Networks capture pairwise interactions between entities and are frequently used in applications such as social networks, food networks, and protein interaction networks, to name a few. Communities, cohesive groups of nodes, often form in these applications, and identifying them gives insight into the overall organization of the network. One common quality function used to identify community structure is modularity. In Hu et al. [SIAM J. Appl. Math., 73 (2013), pp. 2224-2246], it was shown that modularity optimization is equivalent to minimizing a particular nonconvex total variation (TV) based functional over a discrete domain. They solve this problem- assuming the number of communities is known-using a Merriman-Bence-Osher (MBO) scheme. We show that modularity optimization is equivalent to minimizing a convex TV-based functional over a discrete domain-again, assuming the number of communities is known. Furthermore, we show that modularity has no convex relaxation satisfying certain natural conditions. We therefore find a manageable nonconvex approximation using a Ginzburg-Landau functional, which provably converges to the correct energy in the limit of a certain parameter. We then derive an MBO algorithm that has fewer hand-tuned parameters than in Hu et al. and that is seven times faster at solving the associated diffusion equation due to the fact that the underlying discretization is unconditionally stable. Our numerical tests include a hyperspectral video whose associated graph has 2.9\times 107 edges, which is roughly 37 times larger than what was handled in the paper of Hu et al.