News and views

Modularity maximization considered harmful

Published on Monday, 06 December, 2021
Tags: community detection, modularity, inference, Bayes, SBM

(This is a continuations of the previous two posts, and a slightly modified version of chapter III in [peixoto_descriptive_2021].)

The most widespread method for community detection is modularity maximization [newman_modularity_2006], which happens also to be one the most problematic. This method is based on the modularity function,

\begin{equation*} Q(\boldsymbol A,\boldsymbol b) = \frac{1}{2E}\sum_{ij}\left(A_{ij} - \frac{k_ik_j}{2E}\right)\delta_{b_i,b_j} \end{equation*}

where \(A_{ij}\in\{0,1\}\) is an entry of the adjacency matrix, \(k_i=\sum_jA_{ij}\) is the degree of node \(i\), \(b_i\) is the group membership of node \(i\), and \(E\) is the total number of edges. The method consists in finding the partition \(\hat{\boldsymbol b}\) that maximizes \(Q({\boldsymbol A},{\boldsymbol b})\),

\begin{equation*} \hat{\boldsymbol b} = \underset{\boldsymbol b}{\operatorname{argmax}}\; Q(\boldsymbol A,\boldsymbol b). \end{equation*}

The motivation behind the modularity function is that it compares the existence of an edge \((i,j)\) to the probability of it existing according to a null model, \(P_{ij} = k_ik_j/2E\), namely that of the configuration model [fosdick_configuring_2018] (or more precisely, the Chung-Lu model [chung_connected_2002]). The intuition for this method is that we should consider a partition of the network meaningful if the occurrence of edges between nodes of the same group exceeds what we would expect with a random null model without communities.

Despite its widespread adoption, this approach suffers from a variety of serious conceptual and practical flaws, which have been documented extensively [guimera_modularity_2004] [fortunato_resolution_2007] [good_performance_2010] [fortunato_community_2010] [fortunato_community_2016]. The most problematic one is that it purports to use an inferential criterion — a deviation from a null generative model — but is in fact merely descriptive. As has been recognized very early, this method categorically fails in its own stated goal, since it always finds high-scoring partitions in networks sampled from its own null model [guimera_modularity_2004].

The reason for this failure is that the method does not take into account the deviation from the null model in a statistically consistent manner. The modularity function is just a re-scaled version of the assortativity coefficient [newman_mixing_2003], a correlation measure of the community assignments seen at the endpoints of edges in the network. We should expect such a correlation value to be close to zero for a partition that is determined before the edges of the network are placed according to the null model, or equivalently, for a partition chosen at random. However, it is quite a different matter to find a partition that optimizes the value of \(Q({\boldsymbol A},{\boldsymbol b})\), after the network is observed. The deviation from a null model computed in eq:Q completely ignores the optimization step of eq:qmax, although it is a crucial part of the algorithm. As a result, the method of modularity maximization tends to massively overfit, and find spurious communities even in networks sampled from its null model. We are searching for patterns of correlations in a random graph, and most of the time we will find them. This is a pitfall known as "data dredging" or "p-hacking", where one searches exhaustively for different patterns in the same data and reports only those that are deemed significant, according to a criterion that does not take into account the fact that we are doing this search in the first place.

We demonstrate this problem in fig:randomQ, where we show the distribution of modularity values obtained with a uniform configuration model with \(k_i=5\) for every node \(i\), considering both a random partition and the one that maximizes \(Q({\boldsymbol A},{\boldsymbol b})\). While for a random partition we find what we would expect, i.e. a value of \(Q({\boldsymbol A},{\boldsymbol b})\) close to zero, for the optimized partition the value is substantially larger. Inspecting the optimized partition in fig:randomQ (c), we see that it corresponds indeed to 15 seemingly clear assortative communities — which by construction bear no relevance to how the network was generated. They have been dredged out of randomness by the optimization procedure.

/static/blog/figs/modularity/modularity_fail.png

Modularity maximization systematically overfits, and finds spurious structures even its own null model. In this example we consider a random network model with \(N=10^3\) nodes, with every node having degree \(5\). (a) Distribution of modularity values for a partition into 15 groups chosen at random, and for the optimized value of modularity, for \(5000\) networks sampled from the same model. (b) Adjacency matrix of a sample from the model, with the nodes ordered according to a random partition. (c) Same as (b), but with the nodes ordered according to the partition that maximizes modularity.

Somewhat paradoxically, another problem with modularity maximization is that in addition to systematically overfitting, it also systematically underfits. This occurs via the so-called resolution limit: in a connected network [1] the method cannot find more than \(\sqrt{2E}\) communities [fortunato_resolution_2007], even if they seem intuitive or can be found by other methods. An example of this is shown in fig:resolution, where for a network generated with the SBM containing 30 communities, modularity maximization finds only 18, while an inferential approach has no problems finding the true structure. There are attempts to counteract the resolution limit by introducing a "resolution parameter" to the modularity function, but they are in general ineffective (see [peixoto_descriptive_2021]).

/static/blog/figs/modularity/modularity_resolution.png

The resolution limit of modularity maximization prevents small communities from being identified, even if there is sufficient statistical evidence to support them. Panel (a) shows a network with \(B=30\) communities sampled from an assortative SBM parametrization. The colors indicate the \(18\) communities found with modularity maximization, where several pairs of true communities are merged together. Panel (b) shows the inference result of an assortative SBM [zhang_statistical_2020], recovering the true communities with perfect accuracy. Panels (c) and (d) show the results for a similar model where a larger community has been introduced. In (c) we see the results of modularity maximization, which not only merges the smaller communities together, but also splits the larger community into several spurious ones — thus both underfitting and overfitting different parts of the network at the same time. In (d) we see the result obtained by inferring the SBM, which once again finds the correct answer.

These two problems — overfitting and underfitting — can occur in tandem, such that portions of the network dominated by randomness are spuriously revealed to contain communities, whereas other portions with clear modular structure can have those obstructed. The result is a very unreliable method to capture the structure of heterogeneous networks. We demonstrate this in fig:resolution (c) and (d)

In addition to these major problems, modularity maximization also often possesses a degenerate landscape of solutions, with very different partitions having similar values of \(Q({\boldsymbol A},{\boldsymbol b})\) [good_performance_2010]. In these situations the partition with maximum value of modularity can be a poor representative of the entire set of high-scoring solutions and depend on idiosyncratic details of the data rather than general patterns — which can be interpreted as a different kind of overfitting.

The combined effects of underfitting and overfitting can make the results obtained with the method unreliable and difficult to interpret. As a demonstration of the systematic nature of the problem, in fig:Qrand (a) we show the number of communities obtained using modularity maximization for 263 empirical networks of various sizes and belonging to different domains, obtained from the Netzschleuder catalogue. Since the networks considered are all connected, the values are always below \(\sqrt{2E}\), due to the resolution limit; but otherwise they are well distributed over the allowed range. However, in fig:Qrand (b) we show the same analysis, but for a version of each network that is fully randomized, while preserving the degree sequence. In this case, the number of groups remains distributed in the same range (sometimes even exceeding the resolution limit, because the randomized versions can end up disconnected). As fig:Qrand (c) shows, the number of groups found for the randomized networks is strongly correlated with the original ones, despite the fact that the former have no latent community structure. This is a strong indication of the substantial amount of noise that is incorporated into the partitions found with the method.

/static/blog/figs/modularity/modularity_random.png

Modularity maximization incorporates a substantial amount of noise into its results. (a) Number of groups found using modularity maximization for 263 empirical networks as a function of the number of edges. The dashed line corresponds to the \(\sqrt{2E}\) upper bound due to the resolution limit. (b) The same as in (a) but with randomized versions of each network. (c) Correspondence between the number of groups of the original and randomized network. The dashed line shows the diagonal.

The systematic overfitting of modularity maximization — as well as other descriptive methods such as Infomap — has been also demonstrated recently in [ghasemian_evaluating_2019], from the point of view of edge prediction, on a separate empirical dataset of 572 networks from various domains.

Although many of the problems with modularity maximization were long known, for some time there were no principled solutions to them, but this is no longer the case. In the table below we summarize some of the main problems with modularity and how they are solved with inferential approaches.

Problem Principled solution via inference
Modularity maximization overfits, and finds modules in fully random networks. [guimera_modularity_2004] Bayesian inference of the SBM is designed from the ground to avoid this problem in a principled way and systematically succeeds [peixoto_bayesian_2019].
Modularity maximization has a resolution limit, and finds at most \(\sqrt{2E}\) groups in connected networks [fortunato_resolution_2007]. Inferential approaches with hierarchical priors [peixoto_hierarchical_2014] [peixoto_nonparametric_2017] or strictly assortative structures [zhang_statistical_2020] do not have any appreciable resolution limit, and can find a maximum number of groups that scales as \(O(N/\log N)\). Importantly, this is achieved without sacrificing the robustness against overfitting.
Modularity maximization has a characteristic scale, and tends to find communities of similar size; in particular with the same sum of degrees. Hierarchical priors can be specifically chosen to be a priori agnostic about characteristic sizes, densities of groups and degree sequences [peixoto_nonparametric_2017], such that these are not imposed, but instead obtained from inference, in an unbiased way.
Modularity maximization can only find strictly assortative communities. Inferential approaches can be based on any generative model. The general SBM will find any kind of mixing pattern in an unbiased way, and has no problems identifying modular structure in bipartite networks, core-periphery networks, and any mixture of these or other patterns. There are also specialized versions for bipartite [larremore_efficiently_2014], core-periphery [zhang_identification_2015], and assortative patterns [zhang_statistical_2020], if these are being searched exclusively.
The solution landscape of modularity maximization is often degenerate, with many different solutions with close to the same modularity value [good_performance_2010], and with no clear way of how to select between them. Inferential methods are characterized by a posterior distribution of partitions. The consensus or dissensus between the different solutions [peixoto_revealing_2021] can be used to determine how many cohesive hypotheses can be extracted from inference, and to what extent is the model being used a poor or a good fit for the network.

Because of the above problems, the use of modularity maximization should be discouraged, since it is demonstrably not fit for purpose as an inferential method. As a consequence, the use of modularity maximization in any recent network analysis can be arguably considered a "red flag" that strongly indicates methodological carelessness. In the absence of secondary evidence supporting the alleged community structures found, or extreme care to counteract the several limitations of the method, the safest assumption is that the results obtained with that method tend to contain a substantial amount of noise, rendering any inferential conclusion derived from them highly suspicious.

As a final note, we focus on modularity here not only for its widespread adoption but also because of its emblematic character. At a fundamental level, all of its shortcoming are shared with any descriptive method in the literature — to varied but always non-negligible degrees.

[peixoto_descriptive_2021](1, 2) Tiago P. Peixoto, "Descriptive vs. inferential community detection: pitfalls, myths and half-truths", arXiv: 2112.00183
[newman_modularity_2006]M. E. J. Newman, "Modularity and community structure in networks," Proceedings of the National Academy of Sciences 103, 8577–8582 (2006). DOI: 10.1073/pnas.0601602103
[fosdick_configuring_2018]B. Fosdick, D. Larremore, J. Nishimura, and J. Ugander, "Configuring Random Graph Models with Fixed Degree Sequences," SIAM Review 60, 315–355 (2018). DOI: 10.1137/16M1087175
[chung_connected_2002]Fan Chung and Linyuan Lu, "Connected Components in Random Graphs with Given Expected Degree Sequences," Annals of Combinatorics 6, 125–145 (2002). DOI: 10.1007/PL00012580
[guimera_modularity_2004](1, 2, 3) Roger Guimerà, Marta Sales-Pardo, and Luís A. Nunes Amaral, "Modularity from fluctuations in random graphs and complex networks," Physical Review E 70, 025101 (2004). DOI: 10.1103/PhysRevE.70.025101
[fortunato_resolution_2007](1, 2, 3) Santo Fortunato and Marc Barthélemy, "Resolution limit in community detection", Proceedings of the National Academy of Sciences 104, 36–41 (2007). DOI: 10.1073/pnas.0605965104
[good_performance_2010](1, 2, 3) Benjamin H. Good, Yves-Alexandre de Montjoye, and Aaron Clauset, "Performance of modularity maximization in practical contexts", Physical Review E 81, 046106 (2010). DOI: 10.1103/PhysRevE.81.046106
[fortunato_community_2010]Santo Fortunato, "Community detection in graphs", Physics Reports 486, 75–174 (2010). DOI: 16/j.physrep.2009.11.002
[fortunato_community_2016]Santo Fortunato and Darko Hric, "Community detection in networks: A user guide", Physics Reports(2016), DOI: 10.1016/j.physrep.2016.09.002
[newman_mixing_2003]M. E. J. Newman, "Mixing patterns in networks", Phys. Rev. E 67, 026126 (2003). DOI: 10.1103/PhysRevE.67.026126
[ghasemian_evaluating_2019]Amir Ghasemian, Homa Hosseinmardi, and Aaron Clauset, "Evaluating Overfit and Underfit in Models of Network Community Structure," IEEE Transactions on Knowledge and Data Engineering, 1–1 (2019). DOI: 10.1109/TKDE.2019.2911585
[peixoto_hierarchical_2014]Tiago P. Peixoto, "Hierarchical Block Structures and High-Resolution Model Selection in Large Networks", Physical Review X 4, 011047 (2014). DOI: 10.1103/PhysRevX.4.011047
[peixoto_nonparametric_2017](1, 2) Tiago P. Peixoto, "Nonparametric Bayesian inference of the microcanonical stochastic block model", Physical Review E 95, 012317 (2017). DOI: 10.1103/PhysRevE.95.012317
[zhang_statistical_2020](1, 2, 3) Lizhi Zhang and Tiago P. Peixoto, "Statistical inference of assortative community structures." Physical Review Research 2, 043271 (2020). DOI: 10.1103/PhysRevResearch.2.043271
[larremore_efficiently_2014]Daniel B. Larremore, Aaron Clauset, and Abigail Z. Jacobs, "Efficiently inferring community structure in bipartite networks", Physical Review E 90, 012805 (2014). DOI: 10.1103/PhysRevE.90.012805
[zhang_identification_2015]Xiao Zhang, Travis Martin, and M. E. J. Newman, "Identification of core-periphery structure in networks," Physical Review E 91, 032803 (2015). DOI: 10.1103/PhysRevE.91.032803
[peixoto_revealing_2021]Tiago P. Peixoto, "Revealing Consensus and Dissensus between Network Partitions", Physical Review X 11, 021003 (2021). DOI: 10.1103/PhysRevX.11.021003
[peixoto_bayesian_2019]Tiago P. Peixoto, "Bayesian Stochastic Blockmodeling", in Advances in Network Clustering and Blockmodeling (John Wiley & Sons, Ltd, 2019) pp. 289–332. DOI: 10.1002/9781119483298.ch11
[1]Modularity maximization, like many descriptive community detection methods, will always place connected components in different communities. This is another clear distinction with inferential approaches, since fully random models — without latent community structure — can generate disconnected networks if they are sufficiently sparse. From an inferential point of view, it is therefore incorrect to assume that every connected component must belong to a different community.

Comments