# Inferential network science

The 21st century has been marked by the unprecedented volume of digital data being increasingly produced on human behavior, biological organisms, economies, and a variety of other complex systems.

Networks delineate the constituent interactions of a broad range of such large-scale complex systems. They provide an essential mathematical representation of socio-economical relations, the human brain, cell metabolism, ecosystems, epidemic spreading, informational infrastructure, transportation systems, and many more.

The structure of these network systems is typically large and heterogeneous, and the interactions they describe are often non-linear, and result in nontrivial emergent behavior and self-organization.

Although network theory offers a wide ranging foundation to untangle such intricate systems, potentially allowing us to predict and control their behavior, the *analysis of network data* is particularly challenging. Since networks are high-dimensional relational objects, low-order statistics can reveal only very little about them. Conversely, higher-order representations are prone to overfitting, if obtained heuristically, and can easily yield misleading characterizations and statistical illusions.

Such a framework should be able to extract from data the most appropriate level of complexity that can be justified from statistical evidence, taking into account both epistemic and aleatoric uncertainty, while achieving interpretability, algorithmic efficiency, and versatility.

A central concern of ours is the practical implementation of inductive reasoning and statistical inference to relational data that come from a variety of complex systems in the real world. A lot of what we do is framed by the following instrumental questions:

- How do we prevent overfitting and produce explanations of empirical observations that correctly separate structure from randomness?
- How can we reconstruct dynamical rules and network structures from indirect information on their behavior?
- How do we faithfully model the hierarchical, modular, higher-order, and dynamical structure of network systems?

This line of work was recognized with the Erdős–Rényi Prize from the Network Science Society.

Most of the methods developed in our group are made available as part of the graph-tool library, which is extensively documented.

For a practical introduction to many inference and reconstruction algorithms, please refer to the HOWTO.

## Simplicity science for complex systems

(or, from complex visibles to simple invisibles)

The essential goal of science is to find understandable explanations (inferred and sufficiently simple) to what at first seems incomprehensible behavior (observed and initially complex).^{1}

^{1} Recommended: M. Marsili, *Simplicity Science*, Indian Journal of Physics **98**, 3789 (2024).

In the context of complexity and network science, the understandable explanations that we seek are the fundamental local interaction rules that give rise to global emergent behavior. These fundamental rules are rarely observed directly—instead, they are almost always **latent**, i.e. hidden from the observer.

Therefore, to fulfill our scientific goal we must develop a robust inductive framework to *infer* the hidden minimal rules that govern complex behavior.

### Structure vs. randomness

A significant obstacle for the inference of such high-dimensional relational objects lies in discerning between signal and randomness. We need to be able to identify which aspects of these systems arise from random stochastic fluctuations and which convey valuable information about an underlying phenomenon. This is a multifaceted problem that often defies intuition, and lies at the heart of any data-driven analysis.

Figure 1 below demonstrates how easy it is to mistake pure randomness for seemingly meaningful structure in complex systems.

In our group, we focus on the development of principled and trustworthy methods to extract scientific understanding from network data, in a manner that avoids statistical illusions.

Our methods are designed to be robust against overfitting, honoring the principle of maximum parsimony—or Occam’s razor, as well as to enable model comparison, validation, and uncertainty quantification, while also being algorithmically efficient. This is achieved by merging analytical tools and concepts from a variety of disciplines, including Information Theory, Bayesian Statistics, Machine Learning, and Statistical Mechanics.

## Network reconstruction

We’re particularly interested in problems of network inference where meaningful structural and functional patterns cannot be obtained by direct inspection or low-order statistics, and require instead more sophisticated approaches based on large-scale generative models and efficient algorithms derived from them. In more demanding, but nonetheless ubiquitous scenarios, the network data are noisy, incomplete, or even completely hidden, leaving their trace only via an observed dynamical behavior—in which case the network needs to be fully reconstructed from indirect information (see Figure 2) [1].

## Research highlights

**Annotated and attributed networks**

**Dynamical networks**

**Uncertain network reconstruction**

**Reconstruction from dynamics**

**Disentangling edge formation mechanisms**

### References

*Statistical Inference Links Data and Theory in Network Science*, Nature Communications

**13**, 6794 (2022).

*Bayesian Stochastic Blockmodeling*, in

*Advances in Network Clustering and Blockmodeling*(John Wiley & Sons, Ltd, 2019), pp. 289–332.

*Hierarchical Block Structures and High-Resolution Model Selection in Large Networks*, Physical Review X

**4**, 011047 (2014).

*Parsimonious Module Inference in Large Networks*, Physical Review Letters

**110**, 148701 (2013).

*Nonparametric Bayesian Inference of the Microcanonical Stochastic Block Model*, Physical Review E

**95**, 012317 (2017).

*Efficient Monte Carlo and Greedy Heuristic for the Inference of Stochastic Block Models*, Physical Review E

**89**, 012804 (2014).

*Latent Poisson Models for Networks with Heterogeneous Density*, Physical Review E

**102**, 012309 (2020).

*Merge-Split Markov Chain Monte Carlo for Community Detection*, Physical Review E

**102**, 012305 (2020).

*Nonparametric Weighted Stochastic Block Models*, Physical Review E

**97**, 012306 (2018).

*Network Structure, Metadata, and the Prediction of Missing Nodes and Annotations*, Physical Review X

**6**, 031038 (2016).

*Change Points, Memory and Epidemic Spreading in Temporal Networks*, Scientific Reports

**8**, 15511 (2018).

*Modelling Sequences and Temporal Networks with Dynamic Community Structures*, Nature Communications

**8**, 582 (2017).

*Inferring the Mesoscale Structure of Layered, Edge-Valued, and Time-Varying Networks*, Physical Review E

**92**, 042807 (2015).

*Reconstructing Networks with Unknown and Heterogeneous Errors*, Physical Review X

**8**, 041011 (2018).

*Network Reconstruction and Community Detection from Dynamics*, Physical Review Letters

**123**, 128301 (2019).

*Scalable Network Reconstruction in Subquadratic Time*, arXiv:2401.01404 (2024).

*Disentangling Homophily, Community Structure, and Triadic Closure in Networks*, Physical Review X

**12**, 011004 (2022).