Inferential Network Science

Our research group works at the interface between Computational Statistics, Information Theory, Bayesian Inference, Machine Learning, and Statistical Physics.

Group News

  • 21/06/2024 — Sebastian Kusch has beew awarded the “best talk by an early-career researcher” at NetSci 2024 in Québec!
  • 06/05/2024 — A new version of graph-tool was released!
  • 02/05/2024 — New pre-print: “Network reconstruction via the minimum description length principle” [1]
  • 12/04/2024 — Upcoming talk at NetSI, Boston.
  • 09/04/2024 — Bukyoung Jhun has been awarded the Young Statistical Physicist Award by Korean Physical Society!
  • 01/03/2024 — Martina Contisciani has joined the group as a post-doc!
  • 04/01/2024 — New arxiv paper: “Scalable network reconstruction in subquadratic time” [2]
  • 01/01/2024 — Thomas Robiglio has joined the group as a PhD student!
  • 15/12/2023 — Silvia Guerrini defended her MSc thesis!
  • 01/09/2023 — Bukyoung Jhun has joined the group as a post-doc!

We have as main focus the methodological foundations of Network Science and the study of Complex Systems.

Our ambition has been to render obsolete the reliance on ad hoc heuristics in the field of network data analysis, and transition instead to a mature and robust methodological framework that is derived from fundamental principles and is grounded in solid statistical theory.

This framework should be able to extract the most appropriate level of complexity that can be justified from statistical evidence, taking into account both epistemic and aleatoric uncertainty, while achieving interpretability, algorithmic efficiency, and versatility.

A central concern of ours is the practical implementation of inductive reasoning and statistical inference to relational data that come from a variety of complex systems in the real world. A lot of what we do is framed by the following instrumental questions:

  1. How do we prevent overfitting and produce explanations of empirical observations that correctly separate structure from randomness?
  2. How can we reconstruct dynamical rules and network structures from indirect information on their behavior?
  3. How do we faithfully model the hierarchical, modular, higher-order, and dynamical structure of network systems?

This line of work was recognized with the Erdős–Rényi Prize from the Network Science Society.

The graph-tool library

Most of the methods developed in our group are made available as part of the graph-tool library, which is extensively documented.
For a practical introduction to many inference and reconstruction algorithms, please refer to the HOWTO.

From data to parsimonious models of complex systems

Networks delineate the constituent interactions of a broad range of large-scale complex systems. They provide an essential mathematical representation of socio-economical relations, the human brain, cell metabolism, ecosystems, epidemic spreading, informational infrastructure, transportation systems, and many more.

The structure of these network systems is typically large and heterogeneous, and the interactions they describe are typically non-linear, and result in nontrivial emergent behavior and self-organization. Network theory offers a wide ranging foundation to untangle such intricate systems, potentially allowing us to predict and control their behavior, as well as to provide scientific explanations.

From complex visibles to simple invisibles

The essential goal of science is to find understandable explanations (inferred and sufficiently simple) to what at first seems incomprehensible behavior (observed and initially complex).

In the context of complexity science and network science, the understandable explanations that we seek are the fundamental local interaction rules that give rise to global emergent behavior. These fundamental rules are rarely observed directly — instead, they are almost always latent, i.e. hidden from the observer.

Therefore, to fulfill our scientific goal we must develop a robust inductive framework to infer the hidden minimal rules that govern complex behavior.

Structure vs. randomness

A significant obstacle for the inference of such high-dimensional relational objects lies in discerning between signal and randomness. We need to be able to identify which aspects of these systems arise from random stochastic fluctuations and which convey valuable information about an underlying phenomenon. This is a multifaceted problem that often defies intuition, and lies at the heart of any data-driven analysis.

Figure 1: These three adjacency matrices correspond to the same random graph; the only difference between them is how the nodes are ordered. Each ordering reveals a nontrivial—and seemingly compelling—mixing pattern between the nodes. However, since the graph is completely random, all these patterns are mere statistical illusions, dredged out of pure randomness, and therefore overfit the data. Widespread methods of network data analysis cannot distinguish these spurious patterns from statistically significant ones, posing a significant risk to their application.

Figure 1 cycles through some of the many different patterns that can be found in fully random graphs!

How about the patterns that you see in your network?

Are they statistically meaningful? 🤔

There’s a wrong way and a right way to find out.

In our group, we focus on the development of principled and trustworthy methods to extract scientific understanding from network data.

Our methods are designed to be robust against overfitting, honoring the principle of maximum parsimony—or Occam’s razor, as well as to enable model comparison, validation, and uncertainty quantification, while also being algorithmically efficient. This is achieved by merging analytical tools and concepts from a variety of disciplines, including Information Theory, Bayesian Statistics, Machine Learning, and Statistical Mechanics.

We’re particularly interested in problems of network inference where meaningful structural and functional patterns cannot be obtained by direct inspection or low-order statistics, and require instead more sophisticated approaches based on large-scale generative models and efficient algorithms derived from them. In more demanding, but nonetheless ubiquitous scenarios, the network data are noisy, incomplete, or even completely hidden, leaving their trace only via an observed dynamical behavior—in which case the network needs to be fully reconstructed from indirect information.

Research highlights

Inferring modular structures in networks

Figure 2: We develop principled methods to infer the hierarchical, modular structure of networks, based on generative models and Bayesian inference. Our approaches are efficient (scaling up to huge networks) and robust. In particular they are able to avoid both overfitting and underfitting the data. See review  [3] for an introduction, and the HOWTO documentation for graph-tool. See also  [49].

Annotated and attributed networks

Figure 3: Network data are often annotated with weights or covariates on the edges, or metadata on the nodes. We develop inference approaches that are able to leverage this formation to uncover latent, statistically meaningful network structures. Our perspective is that such annotations are just more data—not “ground truth”—and hence are also subject to noise, incompleteness, irrelevance, etc.  [See 10,11].

Dynamical networks

Figure 4: In many instances, networks are dynamical objects and their structure evolves in time. We develop inference methods that are able to characterize how the large-scale structure dynamically changes. Importantly, instead of imposing a priori characteristic time scales, we extract the relevant scales from data by formulating arbitrary-order dynamical models, within a nonparametric Bayesian inference framework.  [See 12,13,14].

Uncertain network reconstruction

Figure 5: As is unavoidable in any empirical setting, network data are subject to measurement uncertainties and omissions. However, differently from more established empirical traditions, network data often do not contain reported error assessments of any kind. We develop principled methods of error evaluation and network reconstruction that are able to function even in the demanding scenario where only a single network is observed, and the error magnitudes are unknown.  [See 15].

Reconstruction from dynamics

Figure 6: Certain networks are impossible, or prohibitively expensive, to be measured directly, and we need to infer their structure from an observed dynamics that takes place on them. We develop Bayesian methods that are able to achieve this reconstruction, and demonstrate how the joint inference of modular network structure with the network itself can significantly improve the reconstruction from indirect dynamics as well, specially when coupled with efficient algorithms. This amounts to a unification of network reconstruction with community detection—two central but traditionally isolated problems in network science, statistics and machine learning.  [2,See 16].

Disentangling edge formation mechanisms

Figure 7: Networks are often the result of a variety of different and interdependent generative mechanisms that operate on different scales (e.g. global or local). We are able to show that it is possible, in key cases, to decompose the contributions of each mechanism, based only on the traces they leave behind on the network structure. In particular we show how homophily and triadic closure can be disentangled from each other, given only a single network snapshot. This has important consequences to the interpretation of community detection methods, since the effects of both mechanisms are often conflated.  [See 17].

Open positions

Interested PhD candidates are encouraged to apply for the “PhD Program in Network Science at CEU”.

Prospective post-doc researchers should refer to the joining the group page.


T. P. Peixoto, Scalable Network Reconstruction in Subquadratic Time, arXiv:2401.01404 (2024).
T. P. Peixoto, Bayesian Stochastic Blockmodeling, in Advances in Network Clustering and Blockmodeling (John Wiley & Sons, Ltd, 2019), pp. 289–332.
T. P. Peixoto, Parsimonious Module Inference in Large Networks, Physical Review Letters 110, 148701 (2013).
T. P. Peixoto, Nonparametric Bayesian Inference of the Microcanonical Stochastic Block Model, Physical Review E 95, 012317 (2017).
T. P. Peixoto, Latent Poisson Models for Networks with Heterogeneous Density, Physical Review E 102, 012309 (2020).
T. P. Peixoto, Merge-Split Markov Chain Monte Carlo for Community Detection, Physical Review E 102, 012305 (2020).
T. P. Peixoto, Nonparametric Weighted Stochastic Block Models, Physical Review E 97, 012306 (2018).
D. Hric, T. P. Peixoto, and S. Fortunato, Network Structure, Metadata, and the Prediction of Missing Nodes and Annotations, Physical Review X 6, 031038 (2016).
T. P. Peixoto and L. Gauvin, Change Points, Memory and Epidemic Spreading in Temporal Networks, Scientific Reports 8, 15511 (2018).
T. P. Peixoto and M. Rosvall, Modelling Sequences and Temporal Networks with Dynamic Community Structures, Nature Communications 8, 582 (2017).
T. P. Peixoto, Reconstructing Networks with Unknown and Heterogeneous Errors, Physical Review X 8, 041011 (2018).
T. P. Peixoto, Network Reconstruction and Community Detection from Dynamics, Physical Review Letters 123, 128301 (2019).
T. P. Peixoto, Disentangling Homophily, Community Structure, and Triadic Closure in Networks, Physical Review X 12, 011004 (2022).