Separating structure from statistical noise in complex systems
Networks
delineate the constituent interactions of a broad range of large-scale
complex
systems. They are essential to describe socio-economical
relations, the human brain, cell metabolism, ecosystems, epidemic
spreading, informational infrastructure, transportation systems, and
many more.
The structure of these network systems is typically large and
heterogeneous. Network theory offers a wide ranging foundation to
untangle such intricate systems, potentially allowing us to predict
and control their behavior, as well as to provide scientific
explanations.
A significant obstacle for the comprehension of such
high-dimensional relational objects lies in discerning
between signal
and randomness.
It is crucial to identify which aspects of these systems arise from
random stochastic fluctuations and which convey valuable information
about an underlying phenomenon. This is a multifaceted problem that
often defies intuition, and lies at the heart of any data-driven
analysis.
These
three adjacency
matrices correspond to the
same random
graph; the only difference between them is how the nodes are
ordered. Each ordering reveals a nontrivial—and seemingly
compelling—mixing pattern between the nodes. However, since the
graph is completely random, all these patterns are mere statistical
illusions, dredged
out of pure randomness, and therefore
overfit the
data. Widespread methods of network data
analysis cannot
distinguish these spurious patterns from statistically significant
ones, posing a significant risk to their application. (Reload
this page to see how many different patterns can be found in random
graphs!)
In our group, we focus on the development of principled and
trustworthy methods to extract scientific understanding from network
data, as well as the mathematical modeling of network behavior and
evolution.
We're particularly interested in problems of network inference
where meaningful structural and functional patterns cannot be obtained
by direct inspection or low-order statistics, and require instead more
sophisticated approaches based on large-scale generative models and
efficient algorithms derived from them. In more demanding, but
nonetheless ubiquitous scenarios, the network data are noisy,
incomplete, or even completely hidden, leaving their trace only via an
observed dynamical behavior—in which case the network needs to be
fully reconstructed from indirect information.
Most of the methods developed in our group are made available
as part of
the graph-tool
library, which
is extensively
documented. For a practical introduction to many inference
and reconstruction algorithms, please refer to
the HOWTO.
We develop principled
methods to infer the hierarchical, modular structure of
networks, based on generative models and Bayesian
inference. Our approaches are efficient (scaling up to
huge networks) and robust. In particular they are able to
avoid both overfitting and underfitting the data. See
review [B2] for an
introduction, and
the HOWTO
documentation
for graph-tool. See
also
[24,20,33,23,42,43].
Annotated and attributed networks
Network data are often annotated with weights or
covariates on the edges, or metadata on the nodes. We develop
inference approaches that are able to leverage this formation to
uncover latent, statistically meaningful network structures. Our
perspective is that such annotations are just more data—not
“ground truth”—and hence are also subject to noise,
incompleteness, irrelevance, etc. See
[37,31].
Dynamical networks
In many instances, networks are dynamical
objects and their structure evolves in time. We develop
inference methods that are able to characterize how the
large-scale structure dynamically changes. Importantly,
instead of imposing a priori characteristic time scales,
we extract the relevant scales from data by formulating
arbitrary-order dynamical models, within a nonparametric
Bayesian inference framework. See
[B1,36,34,28].
Uncertain network reconstruction
As is unavoidable in any empirical setting,
network data are subject to measurement uncertainties and
omissions. However, differently from more established
empirical traditions, network data often do not contain
reported error assessments of any kind. We develop
principled methods of error evaluation and network
reconstruction that are able to function even in the
demanding scenario where only a single network is
observed, and the error magnitudes are unknown. See
[39].
Reconstruction from dynamics
Certain networks are impossible, or
prohibitively expensive, to be measured directly, and we
need to infer their structure from an observed dynamics
that takes place on them. We develop Bayesian methods that
are able to achieve this reconstruction, and demonstrate
how the joint inference of modular network structure with
the network itself can significantly improve the
reconstruction from indirect dynamics as well, specially
when coupled with efficient algorithms. This amounts to a
unification of network reconstruction with community
detection—two central but traditionally isolated
problems in network science, statistics and machine
learning. See [40].
Disentangling edge formation mechanisms
Networks are often the result of a variety of
different and interdependent generative mechanisms that
operate on different scales (e.g. global or local). We are
able to show that it is possible, in key cases, to
decompose the contributions of each mechanism, based only
on the traces they leave behind on the network
structure. In particular we show how homophily and triadic
closure can be disentangled from each other, given only a
single network snapshot. This has important consequences
to the interpretation of community detection methods,
since the effects of both mechanisms are often
conflated. See [50].