Keywords

1 Introduction

Network neuroscience sees the brain through an integrative lens by mapping and modelling its elements and interactions [3, 21]. The main theoretical framework from complex network science used to model, estimate, and simulate brain networks is graph theory [9, 24]. A graph is comprised of a set of interconnected elements, also known as vertices and edges. Vertices (also known as nodes) in a network can be, for example, brain areas, while edges (also known as links) are a representation of the functional connectivity between pairs of vertices [42]. Several descriptive graph metricsFootnote 1 [16] can then be calculated to describe the brain network’s characteristic [21, 26], and they have consistently allowed researchers to identify non-random features of brain networks. An example is the ground-breaking discovery that the brain (like most other real-world networks) follows a ‘small-world network’ architecture [2, 43], indicating a compromise between wiring cost and optimal efficiency. Using graph theory, many insights have been gathered on the healthy and diseased brain neurobiology [19, 26]. Algebraic topological data analysis (TDA) provides another prism on brain connectivity investigation beyond the ‘simple’ pairwise connections (i.e., higher-order interactions). With TDA, one can identify a network’s shape and its invariant properties (i.e., coordinate and deformation invariances [47]). Moreover, TDA often provides more robustness against noise than graph theoretical analysis [41], a significant neuroimaging data issue [30]. Although TDA has only recently been adopted to network neuroscience [15, 39], it has already shown exciting results on brain network data [14, 18]. However, clinical scientists’ comprehension and application can be hindered by TDA’s complexity and mathematical abstraction. Here, we want to facilitate the use of network neuroscience and its constituents graph theory and TDA by the general neuroscientific community by providing both computational and theoretical explanation of the primary metrics, strongly inspired by [21]. The work is divided into a longer manuscript [13] containing several resources (see Table in [13]) and more theoretical explanations, and a publicly available Jupyter Notebook online (https://github.com/multinetlab-amsterdam/network_TDA_tutorial). In these notebooks, we use third-party Python packages for part of the computations (e.g., networkx [25] for the graph theory metrics and gudhi [32] for persistent homology) and provide practical scripts for some TDA metrics and 3-D visualisations of simplicial complexes (a new addition to the field). Our tutorial focuses on resting-state functional magnetic resonance imaging (rsfMRI) data; however, the main concepts and tools discussed in this paper can be extrapolated to other imaging modalities, biological or complex networks. The extended version [13] covers the most commonly used graph metrics in network neuroscience, also in line with reference [21], and TDA. However, due to the size constraint, here we prioritize the latter.

1.1 Starting Point: The Adjacency Matrix

The basic unit on which graph theory and TDA are applied in the context of rsfMRI is the adjacency or functional connectivity matrix [3, 21]. Typically, rsfMRI matrices are symmetric and do not specify the direction of connectivity (i.e., activity in area A drives activity in area B), therefore yielding undirected networks. To further analyse the sfMRI connectivity matrix, one has to decide whether to keep or not edges’ weights (e.g., correlation values in rsfMRI connectivity) or to absolutise negative weights (or anticorrelations) [21, 27]. These decisions influence the computation of the different metrics described in the tutorial and matter for the biological interpretation of the results [21]. In this tutorial we use an undirected, weighted, and absolutised connectivity matrix.

1.2 Topological Data Analysis

TDA uses topology and geometry methods to study the shape of the data [11] and can identify a network’s different characteristics by addressing a network’s high-order structure [4, 10, 28]. A core success of TDA is the ability to provide robust results when compared with alternative methods, even if the data are noisy [18, 41]. One of the benefits of using TDA in network neuroscience is the possibility of finding global properties of a network that are preserved regardless of the way we represent the network [35], as we illustrate below. Those properties are the so-called topological invariants. Here, we cover some fundamental TDA concepts: filtration, simplicial complexes, Euler characteristic, phase-transitions, Betti numbers, curvature, and persistent homology.

Simplicial Complexes. In TDA, we consider that the network as a multidimensional structure called the simplicial complex. Such a network is not only made up of the set of vertices (0-simplex) and edges (1-simplex) but also of triangles (2-simplex), tetrahedrons (3-simplex), and higher k-dimensional structures. In short, a k-simplex is an object in k-dimensions and, in our work, is formed by a subset of \(k+1\) vertices of the network.

Filtration. Consists of a nested sequence of simplicial complexes. Here, a filtration is defined by changing the density d of the network, from \(0 \le d \le 1\). This yields a nested sequence of networks, in which increasing d leads to a more densely connected network. In neuroscience, It can be used to avoid arbitrary threshold/density choices, which are usually made in the field.

We can encode a network into a simplicial complex in several ways [17, 29, 31]. Here, we focus on building a simplicial complex only from the brain network’s cliques, i.e., we create the so-called clique complex of a brain network. In a network, a k-clique is a subset of the network with k all-to-all connected nodes. 0-clique corresponds to the empty set, 1-cliques correspond to nodes, 2-cliques to links, 3-cliques to triangles, etc.. In the clique complex, each \(k+1\) clique is associated with a k-simplex. This choice for creating simplexes from cliques has the advantage that we can still use pairwise signal processing to create a simplicial complex from brain networks, such as in [23]. It is essential to mention that other strategies to build simplicial complexes beyond pairwise signal processing are still under development, such as applications using multivariate information theory together with tools from algebraic topology [1, 5,6,7, 22, 36]. In our Jupyter Notebook [12], we provide the code to visualise the clique complex developed in [38] (Fig. 1).

The Euler Characteristic. The Euler characteristic is one example of topological invariants: the network properties that do not depend on a specific graph representation. We first introduce the Euler characteristic for polyhredra. Later, we translate this concept to brain networks. In 3-D convex polyhedra, the Euler characteristic is defined as the numbers of vertices minus edges plus faces. For convex polyhedra without cavities (holes in its shape), which are isomorphous to the sphere, the Euler characteristic is always two. If we take the cube and make a cavity, the Euler drops to zero as it is in the torus. If we make two cavities in a polyhedral (as in the bitorus), the Euler drops to minus two. We can understand that the Euler characteristic tells us something about a polyhedron’s topology and its analogous surface. In other words, if we have a surface and we make a discrete representation of it (e.g., a surface triangulation), its Euler characteristic is always the same, regardless of the way we do it. We can now generalise the definition of Euler characteristic to a simplicial complex in any dimension. Thus, the high dimensional version of the Euler characteristic is expressed by the alternate sum of the numbers \(Cl_{k} (d)\) of the k-cliques (which are (\(k-1\))-simplexes) present in the network’s simplicial complex for a given value of the density threshold d:

$$\chi (d)= Cl_1-Cl_2+ ... + Cl_n=\sum _{k=1}^n(-1)^k Cl_(k ) (d).$$

Betti Numbers. Another set of topological invariants are the Betti numbers (\(\beta \)). Given that a simplicial complex is a high-dimensional structure, \(\beta _k\) counts the number of k-dimensional holes in the simplicial complex. These are topological invariants that correspond, for each \(k \ge 0\), to the number of k-dimensional holes in the simplicial complex [47]. In a simplicial complex, there can be many of these k-holes and counting them provide the Betti number \(\beta \), e.g., if \(\beta _2\) is equal to five, there are 5 two-dimensional holes. The Euler characteristics of a simplicial complex can also be computed using the \(\beta \) via the following formula [17]:

$$\chi =\beta _0-\beta _1+\beta _2-\ldots {\left( -1\right) ^{k_{\max }}\beta }_{k_{\max }}=\sum _{k=0}^{k_{max}}\left( -1\right) ^k\beta _{k\ },$$

where \(k_{\max }\) the maximum dimension that we are computing the cycles.

Curvature. Curvature is a TDA metric that can link the global network properties described above to local features [20, 38, 44]. It allows us to compute topological invariants for the whole-brain set of vertices and understand the contribution of specific individual nodal, or subnetwork, geometric proprieties to the brain network’s global properties. Several approaches to defining a curvature for networks are available [33, 44], including some already used in neuroscientific investigations [38]. We illustrate the curvature approach linked to topological phase transitions, previously introduced for complex systems [20, 33, 45]. To compute the curvature, filtration is used to calculate the clique participation rank (i.e., the number of k-cliques in which a vertex i participates for density d) [40], which we denote here by \(Cl_{ik}(d)\). The curvature of the vertex based on the participation rank is then defined as:

$$\kappa _i=\ \sum _{k=1}^{k_{\max }}{\left( -1\right) ^{k+1}\!,\,\frac{{Cl}_{ik\ }\left( d\right) }{k}\ }$$

where \({Cl}_{ik } = 1\) since each vertex i participates in a single 1-clique (the vertex itself), and \(k_{\max }\) the maximum number of vertices that are all-to-all connected in the network (see in Fig. 1 the participation in 3-cliques). We use the Gauss-Bonnet theorem for networks to link the local (nodal) curvature to the network’s global properties (its Euler characteristic). Conversely, by summing up all the curvatures of the network across different thresholds, one can reach the alternate sum of the numbers \(Cl_k\) of k-cliques (a subgraph with k all-to-all connected vertices) present in the network’s simplicial complex for a given density threshold \(d \in [0, 1]\). By doing so, we also write the Euler characteristics as a sum of the curvature of all network vertices, i.e.,

$$ \chi \left( d\right) =\sum _{i=1}^{N}{k_i\left( d\right) }. $$
Fig. 1.
figure 1

Simplex 3-D visualisation. Here we visualise the number of 3-cliques (triangles) in a functional brain network as we increase the edge density d (0.01, 0.015, 0.02, and 0.025, from A to D). For higher densities, we have a more significant number of 3-cliques compared to smaller densities. The vertex colour indicates the clique participation rank. We used fMRI data from the 1000 Functional Connectome Project [8].

1.3 Discussion

This tutorial explains some of the primary metrics related to two network neuroscience branches - graph theory and TDA -, providing short theoretical backgrounds and code examples accompanied by a publicly available Jupyter Notebook, with a special section on visualisations of simplicial complexes and curvature computation in brain data. Here, we did not aim to provide a use-case report but rather a hands on computational resource. Finally, we would like to mention some relevant limitations in interpretation when using these metrics in connectivity-based data. Considering that rsfMRI data is often calculated as a temporal correlation between time series using Pearson’s correlation coefficient, a bias on the number of triangles can emerge [46]. This affects TDA (where the impact depends on how high-order interactions are defined) and graph-theoretical metrics (such as the clustering coefficient), with networks based on this statistical method being automatically more clustered than random models. The proper way to determine and infer high-order interactions in the brain is an ongoing challenge in network neuroscience [1, 5,6,7, 22, 36]. Moreover, it is essential to think about the computational cost. The computation of cliques falls in the clique problem, an NP (nonpolynomial time) problem; thus, listing cliques may require exponential time as the size of the cliques or networks grows [34]. What we can do for practical applications is to limit the clique size that can be reached by the algorithm, which determines the dimension of the simplicial complex in which the brain network is represented. This arbitrary constraint implies a theoretical simplification, limiting the space or the dimensionality in which we would analyse brain data. Another issue is that, to finish TDA computations in a realistic time frame, the researcher might need to establish a maximal threshold/density for convergence even after reducing the maximal clique size. Even though TDA approaches lead to substantial improvements in network science, apart from applications using the Mapper algorithm [37], the limitations mentioned above contribute to losing information on the data’s shape. In conclusion, graph theory has been widely used in network neuroscience, but newer methods such as TDA are gaining momentum. To further improve the field, especially for users in the domain of clinical network neuroscience, it is imperative to make the computation of the developed metrics accessible and easy to comprehend and visualise. We hope to have facilitated the comprehension of some aspects of network and topological neuroscience, the computation and visualisation of some of its metrics.