Abstract
In recent years, there has been a growing recognition that higher-order structures are important features in real-world networks. A particular class of structures that has gained prominence is known as a simplicial complex. Despite their application to complex processes such as social contagion and novel measures of centrality, not much is currently understood about the distributional properties of these complexes in communication networks. Furthermore, it is also an open question as to whether an established growth model, such as scale-free network growth with triad formation, is sophisticated enough to capture the distributional properties of simplicial complexes. In this paper, we use empirical data on five real-world communication networks to propose a functional form for the distributions of two important simplicial complex structures. We also show that, while the scale-free network growth model with triad formation captures the form of these distributions in networks evolved using the model, the best-fit parameters are significantly different between the real network and its simulated equivalent. An auxiliary contribution is an empirical profile of the two simplicial complexes in these five real-world networks.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Background
Complex systems have undergone intense, interdisciplinary study in recent decades, with network science [3, 28] having emerged as a viable framework for understanding complexity in domains ranging from economics and finance [15, 16, 22, 23, 26], to ‘wicked’ social problems such as human trafficking and terrorism [21, 24, 25, 40]. Communication networks, as well as many other natural and social networks that are modeled as complex systems, have the scale-free topology in common. The preferential attachment model [8, 44] has been suggested as a candidate network evolution or ‘growth’ model to yield such topologies in complex networks by formalizing the intuition that highly connected nodes increase their connectivity faster than their less connected peers. The degree distribution of such networks exhibits power-law scaling [20].
While early studies in network science tended to be limited to lower-order structures like dyadic links or edges [14, 32, 34, 37] (and later, triangles), a recent and growing body of research has revealed that deep insights can be gained from the systematic study of non-simple networks, multi-layer networks [33] and ‘higher-order’ structures [46] in simple networks [1, 5, 13, 19, 43].
One such higher-order structure that continues to undergo study is a simplicial complex (often just referred to as a ‘complex’) [4, 17, 42]. The study of simplicial complexes first took root in mathematics (especially, algebraic topology) [6, 10, 27, 30, 31], but in the last several years, have found practical applications in network science (as discussed in Related Work). Figure 1 provides a practical example of two such simplicial complexes that have been studied in the literature, especially in theoretical biology and protein interaction networks. Due to space limitations, we do not provide a full formal definition; a good reference is [9], who detailed some of their properties and even proposed centrality measures due to their importance. An S-complexFootnote 1 is defined by a ‘central’ edge A-B, with one or more triangles sharing that edge. A T-complex is similar but the central unit is a triangle (A-B-C). Furthermore, non-central (or peripheral) triangles in a T-complex should not also participate in quads with the central triangle i.e., given central triangle A-B-C and peripheral triangle A-B-V3, there should be no link between V3 and C in a valid T-complex. As we detail subsequently, the adjacency factor of either an S*- or T*-complex is the number of triangles flanking the central structure (an edge or triangle respectively).
Simplicial complexes have been widely used to analyze aspects of diverse multilayer systems, including social relation [45], social contagion [35], protein interaction [39], linguistic categorization [12], and transportation [29]. New measurements, such as simplicial degree [39], simplicial degree based centralities [9, 38], and random walks [36] have all been proposed to not only measure the relevance of a simplicial community and the quality of higher-order connections, but also the dynamical properties of simplicial networks.
However, to the best of our knowledge, the distributional properties of such complexes, especially in the context of communication networks, have not been studied so far. A methodology for conducting such studies has also been lacking. Therefore, given the growing recognition that these two structures play an important role in real networks, and with this brief background in place, we propose to investigate the following research questions (RQs):
RQ1: In real-world communication networks, what are the respective distributions of S*- and T*-complexes? Can good functional fits be found for these distributions?
RQ2: Can (and to what extent) the scale-free network growth model (with triad formation) accurately capture these distributions? Or are additional parameters and steps (beyond triad formation) needed to model these higher-order structural properties in real-world networks?
2 Methodology
Since our primary goal in this paper is to understand whether (and to what extent) the scale-free network growth (with triad formation) model can accurately and empirically capture the two simplicial complexes described in the introduction, we first briefly recap the details of the growth model below. Full details are provided in [18].
2.1 Scale-Free Network Growth with Triad Formation
Networks with the power-law degree distribution have been classically modeled by the scale-free network model of Barabási and Albert (BA) [2]. In the original BA model, the initial condition is a network with \(n_0\) nodes. In each growth timestep, an incoming node v is connected using m edges to existing nodes in the network. The connections are determined using preferential attachment (PA), wherein an edge between v and another node w in the network is established with probability proportional to the degree of w.
The growth model informally described above is known to generate a network with the power-law degree distribution; however, other work has found that such networks lack triadic properties (including observed clustering coefficient) in real networks. In order to incorporate such higher-order properties, the growth step in BA model was extended by [18] to include a triad formation (TF) step. Specifically, given that an edge between nodes v and w was attached using preferential attachment, an edge is also established from v to a random neighbor of w with some probability. If all neighbors of w are connected to v, this step does not apply.
In summary, when a ‘new’ node v comes in, a PA step will first be performed, and then a TF step will be performed with probability \(P_t\) (in other words, the probability of PA without TF is \(1-P_t\)). These two steps are performed repeatedly per incoming node until m edges are added to the network. \(P_t\) is the control parameter in the model. It has been shown to have a linear relationship with the network’s average (over all nodes) clustering coefficientFootnote 2.
2.2 Adjacency Factor
To understand the distributional properties of the S*- and T*- complexes in the generated network versus real communication networks, we use the notion of the adjacency factor. From the earlier definition, we know that an S*-complex is defined by a ‘central’ edge (A-B in Fig. 1 (a)) that is adjacent to a certain number of triangles. Given an edge in the network, therefore, we denote the adjacency factor (with respect to S*-complexes) as the (maximal) number of triangles adjacent to that edge. For example, the adjacency factor of edge A-B in Fig. 1 (a) would be 3, not 1 or 2. While we record adjacency factors of 0 alsoFootnote 3 to obtain a continuous distribution, only cases where adjacency factor is greater than 0 constitute valid S*-complexes.
Similarly, the adjacency factor (with respect to T*-complexes) applies to triangles in the network. For every triangle A-B-C (see Fig. 1 (b)), the adjacency factor is the (maximal) number of triangles adjacent to itFootnote 4 in the T*-complex configuration. If no (non-quad) triangles are adjacent to any of the edges of the central A-B-C triangle, then the adjacency factor is 0, meaning that the triangle does not technically participate in a T*-complex.
Hence, depending on whether we are studying and comparing S*- or T*-complex distributions, an adjacency factor can be computed for each edge and each triangle (respectively) in the network. We compute a frequency distribution over these adjacency factors to better contrast these higher-order structures in the grown versus the actual networks from a distributional standpoint.
3 Experiments
We use five publicly available communication networks in our experiments, including Enron email communication network (Email-EnronFootnote 5), 2016 Democratic National Committee email leak network (Email-DNCFootnote 6), a European research institution email data network (Email-EUFootnote 7), the email network based on traffic data collected for 112 d at University of Kiel, Germany [7], and a mobile communication network [41]. Details are shown in Table 1. These networks are available publicly and some (such as Enron) have been extensively studied, but to our knowledge studies involving simplicial complexes and their properties have been non-existent with respect to these communication networks. While our primary goal here is not to study these properties for these specific networks, a secondary contribution of the results that follow is that they do shed some light on the extent and distribution of such complexes in these networks.
In the Introduction, we had introduced two separate (but related) research questions. Below, we discuss both individually, although both rely on a shared set of results.
RQ1: For each network, using the numbers of nodes and edges, and the observed average clustering coefficient, we generate 10 networks using the PA-based growth model (with TF). We obtain the frequency distributions (normalized to resemble a probability distribution) of adjacency factors of T*- and S*-complexes in both the real and generated networks, and visualize these distributions inFootnote 8 Fig. 2 and 3. Besides the direct comparison between the distribution curves, the figures suggest two functions that could fit the distributions (for the S*- and T*-complexes respectively):
where \(erfc(x) = \frac{2}{\sqrt{\pi }}\int _{x}^{\infty } e^{-t^2}dt\).
Both functional fits were discovered empirically using the Enron dataset as a ‘development’ set; however, as we show in response to RQ2, the functions fit quite consistently for all five datasets (but with different parameters, of course), although the first function diverges after a point (when the long tail begins). A theoretical basis for the functions is an interesting open question. We note that the second function is an Exponentially modified Gaussian (EMG) distribution, which is an important and general class of models for capturing skewed distributions. It has been broadly studied in mathematics, and has found empirical applications as well [11].
RQ2: We tabulate the best-fit parameters for each real world network, and the generated networks, in Table 2 and 3. For the real world networks, there is only one set of best-fit estimates. For the generated networks there are ten best-fit estimates per parameter (since we generate 10 networks per real-world network), for which we report the average in the table. We also compute a 2-tailed Student’s t-test and found that, for all parameters and all networks, the generated networks’ (averaged) parameter is significantly different from the corresponding real network’s best-fit parameter at the 99% level. This suggests, intriguingly, that despite the distributional similarities between Fig. 2 (a) and (b) (i.e., between the real and generated networks) the best-fit parameters are significantly different in both cases.
Of course, this does not answer the question as to whether the functions that we empirically discovered and suggested in Eqs. 1 and 2 are good approximations or models for the actual distributions. To quantify such a ‘goodness of fit’ between an actual frequency distribution curve (whether for the real network or the generated networks) and the curve obtained by using the models suggested in Eq. 1 or 2 (with best-fit parameter estimates), we compute a metric called Mean Normalized Deviation (MND). This metric is modeled closely after the root mean square error (RMSE) metric. Given an actual curve f and a modeled curve \(f'\), defined on a common supportFootnote 9 (x-axis) \(X=\{x_1, x_2, \ldots x_n\}\), the MND is given by:
Note that the lower the MND, the better \(f'\) fits f on support X. The MND can never be negative for a positive function f, but it has no upper bound. Hence, a reference is needed. Since we are not aware of any other candidate functional fits for the simplicial complex distributions in the literature, we use a simple (but functionally effective) baseline, namely the horizontal curve \(y=c\), where c is a constant that is selected to roughly coincide with the long-tail of the corresponding real network’s distribution.
In Table 2, we report not just the MNDs of the real and grown networks, but also the corresponding reference MND. Because of the significant long tails in Fig. 2, this MND is already expected to be low. We find, however, that with only three exceptions (over both equationsFootnote 10) for the generated networks (and none for the real networks) does the reference fit the actual distributions better than our proposed models (through a lower MND), despite being optimized to almost coincide with the long tail. Interestingly, Eq. 2 has (much) lower MND scores for the real network compared to the grown networks, as well as the reference function. As we noted earlier, Eq. 1 did not seem to be capturing the long tail accurately. We hypothesize that a piecewise function, where Eq. 1 is only used for modeling the short tail of the S*-complex frequency distribution, may be a better fit. In all cases, investigating the theory of this phenomenon is a promising area of investigation for complex systems research.
In our exploration, we further visualized the relationship between the estimated function’s parameter of two simplices distribution and the configuration setting P of the generated network in Fig. 4. Intriguingly, despite variations in the configuration of the generated networks, the best-fit estimates seem to exhibit convergence behavior. Specifically, as different settings for the probability of TF steps in network generation are applied, the parameter, such as \(\lambda \) in T*-simplices distribution estimation, tends to stabilize towards a consistent value. This observed consistency may hint at some inherent properties of the two high-order structures. Such invariance across diverse network configurations not only underscores the potential utility and reliability of the parameter estimation but also points towards deeper, fundamental characteristics of these structures.
4 Conclusion
Simplicial complexes have become important in the last several years for modeling and reasoning about higher-order structures in real networks. Many questions remain about these structures, including whether they are captured properly by existing (and now classic) growth models. In this paper, we showed that, for two well-known complexes, the PA-model with triad formation captures the distributional properties of the complexes, but the best-fit parameters are significantly different between the grown networks and the real communication networks. It remains an active area of research to better understand the theoretical underpinnings of our proposed functional fits for the simplicial complex distributions, and also to deduce what could be ‘added’ to the growth model to bring its parameters into alignment with the real-world network. We are also investigating the properties of other growth models with respect to accurately capturing these distributions. Finally, understanding the real-world phenomena modeled by these complexes, which are fairly common motifs in all five networks we studied, continues to be an interesting research avenue in communication (and other complex) systems.
Notes
- 1.
Technically, we refer to these in this paper as S*- and T-* complexes, with the * indicating that we are considering the maximal definition of the complex e.g., an S*-complex is not a strict sub-graph of another S-complex.
- 2.
A measure of the degree of clustering, the clustering coefficient \(\gamma _v\) of node v is given by \(\frac{|\mathcal {E}(\varGamma _v)|}{\frac{k_v(k_v-1)}{2}}\), where \(|\mathcal {E}(\varGamma _v)|\) is the number of edges that exist between node v’s neighbors.
- 3.
These are edges that are not part of any triangles.
- 4.
But subject to the ‘quad’ constraint noted in the Introduction.
- 5.
- 6.
- 7.
- 8.
The differences between generated networks corresponding to the same real network were found to be very minor, so we just show one such network (per real network) in the Fig. 2. However, subsequently described statistical analyses make use of all the generated networks.
- 9.
In our case, this is simply the set of adjacency factors.
- 10.
Specifically, on the Eq. 1 model, the MND’ (average over generated networks) is higher for Email-Enron than for the reference; on Equation refeq2, the MND’ is higher for both Uni. of Kiel and Phone Calls.
References
Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002). https://link.aps.org/doi/10.1103/RevModPhys.74.47
Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999). https://science.sciencemag.org/content/286/5439/509
Barabási, A.L., et al.: Network science. Cambridge university press (2016)
Barbarossa, S., Sardellitti, S., Ceci, E.: Learning from signals defined over simplicial complexes. In: 2018 IEEE Data Science Workshop (DSW), pp. 51–55. IEEE (2018)
Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwang, D.U.: Complex networks: structure and dynamics. Phys. Rep. 424(4), 175–308 (2006). http://www.sciencedirect.com/science/article/pii/S037015730500462X
Costa, A., Farber, M.: Random simplicial complexes. In: Callegaro, F., Cohen, F., De Concini, C., Feichtner, E.M., Gaiffi, G., Salvetti, M. (eds.) Configuration Spaces. SIS, vol. 14, pp. 129–153. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31580-5_6
Ebel, H., Mielsch, L.I., Bornholdt, S.: Scale-free topology of e-mail networks. Phys. Rev. E 66, 035103 (2002).https://link.aps.org/doi/10.1103/PhysRevE.66.035103
Eisenberg, E., Levanon, E.Y.: Preferential attachment in the protein network evolution. Phys. Rev. Lett. 91(13), 138701 (2003)
Estrada, E., Ross, G.J.: Centralities in simplicial complexes. applications to protein interaction networks. J. Theoretical Biol. 438, 46–60 (2018)
Faridi, S.: The facet ideal of a simplicial complex. Manuscripta Math. 109(2), 159–174 (2002)
Foley, J.P., Dorsey, J.G.: A review of the exponentially modified gaussian (emg) function: evaluation and subsequent calculation of universal data. J. Chromatogr. Sci. 22(1), 40–46 (1984)
Gong, T., Baronchelli, A., Puglisi, A., Loreto, V.: Exploring the roles of complex networks in linguistic categorization. Artif. Life 18(1), 107–121 (2011). https://doi.org/10.1162/artl_a_00051
Guilbeault, D., Becker, J., Centola, D.: Complex contagions: a decade in review. In: Lehmann, S., Ahn, Y.-Y. (eds.) Complex Spreading Phenomena in Social Systems. CSS, pp. 3–25. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77332-2_1
Hagberg, A., Swart, P., S Chult, D.: Exploring network structure, dynamics, and function using networkx. Tech. rep., Los Alamos National Lab.(LANL), Los Alamos, NM (United States) (2008)
Hidalgo, C.A.: Economic complexity theory and applications. Nat. Rev. Phys. 3(2), 92–113 (2021)
Hidalgo, C.A., Hausmann, R.: The building blocks of economic complexity. Proc. Natl. Acad. Sci. 106(26), 10570–10575 (2009)
Hofmann, S.G., Curtiss, J., McNally, R.J.: A complex network perspective on clinical science. Perspect. Psychol. Sci. 11(5), 597–605 (2016)
Holme, P., Kim, B.: Growing scale-free networks with tunable clustering. Phys. Rev. E, Stat. Nonlinear Soft Matter Phys. 652 Pt 2, 026107 (2002)
Iacopini, I., Petri, G., Barrat, A., Latora, V.: Simplicial models of social contagion. Nat. Commun. 10(1), 1–9 (2019)
Jeong, H., Néda, Z., Barabási, A.L.: Measuring preferential attachment in evolving networks. EPL (Europhys. Lett.) 61(4), 567 (2003)
Kejriwal, M.: Link prediction between structured geopolitical events: Models and experiments. Front. Big Data 4, 779792 (2021)
Kejriwal, M.: On using centrality to understand importance of entities in the panama papers. PLoS ONE 16(3), e0248573 (2021)
Kejriwal, M., Dang, A.: Structural studies of the global networks exposed in the panama papers. Appli. Netw. Sci. 5(1), 1–24 (2020)
Kejriwal, M., Gu, Y.: Network-theoretic modeling of complex activity using UK online sex advertisements. Appli. Netw. Sci. 5, 1–23 (2020)
Kejriwal, M., Kapoor, R.: Network-theoretic information extraction quality assessment in the human trafficking domain. Appli. Netw. Sci. 4(1), 1–26 (2019)
Kejriwal, M., Luo, Y.: On the empirical association between trade network complexity and global gross domestic product. In: International Conference on Complex Networks and Their Applications. pp. 456–466. Springer (2022). doi: https://doi.org/10.1007/978-3-031-21127-0_37
Knill, O.: The energy of a simplicial complex. Linear Algebra and its Applications (2020)
Lewis, T.G.: Network science: Theory and applications. John Wiley & Sons (2011)
Lin, J., Ban, Y.: Complex network topology of transportation systems. Transp. Rev. 33(6), 658–685 (2013)
Maria, C., Boissonnat, J.-D., Glisse, M., Yvinec, M.: The gudhi library: simplicial complexes and persistent homology. In: Hong, H., Yap, C. (eds.) ICMS 2014. LNCS, vol. 8592, pp. 167–174. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44199-2_28
Milnor, J.: The geometric realization of a semi-simplicial complex. Annal. Math, 357–362 (1957)
Milward, H.B., Provan, K.G.: Measuring network structure. Public Administration 76(2), 387–407 (1998)
Mitchison, G., Durbin, R.: Bounds on the learning capacity of some multi-layer networks. Biol. Cybern. 60(5), 345–365 (1989)
Motter, A.E., Zhou, C., Kurths, J.: Enhancing complex-network synchronization. EPL (Europhys. Lett.) 69(3), 334 (2005)
Pastor-Satorras, R., Castellano, C., Van Mieghem, P., Vespignani, A.: Epidemic processes in complex networks. Rev. Mod. Phys. 87(3), 925 (2015)
Schaub, M.T., Benson, A.R., Horn, P., Lippner, G., Jadbabaie, A.: Random walks on simplicial complexes and the normalized hodge 1-laplacian. SIAM Rev. 62(2), 353–391 (2020)
Seidman, S.B.: Network structure and minimum degree. Soc. Netw. 5(3), 269–287 (1983)
Serrano, D.H., Gómez, D.S.: Centrality measures in simplicial complexes: Applications of topological data analysis to network science. Appl. Math. Comput. 382, 125331 (2020)
Serrano, D.H., Hernández-Serrano, J., Gómez, D.S.: Simplicial degree in complex networks. applications of topological data analysis to network science. Chaos, Solitons Fractals 137, 109839 (2020)
Skaburskis, A.: The origin of “wicked problems.” Planning Theory Pract. 9(2), 277–280 (2008)
Song, C., Qu, Z., Blumm, N., Barabási, A.L.: Limits of predictability in human mobility. Science 327(5968), 1018–1021 (2010). https://science.sciencemag.org/content/327/5968/1018
Torres, J.J., Bianconi, G.: Simplicial complexes: higher-order spectral dimension and dynamics. J. Phys. Complexity 1(1), 015002 (2020)
Torres, L., Blevins, A.S., Bassett, D.S., Eliassi-Rad, T.: The why, how, and when of representations for complex systems. arXiv preprint arXiv:2006.02870 (2020)
Vázquez, A.: Growing network with local rules: Preferential attachment, clustering hierarchy, and degree correlations. Phys. Rev. E 67, 056104 (2003). https://link.aps.org/doi/10.1103/PhysRevE.67.056104
Wang, D., Zhao, Y., Leng, H., Small, M.: A social communication model based on simplicial complexes. Phys. Lett. A 384(35), 126895 (2020)
Xu, J., Wickramarathne, T.L., Chawla, N.V.: Representing higher-order dependencies in networks. Sci. Adv. 2(5), e1600028 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shen, K., Kejriwal, M. (2024). An Analytical Approximation of Simplicial Complex Distributions in Communication Networks. In: Cherifi, H., Rocha, L.M., Cherifi, C., Donduran, M. (eds) Complex Networks & Their Applications XII. COMPLEX NETWORKS 2023. Studies in Computational Intelligence, vol 1144. Springer, Cham. https://doi.org/10.1007/978-3-031-53503-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-53503-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53502-4
Online ISBN: 978-3-031-53503-1
eBook Packages: EngineeringEngineering (R0)