Footnote 1

1 Geometric Essence: To the Memory of Jiří Matoušek

Like many in theoretical computer science and discrete mathematics, my own research has benefited from Jirka’s deep insights, especially into computational geometry [64] and linear programming [65]. In fact, our paths accidentally crossed in the final year of my Ph.D. program. As a part of my 1991 CMU thesis [88], I obtained a result on the deterministic computation of a geometric concept, called centerpoints, which led me to learn about one of Jirka’s groundbreaking results during this time.

1.1 Centerpoints

The median is a widely-used concept for analyzing one-dimensional data, due to its statistical robustness and its natural algorithmic applications to divide-and-conquer. In general, suppose P = { p 1, , p n } is a set of n real numbers. For δ ∈ (0, 1∕2], we call \(c \in \mathbb{R}^{}\) a δ-median of P if \(\max \left (\vert \{i: p_{i} <c\}\vert,\vert \{\,j: p_{j}> c\}\vert ) \leq (1-\delta \right )n.\)\(\frac{1} {2}\)-median of P is known simply as a median. Centerpoints are high-dimensional generalization of medians:

Definition 1.1 (Centerpoints)

Suppose P = { p 1, , p n } is a point set in \(\mathbb{R}^{d}\). For δ ∈ (0, 1∕2], a point \(\mathbf{c} \in \mathbb{R}^{d}\) is a δ-centerpoint of P if for all unit vectors \(\mathbf{z} \in \mathbb{R}^{d}\), the projection z T c is a δ-median of the projections, z T ⋅ P = { z T p 1, , z T p n }.

Geometrically, every hyperplane h in \(\mathbb{R}^{d}\) divides the space into two open halfspaces, h + and h . Let the splitting ratio of h over P, denoted by δ h (P), be:

$$\displaystyle{ \delta _{\mathbf{h}}(P):= \frac{\max \left (\vert \mathbf{h}^{+} \cap P\vert,\vert \mathbf{h}^{-}\cap P\vert \right )} {\vert P\vert } }$$
(1)

Definition 1.1 can be restated as: \(\mathbf{c} \in \mathbb{R}^{d}\) is a δ-centerpoint of P if the splitting ratio of every hyperplane h passing through c is at most (1 −δ). Centerpoints are fundamental to geometric divide-and-conquer [34]. They are also strongly connected to the concept of regression depth introduced by Rousseeuw and Hubert in robust statistics [7, 50].

We all know that every set of real numbers has a median. Likewise—and remarkably—every point set in d-dimensional Euclidean space has a \(\frac{1} {d+1}\)-centerpoint [30]. This mathematical result can be established by Helly’s classical theorem from convex geometry.Footnote 2 Algorithmically, Vapnik–Chervonenkis’ celebrated sampling theorem [92] (more below) implies an efficient randomized algorithm—at least in theory—for computing a \(( \frac{1} {d+1}-\epsilon )\)-centerpoint. This “simple” algorithm first takes a “small” random sample, and then obtains its \(\frac{1} {d+1}\)-centerpoint via linear programming. The complexity of this LP-based sampling algorithm is:

$$\displaystyle{2^{O(d)}\left (\frac{d} {\epsilon ^{2}} \cdot \log \frac{d} {\epsilon } \right )^{d}.}$$

1.2 Derandomization

For my thesis, I needed to compute centerpoints in order to construct geometric separators [67] for supporting finite-element simulation and parallel scientific computing [68]. Because linear programming was too slow, I needed a practical centerpoint algorithm to run large-scale experiments [45]. Because I was a theory student, I was also aiming for a theoretical algorithm to enrich my thesis. For the latter, I focused on derandomization, which was then an active research area in theoretical computer science. For centerpoint approximation without linear programming, my advisor Gary Miller and I quickly obtained a simple and practical algorithmFootnote 3 based on Radon’s classical theoremFootnote 4 [30]. But for derandomization, it took me more than a year to finally design a deterministic linear-time algorithm for computing \(( \frac{1} {d+1}-\epsilon )\)-centerpoints in any fixed dimensions. It happened in the Spring of 1991, my last semester at CMU. Gary then invited me to accompany him for a month-long visit, starting at the spring break of 1991, at the International Computer Science Institute (ICSI), located near the U.C. Berkeley campus. During the California visit, I ran into Leo Guibas, one of the pioneers of computational geometry.

After I told Leo about my progress on Radon-Tverberg decomposition [90] and centerpoint computation, he mentioned to me a paper by Jirka [64], which was just accepted to the ACM Symposium on Theory of Computing (STOC 1991)—before my solution—that beautifully solved the sampling problem for a broad class of computational geometry and statistical learning problems. Jirka’s result—see Theorem 1.3 below—includes the approximation of centerpoints as a simple special case. Although our approaches had some ideas in common, I instantly knew that this mathematician—who I later learned was just a year older than me—was masterful and brilliant. I shortened that section of my thesis by referring readers to Jirka’s paper [64], and only included the scheme I had that was in common with his bigger result (Fig. 1).

Fig. 1
figure 1

Page 66 (Chapter 8) of my thesis

1.3 Matoušek’s Theorem: The Essence of Dimensionality

Mathematically, a range space \(\Sigma\) is a pair \((X,\mathcal{R})\), where X is a finite or infinite set, and \(\mathcal{R}\) is a finite or infinite family of subsets of X. Each \(H \in \mathcal{R}\) can be viewed as a classifier of X, with elements in XH as its positive instances. For example, \(\mathbb{R}^{d}\) and its halfspaces form a range space, so do \(\mathbb{R}^{d}\) and its L p -balls, for any p > 0, as well as V and the set of all cliques in a graph G = (V, E). Range spaces greatly extend the concept of linear separators.

An important technique in statistical machine learning and computational geometry is sampling. For range spaces, we can measure the quality of a sample as the following:

Definition 1.2 (ε-samples)

Let \(\Sigma = (X,\mathcal{R})\) be an n-point range space. A subset SX is an ε-sample or ε-approximation for \(\Sigma\) if for all \(H \in \mathcal{R}\):

$$\displaystyle\begin{array}{rcl} \left \vert \frac{\vert H \cap S\vert } {\vert S\vert } -\frac{\vert H \cap X\vert } {\vert X\vert } \right \vert \leq \epsilon & &{}\end{array}$$
(2)

For each SX, the set of distinct classifiers that \(\mathcal{R}\) can define is \(\mathcal{R}(S) =\{ H \cap S: H \in \mathcal{R}\}\). For any m ≤ | X |, let the shatter function for \(\Sigma\) be:

$$\displaystyle{ \pi _{\mathcal{R}}(m) =\sup _{S\subseteq X,\vert S\vert =m}\left \vert \mathcal{R}(S)\right \vert }$$
(3)

Theorem 1.3 (Deterministic Sampling—Matoušek)

Let \(\Sigma = (X,\mathcal{R})\) be an n-point range space with the shatter function satisfying \(\pi _{\mathcal{R}}(m) = O(m^{d})\) (d ≥ 1 a constant). Having a subspace oracle for \(\Sigma\) , and given a parameter r, we can deterministically compute a (1∕r)-approximation of size O(dr 2logr) for \(\Sigma\) , in time O(n(r 2logr)d).

Matoušek’s sampling theorem goes beyond traditional geometry and completely derandomizes the theory of Vapnik–Chervonenkis [92].

Theorem 1.4 (Vapnik and Chervonenkis)

There exists a constant c such that for any finite range space \(\Sigma = (X,\mathcal{R})\) and ε, δ ∈ (0, 1), if S is a set of \(c \cdot \frac{d} {\epsilon ^{2}} \left (\log \frac{d} {\epsilon \delta } \right )\) uniform and independent samples from X, where \(d =\mathrm{ VC}(\Sigma )\) , (see below for definition) then:

$$\displaystyle{\mathrm{Pr}_{}[S\ is\ an\ \epsilon \mathit{\mbox{ -}}sample\ for\ \Sigma ] \geq 1-\delta }$$

Matoušek’s deterministic algorithm can be applied to geometric classifiers as well as any classifier—known as a concept space—that arises in statistical learning theory [91]. The concept of range space has also provided a powerful tool for capturing geometric structures, and played a profound role—both in theory and in practice—for data clustering [38] and geometric approximation [3]. The beauty of Vapnik–Chervonenkis’ theory and Matoušek’s sampling theorem lies in the essence of dimensionality, which is generalized from geometric spaces to abstract range spaces. In Euclidean geometry, the dimensionality comes naturally to many of us. For abstract range spaces, the growth of the shatter functions is more intrinsic! If \(\pi _{\mathcal{R}}(m) = 2^{m}\), then there exists a set SX of m elements that is shattered, i.e., for any subset T of SX, there exists \(H \in \mathcal{R}\) such that T = HS. In other words, we can use \(\mathcal{R}\) to build classifiers for all subsets of S. There is a beautiful dichotomy of polynomial and exponential complexity within the concept of shattering:

  • either X has a subset SX of size m that can be shattered by \(\mathcal{R}\),

  • or for any UX, | U | ≥ m, \(\vert \{H \cap U: H \in \mathcal{R}\}\vert\) is polynomial in | U |.

The latter case implies that \(\mathcal{R}\) can only be used to build a polynomial number of classifiers for U. The celebrated VC-dimension of range space \(\Sigma = (X,\mathcal{R})\), denoted by \(\mathrm{VC}(\Sigma )\), is defined as:

$$\displaystyle{\mathrm{VC}(\Sigma ):= \mbox{ arg max}\{m:\pi _{\mathcal{R}}(m) = 2^{m}\}.}$$

This polynomial-exponential dichotomy is established by the following Sauer’s lemma.Footnote 5

Lemma 1.5 (Sauer)

For any range space \(\Sigma = (X,\mathcal{R})\) and \(\forall m>\mathrm{ VC}(\Sigma )\) , \(\pi _{\mathcal{R}}(m) \leq \sum _{k=0}^{\mathit{\mbox{ VC}}(\Sigma )}{m\choose k}\).

Sauer’s lemma extends the following well-known fact of Euclidean geometry: any set of m hyperplanes in \(\mathbb{R}^{d}\) divides the space into at most O(m d) convex cells. By the point-hyperplane duality, any set of m points can be divided into at O(m d) subsets by halfspaces.

Although my construction of ε-samples in \(\mathbb{R}^{d}\) was good enough for designing linear-time centerpoint approximation algorithm in fixed dimensions, it did not immediately generalize to arbitrary range spaces, because it was tailored to the geometric properties of Euclidean spaces.

By addressing abstract range spaces, Jirka resolved the intrinsic algorithmic problem at the heart of Vapnik–Chervonenkis’ sampling theory. Like Theorem 1.3, many of Jirka’s other landmark and breakthrough results are elegant, insightful, and fundamental. By going beyond the original objects—such as Euclidean spaces or linear programs [65]—Jirka usually went directly to the essence of the challenging problems to come up with beautiful solutions that were natural to him but remarkable to the field.

2 Backgrounds: Understanding Multifaceted Network Data

To analyze the structures of social and information networks in the age of Big Data, we need to overcome various conceptual and algorithmic challenges both in understanding network data and in formulating solution concepts. For both, we need to capture the network essence.

2.1 The Graph Model—A Basic Network Facet

At the most basic level, a network can be modeled as a graph G = (V, E), which characterizes the structure of the network in terms of:

  • nodes: for example, Webpages, Internet routers, scholarly articles, people, random variables, or counties

  • edges: for example, links, connections, citations, friends, conditional dependencies, or voting similarities

In general, nodes in many real-world networks may not be “homogeneous” [5], as they may have some additional features, specifying the types or states of the node elements. Similarly, edges may have additional features, specifying the levels and/or types of pairwise interactions, associations, or affinities.

Networks with “homogeneous” types of nodes and edges are closest to the combinatorial structures studied under traditional graph theory, which considers both weighted or unweighted graphs. Three basic classes of weighted graphs often appear in applications. The first class consists of distance networks, where each edge eE is assigned a number l e ≥ 0, representing the length of edge e. The second class consists of affinity networks, where each edge (u, v) ∈ E is assigned a weight w u, v ≥ 0, specifying u’s affinity weight towards v. The third class consists of probabilistic networks, where each (directed) edge (u, v) ∈ E is assigned a probability p u, v ≥ 0, modeling how a random process connects u to v. It is usually more natural to view maps or the Internet as distance networks, social networks as affinity networks, and Markov processes as probabilistic networks. Depending on applications, a graph may be directed or undirected. Examples of directed networks include: the Web, Twitter, the citation graphs of scholarly publications, and Markov processes. Meanwhile, Facebook “friends” or collaboration networks are examples of undirected graphs.

In this article, we will first focus on affinity networks. An affinity network with n nodes can be mathematically represented as a weighted graph G = (V, E, W). Unless otherwise stated, we assume V = [n] and W is an n × n non-negative matrix (for example from [0, 1]n×n). We will follow the convention that for ij, w i, j = 0, if and only if, (i, j) ∉ E. If W is a symmetric matrix, then we say G is undirected. If w i, j ∈ {0, 1}, \(\forall i,j \in V\), then we say G is unweighted.

Although they do not always fit, three popular data models for defining pairwise affinity weights are the metric model, feature model, and statistical model. The first assumes that an underlying metric space, \(\mathbf{\mathcal{M}} = \left (V,\mathrm{dist}\right )\), impacts the interactions among nodes in a network. The affinities between nodes may then be determined by their distances from the underlying metric space: The closer two elements are, the higher their affinity becomes, and the more interactions they have. A standard way to define affinity weights for uv is: w u, v = dist(u, v)α, for some α > 0. The second assumes that there exists an underlying “feature” space, \(\mathbf{\mathcal{F}} = \left (V,\mathbf{F}\right )\), that impacts the interactions among nodes in a network. This is a widely-used alternative data model for information networks. In a d-dimensional feature space, F is an n × d matrix, where \(f_{u,i} \in \mathbb{R}^{+} \cup \{ 0\}\) denotes u’s quality score with respect the ith feature. Let f u denote the uth row of F, i.e., the feature vector of node u. The affinity weights w u, v between two nodes u and v may then be determined by the correlation between their features: \(w_{u,v} \sim \left (\mathbf{f}_{u}^{T} \cdot \mathbf{f}_{v}\right ) =\sum _{ i=1}^{d}f_{u,i} \cdot f_{v,i}.\) The third assumes that there exists an underlying statistical space (such as a stochastic block model, Markov process, or (Gaussian) random field) that impacts the pairwise interactions. The higher the dependency between two elements is, the higher their strength of tie is.

If one thinks that the meaning of weighted networks is complex, the real-world network data is far more complex and diverse. We will have more discussions in Sects. 2.3 and 4.

2.2 Sparsity and Underlying Models

A basic challenge in network analysis is that real network data that we observe is only a reflection of underlying network models. Thus, like machine learning tasks which have to work with samples from an unknown underlying distribution, network analysis tasks typically work with observed network data, which is usually different from the underlying network model. As argued in [11, 48, 56, 89], a real-world social and information network may be viewed as an observed network, induced by a “complete-information” underlying preference/affinity/statistical/geometric/feature/economical model. However, these observed networks are typically sparse with many missing links.

For studying network phenomena, it is crucial to mathematically understand underlying network models, while algorithmically work efficiently with sparse observed data. Thus, developing systematic approaches to uncover or capture the underlying network model — or the network essence — is a central and challenging mathematical task in network analysis.

Implicitly or explicitly, underlying network models are the ultimate guide for understanding network phenomena, and for inferring missing network data, and distinguishing missing links from absent links. To study basic network concepts, we also need to simultaneously understand the observed and underlying networks. Some network concepts, such as centrality, capture various aspects of “dimension reduction” of network data. Others characterizations, such as clusterability and community classification, are more naturally expressed in a space with dimension higher than that of the observed networks.

Schematically, centrality assigns a numerical score or ranking to each node, which measures the importance or significance of each node in a network [1, 1317, 22, 33, 36, 37, 41, 42, 51, 66, 71, 76, 80]. Mathematically, a numerical centrality measure is a mapping from a network G = (V, E, W) to a | V | -dimensional real vector:

$$\displaystyle{ \left [\ \mathrm{centrality}_{\mathbf{W}}(v)\ \right ]_{v\in V } \in \mathbb{R}^{\vert V \vert } }$$
(4)

For example, a widely used centrality measure is the PageRank centrality. Suppose G = (V, E, W) is a weighted directed graph. The PageRank centrality uses an additional parameter α ∈ (0, 1)—known as the restart constant—to define a finite Markov process whose transition rule—for any node vV —is the following:

  • with probability α, restart at a random node in V, and

  • with probability (1 −α), move to a neighbor of v, chosen randomly with probability proportional to edge weights out of v.

Then, the PageRank centrality (with restart constant α) of any vV is proportional to v’s stationary probability in this Markov chain.

In contrast, clusterability assigns a numerical score or ranking to each subset of nodes, which measures the coherence of each group in a network [62, 71, 89]. Mathematically, a numerical clusterability measure is a mapping from a network G = (V, E, W) to a 2| V |-dimensional real vector:

$$\displaystyle{ \left [\ \mathrm{clusterability}_{\mathbf{W}}(S)\ \right ]_{S\subseteq V } \in [0,1]^{2^{\vert V\vert } } }$$
(5)

An example of clusterability measure is conductance [62].Footnote 6 Similarly, a community-characterization rule [19] is a mapping from a network G = (V, E, W) to a 2| V |-dimensional Boolean vector:

$$\displaystyle{ \left [\ \mathcal{C}_{\mathbf{W}}(S)\ \right ]_{S\subseteq V } \in \{ 0,1\}^{2^{\vert V\vert } } }$$
(6)

indicating whether or not each group SV is a community in G. Clusterability and community-identification rules have much higher dimensionality than centrality. To a certain degree, they can be viewed as a “complete-information” model of the observed network. Thus again:

Explicitly or implicitly, the formulations of these network concepts are mathematical processes of uncovering or capturing underlying network models.

2.3 Multifaceted Network Data: Beyond Graph-Based Network Models

Another basic challenge in network analysis is that real-world network data is much richer than the graph-theoretical representations. For example, social networks are more than weighted graphs. Likewise, the Web and Twitter are not just directed graphs. In general, network interactions and phenomena—such as social influence [55] or electoral behavior [35]—are more complex than what can be captured by nodes and edges. The network interactions are often the result of the interplay between dynamic mathematical processes and static underlying graph structures [25, 44].

2.3.1 Diverse Network Models

The richness of network data and diversity of network concepts encourage us to consider network models beyond graphs [89]. For example, each clusterability measure \(\left [\mathrm{clusterability}_{\mathbf{W}}(S)\right ]_{S\subseteq V }\) of a weighted graph G = (V, E, W) explicitly defines a complete-information, weighted hyper-network:

Definition 2.1 (Cooperative Model: Weighted Hypergraphs)

A weighted hypergraph over V is given by \(H = (V,E,\boldsymbol{\tau })\) where E ⊆ 2V is a set of hyper-edges and \(\boldsymbol{\tau }: E \rightarrow \mathbb{R}^{}\) is a function that assigns weights to hyper-edges. H is a complete-information cooperative networks if E = 2V.

We refer to weighted hypergraphs as cooperative networks because they are the central subjects in classical cooperative game theory, but under a different name [81]. An n-person cooperative game over V = [n] is specified by a characteristic function \(\boldsymbol{\tau }: 2^{V } \rightarrow \mathbb{R}\), where for any coalition SV, \(\boldsymbol{\tau }(S)\) denotes the cooperative utility of S.

Cooperative networks are generalization of undirected weighted graphs. One can also generalize directed networks, which specify directed node-node interactions. The first one below explicitly captures node-group interactions, while the second one captures group-group interactions.

Definition 2.2 (Incentive Model)

An incentive network over V is a pair \(U = (V,\boldsymbol{u})\). For each sV, \(u_{s}: 2^{V \setminus \{s\}} \rightarrow \mathbb{R}^{}\) specifies s’s incentive utility over subsets of V ∖{s}. In other words, there are | S | utility values, {u s (S∖{s}} sS , associated with each group SV in the incentive network. For each sS, the value of its interaction with the rest of the group S∖{s} is explicitly defined as u s (S∖{s}).

Definition 2.3 (Powerset Model)

A powerset network over V is a weighted directed network on the powersets of V. In other words, a powerset network \(P = (V,\boldsymbol{\theta })\) is specified by a function \(\boldsymbol{\theta }: 2^{V } \times 2^{V } \rightarrow \mathbb{R}^{}\).

For example—as pointed in [25, 55]—a social-influence instance fundamentally defines a powerset network. Recall that a social-influence instance \(\mathcal{I}\) is specified by a directed graph G = (V, E) and an influence model \(\mathcal{D}\) [32, 55, 78], where G defines the graph structure of the social network and \(\mathcal{D}\) defines a stochastic process that characterizes how nodes in each seed set SV collectively influence other nodes using the edge structures of G [55]. A popular influence model is independent cascade (IC)Footnote 7 [55].

Mathematically, the influence process \(\mathcal{D}\) and the network structure G together define a probability distribution \(\boldsymbol{P}_{G,\mathcal{D}}: 2^{V } \times 2^{V } \rightarrow [0,1]\): For each T ∈ 2V, \(\boldsymbol{P}_{G,\mathcal{D}}[S,T]\) specifies the probability that T is the final activated set when S cascades its influence through the network G. Thus, \(P_{\mathcal{I}} = (V,\boldsymbol{P}_{G,\mathcal{D}})\) defines a natural powerset network, which can be viewed as the underlying network induced by the interplay between the static network structure G and dynamic influence process \(\mathcal{D}\).

An important quality measure of S in this process is S’s influence spread [55]. It can be defined from the powerset model \(P_{\mathcal{I}} = (V,\boldsymbol{P}_{G,\mathcal{D}})\) as following:

$$\displaystyle{\boldsymbol{\sigma }_{G,\mathcal{D}}(S) =\sum _{T\subseteq V }\vert T\vert \cdot \boldsymbol{ P}_{G,\mathcal{D}}[S,T].}$$

Thus, \((V,\boldsymbol{\sigma }_{G,\mathcal{D}})\) also defines a natural cooperative network [25].

In many applications and studies, ordinal network models rather than cardinal network models are used to capture the preferences among nodes. Two classical applications of preference frameworks are voting [10] and stable marriage/coalition formation [21, 43, 46, 79]. A modern use of preference models is the Border Gateway Protocol (BGP) for network routing between autonomous Internet systems [23, 77].

In a recent axiomatic study of community identification in social networks, Borgs et al. [11, 19] considered the following abstract social/information network framework. Below, for a non-empty finite set V, let L(V ) denote the set of all linear orders on V.

Definition 2.4 (Preference Model)

A preference network over V is a pair \(A = (V,\Pi )\), where \(\Pi =\{\boldsymbol{\pi } _{u}\}_{u\in V } \in L(V )^{\vert V \vert }\) is a preference profile in which \(\boldsymbol{\pi }_{u}\) specifies u’s individual preference.

2.3.2 Understanding Network Facets and Network Concepts

Each network model enables us to focus on different facets of network data. For example, the powerset model offers the most natural framework for capturing the underlying interplay between influence processes and network structures. The cooperative model matches the explicit representation of clusterability, group utilities, and influence spreads. While traditional graph-based network data often consists solely of pairwise interactions, affinities, or associations, a community is formed by a group of individuals. Thus, the basic question for community identification is to understand “how do individual preferences (affinities/associations) result in group preferences or community coherence?” [19] The preference model highlights the fundamental aspect of community characterization. The preference model is also natural for addressing the question of summarizing individual preferences into one collective preference, which is fundamental in the formulation of network centrality [89]. Thus, studying network models beyond graphs helps to broaden our understanding of social/information networks.

Several these network models, as defined above, are highly theoretical models. Their complete-information profiles have exponential dimensionality in | V |. To use them as underlying models in network analysis, succinct representations should be constructed to efficiently capture observed network data. For example, both the conductance clusterability measure and the social-influence powerset network are succinctly defined. Characterizing network concepts in these models and effectively applying them to understanding real network data are promising and fundamentally challenging research directions in network science.

3 PageRank Completion

Network analysis is a task to capture the essence of the observed networks. For example, graph embedding [61, 89] can be viewed as a process to identify the geometric essence of networks. Similarly, network completion [48, 56, 63], graphon estimation [4, 20], and community recovering in hidden stochastic block models [2] can be viewed as processes to distill the statistical essence of networks. All these approaches build constructive maps from observed sparse graphs to underlying complete-information models. In this section, we study the following basic question:

Given an observed sparse affinity network G = (V, E, W), can we construct a complete-information affinity network that is consistent with G?

This question is simpler than but relevant to matrix and network completion [48, 56], which aims to infer the missing data from sparse, observed network data. Like matrix/network completion, this problem is mathematically an inverse problem. Conceptually, we need to formulate the meaning of “a complete-information affinity network consistent with G.”

Our study is also partially motivated by the following question asked in [6, 11], aiming to deriving personalized ranking information from graph-based network data:

Given a sparse affinity network G = (V, E, W), how should we construct a complete-information preference model that best captures the underlying individual preferences from network data given by G?

We will prove the following basic structural resultFootnote 8: Every connected, undirected, weighted graph G = (V, E, W) has an undirected and weighted graph \(\overline{G } = (V, \overline{E }, \overline{\mathbf{W} })\), such that:

  • Complete Information: \(\overline{E }\) forms a complete graph with | V | self-loops.

  • Degree and Stationary Preserving: \(\mathbf{W} \cdot \mathbf{1} = \overline{\mathbf{W} } \cdot \mathbf{1}\). Thus, the random-walk Markov chains on G and on \(\overline{G }\) have the same stationary distribution.

  • PageRank Conforming: The transition matrix \(\mathbf{M}_{\overline{\mathbf{W} }}\) of the random-walk Markov chain on \(\overline{G }\) is conformal to the PageRank of G, that is, \(\mathbf{M}_{\overline{\mathbf{W} }}^{T} \cdot \mathbf{1}\) is proportional to the PageRank centrality of G

  • Spectral Approximation: G and \(\overline{G }\) are spectrally similar.

In the last condition, the similarity between G and \(\overline{G}\) is measured by the following notion of spectral similarity [85]:

Definition 3.1 (Spectral Similarity of Networks)

Suppose G = (V, E, W) and \(\overline{G } = (V, \overline{E }, \overline{\mathbf{W} })\) are two weighted undirected graphs over the same set V of n nodes. Let L W = D W W and \(\mathbf{\mathbf{L}}_{\overline{\mathbf{W} }} = \mathbf{D}_{\overline{\mathbf{W} }} -\overline{\mathbf{W} }\) be the Laplacian matrices, respectively, of these two graphs. Then, for σ ≥ 1, we say G and \(\overline{G}\) are σ-spectrally similar if:

$$\displaystyle{ \forall \mathbf{x} \in \mathbb{R}^{n},\quad \frac{1} {\sigma } \cdot \mathbf{x}^{T}\mathbf{\mathbf{L}}_{\overline{\mathbf{W} }}\mathbf{x} \leq \mathbf{x}^{T}\mathbf{\mathbf{L}}_{\mathbf{ W}}\mathbf{x} \leq \sigma \cdot \mathbf{x}^{T}\mathbf{\mathbf{L}}_{\overline{\mathbf{W} }}\mathbf{x} }$$
(7)

Many graph-theoretical measures, such as flows, cuts, conductances, effective resistances, are approximately preserved by spectral similarity [12, 85]. We refer to \(\overline{G} = (V, \overline{E }, \overline{\mathbf{W} })\) as the PageRank essence or PageRank completion of G = (V, E, W).

3.1 The Personalized PageRank Matrix

\(\overline{G } = (V, \overline{E }, \overline{\mathbf{W} })\) stated above is derived from a well-known structure in network analysis, the personalized PageRank matrix of a network [8, 89].

3.1.1 Personalized PageRanks

Generalizing the Markov process of PageRank, Haveliwala [49] introduced personalized PageRanks. Suppose G = (V, E, W) is a weighted directed graph and α > 0 is a restart parameter. For any distribution s over V, consider the following Markov process, whose transition rule—for any vV —is the following:

  • with probability α, restart at a random node in V according to distribution s, and

  • with probability (1 −α), move to a neighbor of v, chosen randomly with probability proportional to edge weights out of v.

Then, the PageRank with respect to the starting vector s, denoted by p s , is the stationary distribution of this Markov chain.

Let d u out = vV w u, v denotes the out-degree of uV in G. Then, p s is the solution to the following equation:

$$\displaystyle{ \mathbf{p}_{\mathbf{s}} =\alpha \cdot \mathbf{s} + (1-\alpha ) \cdot \mathbf{W}^{T} \cdot \left (\mathbf{D}_{\mathbf{ W}}^{out}\right )^{-1} \cdot \mathbf{p}_{\mathbf{ s}} }$$
(8)

where D W out = diag([d 1 out, , d n out]) is the diagonal matrix of out degrees. Let 1 u denote the n-dimensional vector whose uth location is 1 and all other entries in 1 u are zeros. Haveliwala [49] referred to \(\mathbf{p}_{u}:= \mathbf{p}_{\mathbf{1}_{u}}\) as the personalized PageRank of uV in G. Personalized PageRank is asymmetric, and hence to emphasize this fact, we express p u as:

$$\displaystyle{\mathbf{p}_{u} = \left (\,p_{u\rightarrow 1},\ldots,p_{u\rightarrow n}\right )^{T}.}$$

Then {p u } uV —the personalized PageRank profile—defines the following matrix:

Definition 3.2 (Personalized PageRank Matrix)

The personalized PageRank matrix of an n-node weighted graph G = (V, E, W) and restart constant α > 0 is:

$$\displaystyle{ \mathbf{PPR}_{\mathbf{W},\alpha } = \left [\mathbf{p}_{1},\ldots,\mathbf{p}_{n}\right ]^{T} = \left [\begin{array}{ccc} p_{1\rightarrow 1} & \cdots & p_{1\rightarrow n}\\ \vdots &\cdots & \vdots \\ p_{n\rightarrow 1} & \cdots &p_{n\rightarrow n}\\ \end{array} \right ] }$$
(9)

In this article, we normalize the PageRank centrality so that the sum of the centrality values over all nodes is equal to n. Let 1 denote the n-dimensional vector of all 1s. Then, the PageRank centrality of G is the solution to the following Markov random-walk equation [49, 72]:

$$\displaystyle{ \mathbf{PageRank}_{\mathbf{W},\alpha } =\alpha \cdot \mathbf{1} + (1-\alpha ) \cdot \mathbf{W}^{T}\left (\mathbf{D}_{\mathbf{ W}}^{out}\right )^{-1}\mathbf{PageRank}_{\mathbf{ W},\alpha } }$$
(10)

Because 1 = u 1 u , we have:

Proposition 3.3 (PageRank Conforming)

For any G = (V, E, W) and α > 0:

$$\displaystyle{ \mathbf{PageRank}_{\mathbf{W},\alpha } =\sum _{u\in V }\mathbf{p}_{u} = \mathbf{PPR}_{\mathbf{W},\alpha }^{T} \cdot \mathbf{1} }$$
(11)

Because Markov processes preserve the probability mass of the starting vector, we also have:

Proposition 3.4 (Markovian Conforming)

For any G = (V, E, W) and α > 0, PPR W, α is non-negative and:

$$\displaystyle{ \mathbf{PPR}_{\mathbf{W},\alpha } \cdot \mathbf{1} = \mathbf{1} }$$
(12)

In summary, the PageRank matrix PPR W, α is a special matrix associated with network G—its row sum is the vector of all 1s and its column sum is the PageRank centrality of G.

3.2 PageRank Completion of Symmetric Networks

PageRank centrality and personalized PageRank matrix apply to both directed and undirected weighted graphs. Both Propositions 3.3 and 3.4 also hold generally. In this subsection, we will focus mainly on undirected weighted networks. In such a case, let D W be the diagonal matrix associated with weighted degrees d W = W ⋅ 1 and let M W = D W −1 W be the standard random-walk transition matrix on G.

To state the theorem below, let’s first review a basic concept of Markov chain. Recall that a Markov chain over V is defined by an n × n transition matrix M satisfying the stochastic condition: M is non-negative and M ⋅ 1 = 1. A probability vector \(\boldsymbol{\pi }\) is the stationary distribution of this Markov process if:

$$\displaystyle{ \mathbf{M}^{T}\boldsymbol{\pi } =\boldsymbol{\pi } }$$
(13)

It is well known that every irreducible and ergodic Markov chain has a stationary distribution. Markov chain M is detailed-balanced if:

$$\displaystyle{ \boldsymbol{\pi }[u]\mathbf{M}[u,v] =\boldsymbol{\pi } [v]\mathbf{M}[v,u],\quad \forall \ u,v \in V }$$
(14)

We will now prove the following structural result:

Theorem 3.5 (PageRank Completion)

For any weighted directed graph G = (V, E, W) and restart constant α > 0:

  1. A:

    PPR W, α and \(\left (\mathbf{D}_{\mathbf{W}}^{out}\right )^{-1} \cdot \mathbf{W}\) have the same eigenvectors. Thus, both Markov chains have the same stationary distribution.

  2. B:

    PPR W, α is detailed-balanced if and only if W is symmetric.

Furthermore, when W is symmetric, let \(\overline{G }_{\alpha } = (V, \overline{E }_{\alpha }, \overline{\mathbf{W} }_{\alpha })\) be the affinity network such that:

$$\displaystyle{ \overline{\mathbf{W} }_{\alpha } = \mathbf{D}_{\mathbf{W}} \cdot \mathbf{PPR}_{\mathbf{W},\alpha }\quad \mathrm{and}\quad \overline{E } =\{ (u,v): \overline{\mathbf{W} }_{\alpha }[u,v]> 0\} }$$
(15)

Then, \(\overline{G }_{\alpha }\) satisfies the following conditions:

  1. 1.

    Symmetry Preserving: \(\overline{\mathbf{W} }^{T} = \overline{\mathbf{W} }\) , i.e., \(\overline{G }_{\alpha }\) is an undirected affinity network.

  2. 2.

    Complete Information: If G is connected, then \(\overline{E }_{\alpha }\) is a complete graph with | V | self-loops.

  3. 3.

    Degree and Stationary Preserving: \(\mathbf{W} \cdot \mathbf{1} = \overline{\mathbf{W} } \cdot \mathbf{1}\) . Thus, \(\mathbf{D}_{\mathbf{W}} = \mathbf{D}_{\overline{\mathbf{W} }}\) and the random-walk Markov chains M W and \(\mathbf{M}_{\overline{\mathbf{W} }}\) have the same stationary distribution.

  4. 4.

    Markovian and PageRank Conforming:

    $$\displaystyle{ \mathbf{M}_{\overline{\mathbf{W} }} \cdot \mathbf{1} = \mathbf{1}\quad \mathrm{and}\quad \mathbf{M}_{\overline{\mathbf{W} }}^{T} \cdot \mathbf{1} = \mathbf{PageRank}_{\mathbf{ W},\alpha } }$$
    (16)
  5. 5.

    Simultaneously Diagonalizable: For any symmetric W , recall L W = D W W denotes the Laplacian matrix associated with W . Let \(\mathbf{\mathcal{L}}_{\mathbf{W}} = \mathbf{D}_{\mathbf{W}}^{-\frac{1} {2} }\mathbf{\mathbf{L}}_{\mathbf{W}}\mathbf{D}_{\mathbf{W}}^{\frac{1} {2} } = \mathbf{I} -\mathbf{D}_{\mathbf{W}}^{-\frac{1} {2} }\mathbf{W}\mathbf{D}_{\mathbf{W}}^{-\frac{1} {2} }\) be the normalized Laplacian matrix associated with W . Then, \(\mathbf{\mathcal{L}}_{\mathbf{W}}\) and \(\mathbf{\mathcal{L}}_{\overline{\mathbf{W} }}\) are simultaneously diagonalizable.

  6. 6.

    Spectral Densification and Approximation: For all \(\mathbf{x} \in \mathbb{R}^{n}\) :

    $$\displaystyle{ \alpha \cdot \mathbf{\mathbf{L}}_{\mathbf{W}} \leq \mathbf{x}^{T}\left ( \frac{1} {1-\alpha }\cdot \mathbf{\mathbf{L}}_{\overline{\mathbf{W} }}\right )\mathbf{x} \leq \frac{1} {\alpha } \mathbf{\mathbf{L}}_{\mathbf{W}} }$$
    (17)
    $$\displaystyle{ \alpha \cdot \mathbf{\mathcal{L}}_{\mathbf{W}} \leq \mathbf{x}^{T}\left ( \frac{1} {1-\alpha }\cdot \mathbf{\mathcal{L}}_{\overline{\mathbf{W} }}\right )\mathbf{x} \leq \frac{1} {\alpha } \mathbf{\mathcal{L}}_{\mathbf{W}} }$$
    (18)

    In other words, G and \(\frac{1} {1-\alpha } \cdot \overline{G }_{\alpha }\) are \(\frac{1} {\alpha }\) -spectrally similar.

Remarks

We rescale \(\mathbf{L}_{\overline{\mathbf{W} }}\) and \(\mathbf{\mathcal{L}}_{\overline{\mathbf{W} }}\) by \(\frac{1} {1-\alpha }\) because \(\overline{G }_{\alpha }\) has self-loops of magnitude α D W . In other words, \(\overline{G }_{\alpha }\) only uses (1 −α) fraction of its weighted degrees for connecting different nodes in V.

Proof

Let n = | V |. For any initial distribution s over V, we can explicitly express p s as:

$$\displaystyle{ \mathbf{p}_{\mathbf{s}} =\alpha \sum _{ k=0}^{\infty }(1-\alpha )^{k} \cdot \left (\mathbf{W}^{T} \cdot \left (\mathbf{D}_{\mathbf{ W}}^{out}\right )^{-1}\right )^{k} \cdot \mathbf{s} }$$
(19)

Consequently: we can express PPR W, α as:

$$\displaystyle{ \mathbf{PPR}_{\mathbf{W},\alpha } =\alpha \sum _{ k=0}^{\infty }(1-\alpha )^{k} \cdot \left (\left (\mathbf{D}_{\mathbf{ W}}^{out}\right )^{-1} \cdot \mathbf{W}\right )^{k} }$$
(20)

Note that α∑ k = 0 (1 −α)k = 1. Thus, PPR W, α is a convex combination of (multi-step) random-walk matrices defined by \(\left (\mathbf{D}_{\mathbf{W}}^{out}\right )^{-1} \cdot \mathbf{W}\). Statement A follows directly from the fact that \(\left (\left (\mathbf{D}_{\mathbf{W}}^{out}\right )^{-1} \cdot \mathbf{W}\right )^{k}\) is a stochastic matrix for any integer k ≥ 0.

The following fact is well known (Aldous and Fill, recompiled 2014, Reversible Markov chains and random walks on graphs, Unfinished monograph. Available at http://www.stat.berkeley.edu~aldous/RWG/book.html):

Suppose M is a Markov chain with stationary distribution \(\boldsymbol{\pi }\). Let \(\boldsymbol{\Pi }\) be the diagonal matrix defined by \(\boldsymbol{\pi }\). Then, \(\mathbf{M}^{T}\boldsymbol{\Pi }\) is symmetric if and only if the Markov process defined by M is detailed balanced.

We now assume W = W T. Then, Eq. (20) becomes:

$$\displaystyle{ \mathbf{PPR}_{\mathbf{W},\alpha } =\alpha \sum _{ k=0}^{\infty }(1-\alpha )^{k} \cdot \left (\mathbf{D}_{\mathbf{ W}}^{-1} \cdot \mathbf{W}\right )^{k} }$$
(21)

The stationary distribution of D W −1 W—and hence of PPR W, α —is proportional to d = W ⋅ 1. PPR W, α is detailed balanced because \(\overline{\mathbf{W} } = \mathbf{D}_{\mathbf{W}} \cdot \mathbf{PPR}_{\mathbf{W},\alpha }\) is a symmetric matrix. Because \(\left (\left (\mathbf{D}_{\mathbf{W}}^{out}\right )^{-1} \cdot \mathbf{W}\right )^{k}\) (for all positive integers) have a common stationary distribution, PPR W, α is not detailed balanced when W is not symmetric. It is also well known—by Eq. (19)—that for all u, vV, PPR W, α [u, v] is equal to the probability that a run of random walk starting at u passes by v immediately before it restarts. Thus, when G is connected, PPR W, α [u, v] > 0 for all u, vV. Thus, \(\mathrm{nnz}(\overline{\mathbf{W} }_{\alpha }) = n^{2}\), and \(\overline{E}_{\alpha }\), the nonzero pattern of \(\overline{\mathbf{W} }_{\alpha }\), is a complete graph with | V | self-loops. We have now established Condition B and Conditions 1–4.

We now prove Conditions 5 and 6.Footnote 9 Recall that when W = W T, we can express the personalized PageRank matrix as:

$$\displaystyle{ \mathbf{PPR}_{\mathbf{W},\alpha } =\alpha \sum _{ k=0}^{\infty }(1-\alpha )^{k} \cdot \left (\mathbf{D}_{\mathbf{ W}}^{-1} \cdot \mathbf{W}\right )^{k}. }$$

Thus:

$$\displaystyle{ \overline{\mathbf{W} }_{\alpha } = \mathbf{D}_{\mathbf{W}} \cdot \mathbf{PPR}_{\mathbf{W},\alpha } = \left (\alpha \sum _{k=0}^{\infty }(1-\alpha )^{k} \cdot \mathbf{D}_{\mathbf{ W}} \cdot \left (\mathbf{D}_{\mathbf{W}}^{-1}\mathbf{W}\right )^{k}\right ). }$$

We compare the Laplacian matrices associated with W and \(\overline{\mathbf{W}}\):

$$\displaystyle\begin{array}{rcl} \mathbf{\mathbf{L}}_{\mathbf{W}}& =& \mathbf{D}_{\mathbf{W}} -\mathbf{W} = \mathbf{D}_{\mathbf{W}}^{1/2}\left (\mathbf{I} -\mathbf{D}_{\mathbf{ W}}^{-1/2}\mathbf{W}\mathbf{D}_{\mathbf{ W}}^{-1/2}\right )\mathbf{D}_{\mathbf{ W}}^{1/2} = \mathbf{D}_{\mathbf{ W}}^{1/2}\mathbf{\mathcal{L}}_{\mathbf{ W}}\mathbf{D}_{\mathbf{W}}^{1/2}. {}\\ \mathbf{\mathbf{L}}_{\overline{\mathbf{W}}}& =& \mathbf{D}_{\mathbf{W}} -\overline{\mathbf{W} } = \mathbf{D}_{\mathbf{W}}^{1/2}\mathbf{\mathcal{L}}_{ \overline{\mathbf{W} }}\mathbf{D}_{\mathbf{W}}^{1/2} {}\\ \end{array}$$

where

$$\displaystyle{ \mathbf{\mathcal{L}}_{\overline{\mathbf{W} }} = \mathbf{I} -\alpha \sum _{k=0}^{\infty }(1-\alpha )^{k} \cdot (\mathbf{D}_{\mathbf{ W}}^{-1/2}\mathbf{W}\mathbf{D}_{\mathbf{ W}}^{-1/2})^{k}. }$$

Let λ 1λ 2λ n be the n eigenvalues of D W −1∕2 WD W −1∕2. Let u 1, , u n denote the unit-length eigenvectors of D W −1∕2 WD W −1∕2 associated with eigenvalues λ 1, ⋯ , λ n , respectively. We have | λ i | ≤ 1. Let \(\boldsymbol{\Lambda }\) be the diagonal matrix associated with (λ 1, , λ n ) and U = [u 1, , u n ]. By the spectral theorem—i.e., the eigenvalue decomposition for symmetric matrices—we have:

$$\displaystyle{ \mathbf{U}^{T}\mathbf{D}_{\mathbf{ W}}^{-1/2}\mathbf{W}\mathbf{D}_{\mathbf{ W}}^{-1/2}\mathbf{U} =\boldsymbol{ \Lambda } }$$
(22)
$$\displaystyle{ \mathbf{U}\mathbf{U}^{T} = \mathbf{U}^{T}\mathbf{U} = \mathbf{I} }$$
(23)

Therefore:

$$\displaystyle\begin{array}{rcl} \mathbf{\mathbf{L}}_{\mathbf{W}}& =& \mathbf{D}_{\mathbf{W}}^{1/2}\mathbf{U}\mathbf{U}^{T}\left (\mathbf{I} -\mathbf{D}_{\mathbf{ W}}^{-1/2}\mathbf{W}\mathbf{D}_{\mathbf{ W}}^{-1/2}\right )\mathbf{U}\mathbf{U}^{T}\mathbf{D}_{\mathbf{ W}}^{1/2} {}\\ & =& \mathbf{D}_{\mathbf{W}}^{1/2}\mathbf{U}\left (\mathbf{I} -\mathbf{U}^{T}\mathbf{D}_{\mathbf{ W}}^{-1/2}\mathbf{W}\mathbf{D}_{\mathbf{ W}}^{-1/2}\mathbf{U}\right )\mathbf{U}^{T}\mathbf{D}_{\mathbf{ W}}^{1/2} {}\\ & =& \mathbf{D}_{\mathbf{W}}^{1/2}\mathbf{U}\left (\mathbf{I} -\boldsymbol{ \Lambda }\right )\mathbf{U}^{T}\mathbf{D}_{\mathbf{ W}}^{1/2}. {}\\ \end{array}$$

Similarly:

$$\displaystyle\begin{array}{rcl} \mathbf{\mathbf{L}}_{\overline{\mathbf{W}}_{\alpha }}& =& \mathbf{D}_{\mathbf{W}} -\overline{\mathbf{W} }_{\alpha } = \mathbf{D}_{\mathbf{W}}^{1/2}\mathbf{\mathcal{L}}_{ \overline{\mathbf{W} }}\mathbf{D}_{\mathbf{W}}^{1/2} {}\\ & =& \mathbf{D}_{\mathbf{W}}^{1/2}\left (\mathbf{I} -\alpha \sum _{ k=0}^{\infty }(1-\alpha )^{k} \cdot (\mathbf{D}_{\mathbf{ W}}^{-1/2}\mathbf{W}\mathbf{D}_{\mathbf{ W}}^{-1/2})^{k}\right )\mathbf{D}_{\mathbf{ W}}^{1/2} {}\\ & =& \mathbf{D}_{\mathbf{W}}^{1/2}\mathbf{U}\left (\mathbf{I} -\alpha \sum _{ k=0}^{\infty }(1-\alpha )^{k} \cdot \mathbf{U}^{T}(\mathbf{D}_{\mathbf{ W}}^{-1/2}\mathbf{W}\mathbf{D}_{\mathbf{ W}}^{-1/2})^{k}\mathbf{U}\right )\mathbf{U}^{T}\mathbf{D}_{\mathbf{ W}}^{1/2} {}\\ & =& \mathbf{D}_{\mathbf{W}}^{1/2}\mathbf{U}\left (\mathbf{I} -\alpha \sum _{ k=0}^{\infty }(1-\alpha )^{k} \cdot \boldsymbol{ \Lambda }^{k}\right )\mathbf{U}^{T}\mathbf{D}_{\mathbf{ W}}^{1/2} {}\\ & =& \mathbf{D}_{\mathbf{W}}^{1/2}\mathbf{U}\left (\mathbf{I} - \frac{\alpha } {\mathbf{I} - (1-\alpha )\boldsymbol{\Lambda }}\right )\mathbf{U}^{T}\mathbf{D}_{\mathbf{ W}}^{1/2}. {}\\ \end{array}$$

The derivation above has proved Condition (5). To prove Condition (6), consider an arbitrary \(\mathbf{x} \in \mathbb{R}^{n}\setminus \{\mathbf{0}\}\). With y = U T D W 1∕2 x, we have:

$$\displaystyle\begin{array}{rcl} \frac{\mathbf{x}^{T} \frac{1} {1-\alpha }\mathbf{\mathbf{L}}_{\overline{\mathbf{W}}}\mathbf{x}} {\mathbf{x}^{T}\mathbf{\mathbf{L}}_{\mathbf{W}}\mathbf{x}} & =& \frac{1} {1-\alpha } \cdot \frac{\mathbf{x}^{T}\mathbf{D}_{\mathbf{W}}^{1/2}\mathbf{U}\left (\mathbf{I} - \frac{\alpha } {\mathbf{I} -(1-\alpha )\boldsymbol{\Lambda }}\right )\mathbf{U}^{T}\mathbf{D}_{\mathbf{W}}^{1/2}\mathbf{x}} {\mathbf{x}^{T}\mathbf{D}_{\mathbf{W}}^{1/2}\mathbf{U}\left (\mathbf{I} -\boldsymbol{ \Lambda }\right )\mathbf{U}^{T}\mathbf{D}_{\mathbf{W}}^{1/2}\mathbf{x}} {}\\ & =& \frac{1} {1-\alpha } \cdot \frac{\mathbf{y}^{T}\left (\mathbf{I} - \frac{\alpha } {\mathbf{I} -(1-\alpha )\boldsymbol{\Lambda }}\right )\mathbf{y}} {\mathbf{y}^{T}\left (\mathbf{I} -\boldsymbol{ \Lambda }\right )\mathbf{y}} {}\\ \end{array}$$

This ratio is in the interval of:

$$\displaystyle{\left [\inf _{\lambda:\vert \lambda \vert \leq 1} \frac{1} {1 - (1-\alpha )\lambda },\sup _{\lambda:\vert \lambda \vert \leq 1} \frac{1} {1 - (1-\alpha )\lambda }\right ] = \left [ \frac{1} {2-\alpha }, \frac{1} {\alpha } \right ].}$$

3.3 PageRank Completion, Community Identification, and Clustering

PageRank completion has an immediate application to the community-identification approaches developed in [11, 19]. This family of methods first constructs a preference network from an input weighted graph G = (V, E, W). It then applies various social-choice aggregation functions [10] to define network communities [11, 19]. In fact, Balcan et al. [11] show that the PageRank completion of G provides a wonderful scheme (see also in Definition 4.10) for constructing preference networks from affinity networks.

In addition to its classical connection with PageRank centrality, PageRank completion also has a direct connection with network clustering. To illustrate this connection, let’s recall a well-known approach in spectral graph theory for clustering [9, 24, 62, 84, 86]:

Algorithm: Sweep(G, v)

Both in theory and in practice, the most popular vectors used in Sweep are:

  • Fiedler vector: the eigenvector associated with the second smallest eigenvalue of the Laplacian matrix L W [39, 40, 84].

  • Cheeger vector: \(\mathbf{D}_{\mathbf{W}}^{-1/2}\boldsymbol{v}_{2}\), where \(\boldsymbol{v}_{2}\) is the eigenvector associated with the second smallest eigenvalue of the normalized Laplacian matrix \(\mathbf{\mathcal{L}}_{\mathbf{W}}\) [24, 28].

The sweep-based clustering method and Fiedler/Cheeger vectors are the main subject of following beautiful theorem [24] in spectral graph theory:

Theorem 3.6 (Cheeger’s Inequality)

For any symmetric weighted graph G = (V, E, W), let λ 2 be the second smallest eigenvalue of the normalized Laplacian matrix \(\mathbf{\mathcal{L}}_{\mathbf{W}}\) of G. Let \(\boldsymbol{v}_{2}\) be the eigenvector associated with λ 2 and S = Sweep(G, D W −1∕2 v 2). Then:

$$\displaystyle{ \frac{\lambda _{2}} {2} \leq \mathrm{ conductance}_{\mathbf{W}}(S) \leq \sqrt{2\lambda _{2}} }$$
(24)

By Theorem 3.5, the normalized Laplacian matrices of G and its PageRank completion are simultaneously diagonalizable. Thus, we can also use the eigenvector of the PageRank completion of G to identify a cluster of G whose conductance is guaranteed by the Cheeger’s inequality.

Then, how is the PageRank completion necessarily a better representation of the information contained in the original network?

For example, with respect to network clustering, what desirable properties does the PageRank completion have that the original graph doesn’t?

While we are still looking for a comprehensive answer to these questions, we will now use the elegant result of Andersen, Chung, and Lang [9] to illustrate that the PageRank completion indeed contains more direct information about network clustering than the original data W. Andersen et al. proved that if one applies sweep to vectors {D W −1 ⋅ p v } vV , then one can obtain a cluster whose conductance is nearly as small as that guaranteed by Cheeger’s inequality. Such a statement does not hold for the rows in the original network data W, particularly when W is sparse.

In fact, the result of Andersen, Chung, and Lang [9] is much stronger. They showed that for any cluster SV, if one selects a random node vS with probability proportional to the weighted degree d v of the node, then, with probability at least 1∕2, one can identify a cluster S of conductance at most \(O(\sqrt{\mathrm{conductance }_{\mathbf{W} } (S)\log n})\) by applying sweep to vector D W −1 ⋅ p v . In other words, the row vectors in the PageRank completion—i.e., the personalized PageRank vectors that represent the individual data associated with nodes—have rich and direct information about network clustering (measured by conductance). This is a property that the original network data simply doesn’t have, as one is usually not able to identify good clusters directly from the individual rows of W.

In summary, Cheeger’s inequality and its algorithmic proof can be viewed as the mathematical foundation for global spectral partitioning, because the Fiedler/Cheeger vectors are formulated from the network data as a whole. From this global perspective, both the original network and its PageRank completion are equally effective. In contrast, from the local perspective of individual-row data, Andersen, Chung, and Lang’s result highlights the effectiveness of the PageRank completion to local clustering [86]: The row data associated with nodes in the PageRank completion provides effective information for identifying good clusters. Similarly, from the corresponding column in the PageRank completion, one can also directly and “locally” obtains each node’s PageRank centrality. In other words, PageRank completion transforms the input network data W into a “complete-information” network model \(\overline{\mathbf{W}}\), and in the process, it distilled the centrality/clusterability information implicitly embedded globally in W into an ensemble of nodes’ “individual” network data that explicitly encodes the centrality information and locally capturing the clustering structures.

4 Connecting Multifaceted Network Data

The formulations highlighted in Sect. 2.3, such as the cooperative, incentive, powerset, and preference models, are just a few examples of network models beyond the traditional graph-based framework. Other extensions include the popular probabilistic graphical model [58] and game-theoretical graphical model [26, 31, 52]. These models use relatively homogeneous node and edge types, but nevertheless represent a great source of expressions for multifaceted and multimodal network data.

While diverse network models enable us to express multifaceted network data, we need mathematical and algorithmic tools to connect them. For some applications such as community identification, one may need to properly use some data facets as metadata to evaluate or cross validate the network solution(s) identified from the main network facets [74].

But more broadly, for many real-world network analysis tasks, we need a systematic approach to network composition whose task is to integrate the multifaceted data into a single effective network worldview. Towards this goal, a basic theoretical step in multifaceted network analysis is to establish a unified worldview for capturing multifaceted network data expressed in various models.

Although fundamental, formulating a unified worldview of network models is still largely an outstanding research problem. In this section, we sketch our preliminary studies in using Markov chains to build a “common platform” for the network models discussed in Sect. 2.3. We hope this study will inspire a general theory for data integration, network composition, and multifaceted network analysis. We also hope that it will help to strengthen the connection between various fields, as diverse as statistical modeling, geometric embedding, social influence, network dynamics, game theory, and social choice theory, as well as various application domains (protein-protein interaction, viral marketing, information propagation, electoral behavior, homeland security, healthcare, etc.), that have provided different but valuable techniques and motivations to network analysis.

4.1 Centrality-Conforming Stochastic Matrices of Various Network Models

Markov chain—a basic statistical model—is also a fundamental network concept. For a weighted network G = (V, E, W), the standard random-walk transition \(\left (\mathbf{D}_{\mathbf{W}}^{out}\right )^{-1} \cdot \mathbf{W}\) is the most widely-used stochastic matrix associated with G. Importantly, Sect. 3 illustrates that other Markov chains—such as PageRank Markov chain PPR W, α —are also natural with respect to network data W. Traditionally, a Markov chain is characterized by its stochastic condition, stationary distribution, mixing time, and detailed-balancedness. Theorem 3.5 highlights another important feature of Markov chains in the context of network analysis: The PageRank Markov chain is conforming with respect to PageRank centrality, that is, for any network G = (V, E, W) and α > 0, we have:

$$\displaystyle{\mathbf{PPR}_{\mathbf{W},\alpha }^{T} \cdot \mathbf{1} = \mathbf{PageRank}_{\mathbf{ W},\alpha }.}$$

How should we derive stochastic matrices from other network models? Can we construct Markov chains that are centrality-confirming with respect to natural centrality measures of these network models?

In this section, we will examine some centrality-confirming Markov chains that can be derived from network data given by preference/incentive/cooperative/ powerset models.

4.1.1 The Preference Model

For the preference model, there is a family of natural Markov chains, based on weighted aggregations in social-choice theory [10]. For a fixed n, let \(\mathbf{w} \in (\mathbb{R}^{+} \cup \{ 0\})^{n}\) be a non-negative and monotonically non-increasing vector. For the discussion below, we will assume that w is normalized such that i = 1 n w[i] = 1. For example, while the famous Borda count [93] uses w = [n, n − 1, , 1]T, the normalized Borda count uses \(\mathbf{w} = [n,n - 1,\ldots,1]^{T}/{n\choose 2}\).

Proposition 4.1 (Weighted Preference Markov Chain)

Suppose \(A = (V,\Pi )\) is a preference network over V = [n] and w is non-negative and monotonically non-increasing weight vector, with | | w | |1 = 1. Let M A, w be the matrix in which for each uV, the uth row of M A, w is:

$$\displaystyle{\boldsymbol{\pi }_{u} \circ \mathbf{w} = [\mathbf{w}[\boldsymbol{\pi }_{u}(1)],\ldots,\mathbf{w}(\boldsymbol{\pi }_{u}(n))].}$$

Then, M A, w defines a Markov chain, i.e., M A, w 1 = 1 .

Proof

M A, w is a stochastic matrix because each row of M A, w is a permutation of w, and permutations preserve the L1-norm of the vector. □

Social-choice aggregation based on w also defines the following natural centrality measure, which can be viewed as the collective ranking over V based on the preference profiles of \(A = (V,\Pi )\):

$$\displaystyle\begin{array}{rcl} \mathrm{centrality}_{\Pi,\mathbf{w}}[v] =\sum _{u\in V }\mathbf{w}[\pi _{u}(v)]& &{}\end{array}$$
(25)

Like PageRank Markov chains, weighted preference Markov chains also enjoy the centrality-conforming property:

Proposition 4.2

For any preference network \(A = (V,\Pi )\) , in which \(\Pi \in L(V )^{\vert V \vert }\) :

$$\displaystyle\begin{array}{rcl} \mathbf{M}_{A,\mathbf{w}}^{T} \cdot \mathbf{1} =\mathrm{ centrality}_{ \Pi,\mathbf{w}}& &{}\end{array}$$
(26)

4.1.2 The Incentive Model

We now focus on a special family of incentive networks: We assume for \(U = (V,\boldsymbol{u})\) and sV:

  1. 1.

    u s is monotonically non-decreasing, i.e., for all T 1T 2, u s (T 1) ≤ u s (T 2).

  2. 2.

    u s is normalized, i.e., u s (V ∖{s}) = 1.

Each incentive network defines a natural cooperative network, \(H_{U} = (V,\boldsymbol{\tau }_{SocialUtility})\): For any SV, let the social utility of S be:

$$\displaystyle\begin{array}{rcl} \boldsymbol{\tau }_{SocialUtility}(S) =\sum _{s\in S}u_{s}(S\setminus \{s\})& &{}\end{array}$$
(27)

The Shapley value [81]—a classical game-theoretical concept—provides a natural centrality measure for cooperative networks.

Definition 4.3 (Shapley Value)

Suppose \(\boldsymbol{\tau }\) is the characteristic function of a cooperative game over V = [n]. Recall that L(V ) denotes the set of all permutations of V. Let \(S_{\boldsymbol{\pi },v}\) denotes the set of players preceding v in a permutation \(\boldsymbol{\pi }\in L(V )\). Then, the Shapley value \(\boldsymbol{\phi }_{\boldsymbol{\tau }}^{Shapley}[v]\) of a player vV is:

$$\displaystyle{ \boldsymbol{\phi }_{\boldsymbol{\tau }}^{Shapley}[v] =\mathrm{ E}_{\boldsymbol{\pi } \sim L(V )}\left [\boldsymbol{\tau }[S_{\boldsymbol{\pi },v} \cup \{ v\}] -\boldsymbol{\tau } [S_{\boldsymbol{\pi },v}]\right ] }$$
(28)

The Shapley value \(\boldsymbol{\phi }_{\boldsymbol{\tau }}^{Shapley}[v]\) of player vV is the expected marginal contribution of v over the set preceding v in a random permutation of the players. The Shapley value has many attractive properties, and is widely considered to be the fairest measure of a player’s power index in a cooperative game.

We can use Shapley values to define both the stochastic matrix and the centrality of incentive networks U. Let centrality U be the Shapley value of the cooperative game defined by \(\boldsymbol{\tau }_{SocialUtility}\). Note that the incentive network U also defines | V | natural individual cooperative networks: For each sV and TV, let:

$$\displaystyle{ \boldsymbol{\tau }_{s}(T) = \left \{\begin{array}{ll} u_{s}(T\setminus \{s\})&\mbox{ if }s \in T\\ 0 &\mbox{ if } s\not\in T \end{array} \right. }$$
(29)

Proposition 4.4 (The Markov Chain of Monotonic Incentive Model)

Suppose \(U = (V,\boldsymbol{u})\) is an incentive network over V = [n], such that \(\forall s \in V\) , u s is monotonically non-decreasing and u s (V ∖{s}) = 1. Let M U be the matrix in which for each sV, the sth row of M U is the Shapley value of the cooperative game with characteristic function \(\boldsymbol{\tau }_{s}\) . Then, M U defines a Markov chain and is centrality-conforming with respect to centrality U , i.e., (1) M U 1 = 1 and (2) M U T 1 = centrality U .

Proof

This proposition is the direct consequence of two basic properties of Shapley’s beautiful characterization [81]:

  1. 1.

    The Shapley value is efficient: \(\sum _{v\in V }\phi _{\boldsymbol{\tau }}[v] =\boldsymbol{\tau } (V )\).

  2. 2.

    The Shapley value is Linear: For any two characteristic functions \(\boldsymbol{\tau }\) and \(\boldsymbol{\omega }\), \(\phi _{\boldsymbol{\tau }+\boldsymbol{\omega }} =\phi _{\boldsymbol{\tau }} +\phi _{\boldsymbol{\omega }}\).

By the assumption u s is monotonically non-decreasing, we can show that every entry of the Shapley value (as given by Eq. (28)) is non-negative. Then, it follows from the efficiency of Shapley values and the assumption that \(\forall s \in V,u_{s}(V \setminus \{s\}) = 1\), that M U is a stochastic matrix, and hence it defines a Markov chain. Furthermore, we have:

$$\displaystyle{ \boldsymbol{\tau }_{SocialUtility} =\sum _{s\in V }\boldsymbol{\tau }_{s} }$$
(30)

Because centrality U is the Shapley value of the cooperative game with characteristic function \(\boldsymbol{\tau }_{SocialUtility}\), the linearity of the Shapley value then implies M U T 1 = centrality U , i.e., M U is centrality-conforming with respect to centrality U . □

4.1.3 The Influence Model

Centrality-conforming Markov chain can also be naturally constructed for a family of powerset networks. Recall from Sect. 2.3 that an influence process \(\mathcal{D}\) and social network G = (V, E) together define a powerset network, \(\boldsymbol{P}_{G,\mathcal{D}}: 2^{V } \times 2^{V } \rightarrow [0,1]\), where for each T ∈ 2V, \(\boldsymbol{P}_{G,\mathcal{D}}[S,T]\) specifies the probability that T is the final activated set when S cascades its influence through G. As observed in [25], the influence model also defines a natural cooperative game, whose characteristic function is the influence spread function:

$$\displaystyle{\boldsymbol{\sigma }_{G,\mathcal{D}}(S) =\sum _{T\subseteq V }\vert T\vert \cdot \boldsymbol{ P}_{G,\mathcal{D}}[S,T],\quad \forall S \subseteq V.}$$

Chen and Teng [25] proposed to use the Shapley value of this social-influence game as a centrality measure of the powerset network defined by \(\boldsymbol{P}_{G,\mathcal{D}}\). They showed that this social-influence centrality measure, to be denoted by \(\mathrm{centrality}_{G,\mathcal{D}}\), can be uniquely characterized by a set of five natrual axioms [25]. Motivated by the PageRank Markov chain, they also constructed the following centrality-conforming Markov chain for social-influence models.

Proposition 4.5 (Social-Influence Markov Chain)

Suppose G = (V, E) is a social network and \(\mathcal{D}\) is a social-influence process. Let \(\mathbf{M}_{G,\mathcal{D}}\) be the matrix in which for each vV, the vth row of \(\mathbf{M}_{G,\mathcal{D}}\) is given by the Shapley value of the cooperative game with the following characteristic function:

$$\displaystyle{ \boldsymbol{\sigma }_{G,\mathcal{D},v}(S) =\sum _{T\subseteq V }\boldsymbol{[}v \in T\boldsymbol{]} \cdot \boldsymbol{ P}_{G,\mathcal{D}}[S,T] }$$
(31)

where \(\boldsymbol{[}v \in T\boldsymbol{]}\) is the indicator function for event (vT). Then, \(\mathbf{M}_{G,\mathcal{D}}\) defines a Markov chain and is centrality-conforming with respect to \(\mathrm{centrality}_{G,\mathcal{D}}\) , i.e., (1) \(\mathbf{M}_{G,\mathcal{D}}\mathbf{1} = \mathbf{1}\) and (2) \(\mathbf{M}_{G,\mathcal{D}}^{T}\mathbf{1} =\mathrm{ centrality}_{G,\mathcal{D}}\) .

Proof

For all vV, the characteristic function \(\boldsymbol{\sigma }_{G,\mathcal{D},v}\) satisfies the following two conditions:

  1. 1.

    \(\boldsymbol{\sigma }_{G,\mathcal{D},v}\) is monotonically non-decreasing.

  2. 2.

    \(\boldsymbol{\sigma }_{G,\mathcal{D},v}(V ) = 1\).

The rest of the proof is essentially the same as the proof of Proposition 4.4. □

4.2 Networks Associated with Markov Chains

The common feature in the Markovian formulations of Sect. 4.1 suggests the possibility of a general theory that various network models beyond graphs can be succinctly analyzed through the worldview of Markov chains. Such analyses are forms of dimension reduction of network data—the Markov chains derived, such as from social-influence instances, usually have lower dimensionality than the original network models. In dimension reduction of data, inevitably some information is lost. Thus, which Markov chain is formulated from a particular network model may largely depend on through which mathematical lens we are looking at the network data. The Markovian formulations of Sect. 4.1 are largely based on centrality formulations. Developing a more general Markovian formulation theory of various network models remains the subject of future research.

But once we can reduce the network models specifying various aspects of network data to a collection of Markov chains representing the corresponding network facets, we effectively reduce multifaceted network analysis to a potentially simpler task—the analysis of multilayer networks [57, 60]. Thus, we can apply various emerging techniques for multilayer network analysis [47, 73, 94] and network composition [60]. We can further use standard techniques to convert the Markov chains into weighted graphs to examine these network models through the popular graph-theoretical worldview.

4.2.1 Random-Walk Connection

Because of the following characterization, the random-walk is traditionally the most commonly-used connection between Markov chains and weighted networks.

Proposition 4.6 (Markov Chains and Networks: Random-Walk Connection)

For any directed network G = (V, E, W) in which every node has at least one out-neighbor, there is a unique transition matrix:

$$\displaystyle{\mathbf{M}_{\mathbf{W}} = \left (\mathbf{D}_{\mathbf{W}}^{out}\right )^{-1}\mathbf{W}}$$

that captures the (unbiased) random-walk Markov process on G. Conversely, given a transition matrix M , there is an infinite family of weighted networks whose random-walk Markov chains are consistent with M . This family is given by:

$$\displaystyle{\{\boldsymbol{\varGamma }\mathbf{M}:\boldsymbol{\varGamma } \mathit{\mbox{ is a positive diagonal matrix}}\}.}$$

The most commonly-used diagonal scaling is \(\boldsymbol{\varPi }\), the diagonal matrix of the stationary distribution. This scaling is partially justified by the fact that \(\boldsymbol{\varPi }\mathbf{M}\) is an undirected network if and only if M is a detailed-balanced Markov chain. In fact in such a case, \(\boldsymbol{\varGamma }\mathbf{M}\) is symmetric if and only if there exists c > 0, \(\boldsymbol{\varGamma }= c\cdot \boldsymbol{\varPi }\). Let’s call \(\boldsymbol{\varPi }\mathbf{M}\) the canonical Markovian network of transition matrix M. For a general Markov chain, we have:

$$\displaystyle{ \mathbf{1}\boldsymbol{\varPi }\mathbf{M} =\boldsymbol{\pi } ^{T}\mbox{ and }\boldsymbol{\varPi }\mathbf{M}\mathbf{1} =\boldsymbol{\pi } }$$
(32)

Thus, although canonical Markovian networks are usually directed, their nodes always have the same in-degree and out-degree. Such graphs are also known as the weighted Eulerian graphs.

4.2.2 PageRank Connection

Recall that Theorem 3.5 features the derivation of PageRank-conforming Markov chains from weighted networks. In fact, Theorem 3.5 and its PageRank power series can be naturally extended to any transition matrix M: For any finite irreducible and ergodic Markov chain M and restart constant α > 0, the matrix α∑ k = 0 (1 −α)k ⋅ M k is a stochastic matrix that preserves the detailed-balancedness, the stationary distribution, and the spectra of M.

Let’s call \(\alpha \sum _{k=0}^{\infty }(1-\alpha )^{k} \cdot \boldsymbol{\varPi }\mathbf{M}^{k}\) the canonical PageRank-Markovian network of transition matrix M.

Proposition 4.7

For any Markov chain M , the random-walk Markov chain of the canonical PageRank-Markovian network \(\alpha \sum _{k=0}^{\infty }(1-\alpha )^{k} \cdot \boldsymbol{\varPi }\mathbf{M}^{k}\) is conforming with respect to the PageRank of the canonical Markovian network \(\boldsymbol{\varPi }\mathbf{M}\) .

4.2.3 Symmetrization

Algorithmically, computational/optimization problems on directed graphs are usually harder than they are on undirected graphs. For example, many recent breakthroughs in scalable graph-algorithm design are for limited to undirected graphs [9, 27, 53, 54, 59, 75, 83, 8587]. To express Markov chains as undirect networks, we can apply the following well-known Markavian symmetrization formulation. Recall a matrix L is a Laplacian matrix if (1) L is a symmetric matrix with non-positive off-diagonal entries, and (2) L ⋅ 1 = 0.

Proposition 4.8 (Canonical Markovian Symmetrization)

For any irreducible and ergodic finite Markov chain M :

$$\displaystyle{ \boldsymbol{\varPi }-\frac{\boldsymbol{\varPi }\mathbf{M} + \mathbf{M}^{T}\boldsymbol{\varPi }} {2} }$$
(33)

is a Laplacian matrix, where \(\boldsymbol{\varPi }\) the diagonal matrix associated with M ’s stationary distribution. Therefore, \(\frac{\boldsymbol{\varPi }\mathbf{M}+\mathbf{M}^{T}\boldsymbol{\varPi }} {2}\) is a symmetric network, whose degrees are normalized to stationary distribution \(\boldsymbol{\pi }=\boldsymbol{\varPi } \cdot \mathbf{1}\) . When M is detailed balanced, \(\frac{\boldsymbol{\varPi }\mathbf{M}+\mathbf{M}^{T}\boldsymbol{\varPi }} {2}\) is the canonical Markovian network of M .

Proof

We include a proof here for completeness. Let \(\boldsymbol{\pi }\) be the stationary distribution of M. Then:

$$\displaystyle\begin{array}{rcl} \mathbf{M}^{T}\boldsymbol{\pi }& =& \boldsymbol{\pi } {}\\ \boldsymbol{\varPi }\cdot \mathbf{1}& =& \boldsymbol{\pi } {}\\ \mathbf{M} \cdot \mathbf{1}& =& \mathbf{1} {}\\ \end{array}$$

Therefore:

$$\displaystyle{ \left (\boldsymbol{\varPi }-\frac{\boldsymbol{\varPi }\mathbf{M} + \mathbf{M}^{T}\boldsymbol{\varPi }} {2} \right ) \cdot \mathbf{1} = \left (\boldsymbol{\pi }-\frac{\boldsymbol{\varPi }\mathbf{1} + \mathbf{M}^{T}\boldsymbol{\pi }} {2} \right ) = \mathbf{0} }$$
(34)

The Lemma then follows from the fact that \(\frac{1} {2}(\boldsymbol{\varPi }\mathbf{M} + \mathbf{M}^{T}\boldsymbol{\varPi })\) is symmetric and non-negative. □

Through the PageRank connection, Markov chains also have two extended Markovian symmetrizations:

Proposition 4.9 (PageRank Markovian Symmetrization)

For any irreducible and ergodic finite Markov chain M and restart constant α > 0, the two matrices below:

$$\displaystyle{ \boldsymbol{\varPi }-\alpha \sum _{k=0}^{\infty }(1-\alpha )^{k} \cdot \frac{\boldsymbol{\varPi }\mathbf{M}^{k} + (\mathbf{M}^{T})^{k}\boldsymbol{\varPi }} {2} }$$
(35)
$$\displaystyle{ \boldsymbol{\varPi }-\alpha \sum _{k=0}^{\infty }(1-\alpha )^{k}\boldsymbol{\varPi } \cdot \left (\boldsymbol{\varPi }^{-1} \cdot \frac{\boldsymbol{\varPi }\mathbf{M} + \mathbf{M}^{T}\boldsymbol{\varPi }} {2} \right )^{k} }$$
(36)

are both Laplacian matrices. Moreover, the second Laplacian matrix is \(\frac{1} {\alpha }\) -spectrally similar to \((1-\alpha ) \cdot \left (\boldsymbol{\varPi }-\frac{\boldsymbol{\varPi }\mathbf{M}+\mathbf{M}^{T}\boldsymbol{\varPi }} {2} \right )\) .

4.2.4 Network Interpretations

We now return to Balcan et al.’s approach [11] for deriving preference networks from affinity networks. Consider the following natural extension of linear orders to express rankings with ties: An ordered partition of V is a total order of a partition of V. Let \(\overline{L(V )}\) denote the set of all ordered partitions of V: For a \(\sigma \in \overline{L(V )}\), for i, jV, we i is ranked strictly ahead of j if i and j belong to different partitions, and the partition containing i is ahead of the partition containing j in σ. If i and j are members of the same partition in σ, we say σ is indifferent of i and j.

Definition 4.10 (PageRank Preferences)

Suppose G = (V, E, W) is a weighted graph and α > 0 is a restart constant. For each uV, let \(\boldsymbol{\pi }_{u}\) be the ordered partition according to the descending ranking of V based on the personalized PageRank vector p u = PPR W, α [u, : ]. We call \(\Pi _{\mathbf{W},\alpha } =\{\boldsymbol{\pi } _{u}\}_{u\in V }\) the PageRank preference profile of V with respect to G, and \(A_{\mathbf{W},\alpha } = (V,\Pi _{\mathbf{W},\alpha })\) the PageRank preference network of G.

As pointed out in [11], other methods for deriving preference networks from weighted networks exist. For example, one can obtain individual preference rankings by ordering nodes according to shortest path distances, effective resistances, or maximum-flow/minimum-cut values.

Is the PageRank preference a desirable personalized-preference profile of an affinity network?

This is a basic question in network analysis. In fact, much work has been done. I will refer readers to the beautiful axiomatic approach of Altman and Tennenholtz for characterizing personalized ranking systems [6]. Although they mostly studied unweighted networks, many of their results can be extended to weighted networks. Below, I will use Theorem 3.5 to address the following question that I was asked when first giving a talk about PageRank preferences.

By taking the ranking information from PageRank matrices — which is usually asymmetric — one may lose valuable network information. For example, when G = (V, E, W) is a undirected network, isn’t it desirable to define ranking information according to a symmetric matrix?

At the time, I was not prepared to answer this question and replied that it was an excellent point. Theorem 3.5 now provides an answer. Markov chain theory uses an elegant concept to characterize whether or not a Markov chain M has an undirected network realization. Although Markov-chain transition matrices are usually asymmetric, if a Markov chain is detailed-balanced, then its transition matrix M can be diagonally scaled into a symmetric matrix by its stationary distribution. Moreover, \(\boldsymbol{\varPi }\mathbf{M}\) is the “unique” underlying undirected network associated with M. By Theorem 3.5, PPR W, α is a Markov transition matrix with stationary distribution D W , and thus, \(\overline{\mathbf{W}}_{\alpha } = \mathbf{D}_{\mathbf{W}} \cdot \mathbf{PPR}_{\mathbf{W},\alpha }\) is symmetric if and only if W is symmetric. Therefore, because the ranking given by p u is the same as the ranking given by \(\overline{\mathbf{W} }[u,:]\), the PageRank preference profile is indeed derived from a symmetric matrix when W is symmetric.

We can also define clusterability and other network models based on personalized PageRank matrices. For example:

  • PageRank conductance:

    $$\displaystyle{ \text{PageRank-conductance}_{\mathbf{W}}(S):= \frac{\sum _{u\in S,v\not\in S}\overline{\mathbf{W} }[u,v]} {\min \left (\sum _{u\in S,v\in V }\overline{\mathbf{W} }[u,v],\sum _{u\not\in S,v\in V }\overline{\mathbf{W} }[u,v]\right )} }$$
    (37)
  • PageRank utility:

    $$\displaystyle{ \text{PageRank-utility}_{\mathbf{W}}(S):=\sum _{u\in S,v\in S}\mathbf{PPR}_{\mathbf{W},\alpha }[u,v] }$$
    (38)
  • PageRank clusterability:

    $$\displaystyle{ \text{PageRank-clusterability}_{\mathbf{W}}(S):= \frac{\text{PageRank-utility}_{\mathbf{W}}(S)} {\vert S\vert } }$$
    (39)

Each of these functions defines a cooperative network based on G = (V, E, W). These formulations are connected with the PageRank of G. For example, the Shapley value of the cooperative network given by \(\boldsymbol{\tau }= \text{PageRank-utility}_{\mathbf{W}}\) is the PageRank of G.

PPR W, α can also be used to define incentive and powerset network models. The former can be defined by u s (T) = vT PPR W, α [s, v], for sV, TV and sT. The latter can be defined by \(\boldsymbol{\theta }_{\mathbf{W}}(S,T) = \frac{\sum _{u\in S,v\in T}\mathbf{PPR}_{\mathbf{W},\alpha }[u,v]} {\vert S\vert }\) for S, TV. \(\boldsymbol{\theta }_{\mathbf{W}}(S,T)\) measures the rate of PageRank contribution from S to T.

4.3 Multifaceted Approaches to Network Analysis: Some Basic Questions

We will now conclude this section with a few basic questions, aiming to study how structural concepts in one network model can inspire structural concepts in other network models. A broad view of network data will enable us to comprehensively examine different facets of network data, as each network model brings out different aspects of network data. For examples, the metric model is based on geometry, the preference model is inspired by social-choice theory [10], the incentive and cooperative models are based on game-theoretical and economical principles [69, 70, 82], the powerset model is motivated by social influences [32, 55, 78], while the graphon [18] is based on graph limits and statistical modeling. We hope that addressing questions below will help us to gain comprehensive and comparative understanding of these models and the network structures/aspects that these models may reveal. We believe that multifaceted and multimodal approaches to network analysis will become increasingly more essential for studying major subjects in network science.

  • How should we formulate personalized centrality measures with respect to other commonly-used network centrality measures [1, 1317, 33, 36, 37, 41, 42, 51, 66, 71, 76, 80]? Can they be used to define meaningful centrality-conforming Markov chains?

  • How should we define centrality measures and personalized ranking systems for general incentive or powerset networks? How should we define personalized Shapley value for cooperative games? How should we define weighted networks from cooperative/incentive/powerset models?

  • What are natural Markov chains associated with the probabilistic graphical models [58]? How should we define centrality and clusterability for this important class of network models that are central to statistical machine learning?

  • What constitutes a community in a probabilistic graphical model? What constitutes a community in a cooperative, incentive, preference, and powerset network? How should we capture network similarity in these models? How should we integrate them if they represents different facets of network data?

  • How should we evaluate different clusterability measures and their usefulness to community identification or clustering? For example, PageRank conductance and PageRank clusterability are two different subset functions, but the latter applies to directed networks. How should we define clusterability-conforming centrality or centrality-forming clusterability?

  • What are limitations of Markovian worldview of various network models? What are other unified worldview models for multifaceted network data?

  • What is the fundamental difference between “directed” and “undirected” networks in various models?

  • How should we model networks with non-homogeneous nodes and edge types?

More broadly, the objective is to build a systematic algorithmic framework for understanding multifaceted network data, particular given that many natural network models are highly theoretical in that their complete-information profiles have exponential dimensionality in | V |. In practice, they must be succinctly defined. The algorithmic network framework consists of the complex and challenging tasks of integrating sparse and succinctly-represented multifaceted network data N = (V, F 1, , F k ) into an effective worldview (V, W) based on which, one can effectively build succinctly-represented underlying models for network facets, analyzing the interplay between network facets, and identify network solutions that are consistent with the comprehensive network data/models. What is a general model for specifying multifaceted network data? How should we formulate the problem of network composition for multifaceted network data?

5 To Jirka

The sparsity, richness, and ubiquitousness of multifaceted networks data make them wonderful subjects for mathematical and algorithmic studies. Network science has truly become a “universal discipline,” with its multidisciplinary roots and interdisciplinary presence. However, it is a fundamental and conceptually challenging task to understand network data, due to the vast network phenomena.

The holy grail of network science is to understand the network essence that underlies the observed sparse-and-multifaceted network data.

We need an analog of the concept of range space, which provides a united worldview of a family of diverse problems that are fundamental in statistical machine learning, geometric approximation, and data analysis. I wish that I had a chance to discuss with you about the mathematics of networks—beyond just the geometry of graphs—and to learn from your brilliant insights into the essence of networks. You and your mathematical depth and clarity will be greatly missed, Jirka.