1 Introduction

Over 40 years ago, social scientist Allen Barton stated that “If our aim is to understand people’s behavior rather than simply to record it, we want to know about primary groups, neighborhoods, organizations, social circles, and communities; about interaction, communication, role expectations, and social control.” (Barton 1968 as reported in Freeman 2004). This sentiment is fundamental to the concept of modularity. The importance of structural relationships in defining communities and predicting future behaviors has long been recognized, and is not restricted to the social sciences (Freeman 2004).

Social network analysis (SNA) has a rich history that is based on the defining principle that links between actors are informative. The advent and availability of Internet technology has created an explosion in online social networks and a transformation in SNA. The analysis of today’s social networks is a difficult Big Data problem, which requires the integration of statistics and computer science to leverage networks for knowledge mining and discovery (Manyika et al. 2011). Historically, scientists have had to rely on tractable records of social interactions and experiments (e.g., Milgram’s small world experiment); now they have a luxury of accessing huge digital databases of relational social data. SNA relies on diverse data representations and relational information, which may include (among others), tracked relationships among actors, events, and other covariate information (Scott and Carrington 2011). Modeling social networks is especially challenging due to the heterogeneity of the populations represented, and the broad spectrum of information represented in the data itself. Modern applications of SNA include, among others, the estimation of influence, privacy protection, trust (and distrust) microblogging, and web browsing.

In this review, we focus on probabilistic graphical models (PGMs), which have demonstrated promise in modeling social networks (Lauritzen 1996; Koller and Friedman 2009). PGMs represent a marriage between graph theory and probability theory that offers flexible modeling paradigms with good interpretably. The graphical representation consists of nodes connected by edges, which may be directed (Bayesian networks) or undirected (Markov networks). The relationship between nodes in a graph can be interpreted in terms of conditional independencies. These independencies can be read directly from the graph and enable a tractable decomposition of the joint distribution possible through the use of conditional probabilities. In this setting, the compact representation of a high-dimensional joint distribution of random variables \(\{X_1,X_2,\ldots ,X_p\}\) can be represented explicitly in a factorized form that has a graphical interpretation rooted in conditional independencies.

A powerful feature of the PGM modeling paradigms is the ability to perform probabilistic queries and reasoning on the graph at multiple levels (Koller and Friedman 2009). Queries of interest may include the estimation of probabilities (joint, conditional, or marginal), reasoning about variables in light of new evidence (causal, evidential, and inter-causal reasoning), and quantitative predictions through the use of the graph as a generator in simulations. Another attractive feature of PGMs is their inherent flexibility to model variables that follow different distributions, and the ability to bring in a priori information in to the learning process.

In this review, we outline the basic theory of PGMs, along with the parameter and structural learning. The topic of PGMs is extremely rich in content and theory. Several existing surveys on the topic of graphical models that are similar in spirit include (e.g., Goldenberg et al. 2010; Daud et al. 2010; Salter-Townshend et al. 2012; Srihari 2014). Our review differs from existing reviews in both style and content. One distinguishing feature is that we illustrate the different modeling paradigms using accessible and simple models. Simple examples facilitate a connection between theory and practice. Once this connection is established, we highlight more complex recent applications in SNA that differ from each other in both the nature of the data and objectives of the modeling. These applications reveal the inherent flexibility of PGMs to model a broad spectrum of data that target relevant open challenges and questions in SNA. We address both directed PGMs, known as Bayesian networks (BNs), and undirected PGMs, known as Markov networks, in Sects. 2 and 3, respectively. In Sect. 4, some of the current challenges are highlighted, comparisons between directed and undirected paradigms are made, and future directions and opportunities for PGM-based research in SNA are also highlighted.

2 Directed probabilistic graphical models

Bayesian networks (BNs) are a special class of PGMs that capture directed dependencies between variables, which may represent cause and effect relationships. The edges in a BN form a directed acyclic graphs (DAGs). The DAG architecture conveys a critical modeling assumption that there is no feedback via cycles in the graph. BNs obey the Markov assumption which states that each variable, \(X_i\), is independent of its non-descendants (unconnected nodes), given its parents in G. Taken together, these assumptions enable the compact representation of the high-dimensional joint probability distribution of the variables in the model. Despite their flexibility, the use of directed graphs in SNA has been somewhat limited, although the applications that we highlight are diverse. We describe the basic principles of these directed PGMs and motivate them with applications in the literature, which showcase their utility in SNA.

Static Bayesian Networks Our major focus is static BNs, which utilize data from a single snapshot of a social community at a given time point. A DAG conveys precise information regarding the conditional independencies between modeled variables (nodes). For a set of random variables \(\{X_1,X_2,\ldots , X_n\}\) is a network with the structure that encodes conditional independence relationships:

$$\begin{aligned} P(X_1,X_2,\ldots ,X_n)=P(G)\prod _{i=1}^{n} P(X_i \mid {\hbox {pa}}(X_i),\varTheta _i), \end{aligned}$$
(1)

where P(G) is the prior distribution over the graph G, \({\hbox {pa}}(X_i)\) are the parent nodes of child \(X_i\), and \(\varTheta _i\) denotes the parameters of the local probability distribution. The prior comes into play only when an expert cannot describe the graph and structural learning is required. Structures and relationships that are more likely (and less likely) can be embedded into P(G) to influence searches through the posterior model space.

A simple and fully parameterized BN for a course at a University is shown in Fig. 1. This network can be viewed as a template model in which different sections of a course are taught by different professors \((\hbox {Pr} \in \{p0, p1\}\)). Similar templates can be used for various courses, e.g., Calculus, Introduction to Chemistry, etc. In this example, different teaching assistants (TA) vary in their teaching effectiveness (\(\hbox {TA}_q\in \{{\text {Poor}}, {\text {Fair}}, {\text {Good}}\}\)), and their own grade in the course influences their overall ability to convey the material in the class in well (\(\hbox {TA}_g \in \{A, B, C\}\)). A student’s grade (\({\hbox {Grade}} \in \{A, B, C\}\)) is caused by their intelligence \(\{i0,i1\}\), the professor of the course, and the grade of the TA. Finally, if a professor is asked to write a letter of recommendation for a particular student, for simplicity, this may be based on the performance of the student in the course and the professor (e.g., some are more prone to write positive letters). This scenario, although overly simplistic, may hold in large classroom settings where the teacher does not get to know the individual students well.

Fig. 1
figure 1

a Simple example of a parameterized Bayesian Network of a University multi-section course. In this scenario, a student may enroll in a course taught by different Professors. A student’s Grade is influenced by the student’s Intelligence, the Professor, and the effectiveness of their TA as measured by the TA grade in the course. The TA Quality is a direct reflection of the TA’s mastery of the material (their grade). The quality of a student’s Letter of recommendation is dependent on the grade in the course, and also the professor. b Time slices from a dynamic Bayesian network of two students that interact in a study group. The true network is given to the left, which includes both feedback and a cycle (prohibited in static BNs). Relationships can unrolled in a DBN across discrete time points (right)

The Markov assumption is apparent through the conditional probability tables (CPDs) for each node, which depend only on the parents. The head nodes (top layer) have no parents, thus the CPD table is simply a marginal probability that will sum to one across the different states of the discrete variable. On the other hand, nodes with parents are conditional on the possible combinations of the parent states. It is evident that even for a small number of parents, and a small number of states for those parents, the CPD tables can grow quickly. In our example, \(P(G \mid I, \hbox {Pr}, \hbox {TA}_g)\) has 12 possible states (or scenarios) that can occur that would influence grade-level.

Our toy example is an expert system that was written down according to knowledge about the modeling domain. Importantly, there are many scenarios that may be more or less realistic, which may include changing some edges or the addition of new variables. Nonetheless, it is often the case that a network structure can be accurately described by an expert. When a structure is prescribed for a BN, parameter learning is still required. This comes in the form of CPD tables for discrete distributions (Fig. 1). In our simple example, the probabilities for the CPD table may have been extracted from teaching evaluations, grades, or other means.

As demonstrated in our simple example, each child node is dependent on its parent nodes. The parameter learning can be viewed as a local model or distribution that involves only the child and the parents. These local models are the building blocks of the graphical model and they make up the factors in the product for the joint distribution (Eq. 1). When the variables in the model are continuous, the specification of a local model requires distribution parameters. For example, if \(\{X_1, X_2, X_3\}\) are Gaussian, and \(X_1 \longrightarrow X_2 \longleftarrow X_3\), then a local model would be of the form \(X_3 \sim N(\beta _0+\beta _1\cdot X_1+\beta _2\cdot X_3,\sigma ^2)\). Therefore, in the continuous case, the local model can be viewed as a regression on the parents. Another popular local model is a conditional Gaussian Bayesian network (CG-BN), which gives rise to regressions in which a continuous child node may have parents that are both discrete or continuous (Lauritzen 1996). To enable factorization, CG-BNs prohibit the discrete children from having continuous parents.

Structural learning is required when the network is not known and has to be learned from the data. The objective function for maximization is the posterior probability of a graph, G, given by:

$$\begin{aligned} P(G \mid X)\propto P(X \mid G) P(G), \end{aligned}$$

where P(G) is the prior on the graph. The marginal likelihood, \(P(X\mid G)\), requires complex integration over the parameters \(\varTheta \):

$$\begin{aligned} P(X \mid G)\propto \int P(X\mid \theta , G) P(\theta \mid G)~{\mathrm{d}}\theta , \end{aligned}$$

which can be alleviated with the use of conjugate priors. In an effort to accelerate the learning process, and prevent from over-fitting, a fan-in assumption is typically adopted. This limits the number of parents that a node can have (e.g., a node can have no more than three parents). The graph prior P(G) can be explicitly used to encourage certain relationships, and penalize against others (Mukherjee and Speed 2008; Hageman et al. 2011). Computation relies on the fact that each node in the network, together with the corresponding parents, represents a local model, which can be described by a regression. These individual regressions have priors on their parameters, \(P(X \mid G)\), for example a normal-Wishart prior can be used for nodes that follow a normal distribution. In practice, the posterior in a graph is calculated as a product of the local models, which is valid representation under the Markov assumption. Possible local models are often pre-computed in an effort to ease the computational demand of the learning algorithms.

The structural learning problem concerns identifying a global network that assembles these local models in a optimal way. The process is a major challenge (NP-hard), as the number of possible networks is super-exponential with the number of nodes (Chickering et al. 2001). Structural learning methods rely on sampling-based approaches or a greedy optimization, e.g., hill climbing or simulated annealing (Heckerman 2008). Sampling-based approaches rely on Markov Chain Monte Carlo (MCMC) techniques that sample the posterior distribution by moving through model space according to a proposal distribution. The proposal represents the modification to the current graph, \(G_{\mathrm{{curr}}}\), which is then evaluated and potentially accepted, \(G_{\mathrm{{new}}}\), (kept in the sample) or rejected (not kept in the sample, another proposal is attempted) (Madigan et al. 1995). A widely used proposal for a new graph in the Markov chain is to either add, delete, or reverse a single edge (Fig. 2). This proposal is implemented within a Metropolis–Hastings framework. The acceptance criterion for a new graph is determined by:

$$\begin{aligned} \min \left\{ 1, \underbrace{\frac{P(G_{\mathrm{{new}}} \mid X)}{ P(G_{\mathrm{{curr}}} \mid X)}\cdot }_{\text {Bayes Factor}} \underbrace{\frac{Q(G_{\mathrm{{curr}}} \mid G_{\mathrm{{new}}})}{Q(G_{\mathrm{{new}}} \mid G_{\mathrm{{curr}}})} }_{\text {Hastings Ratio}} \right\} . \end{aligned}$$
(2)

The Bayes factor gives a measure of goodness of fit of the proposed graph relative to the current. The Hastings ratio is simply the neighborhood size of possible moves from the current and new graph, equivalently,

\(\hbox{Neighborhood}(G_{\mathrm{{curr}}})/\hbox{Neighborhood}(G_{\mathrm{{new}}})\) (\({=}5/6\) in Fig. 2).

Fig. 2
figure 2

A widely used proposal in an MCMC sampling for structural learning in a BN is the addition, reversal, or deletion of an edge from the current graph, \(G_{\mathrm{{curr}}}\), to form a new graph, \(G_{\mathrm{{new}}}\). The proposal move is selected at random from models in the neighborhood with a probability. The Metropolis–Hastings acceptance criteria are a function of the neighborhood size for the current and propose graphs and the overall fit of those graphs to the data as measured by the Bayes factor

The directionality and causal structure of the inferred model make BN an attractive modeling paradigm for social networks that capture cause and effect relationships. Screen-based bayes net structure (SBNS) was developed as a search strategy for large-scale data, which relies on the adopted assumption of sparsity in the overall network structure (Goldenberg and Moore 2004). SBSN enforces the sparsity through a two stage process, which frames the structural learning problem as market basket analysis task. The algorithm relies on the theory of frequent sets and support, to first screen for local modules of nodes, and then connect them through a global structure search. The market basket framework lends itself to transaction style data, which is by nature large, sparse and binary. The learning problem is to identify an influence graph based on derived features of the binary transaction data. In this case, actors are assumed to be linked to each other indirectly through items or events. A simple example of individuals linked through a conference is shown in Fig. 3a. In this example, the conference attendance (transaction items) can be used to infer a network of social influence between individuals, which adds insights into the social hierarchy that are not apparent in classical interaction networks. The method was shown to be effective for modeling a variety of SNs, including citation networks, collaboration data, and movie appearance records (Berry and Linoff 1997).

Koelle et al. proposed applications of BNs to SNA for the prediction of novel links and pre-specified node features (e.g., leadership potential) (Koelle et al. 2006). The authors emphasize the advantage of BN to account for uncertainty, noise, and incompleteness in the network. For example, a topology-based network measures such as degree centrality, which is often used as a surrogate for importance, are subject to summarizations over incomplete and sometimes erroneous data. Comparatively, a BN affords more flexibility that enables measures such as importance to be estimated in a more data-dependent manner. Koelle et al. provide an example of combining topology-based network measures with covariate information (Fig. 3b). Directed inference of this type leverages small local models, which can be naturally translated to regression or classification problems, depending on the child node (response variable). In this setting, the local BN can be evaluated at the node-level, ranked probability estimates can be used for predictive purposes, and the output serves as a surrogate for model fit on a given structure.

Fig. 3
figure 3

Simplified schematics of select examples of Bayesian networks in social networks. a Inferring influence based on transaction style data that links actors to events. b DAGs can utilize network features such as attributes and centrality measures on the network itself to predict derived metrics, e.g., individual importance or leadership potential. c Twitter is a microblogging community, which can be queried using a retrieval model that is based on a Bayesian network. d An application of DBNs for click modeling in a browser. The temporal dimension is a click sequence that connects the examination of webpages with attraction and satisfaction

Privacy protection is a major concern amongst users in online social networks. Generally, people prefer that their personal information is shared in small circles of friends and family, and shielded from strangers. Despite this common desire, relatively simple BNs have been shown to be successful in the invasion of privacy though the inference of personal attributes, which have been shielded through privacy settings (He et al. 2006). The BNs operate under the often accurate assumption that friends in social circles are likely to share common attributes. In 2006, the recommendation by He et al. to improve privacy was to hide friend lists through privacy settings, and to request that friends hide their personal attributes. Practically speaking, setting the optimal privacy settings is complex, and can be a tedious and difficult for an average user (Lipford et al. 2008). In 2010, a privacy wizard template was proposed, which automates a persons privacy settings based on an implicit set of rules derived using Naive Bayes (the simplest BN) or decision tree methods (Fang and LeFevre 2010).

On the other side of the application spectrum, BNs are useful for recommending products and services, to users, taking into account their interests, needs and communications patterns. Belief propagation has been used to summarize belief about a product and propagate that belief through a BN (Ayday and Fekri 2010; Yang et al. 2013). Belief propagation is the process in which node marginal distributions (beliefs) are updated in light of new evidence. In the case of a BN, evidence (e.g., opinion or ratings) is absorbed and propagated through a computational object known as a junction tree, resulting in updated marginal distributions. Comparing the network marginals before and after evidence is entered and propagated conveys a system-wide effect of influence(s), and insights into how perception or ratings change when recommendations are passed through a network. Despite its simplicity, the BN approach has been shown to be competitive with the more classical collaborative filtering (CF)-based recommendation. Trust (and distrust) can be highly variable dynamic processes, which depends not only on distance from a recommender, but also, the characteristics of the network users (Wang and Vassileva 2003; Kuter and Golbeck 2007). Accounting for trust in recommendation systems is an open area of research

Microblogging networks represent another effective venue for rapidly disseminating information and influence throughout a community. Twitter is the most well-known microblogging network, in which posts (tweets) are short and time-sensitive with respect to the reference of current topics (Kwak et al. 2010). Users within microblogging networks of this type participate though the act of following and being followed, which gives rise naturally to directed associations (Java et al. 2007). With over 50 million tweets submitted daily, ranking and querying microblogs has become an important and active area of open research. Jabeur et al. proposed a retrieval model for tweet searches, which takes into account a number of factors, including hashtags, influence of the microbloggers, and the time (Jabeur et al. 2012a, b). A query relevance function was developed based on a BN that leverages the PageRank algorithm to estimate parameters, such as influence, in the model (Fig. 3c). The retrieval model was shown to outperform traditional methods for information retrieval on Twitter data from the TREC Tweets 2011 corpus (Ounis et al. 2011).

Dynamic Bayesian Networks The static BNs described depict a network at a single time point. This is most often an oversimplification of the true nature of the network, which is inherently dynamic. Modeling the dynamics of a network over the time-course can be achieved in the BN framework with additional modeling assumptions. Dynamic Bayesian networks (DBNs) provide compact representations for encoding structured probability distributions over arbitrarily long time-courses (Murphy 2002). State-space models, such as hidden Markov model (HMM) and Kalman filter models (KFMs), can be viewed as a special class of the more general DBN. Specifically, KFMs require unimodal linear Gaussian assumptions on the state-space variables. HMMs do not allow for factorizations within the state-space, but can be extended to hierarchical HMMs for this purpose. Overall, DBNs enable a more general representation of sequential or time-course data.

DBN modeling is achieved through the use of template models, which are instantiated, i.e., duplicated, over multiple time points. The relationships between the variables within a template are fixed, and represent the inherent dependencies between ground variables in the model. There are three types of edges in a DBN. Intra-time slice edges represent dependencies within a time slice. Persistent edges link the same variable in two time slices; for example, the velocity of a vehicle at a time slice is very dependent on the velocity of the vehicle in the previous time slice. Finally, inter-time slice edges connect different variables between time slices, for example, the velocity of a car may also be influenced by weather at the previous time slice.

The objective is to model a template variable over a discretized time-course, \(X^{0}\cdots X^{\mathrm{T}}\), and represent \(P(X^{{0}}:X^{\mathrm{T}})\) as a function of the templates over the range of time points. Reducing the temporal problem to conditional template models makes the problem computationally tractable, but requires the specification of a fixed structure across the entire time trajectory. In a DBN, the probability for a random variable X spanning the time-course can be given in factored form,

$$\begin{aligned} P(X^{0:T})=P(X^{0})\prod _{t=0}^{T-1}P(X^{t+1}\mid X^{t}), \end{aligned}$$

where \(X^{0}\) represents the initial state, and the conditional probability terms of the form \(P(X^{t+1}\mid X^{t})\) convey the conditional independence assumptions. The conditional representation of the likelihood is similar in spirit to the static BN representation, but conveys the conditional independence with respect to time. The Markov assumption enables this factorization, which has different, yet analogous meanings in static and dynamic BNs. In a DBN, the Markov assumption explains the memoryless property, i.e., that the current state depends on the previous and is conditionally independent of the past \((X^{t+1}\bot X^{0:t-1}\mid X^{t})\). Comparatively, in static BNs, the Markov assumption only captures nodes’ independence of their non-descendants, given the states of their parents.

Briefly, the learning paradigms are rather similar. Structural learning is typically achieved by the same scoring strategies, but with the added constraint that the structure must repeat over time (Friedman et al. 1998). Such a constraint alleviates the computational burden for search strategies. Additionally, the best initial structure can be searched for independently from the remainder of the time-course. The search is performed either through greedy hill climbing or sampling.

A major advantage of DBNs is that they can be enriched to accommodate more complex interactions that would violate DAG assumptions in a static BN. Figure 1b shows a simple example of a common situation where the true network has feedback in the form of self-loops and a cycle in the graph. This feedback is prohibited in a static BN, but can be captured in a DBN. In this scenario, two students form a study group, but also self-study, in an effort to improve learning outcomes. The unrolled BN captures these relationships for two time slices that contains persistent and inter-time slice edges. These relationships are preserved over a time series (e.g., a semester), thereby forming a template model.

Despite the fact that social networks are inherently dynamic, the applications of DBNs in SNA have been limited. Importantly, there have been many attempts to model social networks probabilistically over time, but not in the strict PGM context, which is the focus of this review; many of these advances are mentioned in the discussion. Chapelle et al. used DBNs to model web users’ browsing history (Chapelle and Zhang 2009). The DBN extends the traditional and widely used cascade model for browsing behavior. The dynamic of click sequences for single click (Fig. 3d) takes into account the information at the query and session levels, differentiating perceived/actual attraction (\(a_u\) and \(A_i\) respectively) and perceived/actual satisfaction (\(s_u\) and \(S_i\) respectively) with links. At each click (time-step), the hidden binary variables for examination (\(E_i\)) and satisfaction (\(S_i\)) track the time progression to predict future clicks. The DBM approach was shown to outperform traditional methods, and highlighted the sensitivity of click modeling to measures of relevance and popularity at the query level.

Meetings can be viewed as social events, in which valuable information is exchanged mainly through speech. Effectively processing, capturing, and organizing this information can be costly, but important in order to maximize the impact and information flow for participants. Dielman et al. cast the problem of meeting structuring as a DBN, which partitions meetings into sequences of actions or phases based on audio (Dielmann and Renals 2004). DBNs outperformed baseline HMMs in detecting meeting actions in a smart room, such as dialog, notes at the board, computer presentations, and presentations at the board.

Twitter and microblogs, in general, have become a major resource for the media to obtain breaking news or a the occurrence of a critical event. Recently, Sakaki et al. showed that tweet modeling via Kalman Filtering is effective for the prediction of earthquakes of a certain magnitude in Japan. Furthermore, they developed a reporting system Torreter, which is quicker than the existing government reporting system in warning registered individuals through email of an impending quake (Jansen et al. 2009).

3 Undirected probabilistic graphical models

Markov networks (MNs), also known as Markov random fields (MRFs), are PGMs with undirected edges. Similar to directed BNs, a MN is a representation of the joint probability distribution between random variables (represented by nodes), where the absence of an edge between two nodes implies conditional independence between the nodes, given the other nodes in the network. In this review, we restrict our focus to MNs, Markov logic networks (MLNs) and exponential random graph models (ERGMs), which can be viewed as generalizations of the random graphs (Frank and Strauss 1986), and are widely used in SNA (Newman et al. 2002). The basic formulation of these models and their utility in SNA will be highlighted.

Markov networks express the joint probability of given random variables by decomposing the network into smaller complete sub-graphs known as cliques, and use maximal cliques to capture the random variables’ dependencies. A clique is a maximal clique if it cannot be extended to include additional adjacent nodes. The clique representation enables a compact factorization of the probability density function (pdf). The joint pdf of n random variables, \(X=\{X_1,\ldots , X_n\}\), with conditional (in)dependencies captured by a graph, can be expressed as:

$$\begin{aligned} P(X)=\frac{1}{Z}\prod _{C\in \varOmega }\psi _C(X_C), \end{aligned}$$
(3)

where C is a maximal clique in the set of maximal cliques \(\varOmega \). Let \(X_C\) denote a subset of X comprised of the random variables that form clique C. Clique potential \(\psi _C(X_C)\) is a function of these variables (e.g., the frequency of distinct realizations of the random variables forming the clique). A unique clique potential can be specified for each clique; the size of a clique can be one determinant of the corresponding clique potential; however, cliques of the same size may have different clique potentials. The clique potentials are positive functions that capture the dependence of the variables within the cliques (Koller and Friedman 2009). The normalizing constant, also known as the partition function, is given as:

$$\begin{aligned} Z=\sum _{X\in \chi } \prod _{C\in \varOmega }\psi _C(X_C). \end{aligned}$$

Each clique potential in a MN is specified by a factor, which can be viewed as a table of weights for each combination of values of variables in the potential. In some special cases of MNs such as log-linear models (Murphy 2012), clique potentials are represented by a set of functions, termed features, with associated weights (i.e., \(\theta _C\phi _C(X_C)=\log (\psi _C(X_C))\), where \(\phi _C(X_C)\) is a feature derived from the values of the variables in set \(X_C\)) and \(\theta _C\) is the weight of \(\phi _C(X_C)\) estimated at the parameter learning stage.

The Hammersley–Clifford theorem specifies the conditions under which a positive probability distribution can be represented as a MN. Specifically, the given representation (Eq. 3) implies conditional independencies between the maximal cliques and is, by definition, a Gibbs measure (Murphy 2012).

A simple example of the use of MNs to study the collective behavior in a social network is shown in Fig. 4. Suppose each member of a friendship network (Fig. 4a) is looking to make a decision about purchasing a given product, “liking” a post in an online forum, supporting a political party, participating in a school activity or choosing a family doctor. In this setting, the random variables in the nodes are binary and the edges indicate pairwise dependencies between them. Let \(X_1,\ldots , X_5\) denote five random variables, each of which takes on the value of 1 or 0 to signal the member’s attitude—Like or Dislike, respectively. Figure 4b depicts the MN with three cliques a, b and c, \(\varOmega =\{a,b,c\}\), with \(X_a=\{X_1,X_2,X_3\}\), \(X_b=\{X_1,X_5\}\) and \(X_c=\{X_4,X_5\}\). The MN includes three clique potentials, \(\psi _a(X_1,X_2,X_3),\psi _b(X_1,X_5)\) and \(\psi _c(X_4,X_5)\), satisfying the requirements of the Hammersley–Clifford theorem under the assumption that a member’s decision can be affected by only their immediate friends and that it matters if those friends are also friends with each other. The joint probability function of \(X_1,\ldots , X_5\) is expressed as:

$$\begin{aligned} P(X_1,\ldots ,X_5) = \frac{1}{Z}\psi _a(X_1,X_2,X_3)\psi _b(X_1,X_5)\psi _c(X_4,X_5), \end{aligned}$$

where \(Z=\sum _{X_1,X_2,\ldots ,X_5}\psi _a(X_1,X_2,X_3)\psi _b(X_1,X_5)\psi _c(X_4,X_5)=3\times 1\times 13+\cdots +9\times 10\times 19\) (the summation is taken over all the \(2^5\) distinct realizations of the model’s variables; the first explicitly written term above corresponds to the case where the variables are all zeros, and the last term corresponds to the case where the variables are all ones). To parameterize the MN and obtain a log-linear model, let \(\theta _a\phi _a(X_1,X_2,X_3)=\log (\psi _a(X_1,X_2,X_3))\), \(\theta _b\phi _b(X_1,X_5)=\log (\psi _b(X_1,X_5))\) and \(\theta _c\phi _c(X_4,X_5)=\log (\psi _c(X_4,X_5))\). With \(\varTheta =\{\theta _a,\theta _b,\theta _c\}\), the joint probability function defined by the MN becomes:

$$\begin{aligned} P(X_1,\ldots ,X_5) =\frac{1}{Z(\varTheta )} \exp \{\theta _a\phi _a(X_1,X_2,X_3)+\theta _b\phi _b(X_1,X_5)+ \theta _c\phi _c(X_4,X_5)\}, \end{aligned}$$

where \(Z(\varTheta )\) is the partition function. The set of the model parameters, \(\varTheta \), would need to be estimated from any available data of known decisions made by the social network members. Figure 4c depicts a factor graph corresponding to the MN. Factor graphs are bipartite graphs used to specify the factorization of the probability distribution function, and also, to inform the computation of marginal probability distributions of MN variables (Murphy 2012).

Fig. 4
figure 4

a A social network and a corresponding MN model (b). The nodes of the MN are actors’ decisions and the variable dependencies are defined based on the ties in the social network. The three tables show the frequencies of hypothetical (observed) Like and Dislike combinations. c A factor graph for the MN in b

MN specification problems, including parameters estimation and structure learning from data, can be quite challenging. The main difficulty in MN parameter estimation is that the maximum likelihood problem formulated with Eq. 3 has no analytical solution due to the complex expression of Z (Lee et al. 2006). The problem of finding the optimal structure of the MN using available data, similar to BNs, is even more challenging (Bromberg et al. 2009). Currently existing approaches to structure learning are either constraint-based or score-based (see Koller and Friedman 2009; Ding 2011; Schmidt et al. 2010 for more details).

MNs have found increased utility in SNA with the emergence of online social networks (OSNs) and digital social media (see Bonchi et al. 2011 for a review of key problems in SNA). The need to capture non-causal dependencies within and between data instances (e.g., profile information) and observed relationships (e.g., hyperlinks) in these applications is exacerbated by the presence of missing or hidden data in OSNs (Xiang and Neville 2013). A popular problem instance in this domain, that of user (missing) profile prediction, has been attacked using MNs (Taskar et al. 2002; Neville and Jensen 2007).

Along with the problem of predicting missing profiles, link prediction is among the most prominent problems in Big Data SNA. Multiple variations of MNs that have been used to estimate the probability that a (unobserved) link exists between nodes include Markov logic networks, relational Markov networks, relational Bayesian networks and relational dependency networks (Al Hasan and Zaki 2011; Chen et al. 2013; Tresp and Nickel 2013). Detection of community structures is another area of MN application (Newman 2006). Communities can be discovered through examination and subsetting (cutting) network relationships according to labels of interest, and through the use of weighted community detection algorithms. Social network clustering is especially challenging in a dynamic context, e.g. in mobile social networks (Humphreys 2007). Wan et al. employed undirected graphical models (i.e., conditional random fields) constructed from mobile user logs that include both communication records and user movement information (Wan et al. 2012).

Several generative models have been proposed, which are motivated by MNs, and explain the effects of selection and influence (e.g., see Aggarwal 2011). Modeling channeled spread of opinions and rumors, known more generally as diffusion modeling, is an active area of research in SNA (Bach et al. 2012). Several applications of diffusion models have been proposed for social networks including, but not limited to the spread of information (Cowan and Jonard 2004), viral marketing (Kempe et al. 2003), spread of diseases (Anderson and May 1979), the spread of cooperation (Santos et al. 2006). Given a social network, for each node, a corresponding random variable indicates the state of the node (e.g., product or technology adoption) and links in the network represent dependency (Wortman 2008).

Markov logic networks employ a probabilistic framework that integrates MNs with first-order logic such that the MN weights are positive for only a small subset of meaningful features viewed as templates (Richardson and Domingos 2006). Formally, let \(F_i\) denote a first-order logic formula, i.e., a logical expression comprising constants, variables, functions and predicates, and \(w_i \in \mathcal {R}\) denote a scalar weight. An MLN is then defined as a set of pairs \((F_i,w_i)\). From the MLN, the ground Markov network, \(M_{L,C}\), is constructed (Richardson and Domingos 2006) with the probability distribution (Tresp and Nickel 2013),

$$\begin{aligned} P(X=x)=\frac{1}{Z}\exp \left( \sum _i{w_in_i(x)}\right) , \end{aligned}$$
(4)

where \(n_i(x)\) is the number of true groundings (e.g., true logic expressions based on observations) of \(F_i\), i.e., such formulae that hold, in x.

A simple example network of five senators is shown in Fig. 5a. In this setting, each senator supports one of two political parties (Democratic or Republican). Each senator in this network has two attributes (1) political affiliation, (R(n) is 1 if senator n is a republican and 0 otherwise for \(n \in \{A,B,C,D,E\}\)), and (2) supporting a particular bill, (S(n) is 1 if senator n supports the bill and 0 otherwise). Let F(nm), a binary symmetric function, denote the relationship between senators n and m (\(n\ne m\)). Suppose “Republicans do not support the bill” and “If two senators have a relationship and one is republican then so is the other” are two logical statements denoted by \(F_1\) and \(F_2\). The first-order logic format is given as follows:

$$\begin{aligned}&F_1:\;\forall n, \;R(n) \Rightarrow \lnot S(n),\\&F_2:\;\forall n,m \; F(n,m)\wedge R(n) \Rightarrow R(m). \end{aligned}$$

The MLN is similar to the ground network (Fig. 5b). However, only the combinations of variables corresponding to logical statements, \(F_1\) and \(F_2\), are parameterized in the MLN (all the other weights are zero). Let \(X=\{X_1,\ldots ,X_{15}\}\) denote the set of all nodes in the \(M_{L,C}\) where \(X_i\) indicate node i (e.g., \(X_1\) is the node labeled S(A) in Fig. 5b). In an MLN, clique potentials are defined similar to those in Markov networks. Now, one can use this MLN to find the probability that all senators in this example support the bill (i.e., \(P(S(A)=1,\ldots ,S(E)=1\)). To calculate the number of true groundings (\(n_1\) and \(n_2\)), both \(F_1\) and \(F_2\) should be examined for all nodes in the observed network. More generally, this approach can be implemented to estimate missing profiles in social networks as well.

Fig. 5
figure 5

An example of using MLN in the political science. a Depicts a social network of five senators with two attributes. The ground predicates (b) are denoted by 15 elliptical nodes. The red ones are captured by the links of the social networks and dark blue nodes indicate nodes’ attributes. Two first-order logics, \(F_1\) and \(F_2\), determine the structure of the MLN. There exist five groundings of the \(F_1\) (illustrated by the edges between the R(x) and S(x)′ nodes) and 15 groundings of \(F_2\) captured by the rest of the edges. Other examples of MLNs can be found in Tresp and Nickel (2013)

Many problems in statistical relational learning, such as link prediction (Domingos et al. 2008), social network modeling, collective classification, link-based clustering and object identification, can be formulated using instances of MLN (Richardson and Domingos 2006). Dierkes et al. used MLNs to investigate the influence of Mobile Social Networks on consumer decision-making behavior. With the call detail records represented by a weighted graph, MLNs were employed in conjunction with logit models as the learning technique based on lagged neighborhood variables. The resulting MLNs were used as predictive models for the analysis of the impact of word of mouth on churn (the decision to abandon a communication service provider) and purchase decisions (Dierkes et al. 2011).

As mentioned above, link mining and link prediction problems can also be addressed using MLNs, since MLNs combine logic and probability reasoning in a single framework (Domingos et al. 2010). Furthermore, the ability of MLNs to represent complex rules by exploiting relational information makes them an appropriate alternative for collective classification (e.g., classification of publications in a citation network, or of hyperlinked webpages) (Crane and McDowell 2011).

The Ising model and its variations form a subclass of MN with foundations in theoretical physics. The Ising model is a discrete and pairwise MN, and is popular in applications in part due to its simplicity (Koller and Friedman 2009). The variables in the model, \(X_1 \cdots X_p\), are assumed to be binary, and their joint probability is given as:

$$\begin{aligned} P(X,\varTheta )=\exp \left( \sum _{(j,k)\in E}\theta _{jk}X_j X_k - \varPhi (\varTheta )\right) \quad \forall \; X \in \chi , \end{aligned}$$

where \(\chi \in \{-1,1\}^p\), and \(\varPhi (\varTheta )\) is the log of the partition function

$$\begin{aligned} \varPhi (\varTheta ) = \log \sum _{x \in \chi } \left[ \exp \left( \sum _{(j,k)\in E}\theta _{jk}x_jx_k\right) \right] . \end{aligned}$$

Special, efficient methods exist for learning the Ising Model parameters from data (Ravikumar et al. 2010). While the model has been originally found useful for understanding magnetism and phase transitions, its utility has later expanded to image processing, neural modeling, and studies of tipping points in economics and social domains (Afrasiabi et al. 2013).

In SNA, the Ising model can be employed to analyze factors such as network substructures and nodal features affecting the opinion formation process. A classical example within this is a study of medical innovation spread, namely the adoption of drug tetracycline by 125 physicians in four small cities in Illinois (Van den Bulte and Lilien 2001). A small subset of this network is illustrated in Fig. 6. The adoption status of node k is represented by \(X_k \; \forall \; k=1,\ldots ,5\), where \(X_k=1\) if adopted (blue nodes) and 0 otherwise (black nodes). Since the Ising model only captures pairwise dependencies between variables, the corresponding MN only considers cliques of size two (i.e., dyads); hence, one can concisely write the clique potentials in the form \(\psi (X_k,X_j)=X_kX_j\). Let \(n^+\) and \(n^-\) denote the number of agreements (i.e., cases with\(X_kX_j=1\) for some k and j) and the number of disagreements (i.e., cases with \(X_kX_j=-1\) for some k and j), respectively. Assuming \(\theta _{jk}=\theta _{A}\) if \(X_kX_j=1\) and \(\theta _{jk}=\theta _{D}\) if \(X_kX_j=-1\), \(\varTheta =\{\theta _{A},\theta _{D}\}\), one can obtain \(\sum _{(j,k)\in E}\theta _{jk}X_j X_k=\theta _{A}n^+-\theta _{D}n^-\). Hence, the joint probability of all nodes’ adoption status is:

$$\begin{aligned} P(X_1,\ldots ,X_5,\varTheta )=\exp ( \theta _{A}n^+-\theta _{D}n^--\varPhi (\varTheta )), \end{aligned}$$

where \(\varPhi (\varTheta )=\log \sum _{x \in \chi } [ \exp (\theta _{A}n^+-\theta _{D}n^-)]\) is the partition function. For this small example, the table in Fig. 6b shows all clique potentials which are either \(-1\) or 1 based on the network structure. The counts of agreements and disagreements are obtained next (e.g., \(n^+=5\) and \(n^-=2\) in Fig. 6b). The partition function has 32 additive terms: each combination of \(X_k\)’s leads to particular values of \(n^+\) and \(n^-\) and all these values affect the value of \(\varPhi (\varTheta )\).

In the model presented above, the counts of possible agreements and disagreements depend on the network structure, so the MN can be said to explore the impact of homophily on tetracycline adoption decisions. Note that the model’s parameters, \(\theta _{A}\) and \(\theta _{D}\), first need to be estimated from any given data (i.e., from a single observation of a network of (non)adopters); however, the approaches to such parameter estimation are beyond the scope of this paper.

Fig. 6
figure 6

An example of implementing the Ising model to find the probability of adopting a new medication. a A sub-network of an physicians’ advisory network with 5. b A pairwise Markov network is constructed where the cliques with size of at most 2 are involved

Figure 7 depicts the entire physicians’ advisory network from a data set prepared by Ron Burt from the 1966 data collected by Coleman et al. (1966) about the spread of medical innovation. The figure illustrates the physicians’ network in two different time points and shows how physicians changed their opinions and adopted the new medication overtime. To find the probability of adoption, the Ising model can be modified by considering the impact of nodal attributes on the adoption.

Fig. 7
figure 7

The spread of new drug adoption through an advisory network of physicians: two snapshots at different time points, about 2 years apart (from left to right). The growth dynamics in the number of adopters can be analyzed with an Ising Model

Recently, the Ising Model has been used to examine social behaviors (Vega-Redondo 2007), including collective decision making, opinion formation and adoption of new technologies or products (Grabowski and Kosiński 2006; Krause et al. 2012). For example, Fellows et al. proposed a random model of the full network by modeling nodal attributes as random variates. They utilized the new model formulation to analyze a peer social network from the National Longitudinal Study of Adolescent Health (Fellows and Handcock 2012). Agliari et al. (2010) proposed a model to extract the underlying dynamics of social systems based on diffusive effects and people strategic choices to convince others. Through the adaptation of a cost function, based on the Ising model, for social interactions between individuals, they showed by numerical simulation that a steady-state is obtained through natural dynamics of social systems.

Exponential random graph models (ERGMs) (Wasserman and Pattison 1996), also known as the \(p^*\)-class models, are among the most widely used network approaches to modeling social networks in recent years (Pattison and Wasserman 1999; Robins et al. 1999, 2007a). A social network of individuals is denoted by graph \(G_s\) with N nodes and M edges, \(M\le {N\atopwithdelims ()2}\). The corresponding adjacency matrix of is denoted by \(Y=[y_{ij}]_{N\times N}\), where \(y_{ij}\) is a random variable and defined as follows:

$$\begin{array}{ll} y_{ij} = \left\{ \begin{array}{ll} 1 & \hbox {if there exists a link between nodes }i \hbox { and }j\quad \forall i, j, i\ne j \\ 0 & \hbox {otherwise.} \end{array}\right. \end{array}$$

Based on an ERGM, the probability of any observed network, y, is:

$$\begin{aligned} P(Y=y,\varTheta )=\frac{1}{Z}\exp \left( \sum _{i =1}^K{\theta _{i}f_{i}(y)}\right) , \end{aligned}$$
(5)

where \(f_{i}(y), i=1,\ldots ,K\), are called sufficient statistics (Morris et al. 2008; Lusher et al. 2012), or motifs based on configurations of the observed graph and \(\varTheta =\{\theta _{1},\ldots ,\theta _{K}\}\) is a K-vector of parameters (K is the number of different sufficient statistics used in the model). Network configurations used to compute sufficient statistics, including but not limited to network edge count (tie between two actors), as well as counts of 2-stars (two ties sharing an actor) and triads of various types, are related to communication patterns among actors in a social network (see Lusher et al. 2012 for more details about network configurations). The parameters of an ERGM describe the probabilities of a wide variety of possible configurations in social networks (Robins et al. 2001). Again, Z is called the normalization constant.

As an example, a social network of five individuals is assumed. Since the edges (ties) between nodes are considered as random variables, the given network is the most likely realization out of many possible networks. In this case, an ERGM constructs a probability distribution over all possible networks with five nodes. Figure 7 illustrates the social network (A) and the corresponding graph where edges, \(y_{ij} \; \forall i,j=1,\ldots ,5\) represent random variables along with five sufficient statistics (\(f_i(y)\; i=1,\dots ,5\)) including edge, 2-star, 3-star, 4-star and triangle (B). The probability distribution of any possible network is obtained as follows:

$$\begin{aligned} P(Y=y,\varTheta )=\frac{1}{Z}\exp \left( \theta _1f_1(y)+ \theta _2f_2(y)+\theta _3f_3(y)+\theta _4f_4(y)+\theta _5f_5(y)\right) , \end{aligned}$$

where y is any observed network with five nodes, \(\varTheta =\{\theta _1,\ldots ,\theta _5\}\), the set of weights of sufficient statistics, are estimated through solving an optimization problem where the probability of the observed network is maximized. The exact computation of the normalization constant, Z, requires handling of many terms (all possible network realization must be considered and their corresponding sufficient statistics calculated). This challenge is conventionally handled using Markov Chain Monte Carlo (MCMC) sampling technique (Snijders et al. 2006).

Fig. 8
figure 8

a A social network with 5 nodes and b the corresponding realization network (graph) and sufficient statistics of the observed network

Some of the first proposed models, e.g., random graphs and \(p_1\) models (Frank and Strauss 1986), used Bernoulli and dyadic dependence structures, which are generally overly simplistic (Robins et al. 2007a). On the contrary, ERGMs are based on Markov dependence assumption (Frank and Strauss 1986) supposing that two possible ties are conditionally dependent when they share an actor (node). Moreover, Markov dependence assumption can be extended to attributed networks which assumes each node has a set of attributes influencing the node’s possible incoming and outgoing ties (Robins et al. 2007a) (e.g., more experienced actors in an advisory network, more incoming ties). When nodal attributes are taken into account as random variables, ERGMs and MNs can be integrated to model the social network due to similarities that they share (see the Appendix and Fellows and Handcock 2012; Thiemichen et al. 2014; Lusher et al. 2012).

ERGMs have been widely employed to study the network and friendship formation (Song et al. 2014) and global network structural using local structure of the observed network (Uddin et al. 2013a). The observed network is considered as one realization from too many possible networks with similar important characteristics (Robins et al. 2007a). For example, Broekel and Hartog (2013) used ERGMs to identify factors determining the structure of inter-organizational networks based on the single observation. Schaefer and Simpkins (2014) used SNA to study the relation between weight status and friend selection and ERGMs to measure the effects of body mass index on friend selection.

Moreover, Goodreau et al. (2009) used ERGMs to examine the generative processes that give rise to widespread patterns in friendship networks. Cranmer and Desmarais used ERGMs to model co-sponsorship networks in the U.S. Congress and conflict networks in the international system. They determined that several previously unexplored network parameters are acceptable predictors of the U.S. House of Representatives legislative co-sponsorship network (Cranmer and Desmarais 2011).

The ERGMs have also been utilized in modeling the changing communication network structure and classifying networks based on the occurrence of their local features (Uddin et al. 2013a) and to identify micro-level structural properties of physician collaboration network on hospitalization cost and readmission rate (Uddin et al. 2013b). Finally, a ERGM-based model of clustering nodes considering their role in the network has been reported (Salter-Townshend and Murphy 2014).

4 Discussion

Mining social networks for knowledge and discovery has proven to be a very challenging and active research area. This review focussed on PGMs. The directed and undirected PGM paradigms were described and their applications to social networks were highlighted. An important consideration and major challenge is the issue of scalability, not only for PGMs, but for SNA, in general. Structural and parameter learning in high dimensions can be prohibitive. Moreover, for structural learning, both greedy- and sampling-based search strategies can get stuck at local minima, and many graphs may be likelihood equivalent. These numerical caveats can give rise to misleading networks, generating models, and subsequent predictions. In addition, ERGMs can exhibit degeneracy, which occurs when the generated networks show little resemblance to the generating model. Proposed modifications to the concept of goodness of fit have been proposed to safeguard against the problems of degeneracy (Goodreau 2007; Hunter et al. 2008).

In the majority of applications of PGMs (both directed and undirected) in SNA, the graphical structures are assumed to be either known or designed by human experts (i.e., captured directly by social networks), thereby the learning problem is limited to the parameter estimation. However, practically hand-constructed PGMs for SNA have many barriers: time taken to construct them varies from hours to months, experts can be costly or unavailable, the data may be huge and errors may lead to poor answers. On the other hand, structure learning is NP-hard with the hypothesis space being super-exponential (\(2^{O(n^2)}\)) networks.

Directed and undirected graphs share common interpretations in terms of conditional independences. Selection of a PGM modeling paradigm is not trivial and is driven by the data and ultimately what the user hopes to achieve with the model. When the relationship can be viewed in terms of cause and effect, BNs are more appropriate, and when the relationship is association, MNs are preferred. Inferences in both paradigms are met with challenges. The types of variables (continuous or discrete) have to be carefully considered. Modeling with a mixture of these variables is possible in the case of BNs under strict assumptions. However, the inference problem becomes more sensitive to sample size, as the parameters estimated for the local models are done so from a potentially reduced population, which can be severely subset by level factors of parent nodes. Another important learning task, outside of the scope of this review, is queries that involve the absorption of evidence (e.g., new data) in the network and propagation through the network. This process is known as belief propagation and it takes place on a factor graph (aka cluster graph). In the case of BNs, the factor graph is a factor tree (aka junction tree), and the propagation schemes give rise to exact inferences of marginal distributions (aka beliefs). On the other hand, in MNs the factor graph may have cycles, which does not ensure exact inference in terms of marginals, but has still been shown to be useful in practice, see Koller and Friedman (2009) for more details.

There are several opportunities to access open source data resources in order to develop and test methodologies for PGMs, and related areas. Max-Plank researchers have released OSN data used in publications, which includes crawled data from Flickr, YouTube, Wikipedia and Facebook (Mislove et al. 2007; Cha et al. 2008, 2009; Viswanath et al. 2009). Several directed OSNs have been released in the Stanford Network Analysis Package (snap), e.g. from Epinions, Amazon, LiveJournal, Slashdot and Wikipedia voting (Stanford 2011). Recently, a Facebook dataset was released that exhibited convergence properties and was shown to be representative of the underlying population (Gjoka et al. 2010). Document classification datasets have also been released (Getoor 2012). A sample from the CiteSeer database contains 3312 publications from one of six classes, and 4732 links. The Cora dataset consists of 2708 publications classified into seven categories and the citation network has 5429 links. Each publication is described by a binary word vector which indicates the presence of certain words within a collection of 1433. WebKB consists of 877 scientific publications from five classes, contains 1601 links and includes binary word attributes similar to Cora. Terrorism databases are also publicly available (Division 1948; National Consortium for the Study of Terrorism and Responses to Terrorism 2015). The most extensive is the RAND Database of Worldwide Terrorism Incidents, which details terrorist attacks in nine distinct regions of the world across the time-span 1968–2009 (dates vary slightly depending on region) (Division 1948). Several well-known challenges may arise in the analysis and representation of terrorist network data, including incomplete information, latent variables influencing node dynamics, and fuzzy boundaries between terrorists, supporters of terrorists, and the innocent (Sparrow 1991; Krebs 2002). The DBLP computer science bibliography (http://dblp.uni-trier.de/db/) is a massive online database that contains bibliographic meta-data for over 2.6 million publications. There is also ample opportunity to enroll in various data challenges, which are often posed by corporations and operators of the networks themselves.

In this review, we surveyed directed and undirected PGMs, and highlighted their applications in modern social networks. Despite limitations that arise related to scalability and inference, it is our opinion that the utility of PGMs has been somewhat under-realized in the social network arena. It is indisputable that methods for understanding social networks have not kept pace with the data explosion. There are several relevant topics and opportunities in social networks, e.g., link predication, collective classification, modeling information diffusion, entity resolution, and viral marketing, where conditional independencies can be leveraged to improve performance. PGMs implicitly convey conditional independence and provide flexible modeling paradigms, which hold tremendous promise and untapped opportunity for SNA.