1 Introduction

Relational data can be represented in the form of networks such as social network, biological network and knowledge graph (Lü and Zhou 2011; Nickel et al. 2016). Social networks have a vast diversity of relations such as friendship in Facebook and follower on Twitter (Davis et al. 2011; Chung et al. 2016). A network can be visualized as a graph in which nodes represent entities and edges correspond to links or interactions. The increasing availability of online social networks is useful not only in social network analysis, such as community detection or link prediction, but also in recommendation systems. Networks are powerful representations and are employed in different tasks such as machine learning and data mining (Al Hasan et al. 2006; Ngonmang et al. 2015). Developing models to analyze such relational data has attracted a considerable amount of attention, where link prediction is a fundamental task. Link prediction is a task of predicting unseen links with a given social graph. The prediction of missing links or links that will be formed in the future based on snapshots of the network, is formally defined as a link prediction problem (Liben-Nowell and Kleinberg 2007; Clauset et al. 2008; Lü and Zhou 2011; Dunlavy et al. 2011).

Link prediction is an important task in link mining (Getoor and Diehl 2005). In order to predict links in relational data, one needs to provide a model for different types of information in the graph (Kashima et al. 2009; Nguyen and Mamitsuka 2012). Node information is the first type of information used to predict links, and it is captured from entities such as node attributes(Rahman and Al Hasan 2016). The other one is the relationship between two nodes. While modeling entities are common practice, modeling links are usually more complex (Nickel et al. 2016). Due to the different types of information, various methods have been presented (Bliss et al. 2014). Therefore, designing a model that can cope with such information is a challenging task, which is contrary to nature, and extract latent features.

One of the remarkable features of the social network link prediction is the constant change in size of the network which increases and decreases links and entities over time ( Dunlavy et al. 2011; da Silva Soares and Prudêncio 2012; Heaukulani and Ghahramani 2013; Rossetti et al. 2015). The dynamic of such a network makes the study of these graphs a challenging task. Moreover, a large dynamic network may be complicated by the multi relational data (Davis et al. 2011; Yang et al. 2012; Sarkar et al. 2014; Garcia-Duran et al. 2016). Mostly, social networks are heterogeneous and dealing with multiple links and nodes may be intricate. The sparsity of linked data introduces another challenge(Getoor and Diehl 2005; Nguyen and Mamitsuka 2012; Zhai and Zhang 2015). When a network is sparse, it is sensitive to noise and in a dynamic and heterogeneous network, noise rate changes abruptly over time. Over a period of time nonlinear transformations are commonly seen in dynamic networks with seasonal fluctuations (Sarkar et al. 2012; Li et al. 2014b). Catching such nonlinearity is expensive and time-consuming (Zhu et al. 2016b). The wide range of methods is presented to overcome some of the mentioned challenges. However, choosing an appropriate method could be challenging because there are problems in social network link prediction. With regard to this, a triplex analytical framework, which classifies link prediction approaches was introduced. Moreover, the proposed framework evaluates each category based on our presented functional measure. Our triplex framework led to empirical and technical comparison of link prediction methods and provides a reference point for future researches by recording different techniques and methods concerning the topic. They were also evaluated by specifying the key measures to reflect on the capabilities and characteristics of the proposed methods.

The rest of the paper is organized as follows: In the next section, related works are reviewed. Section 3 describes the social network. In Sect. 4, the formal definition of the link prediction problem is presented. Section 5 shows the general architecture of link prediction in the social network. In Sect. 6, the proposed framework is presented. Section 7 includes the conclusion and future works.

2 Previous related review study

In order to deal with the challenging task of link prediction, a considerable amount of literature have attempted to exploit the relational and temporal nature of data, so as to demonstrate improved performance by compounding related sources of information in their modeling framework. An earlier study on predicting links in social networks is that proposed by Liben-Nowell and Kleinberg (2007). Their model focuses on the unsupervised approach, most of which either generate scores based on node neighborhoods or path information. They examined various similarity indices, including common neighbors, preferential attachment, Adamic-Adar, Jacard, SimRank, hitting time, rooted PageRank and Katz. They found that these similarity indices work well as compared to a random predictor. Later, Al Hasan et al. (2006) expanded this work in two directions. They affirmed that using social network data as well as graph topology can significantly improve the prediction result. Further, they used various similarity indices as entries of feature vector in a supervised learning setup and the link prediction problem was mapped as a binary classification task. Since then, the supervised classification approach has been popular in various other works in link prediction (Bilgic et al. 2007; Wang et al. 2007; Doppa et al. 2009).

Lü and Zhou (2011) classified link prediction approaches into three main categories which include recent approaches such as: (1) similarity based approach, (2) Maximum Likelihood Methods, (3) Probabilistic Models. They reported that an obvious drawback of the maximum likelihood methods is that it is very time consuming; while the probabilistic model will optimize a built target function to establish a model which can best fit the observed data. It is noteworthy that they emphasized on studies conducted by statistical physicists.

Al Hasan and Zaki (2011) investigated feature-based link prediction, probabilistic models, graphical models and linear algebraic models as the most commonly used approaches in link prediction. However, with the exception of Taskar et al. (2003), most of these models were designed for homogeneous networks that consider the same type of nodes and the same type of links in the network. Another line of research, Wang et al. (2015) surveys current trends/methods in link prediction problem. The main topics covered by Wang et al. (2015) include latest link prediction techniques, link prediction applications and active research groups.

In conjunction with all the above studies, a comprehensive classification and evaluation of link prediction approaches based on the introduced evaluation criteria, is presented. This study covers more recent link prediction works, which earlier surveys (which are at least 5 years old) did not cover.

3 Notation

Before proceeding, let us define our mathematical notation. Scalars are denoted by lower case letters such as c; column vectors are denoted by bold lower case letters such as a; matrices are denoted by bold upper case letters such as A; and tensors are denoted by bold upper case letters with an underscore such as \({\underline{\mathbf{A}}}\) (mostly the order of tensor is 3, \(N_1\times N_2 \times N_3\)) and the (ijk)’th element by \(a_{ijk}\) (which is a scalar).

4 Social networks

Formally, social networks include a set of entities with regard to certain criteria known as network entities. The other important elements for the network are relations and interactions or in general, Links (Brandes and Wagner 2004; Lü and Zhou 2011). A data structure in a graph depends on a number of links between two nodes. It means that, in single-relational data, there is just one type of link among network and in multi-relational data, the types of links are more than one (Yang et al. 2012; Litwin and Stoeckel 2016). The study of social network is both a knowledge representation and a how supply. Knowledge representation is on what the graph structure is. Although the overall network structure is a graph, due to the complexity and variety of social networks, it arises as a separate field. Some social networks involving one single type of nodes and links can be represented as homogeneous networks, while some other social networks have multiple links and node type (Keyvanpour and Azizani 2012). A graph of homogeneous network is defined as:

$$\begin{aligned} \begin{aligned} G_{homo}&=\langle \, V,E \, \rangle \\ V&=\{v\in V \, | \, \forall \, i , j\,\, v_i \,\,\textit{identical to}\,\, v_j\,\}\\ E&=\{e\in E \, | \, \forall \, i , j\,\, e_i \,\,\textit{identical to}\,\, e_j\,\} \end{aligned} \end{aligned}$$
(1)

where V is the node set and E is the link set. A heterogeneous network is defined in a similar way, but it contains multiple kinds of nodes and links:

$$\begin{aligned} \begin{aligned} G_{hetro}&=\langle \, V,E \, \rangle \\ V&=\cup _i V_i\\ E&=\cup _i E_i \end{aligned} \end{aligned}$$
(2)

where V is the node set that contains the union of different node type sets and E is the union of the heterogeneous link sets.

Together with studying the structure of social networks, supplying network is a field of interest in this domain. In almost every social network, all the entities are not available at first and by the time of adding to it (da Silva Soares and Prudêncio 2012; Nasim and Brandes 2014). A dynamic network is one in which the rates of changes in same time intervals are always changing (Kuhn and Oshman 2011; Aggarwal and Subbian 2014). In other words, nodes and edges shrink and grow quickly. Such a dynamic network can typically be modeled as a dynamic graph \(G = \langle \, V,E \, \rangle \), where V is a set of nodes, and \(E :N^+ \rightarrow 2^{V\times V}\) is a dynamic edge function assigned to each round \(r \in N^+\) a set of edges E(r) for that round. A round occurs between two times; round \(r \in N^+\) occurs between time \(r-1\) and r. \(G(r) = \langle V, E(r) \rangle \) is instantaneous in round r. In the literature, such dynamic graphs have also been termed evolving graphs (Kuhn and Oshman 2011; Aggarwal and Subbian 2014). Naturally, what can be achieved in a dynamic network largely depends on how the dynamic set of edges is chosen. In Fig. 1, the overall view of social networks is presented.

Fig. 1
figure 1

Social networks types

5 Link prediction: problem definition

Link prediction is a the task of predicting the relations and interactions in a network. Machine learning techniques are proposed for the prediction of unknown links using the known links in a graph as training data. Independent of the procedure, predicting unknown links falls into two categories in accordance with the linked data: (i) Missing Link Prediction and (ii) Future Link prediction (Liben-Nowell and Kleinberg 2007; Lü and Zhou 2011; Dunlavy et al. 2011).

5.1 Missing link prediction

Consider graph \(G = \langle \, V,E \, \rangle \), where V is the set of nodes and E is the set of edges, \(E \subseteq (V \times V)\) that each edge \(e = ( u , v ) \in E\) represent a link between u and v. Let us denote the subgraph G[k] consisting of all edges that are available (also called training graph) and \(G[k']\) consisting of all missing edges (called test graph)(Lü and Zhou 2011;). In other words, the union of both subgraphs, G[k] and \(G[k']\) is equal to the original graph and the Intersection of these two subgraphs is empty (3).

$$\begin{aligned} \begin{aligned} E_k \cup E_{k'}&=E\\ E_k \cap E_{k'}&=\emptyset \end{aligned} \end{aligned}$$
(3)

For an algorithm to access the subgraph G[k], it must yield a list of edges not presented in G[k] that are predicted to appear in the \(G[k']\). Figure 2, describes a simple view of missing link prediction in a homogeneous social network.

Fig. 2
figure 2

An overview of the incoming and outgoing missing link predictor. Dashed edges in the left graph are missed and the model predictor cloud forecasts two of them as shown in the right

5.2 Future link prediction

Future link prediction can be grouped into the following categories: (i) periodic link prediction and (ii) non-periodic link prediction. The former type of link prediction considers the dynamic nature of the graph as a key feature due to the prediction. On the other hand, the latter type dose not discover changes over time but it focuses on the current state of the network. The look-out of future Link prediction is shown in Fig. 3.

  • Periodic link prediction

    Given a series of snapshots \(\{G_1 , G_2 , \cdots , G_t \}\) of an evolving graph \(G_t = \langle \, V,E_t \, \rangle \) in which each \(e = (u , v) \in E_t\) represent a link between u and v that took place at a particular time t (Brandes and Wagner 2004; Miller et al. 2009; Tylenda et al. 2009). We seek to predict the most likely link state in the next time step \(G_{t+1}\). In almost every method analyzed in this paper, it is assumed that nodes V remain the same across all time steps but edges \(E_t\) changes for each time t. Also some new links were predicted, while some of the previous links were removed. In other words, the goal is to properly predict the next state graph snapshot. Also in dynamic heterogeneous network, the graph is defined as:

    $$\begin{aligned} \small G=\langle V_1\cup V_2\cup \cdots \cup V_M,\{E_{1,t-n} \cup E_{2,t-n}\cup \cdots \cup E_{N,t-n},\cdots ,E_{1,t}\cup E_{2,t}\cup \cdots \cup E_{N,t}\}\rangle \end{aligned}$$
    (4)

    where \(V_u,\, u \in M\) represents the set of nodes of the same u and \(E_j, \, j \in N\) represent the set of links with type j (Yang et al. 2012 ).

  • Non-periodic link prediction

    In the non-periodic type, instead of having a series of snapshot of an evolving graph, we have one snapshot of the current state of graph \(G_t\). More formally, let \(G = \langle \, V,E_t \, \rangle \), where V is the set of nodes and E denote its edges, \(E \subseteq (V \times V)\). Considering two subgraphs corresponding to the current state, \(G_t\) and future \(G_{t+1}\) that \(E _t \, \cup \, E_{t+1} = \, E \, , \, E_t \, \cap \, E_{t+1} \, = \, \emptyset \). Due to current state, we seek to predict the next time step of graph \(G_{t+1}\) (Al Hasan et al. 2006 ; Liben-Nowell and Kleinberg 2007). It is noteworthy that most times, this category refers to a heterogeneous network because of the complexity of multi-relational data. It is possible to expand these definitions in a heterogeneous network. This could be achieved just by considering the graph as:

    $$\begin{aligned} G=\langle V_1 \cup V_2 \cup \cdots \cup V_M , E_1 \cup E_2 \cup \cdots \cup E_N \rangle \end{aligned}$$
    (5)
Fig. 3
figure 3

Future Link prediction. The one at the bottom shows a periodic link prediction, where the inputs are the snapshot of the graph in different time intervals. The one at the top shows the non-periodic link prediction where the input is just a snapshot of the current

6 Link prediction in social network

Link prediction is one of the important tasks of link mining in a data mining community (Getoor and Diehl 2005). Therefore, many methods have been created to determine a better way of predicting links (missing links or future links). The most fundamental issue in social link prediction is features(Al Hasan et al. 2006; Nguyen and Mamitsuka 2012; Li et al. 2014a). These features are obtained from two information resources. One is the information from the entities of networks such as nodes. The other kind of information used to predict links is the structures (topologies) of the networks themselves, with or without node information (Lichtenwalter et al. 2010; Nguyen and Mamitsuka 2012). In other words, the structure of social linked-data comprises latent information that can be used in many link-mining tasks, especially in link prediction.

Fig. 4
figure 4

General process of link prediction

In social networks, link prediction consists of two principal phases: feature extraction and prediction (Al Hasan et al. 2006; Fire et al. 2011). It is noteworthy that in many methods, delimitation is not explicitly seen and often in the prediction phase, some features are extracted. As a result of these features and the previous ones, inferences are made. The general architecture for a link prediction system is shown in Fig. 4.

7 Systemic analytical framework

In this section, a triplex analytical framework is presented and its usefulness in selecting an appropriate method is demonstrated. Since the structure of a social network is different, varieties of methods are presented to solve the link prediction problem. The aim of this analytical framework is to provide a platform which analyzes the most common and recent methods. It classifies these methods in a coherent way. Also, in this framework, with the introduction of evaluation criteria, the analysis and comparison of this classification was done. This analysis and comparison resulted in a better understanding of the approaches because the potential advantages of the methods when compared with each other is often unknown. This has a direct impact on performance in a particular status and lack of such framework poses challenges.

This paper explored a vast range of link prediction methods in social networks. Therefore, the structure of the framework is built based on three components. The aim of this study was to illustrate the different types of techniques and identify their fairness and effectiveness. The proposed framework is comprised of three components:

  1. 1.

    Classification of Link Prediction Approaches

  2. 2.

    Evaluation Criteria

  3. 3.

    Analytical evaluation

These three components have been described in details, as follows.

Fig. 5
figure 5

Link prediction approaches

7.1 Classification of link prediction approaches

Classifying link prediction approaches, is the first component of the proposed framework. A review of the wide range of factors offered in this paper shows that link prediction methods can be classified as follows (Fig. 5). At the highest level, the methods are classified with learning based approaches and heuristic based approaches. First ,in the heuristic based approach, the prediction phase is done directly after determining the features (Liben-Nowell and Kleinberg 2007; Lichtenwalter et al. 2010; Sarkar et al. 2011). This group of algorithms compute a similarity score between a pair of nodes. The second one is the learning based models which extract patterns from the input data. This input data can be a preprocessed feature vector, an explicit graph structure or both of them(Wang et al. 2011; Yang et al. 2012; Tylenda et al. 2009; Ermiş et al. 2012). This decomposition is because learning algorithms itself extracts a pattern from data to predict unseen links for further heuristic based approaches prediction link by known similarities from the graph structure.

In the following, each approach is described in detail. We began with the heuristic based approach and subcategories and then reviewed the learning based model variations.

7.1.1 Heuristic based approaches

This approach includes methods that they attempt predict links via heuristic information. This information captures the shared characteristics or contexts of two nodes. Due to this captured information, heuristic based approach is categorized into classes: node neighborhood and ensemble of all paths (Liben-Nowell and Kleinberg 2007; Lü and Zhou 2011; Sarkar et al. 2011; Feng et al. 2012). For Dynamic networks, although most methods in this approach do not consider the dynamic nature of the network, the result is better than predicting by accident. Methods based on neighborhood, which consider local indicators and paths based methods, are known as global indicators (Lü and Zhou 2011; Bliss et al. 2014). In these techniques, a score is assigned to each unobserved link and the top K links with the highest score are predicted. The superiority of these algorithms is no domain knowledge necessary to compute similarity score (Lichtenwalter et al. 2010). Furthermore, it is applicable in discovering homophily pattern. Homophily pattern is the tendency of entities to be related to other entities with similar characteristics (Nickel et al. 2016). The heuristic based approach is also known as similarity-based approach.

  • Node neighborhoods

    Node neighborhood encodes information about the relative overlap between node neighborhoods (Liben-Nowell and Kleinberg 2007). It is expected that the more “similar”nodes are more likely to be a predicted link. As a result of the simplicity and having fewer parameters, it is used in many studies on link prediction. The other priority is that being a general solution, it means it does not need a specific knowledge of the domain. Together with these positive aspects, it does not have the capability to explore evolutionary and non-linear patterns (Dunlavy et al. 2011). Four popular node neighborhood indices are explained below.

    • (a) Common neighbors It is a building block of many other approaches (Liben-Nowell and Kleinberg 2007; Li et al. 2014b) with mathematical expression as:

      $$\begin{aligned} \textit{score}_{CN}(x , y ) = |\varGamma (x) \cap \varGamma (y) | \end{aligned}$$
      (6)

      where \(\varGamma (x)\) and \(\varGamma (y)\) denote the neighbor set of nodes x and y, respectively.

    • (b) Jaccard coeffcient This similarity metric is mostly used in information retrieval. Jaccard Coeffcient is a normalized neighbor (7).

      $$\begin{aligned} \textit{score}_{JC}(x , y ) = \dfrac{|\varGamma (x) \cap \varGamma (y) |}{|\varGamma (x) \cup \varGamma (y) |} \end{aligned}$$
      (7)

      In fact, it defines the probability that a common neighbor of a pair of nodes x and y would be selected if the selection is made randomly from the union of the neighbor-sets of x and y. However, from the experimental results, Liben-Nowell and Kleinberg (2007) showed that the performance of Jaccard coefficient is worse in comparison with the number of common neighbors.

    • (c) Adamic/Adar Adamic and Adar (Adamic and Adar 2003) proposed this score as a similarity index between two web pages. For link prediction, Liben-Nowell and Kleinberg (2007) customized these indices as below, where the common neighbors are considered as features.

      $$\begin{aligned} \textit{score}_{AA}(x , y ) = \sum _{z \in \varGamma (x) \cap \varGamma (y)} \frac{1}{\log | \varGamma (z)|} \end{aligned}$$
      (8)

      Conceptually, Adamic/Adar gives more weight to the neighbors that are not shared with many others. From the reported results of the existing works on link prediction, Adamic/Adar works better than the previous two metrics.

    • (d) Preferential attachment The basic premise is that the probability that a new edge has node x as an endpoint is proportional to \(\varGamma (x)\), the current number of neighbors of x. The probability that this new link will connect x and y is proportional to:

      $$\begin{aligned} \textit{score}_{PA}(x , y ) = |\varGamma (x)| . |\varGamma (y)| \end{aligned}$$
      (9)
  • Ensemble of all paths

    Paths between two nodes are another heuristic that can be used for computing similarities between node pairs. Exploring the graph is both a strength and a weakness. Exploring through graph is a simple learning inside, but in a large scale network, it is time-consuming. Therefore, it is used as a basic idea of the other methods (Rossetti et al. 2015). A brief introduction of the four important global indices is given as follows.

    • (a) Katz Katz is a stated method among path-based approach that counts all paths between two nodes (Dunlavy et al. 2011). The paths are exponentially damped by length that can give mode weights to shorter paths. This measure is defined as follows, where \(paths_{x,y}^l\) is the set of all paths from x to y which \(l, \beta > 0\) (10) and the very small \(\beta \) will cause Katz metric much like CN metric because long length contributes very little to the final similarities (Liben-Nowell and Kleinberg 2007; Lü and Zhou 2011).

      $$\begin{aligned} \textit{score}_{katz}(x,y)=\sum _{l=1}^{\infty } \beta ^l \cdot |path_{x,y}^l|=\beta A+\beta ^2A^2+\beta ^3A^3+\cdots \end{aligned}$$
      (10)
    • (b) Hitting time For two vertices, x and y in a graph, the hitting time, \(H_{x,y}\) defines the expected number of steps required for a random walk starting at x to reach y. Shorter hitting time shows that the nodes are similar, so they can be unseen links (Al Hasan and Zaki 2011). For undirected graph it can be considered as:

      $$\begin{aligned} \textit{score}_{HT_{ugraph}}(x,y) = H_{x,y} + H_{y,x} \end{aligned}$$
      (11)

      Hitting time metric is easy to compute by performing some trial random walks. On the downside, its value can have high variance; hence, prediction by this feature can be poor. Due to the scale free nature of a social network some of the vertices may have very high stationary probability (\(\pi \)) in a random walk; to safeguard against it, the hitting time can be normalized by multiplying it with the stationary probability of the respective node, as shown below:

      $$\begin{aligned} \textit{score}_{NHT_{ugraph}}(x,y) = H_{x,y}.\pi _y + H_{y,x}.\pi _x \end{aligned}$$
      (12)
    • (c) Rooted Pagerank Similarity score between two vertices x and y can be measured as the stationary probability of y in a random walk that returns to x with probability \(1-\beta \) in each step, moving to a random neighbor with probability \(\beta \) is pagerank for link prediction (Feng et al. 2012). Rooted Pagerank is a modification of Pagerank. Let D be a diagonal degree matrix with \(D[i, i] =\sum _j A[i, j]\). Let, \(N = D^{-1}A\) be the adjacency matrix with row sums normalized to 1. Then,

      $$\begin{aligned} \textit{score}_{RPR}(x,y) = (1-\beta )(I-\beta N)^{-1} \end{aligned}$$
      (13)
    • (d) SimRank SimRank is defined in a self-consistent way, according to the assumption that two nodes are similar if they are connected to similar nodes.

      $$\begin{aligned} \textit{score}_{SR}(x,y) = {\left\{ \begin{array}{ll} 1 &{} \text {if }{} \textit{x=y}\\ \gamma . \frac{\sum _{a \in \gamma (x) }\sum _{b \in \gamma (y)}simRank(a,b)}{|\varGamma (x)| . |\varGamma (y)|} &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
      (14)

      where \(\gamma \in [0,1]\) is the decay vector. The SimRank can also be interpreted by the random walk process, that is \(\textit{score}_{SR}(x,y)\) to measure how soon two random walkers, respectively starting from nodes x to y, are expected to meet at a certain node(Lü and Zhou 2011).

The brief description of these similarity indices are listed in Table 1. In the description column, the weaknesses and strengths of each index are demonstrated.

Table 1 Popular similarity indices in the link prediction. The set \(\varGamma (x)\) consist of the neighbors of x in \(G_{train}\)

7.1.2 Learning-based approaches

At this level of abstraction, a model is presented which is learned with a given feature, extract patterns and eventually lead to link prediction (Nickel et al. (2016)). Conceptually, learning based models aim at abstracting the underlying structure of the input graph, and predicting the unseen links by using the learned model (Al Hasan et al. 2006). This approach is classified into two: Classification model and Latent-feature-based models. The key idea behind this grouping is the type of learning model. Classification of model extract patterns from input data and learning a model for the prediction of missing/future links. The input data are the pre-process feature vectors where each entry is known like similarity index, external node or link information(Al Hasan et al. 2006; Liu et al. 2012; Fire et al. 2011). However, in the latent-feature-based model these pre-process feature vectors can be optional. Clearly, the latent-feature-based model, extracting latent features from input graph and learn a model. Meanwhile, this learning model can use external information analogous to social network data or pre-process feature vector. The following two items describe the classification model and latent-feature-based model.

Fig. 6
figure 6

The flowchart of classification model

Fig. 7
figure 7

Grouping most used features

Fig. 8
figure 8

Two different type of information can obtain from network

Fig. 9
figure 9

Example of heterogeneous network

Fig. 10
figure 10

Factorization of an adjacency tensor \({\underline{\mathbf{X}}}\) using RESCAL model. (Reproduced with permission from Nickel and Tresp 2013b)

  • Classification model

    Almost every method that falls into this category is a supervised learning. After identifying the set of features which are key to supervised learning (Al Hasan et al. 2006; Backstrom and Leskovec 2011), the link prediction problem is mapped to a binary classification (Lee et al. 2013; Fig. 6). Although in the binary classification, it is important to predict both classes, in link prediction, the issue is to predict the missing link or future link. In a classification model for link prediction, researchers have used the supervised model, including the support vector machine (Bliss et al. 2014), decision trees (Wang et al. 2011), multi-layer perceptron (Al Hasan et al. 2006), supervised random walks (Backstrom and Leskovec 2011) and others. They found that the support vector machine performed most in the prediction of future links in the supervised model (Nguyen-Thi et al. 2015).

    Al Hasan et al. (2006) placed the feature under categories. Node Attribute and topological features are mostly obtained from heuristic-based methods. In other words, instead of directly predicting link, it is used as an entry of feature vector for learning a model. The others refer to node attribute features like profile information in social networks. Figure 7 shows some of the properties in the feature vector. Although, just having a feature vector can predict link with any binary classification model, creating a valuable feature vector has an additional task.

  • Latent-feature-based model

    An underlying assumption of latent-feature-based model is to build a model, which can discover latent features from the structure of the graph (Sarkar and Moore 2005; Miller et al. 2009; Sarkar et al. 2012; Sewell and Chen 2016). As mentioned earlier, there are two types of information that can be obtained from the network (Fig. 8). Nevertheless, the researchers in this domain believe that the structure of the graph and the combination of these two types of information have latent characteristic, which with the simple techniques, cannot be achieved, especially when capturing the dynamic characteristics or in heterogeneous network with multi-relational data (Miller et al. 2009; Bordes et al. 2014; Zhu et al. 2016b). The goal of latent feature based approach is to learn a model from observed links such as those that can predict the values of unobserved entries. The latent representation of each node corresponds to a point on the surface of a unit hypersphere. In the latent-feature-based model, each entity is associated with a vector \(\mathbf e _i \in \mathbb {R}^{H_e}\), where \(H_e\ll N_e\) (\(N_e\) is the number of entity in the graph). Each link is explained via latent features of entities. For instance, as show in Fig. 9 each node can be modeled via vectors

    $$\begin{aligned} \mathbf e _{Ali} =&\begin{bmatrix} 0.1 \\ 0.9 \end{bmatrix}&\mathbf e _{Sina} =&\begin{bmatrix} 0.15 \\ 0.85 \end{bmatrix}&\mathbf e _{Mohammad} =&\begin{bmatrix} 0.1 \\ 0.8 \end{bmatrix}\\ \mathbf e _{Sam} =&\begin{bmatrix} 0.9 \\ 0.5 \end{bmatrix}&\mathbf e _{Reza} =&\begin{bmatrix} 0.92 \\ 0.35 \end{bmatrix}&\mathbf e _{Borna} =&\begin{bmatrix} 0.82 \\ 0.45 \end{bmatrix} \end{aligned}$$

    where the component \(e_{i1}\) corresponds to the latent feature proficient developer and \(e_{i2}\) correspond to being healthy. Thus, Ali has a teammate connection, it can be inferred that he is healthier, or Sam has a colleague connection to Reza and Borna, so the latent feature proficient developer is higher for him. Note that, unlike this example, the latent features that are inferred by the following models are typically difficult to interpret. The key intuition behind relational latent feature models is that the relationships between entities can be derived from interactions of their latent features(Sarkar and Moore 2005; Rastelli et al. 2016; Rahman and Al Hasan 2016; Sewell and Chen 2016). However, there are many possible ways to model these interactions, and many ways to derive the existence of a relationship from them. With the dimensionality constraint,the link prediction is efficient in both computational time and storage cost(Sarkar and Moore 2005; Nickel et al. 2016; Zhu et al. 2016b). In addition, varying the dimension of latent space offers an opportunity to fine-tune the compromise between computational cost and solution quality. The higher dimension leads to a more accurate latent space representation of each node, but also yields higher computational cost(Zhu et al. 2016b).

    Several approaches have been presented to obtain these latent features. The following three items will introduce each approach in detail.

    • (a) Tensor factorization based

      Significantly, tensor factorization is known as an approach for structured data in different learning contexts. The success of tensor factorization in link prediction problem is due to its high ability to model and analyze relational data (Dunlavy et al. 2011; Ermiş et al. 2012; Gao et al. 2011). Tensor based methods are usually in two (Matrix (Menon and Elkan 2011)) and three orders. In future link prediction, the third domain is considered as a different time snapshot. This approach has a reasonable ability to detect latent feature during the time. More formally, given a sequence of words:

      $$\begin{aligned} z(i,j,t)={\left\{ \begin{array}{ll} 1 &{} \text {if node } i \text { links to node } j \text { at time } t.\\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
      (15)

      which shows that the link from node i to j appeared at time t(Acar et al. 2009; Spiegel et al. 2011; Dunlavy et al. 2011; Yao et al. 2015; Zhu et al. 2016b; Han and Moutarde 2016). On the other hand, in heterogeneous networks with multi-relational data, the third dimension shows different types of links. It is most applicable in heterogeneous networks where links have a high dependency(Gao et al. 2011; Ermiş et al. 2012; Nickel and Tresp 2013b, a; London et al. 2013; Nickel et al. 2014; Krompaß et al. 2014). The third order tensor is used to define multi-relational data as follows:

      $$\begin{aligned} z(i,j,k)={\left\{ \begin{array}{ll} 1 &{} \text {if } \textit{relation}_k (\textit{node}_i , \textit{node}_j ) \text { is true}.\\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
      (16)

      Factorization techniques like the CP (CanDecomp/Parafact) and Tucker model can be considered higher-order generalization of the matrix singular value decomposition (SVD)(Keyvanpour and Moradi 2014) and principal component analysis (PCA) (Kolda and Bader 2009 ; Spiegel et al. 2011). However, the CP model is more advantageous in terms of interoperability, uniqueness of solutions and determining the parameter (Spiegel et al. 2011). A three dimensional tensor \({\underline{\mathbf{Z}}}\) defined as \(m * n * t\), its k-component CP factorization is defined as:

      $$\begin{aligned} \sum _{k=1}^k \lambda _ka_k\circ b_k\circ c_k \end{aligned}$$
      (17)

      Symbol \(\circ \) stands for outer product, \( \, \lambda _k \in \mathbb {R}^+ \, \),\( \, a_k \in \mathbb {R}^m \, \),\( \, b_k \in \mathbb {R}^n \, \),\( \, c_k \in \mathbb {R}^t\) , where \(k = 1, \cdots , K\). Each summand \((\lambda _ka_k\circ b_k\circ c_k )\) is called a component, each vector is called factor. RESCALL is one of the prominent latent factor models for relational learning(Nickel and Tresp 2013b), which factorizes an adjacency tensor \({\underline{\mathbf{X}}}\) into a core tensor \({\underline{\mathbf{R}}} \in \mathbb {R}^{r\times r \times m}\) and a factor matrix \(\mathbf A \in \mathbb {R}^{n\times r}\) such that

      $$\begin{aligned} {\underline{\mathbf{X}}} \approx {\underline{\mathbf{R}}} \times _1 \mathbf A \times _2 \mathbf A \end{aligned}$$
      (18)

      Equation (18) can be equivalently specified as \(x_{ijk}\approx \mathbf a _i^\top \mathbf R _k \mathbf a _j\) where the column vector \(a_i \in R^r\) denotes the i-th row of \(\mathbf A \) and the matrix \(\mathbf R _k \in \mathbb {R}^{r\times r}\) denotes the k-th frontal slice of \(\mathbf R \). Consequently, \(\mathbf a _i\) corresponds to the latent representation of \(entity_i\) ,while \(\mathbf R _k\) models the interactions of the latent variables for \(relation_k\) (Nickel et al. 2016; Fig. 10). Among the presented approach, Tensor factorization has a high capability of representation of multi-relational data (Ermiş et al. 2012; Spiegel et al. 2011), even the tensor can expand to higher order due to representing dynamic heterogeneous network. Nevertheless, as the network size is growing, the computational cost is grows speedily.

      As shown by Narita et al. (2012), one significant obstacle in tensor factorization is that its prediction accuracy tends to be poor because observations made in real datasets are typically sparse. Several studies as such coupled tensor factorization (Yılmaz et al. 2011; Acar et al. 2011; Narita et al. 2012; Ermiş et al. 2012, 2015; Nakatsuji et al. 2016) have attempted to incorporate side information such as node attribute or social connection to improve prediction accuracy. To address couples tensor factorization, consider a third order tensor \({\underline{\mathbf{X}}} \in \mathbb {R}^{I\times J \times K}\) with some elements been observed and others remaining unobserved. Three non-negative matrices, \(\mathbf A _1\in \mathbb {R}^{+^{I\times I}}\) , \(\mathbf A _2\in \mathbb {R}^{+^{J\times J}}\) , \(\mathbf A _3\in \mathbb {R}^{+^{K\times K}}\), corresponds to one of three modes of \({\underline{\mathbf{X}}}\) and contain side information such as similarity or node attributes. Given these three matrices beside \({\underline{\mathbf{X}}}\) can fill unobserved parts of \({\underline{\mathbf{X}}}\) and overcome the sparsity problem.

      Another approach to learning from graphs is based on matrix factorization, where, prior to the factorization, the adjacency tensor \({\underline{\mathbf{Y}}} \in \mathbb {R}^{N_e \times N_e \times N_t}\) is reshaped into matrix \(\mathbf Y \in \mathbb {R}^{N_e^2 \times N_t}\) by associating rows with node-node pairs \((e_i , e_j)\) and columns with time t or relation \(r_k\) (Jiang et al. 2012; Riedel et al. 2013). Unfortunately, these formulations lose information as compared to tensor factorization. For instance, if each node-node pair is modeled via a different latent representation, the information that the relationships \(y_{ijk}\) and \(y_{pjq}\) share the same object is lost. It also leads to an increased memory complexity, since a separate latent representation is computed for each pair of entities.

    • (b) Nonparametric model

      Using nonparametric trick is another type of latent feature model. In this model, methods mostly use Bayesian nonparametric methods to discover discriminative latent features and automatically infer the unknown social dimension (Wang et al. 2007; Miller et al. 2009; Yu et al. 2009; Cao et al. 2010; Kim et al. 2013; Zhu et al. 2016a). Basically, each entity is described by a set of binary features (Schmidt and Morup 2013). Nonparametric models allow simultaneous inferring of the number of latent features at the same time (Sarkar et al. 2012). On the other hand, some nonparametric models are based on kernel. These kernel based models include kernel regression, compute kernel similarities between query and all members of the training set(Collomb and Härdle 1986; Yu et al. 2006; Nguyen and Mamitsuka 2011; Sarkar et al. 2012).

      Miller et al. (2009) introduced a basic Nonparametric model. They showed that each entity is described by a set of binary features and there is no priority for each of them. The probability of having a link from one entity to another is entirely determined by a combined effect of all pairwise feature interactions. If there are K features, then \(\mathbf Z \) will be the \(N \times K\) binary matrix where each row corresponds to an entity and each column corresponds to a feature such that \(z_{ik}\equiv Z(i,k)= 1\) if the \(i^{ th }\) entity has feature k and \(z_{ik}=0\) otherwise. The model has a real value weight matrix (\(K\times K\)) where \(w_{kk'}\equiv W(k,k')\) is the weight that affects the probability of occurrence of a link from entity i to j if entity i has feature k and entity j has feature \(k'\) .

      It is assumed that links are independently conditioned on \(\mathbf Z \) and \(\mathbf W \), and that only the features of entities i and j influence the probability of a link between those entities. This defines the likelihood:

      $$\begin{aligned} Pr (\, Y\, |\, Z\, ,\, W\, )\, = \, \prod _{i,j}\, Pr \, (\, y_{ij}\, |\, Z_iZ_j\, ,\, W\, ) \end{aligned}$$
      (19)

      where the product ranges over all pairs of entities. Given the feature matrix \(\mathbf Z \) and weight matrix \(\mathbf W \) , the probability that there is a link from entity i to entity j is given as:

      $$\begin{aligned} Pr (\, y_{ij}=1\, |\, Z\, ,\, W\, )\, = \sigma ( \, Z_iWZ_j^\top ) = \sigma \left( \, \sum _{k,k'}\, z_{ik}\,z_{jk'}\,w_{kk'} \,\right) \end{aligned}$$
      (20)

      where \(\sigma (.)\) is a sigmoid function that transforms values on \((-\infty , +\infty )\) to (0, 1) .

      Nonparametric models have high ability to explore evolutionary patterns especially seasonal fluctuations. It is noteworthy that, this effectiveness is just compared to the heuristic based methods and not to the other latent feature based model (Sarkar et al. 2012, 2014; Zhu et al. 2016a). Among the learning based models, the nonparametric model is the fastest. Reason being that it has no or a few parameters. On the other hand, most of the methods use LSH implementation to make them faster. LSH or locality sensitive hashing is often used in database for table lookups or retrieving matching items from a large database (Sarkar et al. 2012). This model is also known as probabilistic model (Wang et al. 2007; Clauset et al. 2008; Schmidt and Morup 2013).

    • (c) Deep model

      Due to the significant result of deep learning approach in computer vision, speech recognition, and natural language processing (Bengio et al. 2013; Li Deng 2014; Goodfellow et al. 2016), the researchers were motivated to use a deep model in link prediction task (Socher et al. 2013; Liu et al. 2013; Perozzi et al. 2014; Li et al. 2014a, b; Zhai and Zhang 2015). In general, deep learning is a set of algorithms in machine learning that performs learning tasks in multiple levels, corresponding to different levels of abstraction. It typically uses artificial neural networks. The levels in these learned statistical models correspond to distinct levels of concepts, where higher-level concepts are defined from lower level ones, and the same lower-level concepts can help to define many higher-level concepts (Bengio et al. 2013; Goodfellow et al. 2016).

      Deep model is new to link prediction. Multiple levels of representation help to detect nonlinear latent relationship (Li et al. 2014a, b; Liu et al. 2013). Although most of the methods using a deep neural network, Perozzi et al. (2014) and Grover and Leskovec (2016) are presented as deep random walk due to a notation of deep model. In fact, for the use of deep model, three different ideas have been used in this domain and they are:

      \(\bullet \) :

      Using multiple levels of RBM as a building block of the deep model. The Restricted Boltzmann machine is a building block of many deep models. The reason for its success is that it supports distributed representation, has associative memory and the inference is easily done (Hinton et al. 2006). In this case, in order to use it for relational data, some modifications are applied (Liu et al. 2013; Li et al. 2014a, b). A RBM is a neural network that contains two layers. It has a single layer of hidden units that are not connected with each other. The hidden units have undirected, symmetrical connections to a layer of visible units. To each unit, including both hidden units and visible units, there is a bias in the network. The value of the visible and hidden units are often binary or stochastic units (assume 0 or 1 based on probability)(Hinton et al. 2006). When inputing a vector \(\mathbf v (v_1, v_2, \cdots , v_n)\) to the visible layer, the binary state, \(h_j\) of each hidden unit is set to 1 with a probability given by:

      $$\begin{aligned} p(h_j=1 | v ) = \sigma ( b_j + \sum _{i}^{n} v_iw_{ij}) \end{aligned}$$
      (21)

      where \(\sigma (x) = 1/ (1+e^{-x})\), \(b_j\) is the bias of the hidden unit j.

      When input a vector \(\mathbf h (h_1, h_2, \cdots , h_m)\) to the hidden layer, the binary state, \(v_i\) of each visible unit is set to 1 with probability by

      $$\begin{aligned} p(v_i=1 | h ) = \sigma ( a_i+ \sum _{j}^{n} h_jw_{ij}) \end{aligned}$$
      (22)

      where \(a_i\) is the bias of visible unit i.

      Liu et al. (2013) utilizes the Deep belief network(stack of Restricted Boltzmann Machines) to study the model used for unsupervised link prediction. They provide original link feature vectors as the first RBM’s visible units’ input and use the top RBM’s hidden units’ output as their represented features(Fig. 11a). After training the top RBM, each possible label is tried in turn with a test vector(Fig. 11b).

      Li et al. (2014b) introduces ctRBM, which inherits the advantages of the Restricted Boltzmann Machine family for predicting temporal link. It contains distributed hidden states which imply that it has an exponentially large state space to manage the complex nonlinear variations. It has conditional independent structure that makes it easy to plug in external features. Each node in that model has two types of directed connection to the hidden variables: temporal connection from nodes to historical observations and neighbor connections from the expectation of its local neighbors’ prediction (Fig. 12).

      \(\bullet \) :

      Since the deep model has an outstanding result in natural language processing and data in that field is relational, this prompted researchers to draw inspiration from the ideas in the field for link prediction (Perozzi et al. 2014). Link prediction algorithm such as deepwalk(Perozzi et al. 2014) and node2vec (Grover and Leskovec 2016)truncated random walks to learn latent representations by treating walks as the equivalent of sentences. They produce a vector representation for each entity as latent representation by word2vec models (Mikolov et al. (2013)). These vector representations of entities carry semantic meanings. The inputs of the model are sequences of nodes from the underlying network and turn a network into an ordered sequence of nodes (Perozzi et al. 2014; Tang et al. 2015; Grover and Leskovec 2016; Zhang et al. 2016). in this model, the algorithm iterates over all possible collections in the random walk. In fact, this approach generates representation of social networks that are low-dimensional, and exist in a continuous vector space. Its representation encodes the latent feature of nodes.

      \(\bullet \) :

      Creating a new embedded neural network that is compatible with the nature of relational data. In these embedding models each relation is defined as a triple:

      $$\begin{aligned} (h, l, t ) \, \, \, ,\, \, \, h, t \in E , l \in L \end{aligned}$$
      (23)

      It is composed of two entities h,t and a relation l. Each model assigns a score to a triple using a score function which measures how likely it is that the triple is correct. Entities are represented by a low dimensional vector the embedding and the relation acts as an operator on them (Socher et al. 2013; Fig. 13). Both embedding and operators define a score function that is learned so that triples observed in knowledge bases have a higher score than the unobserved ones. In other words, link prediction is a complementary task in knowledge base completion (Bordes et al. 2014; Socher et al. 2013). Table 2 summarizes some popular scoring functions in the literature. In addition, Table 2 shows the transformation of each score function and the parameters.

Fig. 11
figure 11

a Left is DBN structure for unsupervised link prediction and feature representation. b Right is DBN structure for link prediction. (Reproduced with permission from Liu et al. 2013)

Fig. 12
figure 12

a Left Restricted Boltzmann Machine with temporal information, where N is the window size. b Right The summarized neighbor influence \(\eta ^t\) is integrated into the energy function as adaptive bias. (Reproduced with permission from Li et al. 2014b)

Fig. 13
figure 13

Perspective of deep embedded model

Table 2 Comparisons among several multi-relational models in their scoring functions

For a more efficient representation, at the end of the Sect. 6.1 this classification together with related benefits and requirement is summarized in Table 3. In this table, the most important points and technical aspects are illustrated. It is noteworthy that the mentioned approaches in the table are in one-to-one correspondence with Fig. 5.

Table 3 Benefits and challenges of Link prediction models

7.2 Evaluation criteria

The second component of the proposed framework is the introduction of evaluation criteria in the link prediction. Evaluation criteria plays a significant role in evaluating the effectiveness and weakness of an algorithm. In fact, there are evaluation criteria that determine the quality and validity of a method. As long as evaluation criteria are being discussed, quantitative criteria always come to mind. However, it is noteworthy that the evaluation criteria do not end only on quantitative criteria, and the qualitative criteria are also determinants of the validity of an algorithm. Qualitative criteria cannot measure and represent with a number or a curve. These metrics are derived or inferred from two or more parameters and is often expressed in comparative terms. In the following, each quantitative and qualitative criterion is introduced and each type is expressed. The purpose of this section is to provide an overview of all the evaluation criteria that can be used to examine the validity, quality and capability of a method.

7.2.1 Quantitative evaluation metrics

Most of the quantitative evaluation metrics used in link prediction have been adopted from other applications such as information retrieval and binary classification (Junuthula et al. (2016)). Hence, their functionality has not been specifically investigated in the link prediction. Of course, Yang et al. (2015) and Junuthula et al. (2016) studies some of these evaluation criteria in temporal link prediction in dynamic networks. By reviewing a large volume of articles, it was revealed that some of the evaluation criteria are different in predicting temporal or lost link. The reason for this is explained as follows: In the temporal link prediction, all links that are not already in the network are in fact not present at all or in other words, they are negative links. In missing link prediction, this assumption is not true at all, and links that are not visible on the network are in the two subsets of lost links and negative links. These two distinct attitudes to the issue should also be taken into account in choosing the evaluation criterion.

Quantitative criteria can be considered in two broad categories: fixed-threshold metrics and threshold curves (Lichtnwalter and Chawla 2012; Yang et al. 2015; Davis and Goadrich 2006). Fixed-threshold metrics suffer from the limitation that some estimates of a reasonable threshold must be available in the score space (Lichtnwalter and Chawla 2012). Threshold curve criteria like ROC or PR are an alternative to these weaknesses.

  • (a) Fixed-threshold metrics

    Fixed-threshold metrics rely on different types of thresholds: prediction score, percentage of instances, or number of instances Yang et al. (2015). The output of a link predictor is usually a set of real-valued scores, which are compared with a set of binary labels, where each label denotes the presence (1) or absence (0) of an edge. These scores should be transformed to the label with regard to the threshold and finally, the evaluation criteria should be used.

    • Accuracy In classification, the commonly used metrics is accuracy which is defined as:

      $$\begin{aligned} accuracy = \frac{TP + TN}{ P + N} \end{aligned}$$
      (24)

      where TP is the true positive predicted and TN is the true negative predicted links and PN are the total positive and negative links, respectively. Accuracy is often deceptive in the case of highly imbalanced data, where high accuracy can be obtained even by a random predictor. Typically, social networks are sparse and the existing link only constitutes less than \(10\%\) (this is approximate among data sets) of all possible links (vonWinckel 2014; Lichtnwalter and Chawla 2012). This means that accuracy is not a meaningful measure.

    • Precision In link prediction, precision or positive predictive value is defined as:

      $$\begin{aligned} PPV = \frac{TP}{ TP + FP} \end{aligned}$$
      (25)

      where TP is the true positive predicted and FP is the False positive predicted links. This fraction shows the ratio of the true positive prediction among all positive predictions. Precision takes all positive predicted links into account, but it can also be evaluated at a given cut-of-rank, considering only the top-most results returns by the system. This measure is called precision at n or P@n. As might be expected, the accuracy of link prediction also varies according to the choice of precision measure. Due to its robustness, P@n is a frequently used measure in the domain of information retrieval and machine learning (Spiegel et al. (2011)).

    • Recall In link prediction, recall or true positive rate shows how many of the positive links are predicted. In formal literature it is defined as:

      $$\begin{aligned} Recall (TPR) = \frac{TP}{ TP + FN} = \frac{TP}{P} \end{aligned}$$
      (26)

      where TP is the true positive predicted link and false negative predicted link.

    • F1-score Precision and recall are often combined into a single measure using their harmonic mean, known as the F1-score. Equation 27 describes how these two metrics combine into one.

      $$\begin{aligned} F1-score = 2 . \frac{recall . precision}{recall + precision} = \frac{2TP}{2TP + FP + FN} \end{aligned}$$
      (27)
    • NDCG The normalized discounted cumulative gain (NDCG) over the top k link prediction scores is another information retrieval-based metric that has been used for evaluating link prediction accuracy Tylenda et al. (2009). It is a common evaluation metric used to measure the performance of ranking methods (Zhang et al. (2015). It is defined as:

      $$\begin{aligned} NDCG_k = \frac{DCG_k}{IDCG} \end{aligned}$$
      (28)

      where

      $$\begin{aligned} \begin{aligned} DCG_k&= \sum _{i=1}^{k} \frac{2^{r_{i}} - 1 }{\log _2 (i+1)}\\ IDCG_k&= \sum _{i=1}^{|r|} \frac{2^{r_{i}} - 1 }{\log _2 (i+1)} \end{aligned} \end{aligned}$$
      (29)

      In Eq. 29, |r| represents a list of positive links in the network and \(r_i \in \{ 0 , 1\}\) is the rank of the link in binary mode. It is a fixed-threshold metric that suffers from the same drawbacks as other fixed-threshold metrics as discussed by Yang et al. (2015).

    • Mean Rank (MR) This evaluation criterion is solely used in missing link prediction. In order to evaluate with this measure, the dataset should be divided into two sets, the train and test. Both train and test sets include observed links. In fact, there are no negative links in these two sets. Considering these circumstances, the test procedure is as follows: For each test link, one node is removed and replaced by each of the entities of the dictionary in turn. Dissimilarities (or energies) of those corrupted links are first computed by the models and then sorted in ascending order; the rank of the correct entity is finally stored. Then the mean of those predicted ranks is reported as mean rank( Bordes et al. (2011)).

    • Hit@n The procedure of Hit@n is the same as mean rank. Hit@n is the proportion of correct entities ranked in the top n. Mostly it reports Hit@10 in articles (Bordes et al. 2013).

  • (b) Threshold curves

    An alternative to fixed-threshold metrics is the use of threshold curve due to the rarity of cases when researchers are in possession of reasonable threshold (Davis and Goadrich 2006; Yang et al. 2015). Threshold curve works by shifting the threshold, computing metrics for each one and then drawing a curve with all computed metrics in all thresholds. They become popular when the class distribution is highly imbalanced. Moreover, in threshold curve, a single scalar measure known as area under curve is used, which serves as a single summary statistic of performance (Davis and Goadrich 2006). Receive Operation characteristic curve (ROC) and Precision-Recall curve (PR) are the two threshold curves mostly used in link prediction (Yang et al. 2015; Junuthula et al. 2016).

    • Receive Operation characteristic (ROC) ROC curve describes the fraction of true positive rate versus the fraction of false positive rate at various threshold setting (Davis and Goadrich 2006). The FPR measures the fraction of negative links that are misclassified as positive links. The TPR measures the fraction of positive links that are correctly predicted (Davis and Goadrich 2006; Lichtnwalter and Chawla 2012) .

      Although these metrics are widely used in link prediction, Yang et al. (2015) proves that ROC and AUC can be deceptive. They explain that, due to the severe class imbalance in link prediction, it is recommended to use PR curves and PRAUC in evaluating the link predictor rather than the ROC curve and AUC.

    • Precision-Recall (PR) The precision-recall curve shows precision with respect to recall at all thresholds Davis and Goadrich (2006) Lichtnwalter and Chawla (2012). The Precision-Recall curve considers only the prediction of positives link and ignores negative samples. Although in link prediction, it is desirable to predict a positive link, in temporal link prediction, removed edges are needed to predict which PR curve dose not give credit for correctly predicting removed or negative edges (Junuthula et al. 2016).

7.2.2 Qualitative evaluation metrics

This section describes qualitative criteria in link prediction. Although such qualitative criteria are not measurable, understanding their importance can aid in the selection of an appropriate method for link prediction. The objective is to identify fair and effective qualification measure for link prediction evaluation.

  • Cost The proposal of cost is on how time consuming the method. As size of the network goes up, computational process increases sharply (Dunlavy et al. 2011; Li et al. 2014b). It is obvious that the number of parameters has high impact on the computing time and memory (30).

    $$\begin{aligned} \begin{aligned} N_p&\propto T\\ N_p&\propto M \end{aligned} \end{aligned}$$
    (30)
  • Scalability Social network is a large sparse graph. Actually, there are few observed links compared to the size of the graph. Especially in periodic link prediction, the total size of the network varies with time. Hence, the presented method should be able to cope with evolving network (Lü and Zhou 2011; Bordes et al. 2014).

  • Generalization It is appealed that the presented method has a reasonable result to most of the datasets. In other words, it utilizes less explicit features like profile information and explores the features just by the input data. If the method requires more explicit features then its evaluation is highly dependent on the selection of proper features (Dunlavy et al. 2011; Nguyen and Mamitsuka 2012; Litwin and Stoeckel 2016).

  • Exploring evolutionary patterns In a dynamic network, patterns will be formed over time. Discovering such, these patterns help to represent a better picture of network over time and increases accuracy. A key problem in the dynamic network is in understanding the seasonal fluctuations and detecting an ill-behaved node (Li et al. 2014b).

  • Knowledge Representation Multi-relational data in heterogeneous social network is also known as Knowledge base. The main issue with KBs is that they are far from being complete and knowledge can be represented as a triple(node head, edge relation, node tail). Extracting semantic relation is a task of demand (Brandes and Wagner 2004; Litwin and Stoeckel 2016).

7.3 Analytical evaluation

The third and perhaps the most important component of the proposed framework is the evaluation of link prediction categorization methods. This component evaluates link prediction techniques according to the criteria that have been introduced in the previous subsection. Our evaluation is summarized in Table 4. As observed in Table 4, the column headings is evaluation criteria and the row headings are the nodes of the classification tree (Fig. 5). In Table 4 the strengths and weaknesses of the approaches were compared to each other. An attempt was made to show the efficiency of link prediction models in a different view.

It is important to note that this is not a quantitative evaluation based on scientific experiments, but a qualitative assessment based on a detailed study of the results of previous papers. These results are based on a variety of datasets and with a variety of model validation. Since the number of datasets, the model validation and the purpose of the evaluation criteria are varied, so our mission in this survey article is to provide a qualitative assessment. This is a qualitative assessment to provide a comparative perspective on a macro level, even in a specific domain, with a high accuracy, different parameters such as datasets, test methods, and so on are affected. This assessment can be used in three ways: how to compare, how to choose a method and how to improve the method. This section presents a discussion and comparison of the approaches to each other with a variety of evaluation criteria, and then we will explain how this analytical component can be used in choosing the appropriate method. Ultimately, the solutions that can be used to improve the link prediction method are described.

Table 4 Analytical framework for evaluation of ink prediction models

7.3.1 How to compare

It shows that the learning based model has a higher accuracy than the heuristic base. This is because nearly almost every newly proposed method compares its accuracy with the heuristic based methods and it is obvious that the learning base model uses a different type of features for prediction especially the latent feature model which works well among others (Wang et al. 2007, 2011; Yang et al. 2012; Sarkar et al. 2012; da Silva Soares and Prudêncio 2012; Richard et al. 2014; Li et al. 2014a, b; Rahman and Al Hasan 2016). The accuracy of Tensor factorization models has an inverse proportion with network size. As The network size increases, the accuracy decreases. This is because social networks are sparse, and by increasing the network size, sparsity increases sharply(Nickel et al. 2016). Although some works are done in order to deal with the sparsity, like coupled tensor factorization; among the other latent feature based models which has a medium accuracy (Acar et al. 2011; Yılmaz et al. 2011; Nickel et al. 2016). Medium accuracy is assigned to the classification model; accuracy is highly dependent on selecting the feature vector (Al Hasan and Zaki 2011). It is important to note that although heuristic based approaches have low accuracy among others, they have reasonable performance in a homogeneous network for missing link prediction or non-periodic link prediction (Liben-Nowell and Kleinberg 2007; Lü and Zhou 2011; Feng et al. 2012).

As stated in the previous section, the number of parameters has high impact on the computing time and memory. In Table 4 this has been confirmed too. The reason is that we assigned a high cost to the Ensemble of all paths, which is the nature of the approach. Ensemble of all paths approach walk through a graph which has cost (Liben-Nowell and Kleinberg 2007; Lü and Zhou 2011).

Social networks are sparse. When a network becomes big, mostly it becomes sparser than before. In this case, it can be claimed that sparsity characteristic has a direct impact on scalability. There is no single clear winner among the approaches, and in each approach some of the techniques are more considerable about scalability challenges. In general, because ensemble of all paths behaves like an exhaustive search, it is not a proper approach to deal with a large dataset. On the other hand, when a dataset gets larger, tensor dimensions get larger too and for this reason tensor based models do not have a good performance compared to the smaller network (Nickel et al. 2016).

The classification model has less generalization ability, though a lot of researches have been done for scaling classifiers. As it can be seen in Table 4, deep models have more generalization power and this is because they are not mostly dependent on the feature vectors and they extract latent features in different level of abstraction (Li et al. 2014b; Perozzi et al. 2014; Grover and Leskovec 2016). However; deep models consume more time in comparison to the others.

It is worth noting that sparse networks are highly sensitive to noise (Nguyen and Mamitsuka 2012). Among these, heuristic based approaches have a good performance due to their simplicity. Particularly, global indicator works well and is not sensitive to the local changes caused by noise (Lü and Zhou 2011). However, due to the invariance characteristic of deep models, they are more noise resistant (Bengio et al. 2013). Deep architecture can lead to abstract representation because more abstract concepts can often be constructed in terms of less abstract ones. More abstract concepts are generally invariant to most local changes of the input. One considerable difficulty with tensor factorization is that its prediction accuracy tends to be poor because it is sensitive to noise. some studies tried to add side information to over-come these challenges (Yılmaz et al. 2011; Acar et al. 2011; Narita et al. 2012; Ermiş et al. 2012, 2015; Nakatsuji et al. 2016).

Exploring evolutionary pattern is a long-standing goal in periodic link prediction. As shown in Table 4 the latent feature based models have the ability to explore evolutionary patterns in dynamic networks. It indicates that evolutionary patterns are latent features and need a model to consider different time span as input (Sewell and Chen 2016). Among latent feature based models, non-parametric approaches have a high ability to model evolutionary pattern (Sarkar et al. 2014). Nevertheless, it is mostly dependent on feature vectors, so the result is being limited to a specific domain. Regarding deep models, nothing can be claimed yet, because much work has not been done, but it is obvious that the deep model itself has a lot of parameters, if it is applied to the dynamic network, then the process of learning gets too slow.

As it has been said, knowledge representation is a task of discovering non-linear relation in multi-relational data. Tensor-based and deep embedded models have a high ability of knowledge representation in multi-relational data (Dunlavy et al. 2011; Perozzi et al. 2014; Grover and Leskovec 2016; Nickel et al. 2016). This is because a model focuses on the structure of the graph especially relations. As it was stated earlier, the consideration of both the node attribute and structure of the network is the main key to building a successful model. Meanwhile, nonparametric models do not have access to such success. Actually, this approach works well, when the types of relation is not vast (Zhu et al. 2016a). Moreover, heuristic based approached due to their simplicity are unable to discover non-linear relations and it is not a good choice for knowledge representation (Zhai and Zhang 2015).

7.3.2 How to select

Unlike many classification and clustering methods, choosing the right approach for link prediction, accuracy, and cost are not the basic criteria for selecting a proper method. Link prediction is an off-line task and reaching a reasonable result in the shortest possible time has never been considered as a benchmark in link prediction. It should be noted that although link prediction is an off line task and does not always give the desired best time rate, it cannot be ignored in general. Most social networks have high dimensions and time-consuming methods are faced with the challenges of this huge amount of data. On the other hand, in the link prediction, attempt is always made to predict the maximum number of potential links. The question arises as to how the maximum number of potential links, under which conditions and with what definitions can be obtained?

As previously stated, link prediction falls into two categories: future link prediction and missing link prediction. This leads researchers in two separate lines to select an appropriate method for the link prediction problem. In periodic link prediction, evolutionary pattern recognition is always prioritized. As shown in Table 4, latent-feature-based models have better ability to detect evolutionary patterns. Latent-feature-based models are not only concerned with network structures, but also provide a model that considers the evolutionary structure of the graph. Tensor-factorization approach has high capabilities in modeling dynamic networks. It encodes evolutionary patterns in the model and efficiently represents it. Despite the high ability of this approach to code evolutionary patterns, it is not scalable and in high dimensions it is time-consuming. Hence, researchers have tried to combine it with scalable methods, such as similarity-based approaches. Similarity-based approaches are scalable and time saving.

Another criterion that is necessary for consideration in selecting a method is the type of relationship or link that can be predicted. Similarity algorithms have a good ability to predict links in homogeneous networks. It also has a high scalability due to its simplicity. On the other hand, this approach is incapable of dealing with heterogeneous networks because similarity index does not pay attention to the type of relationship and only predicts the existence of the link. In contrast, latent-feature-based approaches have high ability to model knowledge from multi-relationship data. In fact, tensor factorization and deep models are designed according to the multi-relational structure of the graph and have a high ability to discover knowledge and represent it. Tensor-based models and deep models, with the ability to disclose knowledge, have different accuracy on the same dataset. Tensor-based models work better on low-dimensional data sets, while a large amount of information is needed to explore the knowledge in deep models.

7.3.3 How to improve

The third aspect of the analytical assessment of this proposed framework is to provide an overview of open paths to improve methods. After comparing and recognizing the index criteria in choosing the basic approach, improving other methods is the main goal of this analytical assessment. The presentation of these solutions is based on three perspectives: the combination, the use of side information and the expression of ambiguous and dumb points in the link prediction.

Table 4 can be used explicitly for combining methods. For instance, Acar et al. (2009) have taken advantage of the strengths of similarity-based methods to combine tensor decomposition with the Katz method. The purpose of this combination is to use global indices throughout the graph, which, leads to an efficient method by combining the structural features of the tensor. Lee and colleagues also tried to find an evolutionary model by providing a deep model. They also tried to use local indexes in an image of the graph using a neighboring joint with this deep model. Li et al. (2014b) have also tried to find an evolutionary pattern by providing a deep model. They have also attempted to use local indices in a snapshot of the graph using a common neighboring joint with this deep model. Due to the high computational cost, the combination of a deep model with time consuming approaches cause a challenging task and may lead to inefficient methods. While combining it with a low cost method prevents this discussion. Since in social networks, local features and homophily patterns are important, it is likely that such compounds have an effective effect, of which Li et al. (2014b) admits this assumption.

As long as the utilization of side information such as content of the social network is talked about, generalization drops. Generalization is at the minimum level in the classification and ensemble of all path approaches. The feature vector in this approach consists of structural information and social network content. This approach has a high potential for utilizing content information. On the other hand in latent feature based models, the feasibility of using side information has been seen is smaller. Similarity-based approaches, given the simplicity and low cost of computing, have a lot of potential for exploiting content information, and much work has been done in this direction.

by scrutinizing a large number of articles, vague and ambiguous points in relation to the evaluation criteria have been seen. In Yang et al. (2015) and Junuthula et al. (2016), some research has been conducted on the evaluation criteria in dynamic networks. It has been shown that, the ROC curve, a well known metrics in link prediction can be deceptive due the high distribution of imbalance. On the other hand PR curve that has been suggested to replace ROC, cannot distinguish removed links in the network. In order to improve the link prediction methods, it is necessary to determine the performance of the evaluation criteria carefully. For instance, there is no quantitative evaluation criterion in dynamic networks that can measure links that are created only over time. Although in this paper an attempt was made to present qualitative and comparative analyses of evolutionary patterns, it is necessary to make statistical calculations. Similar research is also needed to predict missing links.

In social networks, such a statistical pattern is known as homophily, that is, the tendency of entities to be related to other entities with similar characteristics. This has been widely observed in various social networks( Liben-Nowell and Kleinberg 2007). Now the question arises whether the patterns of social networking end with this pattern? What are the other latent patterns of social networks?

8 Conclusion and future work

This work was motivated by the need to understand potential advantage of link prediction methods compared to each other. This paper presented an analytical framework for link prediction in social networks and illustrated that there are different challenges and techniques. This analytical framework has a structural perspective, which has three components: classification of link prediction approach, evaluation criteria and analytical evaluation. This framework proposes a new classification for like prediction methods in a different view. An attempt was made to collect all the current and major works done and then evaluate this classification based on the presented evaluation criteria. This framework could be used to conduct future studies in social networks.