1 Introduction

One of the most important tasks of complex network analysis is the task of identifying missing links or predicting future links in the network, using the information obtained from the current network, i.e., structural features and/or nodal attributes, commonly known as link prediction [32, 33]. The link prediction task has been applied to solve problems in various fields; therefore, it has been a hot topic for researchers from different disciplinary backgrounds. Examples include but are not limited to recommender systems in commercial applications [12], protein-protein interactions (PPI) in biological networks [48], collaboration prediction in co-authorship networks [50], etc.

Most link prediction methods are based on the topological structure of the networks. While the effectiveness of these methods has been established, they still suffer from neglecting a very important source of information, i.e., features or attributes of nodes in the network. In many real-world networks, each node is associated with an informative set of features. These types of complex networks are called attributed networks [4, 17]. A famous example of these networks would be collaboration networks, in which every researcher has a specific research background and profile that influences the interactions between them [67]. The importance of these features has been studied in the concept of homophily, and many types of research have emphasized the tendency of similar individuals to interact with each other [18, 19]. Hence, the process of link formation is highly affected by homophily [2, 71].

Generally, link prediction approaches are classified into two main categories: similarity-based and learning-based techniques. The first category calculates the probability of link existence based on a similarity score that is assigned to the node pairs. The similarity score is calculated based on the topological structure of the network or nodal attributes [32]. The learning-based methods are divided into three sub-categories: The first sub-category, i.e., feature-based classification, treats the link prediction task as a binary classification problem in which a pre-defined set of features are used as the input of the classifier, and the output of 1/0 indicates link presence/ absence [63]. The second sub-category, i.e., probabilistic models, assumes that the probability of a link existing between any pair of nodes depends on a set of parameters. Therefore, an optimization function is defined based on those parameters, and once the right set of values is learned for the parameters, the conditional probability of the link’s existence is calculated for any pair of nodes [32]. The last sub-category belongs to the dimensionality reduction-based groups.

Most learning-based methods suffer from the high computational complexity that is due to the large size of the networks. Therefore, recently the dimensionality reduction-based methods have been the focus of many researchers. There are two main approaches to deal with the challenges of high dimensionality. The first approach is using embedding techniques which tries to map the nodes to a lower dimension space, such that their structural and non-structural properties are maximally preserved. Embedding techniques have been developed to be applicable for many network analysis tasks such as community detection [5], link prediction [6, 26, 47], node classification [1, 8], and recommender systems [22]. Overall, the main approaches for learning network embedding are classified into Matrix Factorization, Deep Learning, Edge Reconstruction, Graph Kernel, and Generative Model [10]. Recently, many studies have been conducted to apply embedding approaches to attributed networks and take advantage of the node attributes to obtain more accurate embedding vectors [17, 36]. The second approach is called matrix factorization, which receives a matrix, e.g., the adjacency matrix of the current network, as input and decomposes it into two lower-dimension matrices, i.e., a base matrix and a coefficient matrix, such that their product is as close as possible to the original matrix [32]. To obtain the list of potential links, these matrices are used in a supervised or unsupervised framework.

Matrix factorization methods consist of various categories, such as singular value decomposition (SVD) [24], principal component analysis (PCA) [61], and independent component analysis (ICA) [68]. However, for the task of link prediction, the most commonly used matrix factorization based method is non-negative matrix factorization (NMF), which decomposes an original matrix (X ∈ Rn × n ) into the base matrix (W ∈ Rn × k) and the coefficient matrix (H ∈ Rn × k) such that all the elements in the three matrices are greater than or equal to zero [60], which leads to a situation also known as an additive parts-based representation of the original data. This term refers to the fact that only an additive combination of the original data is possible [21, 34].

NMF-based methods have been applied to many different problems, such as computer vision [56], dimensionality reduction [42, 45, 57, 58], and data mining [35]. In addition, NMF-based methods have also been applied to the link prediction problem in many different kinds of networks such as temporal networks [20], weighted networks [14], directed networks [15], and in order to improve their performance, they take advantage of the specific properties of those networks. For example, Chen et al. [14] utilized the edge weights to improve prediction accuracy, while in [15], the direction of links was used in the process of decomposition of the original matrix. Since the effectiveness of this method has been established, due to the high potential of NMF-based methods, in the present work, we decided to benefit from the NMF-based method’s advantages in the attributed networks to solve the link prediction problem. After learning the base and coefficient matrix and multiplying them to find the reconstructed matrix, a score is obtained for every pair of nodes, which determines the probability of there being a link between the node pairs.

In this article, a new robust version of the graph regularization non-negative matrix factorization model combined with graph regularization and structure-attribute similarity is presented to capture semi-local topology structure and attribute information to solve the problem of link prediction in attributed networks. In order to obtain all of the link weight information from the original network, the Biased Local Rand Walk is used to measure the semi-local proximity and attribute similarity between local nodes; then, the graph regularization technology is combined with SARWS to explore topology information. In addition, the ℓ2,1-norm is used to remove random noise and spurious links. Ultimately, a unified link prediction model (GRNMF-AN) is suggested, and in order to learn the parameters of GRNMF-AN, multiplicative updating rules are used. The authors use nine real-world attributed networks and four evaluation metrics to assess the feasibility of the proposed model; the experimental findings indicate that our model outperforms conventional algorithms.

In Section 2, a brief overview of previous works in the field of link prediction will be provided. Then, in Section 3, some of the work’s preliminaries are discussed, such as the definition of structural-attributed similarity, NMF variants, and the proposed algorithm (GRNMF-AN). Section 4 covers the experimental analysis, which involves evaluating the performance of GRNMF-AN compared to state-of-the-art methods and analyzing the effect of the parameter, and Section 5 presents the effectiveness of the GRNMF-AN in link prediction performance. Finally, Section 6 presents the conclusion and future work and Section 5 presents the conclusion.

2 Related work

In this section, we will briefly introduce some of the most important methods that have been proposed to solve the problem of link prediction. Based on published surveys [32] on link prediction in complex networks, the main link prediction methods are classified into two main classes: similarity-based methods, which are the simplest way to predict missing or new links, and learning-based methods, which require more computational resources. In the following, we will discuss each class and provide a few examples of them.

Similarity-based methods calculate the probability of a link between a pair of nodes based on a pre-defined similarity measure and then pick the L links which have the highest similarities. The similarity score between a non-connected node pair is calculated using the topological structure of the network. In general, we can use local, global, and semi-local scores to calculate the similarity between pairs of nodes. Local-based scores use only the information from the local structure of the nodes to obtain the most similar node pairs. Global methods consider the whole structure of the network for calculating the similarity of a node pair. Therefore, they suffer from high computational complexity while benefiting from global information. Semi-local similarities have been able to achieve a trade-off between these two methods. They use more information compared to local indexes, and, unlike global indexes, they do not require high time and computational resources [7, 49]. To improve the performance of similarity-based methods, some researchers proposed benefiting from the side information associated with each node, also known as node attributes. For example, Muniz et al. [46] proposed a framework to combine structural similarity, obtained by common neighbors, node attributes, obtained by profile and gender information, and also temporal information, obtained by time attributes, and get the probability of a link existing between a pair of nodes.

Learning-based methods use structural and non-structural properties of networks as inputs for a machine learning framework to learn the probability of a link existing between a pair of nodes. These methods are divided into feature-based classification, probabilistic models, and dimensionality reduction methods [32]. The first class, i.e., feature-based classification, considers the link prediction problem as a binary classification task and applies a learning model such as a decision tree, neural network, and support vector machine to predict the labels for each pair of nodes. Structural and non-structural features can be used as input for classification [53]. The main challenge in this approach is how to define the right set of features to get the best results. Keikha et al. [30] proposed a link prediction framework based on deep learning, which extracts the best set of features from structural information and nodal attributes and eliminates the need for manual feature engineering. Structural information includes information from the local and global structure of nodes.

The second class, i.e., the probabilistic model, defines an objective function with a set of parameters and tries to obtain the current structure of the network-based optimization function with the right values for the parameters. When the right set of parameters are learned, the probability of a link existing between a pair of nodes is calculated using the conditional probability P(Aij = 1 |θ), the probabilistic [62] and maximum likelihood [27] based methods have been widely studied, and several methods based on them have been proposed to solve the link prediction problem. Their most challenging aspect is that they require parameter tuning, which is computationally expensive and very time-consuming. Therefore, it is not applicable to large-scale networks. The last class, i.e., dimensionality reduction-based methods, has been the focus of attention by lots of researchers due to their applicability to large-scale networks. In general, they consist of two main approaches, i.e., embedding techniques and matrix factorization.

Embedding-based methods map the graph’s information to a low-dimensional space in which the structural information of the graph and its components, e.g., nodes, edges, communities, etc., are preserved as much as possible. Some comprehensive surveys [10] provide a detailed study of graph embedding techniques, which are great references for embedding approaches and their applications. Here we summarize the three main approaches to graph embedding techniques. The first category is matrix factorization, in which the graph is represented in the form of a matrix, and then the matrix is factorized, and the embedding vectors for nodes are obtained. Menon and Elkan [43] are the pioneers of this technique by proposing a matrix factorization framework for graph structure and using it to solve the link prediction method for the first time. Other examples of this category include the Hope method [51], which is uses \( {\left||\ \mathrm{S}-{\mathrm{Y}}_{\mathrm{s}}{\mathrm{Y}}_{\mathrm{t}}|\right|}_{\mathrm{F}}^2 \) as an optimization function and tries to minimize it. S is the similarity matrix in this setting, which can be defined using several similarity measures, e.g., common neighbors, the Adamic-Adar index, etc.

Another approach to learning the embedding of graphs uses Deep Learning methods. Most of these methods use the autoencoder structure to achieve a dimensionality reduction on the graph structure. For example, SDNE [65] uses a deep autoencoder to maintain the graph’s first and second-order proximity. DNGR [11] is another example of using auto-encoder networks, but its input consists of a positive pointwise mutual information (PPMI) matrix, similar to the similarity matrix used in Hope. GraphSAGE [29] is a method that uses convolutional neural networks (CNNs) to learn the embedding vectors for attributed graphs. The last approach to learning graph embedding is based on random walks. DeepWalk [54] and Node2Vec [26] were the two most well-known proposals in this category. First, these methods represent the graph as a node sequence generated by random walks. Then they apply a natural language processing method called SkipGram [44] to maximize the co-occurrence probability of the nodes that are located in the w-sized window in the random walk. The difference between DeepWalk and Node2Vec is in the generated random walks used to learn the embedding vectors. While DeepWalk uses a pure random walk, Node2Vec takes advantage of a biased random walk, providing a trade-off between structural equivalence and homophily.

Recently, some research has been conducted to incorporate node attributes in the representation learning process to improve the quality of obtained embeddings, which are then used for the prediction of new links. For instance, Yang et al. [70] proposed a matrix factorization-based method called TADW, which took advantage of node attributes to enhance the quality of node embeddings. Pen et al. [52] proposed a framework to combine the structural information, nodal attributes, and node labels to learn embedding vectors. Xu et al. [69] proposed a method called GANE to combine different types of biological information to learn protein representations, which then can be used to predict PPI and disease genes. Masrour et al. [41] studied the impact of three different approaches for incorporating node attribute information in the process of representation learning and how they can improve the accuracy of link prediction.

Matrix factorization-based methods take an original matrix X and try to learn two lower rank matrices W and H, such that the product of W and H is as close as possible to X. These methods preserve local and global properties of networks in the form of a matrix, e.g., adjacency matrix, similarity matrix, etc., and factorize it to perform the link prediction task [20]. Most authors proposed methods based on non-negative matrix factorization to apply for the link prediction problem. For example, Chen et al. [16] proposed an NMF-based framework for the link prediction problem by combining manifold regularization and sparse learning. Ma et al. [39] proposed an asymmetric NMF (SNMF) method be applied to the link prediction task in temporal networks, which consisted of three steps. First, it uses SNMF to discover features in each time step. Next, it combines the feature matrices from the previous step to obtain a unified feature matrix. Finally, that feature matrix is used to predict new links in the current time step. In another work, presented by Chen et al. [14], a graph regularized NMF-based framework was proposed to solve the link prediction problem by using link weights and capturing local topology information. Ma et al. [40] applied graph regularized NMF to temporal networks to predict new links. In the time step T, GrNMF factorized the matrix associated with GT while considering the networks in previous time steps as regularization. Therefore, it captures the topological information of temporal networks.

NMF-based methods have also been applied to attributed networks. Chen et al. [13] proposed an NMF-based framework to solve the link prediction problem in attributed networks. The proposed method simultaneously decomposed the adjacency matrix A and the attribute matrix B to map them to a lower dimension space while keeping the representations at a minimum distance and similar to each other. Gao et al. [25] proposed an NMF-based model in which three types of information, i.e., local information of nodes, the global structure of the network, and nodal attributes, were combined into a unified optimization function.

3 Methodology

In this section, the authors include some notions and preliminary information concerning the definition of the variants NMF and the similarity matrix using a weighted-biased random walk and the proposed method.

3.1 Notions and notations

Given an attributed network G = (V, E, A) with n nodes V = {1, 2, …, n}, m links E = {e1, e2, …., em} and t node attributes Ai ={ a1, a2, …., at}, the network is usually represented by an adjacent matrix X = (Xij) n × n and an attribute matrix A = (Aik) n × m, where Xij = 1 if a link exists between nodes i and j, or 0 otherwise; Aik = 1 if node i has the kth attribute, or 0 otherwise. The authors also consider the attributed graphs as undirected, connected, simple graphs, and all the node attributes conform to a unique multi-dimensional schema, A. Other notations, which will be often used in this paper, are summarized in Table 1.

Table 1 Some often used parameters

3.2 A brief review of various versions of NMF

In this section, Non-negative Matrix Factorization (NMF), Graph Regularized Non-Negative Matrix Factorization (GNMF), and weighted non-negative matrix factorization (WNMF) are briefly reviewed.

Non-negative matrix factorization (NMF) is a computational method for reducing the linear dimensionality of a given data matrix X, and it can be used to solve complex data mining and machine learning challenges. An NMF decomposes an initial data matrix into two low-dimensional non-negative matrices. One of the decomposed matrices is a coefficient matrix, which is used to store a low-dimensional representation, while the other is a basic matrix, which may be considered as parts-based representations of the original data. In the following, a summary of the Non-negative Matrix Factorization problem is stated:

Given a non-negative matrix Xn ∗ n, find two non-negative matrices Wn ∗ k and Hn ∗ k with the condition k << rank(X), that minimize F(XWHT), where F(A, B) is a loss function defining the “distance” between the matrices A and B. The choice of the loss function F affects the solution of the minimization problem. One popular choice is the Frobenius norm (or the Euclidean Distance).

$$ {\mathit{\min}}_{W,H\ge 0}={\left\Vert X-W{H}^T\right\Vert}_F^2, $$
(1)

where ‖.‖F indicates the Frobenius norm, constrain W, H ≥ 0 requires that all the elements in matrices W and H are non-negative.

NMF optimization is a convex optimization problem [29]. According to the NP-hardness of the problem and the lack of suitable convex formulations, non-convex formulations with remarkably straightforward solvability are usually employed, and only local minima can be achieved in a reasonable time for computation. The multiplicative iterative updating of the Frobenius norm can be achieved as follows:

$$ {W}_{ik}\leftarrow {W}_{ik}\frac{(XH)_{ik}}{{\left(W{H}^TH\right)}_{ik}}, $$
(2)
$$ {H}_{jk}\leftarrow {H}_{ik}\frac{{\left({X}^TH\right)}_{jk}}{{\left({H}^TH{W}^T\right)}_{jk}}, $$
(3)

where at the very outset of the iterative update process, the two non-negative matrices W0 and H0 are initialized randomly. The iterative update process is performed until the given terminal condition is fulfilled, as presented in (2) and (3). Finally, the final W and H can be obtained.

However, NMF has two major shortcomings, one of which is that it only recognizes global data structures of X and ignores the local relationships between data. A graph regularization is applied to NMF to solve the first problem. Cai et al. [70] introduced the graph regularization non-negative matrix factorization (GNMF), which is a well-known effective method that incorporates a graph Laplacian term to consider the intrinsic geometrical structure. The objective function of GNMF, in particular, can be seen as follows:

$$ {\mathit{\min}}_{W,H\ge 0}={\left\Vert X-W{H}^T\right\Vert}_F^2+\upalpha\ Tr\left( HL{H}^T\right), $$
(4)

where α ≥0 is the regularization parameter, L = D − S is called the Laplacian matrix, which is based on spectral graph and manifold learning theories. D is a diagonal matrix with \( {D}_i=\sum \limits_j^n{S}_{ij} \), and S is a similarity matrix between nodes. The multiplicative update rules to solve (4) are given as the following.

$$ {W}_{ik}\leftarrow {W}_{ik}\frac{(XH)_{ik}}{{\left(W{H}^TH\right)}_{ik}}, $$
(5)
$$ {H}_{jk}\leftarrow {H}_{ik}\frac{{\left({X}^TW+\lambda SH\right)}_{jk}}{{\left(H{W}^TW+\lambda DH\right)}_{jk}}. $$
(6)

Additionally, Cai et al. [9] proved that the objective function under these two update rules is convergent. In this way, the manifold learning theory can be combined with the NMF, which has an excellent performance.

Another problem of NMF is that it is extremely sensitive to outliers and noises because of applying the squared error function to measure the loss; this issue leads to the objective function being easily dominated by a few outliers with large errors. Kong et al. [5] proposed ℓ2,1-norm to enhance the robustness of NMF. It alleviates the impact of noises or outliers by utilizing ι2, 1-norm to replace the Frobenius norm. The objective function of RNMF is defined as:

$$ {\mathit{\min}}_{W,H\ge 0}={\left\Vert X-W{H}^T\right\Vert}_{2,1}^2, $$
(7)

where ‖.‖2, 1 represents the ℓ2,1 norm.

Other modified versions of the negative matrix factorization algorithm include weighted non-negative matrix factorization (WNMF) [31], which is applied to emphasize the significance of important components; the element is more important in the case of higher weights. The objective function for general weighted non-negative matrix factorization can be formulated as follows:

$$ {\mathit{\min}}_{W,H\ge 0}={\left\Vert Yo\ \left(X-W{H}^T\right)\right\Vert}_F^2, $$
(8)

where \( {\left\Vert .\right\Vert}_F^2 \) is the Frobenius norm, Y ∈ Rn ∗ n implies the weight matrix, and o denotes the Hadamard product. Since the objective function in the equation is not convex with W and H jointly, the purpose is to find a local minimum by iteratively updating W and H in a similar way with the unweight NMF in the eq. (1).

3.3 Constructing the similarity matrix based on the random walk

In the attributed network, two data sources can be applied to perform the link prediction task. The first source of data comes from the network and the set of connections between nodes; the second is the data regarding the nodes and their attributes. With the increase of rich graph attributes, including gene annotations in protein interaction networks and user profiles in social networks, it is more important than ever to consider both the structure and attribute information of graphs for high-quality link prediction.

According to the homophily property of social networks, the relationships between nodes with similar attributes tend to be stronger than the relationships between nodes with different attributes, and they will be more probable to connect in the network [4, 23]; therefore, attribute information can influence the existence of links in networks. In order to further improve the efficiency and accuracy of node similarity measurement in the attributed network, the Structural and Attribute Random Walk Similarity (SARWS) is presented to compute the similarity between nodes through fusing structure and attribute information.

The first step is embedding the information about vertex attribute similarity into a transformed weighted graph G0 = (V, E, W). In particular, for each edge e = (ui, uj) ∈ E, an edge weight w(e) is assigned to quantify the vertex attribute similarity for ui and uj. As a result, the vertex attribute information of G is encoded into the weighted graph G0 as edge weights. The well-known cosine similarity of the angle between two node vectors is used to measure the similarity between two nodes. The reason for choosing cosine similarity is its effectiveness for sparse vectors that consider only non-zero values. For two nodes ui and uj, whose attribute vector is Ai ={ ai1, ai2, …., ait} and Aj ={ aj1, aj2, …., ajt}, respectively, the attribute similarity is expressed as follows:

$$ ATSIM\left({u}_i,{u}_j\right)=\frac{\sum_{d=1}^t{A}_{id}{A}_{jd}\kern0.5em }{\sqrt{\sum_{d=1}^t{\left({A}_{id}\right)}^2}.\sqrt{\sum_{d=1}^t{\left({A}_{jd}\right)}^2}}, $$
(9)

where t denotes the dimension of an attribute vector.

The second step is to run the biased random walk on the weighted graph to discover similarities between the nodes. Each node initially has a walker in random walk methods, so each walker would randomly select a neighbor of the node it currently stands on to localize. By a special rule of transition probability, the random walk similarity is constructed for a pair of nodes which could better capture both the information potential of topological and attribute relationships between nodes. A weight-biased random walk on a graph can be defined by a more general transition matrix, where the element pij gives again the probability that a walker on the node ui of the graph will move to node uj in a single step but depending on appropriate weights for each pair of vertex ui and uj. In the present article, the appropriate weight for each pair of nodes in the network is considered to be determined proportional to the attribute similarity (ATSIM) between the nodes obtained from the Eq. (9).

Therefore, a transition probability pij = ATSIMij on each link (ui, uj) is assigned. The Local Random Walk (LRW) algorithm [37] uses semi-local information to obtain similarities between nodes. According to the Bias Local Random Walk model, the final formula is defined as follows:

$$ {S}_{ij}^{BLRW}\left(\upeta \right)={\sum}_{l=1}^{\upeta}\frac{d_i}{2\left|E\right|}.\frac{ATSIM_{ij}}{\sum_{j\in \Gamma (i)}{ATSIM}_{ij}}(l)+\frac{d_j}{2\left|E\right|}.\frac{ATSIM_{ij}}{\sum_{i\in \Gamma (j)}{ATSIM}_{ij}}(l), $$
(10)

in which η is the number of random walk steps. d and E are referred to as the degree of node and the number of existing links in the network, respectively. In the SARWS similarity matrix, each entity obtains the similarity between nodes using structural information and attributes by applying a formula. The obtained matrix captures nodal attribute similarity in addition to the higher-order structural proximity of node pairs. Therefore, it can be treated as the weight matrix of the graph. For the sake of simplicity, from now on, we refer to the SARWS matrix as S.

3.4 The unified model: RGNMF-AN

In this section, a new algorithm is proposed, which overcomes the deficiency of GNMF for the attributed network in the link prediction problem. Firstly, the formulation of the objective function is introduced; then, the optimization method is assigned. Under the NMF framework, the novel RGNMF-AN model is proposed, which considers topology structures and node attribute information simultaneously to optimize a unified objective function. By integrating topological and attribute information of network through matrix S in Section 3.3 and enabling ℓ2,1-norm to constrain the loss function and a regularization term, a unified model RGNMF-AN is proposed for link prediction in attributed networks. The optimized overall objective function is expressed as follows:

$$ {\mathrm{J}}_{\mathrm{RGNMF}-\mathrm{AN}}={\min}_{W,H\ge 0}{\left\Vert \mathrm{S}\ \mathrm{o}\ \left(X-W{H}^T\right)\right\Vert}_{2,1}^2+\upalpha \mathrm{Tr}\left(H{\mathrm{L}}_{\mathrm{S}}{H}^{\mathrm{T}}\right), $$
(11)

where o is the Hadamard product, ‖.‖2, 1 represents the ℓ2,1-norm, and α denotes the regularization parameter. LS = D−S is called the Laplacian matrix, which is based on spectral graph theory and manifold learning theory. D is a diagonal matrix with \( {D}_i=\sum \limits_j^n{\mathrm{S}}_{ij} \), and S is the structure-attribute similarity between nodes in the objective function. In original GNMF algorithms, the similarity matrix is represented by the network’s adjacency matrix; therefore, the attributed information of nodes cannot be expressed, and the network information is limited. However, the S similarity has been applied, which preserves a higher level of similarity between nodes in terms of structure topology and attribute information.

The link prediction similarity scores are determined in the first term. To tackle random noises in the observed network, it is important to apply ℓ2,1-norm to a loss function to achieve similarity score accuracy. The second term, Tr(HTLSH), is a manifold regularization, which is a combination of local similarity, structure topology, and node attributed to obtain local topology information in a new latent space. In the original GNMF, the method utilizes the adjacency matrix to compute the Laplacian matrix. Despite all of its benefits, the adjacency matrix contains only 0/1 values for any pair of nodes, which results in not being able to distinguish between node pairs. While in real-world situations, each interaction has a specific value in the network. Therefore, we need to consider a new approach to obtain a more informative and accurate similarity matrix.

Since the objective function JRGNMF − AN is non-convex, it is extremely challenging to find the optimal global solution. The Lagrange function is used to update the objective function, which includes two variables W and H. It is convex for each matrix by fixing the other. According to matrix trace properties: Tr(WH) = Tr(HW), ‖A2, 1=trace (AUAT), the objective function can be rewritten as follows:

$$ {\displaystyle \begin{array}{c}{\mathrm{J}}_{\mathrm{RGNMF}-\mathrm{AN}}=\underset{\mathrm{U}\ge 0,\mathrm{W}\ge 0\ }{\min \left(\mathrm{W},\mathrm{H}\right)\ }{\left\Vert \mathrm{S}\ \mathrm{o}\ \left(\mathrm{X}-\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\right\Vert}_{2,1}+\upalpha\ \mathrm{trace}\left({\mathrm{H}}^{\mathrm{T}}\mathrm{LH}\right)\\ {}=\mathrm{trace}\left(\left(\mathrm{X}-\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\mathrm{U}{\left(\mathrm{S}\ \mathrm{o}\ \left(\mathrm{X}-\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\right)}^{\mathrm{T}}\right)\end{array}} $$
(12)

in which the diagonal matrix U is defined as U = [uii]n × n with the diagonal elements given by

$$ {\mathrm{u}}_{\mathrm{i}\mathrm{i}=}{\left\Vert {\left({\mathrm{S}}^{1/2}\ \mathrm{o}\ \mathrm{X}\right)}_{\mathrm{i}}-{\left({\mathrm{S}}^{1/2}\ \mathrm{o}\ \left(\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\right)}_{\mathrm{i}}\right\Vert}_2, $$
(13)

where (S1/2 ο X)i and (S1/2 ο (WHT))i is the ith column of S1/2 ο X and S1/2 ο (WHT), respectively. In addition to this, please note that S1/2 ο S1/2. Hence, we have

$$ {\displaystyle \begin{array}{c}{\left\Vert \mathrm{X}-\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right\Vert}_{2,1}=\mathrm{trace}\left(\left(\mathrm{X}-\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\ \mathrm{U}\ {\left(\mathrm{S}\ \mathrm{o}\ \left(\mathrm{X}-\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\right)}^{\mathrm{T}}\right)\\ {}=\mathrm{trace}\left(\left(\mathrm{X}-\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\ \mathrm{U}\ \left(\mathrm{S}\ \mathrm{o}\ {\mathrm{X}}^{\mathrm{T}}-\mathrm{S}\ \mathrm{o}\ \left(\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\right)\right)\\ {}\begin{array}{c}=\mathrm{trace}\left(\left(\mathrm{X}-\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\left(\ \mathrm{U}\ \left(\left(\mathrm{S}\ \mathrm{o}\ {\mathrm{X}}^{\mathrm{T}}\right)-\mathrm{U}\ \left(\mathrm{S}\ \mathrm{o}\ \left(\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\right)\right)\right)\right)\\ {}=\mathrm{trace}\left(\left(\mathrm{X}\mathrm{U}\left(\mathrm{S}\ \mathrm{o}\ {\mathrm{X}}^{\mathrm{T}}\ \right)\right)-\mathrm{XU}\left(\ \left(\mathrm{S}\ \mathrm{o}\ \left(\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\right)-\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\mathrm{U}\kern0.5em \left(\left(\mathrm{S}\ \mathrm{o}\ {\mathrm{X}}^{\mathrm{T}}\right)+\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\mathrm{U}\ \left(\mathrm{S}\ \mathrm{o}\ \left(\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\right)\right)\right)\right)\end{array}\end{array}} $$
(14)
  1. 1.

    First term:S ο (X − WHT)‖2, 1

    1. 1.1.

      Second part: trace ((S ο XT)UHWT)

We have

$$ \frac{\partial \mathrm{trace}\left(\left(\mathrm{S}\ \mathrm{o}\ \mathrm{X}\right)\mathrm{UH}{\mathrm{W}}^{\mathrm{T}}\right)}{\mathrm{\partial W}}=\left(\mathrm{S}\ \mathrm{o}\ \mathrm{X}\right)\mathrm{UH}, $$
(15)
$$ \frac{\partial \mathrm{trace}\left(\left(\mathrm{S}\ \mathrm{o}\ \mathrm{X}\right)\mathrm{UH}{\mathrm{W}}^{\mathrm{T}}\right)}{\mathrm{\partial H}}=\mathrm{U}\left(\mathrm{S}\ \mathrm{o}\ {\mathrm{X}}^{\mathrm{T}}\right)\mathrm{H}, $$
(16)
  1. 1.2.

    Third part: trace (WHTU(Sο(HWT)))

We have

$$ \frac{\partial \mathrm{trace}\left(\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\mathrm{U}\left(\mathrm{S}\ \mathrm{o}\ \left(\mathrm{H}{\mathrm{W}}^{\mathrm{T}}\right)\right)\right)}{\mathrm{\partial W}}=2\left(\mathrm{S}\ \mathrm{o}\ \left(\mathrm{H}{\mathrm{W}}^{\mathrm{T}}\right)\right)\mathrm{UH}, $$
(17)
$$ \frac{\partial \mathrm{trace}\left(\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\mathrm{U}\left(\mathrm{S}\ \mathrm{o}\ \left(\mathrm{H}{\mathrm{W}}^{\mathrm{T}}\right)\right)\right)}{\mathrm{\partial H}}=2\mathrm{U}\left(\mathrm{S}\ \mathrm{o}\ \left(\mathrm{H}{\mathrm{W}}^{\mathrm{T}}\right)\right)\mathrm{W}, $$
(18)
  1. 2.

    Second term: trace(HTLH)

    $$ \frac{\partial trace\left({H}^T LH\right)}{\partial H}=2 LH $$
    (19)
  2. 3.

    Derivative of J with respect W and H

Considering the constraints W, H>≥0, we set Largrange multipliers A, B for them. Then, the Largrange function of the problem can be written as

$$ {J}_{RGNMF- AN}={\left\Vert S\ o\ \left(X-W{H}^T\right)\right\Vert}_{2,1}+\alpha\ trace\left({H}^T LH\right)+ trace\left({AW}^T\right)+ trace\left({BH}^T\right) $$
(20)

According to (2), (3), (4), and (5), we have

$$ \frac{\mathrm{\partial L}}{\mathrm{\partial W}}=-2\left(\mathrm{S}\ \mathrm{o}\ \mathrm{X}\right)\mathrm{UH}+2\left(\mathrm{S}\ \mathrm{o}\ \left(\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\right)\mathrm{UH}+\mathrm{A}. $$
(21)
$$ \frac{\mathrm{\partial L}}{\mathrm{\partial W}}=-2\mathrm{U}\left(\mathrm{S}\ \mathrm{o}\ \mathrm{X}\right)\mathrm{W}+2\mathrm{U}\left(\mathrm{S}\ \mathrm{o}\ \left(\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\right)\mathrm{W}+2\upalpha \mathrm{LH}+\mathrm{B}. $$
(22)

According to the Kuhn-Tucker conditions, we have AijWij = 0 and BijHij = 0 , and hence we get

$$ -2{\left(\left(\mathrm{S}\ \mathrm{o}\ \mathrm{X}\right)\mathrm{UH}\right)}_{\mathrm{ij}}{\mathrm{W}}_{\mathrm{ij}}+2{\left(\left(\mathrm{S}\ \mathrm{o}\ \left(\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\right)\mathrm{UH}\right)}_{\mathrm{ij}}{\mathrm{W}}_{\mathrm{ij}}+{\mathrm{A}}_{\mathrm{ij}}{\mathrm{W}}_{\mathrm{ij}}=0. $$
(23)
$$ -2\ {\left(\left(\mathrm{S}\ \mathrm{o}\ {\mathrm{X}}^{\mathrm{T}}\right)\mathrm{W}\right)}_{\mathrm{ij}}{\mathrm{H}}_{\mathrm{ij}}+2{\left(\mathrm{U}\left(\mathrm{S}\ \mathrm{o}\ \left(\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\right)\mathrm{W}\right)}_{\mathrm{ij}}{\mathrm{H}}_{\mathrm{ij}}+2\upalpha\ {\left(\left(\mathrm{S}-\mathrm{D}\right)\mathrm{H}\right)}_{\mathrm{ij}}{\mathrm{H}}_{\mathrm{ij}}+{\mathrm{B}}_{\mathrm{ij}}{\mathrm{H}}_{\mathrm{ij}}=0. $$
(24)

Therefore, we have

$$ {\mathrm{W}}_{\mathrm{ij}}\leftarrow \sqrt{{\mathrm{W}}_{\mathrm{ij}}\frac{{\left(\left(\mathrm{S}\ \mathrm{o}\ \mathrm{X}\right)\mathrm{UH}\right)}_{\mathrm{ij}}}{\left(\mathrm{S}\ \mathrm{o}\ \left(\mathrm{W}{\mathrm{H}}^{\mathrm{T}}\right)\right)\mathrm{UH}\Big){}_{\mathrm{ij}}}}. $$
(25)
$$ \kern2em {\mathrm{H}}_{\mathrm{ij}}\leftarrow \sqrt{{\mathrm{H}}_{\mathrm{ij}}\frac{{\left(\mathrm{U}\left(\mathrm{S}\ \mathrm{o}\ {\mathrm{X}}^{\mathrm{T}}\right)\mathrm{W}+\upalpha \mathrm{DH}\right)}_{\mathrm{ij}}}{\left(\mathrm{U}\left(\mathrm{S}\ \mathrm{o}\ {\mathrm{H}\mathrm{W}}^{\mathrm{T}}\right)\mathrm{W}+\upalpha \mathrm{SH}\right)\Big){}_{\mathrm{ij}}}}. $$
(26)

We obtain the new basic matrix W and the feature matrix H by minimizing Eqs. (25) and (26). Finally, we can reconstruct the similarity score of the original network with \( \hat{X}=W{H}^T \) to obtain the link prediction similarity. The RGNMF-AN algorithm is as follows:

figure a

3.5 Proof of convergence

The RGNMF-AN model uses iterative updating rules to optimize the model. To be more specific, at each iteration, we update a variable e.g. W, while keeping fixing the other variable, e.g. H. The proof of convergence for Eqs. (25) and (26) is given in [64]. Therefore, the convergence of the RGNMF-AN under those updating rules to the local minimum can be easily proven.

3.6 Computational complexity analysis

The computational cost of the RGNMF-AN algorithm is analyzed, which consists of two key steps. The first step is to construct a similarity matrix, and the second step is to update the rules for optimizing the objective function. For the first key step, constructing a similarity (SARWS), one needs to O(n2d) to calculate a similar node clustering matrix, where d denotes the average node degree. For the second key step, in each iteration, updating W and H according to Eqs. (17) and (19) require O(n2 ∗ k ∗ iter). Since both iter and k are constants, the overall time complexity of RGNMF-AN is O(n2).

4 Experiments

4.1 Datasets

To evaluate the proposed method, we conducted some experiments on 9 real-world datasets, including citation and biological networks. All the experiments were carried out on a desktop PC equipped with a quad-core Intel i7 2.20GHz processor and 16GB RAM. Details of these networks can be found in Table 2.

  • CoraFootnote 1 [59] is a citation network in which nodes present machine learning papers, and an edge between two papers is formed only if one of them is cited by the other. Keywords in papers are treated as node attributes. It consists of 2708 nodes and 5278 edges.

  • CiteSeer is a citation network in which nodes present scientific papers, and an edge between two papers is formed only if one of them is cited by the other. Keywords in papers are treated as node attributes. It consists of 3312 nodes and 4732 edges.

  • Cornell, Texas, Washington, and WisconsinFootnote 2 are subnetworks drawn from the WebKB dataset. Each of them contains web pages as nodes and links connecting them. Cornell has 195 nodes and 304 edges. Texas consists of 187 nodes and 328 edges. Wisconsin contains 265 nodes and 530 edges.

  • Protein-Protein-Interaction Networks

Table 2 Details of datasets. |V|, |E|, and |Attr| are the number of nodes, links, and attributes of each node, respectively.CC is the average clustering coefficient, Max_Degree is the maximum degree of a node in the network, 〈d〉 is the average shortest path in the network and r is the assortative coefficient

We also evaluated our proposed method, using three protein networks provided by Guo’s dataset [28]. E. coli dataset containing 1834 proteins with 6954 interactions, Drosophila dataset containing 7059 proteins with 21,975 interactions, and C. elegan dataset containing 1607 proteins with 2877 interactions.

4.2 Comparing methods

To evaluate our proposed method, we compare it against some of the state-of-the-art methods for representation learning. The description of these methods can be found here:

  • NMF is used directly and the reconstructed matrix is considered as the score matrix.

    $$ \underset{W\ge 0,H\ge 0\ }{\min }{\left\Vert \left(X-W{H}^T\right)\right\Vert}_F^2 $$
  • GNMF [9] method combines the link structure with graph neighbor information of nodes.

    $$ \underset{U\ge 0,V\ge 0\ }{\min }{\left\Vert X-W{H}^T\right\Vert}_F^2+\lambda Tr\left({H}^T LH\right), $$

where λ is the parameter, L is the Laplacian matrix that is obtained via L = D − S, in which D is the diagonal matrix and S is the similarity matrix between node pairs.

  • FSC-NMF [3] is an NMF-based method that learns node embeddings while taking the structure and contents of nodes into account. The model includes two optimization functions that iteratively optimize them.

    $$ {\displaystyle \begin{array}{c}\underset{B_1\ge 0,{B}_2\ge 0\ }{\min }{\left\Vert A-{B}_1{B}_2\right\Vert}_F^2+{\alpha}_1{\left\Vert {B}_1-U\right\Vert}_F^2+{\alpha}_2{\left\Vert {B}_1\right\Vert}_F^2+{\alpha}_3{\left\Vert {B}_2\right\Vert}_F^2,\\ {}\underset{U\ge 0,V\ge 0\ }{\min }{\left\Vert C-U{V}^T\right\Vert}_F^2+{\beta}_1{\left\Vert U-{B}_1\right\Vert}_F^2+{\beta}_2{\left\Vert U\right\Vert}_F^2+{\beta}_3{\left\Vert V\right\Vert}_F^2,\end{array}} $$
  • TADW [70] incorporates the text features of each node into the embedding process under a framework of matrix factorization

    $$ \underset{W,H\ }{\min }{\left\Vert X-{W}^T HT\right\Vert}_F^2+\frac{\lambda }{2}\left({\left\Vert W\right\Vert}_F^2+{\left\Vert H\right\Vert}_F^2\right) $$
  • M-NMF [66] uses a matrix factorization approach to learn the representation of nodes while capturing the community structure of the network

    $$ \underset{M\ge 0,U\ge 0,H\ge 0,C\ge 0\ }{\min }{\left\Vert S-M{U}^T\right\Vert}_F^2+\alpha {\left\Vert H-U{C}^T\right\Vert}_F^2-\kern0.5em \beta Tr\left({H}^T BH\right), $$

where α and β are non-negative parameters.

  • GraphSAGE [29] is a representation learning method based on CNN, which uses the nodes’ attributes of the neighborhood of each node to learn the embedding vector of that node.

4.3 Evaluation metrics

AUC [38]: The AUC metric evaluates link prediction methods based on the scores, they give to non-observed links. The AUC calculation is done through the following procedure: First, we compare the scores of randomly selected missing links vs. randomly selected non-existent links. Then, among n comparisons, assume the score of the missing link has been higher than the score of the non-existent link for n’ times and also for n”, the scores were equal. Then AUC can be defined as:

$$ AUC=\frac{n^{\prime }+0.5\ {n}^{\hbox{'}\hbox{'}}}{n} $$

F-measure: F1-measure can be interpreted as the weighted average of two other evaluation metrics, i.e. precision and recall. The F1-measure can be calculated as the following:

$$ F1=2\ast \frac{\left( precision\ast recall\right)}{\left( precision+ recall\right)} $$

In which, the precision and recall can be calculated as follows:

$$ {\displaystyle \begin{array}{c} precision=\frac{True\ Positive}{True\ Positive+ False\ Positive},\\ {} recall=\frac{True\ Positive}{True\ Positive+ False\ Negative}\end{array}} $$

ROC curve [55]: this curve can compare the capability of different models in identifying positive and negative samples, in varying decision thresholds. If model A has a higher true positive rate with respect to a false-positive rate, in all the thresholds, compared to model B, then model A is outperforming model B in the classification task. This curve is appropriate when we are dealing with a highly imbalanced dataset, like complex networks, in which the percentage of the negative class is significantly higher than the positive class.

RMSE and PCC are the standard deviations of the differences between the vectors of predicted and actual values of node pairs.

$$ {\displaystyle \begin{array}{c} RMSE=\sqrt{\frac{\sum_{i,j}{\left({r}_{ij}-{A}_{ij}\right)}^2}{n}},\\ {} PCC=\frac{\sum_{i,j}\left({A}_{ij}- average(A)\right)\left({r}_{ij}- average(r)\right)}{\sqrt{\sum_{i,j}{\left({A}_{ij}- average(A)\right)}^2}\ \sqrt{\sum_{i,j}{\left({r}_{ij}- average(r)\right)}^2}},\end{array}} $$

in which, Aij and rij are the actual value and the predicted value, respectively.

4.3.1 Parameter settings

To evaluate the proposed method against other methods, as usual, we need to split our data into training and testing sets. To get the positive samples for the test set, we randomly delete 10% of edges from the original network while making sure that the residual network is obtained after the edge removals are connected. And add them to the test set. The remaining edges belong to the training set. To obtain negative samples for training and testing, we randomly select an equal number of node pairs that have no link connecting them. Other parameters were set empirically. In particular, α = 0.1, max_iter = 40, and the dimension of latent space to 70.

4.4 Experiment analysis

The obtained results for the AUC measure are summarized in Table 3. The best-obtained result for each dataset is shown, highlighted in bold. It is shown that in all datasets, the proposed method has outperformed other methods. RGNMF-AN and RGNMF are abbreviations for the proposed methods with and without considering node attributes in the random walk generation process. In particular, except for CiteSeer and C.elegans, the proposed method has achieved the highest AUC in all the networks. To be more specific, the obtained results show that adding structure and content information can significantly improve the prediction accuracy compared to regular GNMF. For example, in Wisconsin, Cora, and CiteSeer networks, RGNMF-AN achieved 11%, 34%, and 29% improvement and RGNMF achieved 10%, 33%, and 27% of AUC, compared to GNMF. Also, it is shown that by using both structural and content information, we can obtain better results compared to using only structural information.

Table 3 AUC results for link prediction

The micro-F1 scores that resulted from all the algorithms are reported in Table 4. The best-obtained result for each dataset is shown, highlighted in bold. Results show that in all the datasets, the proposed methods have a better or competitive advantage over NMF and GNMF. In addition to that, in all networks, the proposed methods significantly outperformed embedding-based methods. For instance, the RGNMF-AN and RGNMF have achieved 39%, 30%, and 29% higher F-measure in Cornell, Washington, and Wisconsin compared to the best results obtained by embedding methods.

Table 4 F-measure results for link prediction

Tables 5 and 6 report the RMSE and PCC results for the proposed methods vs. comparing methods. As Table 5 shows, RGNMF-AN and RGNMF have achieved the lowest RMSE, compared to NMF and GNMF in all the networks except for CiteSeer. Compared to the embedding-based methods, i.e., FSCNMF, TADW, and MNMF, the proposed methods, achieved remarkable improvement in all networks. From Table 6, it can be concluded that overall, the proposed methods do not perform well in terms of PCC, compared to embedding-based methods, while compared with NMF and GNMF, they achieved considerable improvement.

Table 5 RMSE results for link prediction
Table 6 PCC results for link prediction

To investigate the effect of content and structure on the performance of link prediction in NMF-based methods, we used Fig. 1, to compare obtained AUCs for different networks. It is obvious that taking advantage of higher-order similarity, obtained by LRW, can increase the prediction accuracy compared to regular GNMF. Furthermore, in addition to structural information, taking advantage of the content of nodes in computing the similarity matrix can improve the obtained results.

Fig. 1
figure 1

Comparing the effect of structural and attributes information on GNMF

4.4.1 Parameter sensitivity

The performance of RGNMF-AN depends on various factors and parameters. Those parameters are the alpha, the dimensionality of latent space, and the maximum iteration number. To investigate the impact of these parameters, we performed experiments such that two out of three parameters are fixed and the third one is varied over a range of values. The alpha parameter can take a value between {105, 103, 101, 10−1,  10−3, 10−5} while the latent space dimension can have a value between {10, 20, …, 90} and the maximum number of iterations can be in the range of {10, 20, …, 60}.

4.4.2 Impact of alpha

Figure 2 demonstrates the effect of parameter alpha on the four evaluation metrics. It is shown that by decreasing the value of alpha from 105 to 101, the performance does not improve, but when the value of alpha is equal to 10−1, the performance increases significantly and from that point, varying the alpha value does not produce any significant change. Therefore, we can see that the optimal value of alpha should be set to 10−1.

Fig. 2
figure 2

Impact of alpha on the performance of the proposed method

4.4.3 Impact of iteration

Figure 3 illustrates the effect of the maximum number of iterations on the performance of RGNMF-AN using the four evaluation metrics. From observing Fig. 3, we can see that with the increase of iteration number, the model shows unstable results until we reach the point where the maximum number of iterations is equal to 40. At that point, the performance is stabilized which proves that the model is converged.

Fig. 3
figure 3

Impact of iteration number on the performance of the proposed method

4.4.4 Impact of latent space dimension

Figure 4 shows the effect of the value of the latent space dimension on the performance of the model. It is very critical to find the optimal value of a dimension since it has a direct impact on both performance and complexity. From Fig. 4, we can see that for most datasets, the best performance is achieved when we set the values of dimensions 70 and 80. Therefore, we chose to consider 70 as the optimal value for the latent space dimension.

Fig. 4
figure 4

Impact of latent space dimension on the performance of the proposed method

To make the evaluation more reliable, we use ROC curves to compare the proposed methods, i.e., RGNMF-AN and RGNMF, against other algorithms. The ROC curve for each algorithm illustrates the ability to distinguish between the positive and negative classes for that particular algorithm. In the worst-case scenario, an algorithm predicts the labels, randomly and as a result, the curve would be in the form of x = y. Since ROC curves are suitable for experiments in highly imbalanced datasets, they are perfect for the evaluation of link prediction methods. Figure 5 compares the obtained ROC curves for each method in all networks. Overall, RGNMF-AN and RGNMF obtained better results in almost all the thresholds and therefore achieved the highest area under the curve.

Fig. 5
figure 5

ROC Curve for the proposed method vs. comparing methods

5 Discussion

Most of the existing methods in the area of link prediction concentrate on the information provided by the topological structure of the networks. Although these methods have some benefits, they neglect a crucial information source, i.e. nodal attributes, which negatively impacts their performance. In this work, we proposed a method that considers both structural and non-structural information, that is present in the network. The nodes’ attributes are integrated with topological information via multiplying a Hadamard matrix. This approach was used to solve the link prediction problem in weighted and directed networks, but to the best of our knowledge, this is the first time that it has been applied to attributed graphs to deal with link prediction. Our proposed similarity matrix, which is the combination of structural and non-structural information has high accuracy in capturing the proximity between node pairs.

For evaluation purposes, we have performed various experiments on different datasets. The comparison methods are some of the most well-known NMF-based and deep learning-based methods. In particular, we have compared the performance of the proposed method, i.e., RGNMF-AN to six other methods, i.e., NMF, GNMF, FSC-NMF, TADW, M-NMF, and GraphSAGE. The obtained results in Table 3, illustrate the superiority of our algorithm in terms of AUC. In particular, in all the networks, except for CiteSeer, the proposed method has achieved better results. Table 4 shows the obtained F-scores for all methods. Clearly, RGNMF-AN has outperformed all the other methods and achieved better results. Tables 5 and 6 summarize the performance of RGNMF-AN, against the comparing methods. It is obvious that overall, the proposed methods have achieved the least RMSE among other methods. More specifically, except for CiteSeer, in all the other networks, RGNMF-AN achieved the best RMSE, whereas, according to Table 6, it was not able to do well, compared to other methods.

To further evaluate other aspects of our algorithm, we performed some experiments to investigate the effect of various parameters, i.e., alpha, number of iteration, and latent space dimensions on the performance of RGNMF-AN. The obtained results are illustrated in Figs. 2, 3, and 4. We have also, plotted the ROC curve for comparing the proposed method, vs. comparing methods that illustrate the better performance of RGNMF-AN.

6 Conclusion

In this paper, the authors focused on the problem of link prediction in attributed networks. The main contribution of the present paper was to propose a method, called Robust Graph Regularization Nonnegative Matrix Factorization (RGNMF-AN), that simultaneously considers topological and non-topological information about networks, to capture the semi-local proximity between a pair of nodes and present it as a weight between node pairs. Furthermore, ℓ2,1-norm was also used to constrain the objective function, to minimize the influence of the random noises and spurious links, in the step of reconstructing the original matrix. In addition, multiplicative updating rules are applied to learn and optimize the model parameters. According to the experimental analysis performed on various datasets, the proposed method has outperformed the other NMF-based methods and achieved better results, in terms of predicting new links. In particular, an extensive experimental analysis was performed on nine real-world datasets, using AUC, F-measure, RMSE, and PCC metrics to compare the obtained results of the RGNMF-AN against other methods, and the superiority of the RGNMF-AN was proved. In future studies, the proposed method would have the option to be applied to multilayer, signed, and bipartite networks to deal with the link prediction problem. Furthermore, suggesting an approach to specify a high-order relationship among nodes by applying hypergraph form in the present study could be an excellent topic for future studies.