Keywords

1 Introduction

Networks are natural but powerful structure that capture the relationships between different entities in many domains, such as social networks, referral networks, and bioinformatics networks [12, 27]. Network analysis, also known as network science, has received a lot of attention for decades and remains an attractive field [1]. While the analysis of individual network is critical for a variety of applications (e.g., link prediction; community detection), it cannot sufficiently address the tasks that require considering the relationships between graphs (e.g., graph clustering; graph alignment). As thus, a subfield of network science is put forward to analyze the relationships between different graphs. In this paper, we aim to solve one of the fundamental problems in comparative graph analysis - network alignment (NA).

The goal of network alignment is to identify the corresponding nodes in different networks. For example, there are a large number of users with accounts in different social networks [23], and network alignment can help identify the same users in different social networks, as shown in Fig. 1. The user correspondence established by network alignment can alleviate the sparsity issue of a single social network, benefiting applications such as link prediction [5] and cross-domain recommendation [14]. Besides, network alignment can also help build a more compact knowledge graph based on existing vertical or cross-language knowledge bases, and enable better knowledge inference. In addition, in bioinformatics, aligning protein-protein interaction networks from different species has also been extensively studied to identify common functional structures [7].

Fig. 1.
figure 1

An example of network alignment. The black lines between two networks are anchor links. And the dash lines are potential aligned nodes.

However, network alignment faces three primary challenges: network noise, data sparsity, and alignment efficiency.

Alignment Efficiency. Some NA work describes network alignment as the maximum matching problem of a binary graph, such as the largest common sub-graph problem, but these are all NP-difficult problems [2]. Therefore, many methods employ matrix decomposition formulas such as IsoRank [18], FINAL [28], and REGAL [8]. But the approaches cannot handle very large networks because the computational effort required will grow rapidly as the size of the network increases.

Network Noise. Due to the inevitable error of data measurement or collection, real-world networks are generally noisy or even incomplete. The network noise originates from both the topology of the network and the feature matrix of the nodes.

Data Sparsity. Similar to other types of large-scale data, large-scale networks all obey the long-tail distribution and have severe data sparsity problems [26]. For long-tail nodes, only a few paths are associated with them, so their semantic or inference representation is extremely inaccurate.

To improve alignment efficiency, methods based on the network representing learning are proposed, such as PALE [15], and DeepLink [30]. These alignment techniques can take advantage of the scalability of graph embedding to handle large networks, but these methods rely only on topology information and are susceptible to structure noise, making the models lack generalization capabilities. To overcome the network structure noise, we generate better network structure representation basing on iterative deep graph learning framework.

Moreover, most network alignment methods pay more attention to the global features of the network without valuing the local structure features. Particularly in sparse networks, such methods lead to the poor performance because of the long-tail distribution. While knowledge representation learning methods, such as TransE [3], TransH [24], DistMult [25], ComplEx [21], and RotatE [20] models, can well characterize the local structure of networks. In view of that, to address the data sparsity, we use various graph representation learning methods for network alignment to obtain high-quality local features.

In this paper, we propose Iterative Deep Graph Learning with Local Feature Augmentation for Network Alignment (IDLFA). The model is composed of two parts: encoder module and decoder module. The encoder module learns node structure embedding by iterative deep graph learning model. The decoder module integrates the knowledge representation learning method into the alignment method to augment local feature. And in the training process, the bootstrapping algorithm is applied to add the newly generated alignment nodes to the training set to further alleviate the data sparsity issue. The contributions of this paper can be summarized as follows:

  1. 1)

    We propose a unified network alignment framework, which combines encoder module and decoder model for solving network structure noise and data sparsity.

  2. 2)

    At encoder module, we leverage Iterative Deep Graph Learning for Graph Neural Networks to obtain better node structural embedding and reduce networks noise.

  3. 3)

    At decoder module, for easing data sparsity, we integrate knowledge representation methods to augment local feature and apply bootstrapping algorithm to produce newly alignments for model training.

  4. 4)

    Experiments on real-world datasets demonstrate that the network alignment method based on iterative deep graph learning outperforms state-of-the-art models and is highly robust on alignment tasks.

2 Related Work

In our work, the network alignment techniques are divided into two categories, the spectral method and the network representation learning method. The goal of the spectral approach is to align the two networks based on the adjacency matrix operation. While, the network representation learning method requires an intermediate step where the nodes in the network are represented as embedding.

2.1 Spectral Method

Many spectral methods [11, 17], using matrix factorization, aim to directly calculate the alignment matrices. Assuming that the input graph is in the form of an adjacency matrix, the spectral alignment technique defines the model in the form of a loss function, which considers the adjacency matrix of the source network and the target network. The node features are constants and the alignment matrices are variables. During the alignment process, the alignment matrix is learned by optimizing the loss function based on the structure or property consistency assumptions.

IsoRank [18] is one of the popular and typical techniques in the category which utilizes only topological information. The main idea is that two nodes in two networks are similar if their neighbors are similar, but the technique is highly sensitive to structural noise. BigAlign [10] uses only the attribute information to align the nodes. FINAL [28] is different from previous methods that only used topological or attribute information, and chooses to use both of them to better capture the information of the network nodes. REGAL [8] models the alignment matrix by topology and feature similarity and then employs low-rank matrix approximation to speed up calculation.

2.2 Network Representation Learning Method

Network representation learning approaches [6, 16, 29] solve the network alignment problem by exploiting graph embedding. It includes two steps: embedding generation and alignment matrix generation. At first, a graph embedding technique is used to represent nodes and the two embedding matrices of graphs are obtained separately. Then, the alignment matrix is designed to map the source network’s embeddings to the target network’s embedding space.

PALE [15] involves a pre-process step where a priori mapping between two networks is used to populate the missing edges available in one map but not available in the other. DeepLink [30] has the same method as the PALE to construct graph embeddings, but its mapping function varies by considering the mapping direction. DeepLink adopts unbiased random walk to generate embeddings and uses a linked-dual learning process to improve its quality. IONE [13] uses the same mapping function as PALE, its embedding function is more complex because it considers the neighborhood of the node. IONE aims to meet two goals: close nodes in each graph should have similar node embedding and the nodes with close embedding are good candidates.

3 Preliminaries

In this section, we first formally define the task of network alignment, and then we briefly review the knowledge representation model.

3.1 Problem Formulation

Network alignment is the task of identifying corresponding nodes between two different networks. Given two networks: source network \( G_s=(V_s,E_s))\)and target network \( G_t=(V_t,E_t))\) , where \(V_s\) and \(V_t\) are sets of network nodes, \(E_s\) and \(E_t\) are sets of network edges. Anchor links represent a node pair\((v,v')\) , where\(v \in V_s\), \(v' \in V_t\), and v and \(v'\) are aligned. The goal of network alignment is to predict all potential anchor links.

3.2 Knowledge Representation Model

In this section, we introduce TransE [3], TransH [24], DistMult [25], ComplEx [21], and RotatE [20] models, which are adapted to our network alignment framework. u and v denote the node embedding; e is the edge embedding.

TransE. The idea of TransE is that the embeddings of the nodes in the source network can be close enough to the corresponding embeddings in the target network through the edge embedding, so that the score function of the TransE model can be represented by the following formula:

$$\begin{aligned} f_{TransE}(u+e,v)= \left\| u+e-v\right\| . \end{aligned}$$
(1)

TransH. To overcome the defects of TransE in edge modeling, the nodes have a distributed representation when different edges are involved. For an edge, the model positions an edge-specific translation vector \(d_e\) in the hyperplane \(w_e\) of a specific edge rather than in the space nodes embedded:

$$\begin{aligned} f_{TransH}(u,v)= \left\| (u-w_e^Tuw_e)+d_e-(v-w_e^Tvw_e)\right\| ^2_2. \end{aligned}$$
(2)

DistMult employs bilinear encoding, and the embeddings of nodes and edges can be learned through a neural network. The first layer projects a pair of input nodes onto a low-dimensional vector, and the second layer combines the two vectors onto a scalar. With edge-specific parameters \(B_e\), the score function is:

$$\begin{aligned} f_{DistMult}(u,v)= u^T B_e v. \end{aligned}$$
(3)

ComplEx. The model introduces the complex vector space into the embedding, and its score function is:

$$\begin{aligned} f_{ComplEx}(e,u,v;\varTheta )= Re(<e,u,\overline{v}>)=Re(\sum _{k=1}^K e_k u_k\overline{v}_k), \end{aligned}$$
(4)

where \(Re(\cdot )\) denotes imaginary; \(\overline{v}_k\) and \(v_k\) are conjugate.

RotatE. Similar to complEx model, RotatE models the nodes and edges in the complex vector space. The difference is that the RotatE limits the modulus of the edge vector to 1, so that it becomes the rotation vector from the source network node to the corresponding node of the target network. Therefore, its score function is expressed as:

$$\begin{aligned} f_{RotatE}(u,v)= \left\| u \odot e-v\right\| , \end{aligned}$$
(5)

where \(\odot \) denotes Hadamard product; \(\Vert e_i\Vert =1\) denotes that the modulus of the edge vector are set to 1.

Fig. 2.
figure 2

Overview of IDLFA framework

4 Method

In this section, we present our approach IDLFA, which consists of encoder module and decoder module. The framework is shown in Fig. 2. In encoder module, the network structure is learned according to the iterative deep graph learning framework. In decoder module, we combine the knowledge representation model with the loss function of the IDGL model and adapt it to the network alignment, which helps to learn better local features. To further alleviate the data sparse problem, we use Bootstrapping algorithm [19] to add the predicted aligned node pairs to training data.

4.1 IDGL-Based Node Embedding

Iterative Deep Graph Learning model [4] (IDGL) is an end-to-end graph learning framework, which can jointly and iteratively learning the graph structure and graph embedding. In view of the advantage of obtaining better network representation, we transform it to alignment networks. Figure 3 is the overall architecture of IDGL framework. The brief introduction to the model is following.

Similarity Metric Learning. Without loss of generality, the IDGL model designs a weighted cosine similarity as metric function, \(s_{ij}^p = cos(w \odot v_i, w \odot v_j)\), where \(\odot \) denotes the Hadamard product, and w is a learnable weight vector which has the same dimension as the input vectors \(v_i\) and \(v_j\). Note that the two input vectors could be either raw node features or computed node embeddings.

Graph Node Embeddings and Prediction. Both the learned graph structure A and the original graph topology \(A^{(0)}\) are helpful to formulate an optimized graph for GNNs. IDGL combines the learned graph with the initial graph,

$$\begin{aligned} {\tilde{A}}^{(t)} = \lambda L^{(0)} + (1-\lambda )\{ {\eta f(A^{(t)}) + (1-\eta )f(A^{(1)})} \}, \end{aligned}$$
(6)

where \(L^{(0)}={D^{(0)}}^{-1/2} A^{(0)}{D^{(0)}}^{-1/2}\) is the normalized adjacency matrix of the initial graph. \(A^{(t)}\) and \(A^{(1)}\) are the two adjacency matrices computed at the t-th and 1-st iterations, respectively. \(A^{(1)}\) is computed from the raw node features X, and \(A^{(t)}\) is computed from the previously updated node embeddings \(Z^{(t-1)}\) that is optimized toward the downstream prediction task. Hyperparameter \(\eta \) is used to combine the advantages of both; \(\lambda \) is used to balance the learned graph structure and the initial one.

Fig. 3.
figure 3

Overall architecture of the proposed IDGL framework. Dashed lines (in data points on left) indicate the initial noisy graph topology A.

We follow the setting of IDGL model, and adopt a two-layered GCN [9] where the first layer (denoted as GNN\(_1\)) maps the raw node features \(\textbf{X}\) to the intermediate embedding space, and the second layer (denoted as GNN\(_2\)) further maps the intermediate node embeddings \(\textbf{Z}\) to the output space.

$$\begin{aligned} \textbf{Z}={\text {ReLU}}\left( \textbf{MP}(\textbf{X}, \tilde{\textbf{A}}) \textbf{W}_{1}\right) , \quad \widehat{\textbf{y}}=\sigma \left( \textbf{MP}(\textbf{Z}, \tilde{\textbf{A}}) \textbf{W}_{2}\right) , \quad \mathcal {L}_{pred}=\ell (\widehat{\textbf{y}}, \textbf{y}), \end{aligned}$$
(7)

where \(\sigma (\cdot )\) and \(\mathcal {L}(\cdot )\) are task-dependent output function and loss function, respectively. \(\textbf{MP}(\cdot ,\cdot )\) is a massage passing function.

Joint Learning with a Hybrid Loss. IDGL model proposes to jointly and iteratively learning the graph structure and the GNN parameters by minimizing a hybrid loss function combining both the task prediction loss and the graph regularization loss, namely:

$$\begin{aligned} \mathcal {L} = \mathcal {L}_{pred} + \mathcal {L}_{\mathcal {G}}, \end{aligned}$$
(8)

where \(\mathcal {L}_{\mathcal {G}}\) is the graph regularization loss. At each iteration, a hybrid loss is computed. After all iterations, the overall loss is back-propagated through all previous iterations to update the model parameters.

4.2 Model Training

\(\mathcal {L}_{pred}\) is a task-dependent loss function. For network alignment, the object is to embed equivalent nodes as closely as possible in the vector space. So the model training process is performed by minimizing the following margin-based ranking loss function:

$$\begin{aligned} \mathcal {L}_{pred} = \sum _{(u, v) \in S} \sum _{(u^{\prime }, v^{\prime }) \in S_{(u,v)}^{\prime }}\left[ f\left( {u,v}\right) +\gamma - f\left( {u^{\prime },v^{\prime }}\right) \right] _{+}, \end{aligned}$$
(9)

where \([x]_+ = \max \{{0,x}\}\); \((u,v)\in S\) represents the set of anchor links used to train the model; \(S_{(u,v)}^\prime \in S_{(u,v)}^{\prime }\) denotes the set of negative instances constructed by corrupting (uv), i.e., replacing u or v with a randomly chosen nodes in \(G_s\) or \(G_t\); \(\gamma > 0\) denotes the margin hyper-parameter separating positive and negative instances. The margin-based loss function requires that the distance between the entities in positive pairs should be small, and the distance between the entities in negative pairs should be large. Based on various knowledge representation methods at Sect. 3.3, we design correspond loss function. Following are the detailed instruction.

TransE. We merge it with margin-based ranking loss function:

$$\begin{aligned} {\mathcal {L}_{\text {TransE}}} = \sum _{(u, v) \in S} \sum _{(u^{\prime }, v^{\prime }) \in S_{(u,v)}^{\prime }}\left[ f_{TransE}\left( {u+e,v}\right) +\gamma _1 - f_{TransE}\left( {u^{\prime }+e,v^{\prime }}\right) \right] _{+}, \end{aligned}$$
(10)

where \(\gamma _1>0\) the specific boundary hyperparameter separating the positive and negative node alignment in the TransE model.

TransH. We merge it with margin-based ranking loss function:

$$\begin{aligned} {\mathcal {L}_{\text {TransH}}} = \sum _{(u, v) \in S} \sum _{(u^{\prime }, v^{\prime }) \in S_{(u,v)}^{\prime }}\left[ f_{TransH}\left( {u,v}\right) +\gamma _2 - f_{TransH}\left( {u^{\prime },v^{\prime }}\right) \right] _{+}, \end{aligned}$$
(11)

where \(\gamma _2>0\) is the specific boundary hyperparameter.

DistMult. Based on DistMult model, the loss function \(\mathcal {L}_{pred}\) can be adjusted to:

$$\begin{aligned} {\mathcal {L}_{\text {DistMult}}} = \sum _{(u, v) \in S} \sum _{(u^{\prime }, v^{\prime }) \in S_{(u,v)}^{\prime }}\left[ f_{DistMult}\left( {u,v}\right) - f_{DistMult}\left( {u^{\prime },v^{\prime }}\right) + 1 \right] _{+}. \end{aligned}$$
(12)

ComplEx. We minimize the negative log-likelihood of the logical model, and train the model using small-batch stochastic gradient descent and AdaGrad. By regularizing the parameters of the considered model, we adjust the learning rates:

$$\begin{aligned} \mathcal {L}_{\text{ ComplEx } }=\sum _{(u, v) \in S} \log \left( 1+\exp \left( -\textbf{Y}_{e u v} f_{\text{ ComplEx } }(u, e, v ; \varTheta )\right) \right) +\lambda \Vert \varTheta \Vert _{2}^{2}, \end{aligned}$$
(13)

where \(\textbf{Y}_{e u v}=1\) when the node pairs are positive; and else \(\textbf{Y}_{e u v}=-1\). \(\lambda \) is a weight parameter.

RotatE. Different from above models, RotatE adopts self-adversarial loss function basing on negative sampling for training:

$$\begin{aligned} \mathcal {L}_{\text{ RotatE } }=-\log \sigma \left( \gamma _{3}-f_{\text{ RotatE } }(\textbf{u}, \textbf{v})\right) -\sum _{i=1}^{n} p\left( u_{i}^{\prime }, e, v_{i}^{\prime }\right) \log \sigma \left( f_{\text{ RotatE } }\left( \textbf{u}_{i}^{\prime }, \textbf{v}_{i}^{\prime }\right) -\gamma _{3}\right) , \end{aligned}$$
(14)

where \(\gamma _3\) is the specific boundary hyperparameter; \(\sigma \) is the sigmoid function; \(\left( u_{i}^{\prime }, e, v_{i}^{\prime }\right) \) is the i-th negative alignment nodes; \(p_{\left( u_{i}^{\prime }, e, v_{i}^{\prime }\right) }\) can be defined as:

$$\begin{aligned} p\left( u_{j}^{\prime }, e, v_{j}^{\prime } \mid \left\{ \left( u_{i}, e_{i}, v_{i}\right) \right\} \right) =\frac{\exp \alpha f_{\text{ RotatE } }\left( \textbf{u}_{j}^{\prime }, \textbf{v}_{j}^{\prime }\right) }{\sum _{i} \exp \alpha f_{\text{ RotatE } }\left( \textbf{u}_{i}^{\prime }, \textbf{v}_{i}^{\prime }\right) }, \end{aligned}$$
(15)

where \(\alpha \) denotes the sample weight.

The loss function of the above models will learn a better network structure representation. At the same time, the model further adds the new alignment nodes into the training set through the Bootstrapping algorithm, which helps to alleviate the data sparse problem and further improve the performance.

4.3 Alignment Prediction

We predict the alignment results based on the distance between learned nodes representations from two networks.

The Euclidean distance and Manhattan distance are commonly used distance measures in the Euclidean space. For entities \(u_i\) in \(G_s\) and \(v_j\) in \(G_t\), the distance is defined as:

$$\begin{aligned} D\left( u_{i}, v_{j}\right) =\frac{f\left( \textbf{u}_{i}, \textbf{v}_{j}\right) }{d}, \end{aligned}$$
(16)

where \(f(x,y)=\Vert x-y\Vert _1\), \(\Vert \cdot \Vert _1\) is the \(L_1\) norm; d denotes the dimension of embedding. The distance is expected to be small for equivalent entities and large for non-equivalent ones. For a specific entity \(u_i\) in \(G_s\), our approach computes the distances between \(u_i\) and all the entities in \(G_t\), and returns a list of ranked entities as candidate alignments.

5 Experiment

In the section, we conduct extensive experiments to verify the effectiveness of the model. First, we introduce the experimental settings and then evaluate the performance of our method. For further analysis, ablation study and parameter analysis are performed.

5.1 Experiment Setup

Datasets. This section conducts experiments on 2 real-world datasets (4 real-world networks). The detailed information is shown in Table 1.

Flickr and Myspace datasets: The two subnetworks of Flickr and Myspace are collected in the paper [27] and then processed according to the method in the paper [28]. Flickr’s subnet contains 6714 nodes, and Myspace’s subnet contains 10,733 nodes. The gender of the user is used to represent node attributes, and only part of the ground truth is available for alignment.

Allmovie and Imdb datasets: The Allmovie network is constructed from Rotten Tomatoes websiteFootnote 1. Two films have an edge connecting them if they have at least one common actors. Imdb network is constructed in a similar way from Imdb websiteFootnote 2. The alignment output is constructed by the identity of the film, containing 5176 anchor links.

Evaluation Metrics. We use both Success@q [28] and MAP (Mean Average Precision) [15] to evaluate the effectiveness of our proposed model. Success@q denotes whether a true positive match appears in the previous q candidate. For ranking perspective, MAP is also known as Mean Reciprocal Rank under pair-wise setting. Considering that the network alignment is a bidirectional task, we use the average value of \(G_s \rightarrow G_t\) and \(G_t \rightarrow G_s\) to present the experimental results.

Table 1. Statistics of 4 real-world networks.

Comparison Methods. Our proposed model IDLFA with its variants and the state-of-the-art baseline methods for comparison are listed as following:

PALE [15]: is a network representation technique that learns node embedding by maximizing the co-occurrence likelihood of edge nodes, and then applies a linear or multi-layer perceptron as a mapping function.

REGAL: is a spectral method that models the alignment matrix by topology and feature similarity of nodes, and then accelerates with a low-rank matrix [8].

IsoRank: is a spectral approach and a global alignment method initially with application to protein interaction networks [18].

FINAL: is a spectral method designed for attributed networks, which considers graph structure, node feature, and edge feature [28].

GAlign: is the state-of-the-art alignment mode and proposes a completely unsupervised network alignment framework based on a multi-order GCN embedding model [22].

Hyperparameter Tuning. The margin hyper-parameters \(\gamma \), \(\gamma _1\), \(\gamma _2\) and \(\gamma _3\) in the relevant loss function are set to 1. And the value of \(\lambda \) is validated in set \(\{0.1, 0.03, 0.01, 0.003, 0.001\}\). The embedding dimension is set to 100 and will be further evaluated in the later. We optimize the model with Stochastic Gradient Descent algorithm.

Machines and Repeatability. The results are averaged over 10 runs to mitigate randomness. All experiments are conducted on 8 3.6 GHz Intel Cores with 64 GB RAM and 1 GeForce RTX2080Ti graphic cards. Our proposed algorithm is programmed in Python.

5.2 Experiment Result

To verify the effectiveness of our proposed model, we compares the models with several state-of-the-art models on two real-world datasets, and the experimental results are shown in Table 2. Bold numbers indicate optimal results, and underlined numbers indicate sub-optimal results. The results are obtained with 80% of the anchor nodes as training set and the rest for testing.

Table 2. The performance of network alignment on real-world datasets.

In general, our proposed IDLFA model outperforms all the baselines on datasets Allmovie-Imdb in terms of MAP, Success@1 and Success@10. In terms of Success@1, IDLFA achieves more than 90%, exceeds GAlign by 8% and exceeds FINAL by nearly 15%. In terms of MAP, IDLFA outperforms the second over 8%. In addition, the Success@10 of IDLFA is about 97%.

Table 3. The performance of our proposed model IDLFA on 20% anchor links.

In Flickr-Myspace, our proposed model achieves the sub-optimal alignment accuracy. Compared with FINAL, MAP of IDLFA increases by more than 0.07, and Success@10 is about 15% higher. Compared with the SOTA model GAlign, our proposed method is nearly 2% lower in Success@1 and about 0.04 lower in terms of MAP. Although our model IDLFA does not exceed GAlign, we only use structural information, while GAlign uses additional attribute information.

Weakly Supervised Condition. Table 3 further gives the detailed model comparison when the ratio of training set to test set is 0.2:0.8. Our proposed model is compared with the previous SOTA GAlign in Allmovie-Imdb and Flickr-Myspace datasets. Our method IDLFA still has \(Success@1 \approx 78\%\) better than GAlign in Allmovie-Imdb dataset. In Flickr-Myspace dataset, IDLFA outperforms GAlign by a large margin. The results show that the model is still robust and well-performed in a weakly supervised manner. And our proposed model performs better in Allmovie-Imdb than Flickr-Myspace, which indicates that the model has more prominent performance on datasets with abundant structure information in a weakly supervised manner.

Ablation Study. The local feature argument mechanism is based on knowledge representation model, which is designed for sparse datasets. Although the IDGL model can learn better structure representation, the premise is that the nodes have rich topology information. It can be seen from Table 2 that the alignment accuracy of the same model on different datasets varies greatly. What is the reason for this phenomenon? It can be seen from Table 1 that the network Flickr and Myspace are relative sparse, whose average edges of nodes is about 2. That’s to say, these networks have a large number of long-tail nodes. For long-tail nodes, GNN-based models are limited to learn their structure features [26]. So, to verify the effectiveness of our proposed method on sparse datasets, we conduct ablation study on Flickr-Myspace.

Table 4. The result of ablation study on Flickr-Myspace.

IDNA represents our proposed model without local feature augmentation module. IDNA+TransE, IDNA+TransH, IDNA+DistMult, IDNA+ComplEx and IDNA+RotatE denote fusing IDNA with various knowledge representation method to learn better local feature. Table 4 presents the result with 4 metrics. From the table, we can clearly see that local feature augmentation module obtains better performance in terms of MAP, Success@1, Success@3 and Success@10, which indicates the local feature augmentation module works well and learns better structure representation. When the percentage of anchor links is 0.8, IDNA+RotatE outperforms other method and Success@1 improves by 2% compared to IDNA. When the percentage of anchor links is 0.2, IDNA+ComplEx performs better and has 1.2% improvement in terms of MAP.

Fig. 4.
figure 4

The relationship between alignment results and embedding dimension.

5.3 Hyperparameter Sensitivity

Figure 4 studies the sensitivity of the embedding dimension. In general, users should not choose a high number of dimensions as it does not increase the performance (Success@1) significantly while the time and space complexity definitely become larger. In Flickr-Myspace dataset, as the embedding dimension increases, the model performance will fluctuate to a certain extent, but in general, the effect is better when the embedding dimension is 100. In Allmovie-Imdb, the initial performance increases rapidly over dimension and remains stable when dimension reaches about 100.

6 Conclusion

In this paper, we propose a novel network alignment framework IDLFA, which learns better network structure representation and further solves the networks data sparsity. Comprehensive empirical studies on two pairs of popular real-world datasets show that IDLFA can significantly improve the performance for social network alignment tasks in comparison with existing solutions. On part datasets, such as Allmovie-Imdb, our model shows the superiority, whose Success@1 comes to 90%, and can be adopt to practical applications. Our model doesn’t take attribute information into consideration. In the future works, we will study relative framework for attributed networks and the proposed framework can be applied to other tasks, e.g., cross-lingual knowledge graph task.