1 Introduction

In recent years, due to the popular and diverse functionalities of social networks, more and more users simultaneously own accounts on multiple social networks such as Twitter, Flickr, or Instagram [21]. Linking user accounts in different social networks has very important influence in many cross-network applications. For user profile modeling [4, 11], a comprehensive understanding of a user’s interests can be obtained by aggregating the user’s historical behaviors in different networks. For cross-network recommendation [12, 27, 28] and link prediction [6, 32,33,34], anchor users (i.e., identity linked users) mitigate the cold start and data sparsity problems by enabling information transferring between aligned networks. However, because of the unrevealing nature of the Web and the fact that most social network platforms preserve the anonymity of users, the correspondences among users’ different accounts are also unrevealed. Therefore, an interesting question arises - how can we find user correspondences in different social networks?

Most of the existing research on user identity linkage [3, 7, 9, 31] focuses on extracting user characteristics from user contributed content information (e.g., blogs or tweets posted by users) and structural information (e.g., connections and interactions between users), by assuming the independence between the two types of information. Hence, existing studies usually handle the content information and the structural information separately. However, while the two types of information describe users from different perspectives, we note that there exist correlations between structural and content information. For example, it is very likely for a Twitter user to post tweets about a celebrity if s/he follows (likes) the celebrity. Therefore, it is expected that the structural information and content information may share a common space where the structure-based and content-based representations of users are close to each other. We are thus motivated to model user characteristics by fusing structural and content information in a unified way in linking user identities in different social networks.

It is, however, a challenging task to fuse structural and content information to learn the effective representations of users, because of the following two main issues. Firstly, content and structural information come from heterogeneous feature spaces (such as different granularities and data structures), which makes it hard to fuse them in a joint space. Secondly, the information are diverse and noisy across different networks. For example, users may upload images as content in one social network (e.g., Flickr) and post textual content in another (e.g., Twitter). The diversity problem makes it extremely difficult to leverage the diverse types of information simultaneously to accurately link user identities. Also, user content is noisy with information irrelevant to characterize user identities, such as advertisements. Similarly, the network structure may be noisy as well, because not all edges represent true “friend” relations [21].

Recently, as representing a network as low-dimensional vectors is an efficient way to solve high computation and space cost problem [2], methods that embed multiple types of information of a network into a low-dimensional space have attracted a great deal of attention in a variety of fields, such as text mining and recommendation. For instance, Tang et al. [23] proposed a text embedding method based on modeled heterogeneous text networks, which is proved to be useful for document classification. Xie et al. [25] proposed a generic graph-based embedding model, which jointly captures the sequential effect, geographical influence, temporal cyclic effect and semantic effect in a unified way for the recommendation task. However, these methods are either applied to individual networks or not designed for user identity linkage. There are also some studies [10, 13] focusing on aligning users across social networks by network embedding. Nevertheless, these methods consider only structural information represented as homogeneous networks. Hence, to achieve user linkage with higher performance, we design an effective method that jointly embeds structural and content information in multiple heterogeneous networks.

In this paper, we propose a Linked Heterogeneous Network Embedding model (LHNE) to learn the comprehensive descriptions of users in different social networks through jointly leveraging structural and content information in a unified framework. First, we model the topics of user interests to represent the content information in different social networks at a same granularity and filter out the noise. Second, we capture friend-based (i.e., structure) and interest-based (i.e., content) user co-occurrence in linked heterogeneous network using four types of sub-networks (i.e., user-user intra/inter-network and user-topic intra/inter-network). Third, we learn the effective user representations by embedding the sub-networks into a unified low-dimensional space. In the meantime, to bridge different social networks, we learn user transfer and topic transfer using a set of seed users. Finally, users are mapped by computing the similarity between the representations of users in different networks.

The main contributions are summarized as follows.

  1. 1.

    We focus on learning the comprehensive representations of users by jointly leveraging structural and content information in a unified way, and integrating network structures and content into linked heterogeneous network, which incorporates the friend-based and interest-based user co-occurrence in different social networks.

  2. 2.

    We propose a novel network embedding model “LHNE”, which embeds the linked heterogeneous network into a unified low-dimensional space in terms of intra-network and inter-network. In the meantime, we learn user and topic transfer across social networks to solve the diversity problem utilizing a set of seed users as prior information.

  3. 3.

    We demonstrate the performance of LHNE on both real social network and synthetic datasets. A series of experimental results validate that LHNE achieves better performance than the state-of-the-art methods in terms of effectiveness, reliability and sensibility and can work well even with little or no structural information.

The remainder of the paper is organized as follows. Section 2 reviews existing work related to our research. Section 3 defines concepts and terms used in this paper and formally defines the user identity linkage problem. Section 4 details the technology of our proposed LHNE model. Experimental results on both real social network and synthetic datasets are presented in Section 5. We conclude our work in Section 6.

2 Related work

There are many studies addressing the user identity linkage problem by exploring a variety types of user information in multiple social networks, including profile information, structural information and content information. We group existing methods for user identity linkage into the following two main categories.

The first category of methods exploits one type of user information for user identity linkage. The most intuitive way is to use profile information [15, 30], such as username, avatar and gender. However, profile information contains many null and inconsistent values, which makes it very hard to achieve satisfactory linkage accuracy. For the purpose of performance enhancing, many studies leverage structural information [8, 10, 13, 16, 22] or content information [18, 20] to discover user correspondences in different social networks. The common idea shared by structure-based methods is to extract neighborhood-based features as the inputs of models. For example, Narayanan et al. [16] proposed a graph theoretic model based on the number of common neighbors to perform user identity linkage task. Korula et al. [8] designed a parallelizable mapping algorithm based on neighborhood-based features such as the degrees of unmapped users and the number of common neighbors. Moreover, to solve the high-dimensional problem of networks, techniques are employed to embed networks into a low-dimensional space, which is followed by effective representation learning for users to link user identities [10, 13]. For instance, Liu et al. [10] proposed a network representation learning method to simultaneously learn the follower-ship/followee-ship of individual users, and used seed users as constraints for user representation learning across networks. Man et al. [13] presented a supervised framework that learns embedding-based representations of nodes and links user’s accounts by a projection method. Meanwhile, many studies have shown that content information is also conducible for user identity linkage. Phan et al. [18] regarded the user identity linkage task as a pairwise classification problem based on the content browsed by users on different devices, and then used the gradient boosting method to detect same users. Riederer et al. [20] utilized an aligning algorithm to compute affinity scores based on time-stamped location data and then adopted a maximum weighted matching scheme to find the most likely candidate pair. Overall, exploiting only one specific type of information leads to incomplete and biased user features, which impairs the performance of user identity linkage.

The second category of methods aims to harness multiple types of user information to improve the accuracy of user identity linkage [3, 7, 9, 31]. Kong et al. [7] developed a SVM classifier with one-to-one constraint to predict anchor links by integrating neighborhood-based network features and content features. Zhang et al. [31] proposed a unified link prediction framework for collective link identification (inter-links and intra-links), which also extracts features from both structural and content information [7]. Cao et al. [3] adopted a bootstrapping method, which respectively extracts features from usernames, social ties and content, and then learns model parameters by the EM algorithm. Liu et al. [9] proposed a multi-objective framework by modeling heterogeneous behaviors (e.g., profile features and content features) and structure consistency, respectively. However, most of existing methods extract user features from different types of information separately, and then combine them together as model inputs. Since features extracted from different information sources have different feature spaces and underlying interpretations, it may not be ideal to directly concatenating them as input features.

Our work in this paper distinguishes itself from other research in the following three aspects.

  1. 1.

    Unlike most prior works on anchor link prediction [3, 7, 31] and user identity linkage [9] that assume the independence between content and structural information, our model aims to jointly leverage structural and content information in a unified framework.

  2. 2.

    Although several studies [10, 13] have exploited network embedding methods for user identity linkage, they are all based on structural information that are represented as homogeneous networks. In contrast, we propose a novel network embedding method to improve linkage performance based on heterogeneous networks including both structural and content information.

  3. 3.

    For the purpose of enhancing information exchange across networks, we solve the cross-network diversity problem by learning user transfer and topic transfer across social networks using a set of seed users.

3 Problem definition

In this section, we define preliminary concepts used in this paper and the user identity linkage problem. Without loss of generality, we focus on user identity linkage in two social networks, while the settings of two social networks can be easily extended to multiple networks.

Let G = {U, E} be a social network, where U is the set of users and E = U × U is the set of edges in G representing the social connections between users. Each user uU is associated with a vector of words Vu = {v1,v2,...,vn} representing the content contributed by the user u in G. Each edge eijE, connecting users ui and uj, is associated with a weight wij, denoting the correlations between ui and uj. For example, if G is a co-authorship network, wij is the number of times ui and uj have co-authored.

Then, the problem of user identity linkage in two social networks can be formally defined as follows.

Problem 1. :

(User Identity Linkage) Given two social networks Gx = {Ux,Ex} and Gy = {Uy,Ey}, the task of user identity linkage is to predict whether a pair of users \({u^{x}_{i}}\in U^{x}\) and \({u^{y}_{j}}\in U^{y}\) belong to a same real natural person.

4 Linked heterogeneous network embedding model

In this section, we first model the topics of user interests with content information, and present the LHNE method mathematically based on friend-based and interest-based proximity of users in terms of intra-network embedding (intra-NE) and inter-network embedding (inter-NE). Then, we present the joint learning of user embedding and topic embedding of different networks in a unified low-dimensional space. Next, we map users across social networks based on the representations of users. The illustration of LHNE is depicted in Figure 1. Finally, we present the pseudo codes to summarize the overall algorithm.

Figure 1
figure 1

Illustration of the LHNE framework

4.1 The topics of user interests modeling

As discussed in Section 2, user content information is not only diverse but also noisy. In order to exploit effectively user content information, we capture content-wise user proximity by modeling the topics of user interests from the content contributed by users in a social network.

In particular, given a social network G, we adopt Latent Dirichlet Allocation (LDA) [1] to model topics from the set of word vectors associated with users V = {Vu|uU}. After obtaining the topic distribution from V via LDA, we can capture user proximities through the topics of user interests.

The detailed distributions for LDA model are as below:

$$ \theta_{i}\sim{Dir(\alpha)},\phi_{k}\sim{Dir(\beta)},z_{ij}\sim{Multi(\theta{_{i}})},w_{ij}\sim{Multi(\phi{_{z_{ij}}})} $$
(1)

where α, β are hyper-parameters. Dir(⋅) is the Dirichlet distribution and Multi(⋅) is the Multinomial distribution. For each user i, we draw his topic distribution 𝜃i from the Dirichlet distribution with the parameter α (𝜃iDir(α)). For each word, we first draw the topic zij from user’s topic distribution (zijMulti(𝜃i)), and then select the word wij according to topic-word dictionary \(\phi {_{z_{ij}}}\). The topic-word dictionary ϕk also follows the Dirichlet distribution with the parameter β (ϕkDir(β)).

We train LDA model by estimating the model parameters with the Gibbs sampling method. We can derive the Gibbs updating rule as follows:

$$ P(z_{ij}=k|z_{\neg{ij}},w,\phi,\cdot)\propto{\frac{n^{\neg{ij}}_{i,k}+\alpha{_{k}}}{\sum\limits_{p = 1}^{P}(n^{\neg{ij}}_{i,p}+\alpha{_{p}})} +\frac{n^{\neg{ij}}_{k,j}+\beta{_{j}}}{\sum\limits_{q = 1}^{Q}(n^{\neg{ij}}_{k,q}+\beta{_{q}})}} $$
(2)

where ni, k is the number of times topic k being assigned to user i (number of times zi = k) and nk, j is the number of times word j being assigned to topic k. After sufficient sampling iterations, the topic distribution 𝜃i can be estimated by:

$$ \hat{\theta}{_{i,k}}=\frac{n_{i,k}+\alpha{_{k}}}{\sum\limits_{p = 1}^{P}(n_{i,p}+\alpha{_{p}})} $$
(3)

In our experiments, we notice that some topics are not important to capture user proximities in terms of interests . Therefore, instead of taking into account the complete set of topics, we select the set Ti = {k|𝜃i, k > h}, where h is the topic threshold. By setting a suitable threshold h, we can improve not only the computation efficiency with a reduced number of topics but also the robustness by filtering noises represented by insignificant topics.

Note that, we model topics of user interests from individual social networks, instead of collectively extracting topics from the two social networks. The reason is that, the content information of social networks is noisy and diverse. It is expected that topics modeled from individual social networks will be of high quality than those extracted from combined social networks. It will also allow more flexibility in parameter setting. For example, we may set different topic numbers for different social networks to make the topics semantically meaningful.

4.2 Intra-network embedding

Given a social network G, we first apply intra-network embedding to embed it into a low-dimensional space by preserving friend-based proximities and interest-based proximities within a network. In particular, we perform the following two tasks.

Task 1. :

(User-User Intra-NE) The target is to preserve the friend-based proximities of users within a network. The intuitive idea is to make the representations of users sharing common neighbors to be as similar as possible.

Task 2. :

(User-Topic Intra-NE) The target is to preserve the interest-based proximities of users within a network. That is, the representations of users who are interested in same topics are expected to be similar.

The first task can be performed directly on the a given social network, e.g., G = (U, E). For the second task, we construct a user-topic bipartite network, defined as Gut = {UT, Eut}, where U is the set of users, T is the set of topics extracted by LDA from the content contributed by users, and Eut is the set of edges connecting users and topics. Each edge connecting user ui with topic tj is associated with a weight wij representing the probabilities ui is interested in topic tj, which can be obtained from the output of the LDA model (i.e., (3)).

Then, for both tasks 1 and 2, similar to existing representation learning methods [23], we first define the conditional probability between two nodes vi and vj as follows,

$$ p(v_{j}|v_{i})=\frac{\exp (\mathbf{z}_{j}\cdot{\mathbf{z}_{i}})} {\sum\nolimits_{v_{k}\in{V_{B}}}\exp (\mathbf{z}_{k}\cdot{\mathbf{z}_{i}})} $$
(4)

where vi,vjU for Task 1, while viU and vjT for Task 2, VB = {vkU, vkvi} for Task 1 and VB = T for Task 2. zi and zj are the embedding vectors of node vi and node vj respectively. For preserving the weight wij on edge (vi,vj), we make the conditional distribution p(⋅|vi) and its empirical distribution \(\hat {p}(\cdot |v_{i})\) coincide, and define empirical distribution as \(\hat {p}(v_{j}|v_{i})=\frac {w_{ij}}{d_{i}}\). Then, we minimize the following objective function:

$$ O^{\prime}=\sum\nolimits_{v_{i}\in{V_{A}}}\lambda{_{i}}D(\hat{p}(\cdot|v_{i})||p(\cdot|v_{i})) $$
(5)

where D(⋅||⋅) is the KL-divergence between two distributions, λi is the importance of node vi in the network, which can be denote as the degree \(d_{i}=\sum \nolimits _{i}w_{ij}\), VA = U for both tasks 1 and 2. By omitting some constants, the objective function (5) can be calculated as:

$$ O^{\prime}=-\sum\nolimits_{(v_{i},v_{j})\in{E_{v}}}w_{ij}\log p(v_{j}|v_{i}) $$
(6)

where Ev is the set of edges between VA and VB.

Finally, based on the objective function (6), we can complete the task 1 in the network G and task 2 in the bipartite network Gut by minimizing the following objective functions:

$$ O_{1}=-\sum\limits_{({u_{i}^{x}},{u_{j}^{x}})\in{E^{x}}}w_{ij}^{x}\log p({u_{j}^{x}}|{u_{i}^{x}})-\sum\limits_{({u_{i}^{y}},{u_{j}^{y}})\in{E^{y}}}w_{ij}^{y}\log p({u_{j}^{y}}|{u_{i}^{y}}) $$
(7)
$$ O_{2}=-\sum\limits_{({u_{i}^{x}},{t_{j}^{x}})\in{E^{x}_{ut}}}w_{ij}^{x}\log p({t_{j}^{x}}|{u_{i}^{x}})-\sum\limits_{({u_{i}^{y}},{t_{j}^{y}})\in{E^{y}_{ut}}}w_{ij}^{y}\log p({t_{j}^{y}}|{u_{i}^{y}}) $$
(8)

4.3 Inter-network embedding

In this section, we perform the inter-network embedding on two networks Gx and Gy. Assuming that a set of anchor users bridging the two networks are available, we can learn inter-NE by the following tasks:

Task 3. :

(User-User Inter-NE) The target is to make the anchor and potential anchor users have coincident representations in a unified space utilizing the user transfer.

Task 4. :

(User-Topic Inter-NE) The target is to make the representations of the anchor and potential anchor users sharing common interests to be as similar as possible in a unified space with the assistance of topic transfer.

We first introduce the user transfer and topic transfer as follows.

User transfer will be learned across two networks Gx and Gy. To do this, a classifier (SVM) is trained for anchor link prediction [7] based on featuresFootnote 1 of a set of anchor users Uo, and then the results of the classifier are considered as the transfer probabilities between users. It is proved that the restrictions of probabilities are equivalent to making the representations of anchor users coincide [10]. Therefore, we define the transfer probability as \({p_{u}}({u_{i}^{x}}|{u_{k}^{y}})\), which represents the probability that two users \({u_{i}^{x}}\) and \({u_{k}^{y}}\) in different networks are the same person. Then, we get a set of seed users (anchor and potential anchor users) \(U^{s}=\{u_{k}, {p_{u}}({u_{k}^{x}}|{u_{k}^{y}})>q\}\), where q is a transfer threshold.

Topic transfer is learned between two bipartite networks \(G_{ut}^{x}\) and \(G_{ut}^{y}\). We follow the intuition that if many seed users who are simultaneously interested in topic \({t_{i}^{x}}\) and topic \({t_{j}^{y}}\) in different networks, the two topics tend to be relevant or similar [27]. Therefore, we define the topic transfer probability between topic \({t_{i}^{x}}\) and topic \({t_{j}^{y}}\) based on the set of seed users Us as:

$$\begin{array}{@{}rcl@{}} {p_{t}}({t_{j}^{y}}|{t_{i}^{x}}) &=& \sum\limits_{u_{k} \in {U^{s}}} p({t_{j}^{y}}|u_{k}) p(u_{k}|{t_{i}^{x}})\\ &=& \sum\limits_{u_{k} \in {U^{s}}} p({t_{j}^{y}}|u_{k}) \frac{ p({t_{i}^{x}}|u_{k})p(u_{k})}{p({t_{i}^{x}})}\\ &=& \sum\limits_{u_{k} \in {U^{s}}} \theta{_{j,k}^{y}}\ast \theta{_{i,k}^{x}}\ast \frac{p(u_{k})}{p({t_{i}^{x}})} \end{array} $$
(9)

Where \(\theta {_{j,k}^{y}}\) and \(\theta {_{i,k}^{x}}\) are topic probabilities of LDA model, p(uk) is the user prior and is denoted as \(p(u_{k})={p_{u}}({u_{k}^{x}}|{u_{k}^{y}})\), and \(p({t_{i}^{x}})\) is the topic prior and is denoted as \(p({t_{i}^{x}})=\sum \limits _{u_{k} \in {U^{s}}}p({t_{i}^{x}}|u_{k})p(u_{k})=\sum \limits _{u_{k} \in {U^{s}}}\theta {_{i,k}^{x}}\cdot p(u_{k})\).

With the assistance of user transfer, we can construct a user-user inter-network from Gx and Gy, defined as \(G^{H}_{uu} = \{U^{x} \cup U^{y}, E^{H}_{uu}\}\), where \(E^{H}_{uu}\) is the set of social links ExEy and anchor links \(E^{o}_{uu}\) between anchor users. \(G^{H}_{uu}\) can propagate users’ structural contexts across networks.

Similarly, through topic transfer we build a user-topic inter-network \(G^{H}_{ut}\) as the two user-topic bipartite networks \(G_{ut}^{x}\) and \(G_{ut}^{y}\) connected through the learned topic transfer in (9).

Then, we can embed two inter-networks \(G^{H}_{uu}\) and \(G^{H}_{ut}\)Footnote 2 into a unified latent space. In particular, for the task 3, although there are no real anchor links between the potential anchor user pairs, the information of Gx and Gy can interact with each other by the user transfer probabilities \({{p_{u}^{t}}}\). Therefore, we define the empirical probabilities based on \({{p_{u}^{t}}}\) as:

$$\begin{array}{@{}rcl@{}} \hat p({u_{j}^{y}}|{u_{i}^{x}}) &=& \sum\limits_{{u_{k}} \in {U^{x}}} {\hat p({u_{k}}|{{u_{i}^{x}}})\cdot {p_{u}}({u_{j}^{y}}|u_{k})}\\ &=&\sum\limits_{{u_{k}} \in {U^{x}}} {\frac{w_{ik}^{x}}{{d_{i}^{x}}}\ast {p_{u}}({u_{j}^{y}}|u_{k})} \end{array} $$
(10)

We minimize the KL-divergence of \(p({u_{j}^{y}}|{u_{i}^{x}})\) and \(\hat p({u_{j}^{y}}|{u_{i}^{x}})\), and get the corresponding objective function:

$$ O_{3}^{\prime} = - \sum\limits_{({{u_{i}^{x}}},{u_{k}}) \in {E^{x}}} \sum\limits_{{{u_{j}^{y}}} \in {U^{y}}} w_{ik}^{x}{p_{u}^{t}}({u_{i}^{x}}|{u_{k}})\log p(u_{j}^{y}|u_{i}^{x}) $$
(11)

For the task 4, although there are not real links between user \({u_{i}^{x}}\) in Gx and topic \({t_{j}^{y}}\) in Gy, through the topic transfer \({p_{t}}({{t_{j}^{y}}}|{{t_{k}^{x}}})\), user \({u_{i}^{x}}\) and topic \({t_{j}^{y}}\) can exchange information across networks. Therefore, we define the empirical probabilities and get the corresponding objective function as follows:

$$\begin{array}{@{}rcl@{}} \hat p({t_{j}^{y}}|{u_{i}^{x}}) &=& \sum\limits_{({{u_{i}^{x}}},{{t_{k}^{x}}}) \in E_{ut}^{x}} {\hat p({{t_{k}^{x}}}|{{u_{i}^{x}}}){p_{t}}({{t_{j}^{y}}}|{{t_{k}^{x}}})}\\ &=& \sum\limits_{({{u_{i}^{x}}},{{t_{k}^{x}}}) \in E_{ut}^{x}} {\frac{{{w_{ik}^{x}}}}{{{{d_{i}^{x}}}}}\ast {p_{t}}({{t_{j}^{y}}}|{{t_{k}^{x}}})} \end{array} $$
(12)
$$\begin{array}{@{}rcl@{}} {O_{4}^{\prime}} = - \sum\limits_{({{u_{i}^{x}}},{{t_{k}^{x}}}) \in E_{ut}^{x}} {\sum\limits_{{{t_{j}^{y}}} \in {T^{y}}} {{w_{ik}^{x}}{p_{t}}({{t_{j}^{y}}}|{{t_{k}^{x}}})\log p({{t_{j}^{y}}}|{{u_{i}^{x}}})} } \end{array} $$
(13)

Furthermore, with (913), we can calculate inter-NE by swapping the superscripts x and y, when Gy is the source network and Gx is the target network.

Finally, the task 3 and task 4 can be realized based on the objective function (11) and (13) by minimize the following two objective functions:

$$\begin{array}{@{}rcl@{}} {O_{3}} &= &- {\sum\limits_{({{u_{i}^{x}}},{u_{k}}) \in {E^{x}}} \sum\limits_{{{u_{j}^{y}}} \in {U^{y}}} {{w_{ik}^{x}}{p_{u}}({{u_{i}^{x}}}|{u_{k}})\log p({{u_{j}^{y}}}|{{u_{i}^{x}}})} }\\ &&- {\sum\limits_{({{u_{i}^{y}}},{u_{k}}) \in {E^{y}}} \sum\limits_{{{u_{j}^{x}}} \in {U^{x}}} {{w_{ik}^{y}}{p_{u}}({{u_{i}^{y}}}|{u_{k}})\log p({{u_{j}^{x}}}|{{u_{i}^{y}}})} } \end{array} $$
(14)
$$\begin{array}{@{}rcl@{}} O_{4} &=& - \sum\limits_{({{u_{i}^{x}}},{{t_{k}^{x}}}) \in E_{ut}^{x}} \sum\limits_{t_{j}^{y} \in T^{y}} w_{ik}^{x}p_{t}(t_{j}^{y}|t_{k}^{x})\log p(t_{j}^{y}|u_{i}^{x})\\ &&- \sum\limits_{(u_{i}^{y},t_{k}^{y}) \in E_{ut}^{y}} \sum\limits_{t_{j}^{x} \in T^{x}} w_{ik}^{y}p_{t}(t_{j}^{x}|t_{k}^{y})\log p(t_{j}^{x}|u_{i}^{y}) \end{array} $$
(15)

4.4 Joint embedding learning

The linked heterogeneous network is composed of four parts: user-user/user-topic intra-network and user-user/user-topic inter-network, where the users are shared across the four parts. To learn the representations of the networks, an intuitive idea is to collectively embed the four parts, which can be achieved by minimizing the following objective function:

$$ O=O_{1}+O_{2}+O_{3}+O_{4} $$
(16)

We use the asynchronous stochastic gradient algorithm [19] to optimize objective (16). Optimizing objective (16) is computationally expensive, which needs to sum over the entire set of nodes, as calculating the conditional probability p(⋅|ui). To address this problem, we adopt the negative sampling approach [14]. Take the edges whose the source node is \({u^{x}_{i}}\) as a example, the equivalent counterparts can be derived, given as:

$$ \log p(u_{j}^{x}|u_{i}^{x})\propto \log \sigma(\boldsymbol{\gamma}_{j}^{\prime x^{T}}\cdot \boldsymbol{\gamma}_{i}^{x})+\sum\limits_{m = 1}^{K}E_{u_{n}\sim{p_{n}(u)}}\log \sigma(-\boldsymbol{\gamma}_{n}^{\prime x^{T}}\cdot \boldsymbol{\gamma}_{i}^{x}) $$
(17)
$$ \log p({t_{j}^{x}}|{u_{i}^{x}})\propto \log \sigma({\boldsymbol{\varphi}_{j}^{\prime x}}^{T}\cdot {\boldsymbol{\gamma}_{i}^{x}})+\sum \limits_{m = 1}^{K}E_{u_{n}\sim{p_{n}(u)}}\log \sigma({-\boldsymbol{\varphi}_{n}^{\prime x}}^{T}\cdot {\boldsymbol{\gamma}_{i}^{x}}) $$
(18)
$$ \log p({u_{j}^{y}}|{u_{i}^{x}})\propto \log \sigma({\boldsymbol{\gamma}_{j}^{\prime y}}^{T}\cdot {\boldsymbol{\gamma}_{i}^{x}})+\sum \limits_{m = 1}^{K}E_{u_{n}\sim{p_{n}(u)}}\log \sigma({-\boldsymbol{\gamma}_{n}^{\prime y}}^{T}\cdot {\boldsymbol{\gamma}_{i}^{x}}) $$
(19)
$$ \log p({t_{j}^{y}}|{u_{i}^{x}})\propto \log \sigma({\boldsymbol{\varphi}_{j}^{\prime y}}^{T}\cdot {\boldsymbol{\gamma}_{i}^{x}})+\sum \limits_{m = 1}^{K}E_{u_{n}\sim{p_{n}(u)}}\log \sigma({-\boldsymbol{\varphi}_{n}^{\prime y}}^{T}\cdot {\boldsymbol{\gamma}_{i}^{x}}) $$
(20)

where σ(x) = 1/(1 + exp(−x)) is the sigmoid function, K is the number of negative samples, and du is the output degree. We set K = 5 and \(p_{n}(u) = d_{v}^{3/4}\) as in [14].

To minimize the (16), it is a straightforward solution to merge all kinds of edges in four sets Ex, Ey, \(E_{ut}^{x}\), and \(E_{ut}^{y}\) together. However, because networks are heterogeneous, the weights of different types of edges cannot be comparable to each other. Therefore, it is more reasonable to alternatively sample from the four sets of edges [25], which is called joint training. Moreover, the objective function (16) can be divided into Ouu and Out due to respective sampling, where Ouu = O1 + O3 and Out = O2 + O4 are the objective function when sampling edges from E and Eut, respectively. By learning the representations \({\{{{(\boldsymbol {\gamma }_{i}^{x}, \boldsymbol {\gamma }_{i}^{\prime x})}} \}_{i = 1 {\ldots } |U^{x}|}}\), \({\{ {{\boldsymbol {\varphi }_{j}^{x}}} \}_{j = 1 {\ldots } |T^{x}|}}\), \({\{(\boldsymbol {\gamma }_{i}^{y}, \boldsymbol {\gamma }_{i}^{\prime y}) \}_{i = 1 {\ldots } |U^{y}|}}\) and \({\{ {{\boldsymbol {\varphi }_{j}^{y}}} \}_{j = 1 {\ldots } |T^{y}|}}\), we are able to represent different types of nodes with a d dimensional embedding \(\boldsymbol {\gamma }_{i}^{x}\), \(\boldsymbol {\varphi }_{j}^{x}\), \(\boldsymbol {\gamma }_{i}^{y}\) and \(\boldsymbol {\varphi }_{j}^{y}\) in metric Rd. \(\boldsymbol {\gamma }_{i}^{\prime x}\) and \(\boldsymbol {\gamma }_{i}^{\prime y}\) are the context representations of users as the neighbors [24].

To update the vector of nodes in network Gx, i.e., \(\boldsymbol {\gamma }_{i}^{x}\), we can calculate the gradient by sampling from E and Eut, respectively. The gradient is computed as:

$$\begin{array}{@{}rcl@{}} \frac{\partial O_{uu}}{\partial \boldsymbol{\gamma}_{i}^{x}} &=&w_{ij}^{x}\ast \{ [ 1-\sigma({\boldsymbol{\gamma}_{j}^{\prime x}}^{T}\cdot \boldsymbol{\gamma}_{i}^{x})]\boldsymbol{\gamma}_{j}^{\prime x}-\sigma({\boldsymbol{\gamma}_{n}^{\prime x}}^{T}\cdot \boldsymbol{\gamma}_{i}^{x})\boldsymbol{\gamma}_{n}^{\prime x} \}\\ &&+\sum \limits_{u_{j}\in U^{y}}w_{ik}^{x}\ast p_{u}({u_{j}^{y}}|{u_{k}^{x}})\{[ 1-\sigma({\boldsymbol{\gamma}_{j}^{\prime y}}^{T}\cdot \boldsymbol{\gamma}_{i}^{x})]\boldsymbol{\gamma}_{j}^{\prime y}\\ &&-\sigma({\boldsymbol{\gamma}_{n}^{\prime y}}^{T}\cdot \boldsymbol{\gamma}_{i}^{x})\boldsymbol{\gamma}_{n}^{\prime y} \} \end{array} $$
(21)
$$\begin{array}{@{}rcl@{}} \frac{\partial O_{ut}}{\partial \boldsymbol{\gamma}_{i}^{x}} &=&w_{ij}^{x}\ast \{ [ 1-\sigma(\boldsymbol{\varphi}_{j}^{x^{T}}\cdot \boldsymbol{\gamma}_{i}^{x})]\boldsymbol{\varphi}_{j}^{x}-\sigma(\boldsymbol{\varphi}_{n}^{x^{T}}\cdot \boldsymbol{\gamma}_{i}^{x})\boldsymbol{\varphi}_{n}^{x} \}\\ &&+\sum \limits_{t_{j}\in T^{y}}w_{ik}^{x}\ast p_{t}({t_{j}^{y}}|{u_{k}^{x}})\left\{[ 1-\sigma(\boldsymbol{\varphi}_{j}^{y^{T}}\cdot \boldsymbol{\gamma}_{i}^{x})]\boldsymbol{\varphi}_{j}^{y}\right.\\ &&\left.-\sigma(\boldsymbol{\varphi}_{n}^{y^{T}}\cdot \boldsymbol{\gamma}_{i}^{x})\boldsymbol{\varphi}_{n}^{y}\right\} \end{array} $$
(22)

Similarly, we can obtain the partial derivatives w.r.t. the other vectors of the concerned nodes given as:

$$\begin{array}{@{}rcl@{}} \frac{\partial O_{uu}}{\partial \boldsymbol{\gamma}_{j}^{\prime x}} &=&w_{ij}^{x}\ast [ 1-\sigma(\boldsymbol{\gamma}_{j}^{\prime x^{T}}\cdot \boldsymbol{\gamma}_{i}^{x})]\boldsymbol{\gamma}_{i}^{x} \\ &&+\sum \limits_{u_{j}\in U^{x}}w_{ik}^{y}\ast p_{u}({u_{j}^{x}}|{u_{k}^{y}})[ 1-\sigma(\boldsymbol{\gamma}_{j}^{\prime x^{T}}\cdot \boldsymbol{\gamma}_{i}^{y})]\boldsymbol{\gamma}_{i}^{y} \end{array} $$
(23)
$$\begin{array}{@{}rcl@{}} \frac{\partial O_{ut}}{\partial \boldsymbol{\varphi}_{j}^{x}} &=&w_{ij}^{x}\ast [ 1-\sigma(\boldsymbol{\varphi}_{j}^{x^{T}}\cdot \boldsymbol{\gamma}_{i}^{x})]\boldsymbol{\gamma}_{j}^{x}\\ &&+\sum \limits_{t_{j}\in T^{x}}w_{ik}^{y}\ast p_{t}({t_{j}^{x}}|{u_{k}^{y}})[ 1-\sigma(\boldsymbol{\varphi}_{j}^{x^{T}}\cdot \boldsymbol{\gamma}_{i}^{y})]\boldsymbol{\gamma}_{j}^{y} \end{array} $$
(24)
$$\begin{array}{@{}rcl@{}} \frac{\partial O_{uu}}{\partial \boldsymbol{\gamma}_{n}^{\prime x}} &=&w_{ij}^{x}\ast [-\sigma(\boldsymbol{\gamma}_{n}^{\prime x^{T}}\cdot \boldsymbol{\gamma}_{i}^{x})]\boldsymbol{\gamma}_{i}^{x}\\ &&+\sum \limits_{u_{j}\in U^{x}}w_{ik}^{y}\ast p_{u}({u_{j}^{x}}|{u_{k}^{y}})\ast [ -\sigma(\boldsymbol{\gamma}_{n}^{\prime x^{T}}\cdot \boldsymbol{\gamma}_{i}^{y})]\boldsymbol{\gamma}_{i}^{y} \end{array} $$
(25)
$$\begin{array}{@{}rcl@{}} \frac{\partial O_{ut}}{\partial \boldsymbol{\varphi}_{n}^{x}} &=&w_{ij}^{x}\ast [-\sigma(\boldsymbol{\varphi}_{n}^{x^{T}}\cdot \boldsymbol{\gamma}_{i}^{x})]\boldsymbol{\varphi}_{i}^{x}\\ &&+\sum\limits_{t_{j}\in T^{x}}w_{ik}^{y}\ast p_{t}({t_{j}^{x}}|{u_{k}^{y}})\ast [-\sigma(\boldsymbol{\varphi}_{n}^{x^{T}}\cdot \boldsymbol{\gamma}_{i}^{y})]\boldsymbol{\gamma}_{i}^{y} \end{array} $$
(26)

With reference to (2126), the updating rules for network Gy can be obtained by swapping the superscripts x with y. They are not listed due to the page limit. The joint training algorithm is shown in Algorithm 1:

figure a

4.5 Mapping users across social networks

After learning the representations of users, we can discover user correspondence across social networks based on the cosine similarity, calculated using user embeddings, as follows.

$$ rel({u_{i}^{x}},{u_{j}^{y}}) = \frac{{\sum\nolimits_{p = 1}^{d} {\gamma_{ip}^{x} \times \gamma_{jp}^{x}} }}{{\sqrt{\sum\nolimits_{p = 1}^{d} \gamma_{ip}^{x^{2}}} \times \sqrt {\sum\nolimits_{p = 1}^{d} \gamma_{jp}^{x^{2}} } }} $$
(27)

Given two sets of test users \(U^{\prime x} =\{{u_{1}^{x}},{u_{2}^{x}},{\ldots } , {u_{n}^{x}}\}\) and \(U^{\prime y} =\{{u_{1}^{y}},{u_{2}^{y}},{\ldots } , {u_{n}^{y}}\}\) from two social networks Gx and Gy, we compute the cosine similarity for each pair of test users from the two lists. Then, given some similarity threshold w, we return the list of user pairsFootnote 3\({R}=\{<{u_{i}^{x}},{u_{j}^{y}}>|rel({u_{i}^{x}},{u_{j}^{y}})>w,{u_{i}^{x}}\in {{U}^{\prime x}},{u_{j}^{y}}\in {{U}^{\prime y}}\}\) as the set of discovered corresponding users.

4.6 Overall algorithm

Our overall algorithm is presented in Algorithm 2, which contains four components. First, we model the topics of user interests in Gx and Gy respectively and filter out insignificant topics from steps 1 to 4. Then, we learn the user transfer and topic transfer based on anchor users (steps 5 and 6). The details are discussed in Section 4.3. Next, in step 7, the joint training algorithm is used to learn the user and topic node representations of two networks by collectively utilizing intra-network and intra-network embedding − the detailed process is introduced in Algorithm 1. Finally, we can return a result list of predicted matching user pairs from steps 8 to 13.

figure b

5 Experiments

In this section, we compare our LHNE with the state-of-the-art methods on two types of cross-network datasets. The first dataset is composed of a Twitter network and a Flickr network. The second dataset is a synthetic dataset including two co-author networks in Data Mining area and Wide World Web area.

5.1 Comparative methods

In this subsection, to evaluate the performance of LHNE for user identity linkage, we choose the following state-of-the-art methods as competitors, including

  1. 1.

    LHNE: the model proposed in this paper. LHNE is based on heterogeneous networks in terms of network structures and content. It contains User-User/User-Topic intra-NE and User-User/User-Topic inter-NE.

  2. 2.

    LHNE-U: a variation of our model that ignores the User-Topic inter-NE.

  3. 3.

    LHNE-S: a variation of LHNE based on network structures. It contains User-User intra-NE and User-User inter-NE.

  4. 4.

    LHNE-C: a variation of LHNE based on content. It contains User-Topic intra-NE and User-Topic inter-NE.

  5. 5.

    IONES: a homogeneous network embedding model for user identity linkage with “soft” constraint [10], which can simultaneously learn the follower-ship/followee-ship of each user. IONES considers only the network structures.

  6. 6.

    IONES-C: a variation of IONES that was extended with content information through a simple and effective manner. That is, we concatenate the user topic distribution vector and the embedding vector of a user into a long vector as the user representation for user identity linkage. IONES-C considers structural information and content information separately.

  7. 7.

    KNN: a popular competitive method based on k nearest neighbors search [17, 22]. In the experiments, we jointly utilize topic distribution and common neighbors as user features to compute the k nearest neighbors.

5.2 Evaluation metrics

To perform the user identity linkage, we utilize the recall, precision and F1 [5] as the metrics to evaluate the methods’ performances. The recall is the fraction of the number of real corresponding user pairs that have been found over the total amount of real anchor user pairs, while the precision is the fraction of real corresponding user pairs among the result lists.

$$ Recall = \frac{|CorrPairs|}{|RealAnchorUserPairs|} $$
(28)
$$ Precision = \frac{|CorrPairs|}{|ResultPairs|} $$
(29)
$$ F1 = \frac{2*Recall*Precision}{Recall+Precision} $$
(30)

Where |RealAnchorUserPairs| is the number of all real anchor user pairs. |CorrPairs| is the number of real corresponding user pairs that the method can find in the result list R. |ResultPairs| is the number of pairs in R.

5.3 Structural characteristic metrics

We adopt two metrics (Interop and sparsity level) to evaluate the reliability of LHNE under the different social network structures.

Interoperability (abbreviated as Interop) [22] can measure the influence of overlapping of the two networks and is defined as follows:

$$ Interop(x,y) = \frac{|Correlations|\ast 2}{|Relations^{x}|+|Relations^{y}|} $$
(31)

where Relationsx/y is the set of direct edges in Gx/y. Correlations is the intersection of the two sets, and 0 ≤ Interop(x, y) ≤ 1. When the two network are completely overlapped (or non-overlapped), Interop(x, y) is equal to 1 (or 0).

Meanwhile, we develop a sample ratio of edges es to study the influence of different sparsity levels of networks. In order to reduce the impact of other factors, we conduct variants of datasets for experiments at different sparsity levels (es = [0.1,0.2,…,0.9]) by removing overlapped edges and non-overlapped edges of two networks simultaneously, and keep the Interop value constant.

Besides, we also evaluate the effectiveness of methods under different w and training ratios and the sensitivities under different number of samples and dimensions.

5.4 Datasets

For evaluating the effectiveness, reliability and sensitivity of LHNE, We applied methods above to two types of cross-network datasets, including both social network and synthetic datasets.

Social network dataset [26]

The first dataset is composed of two real social networks: Twitter and Flickr. There are 7118 anchor users with their follower/friend relationships (i.e., structural information). We collected tweets (2361.07 per user) in Twitter via Twitter API and crawled the tags (559.80 per user) in Flickr via Flickr API as user content information. The ground truth of anchor users are provided in the dataset. The basic statistics of them are shown in Table 1.

Table 1 Statistics of social network dataset

Synthetic dataset

The synthetic dataset consists of two co-author networks including the co-author relationships (i.e., structural information) and paper titles from the fields of Data Mining (DM) and Wide World Web (WWW), which is constructed from Extracted DBLP Dataset [29]. We used paper titles as content information. On average, each author has 2.07 titles in DM and 1.81 titles in WWW. Because the network is directed in this paper, the co-author relationships are regarded as two directed edges with opposite directions and equal weights. There are 5353 anchor authors in synthetic dataset, forming the ground truth. The statistics of the dataset are shown in Table 2.

Table 2 Statistics of synthetic dataset

Analyzing two datasets above, we find the social network dataset only contains anchor user information, while the synthetic dataset includes anchor user and their non-anchor friend information simultaneously. For making our experiments more reliable, we evaluate the effectiveness of our proposed method on two datasets and the reliability and sensitivity on the synthetic dataset, because the latter contains more comprehensive user information.

5.5 Experiment results on social network dataset

In this section, we present the performance of all methods on social network dataset. LDA model is adopted to help generate the topics of user interests. The number of topics K is set to 60 and all hyperparameters are set to 1/K. For the purpose of achieving better embeddings, we set imbalance ratio of classifier \(\frac {\#negtive}{\#positive}= 1\), since the classifier achieves better performance when the training sets are more balanced [7, 32].

Performance w.r.t topic threshold.

It is critical to find the appropriate the topics of user interests that can represent users’ real interests for solving the noise problem in social networks. Therefore, we first conduct experiments to study the impact of h, which can help filter out noise of contents. Table 3 presents the performance of our proposed LHNE and LHNE-U in terms of recall, precision and F1 with different threshold h. From the results, we observe that the performance is sensitive to h. First, the performance of the two methods improves with the increase of h and achieves the best performance when h = 0.3. This is because the contents contain a lot of noise and the noise can be filtered with a lower threshold h. Then, the increase of h leads to the decrease of performance, as useful information is also filtered with a higher threshold. To achieve the best performance, we set h = 0.3.

Table 3 Performance w.r.t topic threshold on social network dataset

Performance w.r.t similarity threshold and training ratio.

We show the performance under different similarity threshold and training ratio settings in Figures 2 and 3, as results are very sensitive to them. As discussed in Section 4.5, the recall declines with the increase of w, since many user pairs are filtered with a higher w. Meanwhile, the user pairs with high similarities are more likely to be real corresponding user pairs. It is well known that the real corresponding user pairs usually have larger similarities than the others [5]. Therefore, the precision increases with the increase of w. To balance the recall and precision, we set w = 0.9.

Figure 2
figure 2

Performance w.r.t similarity threshold on social network dataset

Figure 3
figure 3

Performance w.r.t training ratio on social network dataset

According to Figure 2, our proposed LHNE outperforms other competitors significantly. Specifically, there is a 47.93% relative increase (0.682 vs. 0.461 recall score) comparing to IONES-C, 64.34% relative increase (0.682 vs. 0.415 recall score) comparing to IONES and 182.99% relative increase (0.682 vs. 0.241 recall score) comparing to KNN when w = 0.9. By taking a closer look at the dataset, we notice that there are 40.94% users without any links to other users in Flickr network and 18.45% users in Twitter network. The loss of structural information degenerates the performance of methods. LHNE can solve this problem by linking the topics and users because of the correlation of information, therefore, the missing information between users can be supplemented via topic nodes serving as the context of user nodes. In contrast, IONES-C and KNN consider content and structural information separately and IONES only considers structural information, so that they fail to correlate users without social links. Meanwhile, LHNE exploits the transfer across networks in terms of users and topics. With the help of user transfer and topic transfer, the representations of users are more comprehensive and effective. Consequently, LHNE has better performance than LHNE-U, since LHNE-U only considers user transfer across networks. Besides, it can be concluded that the structural information is more discriminative than the content information, as LHNE-S outperforms LHNE-C. In the meantime, LHNE performs better than LHNE-S (with only structural information) and LHNE-C (with only content information), showing the benefits brought by jointly leveraging structural and content information.

Additionally, Figure 3 presents LHNE outperforms other methods with different training ratios. It can be observed that LHNE achieves much higher recall, precision and F1 even when the given ratio is as low as 10% to 20%, indicating that LHNE can capture most common knowledge for user identity linkage by jointly leveraging structural and content information. It is significant for real social networks without a lot of training data. Considering the performance of the competitors, we set training ratio as 0.9.

5.6 Experiment results on synthetic dataset

In this experiment, we focus on the reliability of LHNE on different network structures (e.g., Interop and sparsity) and parameter sensitivity (e.g., the number of samples and dimension) besides effectiveness. The parameter settings of K, hyperparameters and imbalance ratio are as same as social network dataset.

Performance w.r.t similarity threshold and training ratio.

Figures 4 and 5 reports the performance of all methods on synthetic dataset. We can see that the comparison result is similar to that presented in Figures 2 and 3. LHNE performs better than the other methods under different w and training ratio. However, there are two different issues between Figures 4 and 5 and Figures 2 and 3. Firstly, all methods on synthetic dataset perform better than on social network dataset. This is because synthetic dataset has a better network structure. Specifically, synthetic dataset includes anchor users and non-anchor users simultaneously and the ratio of edges of the DM and WWW networks is more balanced than those of the social network dataset, as shown in Table 1. Secondly, we also observe that LHNE outperforms IONES-C and IONES more greatly on social network dataset than on the synthetic dataset. Because of the nature of social networks, the social network datasets has more noise than the synthetic dataset. LHNE is robust by filtering noise of content with a suitable topic threshold h. Moreover, Joint Embedding of structural and content information is more stable for LHNE. Similar to the considerations on the social network dataset, we set w as 0.95 and the training ratio as 0.9.

Figure 4
figure 4

Performance w.r.t similarity threshold on synthetic dataset

Figure 5
figure 5

Performance w.r.t training ratio on synthetic dataset

Reliability w.r.t Interop and sparsity.

Because all methods are based on network structures, they often suffer from problems such as non-overlap and sparsity. Therefore, we explore the performance for different Interop and sparsity level. Figures 6 and 7 show the results. The Interop of the original synthetic dataset is 0.0889. In this experiment, we vary the Interop value from 10% to 60% by removing edges of anchor users. According to Figure 6, all methods depend on the Interop, and they achieve better performance as the Interop value increases. Moreover, LHNE outperforms all competitors, even when the Interop value is low. Figure 7 shows LHNE performs better than other embedding-based methods, even in the cases where the sparsity level is lower than 30%. In other words, LHNE can achieve good performance with relatively less structural information. It can be concluded that linking topics and users can make LHNE more reliable. To maintain the original network structures, we set the Interop as 0.0889 and the sparsity level as 1.

Figure 6
figure 6

Reliability w.r.t Interop on synthetic dataset

Figure 7
figure 7

Reliability w.r.t sparsity on synthetic dataset

Sensitivity w.r.t number of samples and dimension.

For embedding-based methods, the number of samples and the dimension have a significant impact on the computational speed and storage. Therefore, we analyze the converging performance by varying the number of samples and the performance by varying d. Figure 8 shows LHNE converges much faster than IONES-C and IONES. We believe the gain comes from the contribution of joint embedding of structural and content information. It is beneficial for real-time applications based on user identity linkage. Therefore, considering the real-time requirement and the convergence of the competitors, we set the number of samples as 10 million when all methods converge. Besides, Figure 9 presents that LHNE achieves much higher performance than other methods even when d is low. Therefore, more efficient computation in term of time and space can be realized through lower dimensional embedding. We set d = 100 when all the methods are stable and can obtain best performance.

Figure 8
figure 8

Sensitivity w.r.t number of samples on synthetic dataset

Figure 9
figure 9

Sensitivity w.r.t dimension on synthetic dataset

6 Conclusion

In this paper, we aim to learning the comprehensive representations of users considering the fact that structural and content information are correlative, and propose a linked heterogeneous network embedding method for user identity linkage to address the challenging issues, including heterogeneity of information, diversity of social networks and noise. We conducted extensive experiments to evaluate the performance of LHNE on both real social network and synthetic datasets. The results showed LHNE is significantly better than the state-of-the-art methods (up to 47.93% enhancement comparing to IONES-C), when there are 40.94% and 18.45% users without any links to others in Twitter and Flickr network, respectively. Therefore, our model can work well even with little or no structural information, when data acquisition is difficult in social networks because of privacy protection. The performance of LHNE can be reinforced by fully exploring the correlation between heterogeneous information, which can provide complementary information to each other. For future works, we may consider integrating more types of information into LHNE for improving the performance of embedding and extending the model to multiple networks.