Abstract
In recent years, network representation learning on complex information networks attracts more and more attention. Scholars usually use matrix factorization or deep learning methods to learn network representation automatically. However, existing methods only preserve single feature of networks. How to effectively integrate multiple features of network is a challenge. To tackle this challenge, we propose an unsupervised learning algorithm named Multi-View Learning of Network Embedding. The algorithm preserves multiple features that including vertex attribute, network global and local topology structure. Features are treated as network views. We use a variant of convolutional neural networks to learn features from these views. The algorithm maximizes the correlation between different views by canonical correlation analysis, and learns the embedding that preserve multiple features of networks. Comprehensive experiments are conducted on five real networks. We demonstrate that our method can better preserve multiple features and outperform baseline algorithms in community detection, network reconstruction and visualization.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Network representation learning
- Multi-view fusion
- Convolutional neural networks
- Canonical Correlation Analysis
1 Introduction
Large-scale information networks are common information carriers in real world. Mining knowledge from complex information networks can help people to understand network structure [1] or information dissemination patterns [2]. Network representation learning (NRL) [3] is a basic issue in network mining area which mainly studies how to map features of vertex in a network to a low-dimensional, continuous real-valued embedding, and the process of mapping is not only try to preserve the structural feature s, but also try to preserve the properties of the vertex. Embedding learned by NRL can be used as input feature for machine learning methods, and has important applications in the real world, such as network visualization [5], network reconfiguration [4, 6], community detection [7], link prediction [8], etc.
The traditional NRL method is similar to dimensionality reduction, such as Graph Factorization (GF) [9], Local Linear Embedding (LLE) [10] Large-scale Information Network Embedding (LINE) [11] and HOPE algorithm [12], etc. These methods use single matrix to present the similarity graph structure of networks, and obtain low-dimensional embedding for networks by factorize this matrix. Matrix Factorization based Methods is unstable and incomplete because they have strong dependence on constructing single feature matrix. Our task selects multiple features from networks, and design an unsupervised fusion algorithm based on deep neural networks.
2 Related Work
With the advent of the era of big data, deep learning (DL) technology are developing rapidly. DL can discover complex structures in big data via multiple processing layers. DL brings significant results in many areas, such as computer vision, language modeling, etc. In recent years, scholars have done a lot of research on applying DL models to represent graphs or networks. Deepwalk [13] and node2vec [14] use random walk to generate sequence of nodes and adopt an unsupervised neural language model (SkipGram) [15] for networks embedding. SDNE [5] uses an unsupervised deep self-encoder to model the second-order proximity, the hidden layer of the deep self-encoder is the embedding of networks. In addition, convolutional neural networks (CNN) and its variants have been widely adopted in representation learning. PATCHY-SAN [16] selects fixed-length node sequence to assemble the neighborhood of nodes and directly use the original CNN model designed for Euclidean domains. GCN [17] defines the convolution in the spectral domain, and constructs a semi-supervised model for node classification task. However, networks usually contains multiple types of information, such as node attribute information, structure information, text information, etc. Existing method is incomplete because it only learns single types of information. In addition, existing method lack universality because each representation learning model is designed with a specific optimization goals.
Unlike previous approaches, we propose an unsupervised learning algorithm named multi-view of network embedding, also known as MVNE. MVNE uses multiple vertex attribute (text information, geographic location, user tags, etc.), network global topology, and local topology features as input features, and they are treated as network views. The views express the characteristics of different aspects of the network. We consider multiple localized first-order approximation spectral graph convolutions to extract features from views, and fuse features by analyzing correlation between them. The model can be applied to various network tasks because it learns representations in a fully unsupervised setting.
3 Multi-View Learning of Network Embedding
3.1 A Subsection Sample
We define a network \( {\text{G}} = \left( {V,E} \right) \), \( V = \left\{ {v_{1} , \ldots ,v_{i} , \ldots ,v_{N} } \right\} \) is the collection of network vertices, where \( N \) is the number of vertices. \( E \) is the collection of network edges. \( e_{ij} = \left( {v_{i} ,v_{j} } \right) \in E \) represents an edge between \( v_{i} \) and \( v_{j} \). \( {\text{A}} \) is the adjacency matrix. If there is an edge between \( v_{i} \) and \( v_{j} \), then \( A_{ij} = 1 \), otherwise \( A_{ij} = 0 \). The vertices feature matrix \( X \) corresponding to \( G \) is a highly sparse matrix. Dimension of \( X \) is usually expressed as \( \left| {\text{V}} \right| \times {\rm{m}} \), where \( {\text{m}} \) is the feature space size of the attribute. Vertices usually have multiple attributes, such as geographic location, age, hobbies, etc. Let \( Attr = \left\{ {X_{1} , \ldots ,X_{p} } \right\} \) denote the feature matrices set of network \( G \) which are treated as views of networks. In this paper, we assume that the input to our algorithm is an undirected network \( G \) and its feature matrices \( Attr \). The goal of our algorithm is mapping each vertex to a low-dimensional vector \( {\text{z}} \in {\mathbb{R}}^{d} \) by fusing the information contained in A and \( Attr \), where \( d \ll \left| V \right| \).
3.2 Feature Extraction Based on Graph Convolution
Convolutional neural networks (CNN) has achieved good results in areas. CNN can process Euclidean data (e.g. image data) efficiently. However, network data belongs to Graph-structured Data. In order to learn the features in Graph-structured Data, this paper use a variant of CNN which called spectral convolution to extract feature map from views in networks. The definition of spectral convolution is as shown in Eq. (1).
\( {\text{X}} \in {\mathbb{R}}^{{\left| {\rm{V}} \right| \times {\text{m}}}} \) is a feature matrix, \( {\text{g}}_{\uptheta} = {\rm{diag}}\left(\uptheta \right) \) is a filter. Spectral convolution \( g_{\theta } \) is generated by decomposing the normalized graph Laplacian matrix shown as Eq. (2).
\( D \) is the degree matrix, \( U \) is the matrix of eigenvectors of \( L \), \( \varLambda \) is the diagonal matrix of eigenvalues of \( {\text{L}} \). According to Eqs. (1) and (2), it can be seen that filter \( g_{\theta } \) is a function of eigenvalues \( \Lambda \). We can obtain the filter \( g_{\theta } \) via the eigenvalue decomposition of \( L \). However, in large-scale networks, eigenvalue decomposition of L is computationally expensive. So we use \( K^{th} \)-order Chebyshev polynomial to approximate \( g_{\theta } \left( \varLambda \right) \) in Eq. (3).
In Eq. (3), \( {\tilde{\Lambda }} = \frac{2}{{\lambda_{ \hbox{max} } }}\Lambda - {\text{I}}_{\rm{N}} \), \( \lambda_{ \hbox{max} } \) is the largest eigenvalue of \( {\text{L}} \). \( \uptheta^{\prime } \in {\mathbb{R}}^{\text{K}} \) is a vector of Chebyshev coefficients. \( {\text{T}}_{\rm{k}} \left( {{\tilde{\text{L}}}} \right) = 2{\tilde{\rm{L}}\text{T}}_{{{\rm{k}} - 1}} \left( {{\tilde{\text{L}}}} \right) - {\rm{T}}_{{{\text{k}} - 2}} \left( {{\tilde{\rm{L}}}} \right) \), with \( {\text{T}}_{0} \left( {{\tilde{\rm{L}}}} \right) = 1 \) and \( {\text{T}}_{1} \left( {{\tilde{\rm{L}}}} \right) = {\tilde{\text{L}}} \). If \( {\text{K}} = 2 \) and \( \lambda_{ \hbox{max} } \approx 2 \), we can obtain
where \( {\tilde{\text{A}}} = {\rm{A}} + {\text{I}}_{\rm{N}} \), \( {\tilde{\text{D}}} \) is the degree matrix of \( {\tilde{\text{A}}} \). The equation has parameter \( \uptheta =\uptheta_{0}^{\prime } = -\uptheta_{1}^{\prime } \), and it is a matrix of filter parameters in graph convolution network. As an example, we use two-layer convolution network with different \( {\text{W}} \) to learn multiple views in networks. The graph convolution network can be expressed as follows:
In Eq. (5), \( \uptheta \) to denote the vector of all filter parameters \( {\text{W}} \). \( {\hat{\text{A}}} = {\tilde{\rm{D}}}^{{ - \frac{1}{2}}} {{\tilde{\text{A}}{\tilde{\rm{D}}}}}^{{ - \frac{1}{2}}} \) is a symmetric and sparse adjacency matrix which converges the weight information of the nodes in the first-order domain of the target node. \( {\text{z}}\left(\uptheta \right) \) is the feature map learned from feature \( {\text{X}} \). We will obtain multiple feature maps by multiple convolution operations. This process is shown in Fig. 1. We consider three views in network.
3.3 MVNE Algorithm
Canonical Correlation Analysis (CCA) is used to mine complex relation mappings between two views \( ({\text{X}}_{1} ,{\rm{X}}_{2} ) \in {\mathbb{R}}^{{{\text{n}}_{1} }} \times \) \( {\mathbb{R}}^{{{\text{n}}_{2} }} \) by finding pairs of projections \( {\text{w}}_{1} ,{\rm{w}}_{2} \) that are maximize the correlation between views. The goal of CCA is shown in Eq. (6), where \( \Sigma _{11} \) and \( \Sigma _{22} \) are covariance, \( \Sigma _{12} \) is cross-covariance.
Inspired by CCA, we fuse multi-view by finding a canonical coordinate space that maximizes correlations between the projections of views. We use \( {\text{X}} = \left( {{\rm{x}}_{1} ,{\text{x}}_{2} , \ldots ,{\rm{x}}_{\text{n}} } \right) \in {\mathbb{R}}^{{{\text{N}} \times {\rm{m}}_{\text{x}} }} \) and \( {\text{Y}} = \left( {{\rm{y}}_{1} ,{\text{y}}_{2} , \ldots ,{\rm{y}}_{\text{n}} } \right) \in {\mathbb{R}}^{{{\rm{N}} \times {\text{m}}_{\rm{y}} }} \) obtained from network as an example to explain the principle of view fusion. \( {\text{m}}_{\rm{x}} \) and \( {\text{m}}_{\rm{y}} \) are dimensions of views. The goal of our task is learning the parameters \( \uptheta \) in Eq. (7) for every network views, this is express as
Let \( {\text{Z}}_{\rm{X}} = {\text{z}}\left( {{\rm{X}};\uptheta_{1} } \right) \) and \( {\text{Z}}_{\rm{y}} = {\text{z}}\left( {{\rm{Y}};\uptheta_{2} } \right) \) be the matrix produced by the graph convolutional layer on two views. \( \Sigma _{\text{xx}} = {\rm{Z}}_{\text{X}} {\rm{Z}}_{\text{X}}^{\prime } + {\rm{r}}_{1} {\text{I}} \), \( \Sigma _{\text{yy}} = {\rm{Z}}_{\text{Y}} {\rm{Z}}_{\text{Y}}^{\prime } + {\rm{r}}_{2} {\text{I}} \) are covariance matrices of \( \left( {{\text{Z}}_{\rm{X}} ,\,{\text{Z}}_{\rm{y}} } \right) \), and \( \Sigma _{\text{xy}} =\Sigma _{\rm{yx}}^{\prime } = {\text{Z}}_{\rm{X}} {\text{Z}}_{\rm{Y}}^{\prime } \) is the cross-covariance matrices of \( \left( {{\text{Z}}_{\rm{X}} ,\,{\text{Z}}_{\rm{y}} } \right) \). \( {\text{r}}_{1} ,{\rm{r}}_{2} > 0 \) is the regularization constant to reduce over-fitting in training data. We define \( {\text{O}} =\Sigma _{\rm{xx}}^{{ - \frac{1}{2}}}\Sigma _{\text{xy}}\Sigma _{\rm{yy}}^{{ - \frac{1}{2}}} \) according to the objection in Eq. (6), then we use the traces of \( {\text{O}} \) to simplify the calculation of the objective function in Eq. (7)
In order to find \( \uptheta_{1} ,\uptheta_{2} \) such that Eq. (8) is as high as possible, we calculate the gradient of \( {\text{corr}}\left( {{\rm{Z}}_{\text{X}} ,{\rm{Z}}_{\text{Y}} } \right) \) with respect to \( \uptheta_{1} ,\uptheta_{2} \), then use back propagation. Algorithm describes the multi-view fusing and embedding generation process.
4 Experiments
4.1 Dataset
In order to evaluate the effectiveness of MVNE, we use community detection, network reconstruction and network visualization to evaluate different embedding generated by different methods. Table 1 gives a properties list of all real network in our experiment. We construct random walk matrix \( X_{R} \in R^{\left| V \right| \times \left| V \right|} \) based on network to preserve local topology structure, and \( X_{{R_{ij} }} \) denote the frequency of node \( v_{j} \) appearing in random walk sequence of node \( v_{i} \). \( X_{R} \) can preserve node centrality and higher-order proximity between nodes. In addition, some networks in Table 1 contain rich information, such as region, hobbies, etc. We construct feature matrices base on attributes by one-hot coding, and use 5 layers MVNE to generate embedding.
4.2 Community Detection
Community detection is a basic task of network analysis. The interaction between nodes in same community is more frequent than the interaction among other communities in networks. We use K-means to classify and assign community label for every node base on embedding. Modularity can be used to evaluate the quality of community detection. Embedding with high modularity is high-quality. We report performance of eight models including our model. The result is shown in Table 2.
We can see that MVNE has higher modularity than other benchmark algorithms. The results show that the embedding learned by MVNE is more effective because it contains multiple network information.
4.3 Network Reconstruction
The purpose of network reconstruction is to rebuild the links between nodes based on similarity of node pairs. The similarity between nodes is evaluated by the distance of their embedding. The experiment randomly selects 20% node pairs from whole pairs as a sub-network sample. We take k pairs of nodes with the highest similarity as predicted links and calculate actual link ratio to evaluate the accuracy of network reconstruction. If node embedding is effective, the accuracy will be high.
Figure 2 shows the mean and standard deviation of corresponding accuracy. Accuracy of network reconstruction decreases with increasing of k value. MVNE achieves better network reconstruction with different k values. The reconstruction precision can reach about 80% while k = 2.
4.4 Visualization
We can visualize embedding to make better understanding for topology and characteristics of networks intuitively. Embedding learned by different methods are different in the ability of visualization and interpretation. We compare visualization ability of embedding learned by different algorithms in Football network. Each algorithm learns 64-D embedding for nodes, and use t-SNE [4] to reduce dimension to 2-D. We color nodes to observe the basic community structure of networks. The results are shown in Fig. 3. Some nodes belonging to different communities are mixed up in HOPE, LINE and GF. The embedding generated by DeepWalk and Node2Vec can represent clear community structure, but there are still a few nodes belonging to different communities mixed. MVNE is more effective than other benchmark algorithms because nodes belonging to same communities are separated clearly.
5 Future Work
In this paper, we propose a Multi-View Learning algorithm to generate embedding of networks. It fuses multiple features of networks by an unsupervised learning process. In the future, how to improve the learning efficiency of MVNE on large-scale network is a very important problem. In addition, introducing dynamic interaction information as a feature into NRL process is also worthwhile to study.
References
Zhao, D., Wang, L., Li, S., Wang, Z., et al.: Immunization of epidemics in multiplex networks. PLoS ONE 9(11), e112018 (2014)
Cozzo, E., Banos, R.A., Meloni, S., et al.: Contact-based social contagion in multiplex networks. Phys. Rev. E 88(5), 660–691 (2013)
Tan, S., Guan, Z., Cai, D., et al.: Mapping users across networks by manifold alignment on hypergraph. In: 28th AAAI Conference on Artificial Intelligence, vol. 1, pp. 159–165 (2014)
Maaten, L.v.d., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1225–1234 (2016)
Ou, M., Cui, P., Pei, J., Zhang, Z., et al.: Asymmetric transitivity preserving graph embedding. In: 22nd International Conference on Knowledge Discovery and Data Mining, pp. 1105–1114 (2016)
Bhagat, S., Cormode, G., Muthukrishnan, S.: Node classification in social networks. In: Aggarwal, C. (ed.) Social Network Data Analytics, pp. 115–148. Springer, Boston (2011). https://doi.org/10.1007/978-1-4419-8462-3_5
Ahmed, A., Shervashidze, N., Narayanamurthy, S., et al.: Distributed large-scale natural graph factorization. In: 22nd International Conference on World Wide Web, pp. 37–48 (2013)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: LINE: large-scale information network embedding. In: Proceedings 24th International Conference on World Wide Web, pp. 1067–1077. ACM (2015)
Ou, M., Cui, P., Pei, J., Zhang, Z., Zhu, W.: Asymmetric transitivity preserving graph embedding. In: 22nd International Conference on Knowledge Discovery and Data Mining, pp. 1105–1114 (2016)
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. In: 33rd International Conference on Machine Learning, vol. 48, pp. 2014–2023 (2016)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)
Acknowledgment
This work is supported by the National Natural Science Foundation of China (Grant No. 61170112), Beijing Natural Science Foundation (4172016), and the Scientific Research Project of Beijing Educational Committee (KM201710011006), and Key Lab of Information Network Security, Ministry of Public Security).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Han, Z., Zheng, C., Liu, D., Duan, D., Yang, W. (2019). Multi-View Learning of Network Embedding. In: Kojima, K., Sakamoto, M., Mineshima, K., Satoh, K. (eds) New Frontiers in Artificial Intelligence. JSAI-isAI 2018. Lecture Notes in Computer Science(), vol 11717. Springer, Cham. https://doi.org/10.1007/978-3-030-31605-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-31605-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31604-4
Online ISBN: 978-3-030-31605-1
eBook Packages: Computer ScienceComputer Science (R0)