Multi-View Learning of Network Embedding

Han, Zhongming; Zheng, Chenye; Liu, Dan; Duan, Dagao; Yang, Weijie

doi:10.1007/978-3-030-31605-1_8

Zhongming Han^12,13,
Chenye Zheng¹³,
Dan Liu¹³,
Dagao Duan^12,13 &
…
Weijie Yang¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11717))

Included in the following conference series:

JSAI International Symposium on Artificial Intelligence

1000 Accesses
1 Citations

Abstract

In recent years, network representation learning on complex information networks attracts more and more attention. Scholars usually use matrix factorization or deep learning methods to learn network representation automatically. However, existing methods only preserve single feature of networks. How to effectively integrate multiple features of network is a challenge. To tackle this challenge, we propose an unsupervised learning algorithm named Multi-View Learning of Network Embedding. The algorithm preserves multiple features that including vertex attribute, network global and local topology structure. Features are treated as network views. We use a variant of convolutional neural networks to learn features from these views. The algorithm maximizes the correlation between different views by canonical correlation analysis, and learns the embedding that preserve multiple features of networks. Comprehensive experiments are conducted on five real networks. We demonstrate that our method can better preserve multiple features and outperform baseline algorithms in community detection, network reconstruction and visualization.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Multi-view network embedding with node similarity ensemble

Article 12 March 2020

Intra-view and Inter-view Attention for Multi-view Network Embedding

Mineral: Multi-modal Network Representation Learning

Keywords

1 Introduction

Large-scale information networks are common information carriers in real world. Mining knowledge from complex information networks can help people to understand network structure [1] or information dissemination patterns [2]. Network representation learning (NRL) [3] is a basic issue in network mining area which mainly studies how to map features of vertex in a network to a low-dimensional, continuous real-valued embedding, and the process of mapping is not only try to preserve the structural feature s, but also try to preserve the properties of the vertex. Embedding learned by NRL can be used as input feature for machine learning methods, and has important applications in the real world, such as network visualization [5], network reconfiguration [4, 6], community detection [7], link prediction [8], etc.

The traditional NRL method is similar to dimensionality reduction, such as Graph Factorization (GF) [9], Local Linear Embedding (LLE) [10] Large-scale Information Network Embedding (LINE) [11] and HOPE algorithm [12], etc. These methods use single matrix to present the similarity graph structure of networks, and obtain low-dimensional embedding for networks by factorize this matrix. Matrix Factorization based Methods is unstable and incomplete because they have strong dependence on constructing single feature matrix. Our task selects multiple features from networks, and design an unsupervised fusion algorithm based on deep neural networks.

2 Related Work

With the advent of the era of big data, deep learning (DL) technology are developing rapidly. DL can discover complex structures in big data via multiple processing layers. DL brings significant results in many areas, such as computer vision, language modeling, etc. In recent years, scholars have done a lot of research on applying DL models to represent graphs or networks. Deepwalk [13] and node2vec [14] use random walk to generate sequence of nodes and adopt an unsupervised neural language model (SkipGram) [15] for networks embedding. SDNE [5] uses an unsupervised deep self-encoder to model the second-order proximity, the hidden layer of the deep self-encoder is the embedding of networks. In addition, convolutional neural networks (CNN) and its variants have been widely adopted in representation learning. PATCHY-SAN [16] selects fixed-length node sequence to assemble the neighborhood of nodes and directly use the original CNN model designed for Euclidean domains. GCN [17] defines the convolution in the spectral domain, and constructs a semi-supervised model for node classification task. However, networks usually contains multiple types of information, such as node attribute information, structure information, text information, etc. Existing method is incomplete because it only learns single types of information. In addition, existing method lack universality because each representation learning model is designed with a specific optimization goals.

Unlike previous approaches, we propose an unsupervised learning algorithm named multi-view of network embedding, also known as MVNE. MVNE uses multiple vertex attribute (text information, geographic location, user tags, etc.), network global topology, and local topology features as input features, and they are treated as network views. The views express the characteristics of different aspects of the network. We consider multiple localized first-order approximation spectral graph convolutions to extract features from views, and fuse features by analyzing correlation between them. The model can be applied to various network tasks because it learns representations in a fully unsupervised setting.

3 Multi-View Learning of Network Embedding

3.1 A Subsection Sample

We define a network $ {\text{G}} = \left( {V,E} \right) $, $ V = \left\{ {v_{1} , \ldots ,v_{i} , \ldots ,v_{N} } \right\} $ is the collection of network vertices, where $ N $ is the number of vertices. $ E $ is the collection of network edges. $ e_{ij} = \left( {v_{i} ,v_{j} } \right) \in E $ represents an edge between $ v_{i} $ and $ v_{j} $. $ {\text{A}} $ is the adjacency matrix. If there is an edge between $ v_{i} $ and $ v_{j} $, then $ A_{ij} = 1 $, otherwise $ A_{ij} = 0 $. The vertices feature matrix $ X $ corresponding to $ G $ is a highly sparse matrix. Dimension of $ X $ is usually expressed as $ \left| {\text{V}} \right| \times {\rm{m}} $, where $ {\text{m}} $ is the feature space size of the attribute. Vertices usually have multiple attributes, such as geographic location, age, hobbies, etc. Let $ Attr = \left\{ {X_{1} , \ldots ,X_{p} } \right\} $ denote the feature matrices set of network $ G $ which are treated as views of networks. In this paper, we assume that the input to our algorithm is an undirected network $ G $ and its feature matrices $ Attr $. The goal of our algorithm is mapping each vertex to a low-dimensional vector $ {\text{z}} \in {\mathbb{R}}^{d} $ by fusing the information contained in A and $ Attr $, where $ d \ll \left| V \right| $.

3.2 Feature Extraction Based on Graph Convolution

Convolutional neural networks (CNN) has achieved good results in areas. CNN can process Euclidean data (e.g. image data) efficiently. However, network data belongs to Graph-structured Data. In order to learn the features in Graph-structured Data, this paper use a variant of CNN which called spectral convolution to extract feature map from views in networks. The definition of spectral convolution is as shown in Eq. (1).

$$ g_{\theta } *X = Ug_{\theta } U^{T} X $$

(1)

$ {\text{X}} \in {\mathbb{R}}^{{\left| {\rm{V}} \right| \times {\text{m}}}} $ is a feature matrix, $ {\text{g}}_{\uptheta} = {\rm{diag}}\left(\uptheta \right) $ is a filter. Spectral convolution $ g_{\theta } $ is generated by decomposing the normalized graph Laplacian matrix shown as Eq. (2).

$$ L = I_{N} - D^{{ - \frac{1}{2}}} AD^{{ - \frac{1}{2}}} = U^{T}\Lambda U $$

(2)

$ D $ is the degree matrix, $ U $ is the matrix of eigenvectors of $ L $, $ \varLambda $ is the diagonal matrix of eigenvalues of $ {\text{L}} $. According to Eqs. (1) and (2), it can be seen that filter $ g_{\theta } $ is a function of eigenvalues $ \Lambda $. We can obtain the filter $ g_{\theta } $ via the eigenvalue decomposition of $ L $. However, in large-scale networks, eigenvalue decomposition of L is computationally expensive. So we use $ K^{th} $-order Chebyshev polynomial to approximate $ g_{\theta } \left( \varLambda \right) $ in Eq. (3).

$$ g_{{\theta^{\prime } }} *X = Ug_{{\theta^{\prime } }} U^{T} X = \sum\nolimits_{k = 0}^{K} {\theta_{k}^{\prime } T_{k} \left( {U{\tilde{\Lambda }}U^{T} } \right)X} = \sum\nolimits_{k = 0}^{K} {\theta_{k}^{\prime } T_{k} \left( {{\tilde{\text{L}}}} \right)X} $$

(3)

In Eq. (3), $ {\tilde{\Lambda }} = \frac{2}{{\lambda_{ \hbox{max} } }}\Lambda - {\text{I}}_{\rm{N}} $, $ \lambda_{ \hbox{max} } $ is the largest eigenvalue of $ {\text{L}} $. $ \uptheta^{\prime } \in {\mathbb{R}}^{\text{K}} $ is a vector of Chebyshev coefficients. $ {\text{T}}_{\rm{k}} \left( {{\tilde{\text{L}}}} \right) = 2{\tilde{\rm{L}}\text{T}}_{{{\rm{k}} - 1}} \left( {{\tilde{\text{L}}}} \right) - {\rm{T}}_{{{\text{k}} - 2}} \left( {{\tilde{\rm{L}}}} \right) $, with $ {\text{T}}_{0} \left( {{\tilde{\rm{L}}}} \right) = 1 $ and $ {\text{T}}_{1} \left( {{\tilde{\rm{L}}}} \right) = {\tilde{\text{L}}} $. If $ {\text{K}} = 2 $ and $ \lambda_{ \hbox{max} } \approx 2 $, we can obtain

$$ g_{{\theta^{\prime } }} *X \approx \theta_{0}^{\prime } X + \theta_{1}^{\prime } \left( {L - I_{N} } \right)X \approx \theta \left( {\tilde{D}^{{ - \frac{1}{2}}} \tilde{A}\tilde{D}^{{ - \frac{1}{2}}} } \right)X $$

(4)

where $ {\tilde{\text{A}}} = {\rm{A}} + {\text{I}}_{\rm{N}} $, $ {\tilde{\text{D}}} $ is the degree matrix of $ {\tilde{\text{A}}} $. The equation has parameter $ \uptheta =\uptheta_{0}^{\prime } = -\uptheta_{1}^{\prime } $, and it is a matrix of filter parameters in graph convolution network. As an example, we use two-layer convolution network with different $ {\text{W}} $ to learn multiple views in networks. The graph convolution network can be expressed as follows:

$$ {\text{z}}\left(\uptheta \right) = {\rm{ReLU}}\left( {{\hat{\text{A}}\rm{ReLU}}\left( {{\hat{\text{A}}\rm{XW}}^{0} } \right){\text{W}}^{1} } \right) $$

(5)

In Eq. (5), $ \uptheta $ to denote the vector of all filter parameters $ {\text{W}} $. $ {\hat{\text{A}}} = {\tilde{\rm{D}}}^{{ - \frac{1}{2}}} {{\tilde{\text{A}}{\tilde{\rm{D}}}}}^{{ - \frac{1}{2}}} $ is a symmetric and sparse adjacency matrix which converges the weight information of the nodes in the first-order domain of the target node. $ {\text{z}}\left(\uptheta \right) $ is the feature map learned from feature $ {\text{X}} $. We will obtain multiple feature maps by multiple convolution operations. This process is shown in Fig. 1. We consider three views in network.

3.3 MVNE Algorithm

Canonical Correlation Analysis (CCA) is used to mine complex relation mappings between two views $ ({\text{X}}_{1} ,{\rm{X}}_{2} ) \in {\mathbb{R}}^{{{\text{n}}_{1} }} \times $ $ {\mathbb{R}}^{{{\text{n}}_{2} }} $ by finding pairs of projections $ {\text{w}}_{1} ,{\rm{w}}_{2} $ that are maximize the correlation between views. The goal of CCA is shown in Eq. (6), where $ \Sigma _{11} $ and $ \Sigma _{22} $ are covariance, $ \Sigma _{12} $ is cross-covariance.

$$ \left( {{\text{w}}_{1}^{ *} ,{\rm{w}}_{2}^{ *} } \right) = {}_{{{\text{w}}_{1} ,{\rm{w}}_{2} }}^{\text{argmax}} {\rm{corr}}\left( {{\text{w}}_{1}^{\prime } {\rm{X}}_{1} ,{\text{w}}_{2}^{\prime } {\rm{X}}_{2} } \right) = \begin{array}{*{20}c} {\text{argmax}} \\ {{\rm{w}}_{1} ,{\text{w}}_{2} } \\ \end{array} \frac{{{\rm{w}}_{1}^{\prime }\Sigma _{12} {\text{w}}_{2} }}{{\sqrt { {\rm{w}}_{1}^{\prime } \Sigma _{11} {\text{w}}_{1} {\rm{w}}_{2}^{\prime } \Sigma _{22} {\text{w}}_{2} } }} $$

(6)

Inspired by CCA, we fuse multi-view by finding a canonical coordinate space that maximizes correlations between the projections of views. We use $ {\text{X}} = \left( {{\rm{x}}_{1} ,{\text{x}}_{2} , \ldots ,{\rm{x}}_{\text{n}} } \right) \in {\mathbb{R}}^{{{\text{N}} \times {\rm{m}}_{\text{x}} }} $ and $ {\text{Y}} = \left( {{\rm{y}}_{1} ,{\text{y}}_{2} , \ldots ,{\rm{y}}_{\text{n}} } \right) \in {\mathbb{R}}^{{{\rm{N}} \times {\text{m}}_{\rm{y}} }} $ obtained from network as an example to explain the principle of view fusion. $ {\text{m}}_{\rm{x}} $ and $ {\text{m}}_{\rm{y}} $ are dimensions of views. The goal of our task is learning the parameters $ \uptheta $ in Eq. (7) for every network views, this is express as

$$ {}_{{\theta_{1} ,\theta_{2} }}^{argmax} corr\left( {z\left( {X;\theta_{1} } \right),z\left( {Y;\theta_{2} } \right)} \right) $$

(7)

Let $ {\text{Z}}_{\rm{X}} = {\text{z}}\left( {{\rm{X}};\uptheta_{1} } \right) $ and $ {\text{Z}}_{\rm{y}} = {\text{z}}\left( {{\rm{Y}};\uptheta_{2} } \right) $ be the matrix produced by the graph convolutional layer on two views. $ \Sigma _{\text{xx}} = {\rm{Z}}_{\text{X}} {\rm{Z}}_{\text{X}}^{\prime } + {\rm{r}}_{1} {\text{I}} $, $ \Sigma _{\text{yy}} = {\rm{Z}}_{\text{Y}} {\rm{Z}}_{\text{Y}}^{\prime } + {\rm{r}}_{2} {\text{I}} $ are covariance matrices of $ \left( {{\text{Z}}_{\rm{X}} ,\,{\text{Z}}_{\rm{y}} } \right) $, and $ \Sigma _{\text{xy}} =\Sigma _{\rm{yx}}^{\prime } = {\text{Z}}_{\rm{X}} {\text{Z}}_{\rm{Y}}^{\prime } $ is the cross-covariance matrices of $ \left( {{\text{Z}}_{\rm{X}} ,\,{\text{Z}}_{\rm{y}} } \right) $. $ {\text{r}}_{1} ,{\rm{r}}_{2} > 0 $ is the regularization constant to reduce over-fitting in training data. We define $ {\text{O}} =\Sigma _{\rm{xx}}^{{ - \frac{1}{2}}}\Sigma _{\text{xy}}\Sigma _{\rm{yy}}^{{ - \frac{1}{2}}} $ according to the objection in Eq. (6), then we use the traces of $ {\text{O}} $ to simplify the calculation of the objective function in Eq. (7)

$$ \begin{array}{*{20}c} {arg{ \hbox{max} }} \\ {\theta_{1} ,\theta_{2} } \\ \end{array} corr\left( {Z_{X} ,Z_{Y} } \right) = max\,tr\left( {O^{\prime } O} \right)^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}}} $$

(8)

In order to find $ \uptheta_{1} ,\uptheta_{2} $ such that Eq. (8) is as high as possible, we calculate the gradient of $ {\text{corr}}\left( {{\rm{Z}}_{\text{X}} ,{\rm{Z}}_{\text{Y}} } \right) $ with respect to $ \uptheta_{1} ,\uptheta_{2} $, then use back propagation. Algorithm describes the multi-view fusing and embedding generation process.

4 Experiments

4.1 Dataset

In order to evaluate the effectiveness of MVNE, we use community detection, network reconstruction and network visualization to evaluate different embedding generated by different methods. Table 1 gives a properties list of all real network in our experiment. We construct random walk matrix $ X_{R} \in R^{\left| V \right| \times \left| V \right|} $ based on network to preserve local topology structure, and $ X_{{R_{ij} }} $ denote the frequency of node $ v_{j} $ appearing in random walk sequence of node $ v_{i} $. $ X_{R} $ can preserve node centrality and higher-order proximity between nodes. In addition, some networks in Table 1 contain rich information, such as region, hobbies, etc. We construct feature matrices base on attributes by one-hot coding, and use 5 layers MVNE to generate embedding.

Table 1. Experimental network basic properties.

Full size table

4.2 Community Detection

Community detection is a basic task of network analysis. The interaction between nodes in same community is more frequent than the interaction among other communities in networks. We use K-means to classify and assign community label for every node base on embedding. Modularity can be used to evaluate the quality of community detection. Embedding with high modularity is high-quality. We report performance of eight models including our model. The result is shown in Table 2.

Table 2. Modularity result of community detection.

Full size table

We can see that MVNE has higher modularity than other benchmark algorithms. The results show that the embedding learned by MVNE is more effective because it contains multiple network information.

4.3 Network Reconstruction

The purpose of network reconstruction is to rebuild the links between nodes based on similarity of node pairs. The similarity between nodes is evaluated by the distance of their embedding. The experiment randomly selects 20% node pairs from whole pairs as a sub-network sample. We take k pairs of nodes with the highest similarity as predicted links and calculate actual link ratio to evaluate the accuracy of network reconstruction. If node embedding is effective, the accuracy will be high.

Figure 2 shows the mean and standard deviation of corresponding accuracy. Accuracy of network reconstruction decreases with increasing of k value. MVNE achieves better network reconstruction with different k values. The reconstruction precision can reach about 80% while k = 2.

4.4 Visualization

We can visualize embedding to make better understanding for topology and characteristics of networks intuitively. Embedding learned by different methods are different in the ability of visualization and interpretation. We compare visualization ability of embedding learned by different algorithms in Football network. Each algorithm learns 64-D embedding for nodes, and use t-SNE [4] to reduce dimension to 2-D. We color nodes to observe the basic community structure of networks. The results are shown in Fig. 3. Some nodes belonging to different communities are mixed up in HOPE, LINE and GF. The embedding generated by DeepWalk and Node2Vec can represent clear community structure, but there are still a few nodes belonging to different communities mixed. MVNE is more effective than other benchmark algorithms because nodes belonging to same communities are separated clearly.

5 Future Work

In this paper, we propose a Multi-View Learning algorithm to generate embedding of networks. It fuses multiple features of networks by an unsupervised learning process. In the future, how to improve the learning efficiency of MVNE on large-scale network is a very important problem. In addition, introducing dynamic interaction information as a feature into NRL process is also worthwhile to study.

References

Zhao, D., Wang, L., Li, S., Wang, Z., et al.: Immunization of epidemics in multiplex networks. PLoS ONE 9(11), e112018 (2014)
Article Google Scholar
Cozzo, E., Banos, R.A., Meloni, S., et al.: Contact-based social contagion in multiplex networks. Phys. Rev. E 88(5), 660–691 (2013)
Article Google Scholar
Tan, S., Guan, Z., Cai, D., et al.: Mapping users across networks by manifold alignment on hypergraph. In: 28th AAAI Conference on Artificial Intelligence, vol. 1, pp. 159–165 (2014)
Google Scholar
Maaten, L.v.d., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
Google Scholar
Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1225–1234 (2016)
Google Scholar
Ou, M., Cui, P., Pei, J., Zhang, Z., et al.: Asymmetric transitivity preserving graph embedding. In: 22nd International Conference on Knowledge Discovery and Data Mining, pp. 1105–1114 (2016)
Google Scholar
Bhagat, S., Cormode, G., Muthukrishnan, S.: Node classification in social networks. In: Aggarwal, C. (ed.) Social Network Data Analytics, pp. 115–148. Springer, Boston (2011). https://doi.org/10.1007/978-1-4419-8462-3_5
Chapter Google Scholar
Ahmed, A., Shervashidze, N., Narayanamurthy, S., et al.: Distributed large-scale natural graph factorization. In: 22nd International Conference on World Wide Web, pp. 37–48 (2013)
Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: LINE: large-scale information network embedding. In: Proceedings 24th International Conference on World Wide Web, pp. 1067–1077. ACM (2015)
Google Scholar
Ou, M., Cui, P., Pei, J., Zhang, Z., Zhu, W.: Asymmetric transitivity preserving graph embedding. In: 22nd International Conference on Knowledge Discovery and Data Mining, pp. 1105–1114 (2016)
Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)
Google Scholar
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. In: 33rd International Conference on Machine Learning, vol. 48, pp. 2014–2023 (2016)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)
Article Google Scholar

Download references

Acknowledgment

This work is supported by the National Natural Science Foundation of China (Grant No. 61170112), Beijing Natural Science Foundation (4172016), and the Scientific Research Project of Beijing Educational Committee (KM201710011006), and Key Lab of Information Network Security, Ministry of Public Security).

Author information

Authors and Affiliations

Beijing Key Laboratory of Food Safety Big Data Technology, Beijing, China
Zhongming Han & Dagao Duan
Beijing Technology and Business University, No. 11 Fucheng Road, Haidian District, Beijing, China
Zhongming Han, Chenye Zheng, Dan Liu, Dagao Duan & Weijie Yang

Authors

Zhongming Han
View author publications
You can also search for this author in PubMed Google Scholar
Chenye Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Dan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Dagao Duan
View author publications
You can also search for this author in PubMed Google Scholar
Weijie Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongming Han .

Editor information

Editors and Affiliations

National Institute of Advanced Industrial Science and Technology, Ibaraki, Japan
Kazuhiro Kojima
University of Electro-Communications, Tokyo, Japan
Maki Sakamoto
Ochanomizu University, Tokyo, Japan
Koji Mineshima
National Institute of Informatics, Tokyo, Japan
Ken Satoh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, Z., Zheng, C., Liu, D., Duan, D., Yang, W. (2019). Multi-View Learning of Network Embedding. In: Kojima, K., Sakamoto, M., Mineshima, K., Satoh, K. (eds) New Frontiers in Artificial Intelligence. JSAI-isAI 2018. Lecture Notes in Computer Science(), vol 11717. Springer, Cham. https://doi.org/10.1007/978-3-030-31605-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-31605-1_8
Published: 11 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31604-4
Online ISBN: 978-3-030-31605-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics