Meta Paths and Meta Structures: Analysing Large Heterogeneous Information Networks

Cheng, Reynold; Huang, Zhipeng; Zheng, Yudian; Yan, Jing; Wong, Ka Yu; Ng, Eddie

doi:10.1007/978-3-319-63579-8_1

Reynold Cheng¹⁸,
Zhipeng Huang¹⁸,
Yudian Zheng¹⁸,
Jing Yan¹⁸,
Ka Yu Wong¹⁸ &
…
Eddie Ng¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10366))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data

2014 Accesses
2 Citations

Abstract

A heterogeneous information network (HIN) is a graph model in which objects and edges are annotated with types. Large and complex databases, such as YAGO and DBLP, can be modeled as HINs. A fundamental problem in HINs is the computation of closeness, or relevance, between two HIN objects. Relevance measures, such as PCRW, PathSim, and HeteSim, can be used in various applications, including information retrieval, entity resolution, and product recommendation. These metrics are based on the use of meta-paths, essentially a sequence of node classes and edge types between two nodes in a HIN. In this tutorial, we will give a detailed review of meta-paths, as well as how they are used to define relevance. In a large and complex HIN, retrieving meta paths manually can be complex, expensive, and error-prone. Hence, we will explore systematic methods for finding meta paths. In particular, we will study a solution based on the Query-by-Example (QBE) paradigm, which allows us to discover meta-paths in an effective and efficient manner.

We further generalise the notion of meta path to “meta structure”, which is a directed acyclic graph of object types with edge types connecting them. Meta structure, which is more expressive than the meta path, can describe complex relationship between two HIN objects (e.g., two papers in DBLP share the same authors and topics). We will discuss three relevance measures based on meta structure. Due to the computational complexity of these measures, we also study an algorithm with data structures proposed to support their evaluation. Finally, we will examine solutions for performing query recommendation based on meta-paths. We will also discuss future research directions.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks

Article 14 July 2018

Explass: Exploring Associations between Entities via Top-K Ontological Patterns and Facets

PathSimExt: Revisiting PathSim in Heterogeneous Information Networks

1 Background

Heterogeneous information networks (HINs), such as DBLP [5], YAGO [8], and DBpedia [1], have recently received a lot of attention. These data sources, containing a vast number of inter-related facts, facilitate the discovery of interesting knowledge [4, 6, 7]. Figure 1(a) illustrates an HIN, which describes the relationship among entities of different types (e.g., author, paper, venue and topic). For example, Jiawei Han (\(a_2\)) has written a VLDB paper (\(p_{2,2}\)), which mentions the topic “efficient” (\(t_3\)).

Given two HIN objects a and b, the evaluation of their relevance is of fundamental importance. This quantifies the degree of closeness between a and b. In Fig. 1(a), Jian Pei (\(a_1\)) and Jiawei Han (\(a_2\)) have a high relevance score, since they have both published papers with keyword “mining” in the same venue (KDD). Relevance finds its applications in information retrieval, recommendation, and clustering [9, 10]: a researcher can retrieve papers that have high relevance in terms of topics and venues in DBLP; in YAGO, relevance facilitates the extraction of actors who are close to a given director. As another example, in entity resolution applications, duplicated HIN object pairs having high relevance scores (e.g., two different objects in an HIN referring to the same real-world person) can be identified and removed from the HIN.

Relevance Computation. In this tutorial, we will explore different ways of computing the relevance between two graph objects, for instance, neighborhood-based measures, such as common neighbors and Jaccard’s coefficient; graph-theoretic measures based on random walks, such as Personalized PageRank and SimRank. These measures do not consider object and edge type information in an HIN. We will discuss the concept of meta paths [4, 9]. A meta path is a sequence of object types with edge types between them. Figure 1(b) illustrates a meta path \(\mathcal {P}_1\), which states that two authors (\(A_1\) and \(A_2\)) are related by their publications in the same venue (V). Another meta path \(\mathcal {P}_2\) says that two authors have written papers containing the same topic (T). We will discuss several meta-path-based relevance measures, including PathCount, PathSim, and Path Constrained Random Walk (PCRW) [4, 9]. These measures have been shown to be better than those that do not consider object and edge type information.

We will further discuss meta structures, recently proposed in [3], to depict the relationship of two graph objects. This is essentially a directed acyclic graph of object and edge types. Figure 1(b) illustrates a meta structure \(\mathcal {S}\), which depicts that two authors are relevant if they have published papers in the same venue, and have also mentioned the same topic. A meta path (e.g., \(\mathcal {P}_1\) or \(\mathcal {P}_2\)) is a special case of a meta structure. However, a meta path fails to capture such complex relationship that can be conveniently expressed by a meta structure (e.g., \(\mathcal {S}\)). We will discuss how meta structures can be used to formulate three relevance definitions, as well as their efficient calculation.

Meta Path Discovery. There are often a huge number of meta paths between a pair of HIN objects. This can be very difficult, even for a domain expert, to identify the right meta paths. We will discuss a meta path discovery algorithm, recently proposed by [6], where users provide example instances of source and target objects through a Query-by-Example paradigm, to derive meta paths automatically. We will demonstrate a HIN search engine prototype based on this algorithm.

Query Recommendation. We will study the use of meta paths in query recommendation, where queries are suggested to web search users based on their previous query histories. As studied in [2], it is possible to use a knowledge base (a HIN) and its related meta-paths to perform effective query recommendation. The approach is especially useful to long-tail queries that rarely appear in query logs.

2 Proposed Schedule

The following is our proposed schedule of the 90-min tutorial.

Introduction (15 min). We will discuss the basic model of HIN, and discuss applications based on it, such as search, relevance computation, query recommendation, and data integration (10 min). We will also introduce meta-paths, a fundamental HIN analysis tool, and give an overview of the tutorial (5 min).
Main contents (60 min). Next, we will introduce meta path, and how it facilitates the computation of various relevance measures (10 min). We then explain the process of discovering meta paths (15 min). We discuss a novel query recommendation framework based on meta paths (15 min). We will also present the meta structures, which is the latest development of meta paths (15 min). We will demonstrate a HIN search engine prototype based on meta paths (5 min).
Conclusions (15 min). We will conclude the tutorial and discuss future directions (5 min). The rest of the time will be dedicated to Q&A (10 min).

3 Intended Audience

The tutorial is designed for researchers interested in latest development in the field of HINs, especially regarding meta-paths for novel applications. The HIN search demonstration will be give insight to software practitioners for developing recommendation facilities for HINs.

4 Biography of Presenters

Reynold Cheng is an Associate Professor of the Department of Computer Science in the University of Hong Kong. He obtained his PhD from Department of Computer Science of Purdue University in 2005. He was granted an Outstanding Young Researcher Award 2011–12 by HKU. He was the recipient of the 2010 Research Output Prize in the Department of Computer Science of HKU. He also received the U21 Fellowship in 2011. He received the Performance Reward in years 2006 and 2007 awarded by the Hong Kong Polytechnic University. He is a member of the IEEE, the ACM, and ACM SIGMOD. He is an editorial board member of TKDE, DAPD and IS, and was a guest editor for TKDE, DAPD, and Geoinformatica. He is an area chair of ICDE 2017, senior PC member of BigData 2017 and DASFAA 2015, PC co-chair of APWeb 2015, area chair for CIKM 2014, and workshop co-chair of ICDE 2014. He received an Outstanding Service Award in CIKM 2009. He has served as PC members and reviewer for top conferences and journals.

Zhipeng Huang is a 2nd year Ph.D. in the CS department of HKU, supervised by Prof. Nikos Mamoulis and Dr. Reynold Cheng. He received his bachelor degree from EECS department of PKU in 2015. His research interests cover data mining, data management and data cleaning.

Yudian Zheng is a 4th year Ph.D. in the CS department of HKU, supervised by Dr. Reynold Cheng. Yudian’s research interests cover crowdsourcing, data management and data cleaning. He has published full research papers in well-established database and data mining conferences/journals, including SIGMOD, VLDB, KDD, WWW, ICDE, and TKDE. He has also taken internships in Microsoft Research and Google Research.

Jing Yan is a 1st year MPhil student supervised by Dr. Reynold Cheng in the CS department of HKU. His research interests include data management and data mining, with emphasis on knowledge graphs and data cleaning.

Ka Yu Wong is currently a MSc student of the CS department of HKU.

Eddie Ng is currently a MSc student of the CS department of HKU.

References

Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76298-0_52
Chapter Google Scholar
Huang, Z., Cautis, B., Cheng, R., Zheng, Y.: KB-enabled query recommendation for long-tail queries. In: CIKM, pp. 2107–2112 (2016)
Google Scholar
Huang, Z., Zheng, Y., Cheng, R., Sun, Y., Mamoulis, N., Li, X.: Meta structure: computing relevance in large heterogeneous information networks. In: KDD (2016)
Google Scholar
Lao, N., Cohen, W.W.: Relational retrieval using a combination of path-constrained random walks. Mach. Learn. 81(1), 53–67 (2010)
Article MathSciNet Google Scholar
Ley, M.: DBLP Computer Science Bibliography (2005)
Google Scholar
Meng, C., Cheng, R., Maniu, S., Senellart, P., Zhang, W.: Discovering meta-paths in large heterogeneous information networks. In: WWW, pp. 754–764 (2015)
Google Scholar
Mottin, D., Lissandrini, M., Velegrakis, Y., Palpanas, T.: Exemplar queries: give me an example of what you need. PVLDB 7(5), 365–376 (2014)
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)
Google Scholar
Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: Pathsim: meta path-based top-k similarity search in heterogeneous information networks. In: PVLDB, pp. 992–1003 (2011)
Google Scholar
Yu, X., Ren, X., Sun, Y., Sturt, B., Khandelwal, U., Gu, Q., Norick, B., Han, J.: Recommendation in heterogeneous information networks with implicit user feedback. In: RecSys, pp. 347–350 (2013)
Google Scholar

Download references

Acknowledgements

Reynold Cheng, Zhipeng Huang, Yudian Zheng, and Jing Yan were supported by the Research Grants Council of Hong Kong (RGC Projects HKU 17229116 and 17205115) and the University of Hong Kong (Projects 102009508 and 104004129).

Author information

Authors and Affiliations

University of Hong Kong, Pokfulam Road, Pokfulam, Hong Kong
Reynold Cheng, Zhipeng Huang, Yudian Zheng, Jing Yan, Ka Yu Wong & Eddie Ng

Authors

Reynold Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhipeng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yudian Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jing Yan
View author publications
You can also search for this author in PubMed Google Scholar
Ka Yu Wong
View author publications
You can also search for this author in PubMed Google Scholar
Eddie Ng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reynold Cheng .

Editor information

Editors and Affiliations

Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China
Lei Chen
Computer Science, Aarhus University, Aarhus N, Denmark
Christian S. Jensen
Computer Science, University of Southern California, Los Angeles, California, USA
Cyrus Shahabi
Northeastern University, Shenyang, China
Xiaochun Yang
Kent State University, Kent, Ohio, USA
Xiang Lian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, R., Huang, Z., Zheng, Y., Yan, J., Wong, K.Y., Ng, E. (2017). Meta Paths and Meta Structures: Analysing Large Heterogeneous Information Networks. In: Chen, L., Jensen, C., Shahabi, C., Yang, X., Lian, X. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10366. Springer, Cham. https://doi.org/10.1007/978-3-319-63579-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-63579-8_1
Published: 03 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63578-1
Online ISBN: 978-3-319-63579-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Meta Paths and Meta Structures: Analysing Large Heterogeneous Information Networks

Abstract

Similar content being viewed by others

Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks

Explass: Exploring Associations between Entities via Top-K Ontological Patterns and Facets

PathSimExt: Revisiting PathSim in Heterogeneous Information Networks

1 Background

2 Proposed Schedule

3 Intended Audience

4 Biography of Presenters

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Meta Paths and Meta Structures: Analysing Large Heterogeneous Information Networks

Abstract

Similar content being viewed by others

Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks

Explass: Exploring Associations between Entities via Top-K Ontological Patterns and Facets

PathSimExt: Revisiting PathSim in Heterogeneous Information Networks

1 Background

2 Proposed Schedule

3 Intended Audience

4 Biography of Presenters

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation