Abstract
One of the key steps in data analysis is the exploration of data. For traditional relational data, this process is facilitated by relational database management systems and the aggregates and rankings they can compute. However, for the exploration of graph data, relational databases may not be most practical and scalable. Many tasks related to exploration of information networks involve computation and analysis of connections (e.g. paths) between concepts. Traditional relational databases offer no specific support for performing such tasks. For instance, a statistic such as the shortest path between two given nodes cannot be computed by a relational database. Surprisingly, tools for querying graph and network databases are much less well developed than for relational data, and only recently an increasing number of studies are devoted to graph or network databases. Our position is that the development of such graph databases is important both to make basic graph mining easier and to prepare data for more complex types of analysis.
In this chapter we present the BiQL data model for representing and manipulating information networks. The BiQL data model consists of two parts: a data model describing objects, link, domains and networks, and a query language describing basic network manipulations. The main focus here lies on data preparation and data analysis, and less on data mining or knowledge discovery tasks directly.
Chapter PDF
Similar content being viewed by others
References
Amann, B., Scholl, M.: Gram: a graph data model and query language. In: Proceedings of the ACM Conference on Hypertext, pp. 201–211. ACM (1993)
Angles, R., Gutierrez, C.: Survey of graph database models. ACM Computing Surveys 40(1), 1–39 (2008)
Batagelj, V.: Semirings for social network analysis. Journal of Mathematical Sociology 19(1), 53–68 (1994)
Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: Knime: The Konstanz information miner. In: Data Analysis, Machine Learning and Applications, pp. 319–326 (2008)
Brandes, U.: A faster algorithm for betweenness centrality. The Journal of Mathematical Sociology 25(2), 163–177 (2001)
Brandes, U., Erlebach, T. (eds.): Network Analysis. LNCS, vol. 3418. Springer, Heidelberg (2005)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks 30(1-7), 107–117 (1998)
Bringmann, B.: Mining Patterns in Structured Data. PhD thesis, Katholieke Universiteit Leuven (2009)
Bringmann, B., Nijssen, S.: What Is Frequent in a Single Graph? In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 858–863. Springer, Heidelberg (2008)
Calders, T., Ramon, J., Van Dyck, D.: Anti-monotonic overlap-graph support measures. In: Proceedings of the 8th International Conference on Data Mining, pp. 73–82. IEEE (2009)
Cassandra. The Apache Cassandra project, http://cassandra.apache.org
Cattell, R.G.G., Barry, D.K. (eds.): The Object Data Standard: ODMG 3.0. Morgan Kaufmann Publishers (2000)
Ceri, S., Gottlob, G., Tanca, L.: What you always wanted to know about DataLog (and never dared to ask). IEEE Transactions on Knowledge and Data Engineering 1(1), 146–166 (1989)
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for structured data. In: Seventh Symposium on Operating System Design and Implementation (2006)
Cheng, H., Yan, X., Han, J., Hsu, C.-W.: Discriminative frequent pattern analysis for effective classification. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 716–725. IEEE (2007)
Codd, E.F.: A relational model of data for large shared data banks. Communications of the ACM 13(6), 377–387 (1970)
Csárdi, G., Nepusz, T.: The igraph library, http://igraph.sourceforge.net/
De Raedt, L., Kimmig, A., Toivonen, H.: ProbLog: A probabilistic prolog and its application in link discovery. In: Veloso, M.M. (ed.) Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2462–2467 (2007)
Dries, A.: Data streams and information networks: a knowledge discovery perspective. PhD thesis, Katholieke Universiteit Leuven (2010)
Güting, R.H.: GraphDB: Modeling and querying graphs in databases. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proceedings of the 20th International Conference on Very Large Data Bases, pp. 297–308. Morgan Kaufmann (1994)
Gyssens, M., Paredaens, J., Van den Bussche, J., van Gucht, D.: A graph-oriented object database model. IEEE Transactions on Knowledge and Data Engineering 6(4), 572–586 (1994)
He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: Wang, J.T.-L. (ed.) Proceedings ACM SIGMOD International Conference on Management of Data, pp. 405–418. ACM (2008)
Hidders, J.: Typing Graph-Manipulation Operations. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 391–406. Springer, Heidelberg (2002)
Inokuchi, A., Washio, T., Motoda, H.: Complete mining of frequent patterns from graphs: Mining graph data. Machine Learning 50(3), 321–354 (2003)
International Organization for Standardization. SQL Language. ISO/IEC 9075(1-4,9-11,13,14):2008 (2008)
Kuramochi, M., Karypis, G.: Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery 11(3), 243–271 (2005)
Leser, U.: A query language for biological networks. Bioinformatics 21(2), 33–39 (2005)
Leskovec, J.: The SNAP library, http://snap.stanford.edu/snap/
Levene, M., Poulovassilis, A.: The hypernode model and its associated query language. In: Proceedings of the Fifth Jerusalem Conference on Information Technology, pp. 520–530. IEEE Computer Society Press (1990)
Levene, M., Poulovassilis, A.: An object-oriented data model formalised through hypergraphs. Data and Knowledge Engineering 6(3), 205–224 (1991)
MartÃnez-Bazan, N., Muntés-Mulero, V., Gómez-Villamor, S., Nin, J., Sánchez-MartÃnez, M., Larriba-Pey, J.: Dex: high-performance exploration on large graphs for information retrieval. In: Silva, M.J., Laender, A.H.F., Baeza-Yates, R.A., McGuinness, D.L., Olstad, B., Olsen, Ø.H., Falcão, A.O. (eds.) Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 573–582. ACM (2007)
Neo Technology. The Neo4J project, http://neo4j.org
Nijssen, S.: Mining Structured Data. PhD thesis, Universiteit Leiden (2006)
Prud’hommeaux, E., Seaborne, A.: SPARQL query language for RDF (2008), http://www.w3.org/TR/rdf-sparql-query/
Rodriguez, M.A.: Gremlin, http://wiki.github.com/tinkerpop/gremlin/
Sabidussi, G.: The centrality index of a graph. Psychometrika 31, 581–603 (1966)
Sheng, L., Ozsoyoglu, Z.M., Ozsoyogly, G.: A graph query language and its query processing. In: Proceedings of the 15th International Conference on Data Engineering, pp. 572–581. IEEE Computer Society (1999)
Van Segbroeck, S., Santos, F.C., Pacheco, J.M.: Adaptive contact networks change effective disease infectiousness and dynamics. PLoS Computational Biology 6(8), 1–10 (2010)
Washio, T., Kok, J.N., De Raedt, L. (eds.): Advances in Mining Graphs, Trees and Sequences. Frontiers in Artificial Intelligence and Applications, vol. 124. IOS Press (2005)
Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications. Cambridge University Press (1994)
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE Conference on Data Mining, p. 721. IEEE Computer Society (2002)
Zeng, Z., Wang, J., Zhou, L., Karypis, G.: Coherent closed quasi-clique discovery from large dense graph databases. In: Eliassi-Rad, T., Ungar, L.H., Craven, M., Gunopulos, D. (eds.) Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797–802. ACM (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2012 The Author(s)
About this chapter
Cite this chapter
Dries, A., Nijssen, S., De Raedt, L. (2012). BiQL: A Query Language for Analyzing Information Networks. In: Berthold, M.R. (eds) Bisociative Knowledge Discovery. Lecture Notes in Computer Science(), vol 7250. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31830-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-31830-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31829-0
Online ISBN: 978-3-642-31830-6
eBook Packages: Computer ScienceComputer Science (R0)