Abstract
The problem of answering XML queries using path-based indexes is to find efficient methods for accelerating the XML query with pre-designed index structures over the XML database. This problem received increasing interests and have been lucubrated in recent years. Regular path expression is the core of the XML query languages e.g., XPath and XQuery. Most of the state-of-the-art path-based XML indexes, therefore, hammer at how to efficiently answer the path-based XML queries. This paper surveys various approaches to indexing XML data proposed in the literature. We give a step by step analysis to show the evolution of index structures for XML path information, based on tree structures or more commonly, directed labeled graphs. For each approach, we first present the specific issue it aims to tackle, and then the proposed solution presented. Furthermore, construction, physical data storage and maintenance costs, are analyzed.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Abiteboul, S.: Querying semi-structured data. In: Proceedings of ICDT ’97, (1997)
Abiteboul, S., Quass, D., McHugh, J., Widom, J., Wiener, J.: The Lorel query language for semistructured data. Int. J. Digit. Lib. 1(1), 68–88 (April, 1997)
Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, San Francisco, California 94104–3205, USA
Al-Khalifa, S., Jagadish, H.V., Koudas, N., Patel, J.M., Srivastava, D., Wu, Y.: Structural Joins: A primitive for efficient XML query pattern matching. In: Proceedings of ICDE '02, (2002)
Barg, M., Wong. R.K.: A fast and versatile path index for querying semi-structured data. In: Proceedings of DASFAA '03, (2003)
Berglund, A., Baog, S., Chamberlin, D., et al.: XML Path Languages (XPath), ver 2.0. W3Cworking Draft, 2001. Tech. Report WD-xpath20-20011220, W3C, 2001, http://www.w3.org/TR/WD-xpath20-20011220
Biron, P.V., Malhotra, A.: XML Schema Part 2: Datatypes W3C Recommendation 02, (May 2001). http://www.w3.org/TR/xmlschema-2/
Buneman, P., Davidson, S.B., Hillebrand, G.G., Suciu, D.: A query language and optimization techniques for unstructured data. In: Proceedings of SIGMOD '96, (1996)
Chamberlin, D., Robie, J., Florescu, D.: Quilt: An XML query language for heterogeneous data sources. In: Proceedings of Third International Workshop WebDB, (2000).
Chamberlin, D., Clark, J., Florescu, D., et al.: XQuery 1.0: An XML Query Language. W3C working draft. Technical Report WD-xquery-20010607, World Wide Web Consortium, (2001)
Chen, Q., Lim, A., Ong, K.W.: D(K)-Index: An adaptive structural summary for graph-structured data. In: Proceedings of SIGMOD '03, (2003)
Cheng, J., Yu, G., Yu, J.X., Wang. G.: PathGuide: An efficient clustering based indexing method for XML path expressions. In: Proceedings of DASFAA ’03, (2003)
Chien, S.-Y., Vagena, Z., Zhang, D., Tsotras, V.J., Zaniolo, C.: Efficient structural joins on indexed XML documents. In: Proceedings of VLDB ’02, (2002)
Christophides, V., Abiteboul, S., Cluet, S., Scholl, M.: From structured documents to novel query facilities. In: Proceedings of SIGMOD '94, (1994)
Chung, C.-W., Min, J.-K., Shim, K.: APEX: An adaptive path index for XML data. In: Proceedings of SIGMOD '02, (2002)
Cooper, B.F., Sample, N., Franklin, M.J., Hjaltason, G.R., Shadmon, M.: A fast index for semistructured data. In: Proceedings of VLDB '01, (2001)
Deutsch, A., Fernandez, M., Florescu, D., Levy, A., Suciu, D.: XML-QL: A query language for XML. W3C Working Draft. Submission to the World Wide Web Consortium 19-August-1998. Available from http://www.w3.org/TR/NOTE-xml-ql., (1998)
Deutsch, A., Fernandez, M., Suciu, D.: Storing semistructured data with STORED. In: Proceedings of SIGMOD '99, (1999)
Fernándex, M., Florescu, D., Levy, A., Suciu, D.: Declarative specificaiton of web sites with strudel. VLDB J. 9(1), (2000). Electronic edition
Flajolet, P., Sedgewick, R.: Digital search trees revisited. SIAM J. Comput 15(3), 748–767 (1986)
Florescu, D., Kossmann, D.: A performance evaluation of alternative mapping schemes for storing XML data In: a relational database. (1999). Survey report
Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization In: semistructured databases. In: Proceedings of VLDB '97, (1997)
Goldman, R., Widom, J.: Approximate dataguides. In: Proceedings of the Workshop on Query Processing for Semistructured Data and Non-Standard Data Formats, (1999)
Gonnet, G.: Efficient searching of text and pictures (extended abstract). Technical Report OED-88-02, University of Waterloo, (1988)
Harding, P.J., Li, Q., Moon, B.: XISS/R: XML indexing and storage system using RDBMS. In: Proceedings of VLDB '03, (2003)
He, H., Yang, J.: Multiresolution indexing of XML for frequent queries. In: Proceedings of ICDE '04, (2004)
Hopcroft, J., Ullman, J.: Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading, Massachusetts, (1979), Date
The Internet Movie Database Ltd. Internet movie database: http://www.imdb.com
Jagadish, H.V., Al-Khalifa, S., Chapman, A., Lakshmanan, L.V., Nierman, A., Paparizos, S., Patel, J.M., Srivastava, D., Wiwatwattana, N., Wu, Y., Yu, C.: TIMBER: a native XML database. VLDB J. 11 (2002)
Jiang, H., Lu, H., Wang, W., Yu, J.X.: Path materialization revisited: An efficient storage model for XML data. In: Proceedings of ADC '02, (2002)
Jiang, H., Lu, H., Wang, W., Ooi, B.C.: XR-Tree: Indexing XML data for efficient structural join. In: Proceedings of ICDE '03, (2003)
Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. In: Proceedings of SIGMOD '02, (2002)
Kaushik, R., Bohannon, P., Naughton, J.F., Shenoy, P.: Updates for structure indexes. In: Proceedings of VLDB '02, (2002)
Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for indexing paths In: graph-structured data. In: Proceedings of ICDE '02, (2002)
Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proceedings of VLDB '01, (2001)
Lu, H., Wang, G., Yu, G., Bao, Y., Lv, J., Yu, Y.: XBase: Making your gigabyte disk files queriable. In: Proceedings of SIGMOD '02, (2002)
McHugh, J., Abiteboul, S., Goldman, R., Quass, D., Widom, J.: Lore: a database management system for semistructured data. Sigmod Rec. 26(3), 54–66 (1997)
Mc Hugh, J., Widom, J., Abiteboul, S., Luo, Q., Rajamaran, A.: Indexing semistructured data. Technical report, Stanford University, Computer Science Department, Database Group, (1998)
Mendelzon, A., Mihaila, G., Milo, T.: Querying the world wide web. In: Proceedings of the Fourth Conference on Parallel and Distributed Information Systems, (1996)
Milo, T., Suciu, D.: Index structures for path expressions. In: Proceedings of ICDT '99, (1999)
Morrison, D.R.: PATRICIA-practical algorithm to retrieve information coded In: alphanumeric. Journal of the Association of Computing Machinary 15(4), 514–534 (1968)
NASA XML Group. http://www.imdb.com
Nestorov, S., Ullman, J., Wiener, J., Chawathe, S.: Representative objects: concise representation of semistructured, hierarchical data. In: Proceedings of ICDE '97, (1997)
Open Directory Project. DMOZ open directory project. http://www.dmoz.org
Paige, R., Tarjan, R.: Three partition refinement algorithms. SIAM J. Comput 16(6), 973–989 (1987)
Papakonstantinou, Y., Garcia-Molina, H., Widom, J.: Object exchange across heterogeneous information sources. In: Proceedings of ICDE '95, (1995)
Prüfer, H.: Neuer beweis eines satzes über permutationen. Archiv für Mathematik und Physik 27, 142–144 (1918)
Rao, P., Moon, B.: PRIX: Indexing and querying XML using prüfer sequences. In: Proceedings of ICDE '04, (2004)
Sacks-Davis, R., Arnold-Moore, T., Zobel, J.: Database systems for structured documents. In: International Symposium on Advanced Database Technologies and Their Integration (ADTI '94), Nara, Japan, October (1994)
Schmidt, A., Kersten, M.L., Windhouwer, M., Waas, F.: Efficient relational storage and retrieval of XML documents. In: Proceedings of WebDB '00, (2000)
Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: A benchmark for XML data management. In: Proceedings of VLDB '02, (2002)
Software AG: Tamino XML database. http://www.softwareag.com/tamino/
van Zwol, R., Apers, P., Wilschutz, A.: Implementing semi-structured data with MOA. In: Workshop on Query Processing for Semistructured data and Non-Standard Data Formats (in conjunction with ICDT), (1999)
Wang, G., Liu, M.: Query processing and optimization for regular path expressions. In: Proceedings of CAiSE '03, (2003)
Wang, H., Park, S., Fan, W., Yu, P.S.: ViST: A DynamicInde x Method for Querying XML Data by Tree Structures. In: Proceedings of SIGMOD '03, (2003)
Wang, G., Sun, B., Lv, J., Yu, G.: Rpe query processing and optimization techniques for xml databases. J. Comput Sci. Technol 19(2), 224–237 (March, 2004)
XYZFind: XML database. http://www.xyzfind.com/
Yoshikawa, M., Amagasa, T., Shimura, T., Uemura, S.: Xrel: a path-based approach to storage and retrieval of XML documents using relational databases. ACM Transaction Internet Technology 1(1), 110–141 (2001)
Zhou, A., Lu, H., Zheng, S., Liang, Y., Zhang, L., Ji, W., Tian, Z.: VXMLR: A visual XML-relational database system. In: Proceedings of VLDB '01, (2001)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wong, KF., Yu, J.X. & Tang, N. Answering XML Queries Using Path-Based Indexes: A Survey. World Wide Web 9, 277–299 (2006). https://doi.org/10.1007/s11280-006-8557-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-006-8557-z