Abstract
Processing a twig pattern query in XML document includes structural search and content search. Most existing algorithms only focus on structural search. They treat content nodes the same as element nodes during query processing with structural joins. Due to the high variety of contents, to mix content search and structural search suffers from management problem of contents and low performance. Another disadvantage is to find the actual values asked by a query, they have to rely on the original document. In this paper, we propose a novel algorithm Value Extraction with Relational Table (VERT) to overcome these limitations. The main technique of VERT is introducing relational tables to store document contents instead of treating them as nodes and labeling them. Tables in our algorithm are created based on semantic information of documents. As more semantics is captured, we can further optimize tables and queries to significantly enhance efficiency. Last, we show by experiments that besides solving different content problems, VERT also has superiority in performance of twig pattern query processing compared with existing algorithms.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Al-Khalifa, S., Jagadish, H.V., Patel, J.M., Wu, Y., Koudas, N., Srivastava, D.: Structural joins: A primitive for efficient XML query pattern matching. In: Proc. of ICDE (2002)
Berglund, A., Chamberlin, D., Fernandez, M.F., Kay, M., Robie, J., Simeon, J.: XML Path Language (XPath) 2.0. W3C Working Draft (2003)
Boag, S., Chamberlin, D., Fernandez, M.F., Florescu, D., Robie, J., Simeon, J.: XQuery 1.0: An XML Query. W3C Working Draft (2003)
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: Optimal XML pattern matching. In: Proc. of ACM SIGMOD, ACM Press, New York (2002)
Chen, T., Lu, J., Ling, T.W.: On boosting holism in XML twig pattern matching using structural indexing techniques. In: Proc. of SIGMOD Conference (2005)
Grust, T.: Accelerating XPath location steps. In: Proc. of SIGMOD Conference (2002)
Jiang, H., Lu, H., Wang, W.: Efficient processing of XML twig queries with OR-predicates. In: Proc. of SIGMOD Conference (2004)
Jiang, H., Wang, W., Lu, H., Yu, J.: Holistic twig joins on indexed XML documents. In: Proc. of VLDB Conference (2003)
Lu, J., Chen, T., Ling, T.W.: Efficient processing of XML twig patterns with parent child edges: a look-ahead approach. In: Proc. of CIKM (2004)
Lu, J., Ling, T.W., Chan, C., Chen, T.: From region encoding to extended dewey: On efficient processing of XML twig pattern matching. In: Proc. of VLDB Conference (2005)
Rao, P.R., Moon, B.: PRIX: Indexing and Querying XML Using Prufer Sequences. In: Proc. of ICDE (2004)
Wang, H., Park, S., Fan, W., Yu, P.S.: ViST: A Dynamic index method for querying XML data by tree structures. In: Proc. of SIGMOD Conference (2003)
Yu, T., Ling, T.W., Lu, J.: Twigstacklistnot: A holistic twig join algorithm for twig query with NOT-predicates on XML data. In: Lee, M.L., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, Springer, Heidelberg (2006)
Zhang, C., Naughton, J., Dewitt, D., Luo, Q., Lohman, G.: On supporting containment queries in relational database management systems. In: Proc. of ACM SIGMOD, ACM Press, New York (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, H., Ling, T.W., Chen, B. (2007). VERT: A Semantic Approach for Content Search and Content Extraction in XML Query Processing. In: Parent, C., Schewe, KD., Storey, V.C., Thalheim, B. (eds) Conceptual Modeling - ER 2007. ER 2007. Lecture Notes in Computer Science, vol 4801. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75563-0_36
Download citation
DOI: https://doi.org/10.1007/978-3-540-75563-0_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75562-3
Online ISBN: 978-3-540-75563-0
eBook Packages: Computer ScienceComputer Science (R0)