Abstract
Emerging applications such as personalized portals, enterprise search, and web integration systems often require keyword search over semi-structured views. However, traditional information retrieval techniques are likely to be expensive in this context because they rely on the assumption that the set of documents being searched is materialized. In this paper, we present a system architecture and algorithm that can efficiently evaluate keyword search queries over virtual (unmaterialized) XML views. An interesting aspect of our approach is that it exploits indices present on the base data and thereby avoids materializing large parts of the view that are not relevant to the query results. Another feature of the algorithm is that by solely using indices, we can still score the results of queries over the virtual view, and the resulting scores are the same as if the view was materialized. Our performance evaluation using the INEX data set in the Quark (Bhaskar et al. in Quark: an efficient XQuery full-text implementation. In: SIGMOD, 2006) open-source XML database system indicates that the proposed approach is scalable and efficient.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Aboulnaga, A., Naughton, J.F., Zhang C.: Generating synthetic complex-structured XML data. In: WebDB, pp. 79–84 (2001)
Al-Khalifa, S., Yu, C., Jagadish, H.V.: Querying structured text in an XML database. In: SIGMOD (2003)
Amer-Yahia, S. et al.: Structure and content scoring for XML. In: VLDB (2005)
Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance rankings (2002)
Baeza-Yates R., Ribeiro-Neto B.: Modern information retrieval. ACM Press, New York (1999)
Bhaskar, A., et al.: Quark: an efficient XQuery full-text implementation. In: SIGMOD (2006)
Botev, C., Shanmugasundaram, J.: Context-sensitive keyword search and ranking for XML. In: WebDB (2005)
Bressan S., Catania B., Lacroix Z., Li Y.G., Maddalena A.: Accelerating queries by pruning XML documents. Data Knowl. Eng. 54(2), 211–240 (2005)
Carey, J.M.: XPERANTO: middleware for publishing object- relational data as XML documents. In: VLDB, pp. 646–648 (2000)
Chan C.Y., Felber P., Garofalakis M.N., Rastogi R.: Efficient filtering of XML documents with XPath expressions. VLDB J. 11(4), 354–379 (2002)
Chaudhuri S., Gravano L., Marian A.: Optimizing top-k selection queries over multimedia repositories. IEEE Trans. Knowl. Data Eng. 16(8), 992–1009 (2004)
Chen Z. et al.: Index structures for matching XML twigs using relational query processors. Data Knowl. Eng. 60(2), 283–302 (2007)
Chen, Z., Jagadish, H.V., Lakshmanan, L.V.S., Paparizos, S.: From tree patterns to generalized tree patterns: on efficient evaluation of XQuery. In: VLDB (2003)
Cho, S.: Indexing for XML siblings. In: WebDB (2005)
Christophides, V., Cluet, S., Simeon, J.: On wrapping query languages and efficient XML integration. In: SIGMOD (2000)
Curtmola, E., Amer-Yahia, S., Brown, P., Fernandez, M.: GalaTex: a conformant implementation of the XQuery full-text language. In: XIME-P (2005)
Diao, Y., Fischer, P., Franklin, M., To, R.: YFilter: efficient and scalable filtering of XML documents. In: ICDE (2002)
Fagin, R.: Combining fuzzy information from multiple systems. In: PODS (1996)
Fahl G., Risch T.: Query processing over object views of relational data. VLDB J 6(4), 261–281 (1997)
Fernandez M.F., Tan W.C., Suciu D.: SilkRoute: trading between relations and XML. Comput. Netw. 33(1-6), 723–745 (2000)
Fuhr, N., Großjohann, K.: XIRQL: a query language for information retrieval in XML documents (2001)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: SIGMOD (2003)
Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-style keyword search over relational databases. In: VLDB (2003)
Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: VLDB (2002)
Ilyas, I.F., et al.: Rank-aware query optimization. In: SIGMOD (2004)
Jagadish H.V. et al.: TIMBER: a native XML database. VLDB J. 11(4), 274–291 (2002)
Kaushik, R., Krishnamurthy, R., Naughton, J.F., Ramakrishnan, R.: On the integration of structure indexes and inverted lists. In: ICDE (2004)
Marian, A., Siméon, J.: Projecting XML documents. In: VLDB (2003)
Mass, Y., et al.: JuruXML—an XML retrieval system at INEX’02. In: INEX (2002)
Myaeng, S.-H., Jang, D.-H., Kim, M.-S., Zhoo, Z.-C.: A flexible model for retrieval of SGML documents. In: SIGIR (1998)
Naughton J.F. et al.: The niagara internet query system. IEEE Data Eng. Bull. 24(2), 27–33 (2001)
O’Neil, P., et al.: ORDPATHs: insert-friendly XML node labels. In: SIGMOD (2004)
Paparizos, S., Wu, Y., Lakshmanan, L.V.S., Jagadish, H.V.: Tree logical classes for efficient evaluation of XQuery. In: SIGMOD. ACM Press, New York, pp. 71–82 (2004)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS (2001)
Salton, G.: Automatic text processing: the transaction, analysis and retrieval of information by computer. Addison-Wesley, Reading (1989)
Schott, S., Noga, M.L.: Lazy XSL transformations. In: DocEng 2003. ACM Press, Grenoble (2003)
Shanmugasundaram, J., et al.: Querying XML views of relational data. In: VLDB (2001)
Shao, F., et al.: Efficient ranked keyword search over virtual XML views, technical report TR2007-2077, Cornell University (2007)
Shao, F., Guo, L., Botev, C., Bhaskar, A., Chettiar, M.M.M., Yang, F., Shanmugasundaram, J.: Efficient keyword search over virtual XML views. In: VLDB, pp. 1057–1068 (2007)
Witten I.H., Moffat A., Bell T.C.: Managing Gigabytes: compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, San Francisco (1999)
Yoshikawa M., Amagasa T.: XRel: a path-based approach to storage and retrieval of XML documents using relational databases. ACM Trans. Inter. Tech. 1(1), 110–141 (2001)
Zhang, C., et al.: On supporting containment queries in relational database management systems. In: SIGMOD (2001)
Zobel J., Moffat A.: Exploring the similarity space. SIGIR Forum 32(1), 18–34 (2001)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shao, F., Guo, L., Botev, C. et al. Efficient keyword search over virtual XML views. The VLDB Journal 18, 543–570 (2009). https://doi.org/10.1007/s00778-008-0126-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-008-0126-x