Abstract
Searching XML data using keyword queries has attracted much attention because it enables Web users to easily access XML data without having to learn a structured query language or study possibly complex data schemas. Most of the current approaches identify the meaningful results of a given keyword query based on the semantics of lowest common ancestor (LCA) and its variants. However, given the fact that LCA candidates are usually numerous and of low relevance to the users’ information need, how to effectively and efficiently identify the most relevant results from a large number of LCA candidates is still a challenging and unresolved issue. In this article, we introduce a novel semantics of relevant results based on mutual information between the query keywords. Then, we introduce a novel approach for identifying the relevant answers of a given query by adopting skyline semantics. We also recommend three different ranking criteria for selecting the top-k relevant results of the query. Efficient algorithms are proposed which rely on some provable properties of the dominance relationship between result candidates to rapidly identify the top-k dominant results. Extensive experiments were conducted to evaluate our approach and the results show that the proposed approach has a good performance compared with other existing approaches in different data sets and evaluation metrics
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (1999)
Bartolini, I., Ciaccia, P., Patella, M.: Efficient sort-based skyline evaluation. ACM Trans. Database Syst. 33(4), 1–49 (2008). doi:10.1145/1412331.1412343
Bender, M.A., Farach-Colton, M., Pemmasani, G., Skiena, S., Sumazin, P.: Lowest common ancestors in trees and directed acyclic graphs. J. Algorithms 57, 75–94 (2005)
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of the 17th International Conference on Data Engineering, pp. 421–430. IEEE Computer Society, Washington, DC, USA (2001)
Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of 19th International Conference on Data Engineering, vol. 717 (2003)
Cohen, S., Kanza, Y., Kimelfeld, B., Sagiv, Y.: Interconnection semantics for keyword search in XML. In: CIKM ’05: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 389–396. ACM, New York, NY, USA (2005)
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: a semantic search engine for XML. In: Proceedings of the 29th International Conference on Very Large Data Bases, pp. 45–56. VLDB Endowment (2003)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, New York, NY, USA (1991)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 102–113. ACM, New York, NY, USA (2001)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 16–27. ACM, New York, NY, USA (2003)
Han, S.K., Shin, D., Jung, J.Y., Park, J.: Exploring the relationship between keywords and feed elements in blog post search. World Wide Web 12(4), 381–398 (2009)
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)
Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword proximity search in XML trees. IEEE Trans. Knowl. Data Eng. 18, 525–539 (2006)
Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 275–286. VLDB Endowment (2002)
Kung, H.T., Luccio, F., Preparata, F.P.: On finding the maxima of a set of vectors. Journal of the ACM, pp. 469–476. ACM, New York, NY, USA (1975)
Lee, K.H., Whang, K.Y., Han, W.S.: Xmin: Minimizing tree pattern queries with minimality guarantee. World Wide Web 13(3), 343–371 (2010)
Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable LCAs over XML documents. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 31–40. ACM, New York, NY, USA (2007)
Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 903–914. ACM, New York, NY, USA (2008)
Li, L., Otsuka, S., Kitsuregawa, M.: Finding related search engine queries by web community based query enrichment. World Wide Web 13(1), 121–142 (2010)
Li, Y., Yu, C., Jagadish, H.V.: Enabling schema-free XQuery with meaningful query focus. VLDB J. 17(3), 355–377 (2008)
Liu, Z., Chen, Y.: Identifying meaningful return information for xml keyword search. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 329–340. ACM, New York, NY, USA (2007)
Liu, Z., Chen, Y.: Reasoning and identifying relevant matches for XML keyword search. In: Proceedings of the 34th International Conference on Very Large Data Bases, pp. 921–932 (2008)
Liu, Z., Chen, Y.: Processing keyword search on XML: a survey. World Wide Web 14, 671–707 (2011)
Liu, Z., Walker, J., Chen, Y.: XSeek: a semantic XML search engine using keywords. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 1330–1333. VLDB Endowment (2007)
Mondial. http://www.cs.washington.edu/research/xmldatasets/data/mondial/mondial-3.0.xml
Online computer library center. Introduction to the Dewey Decimal Classification. http://www.oclc.org/oclc/fp/about/abouttheddc.htm
Oracle Berkeley DB. http://www.oracle.com/technology/products/berkeley-db/index.html.
Schieber, B., Vishkin, U.: On finding lowest common ancestors: simplification and parallelization. SIAM J. Comput. 17(6), 1253–1262 (1988)
Shao, F., Guo, L., Botev, C., Bhaskar, A., Chettiar, M., Yang, F., Shanmugasundaram, J.: Efficient keyword search over virtual xml views. In: VLDB ’07: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 1057–1068. VLDB Endowment (2007)
Sun, C., Chan, C.Y., Goenka, A.K.: Multiway slca-based keyword search in xml data. In: Proceedings of the 16th International Conference on World Wide Web, pp. 1043–1052. ACM, New York, NY, USA (2007)
Tan, K.L., Eng, P.K., Ooi, B.C.: Efficient progressive skyline computation. In: VLDB ’01: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 301–310 (2001)
Wang, G., Ning, B., Yu, G.: Holistically stream-based processing xtwig queries. World Wide Web 11(4), 407–425 (2008)
Wang, J., Yu, J.X., Liu, C.: Independence of containing patterns property and its application in tree pattern query rewriting using views. World Wide Web 12(1), 87–105 (2009)
Wu, X., Theodoratos, D., Souldatos, S., Dalamagas, T., Sellis, T.: Evaluation techniques for generalized path pattern queries on xml data. World Wide Web 13(4), 441–474 (2010)
Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest lcas in xml databases. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 527–538. ACM, New York, NY, USA (2005)
Xu, Y., Papakonstantinou, Y.: Efficient lca based keyword search in xml data. In: Proceedings of the 11th International Conference on Extending Database Technology, pp. 535–546. ACM, New York, NY, USA (2008)
XML Path Language (XPath). http://www.w3.org/tr/xpath/
XQuery: An XML Query Language. http://www.w3.org/tr/xquery/
Zhou, R., Liu, C., Li, J.: Fast elca computation for keyword queries on xml data. In: Proceedings of the 13th International Conference on Extending Database Technoloy, pp. 549–560. ACM, New York, NY, USA (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nguyen, K., Cao, J. Top-k answers for XML keyword queries. World Wide Web 15, 485–515 (2012). https://doi.org/10.1007/s11280-011-0145-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-011-0145-1