Abstract
In current keyword-based XML fragment retrieval systems, various granules of XML fragments are returned as retrieval results. The number of the XML fragments is huge, so this adversely affects the index construction time and query processing time of the XML fragment retrieval systems if they cannot extract only the answer XML fragments with certainty. In this paper, we propose a method for determining XML fragments that are appropriate in keyword-based XML fragment retrieval. This would help to improve overall performance of XML fragment retrieval systems. The proposed method utilizes and analyzes statistical information of XML fragments based on a technique of the dynamics of terminology in quantitative linguistics. Moreover, our keyword-based XML fragment retrieval system runs on a relational database system. In this paper, we briefly explain the implementation of our system.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Agrawal, S., Chaudhuriand, S., Das, G.: DBXplorer: A System for Keyword-Based Search over Relational Databases. In: Proc. of the 18th International Conference on Data Engineering, pp. 5–16. IEEE CS Press, Los Alamitos (2002)
Amer-Yahia, S., Botev, C., Buxton, S., Case, P., Doerre, J., McBeath, D., Rys, M., Shanmugasundaram, J.: XQuery 1.0 and XPath 2.0 Full-Text, W3C Working Draft 09 July 2004, http://www.w3.org/TR/xmlquery-full-text/ (July 2004)
Amer-Yahia, S., Case, P.: XQuery 1.0 and XPath 2.0 Full-Text Use Cases, W3C Working Draft 09 July 2004, http://www.w3.org/TR/xmlquery-full-text-use-cases/ (July 2004)
Boag, S., Chamberlin, D., Fernandez, M.F., Florescu, D., Robie, J., Siméon, J.: XQuery 1.0: An XML Query Language, W3C Working Draft 29 October, http://www.w3.org/TR/xquery (October 2004)
Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E., Yergeau, F.: Extensible Markup Language (XML) 1.0 (Third Edition), W3C Recommendation 04 February, http://www.w3.org/TR/REC-xml (Febuary 2004)
Bremer, J.-M., Gertz, M.: XQuery/IR: Integrating XML Document and Data Retrieval. In: Proc. of the 5th International Workshop on the Web and Databases (WebDB 2002), pp. 1–6 (June 2002)
Chien, S.-Y., Tsotras, V.J., Zaniolo, C., Zhang, D.: Storing and querying multiversion XML documents using durable node numbers. In: Proc. of the 2nd International Conference on Web Information Systems Engineering, pp. 270–279 (2001)
Clark, J., DeRose, S.: XML Path Language (XPath) Version 1.0. W3C Recommendation 16 November (1999), http://www.w3.org/TR/xpath (November 1999)
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: A Semantic Search Engine for XML. In: Proc. of 29th International Conference on Very Large Data Bases, pp. 45–56. Morgan Kaufmann, San Francisco (2003)
Crouch, C.J., Apte, S., Bapat, H.: Using the Extended Vector Model for XML Retrieval. In: Proc. of the 1st Workshop of the Initiative for the Evaluation of XML Retrieval (INEX), March 2003. ERCIM, pp. 95–98 (2003)
Cui, H., Wen, J.-R., Chua, T.-S.: Hierarchical Indexing and Flexible Element Retrieval for Structured Document. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 73–87. Springer, Heidelberg (2003)
Gövert, N., Fuhr, N., Abolhassani, M., Großjohann, K.: Content-Oriented XML Retrieval with HyREX. In: Proc. of the First Workshop of the Initiative for the Evaluation of XML Retrieval, March 2003. ERCIM, pp. 26–32 (2003)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Documents. In: Proc. of the 2003 ACM SIGMOD International Conference on Management of Data, June 2003, pp. 16–27. ACM Press, New York (2003)
Hatano, K., Kinutani, H., Watanabe, M., Mori, Y., Yoshikawa, M., Uemura, S.: Keyword-based XML Portion Retrieval: Experimental Evaluation based on INEX 2003 Relevance Assessments. In: Proc. of the Second Workshop of the Initiative for the Evaluation of XML Retrieval, March 2004, pp. 81–88 (2004)
Hatano, K., Kinutani, H., Yoshikawa, M., Uemura, S.: Information Retrieval System for XML Documents. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, pp. 758–767. Springer, Heidelberg (2002)
Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword Proximity Search on XML Graphs. In: Proc. of the 19th International Conference on Data Engineering, pp. 367–378. IEEE CS Press, Los Alamitos (2003)
Kageura, K.: The Dynamics of Terminology. John Benjamins, Amsterdam (2002)
Kamps, J., de Rijke, M., Sigurbjörnsson, B.: Length Normalization in XML Retrieval. In: Proc. of the 27th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 80–87. ACM Press, New York (2004)
Kaszkiel, M., Zobel, J.: Passage Retrieval Revisited. In: Proc. of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 178–185. ACM Press, New York (1997)
Nielsen, J.: Do Websites Have Increasing Returns? (April 1997) Jakob Nielsen’s Alertbox, April 15 (1997), http://www.useit.com/alertbox/9704b.html
Shin, D., Jang, H., Jin, H.: BUS: An Effective Indexing and Retrieval Scheme in Structured Documents. In: Proc. of the 3rd ACM Conference on Digital libraries (DL 1998), pp. 235–243 (June 1998)
Yoshikawa, M., Amagasa, T., Shimura, T., Uemura, S.: XRel: A Path-Based Approach to Storage and Retrieval of XML Documents using Relational Databases. ACM Transactions on Internet Technology 1(1), 110–141 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hatano, K., Kinutani, H., Amagasa, T., Mori, Y., Yoshikawa, M., Uemura, S. (2005). Analyzing the Properties of XML Fragments Decomposed from the INEX Document Collection. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds) Advances in XML Information Retrieval. INEX 2004. Lecture Notes in Computer Science, vol 3493. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424550_14
Download citation
DOI: https://doi.org/10.1007/11424550_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26166-7
Online ISBN: 978-3-540-32053-1
eBook Packages: Computer ScienceComputer Science (R0)