Abstract
Can a system designed primarily for the purpose of database-type storage and retrieval be used for information-retrieval tasks? This was one of the questions that led us to participate in the INEX 2004 initiative. DocBase, a prototype database system developed initially for SGML, and adapted to work with XML, was used for the purpose of answering the queries. DocBase uses DSQL, an adaptation of SQL to provide a mechanism for querying XML using existing database and indexing technologies. The INEX evaluation experience was encouraging - although it did show the limitations of database query languages for classic information retrieval tasks, it also demonstrated that several interesting results can be obtained by using database query languages for information retrieval, especially for queries involving both content and structure. Our results demonstrate the adaptability and scalability of a database system for processing IR queries.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Salton, G.: Developments in automatic text retrieval. Science 253, 974–980 (1991)
Tomasic, A., Garcia-Molina, H., Shoens, K.: Incremental updates of inverted lists for text document retrieval. SIGMOD RECORD 23, 289–300 (1994)
Gonnet, G.H., Baeza-Yates, R.: Lexicographical indices for text: Inverted files vs pat trees. Technical Report TR-OED-91-01, University of Waterloo (1991)
Sengupta, A.: DocBase - A Database Environment for Structured Documents. PhD thesis, Indiana University (1997)
Sengupta, A., Dillon, A.: Query by templates: A generalized approach for visual query formulation for text dominated databases. In: Aho, A. (ed.) Proceedings: Symposium on Advanced Digital Libraries, Library of Congress, pp. 36–47. IEEE Computer Scociety Press, Washington (1997)
Sengupta, A., Dalkilic, M.: DSQL - an SQL for structured documents. In: Pidduck, A.B., Mylopoulos, J., Woo, C.C., Ozsu, M.T. (eds.) CAiSE 2002. LNCS, vol. 2348, pp. 757–760. Springer, Heidelberg (2002)
Carey, M.J., DeWitt, D.J., Frank, D., Graefe, G., Muralikrishna, M., Richardson, J.E., Shikita, E.J.: The architecture of the EXODUS extensible DBMS. In: Dittrich, K.R., Dayal, U. (eds.) Proceedings, 1996 International Workshop on Object-Oriented Database Ssytems, Pacific Grove, California, USA, pp. 52–65. IEEE-CS, Los Alamitos (1986)
Open Text Corporation Waterloo, Ontario, Canada: Open Text 5.0 (1994)
Layman, A.: Element-normal form for serializing graphs of data in XML. In: Bosworth, A., Layman, A., Rys, M. (eds.) Europe 1999, Granada, April 1999. Based in part on an earlier paper, Serializing Graphs of Data in XML (1999)
Schmidt, A.R., Waas, F., Kersten, M.L., Carey, M.J., Manolescu, I., Busse, R.: XMark: A Benchmark for XML Data Management. In: Proceedings of the International Conference on Very Large Data Bases, VLDB (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mohan, S., Sengupta, A. (2005). DocBase – The INEX Evaluation Experience. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds) Advances in XML Information Retrieval. INEX 2004. Lecture Notes in Computer Science, vol 3493. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424550_21
Download citation
DOI: https://doi.org/10.1007/11424550_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26166-7
Online ISBN: 978-3-540-32053-1
eBook Packages: Computer ScienceComputer Science (R0)