Abstract
Although Information Retrieval (IR) systems, including search engines, have been effective in locating documents that contain specified patterns from large repositories, they support only keyword searches and queries/patterns that use Boolean operators. Expressive search for complex text patterns is important in many domains such as patent search, search on incoming news, and web repositories. In this paper, we first present the operators and their semantics for specifying an expressive search. We then investigate the detection of complex patterns – currently not supported by search engines – using a pre-computed index, and the type of information needed as part of the index to efficiently detect such complex patterns. We use an expressive pattern specification language and a pattern detection graph mechanism that allows sharing of common sub-patterns. Algorithms have been developed for all the pattern operators using the index to detect complex patterns efficiently. Experiments have been performed to illustrate the scalability of the proposed approach, and its efficiency as compared to a streaming approach.
This work was supported, in part, by the following NSF grants: IIS-0326505, EIA 0216500, and IIS 0534611.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proc. of the WWW, Brisbane, Australia, pp. 107–117 (April 1998)
Callan, J., Croft, B., Harding, S.: The inquery retrieval system. In: Proc. of the DEXA, pp. 78–83 (1992)
Turtle, H., Croft, B.: Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems 9, 187–222 (1991)
Elkhalifa, L., Adaikkalavan, R., Chakravarthy, S.: Infofilter: A system for expressive pattern specification and detection over text streams. In: Proc. of the ACM SAC, Santa Fe, NM (March 13-17, 2005)
Chakravarthy, S., Elkhalifa, L., Deshpande, N., Adaikkalavan, R., Liuzzi, R.A.: How To Search for Complex Patterns Over Streaming and Stored Data. In: IC-AI, pp. 17–22 (2006)
Mauldin, M.L.: Lycos : Design choices in an internet search service. IEEE Expert (1997), http://lazytoad.com/lti/pub/ieee97.html
Witten, I., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kauffman, San Francisco (1999)
Deshpande, N.: Infosearch : A system for searching and retrieving documents using complex queries, Master’s thesis, University of Texas at Arlington, Arlington (2005), http://itlab.uta.edu/ITLABWEB/Students/sharma/theses/Des05MS.pdf
Java wordnet library, http://sourceforge.net/projects/jwordnet
Berkeley db java edition, http://www.oracle.com/us/products/database/berkeley-db/je/index.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Deshpande, N., Chakravarthy, S., Adaikkalavan, R. (2011). Searching for Complex Patterns over Large Stored Information Repositories. In: Fernandes, A.A.A., Gray, A.J.G., Belhajjame, K. (eds) Advances in Databases. BNCOD 2011. Lecture Notes in Computer Science, vol 7051. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24577-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-24577-0_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24576-3
Online ISBN: 978-3-642-24577-0
eBook Packages: Computer ScienceComputer Science (R0)