Abstract
Query tagging is an important step for query understanding. It applies traditional natural language processing techniques on query strings. Specific challenges are raised due to the shortness of query strings. In this chapter, we describe techniques proposed in the existing literature on how to achieve meaningful query tagging in the following areas: query segmentation, query syntactic tagging, and query semantic tagging.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Cory Barr, Rosie Jones, and Moira Regelson. The linguistic structure of English web-search queries. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 1021–1030, 2008.
Michael Bendersky, W. Bruce Croft, and David A. Smith. Two-stage query segmentation for information retrieval. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 810–811, 2009.
Michael Bendersky, W. Bruce Croft, and David A. Smith. Structural annotation of search queries using pseudo-relevance feedback. In Proceedings of the 19th ACM Conference on Information and Knowledge Management, pages 1537–1540, 2010.
Shane Bergsma and Qin Iris Wang. Learning noun phrase query segmentation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 819–826, 2007.
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet allocation. J. Mach. Learn. Res., 3: 993–1022, 2003.
Andrew Eliot Borthwick. A Maximum Entropy Approach to Named Entity Recognition. PhD thesis, 1999.
Eric Brill. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Comput. Linguistics, 21 (4): 543–565, 1995.
Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Comput. Linguistics, 16 (1): 22–29, 1990.
Carl de Marcken. Unsupervised language acquisition. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 1996.
Junwu Du, Zhimin Zhang, Jun Yan, Yan Cui, and Zheng Chen. Using search session context for named entity recognition in query. In Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 765–766, 2010.
Kuzman Ganchev, Keith B. Hall, Ryan T. McDonald, and Slav Petrov. Using search-logs to improve query tagging. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 238–242, 2012.
Jiafeng Guo, Gu Xu, Xueqi Cheng, and Hang Li. Named entity recognition in query. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 267–274, 2009.
Matthias Hagen, Martin Potthast, Benno Stein, and Christof Bräutigam. Query segmentation revisited. In Proceedings of the 20th International Conference on World Wide Web, pages 97–106, 2011.
Rosie Jones, Benjamin Rey, Omid Madani, and Wiley Greiner. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web, pages 387–396, 2006.
Ajinkya Kale, Thrivikrama Taula, Sanjika Hewavitharana, and Amit Srivastava. Towards semantic query segmentation. CoRR, abs/1707.07835, 2017.
Atsushi Keyaki and Jun Miyazaki. Part-of-speech tagging for web search queries using a large-scale web corpus. In Proceedings of the Symposium on Applied Computing, pages 931–937, 2017.
John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 282–289, 2001.
Yanen Li, Bo-June Paul Hsu, ChengXiang Zhai, and Kuansan Wang. Unsupervised query segmentation using clickthrough for information retrieval. In Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 285–294, 2011.
Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA, 1999. ISBN 0-262-13360-1.
Mehdi Manshadi and Xiao Li. Semantic tagging of web search queries. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing, pages 861–869, 2009.
Donald Metzler and W. Bruce Croft. A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 472–479, 2005.
Nikita Mishra, Rishiraj Saha Roy, Niloy Ganguly, Srivatsan Laxman, and Monojit Choudhury. Unsupervised query segmentation using only query logs. In Proceedings of the 20th International Conference on World Wide Web, pages 91–92, 2011.
Marius Pasca. Weakly-supervised discovery of named entities using web search queries. In Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pages 683–690, 2007.
Fuchun Peng, Fangfang Feng, and Andrew McCallum. Chinese segmentation and new word detection using conditional random fields. In Proceedings of the 20th International Conference on Computational Linguistics, pages 562–568, 2004.
Yuval Pinter, Roi Reichart, and Idan Szpektor. Syntactic parsing of web queries with question intent. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 670–680, 2016.
Knut Magne Risvik, Tomasz Mikolajewski, and Peter Boros. Query segmentation for web search. In Proceedings of the Twelfth International World Wide Web Conference, 2003.
Rishiraj Saha Roy, Anusha Suresh, Niloy Ganguly, and Monojit Choudhury. Improving document ranking for long queries with nested query segmentation. In Proceedings of the 38th European Conference on IR Research, pages 775–781, 2016.
Richard Sproat, Chilin Shih, William Gale, and Nancy Chang. A stochastic finite-state word-segmentation algorithm for Chinese. Comput. Linguistics, 22 (3): 377–404, 1996.
Xiangyan Sun, Haixun Wang, Yanghua Xiao, and Zhongyuan Wang. Syntactic parsing of web queries. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1787–1796, 2016.
Bin Tan and Fuchun Peng. Unsupervised query segmentation using generative language models and Wikipedia. In Proceedings of the 17th International Conference on World Wide Web, pages 347–356, 2008.
Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2003.
Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, and Yasemin Altun. Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res., 6: 1453–1484, 2005.
Gilad Tsur, Yuval Pinter, Idan Szpektor, and David Carmel. Identifying web queries with question intent. In Proceedings of the 25th International Conference on World Wide Web, pages 783–793, 2016.
Haocheng Wu, Yunhua Hu, Hang Li, and Enhong Chen. A new approach to query segmentation for relevance ranking in web search. Inf. Retr. J., 18 (1): 26–50, 2015.
Xiaoxin Yin and Sarthak Shah. Building taxonomy of web search intents for name entity queries. In Proceedings of the 19th International Conference on World Wide Web, pages 1001–1010, 2010.
Xiaohui Yu and Huxia Shi. Query segmentation using conditional random fields. In Proceedings of the First International Workshop on Keyword Search on Structured Data, pages 21–26, 2009.
ChengXiang Zhai. Fast statistical parsing of noun phrases for document indexing. In Proceedings of the 5th Applied Natural Language Processing Conference, pages 312–319, 1997.
Chao Zhang, Nan Sun, Xia Hu, Tingzhu Huang, and Tat-Seng Chua. Query segmentation based on eigenspace similarity. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 185–188, 2009.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Wang, X. (2020). Query Segmentation and Tagging. In: Chang, Y., Deng, H. (eds) Query Understanding for Search Engines. The Information Retrieval Series, vol 46. Springer, Cham. https://doi.org/10.1007/978-3-030-58334-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-58334-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58333-0
Online ISBN: 978-3-030-58334-7
eBook Packages: Computer ScienceComputer Science (R0)