Query Segmentation and Tagging

Wang, Xuanhui

doi:10.1007/978-3-030-58334-7_3

Xuanhui Wang⁹

Part of the book series: The Information Retrieval Series ((INRE,volume 46))

798 Accesses

Abstract

Query tagging is an important step for query understanding. It applies traditional natural language processing techniques on query strings. Specific challenges are raised due to the shortness of query strings. In this chapter, we describe techniques proposed in the existing literature on how to achieve meaningful query tagging in the following areas: query segmentation, query syntactic tagging, and query semantic tagging.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Query Ambiguity Identification Based on User Behavior Information

A system to transform natural language queries into SQL queries

Article 06 February 2018

A new approach to query segmentation for relevance ranking in web search

Article 28 September 2014

References

Cory Barr, Rosie Jones, and Moira Regelson. The linguistic structure of English web-search queries. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 1021–1030, 2008.
Google Scholar
Michael Bendersky, W. Bruce Croft, and David A. Smith. Two-stage query segmentation for information retrieval. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 810–811, 2009.
Google Scholar
Michael Bendersky, W. Bruce Croft, and David A. Smith. Structural annotation of search queries using pseudo-relevance feedback. In Proceedings of the 19th ACM Conference on Information and Knowledge Management, pages 1537–1540, 2010.
Google Scholar
Shane Bergsma and Qin Iris Wang. Learning noun phrase query segmentation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 819–826, 2007.
Google Scholar
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet allocation. J. Mach. Learn. Res., 3: 993–1022, 2003.
MATH Google Scholar
Andrew Eliot Borthwick. A Maximum Entropy Approach to Named Entity Recognition. PhD thesis, 1999.
Google Scholar
Eric Brill. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Comput. Linguistics, 21 (4): 543–565, 1995.
MathSciNet Google Scholar
Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Comput. Linguistics, 16 (1): 22–29, 1990.
Google Scholar
Carl de Marcken. Unsupervised language acquisition. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 1996.
Google Scholar
Junwu Du, Zhimin Zhang, Jun Yan, Yan Cui, and Zheng Chen. Using search session context for named entity recognition in query. In Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 765–766, 2010.
Google Scholar
Kuzman Ganchev, Keith B. Hall, Ryan T. McDonald, and Slav Petrov. Using search-logs to improve query tagging. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 238–242, 2012.
Google Scholar
Jiafeng Guo, Gu Xu, Xueqi Cheng, and Hang Li. Named entity recognition in query. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 267–274, 2009.
Google Scholar
Matthias Hagen, Martin Potthast, Benno Stein, and Christof Bräutigam. Query segmentation revisited. In Proceedings of the 20th International Conference on World Wide Web, pages 97–106, 2011.
Google Scholar
Rosie Jones, Benjamin Rey, Omid Madani, and Wiley Greiner. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web, pages 387–396, 2006.
Google Scholar
Ajinkya Kale, Thrivikrama Taula, Sanjika Hewavitharana, and Amit Srivastava. Towards semantic query segmentation. CoRR, abs/1707.07835, 2017.
Google Scholar
Atsushi Keyaki and Jun Miyazaki. Part-of-speech tagging for web search queries using a large-scale web corpus. In Proceedings of the Symposium on Applied Computing, pages 931–937, 2017.
Google Scholar
John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 282–289, 2001.
Google Scholar
Yanen Li, Bo-June Paul Hsu, ChengXiang Zhai, and Kuansan Wang. Unsupervised query segmentation using clickthrough for information retrieval. In Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 285–294, 2011.
Google Scholar
Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA, 1999. ISBN 0-262-13360-1.
MATH Google Scholar
Mehdi Manshadi and Xiao Li. Semantic tagging of web search queries. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing, pages 861–869, 2009.
Google Scholar
Donald Metzler and W. Bruce Croft. A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 472–479, 2005.
Google Scholar
Nikita Mishra, Rishiraj Saha Roy, Niloy Ganguly, Srivatsan Laxman, and Monojit Choudhury. Unsupervised query segmentation using only query logs. In Proceedings of the 20th International Conference on World Wide Web, pages 91–92, 2011.
Google Scholar
Marius Pasca. Weakly-supervised discovery of named entities using web search queries. In Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pages 683–690, 2007.
Google Scholar
Fuchun Peng, Fangfang Feng, and Andrew McCallum. Chinese segmentation and new word detection using conditional random fields. In Proceedings of the 20th International Conference on Computational Linguistics, pages 562–568, 2004.
Google Scholar
Yuval Pinter, Roi Reichart, and Idan Szpektor. Syntactic parsing of web queries with question intent. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 670–680, 2016.
Google Scholar
Knut Magne Risvik, Tomasz Mikolajewski, and Peter Boros. Query segmentation for web search. In Proceedings of the Twelfth International World Wide Web Conference, 2003.
Google Scholar
Rishiraj Saha Roy, Anusha Suresh, Niloy Ganguly, and Monojit Choudhury. Improving document ranking for long queries with nested query segmentation. In Proceedings of the 38th European Conference on IR Research, pages 775–781, 2016.
Google Scholar
Richard Sproat, Chilin Shih, William Gale, and Nancy Chang. A stochastic finite-state word-segmentation algorithm for Chinese. Comput. Linguistics, 22 (3): 377–404, 1996.
Google Scholar
Xiangyan Sun, Haixun Wang, Yanghua Xiao, and Zhongyuan Wang. Syntactic parsing of web queries. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1787–1796, 2016.
Google Scholar
Bin Tan and Fuchun Peng. Unsupervised query segmentation using generative language models and Wikipedia. In Proceedings of the 17th International Conference on World Wide Web, pages 347–356, 2008.
Google Scholar
Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2003.
Google Scholar
Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, and Yasemin Altun. Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res., 6: 1453–1484, 2005.
MathSciNet MATH Google Scholar
Gilad Tsur, Yuval Pinter, Idan Szpektor, and David Carmel. Identifying web queries with question intent. In Proceedings of the 25th International Conference on World Wide Web, pages 783–793, 2016.
Google Scholar
Haocheng Wu, Yunhua Hu, Hang Li, and Enhong Chen. A new approach to query segmentation for relevance ranking in web search. Inf. Retr. J., 18 (1): 26–50, 2015.
Article Google Scholar
Xiaoxin Yin and Sarthak Shah. Building taxonomy of web search intents for name entity queries. In Proceedings of the 19th International Conference on World Wide Web, pages 1001–1010, 2010.
Google Scholar
Xiaohui Yu and Huxia Shi. Query segmentation using conditional random fields. In Proceedings of the First International Workshop on Keyword Search on Structured Data, pages 21–26, 2009.
Google Scholar
ChengXiang Zhai. Fast statistical parsing of noun phrases for document indexing. In Proceedings of the 5th Applied Natural Language Processing Conference, pages 312–319, 1997.
Google Scholar
Chao Zhang, Nan Sun, Xia Hu, Tingzhu Huang, and Tat-Seng Chua. Query segmentation based on eigenspace similarity. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 185–188, 2009.
Google Scholar

Download references

Author information

Authors and Affiliations

Google Research, Mountain View, CA, USA
Xuanhui Wang

Authors

Xuanhui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuanhui Wang .

Editor information

Editors and Affiliations

Jilin University, Jilin, China
Yi Chang
Alibaba Group, Zhejiang, China
Hongbo Deng

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, X. (2020). Query Segmentation and Tagging. In: Chang, Y., Deng, H. (eds) Query Understanding for Search Engines. The Information Retrieval Series, vol 46. Springer, Cham. https://doi.org/10.1007/978-3-030-58334-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-58334-7_3
Published: 02 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58333-0
Online ISBN: 978-3-030-58334-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Query Segmentation and Tagging

Abstract

Chapter PDF

Similar content being viewed by others

Query Ambiguity Identification Based on User Behavior Information

A system to transform natural language queries into SQL queries

A new approach to query segmentation for relevance ranking in web search

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Query Segmentation and Tagging

Abstract

Chapter PDF

Similar content being viewed by others

Query Ambiguity Identification Based on User Behavior Information

A system to transform natural language queries into SQL queries

A new approach to query segmentation for relevance ranking in web search

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation