Abstract
In order to understand user intents behind their queries, many researchers study similar query finding. Recently, the click graph has shown its utility in describing the relationship between queries and URLs. The previous approaches mainly either generate related terms or find relevant queries based on the co-clicked URLs. However, these approaches may suffer from the complexity of natural language processing and click-through data sparseness. In this paper, we tackle this problem through three query probability distribution representation models: Click Model, Term Model, and Semantic Model. The Click Model extracts credible transition probability from queries to URLs, and describes a query without considering web contents. The Term Model focuses on representing a query via term distribution over its main entities and purposes, which can better capture information needs behind short and ambiguous keyword queries. The Semantic Model learns potential intent distribution of queries to distinguish user intents behind a query. Among the three models, we apply pairwise similarity metrics and graph-based personalized pagerank to find similar queries. Compared to traditional representation models, our representation models are verified to be effective and efficient, especially for long tail queries.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’00, pp. 407–416. ACM, New York, NY (2000). doi:10.1145/347090.347176
Bendersky, M., Croft, W.B.: Modeling higher-order term dependencies in information retrieval using query hypergraphs. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12, pp. 941–950. ACM, New York, NY (2012). doi:10.1145/2348283.2348408
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., Vigna, S.: The query-flow graph: model and applications. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM ’08, pp. 609–618. ACM, New York, NY (2008). doi:10.1145/1458082.1458163
Boldi, P., Bonchi, F., Castillo, C., Donato, D., Vigna, S.: Query suggestions using query-flow graphs. In: Proceedings of the 2009 Workshop on Web Search Click Data, WSCD ’09, pp. 56–63. ACM, New York, NY (2009). doi:10.1145/1507509.1507518
Bordino, I., Castillo, C., Donato, D., Gionis, A.: Query similarity by projecting the query-flow graph. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’10, pp. 515–522. ACM, New York, NY (2010). doi:10.1145/1835449.1835536
Castillo, J.J.: A wordnet-based semantic approach to textual entailment and cross-lingual textual entailment. IJMLC 2(3), 177–189 (2011). doi:10.1007/s13042-011-0026-z
Chen, J., Wang, Y., Liu, J., Huang, Y.: Modeling semantic and behavioral relations for query suggestion. In: Web-Age Information Management, pp. 678–690. Springer (2013)
Craswell, N., Szummer, M.: Random walks on the click graph. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’07, pp. 239–246. ACM, New York, NY (2007). doi:10.1145/1277741.1277784
Deng, H., King, I., Lyu, M.R.: Entropy-biased models for query representation on the click graph. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, pp. 339–346. ACM, New York, NY (2009). doi:10.1145/1571941.1572001
Dou, Z., Hu, S., Luo, Y., Song, R., Wen, J.R.: Finding dimensions for queries. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pp. 1311–1320. ACM, New York, NY (2011). doi:10.1145/2063576.2063767
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 226–231. Portland, Oregon, USA. AAAI Press 1996. ISBN 1-57735-004-9 (1996)
Fujita, S., Dupret, G., Baeza-Yates, R.A.: Learning to rank query recommendations by semantic similarities. CoRR. arXiv:abs/1204.2712 (2012)
Griffiths, T.: Gibbs sampling in the generative model of Latent Dirichlet Allocation. Tech. rep., Stanford University (2002). www-psych.stanford.edu/~gruffydd/cogsci02/lda.ps
Guo, J., Cheng, X., Xu, G., Zhu, X.: Intent-aware query similarity. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pp. 259–268. ACM, New York, NY (2011). doi:10.1145/2063576.2063619
Haveliwala, T., Kamvar, S., Jeh, G.: An analytical comparison of approaches to personalizing pagerank. Technical Report 2003-35, Stanford InfoLab (2003)
Hu, Y., Qian, Y., Li, H., Jiang, D., Pei, J., Zheng, Q.: Mining query subtopics from search log data. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12, pp. 305–314. ACM, New York, NY (2012). doi:10.1145/2348283.2348327
Huang, C.K., Chien, L.F., Oyang, Y.J.: Relevant term suggestion in interactive web search based on contextual information in query session logs. JASIST 54(7), 638–649 (2003). doi:10.1002/asi.10256
Huang, J., Gao, J., Miao, J., Li, X., Wang, K., Behr, F., Giles, C.L.: Exploring web scale language models for search query processing. In: Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pp. 451–460. ACM, New York, NY (2010). doi:10.1145/1772690.1772737
Ji-Rong, W., Jian-Yun, N., Zhang, H.J.: Query clustering using user logs. ACM Trans. Inf. Syst. 20(1), 59–81 (2002). doi:10.1145/503104.503108
Jones, R., Rey, B., Madani, O., Greiner, W.: Generating query substitutions. In: Proceedings of the 15th International Conference on World Wide Web, WWW ’06, pp. 387–396. ACM, New York, NY (2006). doi:10.1145/1135777.1135835
Liu, Y., Miao, J., Zhang, M., Ma, S., Ru, L.: How do users describe their information need: query recommendation based on snippet click model. Expert Syst. Appl. 38(11), 13,847–13,856 (2011). doi:10.1016/j.eswa.2011.04.188
Ma, H., Yang, H., King, I., Lyu, M.R.: Learning latent semantic relations from clickthrough data for query suggestion. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM ’08, pp. 709–718. ACM, New York, NY (2008). doi:10.1145/1458082.1458177
Mei, Q., Zhou, D., Church, K.: Query suggestion using hitting time. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM ’08, pp. 469–478. ACM, New York, NY (2008). doi:10.1145/1458082.1458145
Metzler, D., Croft, W.B.: A markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’05, pp. 472–479. ACM, New York, NY (2005). doi:10.1145/1076034.1076115
Metzler, D., Croft, W.B.: Latent concept expansion using markov random fields. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’07, pp. 311–318. ACM, New York, NY (2007). doi:10.1145/1277741.1277796
Radlinski, F., Broder, A., Ciccolo, P., Gabrilovich, E., Josifovski, V., Riedel, L.: Optimizing relevance and revenue in ad search: a query substitution approach. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08, pp. 403–410. ACM, New York, NY (2008). doi:10.1145/1390334.1390404
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: Proceedings of the 3rd Text REtrieval Conference, pp. 109–126. Department of Commerce, National Institute of Standards and Technology (1994)
Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web, WWW ’06, pp. 377–386. ACM, New York, NY (2006). doi:10.1145/1135777.1135834
Song, Y., Zhou, D., He, L.w.: Query suggestion by constructing term-transition graphs. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining, WSDM ’12, pp. 353–362. ACM, New York, NY (2012). doi:10.1145/2124295.2124339
Wang, H., Liang, Y., Fu, L., Xue, G.R., Yu, Y.: Efficient query expansion for advertisement search. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, pp. 51–58. ACM, New York, NY (2009). doi:10.1145/1571941.1571953
Xue, X., Croft, W.B.: Generating reformulation trees for complex queries. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12, pp. 525–534. ACM, New York, NY (2012). doi:10.1145/2348283.2348355
Yi, X., Allan, J.: Discovering missing click-through query language information for web search. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pp. 153–162. ACM, New York, NY (2011). doi:10.1145/2063576.2063604
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Wang, Y., Liu, J., Chen, J. et al. Finding similar queries based on query representation analysis. World Wide Web 17, 1161–1188 (2014). https://doi.org/10.1007/s11280-013-0233-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-013-0233-5