Abstract
Keyword-based Web search is a widely used approach for locating information on the Web. However, Web users usually suffer from the difficulties of organizing and formulating appropriate input queries due to the lack of sufficient domain knowledge, which greatly affects the search performance. An effective tool to meet the information needs of a search engine user is to suggest Web queries that are topically related to their initial inquiry. Accurately computing query-to-query similarity scores is a key to improve the quality of these suggestions. Because of the short lengths of queries, traditional pseudo-relevance or implicit-relevance based approaches expand the expression of the queries for the similarity computation. They explicitly use a search engine as a complementary source and directly extract additional features (such as terms or URLs) from the top-listed or clicked search results. In this paper, we propose a novel approach by utilizing the hidden topic as an expandable feature. This has two steps. In the offline model-learning step, a hidden topic model is trained, and for each candidate query, its posterior distribution over the hidden topic space is determined to re-express the query instead of the lexical expression. In the online query suggestion step, after inferring the topic distribution for an input query in a similar way, we then calculate the similarity between candidate queries and the input query in terms of their corresponding topic distributions; and produce a suggestion list of candidate queries based on the similarity scores. Our experimental results on two real data sets show that the hidden topic based suggestion is much more efficient than the traditional term or URL based approach, and is effective in finding topically related queries for suggestion.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Baeza-Yates, R.A., Hurtado, C.A., Mendoza, M.: Improving search engines by query clustering. J. Am. Soc. Inf. Sci. Technol. 58(12), 1793–1804 (2007)
Balfe, E., Smyth, B.: An analysis of query similarity in collaborative Web search. In: Advances in Information Retrieval, 27th European Conference on IR Research, (ECIR’05), pp. 330–344. Santiago de Compostela, Spain (2005)
Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: Proceedings of the 16th International Conference on World Wide Web (WWW’07), pp. 131–140. Banff, Alberta, Canada (2007)
Beeferman, D., Berger, A.L.: Agglomerative clustering of a search engine query log. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), pp. 407–416. Boston, MA (2000)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using smart. In: Proceedings of Text REtrieval Conference (TREC’03), pp. 69–080. Gaithersburg, Maryland (2003)
Cao, H., Jiang, D., Pei, J., He, Q., Liao, Z., Chen, E., Li, H.: Context-aware query suggestion by mining click-through and session data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (KDD’08), pp. 875–883. Las Vegas, Nevada (2008)
Carman, M.J., Crestani, F., Harvey, M., Baillie, M.: Towards query log based personalization using topic models. In: Proceedings of the 19th ACM Conference on Information and Knowledge Management(CIKM’10), pp. 1849–1852. Toronto, Ontario (2010)
Chien, S., Immorlica, N.: Semantic similarity between search engine queries using temporal correlation. In: Proceedings of the 14th international conference on World Wide Web, (WWW’05), pp. 2–11. Chiba, Japan (2005)
Chirita, P.A., Firan, C.S., Nejdl, W.: Personalized query expansion for the Web. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07), pp. 7–14. Amsterdam, The Netherlands (2007)
Collins-Thompson, K., Callan, J.: Query expansion using random walk models. In: Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management (CIKM’05), pp. 704–711. Bremen, Germany (2005)
Cui, H., Wen, J.R., Nie, J.Y., Ma, W.Y.: Query expansion by mining user logs. IEEE Trans. Knowl. Data Eng. 15(4), 829–839 (2003)
Dolog, P., Stuckenschmidt, H., Wache, H., Diederich, J.: Relaxing rdf queries based on user and domain preferences. J. Intell. Inf. Syst. 33(3), 239–260 (2009)
Eda, T., Yoshikawa, M., Uchiyama, T., Uchiyama, T.: The effectiveness of latent semantic analysis for building up a bottom-up taxonomy from folksonomy tags. World Wide Web 12(4), 421–440 (2009)
Fan, J., Wu, H., Li, G., Zhou, L.: Suggesting topic-based query terms as you type. In: Advances in Web Technologies and Applications, Proceedings of the 12th Asia-Pacific Web Conference(APWeb’10), pp. 61–67. Buscan, Korea (2010)
Fitzpatrick, L., Dent, M.: Automatic feedback using past queries: social searching? In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’97), pp. 306–313. Philadelphia, PA (1997)
Fonseca, B.M., Golgher, P.B., Pôssas, B., Ribeiro-Neto, B.A., Ziviani, N.: Concept-based interactive query expansion. In: Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, (CIKM’05), pp. 696–703 (2005)
Fu, L., lian Goh, D.H., boon Foo, S.S.: The effect of similarity measures on the quality of query clusters. J. Inf. Sci. 30(5), 396–407 (2004)
Glance, N.S.: Community search assistant. In: Proceedings of the 2001 International Conference on Intelligent User Interfaces (IUI’01), pp. 91–96. Santa Fe, NM (2001)
He, X., Yan, J., Ma, J., Liu, N., Chen, Z.: Query topic detection for reformulation. In: Proceedings of the 16th International Conference on World Wide Web (WWW’07), pp. 1187–1188. Banff, Alberta (2007)
Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI’99), pp. 289–296. Stockholm, Sweden (1999)
Huang, S., Zhao, Q., Mitra, P., Giles, C.L.: Hierarchical location and topic based query expansion. In: Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (AAAI’08), pp. 1150–1155. Chicago, Illinois (2008)
Jansen, B.J., Spink, A., Bateman, J., Saracevic, T.: Real life information retrieval: a study of user queries on the Web. SIGIR Forum 32(1), 5–17 (1998)
Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02), pp. 538–543. Edmonton, Alberta (2002)
Kelly, D., Cushing, A., Dostert, M., Niu, X., Gyllstrom, K.: Effects of popularity and quality on the usage of query suggestions during information search. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems(CHI’10), pp. 45–54. Atlanta, Georgia (2010)
Li, L., Otsuka, S., Kitsuregawa, M.: Query recommendation using large-scale web access logs and Web page archive. In: Proceedings of 19th International Conference on Database and Expert Systems Applications (DEXA’08), pp. 134–141. Turin, Italy (2008)
Li, L., Otsuka, S., Kitsuregawa, M.: Finding related search engine queries by Web community based query enrichment. World Wide Web 13(1–2), 121–142 (2010)
Li, L., Yang, Z., Liu, L., Kitsuregawa, M.: Query-url bipartite based approach to personalized query recommendation. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence,(AAAI’08), pp. 1189–1194. Chicago, Illinois (2008)
Lin, J.: Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theory 37(1), 145–151 (1991)
Ma, H., Lyu, M.R., King, I.: Diversifying query suggestion results. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI’10). Atlanta, Georgia (2010)
Ma, H., Yang, H., King, I., Lyu, M.R.: Learning latent semantic relations from clickthrough data for query suggestion. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, (CIKM’08), pp. 709–718. Napa Valley, California (2008)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Mei, Q., Zhou, D., Church, K.W.: Query suggestion using hitting time. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, (CIKM’08), pp. 469–478. Napa Valley, California (2008)
Pereira, F.C.N., Tishby, N., Lee, L.: Distributional clustering of English words. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL’93), pp. 183–190 (1993)
Ravid, G., Rafaeli, S.: Popularity and findability through log analysis of search terms and queries: the case of a multilingual public service web site. IEEE Trans. Inf. Theory 33(5), 567–583 (2007)
Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. J. Am. Soc. Inf. Sci. 41(4), 288–297 (1990)
Shi, X., Yang, C.C.: Mining related queries from web search engine query logs using an improved association rule mining model. J. Am. Soc. Inf. Sci. Technol. 58(12), 1871–1883 (2007)
Song, Y., wei He, L.: Optimal rare query suggestion with implicit user feedback. In: Proceedings of the 19th International Conference on World Wide Web (WWW’10), pp. 901–910. Raleigh, North Carolina (2010)
Sun, J., Qu, H., Chakrabarti, D., Faloutsos, C.: Neighborhood formation and anomaly detection in bipartite graphs. In: Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05), pp. 418–425. Houston, Texas (2005)
Sun, R., Ong, C.H., Chua, T.S.: Mining dependency relations for query expansion in passage retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06), pp. 382–389. Seattle, Washington (2006)
Vechtomova, O., Wang, Y.: A study of the effect of term proximity on query expansion. J. Inf. Sci. 32(4), 324–333 (2006)
Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94), pp. 61–69. Dublin, Ireland (1994)
Wen, J.R., Nie, J.Y., Zhang, H.: Query clustering using user logs. ACM Trans. Inf. Sys. 20(1), 59–81 (2002)
Xu, J., Croft, W.B.: Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Sys. 18(1), 79–112 (2000)
Yang, J.M., Cai, R., Jing, F., Wang, S., Zhang, L., Ma, W.Y.: Search-based query suggestion. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, (CIKM’08), pp. 1439–1440. Napa Valley, California (2008)
Zhu, Y., Gruenwald, L.: Query expansion using Web access log files. In: Proceedings of the 16th International Conference on Database and Expert Systems Applications (DEXA’05), pp. 686–695. Copenhagen, Denmark (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, L., Xu, G., Yang, Z. et al. An efficient approach to suggesting topically related web queries using hidden topic model. World Wide Web 16, 273–297 (2013). https://doi.org/10.1007/s11280-011-0151-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-011-0151-3