Abstract
Precision-oriented search results such as those typically returned by the major search engines are vulnerable to issues of polysemy. When the same term refers to different things, the dominant sense is preferred in the rankings of search results. In this paper, we propose a novel two-box technique in the context of Web search that utilizes contextual terms provided by users for query disambiguation, making it possible to prefer other senses without altering the original query. A prototype system, Bobo, has been implemented. In Bobo, contextual terms are used to capture domain knowledge from users, help estimate relevance of search results, and route them towards a user-intended domain. A vast advantage of Bobo is that a wide range of domain knowledge can be effectively utilized, where helpful contextual terms do not even need to co-occur with query terms on any page. We have extensively evaluated the performance of Bobo on benchmark datasets that demonstrates the utility and effectiveness of our approach.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Artiles, J., Gonzalo, J., Verdejo, F.: A testbed for people searching strategies in the WWW. In: Proceeding of the 28th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 569–570 (2005)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley (1999)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th International World Wide Web Conference (WWW), pp. 107–117 (1998)
Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002)
Buckley, C., Amit, S., Mandar, M.: New retrieval approaches using smart: Trec 4. In: Proceeding of the 4th Text Retrieval Conference (TREC-4), pp. 25–48 (1995)
Cao, H., Jiang, D., Pei, J., Chen, E., Li, H.: Towards context-aware search by learning a very large variable length hidden markov model from search logs. In: Proceedings of the 18th International World Wide Web Conference (WWW), pp. 191–200 (2009)
Cao, H., Jiang, D., Pei, J., He, Q., Liao, Z., Chen, E., Li, H.: Context-aware query suggestion by mining click-through and session data. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 875–883 (2008)
Croft, W., Harper, D.: Using probabilistic models of information retrieval without relevance information. J. Doc. 35(4), 285–295 (1979)
Cui, H., Wen, J.R., Nie, J.Y., Ma, W.Y.: Probabilistic query expansion using query logs. In: Proceedings of the 11th International World Wide Web Conference (WWW), pp. 325–332 (2002)
Fellbaum, C.: WordNet: An Electronic Lexical Database. Bradford Books (1998)
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. ACM Trans. Inf. Sys. 20(1), 116–131 (2002)
Gao, B.J., Anastasiu, D.C., Jiang, X.: Utilizing user-input contextual terms for query disambiguation. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pp. 329–337 (2010)
Gao, B.J., Ester, M., Cai, J.Y., Schulte, O., Xiong, H.: The minimum consistent subset cover problem and its applications in data mining. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 310–319 (2007)
Gkanogiannis, A., Kalamboukis, T.: An algorithm for text categorization. In: Proceeding of the 31st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 869–870 (2008)
Guha, R.V., Garg, A.: Disambiguating people in search. In: Proceedings of the 13th International World Wide Web Conference (WWW) (2004)
Haveliwala, T.H.: Topic-sensitive pagerank. In: Proceedings of the 11th International World Wide Web Conference (WWW), pp. 517–526 (2002)
Jansen, B.J., Spink, A., , Saracevic, T.: Real life, real users and real needs: A study and analysis of users queries on the web. Information Process. and Manage. 36(2), 207–227 (2000)
Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the 12th International World Wide Web Conference (WWW), pp. 271–279 (2003)
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: Proceeding of the 28th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 154–161 (2005)
Kelly, D., Dollu, V.D., , Fu, X.: The loquacious user: a document-independent source of terms for query expansion. In: Proceeding of the 18th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 457–464 (2005)
Kelly, D., Teevan, J.: Implicit feedback for inferring user preference: a bibliography. SIGIR Forum 37(2), 18–28 (2003)
Kraft, R., Chang, C.C., Maghoul, F., Kumar, R.: Searching with context. In: Proceedings of the 15th International World Wide Web Conference (WWW), pp. 477–486 (2006)
Lawrence, S.: Context in web search. IEEE Data Eng. Bull. 23(3), 25–32 (2000)
Lee, J.H.: Combining the evidencde of different relevance feedback methods for information retrieval. Information Process. and Manage. 34(6), 681–691 (1998)
Lee, K.S., Croft, W.B., Allan, J.: A cluster-based resampling method for pseudo-relevance feedback. In: Proceeding of the 31st international ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 235–242 (2008)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Mitra, M., Singhal, A., Buckley, C.: Improving automatic query expansion. In: Proceeding of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 206–214 (1998)
Mizzaro, S., Vassena, L.: A social approach to context-aware retrieval. World Wide Web 14(4), 377–405 (2011)
Qiu, Y., Frei, H.P.: Concept based query expansion. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 160–169 (1993)
Rocchio, J.: Relevance Feedback in Information Retrieval. In The SMART Retrieval System—Experiments in Automatic Document Processing. Prentice-Hall (1971)
Rose, D.E., Levinson, D.: Understanding user goals in web search. In: Proceedings of the 13th International World Wide Web Conference (WWW), pp. 13–19 (2004)
Ruthven, I., Lalmas, M.: A survey on the use of relevance feedback for information access systems. Knowl. Eng. Rev. 18(1), 95–145 (2003)
Salton, G., Buckley, C.: Improving Retrieval Performance by Relevance Feedback. Morgan Kaufmann (1997)
Sandhaus, E.: The New York Times Annotated Corpus. Linguistic Data Consortium, Philadelphia (2008)
Schapire, R.E., Singer, Y., Singhal, A.: Boosting and rocchio applied to text filtering. In: Proceeding of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 215–223 (1998)
Schutze, H., Hull, D.A., Pedersen, J.O.: A comparison of classifiers and document representations for the routing problem. In: Proceeding of the 18th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 229–237 (1995)
Shen, X., Tan, B., Zhai, C.: Context-sensitive information retrieval using implicit feedback. In: Proceedings of the 28th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 43–50 (2005)
Singhal, A., Mitra, M., Buckley, C.: Learning routing queries in a query zone. In: Proceeding of the 20th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 25–32 (1997)
Teevan, J., Dumais, S.T., Horvitz, E.: Personalizing search via automated analysis of interests and activities. In: Proceeding of the 28th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 449–456 (2005)
Vassilvitskii, S., Brill, E.: Using web-graph for relevance feedback in web search. In: Proceeding of the 29th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 147–153 (2006)
Voorhees, E.M.: Using wordnet to disambiguate word senses for text retrieval. In: Proceeding of the 16th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 171–180 (1993)
White, R.W., Clarke, C.L., Cucerzan, S.: Comparing query logs and pseudo-relevance feedback for web search query refinement. In: Proceeding of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 831–832 (2007)
White, R.W., Jose, J.M., Rijsbergen, C.J.V., Ruthven, I.: A simulated study of implicit feedback models. In: Proceeding of the 26th European Conference on Information Retrieval (ECIR), pp. 311–326 (2004)
Yu, S., Cai, D., Wen, J.R., Ma, W.Y.: Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: Proceedings of the 12th International World Wide Web Conference (WWW), pp. 11–18 (2003)
Zhang, H., Chen, Z., Li, M., Su, Z.: Relevance feedback and learning in content-based image search. World Wide Web 6(2), 131–155 (2003)
Zhu, Y., Callan, J., Carbonell, J.: The impact of history length on personalized search. In: Proceeding of the 31st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 715–716 (2008)
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this paper was published in the Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10) [12].
Rights and permissions
About this article
Cite this article
Anastasiu, D.C., Gao, B.J., Jiang, X. et al. A novel two-box search paradigm for query disambiguation. World Wide Web 16, 1–29 (2013). https://doi.org/10.1007/s11280-011-0154-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-011-0154-0