Coniunge et Impera: Multiple-Graph Mining for Query-Log Analysis

Bordino, Ilaria; Donato, Debora; Baeza-Yates, Ricardo

doi:10.1007/978-3-642-15880-3_17

Ilaria Bordino²³,
Debora Donato²⁴ &
Ricardo Baeza-Yates²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6321))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2389 Accesses
1 Citations

Abstract

Query logs of search engines record a huge amount of data about the actions of the users who search for information on the Web. Hence, they contain a wealth of valuable knowledge about the users’ interests and preferences, as well as the implicit feedback that Web searchers provide when they click on the results obtained for their queries.

In this paper we propose a general and completely unsupervised methodology for query-log analysis, which consists of aggregating multiple graph representations of a query log, tailored to capturing different semantic information. The combination is carried out by applying simple but efficient graph-mining techniques. We show that our approach achieves very good performance for two different applications, which are classifying query transitions and recognizing spam queries.

Download to read the full chapter text

Chapter PDF

ADAMiSS: Advanced Data Analysis, Mining and Search, System

Using SPARQL – The Practitioners’ Viewpoint

An analytical study of large SPARQL query logs

Article 02 August 2019

References

Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing, Portland, OR, 11/09 (2009)
Google Scholar
Baeza-Yates, R.: Graphs from search engine queries. In: van Leeuwen, J., Italiano, G.F., van der Hoek, W., Meinel, C., Sack, H., Plášil, F. (eds.) SOFSEM 2007. LNCS, vol. 4362, pp. 1–8. Springer, Heidelberg (2007)
Chapter Google Scholar
Baeza-Yates, R., Hurtado, C., Mendoza, M.: Query Clustering for Boosting Web Page Ranking. In: Favela, J., Menasalvas, E., Chávez, E. (eds.) AWIC 2004. LNCS (LNAI), vol. 3034, pp. 164–175. Springer, Heidelberg (2004)
Chapter Google Scholar
Baeza-Yates, R., Hurtado, C., Mendoza, M.: Query Recommendation Using Query Logs in Search Engines. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 588–596. Springer, Heidelberg (2004)
Chapter Google Scholar
Baeza-Yates, R., Tiberi, A.: Extracting semantic relations from query logs. In: KDD 2007, pp. 76–85. ACM, New York (2007)
Chapter Google Scholar
Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: KDD 2000, pp. 407–416. ACM Press, New York (2000)
Chapter Google Scholar
Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., Vigna, S.: The query-flow graph: model and applications. In: CIKM 2008, pp. 10+ (October 2008)
Google Scholar
Boldi, P., Bonchi, F., Castillo, C., Vigna, S.: From ’dango’ to ’japanese cakes’: Query reformulation models and patterns. In: WI 2009, IEEE, Los Alamitos (September 2009)
Google Scholar
Boldi, P., Vigna, S.: The webgraph framework: Compression techniques. In: WWW 2004. ACM Press, New York (2004)
Google Scholar
Castillo, C., Corsi, C., Donato, D., Ferragina, P., Gionis, A.: Query-log mining for detecting spam. In: AIRWeb 2008 (2008)
Google Scholar
Cohen, J.: Graph twiddling in a mapreduce world. Computing in Science and Engg. 11(4), 29–41 (2009)
Article Google Scholar
Craswell, N., Zoeter, O., Taylor, M., Ramsey, B.: An experimental comparison of click position-bias models. In: WSDM 2008, pp. 87–94. ACM Press, New York (2008)
Chapter Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI 2004: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, p. 10. USENIX Association, Berkeley (2004)
Google Scholar
Diemert, E., Vandelle, G.: Unsupervised query categorization using automatically-built concept graphs. In: WWW 2009, pp. 461–470. ACM, New York (2009)
Chapter Google Scholar
Fonseca, B.M., Golgher, P.B., de Moura, E.S., Ziviani, N.: Using association rules to discover search engines related queries. In: LA-WEB 2003. IEEE Computer Society, Washington (2003)
Google Scholar
He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: SIGMOD 2008, pp. 405–418. ACM, New York (2008)
Chapter Google Scholar
Jones, R., Klinkner, K.L.: Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In: CIKM 2008 (2008)
Google Scholar
Nardini, F.M., Perego, R., Silvestri, F., Castillo, C., Donato, D., Baraglia, R.: Aging effects on query flow graph for query suggestion. In: CIKM 2009 (2009)
Google Scholar
Qiu, G., Liu, K., Bu, J., Chen, C., Kang, Z.: Quantify query ambiguity using odp metadata. In: SIGIR 2007, pp. 697–698. ACM, New York (2007)
Chapter Google Scholar
Radlinski, F., Joachims, T.: Query chains: learning to rank from implicit feedback. In: KDD (2005)
Google Scholar
Raghavan, V.V., Sever, H.: On the reuse of past optimal queries. In: SIGIR 2008, pp. 344–350. ACM Press, New York (1995)
Google Scholar
Rieh, S.Y., Xie, H.: Analysis of multiple query reformulations on the web: the interactive information retrieval context. Inf. Process. Manage. 42(3), 751–768 (2006)
Article Google Scholar
Song, R., Luo, Z., Wen, J.-R., Yu, Y., Hon, H.-W.: Identifying ambiguous queries in web search. In: WWW 2007, pp. 1169–1170. ACM Press, New York (2007)
Chapter Google Scholar
Wen, J.-R., Nie, J.-Y., Zhang, H.-J.: Clustering user queries of a search engine. In: WWW 2001, pp. 162–168. ACM, New York (2001)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Sapienza Università di Roma, Rome, Italy
Ilaria Bordino
Yahoo! Labs, California, US
Debora Donato
Yahoo! Research, Spain
Ricardo Baeza-Yates

Authors

Ilaria Bordino
View author publications
You can also search for this author in PubMed Google Scholar
Debora Donato
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Baeza-Yates
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Matemáticas, Estadística y Computación, Universidad de Cantabria, Avenida de los Castros, s/n, 39071, Santander, Spain
José Luis Balcázar
Yahoo! Research Barcelona, Avinguda Diagonal 177, 08018, Barcelona, Spain
Francesco Bonchi
Yahoo! Research Barcelona, Avinguda Diagnonal 177, 08018, Barcelona, Spain
Aristides Gionis
TAO, CNRS-INRIA-LRI, Université Paris-Sud, 91405, Orsay, France
Michèle Sebag

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bordino, I., Donato, D., Baeza-Yates, R. (2010). Coniunge et Impera: Multiple-Graph Mining for Query-Log Analysis. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6321. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15880-3_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-15880-3_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15879-7
Online ISBN: 978-3-642-15880-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Coniunge et Impera: Multiple-Graph Mining for Query-Log Analysis

Abstract

Chapter PDF

Similar content being viewed by others

ADAMiSS: Advanced Data Analysis, Mining and Search, System

Using SPARQL – The Practitioners’ Viewpoint

An analytical study of large SPARQL query logs

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Coniunge et Impera: Multiple-Graph Mining for Query-Log Analysis

Abstract

Chapter PDF

Similar content being viewed by others

ADAMiSS: Advanced Data Analysis, Mining and Search, System

Using SPARQL – The Practitioners’ Viewpoint

An analytical study of large SPARQL query logs

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation