Abstract
Existing works in literature mostly resort to the web pages or other author-centric resources to detect new words, which require highly complex text processing. This paper exploits the visitor-centric resources, specifically, query logs from the commercial search engine, to detect new words. Since query logs are generated by the search engine users, and are segmented naturally, the complex text processing work can be avoided. By dynamic time warping, a new word detection algorithm based on the trajectory similarity is proposed to distinguish new words from the query logs. Experiments based on real world data sets show the effectiveness and efficiency of the proposed algorithm.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Zheng, Y., Liu, Z., Sun, M., Ru, L., Zhang, Y.: Incorporating user behaviors in new word detection. In: IJCAI 2009: Proceedings of the 21st International Joint Conference on Artificial Intelligence (July 2009)
Liu, H.: A noval method for fast new word detection. Journal of Chinese Information Processing 20, 17–23 (2006) (in Chinese)
Cui, S., Liu, Q., Meng, Y., Hao, Y., Nishino, F.: New word detection based on large-scale corpus. Jounral of Computer Research and Development 43, 927–932 (2006) (in Chinese)
Jia, Z., Shi, Z.: Probability techniques and rule methods for new word detection. In: Computer Engineering, vol. 30 (October 2004) (in Chinese)
Zhao, Q., Liu, T.Y., Bhowmick, S.S., Ma, W.Y.: Event detection from evolution of click-through data. In: KDD 2006: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 484–493. ACM, New York (2006)
Chen, L., Hu, Y., Nejdl, W.: Using subspace analysis for event detection from web click-through data. In: WWW 2008: Proceeding of the 17th International Conference on World Wide Web, pp. 1067–1068. ACM, New York (2008)
Chen, L., Hu, Y., Nejdl, W.: Deck: Detecting events from web click-through data. In: ICDM 2008: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pp. 123–132. IEEE Computer Society, Los Alamitos (2008)
Wang, C., Zhang, M., Ru, L., Ma, S.: Automatic online news topic ranking using media focus and user attention based on aging theory. In: Proc. of CIKM, pp. 1033–1042 (2008)
Wang, C., Zhang, M., Ru, L., Ma, S.: Automatic online news topic ranking using media focus and user attention based on aging theory. In: CIKM 2008: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 1033–1042. ACM, New York (2008)
Lappas, T., Arai, B., Platakis, M., Kotsakos, D., Gunopulos, D.: On burstiness-aware search for document sequences. In: KDD 2009: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 477–486. ACM, New York (2009)
Keogh, E.: Exact indexing of dynamic time warping. In: VLDB 2002 : Proceedings of the 28th International Conference on Very Large Data Bases, VLDB Endowment (2002) pp. 406–417 (2002)
Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: WWW 2006: Proceedings of the 15th International Conference on World Wide Web, pp. 377–386. ACM, New York (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Y., Sun, M., Zhang, Y. (2010). Chinese New Word Detection from Query Logs. In: Cao, L., Zhong, J., Feng, Y. (eds) Advanced Data Mining and Applications. ADMA 2010. Lecture Notes in Computer Science(), vol 6441. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17313-4_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-17313-4_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17312-7
Online ISBN: 978-3-642-17313-4
eBook Packages: Computer ScienceComputer Science (R0)