Abstract
This paper presents algorithms for Chinese and English-Chinese topic detection. Named entities, other nouns and verbs are cue patterns to relate news stories describing the same event. Lexical translation and name transliteration resolve lexical differences between English and Chinese. A two-threshold scheme determines relevance (irrelevance) between a news story and a topic cluster. Lookahead information deals with ambiguous cases in clustering. The least-recently-used removal strategy models the time factor in such a way that older and unimportant terms will have no effect on clustering. Experimental results show that nouns and verbs as well as the least-recently-used removal strategy outperform other models. The performance of the named-entity-only approach decreases slightly, but it has no overhead of nouns-and-verbs approach with the least-recently-used removal strategy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allan, James; Papka, Ron; and Lavrenko, Victor (1998) “On-line New Event Detection and Tracking,” Proceedings of the 21st Annual International ACM SIGIR Conference, Melbourne, 1998, pp. 37–45.
Bian, Guo-Wei and Chen, Hsin-Hsi (2000) “Cross Language Information Access to Multilingual Collections on the Internet,” Journal of American Society for Information Science, 51(3), 2000, pp. 281–296.
Chen, Hsin-Hsi; Bian, Guo-Wei and Lin, Wen-Cheng (1999) “Resolving Translation Ambiguity and Target Polysemy in Cross-Language Information Retrieval,” Proceedings of 37 th Annual Meeting of the Association for Computational Linguistics, University of Maryland, 1999, pp. 215–222.
Chen, Hsin-Hsi; Ding, Yung-Wei and Tsai, Shih-Chung (1998) “Named Entity Extraction for Information Retrieval,” Computer Processing of Oriental Languages, Special Issue on Information Retrieval on Oriental Languages, 12(1), 1998, pp. 75–85.
Chen, Hsin-Hsi; Ding, Yung-Wei; Tsai, Shih-Chung and Bian, Guo-Wei (1998) “Description of the NTU System Used for MET2,” Proceedings of 7 th Message Understanding Conference, Fairfax, VA, 1998, http://www.muc.saic.com/proceedings/muc_7_toc.html.
Chen, Hsin-Hsi and Huang, Sheng-Jie (1999) “A Summarization System for Chinese News from Multiple Sources,” Proceedings of the 4 th International Workshop on Information Retrieval with Asian Languages, 1999, Taipei, Taiwan, pp. 1–7.
Chen, Hsin-Hsi; Huang, Sheng-Jie; Ding, Yung-Wei and Tsai, Shih-Chung (1998) “Proper Name Translation in Cross-Language Information Retrieval,” Proceedings of 17 th International Conference on Computational Linguistics and 36 th Annual Meeting of the Association for Computational Linguistics, Montreal, Quebec, Canada, 1998, pp. 232–236.
Chen, Hsin-Hsi and Lee, Jen-Chang (1996) “Identification and Classification of Proper Nouns in Chinese Texts,” Proceedings of 16th International Conference on Computational Linguistics, Copenhagen, Denmark, 1996, pp. 222–229.
Chen, Hsin-Hsi and Lin, Chuan-Jie (2000) “A Multilingual News Summarizes” Proceedings of 18th International Conference on Computational Linguistics, 2000, Saarland University, pp. 159–165.
Church, K., et al. (1989) “Parsing, Word Associations and Typical Predicate-Argument Relations,” Proceedings of International Workshop on Parsing Technologies, 1989, pp. 389–398.
Fellbaum, C. (1998) WordNet: An Electronic Lexical Database, MIT Press, Cambridge, Mass., 1998.
Harabagiu, S. (1998) Usage of WordNet in Natural Language Processing Systems, Proceedings of the Workshop, Montreal, Quebec, 1998.
Lin, Wei-Hao and Chen, Hsin-Hsi (2000) “Similarity Measure in Backward Transliteration between Different Character Sets and Its Application to CLIR,” Proceedings of 13 th Research on Computational Linguistics and Chinese Language Processing Conference, Taipei, Taiwan, pp. 97–113.
Mei, J.; et al. (1982) tong2yi4ci2ci2lin2. Shanghai Dictionary Press.
Rila, M. (1998) “The Use of WordNet in Information Retrieval,” Proceedings of ACL Workshop on the Usage of WordNet in Natural Language Processing Systems, 1998. pp. 31–37.
Ruiz, M.; et al. (1999) “CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation,” Proceedings of Eighth Text Retrieval Conference, 1999.
Sproat, Richard, et al. (1994) “A Stochastic Finite-State Word-Segmentation Algorithm for Chinese,” Proceedings of 32nd Annual Meeting of the Association for Computational Linguistics, New Mexico, 1994, pp. 66–73.
Yang, Yiming; Pierce, Tom; and Carbonell, Jame (1998) “A Study on Retrospective and On-Line Detection,” Proceedings of the 21st Annual International ACM SIGIR Conference, Melbourne, 1998, pp. 28–36.
Zamir, Oren and Etzioni, Oren (1998) “Web Document Clustering: A Feasibility Demonstration,” Proceedings of the 21st Annual International ACM SIGIR Conference, Melbourne, 1998, pp. 46–54.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media New York
About this chapter
Cite this chapter
Chen, HH., Ku, LW. (2002). An NLP & IR Approach to Topic Detection. In: Allan, J. (eds) Topic Detection and Tracking. The Information Retrieval Series, vol 12. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0933-2_12
Download citation
DOI: https://doi.org/10.1007/978-1-4615-0933-2_12
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5311-9
Online ISBN: 978-1-4615-0933-2
eBook Packages: Springer Book Archive