Abstract
We investigate the novel problem of event recognition from news webpages. “Events” are basic text units containing news elements. We observe that a news article is always constituted by more than one event, namely Latent Ingredients (LIs) which form the whole document. Event recognition aims to mine these Latent Ingredients out. Researchers have tackled related problems before, such as discourse analysis and text segmentation, with different goals and methods. The challenge is to detect event boundaries from plain contexts accurately and the boundary decision is affected by multiple features. Event recognition can be beneficial for topic detection with finer granularity and better accuracy. In this paper, we present two novel event recognition models based on LIs extraction and exploit a set of useful features consisting of context similarity, distance restriction, entity influence from thesaurus and temporal proximity. We conduct thorough experiments with two real datasets and the promising results indicate the effectiveness of these approaches.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: SIGIR 1998, pp. 37–45 (1998)
Banerjee, S., Rudnicky, I.A.: A TextTiling based approach to topic boundary detection in meetings. In: Ninth International Conference on Spoken Language Processing, pp. 57–60 (2006)
Bestgen, Y.: Improving text segmentation using latent semantic analysis: A reanalysis of choi, wiemer-hastings, and moore (2001). Comput. Linguist. 32(1), 5–12 (2006)
Bestgen, Y., Vonk, W.: The role of temporal segmentation markers in discourse processing. Discourse Processes 19(3), 385–406 (1995)
Bestgen, Y., Vonk, W.: Temporal adverbials as segmentation markers in discourse comprehension. Journal of Memory and Language 42(1), 74–87 (1999)
Fukumoto, F., Suzuki, Y.: Detecting shifts in news stories for paragraph extraction. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7 (2002)
Fukumoto, F., Suzukit, Y., Fukumoto, J.: An automatic extraction of key paragraphs based on context dependency. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 291–298 (1997)
Grimes, J.: The thread of discourse. Mouton De Gruyter (1975)
Hearst, M.: A quantitative approach to discourse segmentation. Computational Linguistics 23(1), 33–64 (1997)
Hearst, M.A.: Multi-paragraph segmentation of expository text. In: Proceedings of the 32nd Meeting on Association for Computational Linguistics, pp. 9–16 (1994)
Hearst, M.A.: Texttiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist 23(1), 33–64 (1997)
Hearst, M.A., Plaunt, C.: Subtopic structuring for full-length document access. In: SIGIR 1993, pp. 59–68 (1993)
Jiang, J., Zhai, C.: Extraction of coherent relevant passages using hidden markov models. ACM Trans. Inf. Syst. 24(3), 295–319 (2006)
Misra, H., Yvon, F., Jose, J.M., Cappe, O.: Text segmentation via topic modeling: an analytical study. In: CIKM 2009, pp. 1553–1556 (2009)
Ponte, J., Croft, W.: Text segmentation by topic. In: Research and Advanced Technology for Digital Libraries, pp. 113–125 (1997)
Salton, G., Singhal, A., Buckley, C., Mitra, M.: Automatic text decomposition using text segments and text themes. In: HYPERTEXT 1996, pp. 53–65 (1996)
Van Mulbregt, P., Carp, I., Gillick, L., Lowe, S., Yamron, J.: Text segmentation and topic tracking on broadcast news via a hidden Markov model approach. In: Fifth International Conference on Spoken Language Processing, pp. 2519–2522 (1998)
Xie, L., Zeng, J., Feng, W.: Multi-scale texttiling for automatic story segmentation in Chinese broadcast news. In: Li, H., Liu, T., Ma, W.-Y., Sakai, T., Wong, K.-F., Zhou, G. (eds.) AIRS 2008. LNCS, vol. 4993, pp. 345–355. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yan, R., Li, Y., Zhang, Y., Li, X. (2010). Event Recognition from News Webpages through Latent Ingredients Extraction. In: Cheng, PJ., Kan, MY., Lam, W., Nakov, P. (eds) Information Retrieval Technology. AIRS 2010. Lecture Notes in Computer Science, vol 6458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17187-1_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-17187-1_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17186-4
Online ISBN: 978-3-642-17187-1
eBook Packages: Computer ScienceComputer Science (R0)