Event Recognition from News Webpages through Latent Ingredients Extraction

Yan, Rui; Li, Yu; Zhang, Yan; Li, Xiaoming

doi:10.1007/978-3-642-17187-1_47

Rui Yan²⁰,
Yu Li²¹,
Yan Zhang²⁰ &
…
Xiaoming Li²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6458))

Included in the following conference series:

Asia Information Retrieval Symposium

1433 Accesses
4 Citations

Abstract

We investigate the novel problem of event recognition from news webpages. “Events” are basic text units containing news elements. We observe that a news article is always constituted by more than one event, namely Latent Ingredients (LIs) which form the whole document. Event recognition aims to mine these Latent Ingredients out. Researchers have tackled related problems before, such as discourse analysis and text segmentation, with different goals and methods. The challenge is to detect event boundaries from plain contexts accurately and the boundary decision is affected by multiple features. Event recognition can be beneficial for topic detection with finer granularity and better accuracy. In this paper, we present two novel event recognition models based on LIs extraction and exploit a set of useful features consisting of context similarity, distance restriction, entity influence from thesaurus and temporal proximity. We conduct thorough experiments with two real datasets and the promising results indicate the effectiveness of these approaches.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

News Event Detection Based Web Big Data

Identification of Event and Topic for Multi-document Summarization

Event Detection for Heterogeneous News Streams

Keywords

References

Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: SIGIR 1998, pp. 37–45 (1998)
Google Scholar
Banerjee, S., Rudnicky, I.A.: A TextTiling based approach to topic boundary detection in meetings. In: Ninth International Conference on Spoken Language Processing, pp. 57–60 (2006)
Google Scholar
Bestgen, Y.: Improving text segmentation using latent semantic analysis: A reanalysis of choi, wiemer-hastings, and moore (2001). Comput. Linguist. 32(1), 5–12 (2006)
Article Google Scholar
Bestgen, Y., Vonk, W.: The role of temporal segmentation markers in discourse processing. Discourse Processes 19(3), 385–406 (1995)
Article Google Scholar
Bestgen, Y., Vonk, W.: Temporal adverbials as segmentation markers in discourse comprehension. Journal of Memory and Language 42(1), 74–87 (1999)
Article Google Scholar
Fukumoto, F., Suzuki, Y.: Detecting shifts in news stories for paragraph extraction. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7 (2002)
Google Scholar
Fukumoto, F., Suzukit, Y., Fukumoto, J.: An automatic extraction of key paragraphs based on context dependency. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 291–298 (1997)
Google Scholar
Grimes, J.: The thread of discourse. Mouton De Gruyter (1975)
Google Scholar
Hearst, M.: A quantitative approach to discourse segmentation. Computational Linguistics 23(1), 33–64 (1997)
Google Scholar
Hearst, M.A.: Multi-paragraph segmentation of expository text. In: Proceedings of the 32nd Meeting on Association for Computational Linguistics, pp. 9–16 (1994)
Google Scholar
Hearst, M.A.: Texttiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist 23(1), 33–64 (1997)
Google Scholar
Hearst, M.A., Plaunt, C.: Subtopic structuring for full-length document access. In: SIGIR 1993, pp. 59–68 (1993)
Google Scholar
Jiang, J., Zhai, C.: Extraction of coherent relevant passages using hidden markov models. ACM Trans. Inf. Syst. 24(3), 295–319 (2006)
Article Google Scholar
Misra, H., Yvon, F., Jose, J.M., Cappe, O.: Text segmentation via topic modeling: an analytical study. In: CIKM 2009, pp. 1553–1556 (2009)
Google Scholar
Ponte, J., Croft, W.: Text segmentation by topic. In: Research and Advanced Technology for Digital Libraries, pp. 113–125 (1997)
Google Scholar
Salton, G., Singhal, A., Buckley, C., Mitra, M.: Automatic text decomposition using text segments and text themes. In: HYPERTEXT 1996, pp. 53–65 (1996)
Google Scholar
Van Mulbregt, P., Carp, I., Gillick, L., Lowe, S., Yamron, J.: Text segmentation and topic tracking on broadcast news via a hidden Markov model approach. In: Fifth International Conference on Spoken Language Processing, pp. 2519–2522 (1998)
Google Scholar
Xie, L., Zeng, J., Feng, W.: Multi-scale texttiling for automatic story segmentation in Chinese broadcast news. In: Li, H., Liu, T., Ma, W.-Y., Sakai, T., Wong, K.-F., Zhou, G. (eds.) AIRS 2008. LNCS, vol. 4993, pp. 345–355. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, P.R. China
Rui Yan, Yan Zhang & Xiaoming Li
School of Computer Science, Beihang University, Beijing, 100083, P.R. China
Yu Li

Authors

Rui Yan
View author publications
You can also search for this author in PubMed Google Scholar
Yu Li
View author publications
You can also search for this author in PubMed Google Scholar
Yan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Information Engineering, Roosevelt Road National Taiwan University, No. 1, Sec. 4, 10617, Taipei, Taiwan R.O.C.
Pu-Jen Cheng
School of Computing, National University of Singapore (NUS), Computing 1, 13 Computing Drive, 117417, Singapore
Min-Yen Kan
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong Shatin, N.T. Hong Kong, China
Wai Lam
School of Computing, Computing 1, National University of Singapore (NUS), 13 Computing Drive, 117417, Singapore
Preslav Nakov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yan, R., Li, Y., Zhang, Y., Li, X. (2010). Event Recognition from News Webpages through Latent Ingredients Extraction. In: Cheng, PJ., Kan, MY., Lam, W., Nakov, P. (eds) Information Retrieval Technology. AIRS 2010. Lecture Notes in Computer Science, vol 6458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17187-1_47

Download citation

DOI: https://doi.org/10.1007/978-3-642-17187-1_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17186-4
Online ISBN: 978-3-642-17187-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Event Recognition from News Webpages through Latent Ingredients Extraction

Abstract

Chapter PDF

Similar content being viewed by others

News Event Detection Based Web Big Data

Identification of Event and Topic for Multi-document Summarization

Event Detection for Heterogeneous News Streams

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Event Recognition from News Webpages through Latent Ingredients Extraction

Abstract

Chapter PDF

Similar content being viewed by others

News Event Detection Based Web Big Data

Identification of Event and Topic for Multi-document Summarization

Event Detection for Heterogeneous News Streams

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation