An IR-Inspired Approach to Recovering Named Entity Tags in Broadcast News

Shrestha, Niraj; Vulić, Ivan; Moens, Marie-Francine

doi:10.1007/978-3-642-41057-4_6

Niraj Shrestha¹⁹,
Ivan Vulić¹⁹ &
Marie-Francine Moens¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8201))

Included in the following conference series:

Information Retrieval Facility Conference

458 Accesses

Abstract

We propose a new approach to improving named entity recognition (NER) in broadcast news speech data. The approach proceeds in two key steps: (1) we automatically detect document alignments between highly similar speech documents and corresponding written news stories that are easily obtainable from the Web; (2) we employ term expansion techniques commonly used in information retrieval to recover named entities that were initially missed by the speech transcriber. We show that our method is able to find named entities missing in the transcribed speech data, and additionally to correct incorrectly assigned named entity tags. Consequently, our novel approach improves state-of-the-art NER results from speech data both in terms of recall and precision.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

A Simple Yet Effective Approach for Named Entity Recognition from Transcribed Broadcast News

The Tanl Tagger for Named Entity Recognition on Transcribed Broadcast News at Evalita 2011

Named Entity Recognition on Transcribed Broadcast News at EVALITA 2011

Keywords

References

ffmpeg audio/video tool @ONLINE (2012), http://www.ffmpeg.org
Basili, R., Cammisa, M., Donati, E.: RitroveRAI: A Web application for semantic indexing and hyperlinking of multimedia news. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 97–111. Springer, Heidelberg (2005)
Chapter Google Scholar
Béchet, F., Gorin, A.L., Wright, J.H., Tur, D.H.: Detecting and extracting named entities from spontaneous speech in a mixed-initiative spoken dialogue context: How may I help you? Speech Comm. 42(2), 207–225 (2004)
Article Google Scholar
Blanco, R., De Francisci Morales, G., Silvestri, F.: Towards leveraging closed captions for news retrieval. In: Proc. of WWW Companion, pp. 135–136 (2013)
Google Scholar
Bulyko, I., Ostendorf, M., Stolcke, A.: Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures. In: Proc. of NAACL-HLT, pp. 7–9 (2003)
Google Scholar
Cao, G., Nie, J.Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: Proc. of SIGIR, pp. 243–250 (2008)
Google Scholar
Chinchor, N.A.: MUC-7 named entity task definition (version 3.5). In: Proc. of MUC (1997)
Google Scholar
Favre, B., Béchet, F., Nocera, P.: Robust named entity extraction from large spoken archives. In: Proc. of EMNLP, pp. 491–498 (2005)
Google Scholar
FBK: FBK ASR transcription (2013), https://hlt-tools.fbk.eu/tosca/publish/ASR/transcribe
Finkel, J.R., Grenager, T., Manning, C.D.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proc. of ACL, pp. 363–370 (2005)
Google Scholar
Horlock, J., King, S.: Discriminative methods for improving named entity extraction on speech data. In: Proc. of EUROSPEECH, pp. 2765–2768 (2003)
Google Scholar
Kim, M.H., Compton, P.: Improving the performance of a named entity recognition system with knowledge acquisition. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 97–113. Springer, Heidelberg (2012)
Chapter Google Scholar
Kubala, F., Schwartz, R., Stone, R., Weischedel, R.: Named entity extraction from speech. In: Proc. of the DARPA Broadcast News Transcription and Understanding, pp. 287–292 (1998)
Google Scholar
Lei, X., Wang, W., Stolcke, A.: Data-driven lexicon expansion for Mandarin broadcast news and conversation speech recognition. In: Proc. of ICASSP, pp. 4329–4332 (2009)
Google Scholar
Miller, D., Schwartz, R., Weischedel, R., Stone, R.: Named entity extraction from broadcast news. In: Proc. of the DARPA Broadcast News, pp. 37–40 (1999)
Google Scholar
Mishra, T., Bangalore, S.: Qme!: A speech-based question-answering system on mobile devices. In: Proc. of NAACL-HLT, pp. 55–63 (2010)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Article Google Scholar
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)
Article Google Scholar
Odijk, D., Meij, E., de Rijke, M.: Feeding the second screen: semantic linking based on subtitles. In: Proc. of the 10th Conference on OAIR, OAIR 2013, pp. 9–16 (2013)
Google Scholar
Palmer, D.D., Ostendorf, M., Burger, J.D.: Robust information extraction from automatically generated speech transcriptions. Speech Comm. 32(1-2), 95–109 (2000)
Article Google Scholar
Przybocki, J.M., Fiscus, J.G., Garofolo, J.S., Pallett, D.S.: HUB-4 information extraction evaluation. In: Proc. of the DARPA Broadcast News, pp. 13–18 (1999)
Google Scholar
Sang, E.F.T.K., Meulder, F.D.: Introduction to the CoNLL-2003 shared task: Language-Independent named entity recognition. In: Proc. of CoNLL, pp. 142–147 (2003)
Google Scholar
Stanford: Stanford NER in CoNLL 2003 (2003), http://nlp.stanford.edu/projects/project-ner.shtml
Sundheim, B.: Overview of results of the MUC-6 evaluation. In: Proc. of MUC, pp. 13–31 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, KU Leuven, Belgium
Niraj Shrestha, Ivan Vulić & Marie-Francine Moens

Authors

Niraj Shrestha
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Vulić
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Francine Moens
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstraße 9-11/188, 1040, Vienna, Austria
Mihai Lupu
Google Inc., Brandschenkestraße 110, 8002, Zurich, Switzerland
Evangelos Kanoulas
Department of Multimedia and Graphic Arts, Cyprus University of Technology, 30 Archbishop Kyprianou Street, 3036, Limassol, Cyprus
Fernando Loizides

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shrestha, N., Vulić, I., Moens, MF. (2013). An IR-Inspired Approach to Recovering Named Entity Tags in Broadcast News. In: Lupu, M., Kanoulas, E., Loizides, F. (eds) Multidisciplinary Information Retrieval. IRFC 2013. Lecture Notes in Computer Science, vol 8201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41057-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-41057-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41056-7
Online ISBN: 978-3-642-41057-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An IR-Inspired Approach to Recovering Named Entity Tags in Broadcast News

Abstract

Chapter PDF

Similar content being viewed by others

A Simple Yet Effective Approach for Named Entity Recognition from Transcribed Broadcast News

The Tanl Tagger for Named Entity Recognition on Transcribed Broadcast News at Evalita 2011

Named Entity Recognition on Transcribed Broadcast News at EVALITA 2011

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An IR-Inspired Approach to Recovering Named Entity Tags in Broadcast News

Abstract

Chapter PDF

Similar content being viewed by others

A Simple Yet Effective Approach for Named Entity Recognition from Transcribed Broadcast News

The Tanl Tagger for Named Entity Recognition on Transcribed Broadcast News at Evalita 2011

Named Entity Recognition on Transcribed Broadcast News at EVALITA 2011

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation