Abstract
We propose a new approach to improving named entity recognition (NER) in broadcast news speech data. The approach proceeds in two key steps: (1) we automatically detect document alignments between highly similar speech documents and corresponding written news stories that are easily obtainable from the Web; (2) we employ term expansion techniques commonly used in information retrieval to recover named entities that were initially missed by the speech transcriber. We show that our method is able to find named entities missing in the transcribed speech data, and additionally to correct incorrectly assigned named entity tags. Consequently, our novel approach improves state-of-the-art NER results from speech data both in terms of recall and precision.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
ffmpeg audio/video tool @ONLINE (2012), http://www.ffmpeg.org
Basili, R., Cammisa, M., Donati, E.: RitroveRAI: A Web application for semantic indexing and hyperlinking of multimedia news. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 97–111. Springer, Heidelberg (2005)
Béchet, F., Gorin, A.L., Wright, J.H., Tur, D.H.: Detecting and extracting named entities from spontaneous speech in a mixed-initiative spoken dialogue context: How may I help you? Speech Comm. 42(2), 207–225 (2004)
Blanco, R., De Francisci Morales, G., Silvestri, F.: Towards leveraging closed captions for news retrieval. In: Proc. of WWW Companion, pp. 135–136 (2013)
Bulyko, I., Ostendorf, M., Stolcke, A.: Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures. In: Proc. of NAACL-HLT, pp. 7–9 (2003)
Cao, G., Nie, J.Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: Proc. of SIGIR, pp. 243–250 (2008)
Chinchor, N.A.: MUC-7 named entity task definition (version 3.5). In: Proc. of MUC (1997)
Favre, B., Béchet, F., Nocera, P.: Robust named entity extraction from large spoken archives. In: Proc. of EMNLP, pp. 491–498 (2005)
FBK: FBK ASR transcription (2013), https://hlt-tools.fbk.eu/tosca/publish/ASR/transcribe
Finkel, J.R., Grenager, T., Manning, C.D.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proc. of ACL, pp. 363–370 (2005)
Horlock, J., King, S.: Discriminative methods for improving named entity extraction on speech data. In: Proc. of EUROSPEECH, pp. 2765–2768 (2003)
Kim, M.H., Compton, P.: Improving the performance of a named entity recognition system with knowledge acquisition. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 97–113. Springer, Heidelberg (2012)
Kubala, F., Schwartz, R., Stone, R., Weischedel, R.: Named entity extraction from speech. In: Proc. of the DARPA Broadcast News Transcription and Understanding, pp. 287–292 (1998)
Lei, X., Wang, W., Stolcke, A.: Data-driven lexicon expansion for Mandarin broadcast news and conversation speech recognition. In: Proc. of ICASSP, pp. 4329–4332 (2009)
Miller, D., Schwartz, R., Weischedel, R., Stone, R.: Named entity extraction from broadcast news. In: Proc. of the DARPA Broadcast News, pp. 37–40 (1999)
Mishra, T., Bangalore, S.: Qme!: A speech-based question-answering system on mobile devices. In: Proc. of NAACL-HLT, pp. 55–63 (2010)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)
Odijk, D., Meij, E., de Rijke, M.: Feeding the second screen: semantic linking based on subtitles. In: Proc. of the 10th Conference on OAIR, OAIR 2013, pp. 9–16 (2013)
Palmer, D.D., Ostendorf, M., Burger, J.D.: Robust information extraction from automatically generated speech transcriptions. Speech Comm. 32(1-2), 95–109 (2000)
Przybocki, J.M., Fiscus, J.G., Garofolo, J.S., Pallett, D.S.: HUB-4 information extraction evaluation. In: Proc. of the DARPA Broadcast News, pp. 13–18 (1999)
Sang, E.F.T.K., Meulder, F.D.: Introduction to the CoNLL-2003 shared task: Language-Independent named entity recognition. In: Proc. of CoNLL, pp. 142–147 (2003)
Stanford: Stanford NER in CoNLL 2003 (2003), http://nlp.stanford.edu/projects/project-ner.shtml
Sundheim, B.: Overview of results of the MUC-6 evaluation. In: Proc. of MUC, pp. 13–31 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shrestha, N., Vulić, I., Moens, MF. (2013). An IR-Inspired Approach to Recovering Named Entity Tags in Broadcast News. In: Lupu, M., Kanoulas, E., Loizides, F. (eds) Multidisciplinary Information Retrieval. IRFC 2013. Lecture Notes in Computer Science, vol 8201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41057-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-41057-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41056-7
Online ISBN: 978-3-642-41057-4
eBook Packages: Computer ScienceComputer Science (R0)