Stable methods for recognizing acronym-expansion pairs: from rule sets to hidden Markov models

Schumann, Eduardo Torres; Schulz, Klaus U.

doi:10.1007/s10032-005-0146-7

Stable methods for recognizing acronym-expansion pairs: from rule sets to hidden Markov models

Regular Paper
Published: 10 May 2005

Volume 8, pages 1–14, (2006)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Stable methods for recognizing acronym-expansion pairs: from rule sets to hidden Markov models

Download PDF

Eduardo Torres Schumann¹ &
Klaus U. Schulz¹

71 Accesses
5 Citations
3 Altmetric
Explore all metrics

Abstract

The replacement of textual units by synonymous canonical forms is an important prerequisite for many variants of automated text analysis. In scientific texts, one common normalization step is the consistent replacement of acronyms by their definitions. For many acronyms, the definition is found at a certain position of the text where the acronym is introduced and “expanded” to a synonymous sequence of full words. A recent approach to detecting acronym-expansion pairs by Park and Byrd [19] describes possible graphical correspondences between acronyms and expansions by means of fine-grained rules. Here we show how rule sets as used in [19] can be translated into hidden Markov models that abstract from details of the graphical correspondence and improve recall in a significant way. Stability in terms of precision is ensured by exploiting simple properties of the expansion with an optional reinforcement of linguistic knowledge. With this extension of the original formalism, the introduction of large rule sets can be avoided and a fixed model can be applied to a large variety of texts without retraining, with good values both for recall and precision.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Andrade, M.A., Valencia, A.: Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 14(7), 600–607 (1998)
Article PubMed Google Scholar
Boguraev, B., Kennedy, C.: Applications of term identification terminology: domain description and content characterization. Nat. Lang. Eng. 5(1), 17–44 (1999)
Article Google Scholar
Basili, R., Moschitti, A.: Intelligent NLP-driven text classification. Int. J. Artif. Intell. Tools 11(3), 389–423 (2002)
Article Google Scholar
Teresa, C. M.: Terminology: Theory, Methods and Applications. John Benjamins John Benjamins Publishing Company, Amsterdam (1998)
Google Scholar
Charniak, E.: Statistical Language Learning. MIT Press, Cambridge, MA (1993)
Google Scholar
Cohen, J.D.: Highlights: language and domain independent automatic indexing terms for abstracting. J. Am. Soc. Inf. Sci. 46(3), 162–174 (1995)
Article Google Scholar
Dagan, I., Church, K.W.: Termight: identifying and translating technical terminology. In: Proceedings of the 7th Conference of the European Chapter of the Association for Computational Linguistics (EACL’95), pp. 34–40 (1995)
Fung, P., McKeown, K.: A technical word and term translation aid using noisy parallel corpora across language groups. Mach. Transl. J. (Special Issue on New Tools for Human Translators) pp. 53–87 (1996)
Gaizauskas, R., Demetriou, G.,Humphreys, K.: Term recognition in biological science journal articles. In: Proceedings of the Workshop on Computational Terminology for Medical and Biological Applications and 2nd International Conference on Natural Language Processing (NLP-2000), Patras, Greece, pp. 37–44 (2000)
Hirschman, L., Park, J.C., Tsuji, J., Wong, L., Wu, C.H.: Accomplishments and challenges in literature data mining for biology. Bioinformatics 18(12), 1553–1561 (2002)
Article PubMed Google Scholar
Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Nat. Lang. Eng. 1(1), 9–27 (1995)
Article Google Scholar
Larkey, L.S., Ogilvie, P., Price, M.A., Tamilio, B.: Acrophile: an automated acronym extractor and server. In: Proceedings of the 5th ACM International Conference on Digital Libraries (2000)
Lehnert, W., Soderland, S., Aronow, D., Feng, F.: Inductive text classification for medical applications. J. Exp. Theor. Artif. Intell. 7(1), 49–80 (1995)
Article Google Scholar
Acronym/alias identification corpus of Brandeis University. http://www.medstract.org/gold-standards.html/ (2003)
Medline—Searchable with PubMed. http://www.ncbi.nlm.nih.gov/PubMed/. Service by the U.S. National Library of Medicine (2003)
Mikheev, A.: Periods, capitalized word, etc. Comput. Linguist. 28(3), 289–318 (2002)
Article Google Scholar
Nenadić, G., Spasić, I., Ananiadou, S.: Automatic acronym acquisition and term variation management within domain-specific texts. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation, vol. VI, pp. 2155–2162. European Language Resources Association (2002)
U.S. National Library of Medicine: Fact sheet Medline. http://www.nlm.nih.gov/pubs/factsheets/medline.html (2002)
Park, Y., Byrd, R.J.: Hybrid text mining for finding abbreviations and their definitions. In: Conference on Empirical Methods in Natural Language Processing (EMNLP). http://citeseer.nj.nec.com/444674.html (2001)
Park, Y., Byrd, R.J., Boguraev, B.K.: Automatic glossary extraction: beyond terminology identification. In: Proceedings of COLING’02 (2002)
Pustejovsky, J., Castaño, J., Cochran, B., Kotecki, M., Morrell, M., Rumshisky, A.: Linguistic knowledge extraction from medline: automatic construction of an acronym database. Updated version of a paper presented at Medinfo. http://medstract.org/publications.html (2001)
Paice, C.D., Jones, P.A.: The identification of important concepts in highly structured technical papers. In: Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 69–78 1993
Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar
Swanson, D.R.: Medical literature as a potential source of new knowledge. Bull. Med. Libr. Assoc. 78(1), 29–37 (1990)
PubMed Google Scholar
Taghva, K., Gilbreth, J.: Recognizing acronyms and their definitions. Technical Report 95-03, ISRI Information Science Research Institute, University of Nevada, Las Vegas. (1995)
Google Scholar
Taghva, K., Gilbreth, J.: Recognizing acronyms and their definitions. Int. J. Doc. Anal. Recog. 1(4), 191–198 (1999)
Article Google Scholar
Wright, S.E., Budin, G. (eds.): Handbook of Terminology Management, vol. 1, Basic Concepts of Terminology Management. John Benjamins, Amsterdam (1997)
Google Scholar
Yeates, S., Bainbridge, D., Witten, I.H.: Using compression to identify acronyms in text. In: Conference on Data Compression, pp. 582 (2000)
Yeates, S.: Automatic extraction of acronyms from text. In: New Zealand Computer Science Research Students’ Conference, pp. 117–124 (1999)

Download references

Author information

Authors and Affiliations

CIS, University of Munich, Oettingenstr 67, D-80538, München, Germany
Eduardo Torres Schumann & Klaus U. Schulz

Authors

Eduardo Torres Schumann
View author publications
You can also search for this author in PubMed Google Scholar
Klaus U. Schulz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eduardo Torres Schumann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schumann, E.T., Schulz, K.U. Stable methods for recognizing acronym-expansion pairs: from rule sets to hidden Markov models. IJDAR 8, 1–14 (2006). https://doi.org/10.1007/s10032-005-0146-7

Download citation

Received: 01 February 2005
Published: 10 May 2005
Issue Date: April 2006
DOI: https://doi.org/10.1007/s10032-005-0146-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Stable methods for recognizing acronym-expansion pairs: from rule sets to hidden Markov models

Abstract

Article PDF

Similar content being viewed by others

Acronyms: identification, expansion and disambiguation

Pattern Distillation in Grammar Induction Methods

Text Analysis with Enhanced Annotated Suffix Trees: Algorithms and Implementation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Stable methods for recognizing acronym-expansion pairs: from rule sets to hidden Markov models

Abstract

Article PDF

Similar content being viewed by others

Acronyms: identification, expansion and disambiguation

Pattern Distillation in Grammar Induction Methods

Text Analysis with Enhanced Annotated Suffix Trees: Algorithms and Implementation

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation