Abstract
Morphological knowledge (inflection, derivation, compounds) is useful for medical language processing. Some is available for medical English in the UMLS Specialist Lexicon, but not for the French language. Large corpora of medical texts can nowadays be obtained from the Web. We propose here a method, based on the cooccurrence of formally similar words, which takes advantage of such a corpus to learn morphological knowledge for French medical words. The relations obtained before filtering have an average precision of 75.6% after 5,000 word pairs. Detailed examination of the results obtained on a sample of 376 French SNOMED anatomy nouns shows that 91–94% of the proposed derived adjectives are correct, that 36% of the nouns receive a correct adjective, and that this method can add 41% more derived adjectives than SNOMED already specifies. We discuss these results and propose directions for improvement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lovis, C., Baud, R., Michel, P.A., Scherrer, J.R.: A semi-automatic ICD encoder. J. Am. Med. Inform. Assoc. 3, 937–937 (1996)
Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. J. Am. Med. Inform. Assoc. 8 (2001)
Hahn, U., Honeck, M., Piotrowski, M., Schulz, S.: Subword segmentation: Leveling out morphological variations for medical document retrieval. J. Am. Med. Inform. Assoc. 8, 229–233 (2001)
Zweigenbaum, P., Darmoni, S.J., Grabar, N.: The contribution of morphological knowledge to French MeSH mapping for information retrieval. J. Am. Med. Inform. Assoc. 8, 796–800 (2001)
McCray, A.T., Srinivasan, S., Browne, A.C.: Lexical methods for managing variation in biomedical terminologies. In: Proc 18th Annu. Symp. Comput. Appl. Med. Care, Washington, pp. 235–239. Mc Graw Hill, New York (1994)
Weske-Heck, G., Zaiß, A., Zabel, M., Schulz, S., Giere, W., Schopen, M., Klar, R.: The German Specialist Lexicon. J. Am. Med. Inform. Assoc. 8 (2002)
Zweigenbaum, P., Baud, R., Burgun, A., Namer, F., Jarrousse, E., Grabar, N., Ruch, P., Le Duff, F., Thirion, B., Darmoni, S.: Towards a unified medical lexicon for French. In: Baud, R., Fieschi, M., Le Beux, P., Ruch, P. (eds.) Proceedings Medical Informatics Europe, pp. 415–420. IOS Press, Amsterdam (2003)
Lovis, C., Michel, P.A., Baud, R., Scherrer, J.R.: Word segmentation processing: a way to exponentially extend medical dictionaries. In: Greenes, R.A., Peterson, H.E., Protti, D.J. (eds.) Proc 8th World Congress on Medical Informatics, pp. 28–32 (1995)
Zweigenbaum, P.: Resources for the medical domain: medical terminologies, lexicons and corpora. ELRA Newsletter 6, 8–11 (2001)
Zweigenbaum, P., Grabar, N.: Automatic acquisition of morphological knowledge for medical language processing. In: Horn, W., Shahar, Y., Lindberg, G., Andreassen, S., Wyatt, J. (eds.) Artificial Intelligence in Medicine. LNCS (LNAI), pp. 416–420. Springer, Heidelberg (1999)
Grabar, N., Zweigenbaum, P.: Automatic acquisition of domain-specific morphological resources from thesauri. In: Proceedings of RIAO 2000: Content-Based Multimedia Information Access, Paris, France, C.I.D, pp. 765–784 (2000)
Jacquemin, C.: Guessing morphology from terms and corpora. In: Proc. 20th ACM SIGIR, Philadelphia, PA, pp. 156–167 (1997)
Xu, J., Croft, B.W.: Corpus-based stemming using co-occurrence of word variants. ACM Transactions on Information Systems 16, 61–81 (1998)
Gaussier, E.: Unsupervised learning of derivational morphology from inflectional lexicons. In: Kehler, A., Stolcke, A. (eds.) ACL workshop on Unsupervised Methods in Natural Language Learning, College Park, Md (1999)
Daille, B.: Identification des adjectifs relationnels en corpus. In: Amsili, P. (ed.) Proceedings of TALN 1999 (Traitement automatique des langues naturelles), Cargèse, ATALA, pp. 105–114 (1999)
Hathout, N., Namer, F., Dal, G.: An experimental constructional database: the MorTAL project. In: Boucher, P. (ed.) Many morphologies, pp. 178–209. Cascadilla Press, Somerville (2002)
Porter, M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)
Hadouche, F.: Acquisition de resources morphologiques à partir de corpus. DESS d’ingénierie multilingue, Institut National des Langues et Civilisations Orientales, Paris (2002)
Côtè, R.A.: Répertoire d’anatomopathologie de la SNOMED internationale, vol. 3.4. Université de Sherbrooke, Sherbrooke, Québec. (1996)
Darmoni, S.J., Leroy, J.P., Thirion, B., Baudic, F., Douyere, M., Piot, J.: CISMeF: a structured health resource guide. Methods Inf. Med. 39, 30–35 (2000)
Grefenstette, G., Nioche, J.: Estimation of English and non-English language use on the WWW. In: Proceedings of RIAO 2000: Content-Based Multimedia Information Access, Paris, France, C.I.D, pp. 237–246 (2000)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester, UK, pp. 44–49 (1994)
Namer, F.: FLEMM: un analyseur flexionnel du français à base de règles. Traitement Automatique des Langues 41, 523–547 (2000)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA (1999)
Bodenreider, O., Zweigenbaum, P.: Identifying proper names in parallel medical terminologies. In: Hasman, A., Blobel, B., Dudeck, J., Engelbrecht, R., Gell, G., Prokosh, H.U. (eds.) Medical Infobahn for Europe—Proceedings of MIE 2000 and GMDS 2000, pp. 443–447. IOS Press, Amsterdam (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zweigenbaum, P., Grabar, N. (2003). Learning Derived Words from Medical Corpora. In: Dojat, M., Keravnou, E.T., Barahona, P. (eds) Artificial Intelligence in Medicine. AIME 2003. Lecture Notes in Computer Science(), vol 2780. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39907-0_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-39907-0_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20129-8
Online ISBN: 978-3-540-39907-0
eBook Packages: Springer Book Archive