Abstract
This paper investigates a novel approach to unsupervised morphology induction relying on community detection in networks. In a first step, morphological transformation rules are automatically acquired based on graphical similarities between words. These rules encode substring substitutions for transforming one word form into another. The transformation rules are then applied to the construction of a lexical network. The nodes of the network stand for words while edges represent transformation rules. In the next step, a clustering algorithm is applied to the network to detect families of morphologically related words. Finally, morpheme analyses are produced based on the transformation rules and the word families obtained after clustering. While still in its preliminary development stages, this method obtained encouraging results at Morpho Challenge 2009, which demonstrate the viability of the approach.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Kurimo, M., Virpioja, S., Turunen, V.T., Blackwood, G.W., Byrne, W.: Overview and Results of Morpho Challenge 2009. In: Multilingual Information Access Evaluation 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, Revised Selected Papers. LNCS, vol. I. Springer, Heidelberg (2010)
Matsuo, Y., Sakaki, T., Uchiyama, K., Ishizuka, M.: Graph-based Word Clustering using a Web Search Engine. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 542–550 (2006)
Mihalcea, R.: Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algo-rithms for Sequence Data Labeling. In: Proceedings of the HLT/EMNLP 2005 Conference, pp. 411–418 (2005)
Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Proceedings of EMNLP 2004, pp. 404–411 (2004)
Bybee, J.: Morphology: A Study of the Relation between Meaning and Form. Benjamins, Philadelphia (1985)
Neuvel, S., Fulop, S.A.: Unsupervised Learning of Morphology Without Morphemes. In: Proceedings of the ACL Workshop on Morphological and Phonological Learning 2002, pp. 31–40 (2002)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Physical Review E 69 (2004)
Creutz, M., Lagus, K.: Unsupervised Discovery of Morphemes. In: Proceedings of the ACL Workshop on Morphological and Phonological Learning 2002, pp. 21–30 (2002)
Creutz, M., Lagus, K.: Inducing the Morphological Lexicon of a Natural Language from Unannotated Text. In: Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning, AKRR 2005 (2005)
Bernhard, D.: Apprentissage non supervisé de familles morphologiques par classification ascendante hiérarchique. In: Actes de la 14e conférence sur le Traitement Automatique des Langues Naturelles TALN 2007, pp. 367–376 (2007)
Hathout, N.: Acquistion of the Morphological Structure of the Lexicon Based on Lexical Similarity and Formal Analogy. In: Proceedings of the 3rd Textgraphs workshop on Graph-based Algorithms for Natural Language Processing (COLING 2008), pp. 1–8 (2008)
Bernhard, D.: Unsupervised Morphological Segmentation Based on Segment Predictability and Word Segments Alignment. In: Proceedings of the Pascal Challenges Workshop on the Unsupervised Seg-mentation of Words into Morphemes, pp. 19–23 (April 2006)
Dasgupta, S., Ng, V.: High-Performance, Language-Independent Morphological Segmentation. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2007), pp. 155–163 (2007)
Demberg, V.: A Language-Independent Unsupervised Model for Morphological Segmentation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 920–927 (2007)
Baroni, M., Matiasek, J., Trost, H.: Unsupervised discovery of morphologically related words based on orthographic and semantic similarity. In: Proceedings of the ACL Workshop on Morphological and Phonological Learning 2002, pp. 48–57 (2002)
Gaussier, E.: Unsupervised learning of derivational morphology from inectional lexicons. In: Proceedings of the Workshop on Unsupervised Methods in Natural Language Processing (1999)
Schone, P., Jurafsky, D.: Knowledge-Free Induction of Morphology Using Latent Semantic Analysis. In: Proceedings of the Fourth Conference on Computational Natural Language Learning, pp. 67–72 (2000)
Hathout, N.: From WordNet to CELEX: acquiring morphological links from dictionaries of synonyms. In: Proceedings of the Third International Conference on Language Resources and Evaluation, pp. 1478–1484 (2002)
Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Physical Review E 69 (2004)
van Dongen, S.: Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht (2000)
Dorow, B., Widdows, D., Ling, K., Eckmann, J.P., Sergi, D., Moses, E.: Using Curvature and Markov Clustering in Graphs for Lexical Acquisition and Word Sense Discrimination. In: 2nd MEANING Workshop (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bernhard, D. (2010). MorphoNet: Exploring the Use of Community Structure for Unsupervised Morpheme Analysis. In: Peters, C., et al. Multilingual Information Access Evaluation I. Text Retrieval Experiments. CLEF 2009. Lecture Notes in Computer Science, vol 6241. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15754-7_72
Download citation
DOI: https://doi.org/10.1007/978-3-642-15754-7_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15753-0
Online ISBN: 978-3-642-15754-7
eBook Packages: Computer ScienceComputer Science (R0)