Abstract
This paper deals with Czech disambiguated corpus DESAM. It is a tagged corpus which has been manually disambiguated and can be used in various applications. We discuss the structure of the corpus, tools used for its managing, linguistic applications, and also possible use of machine learning techniques relying on the disambiguated data. Possible ways of developing the procedures for complete automatic disambiguation are considered.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
K. Pala. Desambiguating syntactic constructions from tagged corpus. In Workshop on AI Methods in Machine Learning, 1996.
R. Garside. The CLAWS word-tagging system, The computational analysis of English. Longman, London, 1987.
D. Cutting. A practical part-of-speech tagger. In Proceedings of the 3rd Conference on Natural Language Processing, Trento, Italy, March–April 1992.
F. Karlsson, A. Voutilainen, J. Heikkila, and A. Anttila. Constraint Grammars. Mouton de Gruyter, Berlin, 1995.
P. Ševeček. LEMMA — a lemmatizer for Czech. Brno, 1996. (manuscript).
K. Osolsobě. Algorithmic description of Czech morphology. PhD thesis, Masaryk University, Brno, 1996.
V. Puža. Syntactic analysis of natural language with a view to a corpora tagging. Master's thesis, Faculty of Informatics, Masaryk University, Brno, 1997.
B. M. Schulze and O. Christ. The CQP User's Manual.
O. Christ. The XKWIC User Manual.
J. Jelinek, J. V. Bečka, and M. Těšiteloá. Frequency Dictionary of Czech. Academia, Praha, 1961.
J. Hajič and B. Hladká. Probabilistic and rule-based tagging of an inflective language — a comparison. Technical Report 1, Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, November 1996.
T. J. Sejnowski and C. R. Rosenberg. Parallel Networks that Learn to Pronounce English Text. Complex Systems, 1:145–168, 1987.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pala, K., Rychlý, P., Smrž, P. (1997). DESAM — Annotated corpus for Czech. In: Plášil, F., Jeffery, K.G. (eds) SOFSEM'97: Theory and Practice of Informatics. SOFSEM 1997. Lecture Notes in Computer Science, vol 1338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63774-5_134
Download citation
DOI: https://doi.org/10.1007/3-540-63774-5_134
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63774-5
Online ISBN: 978-3-540-69645-2
eBook Packages: Springer Book Archive