Abstract
Stochastic approaches to tagging of Polish brought results far from being satisfactory. However, successful combination of hand-written rules and a stochastic approach to Czech, as well, as some initial experiments in acquisition of tagging rules for Polish revealed potential capabilities of a rule based approach. The goals are: to define a language of tagging constraints, to construct a set of reduction rules for Polish and to apply Machine Learning to extraction of tagging rules. A language of functional tagging constraints called JOSKIPI is proposed. An extension to the C4.5 algorithm based on introducing complex JOSKIPI operators into decision trees is presented. Construction of a preliminary hand-written tagging rules for Polish is discussed. Finally, the results of the comparison of different versions of the tagger are given.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Dębowski, Ł.: Trigram morphosyntactic tagger for Polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Proceedings of Intelligent Information Processing and Web Mining. Proceedings of the International IIS: IIPWM 2004 Conference, Zakopane, Poland, May 17–20, pp. 409–413. Springer, Heidelberg (2004)
Hajič, J., Krbec, P., Květoň, P., Oliva, K., Petkevič, V.: Serial combination rules and statistics: A case study in czech tagging. In: Proceedings of The 39th Annual Meeting of ACL, pp. 260–267. Morgan Kaufmann Publishers, San Francisco (2001)
Piasecki, M., Gaweł, B.: A rule-based tagger for Polish based on Genetic Algorithm. In: [13]
Karlsson, F., Voutilainen, A., Heikkil a, J., Anttila, A. (eds.): Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin, New York (1995)
Květoň, P.: Language for grammatical rules. Report TR-2003-17, ÚFAL/CKL MFF UK, Prague (2003)
Rudolf, M.: Metody automatycznej analizy korpusu tekstów polskich. Uniwersytet Warszawski, Wydz. Polonistyki (2004)
Quinlan, J.: C4.5: Programms for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Hermjakob, U.: Learning Parse and Translation Decisions From Examples With Rich Context. PhD thesis, University of Texas, Austin (1997)
Przepiórkowski, A.: The IPI PAN Corpus Preliminary Version. Institute of Computer Science PAS (2004)
Oliva, K., Petkevič, V.: Morphological and syntactic tagging of slavonic languages. Lecture Notes for Empirical Linguistics and Natural Language, Fall School, Sozopol (2002)
Piasecki, M., Godlewski, G.: Reductionistic, Tree and Rule Based Tagger for Polish. In: [14]
Márquez, L.: Part-of-speech Tagging: A Machine Learning Approach based on Decision Trees. PhD thesis, Universitat Politécnica de Catalunya (1999)
Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.): Proceedings of Intelligent Information Processing and Web Mining 2005. Advances in Soft Computing. Springer, Berlin (2005)
Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.): Proceedings of Intelligent Information Processing and Web Mining 2006. Advances in Soft Computing. Springer, Berlin (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Piasecki, M. (2006). Hand-Written and Automatically Extracted Rules for Polish Tagger. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science(), vol 4188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846406_26
Download citation
DOI: https://doi.org/10.1007/11846406_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39090-9
Online ISBN: 978-3-540-39091-6
eBook Packages: Computer ScienceComputer Science (R0)