Abstract
The paper presents WCCL, a new formalism and toolkit for constructing morpho-syntactic features, a crucial task for many natural language processing algorithms. One existing solution, JOSKIPI, is analysed from two perspectives: features of the formalism as well as software engineering-related issues. Then we propose its successor. A short case study follows, exemplifying the improvement enabled by using rich features expressed with WCCL. The formalism is targeted at Polish, although it seems well suited for any inflectional language.
This work is financed by Innovative Economy Programme project POIG.01.01.02-14-013/09.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Acedański, S., Gołuchowski, K.: A morphosyntactic rule-based brill tagger for polish. In: Proceedings of Intelligent Information Systems, pp. 67–76 (2009)
Broda, B., Piasecki, M.: SuperMatrix: a general tool for lexical semantic knowledge acquisition. In: Speech and Language Technology, vol. 11, pp. 239–254. Polish Phonetics Assocation (2008)
Daelemans, W., van den Bosch, A.: Memory-Based Language Processing. Cambridge University Press, Cambridge (2005)
Daelemans, W., Zavrel, J., van den Bosch, A., van der Sloot, K.: MBT: Memory-Based Tagger, version 3.2. Tech. Rep. 10-04, ILK (2010)
Leidner, J.: Current Issues in Software Engineering for Natural Language Processing. In: Patrick, J., Cunningham, H. (eds.) Proceedings of the HLT-NAACL 2003 Workshop (SEALTS), pp. 45–50 (2003)
Piasecki, M.: Polish tagger TaKIPI: Rule based construction and optimisation. Task Quarterly 11(1–2), 151–167 (2007)
Piasecki, M., Broda, B.: Semantic similarity measure of polish nouns based on linguistic features. In: Abramowicz, W. (ed.) BIS 2007. LNCS, vol. 4439, pp. 381–390. Springer, Heidelberg (2007)
Piasecki, M., Radziszewski, A.: Morphosyntactic constraints in acquisition of linguistic knowledge for polish. In: Marciniak, M., Mykowiecka, A. (eds.) Bolc Festschrift, vol. 5070, pp. 163–190. Springer, Heidelberg (2009)
Przepiórkowski, A.: Slavic Information Extraction and Partial Parsing. In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing, pp. 1–10. ACL, Prague (2007)
Przepiórkowski, A.: A comparison of two morphosyntactic tagsets of Polish. In: Koseska-Toszewa, V., Dimitrova, L., Roszko, R. (eds.) Representing Semantics in Digital Lexicography: Proceedings of MONDILEX Fourth Open Workshop, Warszawa, pp. 138–144 (2009)
Przepiórkowski, A., Woliński, M.: A flexemic tagset for Polish. In: Proceedings of Morphological Processing of Slavic Languages, EACL 2003 (2003)
Radziszewski, A., Piasecki, M.: A preliminary noun phrase chunker for Polish. In: Proceedings of the Intelligent Information Systems (2010)
Radziszewski, A., Śniatowski, T.: Maca — a configurable tool to integrate Polish morphological data. In: Proceedings of FreeRBMT11 (2011)
Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the Third ACL Workshop on Very Large Corpora, Cambridge, MA, USA, pp. 82–94 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Radziszewski, A., Wardyński, A., Śniatowski, T. (2011). WCCL: A Morpho-syntactic Feature Toolkit. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_55
Download citation
DOI: https://doi.org/10.1007/978-3-642-23538-2_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23537-5
Online ISBN: 978-3-642-23538-2
eBook Packages: Computer ScienceComputer Science (R0)