Abstract
The paper describes a method of identifying a set of interesting constructions in a syntactically annotated corpus of Czech – the Prague Dependency Treebank – by application of an automatic procedure of analysis by reduction to the trees in the treebank. The procedure clearly reveals certain linguistic phenomena that go beyond ‘dependency nature’ (and thus generally pose a problem for dependency-based formalisms). Moreover, it provides a feedback indicating that the annotation of a particular phenomenon might be inconsistent.
The paper contains discussion and analysis of individual phenomena, as well as the quantification of results of the automatic procedure on a subset of the treebank. The results show that a vast majority of sentences from the subset used in these experiments can be analyzed automatically and it confirms that most of the problematic phenomena belong to the language periphery.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Hajič, J., Panevová, J., Hajičová, E., Sgall, P., Pajas, P., Štěpánek, J., Havelka, J., Mikulová, M., Žabokrtský, Z., Ševčíková-Razímová, M.: Prague Dependency Treebank 2.0. LDC, Philadelphia (2006)
Sgall, P., Hajičová, E., Panevová, J.: The Meaning of the Sentence in Its Semantic and Pragmatic Aspects. Reidel, Dordrecht (1986)
Hajičová, E.: Corpus annotation as a test of a linguistic theory: The case of Prague Dependency Treebank, pp. 15–24. Franco Angeli, Milano (2007)
Lopatková, M., Plátek, M., Kuboň, V.: Modeling Syntax of Free Word-Order Languages: Dependency Analysis by Reduction. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 140–147. Springer, Heidelberg (2005)
Lopatková, M., Plátek, M., Sgall, P.: Towards a Formal Model for Functional Generative Description: Analysis by Reduction and Restarting Automata. The Prague Bulletin of Mathematical Linguistics 87, 7–26 (2007)
Tesnière, L.: Eléments de syntaxe structurale. Librairie C. Klincksieck, Paris (1959)
Mel’čuk, I.A.: Dependency in language. In: Proceedings of DepLing 2011, Barcelona, pp. 1–16 (2011)
Gerdes, K., Kahane, S.: Defining dependencies (and constituents). In: Proceedings of DepLing 2011, Barcelona, pp. 17–27 (2011)
Jančar, P., Mráz, F., Plátek, M., Vogel, J.: On monotonic automata with a restart operation. Journal of Automata, Languages and Combinatorics 4, 287–311 (1999)
Otto, F.: Restarting Automata. In: Reichel, H. (ed.) FCT 1995. LNCS, vol. 965, pp. 269–303. Springer, Heidelberg (1995)
Plátek, M., Mráz, F., Lopatková, M.: (In)Dependencies in Functional Generative Description by Restarting Automata. In: Proceedings of NCMA 2010, Wien, Austria, Österreichische Computer Gesellschaft. books@ocg.at, vol. 263, pp. 155–170 (2010)
Avgustinova, T., Oliva, K.: On the Nature of the Wackernagel Position in Czech. In: Formale Slavistik, pp. 25–47. Vervuert Verlag, Frankfurt am Main (1997)
Hana, J.: Czech Clitics in Higher Order Grammar. PhD thesis, The Ohio State University (2007)
Hajičová, E., Havelka, J., Sgall, P., Veselá, K., Zeman, D.: Issues of Projectivity in the Prague Dependency Treebank. The Prague Bulletin of Mathematical Linguistics 81, 5–22 (2004)
Holan, T., Kuboň, V., Oliva, K., Plátek, M.: On Complexity of Word Order. Les grammaires de dépendance – Traitement automatique des langues (TAL) 41, 273–300 (2000)
Pajas, P., Štěpánek, J.: System for Querying Syntactically Annotated Corpora. In: Proceedings of the ACL-IJCNLP 2009 Software Demonstrations, pp. 33–36. ACL, Singapore (2009)
Pajas, P., Štěpánek, J.: Recent Advances in a Feature-Rich Framework for Treebank Annotation. In: Proceedings of CoLING 2008, vol. 2, pp. 673–680. The Coling 2008 Organizing Committee, Manchester (2008)
Mikulová, M., Bémová, A., Hajič, J., Hajičová, E., Havelka, J., Kolářová, V., Kučová, L., Lopatková, M., Pajas, P., Panevová, J., Razímová, M., Sgall, P., Štěpánek, J., Urešová, Z., Veselá, K., Žabokrtský, Z.: Annotation on the tectogrammatical level in the Prague Dependency Treebank. Annotation manual. Technical Report 30, Prague, Czech Rep. (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kuboň, V., Lopatková, M., Mírovský, J. (2013). Automatic Processing of Linguistic Data as a Feedback for Linguistic Theory. In: Castro, F., Gelbukh, A., González, M. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2013. Lecture Notes in Computer Science(), vol 8265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45114-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-45114-0_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45113-3
Online ISBN: 978-3-642-45114-0
eBook Packages: Computer ScienceComputer Science (R0)