Abstract
The major aim of the Szeged Treebank project was to create a high-quality database of syntactic structures for Hungarian that can serve as a golden standard to further research in linguistics and computational language processing. The treebank currently contains full syntactic parsing of about 82,000 sentences, which is the result of accurate manual annotation. Current paper describes the linguistic theory as well as the actual method used in the annotation process. In addition, the application of the treebank for the training of automated syntactic parsers is also presented.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abeillé, A., Clément, L., Toussenel, F.: Building a Treebank for French. In: Abeillé, A. (ed.) Treebank: Building and Using Parsed Corpora, pp. 165–187. Kluwer Academic Publishers, Dordrecht (2003)
Atalay, N.B., Oflazer, K., Say, B.: The Annotation Process in the Turkish Treebank. In: Proceedings of the EACL 2003 Workshop on Linguistically Interpreted Corpora (LINC), Budapest, Hungary (2003)
Boguslavsky, I., Grigorieva, S., Grigoriev, N., Kreidlin, L., Frid, N.: Dependency treebank for Russian: concepts, tools, types of information. In: Proceedings of COLING-2000, Saarbrücken, Germany (2000)
Bond, F., Sanae, F., Chikara, H., Kaname, K., Shigeko, N., Nichols, E., Akira, O., Takaaki, T., Shigeaki, A.: The hinoki treebank A treebank for text understanding. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 158–167. Springer, Heidelberg (2005)
Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: The TIGER Treebank. In: Proceedings of the Workshop on Treebanks and Linguistic Theories (TLT 2002), Sozopol, Bulgaria (2002)
Csendes, D., Csirik, J., Gyimóthy, T.: The Szeged Corpus: A POS tagged and Syntactically Annotated Hungarian Natural Language Corpus. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 41–47. Springer, Heidelberg (2004)
Hajic, J.: Building a Syntactically Annotated Corpus: The Prague Dependency Treebank in Issues of Valency and Meaning, pp. 106-132, Charles University Press, Prague (1999)
Hócza, A., Iván, S.: Learning and recognizing noun phrases. In: Proceedings of the Hungarian Computational Linguistics Conference (MSZNY 2003), Szeged, Hungary, pp. 72–79 (2003)
Kuba, A., Csirik, J., Hócza, A.: POS tagging of Hungarian with combined statistical and rule-based methods. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 113–120. Springer, Heidelberg (2004)
Lesmo, L., Lombardo, V., Bosco, C.: Treebank Development: the TUT Approach. In: Proceedings of ICON 2002, Mumbay, India (2002)
Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus. In: Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, Cairo, Egypt (2004)
Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19 (1993)
Nivre, J.: What kinds of trees grow in Swedish soil? A comparison of four annotation schemes for Swedish. In: Proceedings of the Workshop on Treebanks and Linguistic Theories (TLT 2002), Sozopol, Bulgaria (2002)
Osenova, P., Simov, K.: BTB-TR05: BulTreeBank Stylebook, BulTreeBank Project Technical Report š 05 (2004)
Simov, K., Simov, A., Kouylekov, M., Ivanova, K., Grigorov, I., Ganev, H.: Development of Corpora within the CLaRK System: The BulTreeBank Project Experience. In: Proceedings of the Demo Sessions of EACL 2003, Budapest, Hungary, pp. 243–246 (2003)
Torruella, M.C., Anton´ın, M.: Design Principles for a Spanish Treebank in Proceedings of The Workshop on Treebanks and Linguistic Theories (TLT2002), Sozopol, Bulgaria (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Csendes, D., Csirik, J., Gyimóthy, T., Kocsor, A. (2005). The Szeged Treebank. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_16
Download citation
DOI: https://doi.org/10.1007/11551874_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28789-6
Online ISBN: 978-3-540-31817-0
eBook Packages: Computer ScienceComputer Science (R0)