The Prague Dependency Treebank

Böhmová, Alena; Hajič, Jan; Hajičová, Eva; Hladká, Barbora

doi:10.1007/978-94-010-0201-1_7

Alena Böhmová⁴,
Jan Hajič⁴,
Eva Hajičová⁴ &
…
Barbora Hladká⁴

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 20))

418 Accesses
47 Citations

Abstract

The availability of annotated data (with as rich and “deep” annotation as possible) is desirable in any new developments. Textual data are being used for so-called training phase of various empirical methods solving various problems in the field of computational linguistics. While there are many methods that use texts in their plain (or raw) form (in most cases for so-called unsupervised training), more accurate results may be obtained if annotated corpora are available. The data annotation itself is a complex task. While morphologically annotated corpora (pioneered by Henry Kučera in the 60’s) are now available for English and other languages, syntactically annotated corpora are rare. Inspired by the Penn Treebank, the most widely used syntactically annotated corpus of English, we decided to develop a similarly sized corpus of Czech with a rich annotation scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Prague Dependency Treebank

DeepBankPT and Companion Portuguese Treebanks in a Multilingual Collection of Treebanks Aligned with the Penn Treebank

The PROIEL treebank family: a standard for early attestations of Indo-European languages

Article 09 May 2017

References

Bémová Alla, Buráňová Eva, Hajič Jan, Kárníc Jiří, Pajas Petr, Panevová Jarmila, Urešová Zdeňka, Jan Štěpánek. (1997). Anotace na analytické rovině — příručka pro anotátory [Annotation on the Analytical Level — An-notator’s Guidelines], Technical Report #4 (draft), ÚFAL MFF UK, Prague, Czech Republic (in Czech).
Google Scholar
Chen Keh-Jiann et al. (2003). Sinica Treebank, this volume.
Google Scholar
Collins, Michael. (1997). Statistical Parser Based on Bigram Lexical Dependencies. In Proceedings of the 35th Annual Meeting of the ACL/EACL’97, p. 16–23, Madrid, Spain.
Google Scholar
Collins, Michael, Hajič Jan, Brill Eric, Ramshaw Lance, Christopher Tillmann. (1999). A Statistical Parser of Czech. In Proceedings of 37th ACL’99, p. 505–512, University of Maryland, College Park, June 22-25.
Google Scholar
Czech National Corpus (CNC). http://ucnk.ff.cuni.cz.
Google Scholar
Hajič, Jan. (1998). Building a Syntactically Annotated Corpus: The Prague Dependency Treebank. In Issues of Valency and Meaning. Studies in Honor of Jarmila Panevová, ed. Eva Hajičová, p. 106–132, Karolinum, Charles University Press, Prague, Czech Republic.
Google Scholar
Hajič, Jan. (in press). Disambiguation of Rich Inflection (Computational Morphology of Czech). Charles University Press — Karolinum.
Google Scholar
Hajič, Jan, Brill Eric, Collins Michael, Hladká Barbora, Jones Douglas, Kuo Cynthia, Ramshaw Lance, Schwartz Oren, Tillmann Christopher, Daniel Ze-man. (1998). Core Natural Language Processing Technology Applicable to Multiple Languages: Workshop98 Final Report for the 1998 Language Engineering Workshop for Students and Professionals: Integrating Research and Education, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, Research Note 37.
Google Scholar
Hajič, Jan, Eva Hajičová. (1997). Syntactic Tagging in the Prague Tree Bank. In Proceedings of the Second European Seminar “Language Applications for a Multilingual Europe” (ed. by R. Marcinkeviciene and N. Volz), p. 55–68, Kaunas.
Google Scholar
Hajič, Jan, Barbora Hladká. (1997). Probabilistic and Rule-Based Tagger of an Inflective Language — a Comparison. In Proceedings of the 5th Conference on Applied Natural Language Processing, p. 111–118, Washington, USA.
Google Scholar
Hajič, Jan, Barbora Hladká. (1998). Tagging Inflective Languages: Prediction of Morphological Categories for a Rich, Structured Tagset. In Proceedings of COLING-ACL Conference, p. 483–490, Montreal, Canada.
Google Scholar
Hajičová, Eva. (2000). Dependency-Based Underlying-Structure Tagging of a Very Large Corpus, TAL, 41-1, p. 47–66.
Google Scholar
Hajičová, Eva, Panevová Jarmila, Petr Sgall. (1998). Language Resources Need Annotations To Make Them Really Reusable: The Prague Dependency Treebank. In Proceedings of the First International Conference on Language Resources & Evaluation. Granada, Spain, p. 713–718.
Google Scholar
Křen, Michal. (1996). GRAPH editor MSc. Thesis, Institute of Formal and Applied Linguistics, Charles University, Prague, Czech Republic
Google Scholar
Marcus M. P., Kim G., Marcinkiewicz M. A. et al. (1994). The Penn Treebank: Annotating Predicate Argument Structure. In Proceedings of the ARPA Human Language Technology Workshop. San Francisco: Morgan Kaufmann.
Google Scholar
Marcus M. P., Santorini Beatrice, Marcinkiewicz M. A. (1993). Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19(2), 313–330.
Google Scholar
Palmer, M., Dang, H.T., J. Rosenzweig. (2000). Sense Tagging the Penn Tree-bank. In: Proceedings of LREC’OO, Athens, Greece.
Google Scholar
Panevová, Jarmila. (1980). Formy a funkce ve stavbě české věty [Forms and functions in the structure of the Czech sentence], Prague: Academia.
Google Scholar
Prague Dependency Treebank (PDT). http://ufal.ms.mff.cuni.cz/pdt/pdt.html.
Google Scholar
Sgall, Petr. (1967). Generativní popis jazyka a česká deklinace. Academia, Prague, Czech Republic.
Google Scholar
Sgall, Petr, Hajičová Eva, Jarmila Panevová. (1986) The Meaning of the Sentence and Its Semantic and Pragmatic Aspects. Reidel Publishing Company, Dordrecht, Netherlands, Academia, Prague, Czech Republic.
Google Scholar
Šmilauer, Vladimír. (1969). Novočeská skladba [Syntax of Contemporary Czech], 3rd ed., SPN, Prague, Czech Republic.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Malostranské nám. 25, CZ-118 00, Prague 1, Germany
Alena Böhmová, Jan Hajič, Eva Hajičová & Barbora Hladká

Authors

Alena Böhmová
View author publications
You can also search for this author in PubMed Google Scholar
Jan Hajič
View author publications
You can also search for this author in PubMed Google Scholar
Eva Hajičová
View author publications
You can also search for this author in PubMed Google Scholar
Barbora Hladká
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universite Paris 7, Paris, France
Anne Abeillé

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Böhmová, A., Hajič, J., Hajičová, E., Hladká, B. (2003). The Prague Dependency Treebank. In: Abeillé, A. (eds) Treebanks. Text, Speech and Language Technology, vol 20. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0201-1_7

Download citation

DOI: https://doi.org/10.1007/978-94-010-0201-1_7
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-1335-5
Online ISBN: 978-94-010-0201-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

The Prague Dependency Treebank

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Prague Dependency Treebank

DeepBankPT and Companion Portuguese Treebanks in a Multilingual Collection of Treebanks Aligned with the Penn Treebank

The PROIEL treebank family: a standard for early attestations of Indo-European languages

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

The Prague Dependency Treebank

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Prague Dependency Treebank

DeepBankPT and Companion Portuguese Treebanks in a Multilingual Collection of Treebanks Aligned with the Penn Treebank

The PROIEL treebank family: a standard for early attestations of Indo-European languages

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation