German Treebanks: TIGER and TüBa-D/Z

Dipper, Stefanie; Kübler, Sandra

doi:10.1007/978-94-024-0881-2_22

Stefanie Dipper³ &
Sandra Kübler⁴

2201 Accesses
2 Citations

Abstract

German is a language that is closely related to English but has a richer morphology and freer word order than English. Additionally, German has four existing major treebanks, which differ considerably in their syntactic annotation schemes. All treebanks use a combination of constituent structure and grammatical functions, but the decisions with regard to other phenomena differ significantly, for example in the treatment of discontinuous structures. This makes German a good choice for a comparative analysis of treebanks. This chapter presents two major treebanks of German, TIGER and TüBa-D/Z. We describe the projects in which the two treebanks were annotated, discuss the respective annotation schemes, the processes used for annotation, and the data formats. We also discuss the usage of both treebanks, as well as other German treebanks, and we present a comparison of the two annotation schemes along with their advantages and disadvantages.

We would like to thank Heike Zinsmeister for insightful comments and for providing us with references, and we would like to thank the two anonymous reviewers for valuable comments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Softcover Book: USD 449.99; Price excludes VAT (USA)

Hardcover Book: USD 449.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DeepBankPT and Companion Portuguese Treebanks in a Multilingual Collection of Treebanks Aligned with the Penn Treebank

The PROIEL treebank family: a standard for early attestations of Indo-European languages

Article 09 May 2017

The Turkish Treebank

Notes

1.
Project websites are available at http://www.ims.uni-stuttgart.de/forschung/projekte/tiger.html (TIGER) and http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html (TüBa-D/Z). All URLs provided in this paper have been accessed Nov 28, 2016.
2.
Secondary edges were already proposed in the context of the NEGRA project [80] but had not been used in the actual annotation of the NEGRA corpus.
3.
This period was chosen because it covers a globally relevant event: the assassination of Israeli Prime Minister Yitzhak Rabin. The idea was to keep the option open of building a multilingual corpus, because it would be rather easy to find news about this event in many different languages. A drawback is that the there is some overlap in content among the articles of the two weeks.
The NEGRA corpus also consists of texts from ‘Frankfurter Rundschau’, from 1991 and 1992. As far as we know, there is no overlap in texts between the NEGRA and TIGER corpora.
4.
TüBa-D/Z is short for ‘Tübinger Baumbank des Deutschen/Zeitungssprache’ (Tübingen Treebank of German/Newspaper), i.e., the Z denotes newspaper texts while the S in TüBa-D/S denotes spontaneous speech.
5.
For more information on these projects, see http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html.
6.
Note that we only have a parenthetical construction if the matrix clause is embedded into the direct speech. If the parenthetical were annotated as the head of the direct speech, this would result in a crossing branch, which is not an option in the TüBa-D/Z annotation scheme.
7.
Besides NEGRA, TIGER, TüBa-D/Z, and the Verbmobil treebanks, Annotate was also used for e.g. the Potsdam Commentary Corpus [84], Mercurius Treebank [18], Deutsche Diachrone Baumbank [39], and SMULTRON [93]. The tool is no longer maintained.
8.
TigerMorph was developed by Berthold Crysmann and was only used in the TIGER project. It is not available.
9.
The transfer system of the XEROX Translation Environment (XTE) by Martin Kay, which was part of the XLE development platform.
10.
The grammar was later improved and extended, and, as of 2006, had a coverage of 86% in terms of full parses, and dependency-based F-scores of 84% [24, 71].
11.
Flickinger et al. (chapter “Sustainable Development and Refinement of Complex Linguistic Annotations at Scale”) discuss the use of discriminants in grammar-based treebanking. Discriminants encode the features distinguishing competing analyses and can support annotators in disambiguating complex structures. Such an approach was later adapted to LFG in the INESS project, which developed the LFG Parsebanker. This tool has been applied in creating the Norwegian LFG treebank [56, 73].
12.
For discussions of these and similar formats, see also Ide et al. (chapter “Designing Annotation Schemes: From Model to Representation”).
13.
This description refers to the NEGRA export format 4. There is a previous version, export format 3, which lacks the lemma column, but is otherwise identical.
14.
SynAF is a standard developed by the International Organization for Standardisation in ISO/TC37/SC4 (Language Resources Management); http://www.tc37sc4.org/, see Ide et al. (chapter “Community Standards for Linguistically-Annotated Resources”).
15.
The script was part of the NEGRA corpus deliverable. The script could not deal correctly with some kinds of crossing branches and was not maintained after the end of NEGRA.
16.
http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/Tiger2Dep.en.html.
17.
To enhance readability, we provide indentation in the example presented in Fig. 20.
18.
The license can be signed here:
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/license/index.html.
19.
The license is available from http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html.
20.
http://weblicht.sfs.uni-tuebingen.de/weblichtwiki/index.php/Main_Page.
21.
http://weblicht.sfs.uni-tuebingen.de/weblichtwiki/index.php/Tundra.
22.
http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/icarus.html.
23.
http://www.deutschestextarchiv.de/.
24.
There is work in progress for the Copenhagen Dependency Treebank, but the annotations have not been released yet (http://code.google.com/p/copenhagen-dependency-treebank/wiki/CDT). After the time of writing, the Hamburg Dependency Treebank was announced in 2014, which consists of approx. 2,00,000 manually annotated sentences plus 55,000 automatically parsed sentences, see https://corpora.uni-hamburg.de/drupal/de/islandora/object/treebank:hdt.

References

Albert, S., Anderssen, J., Bader, R., Becker, S., Bracht, T., Brants, S., Brants, T., Demberg, V., Dipper, S., Eisenberg, P., Hansen, S., Hirschmann, H., Janitzek, J., Kirstein, C., Langner, R., Michelbacher, L., Plaehn, O., Preis, C., Pußel, M., Rower, M., Schrader, B., Schwartz, A., Smith, G., Uszkoreit, H.: TIGER Annotationsschema. Technical report, Universität des Saarlandes, Universität Stuttgart and Universität Potsdam (2003). http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/tiger_scheme-syntax.pdf
Bosch, S., Choi, K.-S., de la Clergerie, É., Fang, A.C., Faaß, G., Lee, K., Pareja-Lora, A., Romary, L., Witt, A., Zeldes, A., Zipser, F.: $<$tiger2/$>$ as a standardised serialisation for ISO 24615 – SynAF. In: Proceedings of the Eleventh International Workshop on Treebanks and Linguistic Theories (TLT), Lisbon, Portugal, pp. 37–60 (2012)
Google Scholar
Brants, S., Hansen, S.: Developments in the TIGER annotation scheme and their realization in the corpus. In: Proceedings of the Third Conference on Language Resources and Evaluation LREC-02, Las Palmas de Gran Canaria, pp. 1643–1649 (2002)
Google Scholar
Brants, S., Dipper, S., Eisenberg, P., Hansen-Schirra, S., König, E., Lezius, W., Rohrer, C., Smith, G., Uszkoreit, H.: TIGER: linguistic interpretation of a German corpus. Res. Lang. Comput., Special Issue 2(4), 597–620 (2004)
Google Scholar
Brants, T.: The NeGra Export Format for Annotated Corpora. Universität des Saarlandes, Computational Linguistics, Saarbrücken, Germany (1997). CLAUS Report No. 98, http://www.coli.uni-saarland.de/~thorsten/publications/Brants-CLAUS98.pdf
Brants, T.: Cascaded Markov models. In: Proceedings of EACL-99, Bergen, Norway, pp. 118–125 (1999)
Google Scholar
Brants, T.: Inter-annotator agreement for a German newspaper corpus. In: Proceedings of Second International Conference on Language Resources and Evaluation LREC-2000, Athens, Greece (2000)
Google Scholar
Brants, T.: TnT – a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing ANLP-2000, Seattle, Washington, pp. 224–231 (2000)
Google Scholar
Brants, T., Skut, W.: Automation of treebank annotation. In: Proceedings of the Joint Conference on New Methods in Natural Language Processing and Computational Language Learning. NeMLaP3/CoNLL98, Australia, Sydney, pp. 49–57 (1998)
Google Scholar
Brants, T., Skut, W., Uszkoreit, H.: Syntactic annotation of a German newspaper corpus. In: Proceedings of the ATALA Treebank Workshop, Paris, France, pp. 69–76 (1999)
Google Scholar
Brants, T., Skut, W., Uszkoreit, H.: Syntactic annotation of a German newspaper corpus. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora. Text, Speech and Language Technology, vol. 20, pp. 73–87. Springer, The Netherlands (2003)
Chapter Google Scholar
Bresnan, J.: The Mental Representation of Grammatical Relations. MIT Press, Cambridge (1982)
Google Scholar
Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Language Learning (CoNLL), New York, NY, pp. 149–164 (2006)
Google Scholar
Butt, M., Dyvik, H., King, T.H., Masuichi, H., Rohrer, C.: The parallel grammar project. In: Proceedings of COLING-2002 Workshop on Grammar Engineering and Evaluation, Taipei, Taiwan, vol. 15, pp. 1–7 (2002)
Google Scholar
Corazza, A., Lavelli, A., Satta, G.: An information-theoretic measure to evaluate parsing difficulty across treebanks. ACM Trans. Speech Lang. Process. 9(4) (2013)
Google Scholar
Crouch, D., Dalrymple, M., Kaplan, R.M., King, T.H., Maxwell III, J.T., Newman, P.: XLE documentation. Technical report, Palo Alto Research Center
Google Scholar
Crysmann, B., Hansen-Schirra, S., Smith, G., Ziegler-Eisele, D.: TIGER Morphologie-Annotationsschema. Technical report, Universität des Saarlandes, Universität Stuttgart and Universität Potsdam (2005). http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/tiger_scheme-morph.pdf
Demske, U.: Das Mercurius-Projekt: eine Baumbank für das Frühneuhochdeutsche. In: Zifonun, G., Kallmeyer, W. (eds.) Sprachkorpora - Datenmengen und Erkenntnisfortschritt, Jahrbuch des Instituts für deutsche Sprache 2006, pp. 91–104. de Gruyter, Berlin (2007)
Google Scholar
Dipper, S.: Grammar-based corpus annotation. In: Proceedings of the COLING Workshop on Linguistically Interpreted Corpora (LINC-2000), Luxembourg, pp. 56–64 (2000)
Google Scholar
Dipper, S.: Implementing and Documenting Large-Scale Grammars – German LFG. Ph.D. thesis, IMS, University of Stuttgart (2003). Working papers of the Institut für Maschinelle Sprachverarbeitung (AIMS), vol. 9(1)
Google Scholar
Dipper, S.: Querying topological fields in the TIGER scheme with TIGERSearch. In: Proceedings of the 13th International Workshop on Treebanks and Linguistic Theories (TLT13), Tübingen, Germany, pp. 37–50 (2014)
Google Scholar
Drach, E.: Grundgedanken der Deutschen Satzlehre. Diesterweg, Frankfurt am Main (1937)
Google Scholar
Erdmann, O.: Grundzüge der deutschen Syntax nach ihrer geschichtlichen Entwicklung dargestellt. Verlag der Cotta’schen Buchhandlung, Stuttgart (1886). Erste Abteilung
Google Scholar
Forst, M.: Treebank conversion – establishing a testsuite for a broad-coverage LFG from the TIGER treebank. In: Proceedings of the EACL Workshop on Linguistically Interpreted Corpora (LINC 2003), Budapest, pp. 25–32 (2003)
Google Scholar
Forst, M., Bertomeu, N., Crysmann, B., Fouvry, F., Hansen-Schirra, S., Kordoni, V.: Towards a dependency-based gold standard for German parsers - the TiGer dependency bank. In: Proceedings of LINC 2004 (2004)
Google Scholar
Frank, A., King, TH., Kuhn, J., Maxwell, J.: Optimality theory style constraint ranking in large-scale LFG grammars. In: Proceedings of the Third LFG Conference, Brisbane, Australia (1998)
Google Scholar
Gärtner, M., Thiele, G., Seeker, W., Björkelund, A., Kuhn, J.: ICARUS – an extensible graphical search tool for dependency treebanks. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60, Sofia, Bulgaria, August 2013. Association for Computational Linguistics
Google Scholar
Gastel, A., Schulze, S., Versley, Y., Hinrichs, E.: Annotation of explicit and implicit discourse relations in the TüBa-D/Z Treebank. In: Proceedings of the Conference of the German Society for Computational Linguistics and Language Technology (GSCL), Hamburg, Germany (2011)
Google Scholar
Hajič, J., Ciaramita, M., Johansson, R., Kawahara, D., Martí, M.A., Màrquez, L., Meyers, A., Nivre, J., Padó, S., Štěpánek, J., Straňák, P., Surdeanu, M., Xue, N., Zhang, Y.: The CoNLL-2009 shared task: syntactic and semantic dependencies in multiple languages. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, Boulder, Colorado, pp. 1–18, June 2009. Association for Computational Linguistics
Google Scholar
Harbusch, K.: Incremental sentence production inhibits clausal coordinate ellipsis: a treebank study into Dutch and German. Dialogue Discourse. Special issue on Incremental Processing in Dialogue 2(1):313–332 (2011)
Google Scholar
Harbusch, K., Kempen, G.: Clausal coordinate ellipsis in German: the TIGER treebank as a source of evidence. In: Proceedings of NODALIDA 2007 – Sixteenth Nordic Conference of Computational Linguistics, Tartu, Estonia (2007)
Google Scholar
Hinrichs, E., Beck, K.: Auxiliary fronting in German: a walk in the woods. In: Proceedings of the Twelfth Workshop on Treebanks and Linguistic Theories (TLT), Sofia, Bulgaria, pp. 61–72 (2013)
Google Scholar
Hinrichs, E., Telljohann, H.: Constructing a valence lexicon for a treebank of German. In: Proceedings of the 7th International Workshop on Treebanks and Linguistic Theories (TLT), Groningen, The Netherlands, pp. 41–52 (2009)
Google Scholar
Hinrichs, E.W., Kübler, S.: Treebank profiling of spoken and written German. In: Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories, Barcelona, Spain, pp. 65–76 (2005)
Google Scholar
Hinrichs, E.W., Kübler, S.: What linguists always wanted to know about German and did not know how to ask. In: Suominen, M., Arppe, A., Airola, A., Heinämäki, O., Miestamo, M., Määttä, U., Niemi, J., Pitkänen, K.K., Sinnemäki, K. (eds.) A Man of Measure: Festschrift in Honour of Fred Karlsson on his 60th Birthday. SKY Journal of Linguistics, vol. 19, pp. 24–33. The Linguistic Association of Finland (2006). Special Supplement
Google Scholar
Hinrichs, E.W., Bartels, J., Kawata, Y., Kordoni, V., Telljohann, H.: The Tübingen treebanks for spoken German, English, and Japanese. In: Wahlster, W. (ed.) Verbmobil: Foundations of Speech-to-Speech Translation, pp. 550–574. Springer, Berlin (2000)
Chapter Google Scholar
Hinrichs, E.W., Bartels, J., Kawata, Y., Kordoni, V., Telljohann, H.: The Verbmobil treebanks. In: Proceedings of KONVENS 2000, 5. Konferenz zur Verarbeitung natürlicher Sprache, Ilmenau, Germany, pp. 107–112 (2000)
Google Scholar
Hinrichs, E.W., Filippova, K., Wunsch, H.: What treebanks can do for you: rule-based and machine-learning approaches to anaphora resolution in German. In: Civit, M., Kübler, S., Martí, M.A. (eds.) Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), Barcelona, Spain, pp. 77–88 (2005)
Google Scholar
Hirschmann, H., Linde, S.: Annotationsguidelines zur Deutschen Diachronen Baumbank. Technical report, Humboldt-Universität zu Berlin (2010). http://korpling.german.hu-berlin.de/ddb-doku
Höhle,T.: Der Begriff “Mittelfeld”, Anmerkungen über die Theorie der topologischen Felder. In: Akten des Siebten Internationalen Germanistenkongresses 1985, Göttingen, Germany, pp. 329–340 (1986)
Google Scholar
Kallmeyer, L., Maier, W.: Data-driven parsing using probabilistic linear context-free rewriting systems. Comput. Linguist. 39(1), 87–119 (2013)
Article Google Scholar
King, T.H., Crouch, R., Riezler, S., Dalrymple, M., Kaplan, R.M.: The PARC700 dependency bank. In: Proceedings of the 4th International Workshop on Linguistically Interpreted Corpora (LINC-03) at EACL-03, pp. 1–8 (2003)
Google Scholar
King, T.H., Dipper, S., Frank, A., Kuhn, J., Maxwell, J.: Ambiguity management in grammar writing. Res. Lang. Comput. 2, 259–280 (2004)
Article Google Scholar
Kountz, M.: Extraktion von Dependenztripeln aus der TIGER-Baumbank (2006). Studienarbeit, Universität Stuttgart
Google Scholar
Kübler, S.: How do treebank annotation schemes influence parsing results? Or how not to compare apples and oranges. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria, pp. 293–300 (2005)
Google Scholar
Kübler, S.: The PaGe shared task on parsing German. In: Proceedings of the ACL Workshop on Parsing German, Columbus, Ohio, pp. 55–63 (2008)
Google Scholar
Kübler, S., Telljohann, H.: Towards a dependency-based evaluation for partial parsing. In: Proceedings of the LREC-Workshop Beyond PARSEVAL – Towards Improved Evaluation Measures for Parsing Systems, Las Palmas, Gran Canaria, pp. 9–16 (2002)
Google Scholar
Kübler, S., Hinrichs, E.W., Maier, W.: Is it really that difficult to parse German? In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP), Sydney, Australia, pp. 111–119 (2006)
Google Scholar
Kübler, S., Maier, W., Rehbein, I., Versley, Y.: How to compare treebanks. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC), Marrakech, Morocco, pp. 2322–2329 (2008)
Google Scholar
Kübler, S., Rehbein, I., van Genabith, J.: TePaCoC – a corpus for testing parser performance on complex German grammatical constructions. In: Proceedings of the Seventh International Workshop on Treebanks and Linguistic Theories, Groningen, The Netherlands, pp. 15–28 (2009)
Google Scholar
Kübler, S., Beck, K., Hinrichs, E., Telljohann, H.: Chunking German: an unsolved problem. In: Proceedings of the Forth Linguistic Annotation Workshop (LAW), Uppsala, Sweden, pp. 147–151 (2010)
Google Scholar
Kunze, C., Lemnitzer, L.: Germanet – representation, visualization, application. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), Las Palmas, Canary Islands, pp. 1485–1491 (2002)
Google Scholar
Lezius, W.: Ein Suchwerkzeug für syntaktisch annotierte Textkorpora. Ph.D. thesis, Universität Stuttgart (2002). Arbeitspapiere des Instituts für Maschinelle Sprachverarbeitung (AIMS), vol. 8(4)
Google Scholar
Martens, S.: TüNDRA: a web application for treebank search and visualization. In: Proceedings of the Twelfth Workshop on Treebanks and Linguistic Theories (TLT), Sofia, Bulgaria, pp. 133–144 (2013)
Google Scholar
Mengel, A., Lezius, W.: An XML-based representation format for syntactically annotated corpora. In: Proceedings of the International Conference on Language Resources and Evaluation, pp. 121–126 (2000)
Google Scholar
Meurer, P., Dyvik, H., Rosén, V., De Smedt, K., Lyse, GI., Losnegaard, G.S., Thunes, M.: The INESS treebanking infrastructure. In: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013). NEALT Proceedings, Olso, Norway, vol. 16, pp. 453–458 (2013)
Google Scholar
Meurers, D., Müller, S.: Corpora and syntax. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics: An International Handbook, pp. 920–933. Mouton de Gruyter, Berlin (2009)
Chapter Google Scholar
Müller, F.H.: Stylebook for the Tübingen partially parsed corpus of written German (TüPP-D/Z). Technical report, Seminar für Sprachwissenschaft, Universität Tübingen (2004). http://www.sfs.uni-tuebingen.de/tupp/doc/stylebook.ps
Naumann, K.: Manual for the annotation of in-document referential relations. Technical report, Universität Tübingen (2007). http://www.sfs.uni-tuebingen.de/resources/tuebadz-coreference-manual-2007.pdf
Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the CoNLL 2007 Shared Task. Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Czech Republic, Prague, pp. 915–932(2007)
Google Scholar
Orasan, C.: PALinkA: A highly customizable tool for discourse annotation. In: Proceedings of the 4th SIGdial Workshop on Discourse and Dialog, Sapporo, Japan, pp. 39–43 (2003)
Google Scholar
Pappert, S., Schließer, J., Janssen, D., Pechmann, T.: Corpus- and psycholinguistic investigations of linguistic constraints on German object order. In: Späth, A. (ed.) Interfaces and Interface Conditions, pp. 299–328. Mouton de Gruyter, Berlin (2007)
Google Scholar
Plaehn, O.: Annotate: Bedienungsanleitung. Technical report, Universität des Saarlandes (1998). http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/annotate-manual.ps.gz
Plaehn, O.: Probabilistic parsing with discontinuous phrase structure grammar. Master’s thesis, Department of Computational Linguistics, University of the Saarland, Saarbrücken, Germany (1999)
Google Scholar
Plaehn, O., Brants, T.: Annotate – an efficient interactive annotation tool. In: Proceedings of the 6th Conference on Applied Natural Language Processing (ANLP-2000), Seattle, WA (2000)
Google Scholar
Pollard, C., Sag, I.A.: Head-Driven Phrase Structure Grammar. Studies in Contemporary Linguistics. University of Chicago Press, Chicago (1994)
Google Scholar
Rehbein, I., van Genabith, J.: Treebank annotation schemes and parser evaluation for German. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, pp. 630–639 (2007)
Google Scholar
Rehbein, I., van Genabith, J.: Why is it so difficult to compare treebanks? TIGER and TüBa-D/Z revisited. In: Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories (TLT), Bergen, Norway, pp. 115–126 (2007)
Google Scholar
Rehm, G., Witt, A., Zinsmeister, H., Dellert, J.: Masking treebanks for the free distribution of linguistic resources and other applications. In: Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories (TLT), Bergen, Norway (2007)
Google Scholar
Reis, M.: Zum Subjektbegriff im Deutschen. In: Abraham, W. (ed.) Satzglieder im Deutschen: Vorschläge zur syntaktischen, semantischen und pragmatischen Fundierung, pp. 171–211. Narr, Tübingen (1982)
Google Scholar
Rohrer, C., Forst, M.: Improving coverage and parsing quality of a large-scale LFG for German. In: Proceedings of the Language Resources and Evaluation Conference (LREC-2006), Genoa, Italy, pp. 2206–2211 (2006)
Google Scholar
Romary, L., Zeldes, A., Zipser, F.: $<$tiger2/$>$ – Serialising the ISO SynAF syntactic object model. Lang. Resour. Eval. (to appear)
Google Scholar
Rosén, V., Meurer, P., De Smedt, K.: LFG Parsebanker: a toolkit for building and searching a treebank as a parsed corpus. In: Proceedings of the Seventh International Workshop on Treebanks and Linguistic Theories, Utrecht, pp. 127–133 (2009)
Google Scholar
Roussel, A.: Documentation of the tool TIGER Tree Enricher (2014). http://www.linguistics.ruhr-uni-bochum.de/resources /software/tte
Samuelsson, Y., Volk, M.: Automatic node insertion for treebank deepening. In: Proceedings of the Third Workshop on Treebanks and Linguistic Theories (TLT), Tübingen, pp. 127–136 (2004)
Google Scholar
Schiller, A., Teufel, S., Stöckert, C., Thielen, C.: Guidelines für das Tagging deutscher Textcorpora mit STTS (Kleines und großes Tagset). Technical report, Universität Stuttgart and Universität Tübingen (1999). http://www.ims.uni-stuttgart.de/forschung/ressourcen/lexika/TagSets/stts-1999.pdf
Seeker, W., Kuhn, J.: Making ellipses explicit in dependency conversion for a German treebank. In: Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, pp. 3132–3139 (2012)
Google Scholar
Simon, S., Hinrichs, E., Schulze, S., Versley, Y.: Handbuch zur Annotation expliziter und impliziter Diskursrelationen im Korpus der Tübinger Baumbank des Deutschen (TüBa-D/Z). Universität Tübingen (2011)
Google Scholar
Skut, W., Brants, T., Krenn, B., Uszkoreit, H.: A linguistically interpreted corpus of German newspaper text. In: Proceedings of the ESSLLI Workshop on Recent Advances in Corpus Annotation, pp. 705–711 (1998)
Google Scholar
Skut, W., Krenn, B., Brants, T., Uszkoreit, H.: An annotation scheme for free word order languages. In: Proceedings of the Fifth Conference on Applied Natural Language Processing ANLP 1997, Washington, DC, pp. 88–95 (1997)
Google Scholar
Smith, G.: A brief introduction to the TIGER Treebank, version 1. Technical report, Universität Potsdam (2003). http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/tiger_introduction.pdf
Smith, G.: Searching for morphological structure with regular expressions. Technical report, Universität Potsdam (2003). http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/tiger_regex.pdf
Spreyer, K., Frank, A.: The TIGER 700 RMRS Bank: RMRS construction from dependencies. In: Proceedings of LINC 2005, Jeju Island, Korea, pp. 1–10 (2005)
Google Scholar
Stede, M.: The potsdam commentary corpus. In: Proceedings of the ACL-04 Workshop on Discourse Annotation, Barcelona, pp. 96–102 (2004)
Google Scholar
Steiner, I.: Partial agreement in German: a processing issue? In: Proceedings of the International Conference on Linguistic Evidence, Tübingen, Germany (2009)
Google Scholar
Telljohann, H., Hinrichs, E., Kübler, S.: The TüBa-D/Z treebank: annotating German with a context-free backbone. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC), Lisbon, Portugal, pp. 2229–2235 (2004)
Google Scholar
Telljohann, H., Hinrichs, E.W., Kübler, S., Zinsmeister, H., Beck, K.: Stylebook for the Tübingen Treebank of Written German (TüBa-D/Z). Universität Tübingen, Germany, Seminar für Sprachwissenschaft (2015)
Google Scholar
Thielen, C., Schiller, A.: Ein kleines und erweitertes Tagset fürs Deutsche. In: Feldweg, H., Hinrichs, E. (eds.) Lexikon & Text, pp. 193–203. Niemeyer, Tübingen, Tübingen (1994)
Google Scholar
Trushkina, J.: Morpho-Syntactic Annotation and Dependency Parsing of German. Ph.D. thesis, Universität Tübingen (2004)
Google Scholar
Ule, T.: Treebank Refinement: Optimising Representations of Syntactic Analyses for Probabilistic Context-Free Parsing. Ph.D. thesis, Universität Tübingen (2007)
Google Scholar
Veenstra, J., Müller, F.H., Ule, T.: Topological fields chunking for German. In: Proceedings of the Sixth Conference on Natural Language Learning (CoNLL 2002), Taipei, Taiwan, pp. 56–62 (2002)
Google Scholar
Versley, Y., Beck, K., Hinrichs, E., Telljohann, H.: A syntax-first approach to high-quality morphological analysis and lemma disambiguation for the TüBa-D/Z Treebank. In: Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories (TLT), Tartu, Estonia, pp. 233–244 (2010)
Google Scholar
Volk, M., Göhring, A., Marek, T., Samuelsson, Y.: SMULTRON (version 3.0) – The Stockholm MULtilingual parallel TReebank (2010). An English-French-German-Spanish-Swedish parallel treebank with sub-sentential alignments. http://www.cl.uzh.ch/research/parallelcorpora/paralleltreebanks_en.html
Wahlster, W. (ed.): Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Berlin (2000)
Google Scholar
Zarrieß, S., Cahill, A., Kuhn, J.: To what extent does sentence-internal realisation reflect discourse context? A study on word order. In: Proceedings of the 13th Conference of the European Chapter of the ACL, Avignon, France, pp. 767–776 (2012)
Google Scholar
Zeldes, A., Ritz, J., Lüdeling, A., Chiarcos, C.: ANNIS: a search tool for multi-layer annotated corpora. In: Proceedings of Corpus Linguistics 2009, Liverpool, UK (2009)
Google Scholar
Zinsmeister, H.: Treebank data as linguistic evidence? Coordination in TüBa-D/Z. In: Proceedings of the International Conference on Linguistic Evidence, Tübingen, Germany (2006)
Google Scholar
Zinsmeister, H., Kuhn, J., Dipper, S.: Utilizing LFG parses for treebank annotation. In: Proceedings of the LFG-02 Conference, Athens, Greece, pp. 427–447 (2002). CSLI Publications
Google Scholar

Download references

Author information

Authors and Affiliations

Sprachwissenschaftliches Institut, Ruhr-Universität Bochum, 44780, Bochum, Germany
Stefanie Dipper
Department of Linguistics, Indiana University, Bloomington, IN, 47405, USA
Sandra Kübler

Authors

Stefanie Dipper
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Kübler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefanie Dipper .

Editor information

Editors and Affiliations

Department of Computer Science, Vassar College, Poughkeepsie, New York, USA
Nancy Ide
Department of Computer Science, Volen Center for Complex Systems, Brandeis University, Waltham, Massachusetts, USA
James Pustejovsky

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dipper, S., Kübler, S. (2017). German Treebanks: TIGER and TüBa-D/Z. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_22

Download citation

DOI: https://doi.org/10.1007/978-94-024-0881-2_22
Published: 17 June 2017
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics

German Treebanks: TIGER and TüBa-D/Z

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

DeepBankPT and Companion Portuguese Treebanks in a Multilingual Collection of Treebanks Aligned with the Penn Treebank

The PROIEL treebank family: a standard for early attestations of Indo-European languages

The Turkish Treebank

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

German Treebanks: TIGER and TüBa-D/Z

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

DeepBankPT and Companion Portuguese Treebanks in a Multilingual Collection of Treebanks Aligned with the Penn Treebank

The PROIEL treebank family: a standard for early attestations of Indo-European languages

The Turkish Treebank

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation