Abstract
German is a language that is closely related to English but has a richer morphology and freer word order than English. Additionally, German has four existing major treebanks, which differ considerably in their syntactic annotation schemes. All treebanks use a combination of constituent structure and grammatical functions, but the decisions with regard to other phenomena differ significantly, for example in the treatment of discontinuous structures. This makes German a good choice for a comparative analysis of treebanks. This chapter presents two major treebanks of German, TIGER and TüBa-D/Z. We describe the projects in which the two treebanks were annotated, discuss the respective annotation schemes, the processes used for annotation, and the data formats. We also discuss the usage of both treebanks, as well as other German treebanks, and we present a comparison of the two annotation schemes along with their advantages and disadvantages.
We would like to thank Heike Zinsmeister for insightful comments and for providing us with references, and we would like to thank the two anonymous reviewers for valuable comments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Project websites are available at http://www.ims.uni-stuttgart.de/forschung/projekte/tiger.html (TIGER) and http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html (TüBa-D/Z). All URLs provided in this paper have been accessed Nov 28, 2016.
- 2.
Secondary edges were already proposed in the context of the NEGRA project [80] but had not been used in the actual annotation of the NEGRA corpus.
- 3.
This period was chosen because it covers a globally relevant event: the assassination of Israeli Prime Minister Yitzhak Rabin. The idea was to keep the option open of building a multilingual corpus, because it would be rather easy to find news about this event in many different languages. A drawback is that the there is some overlap in content among the articles of the two weeks.
The NEGRA corpus also consists of texts from ‘Frankfurter Rundschau’, from 1991 and 1992. As far as we know, there is no overlap in texts between the NEGRA and TIGER corpora.
- 4.
TüBa-D/Z is short for ‘Tübinger Baumbank des Deutschen/Zeitungssprache’ (Tübingen Treebank of German/Newspaper), i.e., the Z denotes newspaper texts while the S in TüBa-D/S denotes spontaneous speech.
- 5.
For more information on these projects, see http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html.
- 6.
Note that we only have a parenthetical construction if the matrix clause is embedded into the direct speech. If the parenthetical were annotated as the head of the direct speech, this would result in a crossing branch, which is not an option in the TüBa-D/Z annotation scheme.
- 7.
- 8.
TigerMorph was developed by Berthold Crysmann and was only used in the TIGER project. It is not available.
- 9.
The transfer system of the XEROX Translation Environment (XTE) by Martin Kay, which was part of the XLE development platform.
- 10.
- 11.
Flickinger et al. (chapter “Sustainable Development and Refinement of Complex Linguistic Annotations at Scale”) discuss the use of discriminants in grammar-based treebanking. Discriminants encode the features distinguishing competing analyses and can support annotators in disambiguating complex structures. Such an approach was later adapted to LFG in the INESS project, which developed the LFG Parsebanker. This tool has been applied in creating the Norwegian LFG treebank [56, 73].
- 12.
For discussions of these and similar formats, see also Ide et al. (chapter “Designing Annotation Schemes: From Model to Representation”).
- 13.
This description refers to the NEGRA export format 4. There is a previous version, export format 3, which lacks the lemma column, but is otherwise identical.
- 14.
SynAF is a standard developed by the International Organization for Standardisation in ISO/TC37/SC4 (Language Resources Management); http://www.tc37sc4.org/, see Ide et al. (chapter “Community Standards for Linguistically-Annotated Resources”).
- 15.
The script was part of the NEGRA corpus deliverable. The script could not deal correctly with some kinds of crossing branches and was not maintained after the end of NEGRA.
- 16.
- 17.
To enhance readability, we provide indentation in the example presented in Fig. 20.
- 18.
The license can be signed here:
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/license/index.html.
- 19.
The license is available from http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html.
- 20.
- 21.
- 22.
- 23.
- 24.
There is work in progress for the Copenhagen Dependency Treebank, but the annotations have not been released yet (http://code.google.com/p/copenhagen-dependency-treebank/wiki/CDT). After the time of writing, the Hamburg Dependency Treebank was announced in 2014, which consists of approx. 2,00,000 manually annotated sentences plus 55,000 automatically parsed sentences, see https://corpora.uni-hamburg.de/drupal/de/islandora/object/treebank:hdt.
References
Albert, S., Anderssen, J., Bader, R., Becker, S., Bracht, T., Brants, S., Brants, T., Demberg, V., Dipper, S., Eisenberg, P., Hansen, S., Hirschmann, H., Janitzek, J., Kirstein, C., Langner, R., Michelbacher, L., Plaehn, O., Preis, C., Pußel, M., Rower, M., Schrader, B., Schwartz, A., Smith, G., Uszkoreit, H.: TIGER Annotationsschema. Technical report, Universität des Saarlandes, Universität Stuttgart and Universität Potsdam (2003). http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/tiger_scheme-syntax.pdf
Bosch, S., Choi, K.-S., de la Clergerie, É., Fang, A.C., Faaß, G., Lee, K., Pareja-Lora, A., Romary, L., Witt, A., Zeldes, A., Zipser, F.: \(<\)tiger2/\(>\) as a standardised serialisation for ISO 24615 – SynAF. In: Proceedings of the Eleventh International Workshop on Treebanks and Linguistic Theories (TLT), Lisbon, Portugal, pp. 37–60 (2012)
Brants, S., Hansen, S.: Developments in the TIGER annotation scheme and their realization in the corpus. In: Proceedings of the Third Conference on Language Resources and Evaluation LREC-02, Las Palmas de Gran Canaria, pp. 1643–1649 (2002)
Brants, S., Dipper, S., Eisenberg, P., Hansen-Schirra, S., König, E., Lezius, W., Rohrer, C., Smith, G., Uszkoreit, H.: TIGER: linguistic interpretation of a German corpus. Res. Lang. Comput., Special Issue 2(4), 597–620 (2004)
Brants, T.: The NeGra Export Format for Annotated Corpora. Universität des Saarlandes, Computational Linguistics, Saarbrücken, Germany (1997). CLAUS Report No. 98, http://www.coli.uni-saarland.de/~thorsten/publications/Brants-CLAUS98.pdf
Brants, T.: Cascaded Markov models. In: Proceedings of EACL-99, Bergen, Norway, pp. 118–125 (1999)
Brants, T.: Inter-annotator agreement for a German newspaper corpus. In: Proceedings of Second International Conference on Language Resources and Evaluation LREC-2000, Athens, Greece (2000)
Brants, T.: TnT – a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing ANLP-2000, Seattle, Washington, pp. 224–231 (2000)
Brants, T., Skut, W.: Automation of treebank annotation. In: Proceedings of the Joint Conference on New Methods in Natural Language Processing and Computational Language Learning. NeMLaP3/CoNLL98, Australia, Sydney, pp. 49–57 (1998)
Brants, T., Skut, W., Uszkoreit, H.: Syntactic annotation of a German newspaper corpus. In: Proceedings of the ATALA Treebank Workshop, Paris, France, pp. 69–76 (1999)
Brants, T., Skut, W., Uszkoreit, H.: Syntactic annotation of a German newspaper corpus. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora. Text, Speech and Language Technology, vol. 20, pp. 73–87. Springer, The Netherlands (2003)
Bresnan, J.: The Mental Representation of Grammatical Relations. MIT Press, Cambridge (1982)
Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Language Learning (CoNLL), New York, NY, pp. 149–164 (2006)
Butt, M., Dyvik, H., King, T.H., Masuichi, H., Rohrer, C.: The parallel grammar project. In: Proceedings of COLING-2002 Workshop on Grammar Engineering and Evaluation, Taipei, Taiwan, vol. 15, pp. 1–7 (2002)
Corazza, A., Lavelli, A., Satta, G.: An information-theoretic measure to evaluate parsing difficulty across treebanks. ACM Trans. Speech Lang. Process. 9(4) (2013)
Crouch, D., Dalrymple, M., Kaplan, R.M., King, T.H., Maxwell III, J.T., Newman, P.: XLE documentation. Technical report, Palo Alto Research Center
Crysmann, B., Hansen-Schirra, S., Smith, G., Ziegler-Eisele, D.: TIGER Morphologie-Annotationsschema. Technical report, Universität des Saarlandes, Universität Stuttgart and Universität Potsdam (2005). http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/tiger_scheme-morph.pdf
Demske, U.: Das Mercurius-Projekt: eine Baumbank für das Frühneuhochdeutsche. In: Zifonun, G., Kallmeyer, W. (eds.) Sprachkorpora - Datenmengen und Erkenntnisfortschritt, Jahrbuch des Instituts für deutsche Sprache 2006, pp. 91–104. de Gruyter, Berlin (2007)
Dipper, S.: Grammar-based corpus annotation. In: Proceedings of the COLING Workshop on Linguistically Interpreted Corpora (LINC-2000), Luxembourg, pp. 56–64 (2000)
Dipper, S.: Implementing and Documenting Large-Scale Grammars – German LFG. Ph.D. thesis, IMS, University of Stuttgart (2003). Working papers of the Institut für Maschinelle Sprachverarbeitung (AIMS), vol. 9(1)
Dipper, S.: Querying topological fields in the TIGER scheme with TIGERSearch. In: Proceedings of the 13th International Workshop on Treebanks and Linguistic Theories (TLT13), Tübingen, Germany, pp. 37–50 (2014)
Drach, E.: Grundgedanken der Deutschen Satzlehre. Diesterweg, Frankfurt am Main (1937)
Erdmann, O.: Grundzüge der deutschen Syntax nach ihrer geschichtlichen Entwicklung dargestellt. Verlag der Cotta’schen Buchhandlung, Stuttgart (1886). Erste Abteilung
Forst, M.: Treebank conversion – establishing a testsuite for a broad-coverage LFG from the TIGER treebank. In: Proceedings of the EACL Workshop on Linguistically Interpreted Corpora (LINC 2003), Budapest, pp. 25–32 (2003)
Forst, M., Bertomeu, N., Crysmann, B., Fouvry, F., Hansen-Schirra, S., Kordoni, V.: Towards a dependency-based gold standard for German parsers - the TiGer dependency bank. In: Proceedings of LINC 2004 (2004)
Frank, A., King, TH., Kuhn, J., Maxwell, J.: Optimality theory style constraint ranking in large-scale LFG grammars. In: Proceedings of the Third LFG Conference, Brisbane, Australia (1998)
Gärtner, M., Thiele, G., Seeker, W., Björkelund, A., Kuhn, J.: ICARUS – an extensible graphical search tool for dependency treebanks. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60, Sofia, Bulgaria, August 2013. Association for Computational Linguistics
Gastel, A., Schulze, S., Versley, Y., Hinrichs, E.: Annotation of explicit and implicit discourse relations in the TüBa-D/Z Treebank. In: Proceedings of the Conference of the German Society for Computational Linguistics and Language Technology (GSCL), Hamburg, Germany (2011)
Hajič, J., Ciaramita, M., Johansson, R., Kawahara, D., Martí, M.A., Màrquez, L., Meyers, A., Nivre, J., Padó, S., Štěpánek, J., Straňák, P., Surdeanu, M., Xue, N., Zhang, Y.: The CoNLL-2009 shared task: syntactic and semantic dependencies in multiple languages. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, Boulder, Colorado, pp. 1–18, June 2009. Association for Computational Linguistics
Harbusch, K.: Incremental sentence production inhibits clausal coordinate ellipsis: a treebank study into Dutch and German. Dialogue Discourse. Special issue on Incremental Processing in Dialogue 2(1):313–332 (2011)
Harbusch, K., Kempen, G.: Clausal coordinate ellipsis in German: the TIGER treebank as a source of evidence. In: Proceedings of NODALIDA 2007 – Sixteenth Nordic Conference of Computational Linguistics, Tartu, Estonia (2007)
Hinrichs, E., Beck, K.: Auxiliary fronting in German: a walk in the woods. In: Proceedings of the Twelfth Workshop on Treebanks and Linguistic Theories (TLT), Sofia, Bulgaria, pp. 61–72 (2013)
Hinrichs, E., Telljohann, H.: Constructing a valence lexicon for a treebank of German. In: Proceedings of the 7th International Workshop on Treebanks and Linguistic Theories (TLT), Groningen, The Netherlands, pp. 41–52 (2009)
Hinrichs, E.W., Kübler, S.: Treebank profiling of spoken and written German. In: Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories, Barcelona, Spain, pp. 65–76 (2005)
Hinrichs, E.W., Kübler, S.: What linguists always wanted to know about German and did not know how to ask. In: Suominen, M., Arppe, A., Airola, A., Heinämäki, O., Miestamo, M., Määttä, U., Niemi, J., Pitkänen, K.K., Sinnemäki, K. (eds.) A Man of Measure: Festschrift in Honour of Fred Karlsson on his 60th Birthday. SKY Journal of Linguistics, vol. 19, pp. 24–33. The Linguistic Association of Finland (2006). Special Supplement
Hinrichs, E.W., Bartels, J., Kawata, Y., Kordoni, V., Telljohann, H.: The Tübingen treebanks for spoken German, English, and Japanese. In: Wahlster, W. (ed.) Verbmobil: Foundations of Speech-to-Speech Translation, pp. 550–574. Springer, Berlin (2000)
Hinrichs, E.W., Bartels, J., Kawata, Y., Kordoni, V., Telljohann, H.: The Verbmobil treebanks. In: Proceedings of KONVENS 2000, 5. Konferenz zur Verarbeitung natürlicher Sprache, Ilmenau, Germany, pp. 107–112 (2000)
Hinrichs, E.W., Filippova, K., Wunsch, H.: What treebanks can do for you: rule-based and machine-learning approaches to anaphora resolution in German. In: Civit, M., Kübler, S., Martí, M.A. (eds.) Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), Barcelona, Spain, pp. 77–88 (2005)
Hirschmann, H., Linde, S.: Annotationsguidelines zur Deutschen Diachronen Baumbank. Technical report, Humboldt-Universität zu Berlin (2010). http://korpling.german.hu-berlin.de/ddb-doku
Höhle,T.: Der Begriff “Mittelfeld”, Anmerkungen über die Theorie der topologischen Felder. In: Akten des Siebten Internationalen Germanistenkongresses 1985, Göttingen, Germany, pp. 329–340 (1986)
Kallmeyer, L., Maier, W.: Data-driven parsing using probabilistic linear context-free rewriting systems. Comput. Linguist. 39(1), 87–119 (2013)
King, T.H., Crouch, R., Riezler, S., Dalrymple, M., Kaplan, R.M.: The PARC700 dependency bank. In: Proceedings of the 4th International Workshop on Linguistically Interpreted Corpora (LINC-03) at EACL-03, pp. 1–8 (2003)
King, T.H., Dipper, S., Frank, A., Kuhn, J., Maxwell, J.: Ambiguity management in grammar writing. Res. Lang. Comput. 2, 259–280 (2004)
Kountz, M.: Extraktion von Dependenztripeln aus der TIGER-Baumbank (2006). Studienarbeit, Universität Stuttgart
Kübler, S.: How do treebank annotation schemes influence parsing results? Or how not to compare apples and oranges. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria, pp. 293–300 (2005)
Kübler, S.: The PaGe shared task on parsing German. In: Proceedings of the ACL Workshop on Parsing German, Columbus, Ohio, pp. 55–63 (2008)
Kübler, S., Telljohann, H.: Towards a dependency-based evaluation for partial parsing. In: Proceedings of the LREC-Workshop Beyond PARSEVAL – Towards Improved Evaluation Measures for Parsing Systems, Las Palmas, Gran Canaria, pp. 9–16 (2002)
Kübler, S., Hinrichs, E.W., Maier, W.: Is it really that difficult to parse German? In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP), Sydney, Australia, pp. 111–119 (2006)
Kübler, S., Maier, W., Rehbein, I., Versley, Y.: How to compare treebanks. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC), Marrakech, Morocco, pp. 2322–2329 (2008)
Kübler, S., Rehbein, I., van Genabith, J.: TePaCoC – a corpus for testing parser performance on complex German grammatical constructions. In: Proceedings of the Seventh International Workshop on Treebanks and Linguistic Theories, Groningen, The Netherlands, pp. 15–28 (2009)
Kübler, S., Beck, K., Hinrichs, E., Telljohann, H.: Chunking German: an unsolved problem. In: Proceedings of the Forth Linguistic Annotation Workshop (LAW), Uppsala, Sweden, pp. 147–151 (2010)
Kunze, C., Lemnitzer, L.: Germanet – representation, visualization, application. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), Las Palmas, Canary Islands, pp. 1485–1491 (2002)
Lezius, W.: Ein Suchwerkzeug für syntaktisch annotierte Textkorpora. Ph.D. thesis, Universität Stuttgart (2002). Arbeitspapiere des Instituts für Maschinelle Sprachverarbeitung (AIMS), vol. 8(4)
Martens, S.: TüNDRA: a web application for treebank search and visualization. In: Proceedings of the Twelfth Workshop on Treebanks and Linguistic Theories (TLT), Sofia, Bulgaria, pp. 133–144 (2013)
Mengel, A., Lezius, W.: An XML-based representation format for syntactically annotated corpora. In: Proceedings of the International Conference on Language Resources and Evaluation, pp. 121–126 (2000)
Meurer, P., Dyvik, H., Rosén, V., De Smedt, K., Lyse, GI., Losnegaard, G.S., Thunes, M.: The INESS treebanking infrastructure. In: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013). NEALT Proceedings, Olso, Norway, vol. 16, pp. 453–458 (2013)
Meurers, D., Müller, S.: Corpora and syntax. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics: An International Handbook, pp. 920–933. Mouton de Gruyter, Berlin (2009)
Müller, F.H.: Stylebook for the Tübingen partially parsed corpus of written German (TüPP-D/Z). Technical report, Seminar für Sprachwissenschaft, Universität Tübingen (2004). http://www.sfs.uni-tuebingen.de/tupp/doc/stylebook.ps
Naumann, K.: Manual for the annotation of in-document referential relations. Technical report, Universität Tübingen (2007). http://www.sfs.uni-tuebingen.de/resources/tuebadz-coreference-manual-2007.pdf
Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the CoNLL 2007 Shared Task. Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Czech Republic, Prague, pp. 915–932(2007)
Orasan, C.: PALinkA: A highly customizable tool for discourse annotation. In: Proceedings of the 4th SIGdial Workshop on Discourse and Dialog, Sapporo, Japan, pp. 39–43 (2003)
Pappert, S., Schließer, J., Janssen, D., Pechmann, T.: Corpus- and psycholinguistic investigations of linguistic constraints on German object order. In: Späth, A. (ed.) Interfaces and Interface Conditions, pp. 299–328. Mouton de Gruyter, Berlin (2007)
Plaehn, O.: Annotate: Bedienungsanleitung. Technical report, Universität des Saarlandes (1998). http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/annotate-manual.ps.gz
Plaehn, O.: Probabilistic parsing with discontinuous phrase structure grammar. Master’s thesis, Department of Computational Linguistics, University of the Saarland, Saarbrücken, Germany (1999)
Plaehn, O., Brants, T.: Annotate – an efficient interactive annotation tool. In: Proceedings of the 6th Conference on Applied Natural Language Processing (ANLP-2000), Seattle, WA (2000)
Pollard, C., Sag, I.A.: Head-Driven Phrase Structure Grammar. Studies in Contemporary Linguistics. University of Chicago Press, Chicago (1994)
Rehbein, I., van Genabith, J.: Treebank annotation schemes and parser evaluation for German. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, pp. 630–639 (2007)
Rehbein, I., van Genabith, J.: Why is it so difficult to compare treebanks? TIGER and TüBa-D/Z revisited. In: Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories (TLT), Bergen, Norway, pp. 115–126 (2007)
Rehm, G., Witt, A., Zinsmeister, H., Dellert, J.: Masking treebanks for the free distribution of linguistic resources and other applications. In: Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories (TLT), Bergen, Norway (2007)
Reis, M.: Zum Subjektbegriff im Deutschen. In: Abraham, W. (ed.) Satzglieder im Deutschen: Vorschläge zur syntaktischen, semantischen und pragmatischen Fundierung, pp. 171–211. Narr, Tübingen (1982)
Rohrer, C., Forst, M.: Improving coverage and parsing quality of a large-scale LFG for German. In: Proceedings of the Language Resources and Evaluation Conference (LREC-2006), Genoa, Italy, pp. 2206–2211 (2006)
Romary, L., Zeldes, A., Zipser, F.: \(<\)tiger2/\(>\) – Serialising the ISO SynAF syntactic object model. Lang. Resour. Eval. (to appear)
Rosén, V., Meurer, P., De Smedt, K.: LFG Parsebanker: a toolkit for building and searching a treebank as a parsed corpus. In: Proceedings of the Seventh International Workshop on Treebanks and Linguistic Theories, Utrecht, pp. 127–133 (2009)
Roussel, A.: Documentation of the tool TIGER Tree Enricher (2014). http://www.linguistics.ruhr-uni-bochum.de/resources /software/tte
Samuelsson, Y., Volk, M.: Automatic node insertion for treebank deepening. In: Proceedings of the Third Workshop on Treebanks and Linguistic Theories (TLT), Tübingen, pp. 127–136 (2004)
Schiller, A., Teufel, S., Stöckert, C., Thielen, C.: Guidelines für das Tagging deutscher Textcorpora mit STTS (Kleines und großes Tagset). Technical report, Universität Stuttgart and Universität Tübingen (1999). http://www.ims.uni-stuttgart.de/forschung/ressourcen/lexika/TagSets/stts-1999.pdf
Seeker, W., Kuhn, J.: Making ellipses explicit in dependency conversion for a German treebank. In: Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, pp. 3132–3139 (2012)
Simon, S., Hinrichs, E., Schulze, S., Versley, Y.: Handbuch zur Annotation expliziter und impliziter Diskursrelationen im Korpus der Tübinger Baumbank des Deutschen (TüBa-D/Z). Universität Tübingen (2011)
Skut, W., Brants, T., Krenn, B., Uszkoreit, H.: A linguistically interpreted corpus of German newspaper text. In: Proceedings of the ESSLLI Workshop on Recent Advances in Corpus Annotation, pp. 705–711 (1998)
Skut, W., Krenn, B., Brants, T., Uszkoreit, H.: An annotation scheme for free word order languages. In: Proceedings of the Fifth Conference on Applied Natural Language Processing ANLP 1997, Washington, DC, pp. 88–95 (1997)
Smith, G.: A brief introduction to the TIGER Treebank, version 1. Technical report, Universität Potsdam (2003). http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/tiger_introduction.pdf
Smith, G.: Searching for morphological structure with regular expressions. Technical report, Universität Potsdam (2003). http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/tiger_regex.pdf
Spreyer, K., Frank, A.: The TIGER 700 RMRS Bank: RMRS construction from dependencies. In: Proceedings of LINC 2005, Jeju Island, Korea, pp. 1–10 (2005)
Stede, M.: The potsdam commentary corpus. In: Proceedings of the ACL-04 Workshop on Discourse Annotation, Barcelona, pp. 96–102 (2004)
Steiner, I.: Partial agreement in German: a processing issue? In: Proceedings of the International Conference on Linguistic Evidence, Tübingen, Germany (2009)
Telljohann, H., Hinrichs, E., Kübler, S.: The TüBa-D/Z treebank: annotating German with a context-free backbone. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC), Lisbon, Portugal, pp. 2229–2235 (2004)
Telljohann, H., Hinrichs, E.W., Kübler, S., Zinsmeister, H., Beck, K.: Stylebook for the Tübingen Treebank of Written German (TüBa-D/Z). Universität Tübingen, Germany, Seminar für Sprachwissenschaft (2015)
Thielen, C., Schiller, A.: Ein kleines und erweitertes Tagset fürs Deutsche. In: Feldweg, H., Hinrichs, E. (eds.) Lexikon & Text, pp. 193–203. Niemeyer, Tübingen, Tübingen (1994)
Trushkina, J.: Morpho-Syntactic Annotation and Dependency Parsing of German. Ph.D. thesis, Universität Tübingen (2004)
Ule, T.: Treebank Refinement: Optimising Representations of Syntactic Analyses for Probabilistic Context-Free Parsing. Ph.D. thesis, Universität Tübingen (2007)
Veenstra, J., Müller, F.H., Ule, T.: Topological fields chunking for German. In: Proceedings of the Sixth Conference on Natural Language Learning (CoNLL 2002), Taipei, Taiwan, pp. 56–62 (2002)
Versley, Y., Beck, K., Hinrichs, E., Telljohann, H.: A syntax-first approach to high-quality morphological analysis and lemma disambiguation for the TüBa-D/Z Treebank. In: Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories (TLT), Tartu, Estonia, pp. 233–244 (2010)
Volk, M., Göhring, A., Marek, T., Samuelsson, Y.: SMULTRON (version 3.0) – The Stockholm MULtilingual parallel TReebank (2010). An English-French-German-Spanish-Swedish parallel treebank with sub-sentential alignments. http://www.cl.uzh.ch/research/parallelcorpora/paralleltreebanks_en.html
Wahlster, W. (ed.): Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Berlin (2000)
Zarrieß, S., Cahill, A., Kuhn, J.: To what extent does sentence-internal realisation reflect discourse context? A study on word order. In: Proceedings of the 13th Conference of the European Chapter of the ACL, Avignon, France, pp. 767–776 (2012)
Zeldes, A., Ritz, J., Lüdeling, A., Chiarcos, C.: ANNIS: a search tool for multi-layer annotated corpora. In: Proceedings of Corpus Linguistics 2009, Liverpool, UK (2009)
Zinsmeister, H.: Treebank data as linguistic evidence? Coordination in TüBa-D/Z. In: Proceedings of the International Conference on Linguistic Evidence, Tübingen, Germany (2006)
Zinsmeister, H., Kuhn, J., Dipper, S.: Utilizing LFG parses for treebank annotation. In: Proceedings of the LFG-02 Conference, Athens, Greece, pp. 427–447 (2002). CSLI Publications
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Dipper, S., Kübler, S. (2017). German Treebanks: TIGER and TüBa-D/Z. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_22
Download citation
DOI: https://doi.org/10.1007/978-94-024-0881-2_22
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)