Abstract
We present a new collection of treebanks for the Portuguese language, comprising five datasets that cover major types of grammatically annotated corpora: TreeBankPT, PropBankPT, DependencyBankPT, LogicalFormBankPT and DeepBankPT. This collection is the Portuguese part of a broader multilingual collection of aligned treebanks that are developed for different languages, including English, under the same methodological principles and guidelines, and whose raw text versions are translations of the Penn Treebank, a de facto standard dataset for research on language technology.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Artstein, R., Poesio, M.: Inter-Coder Agreement for Computational Linguistics. Computational Linguistics 34(4) (2008)
Böhmová, A., Hajič, J., Hajičová, E., Hladká, B.: The Prague Dependency Treebank. In: Abeillé, A. (ed.) Treebanks. Kluwer (2003)
Castro, S.: Developing Reliability Metrics and Validation Tools for Datasets with Deep Linguistic Information, MA Dissertaion, Universty of Lisbon (2011)
Copestake, A., Flickinger, D., Pollard, C., Sag, I.A.: Minimal Recursion Semantics: An Introduction. Journal of Research on Language and Computation 3(4) (2005)
Copestake, A., Flickinger, D.: An open-source grammar development environment and broad-coverage English grammar using HPSG. In: LREC 2000 (2000)
Cotton, S., Bird, S.: An Integrated Framework for Treebanks and Multilayer Annotations. In: Proceedings of LREC 2002 (2002)
Branco, A., Carvalheiro, C., Pereira, S., Avelãs, M., Pinto, C., Silveira, S., Costa, F., Silva, J., Castro, S., Graça, J.: A PropBank for Portuguese: The CINTIL-PropBank. In: Proceedings of LREC 2012 (2012)
Branco, A., Silva, J., Costa, F., Castro, S.: CINTIL TreeBank Handbook: Design options for the representation of syntactic constituency. Department of Informatics, University of Lisbon, Technical Reports nb. di-fcul-tp-11-02 (2011)
António, B., Castro, S., Silva, J., Costa, F.: CINTIL DepBank Handbook: Design options for the representation of grammatical dependencies. In: Department of Informatics, University of Lisbon, Technical Reports nb. di-fcul-tr-11-03 (2011)
Branco, A., Costa, F., Silva, J., Silveira, S., Castro, S., Avelãs, M., Pinto, C., Graça, J.: Developing a Deep Linguistic Databank Supporting a Collection of Treebanks. In: Proceedings of LREC 2010 (2010)
Costa, F., Branco, A.: LXGram: A Deep Linguistic Processing Grammar for Portuguese. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 86–89. Springer, Heidelberg (2010)
Branco, A.: LogicalFormBanks, the Next Generation of Semantically Annotated Corpora: key issues in construction methodology. In: Klopotek, M., Przepiorkowski, A., Wierzchón, S., Trojanowski, K. (eds.) Recent Advances in Intelligent Information Systems. Academic Publishing House EXIT, Warsaw (2009)
Dickinson, M., Meurers, D.: Detecting Annotation Errors in Spoken Language Corpora. In: Proceedings of the Special Session on Treebanks for Spoken Language and Discourse at the 15th Nordic Conference of Computational Linguistics (2005)
Dipper, S.: Grammar-based Corpus Annotation. In: Proceedings of Workshop on Linguistically Interpreted Corpora (2000)
Flickinger, D., Kordoni, V., Zhang, Y., Branco, A., Simov, K., Osenova, P., Carvalheiro, C., Costa, F., Castro, S.: ParDeepBank: Multiple Parallel Deep Treebanking, Proceedings. In: Proceedings of TLT 2012 (2012)
Flickinger, D., Kordoni, V., Zhang, Y.: DeepBank: A Dynamically Annotated Treebank of the Wall Street Journal, Proceedings. In: Proceedings of TLT 2012 (2012)
Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2) (1993)
Oepen, S., Flickinger, D., Toutanova, K., Manning, C.D., Brants, T.: The LinGO Redwoods Treebank: Motivation and Preliminary Applications. In: Proceedings of COLING 2002 (2002)
Oepen, S.: [incr tsdb()] — Competence and Performance Laboratory. User Manual, Technical Report, Computational Linguistics, Saarland University, Germany (1999)
Palmer, M., Kingsbury, P., Gildea, D.: The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 31 (2005)
Rosén, V., Meurer, P., Losnegaard, G.S., Lyse, G.I., De Smedt, K., Thunes, M., Dyvik, H.: An integrated web-based treebank annotation system. In: Proceedings of TLT 2012 (2012)
Rosén, V., Meurer, P., de Smedt, K.: LFG Parsebanker: A Toolkit for Building and Searching a Treebank as a Parsed Corpus. In: Van Eynde, F., Frank, A., van Noord, G., De Smedt, K. (eds.) Proceedings of TLT7 (2009)
Rosén, V., Meurer, P., de Smedt, K.: Constructing a Parsed Corpus with a Large LFG Grammar. In: Butt, M., King, T.H. (eds.) Proceedings of the LFG 2005 Conference. CSLI Publications (2005)
Silva, J., Branco, A.: Deep, consistent and also useful: Extracting vistas from deep corpora for shallower tasks. In: Proceedings of the Workshop on Advanced Treebanking, Proceedings of LREC 2012 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Branco, A. et al. (2014). DeepBankPT and Companion Portuguese Treebanks in a Multilingual Collection of Treebanks Aligned with the Penn Treebank. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-09761-9_23
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09760-2
Online ISBN: 978-3-319-09761-9
eBook Packages: Computer ScienceComputer Science (R0)