Abstract
Against background of the growing need of information, which for language used to be supplied in a rather limited way, the new solution found in language corpora and the way how this has been implemented is outlined and discussed. For the Czech language, this solution has materialized in the 100 million representative Czech National Corpus (CNC, 2000). In the following, a brief tour is offered through various stages of its build-up, characterizing both various corpora within CNC and giving some figures about proportions of various types of language represented. The last part of the contribution sets a minimal programme for further research and desiderata to be followed in general in this branch of important and international stream of modern science.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Burnard Lou, 1995, Users’ Reference Guide for the British National Corpus, Oxford U. Press, Oxford.
Čermák, F. 1995. Jazykový korpus: Prostředek a zdroj poznání. Slovo a slovesnost 56: 119–140. (Language Corpus: Means and Source of Knowledge).
Čermák F., 1997, Czech National Corpus: A Case in Many Contexts. International Journal of Corpus Linguistics 2, 181–197.
Čermák F., 1998, Czech National Corpus: Its Character, Goal and Background, In Text, Speech, Dialogue, Proceedings of the First Workshop on Text, Speech, Dialogue-TSD’98, Brno, Czech Republic, September, eds. P. Sojka, V. Matoušek, K. Pala, I. Kopeček, Masaryk University: Brno, 9–14.
Čermák F. Králík J. Kučera K., 1997, Recepce současné češtiny a reprezentativnost korpusu, Slovo a slovesnost 58, 117–124 (Reception of the Contemporary Czech and the Representativeness of Corpus).
Český národní korpus. Pt]Úvod a příručka uživatele, 2000. Eds. Kocek J., Kopřivová M., Kučera K., Filozofická fakulta KU Praha (Czech National Corpus. An Introduction and User’s Manual).
Kruyt, J. G. 1993. Design Criteria for Corpora Construction in the Framework of a European Corpora Network. Final Report. Institute for Dutch Lexicology INL: Leiden.
Kučera K., 1998, Diachronní složka Českého národního korpusu: obecné zásady, kontext a současný stav. Listy filologické 121, 303–313 (Diachronic Component of the Czech National Corpus: General Principles, Context and Current State of Affairs).
Norling-Christensen, O. 1992. Preparing a Text Corpus. Computational Tools and Methods for Standardizing, Tagging and StructuringText Data. Papers in Computational Lexicography COMPLEX’92, ed. by R. Kiefer et al.: 251–259. Research Institute for Linguistics, Hungarian Academy of Sciences: Budapest
Petkevič V., 2001, Neprojektivní konstrukce v češtině z hlediska automatické morfologické disambiguace (Nonprojective Constructions in Czech from the Viewpoint of an Automatic Morphological Disambiguation of Czech Texts), in Čeština. Univerzália a specifika 3, eds. Z. Hladká, P. Karlík, Masarykova univerzita Brno. 197–206
Šulc, M. Korpusová lingvistika. První vstup. Karolinum 1999 (Corpus Linguistics. A First Introduction).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Čermák, F. (2001). Language Corpora: The Czech Case. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science(), vol 2166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44805-5_3
Download citation
DOI: https://doi.org/10.1007/3-540-44805-5_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42557-1
Online ISBN: 978-3-540-44805-1
eBook Packages: Springer Book Archive