Language Corpora: The Czech Case

Čermák, František

doi:10.1007/3-540-44805-5_3

František Čermák²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2166))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

405 Accesses

Abstract

Against background of the growing need of information, which for language used to be supplied in a rather limited way, the new solution found in language corpora and the way how this has been implemented is outlined and discussed. For the Czech language, this solution has materialized in the 100 million representative Czech National Corpus (CNC, 2000). In the following, a brief tour is offered through various stages of its build-up, characterizing both various corpora within CNC and giving some figures about proportions of various types of language represented. The last part of the contribution sets a minimal programme for further research and desiderata to be followed in general in this branch of important and international stream of modern science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Corpora of the Russian Language

MULTEXT-East

Statistical Corpus and Language Comparison on Comparable Corpora

References

Burnard Lou, 1995, Users’ Reference Guide for the British National Corpus, Oxford U. Press, Oxford.
Google Scholar
Čermák, F. 1995. Jazykový korpus: Prostředek a zdroj poznání. Slovo a slovesnost 56: 119–140. (Language Corpus: Means and Source of Knowledge).
Google Scholar
Čermák F., 1997, Czech National Corpus: A Case in Many Contexts. International Journal of Corpus Linguistics 2, 181–197.
Google Scholar
Čermák F., 1998, Czech National Corpus: Its Character, Goal and Background, In Text, Speech, Dialogue, Proceedings of the First Workshop on Text, Speech, Dialogue-TSD’98, Brno, Czech Republic, September, eds. P. Sojka, V. Matoušek, K. Pala, I. Kopeček, Masaryk University: Brno, 9–14.
Google Scholar
Čermák F. Králík J. Kučera K., 1997, Recepce současné češtiny a reprezentativnost korpusu, Slovo a slovesnost 58, 117–124 (Reception of the Contemporary Czech and the Representativeness of Corpus).
Google Scholar
Český národní korpus. Pt]Úvod a příručka uživatele, 2000. Eds. Kocek J., Kopřivová M., Kučera K., Filozofická fakulta KU Praha (Czech National Corpus. An Introduction and User’s Manual).
Google Scholar
Kruyt, J. G. 1993. Design Criteria for Corpora Construction in the Framework of a European Corpora Network. Final Report. Institute for Dutch Lexicology INL: Leiden.
Google Scholar
Kučera K., 1998, Diachronní složka Českého národního korpusu: obecné zásady, kontext a současný stav. Listy filologické 121, 303–313 (Diachronic Component of the Czech National Corpus: General Principles, Context and Current State of Affairs).
Google Scholar
Norling-Christensen, O. 1992. Preparing a Text Corpus. Computational Tools and Methods for Standardizing, Tagging and StructuringText Data. Papers in Computational Lexicography COMPLEX’92, ed. by R. Kiefer et al.: 251–259. Research Institute for Linguistics, Hungarian Academy of Sciences: Budapest
Google Scholar
Petkevič V., 2001, Neprojektivní konstrukce v češtině z hlediska automatické morfologické disambiguace (Nonprojective Constructions in Czech from the Viewpoint of an Automatic Morphological Disambiguation of Czech Texts), in Čeština. Univerzália a specifika 3, eds. Z. Hladká, P. Karlík, Masarykova univerzita Brno. 197–206
Google Scholar
Šulc, M. Korpusová lingvistika. První vstup. Karolinum 1999 (Corpus Linguistics. A First Introduction).
Google Scholar

Download references

Author information

Authors and Affiliations

The Institute of the Czech National Corpus, Faculty of Arts, Charles University, Prague
František Čermák

Authors

František Čermák
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Engineering, University of West Bohemia in Plzeň, Faculty of Applied Sciences, Univerzitní 22, 306-14, Plzeň, Czech Republic
Václav Matoušek , Pavel Mautner , Roman Mouček & Karel Taušer , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Čermák, F. (2001). Language Corpora: The Czech Case. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science(), vol 2166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44805-5_3

Download citation

DOI: https://doi.org/10.1007/3-540-44805-5_3
Published: 24 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42557-1
Online ISBN: 978-3-540-44805-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Language Corpora: The Czech Case

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Corpora of the Russian Language

MULTEXT-East

Statistical Corpus and Language Comparison on Comparable Corpora

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Language Corpora: The Czech Case

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Corpora of the Russian Language

MULTEXT-East

Statistical Corpus and Language Comparison on Comparable Corpora

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation