Text Encoding and Annotation Schemes

Piotrowski, Michael

doi:10.1007/978-3-031-02146-6_5

Michael Piotrowski²

Part of the book series: Synthesis Lectures on Human Language Technologies ((SLHLT))

247 Accesses

Abstract

Once text has been digitized, it needs to be encoded and annotated for storage and further processing. Of course, this concerns all kinds of text, not just historical text, but due to its special properties (see Section 1.1), historical texts tend to have special requirements. This chapter gives a short overview of two standards that are particularly relevant for the encoding of historical texts: first, Unicode for the encoding of characters, then TEI, an XML application for the annotation of texts with structural information and metadata.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Leibniz Institute of European History, Germany
Michael Piotrowski

Authors

Michael Piotrowski
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Piotrowski, M. (2012). Text Encoding and Annotation Schemes. In: Natural Language Processing for Historical Texts. Synthesis Lectures on Human Language Technologies. Springer, Cham. https://doi.org/10.1007/978-3-031-02146-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-02146-6_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-01018-7
Online ISBN: 978-3-031-02146-6
eBook Packages: Synthesis Collection of Technology (R0)eBColl Synthesis Collection 4

Publish with us

Policies and ethics