Abstract
Once text has been digitized, it needs to be encoded and annotated for storage and further processing. Of course, this concerns all kinds of text, not just historical text, but due to its special properties (see Section 1.1), historical texts tend to have special requirements. This chapter gives a short overview of two standards that are particularly relevant for the encoding of historical texts: first, Unicode for the encoding of characters, then TEI, an XML application for the annotation of texts with structural information and metadata.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Piotrowski, M. (2012). Text Encoding and Annotation Schemes. In: Natural Language Processing for Historical Texts. Synthesis Lectures on Human Language Technologies. Springer, Cham. https://doi.org/10.1007/978-3-031-02146-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-02146-6_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-01018-7
Online ISBN: 978-3-031-02146-6
eBook Packages: Synthesis Collection of Technology (R0)eBColl Synthesis Collection 4