Abstract
This chapter provides a broad overview of the state-of-the-art in standards development for language resources, beginning with a brief historical overview to serve as context. It describes in some detail several current, major efforts that define the standardization landscape for language resources today, with the aim of outlining their differences and commonalities and, more generally, identifying the progress that has been made to date as well as the obstacles to definitive standardization. In addition to describing standards that are most applicable to linguistic annotation of text, we include a section that overviews considerations and alternatives for spoken data. We also overview a widely-used and influential de facto standard and consider its role in standards development. Finally, we provide an assessment of the standards landscape and the options available to current and future creators of linguistically-annotated resources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that until roughly 2001, the separation of physical format and linguistic information was typically not taken into account in the development of standards for language resources.
- 2.
The Poughkeepsie Principles together with an accounting of the founding assumptions and sponsors of the TEI are available at http://www.tei-c.org/Vault/ED/edp01.htm
- 3.
SGML was formally adopted as an ISO standard in 1986; see [62].
- 4.
The TEI Guidelines were later converted to the Extensible Markup Language (XML) which superseded SGML in the mid-1990s and whose design was influenced by work undertaken in the TEI project.
- 5.
Originally called “remote markup”–see http://www.cs.vassar.edu/CES/CES1-5.html
- 6.
ISLE (International Standards for Language Engineering), a standards-oriented transatlantic initiative, was established in 2000 as a continuation of EAGLES.
- 7.
EAGLES Guidelines are still available at http://www.ilc.cnr.it/EAGLES/browse.html
- 8.
- 9.
chapter “Designing Annotation Schemes: From Model to Representation” - Sect. 2 in this volume provides a history of the development of standards for physical format.
- 10.
See chapter “Designing Annotation Schemes: From Model to Representation”, Sect. 5.2.
- 11.
In particular, the TEI Guidelines contain a wealth of examples for each element and the major constructs they allow.
- 12.
For instance, the class att.global, which contains general purpose attributes such as the W3Cs @xml:id and @xml:lang and the TEI’s generic @n (for local numbering) and @rend (for rendering information).
- 13.
ISO 24610-1:2006 Language resource management – Feature structures – Part 1: Feature structure representation.
- 14.
See the implementation in the Polish National corpus [98].
- 15.
See for instance [103] for introducing TBX entries within a TEI document.
- 16.
See http://morphadorner.northwestern.edu, with the annotation tagset described in http://panini.northwestern.edu/mmueller/nupos.pdf
- 17.
- 18.
- 19.
See also chapter “Designing Annotation Schemes: From Model to Representation”, Sect. 3.2.4 for a description of the MMAX2 annotation tool.
- 20.
The reference specification of the TEI-based TXM pivot format is available at http://txm.sourceforge.net/wiki/index.php/XML-TXM
- 21.
This is a version based on ISO 24615:2010 SynAF, with the title changed.
- 22.
See chapter “Case Study: The Manually Annotated Sub-Corpus (MASC)”.
- 23.
Two additional standards, ISO 24617-6 SemAF Principles and ISO 24617-8 ISO DR-Core, were published in 2016.
- 24.
See chapter “Building FactBank or How to Annotate Event Factuality One Step at a Time” for an example of ISO-TimeML applied to language data.
- 25.
Copied from [80].
- 26.
\({\texttt {<}}\) TIMEX3 xml:id="t21"/ \({\texttt {>}}\) may be treated as an element, called non-consuming tag, which has no associated markable expression in text, thus the value of its attribute @target is empty. See ISOspace [70], A.3.4 Special Section: Non-consuming tags.
- 27.
See chapter “It-TimeML and the Ita-TimeBank: Language Specific Adaptations for Temporal Annotation” for an example of ISOspace applied to language data.
- 28.
The noun Mia is tagged as se (spatial entity) because it is spatially involved as the figure of the event lives near Harvard in Cambridge.
- 29.
\(_{pl7}\) is a non-consuming tag referring to some spot on the Charles River that is crossed.
- 30.
A new attribute @dir for the direction of a motion may need to be introduced to annotate a markable such as eastward.
- 31.
The informative annex B in SemAF-SR [69] reviews these existing framewokrs in detail.
- 32.
See [17], p. 41.
- 33.
The specification of the annotation structure here is much simplified, differing from that presented in [17].
- 34.
See Annex C.3 Concrete syntax, SemAF-SR [69].
- 35.
See chapters “Semantic Annotation of MASC” and “VerbNet/OntoNotes-Based Sense Annotation”.
- 36.
- 37.
- 38.
- 39.
- 40.
- 41.
- 42.
- 43.
- 44.
- 45.
- 46.
- 47.
- 48.
- 49.
- 50.
- 51.
- 52.
- 53.
- 54.
- 55.
- 56.
- 57.
See e.g. http://clarkparsia.com/pellet/icv/
- 58.
- 59.
- 60.
- 61.
- 62.
- 63.
- 64.
- 65.
- 66.
- 67.
Note that converters from many graph-based formats to CoNLL IOB exist, but the reverse conversion from CoNLL IOB into these formats is significantly more challenging.
References
Allen, J., Core, M.: DAMSL: dialogue act markup in several layers (Draft 2.1). Technical report. University of Rochester, Rochester, NY (1997). http://www.cs.rochester.edu/research/cisd/resources/damsl/RevisedManual/
Allwood, J.: On dialogue cohesion. Gothenburg Papers in Theoretical Linguistics 65 (1992). Gothenburg University, Department of Linguistics
Auer, S., Hellmann, S.: The web of data: decentralized, collaborative, interlinked and interoperable. In: LREC (2012)
Austin, P.K., Grenoble, L.A.: Current trends in language documentation. Lang. Doc. Descr. 4, 12–25 (2007)
Bigi, B., Hirst, D.: SPeech phonetization alignment and syllabification (SPPAS): a tool for the automatic analysis of speech prosody. In: Speech Prosody, Shanghai, China, pp. 1–4. (2012). https://hal.archives-ouvertes.fr/hal-00983699
Bird, S., Klein, E.: Phonological events. Journal of Linguistics 26, 33–56 (1990)
Bird, S., Liberman, M.: A formal framework for linguistic annotation. Speech Communication 33(1–2), 23–60 (2001)
Boersma, P., Weenink, D.: Praat, a system for doing phonetics by computer. Speech Communication 5(9/10), 341–345 (2001)
Breen, M., Dilley, L.C., Kraemer, J., Gibson, E.: Inter-transcriber agreement for two systems of prosodic annotation: Tobi (tones and break indices) and rap (rhythm and pitch). Speech Communication 8(2), 277–312 (2012)
Broeder, D., Schuurman, I., Windhouwer, M.: Experiences with the isocat data category registry. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pp. 4565–4568. European Language Resources Association (ELRA), Reykjavik, Iceland (2014)
Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL), pp. 149–164 (2006)
Bunt, H.: Context and dialogue control. Speech Communication 3(1), 19–31 (1994)
Bunt, H.: Dialogue pragmatics and context specification. In: Bunt, H., Black, W. (eds.) Abduction, Belief and Context in Dialogue. Studies in Computational Pragmatics, pp. 81–150. John Benjamins, Amsterdam (2000)
Bunt, H.:The DIT++ taxonomy for functional dialogue markup. In: Heylen, D., Pelachaud, C., Catizone, R., Traum, D. (eds.) Proceedings of AAMAS 2009 Workshop "Towards a Standard Markup Language for Embodied Dialogue Acts" (EDAML 2009), Budapest, pp. 13–24 (2009)
Bunt, H.: A methodology for designing semantic annotation languages exploring semantic-syntactic iso-morphisms. In: Fang, A., Ide, N., Webster, J. (eds.) Proceedings of the Second International Conference on Global Interoperability for Language Resources (ICGL 2010), pp. 29–46. Department of Chinese, Translation and Linguistics, City Univesity of Hong Kong, Hong Kong (2010)
Bunt, H.: Introducing abstract syntaxt + semantics in semantic annotation, and its consequences for the annotation of time and events. In: Lee, E., Yoon, A. (eds.) Recent Trends in Language and Knowledge Processing, pp. 157–204. Hankookmunhwasa, Seoul (2011)
Bunt, H., Palmer, M.: Conceptual and representational choices in defining an iso standard for semantic role annotation. In: Bunt, H. (ed.) Proceedings of the 9th Joint ISO-ACL/SIGSEM Workshop on Interoperable Semantic Annotation (ISA-9), pp. 41–50. Association for Computational Linguistics, Potsdam, Germany (2013). http://www.aclweb.org/anthology/W13-0500
Bunt, H., Pustejvosky, J.: Annotating event and temporal quantification. In: Proceedings of the Fifth Joint ISO-ACL/SIGSEM Workshop on Interoperable Semantic Annotation ISA-5, pp. 15–22 (2010)
Bunt, H., Alexandersson, J., Carletta, J., Choe, J.W., Fang, A.C., Hasida, K., Lee, K., Petukhova, V., Popescu-Belis, A., Romary, L., Soria, C., Traum, D.: Towards an ISO standard for dialogue act annotation. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10) (2010)
Bunt, H., Alexandersson, J., Choe, J.W., Fang, A.C., Hasida, K., Petukhova, V., Popescu-Belis, A., Traum, D.: Iso 24617-2: a semantically-based standard for dialogue annotation. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA), Istanbul, Turkey (2012)
Buschmeier, H., Wlodarczak, M.: Textgridtools: a textgrid processing and analysis toolkit for python. In: Tagungsband der 24. Konferenz zur Elektronischen Sprachsignalverarbeitung (ESSV 2013), pp. 152–157 (2013)
Carletta, J., Isard, S., Kowtko, J., Doherty-Sneddon, G.: HCRC dialogue structure coding manual. Technical report HCRC/TR-82 (1996)
Carletta, J., Dahlbäck, N., Reithinger, N., Walker, M.A.: Standards for dialogue coding in natural language processing. Technical report no. 167. Report from Dagstuhl seminar number 9706 (1997)
Chiarcos, C.: Ontologies of linguistic annotation: survey and perspectives. In: LREC. European Language Resources Association (2012)
Cinková, S.: From propbank to engvallex: adapting the propbank-lexicon to the valency theory of the functional generative description. In: Proceedings of the 6th Edition of International Conference on Language Resources and Evaluation (LREC 2006), pp. 2170–2175 (2006)
Corpus Encoding Standard (1994). http://www.cs.vassar.edu/CES/CES1.html
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: ACL (2002). doi:10.3115/1073083.1073112. http://www.aclweb.org/anthology/P02-1022
de Marneffe, M.C., Manning, C.D.: The Stanford typed dependencies representation. In: Coling 2008: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation, pp. 1–8 (2008)
de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC) (2006)
de Marneffe, M.C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J., Manning, C.D.: Universal Stanford dependencies: a cross-linguistic typology. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC), pp. 4585–4592 (2014)
Dhillon, R., Bhagat, S., Carvey, H., Schriberg, E.: Meeting recorder project: dialogue labelling guide. ICSI Technical Report TR-04-002 (2004)
Di Eugenio, B., Jordan, P.W., Pylkkanen, L.: The COCONUT project: dialogue annotation manual. ISP Technical Report 98–1, University of Pittsburgh (1998)
Eckle-Kohler, J., Gurevych, I., Hartmann, S., Matuschek, M., Meyer, C.M.: UBY-LMF - exploring the boundaries of language-independent lexicon models. In: Francopoulo, G. (ed.) LMF Lexical Markup Framework, Chap. 10, pp. 145–156. ISTE - HERMES - Wiley, London (2013)
Farrar, S., Langendoen, D.T.: A linguistic ontology for the semantic web. Speech Communication 7, 97–100 (2003)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Ferrucci, D., Lally, A.: UIMA: an architectural approach to unstructured information processing in the corporate research environment. Speech Communication 10(3/4), 327–348 (2004)
Fillmore, C.J.: The case for case. In: Bach, E., Harms, R. (eds.) Universals in Linguistic Theory, pp. 1–89. Holt, Rinehart, and Winston (1968)
Fillmore, C., Baker, C., Sato, H.: Framenet as a “net”. In: Proceedings of the 4th Edition of International Conference on Language Resources and Evaluation (LREC 2004), pp. 1091–1094 (2004)
Francopoulo, G. (ed.): LMF: Lexical Markup Framework. Wiley-ISTE, London (2013)
Gibbon, D.: Time types and time trees: prosodic mining and alignment of temporally annotated data. In: Sudhoff, S., Lenertova, D., Meyer, R., Pappert, S., Augurzky, P., Mleinek, I., Richter, N., Schlieer, J. (eds.) Methods in Empirical Prosody Research, pp. 281–209. Walter de Gruyter, Berlin (2006)
Gibbon, D.: Modelling gesture as speech: a linguistic approach. Pozna? Speech Communication 47, 470–508 (2011)
Gibbon, D., Moore, R., Winski, R. (eds.): Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter, Berlin (1997)
Gibbon, D., Mertins, I., Moore, R.: Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation. The Springer International Series in Engineering and Computer Science. Springer US (2000). http://books.google.com/books?id=Ntb0T7gfIn8C
Głowińska, K., Przepirkowski, A.: The design of syntactic annotation levels in the national corpus of polish. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), pp. 19–21. European Language Resources Association (ELRA), Valletta, Malta (2010)
Grishman, R.: TIPSTER architecture design document version 2.2. Technical report, Defense Advanced Research Projects Agency (1996)
Gurevych, I., Eckle-Kohler, J., Hartmann, S., Matuschek, M., Meyer, C.M., Wirth, C.: UBY - a large-scale unified lexical-semantic resource. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), Avignon, France, pp. 580–590 (2012)
Hellmann, S., Lehmann, J., Auer, S.: Linked-data aware URI schemes for referencing text fragments. EKAW 2012. LNCS, vol. 7603. Springer, New York (2012)
Hirst, D., Di Cristo, A.: Intonation Systems: A Survey of Twenty Languages. Cambridge University Press, Cambridge (1998). http://www.google.com.sg/books?id=LClvNiI4k0sC
Ide, N., Veronis, J.: Multext: multilingual text tools and corpora. In: COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics (1994). http://aclweb.org/anthology/C94-1097
Ide, N., Veronis, J.: Encoding dictionaries. In: Ide, N., Veronis, J. (eds.) The Text Encoding Initiative: Background and Context. Kluwer Academic Publishers, Dordrecht (1995)
Ide, N., Romary, L.: Standards for language resources. In: Proceedings of the IRCS Workshop on Linguistic Databases, Philapdelphia, Pa, pp. 141–149 (2001)
Ide, N., Romary, L.: Outline of the international standard linguistic annotation framework. In: Proceedings of ACL’03 Workshop on Linguistic Annotation: Getting the Model Right, pp. 1–5 (2003)
Ide, N., Romary, L.: International standard for a linguistic annotation framework. Speech Communication 10(3–4), 211–225 (2004)
Ide, N., Romary, L.: A registry of standard data categories for linguistic annotation. In: Proceedings of the Fourth Language Resources and Evaluation Conference (LREC), Lisbon, pp. 135–139 (2004)
Ide, N., Romary, L.: Towards international standards for language resources. In: Dybkjaer, L., Hemsen, H., Minker, W. (eds.) Evaluation of Text and Speech Systems, pp. 263–284. Springer, New York (2007)
Ide, N., Suderman, K.: GrAF: a graph-based format for linguistic annotations. In: Proceedings of the Linguistic Annotation Workshop (LAW), pp. 1–8. Association for Computational Linguistics (2007)
Ide, N., Pustejovsky, J.: What does interoperability mean, anyway? Toward an operational definition of interoperability. In: Proceedings of the Second International Conference on Global Interoperability for Language Resources. Hong Kong (2010)
Ide, N., Suderman, K.: The linguistic annotation framework: a standard for annotation interchange and merging. Speech Communication 48(3), 395–418 (2014)
Ide, N., Bonhomme, P., Romary, L.: XCES: an XML-based encoding standard for linguistic corpora. In: Proceedings of the Second International Language Resources and Evaluation Conference (LREC’00) (2000)
Ide, N., Baker, C., Fellbaum, C., Passonneau, R.: The Manually Annotated Sub-Corpus: A Community Resource For and By the People. In: Proceedings of the ACL 2010 Conference Short Papers, pp. 68–73. Association for Computational Linguistics, Uppsala, Sweden (2010)
Ide, N., Pustejovsky, J., Suderman, K., Verhagen, M.: The language application grid web service exchange vocabulary. In: Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT (OIAF4HLT). Dublin (2014)
International Organization for Standardization: ISO 8879:1986: Information processing – Text and office systems – Standard Generalized Markup Language (SGML). ISO, Geneva (1986)
ISO 24612:201 Language resource management - Linguistic annotation framework (LAF), ISO, Geneva. ISO Working Group: ISO/TC 37/SC 4/WG 2 convenor and project leader, Nancy Ide
ISO: ISO 8601:2004 Data elements and interchange formats – Information interchange – Representation of dates and times. ISO, Geneva (2004)
ISO: ISO 24612:2012 Language resource management - Linguistic annotation framework (LAF). ISO, Geneva. ISO Working Group:TC 37/SC 4/WG 1, Convenor and project leader: Nancy Ide (2012)
ISO: ISO 24617-1:2012 Language resource management - Semantic annotation framework - Part 1: time and events (SemAF-Time, ISO-TimeML). ISO, Geneva. ISO Working Group:TC 37/SC 4/WG 2, Editors: James Pustejvosky (chair), Harry Bunt, Kiyong Lee (convenor and project leader), Bran Boguraev, and Nancy Ide in cooperation with the TimeML Working Group (2012). http://www.timeml.org
ISO: ISO 24617–2:2012 Language resource management - Semantic annotation framework - Part 2: dialogue acts (SemAF-DA). ISO, Geneva. ISO Working Group:TC 37/SC 4/WG 2 Convenor: Kiyong Lee. Project leader: Harry Bunt (2012)
ISO: 24612:2012 Language resource management, Linguistic annotation framework (LAF). ISO, Geneva, Switzerland (2012)
ISO: ISO 24617-4:2014 Language resource management - Semantic annotation framework - Part 4: Semantic roles (SemAF-SR). ISO, Geneva. ISO Working Group:TC 37/SC 4/WG 2 Convenor: Kiyong Lee, Project leader: Martha Palmer, Writers: Martha Palmer (USA), Collin Baker (USA), Claire Bonial (USA), Harry Bunt (Holland), Katrin Erk (USA, Germany), Olga Petukhova (Germany), James Pustejovsky (USA), Zdenka Uresova (the Czech Republic), Nianwen Xue (USA, China) (2014)
ISO: ISO 24617-7:2014 Language resource management - Part 7: spatial information (ISOspace). ISO, Geneva. ISO Working Group: TC 37/SC 4/WG 2, Project leaders: James Pustejovsky and Kiyong Lee, supported by the ISOspace Working Group headed by James Pustejvosky at Brandeis University, Waltham, MA, U.S.A. The following is the homepage for the ISO-Space project (2014). https://sites.google.com/site/wikiisospace/
Katz, G.: Annotating temporal and event quantification. Annotating, Extracting and Reasoning About Time and Events, pp. 88–106 (2007)
Kipper, K., Korhonen, A., Ryant, N., Palmer, M.: A large-scale classification of English verbs. Speech Communication 42, 21–40 (2008)
Kipper-Schuler, K.: Verbnet: a broad-coverage, comprehensive verb lexicon. Ph.D. thesis, University of Pennsylvania (2005)
Klessa, K., Gibbon, D.: Annotation Pro + TGA: automation of speech timing analysis. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). European Language Resources Association (ELRA), Reykjavik, Iceland (2014)
Knuth, D.E.: Literate Programming. CSLI Lecture Notes. CSLI, Stanford (1992)
Kübler, S., McDonald, R., Nivre, J.: Dependency Parsing. Morgan and Claypool, San Rafael (2009)
Laurent Romary. TEI and LMF crosswalks. JLCL - Journal for Language Technology and Computational Linguistics, 30(1), (2009). <http://www.jlcl.org><hal-00762664v4>
Lee, K.: Formal Semantics for Temporal Annotation. Lecture Notes for CIL, vol. 18 (2008)
Lee, K.: A compositional interval semantics for temporal annotation. In: Lee, E., Yoon, A. (eds.) Recent Trends in Language and Knowledge Processing, pp. 157–204. Hankookmunhwasa, Seoul. Presented at the workshop on language and knowledge processing, Pusan National University, in summer 2008 (2011)
Lee, K.: The annotation of measure expressions in ISO standards. In: Bunt, H. (ed.) Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-11). QMUL, London. A satellite workshop of IWCS 2015, London, U.K (2015)
Lee, K., Romary, L.: Towards interoperability of ISO standards for language resource management. In: Fang, A.C., Ide, N., Webster, J. (eds.) Proceedings of Language Resources and Interoperability, The Second International Conference on Global Interoperability for Language Resources (ICGL201), Hong Kong, pp. 95–104 (2010)
Mani, I., Hitzeman, J., Richer, J., Harris, D., Quimby, R., Wellner, B.: Spatialml: annotation scheme, corpora, and tools. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08). European Language Resources Association (ELRA), Marrakech, Morocco (2008). http://www.lrec-conf.org/proceedings/lrec2008/
McDonald, R., Petrov, S., Hall, K.: Multi-source transfer of delexicalized dependency parsers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 62–72 (2011)
McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N., Lee, J.: Universal dependency annotation for multilingual parsing. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 92–97 (2013)
Mcneill, D. (ed.): Language and Gesture: Window into Thought and Action. Cambridge University Press, Cambridge (2000)
Mehler, A., Romary, L., Gibbon, D. (eds.): Handbook of Technical Communication. Handbooks of Applied Linguistics. De Gruyter Mouton, Berlin and Boston (2012)
MITRE: SpatialML: annotation scheme for marking spatial expressions in natural language. The MITRE Corporation (2009). Version 3.1, October 1, 2009, Contact: cdoran@mitre.org
Nivre, J., Hall, J., Nilsson, J.: Maltparser: a data-driven parser-generator for dependency parsing. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), pp. 2216–2219 (2006)
Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the CoNLL Shared Task of EMNLP-CoNLL 2007, pp. 915–932 (2007)
Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: An annotated corpus of semantic roles 31(1), 71–0106 (2005)
Peroni, S., Vitali, F.: Annotations with earmark for arbitrary, overlapping and out-of order markup. In: Borghoff, U.M., Chidlovskii, B. (eds.) ACM Symposium on Document Engineering, pp. 171–180. ACM, New York (2009)
Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC) (2012)
Petukhova, V., Bunt, H.: The independence of dimensions in multidimensional dialogue act annotation. In: Proceedings NAACL HLT Conference, Boulder, Colorado (2009)
Petukhova, V., Bunt, H., Schiffrin, A.: LIRICS semantic role annotation: design and evaluation of a set of data categories. In: Proceedings of the 6th Edition of International Conference on Language Resources and Evaluation (LREC 2008). Marrakech (2007)
Petukhova, V., Prévot, L., Bunt, H.: Discourse relations in dialogue. In: Proceedings 6th Joint ISO - ACL SIGSEM Workshop on Interoperable Semantic Annotation (ISA-6). Oxford, UK (2011)
Popescu-Belis, A.: Dialogue acts: one or more dimensions? ISSCO Working Paper 62. ISSCO, Geneva (2005). http://www.issco.unige.ch/publicaitons/working-papers/papers/apb-issco-wp62b.pdf
Pratt-Hartmann, I.: From TimeML to interval temporal logic. In: Bunt, H. (ed.) Proceedings of the Seventh International Workshop on Computational Semantics (IWCS-7), Tilburg, The Netherlands, pp. 166–180 (2007)
Przepiórkowski, A.: TEI P5 as an XML standard for treebank encoding, pp. 149–160 (2009)
Pustejvosky, J., Gaizauskas, R., Saurí, R., Setzer, A., Ingrai, R.: Annotation guideline to TimeML 1.0 (2002). Available at http://timeml.org
Pustejovsky, J., Ingria, R., Saurí, R., Castaño, J., Littman, J., Gaizauskas, R., Setzer, A., Katz, G., Mani, I.: The specification language TimeML. In: Mani, I., Pustejvosky, J., Gaizauskas, R. (eds.) The Language of Time: a Reader, pp. 545–557. Oxford University Press, Cambridge (2005)
Pustejovsky, J., Lee, K., Bunt, H., Romary, L.: ISO-TimeML: an international standard for semantic annotation. In: Proceedings of LREC2010. Malta (2010)
Rizzo, G., Troncy, R., Hellmann, S., Bruemmer, M.: NERD meets NIF: lifting NLP extraction results to the linked data cloud. In: LDOW (2012)
Romary, L.: TBX goes TEI - implementing a TBX basic extension for the text encoding initiative guidelines. Terminology and Knowledge Engineering 2014, Berlin, Germany, (2014).<hal-00950862v2>
Romary, L., Bonhomme, P.: Parallel alignment of structured documents. Parallel Text Processing, pp. 201–217. Springer, New York (2000)
Rossini, N.: Reinterpreting Gesture as Language - Language in Action. IOS Press, Amsterdam (2012)
Rubiera, E., Polo, L., Berrueta, D., Ghali, A.E.: Telix: an RDF-based model for linguistic annotation. In: ESWC (2012)
Schierle, M.: Language engineering for information extraction. Ph.D. thesis, Universität Leipzig (2011)
Schiffrin, A., Bunt, H.: LIRICS deliverable D4.3: documented compilation of semantic data categories (2007). http://lirics.loria.fr
Schmidt, T.: A tei-based approach to standardising spoken language transcription. Journal of the Text Encoding Initiative, Issue 1 | June 2011. http://jtei.revues.org/142 ; DOI : 10.4000/jtei.142
Sperberg-McQueen, C., L. Burnard, L. (eds.): Guidelines for electronic text encoding and interchange. TEI P3. Text Encoding Initiative, Oxford, Providence, Charlottesville, Bergen (1994)
Szymański, M., Bachan, J.: Interlabeller agreement on segmental and prosodic annotation of the jurisdict polish database. Speech Communication 14/15, 105–121 (2012)
TEI Consortium (ed.): Guidelines for electronic text encoding and interchange. TEI P5. Text Encoding Initiative, Oxford, Providence, Charlottesville, Bergen, Nancy (2003)
Teoh, A., Chin, S.: Transcribing the speech of children with cochlear implants: clinical application of narrow phonetic transcriptions. Speech Communication 18(4), 388–401 (2009)
Tobies, S.: Complexity results and practical algorithms for logics in knowledge representation. Ph.D. thesis, TU Dresden (2001)
Tomaz, E., Fiser, D., Krek, S., Ledinek, N.: The JOS linguistically tagged corpus of Slovene. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). Malta (2010)
Traum, D.: 20 questions on dialogue act taxonomies. Speech Communication 17(1), 7–30 (2000)
Tsarfaty, R.: A unified morpho-syntactic scheme of Stanford dependencies. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 578–584 (2013)
Windhouwer, M.: RELcat: a relation registry for ISOcat data categories. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, pp. 3661–3664 (2012)
Windhouwer, M., Wright, S.E.: LMF and the data category registry: principles and application. In: Francopoulo, G. (ed.) LMF Lexical Markup Framework, Chap. 10, pp. 41–50. ISTE - HERMES - Wiley, London (2013)
Zeman, D.: Reusable tagset conversion using tagset drivers. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC), pp. 213–218 (2008)
Zeman, D., Mareček, D., Popel, M., Ramasamy, L., Štěpánek, J., Žabokrtský, Z., Hajič, J.: HamleDT: To parse or not to parse? In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), pp. 2735–2741 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Ide, N. et al. (2017). Community Standards for Linguistically-Annotated Resources. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_4
Download citation
DOI: https://doi.org/10.1007/978-94-024-0881-2_4
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)