Abstract
Creating linguistic annotations requires more than just a reliable annotation scheme. Annotation can be a complex endeavour potentially involving many people, stages, and tools. This chapter outlines the process of creating end-to-end linguistic annotations, identifying specific tasks that researchers often perform. Because tool support is so central to achieving high quality, reusable annotations with low cost, the focus is on identifying capabilities that are necessary or useful for annotation tools, as well as common problems these tools present that reduce their utility. Although examples of specific tools are provided in many cases, this chapter concentrates more on abstract capabilities and problems because new tools appear continuously, while old tools disappear into disuse or disrepair. The two core capabilities tools must have are support for the chosen annotation scheme and the ability to work on the language under study. Additional capabilities are organized into three categories: those that are widely provided; those that often useful but found in only a few tools; and those that have as yet little or no available tool support.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agirre, E., Edmonds, P. (eds.): Word Sense Disambiguation. Text, Speech, and Language Technology. Springer, Dordrecht (2007)
Apache.: UIMA Documentation, Version 2.7.0. https://uima.apache.org/d/uimaj-2.7.0/index.html (2014)
Apostolova, E., Neilan, S., An, G., Tomuro, N., Lytinen, S.: Djangology: a light-weight web-based tool for distributed collaborative text annotation. In: Proceedings of the 7th Language Resources and Evaluation Conference (LREC 2010), pp. 3499–3505 (2010)
Auer, E., Russel, A., Sloetjes, H., Wittenburg, P., Schreer, O., Masnieri, S., Schneider, D., Tschöpel, S.: ELAN as flexible annotation framework for sound and image processing detectors. In: Proceedings of the 7th Language Resources and Evaluation Conference (LREC 2010), pp. 890–893. Malta (2010)
Boersma, P.: The use of Praat in corpus research. In: Durand, J., Gut, U., Kristoffersen, G. (eds.) The Oxford Handbook of Corpus Phonology. Oxford University Press, Oxford (2014). doi:10.1093/oxfordhb/9780199571932.013.016
Bombien, L., Cassidy, S., Harrington, J., John, T., Palethorpe, S.: Recent developments in the Emu speech database system. In: Proceedings of the Australian Speech Science and Technology Conference. Auckland, New Zealand (2006)
Buchholz, S., Marsi, E., Krymolowski, Y., Dubey, A.: CoNLL-X Shared Task: Multi-lingual Dependency Parsing. http://ilk.uvt.nl/conll/ (2015). Accessed 11 June 2015
Burchardt, A., Erk, K., Frank, A., Kowalski, A., Pado, S.: SALTO – a versatile multi-level annotation tool. In: Proceedings of the 5th International Conference on Language Resources and Evaluation LREC2006, pp. 517–520 (2006). doi:10.1.1.127.8088
Chen, W.-T., Styler, W.: Anafora: a web-based general purpose annotation tool. In: Proceedings of the 2013 NAACL HLT Demonstration Session, pp. 14–19. Atlanta, Association for Computational Linguistics, Georgia. http://www.aclweb.org/anthology/N13-3004 (2013)
Choi, J.D., Bonial, C., Palmer, M.: Jubilee: Propbank Instance Editor Guidelines (Version 2.1). University of Colorado at Boulder, Boulder (2009)
Cunningham, H., Maynard, D., Bontcheva, K.: Text Processing with GATE (Version 6). University of Sheffield, London (2011)
Day, D., Aberdeen, J., Hirschman, L., Kozierok, R., Robinson, P., Vilain, M.: Mixed-initiative development of language processing systems. In: Proceedings of the 5th Conference on Applied Natural Language Processing, pp. 348–355. Association for Computational Linguistics, Washington, DC (1997). doi:10.3115/974557.974608
Day, D., McHenry, C., Kozierok, R., Riek, L.: Callisto: a configurable annotation workbench. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), pp. 2073–2076. Lisbon, Portugal (2004)
Dickinson, M., Lee, C.M.: Detecting errors in semantic annotation. In: Proceedings of the 6th International Language Resources and Evaluation (LREC’08), pp. 605–610. Marrakech, Morocco. http://www.lrec-conf.org/proceedings/lrec2008/ (2008)
Fellbaum, C.: Wordnet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Finlayson, M.A.: The Story Workbench: an extensible semi-automatic text annotation tool. In: Tomai, E., Elson, D., Rowe, J. (eds.) Proceedings of the 4th Workshop on Intelligent Narrative Technologies (INT4), vol. 4, pp. 21–24. AAAI Press, Menlo Park, Stanford. http://aaai.org/ocs/index.php/AIIDE/AIIDE11WS/paper/view/4091/4455 (2011)
Hinrichs, E.W., Hinrichs, M., Zastrow, T.: WebLicht: web-based LRT services for German. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010): System Demonstrations, pp. 25–29. Uppsala, Sweden. http://www.aclweb.org/anthology/P10-4005 (2010)
Kilgarriff, A.: The Sketch Engine: ten years on. Lexicography, pp. 1–30 (2014)
Kipp, M.: ANVIL: The video annotation research tool. In: Durand, J., Gut, U., Kristofferson, G. (eds.) Handbook of Corpus Phonology. Oxford University Press, Oxford (2014)
Kulkarni, N., Finlayson, M.A.: jMWE: A Java Toolkit for detecting multi-word expressions. In: Kordoni, V., Ramisch, C., Villavicencio, A. (eds.) Proceedings of the 8th Workshop on Multiword Expressions: From Parsing and Generation to the Real World (MWE 2011), pp. 122–124. Association for Computational Linguistics (ACL), Portland. http://www.aclweb.org/anthology/W11-0818 (2011)
MacWhinney, B.: The CHILDES Project: Tools for Analyzing Talk (Electronic Edition Part 2: The CLAN Programs). Carnegie Mellon University, Pittsburg. http://childes.psy.cmu.edu/manuals/CLAN.pdf (2015)
Maeda, K., Bird, S., Ma, X., Lee, H.: Creating annotation tools with the annotation graph toolkit. In: Proceedings of the Third International Conference on Language Resources and Evaluation. Paris, France (2002)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014): System Demonstrations, pp. 55–60. http://www.aclweb.org/anthology/P/P14/P14-5010 (2014)
Marcel, B., Florian, P., Stefanie Dipper, J.K.: CorA: A web-based annotation tool for historical and other non-standard language data. In: Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), pp. 86–90. Gothenburg, Sweden (2014)
Neale, S., Silva, J., Branco, A.: A flexible interface tool for manual word sense annotation. In: Bunt, H. (ed.) Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-11). London, UK. http://www.aclweb.org/anthology/W/W15/W15-0208.pdf (2015)
Orasan, C.: PALinkA: A highly customisable tool for discourse annotation. In: Proceedings of the 4th SIGdial Workshop on Discourse and Dialog (2001)
Petasis, G., Karkaletsis, V.: Ellogon: A new text engineering platform. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002), pp. 72–78. Las Palmas, Canary Islands. http://arxiv.org/abs/cs/0205017 (2002)
Pradhan, S., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R., Xue, N. (eds.): Proceedings of the 15th Conference on Computational Natural Language Learning (CoNLL-2011): Shared Task. Association for Computational Linguistics, Portland, Oregon. http://www.aclweb.org/anthology/W11-19 (2011)
Pustejovsky, J., Stubbs, A.: Natural Language Annotation for Machine Learning: A guide to corpus-building for applications. O’Reilly, Sebastopol (2013)
Schmidt, T., Wörner, K.: EXMARaLDA – Creating, analysing and sharing spoken language corpora for pragmatic research. Pragmatics 19, 565–582 (2009)
Seid Muhie, Y., Gurevych, I., de Castilho, R.E. Biemann, C.: WebAnno: a flexible, web-based and visually supported system for distributed annotations. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013): System Demonstrations, pp. 1–6. Sofia, Bulgaria (2013)
Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: brat: a web-based tool for NLP-assisted text annotation. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012): Demonstrations, pp. 102–107. Avignon, France. http://www.aclweb.org/anthology/E12-2021 (2012)
Stubbs, A.: MAE and MAI: lightweight annotation and adjudication tools. In: Proceedings of the 5th Linguistic Annotation Workshop (LAW V), pp. 129–133. Association for Computational Linguistics., Portland, Oregon, USA http://www.aclweb.org/anthology/W11-0416 (2011)
Verhagen, M., Knippen, R., Mani, I., Pustejovsky, J.: Annotation of temporal relations with Tango. In: Proceedings of the 5th Languange Resources and Evaluation Confernece (LREC 2006), pp. 2249–2252. European Language Resources Association (ELRA), Genoa, Italy (2006)
Zeldes, A., Ritz, J., Lüdeling, A., Chiarcos, C.: ANNIS: a search tool for multi-layer annotated corpora. In: Proceedings of Corpus Linguistics 2009. Liverpool. http://ucrel.lancs.ac.uk/publications/cl2009/ (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Finlayson, M.A., Erjavec, T. (2017). Overview of Annotation Creation: Processes and Tools. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_5
Download citation
DOI: https://doi.org/10.1007/978-94-024-0881-2_5
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)