Abstract
This paper proposes the application of the RDF framework to the representation of linguistic annotations. We argue that RDF is a suitable data model to capture multiple annotations on the same text segment, and to integrate multiple layers of annotations. As well as using RDF for this purpose, the main contribution of the paper is an OWL ontology, called TELIX (Text Encoding and Linguistic Information eXchange), which models annotation content. This ontology builds on the SKOS XL vocabulary, a W3C standard for representation of lexical entities as RDF graphs. We extend SKOS XL in order to capture lexical relations between words (e.g., synonymy), as well as to support word sense disambiguation, morphological features and syntactic analysis, among others. In addition, a formal mapping of feature structures to RDF graphs is defined, enabling complex composition of linguistic entities. Finally, the paper also suggests the use of RDFa as a convenient syntax that combines source texts and linguistic annotations in the same file.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
TEI P5: Guidelines for Electronic Text Encoding and Interchange. Technical report, TEI Consortium (2012), http://www.tei-c.org/Guidelines/P5/
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A Nucleus for a Web of Open Data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Bechhofer, S., Miles, A.: SKOS Simple Knowledge Organization System Reference. W3C recommendation, W3C (August 2009), http://www.w3.org/TR/2009/REC-skos-reference-20090818/
Birbeck, M., Adida, B.: RDFa primer. W3C note, W3C (October 2008), http://www.w3.org/TR/2008/NOTE-xhtml-rdfa-primer-20081014/
Carroll, J.J., Bizer, C., Hayes, P., Stickler, P.: Named graphs, Provenance and Trust. In: WWW 2005: Proceedings of the 14th International Conference on World Wide Web, pp. 613–622. ACM, New York (2005)
Cassidy, S.: An RDF realisation of LAF in the DADA annotation server. In: Proceedings of ISA-5, Hong Kong (2010)
Chiarcos, C.: An Ontology of Linguistic Annotations. LDV Forum 23(1), 1–16 (2008)
Chiarcos, C.: POWLA: Modeling Linguistic Corpora in OWL/DL. In: Simperl, E., et al. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 225–239. Springer, Heidelberg (2012)
Derdek, S., El Ghali, A.: Une chaîne UIMA pour l’analyse de documents de réglementation. In: Proceeding of SOS 2011, Brest, France (2011)
Dipper, S.: XML-based stand-off representation and exploitation of multi-level linguistic annotation. In: Proceedings of Berliner XML Tage 2005 (BXML 2005), pp. 39–50 (2005)
Lévy, F. (ed.): D1.4 Interactive ontology and policy acquisition tools. Technical report, Ontorule project (2011), http://ontorule-project.eu/
Farrar, S., Langendoen, T.: A Linguistic Ontology for the Semantic Web. GLOT International 7, 95–100 (2003)
Hayes, P.: RDF semantics. W3C recommendation. W3C (February 2004), http://www.w3.org/TR/2004/REC-rdf-mt-20040210/
Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space, 1st edn. Morgan & Claypool (2011)
Hellmann, S.: NLP Interchange Format (NIF) 1.0 specification, http://nlp2rdf.org/nif-1-0
Ide, N., Romary, L.: International Standard for a Linguistic Annotation Framework. Journal of Natural Language Engineering 10 (2004)
Ide, N., Suderman, K.: GrAF: a graph-based format for linguistic annotations. In: Proceedings of the Linguistic Annotation Workshop, LAW 2007, Stroudsburg, PA, USA, pp. 1–8. Association for Computational Linguistics (2007)
King, P.J.: An Expanded Logical Formalism for Head-Driven Phrase Structure Grammar. Arbeitspapiere des SFB 340 (1994)
McCrae, J., Spohr, D., Cimiano, P.: Linking Lexical Resources and Ontologies on the Semantic Web with Lemon. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 245–259. Springer, Heidelberg (2011)
Miles, A., Bechhofer, S.: SKOS Simple Knowledge Organization System eXtension for Labels (SKOS-XL). W3C recommendation, W3C (August 2009), http://www.w3.org/TR/2009/REC-skos-reference-20090818/skos-xl.html
Pollard, C.: Lectures on the Foundations of HPSG. Technical report, Unpublished manuscript: Ohio State University (1997), http://www-csli.stanford.edu/~sag/L221a/cp-lec-notes.pdf
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword Expressions: A Pain in the Neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)
Schreiber, G., van Assem, M., Gangemi, A.: RDF/OWL representation of WordNet. W3C working draft, W3C (June 2006), http://www.w3.org/TR/2006/WD-wordnet-rdf-20060619/
Seaborne, A., Harris, S.: SPARQL 1.1 query. W3C working draft, W3C (October 2009), http://www.w3.org/TR/2009/WD-sparql11-query-20091022/
Tobin, R., Cowan, J.: XML information set, W3C recommendation, W3C, 2nd edn. (February 2004), http://www.w3.org/TR/2004/REC-xml-infoset-20040204
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rubiera, E., Polo, L., Berrueta, D., El Ghali, A. (2012). TELIX: An RDF-Based Model for Linguistic Annotation. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds) The Semantic Web: Research and Applications. ESWC 2012. Lecture Notes in Computer Science, vol 7295. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30284-8_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-30284-8_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30283-1
Online ISBN: 978-3-642-30284-8
eBook Packages: Computer ScienceComputer Science (R0)