Abstract
FactBank is a corpus of news reports containing event mentions annotated with their factuality status—that is, whether they refer to factual situations, possibilities, or events that did (or will) not take place in the world. Annotating this level of information involves challenges of different types concerning the annotation procedure. For example: What is the adequate level of annotation (sentence, clause, lexical unit)? What are the elements involved in the linguistic expression of event factuality and that should thus be accounted for in the annotation scheme? Should it be a text-extent annotation or a classification task? This article presents the methodological decisions adopted for building FactBank and details the different steps of the annotation process. An analysis of the complexity of the data and the annotation results suggests that the methodological framework applied for building FactBank (annotation scheme, set of factuality values, etc.) is adequately rich for expressing the necessary distinctions while, at the same time, simple enough for ensuring coherent data, as attested by the good interannotation agreement scores obtained.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In this chapter, the terms event and eventuality will be used in a very broad sense to refer to both processes and states, but also other abstract objects such as situations, propositions, facts, possibilities, etc.
- 2.
- 3.
Events in the examples will be identified by marking only their verb, noun, or adjective head, following the convention assumed in TimeML, the specification language for temporal information [44]. Some of the sentences in these examples contain other event expressions (e.g., regretted, claimed, generation, etc.). Here, only those that are relevant for the example’s sake are underlined.
- 4.
Because of its recent adoption in the NLP area of sentiment analysis, the term polarity is often taken to express only the direction of an opinion (i.e., positive vs. negative). Here, I use the term in its original grammatical sense, that is, as conveying the distinction between affirmative and negative contexts (e.g., [22]). Being more abstract, this definition encompasses the different facets of the positive/negative opposition, and not only the one relevant in opinion mining.
- 5.
The use of square brackets in this and coming examples is only for making explicit the syntactic complexity of the sentence. Square brackets are not part of the annotation scheme, as will be presented later.
- 6.
This differs from most of the work within truth-conditional semantics, which conceives of modality as independent from the speaker’s perspective (e.g., [29]).
- 7.
The original sentence is example (9b) (http://www.irishtimes.com/newspaper/ireland/2011/0502/1224295867753.html). The other two have been adapted for the argument’s sake.
- 8.
This is equivalent to the notation \({\langle }author\), \(nelles\rangle \) in Wiebe’s work. FactBank adopts a reversed representation of the nesting (i.e., the non-embedded source last) because it positions the most direct source of the event at the outmost layer, thus facilitating its reading.
- 9.
The vowels naming the vertices, which are derived from Latin verbs a ff i rmo ‘I affirm’, and n e g o ‘I deny’, reflect this distinction.
- 10.
This step is applied here only for the purpose of illustrating the complete process, although it should be clear just from the meaning of the sentence that the event change in the original example is presented with some degree of uncertainty.
- 11.
- 12.
- 13.
- 14.
Note that some events are also factuality markers.
- 15.
Likewise with event markup, only the heads of source expressions are annotated here.
- 16.
References
ACE.: ACE (Automatic Content Extraction) English Annotation Guidelines for Relations (Version 6.0 2008.01.07 ed.). http://www.ldc.upenn.edu/Projects/ACE/ (2008)
Aikhenvald, A.Y.: Evidentiality. Oxford University Press, Oxford (2004)
Asher, N.: Reference to Abstract Objects in English. Kluwer Academic Press, Dordrecht (1993)
Bach, K., Harnish, R.M.: Linguistic Communication and Speech Acts. The MIT Press, Cambridge (1979)
Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of the 17th International Conference on Computational Linguistics, pp. 8690 (1998)
Bethard, S., Yu, H., Thornton, A., Hatzivassiloglou, V., Jurafsky, D.: Automatic extraction of opinion propositions and their holders. In: 2004 AAAI Spring Symposium on Exploring Attitude and Affect in Text (2004)
Biber, D., Finegan, E.: Styles of stance in English: Lexical and grammatical marking of evidentiality and affect. Text 9(1), 93–124 (1989)
Carlson, L., Marcu, D., Okurowski, M.E.: Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. In: Kuppevelt, J.V., Smith, R.W. (eds.) Current and New Directions in Discourse and Dialogue. Springer, Berlin (2003)
Chafe, W.: Evidentiality in English conversation and academic writing. In: Chafe, W., Nichols, J. (eds.) Evidentiality: The Linguistic Coding of Epistemology. Ablex Publishing Corporation, Norwood (1986)
Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: Identifying sources of opinions with conditional random fields and extraction patterns. In: Proceedings of the HLT/EMNLP (2005)
Dalianis, H., Skeppstedt, M.: Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 5–13 (2010)
de Marneffe, M.-C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of LREC 2006, pp. 449–454. Genoa (2006)
de Marneffe, M.-C., Manning, C.D., Potts, C.: Did it happen? The pragmatic complexity of veridicality assessment. Comput. Linguist. 38(2), 301333 (2012)
Diab, M., Dorr, B., Levin, L., Mitamura, T., Passonneau, R., Rambow, O., Ramshaw, L.: Language Understanding Annotation Corpus. Linguistic Data Consortium. LDC2009T10 (2009)
Dor, D.: Representations, Attitudes and Factivity Evaluations. An Epistemically-based Analysis of lexical Selection. PhD thesis, Stanford University (1995)
Farkas, R., Vincze, V., Mra, G., Csirik, J., Szarvas, G.: The CoNLL-2010 Shared Task: Learning to detect hedges and their scope in natural language text. In: Proceedings of the 14th Conference on Computational Natural Language Learning Shared Task, pp. 1–12 (2010)
Haan, F.d.: The Interaction of Modality and Negation: A Typological Study. Garland, New York (1997)
Halliday, M.A.K., Matthiessen, C.M.I.M.: An Introduction to Functional Grammar. Hodder Arnold, London (2004)
Henriksson, A., Velupillai, S.: Levels of certainty in knowledge-intensive corpora: an initial annotation study. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 41–45 (2010)
Hooper, J.B.: On assertive predicates. In: Kimball, J. (ed.) Syntax and Semantics, IV, pp. 91–124. Academic Press, New York (1975)
Horn, L.R.: On the Semantic Properties of Logical Operators in English. PhD thesis, UCLA. Distributed by the Indiana University Linguistics Club in 1976 (1972)
Horn, L.R.: A Natural History of Negation, vol. 960. University of Chicago Press Chicago, Chicago (1989)
Hyland, K.: Writing without conviction? Hedging in science research articles. Appl. Linguist. 14(4), 433–454 (1996)
Karttunen, L.: Implicative verbs. Language 47, 340358 (1971)
Kiefer, F.: On defining modality. Folia Linguist. XX I(1), 67–94 (1987)
Kilicoglu, H., Bergler, S.: Recognizing speculative language in biomedical research articles: a linguistically motivated perspective. BMC Bioinform. 9(Suppl 11), S10 (2008)
Kim, J.-D., Ohta, T., Tsujii, J.: Corpus annotation for mining biomedical events from literature. BMC Bioinform. 9(1), 10 (2008)
Kiparsky, P., Kiparsky, C.: Fact. In: Bierwisch, M., Heidolph, K.E. (eds.) Progress in Linguistics. A Collection of Papers, pp. 143173. The Hague, Paris: Mouton (1970)
Kratzer, A.: Modality. In: Stechow, A.v., Wunderlich, D. (eds.) Semantik: Ein internationales Handbuch der zeitgenoessischen Forschung, pp. 639–650. Walter de Gruyter, Berlin (1991)
Lakoff, G.: Hedges: a study in meaning criteria and the logic of fuzzy concepts. J. Philos. Log. 2(4), 458–508 (1973)
Light, M., Qiu, X.Y., Srinivasan, P.: The language of bioscience: facts, speculations, and statements in between. In: BioLINK 2004: Linking Biological Literature, Ontologies, and Databases, pp. 17–24 (2004)
Lyons, J.: Semantics. Cambridge University Press, Cambridge (1977)
Martin, J.R., White, P.R.R.: Language of Evaluation: Appraisal in English. Palgrave Macmillan (2005)
Medlock, B., Briscoe, T.: Weakly supervised learning for hedge classification in scientific literature. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 992–999 (2007)
Miltsakaki, E., Prasad, R., Joshi, A., Webber, B.: The Penn Discourse Treebank. In: Proceedings of LREC 2004 (2004)
Morante, R., Daelemans, W.: Learning the scope of hedge cues in biomedical texts. In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, pp. 28–36 (2009)
Morante, R., Daelemans, W.: A metalearning approach to processing the scope of negation. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pp. 21–29 (2009)
Mushin, I.: Evidentiality and Epistemological Stance. John Benjamins Publisher, Amsterdam (2001)
Nawaz, R., Thompson, P., Ananiadou, S.: Evaluating a meta-knowledge annotation scheme for bio-events. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 69–77 (2010)
Ohta, T., Kim, J.-D., Tsuji, J.: Guidelines for event annotation (2007)
Palmer, F.R.: Mood and Modality. Cambridge University Press, Cambridge (1986)
Palmer, M., Gildea, D., Kingsbury, P.: The Proposition Bank: An annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–105 (2005)
Prasad, R., Dinesh, N., Lee, A., Joshi, A., Webber, B.: Attribution and its annotation in the Penn Discourse TreeBank. Traitement Automatique des Langues 47(2), 43–64 (2007)
Pustejovsky, J., Knippen, R., Littman, J., Saurí, R.: Temporal and event information in natural language text. Language Resources and Evaluation 39(2–3), 123164 (2005)
Pustejovsky, J., Verhagen, M., Saurí, R., Littman, J., Gaizauskas, R., Katz, G., Mani, I., Knippen, R., Setzer, A.: TimeBank 1.2. Linguistic Data Consortium (LDC), Philadelphia, Pennsylvania. LDC Catalog No. 2006T08 (2006)
Rizomilioti, V.: Exploring epistemic modality in academic discourse using corpora. In: Maci, E.A., Cervera, A.S., Ramos, C.R. (eds.) Information Technology in Languages for Specific Purposes, vol. 7, pp. 53–71. Springer, US (2006)
Rubin, V.L.: Identifying certainty in Texts. PhD thesis, Syracuse University (2006)
Rubin, V.L.: Stating with certainty or stating with doubt: intercoder reliability results for manual annotation of epistemically modalized statements. In: Proceedings of the NAACL-HLT 2007, pp. 141–144 (2007)
Rubin, V.L.: Epistemic modality: From uncertainty to certainty in the context of information seeking as interactions with texts. Inf. Process. Manag. 46, 533–540 (2010)
Rubin, V.L., Liddy, E.D., Kando, N.: Certainty identification in texts: categorization model and manual tagging results. In: Shanahan, J., Qu, Y., Wiebe, J. (eds.) Computing Attitude and Affect in Text: Theories and Applications. Springer, New York (2005)
Saurí, R.: A Factuality Profiler for Eventualities in Text. PhD thesis, Brandeis University (2008)
Saurí, R., Pustejovsky, J.: From structure to interpretation: a double-layered annotation for event factuality. In: Proceedings of the 2nd Linguistic Annotation Workshop (The LAW II). LREC 2008, Marrakech, Morocco (2008)
Saurí, R., Pustejovsky, J.: FactBank. a corpus annotated with event factuality. Lang. Res. Eval. 43, 227–268 (2009)
Saurí, R., Pustejovsky, J.: Are you sure that this happened? Assessing the factuality degree of events in text. Comput. Linguist. 38(2), 261299 (2012)
Saurí, R., Verhagen, M., Pustejovsky, J.: Annotating and recognizing event modality in text. In: 19th International FLAIRS Conference, FLAIRS 2006 (2006)
Szarvas, G.: Hedge classification in biomedical texts with a weakly supervised selection of keywords. In: ACL 08: HLT, pp. 281–289 (2008)
Van Valin, R.D. LaPolla, R.J.: Syntax. Structure, Meaning and Function. Cambridge University Press, Cambridge (1997)
Velupillai, S.: Towards a better understanding of uncertainties and speculations in Swedish clinical text: analysis of an initial annotation trial. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 14–22 (2010)
Vincze, V., Szarvas, G., Farkas, R., Mra, G., Csirik, J.: The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinform. 9(Suppl 11), S9 (2008)
Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotions in language. Lang. Res. Eval. 39(2–3), 165–210 (2005)
Wilbur, W.J., Rzhetsky, A., Shatkay, H.: New directions in biomedical text annotation: definitions, guidelines and corpus construction. BMC Bioinform. 7(1), 356+ (2006)
Wolf, F., Gibson, E.: Representing discourse coherence: a corpus-based analysis. Comput. Linguist. 31(2), 249–287 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Saurí, R. (2017). Building FactBank or How to Annotate Event Factuality One Step at a Time. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_34
Download citation
DOI: https://doi.org/10.1007/978-94-024-0881-2_34
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)