Building FactBank or How to Annotate Event Factuality One Step at a Time

Saurí, Roser

doi:10.1007/978-94-024-0881-2_34

Roser Saurí³

2225 Accesses

Abstract

FactBank is a corpus of news reports containing event mentions annotated with their factuality status—that is, whether they refer to factual situations, possibilities, or events that did (or will) not take place in the world. Annotating this level of information involves challenges of different types concerning the annotation procedure. For example: What is the adequate level of annotation (sentence, clause, lexical unit)? What are the elements involved in the linguistic expression of event factuality and that should thus be accounted for in the annotation scheme? Should it be a text-extent annotation or a classification task? This article presents the methodological decisions adopted for building FactBank and details the different steps of the annotation process. An analysis of the complexity of the data and the annotation results suggests that the methodological framework applied for building FactBank (annotation scheme, set of factuality values, etc.) is adequately rich for expressing the necessary distinctions while, at the same time, simple enough for ensuring coherent data, as attested by the good interannotation agreement scores obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Softcover Book: USD 449.99; Price excludes VAT (USA)

Hardcover Book: USD 449.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Evidence-Based Document-Level Event Factuality Identification

Enriching news events with meta-knowledge information

Article Open access 13 February 2016

Chinese Event Factuality Detection

Notes

1.
In this chapter, the terms event and eventuality will be used in a very broad sense to refer to both processes and states, but also other abstract objects such as situations, propositions, facts, possibilities, etc.
2.
The main references for these corpora are: PropBank [42], FrameNet [5], RST Corpus [8], Penn Discourse TreeBank [35], GraphBank [62], TimeBank [45], and MPQA Opinion Corpus [60].
3.
Events in the examples will be identified by marking only their verb, noun, or adjective head, following the convention assumed in TimeML, the specification language for temporal information [44]. Some of the sentences in these examples contain other event expressions (e.g., regretted, claimed, generation, etc.). Here, only those that are relevant for the example’s sake are underlined.
4.
Because of its recent adoption in the NLP area of sentiment analysis, the term polarity is often taken to express only the direction of an opinion (i.e., positive vs. negative). Here, I use the term in its original grammatical sense, that is, as conveying the distinction between affirmative and negative contexts (e.g., [22]). Being more abstract, this definition encompasses the different facets of the positive/negative opposition, and not only the one relevant in opinion mining.
5.
The use of square brackets in this and coming examples is only for making explicit the syntactic complexity of the sentence. Square brackets are not part of the annotation scheme, as will be presented later.
6.
This differs from most of the work within truth-conditional semantics, which conceives of modality as independent from the speaker’s perspective (e.g., [29]).
7.
The original sentence is example (9b) (http://www.irishtimes.com/newspaper/ireland/2011/0502/1224295867753.html). The other two have been adapted for the argument’s sake.
8.
This is equivalent to the notation ${\langle }author$, $nelles\rangle $ in Wiebe’s work. FactBank adopts a reversed representation of the nesting (i.e., the non-embedded source last) because it positions the most direct source of the event at the outmost layer, thus facilitating its reading.
9.
The vowels naming the vertices, which are derived from Latin verbs a ff i rmo ‘I affirm’, and n e g o ‘I deny’, reflect this distinction.
10.
This step is applied here only for the purpose of illustrating the complete process, although it should be clear just from the meaning of the sentence that the event change in the original example is presented with some degree of uncertainty.
11.
http://www.timeml.org/site/timebank/timebank.html.
12.
http://www.timeml.org/site/timebank/timebank.html.
13.
The figures reported here update those reported in previous work [51, 52].
14.
Note that some events are also factuality markers.
15.
Likewise with event markup, only the heads of source expressions are annotated here.
16.
See also http://compprag.christopherpotts.net/factbank.html.

References

ACE.: ACE (Automatic Content Extraction) English Annotation Guidelines for Relations (Version 6.0 2008.01.07 ed.). http://www.ldc.upenn.edu/Projects/ACE/ (2008)
Aikhenvald, A.Y.: Evidentiality. Oxford University Press, Oxford (2004)
Google Scholar
Asher, N.: Reference to Abstract Objects in English. Kluwer Academic Press, Dordrecht (1993)
Google Scholar
Bach, K., Harnish, R.M.: Linguistic Communication and Speech Acts. The MIT Press, Cambridge (1979)
Google Scholar
Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of the 17th International Conference on Computational Linguistics, pp. 8690 (1998)
Google Scholar
Bethard, S., Yu, H., Thornton, A., Hatzivassiloglou, V., Jurafsky, D.: Automatic extraction of opinion propositions and their holders. In: 2004 AAAI Spring Symposium on Exploring Attitude and Affect in Text (2004)
Google Scholar
Biber, D., Finegan, E.: Styles of stance in English: Lexical and grammatical marking of evidentiality and affect. Text 9(1), 93–124 (1989)
Google Scholar
Carlson, L., Marcu, D., Okurowski, M.E.: Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. In: Kuppevelt, J.V., Smith, R.W. (eds.) Current and New Directions in Discourse and Dialogue. Springer, Berlin (2003)
Google Scholar
Chafe, W.: Evidentiality in English conversation and academic writing. In: Chafe, W., Nichols, J. (eds.) Evidentiality: The Linguistic Coding of Epistemology. Ablex Publishing Corporation, Norwood (1986)
Google Scholar
Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: Identifying sources of opinions with conditional random fields and extraction patterns. In: Proceedings of the HLT/EMNLP (2005)
Google Scholar
Dalianis, H., Skeppstedt, M.: Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 5–13 (2010)
Google Scholar
de Marneffe, M.-C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of LREC 2006, pp. 449–454. Genoa (2006)
Google Scholar
de Marneffe, M.-C., Manning, C.D., Potts, C.: Did it happen? The pragmatic complexity of veridicality assessment. Comput. Linguist. 38(2), 301333 (2012)
Article Google Scholar
Diab, M., Dorr, B., Levin, L., Mitamura, T., Passonneau, R., Rambow, O., Ramshaw, L.: Language Understanding Annotation Corpus. Linguistic Data Consortium. LDC2009T10 (2009)
Google Scholar
Dor, D.: Representations, Attitudes and Factivity Evaluations. An Epistemically-based Analysis of lexical Selection. PhD thesis, Stanford University (1995)
Google Scholar
Farkas, R., Vincze, V., Mra, G., Csirik, J., Szarvas, G.: The CoNLL-2010 Shared Task: Learning to detect hedges and their scope in natural language text. In: Proceedings of the 14th Conference on Computational Natural Language Learning Shared Task, pp. 1–12 (2010)
Google Scholar
Haan, F.d.: The Interaction of Modality and Negation: A Typological Study. Garland, New York (1997)
Google Scholar
Halliday, M.A.K., Matthiessen, C.M.I.M.: An Introduction to Functional Grammar. Hodder Arnold, London (2004)
Google Scholar
Henriksson, A., Velupillai, S.: Levels of certainty in knowledge-intensive corpora: an initial annotation study. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 41–45 (2010)
Google Scholar
Hooper, J.B.: On assertive predicates. In: Kimball, J. (ed.) Syntax and Semantics, IV, pp. 91–124. Academic Press, New York (1975)
Google Scholar
Horn, L.R.: On the Semantic Properties of Logical Operators in English. PhD thesis, UCLA. Distributed by the Indiana University Linguistics Club in 1976 (1972)
Google Scholar
Horn, L.R.: A Natural History of Negation, vol. 960. University of Chicago Press Chicago, Chicago (1989)
Google Scholar
Hyland, K.: Writing without conviction? Hedging in science research articles. Appl. Linguist. 14(4), 433–454 (1996)
Article Google Scholar
Karttunen, L.: Implicative verbs. Language 47, 340358 (1971)
Article Google Scholar
Kiefer, F.: On defining modality. Folia Linguist. XX I(1), 67–94 (1987)
Google Scholar
Kilicoglu, H., Bergler, S.: Recognizing speculative language in biomedical research articles: a linguistically motivated perspective. BMC Bioinform. 9(Suppl 11), S10 (2008)
Article Google Scholar
Kim, J.-D., Ohta, T., Tsujii, J.: Corpus annotation for mining biomedical events from literature. BMC Bioinform. 9(1), 10 (2008)
Article Google Scholar
Kiparsky, P., Kiparsky, C.: Fact. In: Bierwisch, M., Heidolph, K.E. (eds.) Progress in Linguistics. A Collection of Papers, pp. 143173. The Hague, Paris: Mouton (1970)
Google Scholar
Kratzer, A.: Modality. In: Stechow, A.v., Wunderlich, D. (eds.) Semantik: Ein internationales Handbuch der zeitgenoessischen Forschung, pp. 639–650. Walter de Gruyter, Berlin (1991)
Google Scholar
Lakoff, G.: Hedges: a study in meaning criteria and the logic of fuzzy concepts. J. Philos. Log. 2(4), 458–508 (1973)
Article Google Scholar
Light, M., Qiu, X.Y., Srinivasan, P.: The language of bioscience: facts, speculations, and statements in between. In: BioLINK 2004: Linking Biological Literature, Ontologies, and Databases, pp. 17–24 (2004)
Google Scholar
Lyons, J.: Semantics. Cambridge University Press, Cambridge (1977)
Book Google Scholar
Martin, J.R., White, P.R.R.: Language of Evaluation: Appraisal in English. Palgrave Macmillan (2005)
Google Scholar
Medlock, B., Briscoe, T.: Weakly supervised learning for hedge classification in scientific literature. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 992–999 (2007)
Google Scholar
Miltsakaki, E., Prasad, R., Joshi, A., Webber, B.: The Penn Discourse Treebank. In: Proceedings of LREC 2004 (2004)
Google Scholar
Morante, R., Daelemans, W.: Learning the scope of hedge cues in biomedical texts. In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, pp. 28–36 (2009)
Google Scholar
Morante, R., Daelemans, W.: A metalearning approach to processing the scope of negation. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pp. 21–29 (2009)
Google Scholar
Mushin, I.: Evidentiality and Epistemological Stance. John Benjamins Publisher, Amsterdam (2001)
Book Google Scholar
Nawaz, R., Thompson, P., Ananiadou, S.: Evaluating a meta-knowledge annotation scheme for bio-events. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 69–77 (2010)
Google Scholar
Ohta, T., Kim, J.-D., Tsuji, J.: Guidelines for event annotation (2007)
Google Scholar
Palmer, F.R.: Mood and Modality. Cambridge University Press, Cambridge (1986)
Google Scholar
Palmer, M., Gildea, D., Kingsbury, P.: The Proposition Bank: An annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–105 (2005)
Google Scholar
Prasad, R., Dinesh, N., Lee, A., Joshi, A., Webber, B.: Attribution and its annotation in the Penn Discourse TreeBank. Traitement Automatique des Langues 47(2), 43–64 (2007)
Google Scholar
Pustejovsky, J., Knippen, R., Littman, J., Saurí, R.: Temporal and event information in natural language text. Language Resources and Evaluation 39(2–3), 123164 (2005)
Google Scholar
Pustejovsky, J., Verhagen, M., Saurí, R., Littman, J., Gaizauskas, R., Katz, G., Mani, I., Knippen, R., Setzer, A.: TimeBank 1.2. Linguistic Data Consortium (LDC), Philadelphia, Pennsylvania. LDC Catalog No. 2006T08 (2006)
Google Scholar
Rizomilioti, V.: Exploring epistemic modality in academic discourse using corpora. In: Maci, E.A., Cervera, A.S., Ramos, C.R. (eds.) Information Technology in Languages for Specific Purposes, vol. 7, pp. 53–71. Springer, US (2006)
Chapter Google Scholar
Rubin, V.L.: Identifying certainty in Texts. PhD thesis, Syracuse University (2006)
Google Scholar
Rubin, V.L.: Stating with certainty or stating with doubt: intercoder reliability results for manual annotation of epistemically modalized statements. In: Proceedings of the NAACL-HLT 2007, pp. 141–144 (2007)
Google Scholar
Rubin, V.L.: Epistemic modality: From uncertainty to certainty in the context of information seeking as interactions with texts. Inf. Process. Manag. 46, 533–540 (2010)
Article Google Scholar
Rubin, V.L., Liddy, E.D., Kando, N.: Certainty identification in texts: categorization model and manual tagging results. In: Shanahan, J., Qu, Y., Wiebe, J. (eds.) Computing Attitude and Affect in Text: Theories and Applications. Springer, New York (2005)
Google Scholar
Saurí, R.: A Factuality Profiler for Eventualities in Text. PhD thesis, Brandeis University (2008)
Google Scholar
Saurí, R., Pustejovsky, J.: From structure to interpretation: a double-layered annotation for event factuality. In: Proceedings of the 2nd Linguistic Annotation Workshop (The LAW II). LREC 2008, Marrakech, Morocco (2008)
Google Scholar
Saurí, R., Pustejovsky, J.: FactBank. a corpus annotated with event factuality. Lang. Res. Eval. 43, 227–268 (2009)
Article Google Scholar
Saurí, R., Pustejovsky, J.: Are you sure that this happened? Assessing the factuality degree of events in text. Comput. Linguist. 38(2), 261299 (2012)
Article Google Scholar
Saurí, R., Verhagen, M., Pustejovsky, J.: Annotating and recognizing event modality in text. In: 19th International FLAIRS Conference, FLAIRS 2006 (2006)
Google Scholar
Szarvas, G.: Hedge classification in biomedical texts with a weakly supervised selection of keywords. In: ACL 08: HLT, pp. 281–289 (2008)
Google Scholar
Van Valin, R.D. LaPolla, R.J.: Syntax. Structure, Meaning and Function. Cambridge University Press, Cambridge (1997)
Google Scholar
Velupillai, S.: Towards a better understanding of uncertainties and speculations in Swedish clinical text: analysis of an initial annotation trial. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 14–22 (2010)
Google Scholar
Vincze, V., Szarvas, G., Farkas, R., Mra, G., Csirik, J.: The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinform. 9(Suppl 11), S9 (2008)
Article Google Scholar
Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotions in language. Lang. Res. Eval. 39(2–3), 165–210 (2005)
Article Google Scholar
Wilbur, W.J., Rzhetsky, A., Shatkay, H.: New directions in biomedical text annotation: definitions, guidelines and corpus construction. BMC Bioinform. 7(1), 356+ (2006)
Google Scholar
Wolf, F., Gibson, E.: Representing discourse coherence: a corpus-based analysis. Comput. Linguist. 31(2), 249–287 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dictionaries Technology Group—Global Academic, Oxford University Press, Oxford, UK
Roser Saurí

Authors

Roser Saurí
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roser Saurí .

Editor information

Editors and Affiliations

Department of Computer Science, Vassar College, Poughkeepsie, New York, USA
Nancy Ide
Department of Computer Science, Volen Center for Complex Systems, Brandeis University, Waltham, Massachusetts, USA
James Pustejovsky

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Saurí, R. (2017). Building FactBank or How to Annotate Event Factuality One Step at a Time. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_34

Download citation

DOI: https://doi.org/10.1007/978-94-024-0881-2_34
Published: 17 June 2017
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics

Building FactBank or How to Annotate Event Factuality One Step at a Time

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Evidence-Based Document-Level Event Factuality Identification

Enriching news events with meta-knowledge information

Chinese Event Factuality Detection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Building FactBank or How to Annotate Event Factuality One Step at a Time

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Evidence-Based Document-Level Event Factuality Identification

Enriching news events with meta-knowledge information

Chinese Event Factuality Detection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation