Skip to main content

Building FactBank or How to Annotate Event Factuality One Step at a Time

  • Chapter
  • First Online:
Handbook of Linguistic Annotation
  • 2225 Accesses

Abstract

FactBank is a corpus of news reports containing event mentions annotated with their factuality status—that is, whether they refer to factual situations, possibilities, or events that did (or will) not take place in the world. Annotating this level of information involves challenges of different types concerning the annotation procedure. For example: What is the adequate level of annotation (sentence, clause, lexical unit)? What are the elements involved in the linguistic expression of event factuality and that should thus be accounted for in the annotation scheme? Should it be a text-extent annotation or a classification task? This article presents the methodological decisions adopted for building FactBank and details the different steps of the annotation process. An analysis of the complexity of the data and the annotation results suggests that the methodological framework applied for building FactBank (annotation scheme, set of factuality values, etc.) is adequately rich for expressing the necessary distinctions while, at the same time, simple enough for ensuring coherent data, as attested by the good interannotation agreement scores obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    In this chapter, the terms event and eventuality will be used in a very broad sense to refer to both processes and states, but also other abstract objects such as situations, propositions, facts, possibilities, etc.

  2. 2.

    The main references for these corpora are: PropBank [42], FrameNet [5], RST Corpus [8], Penn Discourse TreeBank [35], GraphBank [62], TimeBank [45], and MPQA Opinion Corpus [60].

  3. 3.

    Events in the examples will be identified by marking only their verb, noun, or adjective head, following the convention assumed in TimeML, the specification language for temporal information [44]. Some of the sentences in these examples contain other event expressions (e.g., regretted, claimed, generation, etc.). Here, only those that are relevant for the example’s sake are underlined.

  4. 4.

    Because of its recent adoption in the NLP area of sentiment analysis, the term polarity is often taken to express only the direction of an opinion (i.e., positive vs. negative). Here, I use the term in its original grammatical sense, that is, as conveying the distinction between affirmative and negative contexts (e.g., [22]). Being more abstract, this definition encompasses the different facets of the positive/negative opposition, and not only the one relevant in opinion mining.

  5. 5.

    The use of square brackets in this and coming examples is only for making explicit the syntactic complexity of the sentence. Square brackets are not part of the annotation scheme, as will be presented later.

  6. 6.

    This differs from most of the work within truth-conditional semantics, which conceives of modality as independent from the speaker’s perspective (e.g., [29]).

  7. 7.

    The original sentence is example (9b) (http://www.irishtimes.com/newspaper/ireland/2011/0502/1224295867753.html). The other two have been adapted for the argument’s sake.

  8. 8.

    This is equivalent to the notation \({\langle }author\), \(nelles\rangle \) in Wiebe’s work. FactBank adopts a reversed representation of the nesting (i.e., the non-embedded source last) because it positions the most direct source of the event at the outmost layer, thus facilitating its reading.

  9. 9.

    The vowels naming the vertices, which are derived from Latin verbs a ff i rmo ‘I affirm’, and n e g o ‘I deny’, reflect this distinction.

  10. 10.

    This step is applied here only for the purpose of illustrating the complete process, although it should be clear just from the meaning of the sentence that the event change in the original example is presented with some degree of uncertainty.

  11. 11.

    http://www.timeml.org/site/timebank/timebank.html.

  12. 12.

    http://www.timeml.org/site/timebank/timebank.html.

  13. 13.

    The figures reported here update those reported in previous work [51, 52].

  14. 14.

    Note that some events are also factuality markers.

  15. 15.

    Likewise with event markup, only the heads of source expressions are annotated here.

  16. 16.

    See also http://compprag.christopherpotts.net/factbank.html.

References

  1. ACE.: ACE (Automatic Content Extraction) English Annotation Guidelines for Relations (Version 6.0 2008.01.07 ed.). http://www.ldc.upenn.edu/Projects/ACE/ (2008)

  2. Aikhenvald, A.Y.: Evidentiality. Oxford University Press, Oxford (2004)

    Google Scholar 

  3. Asher, N.: Reference to Abstract Objects in English. Kluwer Academic Press, Dordrecht (1993)

    Google Scholar 

  4. Bach, K., Harnish, R.M.: Linguistic Communication and Speech Acts. The MIT Press, Cambridge (1979)

    Google Scholar 

  5. Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of the 17th International Conference on Computational Linguistics, pp. 8690 (1998)

    Google Scholar 

  6. Bethard, S., Yu, H., Thornton, A., Hatzivassiloglou, V., Jurafsky, D.: Automatic extraction of opinion propositions and their holders. In: 2004 AAAI Spring Symposium on Exploring Attitude and Affect in Text (2004)

    Google Scholar 

  7. Biber, D., Finegan, E.: Styles of stance in English: Lexical and grammatical marking of evidentiality and affect. Text 9(1), 93–124 (1989)

    Google Scholar 

  8. Carlson, L., Marcu, D., Okurowski, M.E.: Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. In: Kuppevelt, J.V., Smith, R.W. (eds.) Current and New Directions in Discourse and Dialogue. Springer, Berlin (2003)

    Google Scholar 

  9. Chafe, W.: Evidentiality in English conversation and academic writing. In: Chafe, W., Nichols, J. (eds.) Evidentiality: The Linguistic Coding of Epistemology. Ablex Publishing Corporation, Norwood (1986)

    Google Scholar 

  10. Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: Identifying sources of opinions with conditional random fields and extraction patterns. In: Proceedings of the HLT/EMNLP (2005)

    Google Scholar 

  11. Dalianis, H., Skeppstedt, M.: Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 5–13 (2010)

    Google Scholar 

  12. de Marneffe, M.-C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of LREC 2006, pp. 449–454. Genoa (2006)

    Google Scholar 

  13. de Marneffe, M.-C., Manning, C.D., Potts, C.: Did it happen? The pragmatic complexity of veridicality assessment. Comput. Linguist. 38(2), 301333 (2012)

    Article  Google Scholar 

  14. Diab, M., Dorr, B., Levin, L., Mitamura, T., Passonneau, R., Rambow, O., Ramshaw, L.: Language Understanding Annotation Corpus. Linguistic Data Consortium. LDC2009T10 (2009)

    Google Scholar 

  15. Dor, D.: Representations, Attitudes and Factivity Evaluations. An Epistemically-based Analysis of lexical Selection. PhD thesis, Stanford University (1995)

    Google Scholar 

  16. Farkas, R., Vincze, V., Mra, G., Csirik, J., Szarvas, G.: The CoNLL-2010 Shared Task: Learning to detect hedges and their scope in natural language text. In: Proceedings of the 14th Conference on Computational Natural Language Learning Shared Task, pp. 1–12 (2010)

    Google Scholar 

  17. Haan, F.d.: The Interaction of Modality and Negation: A Typological Study. Garland, New York (1997)

    Google Scholar 

  18. Halliday, M.A.K., Matthiessen, C.M.I.M.: An Introduction to Functional Grammar. Hodder Arnold, London (2004)

    Google Scholar 

  19. Henriksson, A., Velupillai, S.: Levels of certainty in knowledge-intensive corpora: an initial annotation study. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 41–45 (2010)

    Google Scholar 

  20. Hooper, J.B.: On assertive predicates. In: Kimball, J. (ed.) Syntax and Semantics, IV, pp. 91–124. Academic Press, New York (1975)

    Google Scholar 

  21. Horn, L.R.: On the Semantic Properties of Logical Operators in English. PhD thesis, UCLA. Distributed by the Indiana University Linguistics Club in 1976 (1972)

    Google Scholar 

  22. Horn, L.R.: A Natural History of Negation, vol. 960. University of Chicago Press Chicago, Chicago (1989)

    Google Scholar 

  23. Hyland, K.: Writing without conviction? Hedging in science research articles. Appl. Linguist. 14(4), 433–454 (1996)

    Article  Google Scholar 

  24. Karttunen, L.: Implicative verbs. Language 47, 340358 (1971)

    Article  Google Scholar 

  25. Kiefer, F.: On defining modality. Folia Linguist. XX I(1), 67–94 (1987)

    Google Scholar 

  26. Kilicoglu, H., Bergler, S.: Recognizing speculative language in biomedical research articles: a linguistically motivated perspective. BMC Bioinform. 9(Suppl 11), S10 (2008)

    Article  Google Scholar 

  27. Kim, J.-D., Ohta, T., Tsujii, J.: Corpus annotation for mining biomedical events from literature. BMC Bioinform. 9(1), 10 (2008)

    Article  Google Scholar 

  28. Kiparsky, P., Kiparsky, C.: Fact. In: Bierwisch, M., Heidolph, K.E. (eds.) Progress in Linguistics. A Collection of Papers, pp. 143173. The Hague, Paris: Mouton (1970)

    Google Scholar 

  29. Kratzer, A.: Modality. In: Stechow, A.v., Wunderlich, D. (eds.) Semantik: Ein internationales Handbuch der zeitgenoessischen Forschung, pp. 639–650. Walter de Gruyter, Berlin (1991)

    Google Scholar 

  30. Lakoff, G.: Hedges: a study in meaning criteria and the logic of fuzzy concepts. J. Philos. Log. 2(4), 458–508 (1973)

    Article  Google Scholar 

  31. Light, M., Qiu, X.Y., Srinivasan, P.: The language of bioscience: facts, speculations, and statements in between. In: BioLINK 2004: Linking Biological Literature, Ontologies, and Databases, pp. 17–24 (2004)

    Google Scholar 

  32. Lyons, J.: Semantics. Cambridge University Press, Cambridge (1977)

    Book  Google Scholar 

  33. Martin, J.R., White, P.R.R.: Language of Evaluation: Appraisal in English. Palgrave Macmillan (2005)

    Google Scholar 

  34. Medlock, B., Briscoe, T.: Weakly supervised learning for hedge classification in scientific literature. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 992–999 (2007)

    Google Scholar 

  35. Miltsakaki, E., Prasad, R., Joshi, A., Webber, B.: The Penn Discourse Treebank. In: Proceedings of LREC 2004 (2004)

    Google Scholar 

  36. Morante, R., Daelemans, W.: Learning the scope of hedge cues in biomedical texts. In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, pp. 28–36 (2009)

    Google Scholar 

  37. Morante, R., Daelemans, W.: A metalearning approach to processing the scope of negation. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pp. 21–29 (2009)

    Google Scholar 

  38. Mushin, I.: Evidentiality and Epistemological Stance. John Benjamins Publisher, Amsterdam (2001)

    Book  Google Scholar 

  39. Nawaz, R., Thompson, P., Ananiadou, S.: Evaluating a meta-knowledge annotation scheme for bio-events. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 69–77 (2010)

    Google Scholar 

  40. Ohta, T., Kim, J.-D., Tsuji, J.: Guidelines for event annotation (2007)

    Google Scholar 

  41. Palmer, F.R.: Mood and Modality. Cambridge University Press, Cambridge (1986)

    Google Scholar 

  42. Palmer, M., Gildea, D., Kingsbury, P.: The Proposition Bank: An annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–105 (2005)

    Google Scholar 

  43. Prasad, R., Dinesh, N., Lee, A., Joshi, A., Webber, B.: Attribution and its annotation in the Penn Discourse TreeBank. Traitement Automatique des Langues 47(2), 43–64 (2007)

    Google Scholar 

  44. Pustejovsky, J., Knippen, R., Littman, J., Saurí, R.: Temporal and event information in natural language text. Language Resources and Evaluation 39(2–3), 123164 (2005)

    Google Scholar 

  45. Pustejovsky, J., Verhagen, M., Saurí, R., Littman, J., Gaizauskas, R., Katz, G., Mani, I., Knippen, R., Setzer, A.: TimeBank 1.2. Linguistic Data Consortium (LDC), Philadelphia, Pennsylvania. LDC Catalog No. 2006T08 (2006)

    Google Scholar 

  46. Rizomilioti, V.: Exploring epistemic modality in academic discourse using corpora. In: Maci, E.A., Cervera, A.S., Ramos, C.R. (eds.) Information Technology in Languages for Specific Purposes, vol. 7, pp. 53–71. Springer, US (2006)

    Chapter  Google Scholar 

  47. Rubin, V.L.: Identifying certainty in Texts. PhD thesis, Syracuse University (2006)

    Google Scholar 

  48. Rubin, V.L.: Stating with certainty or stating with doubt: intercoder reliability results for manual annotation of epistemically modalized statements. In: Proceedings of the NAACL-HLT 2007, pp. 141–144 (2007)

    Google Scholar 

  49. Rubin, V.L.: Epistemic modality: From uncertainty to certainty in the context of information seeking as interactions with texts. Inf. Process. Manag. 46, 533–540 (2010)

    Article  Google Scholar 

  50. Rubin, V.L., Liddy, E.D., Kando, N.: Certainty identification in texts: categorization model and manual tagging results. In: Shanahan, J., Qu, Y., Wiebe, J. (eds.) Computing Attitude and Affect in Text: Theories and Applications. Springer, New York (2005)

    Google Scholar 

  51. Saurí, R.: A Factuality Profiler for Eventualities in Text. PhD thesis, Brandeis University (2008)

    Google Scholar 

  52. Saurí, R., Pustejovsky, J.: From structure to interpretation: a double-layered annotation for event factuality. In: Proceedings of the 2nd Linguistic Annotation Workshop (The LAW II). LREC 2008, Marrakech, Morocco (2008)

    Google Scholar 

  53. Saurí, R., Pustejovsky, J.: FactBank. a corpus annotated with event factuality. Lang. Res. Eval. 43, 227–268 (2009)

    Article  Google Scholar 

  54. Saurí, R., Pustejovsky, J.: Are you sure that this happened? Assessing the factuality degree of events in text. Comput. Linguist. 38(2), 261299 (2012)

    Article  Google Scholar 

  55. Saurí, R., Verhagen, M., Pustejovsky, J.: Annotating and recognizing event modality in text. In: 19th International FLAIRS Conference, FLAIRS 2006 (2006)

    Google Scholar 

  56. Szarvas, G.: Hedge classification in biomedical texts with a weakly supervised selection of keywords. In: ACL 08: HLT, pp. 281–289 (2008)

    Google Scholar 

  57. Van Valin, R.D. LaPolla, R.J.: Syntax. Structure, Meaning and Function. Cambridge University Press, Cambridge (1997)

    Google Scholar 

  58. Velupillai, S.: Towards a better understanding of uncertainties and speculations in Swedish clinical text: analysis of an initial annotation trial. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 14–22 (2010)

    Google Scholar 

  59. Vincze, V., Szarvas, G., Farkas, R., Mra, G., Csirik, J.: The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinform. 9(Suppl 11), S9 (2008)

    Article  Google Scholar 

  60. Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotions in language. Lang. Res. Eval. 39(2–3), 165–210 (2005)

    Article  Google Scholar 

  61. Wilbur, W.J., Rzhetsky, A., Shatkay, H.: New directions in biomedical text annotation: definitions, guidelines and corpus construction. BMC Bioinform. 7(1), 356+ (2006)

    Google Scholar 

  62. Wolf, F., Gibson, E.: Representing discourse coherence: a corpus-based analysis. Comput. Linguist. 31(2), 249–287 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roser Saurí .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Saurí, R. (2017). Building FactBank or How to Annotate Event Factuality One Step at a Time. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-94-024-0881-2_34

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-024-0879-9

  • Online ISBN: 978-94-024-0881-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics