Abstract
Word Sense Disambiguation (WSD) continues to present a formidable challenge for Natural Language Processing. To better perform automatic WSD, manually annotated corpora are created that serve as training and testing data. When the annotation labels are drawn from an independently created lexical resource, there is an added benefit of checking the resources’ lexical inventory and sense representations against the corpus data. Such corrections can in turn benefit future manual and automatic annotation. We report on the annotation of a number of selected word forms of different parts of speech in the MASC corpus with WordNet senses. Analyses of the annotations reveal good annotator agreement for half of the lemmas but low agreement for the other half, with no obvious indications for the reasons. Through crowdsourcing, however, instead of a single label per word, we had many annotators assign labels to each word to create a corpus where we can infer a single ground truth label per sentence from the many labels, along with a confidence. Even for words with low agreement, many of the instances have confident labels. In a complementary effort, 100 of the MASC sentences with WordNet-annotated lemmas were fully annotated with FrameNet lexical units and Frame Elements. This allowed for the comparison between, and alignment of, the WordNet and FrameNet senses for the chosen lemmas. We reflect on the fundamental design differences between these two complementary resources and their respective contributions to WSD. The MASC word sense annotation effort has demonstrated that it is possible to collect reliable manual annotations of moderately polysemous words, and that we do not yet know what makes this possible for some words and not others. The corpus, therefore, can serve as a valuable resource for investigating this question.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
FrameNet does not include the mathematical sense, equivalent to orthogonal.
- 2.
The annotation process has been described in detail in several publications. The text for this section is drawn from [27].
- 3.
A preliminary version of the same table appeared in [27] prior to completion of the corpus.
- 4.
The \(\alpha \) scores and confidence intervals are produced with Ron Artstein’s script, calculate-alpha.perl, which is distributed with the word sense sentence corpus.
References
Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Ling. 34(4), 555–596 (2008)
Baker, C.F., Fellbaum, C.: Wordnet and framenet as complementary resources for annotation. In: Proceedings of the Third Linguistic Annotation Workshop, pp. 125–129. Association for Computational Linguistics, Suntec, Singapore (2009). http://www.aclweb.org/anthology/W/W09/W09-3021
Chow, I.C., Webster, J.J.: Integration of linguistic resources for verb classification: FrameNet, WordNet, VerbNet, and suggested upper merged ontology. In: Proceedings of CICLing, pp. 1–11 (2007)
Clark, P., Fellbaum, C., Hobbs, J.R., Harrison, P., Murray, W.R., Thompson, J.: Augmenting WordNet for deep understanding of text. In: Proceedings of the 2008 Conference on Semantics in Text Processing, STEP ’08, pp. 45–57. Association for Computational Linguistics, Stroudsburg (2008). http://dl.acm.org/citation.cfm?id=1626481.1626486
Coppola, B., Moschitti, A., Tonelli, S., Riccardi, G.: Automatic FrameNet-based annotation of conversational speech. In: Proceedings of IEEE-SLT 2008, Goa, pp. 73–76 (2008)
Dawid, A.P., Skene, A.M.: Maximum likellihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 28(1), 20–28 (1979)
De Cao, D., Croce, D., Basili, R.: Extensive evaluation of a FrameNet-WordNet mapping resource. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odjik, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta (2010)
de Melo, G., Baker, C.F., Ide, N., Passonneau, R.J., Fellbaum, C.: Empirical comparisons of MASC word sense annotations. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA), Istanbul, Turkey (2012). http://www.icsi.berkeley.edu/pubs/ai/empiricalcomparisons12.pdf
Erk, K., Padó, S.: Analysing models for semantic role assignment using confusability. In: Proceedings of HLT/EMNLP-05. Vancouver, Canada (2005)
Fellbaum, C. (ed.): WordNet. An Electronic Lexical Database. MIT Press, Cambridge (1998)
Fellbaum, C., Grabowski, J., Landes, S., et al.: Analysis of a hand-tagging task. In: Proceedings of ANLP-97 Workshop on Tagging Text with Lexical Semantics: Why, What, and How (1997)
Fellbaum, C., Grabowski, J., Landes, S.: Performance and confidence in a semantic annotation task. WordNet: An Electronic Lexical Database, pp. 217–239. MIT Press, Cambridge (1998)
Ferrández, O., Ellsworth, M., Muñoz, R., Baker, C.F.: Aligning FrameNet and WordNet based on semantic neighborhoods. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odjik, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), pp. 310–314. European Language Resources Association (ELRA), Valletta, Malta (2010)
Fillmore, C.J.: Scenes-and-frames semantics. In: Zampolli, A. (ed.) Linguistic Structures Processing in Fundamental Studies in Computer Science, vol. 59. North Holland Publishing, Netherlands (1977)
Fillmore, C.J.: Frame semantics. Linguistics in the Morning Calm, pp. 111–137. Hanshin Publishing Co., South Korea (1982)
Fillmore, C.J., Baker, C.F.: A frames approach to semantic analysis. In: Heine, B., Narrog, H. (eds.) Oxford Handbook of Linguistic Analysis, pp. 313–341. Oxford University Press, Oxford (2010)
Ide, N., Reppen, R., Suderman, K.: The American national corpus: more than the web can provide. In: Proceedings of the Third Language Resources and Evaluation Conference (LREC), pp. 839–44, Las Palmas, Canary Islands, Spain (2002). http://americannationalcorpus.org/pubs.html
Ide, N., Baker, C., Fellbaum, C., Fillmore, C., Passonneau, R.: MASC: The manually annotated sub-Corpus of American English. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC), Morocco (2008)
Johansson, R., Nugues, P.: LTH: Semantic structure extraction using nonprojective dependency trees. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pp. 227–230. Association for Computational Linguistics, Prague, Czech Republic (2007). http://www.aclweb.org/anthology/W/W07/W07-2048
Kratzer, A.: Stage level and individual level predicates. In: Carlson, G., Pelletier, F.J. (eds.) The Generic Book. The University of Chicago Press, Chicago (1995). http://sf3.ub.fu-berlin.de/F/7G5IQ44ASMIYAN9352IVKTM2H45I83EMHDNLG5FKL3BP8UE914-38987?func=find-b&find_code=WRD&request=the+generic+book&adjacent=N
Kučera, H., Francis, W.N.: Computational Analysis of Present-day American English. Brown University Press, Providence (1967)
Levin, B.: English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago (1993). http://www-personal.umich.edu/~jlawler/levin.html
Miller, G.A.: WordNet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995). doi:10.1145/219717.219748
Miller, G.A., Chodorow, M., Landes, S., Leacock, C., Thomas, R.G.: Using a semantic concordance for sense identification. In: Proceedings of the Workshop on Human Language Technology, HLT ’94, pp. 240–243. Association for Computational Linguistics, Stroudsburg (1994). doi:10.3115/1075812.1075866
Passonneau, R.J., Carpenter, B.: The benefits of a model of annotation. In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp. 187–195. Association for Computational Linguistics, Sofia, Bulgaria (2013). http://www.aclweb.org/anthology/W13-2323
Passonneau, R.J., Habash, N., Rambow, O.: Inter-annotator agreement on a multilingual semantic annotation task. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), Genoa, Italy, pp. 1951–1956 (2006)
Passonneau, R.J., Baker, C., Fellbaum, C., Ide, N.: The MASC word sense sentence corpus. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC) (2012)
Passonneau, R.J., Bhardwaj, V., Salleb-Aouissi, A., Ide, N.: Multiplicity and word sense: evaluating and learning from multiply labeled word sense annotations. Lang. Resour. Eval. 46(2), 219–252 (2012). doi:10.1007/s10579-012-9188-x
Poesio, M., Artstein, R.: The reliability of anaphoric annotation, reconsidered: Taking ambiguity into account. In: Proceedings of the Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, pp. 76–83 (2005)
Pradhan, S., Loper, E., Dligach, D., Palmer, M.: Semeval-2007 task-17: English lexical sample, srl and all words. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pp. 87–92. Association for Computational Linguistics, Prague, Czech Republic (2007). http://www.aclweb.org/anthology/W/W07/W07-2016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Baker, C., Fellbaum, C., J. Passonneau, R. (2017). Semantic Annotation of MASC. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_25
Download citation
DOI: https://doi.org/10.1007/978-94-024-0881-2_25
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)