Machine Learning for Higher-Level Linguistic Tasks

Rumshisky, Anna; Stubbs, Amber

doi:10.1007/978-94-024-0881-2_13

Anna Rumshisky³ &
Amber Stubbs⁴

2222 Accesses
1 Altmetric

Abstract

Annotation is one of the main vehicles for supplying knowledge to machine learning systems built to automate text processing tasks. In this chapter, we discuss how linguistic annotation is used in machine learning for different natural language processing (NLP) tasks. Specifically, we focus on how different layers of annotation are leveraged in tasks that aim to discover higher-level linguistic information. We present how machine learning fits into the annotation process in the MATTER cycle, discuss some common machine learning algorithms used in NLP, explain the fundamentals of feature selection, and explore methods for leveraging limited quantities of annotated data. We close with a case study of the 2012 i2b2 NLP shared task which targeted temporal information extraction, a higher-level task that requires a synthesis of information from multiple linguistic levels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Softcover Book: USD 449.99; Price excludes VAT (USA)

Hardcover Book: USD 449.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Adam Kilgarriff’s Legacy to Computational Linguistics and Beyond

Pre-trained models for natural language processing: A survey

Article 15 September 2020

Natural Language Processing, Moving from Rules to Data

References

Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 37(6), 15541563 (1966). doi:10.1214/aoms/1177699147
Article Google Scholar
Berger, A.L., Pietra, S.A.D., Pietra, V.J.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)
Google Scholar
Biber, D., Conrad, S., Reppen, R.: Compurs Linguistics: Investigating Language Structure and Use. Cambridge University Press, Cambridge (1998)
Book Google Scholar
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit. O’Reilly (2009)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Google Scholar
Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Google Scholar
Bordes, A., Glorot, X., Weston, J., Bengio, Y.: Joint learning of words and meaning representations for open-text semantic parsing. In: Proceedings of 15th International Conference on Artificial Intelligence and Statistics (2012)
Google Scholar
Chang, Y.-C.., Dai, H.-J., Wu, J.C.-Y., Chen, J.-M., Tsai, R.T.-H.: Hsu, W.-L.: TEMPTING system: a hybrid method of rule and machine learning for temporal relation extraction in patient discharge summaries. J. Biomed. Inform. 46 Supplement S54–S62 (2013)
Google Scholar
Chen, S.F.: Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Proceedings of the 34th annual meeting on Association for Computational Linguistics (ACL ’96). Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 310-318. doi:10.3115/981863.981904 (1996)
Cherry, C., Zhu, X., Martin, J., de Bruijn, B.: A la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge. J. Am. Med. Inform. Assoc. 2013(20), 843–848 (2012). doi:10.1136/amiajnl-2013-001624
Google Scholar
Chiticariu, L., Li, Y., Reiss, F.R.: Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! EMNLP 2013, pp. 827–832 (2013)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273 (1995). doi:10.1007/BF00994018
Google Scholar
D’Souza, J., Ng, V.: Classifying temporal relations in clinical data: a hybrid, knowledge-rich approach. J. Biomed. Inform. 46(Supplement), S29–S39 (2013)
Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis JASIS 41:6, pp. 391–407 (1990)
Google Scholar
Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10) (2012). doi:10.1145/2347736.2347755
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–130 (1997)
Article Google Scholar
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)
Google Scholar
Ferraro, J.P., Daume 3rd, H., Duvall, S.L., Chapman, W.W. Harkema, H., Haug, P.J.: Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation. J. Am. Med. Inform. Assoc. 20(5), 931-939 (2013). doi:10.1136/amiajnl-2012-001453. Epub 13 Mar 2013
Finkel, J.R., Manning, C.D.: Hierarchical joint learning: Improving joint parsing and named entity recognition with non-jointly labeled data. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 720–728. Association for Computational Linguistics (2010)
Google Scholar
Grouin, C., Grabar, N., Hamon, T., Rosset, S., Tannier, X., Zweigenbaum, P.: Eventual situations for timeline extraction from clinical reports. J. Am. Med. Inf. Assoc. 20, 820–827 (2013). doi:10.1136/amiajnl-2013-001627
Article Google Scholar
Jindal, P., Roth, D.: Extraction of events and temporal expressions from clinical narratives. J. Biomed. Inform. 46 Suppl, pp. S13-S19 (2013). doi:10.1016/j.jbi.2013.08.010. Epub 8 Sep 2013
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics, 2nd edn. Prentice-Hall (2009)
Google Scholar
Klein, D., Manning, C.D.: Conditional structure versus conditional estimation in NLP models. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10. Association for Computational Linguistics (2002)
Google Scholar
Kovacevic, A., Dehghan, A., Filannino, M., Keane, J.A., Nenadic, G.: Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. J. Am. Med. Inform. Assoc. 20, 859–866 (2013). doi:10.1136/amiajnl-2013-001625
Lafferty, J.D., McCallum, A., Pereira, Fernando C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning (2001)
Google Scholar
Lin, Y.-K., Chen, H., Brown, R.A.: MedTime: a temporal information extraction system for clinical narratives. J. Biomed. Inform. 46, Supplement S20–S28 (2013)
Google Scholar
Manning, C.D., Raghaven, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book Google Scholar
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the Seventeenth International Conference on Machine Learning (2000)
Google Scholar
Ng, A., Jordan, M.I.: On discriminative vs. Generative classiers: a comparison of logistic regression and naive bayes. In: NIPS (2001)
Google Scholar
Nikfarjam, A., Emadzadeh, E., Gonzalez, G.: Towards generating a patients timeline: Extracting temporal relationships from clinical notes. J. Biomed. Inform. 46, Special Issue, S40–S47 (2013)
Google Scholar
Pustejovsky, J., Rumshisky, A.: SemEval-2010 Task 7: argument selection and coercion. In: NAACL 2009 Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009). Boulder, Colorado USA (2009)
Google Scholar
Pustejovsky, J., Stubbs, A.: Natural Language Annotation for Machine Learning. OReilly Media (2012)
Google Scholar
Roberts, K., Rink, B., Harabagiu, S.M.: A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text. J. Am. Med. Inform. Assoc. 20, 867–875 (2013). doi:10.1136/amiajnl-2013-001619
Article Google Scholar
Russell, S., Norvig, P.: [1995] Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall (2003) [1995]. ISBN 978-0137903955
Google Scholar
Settles, B.: Active Learning Literature Survey. Computer Sciences Technical Report. University of Wisconsin–Madison (2009)
Google Scholar
Settles, B.: Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 6(1), p. 1. Morgan and Claypool. http://dx.doi.org/10.2200/S00429ED1V01Y201207AIM018 (2012)
Singh, S., Riedel, S., Martin, B., Zheng, J., McCallum, A.: Joint inference of entities, relations, and coreference. In: Third International Workshop on AutomatedKnowledge Base Construction (AKBC) (2013)
Google Scholar
Sohn, S., Wagholikar, K.B., Li, D., Jonnalagadda, S.R., Tao, C., Elayavilli, R.K., Liu, H.: Comprehensive temporal information detection from clinical text: medical events, time, and TLINK identification. J. Am. Med. Inform. Assoc. 20(5), 836–842 (2013). Published online 4 Apr 2013. doi:10.1136/amiajnl-2013-001622
Sun, W., Rumshisky, A., Uzuner, O.: Annotating temporal information in clinical narratives. J. Biomed. Inform. 46(Supplement), S5–S12 (2013)
Google Scholar
Sun, W., Rumshisky, A., Uzuner, O.: Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J. Am. Med. Inform. Assoc. 20(5), 806–813 (2013). doi:10.1136/amiajnl-2013-001628. Epub 5 Apr 2013
Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning. MIT Press (2006)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings 18th International Conference on Machine Learning, pp. 282-289. Morgan Kaufmann (2001)
Google Scholar
Pudil, P., Novoviov, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognit. Lett. 15(11), 1119–1125 (1994)
Article Google Scholar
Tang, B., Cao, H., Wu, Y., Jiang, M., Xu, H.: Clinical entity recognition using structural support vector machines with rich features. In: ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics, Maui, HI, USA, pp. 13–20 (2012)
Google Scholar
Tang, B., Wu, Y., Jiang, M., Chen, Y., Denny, J.C., Xu, H.: A hybrid system for temporal information extraction from clinical text. J. Am. Med. Inform. Assoc. doi:10.1136/amiajnl-2013-001635
Xu, Y., Hong, K., Tsujii, J., Chang, E.I-C.: Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries. J. Am. Med. Inform. Assoc. 19, 824–832 (2012). doi:10.1136/amiajnl-2011-000776
Xu, Y., Wang, Y., Liu, T., Tsujii, J.T., Chang, E.I-C.: An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge. J. Am. Med. Inform. Assoc. 20, 849–858 (2013). doi:10.1136/amiajnl-2012-001607

Download references

Author information

Authors and Affiliations

University of Massachusetts Lowell, 1 University Ave, Lowell, MA, 01854, USA
Anna Rumshisky
Simmons College, 300 The Fenway, Boston, MA, 02135, USA
Amber Stubbs

Authors

Anna Rumshisky
View author publications
You can also search for this author in PubMed Google Scholar
Amber Stubbs
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Rumshisky .

Editor information

Editors and Affiliations

Department of Computer Science, Vassar College, Poughkeepsie, New York, USA
Nancy Ide
Department of Computer Science, Volen Center for Complex Systems, Brandeis University, Waltham, Massachusetts, USA
James Pustejovsky

Appendix: Machine Learning Resources and Toolkits

For more information on the inner workings of ML algorithms, we highly recommend the following books:

Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. 2nd edition. Prentice-Hall. 2009.
Christopher Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999
Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2013.

A variety of toolkits are available for building ML systems. These toolkits provide implementations of different ML algorithms, thereby allowing NLP researchers to focus on providing the appropriate feature sets to maximize the accuracy of the results of the ML system.

Many machine-learning systems for NLP are free and open source; here is a short list of commonly used ML toolkits and other systems:

NLTK: http://www.nltk.org/
GATE: http://gate.ac.uk/
WEKA: http://www.cs.waikato.ac.nz/ml/weka/
LingPipe: http://alias-i.com/lingpipe/index.html
MALLET: http://mallet.cs.umass.edu/
Stanford NLP tools: http://nlp.stanford.edu/software/index.shtml

The NLTK also has an accompanying book: “Natural Language Processing with Python” by Steven Bird, Ewan Klein, and Edward Loper [4].

In addition to providing implementations of many machine learning algorithms that the user can train for their own specific tasks, many of these toolkits provide already-trained systems for common NLP tasks such as part-of-speech tagging, named entity recognition, dependency trees, and so on. This additional functionality is extremely important for many NLP tasks.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rumshisky, A., Stubbs, A. (2017). Machine Learning for Higher-Level Linguistic Tasks. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_13

Download citation

DOI: https://doi.org/10.1007/978-94-024-0881-2_13
Published: 17 June 2017
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics

Machine Learning for Higher-Level Linguistic Tasks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Adam Kilgarriff’s Legacy to Computational Linguistics and Beyond

Pre-trained models for natural language processing: A survey

Natural Language Processing, Moving from Rules to Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Machine Learning Resources and Toolkits

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Machine Learning for Higher-Level Linguistic Tasks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Adam Kilgarriff’s Legacy to Computational Linguistics and Beyond

Pre-trained models for natural language processing: A survey

Natural Language Processing, Moving from Rules to Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Machine Learning Resources and Toolkits

Appendix: Machine Learning Resources and Toolkits

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation