Abstract
Annotation is one of the main vehicles for supplying knowledge to machine learning systems built to automate text processing tasks. In this chapter, we discuss how linguistic annotation is used in machine learning for different natural language processing (NLP) tasks. Specifically, we focus on how different layers of annotation are leveraged in tasks that aim to discover higher-level linguistic information. We present how machine learning fits into the annotation process in the MATTER cycle, discuss some common machine learning algorithms used in NLP, explain the fundamentals of feature selection, and explore methods for leveraging limited quantities of annotated data. We close with a case study of the 2012 i2b2 NLP shared task which targeted temporal information extraction, a higher-level task that requires a synthesis of information from multiple linguistic levels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 37(6), 15541563 (1966). doi:10.1214/aoms/1177699147
Berger, A.L., Pietra, S.A.D., Pietra, V.J.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)
Biber, D., Conrad, S., Reppen, R.: Compurs Linguistics: Investigating Language Structure and Use. Cambridge University Press, Cambridge (1998)
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit. O’Reilly (2009)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Bordes, A., Glorot, X., Weston, J., Bengio, Y.: Joint learning of words and meaning representations for open-text semantic parsing. In: Proceedings of 15th International Conference on Artificial Intelligence and Statistics (2012)
Chang, Y.-C.., Dai, H.-J., Wu, J.C.-Y., Chen, J.-M., Tsai, R.T.-H.: Hsu, W.-L.: TEMPTING system: a hybrid method of rule and machine learning for temporal relation extraction in patient discharge summaries. J. Biomed. Inform. 46 Supplement S54–S62 (2013)
Chen, S.F.: Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Proceedings of the 34th annual meeting on Association for Computational Linguistics (ACL ’96). Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 310-318. doi:10.3115/981863.981904 (1996)
Cherry, C., Zhu, X., Martin, J., de Bruijn, B.: A la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge. J. Am. Med. Inform. Assoc. 2013(20), 843–848 (2012). doi:10.1136/amiajnl-2013-001624
Chiticariu, L., Li, Y., Reiss, F.R.: Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! EMNLP 2013, pp. 827–832 (2013)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273 (1995). doi:10.1007/BF00994018
D’Souza, J., Ng, V.: Classifying temporal relations in clinical data: a hybrid, knowledge-rich approach. J. Biomed. Inform. 46(Supplement), S29–S39 (2013)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis JASIS 41:6, pp. 391–407 (1990)
Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10) (2012). doi:10.1145/2347736.2347755
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–130 (1997)
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)
Ferraro, J.P., Daume 3rd, H., Duvall, S.L., Chapman, W.W. Harkema, H., Haug, P.J.: Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation. J. Am. Med. Inform. Assoc. 20(5), 931-939 (2013). doi:10.1136/amiajnl-2012-001453. Epub 13 Mar 2013
Finkel, J.R., Manning, C.D.: Hierarchical joint learning: Improving joint parsing and named entity recognition with non-jointly labeled data. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 720–728. Association for Computational Linguistics (2010)
Grouin, C., Grabar, N., Hamon, T., Rosset, S., Tannier, X., Zweigenbaum, P.: Eventual situations for timeline extraction from clinical reports. J. Am. Med. Inf. Assoc. 20, 820–827 (2013). doi:10.1136/amiajnl-2013-001627
Jindal, P., Roth, D.: Extraction of events and temporal expressions from clinical narratives. J. Biomed. Inform. 46 Suppl, pp. S13-S19 (2013). doi:10.1016/j.jbi.2013.08.010. Epub 8 Sep 2013
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics, 2nd edn. Prentice-Hall (2009)
Klein, D., Manning, C.D.: Conditional structure versus conditional estimation in NLP models. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10. Association for Computational Linguistics (2002)
Kovacevic, A., Dehghan, A., Filannino, M., Keane, J.A., Nenadic, G.: Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. J. Am. Med. Inform. Assoc. 20, 859–866 (2013). doi:10.1136/amiajnl-2013-001625
Lafferty, J.D., McCallum, A., Pereira, Fernando C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning (2001)
Lin, Y.-K., Chen, H., Brown, R.A.: MedTime: a temporal information extraction system for clinical narratives. J. Biomed. Inform. 46, Supplement S20–S28 (2013)
Manning, C.D., Raghaven, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the Seventeenth International Conference on Machine Learning (2000)
Ng, A., Jordan, M.I.: On discriminative vs. Generative classiers: a comparison of logistic regression and naive bayes. In: NIPS (2001)
Nikfarjam, A., Emadzadeh, E., Gonzalez, G.: Towards generating a patients timeline: Extracting temporal relationships from clinical notes. J. Biomed. Inform. 46, Special Issue, S40–S47 (2013)
Pustejovsky, J., Rumshisky, A.: SemEval-2010 Task 7: argument selection and coercion. In: NAACL 2009 Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009). Boulder, Colorado USA (2009)
Pustejovsky, J., Stubbs, A.: Natural Language Annotation for Machine Learning. OReilly Media (2012)
Roberts, K., Rink, B., Harabagiu, S.M.: A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text. J. Am. Med. Inform. Assoc. 20, 867–875 (2013). doi:10.1136/amiajnl-2013-001619
Russell, S., Norvig, P.: [1995] Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall (2003) [1995]. ISBN 978-0137903955
Settles, B.: Active Learning Literature Survey. Computer Sciences Technical Report. University of Wisconsin–Madison (2009)
Settles, B.: Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 6(1), p. 1. Morgan and Claypool. http://dx.doi.org/10.2200/S00429ED1V01Y201207AIM018 (2012)
Singh, S., Riedel, S., Martin, B., Zheng, J., McCallum, A.: Joint inference of entities, relations, and coreference. In: Third International Workshop on AutomatedKnowledge Base Construction (AKBC) (2013)
Sohn, S., Wagholikar, K.B., Li, D., Jonnalagadda, S.R., Tao, C., Elayavilli, R.K., Liu, H.: Comprehensive temporal information detection from clinical text: medical events, time, and TLINK identification. J. Am. Med. Inform. Assoc. 20(5), 836–842 (2013). Published online 4 Apr 2013. doi:10.1136/amiajnl-2013-001622
Sun, W., Rumshisky, A., Uzuner, O.: Annotating temporal information in clinical narratives. J. Biomed. Inform. 46(Supplement), S5–S12 (2013)
Sun, W., Rumshisky, A., Uzuner, O.: Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J. Am. Med. Inform. Assoc. 20(5), 806–813 (2013). doi:10.1136/amiajnl-2013-001628. Epub 5 Apr 2013
Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning. MIT Press (2006)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings 18th International Conference on Machine Learning, pp. 282-289. Morgan Kaufmann (2001)
Pudil, P., Novoviov, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognit. Lett. 15(11), 1119–1125 (1994)
Tang, B., Cao, H., Wu, Y., Jiang, M., Xu, H.: Clinical entity recognition using structural support vector machines with rich features. In: ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics, Maui, HI, USA, pp. 13–20 (2012)
Tang, B., Wu, Y., Jiang, M., Chen, Y., Denny, J.C., Xu, H.: A hybrid system for temporal information extraction from clinical text. J. Am. Med. Inform. Assoc. doi:10.1136/amiajnl-2013-001635
Xu, Y., Hong, K., Tsujii, J., Chang, E.I-C.: Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries. J. Am. Med. Inform. Assoc. 19, 824–832 (2012). doi:10.1136/amiajnl-2011-000776
Xu, Y., Wang, Y., Liu, T., Tsujii, J.T., Chang, E.I-C.: An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge. J. Am. Med. Inform. Assoc. 20, 849–858 (2013). doi:10.1136/amiajnl-2012-001607
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Machine Learning Resources and Toolkits
Appendix: Machine Learning Resources and Toolkits
For more information on the inner workings of ML algorithms, we highly recommend the following books:
-
Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. 2nd edition. Prentice-Hall. 2009.
-
Christopher Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999
-
Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2013.
A variety of toolkits are available for building ML systems. These toolkits provide implementations of different ML algorithms, thereby allowing NLP researchers to focus on providing the appropriate feature sets to maximize the accuracy of the results of the ML system.
Many machine-learning systems for NLP are free and open source; here is a short list of commonly used ML toolkits and other systems:
-
NLTK: http://www.nltk.org/
-
GATE: http://gate.ac.uk/
-
LingPipe: http://alias-i.com/lingpipe/index.html
-
MALLET: http://mallet.cs.umass.edu/
-
Stanford NLP tools: http://nlp.stanford.edu/software/index.shtml
The NLTK also has an accompanying book: “Natural Language Processing with Python” by Steven Bird, Ewan Klein, and Edward Loper [4].
In addition to providing implementations of many machine learning algorithms that the user can train for their own specific tasks, many of these toolkits provide already-trained systems for common NLP tasks such as part-of-speech tagging, named entity recognition, dependency trees, and so on. This additional functionality is extremely important for many NLP tasks.
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Rumshisky, A., Stubbs, A. (2017). Machine Learning for Higher-Level Linguistic Tasks. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_13
Download citation
DOI: https://doi.org/10.1007/978-94-024-0881-2_13
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)