Mutual Information Independence Model Using Kernel Density Estimation for Segmenting and Labeling Sequential Data

Zhou, Guodong; Yang, Lingpeng; Su, Jian; Ji, Donghong

doi:10.1007/978-3-540-30586-6_15

Guodong Zhou¹⁷,
Lingpeng Yang¹⁷,
Jian Su¹⁷ &
…
Donghong Ji¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3406))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2295 Accesses

Abstract

This paper proposes a Mutual Information Independence Model (MIIM) to segment and label sequential data. MIIM overcomes the strong context independent assumption in traditional generative HMMs by assuming a novel pairwise mutual information independence. As a result, MIIM separately models the long state dependence in its state transition model in a generative way and the observation dependence in its output model in a discriminative way. In addition, a variable-length pairwise mutual information-based modeling approach and a kNN algorithm using kernel density estimation are proposed to capture the long state dependence and the observation dependence respectively. The evaluation on shallow parsing shows that MIIM can effectively capture the long context dependence to segment and label sequential data. It is interesting to note that using kernel density estimation leads to increased performance over using a classifier-based approach.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Gaussian Process Pseudo-Likelihood Models for Sequence Labeling

Training Heterogeneous Features in Sequence to Sequence Tasks: Latent Enhanced Multi-filter Seq2Seq Model

Iterative Integration of Unsupervised Features for Chinese Dependency Parsing

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Church, K.W.: A Stochastic Pars Program and Noun Phrase Parser for Unrestricted Text. In: Proceedings of the Second Conference on Applied Natural Language Processing (ANLP 1998), Austin, Texas (1998)
Google Scholar
Weischedel, R., Meteer, M., Schwartz, R., Ramshaw, L., Palmucci, J.: Coping with Ambiguity and Unknown Words through Probabilistic Methods. Computational Linguistics 19(2), 359–382 (1993)
Google Scholar
Merialdo, B.: Tagging English Text with a Probabilistic Model. Computational Linguistics 20(2), 155–171 (1994)
Google Scholar
Bikel, D.M., Schwartz, R., Weischedel, R.M.: An Algorithm that Learns What’s in a Name. Machine Learning (Special Issue on NLP) 34(3), 211–231 (1999)
MATH Google Scholar
Zhou, G.D., Su, J.: Named Entity Recognition Using a HMM-based Chunk Tagger. In: Proceedings of the Conference on Annual Meeting for Computational Linguistics (ACL 2002), Philadelphia, pp. 473–480 (2002)
Google Scholar
Segond, F., Schiller, A., Grefenstette, Chanod, F.P.: An Experiment in Semantic Tagging using Hidden Markov Model Tagging. In: Proceedings of the Joint ACL/EACL workshop on Automatic Information Extraction and Building of Lexical Semantic Resources, Madrid, Spain, pp. 78–81 (1997)
Google Scholar
Brants, T., Skut, W., Krenn, B.: Tagging Grammatical Functions. In: Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP 1997), Brown Univ. (1997)
Google Scholar
Skut, W., Brants, T.: Chunk Tagger – Statistical Recognition of Noun Phrases. In: Proceedings of the ESSLLI 1998 workshop on Automatic Acquisition of Syntax and Parsing, Univ. of Saarbrucken, Germany (1998)
Google Scholar
Zhou, G.D., Su, J.: Error-driven HMM-based Chunk Tagger with Context-Dependent Lexicon. In: Proceedings of the Joint Conference on Empirical Methods on Natural Language Processing and Very Large Corpus (EMNLP/VLC 2000), Hong Kong (2000)
Google Scholar
Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Article Google Scholar
Gale, W.A., Sampson, G.: Good-Turing frequency estimation without tears. Journal of Quantitative Linguistics 2, 217–237 (1995)
Article Google Scholar
Jelinek, F.: Self-Organized Language Modeling for Speech Recognition. In: Waibel, A., Lee, K.-F. (eds.) Readings in Speech Recognition, pp. 450–506. Morgan Kaufmann, San Francisco (1989)
Google Scholar
Katz, S.M.: Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE Transactions on Acoustics. Speech and Signal Processing 35, 400–401 (1987)
Article Google Scholar
Ratnaparkhi, A.: Learning to parsing natural language with maximum entropy models. Machine Learning 34, 151–175 (1999)
Article MATH Google Scholar
Daelemans, W., Buchholz, S., Veenstra: Memory-based shallow parsing. In: Proceedings of CoNLL 1999, Bergen, Norway (1999)
Google Scholar
Daelemans, T.K.S., Dejean, H., et al.: Applying system combination to base noun noun phrase indentification. In: Proceedings of COLING 2000, Saarbrucken, Germany (2000)
Google Scholar
Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Buliding a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Google Scholar
van Rijsbergen, C.J.: Information Retrieval. Buttersworth, London (1979)
Google Scholar
Roth, D.: Learning to resolve natural language ambiguities: A unified approach. In: Proceedings of the National Conference on Artificial Intelligence, pp. 806–813 (1998)
Google Scholar
Carlson, A., Cumby, C., Rosen, J., Roth, D.: The SNoW learning architecture. Techinical Report UIUCDCS-R-99-2101. UIUC (1999)
Google Scholar
Kudoh, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of NAACL 2001, Pittsburgh, PA, USA (2001)
Google Scholar
Punyakanok, V., Roth, D.: The Use of Classifiers in Sequential Inference NIPS-13 (2000)
Google Scholar
Zhang, T., Damerau, F., Johnson, D.: Text chunking based on a generalization of winnow. Journal of Machine Learning Research 2, 615–637 (2002)
Article MATH Google Scholar
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: ICML-19, Stanford, California, pp. 591–598 (2000)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML-20 (2001)
Google Scholar
Bottou, L.: Une approche theorique de l’apprentissage connexionniste: Applications a la reconnaissance de la parole. Doctoral dissertation, Universite de Paris XI (1991)
Google Scholar
McCallum, A., Rohanimanesh, K., Sutton, C.: Dynamic conditional random fields for jointly labeling multiple sequences. In: Proceedings of IJCAI 2003 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Guodong Zhou, Lingpeng Yang, Jian Su & Donghong Ji

Authors

Guodong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Lingpeng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Su
View author publications
You can also search for this author in PubMed Google Scholar
Donghong Ji
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, G., Yang, L., Su, J., Ji, D. (2005). Mutual Information Independence Model Using Kernel Density Estimation for Segmenting and Labeling Sequential Data. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-30586-6_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Mutual Information Independence Model Using Kernel Density Estimation for Segmenting and Labeling Sequential Data

Abstract

Chapter PDF

Similar content being viewed by others

Gaussian Process Pseudo-Likelihood Models for Sequence Labeling

Training Heterogeneous Features in Sequence to Sequence Tasks: Latent Enhanced Multi-filter Seq2Seq Model

Iterative Integration of Unsupervised Features for Chinese Dependency Parsing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Mutual Information Independence Model Using Kernel Density Estimation for Segmenting and Labeling Sequential Data

Abstract

Chapter PDF

Similar content being viewed by others

Gaussian Process Pseudo-Likelihood Models for Sequence Labeling

Training Heterogeneous Features in Sequence to Sequence Tasks: Latent Enhanced Multi-filter Seq2Seq Model

Iterative Integration of Unsupervised Features for Chinese Dependency Parsing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation