Abstract
This paper proposes a Mutual Information Independence Model (MIIM) to segment and label sequential data. MIIM overcomes the strong context independent assumption in traditional generative HMMs by assuming a novel pairwise mutual information independence. As a result, MIIM separately models the long state dependence in its state transition model in a generative way and the observation dependence in its output model in a discriminative way. In addition, a variable-length pairwise mutual information-based modeling approach and a kNN algorithm using kernel density estimation are proposed to capture the long state dependence and the observation dependence respectively. The evaluation on shallow parsing shows that MIIM can effectively capture the long context dependence to segment and label sequential data. It is interesting to note that using kernel density estimation leads to increased performance over using a classifier-based approach.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Church, K.W.: A Stochastic Pars Program and Noun Phrase Parser for Unrestricted Text. In: Proceedings of the Second Conference on Applied Natural Language Processing (ANLP 1998), Austin, Texas (1998)
Weischedel, R., Meteer, M., Schwartz, R., Ramshaw, L., Palmucci, J.: Coping with Ambiguity and Unknown Words through Probabilistic Methods. Computational Linguistics 19(2), 359–382 (1993)
Merialdo, B.: Tagging English Text with a Probabilistic Model. Computational Linguistics 20(2), 155–171 (1994)
Bikel, D.M., Schwartz, R., Weischedel, R.M.: An Algorithm that Learns What’s in a Name. Machine Learning (Special Issue on NLP) 34(3), 211–231 (1999)
Zhou, G.D., Su, J.: Named Entity Recognition Using a HMM-based Chunk Tagger. In: Proceedings of the Conference on Annual Meeting for Computational Linguistics (ACL 2002), Philadelphia, pp. 473–480 (2002)
Segond, F., Schiller, A., Grefenstette, Chanod, F.P.: An Experiment in Semantic Tagging using Hidden Markov Model Tagging. In: Proceedings of the Joint ACL/EACL workshop on Automatic Information Extraction and Building of Lexical Semantic Resources, Madrid, Spain, pp. 78–81 (1997)
Brants, T., Skut, W., Krenn, B.: Tagging Grammatical Functions. In: Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP 1997), Brown Univ. (1997)
Skut, W., Brants, T.: Chunk Tagger – Statistical Recognition of Noun Phrases. In: Proceedings of the ESSLLI 1998 workshop on Automatic Acquisition of Syntax and Parsing, Univ. of Saarbrucken, Germany (1998)
Zhou, G.D., Su, J.: Error-driven HMM-based Chunk Tagger with Context-Dependent Lexicon. In: Proceedings of the Joint Conference on Empirical Methods on Natural Language Processing and Very Large Corpus (EMNLP/VLC 2000), Hong Kong (2000)
Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Gale, W.A., Sampson, G.: Good-Turing frequency estimation without tears. Journal of Quantitative Linguistics 2, 217–237 (1995)
Jelinek, F.: Self-Organized Language Modeling for Speech Recognition. In: Waibel, A., Lee, K.-F. (eds.) Readings in Speech Recognition, pp. 450–506. Morgan Kaufmann, San Francisco (1989)
Katz, S.M.: Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE Transactions on Acoustics. Speech and Signal Processing 35, 400–401 (1987)
Ratnaparkhi, A.: Learning to parsing natural language with maximum entropy models. Machine Learning 34, 151–175 (1999)
Daelemans, W., Buchholz, S., Veenstra: Memory-based shallow parsing. In: Proceedings of CoNLL 1999, Bergen, Norway (1999)
Daelemans, T.K.S., Dejean, H., et al.: Applying system combination to base noun noun phrase indentification. In: Proceedings of COLING 2000, Saarbrucken, Germany (2000)
Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Buliding a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
van Rijsbergen, C.J.: Information Retrieval. Buttersworth, London (1979)
Roth, D.: Learning to resolve natural language ambiguities: A unified approach. In: Proceedings of the National Conference on Artificial Intelligence, pp. 806–813 (1998)
Carlson, A., Cumby, C., Rosen, J., Roth, D.: The SNoW learning architecture. Techinical Report UIUCDCS-R-99-2101. UIUC (1999)
Kudoh, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of NAACL 2001, Pittsburgh, PA, USA (2001)
Punyakanok, V., Roth, D.: The Use of Classifiers in Sequential Inference NIPS-13 (2000)
Zhang, T., Damerau, F., Johnson, D.: Text chunking based on a generalization of winnow. Journal of Machine Learning Research 2, 615–637 (2002)
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: ICML-19, Stanford, California, pp. 591–598 (2000)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML-20 (2001)
Bottou, L.: Une approche theorique de l’apprentissage connexionniste: Applications a la reconnaissance de la parole. Doctoral dissertation, Universite de Paris XI (1991)
McCallum, A., Rohanimanesh, K., Sutton, C.: Dynamic conditional random fields for jointly labeling multiple sequences. In: Proceedings of IJCAI 2003 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhou, G., Yang, L., Su, J., Ji, D. (2005). Mutual Information Independence Model Using Kernel Density Estimation for Segmenting and Labeling Sequential Data. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-30586-6_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)