Abstract
Partial parsing techniques try to recover syntactic information efficiently and reliably by sacrificing completeness and depth of analysis. One of the difficulties of partial parsing is finding a means to extract the grammar involved automatically. In this paper, we present a method for automatically extracting partial parsing rules from a tree-annotated corpus using decision tree induction. We define the partial parsing rules as those that can decide the structure of a substring in an input sentence deterministically. This decision can be considered as a classification; as such, for a substring in an input sentence, a proper structure is chosen among the structures occurred in the corpus. For the classification, we use decision tree induction, and induce partial parsing rules from the decision tree. The acquired grammar is similar to a phrase structure grammar, with contextual and lexical information, but it allows building structures of depth one or more. Our experiments showed that the proposed partial parser using the automatically extracted rules is not only accurate and efficient, but also achieves reasonable coverage for Korean.
This research was supported in part by the Ministry of Science and Technology, the Ministry of Culture and Tourism, and the Korea Science and Engineering Foundation in Korea.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abney, S.P.: Part-of-speech tagging and partial parsing. In: Corpus-Based Methods in Language and Speech. Kluwer Academic Publishers, Dordrecht (1996)
Abney, S.P.: Partial parsing via finite-state cascades. In: Proceedings of the ESSLLI 1996 Robust Parsing Workshop, pp. 8–15 (1996)
Aït-Mokhtar, S., Chanod, J.P.: Incremental finite-state parsing. In: Proceedings of Applied Natural Language Processing, pp. 72–79 (1997)
Argamon-Engelson, S., Dagan, I., Krymolowski, Y.: A memory-based approach to learning shallow natural language patterns. Journal of Experimental and Theoretical AI 11(3), 369–390 (1999)
Black, E., Abney, S., Flickenger, D., Gdaniec, C., Grishman, R., Harrison, P., Hindle, D., Ingria, R., Jelinek, F., Klavans, J., Liberman, M., Marcus, M., Roukos, S., Santorini, B., Strzalkowski, T.: A procedure for quantitatively comparing the syntactic coverage of English grammars. In: Proceedings of the DARPA Speech and Natural Language Workshop, pp. 306–311 (1991)
Bod, R.: Enriching Linguistics with Statistics: Performance Models of Natural Language. Ph.D Thesis. University of Amsterdam (1995)
Cardie, C., Pierce, D.: Error-driven pruning of treebank grammars for base noun phrase identification. In: Proceedings of 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 218–224 (1998)
Déjean, H.: Learning rules and their exceptions. Journal of Machine Learning Research 2, 669–693 (2002)
Hindle, D.: A parser for text corpora. In: Computational Approaches to the Lexicon, pp. 103–151. Oxford University, Oxford (1995)
Hobbs, J.R., Appelt, D., Bear, J., Israel, D., Kameyama, M., Stickel, M., Tyson, M.: Fastus: A cascaded finite-state transducer for extracting information from natural-language text. In: Finite-State Language Processing, pp. 383–406. The MIT Press, Cambridge (1997)
Lee, K.J.: Probabilistic Parsing of Korean based on Language-Specific Properties. Ph.D. Thesis. KAIST, Korea (1998)
Lee, K.J., Kim, G.C., Kim, J.H., Han, Y.S.: Restricted representation of phrase structure grammar for building a tree annotated corpus of Korean. Natural Language Engineering 3(2), 215–230 (1997)
Muñoz, M., Punyakanok, V., Roth, D., Zimak, D.: A learning approach to shallow parsing. In: Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Copora, pp. 168–178 (1999)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of Third Wordkshop on Very Large Corpora, pp. 82–94 (1995)
van Rijsbergen, C.: Information Retrieval. Buttersworth (1975)
Tjong Kim Sang, E.F.: Memory-based shallow parsing. Journal of Machine Learning Research 2, 559–594 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Choi, MS., Lim, C.S., Choi, KS. (2005). Automatic Partial Parsing Rule Acquisition Using Decision Tree Induction. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_13
Download citation
DOI: https://doi.org/10.1007/11562214_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)