Abstract
We present a new interactive workbench RDRCE (RDR Case Explorer) to facilitate the combination of Machine Learning and manual Knowledge Acquisition for Natural Language Processing problems. We show how to use Brill’s well regarded transformational learning approach and convert its results into an RDR tree. RDRCE then strongly guides the systematic inspection of the generated RDR tree in order to further refine and improve it by manually adding more rules. Furthermore, RDRCE also helps in quickly recognising potential noise in the training data and allows to deal with noise effectively. Finally, we present a first study using RDRCE to build a high-quality Part-of-Speech tagger for English. After some 60 hours of manual knowledge acquisition, we already exceed slightly the state-of-the art performance on unseen benchmark test data and the fruits of some 15 years of further research in learning methods for Part-of-Speech taggers.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Brill, E.: Some advances in transformation-based part of speech tagging. In: AAAI 1994: Proceedings of the Twelfth National Conference on Artificial Intelligence, vol. 1, pp. 722–727 (1994)
Catlett, J.: Ripple-down rules as a mediating representation in interactive induction. In: Proceedings of the Japanese Knowledge Acquisition for Knowledge-Based Systems Workshop, Kobe, Japan, pp. 155–170 (1992)
Collins, M.: Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. In: EMNLP 2002: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, p. 10 (July 2002)
Compton, P., Jansen, R.: A philosophical basis for knowledge acquisition. Knowl. Acquis. 2(3), 241–257 (1990)
Edwards, G., Compton, P., Malor, R., Srinivasan, A., Lazarus, L.: Peirs: a pathologist maintained expert system for the interpretation of chemical pathology reports. Pathology 25, 27–34 (1993)
Gaines, B.R.: An ounce of knowledge is worth a ton of data: Quantiative studies of the trade-off between expertise and data based on statistically well-founded empirical induction. In: Proceedings of the 6th International Workshop on Machine Learning, pp. 156–159 (June 1989)
Kang, B., Compton, P., Preston, P.: Multiple classification ripple down rules: Evaluation and possibilities. In: Proceedings of the 9th AAAI-sponsored Banff Knowledge Acquisition for Knowledge Based Systems Workshop, pp. 17.1–17.20 (1995)
Kim, Y.S., Kang, B.H., Choi, Y.J.: Incremental Knowledge Management of Web Community Groups on Web Portals. In: 5th International Conference on Practical Aspects of Knowledge Management, Vienna, Austria, pp. 198–207 (2004)
Klein, S., Simmons, R.F.: A computational approach to grammatical coding of english words. ACM 10(3), 334–347 (1963)
Martinez-Bejar, R., Ibanez-Cruz, F., Le-Gia, T., Cao, T.M., Compton, P.: Fmr: An incremental knowledge acquisition system for fuzzy domains. In: Fensel, D., Studer, R. (eds.) EKAW 1999. LNCS (LNAI), vol. 1621, pp. 349–354. Springer, Heidelberg (1999)
Pham, S.B., Hoffmann, A.: Efficient knowledge acquisition for extracting temporal relations. In: Proceedings of the European Conference on Artificial Intelligence (ECAI), Riva del Garda, Italy, pp. 521–525 (2006)
Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 82–94 (1995)
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 133–142 (1996)
Richards, D.: Two decades of ripple down rules research. The Knowledge Engineering Review 24(2), 159–184 (2009)
Samuel, K., Carberry, S., Vijay-Shanker, K.: Dialogue act tagging with transformation-based learning. In: Proceedings of the 17th International Conference on Computational Linguistics (August 1998)
Scheffer, T.: Algebraic foundations and improved methods of induction or ripple-down rules. In: Proceedings of the 2nd Pacific Rim Knowledge Acquisition Workshop, Sydney, Australia, pp. 279–292 (1996), ISBN: 0-7334-1450-8
Shen, L., Satta, G., Joshi, A.K.: Guided learning for bidirectional sequence classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 760–767 (June 2007)
Spoustová, D., Hajič, J., Raab, J., Spousta, M.: Semi-supervised training for the averaged perceptron pos tagger. In: EACL 2009: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (March 2009)
Suryanto, H., Compton, P.: Invented predicates to reduce knowledge acquisition. In: Motta, E., Shadbolt, N.R., Stutt, A., Gibbins, N. (eds.) EKAW 2004. LNCS (LNAI), vol. 3257, pp. 293–306. Springer, Heidelberg (2004)
Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180 (2003)
Wada, T., Motoda, H., Washio, T.: Knowledge acquisition from both human expert and data. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 550–561. Springer, Heidelberg (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xu, H., Hoffmann, A. (2010). RDRCE: Combining Machine Learning and Knowledge Acquisition. In: Kang, BH., Richards, D. (eds) Knowledge Management and Acquisition for Smart Systems and Services. PKAW 2010. Lecture Notes in Computer Science(), vol 6232. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15037-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-15037-1_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15036-4
Online ISBN: 978-3-642-15037-1
eBook Packages: Computer ScienceComputer Science (R0)