Abstract
WISDOM is a intelligent document processing system that transforms printed information into a symbolic representation. Its distinguishing feature is the use of a rule base which is automatically built from a set of training documents using two inductive learning techniques: Decision tree learning for the blocks classification, and first-order rule induction for the document classification and understanding. In the paper, advances made with respect to previous studies on this application domain are illustrated and a complete set of experimental results is reported.
Preview
Unable to display preview. Download preview PDF.
References
Ciardiello, G., Scafuro, G., Degrandi, M.T., Spada, M.R., Roccotelli, M.P.: An Experimental System for Office Document Handling and Text Recognition. Proc. of the 9th Int. Conf. on Pattern Recognition, IEEE Computer Society Press, Los Alamitos (1988) 739–743.
De Raedt, L.: Inductive Theory Revision, Academic Press, London (1992).
Esposito, F., Malerba, D., Semeraro, G., Annese, E., Scafuro, G.: Empirical Learning Methods for Digitized Document Recognition: An Integrated Approach to Inductive Generalization. Proc. of the 6th IEEE Conf. on Artificial Intelligence Applications, IEEE Computer Society Press, Los Alamitos (1990) 37–45.
Esposito, F., Malerba, D., Semeraro, G.: Multistrategy Learning for Document Recognition. Applied Artificial Intelligence, 8(1) (1994) 33–84.
Esposito, F., Malerba, D., Semeraro, G.: A Comparative Analysis of Methods for Pruning Decision Trees. IEEE Trans. on Pattern Analysis and Machine Intelligence, TPAMI-19(5) (1997) 476–491.
Fayyad, U.M., Irani, K.B.: On the Handling of Continuous-Valued Attributes in Decision Tree Generation. Machine Learning, 8 (1992) 87–102.
Fisher, J.L., Hinds, S.C., D’Amato, D.P.: A Rule-Based System for Document Image Segmentation. Proc. of the 10th Int. Conf. on Pattern Recognition, IEEE Computer Society, Los Alamitos (1990) 567–572.
Helft, N.: Inductive Generalization: A Logical Framework. In: Bratko, I., Lavrac, N. (eds.): Progress in Machine Learning—Proc. of the EWSL87, Sigma Press, London (1987) 149–157.
Malerba, D., Semeraro, G., Bellisari, E.: LEX: A Knowledge-Based System for the Layout Analysis. Proc. of the 3rd Int. Conf. on the Practical Application of Prolog (1995) 429–443.
Malerba, D., Semeraro, G., Esposito, F.: A Multistrategy Approach to Learning Multiple Dependent Concepts. In: Taylor, C., Nakhaeizadeh, R. (eds.): Machine Learning and Statistics: The Interface, Wiley, London (1997) 87–106.
Malerba, D., Esposito, F., Semeraro, G., Caggese, S.: Handling Continuous Data in Top-Down Induction of First-Order Rules. In: Lenzerini, M. (ed.): AI * IA 97: Advances in Artificial Intelligence, Lecture Notes in Artificial Intelligence, Vol. 1321, Springer-Verlag, Berlin Heidelberg New York (1997) 24–35.
Malerba, D., Esposito, F., Semeraro, G., De Filippis, L.: Processing Paper Documents in WISDOM. In: Lenzerini, M. (ed.): AI * IA 97: Advances in Artificial Intelligence, Lecture Notes in Artificial Intelligence, Vol. 1321, Springer-Verlag, Berlin Heidelberg New York (1997) 439–442.
Nagy, G., Seth, S.C., Stoddard, S.D.: A Prototype Document Image Analysis System for Technical Journals. IEEE Computer, 25(7) (1992) 10–22.
Orkin, M., Drogin, R.: Vital Statistics, McGraw Hill, New York (1990).
Quinlan, J.R.: C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo (1993).
Quinlan, J.R., Cameron-Jones, R.M.: FOIL: A Midterm Report. In: Brazdil, P.B. (ed.): Machine Learning: ECML-93, Lecture Notes in Artificial Intelligence, Vol. 667, Berlin: Springer-Verlag, Berlin Heidelberg New York (1993) 3–20.
Srihari, S.N., Lam, S.W., Hull, J.J., Srihari, R.K., Govindaraju, V.: Intelligent Data Retrieval from Raster Images of Documents. In: Fox, E.A. (ed.): Source Book on Digital Libraries, ftp://fox.cs.vt.edu/pub/DigitalLibrary (1993).
Tang, Y.Y., Yan, C.D., Suen, C.Y.: Document Processing for Automatic Knowledge Acquisition, IEEE Trans. on Knowledge and Data Engineering, 6(1) (1994) 3–21.
Utgoff, P.E.: An Improved Algorithm for Incremental Induction of Decision Trees. Proc. of the 11th Int. Conf. on Machine Learning, Morgan Kaufmann, San Francisco (1994) 318–325.
Wang, D., Srihari, R.N.: Classification of Newspaper Image Blocks Using Texture Analysis. Computer Vision, Graphics, and Image Processing, 47 (1989) 327–352.
Wong, K.Y., Casey, R.G., Wahl, F.M.: Document Analysis System, IBM J. of Research Development, 26(6) (1982) 647–656.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Esposito, F., Malerba, D., Lisi, F.A. (1999). Machine learning for intelligent document processing: The WISDOM system. In: Raś, Z.W., Skowron, A. (eds) Foundations of Intelligent Systems. ISMIS 1999. Lecture Notes in Computer Science, vol 1609. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0095095
Download citation
DOI: https://doi.org/10.1007/BFb0095095
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65965-5
Online ISBN: 978-3-540-48828-6
eBook Packages: Springer Book Archive