Skip to main content

Machine learning for intelligent document processing: The WISDOM system

  • Communications
  • Conference paper
  • First Online:
Foundations of Intelligent Systems (ISMIS 1999)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1609))

Included in the following conference series:

  • 122 Accesses

Abstract

WISDOM is a intelligent document processing system that transforms printed information into a symbolic representation. Its distinguishing feature is the use of a rule base which is automatically built from a set of training documents using two inductive learning techniques: Decision tree learning for the blocks classification, and first-order rule induction for the document classification and understanding. In the paper, advances made with respect to previous studies on this application domain are illustrated and a complete set of experimental results is reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ciardiello, G., Scafuro, G., Degrandi, M.T., Spada, M.R., Roccotelli, M.P.: An Experimental System for Office Document Handling and Text Recognition. Proc. of the 9th Int. Conf. on Pattern Recognition, IEEE Computer Society Press, Los Alamitos (1988) 739–743.

    Google Scholar 

  2. De Raedt, L.: Inductive Theory Revision, Academic Press, London (1992).

    Google Scholar 

  3. Esposito, F., Malerba, D., Semeraro, G., Annese, E., Scafuro, G.: Empirical Learning Methods for Digitized Document Recognition: An Integrated Approach to Inductive Generalization. Proc. of the 6th IEEE Conf. on Artificial Intelligence Applications, IEEE Computer Society Press, Los Alamitos (1990) 37–45.

    Google Scholar 

  4. Esposito, F., Malerba, D., Semeraro, G.: Multistrategy Learning for Document Recognition. Applied Artificial Intelligence, 8(1) (1994) 33–84.

    Google Scholar 

  5. Esposito, F., Malerba, D., Semeraro, G.: A Comparative Analysis of Methods for Pruning Decision Trees. IEEE Trans. on Pattern Analysis and Machine Intelligence, TPAMI-19(5) (1997) 476–491.

    Article  Google Scholar 

  6. Fayyad, U.M., Irani, K.B.: On the Handling of Continuous-Valued Attributes in Decision Tree Generation. Machine Learning, 8 (1992) 87–102.

    MATH  Google Scholar 

  7. Fisher, J.L., Hinds, S.C., D’Amato, D.P.: A Rule-Based System for Document Image Segmentation. Proc. of the 10th Int. Conf. on Pattern Recognition, IEEE Computer Society, Los Alamitos (1990) 567–572.

    Google Scholar 

  8. Helft, N.: Inductive Generalization: A Logical Framework. In: Bratko, I., Lavrac, N. (eds.): Progress in Machine Learning—Proc. of the EWSL87, Sigma Press, London (1987) 149–157.

    Google Scholar 

  9. Malerba, D., Semeraro, G., Bellisari, E.: LEX: A Knowledge-Based System for the Layout Analysis. Proc. of the 3rd Int. Conf. on the Practical Application of Prolog (1995) 429–443.

    Google Scholar 

  10. Malerba, D., Semeraro, G., Esposito, F.: A Multistrategy Approach to Learning Multiple Dependent Concepts. In: Taylor, C., Nakhaeizadeh, R. (eds.): Machine Learning and Statistics: The Interface, Wiley, London (1997) 87–106.

    Google Scholar 

  11. Malerba, D., Esposito, F., Semeraro, G., Caggese, S.: Handling Continuous Data in Top-Down Induction of First-Order Rules. In: Lenzerini, M. (ed.): AI * IA 97: Advances in Artificial Intelligence, Lecture Notes in Artificial Intelligence, Vol. 1321, Springer-Verlag, Berlin Heidelberg New York (1997) 24–35.

    Google Scholar 

  12. Malerba, D., Esposito, F., Semeraro, G., De Filippis, L.: Processing Paper Documents in WISDOM. In: Lenzerini, M. (ed.): AI * IA 97: Advances in Artificial Intelligence, Lecture Notes in Artificial Intelligence, Vol. 1321, Springer-Verlag, Berlin Heidelberg New York (1997) 439–442.

    Google Scholar 

  13. Nagy, G., Seth, S.C., Stoddard, S.D.: A Prototype Document Image Analysis System for Technical Journals. IEEE Computer, 25(7) (1992) 10–22.

    Google Scholar 

  14. Orkin, M., Drogin, R.: Vital Statistics, McGraw Hill, New York (1990).

    MATH  Google Scholar 

  15. Quinlan, J.R.: C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo (1993).

    Google Scholar 

  16. Quinlan, J.R., Cameron-Jones, R.M.: FOIL: A Midterm Report. In: Brazdil, P.B. (ed.): Machine Learning: ECML-93, Lecture Notes in Artificial Intelligence, Vol. 667, Berlin: Springer-Verlag, Berlin Heidelberg New York (1993) 3–20.

    Google Scholar 

  17. Srihari, S.N., Lam, S.W., Hull, J.J., Srihari, R.K., Govindaraju, V.: Intelligent Data Retrieval from Raster Images of Documents. In: Fox, E.A. (ed.): Source Book on Digital Libraries, ftp://fox.cs.vt.edu/pub/DigitalLibrary (1993).

    Google Scholar 

  18. Tang, Y.Y., Yan, C.D., Suen, C.Y.: Document Processing for Automatic Knowledge Acquisition, IEEE Trans. on Knowledge and Data Engineering, 6(1) (1994) 3–21.

    Article  MATH  Google Scholar 

  19. Utgoff, P.E.: An Improved Algorithm for Incremental Induction of Decision Trees. Proc. of the 11th Int. Conf. on Machine Learning, Morgan Kaufmann, San Francisco (1994) 318–325.

    Google Scholar 

  20. Wang, D., Srihari, R.N.: Classification of Newspaper Image Blocks Using Texture Analysis. Computer Vision, Graphics, and Image Processing, 47 (1989) 327–352.

    Article  Google Scholar 

  21. Wong, K.Y., Casey, R.G., Wahl, F.M.: Document Analysis System, IBM J. of Research Development, 26(6) (1982) 647–656.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zbigniew W. Raś Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Esposito, F., Malerba, D., Lisi, F.A. (1999). Machine learning for intelligent document processing: The WISDOM system. In: Raś, Z.W., Skowron, A. (eds) Foundations of Intelligent Systems. ISMIS 1999. Lecture Notes in Computer Science, vol 1609. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0095095

Download citation

  • DOI: https://doi.org/10.1007/BFb0095095

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65965-5

  • Online ISBN: 978-3-540-48828-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics