Abstract
This paper deals with the representation of document models used in the field of document recognition. A novel formalism called generalized n-gram is presented, which is shown to be accurate for the recognition task and well adapted to automatic learning by examples. The paper addresses also the thorny problem of integrating models for document analysis with existing standards used for document manipulation and production.
This project is funded by the Swiss National Fund for Scientific Research, code 21-42'355.94.
Preview
Unable to display preview. Download preview PDF.
References
O. Akindele. Vers un système de construction automatique de modèles génériques de documents. PhD thesis, CRIN-Nancy, 1995.
D. Bollinger. Inferenz und Spezialisierung kontextfreier Regeln mit statistischen Zusatzin-formationen. master's thesis report in computer science, Uni Fribourg, 1996.
R. Brugger, A. Zramdini, and R. Ingold. Modeling documents for structure recognition using generalized n-grams. In ICDAR, 1997.
H. Bunke and P. S. P. Wang. Handbook of Optical Character Recognition and Document Analysis. World Scientific Publishing Company, 1997.
E. Charniak. Statistical language learning. MIT Press, 1993.
J. Clark. Jade — james' dsssl engine. http://www.jclark.com/jade/, 1997.
P. Fankhauser and Y. Xu. Markitup! an incremental approach to document structure recognition. Electronic Publishing, 6, December 1993.
D. J. Hand. Artificial Intelligence, Frontiers in Statistics. Chapman & Hall, 1993.
G. J. Klir and T. A. Folger. Fuzzy Sets, Uncertainty, and Information. Prentice-Hall International, 1992.
P. Lefèvre and F. Reynaud. ODIL: an SGML description language of the layout structure of documents. In ICDAR, 1995.
International Standards Organization. Information processing — text and office systems — standard generalized markup language (SGML) (ISO 8879). Geneva: ISO, 1986.
International Standards Organization. Document style semantics and specification language (DSSSL) (ISO 10179). Geneva: ISO, 1996.
A. L. Spitz. Style directed document recognition. In ICDAR, pages 611–619, 1991.
P. H. Winston. Artificial Intelligence. Addison-Wesley, second edition, 1984.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brugger, R., Bapst, F., Ingold, R. (1998). A DTD extension for document structure recognition. In: Hersch, R.D., André, J., Brown, H. (eds) Electronic Publishing, Artistic Imaging, and Digital Typography. RIDT 1998. Lecture Notes in Computer Science, vol 1375. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0053282
Download citation
DOI: https://doi.org/10.1007/BFb0053282
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64298-5
Online ISBN: 978-3-540-69718-3
eBook Packages: Springer Book Archive