Abstract
EBORA (Digital AccEss to BOoks of the RenAissance) is a multidisciplinary European project aiming at digitizing and thus making rare sixteenth century books more accessible. End-users, librarians, historians, researchers in book history and computer scientists participated in the development of remote and collaborative access to digitized Renaissance books, necessary because of the reduced accessibility to digital libraries in image mode through the Internet. The size of files for the storage of images, the lack of a standard file format exchange suitable for progressive transmission, and limited querying possibilities currently limit remote access to digital libraries. To improve accessibility, historical documents must be digitized and retro-converted to extract a detailed description of the image contents suited to users’ needs. Specialists of the Renaissance have described the metadata generally required by end-users and the ideal functionalities of the digital library. The retro-conversion of historical documents is a complex process that includes image capture, metadata extraction, image storage and indexing, automatic conversion in a reusable electronic form, publication on the Internet, and data compression for faster remote access. The steps of this process cannot be developed independently. DEBORA proposes a global approach to retro-conversion from the digitization to the final functionalities of the digital library centered on users’ needs. The retro-conversion process is mainly based on a document image analysis system that simultaneously extracts the metadata and compresses the images. We also propose a file format to describe compressed books as heterogeneous data (images/text/links/ annotation/physical layout and logical structure) suitable for progressive transmission, editing, and annotation. DEBORA is an exploratory project that aims at demonstrating the feasibility of the concepts by developing prototypes tested by end-users.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Le Bourgeois, F., et al.: Document images analysis solutions for digital libraries. In: Proceedings of first International Workshop on Document Image Analysis for Libraries (DIAL’04). Palo Alto, California, pp. 2–24, 23–24 January 2004
http://debora.enssib.fr
DEBORA: European project, on-line book, 171p. http://rfv6.insa-lyon.fr/debora (2000)
Trinh, E.: De la numérisation à la consultation des documents anciens : Elaboration de procédures de numérisation, de traitements de restauration et proposition d’une plate-forme de consultation, PhD, INSA de Lyon, Villeurbanne France, 212 p, 3 April 2003
Sauvola, J., et al.: Adaptative document binarization. In: Proceedings of the 4th International Conference on Analysis and Recognition, ICDAR’97, vol. 1, Ulm, Allemagne, pp. 147–152 (1997)
Wolf, C.: Text localization enhancement and binarization in multimedia documents. In: Proceedings of the ICPR’02, vol 2, August 11–15 2002, Québec, Canada, pp. 1037–1040
Le Bourgeois, F., Kaileh, H.: Automatic metadata retrieval from ancient manuscripts. In: Proceedings of International Workshop on Documents Analysis Systems (DAS2004), Florence, 8–10 September 2004
Hunter R., Robinson A. (1980) International digital facsimile coding standards. Proc. IEEE, 68: 854–867
Bodson D., Urban S., Deutermann A., Clarke C. (1985) Measurement of data compression in advanced group 4 facsimile system. Proc. IEEE, 73: 731–739
JBIG Committee: ISO/IEC JTC1/SC29/WG1 (ITU-T-SG8) WD 14492, (1998)
Pennebaker W., Mitchell J., Langdon G., Arps R. (1988) An overview of the basic principles of the Q-coder adaptative binary arithmetic coder. IBM J. Res. Dev, 32: 717–726
Kia O.E.: Document image compression and analysis. Ph.D. of the university of Maryland, 1997, p. 191 (1997)
Howard P. Lossless and lossy compression of text images by soft pattern matching. In: Proceedings of the IEEE Data compression Conference, 210–219 (1996).
Howard P., Kossentini F., Martins B., Forchhammer S., Rucklide W., Ono F. (1998) The emerging JBIG2 standard. IEEE Trans. Circ. Syst. Video Technol, 8(5): 838–848
Bottou L., Haffner P., Howard P.G., Simard P., Bengio Y., LeCun Y. (1998) High-quality document image compression with DjVu. J Electron. Imaging, 7(3): 410–428
Asher R., Nagy G. (1974) A means for achieving a high degree of compaction on scan-digitized printed text. IEEE Trans. Comput. 23: 1174–1179
Wong K., Casey R., Wahl F. (1982) Document analysis system. IBM J. Res. Dev. 26: 647–656
Mohiuddin K., Rissanen J., Arps R. Lossless binary image compression based on pattern matching. Proceedings of the International Conference On Computers, Systems and Signal Processing, 447–451 (1984)
Witten I., Bell T., Emberson H., Inglis S., Moffat A. (1994) Textual image compression: two stage lossy/lossless encoding of textual images. Proc. IEEE, 82: 878–888
Inglis S., Witten I.: Compression-based template matching. Proc. of the IEEE Data Compression Conference, pp. 106–115 (1994)
Le Bourgeois F., Emptoz H.: Document Analysis in gray level and typography Extraction using character pattern redundancies. In: proceedings of the 5th ICDAR, Bangalore India, pp. 177–180, 20–22 (1999)
Gross A., Latecki L.J. (1999) Digital geometric methods in document image analysis. Pattern Recogn, 32: 407–424
Sarkar P., et al. (1991) Spatial sampling of printed patterns. IEEE Trans. Pattern Anal. Mach. Intell. 20: 344–351
Le Bourgeois F., et al.: Networking digital document images. Proceedings of the ICDAR, Seattle, pp. 379–383 (2001)
O’Gorman, Binarization and multi-thresholding of document images using connectivity. Comput. Vis. Graph. Image Process. J. Graph. Models Image Process. 56(6), 494–506 (1994)
Hersch R., André J., Brown H. (1998) Electronic publishing, artistic imaging, and digital typography. Springer, Berlin New York
André J. (2003) Numérisation et codage des caractères de livres anciens. J. Doc. Numér, 7(3): 127–142
Turcan, I.: L’édition scientifique d‘ouvrages anciens sur support électronique: éthique méthodologique du traitement numérique des ornements et marques typographiques des dictionnaires dans le programme de numérisation des collections d’ouvrages anciens du laboratoire ATILF, actes de la XIVe Conférence Européenne TeX (EuroTeX’2003), Retour à la typographie. Brest, 24–27 juin 2003
Bres, S., Jolion, J.M., Le Bourgeois, F.: Traitement et analyse des images numériques. Paris Hermès Lavoisier. ISBN 2-7462-0741-9, 408 p (2003)
Nadler L. (1984) A survey of document segmentation and coding techniques. Comput. Vis. Graph. Image process. 28: 240–262
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Le Bourgeois, F., Emptoz, H. DEBORA: Digital AccEss to BOoks of the RenAissance. IJDAR 9, 193–221 (2007). https://doi.org/10.1007/s10032-006-0030-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-006-0030-0