Abstract.
The increasing availability of high-performance, low-priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or wearable computers, and standalone image or video devices are highly mobile and easy to use; they can capture images of thick books, historical manuscripts too fragile to touch, and text in scenes, making them much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there will clearly be a demand in many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera-captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges, and solutions for the analysis of documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Baker S, Kanade T (2002) Limits on super-resolution and how to break them. IEEE Trans PAMI 24(9):1167-1183
Bayer BE: Color image array, US Patent 3971056
Bertucci E, Pilu M, Mirmehdi M (2003) Text selection by structured light marking for hand-held cameras. In: In: Proc. ICDAR, pp 555-559
Brown LG (1992) A survey of image registration techniques. ACM Comput Surv 24(4):325-376
Brown MS, Seales WB (2001) Document restoration using 3D shape: a general deskewing algorithm for arbitrarily warped documents. In: In: Proc. ICCV, pp 367-374
Cai M, Song J-Q, Lyu MR (2002) A new approach for video text detection. In: In: Proc. ICIP, pp 117-120
Cao H-G, Ding X-Q, Liu C-S (2003) Rectifying the bound document image captured by the camera: a model based approach. In: In: Proc. ICDAR, pp 71-75
Chang SL, Chen LS, Chung YC, Chen SW (2004) Automatic license plate recognition. IEEE Trans Intell Transport Syst 5(1):42-53
Capel D, Zisserman A (2000) Super-resolution enhancement of text image sequences. In: In: Proc. ICPR, pp 600-605
Chen D, Shearer K, Bourlard H (2001) Text enhancement with asymmetric filter for video OCR. In: Proc. ICDAR, pp 192-197
Clark P, Mirmehdi M (2000) Location and recovery of text on oriented surfaces. In: Proc. SPIE Document Recognition and Retrieval VII, pp 267-277
Clark P, Mirmehdi M (2000) Finding text regions using localised measures. In: Proc. 11th BMVC, pp 675-684
Clark P, Mirmehdi M (2001) Estimating the orientation and recovery of text planes in a single image. In: Proc. 12th BMVC, pp 421-430
Clark P, Mirmehdi M (2002) On the recovery of oriented documents from single images. In: Proc. Advanced Concepts for Intelligent Vision Systems, pp 190-197
Clark P, Mirmehdi M (2002) Recognizing text in real scenes. Int J Doc Anal Recog 4(4):243-257
Comelli P, Ferragina P, Granieri MN, Stabile F (1995) Optical recognition of motor vehicle license plates. IEEE Trans Vehicular Technol 44(4):790-799
Crandall D, Antani S, Kasturi R (2003) Extraction of special effects caption text events from digital video. Int J Doc Anal Recog 5(2-3):138-157
Dance CR (2002) Perspective estimation for document images. In: Proc. SPIE Document Reconition and Retrieval IX, pp 244-254
Doermann D (1998) The indexing and retrieval of document images: a survey. Comput Vis Image Understand 70(3):287-298
Doermann D, Mihalcik D (2000) Tools and techniques for video performance evaluation. In: Proc. ICPR, pp 167-170
Doncescu A, Bouju A, Quillet V (1997) Former books digital processing: image warping. In: Proc. workshop on document image analysis, pp 5-9
Du EY, Chang C-I, Thouin PD (2002) Thresholding video images for text detection. In: Proc. 16th ICPR, 3:919-922
Elad M, Feuer A (1997) Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images. IEEE Trans Image Process 6(12):1646-1658
Etemad K, Doermann DS, Chellappa R (1997) Multiscale segmentation of unstructured document pages using soft decision integration. IEEE Trans Patt Anal Mach Intell 19(1):92-96
Fekri F, Mersereau RM, Schafer RW (2000) A generalized interpolative vector quantization method for jointly optimal quantization, interpolation, and binarization of text images. IEEE Trans Image Process 9(7):1272-1281
Fink GA, Wienencke M, Sagerer G (2001) Video-based on-line handwriting recognition. In: Proc. ICDAR, pp 226-230
Fisher F (2001) Digital camera for document acquisition. In: Proc. symposium on document image understanding technology, pp 75-83
Fletcher LA, Kastury R (1988) A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans Pattern Anal Mach Intell 10(6):910-918
Gargi U, Crandall D, Antani S, Gandhi T, Keener R, Kasturi R (1999) A system for automatic text detection in video. In: Proc. ICDAR, pp 29-32
Geist J, Wilkinson RA, Janet S, Grother PJ, Hammond B, Larsen NW, Klear RM, Burges CJC, Creecy R, Hull JJ, Vogl TP, Wilson CL (1994) The second census optical character recognition systems conference. Technical Report NISTIR 5452, June 1994
Gotoh T, Toriu T, Sasaki S, Yoshida M (1988) A flexible vision-based algorithm for a book sorting system. IEEE Trans Pattern Anal Mach Intell 10(3):393-399
Haralik RM (1994) Document image understanding: geometric and logical layout. In: Proc. CVPR, pp 385-390
Hasan YMY, Karam LJ (2000) Morphological text extraction from images. IEEE Trans Image Process 9(11):1079-1983
Hsieh J-W, Yu S-H, Chen Y-S (2002) Morphology-based license plate detection from complex scenes. In: Proc. ICPR, pp 176-179
Hua X-S, Chen X-R, Liu W-Y, Zhang H-J (2001) Automatic location of text in video frames. In: Proc. ACM workshop on multimedia: multimedia information retrieval, pp 24-27
Hua X-S, Liu W, Zhang H-J (2001) Automatic performance evaluation for video text detection. In: Proc. ICDAR, pp 545-550
Irani M, Peleg S (1991) Improving resolution by image registration. CVGIP Graphical Models and Image Processing 53(3):231-239
Jain AK, Yu B (1998) Automatic text location in images and video frames. Pattern Recog 31(12):2055-2076
Jiang WWC (1995) Thresholding and enhancement of text images for character recognition. In: Proc. IEEE international conference on acoustics, speech, and signal processing, 4:2395-2398
Jung K, Kim KI, Han J-H (2002) Text extraction in real scene images on planar planes. In: Proc. ICPR, pp 469-472
Jung K, Kim KI, Kurata T, Kourogi M, Han J-H (2002) Text scanner with text detection technology on image sequences. In: Proc. 16th ICPR, 3:473-476
Kamada H, Fujimoto K (1999) High-speed, high-accuracy binarization method for recognizing text in images of low spatial resolutions. In: Proc. ICDAR, pp 139-142
Kanungo T, Haralick RM, Phillips I (1993) Global and local document degradation models. In: Proc. ICDAR, pp 730-734
Kim H-K (1996) Efficient automatic text location method and content-based indexing and structuring of video database. J Vis Commun Image Represent 7(4):336-344
Kim S, Kim D, Ryu Y, Kim G (2002) A robust license-plate extraction method under complex image conditions. In: Proc. ICPR, pp 216-219
Kuo S-S, Ranganath MV (1995) Real time image enhancement for both text and color photo images. In: Proc. ICIP, 1:159-162
Kurakake S, Kuwano H, Odaka K (1997) Recognition and visual feature matching of text region in video for conceptual indexing. In: Proc. SPIE Storage and Retrieval for Image and Video Databases V, San Jose, CA, 3022:368-379
Kuwano H, Taniguchi Y, Arai H, Mori M, Kurakake S, Kojima H (2000) Telop-on-demand: video structuring and retrieval based on text recognition. In: Proc. IEEE ICME, New York, pp 759-762
Lee C-M, Kankanhalli A (1995) Automatic extraction of characters in complex scene images. Int J Pattern Recog Artif Intell 9(1):67-82
Li C, Ding X-Q, Wu Y-S (2001) Automatic text location in natural scene images. In: Proc. ICDAR, pp 1069-1073
Li J, Gray RM (1998) Text and picture segmentation by the distribution analysis of wavelet coefficients. In: Proc. ICIP, 3:790-794
Li H, Kia O, Doermann D (1999) Text enhancement in digital video. In: Proc. 8th ACM conference on information and knowledge management, pp 122-130
Li H, Doermann D (1999) Text enhancement in digital video using multiple frame integration. In: Proc. ACM international multimedia conference, pp 19-22
Li H, Doermann D (2000) A video text detection system based on automated training. In: Proc. ICPR, pp 223-226
Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9(1):147-167
Lienhart R, Stuber F (1996) Automatic text recognition in digital videos. In: Proc. SPIE Image and Video Processing IV, 2666:180-188
Lienhart R, Effelsberg W (2000) Automatic text segmentation and text recognition for video indexing. ACM Multimedia Syst 8:69-81
Lienhart R, Wernicle A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits Syst Video Technol 12(4):256-268
Lopresti D, Zhou J-Y (2000) Locating and recognizing text in WWW images. Inf Retrieval 2:177-206
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competition. In: Proc. ICDAR, pp 682-687
Mao S, Kanungo T (2001) Empirical performance evaluation methodology and its application to page segmentation algorithms. IEEE Trans Pattern Anal Mach Intell 23(3):242-256
Margner VF, Karcher P, Pawlowski A-K (1997) On benchmarking of document analysis systems. In: Proc. ICDAR, pp 331-336
Mariano VY, Min J, Park J-H, Kasturi R, Mihalcik D, Li H, Doermann D, Drayer T (2002) Performance evaluation of object detection algorithm. In: Proc. ICPR, pp 965-969
Myers GK (2003) Metrics for evaluating the performance of video text recognition systems. In: Proc. symposium on document image understanding technology, pp 259-263
Messalodi S, Modena CM (1999) Automatic identification and skew estimation of text lines in real scene images. Pattern Recog 32(5):791-810
Miene A, Hermes Th, Ioannidis G (2001) Extracting textual inserts from digital videos. In: Proc. ICDAR, pp 1079-1083
Mirmehdi M, Palmer PL, Kittler J (1997) Towards optimal zoom for automatic target recognition. In: Proc. 10th Scandinavian conference on image analysis, 1:447-453
Mirmehdi M, Clark P, Lam J (2001) Extracting low resolution text with an active camera for OCR. In: Proc. IX Spanish symposium on pattern recognition and image processing, pp 43-48
Moravec KLC (2002) A grayscale reader for camera images of XEROX dataglyphs. In: Proc. 13th BMVC, pp 698-707
Munich ME, Perona P (2002) Visual input for pen-based computers. IEEE Trans Pattern Anal Mach Intell 24(3):313-328
Myers GK, Bolles RC, Luong Q-T, Herson JA (2001) Recognition of text in 3-D scenes. In: Proc. symposium on document image understanding technology, pp 85-99
Nagy G (2000) Twenty years of document image analysis research in PAMI. IEEE Trans Pattern Anal Mach Intell 22(1):63-84
Nartker TA, Rice SV (1994) OCR accuracy: UNLV’s second annual test. INFORM 8(1):40-45
Nartker TA, Rice SV (1994) OCR accuracy: UNLV’s third annual test. INFORM 8(8):30-36
Newman W, Dance C, Taylor A, Taylor S, Taylor M, Aldhous T (1999) CamWorks: a video-based tool for efficient capture from paper source documents. In: Proc. international conference on multimedia computing and systems, pp 647-653
Ohya J, Shio A, Akamatsu S (1994) Recognizing characters in scene images. IEEE Trans Pattern Anal Mach Intell 16(2):214-220
Pilu M (2001) Undoing paper curl distortion using applicable surfaces. In: Proc. CVPR, pp 67-72
Pilu M (2001) Extraction of illusory linear clues in perspectively skewed documents. In: Proc. CVPR, pp 363-368
Pilu M, Pollard S (2002) A light-weight text image processing method for handheld embedded cameras. In: Proc. BMVC, pp 547-556
Pilu M, Isgro F (2002) A fast and reliable planar registration method with applications to document stitching. In: Proc. BMVC, pp 688-697
Plamondon R, Srihari S (2000) On-line and off-line handwriting recognition: a comprehensive survey. IEEE Trans Pattern Anal Mach Intell 22(1):63-84
Rice SV, Jenkins FR, Nartker TA (1995) The fourth annual test of OCR accuracy. Technical Report 95-04, Information Science Research Institute, University of Nevada, Las Vegas
Rice SV, Jenkins FR, Nartker TA (1996) The fifth annual test of OCR accuracy. Technical Report 96-02, Information Science Research Institute, University of Nevada, Las Vegas
Rother C (2000) A new approach for vanishing point detection in architectural environments. In: Proc. 11th BMVC, pp 382-391
Sato T, Kanade T, Hughes EK, Smith MA (1998) Video OCR for digital news archive. In: Proc. IEEE workshop on content-based access of image and video database, pp 52-60
Seeger M, Dance C (2001) Binarising camera images for OCR. In: Proc. ICDAR, pp 54-59
Shim J-C, Dorai C, Bolle R (1998) Automatic text extraction from video for content-based annotation and retrieval. In: Proc. ICPR, pp 618-620
Smeaton AF, P, Over: (2002) The TREC-2002 video track report. In: Proc. TREC
Stafford-Fraser Q, Robinson P (1996) BrightBoard: a video-augmented environment. In: Proc. conference on computer human interface, pp 134-141
Suen H-M, Wang J-F (1996) Text string extraction from images of colour-printed documents. IEE Proc Vis Image Signal Process 143(4):210-216
Taylor MJ, Dance CR (1998) Enhancement of document images from cameras. In: Proc. SPIE: Document Recognition V, pp 230-241
Trier OD, Taxt T (1995) Evaluation of binarization methods for document images. IEEE Trans Pattern Anal Mach Intell 17(3):312-315
Vinciarelli A (2002) A Survey on off-line word recognition. Pattern Recogn 35:1433-1446
Wang H, Kangas J (2001) Character-like region verification for extracting text in scene images. In: Proc. ICDAR, pp 957-962
Watanabe Y, Okada Y, Kim Y-B, Takeda T (1998) Translation camera. In: Proc. 14th ICPR, pp 613-617
Wellner P (1993) Interacting with paper on the DigitalDesk. Commun ACM 36(7):87-96
Wienecke M, Fink GA, Sagerer G (2003) Towards automatic video-based whiteboard reading. In: Proc. ICDAR, pp 87-91
Wilkinson RA, Geist J, Janet S, Grother PJ, Burges CJC, Creecy R, Hammond B, Hull JJ, Larsen NJ, Vogle TP, Wilson CL (1992) The first optical character recognition systems conference. Technical Report NISTIR 4912, August 1992
Wolf C, Doermann D (2002) Binarization of low quality text using a markov random field model. In: Proc. ICPR, 3:160-163
Wolf C, Jolion J-M, Chassaing F (2002) Text localization, enhancement and binarization in multimedia documents. In: Proc. ICPR, 4:1037-1040
Wolf C (2003) Text detection in images taken from video sequences for semantic indexing. PhD thesis, Institut National de Sciences Appliquées de Lyon, France
Wong EK, Chen M-Y () A robust algorithm for text extraction in color video. In: Proc. IEEE international conference on multimedia and expo, pp 797-800
Wu V, Manmatha R, Riseman EM (1997) Finding text in images. In: Proc. 2nd ACM international conference on digital libraries, pp 3-12
Wu V, Manmatha R, Riseman EM (1999) TextFinder: an automatic system to detect and recognize text in images. IEEE Trans Pattern Anal Mach Intell 21(11):1124-1129
Yang J, Gao J, Zhang Y, Waibel A (2001) Towards automatic sign translation. In: Proc. Human Language Technology
Yang J, Gao J, Zhang Y, Chen X, Waibel A (2001) An automatic sign recognition and translation system. In: Proc. workshop on perceptive user interfaces (PUI’01)
Zandifar A, Duraiswami R, Chahine A, Davis L (2002) A video based interface to textual information for the visually impaired. In: Proc. IEEE 4th international conference on multimodal interfaces, pp 325-330
Zappala A, Gee A, Taylor M (1999) Document mosaicing. Image Vis Comput 17(8):585-595
Zhang D, Rajendran RK, Chang S-F (2002) General and domain-specific techniques for detecting and recognizing superimposed text in video. In: Proc. ICIP, 1:593-596
Zhang J, Chen X-L, Hanneman A, Yang J, Waibel A (2002) A robust approach for recognition of text embedded in natural scenes. In: Proc. ICPR, pp 204-207
Zhang Z, Tan CL (2001) Restoration of images scanned from thick bound documents. In: Proc. ICIP, pp 1074-1077
Zhang Z, Tan CL (2003) Correcting document image warping based on regression of curved text lines. In: Proc. ICDAR, pp 589-593
Zhang Z, Tan CL, Fan L (2004) Estimation of 3D shape of warped document surface for image restoration. In: Proc. ICPR
Zhang Z, Tan CL, Fan L (2004) Restoration of curved document images through 3D shape modeling. In: Proc. CVPR, pp 10-15
Zhong Y, Karu K, Jain AK (1995) Locating text in complex color images. In: Proc. ICDAR, pp 146-149
Zhong Y, Zhang H, Jain AK (2000) Automatic caption localization in compressed video. IEEE Trans Pattern Anal Mach Intell 22(4):385-392
Zunino R, Rovetta S (2000) Vector quantization for license-plate location and image coding. IEEE Trans Indust Electr 47(1):159-167
http://www.hpl.hp.com/news/2002/apr-jun/translator.html
http://www.htsol.com/Products/SeeCar.html
http://fire.relarn.ru/personal/andrey/cobra/
http://www.roadtraffic-technology.com/contractors/ detection/perceptics2/
http://www.4digitalbooks.com/products.htm
http://donswa.home.pipiline.com/ nytimes.digitizing.html
http://sourceforge.net/projects/viper-toolkit/
http://www-nlpir.nist.gov/projects/t01v/
http://www.casioprojector.com/yc400\_overview.html
Author information
Authors and Affiliations
Corresponding author
Additional information
Received: 18 December 2003, Accepted: 1 November 2004, Published online: 21 June 2005
Rights and permissions
About this article
Cite this article
Liang, J., Doermann, D. & Li, H. Camera-based analysis of text and documents: a survey. IJDAR 7, 84–104 (2005). https://doi.org/10.1007/s10032-004-0138-z
Issue Date:
DOI: https://doi.org/10.1007/s10032-004-0138-z