Abstract
This paper proposes a new model for the task of word recognition in natural images that simultaneously models visual and lexicon consistency of words in a single probabilistic model. Our approach combines local likelihood and pairwise positional consistency priors with higher order priors that enforce consistency of characters (lexicon) and their attributes (font and colour). Unlike traditional stage-based methods, word recognition in our framework is performed by estimating the maximum a posteriori (MAP) solution under the joint posterior distribution of the model. MAP inference in our model is performed through the use of weighted finite-state transducers (WFSTs). We show how the efficiency of certain operations on WFSTs can be utilized to find the most likely word under the model in an efficient manner. We evaluate our method on a range of challenging datasets (ICDAR’03, SVT, ICDAR’11). Experimental results demonstrate that our method outperforms state-of-the-art methods for cropped word recognition.
Chapter PDF
Similar content being viewed by others
References
Wang, K., Belongie, S.: Word Spotting in the Wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010)
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: IEEE International Conference on Computer Vision, ICCV (2011)
Neumann, L., Matas, J.: A Method for Text Localization and Recognition in Real-World Images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part III. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011)
Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: ICDAR (2011)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010)
Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR, pp. 366–373. IEEE (2004)
Beaufort, R., Mancas-Thillou, C.: A weighted finite-state framework for correcting errors in natural scene ocr. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition, vol. 02 (2007)
Smith, D.L., Field, J., Learned-Miller, E.G.: Enforcing similarity constraints with integer programming for better scene text recognition. In: CVPR. IEEE (2011)
Mohri, M., Pereira, F., Riley, M.: Weighted finite-state transducers in speech recognition. Computer Speech & Language 16, 69–88 (2002)
Povey, D., Hannemann, M., Boulianne, G., Burget, L., Ghoshal, A., Janda, M., Karafiát, M., Kombrink, S., Motlícek, P., Qian, Y., Riedhammer, K., Veselý, K., Vu, N.T.: Generating exact lattices in the WFST framework. In: ICASSP (2012)
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR (2012)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient belief propagation for early vision. In: CVPR (1), pp. 261–268 (2004)
Weinman, J.J., Learned-Miller, E., Hanson, A.R.: Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 31 (2009)
Jacobs, C.E., Simard, P.Y., Viola, P.A., Rinker, J.: Text recognition of low-resolution document images. In: ICDAR, pp. 695–699 (2005)
Ciura, M., Deorowicz, S.: How to squeeze a lexicon. Softw., Pract. Exper. 31, 1077–1090 (2001)
Yamazoe, T., Etoh, M., Yoshimura, T., Tsujino, K.: Hypothesis preservation approach to scene text recognition with weighted finite-state transducer. In: ICDAR (2011)
Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008), http://www.vlfeat.org/
Allauzen, C., Riley, M.: OpenFst: a general and efficient weighted finite-state transducer library (2010), http://www.openfst.org/twiki/bin/view/FST/WebHome
The OCRopus open source document analysis and OCR system, http://code.google.com/p/ocropus/
Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Computation 9, 1545–1588 (1997)
Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. In: ICDAR (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Novikova, T., Barinova, O., Kohli, P., Lempitsky, V. (2012). Large-Lexicon Attribute-Consistent Text Recognition in Natural Images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33783-3_54
Download citation
DOI: https://doi.org/10.1007/978-3-642-33783-3_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33782-6
Online ISBN: 978-3-642-33783-3
eBook Packages: Computer ScienceComputer Science (R0)