Large-Lexicon Attribute-Consistent Text Recognition in Natural Images

Novikova, Tatiana; Barinova, Olga; Kohli, Pushmeet; Lempitsky, Victor

doi:10.1007/978-3-642-33783-3_54

Tatiana Novikova²¹,
Olga Barinova²¹,
Pushmeet Kohli²² &
…
Victor Lempitsky²³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7577))

Included in the following conference series:

European Conference on Computer Vision

9893 Accesses
44 Citations

Abstract

This paper proposes a new model for the task of word recognition in natural images that simultaneously models visual and lexicon consistency of words in a single probabilistic model. Our approach combines local likelihood and pairwise positional consistency priors with higher order priors that enforce consistency of characters (lexicon) and their attributes (font and colour). Unlike traditional stage-based methods, word recognition in our framework is performed by estimating the maximum a posteriori (MAP) solution under the joint posterior distribution of the model. MAP inference in our model is performed through the use of weighted finite-state transducers (WFSTs). We show how the efficiency of certain operations on WFSTs can be utilized to find the most likely word under the model in an efficient manner. We evaluate our method on a range of challenging datasets (ICDAR’03, SVT, ICDAR’11). Experimental results demonstrate that our method outperforms state-of-the-art methods for cropped word recognition.

Download to read the full chapter text

Chapter PDF

Lexicon-based probabilistic indexing of handwritten text images

Article Open access 10 May 2023

Dynamic Lexicon Generation for Natural Scene Images

Scene Text Recognition and Retrieval for Large Lexicons

References

Wang, K., Belongie, S.: Word Spotting in the Wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010)
Chapter Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: IEEE International Conference on Computer Vision, ICCV (2011)
Google Scholar
Neumann, L., Matas, J.: A Method for Text Localization and Recognition in Real-World Images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part III. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011)
Chapter Google Scholar
Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: ICDAR (2011)
Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010)
Google Scholar
Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR, pp. 366–373. IEEE (2004)
Google Scholar
Beaufort, R., Mancas-Thillou, C.: A weighted finite-state framework for correcting errors in natural scene ocr. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition, vol. 02 (2007)
Google Scholar
Smith, D.L., Field, J., Learned-Miller, E.G.: Enforcing similarity constraints with integer programming for better scene text recognition. In: CVPR. IEEE (2011)
Google Scholar
Mohri, M., Pereira, F., Riley, M.: Weighted finite-state transducers in speech recognition. Computer Speech & Language 16, 69–88 (2002)
Article Google Scholar
Povey, D., Hannemann, M., Boulianne, G., Burget, L., Ghoshal, A., Janda, M., Karafiát, M., Kombrink, S., Motlícek, P., Qian, Y., Riedhammer, K., Veselý, K., Vu, N.T.: Generating exact lattices in the WFST framework. In: ICASSP (2012)
Google Scholar
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR (2012)
Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient belief propagation for early vision. In: CVPR (1), pp. 261–268 (2004)
Google Scholar
Weinman, J.J., Learned-Miller, E., Hanson, A.R.: Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 31 (2009)
Google Scholar
Jacobs, C.E., Simard, P.Y., Viola, P.A., Rinker, J.: Text recognition of low-resolution document images. In: ICDAR, pp. 695–699 (2005)
Google Scholar
Ciura, M., Deorowicz, S.: How to squeeze a lexicon. Softw., Pract. Exper. 31, 1077–1090 (2001)
Article MATH Google Scholar
Yamazoe, T., Etoh, M., Yoshimura, T., Tsujino, K.: Hypothesis preservation approach to scene text recognition with weighted finite-state transducer. In: ICDAR (2011)
Google Scholar
Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008), http://www.vlfeat.org/
Allauzen, C., Riley, M.: OpenFst: a general and efficient weighted finite-state transducer library (2010), http://www.openfst.org/twiki/bin/view/FST/WebHome
The OCRopus open source document analysis and OCR system, http://code.google.com/p/ocropus/
Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Computation 9, 1545–1588 (1997)
Article Google Scholar
Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. In: ICDAR (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Lomonosov Moscow State University, Russia
Tatiana Novikova & Olga Barinova
Microsoft Research Cambridge, UK
Pushmeet Kohli
Yandex, Russia
Victor Lempitsky

Authors

Tatiana Novikova
View author publications
You can also search for this author in PubMed Google Scholar
Olga Barinova
View author publications
You can also search for this author in PubMed Google Scholar
Pushmeet Kohli
View author publications
You can also search for this author in PubMed Google Scholar
Victor Lempitsky
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Ltd., CB3 0FB, Cambridge, UK
Andrew Fitzgibbon
Dept. of Computer Science, University of North Carolina, 27599, Chapel Hill, NC, USA
Svetlana Lazebnik
California Institute of Technology, 91125, Pasadena, CA, USA
Pietro Perona
Institute of Industrial Science, The University of Tokyo, 153-8505, Tokyo, Japan
Yoichi Sato
INRIA, 38330, Montbonnot, France
Cordelia Schmid

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Novikova, T., Barinova, O., Kohli, P., Lempitsky, V. (2012). Large-Lexicon Attribute-Consistent Text Recognition in Natural Images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33783-3_54

Download citation

DOI: https://doi.org/10.1007/978-3-642-33783-3_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33782-6
Online ISBN: 978-3-642-33783-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Large-Lexicon Attribute-Consistent Text Recognition in Natural Images

Abstract

Chapter PDF

Similar content being viewed by others

Lexicon-based probabilistic indexing of handwritten text images

Dynamic Lexicon Generation for Natural Scene Images

Scene Text Recognition and Retrieval for Large Lexicons

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Large-Lexicon Attribute-Consistent Text Recognition in Natural Images

Abstract

Chapter PDF

Similar content being viewed by others

Lexicon-based probabilistic indexing of handwritten text images

Dynamic Lexicon Generation for Natural Scene Images

Scene Text Recognition and Retrieval for Large Lexicons

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation