Deep Features for Text Spotting

Jaderberg, Max; Vedaldi, Andrea; Zisserman, Andrew

doi:10.1007/978-3-319-10593-2_34

Max Jaderberg¹⁹,
Andrea Vedaldi¹⁹ &
Andrew Zisserman¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8692))

Included in the following conference series:

European Conference on Computer Vision

25k Accesses
245 Citations

Abstract

The goal of this work is text spotting in natural images. This is divided into two sequential tasks: detecting words regions in the image, and recognizing the words within these regions. We make the following contributions: first, we develop a Convolutional Neural Network (CNN) classifier that can be used for both tasks. The CNN has a novel architecture that enables efficient feature sharing (by using a number of layers in common) for text detection, character case-sensitive and insensitive classification, and bigram classification. It exceeds the state-of-the-art performance for all of these. Second, we make a number of technical changes over the traditional CNN architectures, including no downsampling for a per-pixel sliding window, and multi-mode learning with a mixture of linear models (maxout). Third, we have a method of automated data mining of Flickr, that generates word and character level annotations. Finally, these components are used together to form an end-to-end, state-of-the-art text spotting system. We evaluate the text-spotting system on two standard benchmarks, the ICDAR Robust Reading data set and the Street View Text data set, and demonstrate improvements over the state-of-the-art on multiple measures.

Download to read the full chapter text

Chapter PDF

Reading Text in the Wild with Convolutional Neural Networks

Article 07 May 2015

Detecting Text in Natural Image with Connectionist Text Proposal Network

An Image Dataset of Text Patches in Everyday Scenes

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

http://algoval.essex.ac.uk/icdar/datasets.html
https://code.google.com/p/tesseract-ocr/
http://www.flickr.com/
http://www.flickr.com/groups/type/
http://www.iapr-tc11.org/mediawiki/index.php/kaist_scene_text_database
Alsharif, O., Pineau, J.: End-to-End Text Recognition with Hybrid HMM Maxout Models. In: ICLR (2014)
Google Scholar
Anthimopoulos, M., Gatos, B., Pratikakis, I.: Detection of artificial and scene text in images and video frames. Pattern Analysis and Applications, 1–16 (2011)
Google Scholar
Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: Reading text in uncontrolled conditions. In: ICCV (2013)
Google Scholar
Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In: Proc. ICCV, vol. 2, pp. 105–112 (2001)
Google Scholar
de Campos, T., Babu, B.R., Varma, M.: Character recognition in natural images, pp. 591–604 (2009)
Google Scholar
Chen, H., Tsai, S., Schroth, G., Chen, D., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: Proc. International Conference on Image Processing (ICIP), pp. 2609–2612 (2011)
Google Scholar
Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004, vol. 2, p. II–366. IEEE (2004)
Google Scholar
Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D.J., Ng, A.Y.: Text detection and character recognition in scene images with unsupervised feature learning. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 440–445. IEEE (2011)
Google Scholar
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: A deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531 (2013)
Google Scholar
Dutta, S., Sankaran, N., Sankar, K., Jawahar, C.: Robust recognition of degraded documents using character n-grams. In: International Workshop on Document Analysis Systems (DAS), pp. 130–134. IEEE (2012)
Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proc. CVPR, pp. 2963–2970. IEEE (2010)
Google Scholar
Erhan, D., Bengio, Y., Courville, A., Vincent, P.: Visualizing higher-layer features of a deep network. Tech. rep. University of Montreal (2009)
Google Scholar
Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. IJCV 61(1) (2005)
Google Scholar
Goel, V., Mishra, A., Alahari, K., Jawahar, C.: Whole is greater than sum of parts: Recognizing scene text words. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 398–402. IEEE (2013)
Google Scholar
Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S., Shet, V.: Multi-digit number recognition from street view imagery using deep convolutional neural networks. In: ICLR (2014)
Google Scholar
Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. arXiv preprint arXiv:1302.4389 (2013)
Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
Google Scholar
Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866 (2014)
Google Scholar
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras, L.P., et al.: Icdar 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1484–1493. IEEE (2013)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, vol. 1, p. 4 (2012)
Google Scholar
Lalonde, M., Gagnon, L.: Key-text spotting in documentary videos using adaboost. In: Electronic Imaging 2006, p. 60641N. International Society for Optics and Photonics (2006)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lucas, S.M.: Icdar 2005 text locating competition results. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition 2005, pp. 80–84. IEEE (2005)
Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proc. BMVC, pp. 384–393 (2002)
Google Scholar
Mathieu, M., Henaff, M., LeCun, Y.: Fast training of convolutional networks through FFTs. CoRR abs/1312.5851 (2013)
Google Scholar
Mishra, A., Alahari, K., Jawahar, C., et al.: Scene text recognition using higher order language priors. In: 23rd British Machine Vision Conference on BMVC 2012 (2012)
Google Scholar
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part III. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011)
Chapter Google Scholar
Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: Proc. ICDAR, pp. 687–691. IEEE (2011)
Google Scholar
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Proc. CVPR, vol. 3, pp. 1187–1190. IEEE (2012)
Google Scholar
Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: 2013 IEEE International Conference on Computer Vision (ICCV 2013), pp. 97–104. IEEE, California (2013)
Chapter Google Scholar
Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 752–765. Springer, Heidelberg (2012)
Chapter Google Scholar
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics 9(1), 62–66 (1979)
Article MathSciNet Google Scholar
Ozuysal, M., Fua, P., Lepetit, V.: Fast keypoint recognition in ten lines of code. In: Proc. CVPR (2007)
Google Scholar
Posner, I., Corke, P., Newman, P.: Using text-spotting to query the world. In: Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, IROS (2010)
Google Scholar
Quack, T.: Large scale mining and retrieval of visual data in a multimodal context. Ph.D. thesis, ETH Zurich (2009)
Google Scholar
Rath, T., Manmatha, R.: Word spotting for historical documents. IJDAR 9(2-4), 139–152 (2007)
Article Google Scholar
Shahab, A., Shafait, F., Dengel, A.: Icdar 2011 robust reading competition challenge 2: Reading text in scene images. In: Proc. ICDAR, pp. 1491–1496. IEEE (2011)
Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: Visualising image classification models and saliency maps. In: Workshop at International Conference on Learning Representations (2014)
Google Scholar
Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: efficient boosting procedures for multiclass object detection. In: Proc. CVPR, pp. 762–769 (2004)
Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: Proc. ICCV, pp. 1457–1464. IEEE (2011)
Google Scholar
Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010)
Chapter Google Scholar
Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 3304–3308. IEEE (2012)
Google Scholar
Weinman, J.J., Butler, Z., Knoll, D., Feild, J.: Toward integrated scene text reading. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 375–387 (2014)
Article Google Scholar
Yang, H., Quehl, B., Sack, H.: A framework for improved video text detection and recognition. Int. Journal of Multimedia Tools and Applications, MTAP (2012)
Google Scholar
Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Transactions on Image Processing 20(9), 2594–2605 (2011)
Article MathSciNet Google Scholar
Yin, X.C., Yin, X., Huang, K.: Robust text detection in natural scene images. CoRR abs/1301.2628 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Visual Geometry Group, Department of Engineering Science, University of Oxford, UK
Max Jaderberg, Andrea Vedaldi & Andrew Zisserman

Authors

Max Jaderberg
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Vedaldi
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Zisserman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
KU Leuven, ESAT - PSI, iMinds, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jaderberg, M., Vedaldi, A., Zisserman, A. (2014). Deep Features for Text Spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8692. Springer, Cham. https://doi.org/10.1007/978-3-319-10593-2_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-10593-2_34
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10592-5
Online ISBN: 978-3-319-10593-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deep Features for Text Spotting

Abstract

Chapter PDF

Similar content being viewed by others

Reading Text in the Wild with Convolutional Neural Networks

Detecting Text in Natural Image with Connectionist Text Proposal Network

An Image Dataset of Text Patches in Everyday Scenes

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Deep Features for Text Spotting

Abstract

Chapter PDF

Similar content being viewed by others

Reading Text in the Wild with Convolutional Neural Networks

Detecting Text in Natural Image with Connectionist Text Proposal Network

An Image Dataset of Text Patches in Everyday Scenes

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation