Caption analysis and recognition for building video indexing systems

Chang, Fu; Chen, Guey-Ching; Lin, Chin-Chin; Lin, Wen-Hsiung

doi:10.1007/s00530-004-0159-y

Caption analysis and recognition for building video indexing systems

Published: April 2005

Volume 10, pages 344–355, (2005)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Multimedia Systems Aims and scope Submit manuscript

Caption analysis and recognition for building video indexing systems

Download PDF

Fu Chang¹,
Guey-Ching Chen¹,
Chin-Chin Lin^1,2 &
…
Wen-Hsiung Lin¹

107 Accesses
17 Citations
3 Altmetric
Explore all metrics

Abstract.

In this paper, we propose several methods for analyzing and recognizing Chinese video captions, which constitute a very useful information source for video content. Image binarization, performed by combining a global threshold method and a window-based method, is used to obtain clearer images of characters, and a caption-tracking scheme is used to locate caption regions and detect caption changes. The separation of characters from possibly complex backgrounds is achieved by using size and color constraints and by cross examination of multiframe images. To segment individual characters, we use a dynamic split-and-merge strategy. Finally, we propose a character recognition process using a prototype classification method, supplemented by a disambiguation process using support vector machines, to improve recognition outcomes. This is followed by a postprocess that integrates multiple recognition results. The overall accuracy rate for the entire process applied to test video films is 94.11%.

Article PDF

Objects Detection in an Image by Color Features

A New DCT-FFT Fusion Based Method for Caption and Scene Text Classification in Action Video Images

Automated Text Detection and Character Recognition in Natural Scenes Based on Local Image Features and Contour Processing Techniques

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Antani S, Crandall D, Kasturi R (2000) Robust extraction of text in video. In: Proceedings of the IEEE international conference on pattern recognition, 1:831-834
Aslandogan YA, Yu CT (1999) Techniques and systems for image and video retrieval. IEEE Trans Knowl Data Eng 11:56-63
Article Google Scholar
Chang CC, Lin CJ (2001b) LIBSVM - A library for support vector machines. http://www.csie.edu.tw/~cjlin/libsvm/
Chang F (2001) Retrieving information from document images: problems and solutions. Int J Doc Anal Recog 4:46-55
Google Scholar
Chang F, Liang KH, Tan TM, Hwang WL (1999) Binarization of document images using Hadamard multiresolution analysis. In: 5th international conference on document analysis and recognition, Bangalore, India
Chang F, Chen CJ, Lu CJ (2004) A linear-time component-labeling algorithm using contour tracing technique. Comput Vis Image Understand 93:206-220
Article Google Scholar
Dasarathy BV (1991) NN concepts and techniques, nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Press, New York, pp 1-30
Doermann D, Liang J, Li H (2003) Progress in camera-based document image analysis. In: Proceedings of the IEEE international conference on document analysis and recognition, pp 606-616
Hua XS, Yin P, Zhang HJ (2002) Efficient video text recognition using multiple frame integration. In: Proceedings of the IEEE international conference on image processing, 2:397-400
Jain AK, Yu B (1998) Automatic text location in images and video frames. In: Proceedings of the IEEE international conference on pattern recognition, 2:1497-1499
Kamada H, Fujimoto K (1999) High-speed, High-accuracy binarization method for recognizing text in images of low spatial resolutions. In Proceedings of the 5th international conference on document analysis and recognition, pp 139-142
Kim EY, Kim KI, Jung K, Kim HJ (2000) A video indexing system using character recognition. In: Proceedings of the international conference on consumer electronics, pp 358-359
Knerr S, Personnaz L, and Dreyfus G (1990) Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Neurocomputing: algorithms, architectures and applications. Springer, Berlin Heidelberg New York
Kuwano H, Taniguchi Y, Arai H, Mori M, Kurakake S, Kojima H (2000) Telop-on-demand: video structuring and retrieval based on text recognition. In: Proceedings of the IEEE international conference on multimedia and expo, 2:759-762
Lee SW, Lee DJ, Park HS (1996) A new methodology for gray-scale character segmentation and recognition. IEEE Trans Pattern Anal Mach Intell 18:1045-1050
Article Google Scholar
Li H, Doermann D (1999) Text enhancement in digital video using multiple frame integration. ACM Multimedia 1:19-22
Google Scholar
Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9:147-156
Google Scholar
Lienhart R (2003) Video OCR: a survey and practitioner’s guide. Kluwer, Dordrecht
Google Scholar
Lienhart R, Effelsberg W (2000) Automatic text segmentation and text recognition for video indexing. Multimedia Syst 8:69-81
Article Google Scholar
Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits Syst Video Technol 12:256-268
Article Google Scholar
Lin CJ, Liu CC, Chen HH (2001) A simple method for Chinese video OCR and its application to question answering. Int J Comput Linguist Chinese Lang Process 6:11-30
MATH Google Scholar
Lu Y (1995) Machine printed character segmentation - an overview. Pattern Recog 28:67-80
Article Google Scholar
Mita T, Hori O (2001) Improvement of video text recognition by character selection. In: Proceedings of the IEEE international conference on document analysis and recognition, pp 1089-1093
Otsu N (1979) A threshold selection method from gray-scale histograms. IEEE Trans Syst Man Cybern 1:62-66
Google Scholar
Platt JC, Cristianini N, Shawe-Taylor J (2000) Large margin DAG’s for multiclass classification. In: Advances in neural information processing systems. MIT Press, Cambridge, MA, pp 547-553
Sato T, Kanade T, Hughes EK, Smith MA, Satoh S (1999) Video OCR: indexing digital news libraries by recognition of superimposed captions. Multimedia Syst 7:385-395
Article Google Scholar
Smith MA, Kanade T (1997) Video skimming and characterization through the combination of image and language understanding techniques. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Puerto Rico, pp 775-781
Shim JC, Dorai C, Bolle R (1998) Automatic text extraction from video for content-based annotation and retrieval. In: Proceedings of the international conference on pattern recognition, 1:16-20
Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin Heidelberg New York
Wong EK, Chen M (2000) A robust algorithm for text extraction in color video. In: IEEE international conference on multimedia and expo, 2:797-800
Wu V, Manmatha R, Riseman EM (1999) TextFinder: an automatic system to detect and recognize text in images. IEEE Trans Pattern Anal Mach Intell 21:1224-1229
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Science, Academia Sinica, Taipei, Taiwan
Fu Chang, Guey-Ching Chen, Chin-Chin Lin & Wen-Hsiung Lin
Department of Electrical Engineering, National Taipei University of Technology, Taipei, Taiwan
Chin-Chin Lin

Authors

Fu Chang
View author publications
You can also search for this author in PubMed Google Scholar
Guey-Ching Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chin-Chin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Hsiung Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fu Chang.

Additional information

Published online: 2 February 2005

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, F., Chen, GC., Lin, CC. et al. Caption analysis and recognition for building video indexing systems. Multimedia Systems 10, 344–355 (2005). https://doi.org/10.1007/s00530-004-0159-y

Download citation

Issue Date: April 2005
DOI: https://doi.org/10.1007/s00530-004-0159-y

Keywords:

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Caption analysis and recognition for building video indexing systems

Abstract.

Article PDF

Similar content being viewed by others

Objects Detection in an Image by Color Features

A New DCT-FFT Fusion Based Method for Caption and Scene Text Classification in Action Video Images

Automated Text Detection and Character Recognition in Natural Scenes Based on Local Image Features and Contour Processing Techniques

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords:

Navigation

Caption analysis and recognition for building video indexing systems

Abstract.

Article PDF

Similar content being viewed by others

Objects Detection in an Image by Color Features

A New DCT-FFT Fusion Based Method for Caption and Scene Text Classification in Action Video Images

Automated Text Detection and Character Recognition in Natural Scenes Based on Local Image Features and Contour Processing Techniques

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords:

Search

Navigation