Abstract.
In this paper, we propose several methods for analyzing and recognizing Chinese video captions, which constitute a very useful information source for video content. Image binarization, performed by combining a global threshold method and a window-based method, is used to obtain clearer images of characters, and a caption-tracking scheme is used to locate caption regions and detect caption changes. The separation of characters from possibly complex backgrounds is achieved by using size and color constraints and by cross examination of multiframe images. To segment individual characters, we use a dynamic split-and-merge strategy. Finally, we propose a character recognition process using a prototype classification method, supplemented by a disambiguation process using support vector machines, to improve recognition outcomes. This is followed by a postprocess that integrates multiple recognition results. The overall accuracy rate for the entire process applied to test video films is 94.11%.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Antani S, Crandall D, Kasturi R (2000) Robust extraction of text in video. In: Proceedings of the IEEE international conference on pattern recognition, 1:831-834
Aslandogan YA, Yu CT (1999) Techniques and systems for image and video retrieval. IEEE Trans Knowl Data Eng 11:56-63
Chang CC, Lin CJ (2001b) LIBSVM - A library for support vector machines. http://www.csie.edu.tw/~cjlin/libsvm/
Chang F (2001) Retrieving information from document images: problems and solutions. Int J Doc Anal Recog 4:46-55
Chang F, Liang KH, Tan TM, Hwang WL (1999) Binarization of document images using Hadamard multiresolution analysis. In: 5th international conference on document analysis and recognition, Bangalore, India
Chang F, Chen CJ, Lu CJ (2004) A linear-time component-labeling algorithm using contour tracing technique. Comput Vis Image Understand 93:206-220
Dasarathy BV (1991) NN concepts and techniques, nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Press, New York, pp 1-30
Doermann D, Liang J, Li H (2003) Progress in camera-based document image analysis. In: Proceedings of the IEEE international conference on document analysis and recognition, pp 606-616
Hua XS, Yin P, Zhang HJ (2002) Efficient video text recognition using multiple frame integration. In: Proceedings of the IEEE international conference on image processing, 2:397-400
Jain AK, Yu B (1998) Automatic text location in images and video frames. In: Proceedings of the IEEE international conference on pattern recognition, 2:1497-1499
Kamada H, Fujimoto K (1999) High-speed, High-accuracy binarization method for recognizing text in images of low spatial resolutions. In Proceedings of the 5th international conference on document analysis and recognition, pp 139-142
Kim EY, Kim KI, Jung K, Kim HJ (2000) A video indexing system using character recognition. In: Proceedings of the international conference on consumer electronics, pp 358-359
Knerr S, Personnaz L, and Dreyfus G (1990) Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Neurocomputing: algorithms, architectures and applications. Springer, Berlin Heidelberg New York
Kuwano H, Taniguchi Y, Arai H, Mori M, Kurakake S, Kojima H (2000) Telop-on-demand: video structuring and retrieval based on text recognition. In: Proceedings of the IEEE international conference on multimedia and expo, 2:759-762
Lee SW, Lee DJ, Park HS (1996) A new methodology for gray-scale character segmentation and recognition. IEEE Trans Pattern Anal Mach Intell 18:1045-1050
Li H, Doermann D (1999) Text enhancement in digital video using multiple frame integration. ACM Multimedia 1:19-22
Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9:147-156
Lienhart R (2003) Video OCR: a survey and practitioner’s guide. Kluwer, Dordrecht
Lienhart R, Effelsberg W (2000) Automatic text segmentation and text recognition for video indexing. Multimedia Syst 8:69-81
Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits Syst Video Technol 12:256-268
Lin CJ, Liu CC, Chen HH (2001) A simple method for Chinese video OCR and its application to question answering. Int J Comput Linguist Chinese Lang Process 6:11-30
Lu Y (1995) Machine printed character segmentation - an overview. Pattern Recog 28:67-80
Mita T, Hori O (2001) Improvement of video text recognition by character selection. In: Proceedings of the IEEE international conference on document analysis and recognition, pp 1089-1093
Otsu N (1979) A threshold selection method from gray-scale histograms. IEEE Trans Syst Man Cybern 1:62-66
Platt JC, Cristianini N, Shawe-Taylor J (2000) Large margin DAG’s for multiclass classification. In: Advances in neural information processing systems. MIT Press, Cambridge, MA, pp 547-553
Sato T, Kanade T, Hughes EK, Smith MA, Satoh S (1999) Video OCR: indexing digital news libraries by recognition of superimposed captions. Multimedia Syst 7:385-395
Smith MA, Kanade T (1997) Video skimming and characterization through the combination of image and language understanding techniques. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Puerto Rico, pp 775-781
Shim JC, Dorai C, Bolle R (1998) Automatic text extraction from video for content-based annotation and retrieval. In: Proceedings of the international conference on pattern recognition, 1:16-20
Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin Heidelberg New York
Wong EK, Chen M (2000) A robust algorithm for text extraction in color video. In: IEEE international conference on multimedia and expo, 2:797-800
Wu V, Manmatha R, Riseman EM (1999) TextFinder: an automatic system to detect and recognize text in images. IEEE Trans Pattern Anal Mach Intell 21:1224-1229
Author information
Authors and Affiliations
Corresponding author
Additional information
Published online: 2 February 2005
Rights and permissions
About this article
Cite this article
Chang, F., Chen, GC., Lin, CC. et al. Caption analysis and recognition for building video indexing systems. Multimedia Systems 10, 344–355 (2005). https://doi.org/10.1007/s00530-004-0159-y
Issue Date:
DOI: https://doi.org/10.1007/s00530-004-0159-y