1 Introduction

Text in scene images includes important information and is exploited in many content-based video and image applications [1]. Text segmentation is a challenging problem due to variations of font, color, size and orientation etc. Binarization is also a great challenge, especially in the process of text based scene images where binarization result can directly influence the OCR rate. Several methods exist for binarization in document images but they cannot be directly applied on low resolution images. Conventional binarization techniques are either global [2] or local ([3, 4]) thresholding. Existing techniques for scene text segmentation can generally be classified into two sets: sliding window [5] and CC [6] based schemes. Sliding window based schemes use a sliding window to find for possible texts in the scene image and then use machine learning methodologies to identify text. CC based methods separate out character candidates from scene images by CC analysis. Due to their relatively simple implementation, CC-based methods are widely used. Here, we take an interest into color images embedding text. In the following sections, our methods are presented.

2 Variance Based Image Binarization Scheme

Consider a color image which consists of red (R), green (G) and blue (B) planes. Now, it is required to retrieve the information from each plane. Further, the image contains highly varying gray pixel values which make binarization a difficult task. This can be overcome to a great extent using variance to perform binarization.

Each plane is separately passed through the binarization process. First, the variance matrix is calculated from the gray scale image, which marks the change in pixel intensities in the image. Binarizing the variance matrix, we separate the image into two regions, one having high variance values and other having low variance values. Now, using each region, two gray scale images (one from the white and other from the black region of the binarized image) are formed. Binarizing these two images separately will produce more even binarization as they don’t contain any fluctuating pixel intensities. These gray scale images are binarized by a window based Otsu binarization method, as illustrated in the algorithm. Finally, binarized image from each plane is merged together to form the final binarized image. The details have been presented in Algorithm 1.

figure a

Consider the RGB image(Fig. 1(a)) as an input. The image has separated into three different planes(Figs. 1(b), (c) and (d)). Consider the R plane, which is passed into the proposed binarization algorithm. Figure 2(a) represents the binarized image of variance matrix, which is calculated by moving a \(5\times 5\) window throughout the image. For white pixels and black pixels, separate gray scale images are again formed and a window based Otsu algorithm is performed. Results are presented in Figs. 3(a) and (b). These are merged to obtain binarized image for R plane(Fig. 3(e)). Similarly, binarized images for G and B planes are presented in Figs. 4(c) and (d). Binarized images from each planes are merged to obtain the final binarized image(Fig. 4(e)).

Fig. 1.
figure 1

(a) Input color image. (b) R plane. (c) G plane. (d) B plane. (Color figure online)

Fig. 2.
figure 2

Binarization for variance matrices on (a) R plane, (b) G plane and (c) B plane.

Fig. 3.
figure 3

Binerized image of gray scale corresponding to (a) White pixel in 2(a). (b) Black pixel in 2(a). (c) White pixel in 2(b). (d) Black pixel in 2(b). (e) Merged image of 3(a) and 3(b).

Fig. 4.
figure 4

Binerized image of gray scale corresponding to (a) White pixel in 2(c) and (b) Black pixel in 2(c). Merged images (c) (3(c) and 3(d)), (d) (4(a) and 4(b)) and (e) (3(e), 4(c), and 4(d)).

3 Shape Based Feature Extraction

Image binarization creates a number of CCs. In order to segment text, we have considered a number of features from each CC.

  1. AL:

    Axial ratio (AL) of a CC is the ratio of the length of the two axes to each other - the longer axis divided by the shorter.

  2. LO:

    Number of lobes in a CC [7].

  3. A:

    Aspect ratio of a CC [7].

  4. E:

    Elongation ratio of a CC [7].

  5. O:

    Object to background pixels ratio of a CC [7].

  6. AR:

    Area ratio of a CC. It is the ratio of (area of the CC and area of input image).

  7. L:

    Length ratio (L) of a CC. It is the ratio of (max (height of CC, width of CC), max (height of the I, width of the I)), where I is the input image.

Now, we construct the feature vector \({\varvec{Y}} = \{AL, LO, A, E, O, AR, L\}\) for a CC.

4 KNN and SVM Based Text Segmentation

To segment the text components, K-NN and SVM classifiers are applied. The feature vector Y for text and non-text CCs are calculated. The dataset contains 420 train images and 102 test images. Ground truth information of train images are used to create the feature file for 21700 text components. Next, the input images are binarized with our binarization method. Then the components present in the ground truth images are eliminated. Thus we create 78800 non-text CCs. These are used to prepare the feature file for non-text components. Based on these feature files, K-NN and SVM classifiers are trained separately. To segment the text components from test images, an input image is binarized using our binarization method and the feature vector Y is obtained. Now, each CC is fed to both the trained K-NN and SVM classifiers to decide whether the component is text or non-text. Thus, two output images from K-NN and SVM classifiers are obtained. Finally, these two images are merged using logical OR operation to get the final image consisting only text components.

5 Results and Discussion

The experimental results are obtained on ICDAR 2011 Born Digital Dataset [8]. These images are inherently low-resolution. So, automatic segmentation of text is therefore an important project. Our experiments are divided into two parts based on our aim of the paper.

5.1 Results of Binarization Scheme

Let us first pictorially observe some binarization results. A few example results are presented in Table 2. First column represents the sample input images and second column presents the corresponding binarized images. Evaluation of our binarization scheme is done in terms of the precision, recall and F-measure [7]. Also the performance of our binarization scheme has been compared with a few known methods in terms of recall, precision and FM on ICDAR 2011 Born Digital data set. It can be seen from the results (Table 1) that our binarization method has significantly outperformed.

Table 1. Recall, Precision and FM for different binarization technique.
Table 2. Input images, binarized images and segmented text (using KNN, SVM and merged of KNN and SVM) are presented respectively \(1^{st}\), \(2^{nd}\), \(3^{rd}\), \(4^{th}\) and \({5^{th}}\) columns.

5.2 Text Identification Results

We present the text segmentation results obtained by our KNN and SVM classifiers. A few images and their corresponding segmented text using KNN, SVM and merged KNN and SVM classifier are presented in Table 2. Visually, it is clear that our approach good towards text segmentation. A robust comparison analysis has been performed by means of Recall, Precision and FM values of our different classification methods obtained on the basis of ICDAR 2011 Born Digital data set images are presented in the Table 3. Final evaluation of our scheme is presented by comparing with other known techniques. The ICDAR 2011 Robust Reading Competition presented evaluation results of a number of methods from different participants. In Table 3, a few of these techniques are compared with our scheme. Our scheme has achieved highest recall (77.72).

Table 3. Recall, Precision and FM for different text segmentation methods.

6 Summary and Future Scope

This paper provides a new variance based image binarization scheme and its application in text segmentation. A number of shape based features are defined towards segmentation of text. Then, SVM and KNN classifiers are trained for classification of text and non-text. Finally, the results obtain from SVM and KNN are merged to get the final segmented text. The proposed method is very effective for low resolution images. Future study may aim towards combining machine learning tools to improve the binarization scheme.