Keywords

1 Introduction

The recognition of texts in the image is an active research area. From the literature, it is reported that there are many different techniques proposed for the recognition of the text in the images. In those, only the grayscale images had been considered, and there involves two different processes. They are binarization and recognition. Binarization is an important process in the recognition, since optimum pre-processing is necessary for the proper recognition of the images. There are varieties of kind of OCRs available for Roman Scripts [1, 2]. But it is not necessary; the images submitted to those OCRs should produce good results. Basically, there are two different procedures involved in word recognition. The former one is detection or localization, and the latter one is recognition [3]. The best performing algorithm produced the result of recognition rate of 52% in ICDAR 2003 [4] and of 62.5% using TH-OCR and 64.3% using ABBYY Fine Reader in ICDAR 2011 [5]. As stated already, the recognition involves two stages: binarization followed by recognition. For recognition, there are training datasets or OCR Engines for Roman Scripts [710]. Hence, the latter part needs no more research expertise. It is only the earlier part; that is, the binarization process needs to be improved. Since the literature reported different algorithms for the binarization of the color images, none of them produced proper results. Hence, a novel algorithm of combing the method of color component clustering [6, 7] and Power Law Transform [8] is applied for the binarization of the images. Here the clustering of similar color components is a separate algorithm, and PLT is yet another different algorithm proposed separately for the binarization. But a novel algorithm is proposed by combining these two algorithms. Here we have considered certain scanned images containing different colors like pamphlets and posters are being considered (Fig. 1).

Fig. 1
figure 1

Some color image dataset considered

Here initially clustering of the color components is applied to the image, and the algorithm of PLT is applied. Since different color components are present in an image, the clustering algorithm gives the more efficient results. Since most of the characters seem to be linked with each other, Power Law Transform reduces the stroke width and hence separates the characters. By combining these two algorithms, the recognition rate would further be increased.

2 Related Work

There are various algorithms proposed in the literature for the binarization of the color images. They include Global Threshold Algorithm [11], Adaptive threshold algorithm [8], Power Law Transform method [12], Clustering Algorithm [1315] etc. All the algorithm have certain flaws, which reduce the recognition rate. Of these, power law transform is the most simple algorithm, with very good efficiency [8]. But in certain cases, the power law transform fails to binarize.

Figures 2 and 3 show the cases in which PLT algorithm fails to produce proper results.

Fig. 2
figure 2

Images in which PLT fails

Fig. 3
figure 3

Binarized images by the PLT

3 Color Components Clustering

Text is one of the most important concentrations for the image which is taken into consideration. Since normally the colored images are multi-colored, the binarization becomes a challenging phenomenon. Hence, a novel approach of clustering different color components is considered. The major color planes of red, green, and blue color planes are considered. They are first extracted and the Canny edge detection [1618] is performed on those extracted images independently and finally the union operation is performed between those three edge detected images. The edge map is obtained as shown below:

$${\mathbf{E}} = {\mathbf{E}}_{{\mathbf{R}}} \;{\mathbf{v}}\;{\mathbf{E}}_{{\mathbf{G}}} \;{\mathbf{v}}\;{\mathbf{E}}_{{\mathbf{B}}}$$

Here ER, EG, and EB are Canny edge detected image of red, green, and blue plane images, respectively, and v is the union indicating the OR operation. Consider the same image taken in which the PLT had failed (Fig. 4).

Fig. 4
figure 4

Images considered and the subsequent images showing the edge detected red, green, and blue plane images and the final image representing the addition of all the three images (Color figure online)

Once this is done, based on 8 connected component labeling independent rectangular box is applied. Each of these rectangular boxes is called as edge box (EB). There are certain constraints for these EBs. The aspect ratio is fixed within the range of 0.1–10. The aspect ratio is assigned for eliminating the elongated areas. The edge boxes’ size should be more than 10 pixels and less than 1/5th of the total image size. Else they should be subjected to further pre-processing (Fig. 5).

Fig. 5
figure 5

Edge map image with edge box (EB)

Here there is a possibility of the edge boxes to be overlapped. But the algorithm is designed in such a way that only the internal edge boxes are taken into consideration and others are ignored.

4 Power Law Transform

After the clustering of the similar color components and creating independent edge boxes for every text character, the Power Law Transform is applied within every edge box so as to perform independent binarization within those edge boxes, the common thresholding technique of Global Thresholding, i.e., Otsu’s method can be applied after the Power Law Transformation [11]. Yet another problem is arriving the optimum threshold value. The histogram is split into two parts using the threshold value k. Hence, the optimum threshold value is arrived by maximizing the following objective function.

$$\sigma^{2} (k^{*} )\, = \,\mathop { \hbox{max} }\limits_{k} \frac{{ [\mu_{T} \,\omega \, ( {\text{k)}}\, - \,\mu \, ( {\text{k)]}}^{ 2} }}{\omega \,(k)\,[1\, - \,\omega \,(k)}$$

where,

$$\omega (k)\, = \,\sum\limits_{i = 1}^{k} {p_{i} } ;\quad \mu (k)\, = \,\sum\limits_{i = 1}^{k} {ip_{i} } ;\quad \mu_{T} \, = \,\sum\limits_{i = 1}^{k} {ip_{i} }$$

Here, the no. of gray levels is indicated by ‘L’ and ‘pi’ is the normalized probability distribution of the image. One of the most important advantages of the Power Law Transform is that it increases the contrast of the image and creates well-distinguishable pixels for improved segmentation.

$$s\, = \,cr^{\nu }$$

Here r is output intensity, s is input intensity, and c, υ are positive constants. Here the exponential component in Power Law is gamma. In experimentation, the gamma value is predefined as 1.5. To avoid another rescaling stage after PLT, the c is fixed as 1. With the increase in gamma value, the number of samples increases gradually, and for optimum segmentation, the gamma value for the PLT of images is fixed to 1.5 (Fig. 6).

Fig. 6
figure 6

Independently binarized image using PLT

5 Proposed Algorithm

As said above of the two important stages of recognition of text in images, binarization is an important task. And in case of color images, a novel method of combining the Clustering technique and PLT is utilized here, which shows more improved and universal results. Here the color components labeling is followed by the edge detection in separate clusters, edge map formation by union operation, and binarization using PLT method.

6 Recognition

Finally, the binarized image with text is given to OCR. Many OCRs are available such as OmniPage, Adobe Reader, ABBYY Fine Reader, and Tesseract. All the above-mentioned OCRs are relatively good, and hence, any of them can be used for the recognition of the binarized color images. Here, the train version of ABBYY Fine Reader is used. Performance evaluation is done with the ABBYY Fine Reader OCR’s result.

7 Results

There are 2589 word images extracted from the color images as the training datasets. The testing dataset is of 781 word images. Table 1 compares the word recognition rate of proposed color clustering and PLT algorithms with those of other methods. Figure 7 shows certain images taken into consideration and their responses to the proposed algorithm.

Table 1 Performance evaluation of proposed color cluster and PLT algorithm on the color image dataset
Fig. 7
figure 7

Other examples showing the binary results obtained using the color clustering and PLT algorithm

8 Conclusion

A novel algorithm with the combination of color components clustering and PLT technique is proposed for improving the recognition of color word image dataset. From experimentation, it is obtained that the algorithms proposed in the literature are not sufficient for the word recognition in the color images. Also in case power law transform, one of the most simple and effective algorithm performs appreciably well on many image dataset. But it fails in certain cases as shown above. Hence, a novel method is proposed by utilizing the effective algorithm PLT along with the connected component algorithm. The results obtained show that the proposed algorithm performs even more well universally for all kinds of image datasets, even for the images which PLT failed to produce results, and consequently enhance the recognition rate.