1 Introduction

As the optical character recognition (OCR) techniques have become widely available, a crucial first step for OCR remains document image binarization. Although document image binarization has been studied for many years, extracting clear characters from degraded document images is still a challenging problem.

Image binarization sets the gray values of pixels to 0 or 255, creating a black and white image [1]. Wen et al. [2] divided binarization methods into three major categories: clustering-based, threshold-based, and hybrid methods. In clustering-based methods, the image pixel’s gray levels are partitioned into two clusters according to model-based features. Fuzzy classification [3,4,5] is a typical clustering-based method. Because clustering is an iterative process, the key to clustering-based methods is the selection of the initial value, clustering criterion, and termination condition. If this selection is effective, the binary image will have clear characters and little noise, but if the selection is poor, some background will be clustered in the foreground mistakenly. In addition, the binary image will lose a large amount of the foreground information. Clustering-based methods perform well on the images with non-uniform illumination. However, for images with bleed-through, clustering-based methods are usually unable to select an ideal clustering criterion nor produce a high-quality binary image. Recently, many researchers have presented hybrid algorithms for image binarization. For example, Chou and Lin’s method [6] combined SVM with Otsu’s threshold. Mesquita et al. [7] combined K-means with Otsu’s threshold. However, most of the hybrid methods have a trade-off between the ability to reduce noise and the complexity of processing. If the algorithm has a strong ability to reduce noise, its complexity will be high and the running time will increase, which is not conducive to practical applications. However, threshold-based methods have been researched widely because of their briefness, efficiency, and easy comprehension. These methods include two sub-categories: global and local. Global binarization methods [8,9,10,11] segment well the images for which the gray value distributions between foreground and background are uniform and the deviations are small. However, for degraded document images, global binarization methods will produce numerous mistakes. Nevertheless, local binarization methods [12,13,14,15,16,17,18] are more suitable for document image binarization because they use windows or blocks to determine a local threshold. However, the selection of window or block will affect the application of the local threshold method. Windows or blocks that are too small will produce a large amount of noise. However, if they are too large, texts will be lost.

Most of the current techniques achieve a good binarization effect on a specific degradation. However, there are many reasons for degradation, and a good binarization algorithm should be able to deal with a variety of situations. Based on this, this paper proposes a local binarization algorithm that can handle images with uneven illumination, bleed-through, and variable background.

This paper is organized as follows: Section 2 reviews binarization methods. Section 3 describes the proposed binarization method. Section 4 presents an analysis of experimental results. Section 5 presents a discussion of the results and concludes the paper.

2 Review of binarization methods

The key of threshold-based methods is how to select the threshold. Researchers usually use histograms to determine a global threshold for images that have a clear distinction between the foreground and background. Classical global binarization methods (such as Otsu’s method [8]) use maximum variance between foreground and background to determine the threshold dynamically. Thanks to its adaptability, it is still one of the most commonly used methods of image segmentation. For some images degraded by noise, uneven illumination, or low contrast, local analysis for images can overcome the influence of degradation to some extent. For these kinds of degraded images, local binarization methods generally gain better results. This section mainly reviews local threshold-based methods.

Niblack’s method [13] is a commonly used local adaptive binarization algorithm. It calculates the threshold based on the local mean and standard deviation. Threshold T for pixel f(xy) is defined as:

$$\begin{aligned} T(x,y)=m(x,y)+k\cdot s(x,y) \end{aligned}$$
(1)

where m(xy) and s(xy) refer to the mean and variance of the gray values in a neighborhood, respectively. The neighborhood should be moderate in size such that it can both contain local details and suppress noise. Hence, the value of k and the neighborhood’s size are chosen as in [18]. According to Gatos et al. [19], Niblack’s [13] method cannot handle background with light texture.

Sauvola and Pietikainen [14] proposed an improved algorithm based on Niblack’s method, which has become a standard for local threshold methods. Sauvola and Pietikainen’s algorithm uses the current pixel as the center of a neighborhood. It dynamically calculates the threshold on the basis of the grayscale average and standard deviation of the current pixel in a neighborhood. Threshold T is as follows:

$$\begin{aligned} T(x,y)=m(x,y)\cdot \left[ {1+k\cdot \left( \frac{s(x,y)}{R}-1\right) } \right] \end{aligned}$$
(2)

where m(xy) and s(xy) are the same as those in Niblack’s method, R is the dynamic range of the standard deviation, and k is the correction factor, which ranges from 0 to 1. Sauvola and Pietikainen’s method can handle degraded document images with variable illumination, resolution variation, and noise, but fails for very light or very dark backgrounds [18].

Bernsen’s method [12] is a typical local threshold algorithm. It calculates the threshold using mean and contrast information over a local region. The threshold is calculated as:

$$\begin{aligned} T(x,y)=\frac{Z_{\text {low}} +Z_{\text {high}} }{2} \end{aligned}$$
(3)

where \(Z_\mathrm{low} \) and \(Z_\mathrm{high} \) are the lowest and highest gray levels, respectively, in an \( r \times r\) region. Bernsen selects \(r = 15\). This method produces a large amount of background noise, especially for degraded document images with blank backgrounds.

Singh et al. [18] proposed a new adaptive binarization method. Their method has four steps: contrast analysis, contrast stretching, thresholding, and noise removal. Singh et al.’s method works well on degraded document images. However, it fails when the document does not contain dense text or suffers serious bleed-through. Furthermore, it is more time-consuming and sensitive to parameter changes.

For degraded document images, current local binarization methods are affected by the size of the window or block to some extent. This means that they are unable binarize images that have bleed-through or little text. Hence, this paper presents a new binarization method for degraded document images based on contrast enhancement. It mainly solves the problem of document images degraded by uneven illumination, bleed-through, and variable background.

3 Proposed algorithm

For both humans and computers, when identifying text, the basis of distinction between the foreground and background is the obvious difference in gray values at the edges of the characters. For degraded document images, some areas have obvious grayscale contrast, but others do not. Therefore, for degraded document images that have obvious differences in grayscale contrast in different areas, a single binarization method cannot achieve a good result. To address this issue, this paper presents a binarization method that uses different contrast enhancements for areas with different grayscale contrasts.

3.1 Area partition

Regional division directly influences whether an adaptable contrast enhancement method can achieve the best results in the corresponding region. Hence, it is crucial to find a suitable region division method. The contrast of pixels can be used as the basis for dividing the areas. Let F(xy) with 256 gray levels be the grayscale image of an input document of size \(M\times N\), where M is the number of lines and N is the number of pixels per line in the image. The grayscale contrast C for pixel f(xy) is defined as:

$$\begin{aligned} C(x,y)= & {} \max [C_h (x,y),C_v (x,y)] \end{aligned}$$
(4)
$$\begin{aligned} C_h (x,y)= & {} \left| {f(x+1,y)-f(x,y)} \right| \end{aligned}$$
(5)
$$\begin{aligned} C_v (x,y)= & {} \left| {f(x,y+1)-f(x,y)} \right| \end{aligned}$$
(6)

where \(C_h (x,y)\) and \(C_v (x,y)\) are the absolute contrasts along the horizontal and vertical directions, respectively. For an image with a white background and black foreground, considering that there may be many background areas without characters in the image, it will save a lot of computation if this background region can be removed directly. For the target areas with characters, due to the fact that the gray contrast between regions (e.g., bright and dark regions) may be significantly different, they may need to be divided again. By dividing the area repeatedly, the areas can be fully partitioned. The whole image can be divided into not-significant areas, significant areas, and comparatively significant areas.

3.1.1 Coarse region division

The proposed algorithm utilizes a quadtree to divide areas on the basis of grayscale contrast, as shown in Fig. 1. After first division, the image is divided into four subregions named A, B, C, and D. If the maximum grayscale contrast in subregion B is less than \(k_1 \) times the maximum grayscale contrast of the whole area as follows:

$$\begin{aligned} C_{B\max } (x,y)\le k_1 \cdot C_{\mathrm{entire}\max } (x,y) \end{aligned}$$
(7)

where \(C_{B\max } (x,y)\) is the maximum grayscale contrast of subregion B after the first division, \(C_{\mathrm{entire}\max } (x,y)\) is the maximum grayscale contrast of the whole image, and \(k_1 \) is the partition coefficient between the foreground and background, the grayscale is defined to change indistinctively. Hence, this subregion is defined as background without characters and output directly. In this step, large areas of background can be eliminated. This step notably reduces the computation. The rest of the target areas with characters can then be divided sequentially.

Fig. 1
figure 1

Division diagram. Empty box Background, box with right side stripe Areas with significant grayscale contrast, and box with left side stripe Areas with comparatively significant grayscale contrast

3.1.2 Fine region division

After coarse division, areas that do not satisfy Eq. 7 are regarded as target areas with characters. In the example shown in Fig. 1, A, C, and D are target areas with characters. For a degraded document image, there may also be a significant difference in the gray contrast among the remaining regions. Hence, further subdivision needs to be done for the rest of the regions.

For instance, subregion A (note that subregions C and D have the same division rules) is divided for the second time after coarse division. If the maximum grayscale contrast in subregion AB is less than \(k_1 \) times the maximum grayscale contrast of the former division as follows:

$$\begin{aligned} C_{AB\max } (x,y)\le k_1 \cdot C_{A\max } (x,y) \end{aligned}$$
(8)

there is no significant variance in this subregion. Hence, this subregion is also background and output directly.

If the maximum grayscale contrast in subregion AA is more than \(k_2 \) times the maximum grayscale contrast of the former division as follows:

$$\begin{aligned} C_{AA\max } (x,y)\ge k_2 \cdot C_{A\max } (x,y) \end{aligned}$$
(9)

this subregion has significant variance. Hence, weak contrast enhancement is used in this subregion. If the maximum grayscale contrast in subregion AC is between \(k_1 \) and \(k_2 \) times the maximum grayscale contrast of the former division as follows:

$$\begin{aligned} k_1 \cdot C_{A\max } (x,y)\le C_{AC\max } (x,y)\le k_2 \cdot C_{A\max } (x,y)\nonumber \\ \end{aligned}$$
(10)

where \(C_{AB\max } (x,y)\),\(C_{AA\max } (x,y)\), and \(C_{AC\max } (x,y)\) are the maximum grayscale contrasts of subregions AB, AA, and AC after the second division, respectively, \(C_{A\max } (x,y)\) is the maximum grayscale contrast of subregion A after the first division, and \(k_2 \) is the partition coefficient between the significant and comparatively significant areas, the grayscale variance is comparatively significant. A strong contrast enhancement is used in this subregion.

In this study, the ranges for \(k_1 \)and \(k_2 \) were empirically determined to be \(k_1 \in [0,0.4]\) and \(k_2 \in [0.7,1]\), respectively. Two divisions were also empirically found to be the optimal number for determining the property of grayscale variance. Too many divisions will lead to a large amount of calculation, generate mistakes between the noise and target, and cannot handle noise well. At the same time, too few divisions will reduce calculation but will also lose detail.

3.2 Grayscale contrast enhancement

Section 3.1 divides areas into not-significant areas, significant areas, and comparatively significant areas. Usually, a document image has a black foreground and white background. Therefore, for not-significant areas, the gray values of pixels within this area are set to:

$$\begin{aligned} ff(x,y)=255 \end{aligned}$$
(11)

For significant areas, weak contrast enhancement [20] is used to modify gray values as follows:

$$\begin{aligned} ff(x,y)=(n-1)\times \frac{f(x,y)-f_{\min } (x,y)}{f_{\max } (x,y)-f_{\min } (x,y)} \end{aligned}$$
(12)

For comparatively significant areas, this paper proposes a strong contrast enhancement mode that further widens the contrast between pixels within that region. The gray values of the pixels are modified as follows:

$$\begin{aligned} ff(x,y)=(nn-1)\times \left( \frac{f(x,y)-f_{\min } (x,y)}{f_{\max } (x,y)-f_{\min } (x,y)}\right) ^{2} \end{aligned}$$
(13)

In Eqs. 12 and 13, f(xy) is the gray value of the original grayscale, \(f{ }_{\max }\) and \(f{ }_{\min }\) denote the maximum and minimum gray levels in the original document image, respectively, and n and nn denote the number of gray levels modified.

The essence of gray contrast enhancement methods is contrast extension. The object is to extend the contrast of target areas to a larger range of gray levels and to suppress gray-level changes in the background. The reason why two enhancement methods are used is that strong contrast enhancement inevitably produces noise in significant areas. Another reason is that, for comparatively significant areas, weak contrast enhancement may not separate clear characters because it does not have enough capacity to widen the contrast between pixels. Meanwhile, for degraded document images, a single contrast enhancement method may create a two-tone image. Therefore, two types of contrast enhancement are indispensable. Figure 2 compares the binarized images of strong and weak contrast enhancement. This illustrates the necessity of using different contrast enhancements for different grayscale variance areas.

Fig. 2
figure 2

a Original degraded image. Binarized output of the proposed algorithm using, b weak contrast enhancement only, c strong contrast enhancement only, and d both weak and strong contrast enhancements

The proposed grayscale enhancement method can effectively adjust the pixel gray values of an image with non-uniform illumination, bleed-through, and variable background. As a result, these three issues in image binarization can be solved. For degraded images caused by bleed-through or non-uniform illumination, the ink bleed-through area and lighter or darker areas can be classified as comparatively significant areas as they have little difference between the foreground and background. For this situation, the strong contrast enhancement in Eq. 13 can be used to separate clear characters. For areas that are only slightly degraded, the weak contrast enhancement of Eq. 12 can be used to reduce the effect of noise, as the areas have a significant difference between the foreground and background. For degraded images with variable background, the variety of contrast in the background is far less than it is between the background and foreground. Therefore, if the region only consists of large contrast background without any characters in it, Eq. 11 is used to remove it. If the region has both characters and variable background, two kinds of contrast enhancement are used to widen the contrast between pixels so that the foreground can be separated from the background.

3.3 Local threshold estimation

The foreground can be distinguished from the background intuitively after the grayscale values have been modified. By analysis of the document image, in general, most of the document image character pixels are less than those in the background. Only a very small number of characters are more than the background pixels. Hence, the gray values of the background and the foreground can be determined by accumulating the number of pixels corresponding to the gray value in a histogram. For an image after contrast enhancement (which is \(p\times q\) in size), we search for the gray value \(n_\mathrm{halfsize} \) for which its accumulation is closest to \(\frac{p\times q}{2}\). In addition, the highest frequency gray value is regarded as the foreground \(ff_{_\mathrm{foreground} } \) in \(0\sim n_\mathrm{halfsize} \). In \(n_\mathrm{halfsize} \sim n\), the highest frequency gray value is regarded as the background \(ff_{_\mathrm{background} } \). In some cases, there is more than one gray value that is the highest frequency for the background or foreground. We then choose the largest or smallest of these in value to represent the background or foreground. Threshold T is defined to be the mean of \(ff_{_\mathrm{foreground} } \)and \(ff_{_\mathrm{background} } \):

$$\begin{aligned} T=\frac{ff_{\text {foreground}} +ff_{\text {background}} }{2} \end{aligned}$$
(14)

Finally, the binarized image g(xy) is obtained as:

$$\begin{aligned} g(x,y)=\left\{ {{\begin{array}{ll} {1,}&{} {ff>T} \\ {0,}&{} \mathrm{otherwise} \\ \end{array} }} \right. \end{aligned}$$
(15)

4 Experiments and discussion

4.1 Experimental environment and test datasets

All algorithms were implemented on a MATLAB (r2011a) compiler and run on an Intel Core i3-3240 CPU 3.40 GHz processor with 4.00 GB RAM and Windows 7 operating system. In our experiments, we used the Document Image Binarization Contest (DIBCO) series datasets (DIBCO 2009, H-DIBCO 2010, DIBCO 2011, and H-DIBCO 2012) [21,22,23,24] that include 50 handwritten and printed images.

Fig. 3
figure 3

Example document images in the DIBCO datasets that illustrate document degradation, consisting of bleed-through in (a) and (c), image contrast variation in (b) and (d), and uneven illumination in (e)

Fig. 4
figure 4

Binarization results of the sample document images in Fig. 3a using a Otsu’s method, b Niblack’s method, c Sauvola and Pietikainen’s method, d Bernsen’s method, e Singh et al.’s method, and f proposed method

Fig. 5
figure 5

Binarization results of the sample document image in Fig. 3(b) using a Otsu’s method, b Niblack’s method, c Sauvola and Pietikainen’s method, d Bernsen’s method, e Singh et al.’s method, and f proposed method

Fig. 6
figure 6

Binarization results of the sample document image in Fig. 3(c) using a Otsu’s method, b Niblack’s method, c Sauvola and Pietikainen’s method, d Bernsen’s method, e Singh et al.’s method, and f proposed method

Fig. 7
figure 7

Binarization results of the sample document image in Fig. 3(d) using a Otsu’s method, b Niblack’s method, c Sauvola and Pietikainen’s method, d Bernsen’s method, e Singh et al.’s method, and f proposed method

Fig. 8
figure 8

Binarization results of the sample document image in Fig. 3(e) using a Otsu’s method, b Niblack’s method, c Sauvola and Pietikainen’s method, d Bernsen’s method, e Singh et al.’s method, and f proposed method

4.2 Testing segmentation results

The proposed approach for binarization was compared with five recent and benchmark binarization methods: Otsu, Niblack, Sauvola and Pietikainen, Bernsen, and Singh et al. The experimental results of all the methods in Fig. 3 are shown in Figs. 4, 5, 6, 7 and 8. Because Niblack’s, Bernsen’s, Sauvola and Pietikainen’s, and Singh et al.’s methods all have the problem of parameter selection, this study set the parameters according to the references [12,13,14, 18]. For Niblack’s, Bernsen’s, and Sauvola and Pietikainen’s methods, five experiments were done with the window sizes 5 \(\times \) 5, 15 \(\times \) 15, 25 \(\times \) 25, 35 \(\times \) 35, and 50 \(\times \) 50. For Singh et al.’s method, we used the block sizes 32 \(\times \) 32, 64 \(\times \) 64, 128 \(\times \) 128, 256 \(\times \) 256, and 512 \(\times \) 512. The proposed method randomly selected five groups \(k_1 \), \(k_2 \) from \(k_1 \in [0,0.4]\), \(k_2 \in [0.7,1]\) using 0.1 as the interval to perform the tests. The above experiments all selected the binary image with the best F-measure value as the final result.

4.3 Visual evaluation

4.3.1 Experiment 1

As can be seen from Figs. 4 and 6, for images with bleed-through, Otsu’s method and Sauvola and Pietikainen’s method inevitably produce a little noise. Niblack’s method mistakes noise caused by bleed-through as foreground. Bernsen’s method produces a large amount of background noise. In addition, Singh et al.’s method also introduces noise in the background areas. However, in these images, the proposed algorithm can intelligently select the target area and non-target background area, avoiding the interference of noise.

4.3.2 Experiment 2

It can be seen from Figs. 5 and 7, that for images with variable background, Otsu’s method and Sauvola and Pietikainen’s method can separate characters without noise. However, the characters have clear broken strokes in weak contrast areas so they are unable to provide a reliable basis for subsequent character recognition. Although Niblack’s method can isolate clear characters in both strong and weak contrast areas, at the same time, it detects a large number of black blobs in the non-target areas. The noise generated by Bernsen’s method almost covers the target area and cannot distinguish the background area and target area. Singh et al.’s method still does not work well in degraded images with non-dense text. However, the proposed method can separate clear characters in both the strong and weak contrast areas without noise.

4.3.3 Experiment3

It can be seen in Fig. 8, that for images with uneven illumination, Otsu’s method, Niblack’s method and Bernsen’s method all cannot eliminate the influence of the dark background. Although Sauvola and Pietikainen’s method can handle noise, it loses many characters in the lighter and darker areas. Nevertheless, Singh et al.’s method and the proposed method can restore more complete characters with minimal noise.

4.4 Ground-truth-based evaluation measures

Higher F-measure, higher PSNR, and lower negative rate metric (NRM) are the essential conditions for a high-quality binarized image [16]. F-measure is calculated as:

$$\begin{aligned} {\hbox {FM}}=\frac{2\times {\hbox {RC}}\times {\hbox {PR}}}{{\hbox {RC}}+{\hbox {PR}}} \end{aligned}$$
(16)

where PR and RC refer to the binarization recall and the binarization precision, respectively. Table 1 shows the F-measure of the results of various algorithms on the DIBCO datasets.

PSNR is calculated using

$$\begin{aligned} {\text {PSNR}}=10\log \left( \frac{C^{2}}{\text {MSE}}\right) \end{aligned}$$
(17)

where MSE denotes the mean square error. Table 2 shows the PSNR of the results of various algorithms on the DIBCO datasets.

Table 1 Comparison of F-measure for six algorithms
Table 2 Comparison of PSNR for six algorithms

Finally, NRM is calculated as:

$$\begin{aligned} {\text {NRM}}=\frac{\frac{{\hbox {FN}}}{{\hbox {FN}}+{\hbox {TP}}}+\frac{{\hbox {FP}}}{{\hbox {FP}}+{\hbox {TN}}}}{2} \end{aligned}$$
(18)

where TP, TN, FP, and FN denote the number of true positives, true negatives, false positives, and false negatives, respectively. Table 3 shows the NRM of the results of the various algorithms on the DIBCO datasets.

Table 3 Comparison of NRM for six algorithms

Tables 12, and 3 illustrate that the binarized images using the proposed algorithm have the highest F-measure (4% higher than Otsu’s method), the highest PSNR (5% higher than Sauvola and Pietikainen’s method) and a higher NRM. The following explanation shows why the proposed method has a slightly higher NRM. Table 4 lists the metrics for the binary image in Fig. 8 for various algorithms.

Table 4 Parameters for binary image by various algorithms

Here, FP represents the number of pixels that are black in the binarized image but white in the ground truth, FN is the number of pixels that are white in the binarized image but black in the ground-truth image, TP represents the number of pixels that are white in both the binarized and ground-truth images, and TN represents the number of pixels that are white in both the binarized and ground-truth images. In addition, FP + FN represents the total number of pixels in error. According to Eq. 13, NRM counts the average of the proportion of background pixels for which the foreground pixels have mistakenly been regarded as background and foreground pixels for which the background pixels have mistakenly been regarded as foreground. For instance in the proposed method and Otsu’s method, the proposed method has a higher F-measure, but \({\hbox {NRM}}_{\mathrm{otsu}} =\frac{\frac{765}{765+45,216}+\frac{60,069}{60,069+691,103}}{2}=0.0483\) and \({\hbox {NRM}}_\mathrm{proposed} =\frac{\frac{8963}{8963+37,013}+\frac{3180}{3180+747,992}}{2}=0.0996\)

This indicates that the binary image of Otsu’s method has a large FP because of a great deal of noise. However, the image binarized by the proposed method effectively avoids the noise, and hence, it has a small FP. However, when widening the contrast for fuzzy character edges, the pixels in these edges may be divided into background mistakenly because of their minor contrast enhancement. This is why the FN of the proposed method is larger, and hence, \({\hbox {NRM}}_\mathrm{proposed} >\mathrm{NRM}_\mathrm{otsu} \). This shows that a smaller error partition of the pixels in the output binary image may lead to a larger NRM. It will not affect the subsequent recognition work as long as the character strokes are not extremely fine. At the same time, it also illustrates that the algorithm is not suitable for blurry images with slender characters.

4.5 Execution time-based evaluation

It is shown in Table 5 that the relatively high complexity of our algorithm is the reason why the average execution time by proposed method is not the fastest. It will inevitably lead to a longer execution time. But our algorithm in the MATLAB platform can still be completed within 1 second, and this can fully meet the needs of practical applications.

Table 5 Comparison of execution time for six algorithms
Fig. 9
figure 9

Recognition results of the binarization image handled by each algorithm in OCRs

4.6 OCR-based evaluation

OCR-based comparison is one of the most acceptable methods for the quantitative evaluation of binarization algorithms [25]. To test the recognition effect of various algorithms in OCR, this experiment tested images including all printed images in DIBCO datasets and an image randomly captured under non-uniform illumination. We selected an image as a representative. And we selected four algorithms with the best F-measure in Table 1 to process the degraded image, testing their recognition rate in ABBYY Fine Reader 12 [26] and Free OCR [27]. Figure 9 shows the recognition results of the binarized image handled by each algorithm in OCRs.

4.6.1 Qualitative analysis for Fig. 9

Table 6 Recognition rate of various algorithms by two OCR programs
Table 7 Recognition rate of various algorithms by ABBYY FineReader

In Fig. 9, the first image in the upper left corner is the original gray image. It can be seen that the original image has a lighter background in the top left corner and a darker background in the bottom right corner because of its non-uniform illumination. Here, Otsu’s method does not work well. Because it is a global threshold method, it cannot separate clear characters using the same threshold both in lighter and in darker backgrounds at the same time. Sauvola and Pietikainen’s method, Singh et al.’s method, and the proposed algorithm are independent of non-uniform illumination and can separate the characters.

4.6.2 Quantitative analysis for Fig. 9

The CRR (correct rate of recognition) by OCR is defined as:

$$\begin{aligned} {\text {CRR}}=\frac{N_{\hbox {crc}} }{N_{\text {total}} }\times 100{\% } \end{aligned}$$
(19)

where \(N_\mathrm{crc} \) is the number of correctly recognized characters and \(N_\mathrm{total} \) is the total number of characters.

Table 6 shows the CRRs for the original gray image and binary images of the four algorithms in ABBYY FineReader and Free OCR.

The combination of Fig. 9 and Table 6 shows that the original gray image with its non-uniform illumination has a very low rate of recognition in Free OCR. The binary image created by Otsu’s method only has a 63% correct rate in the two OCRs because of its black area on the right side of the image. Although the images binarized by Sauvola and Pietikainen’s and Singh et al.’s methods segment the characters, there is a problem in that the characters are stuck together, are incomplete, or have broken strokes. Hence, the images binarized by Sauvola and Pietikainen’s and Singh et al.’s methods have a high error rate for both OCRs. However, the images produced by the proposed method have clear and complete characters that are easy to identify. The OCR programs can achieve a more than 99.5% recognition rate.

Table 8 Recognition rate of various algorithms by Free OCR

4.6.3 Quantitative analysis for images in datasets

At the same time, in order to further test the generality of proposed algorithm, this paper tested all printed images in datasets. Table 7 and Table 8 show the CRRs of the four algorithms in ABBYY FineReader and Free OCR. From the tables, it can be seen that the proposed algorithm gains the highest average CRR, and it is about 4.5% higher than the second highest one by original. The reason for this phenomenon is that other three algorithms have very low correct rates for some individual images, and it leads the average CRRs are pulled down. For example, the 2011-PR7 processed by Otsu’s and Singh et al.’s methods, the 2011-PR6 processed by Sauvola and Pietikainen’s and Singh et al.’s methods, the CRRs in above images are all 0 percentage. However, this is the case because many characters binarized by other three algorithms were broken in strokes. For instance, e was binarized as c, m was binarized as ni, etc. The images with broken strokes can still have a higher F-measure and PSNR. But, OCR will recognize a wrong character. So, it occurs with a higher F-measure, a higher PSNR, and a lower CRR. This illustrates that the above three algorithms have limitations for some images and may not have a good recognition in troubled times. However, the proposed method is universal and has a relatively good recognition accuracy for most images. Meanwhile, the average recognition accuracy by our proposed method is the highest; these OCR results show the effectives of the proposed binarization technique.

5 Conclusion

Using the difference of gray contrast between regions, the method proposed in this paper adaptively divides significant areas and comparatively significant areas. For significant areas, weak contrast enhancement is used to magnify the difference between the foreground and background. Meanwhile, weak contrast enhancement is able to reduce noise in the results. For comparatively significant areas, strong contrast enhancement is used to adjust gray values so that the method can easily distinguish between foreground and background, and clear characters can be separated. Hence, no matter the type of area (including variable background, non-uniform illumination, and bleed-through), there is always an appropriate method that will achieve satisfactory results. Furthermore, the proposed method is particularly effective for degraded document images with bleed-through and severely uneven illumination. The experimental results show that, in the results obtained from DIBCO image set processed by the six algorithms, the binarized images handled by the proposed method have clear and complete characters as well as mostly noise-free backgrounds. Meanwhile, the images binarized using the proposed algorithm achieve the highest F-measure and PSNR. When compared with the OCR results of four top binarization methods, the proposed method obtains the highest CRR.