Efficient multiscale Sauvola’s binarization

Lazzara, Guillaume; Géraud, Thierry

doi:10.1007/s10032-013-0209-0

Efficient multiscale Sauvola’s binarization

Original Paper
Published: 04 July 2013

Volume 17, pages 105–123, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Efficient multiscale Sauvola’s binarization

Download PDF

Guillaume Lazzara¹ &
Thierry Géraud¹

991 Accesses
41 Citations
Explore all metrics

Abstract

This work focuses on the most commonly used binarization method: Sauvola’s. It performs relatively well on classical documents, however, three main defects remain: the window parameter of Sauvola’s formula does not fit automatically to the contents, it is not robust to low contrasts, and it is not invariant with respect to contrast inversion. Thus, on documents such as magazines, the contents may not be retrieved correctly, which is crucial for indexing purpose. In this paper, we describe how to implement an efficient multiscale implementation of Sauvola’s algorithm in order to guarantee good binarization for both small and large objects inside a single document without adjusting manually the window size to the contents. We also describe how to implement it in an efficient way, step by step. This algorithm remains notably fast compared to the original one. For fixed parameters, text recognition rates and binarization quality are equal or better than other methods on text with low and medium x-height and are significantly improved on text with large x-height. Pixel-based accuracy and OCR evaluations are performed on more than 120 documents. Compared to awarded methods in the latest binarization contests, Sauvola’s formula does not give the best results on historical documents. On the other hand, on clean magazines, it outperforms those methods. This implementation improves the robustness of Sauvola’s algorithm by making the results almost insensible to the window size whatever the object sizes. Its properties make it usable in full document analysis toolchains.

ISauvola: Improved Sauvola’s Algorithm for Document Image Binarization

A new efficient binarization method: application to degraded historical document images

Article 24 February 2017

Adaptive binarization method for degraded document images based on surface contrast variation

Article 30 October 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

1.1 Overview

Over the last decades, the need for document image analysis has increased significantly. One critical step of the analysis is to identify and retrieve foreground and background objects correctly. One way to do it is to produce a binary image; however, it is not easy to find the best thresholds because of change of illumination or noise presumed issues. As exposed in Sezgin and Sankur’s survey [1], many attempts have been made to find an efficient and relevant binarization method.

Some methods perform globally. Otsu’s algorithm [2] is known as one of the best in that category. It aims at finding an optimal threshold for the whole document by maximizing the separation between two pre-assumed classes. Despite fast computing times, it is not well adapted to uneven illumination and to the presence of random noise.

Other methods perform locally, trying to find the different satisfying thresholds for specific regions or around every pixels. A well-performing local thresholding method was proposed by Niblack [3]. The idea is to compute an equation usually based on the mean and the standard deviation of a small neighborhood around each pixel. It works fine on clean documents but can give deceiving results in relatively degraded documents. Then, Sauvola and Pietikainen [4] proposed an improvement of Niblack’s method to improve binarization robustness on noisy documents or when show-through artifacts are present. Currently, this is one of the best binarization methods for classical documents according to several surveys [1, 5]. While Niblack’s and Sauvola’s methods rely on the local variance, other local methods use different local features like the contrast in Bernsen’s method [6]. Some methods also try to mix global and local approaches, like Gabarra’s [7], so that object edges are detected and the image is split into regions thanks to a quadtree. Depending on whether the region contains an edge or not, a different threshold formula is used.

Local methods are more robust than global methods but often introduce parameters. Usually, methods are parameter sensitive, and the most difficult part is to find the best values for the set of documents to be processed. Algorithms for automatic estimation of free parameter values have been proposed by Badekas and Papamarkos [5] and Rangoni et al. [8]. Unfortunately, even if these values fit many kinds of documents, they may not be generic enough, and some adaptations may be needed with respect to the type of documents to process. That is the main reason why many attempts have been made to improve the best methods by automatically adjusting the parameters to the global [9, 10] or local contents [11, 12]. This also includes some works on getting multiscale versions of common algorithms like Otsu’s or Sauvola’s [10]. Eventually, improvements can be effectively observed in specific cases.

Over the last four years, more and more local methods try to rely not only on the pixel values in the threshold decision but also on higher-level information. Lu et al. [13] model the document background via an iterative polynomial smoothing and then choose local thresholds based on detected text stroke edges. Lelore and Bouchara [14, 15] use coarse thresholding to partition pixels into three groups: ink, background, and unknown. Some models describe the ink and background clusters and guide decisions on the unknown pixels. Because they rely on the document contents, those methods are usually considered as parameters free. Furthermore, the recent contests have proven their efficiency on historical documents: Lu’s method won dibco 2009 [16] and an improved version tied as a winner of hdibco 2010 [17], whereas Lelore’s method won dibco 2011 [18]. More recently, the winner of hdibco 2012, Howe, proposes a method [19] which optimizes a global energy function based on the Laplacian image. It uses both a Laplacian operator to assess the local likelihood of foreground and background labels and Canny edge detection to identify likely discontinuities. Finally, a graph cut implementation finds the minimum energy solution of a function combining these concepts. Parameters of the method are also adjusted dynamically w.r.t the contents using a stability criterion on the final result.

Because Sauvola’s binarization is widely used in practice and gives good results on magazines, this paper focuses on that particular method.

1.2 Sauvola’s algorithm and issues

Sauvola’s method [4] takes a grayscale image as input. Since most of document images are color images, converting color to grayscale images is required [10]. For this purpose, we choose to use the classical luminance formula, based on the eye perception:

$$\begin{aligned} \text{ Luma } = 0.299 \times \text{ R } + 0.587 \times \text{ G } + 0.114 \times \text{ B }. \end{aligned}$$

From the grayscale image, Sauvola proposed to compute a threshold at each pixel using:

$$\begin{aligned} T = m \times \left[ 1 + k \times \left( \frac{s}{R} - 1\right) \right] . \end{aligned}$$

(1)

This formula relies on the assumption that text pixels have values close to black (respectively, background pixels have values close to white). In Eq. 1, $k$ is a user-defined parameter, $m$ and $s$ are, respectively, the mean and the local standard deviation computed in a window of size $w$ centered on the current pixel, and $R$ is the dynamic range of standard deviation ($R = 128$ with 8-bit gray level images). The size of the window used to compute $m$ and $s$ remains user defined in the original paper.

Combined with optimizations like integral images [20], one of the main advantages of Sauvola’s method is its computational efficiency. It can run in $<$60 ms on A4 300-dpi documents with a modern computer. Another advantage is that it performs relatively well on noisy and blurred documents [1].

Due to the binarization formula, the user must provide two parameters $(w,k)$. Some techniques have been proposed to estimate them. Badekas and Papamarkos [5] state that $w=14$ and $k=0.34$ are the best compromise for show-through removal and object retrieval quality in classical documents. Rangoni et al. [8] based the parameter research on optical character recognition (OCR) result quality and found $w=60$ and $k=0.4$. Sezgin and Sankur [1] and Sauvola and Pietikainen [4] used $w=15$ and $k=0.5$. Adjusting those free parameters usually requires an a priori knowledge on the set of documents to get the best results. Therefore, there is no consensus in the research community regarding those parameter values.

Sauvola’s method suffers from different limitations among the following ones.

1.3 Missing low-contrast objects

Low-contrasted objects may be considered as textured background or show-through artifacts due to the threshold formula (Eq. 1) and may be removed or partially retrieved. Figure 1 illustrates this issue. The region of interest considered shows the values taken into account in a window of size $w=51$ centered at the central point depicted in green: contrasts are very low. In that case, the corresponding histogram illustrates how sensitive Sauvola’s method is to $k$. Object pixels cannot be correctly retrieved if $k$ is greater than 0.034. A low value of this parameter can help retrieving low-contrasted objects but since it is set for the whole document, it also alters other parts of the result: correctly contrasted objects are thicker in that case, possibly causing unintended connections between components. This is due to the fact that background noise and artifacts are usually poorly contrasted and are retrieved as objects.

1.4 Keeping textured text as is

Textures are really sensitive to window size. Figure 2a and d show binarization results of textured and non-textured text with the same font size. Even though the textured text is bold, inner parts of the characters are missing after binarization (see Fig. 2b). In Fig. 2e, the text is still well preserved and suitable for OCR . In Fig. 2c, using a larger window may improve the binarization results on textured text. However, this solution cannot be applied if it is mixed with plain text since, as shown in Fig. 2f, the retrieved text would be bolded.

1.5 Handling badly various object sizes

In case of both small and large objects in a same document, Sauvola’s method will not be able to retrieve all objects correctly. In most cases, one may want to retrieve text in documents, so a small window may be used. Small text should be retrieved perfectly, however, larger text may not. Figure 2h illustrates what happens when the selected window is too small compared to the objects of the document. We expect the algorithm to retrieve plain objects but in case of a too small window, statistics inside the objects may behave like in background: pixels values are locally identical. Since Sauvola’s formula relies on the fact there is a minimum of contrast in the window to set a pixel as foreground, it is unable to make a proper choice.

1.6 Spatial object interference

This issue mainly appears with image captions such as in Fig. 3a. Too large windows may include data from objects of different nature. In Fig. 3a, data from the image located above the caption are taken into account, leading to irrelevant statistics and invalid binarization. This is probably one of the reasons why Sauvola and Pietikainen [4] choose to first identify text and non-text regions before binarization.

Several attempts have been made in order to improve Sauvola’s binarization results and to prevent these defects. Wolf and Jolion [21] try to handle low-contrast and textured text defects. It consists in normalizing the contrast and the mean gray level of the image in order to maximize local contrast. Text is slightly bold though. Bukhari et al. [12] try to improve results by adjusting the parameter $k$ depending on whether a pixel is part of a foreground or background object. They claim that Sauvola’s method is very sensible to $k$ and can perform better if it is tuned, which is something we have also noticed. Farrahi Moghaddam and Cheriet [10] tried to improve the results in case of intensity and interfering degradation by implementing a multiscale version of Sauvola’s algorithm. First, the average stroke width and line height are evaluated. Then, in each step, the scale is reduced by a factor of 2 and the parameters are adjusted: $k$ is set from $0.5$ to $0.01$. The idea is to make the results from the lower scale grow while retrieving only text pixels at each step. Yet, this method only works well on uniform text size.

Kim [22] describes in details issues caused by too small or too large windows. He actually describes some of the limitations cited above and proposes an hybrid solution that takes advantage of two window sizes: a small one in order to get local fine details and a larger one to get the global trend. First, the input image is binarized with a moderate-size window. Then, text lines are located and features are computed from the text: average character thickness and text height. For each text line, two windows are deduced from those features, and two thresholds $T_{large}$ and $T_{small}$ are computed thanks to Sauvola’s formula. Finally, the binarization of each text line is performed using:

$$\begin{aligned} T(x,y) = \alpha T_{large}(x,y) + (1 - \alpha ) T_{small}(x,y). \end{aligned}$$

According to the author, this method gives better results than Sauvola’s binarization. However, it introduces a new parameter $\alpha $ which tends to make the fine tuning of the method more difficult, even if the authors claim that the method is not very sensitive to it. Moreover, the critical part of the algorithm remains the text line detection which assumes that the first binarization has retrieved all the text parts and that text components are correctly grouped. In the case of magazines with different kinds of non-text, we have observed that some text components can be grouped with non-text components which may lead to incorrect features and binarization.

In the remainder of this paper, we present an algorithm to overcome one of the four limitations of Sauvola’s binarization mentioned previously, e.g., handling various object sizes on a single run of the algorithm, without any prior knowledge on the location of text lines. It is actually penalizing while processing magazines or commercials where text uses different font sizes: titles, subtitles, subscripts, etc. We also focus on the implementation and computational speed which is also a critical aspect from our point of view.

In Sect. 2, we first expose the general principle of the proposed multiscale algorithm. In Sect. 3, we describe implementation details and explain how to implement our method efficiently. In Section 4, we present some results and compare them to other methods. We conclude on the achievements of this work and discuss future work in Sect. 7.

2 Multiscale binarization

Large text in documents like magazines is of prime importance for indexing purpose since it usually contains the main topics of documents. Among the four presented defects, handling object of different sizes is thus a priority. The problem with binarizing objects of different sizes is caused by using a single window size, thus not well-suited to all objects. A local window is needed in order to fit appropriately the local document contents. Since we want to preserve performance, we need to avoid costly pre-processing algorithms which would require additional passes on data. Therefore, we want to include the window selection inside the binarization process, which is possible with a multiscale approach.

Multiscale strategies are common in image processing literature and are sometimes used in binarization. Farrahi Moghaddam and Cheriet [10] start by processing the full size image with very strict parameters. After each iteration, the input image is subsampled and the parameters are relaxed, so that more pixels are considered as foreground. Only those connecting to previously identified components are kept increasing the area of the components found in the previous iterations. Here, both data and parameters vary. Tabbone and Wendling propose a method [9] where the image size does not change. A parameter varies in a range of values and the best parameter value is selected by evaluating the result stability at each step. In Gabarra and Tabbone’s method [7], edges are detected then a quad-tree decomposition of the input image is computed. On each area, a local threshold is computed and applied to all the pixels of that area. It is multiscale since parts of an image can be processed at different levels in its quad-tree representation.

In our approach, we choose to run the same process at different scales using the same parameters; the input data is just subsampled. Eventually, the final result is deduced as a merge of the results obtained at different scales. In our approach, we make the assumption that objects are more likely to be well retrieved at one of the scales. The method is described in details in the following subsections.

2.1 Notation

We will use the following notation:

uppercase letters refer to image names: S, I, ...
lowercase letters refer to scalar values: s, n, ...
subscript values refer to a scale number: $I_s$ is an image $I$ at scale $s$.
$I_s(p)$ corresponds to the value of the pixel $p$ of the image $I_s$.

2.2 General description

The main goal of the proposed method is to find the best scale for which the algorithm is able to decide correctly whether a pixel belongs to the foreground or to the background. As described in Sect. 2.4, a relationship exists between the scale where an object is retrieved and the window size which should be used for capturing this object correctly. Our algorithm is composed of four main steps described below and illustrated in Fig. 4:

Step 1 Subsampling. The input image is successively subsampled to different scales.
Step 2 Object selection at each scale. Subsample images are binarized, labeled, and, object components are selected.
Step 3 Results merging. Selected objects are merged into a single scale image. A threshold image is deduced from the scale image.
Step 4 Final binarization. From the threshold image, the input image is binarized.

2.3 Step 1: subsampling

First, the input image, once turned into grayscale, $I_1$ is subsampled at three different scales thus producing three other images: $I_2, I_3$, and $I_4$. The choice of the number of scales, here 4, is related to the fact that we work mainly on A4 documents between 300 and 600 dpi. Thus, the maximum size of an object is constrained and, because of the object area range accepted (see Sect. 2.4), four scales are sufficient to retrieve correctly objects. However, on larger documents and/or with higher resolutions, it might be useful to have a few more scales.

The reduction factor between scales $s$ and $s+1$ is mostly set to 2. This value has been chosen because higher values may lead to a loss of precision in the final results. This side effect mainly appears for high scales, where images contain less and less information. Using a reduction factor of 3 for the first subsampling is usually fine, if the image has a minimum resolution of 300 dpi. This reduction factor value may also be useful in the case of large documents in order to improve overall performance. At the end of this step, we have four grayscale images: $I_1, I_2, I_3$, and $I_4$.

2.4 Step 2: object selection at each scale

Each input image $I_s$ is processed separately thanks to the same processing chain as depicted in Fig. 4. The goal is to select objects within a specific size (area) range.

$I_s$ is binarized using Sauvola’s algorithm with a fixed window of size $w$. As shown in Fig. 2h, the size of the window influences the size and the shape of the retrieved objects. Here, the image $I_s$ is a subsampled version of the input and so are the objects. Therefore, working at a scale $s$ with a window of size $w$ is equivalent to work at scale 1 with a window of size $w_{1,s}$ regarding the reduction factor $q$:

$$\begin{aligned} w_{1,s} = q^{(s-1)} \times w. \end{aligned}$$

When the scale increases, objects size decreases in the subsampled image, and objects are more likely to fit the window to avoid the defects shown in Fig. 2g.

As shown in Fig. 4, the binarization at scale $s$ produces two images: $T_s$, a threshold image storing the point-wise thresholds used during the binarization; and $B_s$, the resulting binary image at scale $s$. $T_s$ will be used later, during the final step. The binarization of $I_s$ includes connected components of various sizes. Some of them need to be removed because they are too small or too large for giving good results with the current window size. We consider a minimum and a maximum size for acceptable objects. We chose the area, i.e., the number of pixels of a connected component, as size criterion. The different ranges of area are defined according to the current scale $s$, the reduction factor $q$, and the constant window size $w$:

at scale $1$:
- $min\_area(1) = 0$
- $max\_area(1) = w^2 \times 0.7$
at scale $s$:
- $min\_area(s) = 0.9 \times max\_area(s-1) / q^2$
- $max\_area(s) = max\_area(s-1) \times q^2$
at the last scale $s_{{\max }}$:
- $min\_area(s_{\max }) = 0.9 \times max\_area(s_{\max }-1) / q^2$
- $max\_area(s_{\max }) = +\infty $.

Those area ranges correspond to commonly used body text, small titles, and large titles in 300 dpi magazines documents. In order to disambiguate objects which are at the limit between two scales, object area ranges overlap when considered at scale 1.

A connected component labeling is performed on $B_s$, and a selection of components having their area in the expected range is performed. The result is stored in the binary image $S_s$. At the end of this step, eight images are kept for the next steps: $T_{1}, T_{2}, T_{3}$, and $T_{4}$ store the thresholds; $S_{1}, S_{2}, S_{3}$, and $S_{4}$ store the selection of objects for their corresponding scale.

2.5 Step 3: result merging

The main goal of this step (Fig. 4) is to prepare the final binarization by mapping each pixel from the input image to a threshold previously computed during Step 2.

Once an object is stored in $S_s$, it means that it has been retrieved at scale $s$. One wants to merge this piece of information into a single scalar image $E_1$. It consists in marking in $E_1$ each object in $S_s$ using its corresponding scale $s$ (see Fig. 5). Since $S_1, S_{2}, S_{3}$, and $S_{4}$ are at different scale, objects extracted from $S_s$ images must be rescaled before being marked in $E_1$.

Sometimes objects are retrieved several times at different scales: during subsampling, components may be connected and may have formed objects large enough to enter several area ranges. Overlapping area criteria may also be responsible for such an issue sometimes. In that case, objects are considered to have been retrieved at the highest scale. This way, we guarantee that the window, even if it is not the best one, will be large enough to avoid degraded objects as depicted in Fig. 2h.

Once $E_1$ is computed, every pixel of binarized objects is mapped to a scale. Yet, non-object pixels do not belong to a scale at this step. Most of them are usually background pixels but others can be pixels around objects, ignored because of the loss of precision due to subsampling. For that reason, they must be associated with a scale too in order to be processed like other pixels afterward. Omitting scale mapping for that pixels and considering them as background information directly would lead to sharp object edges and artifacts.

An influence zone algorithm [23, 24] is applied to $E_1$ to propagate scale information and guaranty smooth results. It actually consists in a discrete Voronoï tesselation where the seeds are the different connected components. The result is stored in $E_1\_IZ$, and values of $E_1\_IZ$ are restricted to scale numbers. Here, the possible values in that image are 1, 2, 3, and 4. $E_1\_IZ$ maps pixels to scales; yet, to effectively binarize the input image, $T_1\_MS$ is needed to map scales data to effective thresholds for each pixel. From $E_1\_IZ$ and the $T_{1}$, $T_{2}$, $T_{3}$, $T_{4}$ images, produced during Step 2, $T_1\_MS$ is deduced: for each pixel $p$ at scale 1, we know its scale $s=E_1\_IZ(p)$, and we deduce the corresponding point $p^{\prime }$ at scale $s$, so that $T_s(p^{\prime })$ is stored as $T_1\_MS(p)$. This image computation is illustrated in Fig. 6. At the end of this step, $T_1\_MS$ provides a single threshold for each pixel.

Reusing the thresholds in $T_s$ is equivalent to computing the thresholds in $I_1$ with the window corresponding to scale $s$. That way, the window is defined pixel-wise which contrasts with the global approach of the original method.

2.6 Step 4: final binarization

A point-wise binarization is performed with $I_1$ and

$T1\_MS$ to get the final result stored in $B_1$.

3 Optimization

This algorithm is designed to be part of a whole document processing chain. Performance is thus a requirement. The multiscale approach implies extra computation: in addition to a classical implementation of Sauvola’s algorithm, we introduce three new steps. Thanks to the multiscale approach, most of the computation is performed on subsampled (smaller) images which limits the impact of the additional steps. Whatever the size of an image, iterations over all its pixels are time consuming, so the main goal of the following optimization is to reduce the number of iterations performed on images. Working at scale $1$ is also expensive because of its full resolution. Therefore, step 2 is restricted to scale $s >= 2$, and the original input image is only used to initialize multiscale inputs and to perform the final binarization.

3.1 Step 1: setup of input data

In order to prepare multiscale computations, successive antialiased subsamples are computed. Image at scale $s$ is computed thanks to the image at scale $s - 1$ by computing for each pixel at scale $s$ the average value of its neighboring pixels at scale $s - 1$. Computing this way reduces the number of operations from $3 \times height_1 \times width_1$ to $height_1 \times width_1 \times \left( 1+\frac{1}{q^2}+\frac{1}{q^4}\right) $, where $height_1$ and $width_1$ are, respectively, the height and the width of images at scale 1.

Subsampling is performed using integer ratios. Images not having dimensions divisible by these ratios are handled by adding special border data: data located on the inner border of the image are duplicated in an added outer image border. The size of this new border is adjusted to make the dimension of the so extended image a multiple of the subsampling ratio. Having images with an exact proportional size is required to find corresponding pixels between images at different scales.

An integral image is also computed in a single pass. The use of integral images in binarization algorithms was first introduced by Shafait et al. [20] allowing local thresholding methods, such as Sauvola’s method, to run in time close to global methods. The idea is to compute an image in which the intensity at a pixel position is equal to the sum of the intensities of all the pixels above and to the left of that position in the original image. Thanks to such an image, as shown in Fig. 8, computing the mean $m(x,y)$ boils down to:

$$\begin{aligned} m(x,y) \!&= \!\frac{1}{w^2}\times \left( I\left( x\!+\!\frac{w}{2}, y \!+\! \frac{w}{2}\right) \!+\! I\left( x \!-\! \frac{w}{2}, y \!-\!\frac{w}{2}\right) \right. \\&\quad \left. - I\left( x + \frac{w}{2}, y - \frac{w}{2}\right) - I\left( x - \frac{w}{2}, y + \frac{w}{2}\right) \right) . \end{aligned}$$

Our algorithm uses integral images to compute both the local means and variances which respectively need local sums and squared sums. For performance reasons, these statistics are stored in a single image as a pair: single data block and directional reading enable data locality. A single integral image is computed from $I_1$, so that statistics are exact. It is stored in an image at scale 2 because it is needed only for scales $s >= 2$ (as explained further) and its reduces the memory footprint. Since there exists a relationship regarding the pixel positions between images at different scales, it is possible to read directly in that image from any scale with the guarantee of computing exact statistics in constant time.

The integral image and the subsampled images are actually computed from the same data and with the same process: iteration over the whole image and computation of sums of values. In our implementation, as shown in Fig. 7, the first subsampled image, $I_2$, and the integral image are computed at the same time. It saves one more iteration on the input image and many floating point operations.

3.2 Step 2: object selection at each scale

Through this step, each subsampled image is binarized, objects are selected and finally marked in the scale image $E_2$. In the general approach, each subsampled image needs to be binarized and labeled before selecting objects thanks to an area criterion. These three algorithms follow a common pattern:

Binarization. For each pixel, a threshold is computed and defines whether the current pixel is part of an object or of the background.
Labeling. For each object pixel, during a first pass, performing a backward iteration, a new parent is marked so that, at the end of the pass, pixels of the same object are linked altogether thanks to a parent relationship. During a second pass, performing a forward iteration, component labels are propagated within components. Component area can also be retrieved directly because of the way relationships were created. This labeling is based on Tarjan’s Union-Find algorithm [25].
Filtering. For each object pixel, if the object area is too low or too high, the pixel is considered as belonging to background.
Marking. For each selected object pixel, mark the pixel in $E_2$ with the current scale as value.

Thanks to this pattern, these steps can be combined into a single algorithm, described in Algorithm 1, decreasing the number of iterations on the whole image from five down to two.

In Algorithm 1, seven images are used: $I_s$, the subsampled input image, $T_s$, the image storing thresholds used for binarization, $Parent_s$, storing the parent relationship for the Union-Find algorithm, $Card_s$, storing the components’ area, $B_s$, the filtered and binarized image, $E_2$, the image at scale 2 where retrieved objects are marked with their corresponding scale, and $Int_2$, the integral image at scale 2. At any time, for a specific pixel in $I_s$, the corresponding data must be readable in all these images at different scales. This algorithm is composed of two passes. In the first pass, the input image at scale $s$ is binarized and the component area is computed. In the second pass, components are filtered with the area criterion and marked in the scale image $E_2$.

In the first pass, for each pixel, statistics are computed from the integral image $Int_2$ (line 14). Note that in compute_stats(), the window size effectively used to read $Int_2$ is actually of size $w_{2,s}$ (see Sect. 2.4). Then, Sauvola threshold is computed (line 15), and the pixel value is thresholded (line 16). If the thresholded value returns True, either a new component is created or the pixel is attached to an existing component in its neighborhood. In those both cases, component area is updated (lines 19 and 24).

In the second pass, all pixels belonging to a component with an area within the valid range are marked in $E_2$ (line 32). Note that those pixels are marked in $E_2$ with the current scale as value using mark_pixel_in_scale_image() but only if they have not been marked before (lines 34 and 39). Processing these selected pixels is straightforward and does not require any labeling.

At the end of this step, four images are available for the next steps: $E_2$ and $T_{2}, T_{3}$, and $T_{4}$. Note that since this step is performed on scales $s >= 2$, the scale image is only known at scale 2 (not at full scale), and there are only three $T_s$ threshold images instead of 4. Therefore, avoiding computation at scale 1 reduces memory usage and saves some execution time.

3.3 Step 3: result merging

Since $E_2$ is built iteratively during step 2, no merging is needed anymore here. Only the influence zone is performed on $E_2$ to produce $E_2\_IZ$.

3.4 Step 4: final binarization

During this step, the aim is to compute a binarization of the input image $I_1$. Images $T_{2}, T_{3}, T_{4}, E_2\_IZ, I_1$ and the output image are browsed simultaneously. The process remains identical to the one depicted in Fig. 6 except that the threshold image $T_1\_MS$ is never created: the binarization is performed directly once the threshold is found.

To prevent many memory accesses because of the numerous images to read, we rely on the scale relationship and iterate over the pixels of all the images simultaneously. We rely on the Morton order [26] to iterate over pixels in square subparts of images. In Fig. 9, reading a square of four pixels in the left image is equivalent to read a single pixel in the two other images. A new value is read in subsampled images only if all the pixels corresponding to the current one have been processed. Such an iteration in this final binarization reduces the total number of memory accesses in the images from $6 \times height_1 \times width_1$ to less than $3 \times height_1 \times width_1$ where $height_1$ and $width_1$ are respectively the height and the width of images at scale 1.

4 Experimental results

4.1 Analysis of the method

During the development of this method, we have used the document dataset of the page segmentation competition from icdar 2009 [27]. It is composed of 63 A4 full page documents at 300dpi. It contains a mix of magazines, scientific publications, and technical articles. We run our multiscale algorithm with several window values on that dataset and found that, $w = 51$ gives good results in all cases. This is the reason why we use that value in the following tests and evaluation.

Figure 10a is a good candidate to illustrate the drawbacks of the original strategy of Sauvola’s algorithm. The original size is $1,500\times 1,500$ pixels, and different sizes of objects are represented: the largest is more than 370 pixels and the smallest around 10 pixels tall. Object thickness varies from 40 to 1 pixels. Running Sauvola’s algorithm leads to almost empty incomplete binarized objects (Fig. 10b). Figure 10c shows that the multiscale implementation takes object sizes into account. Here, objects are colored according to the scale they have been retrieved from: green for scale 2, blue for scale 3, and red for scale 4. It clearly shows the dispatch of the different object sizes into the different scales. As a consequence Large objects are clearly well retrieved (Fig. 10d).

Figure 11 also shows some examples of the binarization results performed with this method on real documents. One can also notice the limits of using the object area as criterion: in Fig. 11a thick line separators are retrieved at scale 3 but they should be retrieved at scale 2 for best results.

Table 1 shows computation times results obtained on an Intel Xeon W3520@2,67Ghz with 6 GB of RAM and programs compiled with GCC 4.4 with -O3 optimization flag. While the classical Sauvola’s algorithm runs in 0.05s on A4 document images scanned at 300dpi, the multiscale implementation runs in 0.15 s. Table 2 illustrates in details this difference on a larger image. As expected, multiscale features imply a cost on computation time: it is about 2.45 times slower than the classical implementation on large images mainly due to the multiscale processing. The computation is constant with respect to the input image size.

Table 1 Comparison of Sauvola’s implementation computing times

Full size table

Table 2 Computation time details between two implementations with an image of $7{,}780\times 11{,}600$ pixels as input

Full size table

4.2 Adjustment of parameter $k$

In the original binarization method proposed by Sauvola, the parameter $k$ is set globally for the whole document. Adjusting this parameter can lead to better results on documents with low contrast or thin characters. In the multiscale approach, it is possible to set a different $k$ value for each scale, e.g., for ranges of object sizes. We have compared such a variant with the classical (monoscale) approach where $k$ is set to $0.34$ globally. We will notate $k_s$ the value of parameter $k$ at scale $s$. According to our experiment using $k_2=0.2$, $k_3=0.3$ and $k_4=0.5$ gives good results. At scale 2 and 3, only small and medium objects are retrieved. They can be thin or not contrasted enough, so setting a low value of $k_{2}$ and $k_{3}$ allows us to be less strict in Sauvola’s formula, i.e., retrieving pixels with lower contrasts. At scale 4, large objects are retrieved and they are large enough not to need some additional precision. This new version, with one value of $k$ per scale, namely Sauvola $\text{ MS }_{kx}$ , is evaluated in the next section, along with other methods.

5 Evaluation

All the material used in this section (datasets, ground truths, implementations, and benchmark tools) are freely available online from the resources page related to this article.^{Footnote 1}

The required quality of the binarization highly depends on use cases. We have chosen to evaluate two aspects which are important in a whole document analysis toolchain: pixel-based accuracy and a posteriori OCR -based results.

The evaluation is performed with eight binarization algorithms. We propose two implementations of Sauvola Multiscale: one with a fixed $k$ value, Sauvola $\text{ MS }_k$ , and another one where $k$ is adjusted according to the scale, Sauvola $\text{ MS }_{kx}$ . We compare those two implementations to the classical (monoscale) algorithm from Sauvola and height other state-of-the-art methods: Wolf’s [21], Otsu’s [2], Niblack’s [3], Kim’s [22], tmms [28], Sauvola MsGb [10], Su 2001 [29], and Lelore 2011 [14].

tmms [28] is a morphological algorithm based on the morphological toggle mapping operator [30]. The morphological erosion and dilation are computed. Then, if the pixel value is closer to the erosion value than to the dilation value, it is marked as background, otherwise it is marked as foreground. If this method was initially dedicated to natural images, hdibco 2009 challenge [16] shows that this algorithm gives also good results on document images. It was ranked $2nd$ out of 43.

Multiscale grid-based Sauvola (Sauvola MsGb), introduced by Farrahi Moghaddam and Cheriet [10], is a multiscale adaptive binarization method based on Sauvola formula and a grid-based approach. This method was initially dedicated to binarize degraded historical documents while preserving weak connections and stroke. It was ranked 9th out of 15 at hdibco 2010 and 14th out of 24 at hdibco 2012.

Su 2011 [29] relies on image contrast defined by the local image maximum and minimum. Text is then segmented using local thresholds that are estimated from the detected high contrast pixels within a local neighborhood window. This method was initially dedicated to processing historical documents. It was ranked 2nd out of 18 at dibco 2011.

Lelore and Bouchara’s method [14, 15] is based on the one which won dibco 2011 but without upscaling the input images. In a first step, the method achieved a rough localization of the text using an edge-detection algorithm based on a modified version of the well-known Canny method. From the previous result, pixels in the immediate vicinity of edges are labeled as text or background thanks to a clustering algorithm while the remaining pixels are temporarily labeled as unknown. Finally, a post-processing step assigns a class to these ‘unknown’ pixels.

For all methods, we have chosen the best parameter values. The parameter $k$ has been set according to the recommended values in the literature. Concerning the window size $w$, we have run the algorithms on the document dataset of the page segmentation competition (pscomp ) from icdar 2009 [27], and we have tuned the value to obtain results with good visual quality.

tmms parameters have been set by its author based on the results on the pscomp dataset. Su 2011 is self-parameterized and so is Sauvola MsGb since we relied on its smart mode. All parameter values are summarized in Table 3. For each method, the meaning of parameters is detailed in their respective reference article. Note that for tmms , as the author did for dibco challenge, an hysteresis is used to set up the cmin parameter; that is, the reason why there are two values for it. It is important to note that parameters are fixed for the whole evaluation in order to highlight the robustness of the methods.

Table 3 Parameters used to evaluate each method

Full size table

For most methods, we used their freely accessible implementations. We implemented Kim’s technique. tmms , Sauvola MsGb, and Su 2011 were provided as binaries by their respective authors. Binarization results for Lelore’s method have been directly computed by its author.

The evaluation is performed on two document types: historical documents and magazines.

5.1 Datasets

As shown in Table 4, the most common reference datasets for document image analysis provide pieces of documents without the whole context data. A majority of them, dibco ’s and hdibco ’s documents, are dedicated to historical documents, which include also handwritten documents. It implies specific cares for reconstructing or preserving thin characters, and for separating background and foreground data which may require special processing before and/or after binarization to obtain best results. Prima Layout Analysis Dataset (LAD) is composed of more than 300 full page documents from magazines and articles with a good quality but it only provides the layout ground truth.

Table 4 Overview of the reference datasets in document image analysis

Full size table

Our method remains a raw binarization method, without any pre- or post-processing, and is designed to work best on magazines with less severe degradations than in dibco datasets.

For the next evaluation, we use dibco 2009 and 2011, and hdibco 2010 and 2012 datasets for historical documents, and our own datasets, lrde Document Binarization Dataset (dbd ) for magazines.

lrde dbd is composed of two subsets. Both are based on an original vector-based version of a French magazine. From this magazine, we have selected 125 A4 300-dpi pages with different text sizes.

One subset of images has been directly extracted from the digital magazine and rasterized. Those images are clean with no illumination nor noise issues. For every pages, we have removed the pictures in order to make the groundtruthing and the evaluation easier. Our selection includes pages with background colored boxes and low contrasts. This dataset is used both for the pixel-based accuracy and for the OCR -based evaluation. We will refer to it as the clean documents dataset.

The other subset is based on the same documents that have been first printed as two-sided pages then scanned at 300-dpi resolution. A rigid transform has been applied to each document, so that text lines match the ones of the corresponding clean documents. Therefore, this process has introduced some noise, show-through, and illumination variations. This subset is used for OCR -based evaluation only. We will refer to it as the scanned documents dataset.

For each page, text lines have been grouped into three categories w.r.t. their font size. Lines with a x-height less than 30 pixels are categorized as Small (S) and correspond to core paragraph text; lines with a x-height between 30 and 55 pixels are considered as Medium (M) and correspond to subtitles or small titles; and for higher x-height, lines are considered as Large (L), e.g., titles. A set of lines composed of 123 large lines, 320 medium lines, and 9,551 small lines is available for both clean documents and scanned documents. For the OCR-based evaluation, we are thus able to measure the binarization quality for each text category independently.

The ground truth images have been obtained using a semi-automatic process. To that aim, we rely on a binarization using a global threshold. Sometimes, due to contrast or color issues, some objects were not correctly binarized or were missing in the output; therefore, we made some adjustments in the input image to preserve every objects.

In order to produce the OCR ground truth, we used Tesseract 3.02 [34] on the clean documents dataset. Errors and missed text were fixed, and text was grouped by lines to produce a plain text OCR output reference.

Datasets, associated ground truths, implementations, and the whole set of tools used for this evaluation are freely available online from the resources page related to this article.^{Footnote 2}

5.2 Evaluation with historical documents

We have tested our method on dibco /hdibco datasets from 2009 to 2012 [17, 18, 32] with the parameters given in Table 3. The results are detailed in Table 5. On historical documents, the multiscale versions of Sauvola are roughly comparable to the original Sauvola’s method. That was expected since the historical documents of the contest databases do not contain “multiscale text”; text is only either of small or of medium size. Note that the original Sauvola method (or any of its variations) is a monolithic general-purpose binarization algorithm. It cannot compete with some elaborate binarization chains, including pre- and post-processings, and a fortiori dedicated for historical documents.

Table 5 Pixel-based accuracy evaluation results performed on the dibco /hdibco subsets from 2009 to 2012

Full size table

All output images are available on the Web page related to this article for extra analysis.

5.3 Evaluation with magazines

5.3.1 Pixel-based accuracy evaluation

The clean documents dataset is used for pixel-based accuracy evaluation since the ground truth binarized documents are perfectly known.

Evaluation measure. According to common evaluation protocols [18], we used the F-measure (FM) in order to compare our method with other approaches:

$$\begin{aligned} FM = \frac{2\times Recall \times Precision}{Recall + Precision}, \end{aligned}$$

where $ Recall = \frac{TP}{TP + FN}$ and $ Precision = \frac{TP}{TP + FP}$, with $TP, FP$, and $FN$, respectively, standing for true-positive (total number of well-classified foreground pixels), false-positive (total number of misclassified foreground pixels in binarization results compared to ground truth), and false-negative (total number of misclassified background pixels). In tables, the F-measure is expressed in percentage.

Results and Analysis. Table 6 gives the evaluation results.

Table 6 Evaluation of the binarization results of different methods over 125 documents extracted from a digital issue of a French magazine

Full size table

A selection of three regions of document images is depicted in Figs. 12, 13 and 14 to compare the results of the different methods. We can see that our approach increases the result quality of the classical binarization method proposed by Sauvola by five percentage points. This difference is mainly due to a now-adapted window size that can adequately process large objects, as we can see on Figs. 12 and 14. Thanks to the multiscale approach, locally low-contrasted objects may be retrieved because they are considered on a larger area. This is the case in Fig. 13 where there is a large and low-contrasted object.

Compared to Sauvola’s approach, Sauvola $\text{ MS }_k$ and Sauvola $\text{ MS }_{kx}$ are able to retrieve the right part of the object. Niblack’s method is able to find it but the too small window prevents it from retrieving it completely. Wolf’s method performs relatively well but some objects are missing in the output. Otsu’s method performs better than any Sauvola’s approach. This is understandable because it is known to give good results on clean documents, which is the case here, and can retrieve large objects correctly. Its corresponding score results mainly from missing objects because of low contrasts (see Fig. 13g). Niblack performs well in the text but does not handle color text boxes edges correctly. Transitions between color boxes and the background lead to some artifacts. Same issues arise with textured background. Kim encounters some trouble with text in colored text box which leads to large artifacts, the box being considered as object instead of as background. Sauvola MsGb performs as well as Sauvola and surprisingly has some difficulties to extract large text like titles and drop capitals. This is also the case for Su’s method: large text and large reverse video areas are missing or degraded (Figs. 12l and 13l). In addition, small text edges are not as smooth as they should be. Lelore’s method gives good overall results, although it has some difficulties some times on large titles and small text. Its performances are really close to ours.

In Table 6, the time needed to compute the final binarization results on 125 documents confirms the multiscale overheads as compared to the classical Sauvola’s implementation. It shows also the large range of computation times, which is a crucial information to take into account while choosing an algorithm.

This evaluation also shows that adjusting the parameter $k$ w.r.t. the scale, in our approach, may improve the quality of the results. Sauvola $\text{ MS }_{kx}$ gets a three points higher F-measure than Sauvola $\text{ MS }_k$ . Those results highlight that, despite the most recent binarization methods perform very well on historical documents, they may not be able to properly binarize simple and clean magazine document images.

5.3.2 OCR -based evaluation

Evaluation method. Once a document has been binarized, and character recognition is performed, line by line, by Tesseract (the options: -l fra -psm 7 specify the recognized language, French, and that the given image contains only one text line). The OCR has been run on the two datasets: clean and scanned documents, and the error is computed thanks to the Levenshtein distance.

Results and analysis. Table 7 shows the recognition rate for seven binarization algorithms w.r.t. text line quality and x-height.

Our Sauvola $\text{ MS }_{kx}$ and Sauvola $\text{ MS }_k$ implementations almost always outperformed the original Sauvola’s algorithm and some state-of-the-art algorithms. On clean documents, results are very close thanks to clean backgrounds, which dramatically reduce the risk of erroneous retrievals. The small differences that are encountered here are due to the contrast between the text and the background colors: text is usually well retrieved but edges are not always as clean as those obtained with a multiscale Sauvola’s implementation (see Fig. 16). This fact is even more true for scanned documents where illumination variations and show-through artifacts are introduced. Sauvola $\text{ MS }_k$ and Sauvola $\text{ MS }_{kx}$ perform better than the classical Sauvola’s algorithm on large text, e.g., text with a high x-height. It is globally more robust, and the results are more stable on a wider set of objects. Moreover, they do not need fine parameter adjustment to deliver acceptable results with any object sizes.

Table 7 Raw OCR error rates of different binarization algorithms on the provided dataset according to the text size (S/M/L stand respectively for small/medium/large)

Full size table

Regarding tmms method, the results are globally equivalent to Sauvolas algorithm for clean documents. On scanned documents, the text is correctly binarized but cartouches and colored text boxes are considered as foreground (see Fig. 15). They surround text components, preventing the OCR from recognizing the text correctly. Since this is a common situation in this dataset, the OCR error is thus extremely high compared to the other methods. To be usable in our context, this method would require post-processing. Su’s method does not scale to large objects thus giving really bad results for that category. Results quality for small and medium text are below average due to many broken and missing characters. This method seems to be really specific to the kind of noise it was designed for. Surprisingly, Otsu’s method performs relatively well on both clean and scanned documents despite a non-uniform illumination. Lelore’s method performs very well on clean documents, but, on scanned documents. Small characters are usually broken and large ones have holes.

Except for Otsu’s and Lelore’s methods, the main drawback of state-of-the-art methods is their difficulty to correctly binarize large objects.

6 Reproducible research

We advocate reproducible research [35], which means that research product shall also include all the materials needed to reproduce research results. To that aim, we provide the community with many resources related to this present article, available from http://publis.lrde.epita.fr/201302-IJDAR.

First a demo is online, so that the user can upload an image and get the binarization result of our method; therefore, third-party people can quickly reproduce our results or run some extra experiments on their own data. Second, we also provide an implementation of our method. It is part of the scribo module [36] of the Olena project [37], along with most of the algorithms discussed in this paper. The Olena project is an open source platform dedicated to image processing, freely available from our Web site.^{Footnote 3} It contains a general-purpose and generic C++ image processing library, described in [38]. Last the magazine document image database (including ground truths and the results of 10 binarization methods, see Table 4) that we have set up is now usable by the community.

As a matter of comparison, among the 36 methods that entered the dibco 2011 and hdibco 2012 competitions, an implementation is available for only two of them and 31 of them cannot be reproduced due to more or less partial descriptions or missing parameter settings.

7 Conclusion and future work

In this paper, we propose an approach that significantly improves the results of Sauvola’s binarization on documents with objects of various sizes like in magazines. Sauvola’s binarization is made almost insensitive to the window parameter thanks to this implementation.

Its accuracy is tested on 125,300-dpi A4 documents. Where on small and medium text sizes, this implementation gets better or similar results than the classical implementation, it dramatically improves the results for large text in magazines. This property is very important for document analysis because text using large font sizes usually correspond to titles and may be particularly relevant for indexing purpose. Furthermore, pixel-based accuracy and character recognition rates are also improved by our proposal, that is crucial for a whole document analysis, from the layout to the contents. Sauvola’s formula is probably not the best one to use for historical documents but at least our evaluation showed that it still competes with the latest awarded methods regarding magazines and classical documents. We also proposed a fast implementation of our method, limiting the impact of the additional steps to a 3 times slower method instead of a 7-times slowdown in a naive version.

The proposed implementation is part of the scribo [36] module from the Olena platform [38], an open-source platform for image processing written in C++, freely available on our Web site. The scribo module also contains the implementation of some algorithms presented in this paper. An online demo of our method is available^{Footnote 4} where documents can be uploaded for testing purpose.

An issue remains though and may be considered for further investigations. The area criterion used to select at which scale an object should be retrieved is probably not precise enough to make a distinction between large thin objects and large thick objects.

Notes

References

Sezgin, M., Sankur, B.: Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 13, 146–165 (2004)
Article Google Scholar
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern 9(1), 62–66 (1979)
Article MathSciNet Google Scholar
Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company, Birkeroed (1985)
Google Scholar
Sauvola, J., Pietikainen, M.: Adaptive document image binarization. Pattern Recogn. 33, 225–236 (2000)
Article Google Scholar
Badekas, E., Papamarkos, N.: Automatic evaluation of document binarization results. In: Progress in Pattern Recognition, Image Analysis and Applications, pp. 1005–1014 (2005)
Bernsen, J.: Dynamic thresholding of grey-level images. In: Proceedings of the International Conference on, Pattern Recognition, pp. 1251–1255 (1986)
Gabarra, E., Tabbone, A.: Combining global and local threshold to binarize document of images. In: Pattern Recognition and Image Analysis, vol. 3523 of LNCS, pp. 173–186. Springer, Berlin (2005)
Rangoni, Y., Shafait, F., Breuel, T.M.: OCR based thresholding. In: Proceedings of IAPR Conference on Machine Vision Applications, pp. 98–101 (2009)
Tabbone, S., Wendling, L.: Multi-scale binarization of images. Pattern Recogn. Lett. 24(1–3), 403–411 (2003)
Article Google Scholar
Farrahi Moghaddam, R., Cheriet, M.: A multi-scale framework for adaptive binarization of degraded document images. Pattern Recogn. 43(6), 2186–2198 (2010)
Article MATH Google Scholar
Chang, F., Liang, K.-H., Tan, T.-M., Hwang, W.-L.: Binarization of document images using hadamard multiresolution analysis. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 157–160 (1999)
Bukhari, S.S., Shafait, F., Breuel, T.: Foreground-background regions guided binarization of camera-captured document images. In: Proceedings of the International Workshop on Camera Based Document Analysis and Recognition, 7 (2009)
Lu, S., Su, B., Tan, C.: Document image binarization using background estimation and stroke edges. Int. J. Doc. Anal. Recogn. 13, 303–314 (2010)
Article Google Scholar
Lelore, T., Bouchara, F.: Super-resolved binarization of text based on the FAIR algorithm. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 839–843 (2011)
Lelore, T., Bouchara, F.: FAIR: a fast algorithm for document image restoration. (2013, published)
Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 document image binarization contest (DIBCO). In: Proceedings of ICDAR, pp. 1375–1382 (2009)
Pratikakis, I., Gatos, B., Ntirogiannis, K.: H-DIBCO 2010—handwritten document image binarization competition. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 727–732 (2010)
Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2011 document image binarization contest (DIBCO). In: Proceedings of International Conference on Document Analysis and Recognition, pp. 1506–1510 (2011)
Howe, N.: A laplacian energy for document binarization. In: Proceedings of the IEEE International Conference on Document Analysis and Recognition, pp. 6–10 (2011)
Shafait, F., Keysers, D., Breuel, T.M.: Efficient implementation of local adaptive thresholding techniques using integral images. In: Document Recognition and Retrieval XV, vol. 6815, p. 681510 (2008)
Wolf, C., Jolion, J.-M.: Extraction and recognition of artificial text in multimedia documents. Pattern Anal. Appl. 6, 309–326 (2004)
Google Scholar
Kim I.-J. Multi-window binarization of camera image for document recognition. In: Proceedings of International Workshop on Frontiers in Handwriting Recognition, pp. 323–327 (2004)
Vincent, L.: Exact Euclidean distance function by chain propagations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 520–525 (1991)
Chassery, J.-M., Montanvert, A.: Geometrical representation of shapes and objects for visual perception. In: Geometric Reasoning for Perception and Action, vol. 708 of LNCS, pp. 163–182. Springer, Berlin (1993)
Dillencourt, M.B., Samet, H., Tamminen, M.: A general approach to connected-component labeling for arbitrary image representations. J. ACM 39(2), 253–280 (1992)
Article MATH MathSciNet Google Scholar
Morton, G.M.: A computer oriented geodetic data base; and a new technique in file sequencing. Technical report, IBM Company (1966)
Antonacopoulos, A., Pletschacher, S., Bridson, D., Papadopoulos, C.: ICDAR 2009 page segmentation competition. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 1370–1374 (2009a)
Fabrizio, J., Marcotegui, B., Cord, M.: Text segmentation in natural scenes using toggle-mapping. In: Proceedings of IEEE International Conference on Image Processing (2009)
Su, B., Lu, S., Tan, C.L.: Binarization of historical document images using the local maximum and minimum. In: Proceedings of the IAPR International Workshop on Document Analysis Systems, pp. 159–166 (2010)
Serra, J.: Toggle mappings. Technical report, CMM, Ecole des Mines, France (1989)
Antonacopoulos, A., Bridson, D., Papadopoulos, C., Pletschacher, S.: A realistic dataset for performance evaluation of document layout analysis. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 296–300 (2009b)
Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICFHR 2012 competition on handwritten document image binarization. In Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 813–818 (2102)
Mollah, A.F., Basu, S., Nasipuri, M.: Computationally efficient implementation of convolution-based locally adaptive binarization techniques. In: Wireless Networks and Computational Intelligence, vol. 292 of CCIS pp. 159–168. Springer, Berlin (2012)
Smith, R.: An overview of the Tesseract OCR engine. In Proceedings of International Conference on Document Analysis and Recognition 2, 629–633 (2007)
Vandewalle, P., Kovacevic, J., Vetterli, M.: Reproducible research in signal processing. IEEE Signal Process. Mag. 26(3), 37–47 (2009)
Google Scholar
Lazzara, G., Levillain, R., Géraud, T., Jacquelet, Y., Marquegnies, J., Crépin-Leblond, A.: The SCRIBO module of the Olena platform: a free software framework for document image analysis. In Proc. of the Intl. Conf. on Document Analysis and Recognition (2011)
Levillain, R., Géraud, T., Najman, L.: Milena: Write generic morphological algorithms once, run on many kinds of images. In: Mathematical Morphology and Its Application to Signal and Image Processing (Proceedings of the International Symposium on Mathematical Morphology), pp. 295–306. Springer, Berlin (2009)
Levillain, R., Géraud, T., Najman, L.: Why and how to design a generic and efficient image processing framework: The case of the Milena library. In: Proceedings of the IEEE International Conference on Image Processing, pp. 1941–1944 (2010)

Download references

Acknowledgments

The authors would like to thank Yongchao Xu, Jonathan Fabrizio, and Roland Levillain for proof-reading and commenting on the paper, and Reza Farrahi Moghaddam, Thibault Lelore, and Frédéric Bouchara for their active collaboration. The authors are very grateful to Yan Gilbert who has accepted that we use and publish as data some pages from the French magazine “Le Nouvel Obs” (issue 2402, November 18–24, 2010) for our experiments.

Author information

Authors and Affiliations

EPITA Research and Development Laboratory, LRDE, 14-16, rue Voltaire, 94276, Le Kremlin-Bicêtre, France
Guillaume Lazzara & Thierry Géraud

Authors

Guillaume Lazzara
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Géraud
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thierry Géraud.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lazzara, G., Géraud, T. Efficient multiscale Sauvola’s binarization. IJDAR 17, 105–123 (2014). https://doi.org/10.1007/s10032-013-0209-0

Download citation

Received: 31 August 2012
Revised: 24 April 2013
Accepted: 11 June 2013
Published: 04 July 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10032-013-0209-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Efficient multiscale Sauvola’s binarization

Abstract

Similar content being viewed by others

ISauvola: Improved Sauvola’s Algorithm for Document Image Binarization

A new efficient binarization method: application to degraded historical document images

Adaptive binarization method for degraded document images based on surface contrast variation

Explore related subjects

1 Introduction

1.1 Overview

1.2 Sauvola’s algorithm and issues

1.3 Missing low-contrast objects

1.4 Keeping textured text as is

1.5 Handling badly various object sizes

1.6 Spatial object interference

2 Multiscale binarization

2.1 Notation

2.2 General description

2.3 Step 1: subsampling

2.4 Step 2: object selection at each scale

2.5 Step 3: result merging

2.6 Step 4: final binarization

3 Optimization

3.1 Step 1: setup of input data

3.2 Step 2: object selection at each scale

3.3 Step 3: result merging

3.4 Step 4: final binarization

4 Experimental results

4.1 Analysis of the method

4.2 Adjustment of parameter \(k\)

5 Evaluation

5.1 Datasets

5.2 Evaluation with historical documents

5.3 Evaluation with magazines

5.3.1 Pixel-based accuracy evaluation

5.3.2 OCR -based evaluation

6 Reproducible research

7 Conclusion and future work

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation