1 Introduction

The binarization of an image is an important aspect in many image analysis and vision applications, such as document enhancement, scene text detection, nuclei segmentation, and much more. However, the need of binarization depends on the requirement of the application and the desired final output through the system. The role of binarization is to separate out the desired objects from others present together in the image. Through decades, it is the problem in separating the regions of interests (foreground objects) from the background in an image, with only possibility of having objects dissimilar from dark (or light) background (Yarramalle and Rao 2007). Thus, the process of binarization separates such objects from the image background. Binarization is the process of conversion of the image pixel values either from R, G, B (0–255 [red, green, blue] taken separately) or gray scale (0–255) in the binary form that is, 0 and 1 (Singh and Singh 1988). This conversion employs a threshold value, \(T_{d}\), pixel values greater than \(T_{d}\) are set to 1 and pixel values smaller than or equals to \(T_{d}\) are set to 0, or vice versa (Otsu 1979; Mandal et al. 2012), using the given formula:

$$\begin{aligned} b(x,y)=\left\{ \begin{array}{ll} f(x,y)>T_{d}, &{} 1\\ f(x,y)\le T_{d}, &{} 0 \end{array}\right. \end{aligned}$$
(1)

This is an example of a global binarization method, where a single value of the threshold, \(T_{d} = constant\), binarizes the entire image, however, this produces incompetent binarization results showing the effects of non-uniform illumination and uneven background in an image. The global methods take into consideration the global characteristics of an image and tends to binarize an image using a single value. However, the results of these techniques is not always satisfactory and needs further refinements in order to get the desired output. Interestingly, the global methods are fast in computation independent of the application dependency (Sahoo et al. 1988; Trier and Jain 1995). This raised the need to have a method that can precisely deal with the image characteristics at local level along with in least computation time (Chaki et al. 2014). With these surveys, it is clear that global methods are insufficient to deal with various deficiencies specific to images. It is because a global method separates the image pixels into two image regions, one having pixels value higher than the threshold and the other with lower than the threshold. Thus, having a binarization method capable of eliminating such deficiencies of an image raised incrementally. Considering the then present need of an efficient binarization method, Wayne Niblack proposed the local binarization method in 1986. This method computes the local mean and standard deviation of the pixels value within a confined size of a window in an image.

1.1 Niblack’s binarization method (1986)

Niblack’s binarization method (NBM) is the oldest local binarization method found in the present literature (Niblack 1986). Before NBM, global methods were used for the segmentation of an image, however, which are incapable of preserving the minute details in an image, while separation of the objects. Thus, NBM was developed to preserve minute details at a local level while binarization of an image, introducing a new concept of local window. In this method, the estimation of a threshold value is based on the calculation of local mean and standard deviation of pixels value in a local window confined to an image. The threshold estimation formula for NBM is defined by:

$$\begin{aligned} T_d = m(x,y) + k\times s(x,y) \end{aligned}$$
(2)

where m(xy) and s(xy) are the local mean and standard deviation and k is an image dependent manually selected parameter (NBM used \(-0.2\) for dark foreground and +0.2 for dark background). This is an image dependent parameter and changes according to the image foreground and background conditions. This could be the only drawback of NBM, selecting a parameter manually.

In literature, it is observed, a large value of k adds extra pixels to foreground area which makes the text unreadable, while a small k value reduces foreground area resulting in broken and incomplete characters (Chaki et al. 2014). Thus, the disadvantage of this method is that it produces a large amount of background noise, increasing the foreground region (Sezgin and Sankur 2004). It have also been observed the processing time increases due to per pixel threshold estimation, with lots of background noise around the foreground objects. However, in a survey, it is found that NBM performed better than other ten local binarization methods for the gray scale images having low contrast, noise, and uneven background intensity (Trier and Jain 1995).

The value of the parameter, k, is the possibility for the non-direct implementation of NBM to many imaging applications that require binarization. The parameter, k, is sensitive to NBM and it is difficult to find a precise value for it. This has called for direct and indirect modifications to NBM to the applications that are intended to implement binarization. In addition, the default value of the parameter, k, is not always sufficient enough to get good results for the test images. According to an observation, the results of NBM produce pepper noise in the shaded regions of an image, although after having a precise value of the parameter k (Zhang and Tan 2001).

The NBM finds a threshold value for a pixel within the window of size chosen empirically based on the heuristics dependent on the image characteristics. Moreover, the NBM relies on the neighborhood processing, that is, the threshold value is calculated using the values of the pixels in the neighborhood. However, if we revisit the NBM’s binarization formula, it is evident from this formula that this method calculates a threshold value by the addition of local mean and k times the local standard deviation. More precisely, any increase or decrease in the threshold value depends on the value of constant k, such that, if k is greater than 0, if k is less than 0, and if k is equal to 0 then the threshold value approaches to the highest gray level, lowest gray level, and equates to the local mean value, respectively.

Generally, it is considered that a local binarization method preserves more information relevant to the desired output in comparison to the global methods (Uchida 2013). In seeing over the disadvantage of a local binarization method, however specifically, the window size approximation is considered as the main disadvantage of the local methods. This is true for the NBM also. For the small and large window size in a binarization method adds noise and behaves like a global method respectively. More interestingly, in the absence of any object pixel in the window may derive the local methods to detect such pixels as object pixels.

The NBM, since its development, has seen many changes or modifications that may have involved the interpretation of original local mean and standard deviation calculation directly or indirectly. It is the first binarization method that used the concept of window, dividing the image into small parts and finding the threshold for that part lying within the window. From then, the threshold calculated for the small part of an image is known as local threshold. Based on the applications, according to the input image and desired output, the modifications are carried over the NBM. The modifications to NBM can be seen in various applications, such as deteriorated documents image binarization, manuscripts restoration, finding texts in video frames, revealing engraved wooden stamps, vehicle license plate number recognition, stained cytology nuclei detection and product barcodes reading. However, there could be a possibility of other applications using NBM with modifications based on the input images and the applications’ requirements, for example, medical imaging (Misiak et al. 2014; LaTorre et al. 2013; Liu et al. 2016a; Kinoshita et al. 2016; Petitjean and Dacher 2011), strip steel defect detection (Liu et al. 2016b), concrete surface cracks detection (Mohan and Poobal 2017), augmented reality applications (Carabias 2012), and optical music recognition (Rebelo et al. 2012).

A recent study focuses on the binarization of the strip steel defect images using genetic algorithm and mathematical morphology (Liu et al. 2016b). This study focuses on strip steel defects identification and classification suffering from non-uniform illumination and low contrast. This method uses top-hat transformation to reduce the effect of non-uniform illumination. And in the genetic algorithm operations, this work “randomly generates ten chromosomes as initial population according to genotype coding,” which changes later with a certain probability. This work claims that the mathematical morphology and genetic algorithm based binarization method is more effective, efficient, and practical according to the time consuming, than global and local binarization methods, like Otsu and Bernsen respectively. Further, this work point outs the drawback of the both global and local binarization methods, that is, the global methods do not suit with the non-uniform illumination and low contrast, and dynamical determination of threshold value in local methods leads to heavy calculations and are time consuming. Furthermore, the local methods create pseudo-shadow, however more clearly, produces noise in the binarized image, which is unwanted for the desired output through the application.

In the most recent work, a pooling based evaluation technique tends to perform image binarization without using any ground truth image or the reference image (Liu et al. 2017). This pooling based evaluation employs twenty-seven well-known binarization algorithms, which comprises of both the global and the local methods. Further, it asserts on collecting the relevant images only in the pool and ignoring the irrelevant ones, however interestingly, which guarantees noninvolvement of the hand work. Furthermore, this quantitative evaluation lays emphasis over producing the referred binary image (also called as pseudo ground truth) from the binarized output images of the binarization methods kept in the pool. This work claims that the quantitative evaluation approach performing image binarization is effective and practical both in the terms of desired output and the processing time, respectively. This is an interesting approach because it does not rely on the ground-truth data, however, it creates the ground-truth image from the binarized output images kept in the pool, which makes worth applying this approach.

2 Motivation

Image binarization is one of the initial steps in image analysis, computer vision and pattern recognition applications. The goal of image binarization is to convert image pixels (whether color or gray scale) into black or white based on a predetermined threshold value selected manually or automatically. This assists in splitting an image into a set of disconnected sub-image regions with consistent and identical features.

The global binarization method like, Otsu (Otsu 1979), finds a single threshold value for the entire image. In such processing, there is a tendency of loosing desired objects pixels, because of the chances of occurrence of the desired objects pixels in the background region. Following surveys (Stathis et al. 2008; Alginahi 2010; Athimethphat 2011; Som et al. 2011) show that global binarization methods are less feasible than local binarization methods. The comparison of global methods with local methods and their efficiency studies motivated this review to consider NBM as the basis of the local binarization methods study (Trier and Jain 1995; Trier and Taxt 1995; Alginahi 2010; Som et al. 2011; Ntirogiannis et al. 2014) found NBM as the efficient binarization method than global as well as some of the local binarization methods.

Many local binarization methods have been proposed in the past decades, among which NBM is one of the most successful methods, which has incurred heavy modifications. The NBM led the foundation for the local binarization methods, introducing a processing at pixels level. These processing uses neighboring pixels value within a window of a region in an image. Therefore, this paper is further motivated to consider those researches that incorporated modifications to NBM according to their input images with specific or desired outputs.

The improvements over NBM are evident because of the drawbacks lying in the processing of the test images and the final output through the NBM.

Firstly, in the terms of processing time taken by the NBM, that is, the need to have a fast method or the binarization method should take minimal processing time for the processing of the test images, which is unavailable in the NBM. More specifically, the NBM uses a small 15-by-15 size the window over a region in an image for the calculation of the threshold value, that is irrespective of the size of the input image. This led the increase in the processing time for the NBM processing an input image for binarization, which is unacceptable for almost all of the improved binarization methods reported in the literature except for the Sauvola’s binarization method.

Secondly, the improved or modified binarization methods demand noise-free binarized images or the final output through the applications using such modified binarization methods, that is too unavailable in the original NBM. More precisely, the NBM produces a large amount of noise in the output binarized images, that is the several small dots or the black pixels (also referred as salt-and-pepper noise) occurring in the output images, which disrupts with the desired objects pixels. Further, the occurrences of undesired pixels in the output binarized images is unacceptable to almost all of the modified NBM binarization methods, which are being used in the real-time applications.

These are the two major drawbacks of the original NBM, which led the foundation of having several improvements or modifications in the NBM based on the test images available for the different real-time applications. Some methods compromise with the quality of the binarized output images in the terms of processing time, while others compromise with the time of processing of the images in the terms of quality output binarized images. Both of the conditions are unfavorable to the original NBM and having improvements in the NBM is needed to overcome both of these drawbacks.

The NBM is the binarization method that deals with the separation of desired objects (text) from the background locally. This method tends to process at pixels-level using gray scale values of the pixels. NBM finds local threshold value of a portion (region) of an image using the neighboring pixels value. More precisely, NBM finds mean and standard deviation values of the respective region incorporating pixels value confined to that region only. Such a processing reduces the scope of processing all the pixels or the maximum pixels being processed for the calculation of threshold value for the whole image.

The NBM uses a 15-by-15 pixels window to threshold an image region finding separate threshold values at each pixel. This threshold calculation uses a mean and a standard deviation value in a window in an image region. Where the pixel corresponding at center of the window is set to black or white on the basis of the threshold value obtained (that is lower or higher) for that region. Some methods are the direct modifications to the original NBM and processed their dataset. Thus, the objective of this review is to present local binarization methods incurring modifications to the original NBM in real time applications. Later sections of the paper will discuss the binarization methods that introduced constants as well as interdependent variables as the modifications to the original NBM.

3 Deteriorated documents image binarization

Observing the implementations of NBM to document images, suffering from uneven background, low illumination, and others, the need to have a modified local binarization was in much demand. Later, after the first local binarization method developed by Niblack, this was one of the first modifications to NBM capable of dealing with such deficiencies (Sauvola and Pietikainen 2000). This method estimates the threshold value by calculating the mean and standard deviation of the pixels within a window. Low contrast in the pixels neighborhood favors this method, since it eliminates dark regions from the background. Thus, lowers the threshold value, whereas in regions with high contrast standard deviation equates to R that further equates threshold value to mean. Hence, confirms the dependency of this modified method on the local neighborhood pixels in estimating the threshold value (Sezgin and Sankur 2004; Badekas et al. 2006). On the other hand, the threshold value in a local window is also regulated by the parameter, k, however impactful, the method has lower impact of k. The method computes the threshold for each pixel, increases the computational complexity. For example, an image of size \(n\times n\) and window size \(w\times w\) would require O(\(n^2w^2\)) computation.

Another modification to NBM came in the year 2001 (Zhang and Tan 2001), with an attempt in overcoming two deficiencies of the original NBM, the sensitivity of the parameter, k, and occurrences of pepper noise (occurring of black pixels) in the image background regions. This method seems effective for the binarization of grayscale scanned images of the thick bound books. In experiments, it is found that the shaded regions near the spines of the book are effectively recovered, which was difficult with the original NBM. The method modified in lure of binarization of low quality document images introduced three components and three coefficients (\(\alpha _1\), \(\alpha _2\) and \(\alpha _3\)) in estimating the threshold value based on local gray-value contrast (Feng and Tan 2004). To increase the binarization performance, however empirically, this method introduces an additional exponential parameter (\(\gamma \)). In order to reduce the computational load, the method calculates threshold by bilinearly interpolating the local threshold values calculated in primary local windows to all the image pixels. This method obtained a character recognition rate of 90.8%, with suppression of noises in non-text regions, and completeness of binarized text characters, however, there is no evidence of binarization of handwritten document images.

The modified NBM calculates both the global and local image characteristics in estimating the binarization threshold for an image (Rais et al. 2004). It further calculates the normalized difference of the global and local means perpetuating the illumination differences of pixels in each local window to that of global illumination. For images with different contrast stretch values, the method fails, since it estimates different threshold values in such cases. To overcome this issue, method introduces the calculation of standard deviation, thus improving the binarization performance. The standard deviation value remains in compliance with different images having different characteristics, which is in accordance with NBM. The modified binarization method based on the concepts of integral images (Viola and Jones 2004; Bradley and Roth 2007) employ both local mean and local variance binarize gray scale document images, without any restrictions on window size (Shafait et al. 2008) “If in an image, pixel intensity at a certain position tends to equate to the sum of all the pixels intensities above and to the left of that position in original image is known asan integral image.” In actual, this method is not a modification to NBM, more precisely, it only employs local mean and local variance within a window for document binarization. However, the method claims to binarize gray scale documents effectively in a single pass, computes local mean using two addition and one subtraction operation within an independent window size. This reduce the computational complexity from \(O(n^2w^2)\) to \(O(n^2)\).

This is an attempt that redefined the Sauvola’s binarization method and observed the effects of the parameter, k, for different types of degraded camera-captured documents, with an inspiration from NBM (Bukhari et al. 2009). The method employed different k values for the documents binarization, 0.05 and 0.2 based on the presence and absence of ridge(s) in the local neighborhood window, respectively. Further, the method applies median filter to remove the salt-and-pepper noise in binarized images getting good results for foreground text lines and drawings. In order to produce good results, the parameter, k, is modified (Liu and Ding 2009) according to the input data used for binarization in the modified Niblack’s algorithm (Zhang and Tan 2001). The results of this work (Liu and Ding 2009), found the unsuitability of the parameter, k, that is, increase in noise in the shadow areas of the images. A modified value of k is evident in this work and ‘0.05’ is found to be the suitable value for the binarization, thus preserving the upper and lower boundary lines of the text in the document images.

A parameter free algorithm for binarization of gray scale document images estimates threshold value based on variable window sizes sliding over the document image (Boiangiu et al. 2011). In this method, the size of the window varies in accordance with the value of standard deviation of gray level pixels in the local window. Thus, in this way, it preserves the local properties and reduces the noise, along with producing quality output and speedy conversion. For binarization, an integral sum image is used to reduce the running time in computing mean in local windows and mean deviation, in addition, this method is independent from calculating local standard deviation (Singh et al. 2011). The parameter, k, in this method controls the estimation of the threshold value, such that low k value determines high threshold value and vice versa. This method obtained good binarization results for gray scale document images without any constraint of window size, same as global techniques, with a computational complexity of \(O(n^2)\) only.

Table 1 State of the art modified and improved binarization methods based on the original NBM

4 Degraded manuscripts restoration

An adaptive NBM automatically finding k values and window sizes for different document images, comprises of five steps (detailed in Table 1), binarize effectively the historical archive documents (He et al. 2005). This work tested various combination of k values and window sizes employing original NBM for the archival document images, however heuristically, results were unsatisfactory for the archival purposes. The proposed adaptive NBM performed better than original NBM and other binarization methods chosen for comparison. An automatic method to recognize characters from ancient manuscripts recorded on palm leaves is considered under the ‘Preservation of Palm Leaf Manuscripts’ project at Mahasarakham University at Thailand (Chamchong and Fung 2009). This method calculates threshold for each pixel by computing local mean and standard deviation of all local threshold values, thus, reducing the noise, sharpening and recovering the characters. This work raised the issue of selecting the best binarization technique and claimed that no single method is applicable to all types of document images.

An enhanced system for degraded old documents estimate the threshold value using mean and standard deviation depending on image brightness (Halabi et al. 2009). In case of dark images (low values in histogram), calculating mean and standard deviation tends to produce low values, this however, decrease the overall threshold estimation. This system employs this binarization method to choose values of two parameters, k1 and k2, depending on the document image brightness that ranges between [1–2]. This work involved various threshold selection methods depending on the document image characteristics, for optical character recognition (OCR) and digital preservation. The work concentrated on manuscripts restoration by denoising and binarization of historical documents images found local binarization effective than global methods (Ventzas et al. 2012). It mainly focused on text to background enhancements considering restoration techniques and binarization methods together with image processing applied on document images. The computational complexity of \(O(n^3)\) is calculated for the algorithm used for digitization and restoration purpose, however parallel computational machines overcame this complexity. This work found text orientation, skew, skew detection time, and skew reconstruction time as the critical parameters of the handwritten documents. However, selection or combination of appropriate algorithms still remains.

Another work on the Jawi historical manuscripts at the Royal Museum of Pahang, Malaysia, applied local binarization method on documents images suffering from various deteriorations (Som et al. 2011). This work observed deteriorations in historical documents such as faded paper, ink expand, uneven color tone, torn paper and other disrupting elements, namely the existence of small spots. NBM was the only method found efficient in producing a clean copy of the manuscripts with clear text, thus improving the readability and suitability for character recognition. In future enhancements, addition of edge detection and image refinement techniques was suggested for post processing and formulation of a class proper method. A similar digitizing and recognition project on Early Christian Greek manuscripts, the D-Scribe Project, a Greek GSRT-funded R&D project at Mount Sinai monastery aimed at preservation of historical documents using image processing techniques (Perantonis et al. 2004). This project used the binarization method only in the context of documents image enhancement, however in addition, employing NBM aided in providing the rough estimation of image foreground regions. Further, the detection and recognition of old Greek handwritten characters and processing of other types of historical manuscripts was remaining for future enhancements.

A local binarization algorithm to binarize low quality ancient documents, which is based on the concept of sliding window is directly derived from the original NBM (Khurshid et al. 2009). This method claims to estimate suitable threshold values for most of the types of documents image degradations. The parameter, k, in this method too depends on the requirements of the application and varies accordingly to produce desired outputs. For images with low intensity variations and whiter regions, the method is found to be efficient and better than original NBM. The latest modification to NBM binarize stain-affected palm leaf and other types of manuscripts images for readability improvement (Saxena 2014). In this method, the parameter, k, has minimal effect in threshold estimation, however, it introduced a variable constant of proportionality depending on the type of the image. This work observed the effect of window size in terms of processing time, the smaller the window the more the processing time, and vice-versa. The method satisfactorily preserved stroke width, maintained the shape and connectivity of the characters, thus, outperforming the original NBM. The processing time for an image calculated in this method for a given window size was close to \(O\left( \displaystyle \frac{n(n+l)}{w}\right) \).

5 Texts in scenes and videos

A hybrid text binarization method based on Otsu (1979) and NBM extracts textual content from images suffering from shadows, non-uniform illuminations, specular reflections, and low contrast (Deepa and Victor 2012). This method utilizes background separation and edge detection characteristics of the Otsu and NBM methods respectively, however heuristically, it assigns two separate weights to threshold values obtained by these methods. The threshold estimated by the hybrid binarization method is an average value of the thresholds estimated by these standard methods. Further, the method uses dual tree complex wavelet transform, morphological dilation and logical operations to separate the text and non-text regions from the images. A stroke filter based binarization method considers intrinsic characteristics of text in segmenting textual content in video images (Liu et al. 2006). This method performs three sub-procedures, namely stroke filtering, text polarity determination, local region growing and analysis in text segmentation. The stroke filtering finds orientation, scale and appearance (dark and bright) as responses to image pixels. The text polarity determination estimates two features, such as “the ratio of the sums of bright and dark stroke filter response magnitude values,” and “the ratio of the numbers of edge points in binarized bright and dark stroke filter response maps.” For local region growing, this method combines global probability density function and local similarities based on stroke filter responses.

The method developed for binarization of still images from videos finds text using an additional gain parameter tuned for the local contrast window (Wolf and Jolion 2003). In this method, contrast is maximized to estimate the threshold value for the still image from a video. This work uses video signals ensuing gray levels of a still image for text extraction and recognition, however, there are geometrical, morphological and temporal properties also present in the video signals. This method incrementally calculates the mean and the standard deviation values from the preceding windows, traversing only the first pixel of the first window. For future research, this work would consider scene text, text with general orientations, and moving text for extraction and recognition. A two-threshold based hybrid method detects strokes and stroke edges using an improved version of the NBM (Li et al. 2009). It coagulates strokes edges using NBM in gray scale edge map, and also employs global information to overcome the fallacies of the original NBM. This hybrid binarization method separates embedded text from the images, mainly from the background of an image. It possibly predicts heuristics in strokes and stroke edges integration, thus improving the text segmentation from the color background.

An automatic parameter-free binarization method extracts text areas from complex background, low resolution images from video frames and web pages (Saidane and Garcia 2007). This method extensively uses convolutional neural network to learn the binarization peculiarities independent of parameter tuning from a training set of synthesized images. Such a supervised learning approach precisely learns in distinguishing the color distributions and geometrical properties of the text mainly in low resolution images. Comparison with other methods support the claims of the authors of developing a parameter-free binarization method efficient enough to separate scene text from complex image background. Another binarization method based on multilayer perceptron (MLP) classify handwritten text regions in video frames employed k-means algorithm to detect textual components (Banerjee et al. 2014). This method uses scale invariant feature transform descriptors over the entire frame to localize the text in a video frame. These features in the form of an input vector are passed through the MLP network that separates the text and non-text regions in the particular frame. This approach further uses k-means clustering based on two simple rules to distinguish detected components into text, and non-text components as noise.

6 Seals and stamps binarization

The analysis and conservation work on ancient wooden stamp uses a modified NBM to simulate the printing process with black ink on white pages (Seulin et al. 2006). This work considered printing zones to black (pixel value = 0) and non-printing zones to white (pixel value = 1) in a stamp as high and low elevation zones, respectively. The method calculates both the local mean (in a window) and the global mean (for entire image). However in binarization, maximum value of local or global mean is taken as threshold value for assigning the pixel as black or white (value = 0 or 1) pixel. This work further insights on modifying the parameter, k, to simulate the inking and printing process at various conditions (ink quantity, paper quality or humidity, ink fluidity, exerted pressure ...). A parameter-free binarization method based on NBM is used to binarize low contrast technical document gray scale images (Valverde and Grigat 2000). This method employed NBM along with morphological image processing; erosion followed by dilation, and gradient based decisions; removing “ghost” objects, the whole algorithm setup considered 4-connectivity while processing the images. The authors claimed to restore faint stamp lines, suppressed variable background contrast and reconstructed fine details in the low contrast image regions. This algorithm took average computation time for low contrast images and authors further proposed to improve it in the future enhancements of the method.

An application, like the postal envelope analysis, use a combination of global as well as local binarization techniques in a multi-stage thresholding approach (Wu and Amin 2003). This approach used global technique and estimate a preliminary threshold value for the entire image, which removes the background and enables connected component analysis to find objects of interests. The local technique estimates a separate threshold value for each sub-image regions in the main image, thus each region is binarized separately and the final binarized image is the composition of these sub-images. The results strengthen method’s claims in effectively binarizing sparse and dense textual regions, mixed fonts with different sizes and orientations, text on different shading or watermarks within one image. A patent application based on local binarization method includes a modified NBM, which is used to correct the contours of the alphanumeric characters on the stamps and envelope covers (Yoder 2009). This method identifies and replaces the defected set of characters string, and then, decodes the corrected set of characters string in the reconstructed characters set. The binarization section of this method uses the modified NBM to binarize the stamps image and generates a black and white image with labelled character components. Further, the method uses a contour coder that corrects the defected contours with a corrected characters set. This enables the effective recognition of the characters through an optical character recognition application.

7 Cytological and histological imaging

The binarization of images finding foreground objects is not limited only to documents or video scenes, however interestingly, analysis of cell nuclei too can be done. The modified binarization method deals with the problem of segmentation of nuclei in analysis of cytology images (Phansalkar et al. 2011). This method claim in analyzing the stain-independent low contrast cytology images, in general, dealing with the problem of non-uniform staining. Furthermore, this method introduces three additional fixed parameters that are specific to cytology images, and needed to be tuned for document images. Unlike other modified NBMs, this method uses a circular local window rather than rectangular window; radius of that is fixed too.

A patent application based on the NBM is applied in segmentation of cell nuclei in histological sections (Nielsen et al. 2011). This method binarize a gray scale nuclei image using NBM, identifies objects of predetermined size as nuclei and ignores the rest, an edge detector extracts these nuclei, and in case of overlapping objects splits them into nuclei. Specifically, the nuclei of size to a predetermined average gradient magnitude are kept, whereas the rest are removed and created holes are filled up. In post processing, this method determines the active contours of the objects, here these are nuclei, in an image by detecting objects edges and accordingly segment through the edges. However, in case of overlapping objects, it segments the object into nuclei. A nuclear image analysis method based on NBM segments cell nuclei in the digital images of feulgen-stained histological sections of prostate cancer (Nielsen et al. 2012). This method uses an active contour model to detect each nucleus along with applying NBM on the histological images and detects nuclei with an object perimeter gradient verification step. The authors claim to have developed an automatic nuclei segmentation method better than manual segmentation based on the experiments conducted during this study. The results of mean segmentation sensitivity/specificity were 95%/96% for the ground truth provided by manual and automatic segmentation methods. The authors suggest insights about the “possibility for large-scale nuclear analysis based on automatic segmentation of nuclei in Feulgen stained histological sections.”

Recently, histopathological images analysis using pattern recognition and image processing techniques is becoming trendier research field. Applications like, “counting of red and white blood cells using microscopic images of blood smear samples and breast cancer malignancy grading from slides of fine needle aspiration biopsies,” are possible (Krzyzak et al. 2011). This work used NBM for the segmentation of cell in a histopathical image, and found it as “most reliable method to maintain disjoint components crucial in avoiding over or under segmentation.” These research have outworked difficulties that includes large variation of blood and cancer cells, occlusions, segmentation, low quality of images and difficulties in getting real data. In a work on adaptive threshold methods validation, the binarization techniques were applied over digital images of stained follicular lymphoma concluded that modified NBM (Sauvola and Pietikainen 2000) is better than other methods for the study of microscopic image synthesis (Korzynska et al. 2013).

8 Productivity and security imaging

In cases of security purpose, such as car license number plate detection and recognition, it is an emerging trend that use a variant of NBM (Hennecke et al. 2012). This method introduce a concept of mean absolute differences replacing the standard deviation, which is core of the original NBM. This work considers it as an improvement over NBM in terms of performance, the runtime and the memory consumption. Further, this method does not requires the tuning of the parameter, k, however introduced a constant positive value 0.5. The authors claim that this method can be applicable to any application that use standard deviation for image binarization. An automatic license plate detection based on color gradient map used NBM to retrieve the candidate license plate regions (Huang 2014). In addition, this method uses template matching to remove the background noise, thus making the license plate characters clear. The author claimed to detect license plates with uneven illumination, complex background conditions, however severe illumination changes issue are proposed for future research.

An automatic license plate recognition system based on inverted NBM is designed to recognize the license plate of the Botswana state (Aghdasi and Ndungo 2004). In addition, this method uses color filtering, blob selection, segmentation, template matching, neural networks, Euler number, and positional checks. The authors claimed to detect and identify license plates without constant human intervention in an economic way. A license plate detection algorithm based on NBM is used to detect automobile license plate without any priori information about the object (Trapeznikov et al. 2014). Additionally, the algorithm find histograms of oriented gradient to compare images for invariance to scale and illumination. The method also analyze “the dependence of correct detection probability from the different values of the internal parameters using receiver operation characteristic-curves method.”

An image processing system based on NBM is used to locate, segment and decode the most common 2D symbol, the 2D barcodes (Ottaviani et al. 1999). This system treats different symbols such as Maxicode, Datamatrix, QR-code, and PDF4 17, according to their similarities in developing a unified computational framework. This system also takes into consideration the difficulties like, variable resolution of the symbols inside the image and shape-based distortions due to disarrangement of the imaging system while image acquisition. The modified NBM effectively binarize the codes images, however there are cases of insufficiencies due to varying lighting conditions and poor code printing quality.

An eyes-free barcode scanning algorithm based on NBM installed on Google Nexus One smartphone with Android 2.2 to decode universal product code(UPC) barcodes (Kutiyanawala et al. 2011). This algorithm is comprised of three modules, namely an interactive camera alignment loop, barcode localization, and barcode decoding. The experiments show that using barcode localization improved the decoding rates, and visually impaired participants successfully scanned most of the UPC barcodes on various grocery products. Another modification that came to NBM comprise of barcode detection in images using smartphones based on support vector machines (Kulyukin et al. 2012). This method is free of the k parameter, and used \(k=0\) for threshold calculation. The developed application uses an eyes-free barcode detection algorithm to detect the presence of a barcode in image regions for blind and visually impaired smartphone users. For experimentation, Google Nexus One smartphone with Android 2.3.3 was used on 124 product images by three blindfolded sighted individuals who used the smartphone to detect UPC barcodes on 10 grocery products.

Table 1 show the modifications to NBM, the developed model or the algorithm, input dataset to the model, and remarks based on the implementation and results observations.

9 Window size selection and threshold estimation

The estimation of a threshold value specifically for a pixel of an image in a window is known as local binarization procedure. The local binarization method, NBM, is the first to calculate the local mean and standard deviation values in a window confined to an image. The local mean and standard deviation of a window varies or remains constant, according to the pixel values or the gray levels in an image, with respect to other windows in that image. It is interesting to note that an estimated threshold value is one of the values from the gray levels of an image, that is, [0–255]. The window slides over the pixels in an image in top-to-bottom and left-to-right manner. In case of the last or corner or border pixel, such pixels are padded using symmetric (or replicative or circular) window, to calculate the local mean and local standard deviation values. It is because the windows cannot find the pixels for masking beyond the image region, hence making it difficult to calculate means and standard deviations at the image border. The center of the window being coincident on the border pixel would have to calculate the mean and standard deviation for border pixel too. Hence, the replication (or other window) would create an imaginary pixel values making it to carry on the sliding of the window and estimation of the threshold.

The selection of a window size is an important aspect in any local binarization process. The size of a window is responsible in deciding the size of the object of interest in an image foreground. Thus, it is the window size that assembles the intended details of the image objects through the binarization process. In most of the cases, mainly empirically, the size of a window is predetermined based on the objects in the foreground region of an image. This is because the size of a window is selected to cover as many as possible objects in the image foreground. Further, the computation time in threshold estimation, however approximately, is directly proportional to the square of the size of a window. The larger the window size, the more the computation time in processing for threshold estimation. Thus, choosing a window size based on the objects in an image is an important task to have a better binarization method for satisfactory results.

10 Experimental results

In the previous sections, this review explained the two major drawbacks of the NBM, which are the reasons for the non-adoption of the original NBM directly to the real-time applications. First, the dependency of the NBM on the user-defined value of k, i.e., choosing a single value of k for all the test images is difficult. Second, irrespective of choosing a single value of k, the binarization result presents salt and pepper noise in the shadow region (or non-text region). These are the reasons why the modification or improvement was done to the original NBM based on the requirements of the real-time application.

Fig. 1
figure 1

Original and ground truth images from HDIBCO 2016 dataset. a Original image. b Ground truth image of (a). c Original image. d Ground truth image of (c). e Original image. f Ground truth image of (e)

10.1 Dataset

This section presents the experimental results of the original NBM and the other modified or improved NBMs based on the HDIBCO 2016 dataset.Footnote 1 Figure 1 shows the test and ground truth (also called as the reference image) images from the HDIBCO 2016 dataset. The HDIBCO 2016 dataset is available free and can be downloaded for educational or experimental purposes. This review comprises of nineteen modified or improved local binarization methods based on the NBM, and the original NBM.

10.2 Evaluation based on processing time

Table 3 shows the processing time of all the binarization methods over the HDIBCO 2016 dataset. The method of Chamchong and Fung (2009) performed well over the HDIBCO 2016 dataset both in the terms of binarization results and the processing time. It took less than 0.03 s process the 10 images, which is the least processing time among the modified or improved NBM’s, including the original NBM. This method calculates the mean value and standard deviation of all local thresholds based on the window size selected for an image. Table 2 shows the computational complexity of Chamchong and Fung (2009) as \(w\times w\), i.e., the dependency of this method on the window size. This is followed by Kulyukin et al. (2012) and it took less than 0.08 s in processing the HDIBCO 2016 dataset. In this method, an image is divided into \(s_i\times s_i\) subimages, and threshold is calculated for the respective subimage. The computational complexity of this method is \(O(s_i^2)\), see Table 2.

Majority of the binarization methods, specifically the twelve methods, namely (Liu and Ding 2009; Shafait et al. 2008; Seulin et al. 2006; Zhang and Tan 2001; Sauvola and Pietikainen 2000; Wolf and Jolion 2003; Khurshid et al. 2009; Singh et al. 2011; Feng and Tan 2004; Hennecke et al. 2012; Phansalkar et al. 2011; He et al. 2005) took between 1 and 2 s. In this category, the computational complexity of the methods of Feng and Tan (2004), He et al. (2005); Hennecke et al. (2012), Shafait et al. (2008), Singh et al. (2011), and rest others is \(O(W^2w^2)\), \(O(n^2+w^2)\), \(O(n^2)\), \(O(n^2)\), \(O(n^2)\), and \(O(n^2w^2)\) respectively, see Table 2. The methods of Halabi et al. (2009), Rais et al. (2004), and Boiangiu et al. (2011) took between 2 and 2.5 s. This review reports the computational complexity of Halabi et al. (2009), Rais et al. (2004), and Boiangiu et al. (2011) as \(O(n^2)\), \(O(n^2+n^2w^2)\), and \(O(w^2)\) respectively, see Table 2. The methods of Saxena (2014) and Bukhari et al. (2009) took slightly more than 4 s and less than 7 s in processing the HDIBCO 2016 dataset, respectively. Also, the computational complexity of the methods of Saxena (2014) and Bukhari et al. (2009) is \(O\left( \displaystyle \frac{n(n+l)}{w}\right) \) and \(O(n^2w^2)\) respectively, see Table 2. And the NBM took more than 350 s in processing the HDIBCO 2016 dataset, which is the highest processing time among all the binarization methods. The computational complexity of the NBM is \(O(n^2w^2)\).

Table 2 Computational complexity of the original NBM and modified binarization methods. (where \(n\times n\) is the size of the image and \(w\times w\) is the size of the window.)
Table 3 Calculated values of the average threshold and total processing time for the original NBM and the modified binarization methods

10.3 Evaluation based on binarization results

Figures 2, 3, and 4 show the binarization results of the modified NBM’s and the original NBM. It is clear from Fig. 2 that the methods of Boiangiu et al. (2011), Chamchong and Fung (2009); Khurshid et al. (2009), Sauvola and Pietikainen (2000), Saxena (2014), and Wolf and Jolion (2003) presents similar results, that is, the background effect in the original image is removed successfully in the binarized images. However, it is also evident from the binarized images that the text on the top-left in the original image is not recovered from these methods. Similarly, the methods of Halabi et al. (2009), Kulyukin et al. (2012); Phansalkar et al. (2011) and Shafait et al. (2008) successfully removed the background effect but with little noise in the binarized images. However, the text in the top-left in the binarized results through these methods possess broken or incomplete characters.

The methods of Bukhari et al. (2009), Liu and Ding (2009), Singh et al. (2011), and Zhang and Tan (2001) recovered the top-left text but with some background noise in the binarization result. If this noise is ignored, then these methods are the best methods to binarize such type of images. As an additional improvement or modification, the post-processing of the binarized images obtained through these methods can achieve best results of binarization, similar to the ground truth image. Whereas the methods of Feng and Tan (2004), He et al. (2005); Hennecke et al. (2012), Rais et al. (2004), and Seulin et al. (2006) reported much more background noise in the binarized results. And the NBM resulted in heavy background noise in the shadow (or non-text) region. This is the main cause of having improvements or modifications to the original NBM.

Fig. 2
figure 2

Binarization result of the original NBM and modified NBM’s. a Boiangiu et al. b Bukhari et al. c Chamchong and Fung. d Feng and Tan. e Halabi et al. f He et al. g Hennecke et al. h Khurshid et al. i Kulyukin et al. j Liu and Ding. k NBM. l Phansalkar et al. m Rais et al. n Sauvola and Pietikainen. o Saxena’s method. p Seulin et al. q Shafait et al. r Singh et al. s Wolf and Jolian. t Zhang and Tan

Fig. 3
figure 3

Binarization result of the original NBM and modified NBM’s. a Boiangiu et al. b Bukhari et al. c Chamchong and Fung. d Feng and Tan. e Halabi et al. f He et al. g Hennecke et al. h Khurshid et al. i Kulyukin et al. j Liu and Ding. k NBM. l Phansalkar et al. m Rais et al. n Sauvola and Pietikainen. o Saxena’s method. p Seulin et al. q Shafait et al. r Singh et al. s Wolf and Jolian. t Zhang and Tan

Fig. 4
figure 4

Binarization result of the original NBM and modified NBM’s. a Boiangiu et al. b Bukhari et al. c Chamchong and Fung. d Feng and Tan. e Halabi et al. f He et al. g Hennecke et al. h Khurshid et al. i Kulyukin et al. j Liu and Ding. k NBM. l Phansalkar et al. m Rais et al. n Sauvola and Pietikainen. o Saxena’s method. p Seulin et al. q Shafait et al. r Singh et al. s Wolf and Jolian. t Zhang and Tan

Figures 3 and 4 show the effectiveness of the methods of Boiangiu et al. (2011), Chamchong and Fung (2009), Sauvola and Pietikainen (2000), Saxena (2014), and Wolf and Jolion (2003) in separating the foreground (text) region from the background. These methods present the perfect binarized results with no background as well as bleed-through effect, the same as the ground truth image. Further, these methods show the effect of broken or incomplete characters in Fig. 4. However, the binarized results of the methods of Khurshid et al. (2009); Kulyukin et al. (2012); Phansalkar et al. (2011) and Shafait et al. (2008) are slightly effected by the bleed-through effect. If this effect is ignored or the binarization results are pruned in the post-processing step, then these methods have the results similar to ground truth image. But these methods performed well for the test image in Fig. 4, that is, no background as well as bleed-through effect.

Feng and Tan (2004), Halabi et al. (2009), Seulin et al. (2006), and Zhang and Tan (2001) produced the binarization result with bleed-through effect, however there is no background effect in the results. But the methods of Bukhari et al. (2009), He et al. (2005), Hennecke et al. (2012), Liu and Ding (2009), Rais et al. (2004), and Singh et al. (2011) show much effect of bleed-through in the binarization results. And the NBM produced the binarization result with much effect both of bleed-through as well as background.

10.4 Performance measures

This section describes the performance measures like, Precision, Recall, F-Measure, Accuracy, and Jaccard Index to show the effectiveness of the original NBM and the modified and improved NBMs. Details of each of the performance measures is as follows:

Precision: Precision (P) is defined as the number of true positives (\(T_p\)) over the number of true positives \((T_p)\) plus the number of false positives (\(F_p\)).

$$\begin{aligned} P=\frac{T_p}{T_p+F_p} \end{aligned}$$
(3)

Recall: Recall (R) is defined as the number of true positives (\(T_p\)) over the number of true positives (\(T_p\)) plus the number of false negatives (\(F_n\)).

$$\begin{aligned} R=\frac{T_p}{T_p+F_n} \end{aligned}$$
(4)

F-Measure: F-Measure is defined as the harmonic mean of Precision (P) and Recall (R).

$$\begin{aligned} F Measure=2\frac{P\times R}{P+R} \end{aligned}$$
(5)

Accuracy: Accuracy is defined as the number of true positives (\(T_p\)) plus the number of true negatives (\(T_n\)) over the number of true positives (\(T_p\)) plus the number of true negatives (\(T_n\)) plus the number of false positives (\(F_p\)) and plus the number of false negatives (\(F_n\)).

$$\begin{aligned} Accuracy=\frac{T_p+T_n}{T_p+T_n+F_p+F_n} \end{aligned}$$
(6)

Jaccard Index: Jaccard Index is defined as the number of true positives (\(T_p\)) over the number of true positives (\(T_p\)) plus the number of false positives (\(F_p\)) and plus the number of false negatives (\(F_n\)).

$$\begin{aligned} Jaccard~Index=\frac{T_p}{T_p+F_p+F_n} \end{aligned}$$
(7)
Table 4 Performance measures like, “Precision”, “Recall”, “F-Measure”, “Accuracy” and “Jaccard Index” for example image 1

Tables 4, 5, and 6 present the performance measures like, Precision, Recall, F-Measure, Accuracy, and Jaccard Index for all the modified and improved NBMs including the original NBM for the ground truth and test images in Fig. 1. It is evident from Table 4 that the original NBM achieved the highest values for the measures like, Precision, F-Measure, and Jaccard Index. Whereas the method of Saxena (2014) achieved the highest value for Accuracy, and the methods of Chamchong and Fung (2009), and Sauvola and Pietikainen (2000) achieved the highest value for Recall. However, the binarization result of the methods of Saxena (2014), Chamchong and Fung (2009), and Sauvola and Pietikainen (2000) are much better than the original NBM. The result of these methods present clear and distinct text as compared to the NBM, which presents a lot of background noise in the binarized image.

For the example (or test) image 2 in Fig. 1, Table 5 shows that the original NBM achieved the highest values for the measures like, Precision and Jaccard Index and the method of Saxena (2014) achieved the highest value for Accuracy. And the methods of Liu and Ding (2009) and Zhang and Tan (2001) achieved the highest value for F-Measure and Recall respectively. The binarization result of the methods of Liu and Ding (2009) and Zhang and Tan (2001) contains background noise, which makes the foreground text difficult to be read. Further the binarization result of the original NBM contains huge amount of background noise, which makes the foreground text unreadable. However, the method of Saxena (2014) presents a clear and distinct text successfully eliminating the background noise.

For the example (or test) image 3 in Fig. 1, Table 6 shows that the original NBM achieved the highest values for the measures like, Precision, F-Measure and Jaccard Index and the method of Saxena (2014) achieved the highest values for both Recall and Accuracy. The binarization result of the method of Saxena (2014) presents a clear and distinct text with the elimination of background noise but there are instances of broken and incomplete text. And evidently, the binarization result of the original NBM suffers from background noise along with the foreground text.

Table 5 Performance measures like, “Precision”, “Recall”, “F-Measure”, “Accuracy” and “Jaccard Index” for example image 2
Table 6 Performance measures like, “Precision”, “Recall”, “F-Measure”, “Accuracy” and “Jaccard Index” for example image 3

11 Outcome of the review

Despite of several modifications and alterations made to NBM, there are drawbacks that are still prevailing and further investigations are needed to obtain desired outputs. The reported modifications to NBM are application-specific and not all modifications are similar or could be applied to other methods. Each method presents a specific modification depending on the quality of the input images and its application’s requirements. In case of documents image binarization, processing noise-affected portions only rather than the whole image would be economical both in the terms of processing speed as well as memory consumption. It is a way of selective processing of the desired objects in an image containing different objects differing in several image properties. This processing would be more effective in most of the document image analysis applications where major portion of an image is text.

A number of modifications to NBM have been proposed in the past for the binarization of the images. However, the low quality and high-resolution images produce inferior and better binarization results respectively (Trier and Taxt 1995). Some methods have incorporated constant variables while the others have involved interdependent variables based on the image properties as modifications to NBM. Although the modifications can be applied to the original NBM, the process of binarization remains the same. In all, this review presents twenty binarization methods including the original NBM; there is no method, which can be said to be suitable for all images or applicable to all the applications. Some modified NBMs are application-specific whereas some are content-specific based on the contents of the images. Hence, the objective of this review is to present a survey of real-time applications based on the modifications incorporated in the original NBM.

12 Author’s suggestions

It is evident from this review that the local binarization methods of future will focus on the hybrid binarization as well as on the addition of one more step, known as post-processing step. Binarization for image enhancement or object recognition should focus on minimum parameters in future methods. In this line, implementing several modifications in a single binarization method can achieve high performance in a particular application. Or a combination of global and local methods could be another possibility, using global first then local or vice versa. Further, there will be a need to optimize the results of the combination of these methods.

There is a need to develop a universal binarization method that can be applicable to any real-time application. As stated previously, hybridizing global and local methods could be an option to fulfill this demand. This hybrid binarization method can be made applicable to any types of images and image analysis tasks where robustness is required. Therefore, there is a need to have a unified binarization method or common modifications criterion to NBM for all the images in any application. This will further lead to the development and proposal of common modifications criterion to NBM for the binarization of the images in any application.

The modifications must lie in the robustness of the method and exploit available properties of the desired objects in the images. As a checking criterion, research must concentrate on eliminating false alarms during the execution of any modification in the binarization processing. The developed method should take into account the quality of the images while separating the desired objects from non-objects. Extensive experiments are needed to prove the effectiveness of both the hybrid binarization method and the common modifications criterion method. It would be interesting to see the effectiveness of the post-processing step to modified NBMs as well as to common modifications criterion method.

13 Significance of the review to this developing world

Global methods are considered as unsuitable for the most of the types of images, however still there is a possibility in local methods. Initial attempts tried to find the threshold value using global binarization methods (Otsu 1979), and after that local method like, NBM came into existence. Local methods are more sensitive to degradations, however they preserve objects shape and image details better than global methods. The degradations in an image disturb the minute details of importance, which is a major issue for any binarization method.

One aspect of complexity is related to resources availability, which is hardly an issue for any method processing in modern computers. For a global method like Otsu (Otsu 1979), one computation is for histogram processing and the other is for image binarization. However, in case of local binarization methods the processing sometimes becomes slower because of large number of small windows. This can create another issue known as time complexity. In many applications the time complexity is directly proportional to the number of pixels or it depends on the square of the size of the image. However, for large-scale projects the time complexity is sometimes compromised with the efficient hardware implementations.

The experimental evaluation of the binarization methods use same input images for comparisons, which sometimes present elusive results. A universal binarization method would be capable of producing perfectly binarized results for all the types of images. The state of uncertainty in classifying foreground and background pixels hinders the path in developing such a method. The perfectly binarized results are of much importance for the binarization methods employed in real-time applications. Most of these applications produce good results, however this is due to human intervention and fine-tuning parameters. This contradicts the development of a universal method.

14 Conclusions

This paper is a review of modified local binarization methods, found twenty binarization methods that are directly or indirectly based on the modifications to NBM. In the scope of this review, it is clear that none of the binarization methods is suitable for every type of image deteriorations. Most of the methods depend on the selection of the size of a window. The size of the window should be large enough to eliminate the noise and at the same time small enough to preserve local details in an image. The larger the window size, the more the computation time in processing for threshold estimation. Thus, choosing a window size based on the objects in an image is an important task to have a better binarization method for satisfactory results. Therefore, truly said, no single algorithm works well for all types of images (Sahoo et al. 1988; Trier and Jain 1995; Chaki et al. 2014; Sezgin and Sankur 2004; Korzynska et al. 2013). Some perform better than other based on the input and method’s controlling parameters; either manual or automatic, and or both. Thus, the need to have a universal binarization method still remains unfulfilled. This is a challenge that has to be taken and come up with a binarization method to possibly segment, if not all then at least most of the images in various domains.