Keywords

1 Introduction

Nowadays, with the increase in use of e-resources, it has become essential to transform paper documents into usable digital versions. Paper document can be converted to digital format by scanning them. Digital documents are present almost everywhere, to further enhance advantages of digitization are images needed to be compressed using suitable compression technique. The presence of noise in image will become significance after any digital process like compression. This resultant noise amount decides whether any enhancement or reconstruction technique will be effective or otherwise [1]. Partial of almost entire information in image may be lost if the noise is noise paid sufficient attention prior to applying any image processing operation. Noise is also the most vital factors to consider when processing digital images; it is crucial to remove or minimize noise or degradation that will be carried through transmission, storage, and retrieval. The undesirable part(s) of the document image known as noise is likely to increase the cost of transmission and storage because they take up valuable resources. The scanned document images frequently suffer from degradations including uneven contrast, show through, interfering strokes, background spots, humidity absorbed by paper in various locations, and uneven backgrounds. These days digital images of documents are acquired using image acquisition tools like digital cameras, mobile phone cameras or scanners. While capturing or acquiring images, these image acquisition devices may introduce undesired elements into the images [8]. In such a scenario, noise reduction is necessary to accomplish the image compression and subsequent objectives of useful portability, storage and retrieval of the document images.

2 Types of Noises

Digitization of documents involves its scanning using digital scanners and cameras. Some of the documents to be digitized are of poor quality, and the illumination conditions or scanning process may also be responsible for introduction of noise during digitization. The possible reasons for the introduction of noise are poor paper and ink quality used, ageing in case of old documents, an older printing process or mechanism, or the scanning equipment itself. The presence of noise affects image quality, noise removal is an important low-level operation in digital image preprocessing [10]. It can appear in the foreground or background of an image, the noises generated before or during digitization process are discussed here. Some noises considered in scanned document images are as follows.

2.1 Speckle Noise

The pixels in grayscale images are affected by this multiplicative noise, which is most evident in low level luminance images [3]. Image enhancement is an essential lowlevel task in image processing to reduce speckle noise before performing subsequent higher-level task like edge detection, image segmentation, object detection, etc. Let I(m, n) represent the noiseless image and O(m, n) is its corresponding distorted image. According to the multiplicative noise model the relation is shown using Eq. 1.

$$O(m,n) = I(m,n) \times N(m,n)$$
(1)

where N (m, n) represents the speckle noise and the multiplication here is point multiplication.

The grey levels statistics of an image is affected by the speckle noise. This influence of speckle noise on the grey values increases with speckle noise variation, making it more difficult to recover the original image with less or no noise. To show the effect of speckle noise with variances of 0.1 and 0.9 on the text image, the respective histograms are shown in Fig. 1. The variations in histogram and the effect of noise addition are evident here. The noise is closely linked with the high-frequency content of image, i.e. its detail features. As a result, maintaining a balance between minimizing noise as much as possible while maintaining the image information intact becomes difficult.

Fig. 1.
figure 1

Effect of Speckle Noise on image Histogram

2.2 Gaussian Noise

Gaussian noise is also referred to as electronic noise since it is originated in amplifiers or detectors. The discrete nature of warm object radiation and the thermal vibration of atoms are considered as parameters to study its mechanisms that cause Gaussian noise. Gaussian noise typically distorts grey levels in digital images. As a result, the histogram of the image or the probability density function (PDF) of Gaussian noise gives its basic nature and characteristics. This statistical noise, also known as the Gaussian distribution, is present in an image with a PDF equal to the normal distribution. The noise is referred to as white Gaussian noise when the values at any pair of times are statistically independent and uncorrelated. The formula of Gaussian noise may be given as below

$$p_{G(Z)} = \frac{1}{{\sigma \sqrt {2\pi } }}e\frac{(z - \mu )^2 }{{2\sigma^2 }}$$
(2)

In the above Eq. 2, ‘p’ is the rate of Gaussian noise. It depends on the standard deviation and mean value. The statistical noise with its PDF termed as the Gaussian distribution. Gaussian noise was not considered in digital image processing earlier, although it is there most of the time when looking at the intensity of the images and the amount of error. But here the mechanism used for digitization is prone to this kind of noise. To better explain the influence of Gaussian noise with variances of 0.01 and 0.5 on the text image is shown in Fig. 2, by means of variations in the histogram.

Fig. 2.
figure 2

Effect of Gaussian Noise on image Histogram

2.3 Salt and Pepper Noise

Salt and pepper noise, which may be caused by dirt and stains on the document paper, might appear in a document image during the conversion process. While one or more pixels may be contaminated by this noise, it is considered that they are very tiny and smaller than the size of the textual elements. Generally, to eliminate this noise, simple filters like median are effective and used, but if the noise is more widespread, techniques like k-fill or other morphological operators should be used [11]. Printed documents can be found in a wide variety of writing inks. While the pepper noise results in spurious representations of textual characters in the document images, the salt noise appears to be a lack of ink. Because noisy pixels are alternately adjusted to the minimum or maximum intensity values, images distorted by impulse noises, such as salt and pepper noise, appear "salt and pepper" [12]. Unaffected pixels, however, always maintain their original state. One can formulate it using the following mathematical expression, as mentioned in Eq. 3.

$$I(m,n)\mathop{\longrightarrow}\limits^{yields}S_{\min } {\text{with probability q, }}S_{\max } {\text{with probability q,}}\,{\text{u}}({\text{m}},{\text{n}}){\text{with}}\,{\text{probability1}}\, - \,({\text{p}}\, + \,{\text{q}})$$
(3)

where, respectively, The noisy image and the noise-free image are denoted by I(m,n) and O(m,n), respectively. The range [Smin, Smax] indicates u(m,n). The image intensity value ranges between [0,1], making Smin = 0 denote pepper noise and Smax = 1 represent salt noise, can be used; the probability p+q indicates the quantity of salt and pepper noise. Using modifications to the histogram, Fig. 3 shows the impact of Salt and Pepper noise on the text picture with variances of 0.01 and 0.5.

Fig. 3.
figure 3

Effect of Salt and Pepper Noise on image Histogram

3 Proposed Work for Document Image Enhancement

Image enhancement is the initial pre-processing step in image processing. Real-time document images may include one or all common distortions like contrast variation, blur, salt and pepper noise, and others. Image enhancement tasks are commonly used to improve the perception of the image quality through spatial or frequency domain. Direct pixel level manipulations are carried over in spatial domain methods, whereas in the frequency domain, changes occur indirectly through transformations. Process of enhancing the low-level blur in an image is a difficult task. Enhancement of lowlevel distortion in an images using existing image smoothing techniques, as the mean filter, median filter, conservative smoothing, and proposed modified conservative smoothing is studied here. The techniques of thresholding and grey level reduction are then used to further improve the readability of the textual contents in the image.

Fig. 4.
figure 4

Flow of proposed scheme for document image enhancement

As shown in Fig. 4. The process of adding noise individually or combinedly is performed. Then as a part of proposed methodology to reduce the noise added the first step of image smoothing is carried out using Edge Preserving Smoothing method like Conserve Smoothing (CS) first, and second time using Modifies Conserve Smoothing (MCS). This smoothened image is fed for Grey-level reduction process, which is developed based on the color level reduction algorithm proposed in [21]. Finally, the Otsu’s thresholding algorithm is applied, and the output image is produced. This resultant image and input image are compared by various image quality metrics and the performance of the CS and MCS are recorded.

3.1 Edge Preserving Image Smoothing

Noise in the digital image can be minimized or suppressed by smoothing. The smoothing techniques rely on low-pass filters, which traverse an image and modify the central pixel value with the mean or median of the surrounding pixels [2]. Point processing and neighbourhood processing are the two spatial domain techniques that are more effective for real-time image processing [6]. The point processing method directly modifies each pixel to enhance the image. Each pixel in an image is modified with a mathematical function in neighbourhood processing or spatial filtering to improve the image [4]. Correlation and convolution are two crucial concepts in spatial filtering. Correlation measures the degree of similarity between two images by moving the kernel over entire pixels in an image and performing the predetermined transformation in each pixel position. Image convolution is like correlation except that the spatial mask is rotated by 180° before processing. Image convolution is frequently used to recovering from effects (such as blurring, edge sharpening, etc.) in an image. Combining a mathematical function with a convolution mask is the fundamental idea behind spatial filtering. It is mostly used to remove extraneous or superfluous information from images. In case of spatial domain filtering, the filters used can be linear or nonlinear. Linear filters modify the value of the targeted pixel by using precise linear combinations of surrounding pixels, on the other hand the non-linear filters, uses arbitrary non-linear combinations of surrounding pixels. The most fundamental linear spatial filter, mean filtering is used to smooth images for noise reduction by minimizing the variability in pixel values. The idea of spatial convolution is used in this method. The filtering kernel, which has dimensions of 3 × 3, 5 × 5, and so forth, slides over the entire image. The central pixel value of the current cell of image pixels is changed to a new value that represents the mean of its eight adjacent pixels and centre pixel values. This technique is very effective at reducing noise, but at the same time high frequency information in the image is lost at greater extents. If a smaller size kernel is used, undesirable local features may get added and when big size kernel is applied, some important features may get lost. The mathematical computations used to replace this centre value of the kernel might vary, and changes to the replacement criterion should be made while keeping in mind the loss of image information and energy compaction, etc.

3.1.1 Conserve Smoothing

The noise is reduced by the smoothing process, but the sensitive details are also lost, and the object boundaries are also un-sharped. A solution to this challenge, particularly for document images, is to use an edge-preserving smoothing technique (adaptive mean filter), where the degree of blurring for each pixel is determined after acquiring local information in its 3 × 3 neighbourhood (Harwood et al., 1987; Perona and Malik, 1990). Here, a fast and effective Edge Preserving Spatial Filtering (EPSF) algorithm is applied to complete the work.

3.1.2 Modified Conserve Smoothing

Here, the centre pixel replacement policy in conserve smoothing is changed slightly, the results obtained shows good edge preserving ability. Following Fig. 5., shows the outcome of using various smoothing algorithm on a test image (a textual portion of a document image). It is evident that the output of the modified conserve smoothing algorithm is more readable than other outputs. The saturation of noise is less than the conservative smoothing method and the granularity of the text symbols is more than that of the median filter, because of edge preserving nature of this algorithm.

Fig. 5.
figure 5

Result of various smoothing algorithms

3.2 Gray Level Reduction

Reducing the number of grey levels in an image is crucial for implementing image compression, segmentation, presentation, and transmission on it. A common technique for reducing the number of grey levels in a digital image is multi-thresholding. Multi-thresholding establishes the proper threshold values that set the boundaries of the grey-level classes of images, by using the data from the image histogram. Multithresholding techniques can be classified into three categories. They are zerocrossing in second order derivatives, histogram based, and entropy based. Other processes depend on grey-level error diffusion or nearest-grey-level merging. Each pixel in the image changes its value to the grey level in a palette that matches some typical neighbouring pixel the closest in the nearest-grey-level procedures. Dithering procedures serve as the foundation for error diffusion strategies. The difference between actual pixel values and true values is referred to as the "error". All the low pass filtering-based techniques presented here are based on thresholding the pixel intensities of the surrounding grey levels. This leads to a procedure that is like averaging, which reduces important edge features of image. The colour level quantization method is used to colour spaces in colour images; it decreases the number of distinct colours in an image, typically with the goal of making the new image as visually as the original image as possible. Most of the conventional approaches consider colour quantization as a problem of point clustering in three dimensions, where the points correspond to colours present in the original image and the three axes to the three colour channels. This view is considered by the proposed approach for grey level image. It uses neighbouring grey-level values of the pixel and apply clustering on it considering the new centroid as average of neighbouring values. All the pixels should be in one cluster if the 8-neighbours are considered, and they should follow the intra-class and inter-class similarity criteria.

The proposed document grey-level reduction technique consists of the following steps:

Step 1. Application of Edge preserving smoothing filter

Step 2. Edge detection of image regions

Step 3. Gray level subsampling

Step 4. Tentative grey-level reduction

Step 5. Mean-shift procedure

Step 6. Final grey-level reduction

Fig. 6.
figure 6

Effect of image grey-level reduction

Figure 6, shows the output of the grey-level reduction scheme for a representative test image, the same algorithm is applied on the ten other text images and the grey-levels reduced by the said algorithm without introducing perceptual difference are listed in Table 1.

Table 1. Effect of Grey-level reduction algorithm

3.3 Image Thresholding Using Otsu’s Method

Using a intensity threshold value, every pixel in the image is assigned to image object or the image background, called as image thresholding. As a result, each image pixel is either classified as a background point or an object point [5]. In many image processing applications, the grey levels of the pixels that belong to the object and the background substantially differ from each other. Thresholding then turns into a simple technique for separating objects from the background. Map processing, where lines, legends, and characters can be found; document image analysis, where the goal is to extract printed characters, logos, graphic content, or musical scores, etc. [9]. By continually running through all the plausible threshold values, Otsu's thresholding method determines the spread for the pixel levels on either side in the foreground and background [7]. Finding the value below which the sum of the foreground pixels and background pixels is at its minimum is the objective of this exercise [13]. As a result, thresholding is frequently used to distinguish between light and dark areas. This greyscale image is converted into a binary image by setting all of the pixels below threshold to ZERO and all of the pixels above the threshold value to ONE. The thresholding of images, which can be thought of as a more extreme form of grey-level quantization, is its most straightforward example. Assume that a grey-level image ‘I’ has ‘K’ existing grey levels, including 0, 1, 2,… ,K 1. Let us define an integer threshold, ‘Th’, that has value between 0, 1, 2, . . . , K 1. Each image pixel value in ‘I’ is compared with the threshold, ‘Th’, during this process. A binary value is selected, that represent the value of the pixel under consideration, in the output binary image ‘O’, based on this comparison.

If O(m, n) is result of thresholding of I(m, n) at selected global threshold ‘Th’.

Generalized Algorithm:

  1. 1.

    Process the input image using smoothing and grey level reduction algorithms

  2. 2.

    Obtain image histogram

  3. 3.

    Compute the threshold value ‘Th’

    1. (i)

      calculate the histogram-based intensity level probabilities

    2. (ii)

      initialize probabilities and means

    3. (iii)

      iterate over possible thresholds

    4. (iv)

      update the values of probabilities and means

    5. (v)

      calculate within-class variance

    6. (vi)

      calculate between-class varience

  4. 4.

    Replace image pixels to ‘1’, where pixel value is greater than ‘Th’ and to ‘0’ in the rest of the cases (Fig. 7).

    Fig. 7.
    figure 7

    Outcome of Otsu’s image thresholding algorithm

4 Experimentation and Results

This set of steps mentioned in method of document image enhancement is implemented using MATLAB programming. Hundred document images, ten each of ten different classes were used for the experimentation. The results with respect to the considered image quality metric are presented in this section. All the three types of noises including speckle noise (var=0.1 to 0.9), gaussian noise (var=0.01 to 0.5), and salt and pepper noise (var=0.01 to 0.5) noise were added to the set of images individually and a combination of all these noises was also applied. The various groupings considering noise variance levels also made to create such 10 classes of noisy images. Then, the combination scheme consisting of edge preserving smoothing, color level reduction based grey-level reduction and the Otsu’s thresholding are performed one after other. The result of proposed method is presented considering the resultant image quality metrics like Mean Square Error (MSE), Signal to Noise Ratio (SNR), Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index Metrics (SSIM). Tables 2, 3, and 4 shows the performance of proposed method considering one noise at a time with varied variance/saturation, for Gaussian Noise, Speckle Noise, and Salt-and-Pepper Noise respectively. Whereas Table 5 shows the performance of algorithm on the combinational noise. The 7th row in each of the table almost shows the promising values of metrics and Fig. 8. Shows the overall result of the method, which signifies the perceptual improvement.

Table 2. Gaussian noise: (Smoothing + GrayLevelReduction + Thresholding)
Table 3. Salt and pepper noise: (Smoothing + GrayLevelReduction + Thresholding)
Table 4. Spekle noise: (Smoothing + GrayLevelReduction + Thresholding)
Table 5. Combination of all noises: (Smoothing + GrayLevelReduction + Thresholding)
Fig. 8.
figure 8

The result of combining smoothing, grey level reduction and thresholding

5 Conclusions and Future Work

The process of noise removal is indeed lossy and may lose some important attribute in the image, keeping this loss minimal and enhancing the perceptual quality (readability in our case) of the image is a challenge. The quality of resultant image is evaluated using quality metrics like MSE, SNR, PSNR and SSIM. It is evident from the results that the proposed method had succeeded up to significant extends. The results of document pre-processing shows that the Edge Preserving Image Smoothing gives better results than the traditional averaging and mean filtering approach. The grey-level reduction based on color level quantization is a better alternative to the other grey-level quantization approaches. The Otsu’s method also works fine, it involves fast and simple calculations that are seldomly affected by the brightness and contrast of input image and lead to satisfactory image thresholding. Especially the text part of the document images is very well processed and made more readable. The proposed combinational algorithm gives satisfactory results over the images affected with noises like Speckle noise, Gaussian noise, salt-and-pepper noise, and its mixture. For the natural noise that gets added through various unmodelled processes, an advance algorithm may be required.