A novel skew correction methodology for handwritten words in multilingual multi-oriented documents

Pramanik, Rahul; Bag, Soumen

doi:10.1007/s11042-021-10822-2

A novel skew correction methodology for handwritten words in multilingual multi-oriented documents

Published: 18 May 2021

Volume 80, pages 27323–27342, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

A novel skew correction methodology for handwritten words in multilingual multi-oriented documents

Download PDF

247 Accesses
3 Citations
Explore all metrics

Abstract

Multi-oriented handwritten documents require additional preprocessing for segmentation and subsequent phases to work accurately in handwritten recognition systems. Skew correction is one such additional phase. Appearance of skew in multi-oriented Indian language based handwritten document is higher due to the presence of cursive nature. In the current work, we utilise a salient feature present in Indian scripts called $m\bar {a}$tr$\bar {a}$ (also known as headline), extract a group of eligible pixels, and employ linear curve fitting for detecting and correcting skew in handwritten words. The proposed method is capable of correcting skew in four distinct Indian languages, viz. Bangla, Hindi, Marathi, and Punjabi. It is capable of efficiently handling skewed word images to an extent of ± 55^∘ and delivers precise result even when the $m\bar {a}$tr$\bar {a}$ is mostly absent or discontinuous.

Linear Regression-Based Skew Correction of Handwritten Words in Indian Languages

Analysis on Skew Detection and Rectification Techniques for Offline Handwritten Scripts

Skew Detection and Correction of Devanagari Script Using Interval Halving Method

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Image processing domain has come a long way and as time has passed each of its various research domains [18, 36, 37, 40] brought novel challenges [19, 38, 39]. Handwriting recognition is one such domain. Handwriting recognition has become popular from the past decade due to the need of automation of handwritten forms and texts that are being filled everyday at railway counters, banks, post offices, educational institutions, and for application of jobs. The drastic need for performing research and development in various aspects of a handwriting recognition system is severe, especially for a developing and multilingual country like India. A handwritten character recognition system encompasses four components, viz. preprocessing, segmentation, feature detection and extraction, and classification [8, 26]. The preprocessing component hones the text, utilising various strategies in order to increase the accuracy at the segmentation level. Due to different reasons, like feeding document into a scanner, or sometimes individuals tend to add skew in text (downwards or upwards) while writing, etc., skew present in most documents can vary between ± 30^∘. Most of the segmentation algorithms work best in ideal cases, i.e., when no skew is present. Presence of a skew estimation and correction strategy at the preprocessing phase helps segmentation algorithms to achieve better result. An example of presence of skew in handwritten text is shown in Fig. 1.

Skew estimation and correction is addressed at two levels:

For correcting skew present in text lines of a handwritten text document,
For correcting skew present in words of a text line extracted from a multi-oriented handwritten text document.

A diverse amount of methodologies have been presented in the literature to estimate and correct skew present at text line level of document images [2, 5, 6, 15, 22, 31, 33, 41]. But, the same has not reflected at word level. Presence of cursive nature makes the task difficult for Indian regional languages [1]. Also, the stroke styles of characters are different for different scripts, as such the script itself arrives as a very challenging obstacle [14]. Bhowmik et al. [4] have used modified Hough transform based method for correcting skew in handwritten Bangla word images. Roy et al. [29] have illustrated a methodology that utilises the angle created between two successive characters to estimate and correct skew present in handwritten Bangla words. Jayadevan et al. [11] have used Radon transform for correcting skew present in handwritten words. They have experimented on handwritten Devanagari legal words. Malakar et al. [20] have delineated a generalized Hough transform based method for skew detection and correction in handwritten Bangla words. Ghosh and Mandal [7] have defined a method where they divide a word based on its bounding box. This is followed by calculation of the center of gravity at both parts of word. The slope created by the centers of gravity is considered as the skew present in the word. They experimented on Bangla handwritten words. Guru et al. [9] have proposed a methodology that uses an ellipse for encapsulating a handwritten word image. The slope of the major axis of the ellipse is considered as the skew present in the word. They experimented on multilingual Indian documents. Jundale and Hegadi [12] have delineated the use of Hough transform for performing skew correction of handwritten Devanagari words. Further, they [13] have used a parallel axes based linear regression based strategy for correcting skew in handwritten Devanagari words. Guru et al. [10] have extracted the contour of a word image. This is followed by extraction of line segments based on small eigen values. They have further applied K-means clustering for extracting the head line present in Devanagari words. Finally, they estimate skew based on the extracted head line. Pramanik and Bag [25] have proposed a linear regression based strategy to correct skew in Bangla and Devanagari handwritten words but with limited capability. Kar et al. [14] have used a rectangular mask to detect the core region of a word, followed by linear regression and skew correction. A detailed overview of the aforementioned works is provided in Table 1.

Table 1 Existing works on skew estimation and correction of handwritten words for Indian languages

Full size table

Based on our observation from the literature, we found the following limitations or challenges on skew correction of handwritten words:

M1:
In present literature, works have been carried out by directly imposing linear curve fitting on words. We observed that, if we use the structural properties of the handwritten words before employing linear curve fitting, then it may provide better results. To the best of our knowledge, none of these types of works have utilized the structural knowledge of handwritten words. (e.g. [13]).
M2:
Most reported works fail when the skew orientation is greater than ± 25^∘. (e.g. [4, 14, 29]).
M3:
Few other works fail when the m$\bar {\mathrm {a}}\textit {tr}\bar {\mathrm {a}}$ (also known as headline) is absent or discontinuous. (e.g. [20]).
M4:
Absence of skew detection limits a method’s capability of handling skewed text. (e.g. [16, 32]).
M5:
Pramanik and Bag [25] fails at times to distinguish eligible from ineligible pixels required to estimate the regression line.

Keeping the aforementioned limitations in mind, we present a method that provides the following solutions:

C1:
Based on the structural knowledge of the languages under consideration, the proposed method extracts a group of eligible pixels and employs linear curve fitting for estimation and correction of skew present in Bangla, Hindi, Marathi, and Punjabi handwritten words.
C2:
The proposed method is capable of handling high skew to an extent of ± 55^∘.
C3:
The proposed method works even when the m$\bar {\mathrm {a}}\textit {tr}\bar {\mathrm {a}}$ is mostly absent or discontinuous.
C4:
Moreover, we have used an existing segmentation-based classification strategy to show the efficacy of the proposed method when compared with non-skew-corrected words as baseline.
C5:
The proposed work is able to distinguish efficiently between eligible and ineligible pixels which is a limitation for [25].

In the past few years, Convolution Neural Networks (CNNs) have widely evolved and being used towards improving the performance of various image processing and multimedia research domains. However, CNNs require considerable training data and energy consumption. In addition, when the training data is limited, the methods may cause a cross-learning problem. So, there is still research space for model-based approaches for image segmentation, which are more cost efficient, energy efficient and space efficient. As such, to identify the effectiveness of our proposed method, we have also focussed towards the feature extraction and recognition part. For correct recognition, selection of correct feature techniques plays a crucial part [17, 21, 35, 42]. We used structural features delineated in [28] and MLP and SVM for classification.

The paper is systematised as follows: A detailed description of the proposed methodology is provided in Section 2. A delineation of the experimental results and their corresponding analysis is presented in Section 3, followed by a brief conclusion in Section 4.

2 Proposed methodology

We propose a strategy that detects and corrects skew present in Bangla, Hindi, Marathi, and Punjabi handwritten words. A complete working model of the entire approach is shown in Fig. 2.

2.1 Preprocessing

Given a m$\bar {\mathrm {a}}\textit {tr}\bar {\mathrm {a}}$-based multilingual document image, we utilise the method depicted in [30] to extract the text lines. This is followed by word extraction (Fig. 2a). For each extracted word image (I_q), we perform component labelling and noise cleaning on I_q and remove all components below a certain threshold, υ. Here, we consider υ as 30 [24]. We crop this noise cleaned image I_q to its boundaries (Fig. 2b).

2.2 Skew estimation and correction

Here, I_q having dimension m × n is taken as input. The k of the n columns in I_q are chosen as 12% of the width of I_q consecutively (Fig. 2c). We experimented on a set of 500 randomly chosen images and surmised 12 to be the optimal percentage of selecting columns in order to obtain the best result. A validation for the aforementioned conclusion is provided in Section 3.2. These k selected columns are denoted as ${\mathscr{L}}~=~<c_{1},c_{2},\cdots ,c_{k}>$. Algorithm 1 is used for admissible pixel identification for skew estimation and correction. Each column in ${\mathscr{L}}$ is consecutively traversed from top to bottom and the first encountered foreground pixel is stored in $\mathcal {P}$ (Alg. 1, Steps: 2–6) (Fig. 2d). The stored pixels in $\mathcal {P}$ are denoted as $\mathcal {P}~=~<p_{1},p_{2},\cdots ,p_{l}>$, where l≤k. Each pixel p_j in $\mathcal {P}$ associates with a row number and a column number (p_j(r), p_j(c)). Three sets viz., $\mathcal {A} $, $\mathcal {A}^{\prime } $, and $\mathcal {T} $ are used to represent pixels in $\mathcal {P} $ as admissible, inadmissible, and transitional, respectively. $\mathcal {A} $ constitutes pixels that are admissible for further skew correction. $\mathcal {A}^{\prime }$ constitutes pixels that are inadmissible for further computation and will be removed from $\mathcal {P} $. $\mathcal {T} $ constitutes pairs of pixels that are provisionally kept here before examining their belongingness in $\mathcal {A} $ or $\mathcal {A}^{\prime }$. For every three successive pixels p_j, p_j+ 1, and p_j+ 2 in $\mathcal {P}$, the angle $\angle p_{j}p_{j+1}p_{j+2}$ (denoted as 𝜃_l) is computed. If 𝜃_l≤ 165^∘, then $\lvert p_{j}(r)-p_{j+1}(r)\rvert $ and $\lvert p_{j+1}(r)-p_{j+2}(r)\rvert $ are evaluated. A validation for choosing 165^∘ as optimal value is provided in Section 3.2. If $\lvert p_{j}(r)-p_{j+1}(r)\rvert > \lvert p_{j+1}(r)-p_{j+2}(r)\rvert $, then (p_j, p_j+ 1) is considered as transitional pixel pair and stored in $\mathcal {T} $. Otherwise, (p_j+ 1, p_j+ 2) is considered as transitional pixel pair and stored in $\mathcal {T} $ instead (Alg. 1, Steps: 7–11). If a pixel p_j in $\mathcal {P} $ appears twice in two consecutive pairs in $\mathcal {T} $ in a single iteration, then p_j is transferred from the set $\mathcal {T} $ to $\mathcal {A}^{\prime } $, while the pixels paired with p_j in $\mathcal {T} $ are removed from $\mathcal {T} $ (Alg. 1, Steps: 12–16). Once all the intermediate pixels get marked in a single iteration, the admissible pixels in $\mathcal {A}$ are computed as $\mathcal {A}$ = $\mathcal {P}$ ∖ ($\mathcal {T}~\cup ~\mathcal {A}^{\prime } $) (Alg. 1, Step: 18).

For each pixel pair (p_j, p_j+ 1) in $\mathcal {T} $, the row-wise difference, $d_{p_{j}}$ and $d_{p_{j+1}}$ of p_j and p_j+ 1 with every pixel in $\mathcal {A}$ is computed. The maximum of the two differences $d_{p_{j}}$ and $d_{p_{j+1}}$ is evaluated as max_d. The pixel in the pair (p_j, p_{j+ 1}) that is associated with the maximum number of max_d is sent from $\mathcal {T} $ to $\mathcal {A}^{\prime }$, while the other is sent to $\mathcal {A}$ (Alg. 1, Steps: 19–30). Once all the pixel pairs in $\mathcal {T} $ are examined, the pixels belonging to $\mathcal {A}^{\prime } $ are deleted from $\mathcal {P} $ and $\mathcal {A} $, $\mathcal {T} $, and $\mathcal {A}^{\prime } $ are all emptied (Alg. 1, Step: 31). This approach is carried out until no pixel gets represented in $\mathcal {T} $ or $\mathcal {A}^{\prime } $. The inadmissible pixels are removed from $\mathcal {P} $ to ensure that the skew estimation does not get affected due to the presence of certain consonants and upper modifiers that appear above the headline and at places where a rift is formed due to the absence of the headline in a word (Fig. 2e). , , ,, and are few examples of consonants and upper modifiers that appear above the headline in Bangla script.

To illustrate the proposed approach, the skewed words and are used as working examples. The two words illustrate the two different conditions as explained above. Currently, we consider the word for illustrating the first condition. During first iteration, for the first three pixels p₁, p₂, and p₃, the computed angle, i.e., $\angle p_{1}p_{2}p_{3}$, is lesser than the specified threshold 165^∘, i.e., $\angle p_{1}p_{2}p_{3}$ (𝜃_l) is ≤ 165^∘ and $\lvert p_{1}(r)$ - $p_{2}(r)\rvert > \lvert p_{2}(r)$ - $p_{3}(r)\rvert $. So, the pixel pair (p₁, p₂) is sent to $\mathcal {T} $ (Fig. 3). Similarly, for the consecutive pixels p₂, p₃, and p₄, the angle $\angle p_{2}p_{3}p_{4}$, is lesser than the specified threshold and $\lvert p_{2}(r)$ - $p_{3}(r)\rvert \leq \lvert p_{3}(r)$ - $p_{4}(r)\rvert $. As a result, pixel pair (p₃, p₄) is sent to $\mathcal {T} $ (Fig. 3). Once all the elements in $\mathcal {P}$ are examined and the first iteration completes, the elements in $\mathcal {A}$ is computed as $\mathcal {A}$=$\mathcal {P}$ ∖ ($\mathcal {T} \cup \mathcal {A}^{\prime }$). $d_{p_{1}}$ and $d_{p_{2}}$ of pixel pair (p₁, p₂) is computed with elements in $\mathcal {A}$ and correspondingly, $d_{p_{3}}$ and $d_{p_{4}}$ of pixel pair (p₃, p₄) is computed with elements in $\mathcal {A}$. It is concluded that for the first pair, p₁ is associated with most # max_d and for the second pair, p₃ is associated with most # max_d (Fig. 3). As a result, p₁ and p₃ are transferred to $\mathcal {A}^{\prime }$ from $\mathcal {T}$ and subsequently, p₁ and p₃ are removed from $\mathcal {P}$ (Fig. 3). All the elements in $\mathcal {A}, \mathcal {A}^{\prime }$, and $\mathcal {T}$ are removed.

During the second iteration, for the three pixels p₂, p₄, and p₅, the computed angle, i.e., $\angle p_{2}p_{4}p_{5}$, is lesser than the specified threshold 165^∘, i.e., $\angle p_{2}p_{4}p_{5}$ (𝜃_l) is ≤ 165^∘ and $\lvert p_{2}(r)$ - $p_{4}(r)\rvert \leq \lvert p_{4}(r)$ - $p_{5}(r)\rvert $. So, the pixel pair (p₄, p₅) is sent to $\mathcal {T} $ (Fig. 4). Once all the elements in $\mathcal {P}$ are examined and the second iteration completes, the elements in $\mathcal {A}$ is computed as $\mathcal {A}$=$\mathcal {P}$ ∖ ($\mathcal {T} \cup \mathcal {A}^{\prime }$). $d_{p_{4}}$ and $d_{p_{5}}$ of pixel pair (p₄, p₅) is computed with elements in $\mathcal {A}$ and is concluded that p₄ is associated with most # max_d. As a result, p₄ is transferred to $\mathcal {A}^{\prime }$ from $\mathcal {T}$ and subsequently, p₄ is removed from $\mathcal {P}$ (Fig. 4). All the elements in $\mathcal {A}, \mathcal {A}^{\prime }$, and $\mathcal {T}$ are removed. During third iteration, no consecutive pixel with angle lesser than the specified threshold 165^∘ is found (Fig. 5).

To illustrate the second condition, the word is used. Presence of the upper modifier in , puts the third encountered foreground pixel p₃ much higher than the rest of the pixels in $\mathcal {P}$ (Fig. 6a). As a result, during first iteration, when the angle for the pixels p₁, p₂, and p₃ is computed, i.e., $\angle p_{1}p_{2}p_{3}$, the angle is lesser than the specified threshold 165^∘, i.e., $\angle p_{1}p_{2}p_{3}$ (𝜃_l) is ≤ 165^∘ and $\lvert p_{1}(r)$ - $p_{2}(r)\rvert \leq \lvert p_{2}(r)$ - $p_{3}(r)\rvert $. So, the pixel pair (p₂, p₃) is sent to $\mathcal {T} $ (Fig. 6a). Similarly, the angle computed for the next three consecutive pixels p₂, p₃, and p₄, i.e., $\angle p_{2}p_{3}p_{4}$, is lesser than the specified threshold 165^∘ and $\lvert p_{2}(r)$ - $p_{3}(r)\rvert > \lvert p_{3}(r)$ - $p_{4}(r)\rvert $. So, the pixel pair (p₂, p₃) is sent to $\mathcal {T} $. As the pair already exists in $\mathcal {T}$, the current pair is discarded (Fig. 6b). Identically, the angle created by the next three subsequent pixels p₃, p₄, and p₅ is lesser than 165^∘. As, $\lvert p_{3}(r)$ - $p_{4}(r)\rvert > \lvert p_{4}(r)$ - $p_{5}(r)\rvert $, the pixel pair (p₃, p₄) is sent to $\mathcal {T} $. Pixel p₃ appears in two consecutive pairs in $\mathcal {T}$. As a consequence, p₃ is transferred to $\mathcal {A}^{\prime } $, while the two pixels associated with it, i.e., p₂ and p₄, are removed from $\mathcal {T}$ (Fig. 6c). Eventually, p₃ is removed from $\mathcal {P}$ (Fig. 6d) and $\mathcal {A}$, $\mathcal {A}^{\prime }$, and $\mathcal {T}$ are emptied. During second iteration, no consecutive pixel with angle lesser than the specified threshold 165^∘ is found.

Next, predictive $\hat {p_{i}}(r)$ values are computed utilising Algorithm 2 and Matlab’s polyfit function based on the pixels in $\mathcal {P} $. For this purpose, we compute the mean values of rows (p_i(r)) and columns (p_i(c)) (Alg. 2, Step: 2–5). We compute two coefficients b₀ and b₁. b₀ is the intercept that determines where the regression line to be computed intercepts the y-axis and b₁ is the slope of the aforementioned line (Alg. 2, Step: 6–9). Finally, we compute the predictive $\hat {p_{i}}(r)$ values utilising the aforementioned coefficients (Alg. 2, Step: 10–12). These predicted $\hat {p_{i}}(r)$ values and p_i(c) values (where i∈{1 ${\cdots } |\mathcal {P}|\}$) are used to compute the regression line (Fig. 2f, 5, and 6e). The skewed angle is determined from the computed regression line (Figs. 2g). Finally, the image is rotated according to the estimated angle for de-skewing (Figs. 2h, 5, and 6f).

3 Experimental results and analysis

3.1 Dataset

We have employed ICDAR 2013 Segmentation Dataset [34] and PHDIndic_11 [23] dataset for performing skew correction of the word images. A total of 10,000 images are used to carry out the experiment. Apart from this, we have also used another set of 1000 word images for performing validation. All modules and codes used for implementation are written in MATLAB environment.

3.2 Parameter validation

Figure 7a delineates the validation of choosing 12% of the width of I_q as the optimal percentage for selecting columns in order to traverse uniformly distributed columns for skew estimation and correction. We have experimented on a set of 500 randomly chosen images and observed the surmised 12 to be the optimal percentage of selecting columns in order to obtain the best result.

For every three successive pixels p_j, p_j+ 1, and p_j+ 2 in $\mathcal {P}$, it is validated if the angle $\angle p_{j}p_{j+1}p_{j+2}$ (denoted as 𝜃_l) ≤ 165^∘. We have randomly chosen 500 word images and performed validation and surmised 165 to be the optimal value (Fig. 7b).

3.3 Time complexity

The time complexity of Algorithm 1 depends on three main factors, namely change, $\lvert \mathcal {T}\rvert $, and $\lvert \mathcal {A}^{\prime }\rvert $. change is the number of iterations the algorithm runs until the sets $\mathcal {T}$ and $\mathcal {A}^{\prime }$ become empty. $\mathcal {A} $ comprises of pixels that are eligible for further skew correction and $\mathcal {T} $ comprises pairs of pixels that are provisionally stored before examining their belongingness in $\mathcal {A} $ or $\mathcal {A}^{\prime }$. So, the time complexity of Algorithm 1 can be approximated to O(change $ \times \lvert \mathcal {T}\rvert \times \lvert \mathcal {A}\rvert $). Similarly, the time complexity of Algorithm 2 can be approximated as O($\lvert \mathcal {P}\rvert $).

3.4 Experimental analysis

As the actual skew angle estimation of a particular word in an unconstrained environment is only possible by human beings, we have employed a semi-automated process to find the ground truth images and the corresponding angles of each word. We have used a recently developed method [25] to estimate the slope of each handwritten word. In case of erroneous cases, we have used traditional rotation operations to find the actual skew. The ground truth angle of each word is compared with the corresponding result of our proposed method as well as with other state-of-the-art methods. The root mean square (RMS) value of absolute error is determined by the following equation:

$$ E_{s} = \sqrt{\frac{{\sum}_{i=1}^{t}\left | g-x \right |^{2}}{t}} $$

(1)

where, g is the ground truth angle, x is the skew angle determined by the proposed method and t is the total number of words.

The proposed methodology has the lowest RMS error of 0.95 for Punjabi words while the highest error of 2.47 is recorded for Bangla words. In an ideal case, a word belonging to any of the four languages under consideration will contain a headline that connects the entire word. In such an ideal scenario, all the four languages will provide almost similar results for the proposed method. But, in general case, we have observed based on the writing of different individuals that the headlines are completely absent or very frequently disconnected in Bangla words. This is not the case for other languages. Our dataset similarly reflects the aforementioned general case. Due to the aforementioned characteristic of Bangla words, the performance of Bangla is not as good as other languages. The performance of our proposed method is delineated in Table 2. To show the efficiency of the proposed skew estimation and correction technique, we have synthetically rotated few handwritten word images and provided a visual and statistical analysis in Table 3. We have compared our proposed method with six recent works, namely Malakar et al. [20], Ghosh and Mandal [7], Jundale and Hegadi [12], Jundale and Hegadi [13], Pramanik and Bag [25], and Roy et al. [29]. Malakar et al. [20] have used generalized Hough transform for skew correcting handwritten Bangla word images. Ghosh and Mandal [7] have divided the word images morphologically and computed the centers of gravity of both part for skew angle detection. Jundale and Hegadi [12] have used a modified Hough transform based method for skew correcting Hindi handwritten words. Later, they [13] have used a basic parallel axes based linear regression based strategy for correcting skew in handwritten Devanagari words. Pramanik and Bag [25] is similar to the current method at initial stage but differs considerably at the latter stage. This method examines if the angle computed between three consecutive pixels is greater than or equal to a certain threshold and removes the middle pixel if the condition matches. Due to this rigid condition, several eligible pixels get removed and several ineligible pixels remain. Roy et al. [29] have used the angle between two successive characters to estimate and correct skewness. This technique cannot handle skew beyond ± 10^∘. In the current method, we use three different sets as well as few other conditions to ensure that an eligible pixel does not get removed and handles skewed word images to an extent of ± 55^∘. This mechanism shows improvement in accuracy when compared with other recent works. A comparison with [20], [7], [12], [13], and [25] is reported in Table 4. We have also reported and compared the average execution time required for each of the aforementionaed method to completely process each image. In terms of execution time, our proposed method is relatively slow when compared with other methodologies. We have also delineated a visual comparison of few word images with [25] and [29] in Tables 5 and 6 respectively to validate that the current method provides better skew correction.

Table 2 Skew estimation and correction performance of our proposed method for different languages

Full size table

Table 3 Few skew estimation and correction results of handwritten words

Full size table

Table 4 Skew estimation and correction performance comparison of our proposed method with other recent works

Full size table

Table 5 Comparison of skew correction on few word images of current method with Pramanik and Bag [25]

Full size table

Table 6 Comparison of skew correction on few word images of current method with Roy et al. [29]

Full size table

Apart from the aforementioned analysis, we have also used a segmentation-based classification methodology for delineating the effectiveness of our proposed skew correction strategy. For segmentation of the words into pseudo-characters, we used the methodology depicted in [27]. We have extracted the structural features using the methodology delineated in [28]. We have used MLP and SVM for classification. For comparison purpose, we have used non-skew-corrected word images as baseline. We have applied the same segmentation and feature set extraction strategy on these word images. The comparison delineates that the segmentation methodology works better on skew corrected words and thereby provides better classification accuracy (Table 7).

Table 7 Accuracy comparison with different classifiers

Full size table

3.5 Failure cases

The proposed methodology is very robust if single skew is present in the word images. But, it fails if a particular word is associated with multiple skews. If the size of upper modifier is too large w.r.t. the actual size of the word image, or the characters in a word are too much separated, then the proposed method fails at times, as most eligible pixels get deleted. Complete absence of headline is also a reason for some failures. Few such examples are provided in Fig 8.

4 Conclusion

Segmentation algorithms in multi-oriented handwritten document often fail due to the presence of skew in text. So, it is important to deskew the text images in the preprocessing phase before applying segmentation algorithms. Appearance of skew in multi-oriented Indian language based handwritten document is higher due to the presence of cursive nature. In the current work, we utilise a salient feature present in Indian scripts called m$\bar {\mathrm {a}}\textit {tr}\bar {\mathrm {a}}$, extract a group of eligible pixels and employ linear curve fitting for detecting and correcting skew in handwritten words. This method is capable in correcting skew in four different Indian languages, viz. Bangla, Hindi, Marathi, and Punjabi and handles skewed word images to an extent of ± 55^∘.

References

Bag S, Harit G (2013) A survey on optical character recognition for Bangla and Devanagari scripts. Sadhana 38(1):133–168
Article Google Scholar
Bagdanov A, Kanai J (1997) Projection profile based skew estimation algorithm for JBIG compressed images. In: Proceedings of the international conference on document analysis and recognition, vol 1, pp 401–405. IEEE
Basu S, Chaudhuri C, Kundu M, Nasipuri M, Basu DK (2007) Text line extraction from multi-skewed handwritten documents. Pattern Recogn 40 (6):1825–1839
Article Google Scholar
Bhowmik TK, Roy A, Roy U (2005) Character segmentation for handwritten Bangla words using artificial neural network. In: Proceedings of the IAPR TC3 NNLDAR
Boukharouba A (2017) A new algorithm for skew correction and baseline detection based on the randomized Hough Transform. Journal of King Saud University-Computer and Information Sciences 29(1):29–38
Article Google Scholar
Brodić D, Milivojević ZN (2012) Estimation of the handwritten text skew based on binary moments. Radioengineering 21(1):162–169
Google Scholar
Ghosh R, Mandal G (2012) Skew detection and correction of online Bangla handwritten word. Int J Comp Sci Issues 9(4):202
Google Scholar
Gupta D, Bag S (2019) Handwritten multilingual word segmentation using polygonal approximation of digital curves for Indian languages. Multi Tools App 78(14):1–26
Google Scholar
Guru DS, Ravikumar M, Manjunath S (2013) Multiple skew estimation in multilingual handwritten documents. Int J Comp Sci Issues 10(5):65
Google Scholar
Guru DS, Suhil M, Ravikumar M, Manjunath S (2015) Small eigenvalue based skew estimation of handwritten Devanagari words. In: International conference on mining intelligence and knowledge exploration, pp 216–225. Springer
Jayadevan R, Kolhe SR, Patil PM, Pal U (2011) Database development and recognition of handwritten Devanagari legal amount words. In: Proceedings of the international conference on document analysis and recognition, pp 304–308. IEEE
Jundale TA, Hegadi RS (2015) Skew detection and correction of Devanagari script using Hough Transform. Proc Comp Sci 45:305–311
Article Google Scholar
Jundale TA, Hegadi RS (2015) Skew detection of Devanagari script using pixels of axes-parallel rectangle and linear regression. In: Proceedings of the international conference on energy systems and applications, pp 480–484. IEEE
Kar R, Saha S, Bera SK, Kavallieratou E, Bhateja V, Sarkar R (2019) Novel approaches towards slope and slant correction for tri-script handwritten word images. The Imaging Sci J 67(3):159–170
Article Google Scholar
Kavallieratou E, Fakotakis N, Kokkinakis G (2002) Skew angle estimation for printed and handwritten documents using the Wigner–Ville distribution. Image Vis Comput 20(11):813–824
Article Google Scholar
Kumar R, Singh A (2010) Detection and segmentation of lines and words in Gurmukhi handwritten text. In: Proceedings of the international conference on advance computing conference, pp 353–356. IEEE
Liang Y, He F, Zeng X (2020) 3D mesh simplification with feature preservation based on whale optimization algorithm and differential evolution. Integrated Computer-Aided Engineering Preprint, pp 1–19
Liu S, Li M, Li M, Xu Q (2020) Research of animals image semantic segmentation based on deep learning. Concurrency and Computation: Practice and Experience 32(1):e4892
Google Scholar
Liu S, Yu M, Li M, Xu Q (2019) The research of virtual face based on deep convolutional generative adversarial networks using tensorflow. Physica A: Statistical Mechanics and its Applications 521:667–680
Article Google Scholar
Malakar S, Seraogi B, Sarkar R, Das N, Basu S, Nasipuri M (2012) Two-stage skew correction of handwritten Bangla document images. In: Proceedings of the international conference on emerging applications of information technology, pp 303–306. IEEE
Mei M, Zhong Y, He F, Xu C (2020) An innovative multi-label learning based algorithm for city data computing. GeoInformatica 24(1):221–245
Article Google Scholar
Mello Carlos AB, Sánchez A, Cavalcanti George DC (2011) Multiple line skew estimation of handwritten images of documents based on a visual perception approach. In: Proceedings of the international conference on computer analysis of images and patterns, pp 138–145. Springer
Obaidullah SM, Halder C, Santosh KC, Das N, Roy K (2018) PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification. Multi Tools App 77(2):1643–1678
Article Google Scholar
Pramanik R, Bag S (2017) Linear curve fitting-based headline estimation in handwritten words for Indian scripts. In: Proceedings of the international conference on pattern recognition and machine intelligence, pp 116–123. Springer
Pramanik R, Bag S (2018) Linear regression-based skew correction of handwritten words in Indian languages. In: Proceedings of the international conference on computer vision & image processing, pp 129–139. Springer
Pramanik R, Bag S (2018) Shape decomposition-based handwritten compound character recognition for Bangla OCR. J Vis Commun Image Represent 50:123–134
Article Google Scholar
Pramanik R, Bag S, Kumar R (2018) A fuzzy and contour-based segmentation methodology for handwritten Hindi words in legal documents. In: Proceedings of the international conference on recent advances in information technology, pp 1–6. IEEE
Pramanik R, Raj V, Bag S (2018) Finding the optimum classifier: Classification of segmentable components in offline handwritten Devanagari words. In: Proceedings of the international conference on recent advances in information technology, pp 1–5. IEEE
Roy A, Bhowmik TK, Parui SK, Roy U (2005) A novel approach to skew detection and character segmentation for handwritten Bangla words. In: Proceedings of the international conference on digital image computing: techniques and applications, pp 30–30. IEEE
Roy K, Roy K, Pal U (2006) Segmentation of unconstrained handwritten text based on RLSA algorithm. In: Proceedings of the national conference on recent trends in information systems, pp 196–199
Sharma MK, Dhaka VP (2016) Segmentation of English offline handwritten cursive scripts using a feedforward neural network. Neural Comput & Applic 27(5):1369–1379
Article Google Scholar
Shaw B, Parui SK (2010) A two stage recognition scheme for offline handwritten Devanagari words. In: Machine interpretation of patterns: image analysis and data mining, World Scientific, pp 145–165
Shi Z, Govindaraju V (2003) Skew detection for complex document images using fuzzy runlength. In: Proceedings of the international conference on document analysis and recognition, p 715. IEEE
Stamatopoulos N, Gatos B, Louloudis G, Pal U, Alaei A (2013) ICDAR 2013 handwriting segmentation contest. In: Proceedings of the international conference on document analysis and recognition, pp 1402–1406. IEEE
Wu Y, He F, Zhang D, Li X (2015) Service-oriented feature-based data exchange for cloud-based design and manufacturing. IEEE Transactions on Services Computing 11(2):341–353
Article Google Scholar
Xu Q, Huang G, Yu M, Guo Y (2020) Fall prediction based on key points of human bones. Physica A: Statistical Mechanics and its Applications 540:123205
Article MathSciNet Google Scholar
Xu Q, Li M, Li M, Liu S (2018) Energy spectrum CT image detection based dimensionality reduction with phase congruency. J Medical Systems 42 (3):49
Article Google Scholar
Xu Q, Wang F, Gong Y, Wang Z, Zeng K, Li Q, Luo X (2019) A novel edge-oriented framework for saliency detection enhancement. Image Vis Comput 87:1–12
Article Google Scholar
Xu Q, Wang Z, Wang F, Gong Y (2019) Multi-feature fusion CNNs for Drosophila embryo of interest detection. Physica A: Statistical Mechanics and its Applications 531:121808
Article Google Scholar
Xu Q, Wang Z, Wang F, Li J (2018) Thermal comfort research on human CT data modeling. Multi Tools App 77(5):6311–6326
Article MathSciNet Google Scholar
Yu H, He F, Pan Y (2020) A scalable region-based level set method using adaptive bilateral filter for noisy image segmentation. Multi Tools App 79 (9):5743–5765
Article Google Scholar
Zhang DJ, He FZ, Han SH, Li XX (2016) Quantitative optimization of interoperability during feature-based data exchange. Integrated Computer-Aided Engineering 23(1):31–50
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, BITS Pilani, Dubai Campus, PO Box 345055, Dubai, UAE
Rahul Pramanik
Department of Computer Science and Engineering, Indian Institute of Technology (ISM), Dhanbad -, Dhanbad, 826004, India
Soumen Bag

Authors

Rahul Pramanik
View author publications
You can also search for this author in PubMed Google Scholar
Soumen Bag
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rahul Pramanik.

Ethics declarations

Conflict of Interests

Rahul Pramanik declares that he has no conflict of interest. Soumen Bag declares that he has no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pramanik, R., Bag, S. A novel skew correction methodology for handwritten words in multilingual multi-oriented documents. Multimed Tools Appl 80, 27323–27342 (2021). https://doi.org/10.1007/s11042-021-10822-2

Download citation

Received: 25 October 2019
Revised: 13 January 2021
Accepted: 10 March 2021
Published: 18 May 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s11042-021-10822-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A novel skew correction methodology for handwritten words in multilingual multi-oriented documents

Abstract

Similar content being viewed by others

Linear Regression-Based Skew Correction of Handwritten Words in Indian Languages

Analysis on Skew Detection and Rectification Techniques for Offline Handwritten Scripts

Skew Detection and Correction of Devanagari Script Using Interval Halving Method

1 Introduction