1 Introduction

With the prevalence of computers and scanners, tremendous books and handwritings of its copies are being digitally available. In order to make these document can be accessed easily, various techniques is utilized and some of it are already playing major role in commercial application. Text line segmentation is significant stage of offline handwritten document recognition and analysis [1]. Correctness of segmented text lines would influence the process and result of subsequent stages directly [2]. Text-line segmentation on document images of printed texts is easily handled by using simple projection method and a statistically estimated threshold. However, it is not a promising way to segment handwritten document images [3,4,5]. Unlike machine printed documents [6], due to high diversity in writing habits of different writers, distances within text lines are irregular and existence of touching and overlapping text lines makes this work challenging.

Modern Uyghur script is an alphabetic script which has 32 basic characters and it is written from right to left [7]. Almost each letter has several special ascenders or descenders which distinguish them from similar letter forms. Due to the cursive nature of Uyghur script, the special symbol may appear connected, overlapped not only in a word and text-line, but also between neighboring text-lines, as well. This makes text line segmentation more difficult than printed texts or other scripts of isolated styles.

Traditional projection-based text-line segmentation method uses a confirmed constant threshold to separate different and neighboring text lines [7]. It is suitable for machine printed text images due to equal or regular spatial distance between neighboring text lines. Yet, its effectiveness is not acceptable for handwritten documents.

In this paper, we propose a novel approach for text line segmentation based on projection and adaptive thresholding mechanism. The proposed method has proven its effectiveness and robustness during the experiments on handwritten text images of text-lines with different styles, lengths, skewing and touching degrees. Rest of the paper is organized as follows: some previous works are recalled in Sect. 2. In Sect. 3, the proposed method is described in detail. Discussion on the conducted experiments and evaluation methods are given in Sect. 4. Section 5 draws brief conclusion then.

2 Related work

In 2006, Li Et Al proposed an approach based on smearing [8]. They first convert a binary image to gray scale image using a Gaussian window. Then, text lines are extracted by evolving an initial estimate using level set method [2]. The algorithm correctly detected 85.6% of 2691 ground-truth text lines. The segmentation error caused by adjacent text lines and over-lapping text line makes this algorithm less compatible.

In 2009, Vassilis Papavassiliou Et Al proposed an algorithm based on the piece-wise projection [9]. The algorithm is tested on the benchmarking datasets of IDCAR07 handwriting segmentation contest, correct rate of the segmentation reached 95.67%. Although the segmentation is mostly correct, over-segmentation is occurred.

In Bal and Saha [10] proposed a text line segmentation algorithm based on projection. All Rising section in the projection is measured and the average value of rising section is treated as threshold. The algorithm is tested on the IAM database which contains more than 550 text images which has different writer. This approach correctly segmented 95.65% text lines. Due to the chosen threshold is constant, it is not adaptable for various handwritten document and it is not able to segment severely sloped text line.

In Ptak et al. [11] proposes an algorithm based on projection with a variable threshold. This method can segment handwritten text lines which text lines are in similar length. However, performance of segmentation declines when text lines are short or touched. The author tested the algorithm on their own collected Polish document images, which contains similar length text lines document and random length text lines.

In this paper, a projection based adaptive threshold algorithm for text line segmentation is proposed.

3 Methodology

3.1 Framework

The first-hand collected Uyghur handwritten text samples are preprocessed using common preprocessing techniques including turning the original image to the gray scale image, dilation, binarization and noise removal [12]. After the document image is preprocessed, horizontal projection of preprocessed image is calculated, and thresholding is performed according to projection peaks and its locations. After measuring threshold, each text line is segmented according to each previously determined threshold and the line separators are drawn at the valley point, which is determined according to horizontal projection profile, of each neighbor text lines in the original image. The major steps of proposed algorithm are shown in Fig. 1.

Fig. 1
figure 1

Major steps of proposed algorithm

3.2 Preprocessing

Preprocessing technique aims to eliminate and minimize harmful or insignificant content and enhance useful features in images, especially for document images [13]. Thus, it improves generality of sample representation and performance of subsequent works. Before the proposed text-line segmentation method is applied, preprocessing is performed using turning the original image (color image) to the gray scale image, dilation, noise removal and binarization which is used twice.

3.2.1 Gray scaling

In order to calculate a projection profile, original document image should be turned to binary image, thus, gray scaling is performed before binarization. Therefore weighted sum method is used to conduct gray scaling. Commonly, a color image contains three channels, each channel stores the 2-dimentional array which represents red, green and blue [14]. The gray scale image is determined by calculating a weighted sum of three channels components for every pixel of color image. Therefore, three dimensional tensor became two dimensional array that stores the result of calculation which is the final gray scale image.

$$ S = 0.2989 \times R + 0.587 \times G + 0.1140 \times B $$
(1)

3.2.2 Dilation

Dilation is one of the basic operations in mathematical morphology [15]. The dilation operation usually uses a structuring element for probing and expanding the shapes contained in the input image [16]. The dilation of \( A \) by \( B \) is defined by:

$$ A \oplus B = \bigcup\limits_{b \in B} {A_{b} } $$
(2)

where \( A_{b} \) is the translation of A by B.

The dilation process is highly dependent on its structuring element [17]. If it is not suitable for particular situation in image, dilation process may cause unpromising result which is different from expectations [18]. Thus, the structuring element must be defined properly. The dilation kernel used in this work is shown in Fig. 2. By using the kernel shown below, representation of text in document image became conspicuous.

Fig. 2
figure 2

Dilation kernel

In this paper, dilation is used to thicken the texture of text in document image and keep the main area of the text, which allows the proposed algorithm easier to extract vital information like peak points and valley points, and distinguish each text line. Fig 3 compares binary image of a text document and its dilation effect.

Fig. 3
figure 3

Before and after dilation

3.2.3 Noise removal

Noise removal is important to any kind of image processing task [19], especially for handwritten document images [20]. Generally, scanned handwritten document image contains some kind of noisy points which caused by dirt or during the scanning process. These points are harmful for the entire process of algorithm. Since binarized image is dilated, consequently, noisy points are also became bigger that could affect subsequent work. Filtering is a prevalent way to minimize or remove the noise in images. Each filter commonly contains a corresponding window. With the expansion of window size, result of filter would be vaguer [21]. This means window size must be chosen appropriately; otherwise, the document image will lose important information in the process of filtering. In this paper, we use mean filter to perform noise removal. Mean filtering is a simple, intuitive and easy to implement method of smoothing images i.e. reducing the amount of intensity variation between one pixel and the next [22]. Thus, noisy points in blank area in document image can be weakened or eliminated. For every pixel in image, the filter would calculate average value of corresponding window and replace the original value to the calculated one.

$$ O\left( {x,y} \right) = \frac{1}{mn}\mathop \sum \limits_{{\left( {s,t} \right) \in S}} I\left( {s,t} \right) $$
(3)

Besides, we also used mean filter to minimize the local extrema (minima and maxima points) in projection profile which is calculated after whole preprocessing technique is done. Some different blurring parameters are tested to observe their blurring effects, setting window size to 30 by 30 pixels gave the best blurring effect and is selected as blurring parameter in later experiments. Handwritten document image after smoothing by different window sizes parameters are compared in Fig. 4.

Fig. 4
figure 4

Differently blurred images

3.2.4 Binarization

Document image binarization is a crucial phase which is able to segment the text and the background by eliminating remaining unimportant information [23]. The histogram of original gray image and image blurred by 30*30 window is shown in Fig. 5. As the second histogram shows, after the document image is blurred, gray level in each black pixel is reduced [24]. Moreover, the gray level of pixels, which is near to black ones, are increased. This means that threshold of binarization should be chose correctly.

Fig. 5
figure 5

Histogram of document image

Thus, Otsu thresholding method is used for image binarization [25].

$$ \sigma_{\omega }^{2} \left( t \right) = \omega_{0} \left( t \right)\sigma_{0}^{2} \left( t \right) + \omega_{1} \left( t \right)\sigma_{1}^{2} \left( t \right) $$
(4)

Weights \( \omega_{0} \) and \( \omega_{1} \) are probabilities of two classes, which refers text lines and the background or black pixel and white pixel, separated by a threshold \( t \), and \( \sigma_{0}^{2} \) and \( \sigma_{1}^{2} \) and variance of these two classes [26].

In this work, binarization also enhances the generality of the text lines in our document image. Four images, which are differently blurred, after binarization effect are shown in Fig. 6.

Fig. 6
figure 6

Binarization effect

As the binary image shows, some noisy points are removed, text area in document image became smoother than original image. This is very conducive to compute a smooth projection profile. The projection after binarization on each differently blurred images are shown in Fig. 7.

Fig. 7
figure 7

Projections after binarization

3.3 Text line segmentation

Widely acknowledged text line segmentation method based on projection calculates the average gap between successive text lines, then define a constant threshold to separate these text lines [27, 28]. However, when threshold is constant, touched or near text lines might be omitted. Therefore, the process of defining threshold must be adaptive to different gaps between each neighbor text line couples.

In this work, after calculating horizontal projection profile \( H \) from the preprocessed image, significant peaks’ location which might represent each potential text lines are extracted to set \( P \) and \( P^{{\prime }} . \) Next, thresholding is performed as follows: visit each element \( P\left( i \right) \) in set P; for given \( P\left( i \right), \) take the half of the peak value and give it to threshold T. In general, visit each peak’s location, then get its value and take the half of it and treat it as threshold.

$$ T_{p} = P\left( i \right) \cdot \frac{1}{2} $$
(5)

Since each threshold is differently measured form peaks of horizontal projection values, the threshold will have different values for each neighbor text lines. After measuring each threshold, the projection values are visited reversely from the current peak location. If the currently visited projection value is less than threshold, then the location of this projection value is assumed as starting point of a text line and added to set \( S \) and break the loop. Then, the ending points are determined same way using forward visiting of projection values and the ending point is added to set \( E \), correspondingly. However, these intervals, which composed by starting points and ending points, are not totally reliable for determine each potential text line. Therefore, interval inspection is performed to remove the intervals that do not represent text line. The pre-estimated text-line intervals and tip points (starting, ending) are checked to confirm their validity and correctness by the following algorithm.

First, visit each element in set \( S \) and set E, for given start point \( S\left( i \right) \) and end point \( E\left( i \right), \) to calculate midpoint \( M_{i} \) of each interval using equation below;

$$ M_{i} = \frac{S\left( i \right) + E\left( i \right)}{2} $$
(6)

Second, get next interval’s start point \( S\left( {i + 1} \right) \), if it is greater or equal to \( M_{i} \), the algorithm see these two intervals as true intervals, which means they are not overlapped with each other, then accept it as a true interval, otherwise it is seen as false interval (7) and it will be added to the previous interval, which is the process of combination of two intervals. This process makes the performance of interval selection more acceptable.

$$ \left\{ {\begin{array}{*{20}l} {S\left( {i + 1} \right) \ge M_{i} } \hfill & {true} \hfill \\ {S\left( {i + 1} \right) < M_{i} } \hfill & {false} \hfill \\ \end{array} } \right. $$
(7)

After modifying set S and E straight lines are drawn to separate the text-lines in the document image. The separator lines are drawn horizontally at valley points, which is in the horizontal projection profile, between two adjacent estimated text-line positions.

In the respect of computation complexity, firstly, the projection calculation is depend on the height and width (rows and columns) of document image. Then, due to every peak is extracted by a projection vector (one dimensional array), thus, peak extraction stage is linear. Moreover, line drawing is also linear. Thus, final equation of time complexity is:

$$ O\left( n \right) = rc\left( {w*h} \right) $$
(8)

where r and c refers to row and column of binarized document image, where w and h refers to the width and height of the filter.

3.4 Algorithm

The pseudo code of proposed algorithm is shown in Table 1.

Table 1 Pseudo code of algorithm
  • Step 1: Read a handwritten document image as a multi-dimensional array;

  • Step 2: Convert the raw image to gray scale image and binarize the gray image;

  • Step 3: Dilate the binarized image;

  • Step 4: Blur the dilated image;

  • Step 5: Binarize the blurred image;

  • Step 6: Calculate the horizontal projection profile of binarized image;

  • Step 7: Add peaks, which is above the mean value of projection, to set P and their locations are stored into set P’.

  • Step 8: For each element in set P, calculate the threshold by taking half of the peak value. Visit the elements of projection vector from currently visiting peak’s location forwardly and reversely to determine ending point and starting point, respectively. Where projection value is less than threshold is measured as starting point or ending point and the location of these are added to set S and set E.

  • Step 9: For each interval, calculate the mid point Mi. Compare it with next interval’s start point. If it is greater or equal to Mi, accept it as a true interval. Otherwise it is seen as false interval and it will be added to the previous interval

  • Step 10: Draw a straight line at the valley point between two adjacent intervals according to HPP.

  • Step 11: End.

4 Experimental result

4.1 Database

To verify the proposed algorithm, we collected 210 Uyghur handwritten document images including 2570 text lines. The collected handwritten documents are written by different writers that each document varies in length and handwriting styles. The handwriting styles in the established database are broadly categorized into three types: (1) neatly written text-lines with random lengths; (2) similar length of text-lines in casual style that contain many overlapping and ligatures; (3) skewed normal handwriting. Fig 8 shows some typical examples of the mentioned handwriting styles in the database. Each document image is separately stored in TIF format. The pixel intensity of the samples also varies between 1477 × 944 to 2175 × 2277.

Fig. 8
figure 8

Three samples of database

Additionally, we also collected the Polish handwritten document images from website that is given by Ptak et al. [11]. Dataset include 29 pairs of Polish document image which has 58 images in total. They generally put these document images into two different classes which are documents that contains short length of text lines and documents that almost has equal length of text lines. In the database, each document is stored as pair. Each pair has random length text line version and mostly identical length text line version. In the document, writing style is divergent from image to image. Some are very neatly written, but severely sloped, which is multidirectional. Some are not sloped but written in extremely casual style. Thus, running test on this data set is also able to evaluate the performance of proposed algorithm due to the dataset’s challenging feature.

Finally, the proposed algorithm is also tested on the public offline handwriting dataset, the IAM dataset [29], to evaluate its performance. It includes unconstrained handwritten text, which were scanned at a resolution of 300 dpi and saved as PNG images with 256 gray levels. The sample of IAM database is illustrated in Fig. 9.

Fig. 9
figure 9

Sample of IAM English handwriting dataset

4.2 Evaluation method

In this paper, we calculated precision, recall and the F-measure to evaluate the performance of proposed algorithm [30]. Precision is based on manually counting the total segmented text lines and correctly segmented text lines, recall is based on counting the total text lines and the correctly segmented text lines. Then, the F-measure is calculated according to precision and recall.

$$ P = \frac{{L_{c} }}{{L_{s} }} $$
(9)
$$ R = \frac{{L_{c} }}{{L_{t} }} $$
(10)
$$ F = \frac{2PR}{P + R} $$
(11)

where \( L_{c} \) and \( L_{s} \) denote the correctly segmented text lines and total segmented text lines, respectively. Where \( L_{t} \) refers to the total lines in document image.

4.3 Result and analysis

Several algorithms including projection based are tested on introduced datasets to compare with proposed algorithm. Brief introduction of algorithms and its segmentation mechanism is depicted below.

There are three parameters is taken to the participant algorithm which is the input image, windows size of filter and the relative threshold. The optimum values of parameters are given that the window size takes 9 and the relative threshold takes 0.5. The experimental results of text-line segmentation on our dataset are shown in Fig. 10 and Table 2. For comparison, we evaluated the participant algorithm on our database.

Fig. 10
figure 10

Comparison of algorithms

Table 2 Result of experiments

In the participant algorithm [11], the Polish document image is preprocessed including turning the original image to gray scale image, binarization and noise reduction. Then count the projection profile of preprocessed image and sort it with descending order. Then visit each value of sorted projection to determine the threshold. Each time the algorithm chooses a threshold, text lines would be segmented afterward. If the text lines are already segmented, the algorithm would continue to the next iteration. The algorithm stops when the current value of projection is less than 1/10 of maximum value of projection.

In contrast, our algorithm’s preprocessing stage has one more step which is dilation. This guarantees the important features of text in document image not to be removed by the noise reduction process. In the respect of threshold measuring, we extract each location corresponding to the significant peaks to determine the threshold rather than sorting the entire projection profile. In text line extraction stage, our algorithm starts visiting from the location of a significant peak, terminates when algorithm find a starting point or an ending point of one interval, rather than visiting all values of projection. In the respect of checking extracted text lines whether it is correctly segmented, we conducted checking mechanism that is totally different from the participant algorithm. The participant algorithm simply just omits if the currently segmented text lines overlaps with intervals which is segmented previously, even it is not severely overlapped. In our checking mechanism, we consider each two adjacent intervals and observe the current interval’s start point that whether it is greater than the next interval’s midpoint.

According to results of the two segmentation algorithms in Table 2, proposed algorithm outperformed the participant algorithm in recall and F-measure. Although the precision of the participant algorithm is higher than the proposed algorithm, its recall rate is much lower than proposed algorithm. This means method [11] is not strong as the proposed algorithm in the respect of text-line detection. Segmentation precision of the participant algorithm is high for neatly styled text-lines, but it is observed not strong enough to detect sufficient text lines. Some text-line segmentation effects of two compared algorithm are illustrated in Fig. 11. In sample (a), which is neatly written handwriting sample, the participant algorithm is unable to detect and segment short text lines. Although the text lines in sample (b) is mostly similar in the respect of length, the casual writing style and skewed text lines affected the participant algorithm’s accuracy. Even the participant algorithm detected one of the skewed text lines, the segmentation is incorrect. But our algorithm segments the all text lines in both sample properly.

Fig. 11
figure 11

result of two different algorithms

Proposed algorithm and the algorithm [11] are also tested on the polish handwriting documents. In this experiment, proposed segmentation method still outperformed the compared method. However, the result of both algorithm is not promising due to testing dataset’s feature is very challenging and segmentation condition is extreme. The result shows that the proposed algorithm detected and segmented most text lines in this Polish document image. However, in proposed algorithm, same error occurred because text lines are skewed. Although our algorithm detected every text line in the image, the segmentation is not correct. Since skewed text lines affected the extraction of significant peaks of projection profile. Algorithm [11] is not sensitive to short text line and when it exist in document, the algorithm is not able to segment these text lines. Finally, the recall rate of proposed algorithm and algorithm [11] are 63.23% and 38.06%, respectively.

We tested several algorithm on our Uyghur documents and Table 3 is the result of each algorithm. It can be seen from the result that the proposed algorithm is also better than other compared algorithms.

Table 3 Comparison of algorithms on Uyghur dataset

In addition, proposed algorithm is also tested on IAM public handwriting dataset. The experimental result shows that out method is also promising on public handwriting dataset. As the Table 4 shows that proposed algorithm’s performance is also better than other recent approach using same dataset.

Table 4 Comparison of algorithms on IAM dataset

In the final stage of segmentation process, detected text lines will be separated from original image. At the same time, every separated text line will stored as individual line image. Some of segmented line images are shown in Fig. 12

Fig. 12
figure 12

result of two different algorithms

As it can be seen from the separation results, sample A, which is written neatly and has significant gap between each text line, is separated easily with all of its contents and did not miss any significant information during the separation process. Thus, in the recognition stage [31], this will enhance the recognition accuracy by providing a whole text line. In contrast, due to skewness of some text lines in other type of document images, the separated line image lost some important information which includes part of words or characters even the line is accurately detected. In this scenario, handwriting recognizers would be affected directly and cause incorrect recognition. Consequence of this kind of segmentation is illustrated in Fig. 13.

Fig. 13
figure 13

result of two different algorithms

5 Conclusion

This paper proposed a novel approach, which is not effected by the length of text lines in handwritten document, for off-line Uyghur handwritten text line segmentation using projection based adaptive threshold selection. The proposed algorithm is verified on 210 different Uyghur handwritten document images and 27 pairs of Polish document image, which is 58 images in total, including 1474 text lines. The experimental results shows robustness of the proposed text line segmentation algorithm. In our dataset, Recall rate of the proposed text-line segmentation algorithm is observed as 97.70% which is much higher than 82.35% recall of the compared algorithm. In Polish document dataset, the final recall rate of proposed algorithm is 63.23% which is twice as accurate as algorithm [11]. Finally, in the IAM public handwriting dataset, proposed algorithm is also better than the recent approach. The increase of segmentation rate means that the subsequent stages will be done in more reliable way. However, there are some disadvantages in proposed algorithm due to its simple projection-based mechanism. If the written direction of document is severely skewed, the performance of the proposed algorithm would decline or even unable to segment skewing text lines. Another factor that makes the performance of the algorithm decline is incorrect peak extraction from calculated projection profile, since the existence of overlapping text lines and nearly written neighboring text lines. To develop more comprehensive and general text-line segmentation algorithm, that is able to segment skewed text lines, is the main content of our next work.