Abstract
Handwritten character recognition has been a well-known area of research for last five decades. This is an important application of pattern recognition in image processing. Generally 2D scanning is used and the text is captured in the form of an image. In this work instead of regular scanning method, the X, Y co-ordinates are measured using measuroscope at every pixel point. Further a 3D feature, depth of indentation, ‘Z’, which is proportional to the pressure applied by the scriber at that point, is measured using a dial gauge indicator. In the present work the profile based features extracted for palm leaf character recognition are ‘histogram’ and ‘distance’ profiles. The recognition accuracy obtained using the Z-dimension, a 3D feature, is very high and the best result obtained is 92.8 % using histogram profile algorithm.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Technological developments gave the printed character recognition a new dimension called Optical Character Recognition (OCR). In the initial stages, most of the work was contributed towards the printed English characters, due to more number of English speakers compared to any of the Indian languages. However, handwritten character recognition became important in due course of time for the application of automatic mail sorting (zip code identification). Further, signature verification of forensic departments and authenticity of a document written by a specific scriber increased the importance of handwritten character recognition.
2 Previous Work
Sastry and Krishnan developed database and test characters for Palm Leaf Character Recognition (PLCR) pertaining to Telugu (a south Indian language) [1,2,3,4,5]. They developed many models using 2D correlation, PCA and Radon transform [1,2,3] in the area of Palm Leaf Character Recognition. Based on the measure of similarity all the Telugu characters were divided into 3 Co-ordinate planes which are XZ, XY and YZ. The best recognition accuracy was reported in the YZ plane for all the methods and found to be 90 %.
Patvardhan et al. [6, 7] presented a denoising approach using discrete curve-let transform and binarization technique using wavelets.
Manjunath Aradhya and others [8] proposed character identification using combination of FT and PCA. The documents were scanned on a HP 2400 scan jet scanner and subsequently skew corrected. Aradhya et al. [9] presented text recognition from videos and images using Gabor filter and wavelet transforms. Aradhya et al. also presented recognition of numerals and multilingual text using wavelet transforms and wavelet entropy, respectively.
Vijaya Lakshmi et al. [10,11,12] worked on isolated Telugu handwritten characters using zoning techniques and hybrid classification approaches. In [12] they reported that the recognition accuracy can be improved by classifying the characters using two classifiers viz., k-NN and SVM. For various feature extraction methods they reported improved recognition accuracy using two stage classification approach.
3 Data Acquisition
In general for any character recognition system the documents are scanned and stored in computer for further analysis. When the documents are scanned noise and skew problems were inevitable. In the present work the co-ordinates of X, Y and Z were measured in a laboratory set up using measuroscope and dial gauge indicator with plunger assembly [1, 3]. After selecting the basic Telugu isolated alphabet both the co-ordinates (X and Y) are measured using a measuroscope. Probe attached to the moving axis of this machine is used to measure the X and Y co-ordinates.
The selected points for measurements are chosen at the starting point, bends, intersections, turns and the end points of the character as shown in Fig. 1. The left most pixel in the character is considered as origin ‘O’ and the remaining pixel co-ordinates are measured with reference to origin ‘O’ [1]. The number of pixel points for different characters vary from 13 to 30 depending on the shape of the Telugu character [1]. The depth of indentation (Z) is proportional to the scriber’s stylus stress applied on the palm leaf [1,2,3]. A needle made up of Teflon is attached to the dial indicator plunger to measure the depth of a character at a selected point. The distance of the bottom of the pixel point is measured and recorded as \( Dist_{1} \). Then the distance is measured on the top of the palm script and recorded as \( Dist_{2} \). The difference between \( Dist_{1} \) and \(Dist_{2} \) gives the depth of indentation for a selected pixel. This method is followed to obtain different alphabets of Telugu on the palm leaf and the measurements of X and Y are recorded. This is the 3D feature which differentiates highly similar characters and gives the best results in PLCR. The depth of indentation varies from 10 to 150 microns which is obtained utilizing dial indicator and a plunger assembly. For every Telugu character, three co-ordinates (X, Y, Z) are found to test and train the images. Using the X and Y co-ordinates, a pattern is developed and termed as the image of XY plane. Using the values of X, Y, Z images are developed for the XZ, XY and YZ co-ordinate system. The number of training samples developed in each plane of projection is 112 from 28 different classes i.e., 4 samples/class. Whereas the number of samples used for testing are 28 from different classes. A total of 420 character patterns are developed in all three planes. The training and testing samples are mutually disjoint.
4 Proposed Recognition Model
The proposed recognition model consists of various steps such as preprocessing, feature extraction and Classification. For every projection plane all these steps are performed and their recognition accuracies are reported in Sect. 5.
4.1 Preprocessing
The character images are binarized using Otsu’s thresholding technique. The minimum boundary rectangle method is used for normalizing all the images to a size of \(50\,\times \,50\). The data acquisition method does not involve the scanning of palm leaf, thereby avoiding the problems of skew and noise.
4.2 Feature Extraction
The two features extracted from the characters and used for recognition are Histogram profile and Distance profile.
4.2.1 Histogram Profile
The histograms projected in specified directions are created. The directions considered are vertical, horizontal and two diagonals left and right. The vertical \( (V_{q}) \) and horizontal \( (H_{p}) \) histograms are computed using Eq. (1).
for q \(=\) 1:N and p \(=\) 1:N respectively, and I(p, q) is the character image of size N \(\times \) N.
The left diagonal (\( LD_{j} \)) and right diagonal (\( RD_{j} \)) histograms are computed using Eq. (2).
for j \(=\) –(N–1):(N–1)
For a sample character ‘va’ the four histograms are shown in Fig. 2.
4.2.2 Distance Profile
The distances of a character image are measured from a boundary box in specified directions. The directions considered to compute distance are top to bottom, bottom to top, left to right and right to left. The size of the feature vector for an N\(\,\times \,\)N image is 4N.
For a sample character ‘va’ the four distance profiles are shown in Fig. 3.
5 Experimental Results and Discussions
The similar Telugu characters are classified into 6 different groups based on the correlation coefficient [1, 3]. The correlation coefficient is more than 0.75, for the characters in the same group, as reported in the literature [1, 3]. There are many characters in Telugu which are highly similar and hence their patterns are always confusing for recognition. Table 1 shows the three co-ordinates of a character “Ba”.
A few similar Telugu characters “Ae”, “Na” and “Pa” from the same group having high similarity as shown in Fig. 4.
The corresponding YZ images are shown in Fig. 5. The problem of recognizing Telugu characters is resolved to a maximum extent with the method proposed, if we consider these patterns which are highly uncorrelated, for any 2 different Telugu palm leaf characters. Further these patterns are highly repetitive in nature for any palm leaf character.
The corresponding XZ images are shown in Fig. 6. With these patterns the recognition rate could be improved to a greater extent. It is very clear from these patterns that these patterns are completely different to each other; thereby recognition accuracy would naturally increase for the proposed approach.
The proposed recognition model is compared with the published methods on the same dataset which is in Table 2. All the proposed and existing methods used k-NN classifier for character classification. Sastry et al. [1,2,3] contributed work on palm leaf character recognition with Radon transform, PCA and 2D correlation approaches.
Sastry et al. published that the maximum Recognition Accuracy (RA) as 90 % in YZ Co-ordinate system using 2D correlation approach [1]. This is a spatial approach by which the proposed model is compared.
Sastry and Krishnan reported the maximum RA as 89 % in YZ Co-ordinate system using Radon transform [2]. The palm leaf character recognition was experimented by considering the image intensities along a radial line [2]. This is a transform domain approach by which the comparison is made with the proposed model.
Sastry et al. [3] published that the recognition accuracy using Principal Component Analysis (PCA) to be less than 50 % for XZ, XY and YZ planes as shown in Table 2.
In the proposed method YZ plane of projection gave the best results which is in line with the existing methods. The reason is due to a large variation in the Y direction between any two similar Telugu palm leaf characters which is an inherent characteristic of the script. Also the 3D feature i.e., the depth of indentation (Z dimension) is a special feature which gave the best results. The RA is high for YZ compared to XY and XZ planes of projection. The best RA obtained is 92.8 % in the YZ plane.
The recognition rate along XZ projection plane is found to be 89.2 % using Histogram profile approach which is higher than the recognition accuracy of XY plane of projection. The reason is due to the 3D feature i.e., ‘Z’ dimension.
6 Conclusions and Future Work
There is no standard database for Indian languages including Telugu as reported in the literature. Since there are many Telugu characters due to its inherent curvilinear shape and more number of modifiers, achieving high recognition accuracy is a challenging task. The recognition accuracy obtained is 51 and 35.7 % using histogram computation and distance profile methods respectively in XY plane, which is very low. The scriber uses a stylus for scribing Telugu characters on the palm leaf, which gives various depth values along the contour of the character. These values become the special features in palm leaf character recognition. Since many Telugu characters are highly similar, hence the depth information of the pixels along the contour (Z measured in microns) helped to improve the recognition accuracy from 35.7 % to 92.8 % using the proposed methods. Automatic scanning of the palm leaf characters for data acquisition and decrease of human interface can be developed which would decrease the time of data acquisition.
References
P.N. Sastry, R. Krishnan, B.V.S. Ram, Classification and identification of Telugu handwritten characters extracted from palm leaves using decision tree approach. J. Appl. Eng. Sci. 5(3), 22–32 (2010)
P.N. Sastry, R. Krishnan, Isolated Telugu palm leaf character recognition using radon transform, a novel approach, in World Congress on Information and Communication Technologies (WICT), (2012), pp. 795–802
P.N. Sastry, R. Krishnan, B.V.S. Ram, Telugu character recognition on palm leaves- a three dimensional approach. Technol. Spectr. 2(3), 19–26 (2008)
P.N. Sastry, T.R. Vijaya Lakshmi, R. Krishnan, N. Rao, Analysis of Telugu palm leaf characters using multi-level recognition approach. J. Appl. Eng. Sci. 10(20), 9258–9264 (2015)
T.R. Vijaya Lakshmi, P.N. Sastry, R. Krishnan, N.V.K. Rao, T.V. Rajinikanth, Analysis of Telugu palm leaf character recognition using 3D feature, in International Conference on Computational Intelligence and Networks (CINE), (2015). pp. 36–41
C. Patvardhan, A.K. Verma, C.V. Lakshmi, Document image binarization using wavelets for OCR applications, in Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing, (ACM, 2012), pp. 60:1–60:8
C. Patvardhan, A.K. Verma, C.V. Lakshmi, Document image denoising and binarization using curvelet transform for OCR applications, in Engineering (NUiCONE), 2012 Nirma University International Conference on, (Dec 2012), pp. 1–6
V.M. Aradhya, G.H. Kumar, S. Noushath, Multilingual OCR system for south Indian scripts and English documents: an approach based on Fourier transform and principal component analysis. Eng. Appl. Artif. Intell. 21(4), 658–668 (2008)
V.M. Aradhya, M. Pavithra, A comprehensive of transforms, Gabor filter and k-means clustering for text detection in images and video. Appl. Comput. Inf. (2014)
P.N. Sastry, T.R. Vijaya Lakshmi, N.V.K. Rao, T.V. Rajinikanth, A. Wahab, Telugu handwritten character recognition using Zoning features, in International Conference on IT Convergence and Security (ICITCS), (Beijing, 2014), pp. 1–4
T.R. Vijaya Lakshmi, P.N. Sastry, T.V. Rajinikanth, Recognition of isolated Telugu handwritten characters using 2D FFT, in Proceedings of the 2nd International Conference on Advanced Computing Methodologies, ser. ICACM’13, (2013), pp. 372–376
T.R. Vijaya Lakshmi, P.N. Sastry, T.V. Rajinikanth, Hybrid approach for Telugu handwritten character recognition using k-NN and SVM classifiers. Int. Rev. Comput. Softw. 10(9), 923–929 (2015)
Acknowledgements
The proposed work is sponsored by AICTE under Research Promotion Scheme and so the authors thank AICTE for motivating to do this work which is, useful to the society.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sastry, P.N., Vijaya Lakshmi, T.R., Koteswara Rao, N.V., RamaKrishnan, K. (2017). A 3D Approach for Palm Leaf Character Recognition Using Histogram Computation and Distance Profile Features. In: Satapathy, S., Bhateja, V., Udgata, S., Pattnaik, P. (eds) Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications . Advances in Intelligent Systems and Computing, vol 516. Springer, Singapore. https://doi.org/10.1007/978-981-10-3156-4_40
Download citation
DOI: https://doi.org/10.1007/978-981-10-3156-4_40
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3155-7
Online ISBN: 978-981-10-3156-4
eBook Packages: EngineeringEngineering (R0)