Writer Identification System for Handwritten Gurmukhi Characters: Study of Different Feature-Classifier Combinations

Sakshi; Garg, Naresh Kumar; Kumar, Munish

doi:10.1007/978-981-10-6319-0_11

Sakshi⁵,
Naresh Kumar Garg⁵ &
Munish Kumar⁶

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 9))

616 Accesses
12 Citations

Abstract

In this paper, we are exploring various features and classifiers for writer identification in light of Gurmukhi text handwriting. The identification of the writers based on a piece of handwriting is a challenging task for pattern recognition. The writer identification framework proposed in this paper includes diverse stages like image preprocessing, feature extraction, training, and classification. The framework first prepares a skeleton of the character so that meaningful data about the handwriting of writers can be extracted. The feature extraction stage incorporates various plans, namely, zoning, diagonal, transition, intersection and open end points, centroid, the horizontal peak extent, the vertical peak extent, parabola curve fitting, and power curve fitting based features. In order to assess the prominence of these features, we have used four classification techniques, namely, Naive Bayes, Decision Tree, Random Forest and AdaBoostM1. For experimental results, we have collected 49,000 samples from 70 different writers. In this work, maximum accuracy of 81.75% has been obtained with centroid features and AdaBoostM1 classifier.

Access provided by CONRICYT-eBooks. Download conference paper PDF

A Comparative Study of Tree-based Structure Methods for Handwriting Identification

Improved recognition results of offline handwritten Gurumukhi characters using hybrid features and adaptive boosting

Article 25 July 2021

Tree-base Structure for Feature Selection in Writer Identification

Keywords

1 Introduction

Writing is implied by the representation of language in a textual medium using an arrangement of ‘signs or symbols’. Every individual has his own writing style which relies on a considerable measure of variables like particular shape of letters, spacing between letters, etc. Handwriting of a person is also subject to the mental condition of the individual like his level of inspiration, anger, joy, and others. In any case, it is found that handwriting of a person is generally steady however might be influenced gradually with age. Writer identification, in general plays an important role in forensic, writer verification schemes, and related branches of science. The parameters are typically considered for writer identification is comprehensiveness uniqueness, aging, accessibility, etc. The same is applicable in Gurmukhi script too with an extra essentialness of finding the advancement and development of the Gurmukhi script. It is clear that the significance of writer identification has turned out to be more vital nowadays. So, the number of researchers engaged with this challenging issue, is going on top of these opportunities. There are many languages all through the world. Each language represents an alternate danger to the writer identification issue contingent upon the characteristics of the language. It is clear that the identification problem differs across various languages. The handwriting-based writer identification is an active research area. A comprehensive survey of writer identification till 1989 was presented by Lorette and Plamondon [8, 11]. Zois and Anastassopoulos [14] have presented a writer identification framework based on English and Greek writers. They have achieved an accuracy of 95% for writer identification of both English and Greek writers. Leeham et al. [9] proposed a technique to recognize the writers based on numerals written by them. Schlapbach and Bunke [12, 13] presented a Hidden Markov Model (HMM) based writer identification and verification strategy. They prepared an individual HMM for each writer’s handwriting. The identification technique was tested with data set collected from 650 scholars. Gazzah and Amara [2] have proposed an approach for writer identification framework based on offline handwritten Arabic script documents. The proposed technique depends on combining the global and neighborhood feature sets by using the genetic algorithm so as to take out the redundant and irrelevant features. They have considered two classifiers, SVM and MLP for recognition. They have noticed that MLP performs preferable outcomes over SVM and got precision of around 94%. Ghiasi and Safabakhsh [3] have proposed an effective technique for writer recognition using a code book. They have utilized Farsi database which incorporates short, medium, and extensive messages and results demonstrate that the productivity of short messages are more effective. Maadeed [10] has proposed a writer identification framework based on Arabic handwritten text. He has also compared the edge direction distribution features with other features of Arabic text and used k-NN classifier for recognition. Writer identification framework is divided into two categories, namely, text dependent and text independent. Depending on the text content, text-dependent methods match the same characters and hence require the writer to write the same text. Text-independent methods are able to identify writers independent of the text content and it does not require comparison of the same characters. A very few studies in Indian languages have been documented so far. Currently, writer identification of handwritten Gurmukhi script documents is done manually. In this paper, we have presented a study of different features and classifiers combinations for text dependent writer identification model based on Gurmukhi text handwriting. This paper is divided into six sections. The introduction and related work have been presented in Sect. 1. Section 2 illustrates the Gurmukhi script and data collection phase. Section 3 portrays the feature extraction techniques. In Sect. 4, we have briefly discussed about classification techniques. Section 5 incorporates experimental results using these features and classifiers. At long last, conclusion and future works are exhibited in Sect. 6.

2 Gurmukhi Script and Data Set

Gurmukhi script is the script used for writing the Punjabi language. In Gurmukhi script, there are 35 basic character constants out of which the initial three are vowel bearers. In this work, we have considered all these 35 fundamental Gurmukhi characters. Twenty samples of each character written by each writer are taken. In this manner, we have collected 49,000 samples from 70 different writers. 70% data of 49,000 samples has been taken as training dataset and rest of data is considered as testing dataset.

3 Feature Extraction Techniques

The performance of writer identification system, basically, is dependent on the features that are being extracted. The extracted features ought to have the capacity to classify a writer in a unique way. In this work, we have explored various features like, zoning, diagonal, transition, intersection and open end points, centroid, horizontal peak extent, vertical peak extent, parabola curve fitting, and power curve fitting based features. These features are extracted by using a hierarchical technique presented by Kumar et al. [6]. Numerous scientists have been utilized zoning, diagonal, transitions and intersection point’s, etc., based features for printed or handwritten character recognition work. Parabola curve fitting and power curve fitting based features are provided by Kumar et al. [7]. They have used these features for offline handwritten Gurmukhi character recognition. They have also proposed peak extent based feature extraction techniques for handwritten Gurmukhi character recognition [5]. In this work, we have considered these strategies for proposing a text dependent writer identification framework based on Gurmukhi text handwriting.

4 Classification Techniques

Classification is the last stage of the writer identification framework which is used to classify the writers based on the features extracted in the previous phase. In this stage, just a final decision is taken about the unknown writer of character, to which class it belongs to find the identity of a writer. In this paper, we have considered four different classification methods, namely, Naive Bayes, Decision Tree, Random Forest, and AdaBoostM1. The Naive Bayes [4] classifier is a basic method which has a very clear semantics representing a probabilistic knowledge. It assumes that in a given class, predicative attributes are conditionally independent. Attributes in decision tree are nodes and each leaf node is representing a class of the writer. Decision tree classifiers are used to classify various subsamples of dataset. Random forest is one of the most popular and powerful machine learning algorithms [1]. It is a type of ensemble machine learning algorithm. Random forest removed the over-fitting crisis of decision tree. The meta estimator that fits the number of decision tree classifiers for such purpose is called random forest. The random forest uses averaging to help in getting better predictive accuracy and control over-fitting. Random forest is unexcelled in accuracy among existing supervised learning algorithms for classification and runs efficiently on large data bases [12]. AdaBoostM1 is a machine learning meta algorithm and can be used in conjunction with many other types of learning algorithms to improve their performance. The output of the other learning algorithms is combined into a weighted sum that represents the final output of the boosted classifier. AdaBoostM1 is adaptive in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. In this work, we have considered AdaBoostM1 with random forest classifier for improve the proposed writer identification accuracy.

5 Experimental Results and Discussion

In this section, we have presented experimental results of various features and classifiers considered in this work. For experimental results, we have used a dataset of 49,000 samples collected from 70 different writers of isolated handwritten characters of Gurmukhi script. 70% data from 49,000 samples is taken as training dataset and rest of data is considered as testing dataset. Experimental results are derived using distinctive feature extraction methods and classification procedures. These outcomes are graphically portrayed in Fig. 1. We have accomplished most extreme identification accuracy of 81.75% with centroid features and AdaBoostM1 ensemble classification technique as depicted in Table 1. True Positive Rate (TPR) and False Positive Rate (FPR) of each writer for this case (Centroid features and AdaBoostM1 classification) are presented in Figs. 2 and 3.

Table 1 Writer identification accuracy with different features and classifiers

Full size table

6 Conclusion and Future Scope

In this paper, we have presented a study of different features and classification techniques for text-dependent writer identification system. Maximum identification accuracy of 81.75% has been accomplished with centroid features and AdaBoostM1 ensemble classifier. This accuracy may be improved either by increasing the span of training dataset or by using the various optimal feature selection techniques like PCA, Correlation Feature Set (CFS), etc. This work can be employed to other scripts like Devanagari, Bengali, and Tamil and so forth which are similar to the Gurmukhi script after building the training dataset of these scripts.

7 Declaration

In this manuscript, we have used 49,000 samples of Gurmukhi characters collected from 70 different writers. These all individuals who participated in this work have given their consent for publish this dataset in this manuscript.

References

Breiman L 2001 Random Forests, Machine Learning, 45(1):5–32.
Google Scholar
Gazzah S and Amara N B 2008 Neural networks and support vector machines classifiers for writer identification using Arabic script, The International Arab Journal of Information Technology, 5(1): 92–101.
Google Scholar
Ghiasi G and Safabakhsh R 2010 An efficient method for offline text independent writer identification, In Proceedings of the 20th International Conference on Pattern Recognition, 1245–1248.
Google Scholar
John G H and Langley P 1995 Estimating Continuous Distributions in Bayesian Classifiers, In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, 338–345.
Google Scholar
Kumar M, Sharma R K and Jindal M K 2013 A Novel Feature Extraction Technique for Offline Handwritten Gurmukhi Character Recognition, IETE Journal of Research, 59(6):687–692.
Google Scholar
Kumar M, Jindal M K and Sharma R K 2014a A Novel Hierarchical Techniques for Offline Handwritten Gurmukhi Character Recognition, National Academy Science Letters, 37(6):567–572.
Google Scholar
Kumar M, Sharma R K and Jindal M K 2014b Efficient Feature Extraction Techniques for Offline Handwritten Gurmukhi Character Recognition, National Academy Science Letters, 37(4):381–391.
Google Scholar
Leclerc F, Plamondon R 1994 Automatic signature verification: the state of the art 1989–1993, International Journal of Pattern Recognition and Artificial Intelligence, 8(3):643–660.
Google Scholar
Leeham G, Chachra S 2003 Writer identification using innovative binarised features of handwriting numerals, In the Proceedings of the 7 ^th International Conference on Document Analysis and Recognition (ICDAR).
Google Scholar
Maadeed S A 2012 Text-dependent writer identification for Arabic Handwriting, Journal of Electrical and Computer Engineering, 13: 1–8.
Google Scholar
Plamondon R, Lorette G 1989 Automatic Signature Verification and Writer Identification The State of the Art, Pattern Recognition, 22(2):107–131.
Google Scholar
Schlapbach A, Bunke H 2005 Writer identification using an HMM based hand writing recognition system: to normalize the input or not?, In the Proceedings of 12th Conference of the International Graphonomics Society, Salerno, Italy.
Google Scholar
Schlapbach A, Bunke H 2007 A writer identification and verification system using HMM based recognizers, Pattern Analysis and Application, 10(1):33–43.
Google Scholar
Zois E, Anastassopoulos V 2000 Morphological Waveform Coding for Writer Identification, Pattern Recognition, 33(3):385–398.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, GZS Campus College of Engineering & Technology (Maharaja Ranjit Singh Punjab Technical University), Bathinda, Punjab, India
Sakshi & Naresh Kumar Garg
Department of Computer Applications, GZS Campus College of Engineering & Technology (Maharaja Ranjit Singh Punjab Technical University), Bathinda, Punjab, India
Munish Kumar

Authors

Sakshi
View author publications
You can also search for this author in PubMed Google Scholar
Naresh Kumar Garg
View author publications
You can also search for this author in PubMed Google Scholar
Munish Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Munish Kumar .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of Calcutta, Kolkata, West Bengal, India
Nabendu Chaki
DAIS, Ca’ Foscari University of Venice, Venice, Italy
Agostino Cortesi
Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India
Nagaraju Devarakonda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sakshi, Garg, N.K., Kumar, M. (2018). Writer Identification System for Handwritten Gurmukhi Characters: Study of Different Feature-Classifier Combinations. In: Chaki, N., Cortesi, A., Devarakonda, N. (eds) Proceedings of International Conference on Computational Intelligence and Data Engineering. Lecture Notes on Data Engineering and Communications Technologies, vol 9. Springer, Singapore. https://doi.org/10.1007/978-981-10-6319-0_11

Download citation

DOI: https://doi.org/10.1007/978-981-10-6319-0_11
Published: 21 December 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6318-3
Online ISBN: 978-981-10-6319-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Writer Identification System for Handwritten Gurmukhi Characters: Study of Different Feature-Classifier Combinations

Abstract

Similar content being viewed by others

A Comparative Study of Tree-based Structure Methods for Handwriting Identification

Improved recognition results of offline handwritten Gurumukhi characters using hybrid features and adaptive boosting

Tree-base Structure for Feature Selection in Writer Identification

Keywords

1 Introduction

2 Gurmukhi Script and Data Set

3 Feature Extraction Techniques

4 Classification Techniques

5 Experimental Results and Discussion

6 Conclusion and Future Scope

7 Declaration

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Writer Identification System for Handwritten Gurmukhi Characters: Study of Different Feature-Classifier Combinations

Abstract

Similar content being viewed by others

A Comparative Study of Tree-based Structure Methods for Handwriting Identification

Improved recognition results of offline handwritten Gurumukhi characters using hybrid features and adaptive boosting

Tree-base Structure for Feature Selection in Writer Identification

Keywords

1 Introduction

2 Gurmukhi Script and Data Set

3 Feature Extraction Techniques

4 Classification Techniques

5 Experimental Results and Discussion

6 Conclusion and Future Scope

7 Declaration

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation