1 Introduction

Classification is one of the important step for the document analysis and recognition. In recent years, the machine learning approaches are progressively in demand and receiving great attention by the researchers for the statistical validation of the received outcomes. This can be credited to the development of the range, the expanding number of real life applications and the accessibility of the open machine learning systems that make it simple to propose new algorithms or change the existing ones. In computer vision and pattern recognition fields, various classifiers are generally used for the classification because of their learning adaptability and ability to handle complex situations. The decision about which strategy to use for classifier execution assessment is reliant of many qualities and it is contended that no technique fulfills all the desired requirements. This implies, for some applications, researchers have to utilize more than one classification technique to achieve a reliable assessment. Sometime bad selection of classification methods yields less accurate results, so great care must be given for the selection purpose. Recognition accuracy, training time to build classification model is also depends upon the quality of features and number of classes in the dataset for classification when someone use same classifier for different scripts recognition or different datasets like Gurmukhi script consisting of 56 classes, Devanagari script consisting of 49 classes etc.

Researchers in the area of character/numeral recognition have been presenting lots of work using different classifiers. In this paper, we have evaluated performance of various classifiers for Gurmukhi character/numeral recognition in such a way so that efficient classifier can work for other scripts also with similar structure as that of the Gurmukhi script. Our work progresses by processing the characters and numerals of the dataset using various classification techniques, namely, k-NN, Linear-SVM, RBF-SVM, Naïve Bayes, Decision tree, Convolution Neural Network (CNN) and Random forest. The goal is to develop a system that is able to recognize the characters and numerals of Gurmukhi script efficiently with promising accuracy rates. The classification evaluation metrics considered are accuracy, training sample size, False Acceptance Rate (FAR), False Rejection Rate (FRR) and Area Under Receiver Operating Characteristic (AUROC) Curve.

The paper is structured into seven sections. Introduction to the present work has been discussed in Sect. 1. Section 2 presents related work and the collection of the dataset. This section presents the background work of character/numeral recognition and depicts the various methodologies used by different researchers for character recognition. Section 3 focuses on the feature extraction phase used for extracting the properties of character and numeral recognition. Feature extraction is an important phase of an optical character recognition system. In this section, authors have presented a brief introduction about the feature considered in this work. In Sect. 4, authors are focusing the classifiers evaluated in this work. Classification phase is basically used for decided the class membership based on the features extracted from samples. Section 4 presents the detailed introduction and block diagrams of classifiers considered in this work for performance evaluation. Section 5 presents different evaluation metrics. Authors have evaluated the performance of various classifiers based on these performance evaluation metrics. Section 6 depicts experimental work performed using different classifiers. In this section, authors have analyzed the performance of used classifiers for the work based on the parameters such as recognition accuracy, time taken to build training model, False Acceptance Rate (FAR), False Rejection Rate (FRR), Area Under Receiver Operating Characteristics (AUROC) curve. In this section, the authors have finally, presented the performance based on individual features with the best classifier evaluated in this work. Finally, concluding notes and future directions of the present study are presented in Sect. 7.

2 Related work and data set

Literature shows that a good amount of work has already been done on the performance evaluation of a few classifiers for character and numeral recognition. For digit recognition, various methods of feature extraction and classifiers have been studied and compared by Lee and Srihari (1993). The results obtained claimed high accuracy with the chain code feature, the gradient feature, stroke-level, and concavity features (Favata et al. 1994). Jeong et al. (1999) have presented a correlation of different classifiers for digit recognition. For fingerprint and digit recognition, Blue et al. (1994) have analyzed a few classifiers and subsequently by studying the classifiers it was found that there was no problem in the execution of Probabilistic Neural Network (PNN) and the k-NN rule. Jain et al. (2000) have presented a study based on little dataset including a digit’s dataset. Zhu et al. (1999) differentiated between connected character images and typical images using the Fourier Transform. By comparing Decision Tree, Artificial Neural Network and Logistic Regression, Kim has presented effectiveness of these classifiers based on Root Mean Square Error (Kim 2008). In this article, the impact of the sort of traits and the span of the dataset on the classification methods have been examined and the outcomes have been accounted for regression. Artificial Neural Network (ANN) has been applied to the real and simulated data. These reported results proved that if the data include errors and if the real values of attributes are not available, then the statistical method of regression could act better than the ANN method and produces superior performance. Huang et al. (2003) have taken into consideration Naïve Bayes (NB), Decision Tree (DT) and SVM collectively using Area Under Curve (AUC) paradigm. After applying specified techniques on the genuine information, they noticed that the AUC measure is superior to attaining the precision for comparing the classification methods. Moreover, it was observed that C4.5 execution of the decision tree has a higher Area Under Curve (AUC) as compared to Naive Bayes and SVM. A standout contribution amongst the most cited papers in this area is one by Dietterich (1998). Subsequent to depict the scientific categorization of statistical questions in machine learning, he concentrates on the subject of selecting the algorithm from the two algorithms under consideration, which produces more precise results for a given data collection. Liu et al. (2002) have presented a performance evaluation study in which some efficient classifiers have been used for handwritten digit recognition. They have also indicated that multiple classifiers should be used with great care to acquire high performance.

Kumar et al. (2018) have presented a review for character recognition of non-Indic and Indic scripts. In this review, they have also examined major challenges/issues for character/numeral recognition. Sharma et al. (2009) have expounded a method to rectify the recognition results of handwritten and machine printed Gurmukhi OCR systems. Sharma and Lehal (2009) have proposed an algorithm for removal of the field frame boundary of the hand-filled forms in Gurmukhi script. Sharma and Jhajj (2010) have extracted zoning features for handwritten Gurmukhi character recognition. They have employed two classifiers, namely, k-NN and SVM in their work. They could achieve a maximum recognition accuracy of 72.5% and 72.0%, respectively, with k-NN and SVM classifiers. Kumar et al. (2013a) have presented a novel feature extraction technique for offline handwritten Gurmukhi character recognition. They have also presented efficient feature extraction techniques based on the curvature features for offline handwritten Gurmukhi character recognition (Kumar et al. 2014a). Table 1 contains some of the studies that have used existing features and classifiers for character and numeral recognition.

Table 1 Studies on numeral and character recognition

For the experimental work in this paper, we have used a balanced primary dataset. This data set consists of 13,000 handwritten samples of 45 classes (7000 samples of handwritten Gurmukhi characters for 35-class problem and 6000 samples of handwritten numerals for 10-class problem). Dataset of characters (7000 samples) is a collection of 35 classes and each class contains 200 samples. Dataset of 6000 samples is a collection of 10 classes and each class contains 600 samples.

Kumar et al. (2013b) have noticed that irrespective of the features, few classifiers perform consistently better if the number of samples in the training data set are increased. Therefore, for experimental work, data set is divided using different partitioning strategies for training dataset and testing dataset as presented in Table 2.

Table 2 Data set partitioning strategies

Partitioning Strategy f and g presents the standard k-fold cross validation. In general, k-fold cross validation divides, complete data set for each category into k equal subsets. Then one subset is taken as testing data and the remaining k-1 subsets are taken as training data. By cross validation, each sample of training data is also predicted and it gives the percentage of correctly recognized testing dataset.

3 Feature extraction

For evaluating the performance of a recognition system, the feature extraction plays an important role. The essential logic behind the feature extraction stage is to extract important properties of a digitized character image, which boosts the recognition accuracy. In this work, at first Nearest Neighborhood Interpolation (NNI) technique has been used to change the digitized images into a size of 88 × 88. A feature vector of 105 elements is extracted by using a hierarchical technique, this feature vector comprises of horizontally and vertically peak extent features (Kumar et al. 2012), diagonal features (Kumar et al. 2012), and centroid features (Kumar et al. 2014b).

3.1 Peak extent based features

In this technique, features are extracted by taking into account the sum of the peak extents, that fit successive black pixels along each zone. Peak extent based features can be extracted horizontally and vertically. In the horizontal peak extent features, they considered the sum of the peak extents that fit successive black pixels horizontally in each row of a zone, whereas in vertical peak extent features they considered the sum of the peak extents that fit successive black pixels vertically in each column of a zone. So, using this technique, authors have obtained 2n features corresponding to each character.

3.2 Diagonal features

In this technique, authors have divided the original thinned image of a character into n number of proportionate evaluated zones. These features are taken out by moving along diagonals of the pixels of each zone. Each zone has 2n − 1 diagonals and ON (foreground) pixels activated along each diagonal are computed up in order to acquire a single sub-feature. These 2n − 1 sub-features values are averaged to form a single value and put into comparing zone as its feature. Here, we will get n features relating to each sample.

3.3 Centroid feature

For centroid feature extraction, divide the bitmap image into n number of zones. After that, find the coordinates of foreground pixels in each zone and calculate the centroid of these foreground pixels and store the coordinates of these foreground pixels as a feature value. Corresponding to the zones that do not have a foreground pixel, take the feature value as zero. Using this methodology, authors have achieved 2n features elements for each character image.

4 List of classifiers employed for the experimental work

4.1 Convolution neural network (CNN)

Convolutional Neural Network (CNN) or ConvNet is a special kind of multi-layer neural network that is the most suitable classifier in the ground of pattern recognition. In 1990, LeCun and Bengio introduced the concept of CNNs (1990). CNNs are made up of neurons that have learnable weights and biases. Each neuron receives some input, performs a dot product and optionally follows it with non-linearity. The whole network expresses a single differentiable score function from the raw image pixels on one end to class score at the other end and they have a loss function (e.g. Softmax) on the last (fully-connected) layer. CNN is a feed-forward network that can extract topological properties of an image and they are learned with a version of the back-propagation algorithm. They can recognize patterns with extreme variability (such as handwritten characters). Block diagram of CNN classification process for numeral recognition is illustrated in Fig. 1.

Fig. 1
figure 1

Block diagram of CNN classification

4.1.1 Layers used to build CNN

CNN is a sequence of layers and every layer of CNN transforms one volume of activations to another through a differentiable function. There are three main types of layers to build CNN architecture, which are convolutional layer, pooling layer and fully-connected layer. The description of these layers are:

  • Convolutional layer is the core building block of CNN that does most of the computational heavy lifting.

  • The pooling layer is placed between successive Convolutional layers of CNN architecture. Its function is to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network, and hence to also control over-fitting. The pooling layer operates independently on each depth slice of the input and resizes it spatially, using the MAX operation.

  • In fully-connected layer, neurons have full connection to all activations in the previous layer. Their activations can be computed with a matrix multiplication followed by a bias offset.

There are several architectures available, which are helping in the working of CNN. These are:

  • LeNet The first successful application of CNNs was developed by LeCun and Bengio in 1990s and the best known is the LeNet (1998) architecture that was used to read zip codes, digits etc.

  • AlexNet The first work that popularized Convolutional Networks in Computer Vision was the AlexNet (Krizhevsky et al. 2012). The AlexNet was submitted to the ImageNet ILSVRC challenge in 2012 and significantly outperformed the second runner-up (top 5 errors of 16% compared to runner-up with 26% error).

  • ZFNet The ILSVRC 2013 winner was a Convolutional Network from Matthew Zeiler and Rob Fergus that became known as the ZFNet  (Zeiler and Fergus 2014). It was an improvement on AlexNet by modifying the architecture, hyper-parameters, in particular by expanding the size of the convolutional middle layers and making the stride and filter size.

  • GoogLeNet The ILSVRC 2014 winner was a Convolutional Network from Szegedy et al. (2015) from Google. Its main contribution was the development of an inception module that dramatically reduced the number of parameters in the network (4M, compared to AlexNet with 60M).

  • VGGNet The runner-up in ILSVRC 2014 was the network from Simonyan and Zisserman that became known as the VGGNet (2015). Its main contribution was in showing that the depth of the network is a critical component for good performance.

  • ResNet Residual Network developed by He et al. (2016) was the winner of ILSVRC 2015. Its features include special skip connection and a heavy use of batch normalization. ResNet architecture is also missing fully-connected layers at the end of the network.

It is observed that a lot of findings and studies have been presented in the field of pattern recognition using Convolutional neural network. For example, Yuan et al. (2012) have applied CNNs for offline handwritten English character recognition and used modified LeNet-5 CNN model. Liu et al. (2013) proposed a hybrid model with a combination of CNN and Conditional Random Field (CRF) for handwritten English character recognition. CNN is used as a trainable topology-sensitive hierarchical feature extractor and CRF is trained to model the dependency between characters. Anil et al. (2015) have used LeNet-5, CNN is trained with gradient based learning and back propagation algorithm for the recognition of Malayalam characters. Wu et al. (2014) proposed a handwritten Chinese character recognition method based on the relaxation Convolutional Neural Network (R-CNN) and Alternately Trained Relaxation Convolutional Neural Network (ATR-CNN). In this paper, they have used LeNet (the First successful application of Convolution Networks) of CNN for script classification with dropout rate = 0.2, patch size = 3 × 3, pool width and height 2. CNN achieved the third rank among the top seven supervised learning algorithm for handwritten character and numeral recognition work considered in the present paper.

4.2 Decision tree

Various attributes of the data are used by the decision tree algorithm for processing and decision making. Attributes in the decision tree are nodes and each leaf node is representing a classification. Decision tree is a type of supervised machine learning algorithms where the data is continuously divided according to certain parameters. Block diagram of decision tree classification for fruit classification is illustrated in Fig. 2.

Fig. 2
figure 2

Block diagram of decision tree classification

The decision tree classifiers organized a series of test questions and conditions in a tree structure. In the decision tree, the root and internal nodes contain attribute test conditions to separate records that have different characteristics. All the terminal nodes are given class labels, Yes or No. After construction of the decision tree, the classification of the test record starts from the root node and then apply the test condition to the record and follow the appropriate branch based on the outcome of the test. It then leads to either another internal node, for which a new test condition is applied, or a leaf node. When the leaf node is reached, the class label associated with the leaf node is then assigned to the record. The building of an optimal decision tree is the key problem in the decision tree classifier. Various efficient algorithms have been developed to construct a reasonably accurate decision tree in a reasonable amount of time. These algorithms usually employed a greedy strategy that grows a decision tree by making a series of locally optimum decisions about which attribute to use for partitioning the data. For example, Hunt’s algorithm, ID3, C4.5, CART, SPRINT are greedy decision tree induction algorithms. Few finding and related work in the field of character recognition or pattern recognition based on the decision tree algorithm are discussed in this section. For example, Amin and Singh (1998) have presented a new technique for the recognition of hand-printed Chinese characters using the Decision trees/C4.5 machine learning system. Sastry et al. (2010) have proposed a system to identify and classify Telugu characters extracted from the palm leaves, using a decision tree approach. Ramanan et al. (2015) proposed a novel hybrid decision tree for printed Tamil character recognition using Directed Acyclic Graph (DAG) and Unbalanced Decision Tree (UDT) classifiers. As per a comparative study of different classification methods presented in this paper for character/numeral recognition, decision tree got the fifth rank among the top seven supervised learning algorithms for character/numeral recognition.

4.3 k-NN

k-NN is considered as a lazy learning algorithm that classifies the data sets based on their similarity with the neighbors. Here k stands for the number of dataset items that are considered for the classification. A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its k nearest neighbors measured by a distance function. If k = 1, then the case is simply assigned to the class of its nearest neighbor. Usually Euclidean distance is used for calculating the distance between stored feature vector and candidate feature vector in k-Nearest Neighbor algorithm. Block diagram of k-NN classifier is depicted in Fig. 3.

Fig. 3
figure 3

Block diagram of k-NN classification

For the given attributes,

$$ {\text{A}} = \left\{ {{\text{X}}1,{\text{X}}2, \ldots ,{\text{XD}}} \right\}, $$

where D is the dimension of the data, we need to predict the corresponding classification group,

$$ {\text{G}} = \left\{ {{\text{Y}}1,{\text{Y}}2, \ldots ,{\text{Yn}}} \right\} $$

using the proximity metric over k items in D dimension that defines the closeness of the association such that X ∈ RD and Yp ∈ G.

We choose the optimal value of k by first inspecting the data. In general, a large k value is more precise as it reduces the overall noise but there is no guarantee. Cross-validation is another way to determine a good k value by using an independent dataset to validate the k value. Rathi et al. (2012) proposed an approach to the recognition of offline handwritten Devanagari vowels by means of k-NN classifier and achieved a recognition rate of 96.1%. Rashad and Semary (2014) have developed a system for isolated printed Arabic character recognition using k-NN and Random Forest classifiers. Hazra et al. (2017) have presented an application of pattern recognition using k-NN to recognize handwritten or printed text. Elakkiya et al. (2017) have developed a system for offline handwritten Tamil character recognition using k-NN. k-NN is a method for classifying characters/numerals in view of neighboring samples in the training feature space. This classifier got the 4th rank among the seven classification algorithms for character/numeral recognition experimented in this paper.

4.4 Naive Bayes

The Naive Bayes (John and Langley 1995) classifier is a basic method, which has a very clear semantics representing a probabilistic knowledge. This classifier is simple or naive with important and simple assumptions. It expects that in a given class, predicative quality is restrictively autonomous. It also assumes that the prediction process is not influenced by any hidden or latent attributes. Naive Bayes classifier is a family of probabilistic algorithms that takes advantage of probability theory and Bayes’ theorem to predict the category of a sample. It is particularly suited when the dimensionality of the input is high. This algorithm is probabilistic, which means that it calculates the probability of each category for a given sample, and then output the category with the highest probability. These probabilities can be achieved by using Bayes’ theorem, which describes the probability of a feature, based on prior knowledge of conditions that might be related to that feature. Naive Bayes classifier assumes that all the features are not related to each other. The presence or absence of a feature does not influence the presence or absence of any other feature. It also assumes that each feature is given the same weight or importance. This method achieved the sixth rank in the seven algorithms for recognition of handwritten characters and numerals considered in this study.

4.5 Random forest

The ensemble for supervised learning method is called the Random Forest (RF) method. Random forest removes the over-fitting crisis of decision tree. Decision tree classifiers are used to classify various sub-samples of the dataset. The meta estimator that fits the number of decision tree classifiers for such design is called Random Forest. Block diagram of random forest classifier is shown in Fig. 4 The random forest uses averaging that helps in improving prescient exactness and control over-fitting. Random forest is unexcelled in accuracy among other existing supervised learning algorithms for classification and runs efficiently on large databases (Breiman 2001). Random forest classifier creates a set of decision trees from a randomly selected subset of the training set. It then aggregates the votes from different decision trees to decide the final class of the test object. Alternatively, the random forest can apply the weight concept for considering the impact of the result of any decision tree. Tree with a high error rate is given low weight value and vice versa. This would increase the decisive impact of trees with low error rate. The basic parameters to random forest classifiers can be the total number of trees to be generated and decision tree related parameters like minimum split, split criteria etc. Random Forest classifier consists of a collection of tree-structured classifiers {h(x, Θk), k = 1, …}, where the Θk are independently, identically distributed random trees and each tree casts a unit vote for the final classification of input x. Like CART, Random Forest uses the Gini index for determining the final class of each tree. The Gini index of node impurity is the most commonly useful for classification-type problems.

Fig. 4
figure 4

Block diagram of random forest classification

Homenda and Lesinski (2011) have presented a study on influence of features selection techniques for effectiveness of different classifiers. Their experimental results show that random forest classifier achieves the better results as compared to other methods. Zahedi and Eslami (2012) have discussed the use of the Random Forest classifier in the field of Persian handwritten character recognition. Cordella et al. (2014) have proposed an experimental study of Random Forest classifier reliability in handwritten character recognition using two real world datasets, namely NIST and PD datasets. Rachidi and Mahani (2017) have presented a system of automatic recognition of Amazigh characters using Random Forest method for images obtained by camera phone. Random Forest is the best classification algorithm for character and numeral recognition among the top seven algorithms considered in this paper. Random Forest classifier achieves the best recognition accuracy because initially it does efficient feature selection for classification. It then builds trees based on good features and favours those trees over other trees that are built based on noisy features.

4.6 Support vector machine (SVM)

SVM is a supervised learning algorithm for the classification of both linear and non-linear data. It maps the genuine data in large dimensions from where it can find a hyper-plane for the division of the data using imperative training samples called as support vectors. Block diagram of SVM classifier is shown in Fig. 5. A hyper-plane is a “decision boundary” that splits one class from another (Han and Kamber 2001). Using support vectors and margins defined by the support vectors, the SVM locates the hyper-plane. In this work, the authors have considered SVM with linear kernel, namely linear-SVM and SVM with RBF kernel, namely RBF-SVM for classification. Kernel parameter for RBF-SVM is considered as \( \gamma \) = 0.01 and c = 1. The random state value is taken as zero in both kernels (Linear-SVM and RBF-SVM). Linear-SVM achieved the second rank and RBF-SVM achieved seventh rank in the seven supervised learning algorithms for recognition of offline handwritten Gurmukhi characters and numerals in this work.

Fig. 5
figure 5

Block diagram of support vector machine classification

5 Performance metrics

The performance of the classifiers has been measured with respect to different performance metrics like training sample size, recognition accuracy, False Acceptance Rate (FAR), False Rejection Rate (FRR) and Area Under Receiver Operating Characteristic (AUROC) Curve. The False Acceptance Rate (FAR) is the measure of the probability that the recognition system will inaccurately recognize test information dataset. FAR represents the proportion of the quantity of false acknowledgments partitioned by the aggregate number of mistaken examples. Similarly, False Rejection Rate (FRR) is the measure of the probability that the recognition system will mistakenly dismiss test information. Mutual relationship between FAR and FRR is shown in Fig. 6. FAR and FRR can be calculated as follows.

$$ FAR = \frac{Wrongly\; accepted \;samples}{Total \;number\; of\; wrong \;samples} $$
$$ FRR = \frac{Wrongly\;rejected\;samples}{Total \;number \;of \;correct\; samples} $$
Fig. 6
figure 6

Relationship between FAR and FRR

Area Under Receiver Operating Characteristic (AUROC) Curve is used in classification analysis in order to determine which of the used models predicts the classes best. The classifiers considered in this work are trained with a variable number of samples as discussed in Table 2. We have additionally presented a performance metric of these classifiers in the light of time taken to assemble the model (Table 3). Recognition accuracies accomplished using different classification methods considered in this work are depicted in Table 4.

Table 3 Time taken to build training model (in seconds)
Table 4 Recognition accuracy achieved using the classifiers

6 Experimental results

In this section, the authors have presented experimental results of the assessment study for the Convolution Neural Network (CNN), decision tree, k-NN, Linear-SVM, Naïve Bayes, RBF-SVM and random forest classifiers. A dataset of 13,000 samples for experimental results (7000 characters and 6000 numerals) has been considered for experimental work. The authors used a variable number of training samples to train the seven classifiers as discussed in Table 2. Time taken to train the proposed model is presented in Table 3. As shown in Table 3, one can see that k-NN classifier is taking minimum time when compared with other classifiers for the training of the model.

In Table 4, we have presented recognition accuracies achieved with different classifiers for offline handwritten Gurmukhi characters and numeral recognition. The recognition accuracy achieved with various classifiers is graphically plotted in Fig. 7. As depicted in Table 4 and Fig. 7, the recognition accuracies of 87.9%, 82.5%, 75.4%, 74.7%, 70.5%, 66.3%, and 64.9% has been achieved with Random Forest, Linear-SVM, CNN, k-NN, Decision Tree, Naïve Bayes and RBF-SVM classifiers, respectively.

Fig. 7
figure 7

Recognition accuracies attained using evaluated classifiers

The FAR, FRR and AUROC values of the seven classifiers considered in this work are depicted in Tables 5, 6 and 7 and graphically plotted in Figs. 8, 9 and 10, respectively.

Table 5 False acceptance rate (FAR) for the classifiers
Table 6 False rejection rate (FRR) for the classifiers
Table 7 Area under receiver operating characteristic (AUROC) curve for the classifiers
Fig. 8
figure 8

False acceptance rate (FAR) for the classifiers

Fig. 9
figure 9

False rejection rate (FRR) for the classifiers

Fig. 10
figure 10

Area under receiver operating characteristic (AUROC) curve for the classifiers

Authors have also calculated one of the most widely used loss function is mean squared error for all classifiers considered in this study, which calculates the square of difference between actual value and predicted value. MSE values of the seven classifiers considered in this work are depicted in Table 8 and graphically plotted in Fig. 11, respectively.

Table 8 Mean squared error (MSE) for the classifiers
Fig. 11
figure 11

Mean squared error for the classifier

Comparing the results based on recognition accuracy, we can see that the recognition accuracy achieved by the Random Forest classifier is noticeably higher than the other classifiers considered in this work. It has also been noticed that FAR, FRR, MSE and AUROC values of Random Forest classifier are also comparable to other classifier as depicted in Tables 5, 6, 7 and 8. Recognition results of individual features with Random Forest classifier and tenfold cross validation methodology are depicted in Table 9. These features are performing well for Gurmukhi character recognition (Sundaram and Ramakrishnan 2008; Kumar et al. 2012, 2013b, 2014b). These features are also useful for other types of scripts, which are structurally akin to the Gurmukhi script. As depicted in Table 9, recognition accuracy of 87.9%, FAR of 0.4%, and FRR of 12.0%, has been attained. Confusion matrix of this case using random forest classifier and tenfold cross validation is depicted in Table 10.

Table 9 Performance based on individual features and random forest classifier
Table 10 Confusion matrix of random forest classifier with tenfold cross validation

7 Inferences and observations

For developing successful applications under document analysis and recognition, many directions and alternatives are possible for selecting feature extraction, and classification methods in order to improve the recognition accuracy. Number of researchers has proposed feature extraction/selection techniques and classification techniques for the different scripts. In this paper, the authors have focused on the comparative analysis of the classifiers for offline handwritten Gurmukhi character and numeral recognition. This study provides an abstract view for potential readers towards the classification techniques for document analysis and recognition in Gurmukhi script. It is worth mentioning here that by increasing the size of the training dataset, the classification accuracy is generally improved. Authors have selected seven classifiers, namely, Convolution Neural Network, decision tree, k-NN, Linear-SVM, Naïve Bayes, RBF-SVM and Random Forest for the character and numeral recognition in this work. These classifiers required moderate memory space and computation cost and provided reasonably high accuracy. After comparing the results based on recognition accuracies, FAR, FRR and AUROC, MSE, authors observed that the Random Forest classifier is performing better than other classifiers for offline handwritten Gurmukhi character and numeral recognition. Researchers can take the new direction of introducing a novel feature extraction and classification method giving higher accuracy rates. One can also look for the tuning and optimizing techniques for the classification algorithms to make sure that the large training set will not cause over fitting problem and achieve higher recognition accuracy.