Keywords

1 Introduction

The human capability to examine and categorize objects and scenes is a very useful skill, researchers have tried to implement this through machine learning algorithm in many domains including Education, Sports, Transportation, Oil and Gas, Financial Services, Marketing and Sales, Government, health care and in many safety-critical applications like fingerprint recognition, facial recognition and many more [1]. Handwriting digit recognition is one of the major applications in machine learning applied in many wide ranges of real-life applications such as signature identification and verification, zip code recognition in postal mail categorization, form processing, handwritten digit verification in bank, fraud detection etc. Handwritten digit recognition plays a crucial role in optical character recognition (OCR) and in pattern recognition [2]. There are many devices such as smart phones and tablets that can take handwriting as an input to a touch screen via a finger or using an electronic stylus. This allows user to quickly transfer the text to the devices which helps especially for the selective individuals who are not well versed with input devices such as keyboards to write text faster rather than typing slowly through input devices. Recognition of such text is very hard even by humans. Thus, a system that supports an automatic recognition of text would be very helpful in many applications.

1.1 Need of the System

Handwritten digit recognition system is developed to improve the accuracy of the existing solutions to achieve higher accuracy and reliable performance. Over the last decades, many machine learning algorithms made use of impressive handwritten digit recognition techniques such as baseline linear classifier, baseline nearest neighbour classifier, pairwise linear classifier, radial basis network, large fully connected Multilayer neural network, tangent distance classifier, optimal margin classifier [3], support vector machine (SVM) [4,5,6,7,8], CNN [5], fuzzy [9] neural network [7, 10,11,12,13,14], PCA [6, 15], CNN-SVM classifier [2, 16], KNN [17], recurrent neural network (RNN) [18] and DNN classifiers [19].

However, there are still some challenges that need to be solved. As handwritten characters are different in writing style, stroke thickness, deformation, rotation, etc., it is difficult to recognize [17, 18, 20, 21]. The main challenge in handwriting recognition system is to classify a handwritten digit based on black and white images. Furthermore, to meet the industry need, accuracy and robustness to the variation in writing style of the individual must be high.

1.2 Scope of the System

The digital world’s advent began a mere century or two ago, but scriptures and books after books have been handwritten by human scholars from the beginning of mankind. Accepting the digital world first begins with the task of integrating the scripts that came into existence before the rise of computers and technology. Thus, this conversion and integration must begin with the most common values in the world that transcend different languages as well numbers.

The problem is to categorize handwritten digits into ten distinct classes with accuracy as high as possible. The digit ranges from zero (0) to nine (9). In this work, we utilized the support vector machines (SVMs), principal component analysis (PCA) and K-nearest neighbour (KNN) techniques, by compounding to form a novel method to solve the problem. The experiment applied on digit data set [22, 23] is taken from the well-known Modified National Institute of Standards and Technology (MNIST) data set [23].

2 Related Work

For developing handwritten digit recognition, the literature presents a number of researches that have made use of machine learning techniques. Among them, a few techniques related to the work have been presented below.

Matan et al. developed a neural network architecture for recognizing handwritten digits in a real world. This network has 1% error rate with about 7% reject rate on handwritten zip code digits provided by the US portal service [24]. Jitendra Malik et al. developed simple and an easy approach for finding out the resemblance between shapes and utilized it for object recognition. The proposed approach was tested on COIL data set, silhouette, trademarks and handwritten digits [21].

S. M. Shamim et al. presented an approach to offline handwritten digit recognition. The main problem is the capability to develop a cost-effective algorithmic program that can acknowledge handwritten digits and which is submitted by users by the way of a scanner, tablet and other digital devices [14].

Caiyun Ma et al. proposed an approach based on specific feature extraction and deep neural network on MNIST database. The proposed work is compared with SOM() [6] and P-SVM [25], and the result shows the proposed algorithm with accuracy 94.2% with 24 dimensions and showed that the deep analysis is more beneficial than traditional in terms of visualization of features [19]. Anuj Dutt et al. compared the results of some of the most widely used machine learning algorithms like SVM, KNN and RFC 4 and with deep learning algorithms like multilayer CNN using Keras with Theano and TensorFlow. The result showed the accuracy of 98.70% using CNN (Keras + Theano) as compared to 97.91% using SVM, 96.67% using KNN, 96.89% using RFC and the lowest error rate 1.28% using convolution neural network [26]. Chayaporn Kaensar presented comparative analysis using three different algorithms like neural network, support vector machine and K-nearest neighbour. The analysis of the presented work demonstrates that the SVM is the best classifier with 96.93% accuracy with more time required for training as compared to neural network and K-nearest neighbour [7].

Mohd Razif Shamsuddin et al. presented handwritten digit recognition on MNIST data set. In this work, four different methods (logistic regression, random forest, extra trees classifier and convolution neural network) were applied on normalized MNIST data set and binary data set. The analysis result shows that the convolution neural network gives the system validation with the best result 99.4% on normalized data set and 92.4% on binary data set using extra trees algorithm. The analysis shows that the system works better on normalized data set [27]. Saeed AL-Mansoori proposed multilayer perceptron (MLP) neural network to solve the problem of the handwritten digit recognition. The system performance is observed on MNIST data set by altering the number of hidden layers and the number of iterations, and the result showed the overall training accuracy of 99.32% and testing accuracy of 100% [8].

Cheng-Lin Liu et al. presented handwritten digit recognition on binary and grey images using eight different classifiers like KNN, MLP, PC, RBF, LVQ, DLQDF, SVC-poly and SVC-RBF tested on three different data sets CENPARMI, CEDAR and MNIST. The presented work is concluded as SVC-RBF gives the highest accuracy among all the algorithms, but this algorithm is extremely expensive in memory space and computation [28]. In addition to the above, other important works include research on local similarity [29], prototype generation techniques [30], handwriting verification [31], trajectory and velocity modelling [5] and feature extraction [15].

3 Materials and Methods

The work is implemented and tested in the following system requirements: Intel i 3 or later processor, minimum 2 GB RAM, minimum 2 GB graphics processing unit, operating system (Windows 7 and above), Anaconda Python 3.7. All the algorithms tried using scikit-learn Python library, version 0.17.1.

3.1 Data Set

The proposed system was implemented and tested using MNIST data set (Modified National Institute of Standards and Technology database). The MNIST data set contains handwritten digits having 60,000 examples in the training set and 10,000 examples in the test set. The MNIST data set was associated with MNIST data set which is the super-set of MNIST. The size of the image is 28 × 28 pixels = 748 pixels. There are close to 60,000 images in the combined data set that can be used for training and judging the system. The data set contains the input and likelihood that the image belongs to different classes (i.e. the machine-encoded digits, 0–9) [22, 23].

3.2 Methods

Figure 1 shows that proposed approach is an association of PCA, KNN and SVM algorithms to improve the classification accuracy. The PCA algorithm helps to reduce the number of attributes which contribute more towards classification. The first step is to load the data set and abstract the feature columns with target columns. The size of the data set is rather large (60,000 samples with 784 features); thus, extraction of features from the original large dimensional features of the data is done using PCA in the initial stage.

Fig. 1
figure 1

Working of the proposed system for handwritten digit recognition

The first 60 features can explain approximately 97% of total substance (in terms of total variance retained), which fulfil to be typical of the information in the original data set as shown in Fig. 2. Thus, the first 60 principal components are implemented as the extracted features. The data is then split into training and testing sets. The simple implementation of SVM-KNN goes as follows: the KNN model is created and fit to the training set values, which trains the KNN classifier. For a query, it is necessary to compute the Euclidean distances of the query to all the training samples and pick the K-nearest neighbours. The general value of Euclidean distance (d) is calculated using Eq. 1.

Fig. 2
figure 2

Amount of data versus component number first 314 principal components as the extracted features using PCA

$$ d(p,q) = \sqrt {\sum\limits_{i = 1}^{n} {(qi - pi)^{2} } } $$
(1)

where p is the first data point, q is the second data point and n is the number of dimensions in data point.

If the K-neighbours (excluding the query) all have the same class, the query is flagged with the respective class same as its neighbours. Further, it calculates the distance between K-neighbour pairwise and converts the distance matrix into kernel matrix. Finally, the multiclass SVM is applied to the kernel matrix to flag the query. In the initial implementation, 314 principal components are extracted and use parameters values of k = 2 for KNN and C = 0.5 for SVM. This resulted in an accuracy score of 0.964 as shown in Table 1. Then, the number of iterations is used to tune the k parameter by changing its value while keeping the other parameter values as the same and observing the results. The same steps are applied for 20 distinct values of k (number of neighbours), keeping the c (penalty parameter) value constant.

Table 1 Initial test
Table 2 For different values of k with c = 0.5 accuracy observed

The bold value in Table 2 highlights the best k value with respect to the highest test set accuracy achieved as well as fastest prediction time taken to do so. Figure 3 plots accuracy of test set with respect to changing k values taken from Table 2.

Fig. 3
figure 3

k versus accuracy (test)

The value of k = 3 gives the highest accuracy as shown in Table 2, hence keeping k = 3 constant and changing the values of c to understand variation in accuracy with change in c as follows (Fig. 4; Table 3):

Fig. 4
figure 4

Accuracy (test) with respect to C

Table 3 For k = 3 with c = 0.5 accuracy observed

From the above result, it is concluded that the best value of k is k = 3. However, changes in the C value do not impact the final accuracy score. This result is quite unusual because the input space to the SVM is very small (size 3) and the SVM algorithm can classify the data set pretty quickly; hence, changing the parameters does not have much effect on the accuracy. The final solution then uses k = 3, C = 0.005 and yields an accuracy score of 0.9720 as shown in Table 2.

4 Results and Discussions

4.1 Data set Analysis

Digit data set has a total of 70,000 image samples (42,000 training set and 28,000 testing set samples, each with 784 features). Figure 5 represents the number of occurrences of all the digits versus labels (i.e. 0–9) present in the training data set of 42,000 samples.

Fig. 5
figure 5

Occurrence of each digit in the training set

4.2 Classification Report

Figure 6 displays the extensive classification report containing details about the precision of the model, recall, f1 score and support.

Fig. 6
figure 6

Classification report

4.3 Manual Result Testing

By manually taking out digits from the data set, plotting their 28 px by 28 px square image using image show function in matplotlib and comparing the results with the predicted outcome, we get the following:

The actual images and their labels are shown in Fig. 8. The same images were fed to the model, and the model’s prediction was shown in Fig. 7. This model incorrectly labels the fifth image and identifies it as 0, but the correct label is 9.

Fig. 7
figure 7

Images with their predicted labels

Fig. 8
figure 8

Actual images with their true labels

4.4 Confusion Matrix

Figure 9 visualizes the confusion matrix. It plots the predicted values versus actual values where the actual labels are represented on Y-axis and predicted values are represented on X-axis. This model has been applied to the testing data set. The model predicted the label to be 0 correctly 1636 times.

Fig. 9
figure 9

Confusion matrix, without normalization

5 Conclusion and Future Scope

In this work, the model is acceptable for providing a solution for classifying handwritten digits into their respective labels in the MNIST data set as it is able to accurately categorize well with accuracy quite close to humans using a combination of two classification techniques such as support vector machine and K-nearest neighbours. However, the model is still in its rudimentary stages and useful in a limited domain. To solve large problem for recognizing multiple digits in an image or to recognize arbitrary multidigit information in unspecified or not constrained natural images, several changes need to be done in this work.