1 Introduction

Handwritten character recognition is increasingly gaining momentum owing to its application areas for significantly reducing time in applications such as data entry, filling forms, banking, automation in postal services, etc. However, developing a more dependable approach or more technically ‘a system’ for recognizing handwritten characters of such regional scripts still poses a challenge to researchers. The main reason behind the problem is the variation in the shapes of the characters that may otherwise depend upon certain factors like acquisition device, ink colour, the width of the pen and many other factors. Moreover, handwritten Meetei-Mayek characters tend to be much more complex in comparison with common English characters due to the presence of modifiers, shape and structure. These factors demand a sophisticated pattern recognition algorithm that will be able to efficiently handle the challenging task of classifying these characters.

In this paper, the design of a Handwritten Manipuri Meetei-Mayek (HMMM) character recognition system is being discussed. The history and origin of Meetei-Mayek can be found in detail in the works of literature by Wangkhemcha [1], Mangang [2] and Hodson [3]. Manipuri or Meeteilon is one of the scheduled languages of India and also the official language of Manipur, which is one of the states located in the North-Eastern part of India. The script contains a total of 56 characters, which can be classified into 5 different categories: Iyek Ipee/Mapung Iyek, which consists of 27 alphabets, Cheitek Iyek (8 symbols), Lonsum Iyek (8 letters), Khudam Iyek (3 symbols) and Cheishing Iyek, which consists of 10 numeral figures. Apart from the three Khudam Iyek, all other characters are internationally accepted symbols. The basic characters or the Iyek Ipee appear only as the main character of a word, which may be modified by adding one of the extended symbols or Vowel modifiers to produce the required pronunciation. All the original characters of the Manipuri Meetei-Mayek (MMM) alphabets are drawn, winded and wreathed based on the features of the human anatomy. Accordingly, the alphabets are named after different parts of the human body [2]. The Meetei-Mayek characters for which recognitions are performed in the current work is shown in figure 1(a) along with the meaning against their names.

The paper is organized in the following order. Section 2 highlights some works related to handwritten character recognition. Section 3 presents the system design of the proposed handwritten character recognition method. Section 4 presents the experimental results, error analysis and evaluations of the proposed character recognition approaches described in section 3. Section 5 charts a comparison between the two approaches for determining size of the feature descriptor, and also to other existing research works for MMM character recognition. Section 6 concludes the paper.

2 Related works

Introduction of MMM OCR is in an infant stage whereas many research works have already been carried out on other Indian scripts of different languages. Sections 2.1 and 2.2 highlight the research works carried out on popular Indian languages and MMM, respectively.

2.1 Research works on other Indian languages

Rani et al [4] focused on the problem of recognition related to Gurumukhi script; they used different techniques for extracting features such as projection histogram, background directional distribution (BDD) and zone-based diagonal features. These features extraction techniques were classified using a support vector machine (SVM) classifier with 5-fold cross-validation and RBF (radial basis function) kernel. They achieved a very high accuracy of 99.4% using a combination of BDD and diagonal features with SVM classifier. Arora et al [5] discussed the characteristics of some classification methods that have been successfully applied to handwritten Devanagari character recognition. The results showed that good classification of Devanagari can be achieved with SVM. Sinha [6] presented an overview of the historical development of the modern Indian scripts’ writing system, their mechanization and adaptation to computing and examined how it facilitated the development of Indian language processing. He concentrated primarily on the Devanagari script and also discussed those features found in current language usage; he explained how the unifying characteristics of the scripts and languages have been exploited for all Indian scripts and languages. Pal et al [7] proposed a system for recognizing offline Bengali handwritten compound characters using Modified Quadratic Discriminant Function (MQDF). Using a 5-fold cross-validation technique they were able to obtain an accuracy of 85.90% from a dataset of Bengali compound characters containing 20,543 samples. Sharma et al [8] proposed a scheme for unconstrained offline handwritten Devanagari numeral and character recognition using a quadratic classifier based on feature obtained from chain code histogram. They were able to achieve an average accuracy of 98.86% for Devanagari numerals and 80.36% for Devanagari characters. Basu et al [9] presented a recognition system for handwritten Bengali alphabets using a 76-element feature set, which included 24 shadow features, 16 centroid features and 36 longest-run features. The recognition performances achieved for training and test sets were 84.46% and 75.05%, respectively. Plamondon and Srihari [10] presented a comprehensive survey for online and offline handwriting recognition; they described the nature of handwritten language and the basic concepts behind written language recognition algorithms. They also indicated in their literature the algorithms for pre-processing, character word recognition and performance with practical systems. Other fields of application like signature verification, writer authentication and handwriting learning tools were also considered.

2.2 Research works on MMM

Maring and Dhir [11] described the recognition of Meetei-Mayek numerals for both handwritten as well as printed. A Gabor filter was used for feature extraction and classification was carried out using SVM. The experiment was carried out using \(14 \times 10\) pixel images and overall accuracy of 89.58% and 98.45% was achieved for handwritten and printed, respectively. Romesh et al [12] described the design of an OCR system for handwritten text in Meitei-Mayek alphabets using artificial neural network (ANN). The database contained 1000 samples, from which 500 samples were considered as a training database and the remaining samples were kept for testing and validation purpose. They observed that the success of the system depended on the feature used to represent the character as well as on the segmentation stage of the test image. Chandan and Sanjib [13] in their literature presented an SVM-based handwritten numeral recognition system for Manipuri script or Meetei-Mayek. They used various techniques for extracting features such as BDD, zone-based diagonal, projection histograms and Histogram Oriented features, which were then classified using SVM as 5-fold cross-validation with RBF kernel. They were able to achieve maximum accuracy of 95%. Romesh et al [14] described a way for simulating and modelling handwritten Meitei-Mayek digit using back-propagation (BP) neural network approach. They were able to achieve an overall performance of 85%. Thokchom et al [15] proposed methods for training BP network with probabilistic features, fuzzy features and a combination of both features for recognizing handwritten Meetei-Mayek characters. They were able to achieve an accuracy of 90.3% for the proposed 27-class classifier neural network with a combination of probabilistic and fuzzy features.

3 System design

The motivation of this paper is to propose a robust method for classifying offline HMMM characters. In the current work, a comparison of Multilayer Perceptron Neural Network (MLPNN)-based classification with Histogram of Oriented Gradient (HOG) descriptors and multiple-feature-size HOG descriptor with linear kernel SVM classifier is presented (figures 2 and 3). The work began with a thorough literature survey of the existing works in MMM script. It was realized that so far no literature exists that can successfully or efficiently classify handwritten Meetei-Mayek alphabets and numerals, which is due to the complex nature of the script. However, previous works reported on numerals alone were quite successful as reported in section 2.2 under the heading ‘Research works on MMM’.

In order to fully comprehend the nature of problem affecting the recognition accuracy of such handwritten characters, two different approaches are being studied in detail. To begin with, all the acquired sample images are pre-processed to remove noise as well as for extracting them individually. The pre-processing steps are discussed in section 3.1. The pre-processed image samples are passed to HOG descriptor system having their required cell sizes for the purpose of extracting feature vectors. As a first approach, the HOG feature vectors were trained and recognized using MLPNN. The classification task using this approach as described in section 3.3 under the heading ‘MMM recognition using MLPNN with HOG descriptors’ was done once, i.e. using \(128 \times 128\) input image sizes. It was learnt that an increase in the input image size resulted in just a meagre increase in recognition result, which did not justify the accuracy, however small the time taken to train the network was. It was also realized from the three confusion matrices that most of the characters like ‘ (2)’, ‘ (4)’, ‘ (LAI)’, ‘ (MIT)’, ‘ (PA)’, ‘ (NA)’, ‘ (CHEEN)’, ‘ TIL’, ‘ (KHOU)’, ‘ (‘WAI)’, ‘ (GOK)’, ‘ (RAAI)’, ‘ (GHOU)’, ‘ (DHOU)’, ‘ (BHAM)’, ‘’ (KOK-LONSUM)’, ‘ (MIT-LONSUM)’, ‘ (NGOU- LONSUM)’, ‘ (EE-LONSUM)’, ‘ (INAP)’, ‘ (UNAP)’, ‘ (SOUNAP)’, ‘ (CHEINAP)’ and ’ (NUNG)’ were consistently misclassified or confused amongst themselves despite the increase in the number of image size. However, other characters showed a slight improvement in their individual accuracies.

Therefore, based on the work by Dalal and Triggs [16], section 3.3 describes a procedure for efficiently recognizing HMMM using HOG feature descriptors and linear SVM classifier. Their feature extractor worked by dividing up an image into small spatial regions or cells; each of these regions accumulated a local 1-D edge orientation over pixels of the cell, and the combined histogram entries formed the representation. In this work, multiple cell sizes for extracting HOG features have been considered in order to determine which size yielded better results for our current classification problem. The extracted feature vectors were used as training data for the linear kernel SVM classifier. Thus, we were able to obtain a significant increase in overall or average accuracy along with a tremendous decrease in training time as compared with the former.

Figure 1
figure 1

(a) Meetei-Mayek script. (b) A sample of the handwritten character ‘MIT(Ma)’. (c) Pre-processed image. (d) All elements are detected and then encapsulated prior to extraction of each one of them.

Figure 2
figure 2

Perceptron neural network.

Figure 3
figure 3

Feedforward neural network.

3.1 Processing the handwritten image

In this section, the stages prior to recognition stage are being described.

3.1a Image acquisition:

In this stage, raw data are created and collected. A total of 5600 handwritten samples were collected from people having different handwriting styles. Secondly, the image samples were scanned using a scanner and saved as a jpeg file. A sample of the acquired handwritten image for the letter ‘ (MIT)’ is shown in figure 1(b).

3.1b Pre-processing:

In order to make the image suitable for further processing the acquired images must be pre-processed. The term pre-processing refers to the removal of any form of noise that is corrupting the useful data so that efficiency as a result of it is not decreased. For character recognition tasks, a binary image is sufficient to work with, so the input grey image is suitably transformed using thresholding. Morphological erosion is performed so as to close the discontinuities between some letters; a square-shaped structural element having a size equal to 2 is selected for the purpose. Morphological erosion is a simple operator in mathematical morphology that is usually performed in binary images or greyscale images. The purpose of the operation is to erode or decay the boundaries of regions of the foreground pixels (i.e. white pixels), and therefore the areas of foreground pixels shrink in size, and holes within those regions become larger. The morphologically eroded image is finally converted into a binary image [17]. Figure 1(c) shows the final image after pre-processing.

3.1c Extracting individual elements:

Prior to extracting each element from the binary image so obtained in the previous step, each of them must be labelled so that automatic extraction from them is possible. For this purpose, each of the elements is bounded by rectangular boxes. It can be seen from figure 1(d) that the size of each of the boxes differs due to the fact that some characters are bigger than others and vice-versa. The bounding box property for each object is an array having 4 elements, which are formatted as [x, y, w, h], where (x,y) represents the row–column coordinates of the upper left corner of the box; w and h are, respectively, the width and height of the box. The next step is creating a 4-column matrix that encapsulates all of these bounding box properties together, where each row denotes a single bounding box. It is necessary to define a good illustration of these bounding boxes, and thus a red box is drawn around each character that is detected. Now, the final task is to extract all of the characters and place them into a cell array because the character sizes are uneven, so putting this into a cell array will accommodate for the different sizes. A cell array is a type of container used for indexing data called cells; each cell may contain any type of data. Commonly they may contain combinations of text and numbers, or list of strings or numeric arrays of varying sizes. Now simply looping over every bounding boxes that we have and then extracting the pixels within each of them will result in a character that can be placed in a cell array. Thereafter, using a loop function, each of the characters in the cell array is written into the directory for further usage.

3.2 Feature extraction from Handwritten Meetei-Mayek script using HOG descriptors

Detecting features in Meetei-Mayek script is a complicated task due to similarity complex of each character. The very first requirement is a robust feature detector that conforms to the shape or structure of the input image so that characters can be discriminated clearly. The current study inclines on the issues of feature set extraction from Handwritten Meetei-Mayek script using the HOG descriptors. The features extracted by multiple-cell-sized HOG features are used as training data for multiple classifiers; the detailed implementation is explained in sections 3.3 and  3.4.

HOG is a standard image feature used, among others, in object detection and deformation object detection. The method evaluates normalized histograms of gradient orientation of images in a dense grid. The most simple explanation is because the shapes and appearance of an object can be characterized easily by distributing the edge detection even without exact knowledge of the corresponding edge positions. It is implemented by dividing up the image window into ‘cells’, which are small spatial regions. Each cell will accumulate a local 1-D histogram of gradient directions over the cell, and the combined histogram entries form the notation. It is also useful to properly equalize the contrast for improved invariance to shadowing or illumination effects before putting them to use. This feature is achieved by accumulating a measure of ‘energy’ of the local histogram over somewhat larger spatial ‘blocks’ or region and then normalizing all of the cells in the block. This is also referred to as HOG descriptor.

HOG divides the input image into square cells of cell size, fitting as many cells as possible, filling the image domain from the upper-left corner down to the right one. For each row and column, the last cell is at least half contained in the image. More precisely, the number of cells obtained in this manner is

$$\begin{aligned} width\_hog= & {} (width + cellsize/2)/cellsize, \end{aligned}$$
(1)
$$\begin{aligned} height\_hog= & {} (height + cellsize/2)/cellsize. \end{aligned}$$
(2)

Later, the image gradient \(\delta l(x,y)\) is computed using central difference, which is then assigned to one of the 2*number \(\_\)of\(\_\)orientations orientation in the range [0,2\(\pi \)]. The contributions are then accumulated using bilinear interpolation to four neighbouring cells, which results in a histogram of dimension 2*number\(\_\)of\(\_\)orientations, called directed orientations since it accounts for the direction as well as the orientation of the gradient.

Implementation: The implementation of the HOG feature descriptors for Meetei-Mayek script is based on the research work by Dalal and Triggs [16]. The detector has been tested in our Meetei-Mayek database, which roughly comprises 56 different classes multiplied by 100 samples each.

The training images comprise roughly 56 different classes times 75 samples each. The pre-processing procedure detailed in section 3.1 is used to segment each of the character samples and finally, the images are resized to \(50 \times 50\) pixels. For testing, the remaining 25 samples for each of the character/class are used to validate how well the classifier performs on data that are different from the training data. Although this is not the most representative dataset, there are enough data to train and test a classifier, and show the feasibility of the approach.

The data that are used for training the classifier are the HOG feature vectors extracted from the input training images. Hence, it is important that the feature vector encodes a sufficient amount of information about the object. With the variation in cell size parameter, the amount of information encoded by each feature vectors can be observed. Each of the pixels in the image calculates a weighted vote for an edge orientation histogram channel. The weighted vote, which is based on the orientation of the gradient element, is accumulated into bins over local regions, which are termed as cells. The orientation bins are specified as a logical scalar and they are evenly spaced from 0 to 180 degree. In this case, a scalar of value less than 0 is placed into the +180 degree value bin. The dark to light versus light to dark transitions contained within some areas of an image can be differentiated using signed orientation. The bilinear interpolation of votes between the neighbouring bin centres can reduce aliasing for orientation as well as position. Increasing cell size can be used for capturing large-scale spatial information. It may be noted that cell size is specified as a 2-element vectored form in pixels. The suppression of changes in local illumination may be reduced with increasing cell size, i.e., losing minute details as a result of averaging. Therefore, a reduction in the size of blocks will help in capturing the significance of local pixels. However, in actual practice, the gradient parameters must be varied by repeatedly training and testing for identifying the optimal parameter settings.

For instance, in the current work, the optimal block size of HOG feature that must be maintained for efficiently recognizing Meetei-Mayek characters is explored by considering the cell sizes, viz., \(6 \times 6\), \(7 \times 7\) and \(8 \times 8\). Figure 4 shows the features extracted using HOG descriptors for the Meetei-Mayek numeral ‘ (9)’.

The extracted HOG features are returned as a 1\(\times N\) vector (the features encode local shape information from regions or from point locations within an image) where N is HOG feature length and is based on the image size and the function parameter values. Let us suppose that \(B_{image}\) is the number of blocks per image, C is the cell size, \(N_{b}\) is the number of bins, \(B_{o}\) is the block overlap, \(B_{size}\) is the block size and \(size_{image}\) is the size of the image. The following equations are used for appropriately deducing the value of N:

$$\begin{aligned} N = B_{image}B_{size}N_{b} \end{aligned}$$
(3)

where

$$\begin{aligned} B_{image}=\frac{\left( \frac{size_{image}}{C}-B_{size}\right) }{B_{size}-B_{o}}+1. \end{aligned}$$
(4)

Table 1 highlights the number of detected features on MMM for different cell sizes. It is important to deduce the dimension of cell size that gives us the best recognition performance when combined with classifiers.

3.3 MMM recognition using MLPNN with HOG descriptors

ANNs can be aptly stated as one of the popular techniques for task related to recognition and classification because of their learning and generalization abilities. They are composed of multiple layers, within which large processing elements or more specifically neurons are interconnected to one another and also work in unison for solving problems. They can be tuned for solving specific applications like MMM recognition using MLPNN data classification or pattern recognition via a learning approach. The multilayer perceptron is a network of fully connected neurons having an input layer, hidden layer(s) and an output layer. The neurons in a layer are connected to each and every neuron in the next layer by a weighted link through which the state of the neuron is transmitted. Each layer has a different activation function, with different neurons in it.

3.3a Perceptron:

The perceptron neural network consists of a single layer of S neurons connected to R inputs through a set of weights \(w_{i,j}\) as shown in figure 2. The indices i and j indicate that \(w_{i,j}\) is the strength of the connection from the \(j^{\mathrm{th}}\) input to the \(i^{\mathrm{th}}\) neuron.

3.3b Feedforward neural network:

Feedforward network usually consists of one or more hidden layers of sigmoid neurons followed by linear output neurons. More than one hidden layer with a non-linear type transfer function will facilitate the network to learn non-linear as well as the linear relationship between the input and output vectors. The linear output layer produces values in the range –1 to +1.

On the other hand, a sigmoid-type transfer function should be used if the outputs of the network are required to be constrained (such as between 0 and 1). The superscript on the weight matrices is determined by the number of layers in case of multiple-layered network structure, which can also be noted from the neuron model and network architectures. As seen in figure 3, a two-layered tansig/purelin network is shown; it can be used as a general approximation function to suitably approximate any type of function with a finite number of discontinuities, provided the number of neurons in the hidden layer is sufficient.

3.3c BP algorithm:

The BP algorithm is the most popular method for neural networks training and it has been used to solve numerous real-life problems. BP is a multilayer feedforward neural network that performs iterative minimization of a cost function by making weight connection adjustments according to the error between the computed and the desired output values. Figure 2 shows a general three-layer network. The error or cost function is the mean squared sum of differences between the output values and the desired target values of the desired network. The following formula is used for this error:

$$\begin{aligned} E = \frac{1}{2}\sum _{p}\left(\sum _{k}({t_{pk}-o_{pk})^{2}}\right). \end{aligned}$$
(5)

In the current work, a general three-layer BP network is being used. When \(w_{ik}\) changes, it affects the error only on one output unit k. When \(w_{ij}\) changes, it affects the error on all the output units. Here p in the subscript represents a pattern and k represents the output units. Thus, \(t_{pk}\) is the target value of output unit k for pattern p and \(o_{kp}\) is the actual output value of the output layer unit k for pattern p. This error function is commonly used; however, other types of error function can also be applied. During the training process, a set of pattern examples is used, each example consisting of a pair with the input and corresponding target output. The patterns are presented to the network sequentially in an iterative manner. Appropriate weight corrections are performed during the process to adapt the network to the desired behaviour. The iterative procedure continues until the correction weight values allow the network to perform the required mapping; each iterative presentation of the whole pattern set is named an epoch. The minimization of the error function is carried out using a gradient descent technique. The necessary corrections to the weights of the network for each iteration n are obtained by calculating the partial derivative of the error function in relation to each weight \(w_{jk}\), which gives a direction of steepest descent. A gradient vector representing the steepest slope in direction of weight space is obtained. The weight update value \(\delta w_{jk}\) uses the negative of a gradient vector to perform a minimization. Based on the gradient direction, the delta rule will determine the required amount of weight update along with step size:

$$\begin{aligned} \delta w_{jk} = -\eta \frac{\delta E}{\delta w_{jk}} \end{aligned}$$
(6)

The parameter \(\eta \) represents the step size and is called the learning rate.

We have the weight change in the hidden layer equal to

$$\begin{aligned} \delta w_{ij} = \eta \delta _{j} O_{i} \end{aligned}$$
(7)

The \(\delta _{k}\) for the output units can be calculated using directly available values since the error measure is based on the difference between the desired \(t_{k}\) and the actual \(o_{k}\) values. However, this measure is not available for the hidden neurons. The solution is to back-propagate the \(\delta _{k}\) values layer by layer through the network so that finally the weights are updated. A momentum term was introduced in the BP algorithm by Rummelhart. The idea consists in incorporating in the present weight update some influence of the past iterations. The delta rule thus becomes

$$\begin{aligned} \delta w_{ij}(n) = -\eta \frac{\delta E}{\delta w_{jk}} + \alpha \delta w_{ij}(n-1) \end{aligned}$$
(8)

where \(\alpha \) is the momentum parameter and it determines the amount of influence from the previous iteration on the present one. It introduces a ‘damping’ effect on the search procedure, thus avoiding oscillations in irregular areas of the error surface by averaging gradient components with opposite sign and accelerating the convergence in long flat areas. In some situations, it possibly avoids the search procedure from being stopped in local minima. It may be considered as an approximation to a second-order method as it uses information from the previous iterations. In some applications, it has been shown to improve the convergence of the BP algorithm [18, 19].

3.3d Implemented neural network architecture:

The neural network architecture consists of three layers: the input layer, the hidden layer and the output layer. The input layer consists of 16384 neurons, which is because the originally pre-processed samples are converted to Glyph images of size \(128 \times 128\). The output layer is composed of only 6 neurons which represent only 0 or 1, i.e., binary representation; ‘tansig’ and ‘logsig’ transfer functions are used because their output range is 0–1 and perfect for learning to give output as Boolean values. The matched character is in the form of binary digits, which can be decoded from values stored in 6-digit binary numbers. The 6 output neurons have the capability for representing a total of 63 characters according to binary calculations.

A hidden layer of 100 neurons was finally selected after testing on different layer sizes for its optimum results because many numbers of neurons will increase the chances of overfitting. Characters were resized, normalized and formed into vectors to feed-in the network for training. Figure 5 presents the neural network architecture. In our current work, the same neural network architecture is used in three different ways: i.e., the training samples are used with three different HOG cell sizes (\(6 \times 6\), \(7 \times 7\) and \(8 \times 8\)) to study the effects on accuracy.

Training set: A total of 56 different characters/classes with 100 handwritten samples each were collected, out of which 75 samples for each of the characters were used for training the neural network. Out of the 75 samples for each of the classes, 80% are used for training, 10% are used for validation and remaining 10% are used to test during the regression. The 57 different classes of Meetei-Mayek characters considered in the current work comprises 43 alphabets and 10 digits as shown in figure 1(a).

Figure 4
figure 4

(a) Sample of the pre-processed handwritten Meetei-Mayek numeral ‘9’. (b) HOG feature of \(6 \times 6\) size cell with length 1764. (c) HOG feature of \(7 \times 7\) size cell with length 1296. (d) HOG feature of \(8 \times 8\) size cell with length 900.

Table 1 Cell size versus HOG feature length.
Figure 5
figure 5

Neural network for \(20 \times 20\) pixel feature size.

Figure 6
figure 6

Mean squared error (mse) versus epochs for HOG cell size \(6 \times 6\) with MLPNN.

Figure 7
figure 7

Mean squared error (mse) versus epochs for HOG cell size \(7 \times 7\) with MLPNN.

Figure 8
figure 8

Mean squared error (mse) versus epochs for HOG cell size \(8 \times 8\) with MLPNN.

As described earlier, training of the neural network was done in two passes during each iteration – forward pass and backward pass. In the forward pass, the input signals are propagated from the neural network input layer to the output layer. In the reverse pass, error signals generated at the output layer are propagated backward through the network for adjusting the weights of the neurons. The training of the neural network is done using a training function that updates weight and bias values according to gradient descent momentum and an adaptive learning rate.

The performance goal was set to 0.0005. Due to constant optimization during the regression phase, the neural network with HOG cell size \(6 \times 6\) converges to its maximum accuracy in just 123 epochs with a training time of 2 min and 34 s. Secondly, the neural network that was trained using Glyph images of HOG cell size \(7 \times 7\) took 190 iterations or epochs, in 5 min and 17 s to converge to its maximum accuracy. Lastly, for the HOG cell size \(8 \times 8\) the solutions converged to their maximum accuracy in 117 epochs in just 1 min and 8 s. The mean squared error versus epochs for the networks trained with HOG cell sizes \(6 \times 6\), \(7 \times 7\) and \(8 \times 8\) is shown in figures 6, 7 and 8, respectively.

3.4 MMM recognition using linear SVM classifier with HOG feature descriptors

The current section provides a deep analysis of how features can be extracted from HMMM using multiple cell sizes HOG descriptors and then using them to train a classifier for efficient recognition. A linear kernel type SVM classifier has been used in the current training tasks because of its speed and reliability. The following subsections explain the basic terminologies involved, implementation and performance of the proposed approach.

SVM is a classifier separating classes in feature space; it is used to identify a set of linearly separable hyperplanes, which are linear functions of the feature space. Among the separable hyperplanes, only one hyperplane is chosen and placed such that the distance between the classes is maximum. SVM has a very high accuracy rate for two-class problems but it can be also modified to classify multiclass problems. If a classifier works with a large number of adjustable parameters and therefore large capacity, it probably learns the training set without error. The effective number of parameters is adjusted automatically to match the complexity of the problem [20]. The equation \(w^{t}x+b=0\) is a hyperplane separating two classes. Let us consider \((X_{i},Y_{i})\) for \(i = 1,2,3,...,N\) denoting the training dataset, where \(Y_{i}\) is the training data of \(X_{i}\). There may be numerous hyperplanes that can separate the two classes, but the aim of SVM is to find the one that gives equal and maximum margin from both the classes. Mathematically, the aim of SVM is to maximize the objective function \(L(\alpha )\) given by

$$\begin{aligned} L(\alpha )=\sum \alpha _{i} - \frac{1}{2} \sum _{i=1}^{N} \alpha _{i}\alpha _{j} N_{j}N_{i} = Y_{i}Y_{j}\phi (X_{i})(X_{j}) \end{aligned}$$
(9)

subject to the constraint

$$\begin{aligned} \sum _{k=1}^{N}\alpha _{i}Y_{j} = 0, \ 0\le \alpha _{i}\le C \ \ \forall i \end{aligned}$$
(10)

where C is the cost parameter that determines the cost caused by constraint violation, \(\alpha _{i}\) is the hyper-parameter and \(\phi (.)\) is the feature mapping function. Asking for the maximum-margin linear separator in Eq. (17) leads to standard quadratic programming (QP) problems. With the mentioned constraints, the QP solution leads to the following classification function for SVMs:

$$\begin{aligned} Y= & {} sgn(W\phi (Z)+b) \end{aligned}$$
(11)
$$\begin{aligned} Y= & {} sgn\left(\sum _{i=1}^{q}\alpha _{i}Y_{i}(X_{i}Z+b)\right)\end{aligned}$$
(12)

where \(\alpha _{i}\) is the Lagrange multiplier assigned to each training data, whose value depends on the role of training the data in the classifier system. The non-zero values of \(\alpha _{i}\) correspond to the support vectors that are used to construct the classifier in (20); ‘q’ denotes the number of support vectors. If the feature functions \(\phi (.)\) are chosen with care one can calculate the scalar products without actually computing all features, thereby greatly reducing the computational complexity. In SVM the learning algorithms require only dot products between the vectors in the original input space, and the mapping is chosen such that these high-dimensional dot products can be computed within the original space means of a kernel function, also called ‘kernel trick’ [20]:

$$\begin{aligned} K(x,x_{i})=\phi (x).(x_{i}). \end{aligned}$$
(13)

4 Experimental results and evaluation

The current section highlights the experimental results and evaluation in detail for the two implementation strategies mentioned earlier. Finally, the comparison subsection also verifies the selection of a suitable type of feature extractor and classifier for efficiently recognizing HMMM characters.

4.1 Experimental results and error analysis for MLPNN with HOG feature descriptors

Table 2 Tabulation of the resulting accuracy (%) and time taken for training the classifier in seconds.

The experimental results described in the current section are a follow-up of the procedure explained in section 3.3. As described earlier, the network was tested against each of the remaining 25 samples from each of the 56 classes of the script for three different HOG cell sizes, i.e. \(6 \times 6\), \(7 \times 7\) and \(8 \times 8\). Table 2 shows the success percentage against each of the characters.

Using HOG cell size \(6 \times 6\): It can be seen from tables 4 and 5 that most of the characters have very low accuracy, i.e. as low as 4% for characters like ‘ (0)’ and ‘ (KHOU)’, whereas characters that have only about 30–50% accuracy rates are ‘ (SAM)’, ‘ (MIT)’, ‘ (CHEEN)’, ‘ (THOU)’, ‘ (EE)’, ‘ (DIL)’ and ‘ (4)’. Other characters whose accuracy are low and in the range 50–80% include ‘ (1)’, ‘ (2)’, ‘ (3)’, ‘ (4)’, ‘ (6)’, ‘ (8)’, ‘ (KOK)’, ‘ (LAI)’, ‘ (PA)’, ‘ (NGOU)’, ‘ (WAI)’, ‘ (PHAM)’, ‘ (ATIYA)’, ‘ (RAAI)’, ‘ (BAA)’, ‘ (GHOU)’, ‘ (DHOU)’, ‘ (KOKLONSUM)’, ‘ (LAILONSUM)’, ‘ (MITLONSUM)’, ‘ (PALONSUM)’, ‘ (NGOULONSUM)’, ‘ (EELONSUM)’, ‘ (INAP)’, ‘ (SOUNAP)’, ‘ (YETNAP)’, ‘ (OTNAP)’, ‘ (CHEINAP)’, ‘ (NUNG)’, ‘ (QUESTION MARK)’, ‘ (COMMA)’ and ‘ (FULLSTOP)’, whereas the remaining characters showed fair level to good level of accuracy. The overall accuracy achieved is 68.61%.

Using HOG cell size \(7 \times 7\): Tables 4 and 5 show that the use of HOG cell size \(7 \times 7\) results in increased accuracy in comparison with its \(6 \times 6\) counterpart. It can be seen that there are more characters for which the accuracy has crossed 80%. However, there are three characters for which the accuracy is below 50%: ‘ (0)’, ‘ (MIT)’ and ‘ (CHEEN)’. The characters for which the accuracy lies between 51% and 80% are: ‘ (1)’, ‘ (2)’, ‘ (5)’, ‘ (8)’, ‘ (KOK)’, ‘ (SAM)’, ‘ (LAI)’, ‘ (MIT)’, ‘ (PA)’, ‘ (NA)’, ‘ (CHEEN)’, ‘ (KHOU)’, ‘ (THOU)’, ‘ (UN)’, ‘ (EE)’, ‘ (PHAM)’, ‘ (ATIYA)’, ‘ (RAAI)’, ‘ (DIL)’, ‘ (GHOU)’, ‘ (LAILONSUM)’, ‘ (PALONSUM)’, ‘ (NALONSUM)’, ‘ (NGOULONSUM)’, ‘ (EELONSUM)’, ‘ (INAP)’, ‘ (SOUNAP)’, ‘ (YETNAP)’, ‘ (OTNAP)’, ‘ (CHEINAP)’, ‘ (NUNG)‘ and ‘ (FULLSTOP)’. The overall accuracy achieved is 75.57%.

Using HOG cell size \(8 \times 8\): Tables 4 and 5 show that when HOG cell size \(8 \times 8\) is used it results in a slight decrease in individual accuracy as compared with that of \(7 \times 7\). It can be seen that there are more characters for which the accuracy has crossed 80%. However, there are some characters for which the accuracy is below 50%: ‘ (0)’, ‘ (SAM)’ , ‘ (MIT)’, ‘ (NA)’, ‘ (CHEEN)’ , ‘ (KHOU)’, ‘ (THOU)’, ‘ (EE)’ and ‘ (YETNAP)’. The characters for which the accuracy lies between 51% and 80% are ‘ (1)’, ‘ (2)’, ‘ (3)’, ‘ (4)’, ‘ (9)’, ‘ (KOK)’, ‘ (LAI)’, ‘ (PA)’, ‘ (TIL)’, ‘ (NGOU)’, ‘ (WAI)’, ‘ (UN)’, ‘ (DIL)’, ‘ (GHOU)’, ‘ (DHOU)’, ‘ (LAILONSUM)’, ‘ (PALONSUM)’, ‘ (NALONSUM)’, ‘ (TILLONSUM)’, ‘ (NGOULONSUM)’, ‘ (EELONSUM)’, ‘ (SOUNAP)’, ‘ (OTNAP)’, ‘ (CHEINAP)’, ‘ (NUNG)’ and ’ (FULLSTOP)’. The overall accuracy achieved is 66.78%.

4.1a Evaluation:

The current work has demonstrated the application of MLP networks with HOG descriptors for HMMM character recognition problem. It can be seen that the accuracies are greatly affected by the use of HOG cell sizes; choosing a suitable value of HOG cell size is a must. Maximum accuracy can be seen when HOG cell is set to \(7 \times 7\). The total time it took to train the network was 5 min and 17 s, and the solution converged in 190 iterations as highlighted earlier. The lower percentage of successful character recognition is because of the roundedness, starting and finishing style of the Meetei-Mayek characters. Finally, it can be learnt from the current application that training a model from a very complex dataset using neural network is too computationally intensive, which means that it will be slow on low-end PCs, or machines without math co-processors. However, processing speed alone is not the only factor in performance and neural networks do not require the time programming and debugging or testing assumptions that other analytical approaches do. Therefore, there is a need to improve the performance of our system using a more robust feature and pattern recognition system like SVM in the classification phase. The limitations of the current character recognition tasks such as speed and accuracy are alleviated with the help of HOG descriptors combined with linear SVM.

4.2 Experimental results and evaluation using multiple-HOG-feature vector with multiclass linear kernel SVM classifier

The current section describes the experimental results of the HMMM character recognition operation using multiple-HOG-feature vector with multiclass linear kernel SVM classifier as described in section 3.4. The use of linear SVM classifier returned a fully trained multiclass, error-correcting output codes (ECOC) model using the training features or HOG descriptors and the class labels in the HOG feature. The One-versus-one coding scheme was employed. In this scheme, for each binary learner, one class is positive, another is negative and the software ignores the rest. This design exhausts all combinations of class pair assignments. The number of binary learners is \({K(K-1)/2}\), where K is the number of unique classes of labels [21, 22]. In the current study, a handwritten character recognition for Meetei-Mayek script based on HOG feature descriptors and trained by SVM linear kernel is successfully implemented. Three different values of cell sizes have been considered, which were examined for accuracy by training the linear SVM classifier individually. Table 3 shows the time taken to train the linear SVM classifiers and the accuracy achieved in each case. Testing of the classifier was performed using the remaining 20 samples from each of the 56 classes of the script; the individual performance or the success rates are recorded in tables 4 and 5. Some of the characters that are marked with an asterisk (*), viz. ‘ (PA)’, ‘ (KHOU)’ and ‘ (WAI)’ in table 4, have very low accuracy in comparison with other characters. For the ‘ (PA)’ character the accuracy increased drastically from 40% to 96%, which is promising. However, the worst recognition rate is achieved in case of ‘Khou’, in which the accuracy starts from just 20% and ends at a maximum of 52%. While most of the characters need to be worked on for better efficiency, some other characters like ‘LAI’ and ‘THOU’ also need an increase in accuracy. Despite the low accuracy readings mentioned earlier, there are also 14 cases where the 100% accuracy holds for all cell sizes, viz. - ‘ (7)’, ‘ (8)’, ‘ (SAM)’, ‘ (PHAM)’, ‘ (GOK)’, ‘ (RAAI)’ , ‘ (BAA)’, ‘ (ATAP)’, ‘ (UNAP), ‘ (YETNAP), ‘ (NUNG)’, ‘ (QUESTION MARK)’, ‘ (COMMA)’ and ‘ (FULLSTOP). Table 5 also shows a huge variation in training time extending from 50.99 up to 67.91 s, which is due to the varying length of HOG feature sizes for each cell size. Thus, it can be aptly stated that offline handwritten Meetei-Mayek can be most suitably used with HOG feature vector of cell size \(6 \times 6\) to achieve an accuracy of about 96.928%.

Table 3 Tabulation of the resulting accuracy (%) and time taken for training the classifier in seconds.

5 Comparison

Table 4 Comparison of accuracy for each class of the MMM between MLPNN with HOG feature and linear SVM with HOG-feature-based classifiers.
Table 5 Comparison of accuracy for each class of the MMM between MLPNN with HOG feature and linear SVM with HOG-feature-based classifiers.

This section is broken up into two major portions. In subsection 5.1 the current study, i.e. HOG features with SVM classifier, is being compared against the MLPNN with HOG descriptors. Subsection 5.2 shows a comparison of our results against some of the top performing character recognition works in MMM.

5.1 Comparison between the MLPNN with HOG descriptors and linear SVM with HOG descriptors classifiers

In this study, we propose a novel feature-extraction-cum-classification approach for recognizing HMMM by establishing that HOG features can play a crucial role in increasing the accuracy of classifiers. This objective is accomplished by comparing two different models as mentioned earlier: in the first model, HOG feature vectors are trained using MLPNN, while in the second model, HOG features are used for training a fast and reliable SVM classifier. The implementation strategies and the discussion for the two approaches are explained in detail in section 3.

The results obtained by training and testing our Meetei-Mayek database are well tabulated in tables 4 and 5 for comparison. A total of 56 different classes are being put up for comparison. The value recorded in the tables gives the accuracy achieved against each character, which is calculated by tabulating the number of recognition times for each character in multiple confusion matrices. A total of 6 confusion matrices were recorded for the tasks, 3 for the first method and 3 for the second method. For both the techniques, three different HOG cell sizes are used: \(6 \times 6\), \(7 \times 7\) and \(8 \times 8\).

The average recognition rate for each of the classifiers as recorded in table 6 shows that the when HOG feature vectors extracted from MMM are simply trained by MLPNN, a maximum accuracy of about 75.57% can be realized using a cell size of \(7 \times 7\). However, if these HOG vectors are trained using SVM classifier, a maximum accuracy of about 96.928% can be realized using a cell size of \(6 \times 6\), which is greatly increased as compared with the MLPNN with HOG descriptor method. The level of accuracy can still be improved using more number of training samples per class.

Table 6 A comparison of accuracy and training time between MLPNN with HOG descriptors and HOG descriptors with SVM classifiers.
Table 7 Comparison of other works on MMM with the current work.

5.2 Comparison to other works on MMM

The current techniques, i.e. HOG features with SVM and the linear SVM with HOG features, are compared to four other research works on MMM. It can be seen that, previously, most of the work was performed either on the 10 numerals or on the 27-character decimals. However, for developing a complete OCR platform, there is a need to classify all the classes of the script by a single baseline classifier and that too with a high level of accuracy. In this sense, we are the first to implement a recognition system that is able to classify all the 56 different classes of the script with a high level of accuracy. Table 7 shows a comparison of the proposed approach with some previous works in MMM character classification or recognition.

Numeral classification: Romesh et al [14] used around 700 and 300 training and testing samples, respectively, for classifying the 10 different numeral classes of the script. They achieved an overall accuracy of 85%. Maring and Dhir [11], on the other hand, proposed a different architecture and used around 6000 and 1200 training and testing samples, respectively, for classifying the same 10 different classes and achieved an accuracy of around 89.58%.

Character classification: Thokchom et al [15] developed a technique to categorize all the 27 different main alphabets of the script using around 459 and 135 training and testing samples, respectively. They achieved an overall accuracy of 90.3%.

Numeral, character, punctuation mark and additional letters: In the current paper, the use of HOG features with SVM classifier is studied in detail using different cell sizes and also tuned to classify all the 56 classes of the script. Seven different HOG cell sizes were used (i.e., \(6 \times 6\), \(7 \times 7\) and \(8 \times 8\)). For each of them the same numbers of training and testing samples were used, i.e. 4200 and 1400, respectively. The overall accuracy of 96.93% with a cell size of \(6 \times 6\) was achieved.

6 Conclusion

In this work, a novel approach for efficiently recognizing HMMM characters is presented by means of comparison between the traditional BP-learning-based ANN with HOG descriptors and the multiple-cell-sized HOG descriptors with linear SVM classifier. About 5600 handwritten samples of the 56 different classes of the MMM were collected from a group of different people. The samples were then pre-processed to remove the noise in and around the letters followed by extraction of each letter from the group. The training and testing phase used three different HOG descriptors sizes: \(6 \times 6\), \(7 \times 7\) and \(8 \times 8\). The maximum accuracy that we were able to achieve was 96.928% with a minimum training time of just 50.99 s.

Therefore, it can be stated that the complex Meeitei-Mayek characters can be efficiently recognized using the \(6 \times 6\) cell-sized HOG descriptors with multiclass linear kernel SVM classifier.