1 Introduction

Optical Character Recognition (OCR) is used for converting hard copy image containing machine-printed or handwritten text into a format which can be edited or processed by a computer. OCR frameworks have just accomplished impressive accuracy in fine printed text recognition. Historical books and medieval handwritten text documents containing bleed through text are yet to be recognized with acceptable accuracy. Gurmukhi is one of the important Indian scripts used in north India. There are a number of medieval handwritten documents of the Gurmukhi script containing ancient manuscripts, printed books and typewritten documents of the sixteenth to the twentieth century. But, major composition is the handwritten manuscripts, mainly written by spiritual persons during that period. These records are profitable for the substance as well as particularly imperative for their physical appearance. A sample of the medieval handwritten document is shown in Fig. 1. To exploit the contents of these valuable Gurmukhi script manuscripts, recognition of these manuscripts is required. In this paper, an improved architecture has been proposed for recognition of isolated medieval handwritten Gurmukhi characters. Using this architecture, maximum recognition accuracy of 95.91% has been accomplished using AdaBoost (Adaptive Boosting) methodology. The organization of the paper is as follows: Sect. 2 illustrates the related work. Data collections are discussed in Sect. 3. Motivation and challenges for the proposed framework are discussed in Sect. 4. Section 5 describes the various phases of the proposed architecture. Section 6 presents the fundamentals of adaptive boosting and bootstrap aggregating methodologies. Experimental results and discussions are described in Sect. 7. Section 8, depicts the state-of-the-art work. Finally, inferences and future directions of the proposed work are presented in Sect. 9.

Fig. 1
figure 1

A sample of medieval handwritten Gurmukhi script document

2 Related Work

Recognition of medieval handwritten manuscript documents is a challenging issue because of the unique properties of writing styles of individuals. More robust techniques are required to take care of this challenging issue as existing strategies for this design are not fruitful. Although handwritten document recognition is a timeworn problem and a considerable measure of research has just been done on it, but this issue is not completely tackled yet. Great outcomes have been accounted for online handwriting recognition because of the accessibility of dynamic data, for example, the strokes of handwriting [1]. A lot of work has been accomplished for handwriting recognition at the character-level, which requires determining character boundaries [2]. As the complex writing styles make it difficult to extract character boundaries, the technique of jointly segmenting and recognizing the characters has been adopted. Lorigo and Govindaraju [3] have studied the different methodologies utilized for recognition of offline Arabic handwritten documents. Nikolos and Dimitrios [4] proposed basic and strong binarization technique for authentic historical documents. No reported work has been found in literature dealing with the problem of recognition of old handwritten manuscripts of any Indian script, in general. In particular, nothing has been done to recognize the old handwritten manuscripts of Gurmukhi script. Prominent work done in recognition of the Gurmukhi script documents is by Lehal and Singh [5, 6]. Alirezaee et al. [7, 8] built up a character recognition system for middle age Persian manuscripts. They have also proposed structural and statistical features based combined approach for character recognition of Greek and Egyptian documents. Feng and Manmatha [9] investigated different machine learning models to solve the historical handwritten manuscript recognition problem. They used Naive Bayes with kernel density estimates, SVM and conditional maximum entropy models for recognition of the 20 pages of George Washington’s manuscripts. Kumar et al. [10] have proposed two efficient feature extraction techniques, namely, parabola fitting based and power curve fitting based for offline handwritten Gurmukhi character recognition. The classifiers that have been employed in their work are k-NN and SVM. Kumar et al. [11] have presented a hierarchical feature extraction technique for offline handwritten Gurmukhi character recognition. They have achieved a recognition accuracy of 91.80% with proposed hierarchical feature extraction technique while using the PCA feature set and SVM with a linear kernel classifier. They have also observed that PCA perform better than CFS and CON feature selection techniques for character recognition. Cheriet et al. [12] discussed the achievements of handwriting recognition research in the last twenty years and also discussed the challenges for handwriting recognition. Shayegan et al. [13] described a list of most widely used structural features, statistical features and transformation features in OCR applications. Tian et al. [14] have presented an efficient multilingual scene character recognition system based on co-occurrence of the histogram of oriented gradients. They have presented two new feature descriptors: Co-occurrence HOG (Co-HOG) and Convolutional Co-HOG (ConvCo-HOG) for accurate recognition of scene texts of different languages. They have experimented on five scene character datasets of three different languages, including, three sets in English, one set in Chinese and one set in Bengali. Sarkhel et al. [15] have proposed a novel multi-column multi-scale convolutional neural network (MMCNN) based architecture for the recognition of isolated handwritten characters of popular Indic scripts. They have evaluated the proposed framework on 10 publicly available datasets of isolated handwritten characters or digits including MNIST and promising results have been achieved by the proposed system for all of the datasets.

3 Data Set

Gurmukhi is the script used for writing the Punjabi language. Since, there is no standard database of Gurmukhi manuscripts, therefore, data has been collected for training and testing purpose. For that, pre-segmented characters of 150 documents of Gurmukhi manuscripts from different books have been scanned. This is the first of its kind of a database consisting of medieval handwritten manuscripts of any Indian script. In this work, we have taken 1140 samples of 43-classess (25 samples of each class) consisting of isolated medieval handwritten Gurmukhi characters. Medieval handwritten Gurmukhi characters are a little bit different from present day’s handwriting because there may be disintegrating of pages, rust or defacing of ink etc. existing in the medieval document.

4 Motivations and Challenges

The physical condition of historical documents is affected by a number of reasons: ageing, rehashed utilize, staining, disintegrating of pages, rust, diffusing of ink and so on. Therefore, capturing text from such degraded historical documents and later recognizing the text is a crucial and typical task. There are many challenges also occurred during recognition of medieval handwritten Gurmukhi documents like in different phases of the recognition system. While digitizing the manuscripts, it is required to handle them very securely as most of the manuscript pages are fragile. Noise is always a major issue to be dealt with, which obstructs the recognition of the medieval historical documents. A historical document may contain a variety of slanting that is also an issue for text recognition. Segmentation is also an important issue because; most of the historical documents of the Gurmukhi script do not contain many word breaks. Text recognition of historical documents is a challenging problem as the shapes of the handwritten characters in historical documents are different from handwritten characters of modern time.

5 Proposed Architecture

The proposed system for medieval handwritten character recognition consists of various phases like digitization, pre-processing, feature extraction, and classification. These phases are briefly discussed in following sub-sections.

5.1 Digitization and Pre-processing

Foreground and background information is required to be preserved during the digitization of medieval handwritten documents. The quality of the scanner and camera influences the quality of the image. While digitizing the manuscripts, it is required to handle them very safely as most of the manuscript pages are fragile. The captured digital image of a document during digitization phase contains many problems, requiring digital enhancement of the image. The actual shape of the characters may change and different types of strokes can be found in a document. This occurs due to degradation of both the paper and the ink. Therefore, digital enhancement of the manuscript documents is essential and requires a good research work to manage this issue. Digitization creates the digital image which is sustained in the pre-processing phase. In this phase, the gray level character image is normalized into a window of size 64 × 64 using Nearest Neighborhood Interpolation (NNI) algorithm. After normalization, we produce a bitmap image of the normalized image.

5.2 Feature Extraction

Feature extraction is the most important phase of a character recognition system. The performance of a character recognition system depends heavily on the types and number of feature values used. As far as it is concerned with handwritten historical documents, shape discrepancy among characters belonging to the same class is sometimes quite large because of the poor quality, low resolution of the document images, different styles of writing the characters, etc. In this paper, statistical feature extraction techniques, namely, zoning, discrete cosine transformations and gradient features have been used. These feature extraction techniques are briefly discussed in following sub-sections.

5.2.1 Zoning Features

This is one of the simplest and very powerful feature extraction methods. Using this method, the character image is divided into N × M (N = 8 and M = 8) zones. Features are extracted from each zone to generate the feature vector. Normally, the number of foreground pixels in each zone is considered a feature. Zoning is a local feature extraction technique to obtain the local characteristics of any character instead of the global characteristics of it. The number ni of foreground pixels in each zone is counted and a feature vector is obtained by dividing ni by N, where N is the total number of foreground pixels in the character image. Density values (Number_of_foreground_pixels/Total_number_of_pixels) are obtained for each zone. All density values are used to form the input feature vector for a particular character pattern.

5.2.2 Discrete Cosine Transformations (DCT) Features

DCT basically expresses a sequence of real data points into its real spectrum. The distribution of the coefficients of DCT depends on the nature of the studied image. For an image having low intensity with low spatial information, DCT provides good energy compaction in low frequency zones. In other words, conventional DCT coefficient distributions diminish while going to higher frequencies [16]. The DCT feature extraction technique has already been used for digit recognition by Ahmed [17] using four variations of DCT: DCT upper left corner coefficients, DCT zigzag coefficients, block based DCT upper left corner coefficients and block based DCT zigzag coefficients.

5.2.3 Gradient Features

The gradient of the image is the intensity at each point, giving the direction of the largest possible increase of light to dark and the rate of change in that direction. The gradient measures the magnitude and direction of the greatest change in intensity in a small neighborhood of each pixel. The rate of change of image/pixel intensity is modeled in gradient direction feature. The gradient direction feature is suitable for application using machine print/handwriting, binary/gray scale and low-resolution images [18]. A grayscale image is generated from an input binary image, and the gradient is calculated as described below.

  1. 1.

    Input normalized image of a character.

  2. 2.

    Mean filter of size 2 × 2 is repeatedly applied r times to obtain a grayscale image.

  3. 3.

    The gray scale image is normalized so that the mean and the maximum of the gray scale are 0 and 1, respectively.

  4. 4.

    Roberts filter given by Eqs. (1) and (2) is applied to each pixel \( g\left( {i,j} \right) \) of the normalized image to calculate the gradient.

$$ d_{u} = g\left( {i + 1, j + 1} \right) - g\left( {i, j} \right) $$
$$ d_{v} = g\left( {i + 1, j} \right) - g\left( {i, j + 1} \right) $$
$$ Direction: \theta \left( {i,j} \right) = tan^{ - 1} \frac{{d_{v} }}{{d_{u} }}^{ }_{ } $$
(1)
$$ Strength: f\left( {i,j} \right) = \sqrt {d_{u}^{2} + d_{v}^{2} }^{ }_{ } $$
(2)

5.3 Classification

In this section, classification techniques used for character recognition have been discussed. Classification stage uses the features extracted in the feature extraction stage for deciding the class membership. Classification phase is the decision making phase of a character recognition system. In this study, k-NN, Linear-SVM, Decision Tree, and Random forest classifiers have been applied individually and in various combinations of these classifiers with a voting scheme for recognition.

5.3.1 k-NN Classifier

In the k-Nearest Neighbour (k-NN) classifier, Euclidean distances from the candidate vector to stored vector are computed. The Euclidean distance between a candidate vector and a stored vector is given by

$$ d = \sqrt {\mathop \sum \limits_{k = 1}^{N} \left( {x_{k} - y_{k} } \right)^{2} } $$

here N is the total number of features in the feature set, \( x_{k} \) is the library stored feature value and \( y_{k} \) is the candidate feature value.Let Xi be an input training sample with m features {xi1, xi2, xi3 …………xim}, n be the total number of input samples {i = 1, 2, 3,…,n} and m the total number of features {j = 1, 2, 3,…,m}. It means lazy or instance learning is used in k-NN, while in other classifiers as SVM, eager learning is used. K specifies the number of nearest neighbours to be considered and the class of the majority of these neighbours is determined as the class of unknown pattern. More robust models can be achieved by locating k, where k > 1, neighbors and letting the majority vote decide the outcome of the class labeling. A higher value of k results in a smoother, less locally sensitive function. Generally, a larger value of k reduces the effect of noise on the classification, but makes the boundaries between classes less distinct. In this work, we have considered k = 1.

5.3.2 Linear-SVM

SVM is a learning machine which has been widely applied in pattern recognition. SVMs are based on statistical learning theory that uses supervised learning. In supervised learning, a machine is trained instead of programmed to perform a given task on a number of inputs/outputs pairs. SVM classifier has also been considered with a linear kernel. C-SVC type classifier in Lib-SVM tool has been used for classification purpose in this research work. In this work, initially we have computed recognition accuracy, using individuality of features and various combinations of these features like (zoning, DCT and gradient features) with a linear-SVM classifier.

5.3.3 Decision Tree and Random Forest (RF)

The attributes of the data are used by the decision tree learning algorithm for processing and decision making. Attributes in the decision tree are nodes and each leaf node is representing a classification. The ensemble for supervised learning method is called Random Forest. Random forest removed the over-fitting crisis of decision tree. Decision tree classifiers are used to classify various sub-samples of the dataset. The meta estimator that fits the number of decision tree classifiers for such design is called Random Forest. The random forest uses averaging to help in improving recognition accuracy and control over-fitting. Random forest is an outstanding algorithm in accuracy among existing supervised learning algorithms for classification and runs efficiently on large databases [19].

6 Adaptive Boosting and Bootstrap Aggregating

Boosting and Bagging are two techniques that can be used to improve the recognition results of classification.

6.1 Adaptive Boosting (AdaBoost)

Boosting is a way to manage machine learning in light of making a precise expectation rule by combining many moderately feeble and inaccurate rules. The AdaBoost algorithm of Freund and Schapire [20] was the principle useful boosting algorithm, and stays a standout amongst the most broadly utilized and contemplated, with applications in various fields. In this paper, we have used this methodology for improving the recognition results of medieval handwritten Gurmukhi manuscripts. AdaBoost is a sort of classifier with high precision. It gives the structure of order and makes the building up of sub-classifiers simple.

6.2 Bootstrap Aggregating (Bagging)

Bootstrap Aggregating (Bagging) is an ensemble method that creates separate samples of the training data set and creates a classifier for each sample. The results of these multiple classifiers are then combined (such as averaged or majority voting). Bagging, introduced by Breiman [21], takes bootstrap samples of objects and classifiers are trained on each sample. The classifier votes are then combined by majority voting. Breiman [21] shows that the predictors build using Bagging can make a weak unstable classifier significantly optimal. Bagging predictors have been used in various classifications and prediction problems in biology and social sciences. Diverse classifiers are generated by randomly selecting subsets of samples to train them.

7 Experimental Results

Experimental outcomes based on above mentioned features and classifiers with AdaBoost and Bagging algorithms are presented in this section. Various features are combined for improving the recognition accuracy. Different combinations of features and length of a combined feature vector, are also depicted in Tables 1, 2, 3 and 4. For classification, initially, we have considered individual classifiers as discussed in Sect. 5.3 and finally, we have also combined the classifiers and classification is done based on the voting scheme. For experimental work, the complete dataset is divided into training dataset and testing dataset. So, 70% data is considered as training dataset (798 samples) and remaining 30% considered as testing dataset (342 samples).

Table 1 Recognition accuracy using various features and classifiers
Table 2 Recognition Accuracy using various features and classifiers with Bootstrap Aggregating methodology
Table 3 Recognition Accuracy using adaptive boosting methodology
Table 4 Recognition Accuracy using a combination of classifiers with majority voting scheme

7.1 Recognition Accuracy Using Various Features and Classifiers

We have done experiments using zoning, DCT and gradient based features. Different combinations of these features are also experimenting. Classifier wise recognition accuracy is depicted in Table 1. As depicted in this Table 1, maximum recognition accuracy of 91.82% has been achieved using Linear-SVM classifier with a combination of zoning and DCT features. These results are graphically shown in Fig. 2.

Fig. 2
figure 2

Recognition accuracy using various features and classifiers

7.2 Recognition Accuracy using Bootstrap Aggregating Methodology

In this sub-section, recognition results based on bootstrap aggregating are presented. Classifier wise recognition results using bootstrap aggregating are depicted in Table 2. As shown in Table 2, maximum recognition accuracy of 92.19% has been achieved using a combination of zoning, DCT and gradient as features and Linear-SVM as a classifier. Recognition accuracy achieved using bootstrap aggregating methodology is graphically depicted in Fig. 3.

Fig. 3
figure 3

Recognition accuracy achieved with bootstrap aggregating methodology

7.3 Recognition Accuracy using Adaptive Boosting Methodology

Recognition results based on adaptive booting methodology are presented in this section (Table 3). It has been observed that the linear-SVM classifier with adaptive boosting methodology make this possible to achieve a maximum recognition accuracy of 91.07% when we use a combination of zoning, DCT and gradient features. Recognition accuracy achieved using adaptive boosting methodology is graphically depicted in Fig. 4.

Fig. 4
figure 4

Recognition accuracy achieved with adaptive boosting methodology

7.4 Recognition Accuracy Using Combination of Classifiers with Majority Voting Scheme

In this sub-section, recognition results based on a combination of all classifiers considered in this work with a majority voting scheme are illustrated (Table 4). These results are graphically shown in Fig. 5. It has been seen that a combination of all classifiers considered in this study with adaptive boosting methodology achieved maximum recognition accuracy of 95.91% when we use a combination of zoning and DCT features.

Fig. 5
figure 5

Recognition accuracy achieved using a combination of classifiers with majority voting scheme

8 State of the Art Work

In this section, we have discussed about the state-of-the-art work related to the present study. Lehal and Singh [22] have achieved a recognition accuracy of 91.6% for printed Gurmukhi text recognition. Sharma and Jain [23] have extracted zoning features for handwritten Gurmukhi character recognition. They have employed two classifiers, namely, k-NN and SVM. They have achieved maximum recognition accuracy of 72.5% and 72.0%, respectively, with k-NN and SVM. Kumar et al. [24] have achieved a recognition accuracy of 94.29% using intersection and open end points based features and SVM classifier. In their work, they have considered 3500 samples for experimentation work. They have also concluded the recognition accuracy is depending upon the number of samples in training and testing dataset. Using diagonal features with a k-NN classifier, they have achieved a recognition accuracy of 94.12% for the same dataset. Kumar et al. [25] have considered 10,500 samples for experimental work and used modified division points based features for offline handwritten Gurmukhi character recognition. They have achieved recognition accuracy of 84.57, 85.85 and 89.20% using Linear-SVM, MLP and k-NN classifiers, respectively. A few work has been done by Rani [26] for medieval handwritten Gurmukhi character recognition. They have achieved maximum recognition accuracy of 93.31% recognition accuracy, using a combination of zoning, DCT and gradient based features and Linear-SVM classifier. Similarity in the shape of different characters in Gurmukhi script makes confusion during the recognition process as depicted in Table 5. In this paper, a framework for medieval handwritten Gurmukhi character recognition has been presented and achieved a recognition accuracy of 95.91% with a combination of all classifiers considered in this work with the adaptive boosting methodology.

Table 5 Similar shaped characters in Gurmukhi script

9 Inferences and Future Directions

In this paper, an efficient recognition system has been presented for medieval handwritten Gurmukhi character recognition. Experimental results have affirmed the viability of the proposed approach and proved the characteristic advantages. We improved the recognition accuracy, using a combination of classifiers and majority voting scheme. Finally, with the help of adaptive boosting methodology we, achieved maximum recognition accuracy of 95.91%. We have also concluded that adaptive boosting and bootstrap aggregating methodologies can be widely used in pattern recognition field for improving the recognition results of various categories.