Keywords

1 Introduction

We humans need to learn how to work with computers as a tool, and provide facilities for working with computer software programs. One of the most important type of such programs includes Optical Character Recognition (OCR) for Farsi characters, literally meaning converting a printed or handwritten text within an image into a machine-readable text image. The image will be obtained by optical scanning or digital imaging.

Over the past few decades, the recognition of written patterns that mostly include printed or handwritten alphabetic and numeric characters attracted many researchers. The first efforts date from the early 1970s. Although extensive research led to the development of many efficient methods for importing information in documents, books, etc. into computers, most of the methods are still not fully functional. Filling this gap requires greater effort.

Various fields such as digital signal processing, image processing, machine vision, machine learning, fuzzy logic, statistics, neural networks et cetera are used in character recognition. There are many applications of alpha-numeric character recognition, such as postal address recognition, license plate recognition, automatic bank cheque processing, and security applications like passport authentication, etc.

An overall categorization for OCR systems in terms of the type of input pattern is as follows: 1. Recognition systems for printed texts. 2. Recognition systems for handwritten texts.

Today, traditional writing tools like paper and pen are used more commonly than keyboards and computers. In other words, information is usually handwritten. Therefore, the digitization of such information requires handwriting recognition (HWR).

In a different categorization, OCR systems are divided into two categories: 1. online systems, and 2. offline systems.

The application of online recognition is in handwriting writing, and offline recognition in printed and handwritten texts. The input to the offline systems is scanned images of texts, and to the online system is the coordinates of the pen movement points by a digitizer pen and tablet.

In the offline method, there are location information and highlighted image points. In the online method, the sequence of the line parts written by the user is accessible (the two-dimensional coordinates of a point sequence in writing are recorded as a continuous function of time). That is, the online method includes space-time representation.

Today, the demand for online systems has increased due to; (1) commercial tools used to communicate with users through pressure-sensitive screens instead of keyboards (such as PDA and Tablet PC, (2) writing being simpler than typing, (3) typing being impossible in some cases, (4) being difficult to type characters, and (5) lack of a full keyboard on small computers.

The main stages of a numeric character recognition system are presented in Fig. 1.

Fig. 1
figure 1

The main three stages of the handwritten numeric character recognition system

2 A Review of the Research on Farsi Numeric Character Recognition Methods

A method was presented for recognizing Farsi handwritten numeric characters that is adaptable to rotation and scale variation of characters to an acceptable extent [1]. This method was implemented on a database of 8600 numerals, i.e. 860 samples for each of the digits 0–9. In this method, 30% of the numerals were randomly rotated at different angles ranging from 10 to 40 degrees clockwise or counterclockwise. The recognition rate obtained was not reduced significantly compared to the non-rotational state. In this method, k-means clustering method was first used and then fuzzy SVM for classification. Feature extraction was carried using two methods, i.e. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA).

In order to improve the recognition rate, the images of the numerals were first upgraded in the preprocessing stage and then the slope in the image was corrected [2]. The database used included 4096 training numerals and 1532 test numerals written by 500 people in a number of forms. The preprocessing increased the recognition rate by 3.3%. In this method, intersection features, chain codes, and a SVM classifier were used.

Gradient histogram features and the developed characteristic locus method were used in combination [3]. The SVM classifier was also used. In this method, the features that affect the recognition rate among all extracted ones were selected using an Improved New Binary Particle Swarm Optimization (INBPSO) algorithm in order to improve the recognition rate. The extracted initial feature vector included 400 features which were reduced to 64 after feature selection. HODA Farsi Digit Dataset was used, which is a large dataset of handwritten Farsi digits. It consists of 102,352 digits, including 60,000 training, 20,000 test digits, and 22,352 remaining samples. In this methods, a good recognition rate is 99.40%.

In order to improve the recognition rate, effective features were selected among all the features [4]. For feature selection, an evolutionary genetic algorithm was used. In this method, the gradient features and multi-layer perceptron classifier (MLPC) were employed. A good recognition rate of 98.85 was obtained using the HODA dataset. In this research, 40,000 training samples and 20,000 test samples were used.

In order to increase the speed of feature extraction and the recognition rate, two new features called improved gradient and gradient histogram were introduced [5]. The two new features are based on the gradient feature for brightness and for the two-level and gray-scale images. Using a neural network classifier and the HODA dataset, a good recognition rate of 99.02 (improved gradient) and 98.80 (gradient histogram) was obtained. The feature extraction speed was increased 2 and 10 times for the gradient feature and gradient histogram compared to the brightness gradient, respectively.

In another method, binary SVM classifier and the HODA dataset were used for handwritten Farsi digit recognition [6]. In SVM, the One-vs-All (OvA) strategy was adopted. The feature extraction was performed using a two-dimensional wavelet transform and then feature reduction by PCA. A good recognition rate was obtained, 91.75.

Handwritten Farsi digit recognition was done using the features extracted from the image gradient [7]. In the method, the image was first normalized and the gradient was calculated. After that, the gradient angle was calculated for each of the image points, and converted to 4 or 8 standard angles. Using a gradient image, 4 or 8 separate images were created. Each of the images includes the gradient related to each angle. Hidden features were extracted by sampling the above images. The machine vector support was used as a classifier. In the SVM, One-vs-the-Rest (OvR) method was employed. A good recognition rate of 99.59 was obtained by using a database of 3939 test digits. This recognition rate was related to the 8-way gradient using the RBF kernel for the SVM classifier. The database used included 4974 training digits. However, the initial numbers were 5000 and 4000 for training and test samples, respectively. And, they were again reduced to 4974 and 3939 after the removal of samples that were badly written and difficult to recognize even by human. The samples had been written in forms randomly distributed among 90 high school and university students. The samples written by an individual were only in one of two sets, i.e. training or test set.

A method was proposed to improve the fuzzy recognition of handwritten Farsi digits. The initial fuzzy method was based on a fuzzy rule for each digit (a total of 10 simple fuzzy rules) [8]. In the improved fuzzy method, the training data for each digit were split into several clusters, and a fuzzy rule was extracted according to the data of each cluster. The number of optimized clusters was selected randomly by the particle swarm optimization algorithm. A good recognition rate obtained was 96.20. In this article, the HODA dataset was also used. The features used in this method were zoning, characteristic location, and Zernike moment.

A very large dataset called HODA was presented for handwritten Farsi numeric and alphabetic characters recognition [9]. This dataset was extracted from about 11,942 registration forms filled out by undergraduate and postgraduate students. The degree of sample resolution was 200 DPI (dots per inch). Using the forms, two large databases for handwritten Farsi numeric and alphabetic characters that have been used widely. The total number of digits is 102,352, of which 60,000 are training samples, 20,000 test samples, and 22,352 remaining samples. The remaining samples can be used in various problems.

A new method was introduced for improving the recognition rate of handwritten Farsi digits using a combination of different classifiers [10]. Digit recognition is a ten-class problem, which was converted into ten simple two-class problems. Each two-class classifier distinguishes one digit from others. It then recognized by using the combining rule on maximum final output. The database used included 8600 samples, and the recognition rate obtained was 96.3% using 600 test samples.

In order to improve the recognition rate, another method was based on the selection of effective features among all other features [11]. This will increase the recognition rate and reduce computational costs. Population-based algorithms, including binary particle swarm and genetic algorithms, were used for feature selection. The classifier used was a simple fuzzy classifier with no pre-processing and post-processing. The HODA dataset was also used for system evaluation.

A novel smart handwritten Persian digit recognition method was proposed [12]. Using the smart method in the feature selection problem, the recognition rate was increased to an acceptable extent. A fitness function, optimized and minimized by Gravitational Search Algorithm (GSA), was the number of fuzzy classifier errors. The good recognition rate obtained was 84.55% for test digits and 90.01% for training digits, without any preprocessing and post-processing.

To improve the recognition rate, a combination of descriptors HOG and LBP was used [13]. One advantage was that the information and features related to the image texture had been recorded. In addition, the length of the feature vector was short with high computation speed. Using the HODA dataset, the recognition rate was 99.3%.

There is another handwritten Farsi digit recognition method [14], based on which point, local, and global features are extracted for recognition. The SVM classifier was used with the One-vs-All (OvA) approach. Using the TMU-Online database, a recognition rate of 95.99% was obtained.

Two methods for improving the recognition rate in handwritten Farsi digit recognition were presented [15]. In the first method, superior features were selected using the Binary version of the Gravity Search Algorithm (BGSA) from all the extracted features for. In addition to the reduced computation speed, the recognition rate was increased. In the second method instead of feature selection, one weight was assigned to each feature using Real-valued Gravity Search Algorithm (RGSA). A new feature vector was obtained by multiplying the initial vector multiplication by the weight vector. The fitness function was the number of simple fuzzy classifier errors in both of the methods. And, the goal was to minimize this function to increase the recognition rate that was significantly increased using the two methods.

3 A Review of the Research on Farsi Alphabetic Character Recognition Methods

A relatively new handwritten Farsi distinct letter online recognition method was proposed [16]. This method has a pre-processing and post-processing stage. This method included preprocessing and postprocessing stages. In the preprocessing stage, the dimensions of the extracted features were equalized. The recognition stage consisted of two steps. In the first step, the main body of the input letter was assigned to one of the 18 groups of the main body of letters. And in the second step, the final letter was recognized based on the location, shape, and number of micro movements, such as point etc. For example to recognize the letter “ت”, the body group “ب،پ،ت،ث” was first determined, and then it was recognized for the “two points” micro movements above the letter. The classification was performed using support vector machine. In the postprocessing stage, the possible errors of the previous steps were corrected and the recognition rate was increased by matching the information of the main body and micro movements. For example, if the classifier detected the letter “ل” with a point above, the system correct and convert it to the letter “ن” in the postprocessing stage. The database used was the Online-TMU dataset. This dataset is provided by the Electrical Engineering Department, Tarbiat Modares University [17]. 70% of the samples were used in the training and 30% in the test stage. The best recognition rate obtained through this method was 98%. The implementation was done using the MATLAB software and the package LIBSVM as a library for support vector machines.

A simple method for recognizing distinct printed Farsi letters was proposed [18]. Through the method, the letters were divided into nine groups based on points and signs up or down. Using the neural network classifier, the points and signs were recognized and the corresponding group was recognized. At this stage, three simple features were used for feature recognition (the ratio of black to white pixels of the symbol frame, the ratio of the length to width of the symbol frame, and the number of horizontal crossings through the black parts of the symbol frame). If the recognized group includes just one letter, the same letter is assigned to the unknown letter, otherwise the minimum distance classifier would be used, and the final letter was recognized by comparing the body of the unknown letter with the body of letters from the same group. At this stage for recognition, the characteristic location features were extracted. To test the recognition system, fonts like Mitra, Lotus, Zar, and Nazanin were used in different sizes. The recognition rate was 100% using this method.

The classifier combination was used to improve the handwritten Farsi letter recognition rate [19]. The classifier combination was used with the purpose of creating different data for the training process to obtain a different classifier by dividing the input features into each step and to obtain better results by combining the results obtained by the classifiers.

In the method, the input data was first randomly divided into several classes, Principal Component Analysis was implemented on each class, and the features were extracted. By combining these features, the final feature vector was created and the training process was carried out using the SVM classifier. The method was evaluated by 10 datasets. Each dataset included 3200 samples of handwritten Farsi letters (100 samples per letter). 70 samples of each letter (a total of 2240) were used at the training stage, and 30 samples of each letter (a total of 960) were used at the testing stage. In this method, there was the preprocessing stage. Each image had a white background with letters in the middle. To enhance the quality, the images were converted to binary images. To reduce the computational load, the letters were separated from the background, and all the images were normalized to a standard size.

The advantage of this method over the combined methods, is the dispersion of samples at each stage and the increased accuracy of the base classifier. However, the runtime is longer than of other methods. And, the increased recognition rate was 82.51%.

A method for the online recognition of separate handwritten Farsi letters was presented [20]. In this method, the main body and letter micro movement information were used simultaneously. The letters were divided into 18 groups based on the similarity of the main body and into 11 groups based on the similarity of the micro movements. In this method, the main body group and the micro movements were first recognized for unknown letters, and the character recognition would be done if the recognized groups matched. If the recognized groups did not match, the error correction algorithm was used in the post-processing stage. In this method, there were preprocesses like the removal of bracket and duplicate points, point refinement, dimensional alignment, transition to origin coordinates, and the uniformity of the number of points and the spacing between them. For more information, see the reference. The preprocesses were necessary due to the fact that online data were input with an optical pen on a touch sensitive screen, so the number and spacing between the points and the dimensions of the sampled data varied greatly. Therefore, the uniformity was needed. To classify the main body, a series of global and point features were used, and a series of structural features and several features extracted from the first and second micro movements were used to classify the micro movements.

In addition, the feature vector dimensions of the main body were reduced from 102 to 17 features with the aim of increasing the resolution of the features and reducing the computational rate by using the principal component analysis (PCA) and linear separator analysis (LDA) algorithms. The classifier used for the main body of letters and SVM micro movements were based on One-vs-One approach (OvO). In this method, the optimal recognition rate was 98%. The online-TMU database was used with 4022 distinct letters.

A fuzzy method was used for distinct Farsi letter online recognition [21]. In the method, a hierarchical algorithm was proposed. A fuzzy classifier was also used to recognize the body of the letters. And to create this fuzzy classifier, expert knowledge and automated learning were combined. After the body recognition, the secondary symbols of the letters were recognized through a set of fuzzy rules. Using the method, the recognition rate was 90.3 for test samples of 54 individuals.

For Farsi manuscript online recognition, a database was introduced with digits, letters, and 1000 subwords, which were used most in the texts [22]. This database is fully functional for research on the recognition of Farsi letters, digits, and words.

For distinct Farsi letter recognition, the letters were first divided into 12 groups based on the points and signs below or above in the main body [23]. The points and signs of each letter were recognized, and each letter was assigned to one of the 12 groups. If the group included one letter, the unknown word would be recognized. Otherwise, the final recognition was carried by comparing the body of the letter with the body of letters from the same group using the minimum distance classifier.

Using a hidden Markov model (HMM), a method for Farsi letter online recognition was presented [24]. In this method, the number of letter components was obtained and the small components of the letter was recognized. After that, the body and the small components of the unknown word were preprocessed. Feature extraction was thus performed with greater accuracy. The features used were local and structural features. For the training phase, the Baum-Welch algorithm was used. This method also included a postprocessing operation. The good recognition rate was 97.22 for training samples and 94.9 for test samples.

4 Conclusion

According to the literature review, including 15 cases for digit recognition and 8 cases for letter recognition), it is clear that the resent research studies focused mostly on digit recognition in recent years. Table 1 includes the results of various methods used in the recognition of letters and digits, some of which were reviewed in the present paper. According to Table 1, it is evident that both the digit and letter recognition methods showed high recognition rates. Therefore, it is necessary to develop methods for commercialization. Moreover, excellent databases for digits and letters were developed during recent years, most notably the HODA Dataset for digits and the Online-TMU Database for distinct letters, both of which used frequently in the reviewed studies.

Table 1 The results obtained by different character recognition methods