1 Introduction

Character and numeral recognition frameworks have been a point of research for as far back as a couple of decades. Still, it remains an exceptionally difficult task to implement a character and numeral recognition framework that works under each possible condition and gives very precise outcomes. In Optical Character Recognition (OCR), the patterns are alphabets and numbers and so forth, while the diverse classes corresponding to the distinctive characters. The instructing of the machine is performed by demonstrating the machine cases of characters of all the diverse classes. In light of these cases the machine constructs a model of each class of characters. At that point, unknown pattern (character or number) is compared with the beforehand acquired depictions, and relegated the class that gives the best match. Optical recognition is performed after the writing of text or printing on paper has been finished, rather than on-line recognition where the computer system recognition recognizes the characters as they are drawn. Both handwritten and printed characters may be recognized accurately, but the quality is straightforwardly needy upon the superiority of the documents. The research in the area of character recognition began in the nineteenth century and first optical character recognition was on hand in 1929. Modern version of OCR was produced in 1951 by David Shepard (Schantz 1982). Character recognition has played and currently playing an important role in pattern recognition research. In general, research on optical character recognition for Indic scripts is in progress. But till now, no solution has been offered that solves the problem correctly and efficiently for Indic scripts. There are various applications of character/numeral recognition research like handwritten notes reading, banking cheque reading, post code recognition, form processing, etc. Character recognition system can be used for reading handwritten notes. Notes are, normally, used to record facts, topics, or thoughts, written down as an assist to memory. Cheque reading is a very important commercial application of character recognition system. Character recognition system plays a very important role in banks for signature verification and for recognition of amount filled by the user. Character and numeral recognition system can be used for reading handwritten postal address on letters and handwritten digits of postcodes. Character recognition system can also be used for form processing. Forms are normally used to collect information from the public. This information can be processed by using a handwritten character recognition system. Signature identification is the specific field of handwriting OCR in which the writer is verified by some specific handwritten text. Offline handwritten character recognition system can be used to identify a person by handwriting, as handwriting varies from person to person. This paper consists of five sections. In Sect. 2, we have presented various issues and challenges for character recognition. Section 3 presents the various motivations related to this work. In Sect. 4, the recognition of different non-Indic scripts has been reviewed and in Sect. 5, literature review has been conducted for Indic scripts. Section 6 presents recognition accuracies achieved for typical non-Indic and Indic scripts and in Sect. 7, we have discussed about a few suggestions on future directions of character recognition of different scripts. Finally, in Sect. 8, we have concluded this paper.

2 Challenges and issues

Various challenges are identified which may provide more lively interest to the researchers for character/numeral recognition. These challenges are difficult to identify the various styles of human writing, different shapes and size of letter, pure input quality, low accuracy rate in recognition etc. Hence, a lot of research work is to be done to solve these problems. Quality of the input document and varying font styles is a challenging task for various scripts. As compared to non-Indic scripts, Indic scripts have many additional challenges like larger character set due to modifiers, lack of standard test databases etc. Segmentation is a major challenge in text recognition due to existence of different skew angles between lines on the page or even along the same text line, i.e., presence of multiple skew in a document complicates the line segmentation process. Curvilinear lines and fluctuating lines also create problems while identifying the exact line boundaries, difference in skew angle of words and characters within the same line also hinders the process of line segmentation, overlapping adjacent text lines, i.e., lower or upper portion of one line extended to neighboring text line makes line segmentation difficult. Poor quality document consisting of holes, noise, spots, broken strokes etc. makes the process of line segmentation extremely difficult. Character segmentation poses additional problems due to touching, broken and overlapping characters. Non-uniform background is one of the quite challenging tasks for text recognition. Recognition of historical manuscript documents is also a challenging problem due to low quality of manuscripts, absence of standard alphabets, presence of unknown fonts, etc. Various challenges are identified for Arabic text recognition as Arabic is cursive in nature because individual characters join together to form a complete word, thus identifying the segmentation points in the words to separate isolated character becomes difficult. One of the major challenges for Arab is that Arabic characters are enriched with dots and diacritics, relative position of dots and diacritics changes frequently with respect to the character which it is associated. Sometimes, Japanese text recognition is also a challenging task to determine whether two radicals are in fact two separate characters or two component parts of the same character. Still, the largest challenge is recognizing the large number of characters and the majority of research has been devoted to overcoming this difficulty. Few major challenges and issues for Japanese text recognition are cursive characters, word spotting, and a document image may consist of printed and handwritten text. Bangla character recognition is a great challenge for researchers because of the large number of characters, change of shape in word and in conjunctive characters. Recognition of the printed Devanagari script is the challenging problem since there is a difference in the same character due to diverse font family, font size, font orientation etc. Sometime same font and size may also have bold face character as well as normal ones. Thus, the width of the stroke is also an issue that interrupts recognition. There are few major challenges in Gurmukhi text recognition. In online Gurmukhi handwriting recognition challenges like confusing strokes, reverse handwriting, new classes in handwritten words etc. are exiting. In offline handwriting recognition, sometimes headlines of the words are not straight, many touching or overlapping characters may be found in a word, and shape variation in different occurrences of a single character are the major challenges for Gurmukhi text recognition. There are few major challenges for Kannada text recognition are existing like, Kannada character set is very vast, few characters are similar with each other, size of characters and words in Kannada is not uniform.

3 Motivation

The advancement in optical character recognition framework is divided into two categories according to the technique for data acquisition: online character recognition and offline character recognition. The online character recognition framework utilizes the digitizer which particularly captures writing with the order of the strokes, speed, pen up and pen down data. Offline character recognition captures the information from paper through an optical scanner or cameras. Offline character recognition is otherwise called optical character recognition in light of the fact that the image of text is changed over into a bit design by optically digitizing devices. Recognition is carried out on this bit design information for both printed and handwritten text. Offline handwritten character recognition is difficult rather than online handwritten character recognition because stroke information is not available in offline handwriting. The major difficulties, as on account of any handwritten character recognition issue, are the huge variety in the composition styles of an individual at various circumstances and among various people, for example, shape, speed of composing and thickness of characters and so on. The issue of printed character recognition is generally solved and comprehended with few limitations and accessible framework yield approximately 99% recognition accuracy. But, handwritten character recognition has still constrained capacities. Other challenges incorporate the similarities of a few characters with each other, vast assortment of character shape and so forth. Offline handwritten character recognition is a standout amongst the most famous zones of research in document analysis and recognition of its enormous application potential. Offline handwritten character recognition is generally developed in scripts like Arabic, Chinese, Korea, and Roman. Some encouraging research findings are recorded in Indic scripts like Bangla, Devanagari and so on. In Indic scripts, though many research papers are published around there, the outcomes detailed are deficient for the outline of efficient handwritten character recognition frameworks. This is the inspiration driving this paper.

4 Recognition of non-Indic scripts

Borovikov (2004) have presented a survey of modern optical character recognition techniques. In this article, they discussed about the latest advances and major developments for optical character recognition techniques. Hussain et al. (2015) have presented a comprehensive survey of handwritten document benchmarks. They have also presented a comparison of these databases on a number of dimensions. The ground truth information of the database along with the supported tasks is also discussed by them. Sonkusare and Sahu (2016) have presented a survey on handwritten character recognition techniques for English alphabets. They have presented an outline of current research work conducted for recognition of handwritten English alphabets. A variety of recognition methodologies with their performance for handwritten English alphabets are conferred by them in this paper. Modi and Parikh (2017) have presented a detailed review in the field of optical character recognition. They have surveyed various techniques for pre-processing and segmentation phases on optical character recognition. We have noticed that most of the existing efforts on optical character recognition deal with non-Indic scripts. In the following sub-sections, we have presented literature of different non-Indic scripts.

4.1 Arabic

The Arabic script is used for writing Arabian and Persian languages. Almuallim and Yamaguchi (1987) have presented a recognition system for Arabic script. They have used geometrical and topological features for recognition. Impedovo and Dimauro (1990) have proposed a recognition system for handwritten Arabic numerals based on Fourier descriptors. Roy et al. (2004) have presented an Arabic postal automation system for sorting of postal documents. Multi-Layer Perceptron (MLP) classifier has been reflected in their work for recognition of Bangla and Arabic numerals. They have obtained maximum recognition accuracy of about 92.1% for handwritten numerals. Lorigo and Govindaraju (2006) have presented a critical review on offline Arabic handwriting recognition systems. They have presented various techniques employed at different stages of the offline handwritten Arabic character recognition system. Izadi et al. (2006) addressed the issues in the Arabic alphabet, adopted and evolved, for writing Persian language. Abd and Paschos (2007) have obtained a recognition accuracy of 99.0% with Support Vector Machine (SVM) for the Arabic script. Alaei et al. (2009) have presented fivefold cross validation technique based recognition system for Arabic numerals. They have achieved a recognition accuracy of 99.4% on a 10-class problem with 20,000 samples in testing data set. Alaei et al. (2010a) have proposed a technique for segmentation of handwritten Persian script text lines into characters. The proposed algorithm finds the baseline of the text image and straightens it. They have extracted features using histogram analysis and removed segmentation points, using baseline dependent as well as language dependent rules. They have achieved maximum segmentation accuracy of 92.5%. Alaei et al. (2010b) have proposed a Persian isolated handwritten character recognition system. They employed SVM for classification and achieved a recognition accuracy of 98.1% with modified chain code features. Kacem et al. (2012) have used structural features for recognition of Arabic names. Shahin (2017) has introduced a system for printed Arabic text recognition using linear and nonlinear regression. He has tested his proposed methodology with 14,000 different words of Arabic script and accomplished a recognition accuracy of 86.0%. Althobaiti and Lu (2017) have presented a review on Arabic optical character recognition and they have proposed a technique for isolated handwritten Arabic character recognition based on encoded freeman chain code.

4.2 Chinese

Some Chinese character recognition systems based on the orthogonal moment descriptors have been reported (Liu and Ma 1996; Zhang et al. 1990; Yap and Paramesran 2003). Zhu et al. have recognized Chinese characters based on stroke and structural features. Liao wt al. (2002) have presented a method based on Gegenbauer moment for Chinese character recognition. Their method can provide a modest improvement in terms of recognition for those Chinese characters that are very similar in shapes. They have used a set of 6763 Chinese characters is used as the testing images. Das and Banerjee (2015) have presented an algorithm based on geometry topology for Japanese Hiragana character recognition. They have achieved an average recognition rate of 94.1%. Bluche and Messina (2016) have presented a technique for recognition of segmentation-free methods for handwritten Chinese text. He and Hu (2016) have presented a system for Chinese character recognition from natural scenes. They have presented a novel method based on the integrated channel feature and pooling technology to extract informative features from scenes images.

4.3 French

Tran et al. (2010) have considered the problem of French handwriting recognition using 24,800 samples. They have worked on both online and offline handwritten character recognition. Grosicki and Abed (2009) proposed a French handwriting recognition system in a competition held in ICDAR-2011. In this competition, they have presented comparisons between different classification and recognition systems for French handwriting recognition. Swaileh et al. (2016) introduced a new unified syllabic model for French handwriting recognition based on hidden Markov models (HMM).

4.4 Japanese

Nakagawa et al. (2005) have presented a model for online handwritten Japanese text recognition which is free from line direction constrains and writing format constraints. Zhu et al. (2010) have described a robust model for online handwritten Japanese text recognition. They received a recognition accuracy of 92.8% using 35,686 samples. Tsai (2016) has achieved a recognition accuracy of 96.1% for Handwritten Japanese Characters which consist of three different types of scripts: hiragana, katakana, and kanji. They have used deep Convolutional Neural Networks for classification. For experimentation work, they have considered Electrotechnical Laboratory (ETL) Character Database from the National Institute of Advanced Industrial Science and Technology (AIST). Liang et al. (2016) have presented an on-line handwritten Japanese text recognition system. They introduced a new unified syllabic model for French and English handwriting recognition, based on hidden Markov models (HMM). Their proposed method sets each off-stroke between real strokes as undecided and evaluates the segmentation probability by SVM model.

4.5 Roman

Schomaker and Segers (1999) have proposed a technique for cursive Roman handwriting recognition using geometrical features. Park et al. (2000) have presented a hierarchical character recognition system for achieving high speed and accuracy by using a multi-resolution and hierarchical feature space. They obtained a recognition rate of about 96.0%. Wang et al. (2000) have presented a technique for recognition of Roman alphabets and numeric characters. They had a recognition rate of about 86.0%. Bunke and Varga (2007) have reviewed the state of the art in offline Roman cursive handwriting recognition. They identified the challenges in Roman cursive handwriting recognition. Liwicki and Bunke (2007) have combined online and offline Roman handwriting recognition systems using a new multiple classifier system. They obtained a maximum recognition accuracy of 66.8% for the combination of online and offline handwriting recognition. Schomaker (2007) has presented a method for retrieval of handwritten lines of text in historical administrative documents. Chanda et al. (2007a) have proposed a SVM based method for identification of printed Roman script documents. They have extracted structural features for script identification and achieved 99.4% recognition accuracy. Pal et al. (2010) have proposed a bi-lingual city name recognition system for Bangla and English. They have considered 11,875 samples for testing and obtained 92.2% recognition accuracy. Jayadevan et al. (2010) have evolved a scheme for recognition of words used to write the amount of bank cheques. They collected a database of 5400 words from fifty writers for testing. Recognition accuracy of 97.0% has been achieved by them. Afroge et al. (2016) have proposed an optical character recognition system for Roman script using a back propagation neural network. They trained their network with more than 10 samples per class and give accuracy of 99.0, 97.0, 96.0 and 93.0% for numeric digits, capital letters, small letters and alphanumeric characters, respectively.

4.6 Thai

Chanda et al. (2007b) have evolved a method based on SVM for identification of printed Thai script documents. They have obtained 99.4% script identification accuracy. Karnchanapusakij et al. (2009) have used linear interpolation approach for online handwritten Thai character recognition. They have obtained 90.9% recognition accuracy using this system. Kobchaisawat and Chalidabhongse (2015) have proposed a method for multi-oriented Thai text localization in natural scene images. They have considered convolutional neural network for classification. Asavareongchai and Giarta (2016) have presented an image processing system for recognition of Thai characters and text from documents. Sopon et al. (2017) have proposed a framework for Thai text retrieval using speech. They have achieved average word accuracy of 74.50%.

5 Recognition of Indic scripts

Pal et al. (2012) have presented a state-of-the-art survey about the techniques available in the area of offline handwriting recognition (OHR) in Indian regional scripts. They have presented survey of nine regional scripts and then categorized these nine scripts into four subgroups based on their similarity and evolutionary information. Various feature extraction and classification techniques associated with the offline handwriting recognition of the regional scripts are discussed in this survey. They have also discussed about the details of the datasets available in different Indian regional scripts. Singh et al. (2012) have presented a survey based on various applications of optical character recognition in different fields. Prasad (2014) has presented an in-depth literature survey of Indic script recognition systems for Bangla, Devnagari, Gurumukhi, Kannada, Malayalam, Tamil, and Urdu. They focused on a multitude of feature and classification techniques for recognition of various scripts. Koundal et al. (2017) have presented a survey for Punjabi character recognition. They have discussed about various feature extraction techniques and classification techniques explored for printed and handwritten Punjab character recognition. As compared to non-Indic scripts, the research on character recognition of Indic scripts has not achieved that perfection yet. So, research in the field of character recognition of Indic scripts is ongoing. In Indic scripts, there is mainly resolute character recognition of machine printed text. Limited attempts have been made for recognition of degraded printed text and handwritten text as well.

5.1 Bangla

A good number of researchers have worked for recognition of handwritten characters in Bangla script. Bangla script is used for writing Bengali and Assamese languages. Dutta and Chaudhury (1993) have presented a system for isolated Bangla alphabets and numerals recognition using curvature features. Pal and Chaudhuri (1994) have proposed a character recognition system using tree classifier. Their system was quite fast because pre-processing like thinning is not required in their scheme. They have achieved a recognition accuracy of 96.0% using 5000 characters data set. Bishnu and Chaudhuri (1999) have used a recursive shape based technique for segmentation of handwritten Bangla script documents. Pal and Dutta (2003) have proposed a system for segmentation of unconstrained Bangla handwritten connected numerals. They achieved segmentation accuracy of 94.8%. Roy et al. (2004) have presented a handwritten numeral recognition system for Indian postal automation and achieved a recognition accuracy of 92.1%. Bhattacharya et al. (2006) have presented Bangla character recognition system and they have obtained maximum recognition accuracy of 94.7%. Pal et al. (2006) have proposed a technique for slant correction of Bangla characters based on Modified Quadratic Discriminant Function (MQDF). They have achieved a recognition accuracy of 87.2% for Bangla city name images dataset. Bhattacharya et al. (2007) have presented an approach for online Bangla handwritten character recognition. They developed a 50-class recognition problem and achieved an accuracy of 92.9 and 82.6% for training and testing, respectively. Pal et al. (2007a) dealt with recognition of offline handwritten Bangla compound characters using MQDF. They have obtained 85.9% recognition accuracy by using fivefold cross validation technique. Pal et al. (2008) have proposed a technique for Bangla handwritten pin code recognition system. Reddy et al. (2012a, b) have presented a handwritten numeral recognition system that can be employed to both online and offline situations for Assamese language. For online handwritten numeral recognition, they have used x and y coordinates for feature extraction and HMM classifier for recognition. For offline numeral recognition, they have considered projection profile features, zonal discrete cosine transforms, chain code histograms and pixel level features and Vector Quantization (VQ) classifier for recognition. They have achieved a recognition accuracy of 96.6 and 97.6% for online and offline handwritten numerals, respectively. Reddy et al. (2012a, b) have also presented a HMM based online handwritten digit recognition system using first and second order derivatives at each point as features. They obtained a recognition accuracy of 97.1% on 18,000 samples testing data set. Sarma et al. (2013) have presented handwritten Assamese numeral recognition system using HMM and SVM classifiers. They have attained a recognition accuracy of 96.5 and 96.8% with HMM and SVM classifier, respectively. Afroge et al. (2016) have presented an offline printed optical character recognition system based on multilayer perceptron model for Bangla script. They have proposed a feature extraction technique based on “Discrete Frechet Distance” and “Dynamic Time wrapping”. They have achieved a recognition accuracy of 95% all basic characters of Bangla script.

5.2 Devanagari

The Devanagari script is used for writing four languages, namely, Hindi, Marathi, Nepali and Sanskrit. Sethi and Chatterjee (1976) have done a good amount of work on Devanagari script recognition. They have used binary decision tree classifier for recognition. Pal and Chaudhuri (2001) have proposed a methodology for machine recognition of printed and handwritten texts of Devanagari script. Recognition accuracy of 98.3% has been achieved by them using this system. Bansal and Sinha (2000) have also presented two phases based recognition system for Devanagari script. In the first phase, they recognized the unknown stroke and in the second phase, they identified the character based on strokes recognized in the first step. Roy et al. (2004) have evolved a scheme for handwritten script identification system and they have generated a tree classifier for word by word script identification of Bangla, Devanagari, and English. They have achieved a recognition accuracy of 98.4% with the proposed technique for printed text. Joshi et al. (2005) have presented an online handwritten Devanagari character recognition system. They have proposed structural feature based algorithm for recognition. Hanmandlu et al. (2007) have used membership functions of fuzzy sets for handwritten Devanagari script recognition. Pal et al. (2007b) have developed a modified classifier based scheme for offline handwritten numerals recognition of six widely used Indian scripts. They have extracted directional features for numeral recognition. They have obtained 99.6% recognition accuracy with fivefold cross validation technique. Pal et al. (2007d) have set into motion, a system for offline handwritten Devanagari character recognition. They have achieved a recognition accuracy of 94.2% with fivefold cross validation test. Kumar (2008) has brought in an artificial intelligence based technique for machine recognition of handwritten Devanagari script. He has used three levels of abstraction to describe this technique. Pal et al. (2009) have assimilated a comparative study of handwritten Devanagari character recognition. Garg et al. (2010) have developed a line segmentation technique for handwritten Hindi text. Lajish and Kopparapu (2010) have described a technique for online handwritten Devanagari script recognition. They have extracted fuzzy directional features for writer independent Devanagari character recognition. Marathi is an Indo-Aryan language spoken in the Indian state of Maharashtra and neighbouring states. Ajmire and Warkhede (2010) have presented a technique based on invariant moments of isolated handwritten Marathi character recognition. The proposed technique is size independent. Shelke and Apte (2011) have presented a multi-stage handwritten character recognition system for Marathi script. They have achieved the recognition accuracy of 96.1 and 94.2% respectively, for training and testing data sets with wavelet approximation features. They have also achieved 98.7 and 96.2% recognition accuracy, respectively, for training and testing samples with modified wavelet features. Belhe et al. (2012) have presented a Hindi handwritten word recognition system. They have used HMM and tree classifier for recognition and obtained a recognition accuracy of 89.0% using 10,000 Hindi words.

5.3 Gujarati

Antani and Agnihotri (1999) are pioneers in attempting Gujarati printed text recognition. For experimental results, they have used dataset of scanned images of printed Gujarati texts collected from various internet sites. Dholakia et al. (2007) attempted to use wavelet features and k-NN classifier on the printed Gujarati text recognition system. They have achieved a recognition accuracy of 96.7% with k-NN classifier. Prasad et al. (2009) have furnished a unique technique called pattern matching for Gujarati script recognition. In this technique, they have identified a character by its shape. Gohell et al. (2015) have presented a low level stroke feature based method for recognition of online handwritten Gujarati characters and numerals. They have accomplished a recognition accuracy of 95, 93 and 90% for numerals dataset, characters dataset and combine dataset of numerals and characters, respectively. Ardeshana et al. (2016) have extracted DCT features for handwritten Gujarati character recognition. They have achieved a recognition accuracy of 78.05% for 22,000 samples using Naïve Bayes classifier. Patel and Kayasth (2017) have presented a recognition system for offline handwritten Gujarati numerals. They have extracted various features namely, hole, straight-line, number of open/end edge and open edge present in different zone for recognition.

5.4 Gurmukhi

Gurmukhi script is used for writing the Punjabi language. Lehal and Singh (1999) have presented a hybrid classification scheme for printed Gurmukhi script recognition. Using this scheme, they have achieved a recognition accuracy of 91.6%. A post processor for Gurmukhi script has been proposed by Lehal et al. (2001). Jindal et al. (2005) have proposed a solution for touching character segmentation of printed Gurmukhi script. Also, they have provided a very useful solution for overlapping lines segmentation in various Indian scripts (2007). They have proposed a technique for segmentation of degraded Gurmukhi script word into upper, middle and lower zones. They have provided a degraded printed Gurmukhi script recognition system. Sharma et al. (2008) have used elastic matching technique for online handwritten Gurmukhi script recognition. Sharma et al. (2009) have expounded a method to rectify the recognition results of handwritten and machine printed Gurmukhi OCR systems. Sharma and Lehal (2009) have set in an algorithm for removal of the field frame boundary of the hand filled forms in Gurmukhi script. Sharma and Jhajj (2010) have extracted zoning features for handwritten Gurmukhi character recognition. They have employed two classifiers, namely, k-NN and SVM. They have achieved maximum recognition accuracy of 72.5 and 72.0%, respectively with k-NN and SVM. Kumar et al. (2013) have presented a novel feature extraction technique for offline handwritten Gurmukhi character recognition. They have also presented efficient feature extraction techniques based on curvature features for offline handwritten Gurmukhi character recognition (2014a). Kumar et al. (2013) have also presented a character recognition using principal component analysis. They have explored k-NN and SVM classifiers for offline handwritten character recognition (Kumar et al. 2011a, b, c, d, 2012).

5.5 Kannada

Kannada is part of the most widely used scripts of Southern India and is spoken by more than fifty million people in India. A little work has been conducted for handwritten Kannada script recognition. Ashwin and Sastry (2002) have presented a font and size independent OCR system for printed Kannada documents. They extracted features based on the foreground pixels in the radial and angular directions. They achieved maximum recognition accuracy of 94.9% using SVM classifier. Sharma et al. (2006) have employed a quadratic classifier for offline handwritten Kannada numerals recognition. They have achieved maximum recognition accuracy of 98.5% using a fivefold cross validation technique. Kunte and Samuel (2007) have presented efficient printed Kannada text recognition system. They considered invariant moments and Zernike moments as features and Neural Network (NN) as classifier. They obtained a recognition accuracy of 96.8% using 2500 characters. Acharya et al. (2008) have come up with a handwritten Kannada numerals recognition system. They have used structural features and multilevel classifiers for recognition. Rajashekararadhya and Ranjan (2008a) have evolved a technique based on zoning and distance metric features. They have utilized feed forward back propagation neural network and obtained recognition accuracy of about 98.0% for Kannada numerals. They have also achieved a recognition accuracy of 97.8% for Kannada numerals with zoning and distance metric features and SVM classifier (2008b). They have utilized Nearest Neighbour classifier for recognition and obtained 97.8% recognition rate for Kannada numerals (2009a). Rajashekararadhya and Ranjan (2009b) have extracted zoning features for offline handwritten numerals of four widely used Indian scripts. For Kannada numerals, they have obtained a recognition accuracy of 98.7% with SVM classifier. Rampalli and Ramakrishnan (2011) have presented an online handwritten Kannada character recognition system which works in combination with an offline handwriting recognition system. They improved the accuracy of online handwriting recognizer by 11% when its combination with offline handwriting recognition system is used. Venkatesh and Ramakrishnan (2011) have presented a technique for fast recognition of online handwritten Kannada characters. Using this technique, they obtained an average accuracy of 92.6% for Kannada characters. Ramakrishnan and Shashidhar (2013) have addressed the challenges in segmentation of online handwritten isolated Kannada words. They achieved 94.3% segmentation accuracy using attention feed-based segmentation technique. Pasha and Padma (2015) have discussed about wavelet transforms and structural features for handwritten Kannada character recognition. They have achieved a recognition accuracy of 91.0 and 97.6% for characters and numerals, respectively. Karthik and Srikanta (2016) have presented a novel approach for handwritten Kannada text recognition using a combination of histogram of gradient features and SVM classifier.

5.6 Malayalam

Malayalam is one of the popular scripts of Southern India. It is the eighth most widely used script in India. Lajish (2007) has presented a system based on fuzzy zoning and normalized vector distance measures for recognition of offline handwritten Malayalam characters. He has also presented a method for offline handwritten segmented Malayalam character recognition (2008). John et al. (2007) have presented a method based on wavelet transform for offline handwritten Malayalam character recognition. Rajashekararadhya and Ranjan (2008b) have developed a technique of feature extraction for Malayalam script recognition. They have also obtained a recognition accuracy of 96.5% with SVM for Malayalam numerals recognition (2009a). Arora and Namboodiri (2010) have proposed a system for online handwritten Malayalam character recognition. The system achieves stroke level accuracy of 97.9%. Rahiman et al. (2010) have evolved an algorithm which accepts the scanned image of handwritten characters as input and produces editable Malayalam characters in a predefined format as output. Sreeraj and Idicula (2010) have presented a technique for online handwritten Malayalam character recognition. They have employed the k-NN classifier and achieved a recognition accuracy of 98.1%. Sunija et al. (2016) have presented a comparative study of various classifiers for Malayalam dialect recognition system. They have analyzed that a recognition accuracy of 90.2, 88.2 and 84.1% has been accomplished using ANN, SVM, and Naïve Bayes classifier, respectively. Baiju and Sabeerath (2016) have compared K-NN, MLP, and SVM classifiers for online handwritten Malayalam text recognition. They have achieved maximum recognition accuracy of 95.12% for Malayalam character recognition using SVM classifier with RBF kernel.

5.7 Oriya

The Oriya OCR system has been developed at the Indian Statistical Institute, Kolkata by Pal and Chaudhuri (1997). They have utilized the Hough transform based technique for skew angle detection for Oriya alphabets recognition. Tripathy and Pal (2004) have segmented Oriya handwritten text using water reservoir based technique. Roy et al. (2005) dealt with offline unconstrained handwritten Oriya numerals recognition. They have achieved a recognition accuracy of 90.4% using NN classifier with a rejection rate of about 1.84%. Bhowmik et al. (2006) have presented HMM based Oriya numerals recognition system and they have achieved a recognition accuracy of 95.9 and 90.6% for training and testing sets, respectively. Pal et al. (2007d) have used curvature features for Oriya numerals recognition. They have obtained a recognition accuracy of 94.6% using this system. Raj (2015) has presented an optical character recognition for Oriya script. They have considered structural features and a novel combination of a binary tree and Naïve Bayes classifier for recognition purpose. Chaudhary et al. (2015) have used histogram of gradient features and ANN classifier for recognition of printed Oriya characters. Bhoi et al. (2015) have presented Oriya handwritten text recognition system using Hidden Markov Model (HMM). They have extracted concavity features for recognition.

5.8 Tamil

Aparna et al. (2004) have presented a system for online handwritten Tamil character recognition. They have used shape based features including dot, line terminal, bumps and cusp in their work. Deepu et al. (2004) have presented an online handwritten Tamil character recognition using PCA. Prasanth et al. (2007) have described a character based elastic matching technique for online handwritten Tamil character recognition. Bharath and Madhvanath (2011) have used HMM for Tamil word recognition system. They have achieved maximum recognition accuracy of 98.0%. Sundaram and Ramakrishnan (2013) have proposed script-dependent approach to segment online handwritten isolated Tamil words into its constituent symbols. They tested their proposed scheme on a set of 10,000 isolated handwritten words. Sundaram and Ramakrishnan (2014) reduced the error rate of the Tamil symbol recognition system by reevaluating certain decisions of the SVM classifier. Rajashekararadhya and Ranjan (2008b) have come up with a zoning based feature extraction technique for recognition of offline handwritten numerals of four widely used Indian scripts. They have utilized nearest neighbour, feed forward back propagation neural network and SVM classifiers for recognition. They have obtained a recognition accuracy of 96.1% for Tamil character recognition. Janani et al. (2016) have recognized and analyzed of Tamil inscriptions and mapping of image processing techniques. Elakkiya et al. (2017) have presented Tamil text recognition using K-NN classifier. They have achieved a recognition accuracy of 91.0%.

5.9 Telugu

Prasanth et al. (2007) have used elastic matching technique for online handwritten Telugu character recognition. They have obtained a recognition accuracy of 90.6%. Pal et al. (2007b) have used direction information for Telugu character recognition. They have used a fivefold cross validation technique and obtained a recognition accuracy of 99.4% for Telugu character recognition. Rajashekararadhya and Ranjan (2008a) have recognized handwritten Telugu numerals with zoning and distance metric based features. For recognition, they have used feed forward back propagation neural network classifier and obtained a recognition accuracy of 96.0%. Rajashekararadhya and Ranjan (2008b) have proposed an algorithm based on zoning features for offline handwritten numerals recognition of four widely used Indian scripts. They have obtained a recognition accuracy of 98.6% for handwritten Telugu numerals recognition with an SVM classifier. Arora and Namboodiri (2010) have proposed a system for online handwritten Telugu character recognition. They have achieved a stroke level accuracy of 95.1% for Telugu character recognition. Sastry et al. (2014) have extracted zoning based features for Telugu handwritten character recognition. They have achieved a recognition accuracy of 78.0% using zoning based features. Jyothi et al. (2015) have presented innovative feature sets for Telugu character recognition. They have considered Discrete Wavelet Transformation (DWT), Projection Profile (PP) and Singular Value Decomposition (SVD) features and for classification they have explored k-NN and SVM classifiers. Kinjarapu et al. (2016) have presented an online handwriting recognition system for Telugu script. They have accomplished a recognition accuracy of 90% using a combination of strokes. Prasad and Kanduri (2016) have used zoning based features and Genetic Algorithm for Telugu handwritten character recognition. They have considered k-NN classifier for recognition purpose and achieved a recognition accuracy of 88.8%.

Table 1 Recognition results of numerals
Table 2 Feature wise recognition results of numerals

6 Recognition results of non-Indic and Indic scripts

This section presents a brief report on the recognition accuracies achieved by researchers for character and numeral recognition. We have presented their results in Tables 1 and 2 for numerals, Tables 3 and 4 for non-Indic scripts and Tables 5 and 6 for Indic scripts. As illustrated in Table 1, one may note that a recognition accuracy of 99.6% has been achieved for handwritten numerals by Pal et al. (2007b). Feature wise comparative study of numeral recognition is depicted in Table 2. In Table 3, the results of non-Indian scripts are presented. As shown in this table, one may note that a recognition accuracy of 99.4, 99.9, 92.8 and 99.2%, has been achieved for Arabic, French, Japanese and Roman scripts, respectively. Srihari and Leedham (2003) have also presented a good survey on computer methods in forensic handwritten document examination. They have presented various software systems that automate some of the examination processes and have included verification methods to provide the degree of match between a questioned and known document. Feature wise comparative study for non-Indic scripts recognition is presented in Table 4. In Table 5, the results on Indic scripts have been presented. It can be seen that a lot of work has been done on Bangla, Devanagari and Kannada scripts. Some work has also been done to recognize the Gurmukhi, Malayalam, Oriya and Tamil scripts as given in this table. As depicted in Table 5, for the Bangla script, maximum recognition accuracy of 97.6% has been achieved by Roy et al. (2005). In Devanagari script, maximum recognition accuracy of 99.0% has been achieved by Pal et al. (2007b). They have used directional features and MQDF classifier for recognition. Kunte and Samuel (2007) have achieved a maximum recognition accuracy of 96.8% for Kannada characters. They have tested their technique with 1000 samples of 50-class problem. For all classes of Kannada script, maximum recognition accuracy of 92.6% has been achieved by Venkatesh and Ramakrishnan (2011). They have considered 26,926 samples for testing data set. Arora and Namboodiri (2010) have achieved a recognition accuracy of 95.8% for Malayalam character recognition. They have tested their technique with 7348 samples of Malayalam characters. Joshi et al. (2004) have achieved a maximum recognition accuracy of 91.5% for Tamil character recognition. They have considered 4860 samples of 156 classes for testing data set. For offline handwritten Gurmukhi script, a recognition accuracy of 91.8% has been achieved by Kumar et al. (2014b). They have tested their technique with 5600 samples of 56-class problem. Nonetheless till now, there is no complete recognition system available for recognition of Indic script. In Table 6, we have presented comparisons of recognition results using the same feature types with different dataset and classifiers.

Table 3 Recognition results of non-Indic scripts
Table 4 Feature wise comparative study of recognition results for non-Indic scripts
Table 5 Recognition results of Indic scripts
Table 6 Feature wise comparative study of recognition results for Indic scripts

7 Suggestions on future directions

In optical character recognition field, a lot of directions are possible for future research as proposed algorithms used for segmentation task can be extended further for improving the recognition accuracy because segmentation is essential part of document recognition process. The following are some suggestions on future research directions in character and numeral recognition:

  1. a.

    There must be multiple standard handwritten character databases for non-Indic scripts and database should be adequately large in size.

  2. b.

    New features can be proposed to improve the recognition accuracy of different scripts. There is a need to develop the standard database for Devanagari, Gurmukhi scripts etc.

  3. c.

    A combination of statistical and structural features should be considered for extracting the relevant information about characters.

  4. d.

    More research should focus on the image transformation based representations

  5. e.

    More research can be carried out on the feature selection techniques and classification techniques for different scripts recognition. An optical character recognition system could be developed for multi-font style characters.

  6. f.

    Most of the work reported on fair quality documents. Sophisticated studies on degraded documents are not undertaken by the scientists in the development character recognition system.

  7. g.

    Experiments should be made to observe the effect of degraded quality paper as well as noise of various types, and take corrective measures.

8 Conclusions

In this paper, we have surveyed the character and numeral recognition work that has been done on non-Indic and Indic scripts. We have assessed the work done for various Indic scripts, i.e., Bangla, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil and Telugu. Also, we have presented the work done for recognition of various non-Indic scripts, i.e., Arabic, French, Japanese, Roman, and Thai. We have presented recognition accuracies achieved for character and numeral recognition of different non-Indic and Indic scripts. We have seen that the efficient techniques used for non-Indic scripts may be used for Indic scripts (printed text and handwritten text) so that accuracy of recognition may be increased as non-Indic scripts. One of the key inspirations of early development of character and numeral recognition system was a reading help for the visually handicapped. One possible way of achieving this goal is to convert the character and numeral output into speech format.