Keywords

1 Introduction

Optical character recognition deals with the mechanical conversion of descriptions electronically, handwritten or published transcript into the machine encoded transcript, which can be from a skim through the manuscript, a picture of a manuscript or from caption text covered on an appearance. OCR is extensively used as a procedure of data entry from published paper data proceedings which includes documents like passport, invoices, statements of bank, receipts generated through computer machines, business cards, printouts of the information, or every suitable documents which is a shared routine of digitizing published manuscripts so that they can be edited electronically, examined, stored more efficiently and also recycled in machine procedures like cognitive work out, machine conversion, text-to-speech scenarios, key information and text analysis [12, 13]. OCR is a research field in artificial intelligence, recognition of patterns and computer vision. In the real world, there is a huge interest to modify over the published records into automated records for observing the security of applied information for the clients. Alongside these appearances, the necessity is to create personality acknowledgment software design agenda to achieve image analysis in terms of documents which perform deviations of records to the arrangement of electronic readings [14, 15]. Generalized real time block diagram is represented in Fig. 1.

Fig. 1.
figure 1

Generalized real-time block diagram

For this process, there are diverse systems presented. In the middle of each approaches, Optical Character Recognition is available as the main crucial technique to perceive typescripts [1]. The vital goal is to quicken the technique of character response in record arrangements. Consequently, the agenda can prepare an incredible number of records within fewer periods and thus replaces the intervals. Currently, the major OCR construction which was shaped to change above the information is obtainable on papers into computers to get ready for the fit records through which the chronicles can be reusable and editable. The current outline or the past procedure of OCR is only OCR without framework practicality. That is the present outline accomplishes the homogeneous personality response or character recognition of single languages.

The paper is organized as, the related work is shortly expounded in Sect. 2. The proposed method is explained in Sect. 3, which consists of data set preparation, preprocessing, feature extraction and selection, and classifier. In Sect. 4, experimental results are shown in detail. Finally, conclusion is drawn in last Sect. 5.

2 Related Works

This section gives various researches are done for the classification of optical character recognition systems. Chaudhary, Garg, Behera et al. [1] shows the utilization of the optical character recognition concept for character recognition so as to achieve document analysis in terms of image, in which they have arranged the data in the grid form. They deal with the product agenda known as character recognition system. They have found that the need for OCR is to create the atmosphere in terms of programming framework to accomplish Document Image Analysis which shows the deviations of the records in their document organization to the electronic procedure. Hamad, Kaya et al. [6] investigated the OCR in various ways. They have given the detailed indication of the experiments that might arise in OCR phases. Additional, they have also reviewed the overall stages of an OCR scheme like preprocessing of the document or image, segmentation of the image, normalization process, feature extraction process and classification process. Then, they have also highlight growths and main uses and uses of OCR. Ahlawat et al. [2] have shown an efficient review of the processes of OCR. The author has discussed the strategies text from an image into arrangement readable format. The machines accept the typescripts through an altering method which makes work as Optical Character Recognition. Afroge, Ahmed, Mahmud et al. [7] presented an ANN-based method for the acknowledgment of characters in English among FNN named as feedforward system. Noise is measured as one of the main product that damages the presentation of CR (character recognition) system. Their net (network) has several layers named as input, hidden, output layers. The entire system is separated into two segments like training and acknowledgment segment. It comprises of acquisition of image, pre-processing and extraction of features. Lee, Osindero et al. proposed recursive RNN with consideration modeling of character recognition [8]. The main compensations of their systems deals with the usage of recursive CNN which deals with the efficient and actual extraction, learned language model, recurrent neural network, the usage of attention mechanism, image features and end-to-end preparation inside a normal back propagation structure. There are various method for omnidirectional processing of 2D images including recognizable characters [11]. Santosh et al. proposed dynamic time wrapping random based shape descriptor for pattern recognition used to match random features and avoided missing features [18,19,20,21].

3 Proposed Method

This is one of the crucial sections for the proposed implementation of the system which is based on optical character recognition system. This system deals with the automatic learning system based on training of the images. The proposed approach is mainly divided into two phases. The very first is the training phase in which the normalization of the image using filtration process and the feature extraction process is done and the last process of training deals with the feature optimization process using particle swarm optimization. The second phase deals with the classification approach which is done using back propagation neural network. The main consider which is to be kept in mind is the proper training of the system which is evaluated in terms of the mean square error rates with respect to the number of epochs. Figure 2 explains the block level flow diagram of the proposed approach. The very first block is the building of the graphical user interface for the human man made interactions.

Fig. 2.
figure 2

Flow diagram

3.1 Dataset and Preprocessing

The dataset 3410 samples are selected from the Chars74K with different shapes drown using pc and tablet. This dataset is selected because of consideration of scanned images with the solved problem in constrained situations, like common character fonts and uniform background. Also, images are taken with popular cameras and hand held devices to overcome challenges in character recognition. The dataset consists of both English characters and numbers. We have considered the letters which are written in different styles for the samples. The English alphabets having total 62 classes of different styles of writing for each alphabet. Each Image consists of 900 \(\times \) 1200 pixels by size which is high in dimensions and our proposed system is able to operate these high dimension images. In the training phase, 70% of the image data sets are trained and 30% of image dataset samples are taken in the testing phase. It will tell the flexibility of our proposed system to operate on large datasets in the training phase and classification in the testing phase.

The scanned images are preprocessed in terms of binarization, filtering, and edge detection. Binarization is the process of obtaining the black and white image (0 or 1). Then, gradient filter is used for smoothing the image so that it is easy for edge detection process [3]. Edge detection of the image terms as the detection of boundaries of the image and on the basis of which all unnecessary pixels get eliminated. The boundaries are detected using an efficient technique named as the canny edge detector, instead, it can also be used different operator like sobel, log, pewit etc. [5].

3.2 Independent Component Analysis Feature Extraction

In this paper, we have used a generative approach for the observed multivariate data using Independent Component Analysis (ICA), which is normally assumed as huge operations on data to extract out the meaningful information. In the proposed model, the variables are presumed to be linear and nonlinear combinations of certain unknown variables which are also unidentified. The variables are expected non-Gaussian and equally self determined and are known as independent components of the practical information (observed data) which are also termed as sources. We have used ICA computations for the factor analysis as it’s the extension of the principle component analysis. It is a power computation process for the computations of the independent components as features to extract the feature vector which is capable of evaluating the factors analysis where the traditional processes are failed to execute. Here, the dimensions of the images are considered as a parallel set of the signals or we can say the time series. The blind source separation process is recycled in it.

The linear mixtures in terms of observing variables \(T_{1}\), ..., \(T_{n}\) having total P independent components and A is the matrix of the element \(a_{pn}\) such that

$$\begin{aligned} T_{p}= a_{p1}s_{1} + a_{p2}s_{2} + a_{p3}s_{3} + a_{p4}s_{4}+ ... + a_{in}s_{n} \ | \ for\ all\ p \end{aligned}$$
(1)

So the notation of the vector matrix is given by

$$\begin{aligned} T=A*S \end{aligned}$$
(2)

The \(S_{i}\) matrix is statistically independent. In this work we have used the non-Gaussian distribution which will be used to estimate the A matrix. After the estimation process of A matrix, we will perform its inverse process, say W and will obtain the independent feature vector using

$$\begin{aligned} s= W*T \end{aligned}$$
(3)

Where, ‘s’ will be the extracted independent feature values and is organized in the form of a feature vector which is closely related to the blind source separation process. Then the output of the feature vector will be the input to the optimization for the instance selection which is done using particle swarm optimization.

3.3 Particle Swarm Optimization for Selection of Features

PSO algorithm is a global algorithm, which the generalized process used to enhance the problem using iterative scenario which will offer the best explanations and clarifications from various solutions. The algorithm is developed by observing behavior of birds and fishes. In PSO, population is termed as swarm and individuals are particles. These particles will try to get nearest matching, which is termed as personal best (pbest) and global best (gbest) obtained by analyzing closeness of all particles of swarm. The gbest will gives overall best value and its location obtained by any particle in the population. It consists of changing the velocity of each particle as per values of pbest and gbest in each step. It is weighted by separate random numbers being generated for speeding up near pbest and gbest. PSO optimize random solutions particle from the population in D-dimensional space. So that, in PSO neither mutation calculation nor overlapping will be occurred. The collection of procedure limitations can put the huge effect on optimization consequences. In this paper (our research work) we are optimizing the feature set extracted from ICA which is basically known as the instance selection. It is the subset collection rate which is the process of selecting a subsection of appropriate features used in the construction of training model. The feature vector which is extracted from ICA makes new feature vector using operation on feature vector extracted using ICA. Choosing Selection of the input parameters for the PSO is one of the crucial tasks in instance selection process which deals with high performance in the problematic condition in performing optimization. The steps are as follows [9, 10].

  1. 1.

    For every particle \(j = 1, ...\) swarm do

    Swarm will be the total number of rows and columns generated using feature vector

    Set the particle’s location with a consistently dispersed random vector \(X_{i}\)

    Set the particle’s best recognized location to its initial location \(P_{i}\)

  2. 2.

    If \(f (GP) < f (GB)\) then

    Get the swarm updated position which is the selection of the instance or the feature value

    Make the best particle speed \(V_{i}\)

  3. 3.

    Do until all swarms do

  4. 4.

    Evaluate the fitness function and generate the new solution for the next iteration \(T_{i}\)

  5. 5.

    Make updates on the particle’s speed and location \(X_{i}\) and \(V_{i}\)

  6. 6.

    Generate the global best solution which is the total set of instance selections and optimize feature vector which is the global best solution \(G_{b}\).

    Where \(G_{b}\) is the subsequent global best enhanced solution until all iterations are completed.

3.4 Back Propagation Neural Networks

The backpropagation neural network algorithm is proposed in two phases. The very first phase is to set up the input pattern in terms of the layers of the network. The network is repetitive in terms of the number of iterations until it stops training. The network layer consists of the hidden layer which is connected in the form of synaptic weights for the link stability.

This algorithm has been proposed by Sharma et al. [17], to identify the text with respect to the scanned images of the documents which is the form of handwritten printed format. For training and testing purpose this algorithm is useful. After completion of the training we will move to the training phase which deals with the uploading of the test sample comprise of English character which system will automatic classifies and on the bases of which the performance will be evaluated. We have taken the training set as the number of features covered in the feature vector on which the neural perform the back propagation training for the performance evaluations to achieve low updations of weights to make the connections stable.

4 Results and Discussions

This section deals with the valuable simulation of our proposed approach which is taken place in the MATLAB environment.

Fig. 3.
figure 3

Main GUI panel

The Fig. 3 shows the graphical user interface panel using MATLAB toolbox in which human machine interaction panel is taken place and the user friendly environment is created. The user will click on the buttons and the user is able to perform actions and generate some events on it.

4.1 Training and Preprocessing

The Fig. 4 shows the training sample of the image of English Alphabet (D) which is to be processed for the further edge detections and filtration for the normalization process in part (a), where as part (b) shows the edge detection of the image which finds some edges of the image and on the basis of which all unnecessary pixels get eliminated. The Canny edge detection technique is performed.

Fig. 4.
figure 4

(a). Training sample and (b). Edge detection

4.2 Feature Extraction

The feature extraction process shows the uploaded sample image is extracted in the content of the image which is done using ICA (Independent Component Analysis). ICA is one of the efficient techniques which uses blind source separation concept to find out the independent components which doesn’t disturb the neighborhood pixels and find the characteristics which doesn’t affect the other intensities of the image. As shown in Fig. 5, extracted features for sample 9, indicates independent values per bit.

Fig. 5.
figure 5

Features extracted

4.3 Feature Vector Selection

As particle swarm optimization is used, it optimizes feature vector which deals with the relevant instances used for the classification of the sample in the testing phase and also acts as input which is directly fed to the neural network to train the whole system and build network layered model using sigmoid function as an activation function. Figure 6 shows optimized feature values.

Fig. 6.
figure 6

Optimize feature vector

4.4 Neural Network Training

As shown in Fig. 7, the training of the system using backpropagation neural network which deals with the number of iterations. It signifies the proposed system is taking total 20 iterations out of max limit of iterations which are 1000 to train the whole system which shows the robustness and also the fast response and reaction time. It also deals with the gradient decedent optimization model which decreases the loss function to achieve low mean square error rate with respect to the number of trained samples.

Fig. 7.
figure 7

Neural network training

Fig. 8.
figure 8

Classified outputs

4.5 Performance Evaluation

The Fig. 8 shows the classification output which is done using neural network that the uploaded sample deals with the image number 10 having the alphabet (D) in the training dataset which is automatically classified by the machine or system which shows that our proposed approach is able to perform automatic classification for optical character recognition of the English alphabets [4].

Figure 9 shows The performance evaluation in terms of sensitivity, specificity and recognition rate of the proposed system and classification results which signifies that our proposed approach is able to achieve high recognition rate probability which is 0.98216, high sensitivity which signifies the high true positive rates and high specificity which signifies high true negative rates. The recognition, sensitivity and specificity must be high for the low error rate probabilities and less classification error rates.

Fig. 9.
figure 9

Performance evaluation

As recognition rate is focused, Table 1 shows 98.21 % characters are matched using this algorithm.

Table 1. Comparison evaluations

5 Conclusion

OCR is one of the efficient and emerging technologies in real world scenario. A variety of approaches used for optical character recognition are done by the researchers which are used for automatic classification and correlations. This paper deals with an effective and effectual approach for the evaluation of the proposed solution using an automatic classification and optimization approach which deals with the normalization and feature extraction and are achieving high recognition rates of 0.98 and high sensitivity which is 0.995 and high specificity which is 0.96 and are able to achieve low error rate and classification probabilities. From the results and discussion, it can be noticed that the neural network (backpropagation) provides high reaction and response time to perform high rate of classification based on training data and simulate the network in the testing phase to perform automatic character recognition approach.