Keywords

1 Introduction

Handwritten math formula recognition is attracting interest due to its practical applications for consumers and academics in many areas such as education, office automation, etc. It offers an easy and direct way to input math formulas into computers, and therefore improves productivity for scientific writers. However, it is a challenging field due to the variety of writing styles and math formulas structures. Hundreds of alphanumeric and math symbols need to be recognized, and also the two-dimensional structures, specifically the relationships between a pair of symbols, for example, superscript and subscript, both of them increase the difficulty of this recognition problem.

Many on-line techniques have been studied for handwritten math formula recognition. But when the recognition is carried out from a document image, therefore off-line techniques must be considered. In the last decade, most of the research has focused on typeset formulas but little research is published. The main difference between on-line and off-line recognition of math formulas is the temporal information that conveys the former problem that is lost in the latter problem. Math formula recognition can be divided into main steps: symbol recognition and structure analysis. Note that an accurate math formula recognition system greatly depends on an efficient symbol recognizer. Recently, deep learning marks the state of the art for math symbol classification problems, especially those including multiple layers of nonlinear information processing that automatically solve problems without using any prior knowledge. The results of recent researchers studies, summarized in Table 1 prove that Deep neural networks enhance mathematical recognition symbols [5] comparing to previous methods like Modified Quadratic Discriminant Functions (MQDFs) [4] with Bidirectional Long Short-term Memory (BLSTM) and Hidden Markov Models (HMM) [9].

The focus in this work is on handwritten math symbol recognition. It is one of the application in pattern classification: the task of labeling the symbol candidates to assign each of them a symbol class. It is as well a difficult task because the number of classes are quite important, more than one hundred different symbols including digits, alphabet, operators, Greek letters and some special math symbols (see Fig. 1).

Fig. 1.
figure 1

Samples of math symbols.

It exists an overlapping between some symbol classes (inter-class variability): for instance, some symbols belonging to different classes might look about the same when considering different handwritten samples. There is also a high intra-class variability because each writer has his writing style (see Fig. 2). Besides, the challenges coming with the off-line. Many results showed that on-line recognition reached higher accuracy than off-line recognition [9], because of the absence of tracking coordinate of the symbol from start to stop, used to recognize it properly. So, it is important to design robust and efficient classifiers and to use a representative training data set. Nowadays, most of the proposed solutions use machine learning algorithms such as artificial neural networks or support vector machines.

Fig. 2.
figure 2

Inter-class (a) and intra-class (b) variability.

This paper is organized as follows. In Sect. 2, we present a brief review in the field of math symbol recognition. In Sect. 3, we detail architectures of the proposed deep learning models and the used data augmentation and learning transfer mechanisms. In Sect. 4, we discuss the obtained results and compare the proposed models and some related works. In Sect. 5, we give some conclusions and prospects.

2 Related Works

In this section, we briefly discuss all advances in the area of math symbol recognition, as summarized in Table 1. The focus is on deep learning-based models.

Table 1. Comparison between some related works.

In [1], authors define their proper CNN with a simple architecture composed of two convolutional layers, two pooling layers, a fully connected and a softmax layer to classify off-line handwritten math symbols. To improve their results, they tuned the performance of CNN by changing the number of feature maps in convolutional layers, the number of nodes in a fully-connected layer, and the size of the input image. The authors used CROHME 2014 for training and evaluation. Authors declare obtaining 93.27% as train accuracy and 87.72% as a test, but they didn’t show any accuracy or loss curve, which is necessary to add more credibility to their work. In [2], authors proposed a CNN, called HMS-VGGNet, for off-line recognition of handwritten math symbols. It is inspired by VGGNet, with smaller image sizes and additional batch normalization layers. The authors also used global average pooling layers to replace the fully connected layers. To prevent the lack of off-line data, the authors used elastic distortion to enrich the training set. Their proposed CNN uses only off-line features of the symbols and achieved an accuracy of 92.42% using CROHME 2016 test set. In [3], authors described an approach for off-line recognition of handwritten math symbols. They used Simple Linear Iterative Clustering for symbol segmentation and different methods: k-Nearest Neighbors (k-NN), LeNet, and SqueezeNet for symbol classification. The best-obtained accuracy using k-NN is 84% with 66 classes of symbols. Using modified LeNet, they achieved an accuracy of 90% with 87 classes. Finally, they reached 90% with a pre-trained SqueezeNet for 101 classes. The authors mentioned that they used the 6000 MNIST images from the CROHME dataset and 2000 images from the set of Handwritten Digit Images published by Computer Vision Group of the University of Sao Paulo, but they did not give details about the number of used instances for train, validation, and test and the cited accuracies. Recently, researchers used the off-line features extracted from the symbol images in combination with the online features to recognize the online math symbols and got great achievements. MyScript [6], the winner of CROHME 2016 extracted both online and off-line features and processed with a combination of Deep Multilayer Perceptron and Recurrent Neural Networks. MyScript achieved 92.81% in CROHME 2016 test set.

From our study of the state-of-the-art, we noted that the combination of online and off-line features betters the symbol recognition task performance and that the off-line recognition of math symbol should be more considered if we aim to reach the best performance. We also noted that many classification techniques are previously used, but there are very few works that compare different classification techniques on the same database and with the same experimental conditions.

3 Proposed System

We explored different architectures of CNNs, trained, and tested them using CROHME 2019 dataset. The objective is to find out the appropriate model for the off-line math symbol recognition. For that, we followed some steps, as described below.

3.1 Data Generation and Tuning

In CROHME 2019, there are 101 different classes of math symbols. The online data is given in Ink Markup Language (InkML) where each symbol is presented by an InkML file. This latter contains the set of symbol traces, knowing that a trace consists of a set of timing sampling points, and each point records its position. When generating symbol images from online data, we connected the points of the same trace with a single line. The generation of symbol images from InkML files is performed by a tool provided with the CROHME 2019 dataset. We then made some changes to ensure the automatic distribution of images on folders named with the class names. To train the proposed CNN models, we generated 30993 symbol images of size \(38\times 38\). To built the train and validation dataset, we automatically split the images dataset to have 24758 images for the train and 6235 for the validation. For the test, we generated and created ground truth of 15483 images from the 15483 CROHME test dataset. As it is known, improving the performance of a deep learning model depends either on tuning the applied model or tuning the used data. Since training deep learning models need several hours, we thought about normalizing the generated symbol dataset by binarizing and inverting them.

3.2 Model Tuning

To classify math symbols using a deep learning model, we have to choose one of these three alternatives: 1) to define a new model and train it on our data. 2) to use a predefined model, tune and train it on our data, or 3) to reuse a pre-trained model on other data and train it on our data. Based on the first tests and limited by the available data and computational resources, we have chosen the third alternative. One of the most common problems that we encountered while training these deep networks is overfitting. Recall that overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. To prevent such a problem, mainly due to overly complex models with too many parameters, we added a global average pooling layer where all the parameters in one feature map are averaged as a result. As deep networks need to be trained on large scale datasets and it is labor-intensive to obtain a large amount of labeled data in real applications, we first augmented the database. Standing on the transfer learning technique, we then tested and compared several pre-trained CNNs. This is will be dealt with in more detail in the next subsections.

Data Augmentation. Having a large dataset is crucial for the performance of the deep learning model. However, we can improve the performance of the model by augmenting the data we already have. Deep learning frameworks usually have built-in data augmentation utilities. Accordingly, to perform augmentation on a dataset of handwritten math symbols, it must be considered that it does not change the symbol meaning, for example, <when is vertically flipped, it is converted to>. Therefore, some augmentation techniques cannot be run on all symbols. In this work, we applied rotation with a random angle in the range of [−15, 15] and horizontal and vertical shift augmentation techniques which are almost safe for handwritten math symbol recognition. Figure 3 shows several symbol image samples generated by the used augmentation techniques.

Fig. 3.
figure 3

Samples of symbol images result of data augmentation.

Used Models. Deep learning models need high computational resources with a huge dataset to obtain good results. One of the solutions to use deep learning-based models for symbol classification is to reuse pre-trained models and to test them with different parameters to improve accuracy. In our work, we have tried four pre-trained CNN models: VGGNet, SqueezeNet, Xception Network, and DenseNet. To these deep networks, we added two layers: 1) an average pooling layer to overcome the problem of over-fitting by averaging the parameters, and 2) a dense layer with regularization for math symbol class prediction. We started our tests from the smallest to the deeper network:

  • Squeezenet: is a CNN with 18 layers deep, it is characterized by its compressed architecture design based on fire modules. A fire module is a combination of squeeze layers (\(1\times 1\) convolution filters) and expand layers (a mix of \(1\times 1\) and \(3\times 3\) convolution filters)

  • VGGNet19 [10]: is a CNN with 19 layers deep, it is composed of 16 convolutional layers and 3 fully connected layers

  • Xception [11]: is a CNN with 71 layers deep it is based entirely on depthwise separable convolution layers.

  • Densenet121 [12]: is a CNN with 121 layers deep. Recent work has shown that CNN can be deeper, more accurate, and efficient to train if they contain shorter connections between layers and this is what characterises in fact the Densenet. Each layer in Densenet is connected to every other layer in a feed-forward fashion. Whereas traditional CNN with L layers have L connections, one between each layer and its subsequent layer, Densenet has \(L(L+1)/2\) direct connections. DenseNets have several advantages: they strengthen feature propagation, encourage feature reuse, and reduce the number of parameters. The efficiency of this model is proven by the tests that we did for the recognition of math symbols.

Table 2 and Fig. 4 show the architectures of the different models.

Fig. 4.
figure 4

Architecture of the Xception network.

Table 2. Architecture of SqueezeNet, VGGNet and DenseNet

Transfer Learning. In computer vision, transfer learning is expressed through the use of pre-trained models. A pre-trained model is a model that was trained on a large dataset to solve a problem similar to the one that we want to solve. It allows us to build accurate models in a timesaving way [13]. To well apply transfer learning and reuse some pre-trained model, we first have to correctly classify the treated problem, considering the size of the dataset and its similarity to the used dataset to train the pre-trained model. Figure 5 shows the size-similarity matrix that controls the choice of the model and guides us to fine-tune it to get successful results.

Fig. 5.
figure 5

Size similarity matrix.

Fine-Tuning. Having situated our problem according to the size-similarity matrix, we can choose the adequate fine-tuning alternatives. Figure 6 represents a CNN model as a succession of two blocks: a convolutional base for feature extraction in the top and a classifier in the bottom. Following the size-similarity matrix, four fine-tuning decisions can be taken.

4 Experimental Results

4.1 CROHME Dataset

Since the datasets of off-line handwritten mathematical symbols are rare, we used the online data of CROHME 2019 to generate symbol images for off-line symbol recognition. The number of symbol classes in the CROHME dataset is 102, including a junk class for erroneous symbols. To evaluate the proposed CNN models, we generated 30993 symbol images. To built the train and validation datasets, we automatically split the images dataset to have 24758 images for the train and 6235 for the validation. For the tests, we generated and created ground truth of 6820 images from CROHME 2019 test dataset.

4.2 Experimental Setup

Our experiments were performed on an Intel(R)Core (TM) with a CPU of 2.5 GHz and a memory of 8 GB. We trained our system using pre-trained deep learning models from the Tensorflow library, trained over the ImageNet dataset. Although our generated images are different from the natural images of the ImageNet dataset, we found that training using the pre-trained models allows for much faster convergence than training from scratch, especially with the presence of a small dataset. Regarding the size-similarity matrix presented in Fig. 5, we found that our classification problem satisfied the third condition (small dataset and different from the pre-trained model’s dataset, that is why we fine-tuned our model by freezing the ten first layers of the convolutional block responsible of the extraction of generic features and train the rest on our data. The initial learning rate was 0.001, the Batch Size was set to 32. We initialized the number of the epoch at 200 and we implemented early stopping callbacks.

Fig. 6.
figure 6

Decision map for fine-tuning pre-trained models.

4.3 Results and Discussion

We trained different CNN models with various parameters on our dataset. Figure 7 shows the accuracy and loss curves of the different CNN models. Our best obtained experimental results are shown in Table 3.

Fig. 7.
figure 7

Accuracy and loss curves.

We can see that DenseNet121 achieves the state-of-the-art and outperforms the other models. This can be explained as follows: 1) This network is remarkably deeper than the others (121 layers), and 2) Adding more instances to the dataset enhances the capabilities of the model (using a dataset of 24758 train images and 6235 validation images improves the accuracy from 91.73% to 91.88% for the training and from 84.07% to 88.82% for the validation). Evaluating our model on the test dataset, we obtained an accuracy of 83.68%. Table 4 shows the performance measures: Precision, Recall and F1-score of the different classes and the overall system.

Table 3. Accuracy and loss results.
Table 4. Model performance evaluation.

Comparing our work to others, we noted that obtained results are promising but still less than some systems of online symbol recognition or those utilizing online and offline features to classify symbols, and this because online data has the tracing information while off-line data does not. Online data has advantages when classifying symbols having similar shapes and different writing styles, such as 5 and s. Our networks only use offline features so it is hard for it to correctly classify those symbols.

Although the symbol recognition achieved good accuracy, that does not prevent it from making mistakes to predict some symbol classes. Analyzing the confusion matrix, we found that miss recognitions are mainly due to that certain distinct symbols are in close resemblance, such as the capital letter X and math symbol \(\times \), the symbol division / and the comma sign, the letter O and the Greek letter \(\varTheta \), the capital letter S and the digit 5, the digit 9 and letter g, etc.

Observing the event of confusion, we noted that confused symbols have roughly similar morphologies that make them difficult to be distinguished even for a human. We considered some of the misrecognition cases to be too difficult for any classifier to resolve without considering symbol context. That is why we keep resolving some of these confusion cases for future works dealing with the entire math formula recognition.

5 Conclusion and Future Work

In this paper, we addressed the problem of offline recognition of handwritten mathematical symbols. We used a deep learning recognition method based on the Densenet model to which we did some modification. Our symbol recognition system has shown its efficiency on a reasonable number of handwritten symbols from Crohme 2019 dataset with an accuracy rate of 83.71%. In further works, we plan to improve the performance of the model by augmenting the data we already have. We will also work out on treating the case of junk symbol by making the focus on finding why they are considered junk, and how to treat them based on cause analysis.