1 Introduction

Amazigh languages are a family of languages in the Afro-Asiatic language phylum [1]. They are spoken by people living in scattered areas in a large region of northern Africa situated between Egypt’s Siwa Oasis and Mauritania [2].

The used writing system of the Amazigh language is called Tifinagh [3], derived from the “Libyco–Berber” inscriptions used since the sixth century before Christ era by the populations of North Africa, the Sahel, and the Canary Islands [3, 4] (Fig. 1).

Fig. 1
figure 1

Medallion of Carthage [5]

Ancient Tifinagh scripts are engraved in the stones and tombs of some historic sites in Algeria, Morocco, Tunisia, the Tuareg areas and Canary Islands [5]. Figure 2 shows an image of an ancient Tifinagh script found in the Dougga site in Tunisia.

Fig. 2
figure 2

An ancient Tifinagh script found in the Mausoleum of Atban (Dougga site, Tunisia) [5]

Tifinagh has been modified from its origins to its present form, from Libyque to Neo-Tifinagh, passing through the Saharan and Tuareg Tifinagh [2, 4] Libyque is the oldest one used in the Mediterranean coasts from Kabylie, Constantine, Aurès (Algeria) to Morocco, Tunisia and in the Canary Islands (Spain) [3]. Saharan Tifinagh also called Libyco–Berber or the old Tuareg was used to transcribe the old Tuareg inscriptions. Neo-Tifinagh, based on the Tuareg Tifinagh, is designed for writing Amazigh dialects of the Maghreb (Morocco and Algeria) [3, 4].

Further information about the ancient and the modern Tifinagh can be found in the book [4] by Ameur et al. It also provides a history of the alphabet, its origin, its different variants and their decryption.

Concerning Morocco which contains the heaviest concentration of Amazigh speakers [1], Amazigh language is officialized since October 17, 2001, [2] after the creation of the Royal Institute of Amazigh Culture (Institut Royale de la Culture Amazigh-IRCAM) [6].

Amazigh language in Morocco has three varieties each of which is spoken in different regions: Tarifite in the north, Tamazight in the Atlas Mountains and Tachelhit in the south regions [2]. The IRCAM institute has adopted the so-called Tifinagh-IRCAM alphabet as the official alphabet for Amazigh language [2]. This alphabet is then officially recognized by the International Organization for Standardization (ISO) [7, 8]. Tifinagh-IRCAM contains 33 characters shown with their correspondent’s pronunciation in Latin characters in Fig. 3.

Fig. 3
figure 3

Tifinagh characters adopted by the IRCAM with their pronunciation in Latin characters

The great variability inherent in the nature of handwriting has made this area of research very active. Therefore, with recent advances in computing technologies, several automatic handwriting recognition techniques have been improved and perfected, particularly for the Latin and the Arabic scripts [9,10,11,12,13,14]. However, despite existing works, establishing an automatic handwritten Tifinagh characters recognition system is an open research challenge and is still at its early beginning.

In this paper, we present a deep convolutional neural network (CNN) for Amazigh handwritten Tifinagh characters recognition system. The obtained results of our proposed system will be discussed and compared with other proposed approaches from the literature.

The structure of the rest of this paper is as follows. Section 2 introduces the related works and their review. Section 3 highlights the convolutional neural networks. The proposed deep CNN architecture is explained in Sect. 4. Experimental results are detailed in Sect. 5. Finally conclusions are made from the experimental results.

2 Related Works

Recognition of Amazigh handwritten characters has become an active research field because of its potentials and various applications. Rachidi et al. [15], presented a state-of-the art and a comparison of scientific research works accomplished and published for automatic recognition of Amazigh characters. In [16], Aharrane et al. gave a comparative study of different supervised algorithms for handwritten Amazigh character recognition. Their goal is to compare performances of some popular classifiers (Bayesian Networks [17], Decision Trees [18] and Multilayer Perceptron [19]), using a set of proposed statistical features. Same authors proposed in [20] a handwritten Amazigh characters recognition system based on a statistical approach with a feature set (densities and shadow features) composed by 79 elements representing each Amazigh characters. In the recognition phase, they used a multilayer perceptron (MLP) as a classifier. The obtained accuracy using only 24,180 characters from the Amazigh Handwritten Character Database (AMHCD) [21] is 96.47%. A little improvement of this system is published later in [22] by combining three classifiers using some majority voting strategies.

Amrouch et al. [23, 24], developed an automatic Amazigh character recognition system based on the Hidden Markov Models (HMMs) [25]. They start by a preprocessing step on the character image, then they use the Hough transformation to represent each character by a string of numbers. The resulting string is injected to an unidirectional HMM (1D-HMM) to achieve the learning task using Baum–Welch algorithm. Finally they use the Viterbi algorithm to recognize the desired character. To evaluate the performance of the proposed system, the authors use only 24,180 among 25,740 characters constituting the entire AMHCD database (as in [22]). The best score obtained is 97.89% with a model having 14 states and 1 or 2 Gaussians.

Es-saady et al. [26, 27], proposed an approach based on the determination of the character’s horizontal central line. Based on its position, a set of statistical character’s features are calculated using a sliding windows technique. The used classifier is a MLP with: one input layer composed by 90 neurons, one hidden layer and one output layer with 31 neurons. An improvement of this approach was published in [7], by using the character’s horizontal baselines instead of central line. They experiment the proposed approach using only 24,180 characters from the AMHCD database as all the above mentioned works and obtain 94.62% as the best recognition rate.

In [28] authors proposed and compare performances of two networks to recognize Tifinagh characters: Convolutional Neural Networks (CNNs) and Deep Belief Network (DBN) [29]. The proposed CNN is composed of 7 convolutional layers with Relu activation function. The DBN is composed of three and four hidden layers by variying the number of neurons in each hidden layer from 500-500-2000-31 to 1000-1000-1000-2000-31. By using AMHCD handwritten character database, authors prove that CNNs outperform the DBN with an accuracy equal to 98.25%.

All mentioned approaches have three major weaknesses: first, they need a preprocessing techniques that require a significant computation time, second, they can cause an important confusion in the recognition of some characters such as ‘Yas’ and ‘Yar’, ‘Yaz’ and ‘Yazz’, ‘Yadd’ and ‘Yatt’, and ‘Yay’ and ‘Yag’, third, they use only at most 31 characters instead of 33 Amazigh characters. To overcome these limits, we propose in this paper a robust and fully automatic handwritten recognition system. This system extracts Tifinagh character’s features directly without supplementary preprocessing step and it can recognize all characters in the AMCHD database.

3 Convolutional Neural Networks

A convolutional neural networks (CNNs) is a concatenation of an input layer, an output layer, and a multiple hidden layers. Compared with fully connected neural networks, the parameters number of these models is widely reduced by sharing weights and biases [30]. LeNet [31] is the first Convolutional Neural Network (CNN) that were developed for handwritten digit recognition since 1998. At that time CNNs were restricted by high computation and memory resources allocation costs. But after the emergence of GPUs and the use of Relu activation functions, instead of Sigmoid and Tanh, CNNs have demonstrated excellent performance on image recognition and classification tasks, especially for handwritten digits and characters recognition. Several papers have been reported in the literature such as handwritten Chinese [32], handwritten character classification [33] and handwritten digits classification [34].

A typical convolutional architecture using an RGB image as input is shown in Fig. 4. In the first, a convolution operation is applied to the input image using k filters (masks) (\(k=1\) in the figure) for each channels R, G and B. These filters act as feature detector from the original input image. Then, a non-linearity function \(\psi\) is then applied to the result of the convolutional operation to obtain the so-called activation map (also called feature map). Each layer is followed by a pooling layer to reduce the size of the activation map and to give invariance for small local translations. Finally, a fully connected layer with a softmax activation function is used in the output layer to perform the classification task.

In the literature, many advanced convolutional neural networks have been developed over last years such as AlexNet [35], Inception modules [36], VGG networks [37], and region-based convolutional neural networks [38].

Recently many researchers employ convolutional Neural Networks to design handwriting recognition systems for several languages such as Chinese [39], Arabic [40], Bangla [41], Devanagari [42], Indic [43], Tamil [44]. The obtained results are good and promote the usage of CNNs for handwriting recognition systems for other languages.

Fig. 4
figure 4

Typical CNN architecture with a convolutional layer, a maxpooling layer, a flattening layer and a fully connected layer with a fixed number of neurons

4 The Proposed System

Fig. 5
figure 5

The proposed system architecture

The used CNNs architecture is given in Fig. 5. It is composed of five adjacent layers. All training images are labeled and resized with \(32\times 32\) pixels. The first three layers are responsible for the feature extraction and the two last ones perform features classification. The first layer (L1) is a convolution layer with 32 activation maps of \(32\times 32\) pixels each. Each neuron unit is associated to a convolution by a \(3\times 3\) mask with the addition of a trainable bias. In this layer, different activation maps correspond to different trainable masks and biases. Each map has 9 trainable weights plus a trainable bias which leads to 320 \((32\times 10)\) trainable parameters for this layer (L1). The used activation function is a rectified linear unit (Relu) as shown in Eq. 1.

$$\begin{aligned} f(x) = \left\{ \begin{array}{ll} 0 &{}\quad {\text {for }} x < 0\\ x &{}\quad {\text {for }} x \ge 0\end{array} \right. \end{aligned}$$
(1)

Layer (L2) is also a convolution layer. It contains 32 activation maps. The implementation of this layer differs from the first one (L1) by adding a maxpolling and dropout layers. The pooling size is 2 by 2 in the x and y directions. We added dropout layer with probability 0.5 for regularization. In total, we have 9248 trainable parameters for this layer.

Next, we add another layer (L3) composed as for (L2) of three layers: a convolutional layer, a maxpooling layer and a dropout layer with 64 output channels. The size of each feature map is \(6\times 6\) pixels. In total, we have 18,496 trainable parameters for the (L3) layer. This duplication and combination of the feature extractors lead to the exploration of other high-level features.

The last two layers perform features classification. The first layer is composed by 64 Relu neurons. Each neuron in this layer is fully connected to only one feature map of (L3). The second layer contains 33 neurons indicating the class of the input image. In this layer, we use softmax activation function. We train the system using RMSProp optimizer with an adaptive learning rate. The whole architecture has 177,729 trainable parameters. In Table 1 are given all the parameters of the proposed architecture.

Table 1 The proposed architecture parameters summary

5 Experimental Results

5.1 Data Set

To evaluate the proposed approach, we use the Amazigh Handwritten Character Database (AMHCD), which is the only available and large database of handwritten Amazigh characters. This data set was created and developed at the IRF-SIC Laboratory of Ibn Zohr University, Agadir, Morocco [21]. It contains 780 scanned images of each character among the 33 Tifinagh-IRCAM characters given a total of \(780\times 33=\) 25,740 images written by 60 writers of different sex, age, and job. Figure 6 presents some examples of handwritten Amazigh characters. All images in this database are resized to \(32\times 32\) pixels. We notice that in contrast to existing methods in the literature our system doesn’t require any preprocessing step to the images of the AMHCD database.

Fig. 6
figure 6

Some Amazigh handwritten characters samples from the AMHCD database [21]

5.2 Results and Discussions

5.2.1 Data Splitting into Training and Validation Sets

For experiments, the images of the AMHCD database was randomly shuffled and split into a training and a validation sets. To investigate the performance of our system, we vary the number of images in the training set from 50% images to 80% and from 50 to 20% images in the testing set and register the obtained accuracy using these sets. Table 2 illustrates the accuracy of various training set sizes. As we can see, the accuracy reaches the best value 99.1% when 80% of the AMHCD database images are used for training. According to the experimental results in Table 2 the training set size is 80% of images and 20% is for validation set size.

Table 2 Accuracy vs the number of training samples for the AMHCD database

5.2.2 Feature Visualization

In this part of experiments, we show the output of each hidden layer in the proposed CNN system after the training step. Our goal is to see how different filters in different layers are trying to highlight different parts of the image character. The Fig. 7 shows the feature maps of the three convolutional layers using an input image of ‘Yakw’ character. As we can see some filters in the first and second layers are acting as edge detectors, others are detecting a particular region of the character. We can note also that the patterns captured by the convolution filters in the third layer are not clear because the feature maps become more sparse and localized.

Fig. 7
figure 7

Illustration of feature visualization using the ‘Yakw’ character for the proposed system: a Input image. bd Output feature maps respective to first, second and third convolution layers in the proposed CNN system

5.2.3 Accuracy Using the Proposed CNN

To show performance of the proposed system in classifying Tifinagh characters, we give a summary of classification rate of each Tifinagh character in the Fig. 8 and we calculate the confusion matrix as shown in Fig. 9. As we can see, the number of misclassified characters is very small compared to the well-predicted ones (all values are concentrated in the diagonal). Mistakes made by the proposed system are not surprising, they can be explained by the fact that some characters are not well written in the AMHCD database and by the similarities between some Tifinagh characters, such as ‘Yagw’ and ‘Yag’, ‘Yak’ and ‘Yahh’, ‘Yall’ and ‘Yan’, ‘Yarr’ and ‘Yar’, ‘Yas’ and ‘Yar’, ‘Yaz’ and ‘Yazz’ and between ‘Yi’ and ‘Ya’ as observed in the confusion matrix. According to this confusion matrix the number of misclassified characters doesn’t exceed five characters in the worst case.

Fig. 8
figure 8

Classification rate of each character in the AMHCD database during the validation phase

Fig. 9
figure 9

Confusion matrix for the AMHCD database

5.2.4 Comparison of the Proposed CNN with Other Existing Methods

In first, we compare the proposed system with the well known LeNet-5 network [31]. Our objective by this experiment is to demonstrate that the architecture of our system is more adapted than the LeNet-5 network to the tifinagh characters recognition task. For that, the data set is divided into two sets: training and validation. At each iteration, the proposed system and LeNet-5 are trained with an equal number of images from the training set. By the use of the validation set, we obtain a more realistic estimation of how the networks would perform with unseen data and check the presence of overfitting.

We notice that we kept the original LeNet-5 architecture as in [31] except the activation function to overcome the vanishing gradient problem [45]. So, we used the Relu activation function instead of Tanh and Sigmoid activation functions.

Figures 10 and 11 show the accuracy and loss curves respectively according to the number of epochs obtained by our proposed model and the LeNet-5 architecture with Relu activation function. The Fig. 10, demonstrates that our network is behaving quite well. After only 6 epochs our network reaches 95%. As we can see in Fig. 11, both the training and validation data continue to fall with small minor spikes and no signs of overfitting. Unlike the LeNet-5 architecture that began to learn from data at the 15 first epochs but after that the model overfit as seen in the Figs. 10 and 11. Based on the loss and accuracy curves, our proposed CNN has achieved good validation accuracy with high consistency and it has outperformed the LeNet-5 with Relu activation function.

Fig. 10
figure 10

A plot of the accuracy over the course of 40 epochs for the proposed architecture and the LeNet-5 trained on the AMHCD data set

Fig. 11
figure 11

A plot of the losses over the course of 40 epochs for the proposed architecture and the LeNet-5 trained on the AMHCD data set

Finally, the proposed system is compared with various existing techniques. Table 3 gives performances obtained by the proposed method and by some existing systems using the AMHCD database. As we can see, the proposed approach gives the best performance without any preprocessing step (such as in [7, 22, 24, 26, 28, 46]), and gives the best accuracy even when we use all images in the AMHCD database, unlike all cited works in Table 3.

Table 3 Comparison with other existing approaches

6 Conclusion

In this paper, we have presented a recognition system of the Amazigh handwritten characters, based on the deep convolutional neural networks. The proposed system operates directly with the original character’s images where all published works require many preprocessing steps. The proposed system was evaluated using the entire characters in the AMHCD database and gives best performances compared with other existing systems in the literature including those who used the CNN networks. As a future work, we aim to extend our system for sentences recognition and multilingual handwriting recognition.