Keywords

1 Introduction

The number of children with learning difficulties (LD) is increasing in the whole world. A significant number of Qatari schoolchildren are facing difficulties in reading, understanding and writing Arabic words and sentences. Many of them stop learning and become somehow isolated and marginalized. In fact, in 2016, it was estimated that 20% of children in elementary schools in the country experience LD.Footnote 1 The current methods of teaching are not suitable for them. In addition, teachers do not have enough resources and time to introduce properly new concepts in the classrooms. Parents cannot afford the high cost of special education centers and private instructors to teach their children. Therefore, teaching these children is recognized as a serious and alarming problem that needs immediate intervention. They need special attention, dedicated care, and tailored curricula that suit each one of them. They should learn to communicate and rely on themselves in reading, understanding, and writing.

The aim of this research work is to develop and evaluate theoretical models and practical strategies for the use of multimodal multimedia (combining visual, auditory, and haptic modalities) to help children with LD overcome their learning problems. To achieve our goal, we propose an assistive platform entitled MOALEMFootnote 2 (Multimedia-Oriented Arabic Language Educational Materials) that uses multimedia technology, text mining, haptic and avatar tools. These software and hardware components have the potential to revolutionize the manner of learning and teaching in schools and special education centers. The proposed algorithms can automatically mine Arabic texts in a specific domain, extract the main concepts (actions, event, actors, etc.), and link them directly to multimedia elements. Children can then see images that explain the appropriate meaning of the Arabic vocabulary occurring in the text. Teachers can address logical questions to MOALEM and get intelligent responses through an inference engine. We use Arabic text processing techniques including morphological analysis and ontology [1]. MOALEM possesses a haptic (touch-based) device to teach children how to write Arabic words correctly by physically guiding their hand along a sequence of strokes of reference handwriting. MOALEM has an avatar called “Badr” that can pronounce standard Arabic words. Children can listen to and watch Badr to observe the correct pronunciation of words and learn the corresponding articulation and emotional expression. The usability of the MOALEM platform is not limited to special education centers and schools. Families can use MOALEM as well to assist their children at home. MOALEM can also be used to teach Arabic to nonnative learners who want to learn Arabic as a second language.

The chapter is organized as follows. Section 2 discusses the background. Sections 3 and 4 present the MOALEM platform. Finally, Sect. 5 concludes the chapter.

2 Background

The population of children with LD is noticeably increasing across the globe [2, 3]. These children require comprehensive care and quick intervention especially during the early childhood years [4]. They struggle with learning new concepts and can sometimes display a negative behavioral attitude. It is possible that they can successfully learn with different methods than those used with their normal peers. Thus, it is very important to understand the difficulties that these children are facing and develop a customized curriculum with personalized contents to accomplish remedial learning and put them on a productive path for new learning.

There are two types of learning disabilities, as detailed in [5]: (1) global learning disability (GLD) and (2) specific learning disability (SLD). Children with GLD have difficulties in appropriately understanding almost everything new as others would. These children are called “slower learners” in the literature and their thinking abilities are below average of their normal peers. Children with SLD are of average intelligence and they need different teaching methodologies to help them understand. They can continue as normal students if the appropriate teaching methods are found to suit their effective needs. These children are currently integrated in normal schools in the state of Qatar, and this adds key strains on these schools and its teachers. In fact, these children need more dedication and one-to-one teaching methodology, but schools cannot recruit new teachers to follow children individually.

Generally, children have problems in several language skills that include the following: (1) reading problems: for instance, they cannot distinguish properly among letters and words, different words, forming simple proper sentences; (2) writing problems: they have difficulties in writing letters and words properly, including writing letters out of order or in reverse; and (3) understanding problems: they may not be able to understand the correct meaning of simple words in each context. In addition, they may have difficulties understanding simple sentences correctly, forgetting names of objects and locations, or establishing logical relations between words. These difficulties can be due to different physical and psychological problems, for instance, intellectual disabilities, concentration deficit, difficulties in learning through conventional teaching methods, disorganization, limited self-reliance and confidence, and low self-esteem.

Early intervention with appropriate teaching and assessment methods tailored to these children’s needs can decrease their delays in learning and break their isolation. The interactivity between teacher and student is important for learning but is almost absent in the learning sessions due to the children’s lack of understanding of the concepts being taught, and to the teachers’ lack of media resources in remedial teaching. Multimedia technology can improve language learning for children with LD as we have shown in our previous work [4, 6,7,8]. In fact, multimedia can demonstrate new concepts using multiple appropriate modalities such visual, haptics, auditory, and gestures [9, 10]. It keeps the children engaged for a longer time and takes into consideration their different levels of difficulties to learn new concepts. Children can see a computer-animated tutor and can hear also the proper pronunciation of vocabulary [11]. Many research studies have shown that animations can improve the perception of new concepts especially in language learning [9, 10, 12, 13]. Learners who have used multimedia animations showed better progress and very good improvement in communication [5, 14,15,16].

A good number of educational systems exist in the literature. Cheng et al. [17] designed an online learning system for the Arabic language with ready-made content. Their system can select the learning material for every learner based on her knowledge that she provides at start. Erradi et al. proposed a simple game-like system called “ArabicTutor” [7] to teach Arabic. The content shows an Arabic word with its synonyms in different contexts with images. Wastam et al. [8] have developed a system to teach children stories through flashcards. The instructor selects a story and then asks the child to arrange its flashcards in a logical sequence. Rosmani and Abdul Wahab [18] proposed a simple prototype called “i-IQRA” to teach children the Holy Qura’n. It helps them to pronounce the verses in the right manner. Cheng et al. [19] proposed Crome for children and adults. Every lesson is followed by a set of exercises to assess the progress of the learners. Tabot and Hamada [20] have proposed a web-based educational system to teach physics to students. Ping et al. [21] have built an educational system for children with hearing impairment to teach them the Malay language. Wuang et al. [22] have built a multimedia courseware system that is based on learning theories. All these systems are based on static contents and address different learning objectives. However, none of the related work that we have seen so far could address all aspects related to Arabic language learning process (reading, writing, and observing and understanding) as the MOALEM platform is offering.

3 MOALEM Platform

We propose a new platform to improve the teaching of the Arabic language to Qatari children with LD and for nonnative Arabic learners at Qatar University. It is known that the learning process consists of reading, understanding, listening, and writing. However, many children cannot properly keep up with teachers in the classrooms and understand the Arabic vocabulary and grammar and pronounce words correctly. They need additional resources, personalized contents, and dedicated and skilled teachers. Very few elementary Qatari schools and educational centers have adequate staff and resources to help children with LD. For instance, the Shafallah center has different schools to teach children with different degrees of disability (i.e., mild, moderate, severe). However, the capacity of the center is limited and hundreds of children are on waiting lists. In addition, the number of children with LD is increasing every year in Qatar as the number of inhabitants is also growing considerably with the arrivals of more than 30,000 newcomers to work every month. Even though some schools have excellent teachers (e.g., Al Bayan schools), it is not, however, feasible for them to teach students in a one-on-one manner. Children with LD will have to seek other alternatives for learning. Parents will be obliged to hire private teachers to help their children keep up, which adds to the children’s expensive schooling. Figure 1 gives an overview of the MOALEM platform.

Fig. 1
figure 1

The MOALEM platform architecture

3.1 Dynamic Multimedia-Based Tutorial

This component of the platform consists of generating multimedia-based tutorials by mining Arabic text. It uses a core multidomain ontology that exploits existing Arabic natural language processing technologies including morphological analysis, logical rules, and discretization. The MOALEM platform will be able to understand Arabic story texts for children and generate personalized multimedia and adaptive tutorials. We develop an educational ontological model enhanced with an Arabic corpus that groups the terms, their synonyms, and multimedia elements. All the concepts of the ontology are semantically linked. The instructors can then address semantic queries to the ontology and get multimedia-based responses. We use search engines (e.g., Google, and Yahoo) to get additional multimedia contents whenever needed to complement the explanation of the text. This component consists of Arabic text processing, educational ontology construction, tutorial generation, and assessment and evaluation. For morphological and orthographic disambiguation, we use MADAMIRA [23].

3.1.1 Arabic Text Processing

Arabic has a high degree of syntactic freedom that is attributed mostly to two phenomena: verb position alternations and case endings. The verb in Arabic often occurs at the beginning of a sentence, but it can also appear after the subject. Arabic nouns have case endings that allow some degree of freedom especially in poetic form. Given the above, the task of Arabic text processing must first fully orthographically and morphologically disambiguate the text as well as produce a syntactic parse of it. Once done, an optimal set of linguistic features will be used in the MOALEM platform.

For morphological and orthographic disambiguation, we use the MADAMIRA system. MADAMIRA utilizes a rule-based morphological analyzer for words out of context and a set of classifiers trained on a corpus of Arabic annotated morphologically in context. The two resources are used to select for each word in context all of its morphological and orthographic features. These include the full diacritization of the word, identification of its stem and morphemes (clitics and affixes), part-of-speech, identification of the lemma (or citation form) of the word and its English gloss as well. The performance of MADAMIRA is quite competitive, scoring over 96% accuracy for lemma and stem as well as over 99% accuracy for word segmentation. Diacritization is also high (96%) when excluding the case markers (on which the performance drops to 86%). Finally, MADAMIRA’s speed is close to 1000 words per second in a server–client mode. Figure 2 (left) shows the MADAMIRA online interface performance on the sentence

“ the red fox chases the white rabbit.”

Fig. 2
figure 2

Madamira Arabic morphology analyzer (left); CATiB Arabic parser output (right)

For syntactic analysis, we use the CATIB dependency parser [24]. The parser produces a simplified dependency representation, which is based on the Columbia Arabic Treebank (CATIB). Dependency representations abstract away from surface order, and thus allow the different verb positions in Arabic to be represented in the same way. For instance, Fig. 2 (right) illustrates the parse tree associated with the sentence discussed above regardless of whether the verb is sentence initial or sentence medial. The parser expects the input to be tokenized in the Arabic Treebank and Columbia Arabic Treebank tokenization scheme, and to also be part-of-speech tagged. This step must follow the initial morphological analysis and disambiguation step done by MADAMIRA. The parser’s accuracy is almost 82% in terms of labeled attachment.

3.1.2 Educational Ontology Construction

An ontology is used to define the concepts of a domain and link them semantically. It is used for information modeling, sharing, and retrieval. Several domain-based ontologies have been recently developed. For instance, SNOMED ontology covers the medicine domain, and UNSPC ontology covers the products and services domain. We use an iterative approach that consists of designing first the global structure of the ontology, building a hierarchical taxonomy, design and fill the slots with instances, and finally validate the ontology. We create a knowledge base (KB) on the educational domain. This KB is enriched with the terms of new stories. The terms of interest are nouns (i.e., named entities, objects such as lion, dog, and cat), adjectives (color, size, etc.), and verbs (i.e., run, eat). We use an ontology of animal classes, as this is the most attractive domain for children. For instance, “Where does the camel live?”, “Which animals live with the camel?”, “Is the camel a carnivore or herbivore?” and so on.

3.1.3 Multimedia Tutorial Generation

For multimedia generation, we build a mapping component to link the dependency graphs with our ontologies according to each topic. The extracted relationships and entities are semantically mapped and validated according to the domain of discourse. Therefore, we need first to determine the similarity rating between the text-extracted key words, using the following formula that we adapted in our previous work [3]:

$$ \textit{fSimilarity}\left( zi, zj\right)=\left\{\begin{array}{c}\ \\ {}{w}_{ls}{fsim}_{ls}\left( zi, zj\right)+{w}_{ss}{fsim}_{ss}\left( zi, zj\right)\to zi= zj\ \\ {}\ \\ {}{fsim}_{ls}\left( zi, zj\right)\to \kern0.75em zi\ne zj\ \end{array}\right. $$

Once we find an instance that matches the subject and the object, we can then search for a matching property for the predicate in the domain ontology model. SPARQL is used as a query language associated with our ontology. It is used to extract knowledge and infer new knowledge. When a SPARQL query is executed, a list of instances satisfying the request will be generated (e.g., “Lion” instance with all its details). Finally, we check if the concepts to which the instances for subject and predicate are asserted in domain and range of the property coincide. After generating the mapping, we combine them into search engine queries (i.e., Google search query) in order to retrieve the corresponding multimedia-based which then it will be semiautomatically ranked and presented to the teacher. The proposed approach allows us to get detailed information from the Arabic story text including the following: character names, actions, and events. We send logical queries in simple Arabic words to the ontology to retrieve the corresponding multimedia elements. Figure 3 shows the result of processing an Arabic sentence and its conversion to multimedia elements. All the previous Arabic text disambiguation techniques can be used to improve the accuracy of text mining.

Fig. 3
figure 3

An Arabic sentence processing and mapping to multimedia

3.2 ALKATIB: The Haptic Handwriting Component

A simplified architecture of the ALKATIB system is shown in Fig. 4. It comprises six modules: (1) a language repository (storing Arabic alphabet data), (2) haptic rendering, (3) audio and visual rendering, (4) haptic interface, (5) a Quality of Performance Evaluation module, and (6) a Graphical User Interface (GUI). Note that the haptic system setup costs less than US$300 ($250 for the Novint Falcon device and less than $50 for the custom grip and software).

Fig. 4
figure 4

ALKATIB system architecture

The proposed design for the haptic device is shown in Fig. 5. The Graphical User Interface for the ALKATIB system is made up of two windows: instructor window and the student window. The student window (Fig. 6) enables the learner to load and play back handwriting tasks. The instructor window enables the instructor to author handwriting tasks by recording haptic, audio, and visual contents and display them to the student. The instructor window also provides a control panel to customize the haptic and the visual rendering to suit specific needs of the students.

Fig. 5
figure 5

The proposed design for the haptic handwriting device

Fig. 6
figure 6

The proposed system interface

3.2.1 Performance Evaluation for ALKATIB

The objective of the experiment is to compare partial haptic guidance and full haptic guidance to improve learning outcomes. The experimental setup included a laptop, the haptic interface (Novint Falcon haptic device with the custom grip), and the software application running on the laptop. The laptop has an Intel Core i7-2640 M CPU running at 2.80 GHz, 8 GB of RAM, an Intel HD Graphics 3000, and runs Windows 7 professional operating system (64-bit). A snapshot of the experimental setup is shown in Fig. 7. A total number of 22 adult users participated in the experiment who were divided into two groups, each one consisting of five females and six males. The age range was 18–45 years.

Fig. 7
figure 7

The experimental setup for ALKATIB

Group 1 began its training with the full haptic guidance mode in the first three sessions and then moved on using the partial haptic guidance in the last three sessions. Group 2, on the other hand, started with the partial (first three sessions) and continued with the full haptic guidance mode (three last sessions). We compare the average scores the participants earned by the end of the first session and the end of the last session for both expert and algorithmic evaluations. These results are depicted in Fig. 8. The improvement in the average score for Group 2 (21.5%) is significantly higher than the improvement in the average score for Group 1 (16.1%). The same conclusion can be derived by examining Fig. 9 (left) (algorithmic evaluation) and Fig. 9 (right) (expert evaluation).

Fig. 8
figure 8

Expert (top) and algorithmic (bottom) evaluations per group per session

Fig. 9
figure 9

Algorithmic evaluation (left); expert evaluation (right)

Comparing partial haptic guidance and full haptic guidance, it seems that when learning the gross aspects of handwriting trajectory, partial guidance is more efficient, while learning fine details of the handwriting is conveyed better with full guidance. This suggests that learning generic handwriting skills may utilize partial haptic guidance, whereas personalized handwriting skills can be learned better through full guidance.

3.3 ALNATEQ Badr: The Speech Synthesis Component

Research has shown that the perception of a language vocabulary is influenced by the speaker’s face, gestures, and sounds [25]. These elements are particularly valuable for children with language learning difficulties. Therefore, we propose to create a realistic animated talking face “Badr” and use it as a tutor in language learning with the children. Creating a realistic Arabic animated talking face will involve the following tasks. The computer-animated face is critically dependent on a speech synthesizer that provides phoneme and duration information as well as auditory speech. We create an Arabic auditory speech synthesizer and validate it with the teachers. Speech synthesizers put all of a language’s basic segments in memory and then combine these appropriately to generate new speech [26]. Thus, any new text can be used as input. The written text is translated into a phonemic representation and the segments in memory are optimally chosen and concatenated to provide the auditory speech. The computer-animated face uses the phoneme representation and the durations of the phonemes to create the appropriate facial and tongue animation, which is appropriately aligned with the synthesized auditory speech. MOALEM accesses the auditory speech database in order to carry out auditory speech synthesis.

Badr currently is implemented on a PC for development and as an application on iPhone devices. We will design an Arabic looking talking face, animate it with respect to Arab cultural behaviors and norms, and integrate it into the MOALEM platform window to make the talking face readily available to the teacher. Previous research has also developed computer-animated talkers in a variety of languages. Most relevant to the present proposal is the development of an Arabic talking face, which has been shown to be extremely accurate and to be effective in language learning. We will adapt our computer-animated tutor Baldi to pronounce Arabic words in standard Arabic. The modifications will be made on Baldi’s control parameters of the polygon model. One set of parameters controls the movement of vertices and their immediate neighbors. Geometric changes include rotation such as jaw rotation or translation in location of the vertices such as mouth widening. The scale and subareas of the face can also be changed such as in the cheek. The effect of the face can also be changed (e.g., showing happiness or sadness) by control parameters. Animation of the talking face Badr synthesizes a sequence of phonemes. Using text-to-speech synthesis, a written utterance is mapped into a phoneme sequence and their corresponding control parameters. Rounding and jaw rotation are examples of control parameters. The control of Badr also implements coarticulation, which means that the nature of a phoneme is influenced by surrounding phonemes. Direct instruction is an effective method for teaching new vocabulary [27]. Implementing different forms and representations of class material has tremendous impact on learning. Text along with visual images can be paired with appropriate definitions as well as the speech of the words to be learned. This multisensory approach with an English computer-animated agent has been effective in several studies with deaf and hard-of-hearing children [1]. Many new words and grammar were mastered in peer-reviewed experiments [28, 29]. Figure 10 shows a lesson to teach fruits and vegetables. Each lesson includes the talking face, the vocabulary words, and “stickers.” The goal is for the child to learn the most used vocabulary of vegetables and fruits. Badr might say “click on the olives.” If the child selects the right item, he is rewarded with praise and a happy face.

Fig. 10
figure 10

The talking face Badr

4 Evaluation and Assessment

Two groups of 10 children each are identified to use the platform: the first one will use MOALEM platform with its components while the other group will continue learning through the current conventional teaching methods. We will conduct the assessments periodically with the two groups measuring their advancement skills in reading, understanding, and writing of Arabic vocabulary and sentences. A set of specific words will be determined and used in both groups. Our assessments include several measurements and metrics that we have developed and used in our previous research [2, 4] that include time of reading a paragraph, explanation of the words, time of writing, and time to retrieve the meaning of words. Other metrics proposed in our work [30] are applied. All potential users of the system, including teachers, children, parents, and caregivers, will be requested to provide feedback on the benefits of the platform, its interfaces, effectiveness, usage, generation of adaptive and personalized learning tutorials, problems, and suggestions for improvements. In order to assess the reading/comprehension outcomes, we will utilize the Neuropsychological Assessment Battery for children. Nonverbal IQ skills will be measured with WISC nonverbal IQ subsets including picture completion, picture arrangement, block design, object assembly, digit-coding symbol, and mazes subtests. Furthermore, two Rapid Automatized Naming (RAN) tasks will be administered to the children (a picture RAN and a digit RAN). Finally, Arabic texts will be created for the purpose of the experiment (one fully vowelized and one non-vowelized). As for handwriting, the Evaluation Tool of Children’s Handwriting (ETCH) will be used to assess the quality of typical children handwriting. For the clinical sample, assessment will be done using the Wechsler Preschool and Primary Scale of IntelligenceFootnote 3 (WPPSI-III) for 5-year-old patients, and the Wechsler Intelligence Scale for Children (WISC-IV) for patients aged 6 and above. The cutoff range for selection will be 50–79, that is, mild intellectual difficulty to borderline difficulty range. The assessment will be divided into two parts. Part 1 will assess individual learning elements of the MOALEM platform (reading, writing, or speaking). We implement the multisensory computer-controlled environment to teach vocabulary and grammar directly. To test whether our computer-animated tutor, Badr, is responsible for the learning of new vocabulary, we carry out experiments using a multiple baseline design for each child. In this design, all words are tested, whereas a subset of these words is being trained as well as tested. Using this procedure guarantees that successful learning of the words only during training means that the training regimen was responsible for the learning. The lessons contain vocabulary words that are unique for each child. Each lesson consists of 24 words, broken down into three subsets of eight words each. The Arabic computer-animated talker says the word when images of the vocabulary items are selected, as shown in Fig. 4. The child responds to Badr’s instructions such as “click on the olive,” or “show the beets.” Reading exercises allow the child to recognize and type the word. Badr will also have the child pronounce the word when Badr names a highlighted image or simply highlights the image without pronouncing it.

5 Conclusion

We proposed a new platform to teach children with learning difficulties the Arabic vocabulary using multimedia technology. The platform supports reading, writing, and listening of Arabic words. It has a customized haptic device for writing and a talking face to pronounce Arabic vocabulary with gestures. The platform can be used in schools’ settings as well as at home where parents can assist their children to review the materials they study in schools. We have developed an educational ontology that allows the instructor to get semantically related information about words.