Keywords

1 Introduction

One of the main limitations currently presented by educational software like Intelligent Tutoring System (ITS) is the lack of features to adapt to students’ emotional states [1, 5]. Much of nowadays ITS pay little or no attention to the emotional experience of students and because of this do not reach the full level of interactivity as humans tutors can do [3]. This limitation becomes relevant considering the inextricable relationship between emotions and learning [12].

Advances in the area known as Affective Computing have motivated a growing body of research in an effort to develop systems that could recognize and adapt to students’ affective states [7]. Despite these advances, accuracy of inferred emotion are not yet sufficient to be used as the bases for adaptation, particularly in real classroom learning environments [3]. There is also the practical challenge of deploying some of the physical affective sensors [1, 3].

In this context, this paper presents a proposal and implementation of a Hybrid Inference Model of Learning Related Emotions - ModHEmo. This proposal stand out by presenting an approach that combines physical and cognitive data, gathered with minimally or non intrusive sensors that could be easy deployed out-of-the-lab. This approach is grounded on the fact that emotions in humans are strongly related with some physical reactions, but also include a rational and cognitive process [12].

2 Hybrid Emotion Inference Model

As this proposal focus on educational software, we need to deal with emotion that impact and correlate with learning outcomes. Priors works [2,3,4, 8, 11] show that there is no consensus about a specific set of learning related emotion. So, to choose the set of emotions to infer we used as conceptual foundations the work of [14]. This work is based on two consolidated theory: the ‘circumplex model’ [13] and the ‘spiral learning model’ [9]. These theories define a two dimensional space in which the emotions of a student could dynamically move during learning. Based on these references, we decide to implement the inference process considering not a specific set of emotions, but grouping them into quadrants.

Figure 1 shows the approach used in this work to represent the learning related emotions in a two dimensional space. In this proposal the ‘Valence’(horizontal axis) and ‘Arousal’(vertical axis) dimensions are used to position the emotions in the “Q1”, “Q2”, “Q3” and “Q4” quadrants, more a “Neutral” state named “QN”. These quadrants played the role of classes in the classification processes performed by the ModHEmo that will be described in the next section.

Fig. 1.
figure 1

Quadrants and learning related emotions

The Fig. 2 schematically presents ModHEmo where the inference process is divided into two main components: physical and cognitive. The final inference of probable students’ affective state is made by combining the information of these two components.

Fig. 2.
figure 2

ModHEmo’s structure and sub-components (labeled with capital letters)

The physical component inference is based on the classical Ekman’s model [6] and include the eight basic emotions, which are: anger, disgust, fear, happy, sadness, surprise, contempt and neutral. In the cognitive component were considered the eight emotions of the Ortony, Clore and Collins - OCC theory [10] that have impact on the agent itself (e.g. student) and that are triggered as valenced (positive or negative) reaction in response to events. The eight cognitive emotion are: joy, distress, disappointment, relief, hope, fear, satisfaction and fears confirmed.

The cognitive component, based on OCC theory, is responsible for handling relevant events in the computing environment. The physical component deal with observable reactions, using students’ face images gathered using a standard Webcam, after the occurrence of a relevant event. These two components returns scores for each emotions of the physical and cognitive components. Further, each one of the components perform a mapping of the emotions’ scores for the respective quadrants (see Fig. 1 based on the values of the valence and arousal dimensions.

The final fusion process is accomplished by creating a single dataset containing quadrants scores of each component. After the fusion, this dataset contains 10 attributes (5 physical + 5 cognitive) with scores of quadrants (plus Neutral) for each component. The Class obtained through the labeling process (to be described in the next section) is also part of the dataset. Based on this dataset, we train and test two classification algorithms which perform the inference of the final result of the model.

3 Experiment Design and Results

To check the ModHEmo’s performance in a real educational environment, an experiment was conducted with 15 elementary students (ages between 11 and 15 years). In the experiment, the students used a customized version of the educational software ‘Tux, of Math Command’ or TuxMathFootnote 1. TuxMath is an educational game that allows kids to exercise their mathematical reasoning.

While students used TuxMath, some of the main events of the game were monitored. When a monitored event occurs, an image of the students’ face was captured and used as the input to physical component of ModHEmo. The kind of event serves as input for cognitive component.

After completing the game, students labeled the events using a customized tool in order to build a ground truth dataset. This tool allows the student to review the game section, synchronized with a video that shows their facial reactions captured by the webcam. The tool automatically stop the video when some monitored event has occurred and ask students to select a quadrant (represented by emoticons) that best describe their affective state at that moment.

A ground truth with 935 instances of monitored events was created and the classes distribution was 141, 188, 173, 130 and 303 for Q1, Q2, Q3, Q4 and QN, respectively. These dataset was used for training and testing (10-fold cross validation) the classification algorithms RandomForest and IBK using WekaFootnote 2. These algorithms were chosen due their simplicity and performance. The algorithm RandomForest achieve accuracy rate of 64.81% and Cohen’s Kappa of 0.545. The IBK accuracy rate was 63.53% and Cohen’s Kappa 0.532.

Another useful tool to analyze classifiers performance are ROC (Receiver Operating Characteristic) curves that depict the performance of a classifier without regard to class distribution or error costs. The Fig. 3 depict the ROC curves for the five classes obtained by the RandomForest algorithm (the curves for IBK are very similar). This figure also show the Area Under Curve (AUC) computed in Weka.

Fig. 3.
figure 3

ROC curves and AUC for RandomForest algorithm

In this work, we considered the premise that the combination of the physical and cognitive components could be an effective approach to improve the inference results. Thus, tests were performed to verify the impact in the inference process of using each of the components individually.

To perform this test we created two datasets: one with physical and other with cognitive attributes only. Using only cognitive attributes, the accuracy was respectively 39.25% and 40% for RandomForest and IBK. With only physical attributes the accuracy as respectively 55.29% and 52.19% for RandomForest and IBK. This result indicates that the combination of the two ModHEmo’s components was important for improving the inference accuracy.

The results achieved in the experiment with ModHEmo presented above show some improvements when compared with [2, 3, 11]. In our experiment, Cohen’s Kappa index was 0.545 and 0.532 for RandomForest and IBK, respectively and AUC value was between 0.843 and 0.888 (see Fig. 3).

4 Final Considerations, Limitations and Future Works

Inferences obtained with this model could be very useful for implementing learning environments or ITS able to appropriately recognize and respond to learners’ emotional reactions. The model described in this paper stand out by presenting a method to combine quite distinct information (physical and cognitive) that is little explored in the research community nowadays.

Even considering some limitations, the initial results obtained can be considered promising, since the results obtained are similar or superior to the state of the art. Furthermore, we believe that our hybrid approach resembles the natural process of emotions inference, thus presenting promising opportunities for future improvements by adding new data or sensors.

As future work wed intend to expand the current experiment involving more students with other age groups and also other types of educational environments. It is also intended to evaluate the result of the adding new information in the physical and cognitive components.