Keywords

1 Introduction

Emotions play a critical role in the learning process that may influence in a critical way the student’s performance. Emotions have an influence on cognitive processes by affecting student learning and achievements [1, 4]. Recently, especially due to the need of taking advantage of distance education because of the COVID-19 emergency, there has been a lot of interest in detecting emotions during learning in distance education contexts where the automatic assessment and monitoring of students’ emotions may provide information about their wellbeing and help in understanding problems and difficulties. This information can then be used for providing personalized support through appropriate interventions [41].

Emotional processes are directly linked to the learning process [2, 3], in particular, Pekrun et al. [3] highlighted that positive emotions are related to reflection and creative thinking, whereas negative emotions are more associated with lower levels of performance. The hypothesis underpinning our work is that the detection of emotions in distance education can be used to develop an emotional profile of the students not only to detect a specific emotional issue in a particular moment during the lecture, but also to monitor the evolving of the situation over time, and identify relevant changes.

Given the importance of detecting and monitoring emotions, advances in the field of computer vision made possible the recognition of emotions from facial expressions [5]. Many research works proposed to use facial expression analysis for detecting and interpreting students’ emotions during e-learning [6, 7]. However, many of them focused on the recognition of basic emotions (anger, disgust, fear, happiness, sadness, and surprise) [8] that, in this application domain, is not sufficient since, basic emotions are quite infrequent during short e-learning sessions [27] Moreover, they do not allow to understand the user’s mental state during the learning process [9]. Instead, affective states such as engagement, boredom, confusion, frustration, happiness, curiosity, and anxiety are much more frequent in this context since they are related to goals achievement, state of flow, the understanding of the learning material.

Since most software available for emotion recognition is trained to recognize only primary emotions, we developed a computer vision module for the recognition of the cognitive emotions that typically arise during the learning process. In the present work, we describe the development of a Facial Expressions Recognition (FER) system to recognize cognitive emotions in the context of distance education from facial expressions. Our FER system classifies the following cognitive emotions that were found to be more related to presence/lack of engagement and flow: enthusiasm, interest, surprise, boredom, perplexity, frustration, and the neutral one.

This module will be integrated into e-learning systems to create a more personalized educational environment in which the system detects and monitors, besides the evaluation of the students’ performance based for instance on tests, also their emotional states allowing to reason on their wellbeing during the learning process.

2 E-Learning and Cognitive Emotions

Emotion and cognition are related, and these relations become even more important in the context of online education [9]. Several studies investigated the kind of emotions and experiences that are present in e-learning activities.

Loderer et al. [28] identified three kinds of emotions in technology-rich learning environments: positive activating emotions (i.e. enjoyment), negative activating emotions (i.e. anxiety), and negative deactivating emotions (i.e. boredom). Duffy et al. [29] indicated that, in particular, positive emotions such as enjoyment and negative activating emotions such as anxiety as the most strongly experienced ones in e-learning contexts. Recently, D’Errico et al. [17] found a positive correlation between academic self-efficacy and the experience of positive emotions during e-learning activities, and that self-efficacy was negatively associated with negative emotions. In other words, when students experience positive emotions in e-learning contexts, can be interpreted as a positive experience of self-efficacy since they can feel the fact that they live a positive experience in interaction with the class. The focus of the present paper is to automatically recognize the cognitive emotions that in Scheffler work [38] can be considered as the “emotional filters through which we view the world, interpret its objects and evaluate its critical features. They involve seeing things as beneficial or harmful, promising or threatening, fulfilling or thwarting” (p.45). In addition, the acquisition of new knowledge/skills elicit cognitive emotions that usually monitor incoming content [30] and thus they can operate a central role in the learning process. Cognitive emotions can be also associated with the student’ “flow state”, that is a positive mental state associated to enjoyment and concentration during a stimulating activity [43]. In our perspective thus, observing the cognitive emotions can be a way to monitoring a more general state of the students, including his/her sense of control or enjoyment, toward the learning process. In this sense, the recognition of cognitive factors, at the basis of the emotional processes, can be a way to understand learner’s beliefs, expectations and goals [31] that are strictly linked to learning and content delivery. For instance, in terms of real time appraisal process, the emotional state of excitement can be strongly linked to a new acquired information (relevant for the student); or a state of frustration and confusion can be interpreted as negative feedback that the new information is in contrast with the previous ones. Speaking about psychological factors implied in the learning process, recent works showed the relevance of the self-efficacy in the academic adjustment, particularly considering the performance and the well-being (for a review see [17]).

The psychological features of the self-efficacy concern the students’ beliefs of being able to plan, control, and direct their learning activities. Thus, it implies cognitive strategies of (a) planning learning actions, (b) assessing learning activities, and (c) reflecting in order to modulate learning actions even in case of difficulties.

In particular, in e-learning environments, the cited study showed a different the cognitive emotions considering younger adults and adult students when they interact with teacher (chat or video activities). Results indicated that in the case of younger adults’ self-efficacy was linked to positive emotions, as in the case of interest, and also to academic adjustment and well-being. In parallel, in older adult students’ feeling negative cognitive emotions like, frustration and boredom, can be also an emotional signal of lower levels of self-efficacy and also low levels of academic adjustment [32]. On the contrary, for older students, weak academic self-efficacy could increase personal distress resulting in negative emotional states during online learning processes. This could probably be related to their low levels of self-control in returning to study and feeling difficulties as an impossible situation to face during online learning processes, for example. In other words, during e-learning activities, on one hand, young adult students with high self-efficacy could reach the ‘state of flow’ in which cognitive effort can be most likely promoted by the willingness to develop and to build one’s own professional pathway. On the other hand, older students with low self-efficacy could feel states of frustration and boredom, probably related to the awareness of the difficulties that need to be overcome to manage the academic tasks that they have to perform and to face. Indeed, these older students expressed emotions such as boredom, which could attest that an academic task is perceived as too simple for them or not interesting given their past experiences and knowledge. Another possible emotion could be frustration. In this case, frustration could indicate the presence of a task perceived too difficult for them. The co-presence of these negative states highlighted that for the older students the state of ‘flow’ is more difficult to achieve. This difficulty seems related to their low perceptions of efficacy in controlling academic tasks and self-regulate in academic online setting.

We also detected by face the relation between cognitive emotions and personality traits [18]. This previous study partially confirms findings in classical literature of emotional expressions since results show that for young students’ emotions like perplexity during the video lectures are negatively associated with energy and openness to experience. In the case of adults instead the energy trait and the emotional stability were associated with boredom and frustration. Moreover, the extroversion of young students is positively associated with positive emotions in the chat interaction with a tutor. In adult students, instead, energy and emotional stability are related to less presence of negative emotions like boredom and frustration, during the interaction in the online chat with the teacher. Instead, adult learners with low levels of emotional stability can easily lose the state of flow during the learning process.

In addition, for adult students, neuroticism can be related to more vulnerability with respect to negative emotions. For adult students with high levels of neuroticism, negative emotions could be provoked by an absence of new stimulus, when they feel boredom, or by a presence of problematic and complex information, when they feel frustration. Moreover, adult students with high levels of neuroticism show more negative emotions in all e-learning settings that have been examined. Finally, it is interesting to note that for adult students another significant personality trait to consider is conscientiousness. More Conscientious adult students feel low negative emotions, such as frustration, perplexity, and boredom, when they interact by chatting with peers.

3 Emotion Detection from Facial Expressions in E-Learning Context

To assess students’ emotional state, most of the times, self-report measures are used. In this case, the emotional state is usually collected with specific questionnaires in which students report their own perception of what they felt during the learning session. However, even if questionnaires are useful to collect subjective evaluation of the student’s state and to relate automatic measurements to self-assessment of being in a particular affective state, they have some limitations. First of all, they do not link the actual, expressed emotions of students to the particular moment of the learning task. Moreover, answering a questionnaire may take time and may result boring and disturbing for students.

Since in this context learning is performed using a digital environment, it is feasible to adopt an approach based on automatic analysis of the student behavior during the learning process. Emotions can be detected automatically from the analysis of a student’s behavior from multiple communication channels. For instance, in [38, 39] facial expressions have been used to detect the student affective state. In [40], besides facial expressions, other data have been used for the same purpose. However, facial expressions are the most commonly used communicative channel to display emotions and facial features are also the most commonly used for automatic emotion recognition since their detection does not require the adoption of expensive or intrusive hardware since webcams are present on many devices and the user does not have to wear sensors or particular devices. Then, we focus on facial features to detect affective states.

Several research studies recognize the emotional state of students in e-learning environments by analyzing facial expressions [19,20,21, 44]. Ashwin et al. [19], in their multi-user face detection-based e-learning system, used a SVM (Support Vector Machine) to classify emotions. Al-Alwani et al. [20] classify moods from facial features using a neural network for improving students’ involvement in e-learning platforms. Neural networks have been used in Magdin et al. [22] to evaluate in real-time the emotional state of the user through a webcam with a good accuracy. An approach based on a deep learning Convolutional Neural Network model for identifying students’ attentiveness from facial expressions is proposed in Tabassum et al. [26]. However, much of the work on emotion detection from facial expressions has focused on the emotions of anger, fear, sadness, happiness, disgust, and surprise. However, the results of the analysis reported in [27] indicate that basic emotions are quite infrequent during learning sessions with technology. In particular, they analyzed five studies concerning automatic monitoring and detection while users were performing conceptually difficult tasks. The analysis of results indicates that, in this context, emotions such as anxiety, boredom, confusion, curiosity, engagement, frustration, and happiness were much more frequent than basic ones.

In the proposed work we aim at developing a new FER system which recognizes in real-time the cognitive-emotional states of students in e-learning systems.

4 The FER System for Cognitive Emotion Recognition

Recognizing facial expressions from images requires the implementation of a pipeline involving different modules. Figure 1 illustrates the schema of the one used in our work in which, after a pre-processing phase, the faces present in the input image are detected and then they are cropped and registered [5]. These preliminary operations are necessary to get a similar position for the components of the face. Then, the feature extraction task is performed and the extracted features can then be used to classify facial emotions.

Fig. 1.
figure 1

A schema of a typical pipeline of a FER system.

4.1 Dataset

To train the classifier a dataset is needed with examples of the target emotions. As far as our knowledge goes there are not available validated datasets for the above-mentioned cognitive emotions with enough examples for each class to properly train a classifier. For this reason, we mixed up examples in the existing dataset with a set of images found on the web.

The first portion of the dataset was collected by taking images from different datasets:

  • EU-Emotion Stimulus Set [10],

  • Yale Face dataset of the UC San Diego,

  • Japanese Female Facial Expression (JAFFE) of the University of Kyushu [11],

  • Senthil IRTT database [12].

In particular, we selected a set of 200 images whose distribution is the following: enthusiasm (34), interest (28), surprise (32), boredom (32), perplexity (24), frustration (18), and neutral (32).

Then we selected by searching on the web other images for these emotions. We excluded images containing very exaggerated expressions. The selected images (210 in total) of facial expressions were clear and as frontal as possible, moreover, we excluded images of elderly people and without a beard to be as close as possible to the images taken by the validated dataset.

This new set of images was validated by three expert human raters and we considered the inter-annotator agreement by evaluating the Fleiss’ kappa [13]. The average value of the kappa calculated for all the images examined was equal to 0.81, then there was an almost perfect agreement among the raters.

Then the obtained final dataset was made up of 410 images in total with the following distribution: enthusiasm (60), interest (66), surprise (56), boredom (58), perplexity (58), frustration (58), and neutral (54).

4.2 Preprocessing, Face Detection and Cropping

The input of the implemented pipeline is a single facial image that it is converted in grayscale. Subsequently, the face in the image is detected with a Multi-task Cascaded Convolution Network MTCNN [45] and cropped. Figure 2 shows an example of this face.

Fig. 2.
figure 2

Face detection and cropping.

4.3 Classifying Facial Expressions

The emotion recognition system is based on machine learning, specifically on the classification task. The input of the classifier is a set of features extracted from the face. These sets of features should be created to characterize the facial expression. Due to the low numbers of examples for each class in the dataset, we did not consider approaches based on deep learning.

To decide which type of features to use to train a classifier for facial expressions we considered HOG (Histogram of Oriented Gradients, [14]) descriptors, AUs (Action Units [15]) and AU plus gaze direction.

HOG features have been considered since facial expressions result from muscle movements that generate a kind of deformation. Considering that HOG features are sensitive to object deformations and have been used widely as features in FER systems, they have been selected for our experiment.

As features, besides HOG (we considered 4464 HOG descriptors), we considered the presence and intensity (expressed as a float from 0 to 5) of 17 facial AUs (AU01r, AU02r, AU04r, AU05r, AU06r, AU07r, AU09r, AU10r, AU12r, AU14r, AU15r, AU17r, AU20r, AU23r, AU25r, AU26r, AU45r and the presence of the facial action unit AU28c) and estimated orientation of the subject’s gaze (x, y and z coordinates of the left and right eye gaze directions). Even, if the gaze is not part of facial expressions, the direction of the gaze may help in improving the recognition of certain affective states since they are related to cognitive processes and not necessarily to an answer to a stimulus [33].

To extract these features, we used OpenFace 2.0 [16] a freely available tool capable of accurate facial landmark detection, recognize a subset of Action Units (AUs), and gaze tracking and head pose estimation. The selected AU are those that are possible to estimate with the OpenFace software.

Figure 3 illustrates an example of a facial analysis made with the Openface software. It is possible to notice that on the left side of the figure the visualization of the extracted facial landmarks, gaze direction and head pose is shown. In the middle of the figure, it is possible to visualize the cropped face and its HOG representation. On the right side, the presence of AUs and their intensity is shown.

Fig. 3.
figure 3

Face analysis with OpenFace.

For selecting the most accurate classification model we tested the performance of three classification algorithms (Multi-SVM [23], Random Forest [24], MultiLayer Perceptron [25]) on three different sets of features: HOG, AUs, and AUs + Gaze.

To test the performance of the proposed approach we used the k-fold cross-validation with k = 10. In Table 1 the results of the testing phase are reported.

Table 1. A summary of precision rate and F1 scores of each feature set (HOG, AUs, AUs + GAZE) for the three algorithms (Multi-SVM, Random Forest, MLP).

We can notice that the Multi-SVM classifier (cost = 1000 e gamma = 0.001) reached the best precision rate using AUs + Gaze features. Random Forest classifiers on AUs and MultiLayer Perceptron (MLP) on AUs + Gaze have achieved a slightly lower precision than the Multi-SVM but it was still quite high.

Figure 4 shows an example of emotion recognition of a student’s facial expression as boredom during an e-learning session.

Fig. 4.
figure 4

Recognition of a facial expression as “boredom” during an e-learning class.

5 Conclusions

This paper presented our research in the context of emotions and their relation to the learning process in distance education contexts. In particular, we developed a FER system able to recognize cognitive emotions from facial expressions in real-time.

In previous studies, we obtained results that indicate that emotions can be used as indicators of the quality of the student’s learning process [17, 18]. We plan to use the cognitive emotion recognition module for analyzing the emotional profiles during distance education courses (MOOCs, e-learning) and to reason on the possible causes of positive and negative experiences during learning, so moving from recognition to the interpretation of the student’s mental state. In particular, we are planning a user study for monitoring the student’s mental state during the learning process and enhancing the student’s learning experience in real-time through the use of gamification strategies [34], a virtual tutor [35] or a robot [36].