1 Introduction

Measuring emotions could be crucial in fields as varied as psychology, sociology, marketing, information technology, and e-learning. Consequently, several researchers have developed their own instruments to assess emotions [1]. The core channels/methods for measuring emotions are the following [2]: (1) questionnaire, (2) personal preference information, (3) speech recognition, (4) physiological data, and (5) facial expressions. Although this paper evaluates and uses facial expressions method, the following paragraphs briefly highlight some main points of the aforementioned emotion recognition methods.

Many researchers have used static methods such as questionnaires and dialogue boxes, in order to infer a user’s emotions. These methods are easy to administer but have been criticized for being static and thus not able to recognize changes in affective states. Moreover, Oatley recognized that self-reporting of emotions simplifies the recognition problem [3]. However, Dieterich, Malinowski, Kühme, and Schneider-Hufschmidt stated that this approach transfers one of the hardest problems in adaptive affective interfaces from the computer to the user [4]. Thus, another advantage of the questionnaire is that it provides feedback from the user’s point of view and not an outsider’s [1]. Questionnaires can be used to infer users’ emotions, either stand-alone or assisting another affect recognition method. On the other hand, the way questions are framed and demonstrated [5], the order in which questions are asked and the terminology employed in questions are all known to affect the subject’s responses [6, 7]. Similarly, there is evidence that judgments on rating scales are non-linear and that subjects hesitate to use the extreme ends of a rating scale [8]. Hence, when using verbal scales, one should make sure that the terminology employed and the context in which it is to be presented, really reflect the subjective significance of the subject population [9].

Emotional recognition frameworks using personal preference information are based on the assumption that people do not necessarily recognize emotions just by signals seen or heard; they also use a high level of knowledge and reason, to be able to process the goals, situations, and preferences of the user. A person’s emotions could be predictable if their goals and perception of relevant events were known [10]. Implemented in a computational model, this can be achieved by using agents, artificial intelligence techniques, reasoning on goals, situations, and preferences [11]. For example, if the system can reason about the reactions of a user from the input that the system receives, (assumption made derived from the time of day, speed of reading, provided personal information, etc.) appropriate content could be displayed in a way adapted for the emotion or the mood of the user.

The modulation of voice intonation is one (of the) main channel(s) of human emotional expression [12]. Certain emotional states, such as anger, fear, or joy, may produce physiological reactions [13], such as an increase in cardiac vibrations and more rapid breathing. These in turn have quite mechanical and thus predictable effects on speech, particularly on pitch (fundamental frequency F0), timing, and voice quality [14]. Some researchers have investigated the existence of reliable acoustic correlates of emotion in the acoustic characteristics of the signal [12, 15]. Their results agree on the speech correlates that are derived from physiological constraints and correspond with broad classes of basic emotions, but disagree and are unclear concerning the differences between the acoustic correlates of fear and surprise or boredom and sadness. This is perhaps explained by the fact that fear produces similar physiological reactions to surprise, and boredom produces similar physiological reactions to sadness, and consequently, very similar physiological correlates result in very similar acoustic correlates [14]. The task of machine recognition of basic emotions in non-formal everyday speech is extremely challenging.

Another valuable channel for emotional detection derives from the measurement of physiological quantities, such as temperature or blood pressure. This is important not only for the study of physiological processes and the clinical diagnostics of various diseases, but also for the estimation of emotional states. William James was the first who proposed that patterns of physiological response could be used to recognize emotion [16]. Psychologists have been using physiological measures as identifiers of human emotions such as anger, grief, and sadness [17]. Usually, changes in emotional state are associated with physiological responses such as changes in heart rate, respiration, temperature, and perspiration [18]. The use of engineering techniques and computers in physiological instrumentation and data analysis is a new, challenging research practice, especially when referring to emotional recognition. For instance, researchers at the MIT Media laboratory have been using sensors that detect galvanic skin response (GSR), blood volume pulse, respiration rate, and electromyographical activity of muscles [19]. The emotion mouse, an example of recent advances in affective computing, measures the user’s skin temperature, galvanic skin response (GSR), and heart rate and uses this data to categorize the user’s emotional state [20]. It has also been suggested that facial electromyography (EMG) could be potentially useful input signals in HCI [21, 22]. Therefore, there is a need for adequate measures to associate physiological measurements with definite emotional states in order to assign them to conditions meaningful to a computer [23]. Since the physiological state is so closely associated with the affective state, an accurate model of a physiological response could enable computer interactive environments to effectively determine a user’s affective state in order to guide appropriate customized interactions [24]. Nevertheless, subjective and physiological measures do not always agree, which indicate that physiological data may detect responses that users are either unconscious of or cannot recall at post-session subjective assessment [25]. Moreover, the sensors might often fail and result in missing or unfavorable data, a common problem in many multimodal scenarios, resulting in a considerable reduction in the performance of the pattern recognition system [26].

Research evidence supports the existence of a number of universally recognized facial expressions for emotion such as happiness, surprise, fear, sadness, anger, and disgust [27]. Therefore, estimating emotional experiences from objectively measured facial expressions has become an important research topic. Other facial recognition systems employ advanced video-based techniques [28] or measure the electrical activity of muscles with EMG (facial electromyography) [21].

An important issue is that many of the existing facial recognition systems rely on analyzing single facial images instead of tracking the changes in facial expressions continuously [29]. It would be more meaningful if the computerized learning environments could analyze the student’s facial expressions continuously to be able to react to changes in the student’s emotional state at the right time. Relative to this, Essa and Pentland made the point that the lack of temporal information is a significant limitation in many facial expression recognition systems. Consequently, methods for analyzing facial expressions in human–computer interaction, especially those concerning computer-aided learning systems, should incorporate a real-time analysis [28]. This can be achieved either by using advanced video-based techniques [28] or by measuring the electrical activity of muscles with EMG (facial electromyography) [21].

At present, different machine vision techniques using video cameras are the predominant methods in measuring facial expressions [3032]. A notable application is the FaceReader, lately developed by Vicar Vision and Noldus Information Technology bv. The FaceReader recognizes facial expressions by distinguishing six basic emotions (happy, angry, sad, surprised, scared, disgusted, and neutral) with an accuracy of 89% [33]. The system is based on Ekman and Friesen’s theory of the Facial Action Coding System (FACS) that states that basic emotions correspond with facial models [34]. Several studies have used FaceReader for different purposes [35, 36].

With regard to learning, there have been very few approaches for the purpose of affect recognition. A real-time analysis should be incorporated in human–computer interaction [2], especially concerning computer-aided learning systems. Previous studies in different fields showed that FaceReader is a reliable measuring tool [35, 36]. However, learning and self-assessment are procedures with particular characteristics.

This paper evaluated the effectiveness of FaceReader 2.0 during a computer-based assessment (CBA). Accordingly, FaceReader’s efficiency was measured in comparison with 2 experts’ observations. Moreover, the proportions of seven basic students’ emotions were estimated during the CBA and were also compared between genders.

2 Methodology

The course was an introductory informatics course, in the Department of Economic Sciences of a Greek University. The course contains theory and practice. In the theoretical module, students have to learn general concepts of Information and Communication Technology (ICT). In the practical module, students have to learn how to use Word Processing and Internet. Computer-based assessment (CBA) includes questions from both modules.

208 students enrolled to participate in computer-based assessment. The next step was the arrangement of the appointments. Finally, 172 applicants out of the 208 attended their appointments. There were 60 males (35%) and 112 females (65%). The average age of students was 18.4 (SD = 1.01). The CBA was voluntary. CBA consists of 45 multiple choice questions, and its duration was 45 min. Each question had 4 possible answers. The sequence of questions was randomized.

The use of the CBA was very simple. Each student had to choose the right answer, and then, he/she had to push the “next” button. Each page included the question, the 4 possible answers, and the “next” button. The text was in Greek. Teachers did not offer any additional instruction in the beginning. Only a few students, who were not very comfortable with the use of the assessment and asked for help with its use, received further information and instructions. The CBA’s appearance was simple, too, in order to avoid any effects of design and esthetics.

During the evaluation stage of a system, the effects of human–computer interaction (HCI) are often examined by what is called the “wizard of oz mode,” where a researcher hidden behind a curtain controls the system and makes observations [37]. Accordingly, each student took the test alone in a properly designed room. The room had two spaces. There was a bulkhead between the two spaces. At the first space, there was the PC on which the CBA took place. Moreover, the camera of the FaceReader was hidden in a bookcase. Besides, it is well known that people express themselves more freely when they feel that they are on their own.

In the second space were the 2 researchers. FaceReader was connected with another PC in that space, so the researchers were able to watch the facial expressions and the emotions of the participants in real time. The two researchers were also able to observe student’s actions during the test through VNC viewer software, which was presenting the student’s screen on a separate window of the researchers’ screen (Fig. 1). Each researcher recorded the student’s emotions measured by the FaceReader and his/her estimation regarding the student’s emotions at the same time, based on student’s facial expressions and actions.

Fig. 1
figure 1

Researchers’ screen: FaceReader and VNC viewer (student’s screen)

In a live analysis, FaceReader’s output is a number of charts and files. Each emotion is expressed as a value between 0 and 1, indicating the intensity of the emotion. “0” means that the emotion is not visible in the facial expression, and “1” means that the emotion is fully present. Only emotions of value ≥0.5 were evaluated by the researchers. Changes at FaceReader’s measurements in relation to student’s facial expression or/and actions (observed by the researchers during the test) determined whether a FaceReader measurement was confirmed or not.

The purpose of this study has two dimensions in the context of CBA: The first is the examination of FaceReader’s efficiency in measuring students’ instant emotions, and the second is to provide empirical data concerning students’ instant emotions.

3 Results

Firstly, it had to be examined whether the 2 researchers’ estimations were statistically different. It was important to show that these estimations were free from researchers’ opinions. This means that any researcher will have a good chance to show the same results if the experiment was repeated. Thus, a contingency table was created for each emotional state and overall. The 2 groups were the 2 researchers, and the outcomes were the agreement and the disagreement with the FaceReader (Table 1). The difference between the 2 researchers is not considered to be statistically significant in each emotional state and overall.

Table 1 Contingency table

Secondly, for the 172 students, 7,416 different emotional states were recorded by the FaceReader. Table 2 shows the results for each emotional state. The second column shows confirmed records. Confirmed records are FaceReader’s records that they are also confirmed by the researchers. In contrast, the third column shows all the records (Confirmed records + Not Confirmed records) of the FaceReader during CBA. Researchers and FaceReader had almost the same opinion regarding Neutral (99%) and Happy (90%) emotions. Moreover, researchers and FaceReader had high agreement for Scared (87%), Surprise (82%), and Sad (79%) emotions. However, the agreement results were lower regarding Disgusted (70%) and Angry (71%) emotions. Nevertheless, there was a high agreement overall between the emotions measured by the FaceReader and the researchers’ opinions (87%).

Table 2 FaceReader and researchers’ agreement on various emotional states

Moreover, Table 3 shows the agreement between researchers and FaceReader on the emotional states observed in each gender. Thus, the fourth column of Table 3 presents the proportion of confirmed instances to total (confirmed and not confirmed by the researchers and FaceReader’s records) FaceReader records for each emotion in each gender. Therefore, the null hypothesis was that the proportions of confirmed instances to total FaceReader records for each emotion would not be statistically different in each gender. The results of the Z test are presented at columns 5 and 6 of Table 3. For Neutral, Happy, and Angry emotions, FaceReader showed almost the same results in both genders. Scared emotion was recognized better by FaceReader regarding males than females with statistically significant difference. Finally, Sad emotion was recognized better by FaceReader regarding females than males, also with statistically significant difference. Thus, gender differences, concerning FaceReader performance, were observed in 2 out of 7 emotional states.

Table 3 FaceReader and researchers’ agreement on various emotional states observed regarding each gender

Table 4 demonstrates the confirmed (column 2) and total (column 3) proportions of each instant emotion records out of overall records during CBA. Z test was also used to compare the proportions of the two groups determining whether they are significantly different from one another. It was expected that Neutral would be the instant emotion with the higher proportion. During the CBA, students’ facial expressions stayed calm. Students changed their facial expressions instantly only if they read questions or answers that provoked them negative or positive emotions. However, the percentage of Neutral’s appearances in the overall emotions, observed by the FaceReader alone, was less (48%) than the percentage of confirmed Neutral appearances (55%) in the overall confirmed emotions (observed by FaceRader and confirmed by the researchers). The co-appearance, in FaceReader’s observations, of Neutral with other emotions such as Angry and Disgusted increased the total records and thus decreased the Neutral’s percentage. For cases, such as this, the researchers agreed most of the times only on the Neutral observation.

Table 4 Confirmed and total records percentages for each emotion records out of overall records during CBA

On the other hand, the percentage of confirmed Disgusted and Angry emotions in the overall confirmed observations was lower than it was for the overall observations of FaceReader alone. However, Surprised, Happy, Scared, and Sad were not statistically different. This indicates that FaceReader’s and researchers’ observations agreed concerning these emotions during the CBA. The results also showed that “negative” emotions (Angry, Sad, and Disgusted) appeared more often than positive emotions such as Happy.

Table 5 demonstrates the confirmed (column 2) and total (column 3) percentages of instant emotions for each gender. Neutral and Angry were also statistically different for both genders. However, Disgusted was statistically different only for males. This indicates that there was an agreement between the FaceReader and researchers’ observations concerning females’ emotions of Disgusted. Thus, concerning Happy, Scared, Surprised, and Sad emotions, FaceReader’s and researchers’ observations were statistically indistinguishable in both genders.

Table 5 Confirmed and total records’ percentages for each emotion records out of overall records during CBA in each gender

Moreover, we compared the confirmed percentages of the two genders for each emotion records out of overall records during CBA. Table 6 shows whether the differences between the two genders are statistically significant. Results indicated that males were more Disgusted and Angry than females. On the other hand, females showed significantly more times Neutral and Happy facial expressions than males. Surprised, Scared, and Sad had no significant difference between the two genders regarding confirmed records.

Table 6 Statistical significance of the differences between the confirmed percentages for each emotion records out of overall confirmed records during CBA in each gender

4 Discussion

Measuring instant emotions by using facial expressions is a well-known method. However, this knowledge and technology have not been yet extensively used in learning environments. The aim of this study was firstly to examine the effectiveness of the FaceReader during a computer-based assessment. In parallel, we demonstrated the instant emotions’ percentages that came up during the CBA. In other words, we presented how the students felt instantly while taking the CBA. Furthermore, we extended our analysis to genders in order to highlight differences between them.

Results showed that FaceReader is capable of measuring emotions with an efficacy of over 87% during CBA (Fig. 2) and that it could be successfully integrated into a computer-aided learning system for the purpose of emotion recognition. Specifically, FaceReader successfully recognized Surprised, Happy, Scared, and Sad emotions (Fig. 2). FaceReader was also successful for Neutral (Fig. 2).

Fig. 2
figure 2

FaceReader and researchers’ agreement on various emotional states

Moreover, results indicated that FaceReader did not have significant differences regarding emotion recognition between genders, except for Sad and Scared emotions (Fig. 3). For Sad, FaceReader was more successful for females. For males, FaceReader was more effective for Scared.

Fig. 3
figure 3

FaceReader and researchers’ agreement on various emotional states observed regarding each gender. *Emotions with significant differences regarding emotion recognition between genders

Our analysis showed limitations concerning the distinction between Neutral, Angry, and Disgusted for males during CBA. Practitioners and researchers could improve the effectiveness of emotion face recognition methods to be more effective in distinguishing between Neutral, Angry, and Disgusted in the context of CBA. Specifically, Figs. 4, 5, and 6 show examples of FaceReader’s limitations during CBA. As we discussed earlier, most of the times FaceReader measured simultaneously Angry and Disgusted, the researchers agreed only with the presence of an Angry emotion (Fig. 4). Some movements of jaw, mouth, and nose may have interfered with the FaceReader’s accuracy.

Fig. 4
figure 4

Angry and Disgusted emotions co-appearance

Fig. 5
figure 5

Angry and Neutral emotions co-appearance

Fig. 6
figure 6

Modeling failed

Additionally, many times FaceReader measured an Angry emotion simultaneously with a Neutral one, but Neutral was the only emotion confirmed by the researchers (Fig. 5). This particular disagreement was expected. When participants read the questions, many of them had clouded brow. People are taking this facial expression when reading something with great concentration. Zaman and Shrimpto-Smith came up to the same result [1]. This may be the reason for FaceReader measuring, so frequently, an Angry emotion at the same time with a Neutral one.

Moreover, FaceReader faced limitations with participants that wore glasses or had piercing. Other problems were caused by special characteristics of some persons like big noses, bushy brows, small eyes, or chins. Another difficulty was fringes reaching down to eyebrows (Fig. 6).

However, these limitations are being confronted. Researchers currently classify features that are located outside the modeled area of the face (e.g. hair) or features that are poorly modeled, such as wrinkles, tattoos, piercing, and birthmarks. Moreover, person identification will be added to the system [33].

Our analysis also included the measurements of the different instant emotions that appeared during the CBA. Neutral was the most dominant of confirmed instant emotions with 55% (Fig. 7). As we said earlier, most of the time students’ facial expressions stayed calm and they were changing their facial expressions only if they read something that changed their emotions, such as a very difficult or a very easy question. Besides Neutral, the appearance of confirmed Angry was also very large with 20% (Fig. 7). This is a very crucial result. Angry is a negative emotion that could disorganize student’s effectiveness during a self-assessment or a learning procedure [38]. Another negative confirmed instant emotion with large percentage during the test was Sad (9.1%). Similarly, Sad could have negative effects on student’s attention and motivation [39]. Disgusted (4.6%) and Scared (3%) are other two negative confirmed emotions that were not observed extensively (Fig. 7). However, their measurement is also important because if practitioners and researchers wish to manage student’s instant emotions, they also have to take into account Disgusted and Scared [40]. During CBA, Disgusted and Scared are two negative emotions that can have an influence on student’s emotional experience. Scared and Disgusted were observed most of the times after a big series of wrong answers. On the other hand, confirmed Happy (4%) had also a small percentage during the CBA (Fig. 7). This result may be justified, since a test is an anxiety provoking procedure. Happy was observed when students answered correctly a difficult question or during the last questions if they felt that they had already reached a good score.

Fig. 7
figure 7

Confirmed records percentages for each emotion records out of overall confirmed records during CBA

Moreover, gender analysis revealed some useful results (Fig. 8). Surprised, Scared, and Sad had no significant difference between genders. Males presented significant larger percentages for Disgusted and Angry. This may indicate that males lose easier their temper and concentration. On the other side, females appeared to experience more Neutral and Happy emotions.

Fig. 8
figure 8

Confirmed records for each emotion out of overall confirmed records during CBA for each gender. *Emotions with significant differences regarding confirmed records percentages between genders

When the effect of negative emotions (such as Sad, Fear, or angry) is too intense, the student’s performance can be seriously impaired. Frequent errors could create the expectation of more errors, thus increasing negative emotions, and leading to even more wrong answers until the student’s performance collapses [41]. Positive emotions may also occasionally necessitate instruction. For instance, providing the correct answer to a hard question could induce positive emotions such as joy and enthusiasm, but also lead to loss of concentration if too much consideration is given to the elicited emotions.

Although fear was not often observed in this study, it is still an emotion that can have a detrimental effect on students’ performance during a test [42, 43]. Neither was happy often observed, but positive emotions may also occasionally necessitate instruction. For instance, positive emotions can lead students to focus on the excitement and undervalue the effort required to achieve a successful result [44, 45]. On the other hand, Angry and Sad emotions were observed often enough in this study to be emotions “calling for feedback.”

Regarding emotional feedback, Economides proposed an emotional feedback framework, taking as field of application the CAT (Computer Adaptive Testing) systems, in order to manage emotions [44, 46]. The emotional feedback can occur before and after the test, during the test, and before and after a student’s answer to a question [46, 47]. In all these cases, emotional feedback can be provided either automatically according to the student’s emotional state, either upon the student’s or the teacher’s request. Humor and jokes, amusing games, expressions of sympathy, reward, pleasant surprises, encouragement, acceptance, praises but also criticism are some of the possible actions that could be practiced by a testing system [44].

Finally, gender analysis revealed that females exhibited significantly higher percentages for Neutral and Happy emotions. On the other hand, males appeared to experience more Disgusted and Angry emotions. Therefore, the results of this study indicate that gender differences should be seriously taken into account when designing emotional feedback strategies for computerized tests.

5 Conclusions

An instrument like FaceReader is very crucial for the amelioration of computer-aided learning systems. Educators will have the opportunity to better recognize how their students are feeling during the learning procedures and they will also be able to give better and more effective emotional feedback in learning, self-assessment, or CAT (Computer Adaptive Testing) systems [41].

To our best knowledge, this is the first study that evaluated an emotional facial recognition instrument during CBA. Our analysis indicates some useful results. Firstly, FaceReader is efficient in measuring emotions with over 87% during CBA. Specifically, FaceReader successfully recognized Neutral, Surprised, Happy, Scared, and Sad emotions and it faces some limitations with Angry and Disgusted. Moreover, our research indicates that FaceReader did not have significant differences regarding emotion recognition between genders, except for Sad, in which it was more successful for females and for Scared, in which it was more effective for males.

Besides the evaluation of FaceReader, this study provides empirical data for the emotional states of students during computer-based assessments and learning procedures. Our analysis shows that Neutral (55%) was the dominant instant emotion, followed by Angry (20%) and Sad (9%). Students also experienced the other four instant emotions, that FaceReader is able to measure, at lower percentages such as Disgusted with 4.5%, Happy with 4%, Surprised with 3.3%, and Scared with 3%. Finally, gender analysis revealed that females presented significantly larger percentages for Neutral and Happy. On the other side, males appeared to experience more Disgusted and Angry emotions.

To conclude, our study provides useful and important results regarding the effectiveness of FaceReader and the students’ instant emotions during CBA. These results could be useful for tutors, researchers, and practitioners.