Keywords

1 Introduction

Different approaches exist that can be used to measure mood, for example, using graphical discrete scales, such as Likert scale, continuous scales such as the visual analogue scale (VAS) or other abstract methods such as colors, pictures, etcetera.

Discrete scales such as the Likert scale [22] or a continuous scale such as the visual analogue scale (VAS) [3] are suitable for mood assessment as they are generally quite intuitive and have been widely used in practice. However, using such scales requires participants to transform the concept of mood onto a numerical or graphical scale. That may result in some information loss, which makes graphical scales less practical for mood assessment. Furthermore, graphical scales have no particular inherent inclination to represent mood [11].

Other approaches of measuring affect are through affective pictures [13, 20, 21], smileys [2, 14], colors [7, 19] or physiological data [15, 17]. Photographic Affect Meter (PAM), for example, is using affective pictures to measure affect. It consists of 16 images, spatially allocated in a two-dimensional space, according to their ratings valence and arousal. The authors in [2] use a discrete scale for valence and arousal represented through icons and sad/happy smileys, for the assessment of arousal and valence, respectively. [7] uses colors to span a two-dimensional emotion space. Different colors represent emotions while color shades represents the intensity of the emotions. All those approaches provide an easily accessible way of reporting mood, however, they are limited to the amount of emotional intensities they provide.

These abstract representations, while very expressive cannot translate well between people as they are highly subjective in nature. In order to have consistency in the measurements, we need a representation which is universally understood by different populations and provides enough variation to describe a broader space.

Facial expressions are inherently linked with emotions and are a visual tool for us to communicate our emotions to the surrounding world. They are embodied representations of our feelings and are as such intrinsically suitable for measuring mood. We are also well versed in using and recognizing facial expressions, which supports the universality of the representation. Research has identified distinct facial expressions, which are associated universally with a specific emotion [5, 6]. For those basic emotions there is a distinctly associated facial expression.

Lorish et al. introduced the concept of using a face scale to measure mood [11]. He argues that facial expressions are tuned to capture and represent mood, because facial feature variations are universal, valid indicators of mood [5, 6]. Kamashita et al. explored the reliability of such scales by comparing them to VAS [9]. The authors evaluated two facial expression-based scales with a VAS scale, which resulted in a 0.68–0.70 correlation between both assessments. Also, in a user experience questionnaire, participants preferred the face scales to VAS scales. This yields the insight, that there might be some interaction quality unique to such scales. Another study conducted by McKinley et al. explored the consistency of a facial expression-based scale [12]. Seven photographs of facial expressions with increasing intensities had to be positioned on a VAS scale. Six out of seven photograph placements were almost equidistant and fell within the expected intervals.

If we are to use and/or improve such a method, we need to make sure that it is reliable in the sense that assessing mood with facial expressions yield at least comparable results to established mood measurement methods and sensitive in the way that assessments provided with such a scale will effectively capture changes in the mood.

Increasing HCI research has focused on the impact of emotions and their awareness on emotional wellbeing and mental health [16]. Such a system would be particularly useful in the context of affective disorders, for example depression. Such conditions are characterized by disturbances in the mood as one of the main symptoms. Being able to frequently assess a person’s mood could potentially provide us with a reasonable estimate of a person’s state of well-being and enable, for example, the early detection of depressive episodes.

2 Method

We developed an android application, which features a bipolar sad-to-happy facial expression scale and a VAS scale. The facial expression scale is represented through an image of a face, which can be interacted with to display happier or sadder expressions by sliding your finger vertically along the display (see Fig. 1). The middle point of the scale is the neutral expression. Navigating upwards displays increasingly happier expressions, while downwards – sadder ones. The image space features 101 images, where 50 represent happiness, 50 – sadness and one – the neutral expression. The images were taken from the male facial expressions of sadness–happiness of the dynamic visual analogue mood scales (D-VAMS) project [1]. The scale is conceptualized as a brief, nonverbal mood assessment instrument to be used for self-reporting. A slider with 101 discrete points represents the VAS scale (see Fig. 2). Text anchored on both extremes denotes the respective emotions (i.e. sadness and happiness). Both scales aim to capture the valence of the provided assessment. When providing an assessment, both scales were initialized in the neutral position, i.e. the slider positioned in the middle and the face – to a neutral expression.

Fig. 1.
figure 1

Screenshots of the D-VAMS face scale assessment from the application [1].

Fig. 2.
figure 2

Screenshot of the VAS scale assessment as taken from the application.

2.1 Participants

We recruited 11 healthy participants via flyers. Eight women and three men took part in the study, with an average age of 29. The participants were recruited from a research environment. They have been handed and signed an informed consent form.

2.2 Assessment

The conducted experiment aimed to evaluate, whether a facial-expression based scale would yield a comparable performance to a VAS scale for mood assessment and whether the user experience between both scales would differ.

Participants were asked to read 30 vignettes and use a smartphone provided by the experimenter for the assessment. Half of the vignettes were taken from [10] and were labeled with a positive emotion. The negative vignettes were collected from various online blogs and forums. The vignettes were paraphrased to portray a story from third persons’ perspective.

2.3 Procedure

Before starting the experiment, participants were presented with three training vignettes in order to be acquainted with the system. The results from the training set were omitted from the final dataset. Participants were asked to read each vignette and then use the application to assess the mood of the main actor in the vignette using both VAS and facial-expression based scales. All participants received the vignettes in the same order. The assessments were completed through both scales, presented in a randomized order for each vignette.

At the end, all participants filled out a user experience survey featuring 26 questions. The survey can be found in Appendix A. Eighteen questions evaluated the method and implementation. Those included the ease of use, suitability for mood-assessment, accuracy, satisfaction, user experience, responsiveness, intuitiveness and preference on unipolar Likert scales. Two questions evaluated the preference and speed of both implementations as bipolar Likert scales. Two yes/no questions prompted the participants if they would be able to use the interfaces without instructions. The survey also included four open-ended questions, which inquired about any potential difficulties participants might have had with the application or prompted them to share their insights as to how the assessment can be improved.

3 Results

The data was analyzed using python 3.6 with the numpy and pandas libraries. The plots were created using the seaborn library.

A Pearson’s correlation coefficient was calculated between VAS and the facial expression scale assessments, which yielded a 0.97 correlation for all participants. Figure 3 displays the results as a scatterplot, where the assessments obtained from the VAS and facial expression scale are plotted respectively on the Y- and X-axis. The lack of ‘neutral’ vignettes in the stimulus set explains the sparsity of assessments in the central region of the plot.

Fig. 3.
figure 3

Scatterplot of the assessments for each vignette and participant for the face scale and VAS

The average time to complete an assessment with the VAS scale was 4.2 s, while using the face scale took 5.6 s. Figure 4 depicts the relationship between assessment values provided with each interface and the respective duration.

Fig. 4.
figure 4

KDEplot depicting the relationship between assessment values and duration with both interfaces (The facial expressions values spread over the maximum value of 100, due to the gaussian kernel estimate used to model the data. The input is the assessment values and their respective durations. For the face scale assessments more often the maximum value was selected (see Fig. 3), which causes this effect.)

Table 1 features the part of results obtained from the user experience survey, which rated the method and implementation of each scale individually. The questions were represented through a five point Likert scale, where 1 was designated as a low/negative score and 5 – a high/positive one.

Albeit none of the results was statistically significant, due to the relatively low participant count, they still show consistent preference for the face scale on most aspects. Particularly interesting are the noticeable differences in the scores for satisfaction in the method section and user experience in the application section. On both accounts the face scale was preferred to VAS, with only two participants favoring the VAS on both accounts. Both participants left the open-ended questions blank. Four participants found the slider more unresponsive, as they would have liked. This would have partially influenced the user experience scores and the speed of assessment for the VAS scale.

Only one participant pointed out, that they would need instructions before using the face scale.

Table 2 features the preferred method of assessment as well as which inter-face was considered faster for assessing. Those were assessed on a bipolar Likert scale, where 1 favored the face scale and 5 – VAS. The results show that most participants found the VAS scale slightly faster than the face scale. This is also coherent with the results from Fig. 4, which established a 1.4 s difference on average for assessments between the VAS and the facial-expression scales.

However, most participants preferred the face scale for mood assessment. Two participants, which preferred the VAS scale in the previous section consistently preferred the VAS scale here as well.

Table 1. Mean (standard deviation), t-value and p-value scores on aspects of the method and application. Rated on a five point Likert scale for negative-to-positive responses.
Table 2. Ratings on a 1 to 5 Likert-scale for preference- and speed of assessment. 1 is the maximum value for the facial expression scale and 5 – that for the VAS scale.

Several participants revealed in the open-ended questions section that a simple sadness-happiness scale is insufficient to capture mood for the presented vignettes. One participant shared – “I think there is more to the emotional spectrum than just happiness or sadness. Other emotions might be relevant to depression as well. Such as fear, disgust, anger, disappointment, frustration, satisfied, grateful, relaxed, nervous, challenged.” Interestingly, one participant pointed out that they liked that the face scale featured a real face instead of a cartoon-like character - “I like the use of a real person and not a cartoon or smiley-type of representation.”

4 Discussion

First, we would like to acknowledge that the study was conducted as a pilot and is aimed to give us some insight on the proposed assessment method. As several participants pointed out, such an approach featuring only sad and happy facial expressions are not sufficient for true mood assessment. The study was set up to assess only based on a sadness-happiness scale. An open question remains, how scales featuring multiple mood dimensions would perform. Future research will aim to assess interfaces featuring multiple facial expression and produce a more comprehensive tool for mood-assessment.

The high correlation obtained from both assessments points to a high consistency of results with an already established mood measurement method such as VAS. Surprisingly, this is despite the fact complex emotions, such as awe or compassion were present in the vignettes. We acknowledge that the vignettes were presented in the same order for all participants, which might have introduced a carry-over effect. This effect, however, would be consistently present in both assessments. The randomized order in which both scales were presented after each vignette ensured that participants would not be able to ‘seek out’ the corresponding value on the latter. Furthermore, the facial expression scale provided no numerical reference as to what value is currently selected. This made it more difficult to simply carry over values from one scale to another. The design, unfortunately, does not allow to establish whether either scale ‘outperforms’ the other. This is due to the mismatch of the emotions portrayed by the vignettes and the dimensions available on the scales. Furthermore, the negative vignettes have not been rated. It will be interesting, however, to evaluate a multidimensional facial-expression based scale with a validated set of stimuli. Such an approach could provide some insights as to how sensitive and accurate a facial expression-based scale is in capturing mood.

The slightly faster average time it took for each VAS assessment can be attributed to the scale space being completely visible. The participants could immediately select a value lying on the extremes, while the face scale needed to be ‘browsed’. As the provided stimuli were emotionally charged, most of the assessments veered away from the neutral expression. Figure 4 visualizes the average time per vignette it took to complete an assessment with each scale with respect to the duration. Despite the fact that the facial expression scale had to be navigated, this didn’t influence assessment time as there is no pronounced relationship, which links longer assessment times with assessments lying on the extremities of the scale. This means that the interface could be easily navigated, yielded negligible slowdown and hints that the scale can be used for frequent assessments. A potential application for this method would be as an ecological momentary assessment (EMA) tool [18]. A longitudinal approach employing such a scale might reveal if such a scale would be viable if it is to be used as frequently as multiple times per day.

Most participants preferred the face scale, despite the slightly longer time required to provide an assessment; however, some still found the VAS scale to be more adequate for mood-assessment. The face scale was preferred to VAS on most accounts. This could be due to the scale providing a better interaction experience or due to a ‘novelty’ factor. A real-world application would reveal if the preference for such a scale would remain if it is used daily.

It would also be interesting how such a scale would perform in a clinical population. It is known that clinical populations have an attentional bias towards sadder-looking faces and perceive more negative expression in ambiguous faces [4, 8]. The implications of such a use case could result in more frequent and reliable mood-tracking, which could open up opportunities for the design of intervention systems. Such an approach could be further augmented by sensor data and enable a more comprehensive monitoring of patients.

5 Conclusion

This pilot study shows that assessing mood with a face scale provides similar results as assessing mood with a visual analogue scale. Additionally, most participants indicated to prefer a face scale to a visual analogue scale. The way the user interface was conceptualized resulted in slightly longer times required for assessment with a facial-expression based scale. However, most participants preferred such a scale in terms of ease of use, user experience and satisfaction.