Keywords

1 Introduction

There are different ways to evaluate a product, whether it is a physical good, a service provided, or a computational solution. In order to conduct a practical assessment, it is crucial to know how and when to use the different types of assessment available in the literature. Human-Computer Interaction (HCI) is a research area of Computer Science in which evaluation has high relevance. During the evaluation stage, problems in the interface and user interaction, not noticed in the design and development stages, are identified and corrected. Thus, after a systematic and careful evaluation, the user has the chance to receive a safer, more effective product that, above all, does not harm their experience while using the product.

In the context of evaluation and HCI, another concept intrinsically associated with usability is the concept of User eXperience (UX). According to Nilsen [36], the user experience encompasses all aspects of the end user’s interaction with its services and products. More specifically, UX is related to how people feel about a product and their pleasure and satisfaction when using it.

User emotion, in turn, is no longer just related to unexpected system response or frustration with an incomprehensible error message. It is now understood that a wide range of emotions plays an essential role in all tasks performed on the computer. When interacting with computer systems, users’ emotions are a fundamental aspect to help understand the user experience [4, 47].

This recent change in user emotion concerning interactive systems has raised the need to understand better what emotion is and how it influences the user during interaction. However, even though the term emotion is used very often, and several studies in the literature address this issue, there is no consensus on the concept, which is controversial even for specialists in the field [46].

Given that HCI is the intersection between Psychology and Social Sciences on the one hand and the combination of Computer Science and technology on the other, it is crucial to understand how different areas of knowledge understand and assess individuals’ emotional responses. Hence, we conducted a Systematic Mapping (SM) study whose objective is to identify instruments for evaluating emotional responses and find instruments from other fields that can be systematized and incorporated into the area of Computing or other areas.

This paper is divided as follows: Sect. 2 describes the theoretical foundation, Sect. 3 describes the protocol for planning, conducting and reporting the systematic mapping. Section 4 contains the synthesis of the results obtained through the mapping. In Sect. 5 we make the final considerations and our conclusions on the subject.

2 Theoretical Foundation

In this section, we present a summary of the study of emotions. In the literature, there are several definitions of emotion. According to Young [60], emotion is an acute disorder of the individual, of psychological origin, involving behavior, conscious experience, and visceral functioning. For Ekman [12], emotion refers to the process by which an elicitor is assessed automatically or in an extended way. An affect program may or may not be triggered, organized responses may occur, although more or less managed by attempts to control emotional behavior.

According to Izard [21], emotion is a complex concept with neurophysiological, neuromuscular, and phenomenological aspects. At the neurophysiological level, emotion is defined primarily in terms of patterns of electrochemical activity in the nervous system. At the neuromuscular level, emotion is primarily a facial activity, and facial patterns and secondary is a bodily response. At the phenomenological level, emotion is essentially a motivating experience or experience that has immediate meaning and importance. These definitions were found in the work proposed by Kleinginna and Kleinginna [25], in which the authors compiled a compilation of 92 definitions and nine skeptical statements from a variety of sources in the emotion literature.

For Coan and Allen [9], emotion is too broad a class of events to be a single scientific category. As psychologists use the term, it includes the euphoria of winning an Olympic gold medal, a brief startle with an unexpected noise, a deep, unrelenting pain, the fleeting pleasant sensations of a warm breeze. While it can also mean cardiovascular changes in response to the display of a movie, stalking and murder of an innocent victim, lifelong love for a child, feeling excited about no known reason, and interest in a newsletter.

The boundaries of emotion can be so confusing that everything can easily be characterized as emotion. Experts are not unanimous about what is an emotion and what is not. All the different types of events included in this term are essential, some of the vital importance. Nevertheless, it is increasingly evident that not all events can be explained in the same way. No description and evaluation framework can do justice to this heterogeneous class of events without differentiating one type of event from another [9].

2.1 Emotion Evaluation

The definition of Scherer [45] states that: “Emotion is defined as an episode of synchronized and interrelated changes in the states of all or most of the five subsystems of the organism in response to the evaluation of a stimulus event external or internal as relevant to the main concerns of the organism”. Therefore, we adopted this definition in this study. The rationale behind this choice lies in the fact that this definition is one of the most comprehensive. The five components are:

  1. 1.

    Cognitive evaluations, which have the function of evaluating objects and events;

  2. 2.

    Behavioral trends (action trends), responsible for preparing and directing activities;

  3. 3.

    Motor expressions (facial and vocal expressions), which communicate reactions and behavioral intentions;

  4. 4.

    Physiological reactions (physical symptoms), responsible for regulating the body;

  5. 5.

    Subjective feelings (conscious experience), which monitors the organism’s internal state and interaction with the environment.

This study focuses on the subjective feelings component. This means that we only considered self-report instruments for the systematic mapping study. In the field of Psychology, a self-report is any test, measure, or survey that is based on an individual’s own account of their symptoms, behaviors, beliefs, or feelings. Examples that are widely used are interviews (structured or not) and questionnaires, which are usually applied using paper and pencil, or online.

3 Methodology

This section outlines the protocol used to carry out this study. The protocol consists of five activities: defining the research questions, the search process, research strategy (inclusion/exclusion criteria), data extraction strategy, and synthesis of the extracted data.

3.1 Research Question

The present systematic mapping study addresses one leading research question, which we named RQ: “What self-report instruments are used to evaluate individuals’ emotional responses?”.

By answering this research question, we can discover which self-report instruments are used to assess individuals’ emotional responses. Since this is a broad question and is not limited to a specific area, the results may include well-known and widely used instruments, as well as innovative instruments that may be discovered and disseminated.

3.2 Search Process

The search process aimed at identifying studies that will answer our research question. In order to achieve this goal, we create a search string for the search process, gathering the most relevant terms related to the search question and combining them by logical operators. To obtain relevant and valuable results for this study, some iterations were carried out until reaching the terms that composed the following string used: (“emotion evaluation” OR “emotional evaluation” OR “emotional response evaluation” OR “evaluation of emotion”).

The procedure consisted of an automated search into well-known digital libraries in both Computing and Health areas. The electronic search was performed on:

3.3 Search Strategy

Inclusion (IC) and exclusion (EC) criteria were defined for the studies returned by the search string, as shown in Table 1.

Table 1. Selection Criteria.

The selection process followed six steps:

  1. 1.

    Execution of the search on the bases previously chosen;

  2. 2.

    Removal of duplicates studies;

  3. 3.

    Selection through title and abstract;

  4. 4.

    Application of selection criteria in the studies selected in step 3, in the full text;

  5. 5.

    Application of quality criteria in the final set of selected studies;

  6. 6.

    And finally, data extraction.

Quality criteria (see Table 2) were adopted to ensure that the selected studies were relevant to answering the research question raised. The possible answers to the questions were “yes”, “partially”, or “no”, quantified with the values “1”, “0.5”, and “0”, respectively. For the paper to be considered sufficient quality to have its data extracted for the research, it was necessary to reach a minimum score of 3.5 points. The studies that did not reach the minimum score were eliminated. The quality criteria applied to the studies are described in Table 2.

Table 2. Quality Criteria
Table 3. Set of studies identified.

The database search returned a total of 1410 studies, of which 736 were duplicates. Thus, only 674 went through the first iteration of the inclusion and exclusion criteria. At this stage, based on reading the title and abstract, we selected 70 studies. In the second iteration of the selection criteria, we read the full text of the remaining studies and applied the quality criteria, the final set of studies consisted of 52 studies. The identified works are described in Table 3.

3.4 Data Analysis

The data extraction process was carried out systematically, throughout a form for recording the information necessary for answer the research question, containing the following fields:

  1. 1.

    Study identifier (ID);

  2. 2.

    Title;

  3. 3.

    Authors;

  4. 4.

    Year;

  5. 5.

    Search base;

  6. 6.

    Evaluation instrument;

  7. 7.

    Instrument origin field;

  8. 8.

    Emotions evaluated by the instrument;

  9. 9.

    Target audience;

  10. 10.

    and Evaluation procedure.

Table 4. Instruments identified.

In Table 3 we show the ID, authors, year, source and the evaluation instrument, and Table 4, in turn, shows the instruments used in the studies described in Table 3. In total, we identified 19 assessment instruments, some instruments, such as the POMS, have variations and only one version appears in the table.

4 Results

The instruments described in the tables were divided into four categories: screening instruments, non-verbal instruments, instruments based on rating scales, and instruments based on the semantic differential.

4.1 Screening Instruments

A screening test is done to detect potential health disorders or diseases in people who do not have any disease symptoms. The goal is early detection and lifestyle changes or surveillance to reduce the risk of disease or detect it early enough to treat it most effectively. Brief psychological measures can be used to screen individuals for a range of mental health conditions. Screening measures are often questionnaires completed by clients. Screening tends to be quick to administer, but results are only indicative: if a positive result is found on a screening test, then the screening test can be followed up by a more definitive test [55]. The ID’s of instruments that fall into this category are: 14, 23, 25, 27, 30, 31 ans 32.

The following is a brief description of the instruments in this category:

  • Beck Youth Inventory (BYI II): this instrument uses five self-report inventories to assess symptoms of depression, anxiety, anger, disruptive behavior, and self-concept in children and adolescents [52];

  • Differential Emotion Sacale (DES-II): the DES is a standardized instrument that reliably divides the individual’s description of emotion experience into validated, discrete categories of emotion [22];

  • The Short Depression, Anxiety and Stress Scale (DASS-21): The Depression, Anxiety and Stress Scale - 21 Items (DASS-21) is a set of three self-report scales designed to measure the emotional states of depression, anxiety and stress. Each of the three DASS-21 scales contains 7 items, divided into subscales with similar content [29];

  • Hospital Anxiety and Depression Scale (HADS): This instrument measure anxiety and depression in a general medical population of patients. HADS focuses on non-physical symptoms so that it can be used to diagnose depression in people with significant physical ill-health [53];

  • Edinburgh Postnatal Depression Scale (EPDS): The 10-question Edinburgh Postnatal Depression Scale (EPDS) is a valuable and efficient way of identifying patients at risk for “perinatal” depression. The EPDS is easy to administer and has proven to be an effective screening tool. This instrument can only be applied by a specialist [10];

  • Geriatric Depression Scale (GDS): GDS is a self-report measure of depression in older adults. Users respond in a “Yes/No” format. This form can be completed in approximately 5 to 7 min, making it ideal for people who are easily fatigued or are limited in their ability to concentrate for longer periods of time. GDS is a scale widely used and it is an instrument that non-specialists can administer [59];

  • Beck Depression Inventory (BDI): The Beck Depression Inventory (BDI) is a 21-item, self-report rating inventory that measures characteristic attitudes and symptoms of depression. The BDI has been developed in different forms, including several computerized forms [5].

Screening tests for emotional disorders are usually administered by trained professionals. Systematizing them would be possible with the help of domain experts.

4.2 Non-verbal Instruments

Non-verbal instruments have no age restriction. They can be applied to children, the elderly, people with communication difficulties, and low education. The Self-Assessment Manikin (SAM)(ID’s: 6,10, 11, 12, 15, 16, 22 and 28) is a non-verbal instrument that is also based on a 9-point Likert Scale.

The Self-Assessment Manikin (see Fig. 1) is an image-based questionnaire developed by Bradley and Lang [7] to measure emotional response. The questionnaire, widely used in evaluations by Computing professionals, was designed to measure three characteristics of an emotional response (pleasure, arousal and dominance), identified as central to emotion in research conducted by Lang et al. [27]. SAM can be considered free of language; that is, any individual, of any schooling, can answer it.

Fig. 1.
figure 1

(Extracted from Bradley and Lang [7]).

SAM

Fig. 2.
figure 2

(Extracted from Russell [44]).

EmoCards

EmoCards (2) is an instrument made up of eight cards and is manually administered. The Emocard was inspired by the model of [44] and has eight emotions, each of these emotions is represented by a male and female face, totaling 16 cards, as shown in Fig. 2.

4.3 Instruments Based on Rating Scales

One of the most common rating scales is the Likert scale. The original Likert scale is a set of statements offered for a real or hypothetical situation under study. Participants are asked to show their level of agreement (from strongly disagree to agree strongly) with the given statement (items) on a metric scale. Here all the statements in combination reveal the specific dimension of the attitude towards the issue, hence, necessarily inter-linked with each other [24].

UCLA (1, 25) POMS (4, 7, 17, 19, 26), PANAS (2, 9, 18, 28), STAI (8, 16, 30), and OHQ (3), are instruments based on a Likert scale, as follows they will be briefly described.

  • University of California, Los Angeles Loneliness Scale (UCLA): The UCLA Loneliness Scale is a commonly used measure of loneliness. It was originally released in 1978 as a 20-item scale. It has since been revised several times and shorter versions have been introduced [43];

  • The Profile of Mood States (POMS): POMS questionnaires contain a series of descriptive words/statements that describe feelings people have. Subjects self report on each of these areas using a 5-point Likert scale. There are several versions of the POMS questionnaire. Currently, the most commonly used is the POMS 2, which is available for adults aged 18 years and older (POMS 2-A) and another for adolescents 13 to 17 years of age (POMS 2-Y). Both POMS 2 instruments are available as full-length (65 items) and short versions (35 items) [26];

  • : Positive and Negative Affect Schedule (PANAS): PANAS is a self-report questionnaire consisting of two 10-item scales to measure positive and negative affect. Each item is rated on a 5-point scale from 1 to 5 [11];

  • State-Trait Anxiety Inventory (STAI): The State-Trait Anxiety Inventory (STAI) is a commonly used measure of trait and state anxiety. It can be used in clinical settings to diagnose anxiety and to distinguish it from depressive syndromes. Form Y, its most popular version, has 20 items for assessing trait anxiety and 20 for state anxiety. All items are rated on a 4-point scale [50];

  • The Oxford Happiness Questionnaire (OHP): The Oxford Happiness Questionnaire (OHQ) is a widely-used scale for assessment of personal happiness. Each item of questionnaire each presented as a single statement can be endorsed on a uniform six-point Likert scale [19].

4.4 Instruments Based on the Semantic Differential

Developed by Osgood, Suci and Tannenbaum [38], the Semantic Differential usually takes a bipolar adjective scale of 5 or 7 points. This form usually differs according to the number of points on the scale, the degree, and marking of these points. The authors created this method when they realized the need to assess the affectivity and qualities of a concept, as well as ways to quantify the effective meaning of attitudes, opinions, perceptions, social image, personality, preferences, and interests of people or patients with content related to their health, treatment, and illness, which are not directly measurable. The works whose ID’s are 4, 7 and 20 use instruments based on the semantic differential [28].

The next section describes the potential threats related to the systematic mapping conducted

4.5 Threats to Validity

This subsection aims at presenting the most common threats to validity of this research. Such threats are described as follows:

  • Study inclusion/exclusion bias: If inclusion/exclusion criteria are conflicting, or very generic ones;

  • Construction of the search string: Problems with string construction can result in the search returning a large number of studies (including many irrelevant ones) or missing some relevant studies;

  • Data extraction bias: The data extraction phase can be hampered by the use of “open questions” on the variables collected, whose treatment is not explicitly discussed in the protocol;

  • Researcher bias: Finally, this threat refers to potential bias the authors of studies may have, while interpreting or synthesizing the extracted results.

5 Final Remarks

This study describes a systematic mapping of emotion evaluation instruments. Its contributions are the protocol planning and the mapping results. For each study selected, we extracted and summarized their information. The self-report instruments found in this study are mainly from the Psychology field and are aimed at adults. Most instruments are administered manually. Some instruments are already used in the Computing area.

The mapping also answered our research question and brought us several self-report instruments used in different domains to assess different emotions. Our objective is to offer a framework composed of a system with several of these systematized instruments (with the support of domain professionals), so that professionals can carry out their assessments and obtain data and results in real time.

We identified 32 papers that describe 18 different instruments. For each of these techniques we extracted and summarized their information. The self-report instruments that are used to assess the emotional responses of users listed in this study are mainly from the field of Psychology and aimed at adults. Most instruments are administered manually. Some instruments are already used in the Computing area, such as the SAM, the Semantic Differential and EmoCards.

We believe that this study is relevant for our field because computer professionals develop applications for several other areas, a frequent example of the tools focused on the Health area. In many cases, the application developer does not have feedback from the end-user or the specialist, and this occurs because, in most situations, the evaluation is in charge of the domain specialists themselves. Therefore, it is of utmost importance that the professionals who work to create them have the necessary tools to evaluate them. One way to achieve this goal is to analyze users’ emotional responses to these interactive systems. The identification of instruments that assess users’ emotional responses in different areas is, therefore, essential so that new instruments can be disseminated.