Keywords

1 Introduction

The task of recognizing and detecting emotions is relevant for decades. Knowledge of the emotions experienced by a subject can be useful in many fields, e.g., marketing, education, robotics, banking, advertising, and others. People’s moods can be recognized through many distinctive abilities. The most common are facial expressions, gestures, voice volume and interruption, and physiological characteristics (sweating, heart rate, skin conductivity, etc.). A human is capable of recognizing emotion from a person’s facial expression with a high degree of accuracy, but for an automated system, it is a highly complex task. A computer needs to take many factors into account when identifying moods, and there are other problems as well. Cultural differences are also important in analysis because similar facial expressions in different cultures can sometimes mean the opposite of emotion, leading to a deliberately false assessment. Furthermore, facial expressions may not reflect a person’s real feelings, or may reflect false ones. Therefore, there is a need to create a new way of identifying and analyzing emotions that can be applied to different ethnic and linguistic groups, avoiding the difficulties of existing mechanisms.

The diversity of data types is leading to new and improved analysis methods. Today, there are many approaches to detect the psycho-emotional state of a person, capable of analyzing speech recordings, digitized speech recognition results, textual data, images, and video materials. Textual information can serve as the main source of data about a person’s state. The application of machine learning in text analysis leads to an accuracy of over 90% [1]. Also, high quality of emotion classification can be achieved through the study of audio signal [2, 3] and combining the modalities, audio, and video [4]. Video-based emotion analysis also makes it possible to study the relationship between cognitive function and emotion in humans, a particularly acute issue in the elderly [5]. Video streaming also makes it possible to assess the emotional state of a car driver [6], which will help to avoid accidents and keep all road users safe from dangerous situations. For more detailed research in emotion recognition, there are datasets that include several modalities at once [7], such as audio, video, oculography, speech, motion capture data, and others.

The emotions evoked in a person, intentionally or not, leave a trace in their physiology. In this way, information on afferent muscle excitation [8], skin conductance [9], functional magnetic resonance imaging (fMRI) data, respiration rate [10] can be used to determine a subject’s emotional state, sadness, happiness, joy, and others.

To analyze emotional states, it is possible to use different modalities, which is a wide field for researchers.

2 The Need for a New Modality

Modalities for recognizing emotional states such as text, audio, video, or characteristics of human physiology are quite informative sources of knowledge about a subject, but each provides advantages and disadvantages in the accumulation, transformation, and processing of data.

There are more than 7000 languages in the world, many of which have dialects. However, only a seventh can be machine-processed, making it difficult to analyze the emotions of the rest of the language space. In addition, textual and audio modalities are not always available for processing, nor are video, photo and physiological characteristics. This becomes the reason for seeking out and analyzing new sources of data. One of them is unconscious actions of a subject while interacting with a computer, in particular is with a computer mouse. During this interaction, a person experiences many emotions acquired during work and from the environment, and at the same time makes hundreds of cursor movements, which can be indicative of the subject’s internal state. The creation of an emotion recognition mechanism based on unconscious activity is driven by the need to monitor the emotional well-being of the population to identify problematic situations in the early stages of development. The implementation of the mechanism will solve the problem of mental ill-health of the subject (for example, emotional burnout, fatigue, depression, etc.) by identifying bad moods, irritability at the initial stage of their manifestation through computer mouse movements. However, the lack of freely available data for studying the problem entails the need to accumulate such a set. It will help to identify and assess the subject’s emotions regardless of language, race or other affiliations.

A person’s movements are constantly changing according to the senses they are experiencing. If we consider computer mouse movement, the subject continuously processes data about the desired hand location and its difference from the actual location, while generating the necessary motor commands to achieve the goal [11]. Neurological movement disorders, such as Parkinson’s and Tourette’s diseases, prove the assumption that cursor movements in a choice task are influenced by emotion [12]. Experienced arousal in online shopping influences the duration of movements and quantitative changes in computer mouse speed [13]. Decision making and change in emotion can also be encountered when using not only a computer mouse, keyboard, and other things, but also when using a smartphone. The gadget can be applied to study changes in levels of joy while interacting with it [14].

An important aspect of the analysis is to identify and define the indicators to look for. To obtain a plausible assessment of a subject’s emotional state, it is necessary to understand what indicators to look for. A study of the impact of negative emotions on cursor speed and distance travelled can help in this regard [15]. Besides the distance travelled and the mouse cursor speed, the use of such spatial characteristics of the trajectory as AUC (area under the curve), and zigzag (the number of trajectory deviations from a straight line connecting the start and the end points) are considered [12].

The analysis of emotional state based on the subject’s unconscious actions, such as mouse movement, interaction with a smartphone, etc., is an innovative approach for this task. The versatility of the mechanism lies in its independence from the subject’s race, language, and other affiliations, which makes the modality most suitable for investigating psycho-emotional states in the absence or deficiency of such data as text, audio, video, and others. Both spatial characteristics such as distance travelled, curve deviation and temporal characteristics such as total mouse interaction time per session can be considered as properties of cursor movement for the task. This allows us to analyze how well the emotion scores are based on computer mouse activity, and to extend the dataset with additional modalities to achieve better results.

3 Creation of a Mechanism for Recognizing the Emotional State of the Subject Based on His Unconscious Actions

3.1 Designing a Mechanism for Recognizing the Emotional State of the Subject

The first step in implementing a mechanism for recognizing a person’s psycho-emotional state is to design the mechanism. For successful design it is necessary to understand who will interact with the system, which blocks are needed to implement the analysis, and how the blocks will be linked together. The structure of the emotion recognition mechanism is shown in Fig. 1.

Fig. 1.
figure 1

Structure of the mechanism for recognizing the emotional state of the subject.

The person who will communicate directly with the system is the user who controls the mouse when performing tasks. The emotion recognition mechanism consists of two main units: data collecting and processing. Data collection for analysis in the current study is done manually, this is due to the lack of freely available datasets that provide information on human-computer interaction by controlling the mouse cursor. An algorithm for creating an in-house dataset, including the coordinates of the subject’s cursor position and an estimate of his or her emotional state, developed and implemented. This mechanism will be described in more detail in Sect. 3.2. After the necessary data is received from the user, it is fed for processing to the next unit, where the estimation of the emotional state of the user interacting with the system is performed based on the spatial characteristics of the mouse cursor position and the accumulated experience of the mechanism. A detailed description of the data processing algorithm is presented in Sect. 3.2. The output of the mechanism is the classification of the user as a subject with a positive, negative, or neutral emotional state.

Based on the description of the human emotion recognition mechanism, it can be deduced that the input data for emotion analysis are the spatial characteristics of the mouse cursor position, the output data are the resulting assessment for the user. Such inputs have nothing to do with the subject’s language, appearance, or other factors, confirming the universality of the emotional state analysis mechanism.

In the following chapters, the data collection and processing algorithm and the training of neural networks for classifying participants are described in detail.

3.2 Collect Spatial Characteristics of the Mouse Cursor Position and Create Own Dataset

The task of analyzing the emotional state of a user is one of the classification family, where, based on a training sample, the model identifies relationships between dependent and independent variables to enable further accurate prediction of results based on the data obtained for the first time. The classification task using information technology exists for decades, helping people to perform analysis in very different areas of life. A training dataset is a necessary part of the classification task.

The problem of emotion analysis by computer mouse movements is new and accumulating and canonizing such data is a time-consuming process. Therefore, such training sets are not publicly available, making analysis in this area difficult. To implement collection of data about a subject’s cursor location during interactions with a computer, it is necessary to design and recreate a user interface that is free of distractions so that the evaluation of emotional state can be more reliable. It is also necessary to organize the data collection itself and the mechanism of interaction between the handler program and the user interface. This work is done in the data collection unit, the structure of which is shown in Fig. 2.

As a platform for research participants to interact with the interface as well as to save the coordinates of the cursor location in parallel, it is decided to create a web application, which is the block of the engine responsible for data collection. It is a structure consisting of the following elements:

  1. 1.

    A database into which the details of each participant are placed.

  2. 2.

    A server, written in the Python programming language, responsible for processing client-side requests, accumulating and transforming received data, and providing feedback.

  3. 3.

    Web-interface is an interface between a client (the page with which a computer user interacts) and a server, designed for the stable functioning of a web application.

The interface is written using the FastAPI web framework [16] for Python.

Fig. 2.
figure 2

Structure of the “Data collecting” block of the emotion recognition mechanism of the subject.

The participant never interacts directly with the server, it is always done by sending GET, POST and other requests going through the web interface and getting a response. These responses can be either web pages that are displayed to the user, status codes, success messages, etc.

A web application is a complex structure that combines a client and server side, as well as a web interface. First, let’s look at what the client side is like. In the first step of interacting with the web application the user sees a welcome page with introductory instructions and instructions for further actions. The second page is a small survey to understand whose data will be processed in the future.

When the user completes registration in the system (by pressing the “Continue” button), a page is made available with an image that has the potential to induce a person to experience a particular emotion. These images are taken from the open affective image database OASIS [17]. There is no limit to the amount of time a participant can view the page. The participant is then asked to rate their psycho-emotional state on an emotion scale: positive, rather positive, neutral, rather negative, negative.

After viewing the affective image and self-assessment, a simple choice task (association task) [12, 18] becomes available (Fig. 3a), the meaning of which is to choose from two offered figures the one that, in the participant’s opinion, is more consistent with the one given. A perfect choice is indicated by pressing the Right or Left button. An example of cursor movement during a simple choice task is Fig. 3b.

The whole test is a cycle of an affective image, a self-assessment survey and three association tasks. In total each participant completed 10 cycles, respectively viewing 10 images, completing 10 self-assessments, and solving 30 simple choice tasks. Thanks also to time recording, it can be said that, on average, it took the participants no more than 7 min to solve the complete test.

Fig. 3.
figure 3

a) Example of an association task to record the location of a subject’s mouse cursor. b) An example of a solution to an association problem. The red line is a hypothetical trajectory and is not shown in the real experiment.

While the participants solve the choice tasks, each new cursor location is recorded in the background, namely the x and y coordinates. The scheme of interaction between the parties, client, and server, is shown in Fig. 4. Each time a participant moves the computer mouse, the client interface sends data such as the identification number (ID) and spatial characteristics of the cursor to the server through a POST request via the web interface. In response, the server sends a request processing report message.

Fig. 4.
figure 4

Interaction between client and data collection unit server in a mechanism for analyzing emotional state based on unconscious subject movements.

In this way, the information about the movement of the cursor on each task page is sent to the server, where it is placed in the database and waits for further processing.

The next stage after the accumulation of information from the participants is its systematization and transformation into a unified data set. According to [12, 15, 18, 19], attention ceases to be directed and becomes stimulated when a person is affected by emotion, with the subject becoming more distracted by environmental stimuli. When viewed from the perspective of neuroscience, the brain’s capacity for concentration is also affected by emotions, particularly negative ones. Therefore, temporal, spatio-temporal and two spatial characteristics are chosen as metrics to assess emotional state from mouse cursor movements. Temporal is the time of user’s interaction with the page of the simplest choice (association) task. The spatio-temporal characteristic is the maximum speed at which the computer mouse moves. The spatial characteristics are the distance travelled by the cursor and the value of the maximum deviation from the straight line connecting the start and end points of the movement.

A program developed in Python is used to create a single dataset. The distance travelled is found by Eq. (1) as the sum of the Euclidean distance between two points, according to Eq. (2).

$$ D = \sum\nolimits_{i = 1}^{n - 1} {d\left( {a_{i} , a_{i + 1} } \right)} $$
(1)
$$ d\left( {a_{i} , a_{i + 1} } \right) = \sqrt {\left( {a_{i}^{\left( x \right)} - a_{i + 1}^{\left( x \right)} } \right)^{2} + \left( {a_{i}^{\left( y \right)} - a_{i + 1}^{\left( y \right)} } \right)^{2} } $$
(2)

Peak mouse speed is calculated as the difference of the Euclidean distance between points and the time it is travelled.

The maximum deviation of the cursor path from the straight line connecting the start and end points is calculated as the greatest minimum distance (perpendicular) from the point to the straight line.

After finding the four characteristics for each completed association task (assuming that the participant indicated a choice by pressing the appropriate button), all data are placed in one file, each line of which carries: id is participant number; gender is value 1 for male, 2 for female; emotion is self-rated emotional state (1-positive, 2-negative, 3-neutral); Euclidean distance - distance travelled by cursor (pixels); max straight line distance is maximum deviation of the cursor from the straight line connecting the start and end points; peak velocity is peak cursor speed (pixel/nanosecond); time is interaction time with the simplest selection task (nanoseconds). The final file contains 3139 lines of information.

3.3 Research and Data Processing

The study of temporal, spatio-temporal and spatial characteristics is the key to the analysis of the emotional state based on the unconscious actions of the subject. The tables below present the statistical values, maximum, minimum and median, respectively, for the four characteristics investigated. The values are specified for males, females, and the whole group of participants. This will reveal the correlations, if any, between the values and the gender of the participants.

From the analysis of the maximum values of the characteristics (Table 1), the maximum values for men are observed in the maximum deviation of the movement trajectory from the straight line connecting the start and end points (856.21 versus 823.48 for women), peak speed (816 × 10–6 vs. 612 × 10–6 for women), and interaction time with the simplest choice task (575.040 × 108 vs. 382.679 × 108 for women). The maximum value of distance travelled is observed for women (14620.5 vs. 10955.7 for men).

Table 1. Maximum values for spatial and temporal characteristics of the mouse cursor.

Analysis of the minimum values (Table 2) shows that the lowest values for women are observed in the maximum deviation of the trajectory from a straight line (5.87 vs. 7.26 for men), time to solve the simplest choice problem (3.671 × 108 vs. 4.858 × 108 for men). The minimum value of the traveled distance is observed for men (477.9 vs. 532.3 for women). The peak speed is independent of the participant’s gender and amounted to 1 × 10–6.

Table 2. Minimum values for spatial and temporal characteristics of the mouse cursor.

The median peak speed values (Table 3) for males and females are closest compared to the rest of the characteristics - 5.02 × 10–6 and 5.73 × 10–6 respectively (5.28 × 10–6 for the full group of participants). The same can be said for the association task solution time - 18.19 × 108 for men and 18.20 × 108 for women (18.19 × 108 for the full group of participants).

Table 3. Median values for spatial and temporal characteristics of the mouse cursor.

From the analysis of the statistical characteristics, it can be concluded that the values are not strongly dependent on the gender of the participants, the observed difference in performance is not significant.

Table 4 presents the lowest and highest values of time, peak speed, spatial characteristics for the three subject states: positive, negative, neutral.

Table 4. Maximum and minimum values of characteristics for different emotional states of the subject.

The distance travelled by the cursor reaches a maximum and a minimum when the condition is neutral. The situation is similar for the maximum deviation of the trajectory of movement from a straight line only for negative emotions. Peak speed has the highest value for a positive emotional state; the minimum is independent of the mood of the participant. The longest and shortest time to solve the simplest choice (association) task corresponds to positive emotions. The analysis shows that emotions affect subjects differently, and one cannot assess emotional state based only on the statistical characteristics of cursor movements.

Two neural networks using the Python programming language are trained to analyze emotions based on the unconscious movements of the subject, mouse movement: a multilayer perceptron using the TensorFlow library [20], and the OneVsRestClassifier from the Scikit-learn library [21]. The task of emotion recognition based on subject actions is a multi-class classification [22] because there are three classes: positive, negative, and neutral emotional states. The multi-layer perceptron (MLP) is a class of simple direct-coupled neural networks using supervised learning. The OneVsRestClassifier network is based on the principle of selecting a classifier for each class and then matching the class to others for each classifier, thus making the approach interpretive. The emotion analysis task solved using seven approaches to compare performance: in six cases, different pairs of features are taken as independent variables (two features out of four are chosen), and in the seventh case, all features are taken as independent variables. In each approach, the data set is split into a training sample and a test sample.

For MLP neural network analysis of emotion, the dependent variables (emotional state value) transformed using One-Hot coding, where the result is a binary data array whose size equals the number of features, with one in the column corresponding to the feature number and zeros in the others. The neural network consists of a normalizing layer adapted for the corresponding independent variables; two hidden full-connected layers (Dense) with 64 neurons each and a ReLU activation function [23]; an output full-connected layer (Dense) with a Softmax activation function [24] and three neurons according to the number of classes. The model trained over one hundred epochs.

To investigate the relationship between emotion and mouse movements using the OneVsRestClassifier classifier, no transformation of dependent variables is required. Reference vector classification is used in the construction of the model [25]. Table 5 summarizes the results of seven studies for each classifier.

Table 5. Training accuracy of multi-class classification models for different independent variables.

From the results obtained, we can conclude that a pair of features is the maximum deviation of the movement trajectory from the straight line connecting the start and end points and the time to solve the simplest choice (association) problem is the most informative for analyzing the emotional state based on the unconscious movements of the subject, as both classifiers showed the maximum accuracy index for the pair of features. The most inefficient pair to investigate with an MLP network is the maximum deviation of the trajectory from a straight line and the peak speed, for the OneVsRestClassifier model these are three pairs: distance travelled and maximum deviation from a straight line, distance travelled and interaction time, a set of four features.

4 Conclusion

The technologies of emotion detection based on the subject’s unconscious actions, computer mouse control, and interaction with a smartphone considered. The lack of freely available data for a study on this topic is revealed. A dataset containing temporal, spatial, and spatio-temporal characteristics of the cursor collected. These are the interaction time with the computer mouse during the session, the distance travelled by the cursor, its maximum deviation from the straight line connecting the start and end points of movement, and the peak speed of movement respectively. A web application in Python programming language using FastAPI web framework developed to compile the dataset. The final dataset contains 3139 lines of information. Such data helps to explore a subject’s emotional state regardless of their race, language, or other affiliations.

The statistical values of the cursor characteristics, maximum, minimum and median, for men and women maximum and minimum values for each emotional state, positive, negative, neutral analyzed. From the analysis of the statistical characteristics, it is deduced the values do not depend on the gender of the participants. The MLP and OneVsRestClassifier from the TensorFlow and Scikit-learn libraries trained to identify the relationship between emotional state and cursor movement, respectively, with a 41% accuracy in determining emotional state. The results indicate there is a relationship between the emotions experienced and unconscious movements of the subject.