Keywords

1 Introduction

Anxiety is an emotional state characterised by negative affect and worry, heightened arousal, careful environmental monitoring, rumination and avoidance behaviour, ranging from mild to severe. Intense states of anxiety, or even fear - a more rudimentary physiological response to a perceived threat that can lead to fight/flight/freeze reactions and panic behaviour - can be symptoms of different psychological disorders. For example, phobias are defined by an exaggerated fear or unrealistic sense of threat to a situation or object, which appear in many forms. In the Diagnostic and Statistical Manual of Mental Disorders (DSM-5, 2013) [18, 23], the American Psychiatric Association defines five types of phobia, related to natural environments (e.g., heights), animals (e.g., spiders), specific situations (e.g., public spaces), blood/injury or medical issues, and other types (e.g. loud noise, vomiting, choking). These debilitating disorders affect about 13% of the world’s total population. Research is ongoing for contributing factors to the onset, development, and maintenance of phobias and anxiety-related disorders, their underlying cognitive and behavioural processes, physical manifestation, and treatment methods [4, 5, 26, 31]. Traditional treatments of such disorders include in-vivo exposure, interoceptive exposure, cognitive behavioural therapy (CBT), applied muscle tension, supportive psychotherapy, hypnotherapy, and medications such as beta-blockers or sedatives [9].

Virtual reality exposure therapy (VRET) is one of the most promising novel treatments, enabled by its superior immersive capabilities that generate a greater sense of presence and enhance user effects, especially for negatively valenced, high arousal stimuli [37]. Over the last two decades VRET, encompassing psychological treatment principles and enabled by advancing display and computing technology developments, has become a popular digital intervention for various psychological disorders [6, 38], being as effective as in-vivo (i.e., face-to-face) exposure therapy post-intervention [20]. For example, a meta-analysis showed VRET for Social Anxiety Disorder (encompassing an exaggerated fear of being rejected, negatively evaluated or humiliated during social interactions, observations and/or in performance situations) to be more effective than wait-list controls (with large effect sizes), and even therapist-led in-vivo exposure therapy (though only small effect size) [6]. It shows good acceptability in users due to its safe, controlled and empowering means of exposure. A vital part of the development of VRET is the integration of bio-signals, such as heart rate variability or cortical arousal, to assess and ameliorate physiological distress states (e.g., fear or anxiety induced arousal) during exposure. Here, correct detection of physiological states through robust models for effective management of anxiety-induced arousal or stress is pivotal to facilitating intervention and enhancing psychological health and well-being.

2 Related Work

Arousal detection, a noninvasive intervention, requires a multi-disciplinary approach, where psychological state determination, machine learning models for arousal or stress detection, and exploration of the related domains for model implementation are equally important. In this paper, we narrow down the areas and present an overview of the state of the art scenarios.

Emotion/Stress Detection: Koelstra et al. (2012) presented a multimodal dataset for the analysis of human affective states [21]. They collected physiological signals, including electroencephalographic (EEG) data from participants watching music videos and rated each video in terms of excitement, stress, arousal, flaws, valence, like, dislike. The data has been widely used for developing various machine learning models for arousal, anxiety and stress detection. Ahuja and Banga (2019) created another dataset from the Jaypee Institute of Information Technology where they classified mental stress in 206 students [2]. They used Linear Regression (LR), Support Vector Machine (SVM), Naïve Bayes (NB) and Random Forest (RF) machine learning classification algorithms [10, 11, 15, 25, 28, 32,33,34] to determine mental stress. Using SVM and 10-Fold cross-validation, they claimed an 85.71% accuracy. Ghaderi et al. (2015) used respiration, galvanic skin response (GSR) from hand and foot, heart rate (HR) and electromyography (EMG) at different time intervals to examine different stress levels. Then they used k-nearest neighbour (k-NN) and the SVM machine learning model for stress detection [16].

Table 1. Machine learning models of arousal detection.

Emotion/Stress Detection using EEG: EEG is a non-invasive way to measure electrical responses generated by the outer layers of the cortex, primarily pyramidal cells. It has been used to investigate neural activity during arousal, stress, depression, anxiety or various other emotions. Several studies have applied machine learning methods to classify and/or predict emotional brain states based on EEG activity [12, 13]. For example, Chen et al. (2020) designed a neural feedback system to predict and classify anxiety states using EEG signals during the resting state from 34 subjects [8]. Anxiety was calculated using power spectral density (PSD), and then SVM was used to classify anxious and non-anxious states. Shon et al. (2018) integrated genetic algorithm (GA)-based features in the machine learning pipeline along with a k-NN classifier to detect stress in EEG signals [36]. The model was evaluated using DEAP data set [21] for the identification of emotional stress state. Other work also used the publicly available DEAP data set for emotion recognition in virtual environments [27]. Based on Russell’s circumplex model, statistical features, high order crossing (HOC) features and powerbands were extracted from the EEG signals, and affective state classification was performed using SVM and RF. In major depressive disorder (MDD, n = 32), Duan et al. (2020) [14] extracted interhemispheric asymmetry and cross-correlation features from EEG signals and combined these in a classification using k-NN, SVM and convolutional neural networks (CNN). Similarly, in other research by Omar [3] frontal lobe EEG data was used to identify stressed patients. Fast Fourier Transformation (FFT) was applied to extract features from the signal, which were then passed to machine learning models, such as SVM and NB for subject-wise classification of control and stress groups. Table 1 shows a summary of ML models used for arousal detection and their performance.

Machine Learning and VRET: Balan et al. (2020) used the publicly available DEAP [21] database and applied various machine learning algorithms for classifying the six basic emotions joy, anger, sadness, disgust, surprise and fear, based on the physiological data [5]. They presented the stages of model development and its evaluation in a virtual environment with gradual stimulus exposure for acrophobia treatment, accompanied by physiological signals monitoring. In [39], authors used a hybrid machine learning technique using k-Means++ clustering algorithm and principal component analysis (PCA) to cluster drug addicts to find out the relationship between cardiac physiological characteristic data and treatment effect. The author showed the relationship between cardiac physiological characteristics and treatment effects using virtual reality. Other research [35] used a single session VRET for patients with spider phobia, including clinical, neuroimaging (functional magnetic resonance imaging, fMRI), and genetic data for baseline and post-treatment (after six months) analysis. They claimed a 30% reduction in spider phobia, assessed psychometrically, and a 50% reduction in individual distance avoidance tests using behavioural patterns.

Fig. 1.
figure 1

Proposed Machine Learning Pipeline: We collect EEG and multimodal physiological data from suitable sensors. To clean the data for further processing we used individual phases of feature selection, feature prepossessing and feature constructions for model selection which was used for parameter optimisation. This process was repeated using automated machine learning for the best possible outcome from the collected data set. After model validation, we use our trained model for meltdown moment detection, workplace stress detection, VRET and/or other domains where arousal detection is crucial.

3 ML Model Pipeline and Data Set

First, we collected EEG and multimodal physiological data from suitable sensors. Then we cleaned the data for further processing. Here we used individual phases of feature selection, feature prepossessing and feature constructions for model selection used for parameter optimisation. This process was repeated using automated machine learning for the best possible outcome from the collected data set. After the model validation, we apply our trained model to VRET and/or other domains where arousal detection is crucial. Figure 1 shows the proposed machine learning pipeline.

Fig. 2.
figure 2

The time domain representation of EEG data of [40]. The top Figures show the combined representations. Figures on the left show the initial condition and figures on the right show the stressed condition in channels F3, F4, Fz, Cz. We can clearly see the increase of oscillatory patterns of the signal from initial to stressful condition.

Data Set: For this research, we explored three publicly available data sets. The first one is the SWELL data set of [22]. The authors calculated the inter-beat interval (IBI) between peaks in electrocardiographic (ECG) signals. Then, the heart rate variability (HRV) index was computed on a five minutes IBI array by appending the new IBI sample to the array in a repeated manner. The data set was manually annotated with the conditions under which the data was collected. This data set has 204885 samples with 75 features and 3 labelled classes. Here, 25 people performed regular cognitive activities, including reading e-mails, writing reports, searching, and making presentations under manipulated working conditions. We used a second publicly available data set of [30], which was initially inspired from [19], with HRV data to train our proposed machine learning model and determine arousal levels. We also used a third publicly available data set titled ‘EEG during Mental Arithmetic Task Performance’ [40] to explore EEG recordings of 36 participants during resting state and while doing an arithmetic task. This data set has been commonly used to identify anxiety in individuals triggered while performing arithmetic tasks. It has been collected using a Neurocom monopolar EEG 23-channel system device. Electrodes (Fp1, Fp2, F3, F4, Fz, F7, F8, C3, C4, Cz, P3, P4, Pz, O1, O2, T3, T4, T5, T6) were placed on the scalp using international 10/20 standard. The sampling rate for each channel 500 Hz with a high-pass filter of 0.5 Hz and a low-pass filter 45 Hz cut-off frequency. In the experimental manipulation, participants were asked to solve mental arithmetic questions to increase cognitive load and induce stress, thus, evoking higher arousal states.

Fig. 3.
figure 3

Average frequency content of signal before and during the arithmetic task using [40] data set. We can clearly see changes in excitation levels. The figure on the left shows the initial level, whereas the right figure shows the stressed condition during mathematical problem solving. The figures were generated using the open source python package MNE-Python [17].

Fig. 4.
figure 4

Images above show the time frequency representations plotted using power plot topographic maps. Changes in Power Spectral Density can be seen for individual channels before and during the stressed conditions. The figures were generated using the open source python package MNE-Python [17].

4 Result Analysis

In this study, we took the data set of EEG signals during mental arithmetic tasksFootnote 1 [40]. Decomposed EEG signals for a duration of 5 s before and during an arithmetic task are shown in Fig. 2. The signals were in edf format, which were converted to epochs and their statistical features (mean, std, ptp, var, minim, maxim, argminim, argmaxim, skewness and kurtosis) were calculated. These were then used for the classification of the signals. RF model was used for this purpose which gave an accuracy of 87.5%. Figure 2 shows the time-domain representation of EEG signal of [40]. In this figure, plots on the left show recordings during the initial condition and plots on the right during stressed condition in channels F3, F4, Fz, Cz. We can clearly see the increase of oscillatory patterns of the signal from initial to stressful condition (Fig. 4).

Fig. 5.
figure 5

The figure shows the pairplot of a few notable features MEAN-RR, MEDIAN-RR, SDRR-RMSSD, MEDIAN-REL-RR, SDRR-RMSSD-REL-RR, VLF, VLF-PCT from SWELL dataset [22]. These statistical features have been used for the classification of the signals aiming at arousal detection. This publicly available HRV dataset has been used to train our machine learning models.

Fig. 6.
figure 6

The figure shows the prediction of stressful moments from the HRV data set generated by [30] inspired from [19]. We used the publicly available data set of [30] to train our proposed machine learning model for VRET and determine momentary stress states.

Fig. 7.
figure 7

Figures show the performance (accuracy, precision, recall and F1-Score) of the publicly available data set that we used to train our model. Here we consider QDA, GNB, SVM, MLP, ADB, KNN, DT and RF machine learning models. KNN, DT and RF has been used with multiple parameter settings. The figure on the top shows the performance on SWELL [22] data set and figure on the bottom shows the performance on EEG data set of [40].

Figure 3 shows average frequency content of signal epochs before and during solving arithmetic tasks using [40] data set. We can see some changes in excitation levels. The figures on the left show the signal in a relaxed state, whereas figures on the right depict the signals under stress while performing mental arithmetic task. Similarly, subsequent images in Fig. 3 show the time-frequency analysis of individual channels (F3, Cz, P4) generated using power plots and topographic maps. Significant difference can be seen between plots before and during evoked stress states. The Fig. 5 shows the pair plot of few notable features MEAN-RR, MEDIAN-RR, SDRR-RMSSD, MEDIAN-REL-RR, SDRR-RMSSD-REL-RR, VLF, VLF-PCT from SWELL dataset [22]. These statistical features have been used to classify the signals aiming for arousal detection. This publicly available HRV dataset has been used to train our machine learning models. The Fig. 6 shows the prediction of stressful moments from the HRV data set generated by [30] inspired from [19]. We used the publicly available data set of [30] to train our proposed machine learning model and determine momentary stressful states. Figure 7 shows the performance (accuracy, precision, recall and F1-Score) of the publicly available data set that we have used to train our model. Here we consider Gaussian Naïve Bayes (GNB), Quadratic Discriminant Analysis (QDA), Support Vector Machine (SVM), Multilayer Perceptron (MLP), AdaBoost (ADB), k-nearhood neighbour (KNN), Decision Tree(DT) and random Forest (RF) machine learning models. KNN, DT and RF have been used with multiple parameter settings. The figure on the top shows the performance of the SWELL [22] data set and figure on the bottom shows the performance on the EEG data set of [40]. If we use a different set of data then they results may vary slightly as showed by [1].

5 Challenges and Future Research Directions

As we mentioned in the Related Work section (Sect. 2) this work is derived through multidisciplinary research. So, diverse open domain challenges have been identified. Some of the key issues are-

  • The real-time analysis of the machine learning data. Stream processing will be one of the next challenges that we want to overcome for the same problem.

  • The placement of the BCI electrodes is an important consideration, and interesting to investigate further to determine the most relevant regions of the brain to monitor arousal.

  • In future, additional sensor/polar devices, chest-straps and/or wrist bands could be used to collect further types of signals. Moreover, additional data should be collected from different experimental conditions to further improve efficacy.

6 Conclusion

In self-guided VRET, participants can gradually increase their own exposure to anxiety evoking stimuli (like audience size, audience reaction, salience of self etc.) to desensitise and reduce momentary anxiety and arousal states, facilitating amelioration of PSA over time. However, creating this VR environment and determining anxiety induced arousal or momentary stress states is an open challenge. In this work, we showed which selection of parameters and machine learning models can facilitate arousal detection. As such, we propose a machine learning pipeline for effective arousal detection. We trained our model with three publicly available data sets where we particularly focused on EEG and HRV data. Considering the scenarios, our proposed automated machine learning pipeline will overcome the model selection problem for arousal detection. Our trained machine learning model can be used for further development in VRET to overcome psychological distress in anxiety and fear related disorders. Further useful applications of the model can be seen in meltdown moment detection in Autism Spectrum Disorder (ASD) and other scenarios where stress and arousal play a significant role and early intervention will be helpful for physiological amelioration. For example, early identification and signalling of a meltdown moment, can facilitate initiation of targeted interventions preventing meltdowns, which will help parents, carers and supporting staff deal with such occurrences and reduce distress and harm in individuals with ASD. Finally, arousal and increasing stress have become buzzwords of recent times, adversely affecting a vast range of populations across the globe regardless of age group, ethnicity, gender, or work profile. Due to the long ongoing COVID-19 pandemic, changing scenarios, work patterns and lifestyles, increasing pressures, and technological advancements are a few possible reasons for this trend [16, 21, 29, 30]. Thus, accurate detection of distress related arousal levels across the general population (e.g., in educational settings or the workplace) may help to avoid associated adverse impacts through effective interventions, prevent long-term mental health issues and improve overall well-being.