Keywords

1 Introduction

Phobia is a type of anxiety disorder manifested through an extreme and irrational fear towards objects and situations. According to the newest statistics, 13% of the world’s population suffer from a certain type of phobia [1]. They are classified into social phobias (agoraphobia, fear of public speaking) and specific phobias (triggered by specific objects or situations). At world level, 15–20% people [2] experience specific phobias at least once in their lifetime. Acrophobia (fear of heights) is in the top, affecting 7.5% of people worldwide, followed by arachnophobia (fear of spiders) with 3.5% and aerophobia (fear of flying) with 2.6%. The recommended treatment in case of phobia is either medical (with pills) or psychological – in-vivo exposure to the fear-provoking stimuli and Cognitive Behavioral Therapy (CBT), a modality that determines the patient to change his thoughts towards the objects generating fear. Nearly 80% of phobics find relief in medicines and CBT. However, treatment should be provided continuously, as in more than 50% of cases, the disorder tends to relapse. The medication prescribed includes anti-anxiety and anti-depressive drugs that alleviate anxiety symptoms [3], with side effects such as impaired cognition and tendency to create dependence [4].

Virtual Reality (VR) has emerged in recent years and a series of systems for phobia therapy have been tested and validated, either as commercial products or research items. They are called Virtual Reality Exposure Therapy (VRET) systems and are preferred by more than 80% of patients over the classical in-vivo exposure therapy [5].

VR phobia therapy has the advantages of providing a safe exposure environment, with a various range of modifiable stimuli and immediate intervention from the therapist. Our approach replaces the human therapist with a virtual one, in the form of a female software avatar, called eTher, that evaluates the patient’s level of anxiety based on recorded biophysical signals, provides guidance and encouragement, changes its voice parameters (pitch, tempo and volume) according to the user’s emotional state and automatically adjusts the levels of game exposure. We continued the previous research [6,7,8] and enriched the software by adding a virtual agent with the appearance of a virtual avatar.

The paper is structured as follows: Sect. 2 presents related work, Sect. 3 introduces emotions and biophysical signals, Sect. 4 describes the virtual environment for acrophobia therapy, Sect. 5 details eTher’s implementation and Sect. 6 outlines its validation – experimental procedure and results. Finally, in Sect. 7 we provide conclusions and future research directions.

2 Related Work

Virtual Reality (VR) has been successfully used for 25 years in mental disorders treatment [9, 10]. One of the first studies on virtual environments and acrophobia treatment is presented in [11]. Three virtual environments have been created to be used for exposure therapy. They contain an elevator, balconies and bridges. 17 subjects were randomly divided in two groups: a treatment group and a control group. The subjects from the treatment group have been gradually exposed in the VR environments and the results indicated that virtual height situations generated the same experience as physical world height exposure.

A large debate about the potential of VRET in acrophobia treatment is highlighted in [12]. VR can elicit phobic stimuli and the patients may feel like being in situations that are difficult to access in the real-world environment. Also, VR offers a solution to treat the patients who cannot imagine an acrophobic environment. VRET can be used by anyone including those who do not have the courage to recognize the phobia and seek treatment. The technological advances made possible the development of cheap VR applications and devices to be used in therapies. The VR-based triggers of acrophobic behavior are identified and described in [12]: visuo-vestibular triggers, postural triggers, visual and motion triggers. The authors concluded that VR can be used both as a tool for treating acrophobia, but also for investigating and understanding it [12].

Recently, Freeman et al. [13] showed that VR has not been used at its maximum potential for mental healthcare therapy. They conducted a systematic review of the field, where both benefits and issues have been identified. The main benefit is the increased access to treatment, while the main concern refers to the therapy’s quality control. The highlighted benefits of VR for mental healthcare are: the usage of VR enables accurate therapeutic strategies implementations; helpful situations for therapy can be created; the treatment can be repeated without additional costs and can be delivered to patients’ homes [13]. VR offers the possibility to implement various therapeutic techniques, more real situations can be simulated and patients can gradually and repeatedly experiment them until they overcome their problems.

In 2018 the results of an automated treatment for fear of heights were presented in [14]. A software application called Now I Can Do Heights with a virtual coach was involved in the treatment of 49 subjects. 51 subjects were allocated to the control group. The VR-based treatment was designed during 6 sessions of 30 min each, over a period of 2 weeks. The results proved the efficiency of the VR-based automated psychological therapy: fear of heights has decreased for all the participants from the VR group.

The efficiency of VR cognitive behavior therapy was proved in a large experiment in [15]. 193 subjects, aged between 18 and 65, were divided in two groups. One group was exposed to VR-CBT applications and gamified VR environments and the other was a control group. A significant reduction of acrophobia was recorded for the VR-CBT group after three months of therapy, compared to the control group [15]. Also, the authors concluded that VR-based treatment can be performed even in the absence of the therapist.

A pilot study was performed in [16] in order to evaluate e-virtual reality exposure for acrophobia treatment. 6 subjects were exposed for six sessions during three weeks in VR-based therapy: three of them participated in e-VRET sessions (without a therapist) and three were involved in traditional p-VRET sessions (in the physical presence of the therapist). The results showed that there is no significant difference between e-VRET and p-VRET sessions regarding the anxiety level recorded. The authors made a first step in proving that VRET can be used over Internet for phobia treatment [16].

Most of the automated applications for acrophobia treatment use artificial intelligence to estimate the subjects’ fear level in acrophobic situations. Various biophysical data were collected and learning models have been used to make predictions about the patients’ fear level. In the experiment presented in [17], the EEG data of 60 participants was acquired and fed to a deep learning model to detect the subjects’ acrophobia level.

In [6,7,8], we investigated the efficiency of different machine learning classifiers in a VR system for treating acrophobia. The system automatically estimated fear level based on multimodal sensory data – EEG, pulse, electrodermal activity and a self-reported emotion evaluation. The results showed classification accuracies ranging from 42.5% to 89.5%, using the Support Vector Machine, Random Forest and k Nearest Neighbors techniques. The most important features for fear level classification were GSR, HR and the EEG in the beta frequency range.

VRET gained the status of effective therapy for various mental disorders according to the results of the meta-analyses presented in [18]. 30 studies about VRET in different disorders have been analyzed: 14 studies with specific phobias, 8 with social anxiety disorder or performance anxiety, 5 with posttraumatic stress disorder and 3 with panic disorder. The authors concluded that VRET is an equal medium for exposure therapy [18].

3 Emotions and Biophysical Data

Accurate recognition of emotions allows the appropriate adjustment of the therapeutic attitude in phobias. Phobic behavior is a defense-like overreaction to a certain category of stimuli [19]. As a manifestation of autonomic nervous system activation, emotions can be identified by measuring and analyzing physiological reactivity. Nowadays a wide variety of modern neurophysiological methods are used for biophysical signal sensing of emotions: Electromyography - EMG, Electrodermal Activity – EDA, Electroencephalography - EEG, Heart Rate Variability – HRV.

In response to phobic conditioning, we chose to record Galvanic Skin Response (GSR) and Heart Rate (HR). It is expected that GSR and HR amplitudes would increase as a result of phobic stimuli without habituation and decrease as a result of the virtual agent’s therapeutic intervention.

4 VRET Game for Acrophobia

The designed system is based on a VR game for acrophobia therapy, to which we added a virtual agent acting as a virtual therapist.

The acrophobia game depicts a mountain landscape, with three possible scenarios that can be selected from the start menu: a ride by cable car, one by ski lift and one by foot. The game is rendered over the HTC Vive Head Mounted Display and the interaction is ensured by pressing the buttons from the HTC Vive’s controllers. In the current version, eTher is implemented only for the cable car ride (Fig. 1). Throughout the ride, there are 10 stops where the cable car gets blocked and the user’s emotional state is evaluated considering the biophysical data (GSR and HR) recorded during the previous level. A level is defined as the ride segment between two consecutive stops. Each level takes approximately 10 s, time in which the cable car moves slowly, so that the user can look on the window, move his head, rotate and have a full realistic experience of the environment. The ride takes a semi-ascending path, so that the stops are set at the following altitudes: starting point – 28 m, Stop1 – 138 m, Stop2 – 264 m, Stop3 – 327 m, Stop4 – 213 m, Stop5 – 388 m, Stop6 – 460 m, Stop7 – 395 m, Stop8 – 470 m, Stop9 – 607 m, Stop10 – 640 m.

Before the start of the game, we store the user’s profile – name, age, level of acrophobia (low, medium, high) determined by completing 3 questionnaires: Heights Interpretation Questionnaire [20], Visual Height Intolerance Severity Scale [21], Acrophobia Questionnaire [22]. We also store each user’s favorite song/image/quote that will represent the relaxation modalities.

At start, we also measure the baseline HR and GSR during a relaxation period of 3 min. The difference between the average baseline relaxation HR and GSR values and the average HR and GSR values recorded during a game level will determine the current anxiety level. According to the current anxiety level, eTher will either allow the user to continue the game (if the anxiety level is low) or provide relaxation modalities by randomly presenting the user his favorite image, song or quote. The game ends when the user reaches the final stop or after a predefined number of game epochs – for the current implementation we chose 20. He may also leave the game anytime he feels uncomfortable or experiences motion sickness. A full description of eTher’s workflow is detailed in Sect. 5.

5 eTher for Acrophobia Therapy

5.1 Description

eTher (Fig. 2) is a therapeutic agent providing assistance to the user through social interactions in order to increase the efficiency of the VR game in acrophobia therapy. It is actually a virtual agent equipped with conversational service capabilities in order to maintain a dialogue with the users. The agent aims to keep the users in a comfortable zone defined by their baseline relaxation HR and GSR values during exposure to various acrophobia anxiety-producing stimuli in the VR environment.

Fig. 1.
figure 1

Virtual environment seen from the cable car

Fig. 2.
figure 2

eTher female game avatar

The main capabilities of eTher are:

  • Provides assistance in acrophobia treatment by monitoring patients’ biophysical data during the therapy.

  • Offers encouragement, motivation, corrective feedback and support to the patients.

  • Provides personalized guidance during the VR game.

  • Automatically adjust the level of exposure.

The architecture of eTher is inspired by INTERRAP [23] and MEBDP [24] (Fig. 3).

Fig. 3.
figure 3

eTher architecture

The therapeutic agent consists of three modules: a World Interface, which provides information exchange with the environment, a Knowledge Base consisting of facts and rules and a Control Unit. The Environment is defined by patients and the VR game. The World Interface contains user’s data acquisition systems, sensors systems, systems for describing game scenarios and actuators, elements through which the agent modifies the environment. The Knowledge Base and Control Unit are accordingly structured in two layers: one for reactive behavior and one for planned behavior. The Knowledge Base (facts) of the agent contains the beliefs about the environment (world model), and beliefs about itself (mental model). For example, the world model contains: relaxing songs; motivational, encouraging, congratulating expressions; conversation topics; music with changeable parameters; patients’ profiles; games scenarios descriptions; etc. The Knowledge Base (rules) contains a plans library and a reactions library.

  • A plan is a tuple p = <condition, procedure>

  • If the condition is true, then the procedure is run.

The Control Unit contains algorithms for situations recognition (a reactive or planning situation), a module which controls layers activation and two layers for each type of situation. All layers contain Reinforcement Learning techniques to decide which rules, plans or cooperation protocols will be selected at any given state of the environment. An environment state is defined by anxiety levels, patients’ profiles and game scenarios. The planning layer is divided in two sub-layers similar to MEBDP [24]: task planning sub-layer to decompose a global task in subtasks and expert action selection sub-layer to select action for each sub-task. The reactive layer and planning layer generate actions which are transferred to actuators to be performed.

5.2 Implementation

For measuring HR and GSR, we used the Shimmer3 GSR+ Unit [25]. It captures the skin’s electrical conductance via two electrodes placed on the medial phalanges of the middle and ring fingers and the photoplethysmography (PPG) signal by using the optical pulse probe sensor placed on the tip of the index finger. The PPG is then converted into HR by an in-lab modified version of the module integrated into the Shimmers C# API. As in the first seconds after the Shimmers3 GSR+ Unit starts, the recorded values for both GSR and HR are invalid, usually providing a value of −1, we allowed the system to have a calibration period and only after that, when the recorded signal became valid, the data could be used and stored. For each user who plays the game, we record a baseline HR and GSR for a period of 3 min during which the user stays still and tries to relax. The data is averaged for both GSR and HR and thus we obtain GSRb and HRb.

During each game level, we record the biophysical signals and obtain GSR current (GSRc) and HR current (HRc). The differences (in percent) between the current biophysical values and the baseline ones (for both GSR and HR) are computed using the following formula:

$$ \begin{array}{*{20}c} {P_{HR} = \,100*{{(HR_{c} - HR_{b} )} \mathord{\left/ {\vphantom {{(HR_{c} - HR_{b} )} {HR_{b} }}} \right. \kern-0pt} {HR_{b} }}} \\ {P_{GSR} = 100*{{(GSR_{c} - GSR_{b} )} \mathord{\left/ {\vphantom {{(GSR_{c} - GSR_{b} )} {GSR_{b} }}} \right. \kern-0pt} {GSR_{b} }}} \\ \end{array} $$
(1)

In order to provide the user a form of feedback, these percents (pHR and pGSR) are translated on the screen as bars of changeable color, one bar for pHR and one for pGSR, thus:

$$ \begin{array}{*{20}c} {P_{HR/GSR} \le 10\% ,\;color = green} \\ {P_{HR/GSR} > 10\% \;and\;P_{HR/GSR} \le 40\% ,\;color = yellow} \\ {P_{HR/GSR} > 40\% \;and\;P_{HR/GSR} \le 70\% ,\;color = orange} \\ {P_{HR/GSR} > 70\% ,\;color = red} \\ \end{array} $$
(2)

pHR/GSR refers to either pHR or pGSR.

The objective of eTher is to keep the user within the green and yellow area of comfort. ETher is a humanoid software agent with the shape of a female game avatar. It is designed using the Unity game engine, starting from a realistic human face. The appearance of the agent needs to be according the users’ preference. The agent must have a positive voice that inspires trustworthiness, competence and warmth. After a preliminary analysis of the gender dimension, we chose to design the therapeutic agent with a female voice for several reasons: (i) it is perceived as helping, not commanding [26]; (ii) we are biologically set from intrauterine life to prefer the female voice, to identify the mother’s voice, not necessarily that of the father [27]; (iii) the female voice is clearer and more melodious, having a calming and soothing effect (the processing of the female voice is done in the same auditory area dedicated to musical information) [28] (iv) the female voice offers greater confidence than the male voice due to a higher pitch [29]. By default, the female avatar has a normal, neutral voice in terms of voice parameters – pitch, tempo and volume. The voice parameters are afterwards modified in Audacity [30] in order to create the four Agent Interactions (AIs) described in Table 1. For AI1, pitch, tempo and volume are amplified with 10%. This means that the agent’s voice is more energetic, dynamic and lively, appropriate for giving encouragements and motivating the user to go on with the game. For AI2, eTher’s voice is rather neutral. For AI3, the parameters are decreased with 10%, meaning that the voice is graver, the words are pronounced slower and the volume is lower in order to make the user relax, detach and diminish his level of anxiety. In the case of AI4, all three parameters are decreased with 20%. This will be played when the subject is extremely tense and the voice needs to be sober, serious, with a staccato rhythm and quieter so that he may completely understand his emotional state and relax.

Table 1. Virtual agent interactions

We identified the following situations or tuples <condition, procedure> which compose the Therapy Plan Library (Table 2):

Table 2. Therapy plan library

A diagram that details the therapy workflow is presented in Fig. 4.

In Situation1–Situation4, GSRcolor and HRcolor are both either green or yellow. This means that the user is relaxed or rather relaxed and he can continue to the next game level. eTher in the form of AI1 appears on the screen and encourages him to go on. In Situation5–Situation8, one of GSRcolor or HRcolor are Green/Yellow and the other is Orange, suggesting that the subject tends to become anxious. eTher plays AI2 and then presents randomly the user’s favorite image, song or quotation for 20 s. After these 20 s, the subject’s emotional state is evaluated again. If it falls into Situation1–Situation4, AI1 appears and he may continue the game from there. If it falls into Situations 5–8, the player is taken to the previous level, so the level of exposure decreases with 1, if it exists. For Situations 9–15, the exposure decreases with 2 game levels, while for Situation16, the user is automatically taken to 3 game levels behind the current one. In Situation9–Situation15, GSRcolor and HRcolor are both orange or one is red. In all these cases, AI3 is played and if the relaxation modalities fail to work, the exposure is lowered with 2 levels. In Situation15, both GSRcolor and HRcolor are red. AI4 is played by eTher and if the relaxation means are inefficient, the user is taken 3 game levels behind the current one.

Fig. 4.
figure 4

Therapy workflow

6 eTher Validation

6.1 Experiment Validation

In order to validate our pilot implementation of eTher, we have performed a series of tests with a number of 10 acrophobic subjects, 5 men and 5 females, aged 24–50. The experiment consisted in obtaining a baseline GSR and HR recording during a relaxation period of 3 min, followed by a gameplay session without intervention from eTher, other 3 gameplay sessions spanned over a period of 3 days assisted by eTher and a subjective evaluation where the participants were asked to share their opinion about the human-agent interaction by answering the following questions:

  1. 1.

    How was your experience with the agent?

  2. 2.

    Was it easy to be used?

  3. 3.

    Did you feel safe during gameplay?

  4. 4.

    Have you felt an improvement of your acrophobic (comfort) state during the game?

As our research is in incipient phase, we designed and evaluated the functionality of the therapeutic agent simultaneously, by adjusting its capabilities during the current development phase. As we wanted to find out more about the human-agent interaction, at the end of our experiment, the subjects took part in an interview with the researchers and verbally expressed their opinions and shared their experience.

6.2 Results

In the last session of the experiment, for all users, the most frequent situation was Situation1 (44%), followed by Situation2 (21%), Situation3 (17%), Situation6 (5%), Situation13 (9%) and Situation15 (4%). The participants succeeded to reach the final stop of the game after completing 11.75 game levels on average.

There were 14% exposure changes with 1 level and no exposure changes with 2 or 3 levels. Of the 14% exposure changes with 1 level, 9% were due to the fact that a high level of anxiety appeared in Stop1 and Level1 had to be repeated. As Level0 does not exist, the user had to replay Level1. Probably they experienced anxiety at the start of the game, due to the fact that they knew they are monitored or felt uneasy in the presence of the tester.

Both skin conductance parameters (Fig. 5) and heart rate (Fig. 6) decreased at the end of the three days of gameplay, indicating a reduction of phobic anxiety.

In order to assess the efficiency of the proposed model, we applied a paired-samples t-test that compared the average of the recordings at the final session for the same group of subjects before and after therapy. We started from the hypothesis that the average values decrease after therapy for both GSR and HR. A p < 0.05 value was considered statistically significant. The average GSR before therapy was 1.68 uS and after therapy, 0.9 uS. In what concerns HR, before therapy the average was 77.34 bpm and after therapy, 75.17 bpm.

According to Table 3, we reject the null hypothesis and support the research hypothesis stating that virtual agent-assisted therapy has a positive effect on acrophobia alleviation.

Fig. 5.
figure 5

Reduction of GSR after virtual agent therapy

Fig. 6.
figure 6

Reduction of HR after virtual agent therapy

Table 3. Compared averaged recordings before and after therapy

For exemplification, we present the analysis of one of the subjects both before and after therapy.

The data obtained from the electrodermal recording was processed using the ledalab software [31, 32]. Its processing includes electrical noise reduction, the detection and measurement of artifacts, as well as signal decomposition into two components: Skin Conductance Level – SCL and Skin Conductance Response – SCR.

SCL is a component that varies slowly as a result of general changes in autonomous arousal, while SCR is a rapidly varying component which represents the phasic response to a succession of stimuli provided in the environment. These two processes have different neurological mechanisms.

Figure 7 presents the evolution of GSR during gameplay before (a) and after (b) therapy.

Fig. 7.
figure 7

The evolution of GSR during gameplay before (a) and after (b) therapy

Figure 8 presents individual responses to stimuli obtained after processing raw data (before therapy – a and after therapy – b) using the Discrete Decomposition Analysis (DDA) method.

Fig. 8.
figure 8

Individual responses to stimuli before (a) and after (b) therapy.

In what concerns the qualitative questionnaire, we hereby express the opinion of the user with the lowest level of VR experience: “Nice and interesting experience, although it was the first time I used the VR glasses; it wasn’t easy for me to use them, as I had to stay focused and alert. Yes, I felt secure and I enjoyed knowing that a virtual therapist takes care of me. I felt an improvement of my acrophobic condition, but also a stress because I do not have necessary abilities for interacting with the virtual environment”.

7 Conclusions

In this paper, we presented the pilot implementation and validation of an assistive virtual agent which identifies emotional states and provides guidance by adjusting exposure levels in a virtual environment for acrophobia therapy. The therapeutic agent automatically adjusts its speech and voice parameters according to the player’s anxiety level and supplies relaxation modalities. In an experiment with 10 acrophobic users, we have showed a significant improvement of the anxiety levels, observable in the GSR and HR rates in the final session of the experiment. As future directions, we plan to perform more experiments and to refine the system so that the patients will be able to use it in safety conditions.