Introduction

People often smoke in response to specific cues such as seeing a cigarette, ashtray, or matches1,2. Hence exposure to smoking cues is an important step in therapies designed to build resilience to craving3. Presentation of smoking cues within virtual reality (VR) has been shown to elicit cigarette craving4. There are two key advantages of use of VR. First, multiple different smoking cues and scenarios, graduated in difficulty, can be easily presented and there are no actual cigarettes present to smoke and reinforce the established response. Second, it is now possible to automate delivery of therapy within VR5. We have successfully piloted with thirteen smokers a VR smoking environment delivered in the new generation of standalone VR headsets6. In this paper, we report a definitive test of this environment as the first step in developing a new VR therapy for smoking cessation.

One important reason that prevents people who smoke from successfully quitting is the difficulty of not responding to everyday smoking cues1. Smoking cues can be pervasive in daily lives and exposure to these cues is a predictor of smoking7. These cues may be specific items related to smoking, such as ashtrays and cigarette butts, and general environments where people usually smoke, such as a bar, and can include time-related events such as a morning coffee routine8. Exposure to smoking cues elicits craving, and craving has been identified as the mediator that leads to smoking9,10,11. Drinking alcohol is another well-recognised cue for smoking12 and people who drink alcohol are more likely to smoke too13.

Exposure to smoking cues triggering craving for a cigarette is a well-replicated phenomenon1,14. The results are consistent across different means of presentation, including pictures15, video16, feature films17, and VR18. VR clearly provides a higher degree of experimental control compared to studying occurrences of smoking in a natural setup and also provides a higher degree of immersion and interaction than 2D technologies, in which a typical setup offers a reduced field of view and participants are simply spectators19. Furthermore, VR allows the placement of people in a surrounding virtual environment that may be associated to smoking, thus triggering craving from a broad contextual cue20.

In a systematic review of 18 studies that involved 541 smokers, it has been shown that VR presentation of cues can produce a large triggering of craving (Cohen’s d = 1.0)21. In the largest study to date we wanted to show that similar effects can be produced from delivery of VR scenes in the new generation of standalone headsets. This could then form the basis of the development of a new cognitive intervention for smoking cessation. There is extensive evidence in the literature that exposure to smoking cues in VR triggers craving18,22, 23. However, only a few papers have reported the results of randomised control tests of the use of immersive VR technologies so far. A number of studies used VR to expose smokers to cues24 or used VR to try to improve the results of an approach bias modification approach25. Other studies used more limited technologies such as 360-degree videos26 and Second Life27. These studies have taken different approaches in experimental design or the experimental setup, making it difficult to compare their results. None have tested long-term effects.

Subjective measurements are the most common way to assess cravings. A small number of studies have also supplemented self-report with objective measurements such as physiological data. For example, it has been suggested that craving can be a predictor of physiological arousal28, and skin conductance has been shown to increase after exposure to smoking cues29. Therefore we also included physiological measures in our test.

In our pilot study, simple pre and post testing with 13 smokers indicated that our new VR environment may increase smoking craving6. In this paper, we present the results of a randomised controlled study with 100 participants using the same VR smoking cue scenario. We collected self-reported measurements through questionnaires and physiological data related to heart rate and skin conductance.

Methods

Experimental design

The study was a between-subject experiment in which we carried out a between group comparison of smoking craving scores after going through a VR experience, either a neutral environment or a scenario depicting potential smoking cues. Participants were randomly allocated to an experimental group using the online tool Sealed Envelope (https://www.sealedenvelope.com/). Ethics approval was granted by the University of Oxford Central University Research Ethics Committee (reference R81586/RE001). All research was performed in accordance with relevant guidelines/regulations, and written informed consent was provided by all participants.

Participants

Participants were recruited through advertisements on social media and local radio stations. The inclusion criteria were: over 18 years old and smoke a minimum of 10 cigarettes per day. The exclusion criteria were: photosensitive epilepsy; significant visual, auditory, or balance impairment; insufficient comprehension of English to complete study measures; using nicotine replacements; primary diagnosis of another substance dependency; or medication that reduces nicotine cravings (e.g. Bupropion).

Measures

The main outcome of the study was the score from the Tobacco Craving Questionnaire30. It is a 12-item short version of the long 47-item questionnaire. The long version is validated and reported as being reliable for research31 and the short version has similar internal consistency to the original version. It assesses craving for a cigarette at the time of filling it out, with answers on a 1 (strongly disagree) to 7 (strongly agree) Likert scale. Participants filled out this questionnaire before and after the VR experience. Scores can range from 12 to 84. Higher scores indicate greater craving for a cigarette.

As additional outcomes, participants provided subjective scores of their current cravings of cigarettes and alcohol using two visual analogue scales (VAS), with answers from 0 (Do not want one at all) to 10 (Extremely want one). Participants provided their current cravings before and after the VR experience. Higher scores indicate greater craving. Although internal consistency cannot be checked due to being just a single item, the use of VAS has been an increasingly popular method to quantify subjective experiences (e.g. pain32), and has been found to be a valid way to measure intensity of cigarette craving33.

At baseline participants completed the Heaviness of Smoking Index (HSI)34, which is a 2-item questionnaire to assess how much a person smokes, including the questions “How many cigarettes do you typically smoke per day?” and “How soon after you wake up do you have your first cigarette?” Answers were given as categorical numbers. These two measurements have been shown to be fairly reliable when used either separately or together35.

At baseline participants completed the AUDIT-C36, which is a three-item questionnaire to assess average alcohol use. Multiple studies have validated this questionnaire37.

Electrodermal activity and heart rate were recorded during baseline and the VR experience with the use of an Empatica E4 wristband (https://www.empatica.com/en-gb/research/e4/). Data included two pairs of inter beat interval (IBI) and electrodermal activity (EDA) files, one pair for baseline and the other recorded during the VR experience.

VR scenarios

We used the Meta Quest 2 VR headset in standalone mode for all the VR sessions. That means we used the VR headset without using a computer to run the simulation. There were two scenarios, one for the experimental group and one for the control group. Each lasted three minutes. In both scenarios participants sat down for the entire duration of the experience.

In the experimental group, participants were placed in an environment that resembled a British pub outdoor space6 (see Fig. 1). There were several people sitting around as on a typical warm sunny day. There were several items related to smoking cues in the scenario—pint glasses, ashtrays with cigarette butts, one of them half extinguished and still releasing a trail of smoke. On the bench next to the participant, two virtual characters were chatting. Over time their discussion turned towards cigarettes and how hard it is to quit smoking. At the end of the scenario, one of the characters turned their head towards the participant and asked them directly if they wanted a cigarette.

Figure 1
figure 1

Screenshots of the VR scenarios taken from the initial perspective of the participant (1) the beer garden; (2) the room in the neutral environment. Images created in Unity 2020.3.3f1 (https://unity.com/releases/editor/whats-new/2022.3.3).

In the control group, participants visited a neutral environment in a modern house with wide windows, similar to the welcome room described in38 (see Fig. 1). The landscape outside included different types of vegetation, a water stream, and clear blue sky. The environment included quiet background music. This environment did not contain any smoking cues.

No other hardware was required besides the VR headset, the Empatica E4 wristband, and a smartphone to record the physiological data.

Experimental procedures

Participants were asked to refrain from smoking 30 min before coming to the VR lab. Upon arrival, they were met at the reception area by a researcher who guided them to the VR lab. Once in the lab, they were asked to confirm that they had read the information sheet at least 24 h prior to the VR session and if willing to participate, to sign a consent form. After agreeing to participate, they filled out the AUDIT-C, the Tobacco Craving Questionnaire, and the visual analogue scales to obtain baseline measurements for the initial craving scores.

When they completed the questionnaires, they were randomised and allocated to the experimental condition they were instructed to remain seated for the rest of the session. A researcher helped them put the Empatica E4 wristband on their dominant arm and recorded two minutes of physiological data as a baseline. After that, participants put the VR headset on, the researcher made sure that vision was clear and the headset had been adjusted to the participant’s comfort, the Empatica E4 started recording data once again, and the VR experience started.

After the VR experience ended, the researcher helped them to take the VR headset off and were asked to fill out the Tobacco Craving Questionnaire and the visual analogue scales again. Participants were compensated twenty pounds for their time.

Data analysis

Analyses were conducted in R version 4.3.039. The main outcome was the craving score on the TCQ after the VR experience. We carried out a linear regression analysis with experimental group (StudyCondition) as the independent variable and controlling for initial craving scores.

$$Tobacco \,Craving \,Questionnaire \,\left(TCQ\right)score \sim StudyCondition + TCQBaseline$$

The scores reported by participants on the two VASs (cigarettes and alcohol) were also analysed using linear regression. Similarly, we used group as the independent variable and controlling for initial craving scores. Linear regression analyses were carried out in R using the lm function. Effect sizes were summarised as Cohen’s d values calculated using the cohen.d function in R.

We tested scores for how much participants smoked (HSI) and drank (AUDIT-C) on average as possible moderators of the main outcome with the following equation:

$$TCQ \,score \sim StudyCondition + HSI+ AUDIT\_C+ StudyCondition * HSI + StudyCondition * AUDIT\_C$$

Electrodermal activity (EDA) data pre-processing and initial visual analysis was carried out using the Matlab-based tool Ledalab40. We carried out a visual inspection on each dataset to detect anomalies in the data. We discarded the data from participants if more than 50% of their data were zero on either the baseline or the VR experience dataset. We also discarded the data showing sudden jumps that were too abrupt to be attributed to a change in skin conductance and did not recover to the original level after a few seconds.

Data cleaning41 included data trimming, smoothing, and correction of artifacts originated from bad readings. We trimmed a few seconds in the beginning and in the end of the dataset as it was common that there were a few faulty readings at both ends. Trimming was done manually keeping the data from the moment the function looked stable. We smoothed out the data to remove high frequency noise using a filter with a Gauss window size 8. We also removed any isolated spike due to bad readings and we reconstructed the signal with either a linear or a Spline interpolation, depending on what was more suitable in each case. The sample rate was kept at the recording rate of 4 Hz.

We split the signal between the tonic and the phasic components using continuous decomposition analysis40. The tonic component provides an overall background level and tendency over time of the signal, while the phasic component contains the information about sudden peaks and changes. We then analysed both components separately. We looked at the mean and standard deviation in the tonic component during the VR experience relative to the baseline. For this, we divided both the mean and standard deviation obtained in the VR experience by the values obtained in the baseline. We also looked at the skin conductance level (SCL) as the gradient of the tonic component. In the phasic component, we studied the mean and standard deviation relative to baseline the same way we calculated it in the tonic component. We then compared these extracted features between experimental groups.

Regarding heart rate data, we were interested in the heart rate variability (HRV), calculated from the IBI data. These data are processed only when the two beats are detected, thus the number of samples varied between participants. The mean and standard deviation of the HRV used in the statistical analysis were also relative to the baseline values.

We analysed all the features extracted from both EDA and IBI signals in a linear regression with the experimental group as the sole independent variable.

Results

30 male and 22 female participants were allocated to the experimental group. The average age was 39.12 (SD = 15.12). In the control group there were 27 male and 21 female participants, with an average age of 37.77 (SD = 14.49). No participants selected their gender as either ‘non-binary’ or ‘preferred not to say’. In the experimental group, the average HSI score was 3.33 (SD = 1.20) and the AUDIT-C score was 5 (SD = 3.01), and in the control group, the average HSI was 3.25 (SD = 0.84) and the average AUDIT-C score was 6.25 (SD = 2.97).

Table 1 shows the scores of the three questionnaire outcomes. All three scores obtained after the VR experience were statistically different between experimental groups. Compared to the neutral condition, the experimental group had large effect size increases in cigarette craving and a moderate increase in craving for alcohol. The Heaviness of Smoking Index reported during screening predicted cigarette craving after the VR session across both experimental groups (HSI, p < 0.0001) and alcohol use did not (AUDIT-C, p = 0.85). Looking within groups (i.e. pre to post changes), the experimental group increased in scores on the TCQ (p < 0.01), the VAS for cigarettes (p < 0.0001), and the VAS for alcohol (p < 0.001). The control group decreased in scores on the TCQ (p < 0.0001) and the VAS for cigarettes (p = 0.02) but did not significantly alter in alcohol craving (p = 0.18).

Table 1 Mean, std deviation and tests results of the craving outcome scores.

There were missing data from the physiological recordings. We had electrodermal activity (EDA) data from 71 participants (42 in the experimental group and 29 in the control group). Table 2 shows the mean, the standard deviation, and the results from the regression analyses of the different features extracted from the physiological data. The results did not show any statistically significant difference between experimental groups on any of the features extracted.

Table 2 Mean, standard deviation, and test results between experimental groups of the features extracted from the physiological data.

Discussion

We conducted the largest experimental test of whether VR simulations can produce craving for cigarettes in people who smoke regularly. Importantly the test used the latest standalone headset without use of an external computer. The VR public house scene produced a large increase in cigarette craving compared to a neutral VR scene. The Cohen’s d for cigarette craving was 1.1, which is similar to the effect size reported in a meta-analysis of studies focused on cue-induced craving in VR21. The VR pub experience led to significantly increased levels of craving from before to after immersion (i.e., there was a within group effect), but it should be noted when considering the magnitude of the between groups effect that there was also a significant reduction in cigarette craving in the neutral experience. This reduction may perhaps be explained by the use of VR technology being interesting for the participant and hence distracting from cravings. Furthermore, it is possible that the virtual environment, which had windows with views to a natural landscape, might have been found calming like a relaxation nature scene42. The VR pub scene also led to an increase in the smokers in craving for alcohol. The results once again show how VR can induce similar responses to real-world environments. Our VR pub scene could form the basis for the development of a smoking cessation therapy.

We tested whether level of smoking and alcohol consumption affected responses. Neither had a differential affect by the type of VR scene. However, people who reported smoking a greater number of cigarettes had higher levels of cigarette craving in both the VR scene and the neutral scene. In contrast, level of alcohol use did not predict level of cigarette craving in VR. This further validates the use of VR, since it shows that, as would be expected, cravings elicited in VR are affected by a person’s severity of smoking (but not alcohol use).

Regarding the physiological information collected, we explored different features from electrodermal and heart rate data that could be related to craving. The tonic component in the electrodermal data could reveal an overall increase in anxiety. We analysed the mean and standard deviation of this signal relative to the data recorded during baseline. We also looked at skin conductance by calculating the gradient of the linear regression. Electrodermal values naturally change over time, so we predicted that in the experimental group there would be a significantly higher number of participants with a positive slope compared to the control group. We did not find evidence that any feature was statistically different between the randomised groups. Table 2 shows that the values are in the decimals, so we speculate that the signal-to-noise ratio was possibly close to zero dB. Skin conductance values were low. That means that the overall tonic values did not change to any great degree for any participant. Analysing the tonic driver might be more meaningful in longer experiences than three minutes.

The phasic signal is a marker of how participants respond to specific events during the experience. The VR pub scenario contained several smoking cues. We were interested in looking at whether these cues could trigger craving in very specific timestamps thus showing a spike in the data. We analysed the mean and standard deviation of the phasic driver, but the results did not show any difference in the data between the two groups. Data from an accelerometer can provide the information to see if a change in the electrodermal data comes from the movement of limbs. Additionally, electrodermal values can change when people talk, so a voice detection algorithm could be helpful. We finally studied the inter-beat interval to look for statistical differences in the heart rate variability. Again, we looked at the mean and standard deviation relative to baseline for each participant. The results did not show any statistical differences in this case either. Given the strong findings for subjective craving, it is plausible that we did not assess the most useful physiological information to detect it.

The Empatica E4 device has been validated43,44. However, there were missing data. Researchers need procedures in place for setting it up correctly and to expect that the signal might change if the wristband moves. Recordings were three minutes long. Longer sessions would have provided better estimations of the tonic driver and the overall skin conductance level over time. On the other hand, the phasic signal detecting peaks should not have been too affected by the length of the recording. However, it should be kept in mind that changes in the phasic driver induced by stimuli will be reflected in the signal a few seconds later, between one and five seconds40. Our scenario with smoking cues ended with one of the characters looking directly at the participant and offering a cigarette. Recording was stopped right at that moment, whereas it should have carried on longer to capture the response. For the phasic driver, it is important to make an estimation of the signal-to-noise ratio (SNR) to facilitate the task of discriminating the peaks from the noise. We applied a smoothing function when preparing the signal before the decomposition but the phasic signal was not completely noise-free.

There are considerations when using the Empatica E4. If the band is too loose, the readings will vary and data will become unreliable. If the band is too tight, it can create discomfort and interfere with the VR experience. Ideally, the signal acquired needs to be as clean as possible to minimise the amount of post-processing. Another consideration that we noticed is that the wristband has a button that needs to be pressed to start and stop recording data. When the button is pressed, the sensors are pressed against the skin of the person wearing it and that is clearly visible in the data with short spikes and oscillation on the first seconds of the recording, as well as at the end. The data needed to be trimmed but, ideally, the best way to operate the wristband is remotely via the API provided by the manufacturer.

Developing VR therapies based on rigorous experiment is more likely to lead to clinically successful outcomes. This study not only confirms the base potential for VR in helping people smoke less but shows that the scenario can form part of the content of a VR therapy.