Keywords

1 Introduction

Mental health cases are increasing at a worrying rate. Early recognition of stress may prevent its detrimental impact and proper stress management may reduce the risk of being afflicted by its related diseases [1]. With its simplicity and non-invasive approach, heart rate (HR) and heart rate variability (HRV) have been proposed as practical indicators for stress evaluation through the use of wearable sensors e.g. chest-strap detectors with electrocardiogram (ECG) electrodes, or finger or wrist-worn photoplethysmography devices [2,3,4]. HRV is the measure of the difference between two consecutive heartbeats or R-R interval (interval between two R waves in the ECG signal), which reflects the interactions between heart and brain and balance of parasympathetic and sympathetic reactions of autonomic nervous system (ANS). During stressful moments, the sympathetic nervous system will stimulate the release of hormones that cause the “flight-or-fight” reaction, increasing HR and affecting HRV [5]. Numerous parameters have been proposed to evaluate stress based on a group analysis [2, 3, 5]. However, HR and HRV in response to stress may vary between individuals, thus the significant parameter differences between stress and baseline conditions in the subject group may not truly reflect a unique response in individuals.

The HR and HRV in response to stress are usually measured in the short term (conventionally 5 min) and long term. However, real-time requirement restricts the use of conventional short-term HRV in routine medical practice, brief experimental tasks and the sports industry [6]. Thus, this has led to an interest in developing ultra-short-term metrics, in which HR and HRV analyses are obtained from recordings at a shorter duration [7]. Studies have attempted to establish the correlation between ultra-short-term and short-term recordings to identify reliable parameters [8,9,10]. However, the validity and reliability of practised techniques and tests are questionable [6]. Therefore, a standardised guideline has been proposed to determine reliable parameters are to be determined using the validation tests [7].

This study aims to identify reliable HR and HRV parameters for stress conditions by implementing the recommended standardised guideline [7]. The surrogacy of ultra-short-term recording for conventional short-term recording in stress evaluation is also investigated.

2 Methodology

2.1 Dataset

The WESAD dataset [11] was used to perform HRV analysis. The data was collected from 15 subjects (mean age 27 ± 2, 12 males). Exclusion criteria included pregnant women, heavy smokers, mental health patients, and those with chronic and cardiovascular diseases. This dataset contained physiological and acceleration signals in three conditions—baseline (control), amused and stressed. Participants were asked to undergo a guided meditation to allow their heart rate activities return to baseline level after amusement and stress stimulation. The entire process of data collection took about two hours with interchanging arrangement of conditions to avoid event sequence effects (Fig. 1).

Fig. 1
figure 1

Two versions of event sequences used in data collection. The red boxes indicate a pause where participants fill in self-reported questionnaires after a stimulation session [11]

For this study, only the baseline and stress conditions were evaluated. In the baseline condition, a 20-min recording was obtained from subjects who were given neutral reading materials (magazines). In stress condition, the Trier Social Stress Test (TSST), which is a test to stimulate moderate mental stress in a laboratory setting, was performed. The TSST had been proven to be able to stimulate cortisol secretion, which was a stress hormone related to the “flight or fight” reaction [11]. To obtain this dataset, subjects were asked to give speech in public speaking for five minutes, and followed by arithmetic task of counting down from 2023 to zero with steps of 17 for the next five minutes. Participants were required to start over if mistakes were made. The total duration of TSST was 10 min. The data was sampled at 700 Hz and recorded using a RespiBAN Professional chest device (Wireless Biosignals S.A., Lisbon, Portugal). Only the ECG signals were used in the analysis.

2.2 Data Analysis

The ECG data was extracted using MATLAB R2020b (The Mathworks Inc, Natick, MA, USA) and imported into SinusCor for HRV analysis [12]. Table 1 shows all parameters that were extracted and analysed in this study.

Table 1 HR and HRV parameters based on Camm et al. [13]

The ECG signals were visually inspected and manual peak corrections were performed (Fig. 2). The moving median filter was applied to eliminate noise while the quotient filter was employed to remove abnormal beats. The incorrect beats were identified if the changes between two successive R-R interval values were exceeded by 20 percent [14]. The comparison of raw and filtered ECG signals is illustrated in Fig. 3. The R-R interval was then extracted from the processed ECG signal. In baseline condition, the 10 min of the R-R intervals were segmented from the middle of the full 20-min recording, while the full 10 min duration of the stress condition was analysed. These ECG signals were used to obtain the parameters for stress evaluation.

Fig. 2
figure 2

A fragment of raw ECG data obtained under baseline condition

Fig. 3
figure 3

The raw ECG signal (top) and the filtered R-R tachogram (bottom)

The HRV was analysed in time and frequency domains. The time domain measurements were evaluated for every successive 30 s segments without overlapping, while Welch’s method was implemented to estimate the frequency domain measurements with a segment size of 256 with 50% overlap. Hanning window was used with a linear polynomial fit for signal de-trending to control spectral leakage [14]. To investigate the reliability of ultra-short HRV, the 10-min segments of the filtered ECG signal under baseline and stress conditions were truncated into 5-min and 1-min segments (Fig. 4).

Fig. 4
figure 4

Illustration of 1-min and 5-min segment extraction

2.3 Statistical Analysis

Statistical analysis was performed using GNU PSPP V1.4.1 (GNU Project, Boston, USA) and Microsoft Excel (Microsoft Corporation, Redmond, USA). Figure 5 shows the flowchart of analysis implemented in this study. The data normality was examined by using Shapiro–Wilk test. For normally distributed data, independent t-test was used to compare the significant differences between the baseline and stress measurements, while data with non-normal distribution was tested using Mann–Whitney U test. Levene’s test and two-sample Kolmogorov–Smirnov test were applied to assess the equality of variances and distribution of data, respectively, to find out whether the assumptions of the Mann–Whitney U test was met by the datasets.

Fig. 5
figure 5

Flowchart of the procedure carried out in this study

The reliability of ultra-short-term parameters was examined using the algorithm proposed by Pecchia et al. [7]. All the appropriate tests (Levene’s test and two-sample Kolmogorov–Smirnov test) were applied on the dataset to confirm that they met the assumptions of statistical tests used to compare the significant differences. Parametric and non-parametric tests like Pearson’s correlation and Spearman’s correlation were used to identify the correlation. The Bland–Altman plot and linear regression procedure were used to verify the data’s agreement. Once all steps were performed, the ultra-short HR and HRV could be presumed to be good surrogates if the parameters preserved the same behaviour between the 5-min and 1-min recordings, and showed significantly high correlation (r > 0.7 and p < 0.05) for both 5-min and 1-min recordings. All significant thresholds were set at 0.05, and the correlation coefficient threshold was set at 0.7.

3 Results and Discussion

3.1 HR and HRV for Stress Evaluation

HR and HRV in response to stress condition could vary between individuals, thus individual stress response was evaluated based on the significant difference between stress and baseline condition (Table 2: Individual analysis). The parameters with resulting high frequency indicate good potential to identify stress from majority of the subjects. To show the results trend, the comparison between baseline and stress condition was shown as mean ± standard error across the subjects (Table 2: Group analysis). From the results, all 15 subjects under stress demonstrated a significantly lower mean RRi, while 14 showed significantly higher mean HR under stress. This was followed by 13 subjects with lower RMSSD, SD1 and HF, whereas 12 subjects showed greater LF/HF and LFnu in stress condition (p < 0.05). Only the mean RRi and mean HR classified stress condition in the group analysis.

Table 2 Comparison between baseline and stress conditions based on individual (left) and group analysis (right)

From the analysis, most of the parameters were non-normaly distributed and heteroscedastic and thus requires logarithmic transformation. There was a certain degree of fluctuation in these parameters, with the least changes in mean RRi and mean HR. When the subjects were under stress, the HRV parameters of RMSSD, SDNN, pNN50, SD1, mean RRi, VLF, HF, and HFnu were lower than the baseline condition, while SD2, mean HR, LF, LF/HF, and LFnu increased. These findings followed the expected trend, where when mean HR increases, the duration between successive R-R would decrease. The ANS was activated, suppressing the parasympathetic nervous system (PNS), and activating the sympathetic nervous system (SNS). The SNS sustained homeostasis through sweating, heat dissipation and increased cardiac output. Once the stress had subsided, the PNS would facilitate the return of the body to equilibrium, countering the SNS effects [15]. SNS activity could be reflected by LF while HF denoted PNS activity [15]. From the results, an increase in LF and LF/HF during stress condition confirmed the SNS ascendancy.

3.2 Ultra-Short-Term Analysis

In the ultra-short-term analysis, all parameters demonstrated significant correlations (r > 0.6), with majority showing a very high correlation (r > 0.7, p < 0.05) between the 5-min and 1-min recordings (Table 3). Nevertheless, the good correlation did not imply good agreement as the data could be widely spread. As such, the Bland–Altman plot was used to identify the agreement between 5-min and the 1-min measurements. The linear regression was used to determine whether the existence of proportional bias. This bias was observed to exist (q < 0.05) in LF, LF/HF, LFnu, and HFnu under both conditions, and VLF under stress only. The limits of agreement (LOA) values varied greatly due to the widespread of values between parameters (Table 3). The percentage error (PE) was calculated because the information presented from the LOA was unclear, and no significant visual differences between all parameters in the Bland–Altman plots (Fig. 6). The PE presented a more context-sensitive value, in which the LOA was divided by the mean of the measurements with a threshold of ± 30% [16]. Based on this, all parameters with PE > 30% in both conditions were disregarded and the accepted parameters were mean RRi, mean HR, VLF, LF, and HF (Table 3).

Table 3 Significant correlation (r) and proportional bias (q) between the 1-min and 5-min recordings during baseline and stress conditions
Fig. 6
figure 6

Bland–Altman plot of mean RRi (left) and mean HR (right) during baseline (top row) and stress conditions (bottom row)

The reliable ultra-short parameters should exhibit a significantly high correlation and portray the same trend for baseline and stress conditions between 1-min and 5-min recordings. Most of the parameters preserve the same trend of comparison between baseline and stress conditions in 1-min and 5-min recordings, except for SDNN and HF (Table 4), in which the SDNN increased during stress condition for the 1-min recording, whereas a decrease was observed in the 5-min analysis. The same went for HF. From this analysis, mean RRi and mean HR not only preserved the resulting trend, but also significantly reflected the stress condition in both 1-min and 5-min recordings.

Table 4 Comparison of HR and HRV parameters during baseline and stress conditions for 1-min and 5-min recordings

Our results were generally concurred with Salahuddin et al. [8], who also reported the mean HR and mean RR as among the reliable parameters for ultra-short 50 s recording. Esco and Flatt [10] found that the 1-min segment showed the strongest correlation and considered the natural log of RMSSD as promising, while Baek et al. [9] accepted RMSSD and HF as good surrogates. However, there were no standardised tests used in those studies. Other studies did not justify the reliability of ultra-short-term HRV [17]. Although some studies made valuable comparisons between various time intervals [8,9,10], there was no comparison made between control and the ultra-short-term measurements for the parameter’s capability in preserving information (result trend and significant pairs) [8,9,10, 17, 18].

It is commonly mistaken to corroborate a surrogate measure based on the correlation alone because this was insufficient as validating tool. Our results showed that only mean RRi and mean HR were able to preserve the information (showing same result trend and significant pairs), although most of the parameters demonstrated a good correlation. Another misconception for accepting a marker as good surrogate was when the null hypothesis of the statistical test between the standard and the marker was accepted [7]. Measurements for at least one minute would be required to avoid result inaccuracies when it involved LF and HF power spectra. The bandwidth of the LF power spectrum was 25 s and a minimum of 250 s of HRV signals were required to completely measure the full LF power spectrum. At least one minute was required for the HF power spectrum [13].

The limitations of this study included a small sample size and uneven gender distribution. Future studies should gather a larger sample with equal gender participation to investigate the influence of different activity levels and breathing pace to better reflect the behaviours of HR and HRV parameters [19].

4 Conclusion

The mean RRi, mean HR, RMSSD, SD1, HF, LF/HF and LFnu could reflect individual stress level in majority of subjects. With the implementation of recommended standardised tests, the mean RRi and mean HR could be the potential indicators to identify stress condition. Consistent trends were found in ultra-short analysis and the 1-min mean RRi and mean HR could become possible surrogates for conventional short-term analysis.