1 Introduction

Heart rate variability (HRV) is commonly considered an important tool to assess overall cardiac health and the status of autonomic nervous system (ANS) [1, 2]. HRV reflects cardiovascular complexity and the organism capability to react to environmental and psychological stimuli, and is widely agreed as a noteworthy feature of healthy physiological regulation that is commonly degraded with disease and aging [1,2,3,4,5]. Therefore, HRV may potentially reveal an ongoing disease, or even an impending cardiovascular pathological state [2, 6]. The “gold standard” method to assess HRV involves measurement of the time interval between consecutive heartbeats from electrocardiographic (ECG) recordings—RR intervals [1,2,3,4,5,6]. While in the analysis of HRV recordings as long as 24 h (“long-term” recordings) can be considered, recordings of HRV of ~ 5 min are the most commonly analyzed, both for practical purposes and because they allow to assess the short-term cardiovascular control [1, 6]. The analysis of short-term HRV can be also used—with a reasonable time resolution—to assess changes in the ANS activity related to altered psychophysiological states, including among others the response to physiological stressors [7, 8].

Short-term HRV is commonly described through different analysis techniques, carried out via time-domain, frequency-domain, and information-theoretic indexes [1, 6, 9, 10]. Time-domain analysis can quantify the magnitude of variability either beat-to-beat, or overall within the considered timeframe. On the other hand, frequency-domain indexes are convenient to estimate the distribution of absolute or relative power into specific frequency bands [1]. Furthermore, information-domain entropy-based measures allow to assess the complexity of HRV, which is related to the regularity of the temporal patterns found in the time series and is linked to the balance between sympathetic and parasympathetic ANS activity [1, 3, 7, 11,12,13,14,15,16]. Notably, the interest in the information-theoretic assessment of short-term cardiovascular variability is increasing, as documented by several recent studies focused not only on the analysis of HRV complexity, but also on the analysis of physiological interactions within cardiovascular and cardiorespiratory control system [3, 7, 11,12,13,14,15,16].

In recent years, there has been an upswing of the utilization out of clinical settings of photoplethysmography (PPG), an optical technique capable of detecting microvascular blood volume changes in tissues [16, 17]. The simplest PPG devices usually consist of a light source illuminating the tissue, and a photodetector capable of sensing the small variations in reflected or transmitted light intensity [16,17,18,19]. The PPG working principle relies on a different absorption of infrared light by blood in comparison to the surrounding skin tissues, in order to obtain a signal proportional to changes in blood volume [16,17,18,19]. The growing interest towards PPG is mainly due to its ease of use, low cost, safety, minimal invasiveness, and potential to carry out a wide range of physiological assessments, such as blood oxygen saturation and extraction of cardiovascular and respiratory parameters. Moreover, PPG signals are becoming more widely available thanks to the possibility to employ cameras embedded in smartphones and smartwatches for their acquisition [20, 21]. The PPG technique is used in the clinical and physiological research also to measure arterial pressure variability, employing medical devices that exploit the volume-clamp method, such as Finapres and Finometer [17, 22,23,24]. Finapres (FINger Arterial PRESsure) devices were first developed in the early 1980s for providing a reliable continuous blood pressure (CBP) monitoring. Their working principle is based on the dynamic vascular unloading of the finger arterial walls using an inflatable finger cuff [17, 22,23,24]. In such devices, a PPG probe controls an air pump to counteract finger volume changes, so that the cuff pressure oscillations are a reflection of the arterial pressure signal and can be used as its surrogate [17, 22]. CBP and PPG signals are thus related to each other since a pulsation of the arterial diameter during a heartbeat produces a pulsation in the photodetected signal; e.g., timings of the CBP signal are based on plethysmographic principle [17, 23].

Starting from a PPG or blood pressure signal, a time series of pulse rate variability (PRV) can be extracted which can be considered as an alternative way for the recording of HRV [16,17,18,19]. However, PPG or CBP recordings are typically affected both by physiological factors related to the transmission of the pulse wave along the vascular bed, and by measurements errors due for instance to motion-induced signal corruption and to the lower accuracy of peak detection [16, 25]. Such drawbacks may impair the agreement between PRV and HRV, thus potentially limiting the usability of PPG-based approaches to the evaluation of HRV parameters. This issue has been widely investigated in recent years [21, 24, 26,27,28,29,30,31,32,33,34,35,36,37], and we refer the reader to [24] for a comprehensive review in this sense. Most of the papers [26,27,28,29,30,31,32,33] take into account time- and frequency-domain indexes for the analysis, while only a few consider also information-theoretic variables [34, 35]. While an overall good agreement has been found in the literature among PRV and HRV variables, results appear still controversial [24, 27,28,29,30,31,32,33].

The present study aims at contributing to the assessment of the reliability of PRV as a surrogate of HRV with a special focus on the following: (i) the exploration of a broad range of measures able to characterize short-term cardiac control complexity, and (ii) the evaluation of the agreement during conditions of physiological stress. To this end, we evaluate, in a group of healthy subjects monitored during supine rest, postural stress induced by head-up tilt, and mental stress induced by arithmetic tests, several measures typically used to quantify short-term HRV in the time domain (mean, variance, and root mean square of the successive differences (RMSSD)), frequency domain (low-to-high frequency power ratio LF/HF, HF band central frequency, and spectral power), and information domain (entropy, conditional entropy, self entropy). To the best of our knowledge, this paper is the first which compares PRV to HRV taking into account, in the same study, nine different indexes, also including information-theoretic domain ones. This approach considers the complex nature of beat-to-beat heart rate oscillations stressing the importance of their assessment using time series analysis tools in various domains. In addition, our work brings special focus on validation, performing a thorough analysis of the differences and relations between two assessed methods (PRV and HRV) exploiting three types of analysis (i.e., hypothesis testing, correlation analysis, Bland–Altman plots). Our intention was to complement the results already available in the literature in terms of when and to which extent PPG-based measurements can be used to assess HRV in place of the standard but more cumbersome ECG technique as regards the detection of stress states resembling those of daily life situations, with possible implications for homecare and fitness applications. The present work completes and extends results recently presented in a preliminary form in ref. [38].

2 Materials and methods

2.1 Subjects and experimental protocol

The present work makes use of a database previously collected to evaluate the effects of physiological stress and cognitive workload on cardiovascular variability [39]. Seventy-six young healthy volunteers (32 males, 44 females, age 18.4 ± 2.7 years), all normotensive, and having a normal body mass index (BMI = 21.3 ± 2.3 kg/m2) participated to the study. All participants signed a written informed consent, and when the subject was a minor (age < 18 years) prior parental or legal guardian permission was gathered to allow the child to participate in the study. All the procedures were approved by the Ethical Committee of the Jessenius Faculty of Medicine, Comenius University, Martin, Slovakia.

Signals were recorded on subjects in five different phases of the experimental protocol: (a) 15 min with subjects resting in the supine position after initial stabilization of physiological parameters prior to measurement at a baseline level (phase R1), (b) head-up tilt test was performed for 8 min in order to produce orthostatic stress (phase T), (c) another phase of 10 min of supine rest allowed physiological parameters of the subjects to recover (phase R2), (d) with subjects lying in the supine position, a mental arithmetic test lasting 6 min was executed to evoke cognitive load (phase M), and (e) another phase of 10 min of supine rest was allowed to let the physiological parameters to recover again (phase R3). Head-up-tilt was performed in phase T by tilting passively the motorized bed table on which the volunteers were laying to 45° upright position. The arithmetic test was carried out in phase M using WQuick software with WIN 5 PMT test (Psycho Soft Software, s.r.o., Brno, Czech Republic) and consisted of a repetitive display on the ceiling of the room of randomly generated 3-digit numbers. Each subject was asked to read the numbers and mentally sum up the digits as quickly as possible: if the result was a two-digit number, the subject was instructed to keep summing the digits until a one-digit number was reached; then, the subject had to decide whether the final resulting number was even or odd by using a computer mouse to click the corresponding virtual button also projected on the ceiling.

The analyzed data consisted of ECG and blood pressure recordings acquired simultaneously at a sampling rate of 1 kHz. ECG was obtained using horizontal bipolar thoracic leads and recorded by Cardiofax ECG-96220 (Nihon Kohden, Japan), while blood pressure data were obtained using Finometer Pro device (FMS, The Netherlands), which measures beat-to-beat arterial pressure variability through the volume-clamp method [22, 23].

2.2 Time series and data analysis

Analyses have been carried out selecting windows of N = 300 consecutive heartbeats, for each of the five phases described in the previous subsection. The windows were selected during stable physiological conditions to avoid transition effects from one phase to another, thus favoring the stationarity of the time series. In detail, the 300-point windows were extracted from the recorded signals, respectively, starting ~ 8 min after the beginning of phase R1, ~ 3 min after the beginning of phase T, ~ 3 min after the beginning of phase R2, ~ 2 min after the beginning of phase M, and ~ 5 min after the beginning of phase R3. The analyzed 300-point windows were free of artifacts, including those related to calibration of the Finometer device (such calibration, which interrupts the measurement of CBP, was executed only in the last minute of phases R1 and R2).

Starting from the acquired data, the n-th RR interval (RRI) was calculated from the ECG as the time interval between the n-th and (n + 1)-th QRS apexes, while the n-th pulse-to-pulse interval (PPI) was measured as the time interval between the n-th and (n + 1)-th blood pressure maxima. RRI and PPI values were extracted using LabChart 8 (ECG analysis, blood pressure modules) toolbox from ADInstruments. An example of the RRI and PPI time series measured for a representative subject during the phases R1, M, and T is reported in Fig. 1a.

Fig. 1
figure 1

Examples of RRI and PPI time series (top row panels), spectral decomposition (mid row panels), and probability distributions (bottom row panels) for a representative subject monitored at rest (a, phase R1), during orthostatic stress (b, phase T), and during mental workload (c, phase M). Black lines/plots represent RRI data, while blue lines/plots denote PPI data. Dotted lines in mid row panels indicate the k components obtained from spectral decomposition representing the oscillatory structure of the process. In this example, the indexes measured in the three phases (R1, T, and M) were the following: MEAN (911.24, 720.08, 843.09 ms from RRI; 911.24, 720.13, 843.09 from PPI), SDNN (40.33, 52.74, 40.74 ms from RRI; 42.18, 57.19, 45.24 ms from PPI), and RMSSD (40.82, 29.24, 33.16 ms from RRI; 42.18, 39.57, 45.70 ms from PPI) for the time-domain analysis; fHF (0.336, 0.166, 0.191 Hz from RRI; 0.336, 0.171, 0.189 Hz from PPI), HF (606.19, 608.32, 527.81 ms2 from RRI; 726.38, 744.60, 727.46 ms2), and LF/HF (0.999, 3.028, 1.002 from RRI; 0.826, 3.116, 0.732 from PPI) for frequency-domain analysis; and H (5.046, 5.209, 4.842 nats from RRI, 5.035, 5.321, 5.009 nats from PPI), CE (2.067, 1.470, 1.700 nats from RRI; 2.042, 1.839, 1.974 nats from PPI), and SE (0.116, 0.547, 0.122 nats from RRI; 0.119, 0.395, 0.062 nats from PPI)

For both the RRI and PPI time series measured in each phase, time-domain analysis was performed computing the average value (MEAN), the standard deviation of the normal-to-normal intervals (SDNN), and the root mean square of successive RRI or PPI interval differences (RMSSD) [1, 40]. SDNN has been chosen as an index of interest since both sympathetic nervous system (SNS) and parasympathetic nervous system (PNS) activity contribute to its value, and also because it represents the gold standard for medical stratification of cardiac risk, although this has been proved on 24-h period recordings [1]. Instead, RMSSD reflects the beat-to-beat variability in pulse interval (or heart rate) and is the primary time-domain measure used to assess vagally mediated changes in HRV [1], and was computed as [40]

$$ \mathrm{RMSSD}=\sqrt{\frac{1}{N-1}\sum \limits_{n=1}^{N-1}{\left(x\left(n+1\right)-x(n)\right)}^2,} $$
(1)

where x(n) can be either RRI(n) or PPI(n), indicating the n-th measurement of the interval, and N = 300. Before performing frequency- and information-domain analyses, PPI and RRI time series were further pre-processed removing slow trends (by means of a zero-phase IIR high-pass filter with a cutoff frequency of 0.015 Hz) and then reduced to zero mean by subtracting the mean value. Quasi-stationarity of the selected time series was verified by applying test described in [41], which checks a restricted form of weak stationarity by assessing the stability of mean and variance across sub-windows of the analyzed time series.

Parametric spectral analysis was then carried out fitting the pre-processed time series with an autoregressive (AR) model of the form

$$ x(n)=\sum \limits_{k=1}^p{a}_kx\left(n-k\right)+w(n), $$
(2)

where ak are the linear regression coefficients that weight the linear dependence of the current sample of the time series, x(n), on the past samples, x(n-k) with a delay k = 1,..., p, p is the model order, and w(n) an uncorrelated Gaussian white noise process with zero mean and variance \( {\sigma}_w^2 \). Here, model identification was performed through the ordinary least squares method and, instead of using standard model order selection criteria, a model order of p = 10 was selected to allow representation of different oscillations within the low-frequency (LF, range 0.04–0.15 Hz) and high-frequency (HF, range 0.15–0.4 Hz) spectral bands [38]; in general, it has been proven that orders from p = 9 to p = 25 generate statistically similar normalized spectral parameters [42]. Then, the power spectral density of the AR process was computed in the frequency domain starting from the AR coefficients as [43]

$$ P(f)={\sigma}_w^2/\mid A(z){\left|{}_{z={e}^{j2\pi fT}}\right|}^2 $$
(3)

where \( A(z)=1-{\sum}_{k=1}^p{a}_k{z}^{-k} \) is the representation of the coefficients in the z-domain.

In this work, we used a spectral decomposition method to split P(f) into k components reflecting the oscillatory structure of the process [44], each associated with a central frequency fk and a power Pk computed from the roots of the polynomial A(z) [16]. An example of spectral decomposition performed from RRI and PPI time series during R1, M, and T is shown in Fig. 1 (mid row panels). As frequency-domain indexes, we have selected the central frequency of the most prominent peak located in the high frequency band (fHF), the HF spectral power in absolute units, and the ratio of the total power located in the LF band to that found in HF band (LF/HF). In detail, LF power may reflect the activity of both PNS and SNS; however, SNS usually does not generate rhythms much above 0.1 Hz, while the PNS can affect heart rhythms down to 0.05 Hz [1]. The power content in the HF band reflects parasympathetic activity and a strong relationship between the HF central frequency, and respiratory influences has been observed [1]; moreover, lower HF power has been correlated with stress, panic, anxiety, or worry conditions [1]. Finally, LF/HF power ratio has been chosen since it has been long used as an index reflecting the proportion of sympathetic to parasympathetic activity [1], although its reliability as a measure of the autonomic balance is largely debated [45, 46]. In addition to these indexes herein analyzed, also the LF spectral power (both in absolute and normalized units), the central frequency of the most prominent peak located in the LF band and the normalized power in the HF band have been considered; description and relevant results are included as electronic supplementary material (Online Resource 1).

Finally, information-domain analysis was carried out to assess the amount of information contained in the most recent sample of the time series x(n), as well as the information shared between x(n) and the past samples x(n−1),..., x(nm) and the residual information contained in x(n) but not in x(n−1),..., x(nm). These quantities reflect, respectively, the entropy of the time series (H), the part of this entropy which can be derived from the past history (self entropy, SE) and the part that cannot be derived from it (conditional entropy, CE) [11, 47, 48]. In detail, entropy, conditional entropy, and self entropy are given by:

$$ {\displaystyle \begin{array}{l}{H}_x=-E\left[\log p\left(x(n)\right)\right]\\ {}{C}_x=-E\left[\log \frac{p\left(x(n),x\left(n-1\right),...,x\left(n-m\right)\right)}{p\left(x\left(n-1\right),...,x\left(n-m\right)\right)}\right]\\ {}{S}_x={H}_x-{C}_x=E\left[\log \frac{p\left(x(n),x\left(n-1\right),...,x\left(n-m\right)\right)}{p\left(x(n)\right)\cdot p\left(x\left(n-1\right),...,x\left(n-m\right)\right)}\right]\end{array}} $$
(4)

where for the generic random variables a and b, p(a) is the probability of a, and p(a|b) = p(a,b)/p(b) is the conditional probability of a given b. An example of the probability distribution p(x(n)), computed using histogram quantization where x(n) is RRI(n) or PPI(n), is reported in Fig. 1 (bottom row panels). In this work, we applied two different model-free approaches, described in detail e.g., in ref. [48], to compute the information measures defined in Eq. (4). The conditional entropy was computed using the kernel estimator, which yields the estimate of CE very well known as sample entropy [12]. This is achieved computing the kernel density estimate of the probability distribution, i.e., computing the probability of a generic vector v(n) from M realizations as \( p\left(\mathbf{v}(n)\right)={\left(M-1\right)}^{-1}{\sum}_{i=1,i\ne n}^M\varTheta \left(\left\Vert \mathbf{v}(n)-\mathbf{v}(i)\right\Vert \right) \) where ||∙|| is the maximum norm and Θ is the Heaviside kernel with threshold distance r (i.e., the vectors v(n) and v(i) are similar if the maximum distance between their scalar components is lower than r), and then using the kernel estimator for computing p(x(n), x(n − 1), ..., x(n − m)) and p(x(n − 1), ..., x(n − m)) to be plugged in Eq. (4) to estimate Cx. As to the estimate of entropy and self entropy, here, we adopted an approach based on k-nearest neighbor method for the estimation of probability density functions [48, 49] that is more accurate and less biased than the kernel entropy estimation methods commonly used for Cx [12]. With this approach, estimates of the quantities Hx and Sx defined in Eq. (4) are obtained in practice as follows:

$$ {\displaystyle \begin{array}{l}{H}_x=\psi (N)-\psi (k)+\frac{1}{N}{\sum}_{n=1}^N\ln \varepsilon (n)\\ {}{S}_x=\psi \left(N-m\right)+\psi (k)-\frac{1}{N-m}{\sum}_{n=1}^{N-m}\left(\psi \left({N}_{x\left(n,m\right)}\right)+\psi \left({N}_{x(n)}\right)\right)\end{array}} $$
(5)

where ψ(∙) is the digamma function, ε(n) is twice the distance from x(n) to its k-th nearest neighbor in the one-dimensional space, x(n, m) = (x(n − 1), ..., x(n − m)), ε(n, m + 1) is twice the distance from (x(n), x(n, m)) to its k-th nearest neighbor in the (m + 1)-dimensional space, and Nx(n, m) and Nx(n) are the number of points whose distance from x(n,m) and x(n), respectively, is smaller than ε(n, m + 1)/2. Details about the estimation of these measures can be found in [48, 49]. In this study, before evaluating information-domain indexes, the time series were normalized to unit variance. Then, in accordance with the literature standards for short-time series [12, 49], the parameter m was set equal to m = 2; the threshold distance for the kernel estimator was set to r = 0.2, and the number of neighbors to be used in Eq. (5) was set to k = 10.

2.3 Statistical analysis

The main aim of the analyses carried out in this work was to assess to what extent PPI-based measures can be used to evaluate HRV in place of the gold standard ECG technique. For this reason, the agreement between PPI- and RRI-based approaches has been assessed comparing the distributions of the nine indexes described in Sect. 2.2 across the 76 subjects, using three different testing approaches: (a) hypothesis testing, (b) correlation analysis, and (c) Bland–Altman plots.

For each of the five phases described in Sect. 2.1, we have applied the Wilcoxon signed rank test to check whether RRI-based and PPI-based distributions come from a distribution of the same median. The null hypothesis is rejected (i.e., p value < 0.05) when there is no agreement between the two distributions under test. Moreover, to test the statistical significance of the differences in median of the five distributions of each measure evaluated across conditions (R1, T, R2, M, R3), we used the Friedman ANOVA test, followed by post hoc Wilcoxon signed rank test to assess pairwise differences (e.g., R1 vs T, R1 vs R2, R1 vs M, R1 vs R3, and so on), also employing the Bonferroni correction for multiple comparisons.

As regards correlation analysis, a robust regression technique was used to calculate the slope (a) and the intercept (b) of the regression line, in order to quantify the agreement between RRI-based and PPI-based measurements according to the linear prediction model y = ax + b, where x and y represent values of measures assessed from RRI and from PPI time series, respectively. Moreover, the standard linear regression technique was employed to calculate the Pearson correlation coefficient R, which was taken as a measure of the agreement between the two distributions under test.

Finally, the Bland–Altman plots were used to assess the agreement between the two approaches in terms of the differences between RRI-based and PPI-based measurements (in the y axis) plotted versus their average (in the x axis). The agreement was quantified as a ratio between half the 95% confidence interval for the difference and the mean of the averaged values [50].

3 Results

Figure 2 shows the comparison of the distributions of time-domain indexes (i.e., MEAN, SDNN, and RMSSD) across the 76 subjects in the five considered conditions.

Fig. 2
figure 2

Boxplot distributions of time-domain indexes, i.e., (a) MEAN, (b) SDNN, and (c) RMSSD calculated from PPI (white) and RRI (gray) time series during supine rest phases (R1, R2, R3), head-up tilt (T), and mental arithmetic test (M). Statistical tests: phase name, p < 0.05 Ph.1 vs Ph.2; *p < 0.05 PPI vs RRI

As reported, for all the three considered indexes, the phase-by-phase comparison shows that the distributions in supine rest conditions (R1, R2, R3) are not statistically different. On the contrary, all indexes were significantly lower (p < 0.05) during mental arithmetic and head-up tilt if compared with each of the three resting conditions.

Importantly, the results of the significance tests were exactly the same when the indexes were computed from RRI or from PPI measurements. Regarding the comparison between the two approaches to compute the same index in a given phase, we observed that the distributions of the MEAN index were statistically similar. On the contrary, the values of SDNN and RMSSD were significantly higher if assessed from PPI than from RRI time series.

Figure 3 depicts the comparison of the distributions of frequency-domain indexes, i.e., fHF, HF, and LF/HF. The central frequency of the HF peak in the power spectrum did not show any statistically significant difference across the analyzed experimental conditions, while the HF spectral power was significantly lower during M and T than during any of the supine rest conditions. Analysis of the LF/HF power ratio (Fig. 3c) shows that, while the three resting conditions did not exhibit significant differences with each other, the ratio was significantly higher during T and M than during rest, and was significantly higher during T than during M. For the HF and LF/HF indexes, we note also that the PPI-based values are significantly different than the RRI-based values in all conditions, the difference being more positive (larger PPI) for HF and negative (larger RRI) for LF/HF.

Fig. 3
figure 3

Boxplot distributions of frequency-domain indexes, i.e., (a) fHF, (b) HF, and (c) LF/HF, calculated from PPI (white) and RRI (gray) time series during supine rest phases (R1, R2, R3), head-up tilt (T), and mental arithmetic test (M). Statistical tests: phase name, p < 0.05 Ph.1 vs Ph.2; *p < 0.05 PPI vs RRI

Similar remarks can be made from the comparison of the distributions of information-domain indexes, i.e., H, CE, and SE, shown in Fig. 4. Again, RRI-based or PPI-based distributions of supine resting phases are similar, while T and M are statistically different. In particular, significant differences can be observed for H during both T and M (a significant decrease), for CE during T (a decrease), and for SE during T (an increase). RRI-based and PPI-based values of the indexes are statistically different as the PPI method overestimates H and CE and underestimates SE (p value < 0.05), the difference being more remarkable during T.

Fig. 4
figure 4

Boxplot distributions of information-domain indexes, i.e., (a) H, (b) CE, and (c) SE calculated from PPI (white) and RRI (gray) time series during supine rest (R1, R2, R3), head-up tilt (T), and mental arithmetic test (M). Statistical tests: phase name, p < 0.05 Ph.1 vs Ph.2; *p < 0.05 PPI vs RRI

Table 1 summarizes the results of the linear correlation analysis between RRI-based and PPI-based measurements of, respectively, time-, frequency-, and information-domain indexes, assessed through robust regression in the five different conditions (R1, R2, R3 shown together in the left panel in each figure, T in the central panel, M in the right panel). In detail, the numerical values of the correlation coefficient (R), slope (a), and intercept (b) of the regression lines are reported in Table 1.

Table 1 Results of regression analysis

With regard to time-domain indexes, an almost perfect agreement (i.e., correlation coefficient and slope of the regression of ~ 1.00) is observed in resting conditions (R1, R2, R3), while a slightly lower slope has been obtained in case of RMSSD in T (a~0.97) and M (a~0.98) phases.

Regression analysis of frequency-domain indexes demonstrates again a very good agreement of central frequency and HF spectral power in resting conditions, with almost unitary slope values and correlation coefficients higher than 0.93. As to the LF/HF power ratio, the agreement is lower than expected during the R1 condition due to the presence of some outlier values, but the correlation is very high in the two other resting phases R2 and R3. The lowest value of the slope of the regression line has been obtained also in this case during the T condition.

Regression analysis of information-domain indexes shows similar results to the previous analysis, demonstrating again a very good agreement of all the considered indexes (i.e., H, CE, and SE) in the supine resting conditions (R1, R2, and R3), with high values of the correlation coefficient in all the cases; lower correlation values were obtained for CE (computed with the kernel estimator) compared with H and SE (computed with the nearest neighbor estimator). In a similar way, the slope of the robust regression line was higher than 0.9 during the three rest conditions when assessed for H and SE, while it was lower for CE. Again, a worse agreement is observed moving from rest to stress, with generally lower R and a values obtained in the M and especially in the T phases. The worst agreement is observed during head-up tilt for SE (R~0.79, a~0.63) and for CE (R~0.79, a~0.61).

Figures 5, 6, and 7 depict Bland–Altman plots, respectively for the time-domain, frequency-domain, and information-domain indexes, during all the five conditions; again, the resting phases R1, R2, and R3 have been analyzed together. Diagrams have been obtained plotting the differences between RRI-based and PPI-based measurement versus their average. The average difference is indicative of the bias of PPI-based measures compared to RRI-based ones, while the width of the 95% confidence intervals of the differences PPI-RRI is indicative of the dispersion of PPI-based measures around RRI-based ones. Numerical values of the agreement between PPI-based and RRI-based measures are reported in Table A of the electronic supplementary material (Online Resource 1).

Fig. 5
figure 5

Bland–Altman plots of time-domain indexes, i.e., (a) MEAN, (b) SDNN, and (c) RMSSD showing mean values of RRI-based and PPI-based indexes against their difference, computed for the five considered conditions. For each figure, the left panel shows results obtained in resting conditions (R1 in blue, R2 in green, and R3 in red), the central panel those extracted for head up tilt (T) case, and the right panel those computed for mental arithmetic test condition (M). Horizontal lines denote median (dotted lines) and 95% confidence intervals of the difference PPI-RRI (solid lines)

Fig. 6
figure 6

Bland-Altman plots of frequency-domain indexes, i.e., (a) fHF, (b) HF, and (c) LF/HF showing mean values of RRI-based and PPI-based indexes against their difference, computed for the five considered conditions. For each figure, the left panel shows results obtained in resting conditions (R1 in blue, R2 in green, and R3 in red), the central panel those extracted for head-up tilt (T) case, and the right panel those computed for mental arithmetic test condition (M). Horizontal lines denote median (dotted lines) and 95% confidence intervals of the difference PPI-RRI (solid lines)

Fig. 7
figure 7

Bland–Altman plots of frequency-domain indexes, i.e., (a) H, (b) CE, and (c) SE showing mean values of RRI-based and PPI-based indexes against their difference, computed for the five considered conditions. For each figure, the left panel shows results obtained in resting conditions (R1 in blue, R2 in green, and R3 in red), the central panel those extracted for head-up tilt (T) case, and the right panel those computed for mental arithmetic test condition (M). Horizontal lines denote median (dotted lines) and 95% confidence intervals of the difference PPI-RRI (solid lines)

With regard to time-domain indexes (Fig. 5), there is an almost perfect agreement for the MEAN parameter between RRI-based and PPI-based measurements (around 10−4) in all the conditions.

Moreover, the average values PPI-RRI are very small, as also seen by the almost null median values of the plots. Also, SDNN shows a very good agreement, with index lower than 0.05 for all the cases and especially for the resting conditions. However, the average values PPI-RRI and the median of their distribution show that PPI measurements overestimate SDNN for all the conditions, especially during head-up tilt. The same remarks can be made for RMSSD in terms of overestimation in case of PPI measurements. In addition, the agreement was very good (~ 0.05) for resting phases, but lower for other conditions, being 0.09 during M and 0.13 during T.

The Bland-Altman analysis carried out on frequency-domain indexes (Fig. 6) shows quite different results. First, the bias is very low for fHF in all the considered conditions. A larger positive bias was instead found for HF, indicating that PPI-based measures overestimate HF power.

The agreement is quite good (~ 0.01) only for fHF, while it is worse otherwise. In detail, agreement is bad (> 2) for the LF/HF power ratio during R1, R2, and R3 (considered together) and during T, being only a little better in the M phase. The difference in the agreement among the various conditions is very marked for HF, ranging from ~ 0.16 in the supine resting conditions to ~ 0.83 during T.

Figure7 reports the results of the Bland–Altman test executed for the information-domain indexes. The bias, which documents the average of the differences, is positive for the indexes H and CE and is negative for SE, confirming that PPI-based measures overestimate entropy and complexity and underestimate the regularity of HRV. Moreover, the bias was small during rest and M and larger during T for all indexes. As to the agreement, it is very good (~ 0.02) in all conditions for the index H, while it is larger for the indexes CE and SE, ranging from ~ 0.14 to ~ 0.5.

We highlight that similar remarks can be made when analyzing LF spectral power and also normalized power values, which are other frequency-domain indexes widely employed in the literature [1, 6]. We refer the reader to the electronic supplementary material (Online Resource 1) for this analysis.

4 Discussion

The aim of the present study was to investigate to what extent measures of PRV-derived detecting blood pulsation in the peripheral circulation can substitute traditional HRV measures derived from the ECG, with focus on descriptive indexes based on short-term analysis performed in time, frequency, and information domains. For PRV analysis, we have employed blood pressure data acquired via a Finometer device (previously collected for other purposes). We are aware that volume clamp photoplethysmography-based devices, being less portable and more expensive when compared to basic PPG devices, are not the way to go for practical applications if the purpose is to measure PRV. However, as the CBP and PPG signals are strictly related to each other in terms of pulsation timings, both have been proposed to compare HRV and PRV [24], and this suggests that our results should also reflect the agreement between HRV and PRV when the latter is assessed through portable photoplethysmographic devices [17, 23].

In the literature, different works have been devoted to comparing PRV and HRV variables [24, 27,28,29,30,31,32,33]. Even if an overall good agreement has been obtained in most papers for time- and frequency-domain indexes, the results appear still somewhat controversial [24, 27,28,29,30,31,32,33], especially with regard to RMSSD and some frequency-related variables, such as LF, HF, and LF/HF [27,28,29] or during head-up tilt, exercise [26, 27], or mental stress [29]. Also, a few studies using blood pressure (Finapres) data tend to confirm a greater disagreement between PRV and HRV variables in the short-term or HF domain [30,31,32,33]. In particular, in [30], a comparative study of ECG and CBP signals recorded during a variety of situations (e.g., supine, seated rest, orthostatic tilt, psychological tasks) has been carried out, again taking into account time- and frequency-domain indexes. However, data were analyzed by intraclass correlation reliability coefficients only [30]. More recent research papers have investigated the feasibility to assess the reliability of pulse-rate variability measurements obtained from PPG signals acquired by video cameras [36], smartphones [21], or smartwatches [37]. Also, in such works, only time- and frequency-domain indexes have been taken into account, finding an overall good agreement of temporal parameters, but bigger differences and disagreement for LF and HF powers [21, 37] or during standing if compared to resting position [36].

Our results complement the findings already available in the literature, highlighting the feasibility of extracting HRV indexes from PPI-based data, in several different conditions. Overall, the relatively good agreement of PPI-based measures with the corresponding RRI-based gold standard is demonstrated by their ability to detect changes in the autonomic nervous system state during physiological challenges, by generally low difference of their absolute value (Figs. 2, 3, and 4), high correlation (Table 1), and low dispersion of differences (Figs. 5, 6 and 7). The agreement was almost perfect in resting supine positions (R1, R2, R3), while resulted to be worse in the 45° upright position (T) and during mental arithmetic test (M). In particular, PPI-based measures of the time-domain indexes exhibited higher bias and coefficient of agreement according to Bland–Altman analysis (Fig. 5). As to frequency-domain indexes, resting conditions presented a better agreement documented in terms of correlation coefficient (Table 1) and Bland–Altman plots (Fig. 6), compared to mental and especially to postural stress. The HF spectral power resulted significantly lower during M and T than during any of the supine rest conditions, suggesting that the HF power decreases during M and especially during head-up tilt due to reduced parasympathetic activity. Moreover, the LF/HF ratio was markedly higher during T, confirming the well-established behavior that in this case the LF component becomes dominant, reflecting sympathetic activation; this occurred without any significant changes of LF spectral power in absolute units if compared to resting conditions, due to the decrease of the total variance [6]. Moreover, the reduction of HF spectral power during mental arithmetic test can be due to the fact that decreased parasympathetic tone reflected in a lower HF power is correlated with stress states [1]. In the information domain, supine resting conditions showed a close agreement between the two approaches, being instead worse during stress, especially in the case of conditional entropy and self entropy computed during head-up tilt. A possible explanation of the worse agreement for information measures during physiological stress is that the higher variability exhibited by PRV compared to HRV is likely determining more complex patterns in the time series, which cause the higher CE and lower SE found in our study (Fig. 7). This finding has potential physiological and clinical relevance, since information measures such as the CE and the SE have been shown to respond to different degrees of neural sympathetic activity during postural stress, and are thus believed to respond to sympathetic control [7, 13].

When comparing the different descriptors in the time, frequency, and information domains, we observe that PRV is more in disagreement with HRV regarding the computation of the LF/HF power ratio. For this index, the accordance between PPI- and RRI-based data was worse both in terms of correlation coefficient and of Bland–Altman agreement. In the literature, the agreement between PPI- and RRI-based values of the LF/HF index has been controversial [24]. In particular, similarly to our results, in different previous works, the LF/HF ratio presented a negative bias and a lower agreement when compared to other indexes [24, 29, 51]. The reason behind this may be physiological as the LF/HF ratio, though traditionally used as an indicator of sympathovagal balance, is actually a measure affected by several factors which determine its high variability across subjects and conditions. In fact, its use as a measure of balance between sympathetic and parasympathetic activity has already been widely questioned [1, 45, 46], since the underlying assumption in using LF/HF ratio (i.e., that an increased sympathetic activity is accompanied by a decreased parasympathetic activity) is not always valid and instead depends on specific measurement conditions, especially in case of short-term HRV (5-min data) [1, 45]. Moreover, LF power is not a pure index of sympathetic activity, being a significant portion of the variability in this frequency band mediated by the PNS [1]. The situation may change during head-up tilt, when sympathetic activation induces a shift in the sympathovagal balance that generally increases the value of LF/HF [6]; this may also explain the fact that in our study, LF/HF did not show the higher disagreement during tilt displayed by most of the other indexes.

The analysis of the distribution of values for each index computed over the 76 subjects analyzed showed that PPI-based distributions present in most cases a small but statistically significant deviation from reference RRI-based measures, with a bias which is positive for SDNN, RMSSD, HF, H, and CE, and negative for LF/HF and SE indexes. However, such deviation does not affect the capability of the PPI-based method to assess in almost all the cases the differences across the different conditions (resting and T and M), as using the RRI-based analysis. Moreover, the detected deviations are not surprising, since previous studies evidenced bias in the estimation of PRV-based indexes, especially regarding fast variations in the cycle length reflected by the RMSSD measure in the time domain and by the HF activity in the frequency domain [26, 34].

The discrepancies observed between HRV and PRV can be due to inaccurate detection of the pulse rate and/or to noise and movement artifacts which are known to affect CBP and PPG recordings. In our study, we tend to exclude the second reason because our acquisitions were executed in carefully controlled laboratory settings, also avoiding transition effects from one phase to another, and the analyzed time series were selected as free of artifacts and fulfilled the test for restricted weak-sense stationarity [41] (see Sect. 2.2) [3, 39]; however, random errors due to localization of the maxima in the round-shaped pulse waves of the blood pressure signal may play an important role in determining different values of PPI compared to RRI. Another plausible reason for the observed discrepancies can be related to physiological factors. PRV differs from HRV due to the distorting effect of non-constant pre-ejection period (PEP), which depends mainly on left ventricular contractility [52,53,54,55], and to pulse transit time (PTT), which has been also shown to exhibit physiological variability [24, 56,57,58,59,60,61]. PTT is the time that a pressure pulse takes to travel between two arterial sites and depends on the current condition of the vessels and on blood pressure [59, 60]. We refer to [60] for propagation models establishing the relationship between PTT and arterial elasticity, which may play a role in determining the discrepancy between short-term PRV and HRV and between the related descriptive indexes. PEP is one of the components of the time delay between R wave from ECG and pulse wave from PPG, and several previous studies demonstrated its dependence on the autonomic nervous system state and ventricular filling (venous return) [52,53,54,55]. PEP depends on the electromechanical functioning of the heart and can thus vary independently of PTT (e.g., PEP changes in the same direction as PTT during exercise but in the opposite direction during vasoconstriction) [61]. Finally, we note that a main role in determining physiological differences between PRV and HRV is played by respiration, which may affect ventricular loading and thus PEP, as well as intra-thoracic pressure, stroke volume, and arterial blood flow, and thus PTT. Future studies are needed to clarify better the role of inaccuracies in PPI measurement vs physiological factors in determining the differences between HRV and PRV.

In spite of the discrepancies discussed above, our results demonstrate that it is feasible to employ PPI-based measures to assess statistically significant variations of both standard and novel descriptive indexes of short-term HRV computed in the time, frequency, and information domains in response to physiological changes related to orthostatic stress or cognitive workload. Indeed, a number of changes observed in the descriptive indexes of HRV across different experimental conditions were observed identically also for the corresponding indexes based on PRV. These changes are specific for the ANS state modification and reflect the well-known physiological regulatory mechanisms responsible for the cardiovascular system control [3, 13, 62, 63]. In detail, orthostatic and mental stress induce tachycardia as well as an overall decrease of HRV associated with a reduced PNS activity and/or elevated SNS activity, here reflected by the lower values of the MEAN, SDNN, and RMSSD indexes. In the frequency domain, orthostatic stress is reflected in this study by higher values of the LF/HF power ratio, which are seen, even though to a lesser extent, also during mental stress induced by the arithmetic test. With regard to the entropy measures, the decrease of conditional entropy, a measure associated with system complexity [7], as well as the increase of information storage (i.e., the self entropy, SE) [38, 64] induced by head-up tilt in short-term HRV series is a well-known result which has been ascribed to the shift in the sympathovagal balance occurring during head-up tilt causing simpler cardiac dynamics with dominant oscillations centered around the frequency of Mayer waves [3]. It is worth noting also that, in agreement with recent findings [11, 65], conditional entropy and information storage were altered by postural stress but not by mental stress.

Correlation analysis and Bland–Altman plots confirmed the good agreement between PPI and RRI based measurements in all the resting conditions. The agreement decreases for mental arithmetic test and even more for head-up tilt: in detail, lower correlation coefficients and/or higher deviation of the robust regression line from the optimum condition of a = 1, b = 0 have been obtained, especially for LF/HF, CE, and SE. Such results are in agreement with other works in literature which demonstrated that the consistency between PPI- and RRI-based measures decreases during physiological stress [26, 34], most probably due to the usually lower magnitude values of HRV in such conditions [6]. Again, while the administration of stressors generally favors noise and motion artifacts, physiological factors may contribute to the observed higher discrepancies. These factors may include the reduced variability of RRI during orthostatic and mental stress, and the stronger mechanical coupling between respiration and the thoracic vascular system in the upright position compared to the supine [66].

5 Conclusions

This work presents an in-depth comparison of PRV and HRV as regards the computation of descriptive indexes of the heart period variability in different physiological states ranging from the resting supine position to head-up tilt and mental arithmetic test. Our results speak in favor of the utilization of PRV to characterize the short-term cardiovascular control, both at rest and in response to postural and mental stress. Nevertheless, the larger disagreement found during mental stress and particularly during postural stress suggests that some caution should be adopted to use PRV measures in place of HRV measures during altered physiological states, especially when subtle modifications in cardiovascular control are sought. Future works should focus on studying how the agreement/disagreement between PRV- and HRV-based indexes of short-term variability varies with age and in presence of cardiovascular pathologies and what are its major determinants.