Introduction

Orthostatic blood pressure (BP) responses are used to identify patients with orthostatic hypotension who are at risk of syncope, falls and autonomic dysfunction [1]. Traditionally, auscultatory or oscillometric BP measurement methods have been employed [2]. Over the past 20 years emerging clinical and epidemiological practice has shifted towards the use of continuous beat-to-beat BP (CBP) measurements [3]. Impairments in CBP responses are related to increased levels of frailty [4], cognitive impairment [5] and falls risk [6] in cross-sectional and longitudinal studies.

Understanding the reliability of CBP measurements is important in all applications of these measures. Numerous approaches for characterising altered CBP behaviour [7] are suggested in the literature, e.g. initial orthostatic hypotension [8], classical orthostatic hypotension (OH) [3], impaired BP stabilisation [9] and morphological approaches [6]. Given that traditional BP measurements have considerable intra- and inter-individual variability [10] and intra-observer effects [11] we hypothesise that CBP is similarly affected.

Information on measurement error for CBP is important for clinical practice and research studies. Clinical diagnoses require repeatable and robust biomarkers with well-characterised intra- and inter-individual variability [11], while in research, knowledge of measurement error is used in power calculations, to select outcome measures and to reduce errors associated with incomplete correction of confounding and regression dilution bias [12].

In this article we focus on the medium term (4–12 weeks) reliability of active stand responses and related classifications of OH using CBP approaches in a population study of adults aged over 50 years. Bias, standard error of measurement (SEM), minimal detectable change (MDC) between measurements, intra-class correlation coefficients (ICC) and kappa statistics are reported.

Methods

Study population

Participants for this study were recruited from a random sample of older adults (n ≈ 1100) aged over 50 living in Ireland who previously took part in the pan-European Survey of Health, Ageing and Retirement in Europe (SHARE), Ireland between 2006 and 2007 [15]. These individuals were invited to take part in a health assessment in 2011 that mimicked that of the Irish Longitudinal Study on Ageing (TILDA) and included an active stand test [13, 14]. A short-form version of the TILDA computer-aided personal interview (CAPI) was administered during this assessment to provide demographic and self-reported health and socioeconomic information.

To examine the repeatability of outcome variables participants attended for a repeat health assessment approximately 4–12 weeks after the baseline measurement. To control for the effects of time of day and observer on the outcome variables participants were randomly assigned to have their health assessment performed in either the morning (9:30 a.m.) or afternoon (2:00 p.m.) and to be assessed by one of two research nurses. Ethical approval was obtained from Trinity College Dublin ethics committee and all respondents provided signed informed consent prior to participation. All experimental procedures adhered to the Declaration of Helsinki.

Inclusion and exclusion criteria

This nationally representative sample was constructed using the RANSAM framework [14] using a two-stage clustered sampling process using the Irish Geodirectory as the sampling frame. The Irish Geodirectory is a comprehensive listing of all addresses in the Republic of Ireland. These addresses were assessed for eligibility, and members of eligible households aged ≥50 years were canvassed to participate in the study. The n ≈ 1100 individuals that participated in the original SHARE study were subsequently invited to participate in the health assessment as described above. Patients with specific autonomic dysfunction were not excluded from the study; however those who lacked capacity to consent to the study or were living in institutional care were excluded.

Continuous orthostatic blood pressure measurement

CBP responses to orthostasis were measured using the volume clamp method (Finometer®, Finapres Medical Systems, Arnhem, The Netherlands) as previously described [3, 16]. Measurements were performed in a quiet, temperate room (21–23 °C). Participants rested in the supine position for 10 min and were then instructed to stand using a "5-4-3-2-1-STAND" prompt. Assistance to mobilise was provided as required. Participants stood while systolic BP (SBP), diastolic BP (DBP) and heart rate (HR) were monitored for 2.5 min with subjects reporting symptoms of dizziness, light-headedness or unsteadiness during the stand (coded as orthostatic symptoms). The measurement hand (left) was maintained by the participant’s side throughout the period of measurement. The height correction unit was used to correct for any hydrostatic offsets introduced during change of position [3, 16]. The Physiocal® system remained on during the supine rest period and was switched off just prior to the stand. This identical protocol was repeated on the second health assessment visit by the assigned research nurse. The role of the observing research nurse was to guide participants throughout the protocol according to a standard operating procedure. Data analyses were performed independently of the observer [16].

Data analysis of CBP records

Raw CBP data were automatically processed to reject artefacts and perform moving average filtering and BP waveform feature extraction [16]. A 2-s moving average filter was applied to the raw CBP to allow nadir and peak CBP and HR values to be extracted with a high time resolution while also minimising noise. A moving average filter (±5 s around each reported time point) was applied to the raw CBP, e.g. the filtered value at 60 s is a weighted mean of the values occurring in the window from 55 to 65 s. These data were used to capture the recovery trends as per previous work [3, 16] while balancing noise reduction, the data set size for epidemiological samples and falls risk prediction. The following features were automatically extracted [3, 16]: supine SBP, DBP and HR: values of SBP (SBPB), DBP (DBPB) and HR (HRB) defined as the average of supine data occurring −60 to −30 s prior to standing; systolic BP and diastolic BP nadirs: minimum SBP or DBP value occurring within 30 s of standing denoted SBPNadir and DBPNadir; peak HR: maximum heart rate (HRPeak) after standing was defined given its association with ageing [3] and autonomic dysfunction [17]; recovery values at specified time intervals: SBP, DBP and HR values at takes on values 30, 60, 90 and 110 s post-stand. These values are denoted SBP (t), DBP (t) and HR (t) where t is time in seconds after standing; delta values at specified time intervals: change in SBP, DBP and HR values from supine values. These values are denoted ΔSBP (t), ΔDBP (t) and ΔHR (t) where t is time in seconds after standing and takes on values 30, 60, 90 and 110 s; impaired orthostatic BP stabilisation OH (t) was defined as failure to return to within SBP ≥20 mmHg and/or DBP ≥10 mmHg of supine levels at t seconds after standing where t is time in seconds after standing and takes on values 30–90 s in 30-s intervals. An additional analysis examined the effect of including a modified threshold of 30 mmHg SBP drop in those with supine systolic hypertension >160 mmHg. Sustained tachycardia was defined as a sustained HR increase of >30 bpm or a HR of 120 bpm at all time points after standing.

Participant characteristics

The following characteristics were recorded (See Table 1): age (years), gender, height (cm), body mass index (BMI, kg/m2), mean gait speed (cm/s), dominant hand grip strength (kg), history of fractures, faints, falls, a doctor’s diagnosis of heart attack, hypertension, high cholesterol, stroke, diabetes, atrial fibrillation, arthritis and osteoporosis. Medication use was quantified to capture the number of medications taken, those on antihypertensives and polypharmacy (taking ≥5 medications). To assess global cognition the Montreal Cognitive Assessment (MOCA) was administered [18], with mental health assessed using the eight-item Centre for Epidemiological Studies Depression (CES-D) Scale [19] [13].

Statistical analyses

Analyses were performed using Stata (version 12). Paired t tests for continuous variables were applied to assess univariate mean differences across time points for each measure (see Table 2). For each continuous variable a linear mixed-effects model (Table 3) was estimated including random effect of participant and fixed effects of time of day, as well as observer effects. From the model the standard deviations of the between participant (SDbetween; standard deviation of the random effect) and within participant variation (SDwithin; residual standard deviation) were extracted. ICC, the proportion of variability explained by between-individual differences, was estimated as ICC = SD 2between /(SD 2within  + SD 2between ) (see Table 3). A false discovery rate approach was used to control for multiple testing with a corrected significance level of P = 0.0015 [20] for statistical significance. Bland-Altman plots (Supplementary Figs. 1–3) were used to estimate the mean difference between repeated measures (bias) and the MDC for each continuous measure. The MDC reflects the maximum difference between two occasions that could be attributed to chance within the same individual given the estimated within-person variability for each measure. MDC is estimated by MDC \(= {\text{SD}}_{Within} \times \sqrt 2 \times Z\) where Z = 1.96 for the 95% MDC. The agreement across time points of the following binary classifications: OH, OH (t) (at all time points t = 10, 20,…,110 s after standing) and sustained tachycardia was reported using Cohen’s kappa statistic.

Results

Sample

Data from 125 participants aged 66.2 ± 7.5 years old of whom 55.6% are female were available for analysis with five participants without a complete active stand data set. Eighty-two (65.6%) participants were allocated a different observer (observer 1 vs. observer 2), while 57 (45.6%) were assigned a different time of day (a.m. vs. p.m.) for the second visit. The mean time between visits was 84.3 ± 23.3 days. Participant characteristics are detailed in Table 1.

Table 1 Cohort characteristics

Reliability of CBP measurements

After correcting for multiple testing, univariate analysis showed that some absolute measures of BP were lower at repeat compared to baseline. This, however, resulted in no substantial mean differences in BP recovery between the baseline and repeat measurement, where only a few statistical differences were found. Mean HR and HR recovery were similar in the baseline and repeat assessments (Table 2). We found the similar results after multivariate analysis with time of day and observer as covariates (Table 3).

Table 2 Univariate repeatability of active stand variables
Table 3 Multivariate reliability statistics of active stand variables

Substantial within-person variation was noted across all absolute measures of SBP, DBP and HR. These values had (by conventional cut-offs) moderate to high repeatability with ICC values ranging from 0.6 to 0.8, with HR more reliable than BP values.

The repeatability of the baseline corrected parameters were lower, with similar magnitudes of variation within and between individuals leading to moderate ICC values ranging from 0.4–0.7.

The MDC for absolute SBP ranged from 32.7 mmHg to 39.4 mmHg. The MDC values for absolute DBP are smaller and ranged from 15.9 mmHg at 110 s after standing to 20.2 mmHg at the nadir. HR MDC values ranged from 12.7 bpm to 16.7 bpm. MDC values for values corrected for supine resting values were marginally improved (Table 3). Supplementary Figures 1–3 depict the Bland-Altman plots for these relationships.

Reliability of orthostatic hypotension and related classifications

Classical OH (κ = 0.21; P = 0.009), OH (t) demonstrated low-to-moderate agreement across repeated measurements with the reliability of OH (t) highest at 50 s after standing (κ = 0.40; P < 0.0001) (Table 4). For example, although 16 out of 125 participants were classified at both the baseline assessment and at the repeat, only 5 of these were classified with OH on both occasions.

Table 4 Reliability of orthostatic hypotension (and variants) and postural tachycardia definitions

Additional analysis indicated that the reliability of the OH definition was not sensitive to the effect of changing the diagnostic threshold to 30 mmHg in those with supine hypertension >160 mmHg.

Discussion

Continuous BP and HR measurements (and commonly derived clinical measures, e.g. OH) during standing have low-to-moderate levels of reliability (ICC = 0.46–0.80 for continuous measures, κ = 0.21–0.40 for binary classifications) and MDC over a 4–12-week period. While many absolute BP measures were lower during the repeat visit, there was less systematic difference in measures of BP recovery.

The mean values of SBP, DBP, HR, ∆SBP, ∆DBP and ∆HR are relatively stable across a 4–12-week period. HR values are the most reliable with DBP being the least reliable. From the Bland-Altman plots (Supplementary Figure) it is evident that the MDCs are wide and range from 25 to 40 mmHg for SBP/∆SBP, 11–16 mmHg for DBP/∆DBP and 12–16 bpm for HR/∆HR measures.

Clinicians should be aware of the MDC in these measures when assessing the effect of interventions or the progression of patients. Owing to the natural variation and day-to-day changes in factors that affect BP, 95% of patients would be expected to exhibit changes within the 95% MDC by chance alone when two measures taken at different times are compared.

From a research perspective, knowledge of between and within-person variability will assist in performing power calculations and reducing bias associated with misclassified exposure (otherwise called predictor or independent variables) or confounding variables. Regression dilution bias occurs when exposure variables are measured with error and causes underestimation of effects. This bias can be corrected using the reliability information we report here [12]. The degree of dilution is determined by the reliability of the exposure measurement, given here by ICC ranging from 0 to 1. For example an uncorrected estimate, β = −0.15, of the association between increasing ΔSBP(30) and measures of global cognition is biased downwards if an unreliable measurement of ΔSBP(30) is used. This can be corrected by applying the correction formula β corrected = β/ICC. This gives a corrected coefficient of β corrected = −0.25, where ICC = 0.59 is reported in Table 3. Our findings are therefore important to those estimating the effects of impaired BP recovery on future health outcomes. More generally, measurement error in confounding variables can also lead to false-positive effects, because confounding control for such variables will be incomplete if unreliable measures are used. Procedures such as errors-in-variables regression can produce unbiased estimates, but rely on knowing the reliability or the within-person standard deviation to correct these effects. Hence our estimates are also important to those using such measures as confounding variables [12].

The reported variability here is comparable to previous reports of autonomic function and BP measures [21, 22] and is likely physiological in origin [23]. It is well known that SBP, DBP and HR fluctuate considerably over a broad range of time scales, i.e. diurnal to seasonal [10] [24]. These fluctuations are thought to arise from internal and external sources including changes in neurohormonal activity [25], circulating volume [26], changes in sympathetic and parasympathetic activity [8] and environmental effects [11]. Furthermore the active stand response is under the influence of a number of additional factors including self-selected speed of standing, muscle pump activation [27], movement artefact [16], time of day [28] and other experiment effects (e.g. observer), which are likely to further effect the repeatability of measurements. Previous studies have noted that orthostatic BP measurements have relatively poor reproducibility [29, 30], with the reproducibility thought to be higher in those with neurogenic OH [31]. Others have also noted low levels of agreement in OH definitions (kappa 0.17–0.32) based on active stands performed at the same time on different days [32], suggesting that at least in this relatively healthy population, OH can be transient for a number of people.

Strengths and limitations

A number of limitations are worth noting in our study. Underlying physiological changes, e.g. hydration status over a 4–12 week period, may have occurred, which may account for some of the within-individual variability detected. No restrictions were placed on participants in terms of exercise, food intake and medications, all of which may contribute to measurement variability. Repeating this study over a shorter time frame in a more controlled environment with the inclusion of same-day repeated measurements could lead to higher estimates of reliability, but our findings reflect the variation likely to be observed in real-world settings. The presence of orthostatic symptoms was noted; however the severity and type were not quantified nor was the reliability of initial orthostatic hypotension [33]. Although we recognise the duration of the stand was shorter than advised by AAS consensus criteria, this was for a number of reasons: the challenges of a large epidemiological study require a trade-off between sample representativeness and length of time to conduct individual tests; based on an initial pilot study 95% of the population had recovered by 120 s; the duration of the data reported (110 s) was chosen to minimise missing data and end effects, which influence the moving average filtering process [3].

This study has a number of significant strengths. A community representative sample was studied allowing our results to be generalised to community-dwelling individuals. Given the prevalence of OH (12.8%), sustained tachycardia on standing (5.6%) and diabetes (6.4%) in this sample, it is also captures those with autonomic dysfunction. A dedicated sample would be required to focus on specific autonomic disorders, e.g. diabetic autonomic neuropathy. However our findings suggest that within-person standard deviation is not linked to absolute values of BP recovery, and so there is no evidence that our estimates of within-person standard deviation would not be applicable in this group although it would be interesting to examine these relationships in individuals with type 2 diabetes or hypertension [34]. A comprehensive battery of physical, cognitive and mental health measures was obtained, alongside socioeconomic and health utilisation metrics. All measures were collected using internationally standardised protocols.

Conclusion

Continuous BP and HR measurements and related clinical definitions of OH during standing show low to moderate reliability and substantial natural variation over a 4–12-week period. Understanding variation in measures is essential for researchers designing studies or estimating the effects of OH, while in clinical settings reliability statistics should be considered when evaluating longer term treatment effects.