Introduction

Neurogenic orthostatic hypotension (NOH) is a disorder of sympathetic vasoconstriction [11]. Upon standing, the release of norepinephrine from sympathetic nerve terminals is decreased or absent, vasoconstriction in the systemic circulation fails, and blood pressure falls [10]. Reduced blood supply to the brain [8, 16] causes characteristic symptoms that are disabling and interfere with the ability to perform everyday physical activities [4, 9, 10].

Classic symptoms of NOH occur when standing and disappear when lying down [19]. Affected patients describe lightheadedness, dizziness or feeling faint [4]. Changes in vision, described as blurry or gray, are also common. When NOH is severe blood supply to the brain is critically reduced and loss of consciousness (syncope) may ensue [9, 19]. Other less specific symptoms of NOH, including generalized weakness, fatigue, shoulder and neck pain [1], cognitive slowing, leg buckling and headache often go unrecognized.

Orthostatic hypotension is defined operationally as a fall in systolic blood pressure of at least 20 mmHg and/or a fall in diastolic blood pressure of at least 10 mmHg within 3 min of standing [2]. NOH can be symptomatic or asymptomatic. Neither the magnitude of the blood pressure fall nor the standing blood pressure always correlate with symptoms of NOH and, in some patients neither can be used as a reliable measure for clinical decision-making.

Global quality of life questionnaires and symptom assessment instruments are not designed to specifically evaluate symptoms of NOH. Nor do these instruments measure the impact of NOH on daily activities. Therefore, we developed a novel clinical rating scale, the Orthostatic Hypotension Questionnaire, with two components: the Orthostatic Hypotension Symptom Assessment (OHSA) to measure the presence and severity of symptoms and the Orthostatic Hypotension Daily Activity Scale (OHDAS) to measure the impact of orthostatic symptoms on daily activities.

The psychometric properties of the OHQ were evaluated using data from a Phase IV multi-center, randomized, placebo-controlled cross-over trial to assess the clinical benefit of the alpha-adrenergic agonist, midodrine hydrochloride, a drug widely used to treat NOH [5, 28]. Here we present the results of the validation analyses.

Methods

Subjects and study design

The validity and responsiveness of the OHQ were tested on patients with NOH participating in a clinical trial of midodrine [5, 28]. Patients were included in the study if they had a fall in systolic BP > 20 mmHg and/or a fall in diastolic BP > 10 mmHg within 3 min of standing; symptoms of NOH; and a diagnosis of Parkinson Disease (PD), pure autonomic failure (PAF), multiple system atrophy (MSA) or an autonomic neuropathy. The study design was a placebo controlled cross-over trial, with an active treatment run-in. Subjects who responded to treatment were then randomized to a sequence arm (Fig. 1).

Fig. 1
figure 1

Trial design

Procedures

The OHQ, a clinical global impression–severity (CGI-S) [7, 20], and the SF-36® (version 2) Health Survey [26] were administered to all patients by a trained clinician at baseline, after 2 weeks of receiving midodrine or placebo (treatment 1) and after 2 weeks of cross-over into the opposite arm of the study (treatment 2). The clinical global impression of improvement scale (CGI-Improvement) [7] was administered at both visits (treatment #1 and #2) when the patient had been taking placebo or active agent for 2 weeks. Rating scores were recorded on standardized data-collection forms.

Measures

OHQ

The OHQ was developed by clinicians experienced in the treatment of patients with autonomic disorders (3 neurologists, 1 cardiologist, and 1 endocrinologist) and a psychometrician with experience in clinical scores. During development of the questionnaire, two focus groups were conducted to incorporate the patients’ experience with OH and to avoid clinical “jargon” (data not presented). The questionnaire is divided into two parts: Part I, Symptom Assessment (OHSA), consisted of six questions, each rating the intensity of one characteristic symptom of NOH, [1. Dizziness, lightheadedness, feeling faint, or feeling like you might black out; 2. Problems with vision (blurring, seeing spots, tunnel vision, etc.); 3. Generalized weakness; 4. Fatigue; 5. Trouble concentrating; 6. Head/neck discomfort] and Part II, Daily Activity Scale (OHDAS), consisted of four questions that assessed the impact of NOH symptoms on daily activities.

The questions are preceded by instructions to restrict answers only to symptoms that occur on standing and resolve when lying down. The recall period is “over the past week”. The items are scored on an 11-point scale from 0 to 10, with 0 indicating no symptoms/no interference and 10 indicating the worst possible symptoms/complete interference, and the option of selecting “cannot be done for other reasons”. The composite OHQ score is calculated by averaging the OHSAS and the OHDAS. Activities that are marked as zero or ‘cannot be done for other reasons’ at baseline are not included in the scoring. Thus symptoms that are not experienced by an individual subject as well as other neurological, usually motor, abnormalities that cause similar problems to those being measured were not included in the rating of NOH. The OHQ scales at post-baseline are calculated using only those items that were included in the baseline scores. When patients have a score for an OHDAS activity at baseline and endorsed ‘cannot be done for other reasons’ or were without a value at randomization (visit 5) or after crossover (visit 6, end of study), a score was assigned using last observation carried forward (in this study, this occurred in only 6 patients).

Global impression of severity and change scales

The clinician global impression of severity (CGI-Severity) consisted of 7-points ranging from 1 (no symptoms) to 7 (most extremely ill with symptoms of OH), with a higher score indicating a greater severity. The Clinician Global Impression of Change (CGI-C) scale also consisted of 7-points ranging from 1 (very much improved) to 7 (very much worse). The CGI-S and CGI-C were filled out by the clinician. Patient reported versions of the rating scales were also administered, the Patient Global Impression of Severity (PGI-S) and Patient Global Impression of Change (PGI-C).

SF-36 Health Survey Questionnaire

The SF-36 generic health status questionnaire consisted of 36 items subdivided into 8 domains. Each SF-36 scale was scored using a norm-based approach to standardize scores to a mean of 50 and a standard deviation of 10.

Psychometric analysis

Data quality review

The frequency distribution of OHQ items was examined. Descriptive characteristics, such as mean, median, and range were evaluated, followed by the extent of missing and out-of-range data. The percentage of samples scoring the minimum and maximum possible scores was determined to evaluate the floor and ceiling effects, respectively.

Validity

The construct validity of the OHQ was examined to assess the validity of combining items into the OHSA and OHDAS scale scores as well as the validity of a single composite score. Exploratory factor analysis was performed with no prior assumptions as to the structure of the OHQ. Eigen values were generated using a correlation matrix from data collected at baseline (visit 3A). The number of underlying factors was determined if an eigenvalue was greater than unity and the factor explained more than 5% of the variance in item scores [18]. Confirmatory factor analysis was conducted on the factor solutions suggested by the exploratory factor analysis using OHQ data collected at visit 5 and at the end of study (visit 6). Model fit was examined using the comparative fit index, Tucker–Lewis Index, root mean square error of approximation and weighted root mean residual. Hu and Bentler’s guidelines were used to interpret the values of comparative fit index and Tucker–Lewis Index (≥0.95), root mean square error of approximation (<0.06) and weighted root mean residual (<0.90) indicating good fit. Analyses were performed using MPlus (version 5.1).

Convergent validity of the OHQ was determined by assessing the relationship between the OHQ and other validated patient-scored measures. The strength of the relationship between the patient’s OHQ scores and CGI-S and SF-36 generic health questionnaire scores were examined using Pearson correlations. Both patients and clinician CGI responses were analyzed. As the SF-36 scale is a generic health-related quality of life scale and the CGI scale was conceptually closer to the OHQ, it was hypothesized that CGI scales would be more closely correlated to OHQ scores. The inter-correlations between the OHQ and the different domains of the SF-36 questionnaire were examined to determine whether specific aspects of the scale that measure symptoms of NOH, such as fatigue, correlated more closely with OHQ scores.

Convergent validity was also assessed by correlating absolute OHQ scores and standing systolic blood pressure at baseline and Visits 5 and 6, and by assessing the relationship between changes in OHQ scores and changes in standing systolic blood pressure from baseline to Visits 5 and 6.

Clinical validity was evaluated using known-groups methods to determine whether the OHSA, OHDAS and OHQ-Composite scores were systematically related to disease severity. Patients were sub-divided into two severity groups according to their symptom rating on the CGI-S scale. Group 1 consisted of patients with “little or no symptoms” (with a CGI-S score of 1, 2 or 3). Group 2 consisted of patients with “moderate to/severe NOH symptoms” (with a CGI-S score of 4, 5, 6 or 7). Mean OHSA, OHDAS and OHQ-Composite scores were compared in the two groups using ANOVA.

Reliability

Test–retest reliability of the OHQ was examined by calculating the intraclass correlation coefficient between baseline and Visit 5 in a sub-group of selected patients that reported no change on the PGI-C at Visit 5.

Internal consistency was assessed for the OHSA score, OHDAS score, and OHQ composite score using Cronbach’s alpha. Alpha was calculated for the OHSA using all 6 symptom items, for the OHDAS using the 4 impact on activity items, and for the OHQ composite using all 10 symptom and impact items.

Responsiveness

Responsiveness was examined by determining whether a change in the patient’s underlying disease status produced an appropriate change in OHSA, OHDAS and OHQ scores. Participants were sub-divided into two groups depending on how they rated their health status relative to baseline while randomized to receive placebo or midodrine (visits 5 and 6). Patients were classified as “improved” if they endorsed ‘very much improved’ or ‘much improved’ or ‘slightly improved’ on the CGI-C scale. Patients were classified as ‘the same or worse’ through endorsement of ‘no change’ or ‘slightly worse’ or ‘much worse’ or ‘very much worse’ on the PGI-C. OHQ responsiveness was examined by comparing the average change in OHQ scores from baseline in the two groups using ANOVA. Analyses were also performed in the same manner using groups defined by clinician report on the CGI-C.

Minimally important difference

In addition to ability to detect change, it is also helpful to assess the magnitude of change that is important. Using an anchor based approach, the MID is defined for this study as the change from baseline in the “Minimally Improved” group as defined on the PGI-C. As there is not a single well-accepted approach to the assessment of MID, distributional estimates are also included. In addition to the ½ standard deviation estimates of MID, standard error of measurement (SEM) is also included. SEM (standard deviation × √(1-reliability)), an estimate of MID that is based on the reliability of the questionnaire, is equivalent to the ½ SD when reliability = 0.75 and decreases as reliability increases. The logic behind the relationship is that as the noise inherent in the tool decreases, a score smaller than the ½ SD estimate becomes more meaningful. Research using the SEM indicates a convergence between anchor based and distribution based methods [15]. The internal consistency reliability has been proposed as the most stable estimate to use in calculation of the SEM [17].

Results

Sample

OHQ data was collected from 137 patients at baseline (visit 3A), of these patients 103 had OHQ data at Visit 5 and 127 had OHQ data available at Visit 6. The sample was 54% male, 93.6% Caucasian, and had a mean age of 62.7 years (SD 15.15). NOH was associated with diabetic peripheral neuropathy (29%), non-diabetic peripheral neuropathy (11%), pure autonomic failure (28%), multiple system atrophy (11%) and Parkinson’s disease (21%). At baseline, on average systolic blood pressure fell from 136 ± 21 to 101 ± 22 mmHg. Table 1 shows mean scores and blood pressure values at baseline, cross over onto placebo and active agent.

Table 1 Results of the trial

Data quality review

Analysis of the frequency distribution revealed the degree of “missingness” was minimal for all items across time points, with the exception of item Q10 (activities that require walking for a long time), which slightly exceeded the acceptable value of 10%. The OHQ scores showed a normal distribution. The percentage of samples scoring the minimum and maximum possible scores was less than 30%, indicating that none of the OHQ items showed significant floor or ceiling effects. Item standard deviations in the OHSA ranged from 2.65 to 3.16 and from 2.57 to 3.25 in the OHDAS, suggesting equality of variances among items in the scale. In addition, skewness and kurtosis were low for all questions in the OHSA (skewness, −0.31 to 0.61; kurtosis, −1.22 to −0.56) and OHDAS (skewness, −0.90 to 0.50; kurtosis, −1.15 to −0.03).

Validity

Construct validity

Exploratory factor analysis revealed that the OHQ had two underlying factors. The first two eigenvalues were greater than 1 and each explained more than 5% of the variance in item responses. The pattern of correlations between items and the first two factors were consistent with the a priori item-scale hypothesized relationships. Promax rotated factor loading using weighted least squares estimator extraction revealed that items 1 through to 6, which measure symptoms of NOH, all loaded more highly on the first factor (with r values ranging from 0.526 to 0.829). Items 7 through 10, which measure impact of symptoms on activities, all loaded more highly on the second factor (with r values ranging from 0.645 to 0.850). The two factor solution suggested by the exploratory factor analysis was confirmed with confirmatory factor analysis using data from visits 5 and 6. As shown in Table 2, a two factor solution had acceptable goodness of fit statistics and all but the vision, concentration, and head/neck discomfort items loaded very highly (>0.8) with their respective factor. The goodness of fit statistics for a single factor model was also acceptable, suggesting that a single composite index can appropriately be scored from all OHQ items, from a measurement perspective.

Table 2 Confirmatory factor analysis for the one and two factor solutions: completely standardized factor loadings—Visit 5 and 6

Convergent validity

As shown in Table 3, all correlations between the OHQ and other symptom scales were in the predicted direction and statistically significant (P < 0.05). The OHQ was strongly correlated with the CGI-S (Pearson r: 0.43–0.51) and the PGI-S (r: 0.58–0.67). As expected, the OHQ correlated less well with most of the generic health status domains of the SF-36 quality of life questionnaire (average Pearson value −0.39 ± 0.03). As expected, SF-36 derived vitality scores, which measured energy/fatigue, showed the strongest correlation with OHQ scores (average Pearson value −0.49 ± 0.04). Social functioning domains, which measure daily activities (average Pearson value −0.47 ± 0.02), showed the strongest relationship with OHDAS scores. At visit 6 (on placebo/midodrine), there was a moderate correlation between absolute standing systolic blood pressure and OHQ scores (r values: OHSA −0.31, OHDAS −0.32 and OHQ-Composite −0.33) and a moderate-to-strong correlation between changes in standing systolic blood pressure and change in OHQ scores from baseline (r values: OHSA −0.41, OHDAS –0.40, OHQ-Composite –0.42).

Table 3 Convergent/discriminant validity—correlations of the OHQ scores with criterion measure scores

Known groups validity

Based on the patient’s own rating of their symptom severity of the PGI-C scale, 29 patients were classified as having ‘little/no symptoms’ and 106 patients were classified as having ‘moderate to severe symptoms’. As shown in Fig. 2, compared with the group with ‘little/no symptoms’, patients with moderate to severe symptoms had significantly higher scores on the OHSA, OHDAS and OHQ-Composite. Clinician scored CGI-C ratings showed similar findings.

Fig. 2
figure 2

Clinical validity. Average scores on OHQ scales according to the patient’s rating of illness on the clinical global impression of OH severity scale. OHSA Orthostatic Hypotension Symptom Assessment Score, OHDAS Orthostatic Hypotension Daily Activities Scale, OHQ Orthostatic Hypotension Questionnaire Composite Score. *P < 0.05

Reliability

The internal-consistency reliability estimates for each OHQ scale were consistently above the recommended threshold (Table 4). Cronbach’s alpha values were above 0.8 for the OHQ composite scale and subscales, respectively, and the average inter-item correlation was above 0.4. The internal consistency reliability estimates for the OHSA, OHDAS and OHQ-Composite exceeded the minimum standard for group-level comparisons (r > 0.7). Average inter-item correlations between items within each scale were consistently moderate to high (the average inter-item correlation for most patient-reported outcome measures should be at least 0.3, and preferably above 0.4).

Table 4 Multi-trait analysis—internal-consistency reliability—Visit 3 (baseline)

Eighteen patients were identified who reported no change on the PGI-C at Visit 5. Intraclass correlation coefficients in the stable subgroup were 0.92 for the OHQ, 0.87 for the OHDAS and 0.86 for the OHQ-Composite, suggesting excellent test–retest reliability.

Responsiveness

Compared to baseline (visit 3A), at visit 6, 67 patients were categorized as ‘improved’ and 58 patients were categorized as ‘the same/worse’ based on response to the PGI-C. As shown in Fig. 3, OHSA, OHDAS and OHQ Composite scores fell significantly more (indicating improved symptoms and greater physical activity levels) in patients who rated themselves as “improved” than in patients who rated themselves “the same or worse”. Results from groups classified according to the CGI-C ratings showed similar findings.

Fig. 3
figure 3

Responsiveness. Average score change in OHQ according to the patient’s opinion of whether their symptoms of OH were improved or the same/worse on the clinical global impression of improvement scale. OHSA Orthostatic Hypotension Symptom Assessment Score, OHDAS Orthostatic Hypotension Daily Activities Scale, OHQ Orthostatic Hypotension Questionnaire Composite Score. **P < 0.01, ***P < 0.001

Minimally important change

The MID anchor based estimates (Table 5) for the OHSA 0.82 and 1.26 points (at Visit 5 and Visit 6, respectively) while the ½ SD estimate is 0.98 points and the SEM estimate is 0.78 points (the lower SEM representing the good reliability of the scale). The OHDAS MID estimates using the anchor-based approach are 0.71 to 0.89 points, less than the ½ SD estimate of 1.04 and very close to the SEM estimate of 0.83. For the OHQ composite score, the MID anchor based estimates are 0.83 and 1.16 points (at Visit 5 and Visit 6, respectively) while the ½ SD estimate is 0.94 and the SEM estimate is 0.62. The point rages from the anchor based approach are similar to the oft recommended “remarkable universality of half a standard deviation” [15]. These estimates are supported by the reliability-based SEM estimates of the instrument, which are lower estimates than ½ SD based upon the high reliability of the scales. Given the convergence seen in the different methodologies, a MID of between 0.8 and 1.00 points on the OHSA, OHDAS and OHQ composites, or more generally ½ SD can be considered appropriate and conservative to guide the interpretation of the clinical trial results (Table 5).

Table 5 Minimal important difference estimates

Discussion

This study evaluated the psychometric properties of a patient-completed tool to assess the severity of NOH symptoms and impact on patients’ activities. The OHQ was developed with input from patients and physicians. In the findings presented here, the OHQ was shown to have internal consistency, reproducibility, construct validity and responsiveness to change. Furthermore, it was shown that the subscales of the OHSA and the OHDAS are valid subscale scores and that the OHQ can be scored as a single overall composite that combines symptom severity and impact of activities.

Orthostatic hypotension is arguably the most disabling symptom of autonomic failure. It is a frequently occurring feature of central degenerative autonomic disorders such as multiple system atrophy, Lewy body disorders and Parkinson disease; peripheral autonomic neuropathies; and the peripheral autonomic degenerative disorder, pure autonomic failure [4]. Despite this, there is no comprehensive, validated symptom assessment questionnaire nor is there a validated disease-specific activity questionnaire. These deficiencies prompted this study.

Orthostatic hypotension produces a diverse array of symptoms. Lightheadedness, dizziness, pre-syncope and syncope occurring in response to sudden postural change are the most characteristic symptoms of orthostatic hypotension. However, non-specific symptoms such as generalized weakness, fatigue, nausea or headache are common concomitant symptoms and, in some patients, may be the predominant feature. Furthermore, orthostatic hypotension may result in focal symptoms due to tissue or end-organ hypoperfusion. Thus, patients may report visual blurring due to retinal or occipital lobe ischemia; cognitive slowing, concentration difficulties and leg buckling due to cerebral ischemia; neck pain due to neck muscle ischemia; orthostatic dyspnea due to ventilation perfusion mismatch; and anginal pains due impaired myocardial perfusion [6, 14].

To increase the range of symptoms encompassed by the OHSA, thereby providing a more comprehensive measure of the symptom burden of orthostatic hypotension, we included non-specific symptoms (weakness and fatigue) and focal symptoms (concentration difficulties, neck pain and visual blurring). Several general questionnaires that target the features of autonomic failure exist. These were developed to assess the autonomic features of multiple system atrophy [27], Parkinson disease [25], diabetic autonomic peripheral neuropathy [12], the neuropathy of impaired glucose tolerance [29] and generalized autonomic failure [21, 23, 24]. All of these have one or more questions that assess orthostatic hypotension, however, the majority of these [24] address only the characteristic symptoms of orthostatic hypotension, dizziness, lightheadedness, pre-syncope and syncope. Few questionnaires address the non-specific symptoms, such as fatigue and weakness [22] or the focal symptoms such as visual blurring, and concentration difficulties [23, 25] and none target neck and shoulder pain.

The psychometric properties of the questionnaire should be viewed within the context of the challenges imposed by the need to differentiate the symptoms of orthostatic hypotension from those due to multi-system degenerative disorders. For example, balance difficulties due to Parkinsonism or a peripheral neuropathy may be experienced as dizziness, and patients may struggle to differentiate falls or near-falls due to these disorders from syncope and pre-syncope. Similarly, fatigue, cognitive slowing and visual blurring may be a consequence of the underlying disease. The non-orthostatic features of these disorders also impair activities of daily living and quality of life. Although patients when completing the questionnaire explicitly ask to rate symptoms and activity limitations due to orthostatic hypotension and not due to the underlying disease, this task is not easily accomplished. Despite this, the psychometric qualities of the OHQ appear similar to other questionnaires that assess symptoms and activities in degenerative disorders.

The minimally important difference (MID, Table 5) is often misunderstood to represent the point difference that must be present between treatment groups in order to be clinically meaningful. It is important to understand the MID as a within group phenomenon [13]. In fact, it has been reported that the between group effect size should be smaller than the MID [3]. As a study is designed, it is the clinical knowledge of the indication and the expected effect of both treatment and control that should guide estimates of effect size. Indeed, one might expect a very small effect between a treatment and an active comparator, and a rather larger estimate of effect between placebo and treatment. The accurate estimation of sample size to test a certain effect size takes into account the precision of the endpoint. The point to be made is that the MID should guide decisions about change within a group that can be considered meaningful.

The determination of a clinically meaningful between groups effect size is not an artifact of the instrument in use, but a decision to be made based upon clinical and statistical considerations. Although an appropriate application of MID estimates would be to define a responder criteria based on these estimates and to perform a responder analyses, the FDA guidance recommends instead providing a plot of the cumulative distribution function of change from baseline scores. These results, produced by treatment group, provide a rich overview of the treatment response pattern for the scale, and are less impacted by outliers than are parametric analyses.

There are several limitations to this study. The population under study had autonomic failure due to several different central and peripheral causes. 32% of the participants had a central nervous system cause of orthostatic hypotension associated with Parkinsonism while 68% of participants had a peripheral cause of orthostatic hypotension. It is possible that the questionnaire has different psychometric qualities in different disease populations. The sample-size was not sufficiently large for sub-group analysis. This is a suitable topic for further study. Additionally, as part of the clinical trial protocol, the patients were showed their previous responses to the OHQ prior to filling out their follow-up visits. It is possible that the high test–retest reliability observed in this study may not generalize to a situation where patients are not provided with prior responses. Reliability should be re-evaluated in future studies.

The OHQ has several strengths. First, the OHQ focuses on the full range of symptoms relevant to patients with NOH. Second, the assessment of activities are specific for the most prominent activity impairments imposed by orthostatic hypotension, standing or walking for short or long periods of time. Third, the psychometric properties presented here indicate that the OHQ can accurately measure the symptoms and impact of NOH in a valid and reliable way and, of particular importance for an outcome measure used to assess the impact of treatment interventions, can appropriately detect change over time. Finally, it is brief, making it quick and easy for the patient to complete with minimal patient burden, a common concern regarding patient reported outcomes in clinical trials.

Validation in other languages is required for the OHQ to be widely used. Although the OHQ has been translated, cross cultural validation has not yet been conducted due to inadequate sample sizes.

In conclusion, the OHQ demonstrated good psychometric properties. By highlighting the patient report of symptoms as well as impact of symptoms on patients’ activities, the OHQ can provide rich information to a physician and augment clinical outcomes for planning treatment management.