Introduction

Clinical measures are critically important but may not reflect the day-to-day functioning and well-being of patients with chronic diseases. The Patient-Reported Outcomes Measurement Information System (PROMIS®) initiative of the National Institutes of Health (NIH) was developed to advance methodology and the application of patient-reported outcomes (PROs) among patients with chronic diseases for use in research and clinical practice [1].

Chronic obstructive pulmonary disease (COPD) is a progressive disease characterized by airflow limitation that is not fully reversible [4]. It is a prevalent condition that ranks in the top five for leading causes of death worldwide and in the top three in the USA [2, 3]. COPD is characterized by episodes of exacerbation that require acute therapies and sometimes hospitalization that are associated with declines in health-related quality of life (HRQOL) [5].

COPD represents a potentially informative target condition for evaluating the validity of the PROMIS instruments for several reasons. Stable COPD is associated with relatively poor health status across many areas covered by PROMIS instruments, including depression, anxiety, fatigue, mobility, activities of daily living (ADL), and social activities, with significant declines seen in several of these domains during acute exacerbations of the condition [913]. One study examined the most important HRQOL domains from the COPD patient perspective, and these patients identified several relevant PROMIS domains including fatigue, physical functioning, social roles, and social activities to be most relevant for their condition [36]. Another study demonstrated that stable COPD patients with more severe lung function had significantly worse PROMIS physical function and social role domain scores [37]. Across studies, several different HRQOL instruments including generic and COPD-targeted measures have been used to evaluate the impact of COPD on HRQOL. Most COPD-specific measures correlate weakly with clinical measures such as FEV1 [3235, 37]. Therefore, the best instrument and the relative sensitivity of generic versus condition-targeted measures are not generally agreed upon in the literature [58]. Since PROMIS instruments are designed to be applicable to a range of chronic illnesses, they allow for comparisons across a variety of chronic health conditions and studies. Hence, the relative validity of PROMIS instruments compared with existing COPD-specific instruments is important to document.

This aim of this study was to examine the validity of PROMIS scales in a cross-sectional comparison of stable patients with COPD and patients with a recent exacerbation. Specifically, the validity is evaluated by (1) exploring the correlations of the PROMIS scales with clinical indices such as forced expiratory volume in 1 s (FEV1) and 6-min walk assessments; (2) evaluating the correlations of PROMIS scales with established COPD-targeted instruments; and (3) comparing PROMIS scale scores between stable COPD patients and patients experiencing an exacerbation.

Materials and methods

Participants

Patients were eligible for inclusion in the study if they met the following criteria:

  1. 1.

    had an established clinical history of COPD in accordance with the global initiative for chronic obstructive lung disease (GOLD) definition [14, 15].

  2. 2.

    had at least a 10 pack/year history of smoking.

  3. 3.

    were 40 years or older.

  4. 4.

    read and spoke English.

  5. 5.

    had access to and be able to communicate using a touch-tone telephone.

  6. 6.

    were able to see and interact with a computer screen, mouse, and keyboard.

Two groups of patients with a clinical diagnosis of COPD were eligible for enrollment (1) patients with a stable COPD diagnosis and (2) patients currently experiencing a COPD exacerbation. For those enrolled in the stable group (n = 100), the patient needed to be exacerbation-free for a minimum of 2 months prior to enrollment. For those enrolled into the exacerbation group (n = 85), treatment for an exacerbation may have started no more than 3 days prior to the day of enrollment for patients recruited in the outpatient setting and no more than 6 days prior to the day of enrollment for patients recruited in the inpatient setting. An exacerbation was defined as a sustained worsening of COPD symptoms from stable state from normal day-to-day variations. Criteria for an exacerbation included that it was acute in onset, necessitated a change in regular medication [4], and required treatment with antibiotics, corticosteroids, hospitalization, or a combination of these events [16, 17].

Patients were excluded from participation if they had any concurrent medical or psychiatric condition that may have precluded participation in this study or completion of self-administered questionnaires (e.g., moderate to severe dementia and/or severe, uncontrolled schizophrenia), had a history of asthma without coexistent COPD as the primary diagnosis, or were experiencing a current heart failure exacerbation.

Participants were recruited from outpatient clinics and hospitals at four research sites (University of North Carolina Health System, NorthShore University Health System, Pittsburgh VA Medical Center, and Durham VA Medical Center). The study was conducted in accordance with the amended Declaration of Helsinki and was approved by the Institutional Review Board (IRB) at each site. At the time of enrollment, eligible participants gave written informed consent and began the baseline assessment.

Study procedures

For those who were stable at enrollment, the baseline assessment included a questionnaire, literacy assessment, percent predicted FEV1, and a 6-min walk test. For those in exacerbation at enrollment, the questionnaire was administered at baseline and the clinical measures (FEV1 and a 6-min walk test) along with the literacy assessment were administered when the patient was deemed stable (approximately 3 months after exacerbation). This was done because it was difficult for the exacerbators to complete these measurements during the time of exacerbation. In addition, all of the analysis involving the clinical measures (FEV1 And 6-min walk test) utilized data obtained when the patients were deemed stable (see “Data analysis” section). The questionnaire collected information on demographics, comorbid conditions, COPD history (symptoms, duration of diagnosis as well as the number of exacerbations, hospitalizations, and emergency room (ER) visits during past year) and health-related quality of life (HRQOL). Participants self-reported their responses on laptop computers in the clinic or in the hospital. Research assistants reviewed the clinical chart to abstract variables including clinical characteristics, body mass index (BMI), and COPD medications.

Study measures

One goal of the study was to evaluate the associations between the clinical assessments (percent predicted FEV1 and 6-min walk) and the HRQOL measures. The measures are summarized in Table 1. Included were the PROMIS adult health domains (www.nihpromis.org) and several targeted “legacy” measures: St. Georges Respiratory Questionnaire (SGRQ), Modified Medical Research Council (MMRC) Dyspnea Scale, Functional Assessment of Chronic Illness Therapy (FACIT) Dyspnea Scale, EXAcerbations of Chronic pulmonary disease Tool—patient-reported outcome (EXACT-PRO), and the Pittsburgh Sleep Quality Index (PSQI) [1824, 38]. These measures were chosen because they are some of the most commonly used HRQOL measures in COPD clinical trials and observational studies. The MMRC and FACIT Dyspnea Scales assess the impact of dyspnea on activities of daily living and physical functioning. The SGRQ is a HRQOL questionnaire designed for patients with chronic airflow limitation and evaluates three domains (1) symptoms (2) activities that exacerbate symptoms, and (3) areas of disease impact such as employment, panic, stigmatization, need for medications, side effects of medications, expectations, and being in control of health as well as disturbances of daily life. The EXACT-PRO measures COPD symptoms and manifestations of exacerbations. Pittsburgh Sleep Quality Index evaluates sleep quality and disturbances. Version 1.0 of the PROMIS items was used, and these PROMIS items can be found by accessing www.nihpromis.org. The analysis for this manuscript is cross-sectional and included the initial assessment day for the EXACT-PRO. Due to the quantity of data, the longitudinal data collected from the EXACT-PRO diaries and the other PRO measures will be the subject of another manuscript. Literacy was assessed using the short test of functional health literacy in adults (S-TOFHLA) [27].

Table 1 Study measures

We anticipated that all of the legacy instruments would be significantly correlated with the PROMIS domain scales. Specifically, the PROMIS physical function and social health (discretionary social activities and social roles) would be the most strongly correlated for the SGRQ, FACIT, and MMRC instruments as these had been indicated in another study [37]. Because these PROMIS domains (physical functioning and social health) are scored so that a higher score represents better health, we hypothesized negative correlations with the legacy measures.

A 6-min walk assessment and percent predicted FEV1 measurements were performed at the baseline visit unless the patient was experiencing an exacerbation or feeling too ill, in which case these measures were performed when the patient was deemed stable. The 6-min walk test measured the distance in meters that a participant is able to walk in a 6-min time span [25, 26]. Portable spirometry was used to estimate FEV1 using the American Thoracic Society criteria [4].

Scoring of HRQOL measures

PROMIS 1.0 measures administered assess physical function, pain interference, pain behavior, fatigue, anxiety, depression, anger, social roles (SR—satisfaction with participation in social roles), discretionary social activities (DSA—satisfaction with participation in discretionary social activities), and global health (Table 1). PROMIS measures can be administered via static short forms (SF) (SF number of items: physical function = 10, pain interference = 6, pain behavior = 7, fatigue = 7, anxiety = 7, depression = 8, anger = 8, social roles = 7, discretionary social activities = 7, and global health = 10) or by computer adaptive testing (CAT). For CAT administration, the next item to be administered is based on the participant’s prior responses, and items are administered until the reliability of measurement meets a target threshold (e.g., 0.90). The PROMIS CAT parameters (www.assessmentcenter.net) were used to administer the CAT. Any remaining SF items that had not yet been administered were presented after the CAT was completed. Scores from the CAT and SF (all items) were calculated using item response theory (IRT) parameters allowing a CAT score and a SF score for each participant on the same underlying metric. PROMIS scores are scored on a T-metric with 50 representing the mean and ten the standard deviation in the US general population. For PROMIS domains of anger, anxiety, depression, fatigue, pain behavior, and pain interference, higher scores indicate worse health and for domains of physical function, DSA, SR, and global health (physical and mental), higher scores indicate better health.

The SGRQ contains three domains (symptoms, activity, and impacts) and a summary score on a 0–100 scale with 100 representing the worst HRQOL [18, 38]. The MMRC scale is scored on a scale of 0–4 (0 = not troubled with breathlessness except with strenuous exercise; to 4 = too breathless to leave the house or breathless when dressing or undressing) [19]. To perform correlation analysis, the MMRC score was linearly transformed to a 0–100 possible range (1 = 0; 2 = 25; 3 = 50; 4 = 75; and 5 = 100) for some analyses. This was done by subtracting one from the original MMRC score and multiplying it by 25. The FACIT Dyspnea Scale consists of 20 items that assess dyspnea severity (ten items) and related functional limitations (ten items). Lower scores reflect less severity or difficulty completing a task [20]. The PSQI is scored on a 0–21 scale with higher scores representing worse sleep quality [24]. The EXACT-PRO daily dairy total score is computed across the 14 items and has a possible range of 0–100, with higher values indicating a more severe condition [2123]. The S-TOFHLA was scored according to published guidelines, and literacy was classified as adequate for those individuals scoring range of 23–36 and inadequate for those scoring 0–22 [27].

Data analysis

A cross-sectional analysis was performed using the baseline evaluations unless otherwise specified. Preliminary data were analyzed using descriptive and graphical methods wherever applicable, to facilitate interpretation of the data. Data was summarized using descriptive statistics (e.g., means and standard deviation for continuous and ordinal variables and count and frequency for categorical variables) for demographic variables and all HRQOL and clinical measures. All item responses were examined using measures of central tendency (mean, median), spread (standard deviation, range), and response category frequencies. Correlations between PROMIS and legacy measures as well as between PROMIS and clinical measures were estimated. Only the correlations between PROMIS administered by CAT and legacy measures are presented as there were no significant differences noted when examining the correlations between PROMIS SF and legacy measures. PROMIS IRT-calibrated person parameters were used for correlations. As noted earlier, 6-min walk and FEV1 percent predicted scores were collected when all patients were deemed stable. The relationship between HRQL (PROMIS and legacy) scores and clinical measures (6MWT and FEV1) used the HRQOL assessed at the time the patients were deemed stable. In other words, these correlations (PROMIS with clinical measures) were performed at baseline for stable patients. For patients experiencing an exacerbation, the PROMIS and clinical measures were collected when the patient was deemed stable, which may have been up to 3 months after the baseline visit.

A two-sample t test was used to compare scores on the PROMIS domains between the two COPD groups (stable versus in an exacerbation). Pearson correlation coefficients of the PROMIS measures with clinical measures and the HRQOL legacy measures were computed.

Results

The stable COPD patients did not differ substantially from the COPD patients enrolled during an exacerbation on BMI, smoking history, percent predicted FEV1, ability to walk greater than 300 meters on 6-min walk assessment, literacy, gender, race, and presence of comorbid conditions. However, exacerbators were significantly younger and had been diagnosed for shorter periods of time than stable patients. In addition, the exacerbators reported significantly more COPD-related hospitalizations, emergency room (ER) visits, and exacerbations during the past 12 months. Not surprisingly, the exacerbators reported significantly more COPD symptoms and exacerbation-related medications (antibiotics and systemic steroids, Table 2).

Table 2 Baseline demographic and clinical characteristics of enrolled patients

Table 3 shows the mean PROMIS scores by COPD exacerbation status at enrollment. For all domains, the stable patients reported significantly better PROMIS scores, whether administered via CAT or SF. Similarly, the stable patients reported significantly better HRQOL on all of the legacy instruments. PROMIS short form scores were significantly correlated (p < 0.001) with CAT scores for all domains (physical function r = 0.89; pain interference r = 0.95; pain behavior r = 0.97; fatigue r = 0.88; anxiety r = 0.94; depression r = 0.85; anger r = 0.98; social roles r = 0.97; and discretionary social activities r = 0.95).

Table 3 Mean PROMIS SF, CAT, and legacy instrument scores by COPD exacerbation status at enrollment

All of the PROMIS measures (using CAT) were significantly correlated with the legacy instruments except for the PROMIS pain domain measures (pain behavior and pain interference) and the MMRC (Table 4). These results did not differ for PROMIS SF (data not shown). Six-min walk scores were most highly correlated with the PROMIS physical function scores, followed by the fatigue, social domains (SR and DSA), and to a lesser extent anxiety and depression (Table 5). Similar correlations were found for PROMIS SF and CAT. Percent predicted FEV1 scores were significantly correlated with PROMIS physical function scores (SF and CAT) as well as with FACIT, MMRC, and SGRQ (activities and total) scores (Table 5).

Table 4 Product-moment correlations of PROMIS CAT and legacy instruments when participants are stable
Table 5 Product-moment correlations of PROMIS SF, CAT, and legacy instruments with 6-min walk and percent predicted FEV1 when participants are stable

Discussion

Only a couple of studies have evaluated PROMIS instruments among COPD patients. One prior cross-sectional study showed PROMIS scores to be worse for those with COPD than those without it [28]. This study used self-reported chronic disease status including COPD and did not include any assessment of clinical diagnosis, and COPD-specific results were not reported in detail. Another recent study noted that PROMIS physical function and social activity scores decreased with level of lung function measured by GOLD grade [37]. The present study is unique in comparing PROMIS scores between stable and exacerbating COPD patients. Exacerbators reported significantly worse HRQOL on all domains. This study is also one of the first studies to examine the correlations between PROMIS scores and clinical indices.

The results of this study for the PROMIS measures were similar whether the administration was done using a static short form or CAT. The availability of PROMIS instruments in both CAT and short form offers researchers flexibility in administration formats. CAT administration offers the advantage of minimal participant burden without sacrificing measurement precision, but requires a computer for administration. Short forms can be accomplished via paper and pencil and thus does not require a computer for administration [39]. Both were developed with rigorous qualitative and quantitative methodology and offer the advantages of comparability across conditions, reliability, validity, and precision.

Several studies have now confirmed that COPD patients experiencing an exacerbation report significantly poorer HRQOL than stable COPD patients using either generic or disease-targeted measures [58]. The results for the disease-targeted measures administered in this study were similar to those previously reported [58]. The magnitude of score differences between stable and exacerbating patients for the disease-specific measures is substantially greater than for the PROMIS measures. This is not unexpected since the disease-specific measures would be most likely to demonstrate differences between these two populations as compared to a generic measure. Exacerbations of COPD have been reported to lead to substantial reductions in HRQOL [29, 30]. In addition, some studies have found that patients with worse HRQOL scores on disease-targeted measures are more likely to be hospitalized and less likely to survive [29, 30].

Patients with COPD often have several other chronic illnesses. Because it is difficult for patients to attribute their symptoms to one disease or treatment versus another, generic HRQOL measures may be easier to complete [31]. In this study, comparable findings were presented for generic and disease-targeted measures. PROMIS scores were significantly correlated with the disease-specific legacy instruments. Similar to other reported results [37], this study found moderate correlations between the PROMIS domain scores and the FACIT-dyspnea (correlation coefficients range 0.31–0.78) and MMRC-dyspnea (correlation coefficients range from 0.11 to 0.55). In addition, the largest correlations with the 6-min walk test were similar for the SGRQ activities (r = −0.41), the FACIT-dyspnea and functional limitations (r = −0.50), and the PROMIS physical function (r = 0.57) domain scores.

The correlations between FEV1 and HRQOL tended to be small in magnitude and similar to those reported in another study correlating FEV1 with PROMIS measures [37]. There is a sizable literature reflecting the relatively weak correlations of disease-targeted and generic HRQL instruments with clinical measures, similar to those found in this study. A recent meta-analysis reported weighted correlations of −0.29 between SGRQ total score and FEV1 and −0.34 between SGRQ total score and 6-min walk [32]. Similarly, weak correlations have been noted between generic HRQOL instruments scores and FEV1 assessments (range for SF-36 physical functioning summary (PCS) r = 0.06–0.38 and SF-36 mental health summary (MCS) 0.09–0.25) [3335].

One limitation of this study was the small sample size at each site that recruited patients; hence, clinical site-specific analyses were not feasible. All study sites underwent extensive three-day training on a standardized study protocol to ensure consistent implementation for patient recruitment, enrollment, and study procedures and to minimize variability among sites. Another limitation was that differential item functioning for the two study groups (stable and exacerbation patients) was not examined. In addition, for some of the analyses, a subset of the study patients was utilized due to the fact that no patients completed these measures when they were in a stable state. Moreover, the analyses reported here do not address longitudinal changes in COPD status; hence, it is difficult to determine whether some of the differences seen between stable and exacerbating patients are really due to the exacerbation or just different underlying disease severity. These issues will be addressed more completely in future manuscripts that incorporate longitudinal components of this study.

Conclusion

The study provides support for the validity of the PROMIS measures and found that they performed similarly to legacy measures targeting the impact of COPD on HRQOL. Because the PROMIS instruments are designed to be applicable to a range of chronic illnesses, they offer some advantages over disease-targeted instruments by allowing for comparisons across a variety of chronic health conditions and studies.