Introduction

The use of patient-reported outcomes (PROs) in oncology has increased steadily in the past decade. With the emergence of the Patient-Reported Outcomes Measurement System (PROMIS) [1], the publication of the U.S. Food and Drug Administration’s guidance to support labeling claims [2], and the development of the PRO version of the Common Terminology Criteria for Adverse Events [3], we expect to see continued use of PROs in descriptive studies and clinical trials. This use is being facilitated by the accumulation of validation evidence for widely used assessment tools such as the M. D. Anderson Symptom Inventory (MDASI) [4], the Functional Assessment of Chronic Illness Therapy [5], the European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire QLQ-C30 [6, 7], and the 36-item Short-Form Health Survey (SF-36) [8] or its derivative, the SF-12 [9].

Patient-reported outcomes not only capture symptom or quality-of-life information for clinical use, they also often serve as prognostic factors for clinical outcomes such as overall survival [1012]. Indeed, some studies have suggested that patient-reported quality-of-life information is a more valuable prognostic factor than clinician-assessed performance status [10]. However, most studies have found only modest improvement in predictive ability when PROs are added to models already containing other prognostic factors [10, 13]. Many PROs, especially those pertaining to the interference of symptoms with patients’ daily activities, have not been studied as predictors of survival.

In this study, we sought to test whether adding various PROs to models containing established prognostic factors would improve the prediction of survival in patients with advanced non-small cell lung cancer (NSCLC). We placed particular emphasis on a less-studied PRO: baseline ratings of the interference scale of the MDASI, which measures how much symptoms interfere with life domains. Although symptom interference was shown previously to be predictive of primary tumor recurrence in patients with brain tumors [14], symptom interference has not been used to examine survival in patients with lung cancer. We also considered the added prognostic ability of several other PROs derived from either the MDASI or the SF-12. Because many studies of modeling survival have used performance status as part of a statistically significant index (e.g., [1519], we were particularly interested in whether PROs improved survival prediction when models were adjusted for the “gold standard” of performance status.

Methods

Patients

Patients were recruited between January 2004 and December 2008 from the thoracic medical oncology clinic at The University of Texas MD Anderson Cancer Center. Eligible patients had stage IV NSCLC, were at least 18 years old, spoke English, had an Eastern Cooperative Oncology Group performance status (ECOG PS) of 0, 1, 2, or 3 and were scheduled for first-line chemotherapy. Patients provided written informed consent, and the M.D. Anderson Institutional Review Board approved the study.

The current study is a reanalysis of data obtained in a previous study [20], for which patients were recruited from several kinds of treatment facility, including MD Anderson. Because of variations in symptom management between tertiary cancer centers and other hospitals, patients were excluded from the present study if they were not treated at MD Anderson.

Prognostic factors

With the goal of understanding the added benefit of various PROs in modeling overall survival, we considered prognostic factors of two types: those based on baseline demographic or clinical variables related to overall survival, according to previous studies [11, 12, 14] and those based on baseline PROs. The demographic and clinical prognostic factors were age (in years), ECOG PS (poor, defined as ≥2, vs. good, defined as ≤1), previous chemotherapy (yes vs. no), and sex (male vs. female).

The PROs considered were derived from the baseline administration of the MDASI [4] and the SF-12 [9]. The MDASI solicits patient self-report of the severity of 13 common cancer-related symptoms and the degree to which symptoms interfere with daily functioning [4]. The severity of each symptom is reported on an integer scale of 0 (“not present”) to 10 (“as bad as you can imagine”). For this study, we used the MDASI-LC [21], a validated lung cancer version of the MDASI that assesses three additional symptoms, constipation, coughing, and sore throat. The six MDASI symptom interference items are also reported using an integer scale of 0 (“did not interfere”) to 10 (“interfered completely”). These items measure interference with activity, work, walking, mood, relations with others, and enjoyment of life.

The mean of all symptom severity scores can provide a convenient one-number summary for use as a prognostic factor; however, to have a more sensitive measure, we elected to use the mean of only five of 15 MDASI symptom scores (the MDASI-LC item “sore throat” was excluded from the analysis because it is related to radiation therapy, which none of our sample received). In a previous report of data from a sample that included patients in the present study, symptoms were ranked by severity and analyzed as predictors of overall survival on an individual basis [12]; in the current study, we calculated the composite score of the five symptoms with the highest average baseline levels (fatigue, disturbed sleep, shortness of breath, drowsiness, and pain), rather than analyzing them individually. Mendoza et al. [21] suggested that different composite scores can be derived using various combinations of MDASI-LC items so long as they are specified a priori. We also evaluated the mean of the scores for the six interference items as a potential prognostic factor.

The SF-12 is a validated derivative of the SF-36 [8, 9]. The 12 items of the SF-12 include one item for the respondent’s self-reported general health. One apparent advantage of a single-item measure is its simplicity [22]. The five response categories for the general health item were combined to form a binary response: poor or fair versus excellent, very good, or good. The SF-12 items can be combined to form the physical component summary (PCS) and mental component summary (MCS) [9], each of which ranges from 0 (worst) to 100 (best). In addition to the dichotomized general health item, we evaluated the PCS and MCS as potential prognostic factors.

Statistical analysis

Descriptive statistics for the prognostic factors and PROs were computed. In addition, the univariate significance of each potential prognostic factor in a Cox proportional hazards model was assessed. To facilitate comparisons, only data from patients with observed values for each factor considered were included. This guaranteed that the same subset of patients was used in every analysis.

Most of the factors evaluated in this study were categorical, but several were treated as quantitative factors. Age, SF-12 PCS, and SF-12 MCS were manifestly quantitative. Although both the mean MDASI interference level and the mean MDASI symptom severity level were based on ordinal items rather than quantitative measurements, the 11 possible categories for each item were sufficiently refined and had an inherent equal spacing to justify their treatment as a quantitative response; this assumption was even more reasonable for the average of several such items.

To assess the added prognostic ability of the various PROs considered, a Cox proportional hazards model was fit using the demographic and clinical factors noted above. This base model was assessed for validity. To this base model was added in turn each of the PROs, and if the PRO significantly improved the model fit (likelihood ratio test p ≤ 0.05), then interactions between the PRO and other factors were also considered.

The additive benefit of a potential prognostic factor to a model can be assessed by comparing the C statistic for models with and without the factor. The C statistic [23] measures the concordance between the model’s predicted risk and overall survival for all pairs of patients in which it is possible to determine which patient lived longer [24].

Bootstrapping techniques were used to check the robustness of the findings. This was particularly important because of the relatively small ratio of events to factors in the models. The small ratio heightened concerns about the potential of influential points to drive model conclusions. The bootstrap procedure implemented was to construct confidence intervals for hazard ratios (HRs) using the percentile method. The preliminary step was to resample (with replacement) the same number of observations from the original data set 1,000 times and then to refit a given model to each of these resampled data sets. The HR estimate for any factor of interest was computed for each sample, and the 2.5th and 97.5th percentiles of these HR estimates from the 1,000 samples were used to form a 95 % bootstrap confidence interval. This is one of the most straightforward bootstrap techniques that can be used [25]. In addition, we determined the proportion of the bootstrap samples for which the factor had p ≤ 0.05. Such a proportion can be used in assessing the usefulness of a prognostic factor [26].

All calculations were performed using R [27] with the survival package [28]. p values are in all instances based on the (two-sided) likelihood ratio test.

Results

Of a potential pool of 95 patients, 90 had no missing information for any of the factors and thus were included in our analysis. The five patients with missing information had missing SF-12 PCS and SF-12 MCS scores (three patients), ECOG PS (one patient), MDASI pain severity (one patient), and MDASI relationship interference (one of the patients also missing SF-12 PCS and SF-12 MCS). The median time span between diagnosis of lung cancer and baseline symptom assessment was 58 days. At the time of the initial analysis, 88 of the 90 patients had died. The median survival time was 8.4 months.

Summary information for these 90 patients, including the univariate significance of each potential prognostic factor, is included in Table 1. For the factors treated as quantitative, the comparison for the HRs corresponds to a change in the measure of roughly one-half standard deviation. Using an unadjusted α of 0.05 for each factor as a predictor of overall survival, only the mean MDASI symptom interference score and the SF-12 general health item were found to be significant. The Cronbach’s alpha for the composite of the five symptoms with the highest average baseline levels (fatigue, disturbed sleep, shortness of breath, drowsiness, and pain) was 0.74. The mean score from these five symptoms for patients with good ECOG PS was 1.9 (SD = 1.49), whereas patients with poor ECOG PS had a mean score of 3.19 (SD = 1.75). Hence, the groups differed statistically (p < 0.001) and meaningfully (effect size = 0.79), demonstrating known-group validity.

Table 1 Baseline characteristics of 90 patients with stage IV non-small cell lung cancer and univariate significance in a Cox proportional hazards model for overall survival

The base Cox model with sex, age, previous chemotherapy, and ECOG PS had a pronounced weakness because the estimated survival function appeared to behave differently for good ECOG PS than for poor ECOG PS. Because of this, we stratified the base Cox model by good versus poor ECOG PS. Figure 1 depicts the estimated survival curve for patients [29] in each stratum.

Fig. 1
figure 1

Estimated overall survival curves for two “average” patients with stage IV NSCLC who differ at baseline only by ECOG PS group membership

Table 2 displays the estimated adjusted HR of each factor for this base model and for the models with each PRO-derived measure added. Each PRO had its estimated effect in the anticipated direction, but only the mean MDASI interference had a p value ≤0.05 for its marginal effect when added to the base model.

Table 2 Estimates of adjusted HRs of each factor in the base Cox proportional hazards model and when the PRO-based measures were added

Further investigation found that the effect of mean MDASI interference depended on ECOG PS, so the model with this interaction was also considered (Table 2). When the interaction was included, the effect of mean MDASI interference on overall survival was significant only among patients with poor ECOG PS. The estimate and 95 % Wald’s confidence interval for the adjusted HR of a one-point increase in mean MDASI interference was 1.03 (0.89–1.20) for the good ECOG PS group and 1.58 (1.26–1.99) for the poor ECOG PS group. Fig. 2 depicts these estimates and 95 % confidence intervals for the stratum-specific adjusted HRs.

Fig. 2
figure 2

Estimates and 95 % confidence intervals of the adjusted HR associated with a one-point increase in mean MDASI symptom interference level, by ECOG PS stratum; HR > 1 indicates difference in survival

Because the multivariate Cox proportional hazards models were stratified by ECOG PS, the C statistic was computed within each stratum and by combining the stratum-specific concordance results (Table 2). The overall C statistic for all models was low, but when computed separately within each stratum, the C statistics revealed that for each model, the concordance was substantially higher among patients with poor ECOG PS than among those with good ECOG PS.

The bootstrap procedures yielded results agreeing with the earlier findings. The 95 % bootstrap confidence intervals for each PRO’s adjusted HR are shown in Table 3. Not surprisingly, these intervals tended to be wider than Wald-based confidence intervals [25, 30]. Mean MDASI interference was the only PRO for which the confidence interval excluded 1.00, and when allowed to be stratum specific, symptom interference had a marked effect for the poor ECOG PS stratum but not for the good ECOG PS stratum.

Table 3 Bootstrap-based confidence intervals for the adjusted HR when PRO-based measures were added to the base model

The proportions of bootstrap resamples in which each PRO had p ≤ 0.05 when added to the base model are given in Table 4. These proportions were high only for the mean MDASI interference score.

Table 4 Proportion of bootstrap resamples for which the PRO measure was a significant prognostic factor for overall survival when added to the base model containing only demographic and clinical factors

Discussion

Our results indicate that symptom interference may be a prognostic factor for overall survival in patients with stage IV NSCLC. This finding seemed robust despite the small sample size. Some of the evidence for the added prognostic benefit of the mean MDASI symptom interference score might be attributable to the increased precision that this PRO offers compared with ECOG PS, which is already categorical and which we further dichotomized. Still more of the evidence might be due to the restricted range of ECOG PS in the analysis data set: no patients had baseline ECOG PS = 4 and only 3 % had ECOG PS = 3. Even so, a particular range of ECOG PS scores might be expected for patients who are eligible for a particular treatment. Therefore, a PRO might further differentiate patients likely to live longer from those likely to live shorter, at least if the patients have poor performance status.

Guidelines for the interpretation of C statistics [31] indicate that the models considered in this analysis may not be comparable with those used in clinical practice. One possible cause of the relatively poor prognostic ability of all models considered is that the patient population was limited to patients with stage IV NSCLC, who would be expected to die sooner than patients with early-stage NSCLC. Therefore, stage, which has previously been identified as a significant prognostic factor for overall survival in patients with NSCLC [32], could not be used to improve the model.

Of the many available patient-reported measures, we considered only a few for our retrospective analyses. Ideally, more PRO instruments would have been considered. Availability of data from the EORTC QLQ-C30 [6], for instance, would have permitted us to consider the physical functioning scale, which Braun et al. [32] reported as a prognostic factor for NSCLC. The ability to verify the usefulness of EORTC’s physical functioning scale would have been especially useful in our assessment of the added prognostic benefit of PROs after adjustment for ECOG PS, because Braun et al. did not control for performance status. Likewise, although we included one single-item measure (SF-12 general health), it was not the same single-item measure (overall quality of life) previously shown to be a significant prognostic factor in NSCLC [11].

Most of the PROs we considered did not meet our criterion for having a significant added benefit, but this might be because our analysis data set was relatively small, especially if one considers the ratio of observed deaths to prognostic factors. Even most of the demographic and clinical factors were generally not significant. For example, sex was the non-PRO factor that had the strongest evidence for being important, but it was not statistically significant in many of our models even though it was demonstrated to be a significant prognostic factor in a larger data set [33]. In addition, all of the patients in this study were treated at a tertiary cancer center; however, Cleeland et al. [20] showed that type of treatment site is associated with differences in symptom management. Further studies should include patients from different patient populations and treatment settings, including community clinics and public hospitals, and should include more PROs.

We do not claim that symptom interference is the best PRO for use as a prognostic factor in oncology research and practice. Instead, we note that it demonstrated a statistically significant effect in a prognostic model in advanced lung cancer despite a small sample size and despite adjustments for multiple clinical and demographic factors. It is especially interesting that symptom interference—a patient-reported measure that has not been extensively studied as a prognostic factor—was more predictive in our analysis than demographic or clinical factors and was not redundant in models stratified by ECOG PS. In fact, the usefulness of the measure seemed to entirely depend on ECOG PS, suggesting that symptom interference and clinician-assessed performance status should be seen as complementary. To our knowledge, the interactive effect of symptom interference and performance status as a prognostic factor for overall survival in NSCLC has not been previously established. While our results apply directly to patients with advanced NSCLC, it is plausible that mean MDASI symptom interference and/or other PROs could serve as important prognostic factors in other patient populations, especially for patients with advanced disease. Hence, the use of the MDASI symptom interference is encouraged to assist in providing clinicians with prognostic information in practice.

Some have proposed using PROs as stratifying factors in clinical trials because of their frequent superiority vis-à-vis performance status in predicting survival [34]. We agree that PROs should be considered in randomization strata. Our findings also suggest that the utility of at least one PRO as a prognostic factor depends on performance status. Further attention should be devoted to studying possible interaction effects between performance status and PROs, especially symptom interference, in prognostic models. In addition, future studies should include patients across all types and stages of cancer.