Introduction

Cardiac trials commonly include the collection of cardiac-specific outcome measures. However, these measures do not reflect how individuals perceive the impact of angina-symptoms upon their health-related quality of life (HRQL). For purposes of resource allocation the impact of interventions on HRQL is ideally measured in generic terms in order to facilitate comparison across different conditions and patient groups. Furthermore, economic evaluation requires these HRQL effects to be valued on a single scale, anchored at death and good health, based on the preferences of an appropriate group of individuals (e.g. a sample of patients or the general public). Furthermore, the use of preference-based measures of health outcome enables HRQL and length of survival to be integrated into a single measure, most commonly quality adjusted life years (QALYs). The EQ-5D is one of a group of outcome measures which categorise patients into a series of generic health states for which a set of social preferences or ‘utilities’ have been previously generated [1]. Other examples of this group of instruments include the Quality of Wellbeing Scale [2], and the Health Utilities Index [3]. An algorithm that may be used to obtain single index utility scores from the Short-Form 36 (SF-36) has also recently been published [4].

Economic evaluations of health care technologies are becoming increasingly used in a policy-making context. Within the United Kingdom the National Institute for Health and Clinical Excellence (NICE) provides guidance on current ‘best practice’ to the NHS for England and Wales. NICE recommends that evidence provided to it should include a cost-utility analysis which relates the differential cost of interventions with differential QALYs [5]. Furthermore, it recommends that health states should be measured in patients using a generic classification system for which UK population preference values exist, such as the EQ-5D [6]. Statements of good practice in the United States have also supported the use of QALYs based on social preferences [7].

Many clinical trials within the cardiac area have not collected preference-based outcome information to facilitate estimates of QALYs. As a result related economic evaluations have focused upon more limited clinical measures of outcome, such as the number of patients free from angina [8], cost per life year saved [9], and cost per patient free from major adverse cardiac events [10]. A range of clinical indicators has been developed to measure the severity of angina, including the Canadian Cardiovascular Society (CCS) score, and measures of breathlessness. However, these measures do not account for how angina symptoms are perceived by patients and do not place the effects of angina within the wider context of HRQL necessary for economic evaluation.

Utility data may not have been collected within clinical trials for a variety of reasons. For example, trials may have been designed prior to the availability of reliable instruments designed to produce measures of utility; there may be a concern of over-burdening patients with the completion of questionnaires; economic evaluations of the technologies of interest may have been considered after the trial design has been completed, and there may be constraints upon time and resources that prevent utility data being collected within clinical trials. One cardiac trial that has included a utility-based assessment of quality of life is the Bypass Angiography Revascularisation Investigation which compared bypass surgery and angioplasty for the treatment of multivessel coronary artery disease. A sub-group of patients completed a battery of quality of life scales and a time-tradeoff questionnaire [11]. However, the majority of patients in this study did not have angina, and the number of responses for each of the CCS categories was small (n<30).

The second Randomised Intervention Treatments for Angina (RITA-2) trial was a major UK trial that compared percutaneous transluminal coronary angioplasty (PTCA) and medical management for the treatment of stable angina [12]. Although utility data were not directly collected within the trial, non-utility based HRQL data were collected. The RITA-2 trial indicated that patients’ HRQL (as measured by the Nottingham Health Profile and the SF-36) is strongly correlated with cardiac-specific measures of angina symptoms and breathlessness [13].

In the absence of directly collected preference-based outcome data collected in a clinical trial it may be possible to generate utility values by mapping data collected within a non-trial survey of patients with similar characteristics to the trial patients using disease-specific outcome measures which are collected both in the trial and in the separate survey.

Previously published approaches to mapping to utility measures have included the elicitation of time trade-off valuations of the four New York Heart Classifications of heart failure in a sample of 64 persons [14] and using regression analysis to derive a utility-based scoring mechanism for the Utility-based Quality of Life – Heart Questionnaire using disease-specific data and self-reported utilities in a sample of 201 cardiovascular patients [15]. A previous study has explored the relationship between utilities, CCS scores and the Duke Activity Specific Index [16]. Utilities were estimated using the standard gamble approach in a sample of 41 patients with chest pain about to undergo elective cardiac catherisation. The authors reported that the correlation between the utilities and the disease-specific measures were low.

This study explored the relationship between utility data, based on angina patients’ health states and public valuations, and clinical, demographic and angina-specific outcome data collected in a trial of angina. The main aim of this contribution is to provide a method to estimate utilities when only angina-specific outcome measures have been included within a trial. The data reported here are from a sample of patients with stable angina who were chosen to be representative of patients from the RITA-2 trial [12].

Methods

Study sample

Ethics committee approval was obtained to recruit patients from four UK cardiac out-patient clinics. Subjects were aged between 40 and 75 years and either had coronary artery disease with symptoms of angina or had undergone previous revascularisation procedures. Exclusion criteria from the RITA-2 trial [12] were applied: patients convalescing after an invasive therapeutic procedure, patients known to have significant left main stem disease or haemodynamically significant valve disease, and patients known to have life-threatening non-cardiac disease that would limit survival to 5 years. Patients unable to complete the questionnaire unassisted were also excluded.

Measures used

For patients who agreed to take part in the study research nurses at the clinics collected demographic and clinical data through patient interviews. Clinical data included previous revascularisation procedures, current medication (nitrates, calcium antagonists, or beta-blockers), and two measures of disease severity. The first, the CCS score, is a categorical measure of angina pain ranging from level one (least severe) to level four (very severe) [17]. Level zero was added to indicate that the patient suffered no angina pain. The second measure, the breathlessness grade, ranges from zero (mild breathlessness) to five (breathless at rest) [12]: grade 1, the patient becomes breathless when climbing hills, compared with other people of the same age; grade 2, the patient becomes breathless when hurrying on the level, compared with other people of the same age; grade 3, the patient becomes breathless when walking at own pace on level; grade 4, the patient becomes breathless when washing or dressing; and grade 5, the patient becomes breathless when at rest.

Patients were also requested to complete health status questionnaires, containing the EuroQol EQ-5D. The EQ-5D is a non-disease specific instrument for describing and valuing HRQL [18]. Respondents categorise their health status on five dimensions (mobility, self-care, usual activities, pain, and anxiety/depression), where each dimension has three possible levels of response. An index of values was applied to the EQ-5D classification system to generate single-index utility scores, where full health is equal to one and death equals zero. This index of values was derived from a random sample of approx. 3,000 members of the UK general population using the time trade-off technique to obtain their preferences towards a set of pre-defined health states [19]. Thus each patient in the sample describes his or her own health, and the utility score corresponding to that description of health is derived from a population sample.

The objective of this study was to develop an algorithm that can be used to map EQ-5D scores to the clinical and demographic data collected in RITA-2. A cost analysis has been undertaken for this trial [20], and synthesising these cost data with the outcome data from the original trial and the utility data reported here will enable a full cost utility analysis of RITA-2 to be presented in future.

Sample size

On the basis of data collected from similar patients in a previous study [13] the expected standard deviation in utility scores per CCS level was approx. 0.25. In order to achieve a level of precision (in terms of a 95% confidence interval) for a mean utility score per CCS level of 0.08 on the 0–1 scale (i.e. 0.04 on either side of the mean utility), approx. 150 responses were required per CCS level. Thus for the five CCS levels (including level zero) we estimated that at least 750 patients were required.

Statistical methods

Statistical significance is assumed where the p value is less than or equal to 0.05. Correlation was measured using Spearman’s correlation coefficient for ordinal data. Ordinary least squares regression techniques were used to predict utility scores from the clinical and demographic data. Dummy variables were used where appropriate, including for the representation of CCS and breathlessness grades. A forward stepwise approach was used in the regression analysis. Transformations of the dependent variable, the EQ-5D index values, were considered and the Box-Cox method was used to determine the most appropriate transformation [21]. Variables considered for inclusion in the predictive model included CCS score, breathlessness score, gender, age, centre where treated, type and number of previous revascularisation experiences, and type and number of current medications. Interaction terms were included if they were found to be statistically significant on the F test. The Ramsey RESET test was used to test for omitted variable problems. The Cook-Weisenberg test was used to detect hetereoscedasticity in the resulting model. Transformations of the dependent variable were considered where hetereoscedasticity was detected, and White’s corrected standard errors were used where this could not be rectified. Using regression analysis for estimation purposes leads to artificially small levels of uncertainty in the predicted data. The variance-covariance matrix is also presented to enable those wishing to apply the regression model to incorporate the variance of the prediction error obtained from the matrix, which will reflect that the coefficients from the model are estimated rather than known (Table 4).

Results

Interviews were completed by 533 patients. Ten of these respondents were excluded from the analysis based on the exclusion criteria. Of the remaining responses 510 had complete EQ-5D data, and it is these responses that form the basis of the analysis (417 men; median age 61 years, inter-quartile range 55–66). The sample size achieved therefore fell short of the target by 240 patients. The four centres contributed between 64 and 190 patients each to the usable total of responses. Patients had received up to four PTCA procedures and up to three bypass grafts; 44 (9%) patients had received both a PTCA and a bypass graft. In total 228 (45%) patients had not undergone any revascularisation procedure, 217 (43%) had undergone at least one PTCA, and 109 (21%) had underdone at least one bypass graft. Figure 1 describes respondents’ current medication at time of interview. Forty-three (9%) respondents were not currently taking any of the three types of medication. Complete information on current medication was missing for the remaining 7 patients.

Fig. 1
figure 1

Details of medication taken by respondents*

Descriptive analysis

Table 1 shows the number of respondents classified in each of the CCS and breathlessness classifications. On both scales the numbers of patients in the sample declined at higher levels of severity. The largest proportion of respondents was classified as having CCS 1 (29%), closely followed by CCS 2 (26%). Not surprisingly, CCS 4 had the lowest proportion of respondents (8%). Table 2 provides summary statistics of the EQ-5D index scores for each CCS grade where 0 is equivalent to dead and 1 to good health. As expected, mean EQ-5D scores decrease as the severity of angina increases and the differences in EQ-5D scores between CCS grades are statistically significant (χ24=149, p<0.001). The difference in utility score between each angina grade is not constant; the difference in EQ-5D scores is greater between grades 1 and 2, and between 2 and 3, than the difference in scores between grades 0 and 1, and between 3 and 4. The correlation coefficient between CCS and breathlessness grades is 0.54, between CCS and EQ-5D is −0.54 and between breathlessness and EQ-5D is −0.52. All correlation coefficients are statistically significant (p<0.001).

Table 1 Number of respondents by Canadian Cardiovascular Society score (CCS) and Breathlessness grade (Breath)
Table 2 Summary statistics for EQ-5D index scores by Canadian Cardiovascular Society score (CCS) and Breathlessness grade (Breath) (IQR inter-quartile range)

Multivariate analysis

Table 3 details the results of the final model for predicting EQ-5D scores. CCS scores, breathlessness scores (breath), and a variable representing whether patients were currently taking 0, 1, 2, or all 3 of the specified classes of drugs (nodrugs) were included in the final model. The constant value is not equal to zero, indicating that regardless of angina severity this sample of patients has less than full health, which is expected given the age range of the sample. The adjusted R2 value of the model is 0.370, and the mean squared error is 0.227. Transformations of the dependent variable, including taking natural logarithms, squaring, and square roots, were rejected as they did not improve the distribution of the residuals. The results of the Box-Cox analysis did not identify an appropriate transformation. The Ramsey RESET test indicates that there is not an omitted variable problem in the model (p=0.75). The variance-covariance matrix from the model is shown in Table 4. The parameter coefficients in the model are consistent with the ordinal nature of the two clinical scales. The results show that utility is dependent on clinical scores and medication, but not on previous revascularisation surgery.

Table 3 Regression model to predict EQ-5D scores (CCS Canadian Cardiovascular Society score, Breath Breathlessness grade, No. drugs number of medications from beta-blockers, calcium antagonists, and nitrates)
Table 4 Variance-covariance matrix (CCS Canadian Cardiovascular Society score, Breath Breathlessness grade, No. drugs number of medications from beta-blockers, calcium antagonists, and nitrates)

A scatter-plot of the predicted values from the regression model on the observed values is shown in Fig. 2. Each point on the figure illustrates a single observation. The figure demonstrates a large amount of variability between the predicted and observed scores. However, there are a large number of observations occurring around the 45° line, where the observed and predicted scores are equivalent, particularly in the top end of the scale. There is less consensus around the lower end of the scale of observed scores, below 0.4. Predicted values below this value are systematically higher than observed values. The lowest value predicted by the model is 0.23 whereas the lowest observed value is −0.18 (indicating that the health state is considered worse than death).

Fig. 2
figure 2

Plot of observed values on OLS model predicted values

The results from this study demonstrate a relationship between the EQ-5D and the CCS and breathlessness clinical indicators. Both the CCS and breathless measures are ordinal scales, and therefore when mapping between the measures differences in terms of utility values between increments on the cardiac-specific measures are not equal. The difference between levels 3 and 4 on the CCS scale is small, but differences in the coefficients are much larger between levels 1 and 2, and 2 and 3. The largest difference in the coefficients on the breathlessness scale is between levels 2 and 3. The model reported here to predict EQ-5D scores from the cardiac-specific measures and the information on current medication demonstrates greater consensus between observed and predicted scores at the higher end of the utility scale than the lower end, which may reduce the ability of the model to predict EQ-5D scores for patients with severe angina.

The greater variability at the lower end of the scale may stem from the relatively fewer observations in these more severe states. This sample was drawn from patients attending cardiac out-patient clinics; predicting scores at the more severe end of the scale with less variation would require including patients too ill to be treated as outpatients. To overcome this problem of relatively few observations in the more severe cardiac grades we considered a model including the two highest CCS grades as a single dummy variable and the three highest breathlessness grades as another dummy variable. This did not improve the model, and the variation at the lower end of the scale between predicted and observed scores remained. The R2 value of the model showed moderate correlation with EQ-5D single-index scores and is similar to other models of this type [11].

Whenever using data of this type in cost-utility analyses it is necessary to consider the surrounding elements of uncertainty. One element of uncertainty corresponds to the utility score for the health state, using a predefined index of values. The index applied here is based upon valuations from a sample of over 3,000 members of the UK population. However, despite the large number of observations the measures of dispersion in these valuations were much higher than the authors of the study anticipated [22]. This variation should not be ignored when using the results to map to data from other studies. Another element of uncertainty stems from the cardiac measures themselves. These are completed in interview scenarios, and it is possible that misclassification may exist within the CCS and breathlessness measures.

One issue regarding uncertainty more specific to this analysis is that the use of regression approaches to estimate utilities may lead to artificially small variance estimations. If applying the results of this model to external data, two forms of variation should be added back into the model predictions: variance to reflect that parameter coefficients are estimated from a sample rather than known from a population; and variance to reflect that the utilities of patients with the same observed characteristics will differ. The first element of variance should be added in from the variance-covariance matrix (Table 4) and the second type of variance is described by the mean squared error. Approaches to incorporating uncertainty surrounding imputed values are discussed in a recent paper by Briggs et al. [23].

Other non-trial based studies have reported utility data for patients with stable angina [16, 24]. The correlations between CCS scores and estimates of utility using the EQ-5D reported here are stronger than those previously reported between CCS scores and estimates of utility using the standard gamble approach [16]. This may be due to the larger number of patient included in this study or to different approaches used to derive utility values. The median EQ-5D values presented here are somewhat lower than those reported in a study of 220 patients which elicited utility values using standard gamble and time-trade off methods [24].

This contribution describes an investigation to map data from cardiac-specific measures and medication data to a utility scale using ordinary least squares regression techniques. The results demonstrate the possibility of estimating utility data based on social preferences for studies where such data have not been collected directly. For example, the characteristics of the sample of patients presented here are similar to the patients in the RITA-2 trial. Thus it should be possible to revisit these trials to produce estimates of cost-effectiveness in terms of cost per QALY gained.

Researchers should strive to collect utility data within clinical trials that include an economic component. Where this is not possible, it is possible to map utility data from similar patient samples to the original study data in order to produce estimates of cost per QALY gained. However, the uncertainty surrounding the estimates should also be considered within the mapping process.