FormalPara Key Points

Regulatory authorities generally recommend the use of cost-utility analysis, which implies the need to use health state utility as the outcome measure.

The current study aims to enable cost-utility analysis using data from a widely used Parkinson-disease-specific questionnaire.

Additionally, the current study is the first that attempts to predict the utilities of 15D and EQ-5D-5L based on PDQ-8 dimensions and total scores of patients with Parkinson’s disease.

1 Introduction

Parkinson’s disease (PD) is the second most common neurodegenerative disorder around the world. Despite much research on this disease, its specific cause remains unknown. Its diagnosis is subject to a combination of bradykinesia with at least one of the remaining clinical manifestations: muscular rigidity, tremor or postural instability [1, 2]. The multiple motor and non-motor symptoms of such a severe chronic condition as PD have a strong impact on health-related quality of life (HRQoL) and patient well-being. A gradual deterioration of quality of life (QoL) involves a significant financial burden for the public healthcare system [3, 4]. Health programs and treatments aimed to alleviate the symptoms of PD patients may reduce HRQoL losses but also impose costs on the healthcare budget. Health economic evaluation can then become an important tool for decision making by healthcare managers. By comparing the costs and results of health interventions, one can assess if emerging technologies or even those currently used are ‘value for money.’

The economic evaluation may not end all issues regarding the allocation of scarce resources across healthcare interventions, but it can help to reduce political arbitrariness in making such decisions. There exist different economic evaluation techniques, but the one recommended by the National Institute for Health and Care Excellence (NICE) in the UK and other health regulatory agencies around the world is cost-utility analysis (CUA) [5]. This method may be regarded as a particular case of cost-effectiveness analysis, which represents the outcome (i.e., effectiveness) in quality-adjusted life-years (QALYs). A QALY includes the two main dimensions of health outcomes, namely, QoL (morbidity) and life-years (mortality) in a single index [6]. In essence, QALYs are obtained by weighting life-years with a QoL index, ranging from 0 (death) to 1 (full health).

QoL weights are also known as ‘utilities’ (hence the term cost-utility analysis) and are designed to reflect preferences of the general population about the values of different health states. There are two ways to obtain these utilities: via direct preference valuation techniques, such as time trade-off (TTO) and standard gamble (SG), or indirectly via HRQoL measurement instruments, also known as multi-attribute utility (MAU) instruments [7]. These MAU instruments include a multi-attribute descriptive system and an algorithm that converts the description of a health state into a single QoL index. Among existing MAU instruments, the EQ-5D-3L has been the most widely used [8]. For all of them, a value set or ‘tariff’ of utilities is available, and in some cases, there are several country-specific ‘tariffs.’

Some disease-specific instruments are often preferred to generic instruments because of their higher sensitivity to changes of HRQoL and greater ability to detect clinically important differences in a particular condition. That is why these instruments are commonly used in the context of clinical trials [9,10,11]. However, most of these disease-specific measures do not have weights based on peoples’ preferences, and do not have a scoring algorithm that allows utility indexes to be obtained. A major advantage of generic preference-based HRQoL instruments is that they generate utilities that can be used to compare QALYs gained for interventions across patient groups and diseases to aid broad health-service resource allocation decisions.

Since disease-specific instruments may be superior in terms of sensitivity and are also widely used in clinical practice, it would be useful to find a way to assign utilities to condition-specific measures. One way consists of using mapping algorithms, which often include demographic characteristics, in particular age and gender, to improve the predictive performance of the models [12].

The main purpose of this study is to develop a mapping algorithm using HRQoL data from the Parkinson’s Disease Questionnaire-8 (PDQ-8) to impute utility values into three MAU instruments selected in this study (EQ-5D-3L, EQ-5D-5L, and 15D).

2 Methods

2.1 Study Sample and Data Collection

A cross-sectional study was conducted. Data were obtained between May 2012 and August 2013. Patients were recruited from 23 local PD associations, belonging to 13 out of the 17 different regions of Spain. All participants in this study were over 30 years of age (range 34–86 years) and had been diagnosed with PD. Exclusion criteria included patients with the highest degree of severity possible (Hoehn and Yahr stage V) because of a lack of autonomy in completing the questionnaire set by themselves. Likewise, those diagnosed with other similar disorders were excluded, such as other subtypes of Parkinsonism like vascular cause, drug-induced, dementia with parkinsonism, multiple systems atrophy (MSA) [13]. A total of 280 patients who were registered in the study received a set of questionnaires and detailed information about the aims and procedures of the study. They were included after giving their written informed consent. However, 52 patients were subsequently excluded because of missing data on either the MAU instruments or the PDQ-8. A final sample of 228 evaluable patients was included in the analyses. They answered the set of questionnaires under the supervision of trained interviewers. The study was approved by the Ethics Committee of the University of Extremadura and was developed following the ethical guidelines of the Declaration of Helsinki as revised in Seoul in October 2008.

2.2 Health-Related Quality of Life and Clinical Measures

The EQ-5D-3L [14] is one of the most widely used MAU instruments to assess HRQoL. It consists of five dimensions (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression) and three response levels for each dimension. A new five-level version EQ-5D (EQ-5D-5L) was developed to increase the sensitivity of the original three-level version [15]. This new questionnaire maintains the five dimensions from its original version but includes five response levels for each dimension. Both the EQ-5D-3L and EQ-5D-5L questionnaires were scored using the official Spanish tariffs [16, 17].

The 15D is a MAU instrument that was originally developed in Finland [18]. It has 15 dimensions: mobility, vision, hearing, breathing, sleeping, eating, speech, excretion, usual activities, mental function, discomfort and symptoms, depression, distress, vitality, and sexual activity. However, in comparison, the EQ-5D-5L seems to have better efficiency and greater sensitivity in detecting clinical HRQoL changes than the 15D [19]. The valuation system of the 15D is based on an application of the MAU theory and derived by using a combination of rating scale and magnitude estimation methods from a Finnish population sample [18].

The PDQ-8 is a clinical measure, which was developed from the PDQ-39 [20]. The PDQ-8 requires less time for self-completion than the PDQ-39. The PDQ-8 is a short-form questionnaire, made up of eight items, representing different dimensions: mobility, activities of daily living, emotional well-being, social support, cognition, communication, bodily discomfort, and stigma. Each item is scored on an ordinal scale from 0 (never or not at all) to 4 (always or cannot do at all). The summary index of all items is standardized on a scale with a range of 0–100, where the lower index represents better HRQoL.

In all cases, Spanish versions of the instruments were used in the questionnaires. A total of 35 min was the mean duration of the interviews.

2.3 Statistical Analyses

2.3.1 Descriptive Statistics

The Shapiro–Wilk test was used to determine whether QoL scores were normally distributed. The degree of correlation between the health state utility and PDQ-8 (item and overall) scores was calculated using the Spearman rank correlation.

2.3.2 Mapping Analyses

A direct mapping approach using regression techniques was first used to develop a series of mapping algorithms from the PDQ-8 onto each of the three MAU instruments. In each regression, the dependent (left hand) variable was a utility score calculated according to the choice of MAU instrument. Two sets of independent (right hand) variables were considered. Model I used the overall PDQ-8 score, its square term (to detect non-linear effects), age, and gender (a dummy variable). The independent variables of Model II replaced the overall PDQ-8 score (and its squared term) with the item scores (and corresponding squared terms) of the PDQ-8. Secondly, for both EQ-5D-3L and EQ-5D-5L, an indirect response mapping was also conducted. In the indirect mapping, the responses of each EQ-5D dimension were predicted separately from PDQ-8 using the ordered logit model. Then, applying country-specific tariffs, the country-specific EQ-5D utilities can be calculated. This indirect mapping approach provides an advantage of producing a set of mapping algorithms that can potentially be used for other country-specific tariffs, not just the Spanish tariff that was used for this study. This indirect mapping was not applied to the 15D, which contains broader dimensions not included in the PDQ-8, since large prediction errors would be expected.

Three statistical techniques were used to estimate each model in the direct mapping approach, including the most widely used ordinary least squares (OLS), the robust MM-estimator (which is designed to deal with the presence of potential outliers) [21, 22] and the generalized linear model (GLM) (which allows for the non-normal distribution of dependent variables) [23]. For the GLM estimator, among different combinations of commonly used family (Gaussian and gamma) and link (log) functions in the crosswalk literature, the optimal combination was chosen (and reported in Table 3) for each mapping function using the goodness-of-fit criteria described below (detailed results not reported but available from the authors). The ordered logit model was used to estimate the EQ-5D response in the indirect mapping [24]. Except for the constant, independent variables were retained only when they were statistically significant (i.e., p < 0.05), based on a forward stepwise regression technique [22, 25]. For logical reasons, the squared terms of the overall score of PDQ-8 and the square of the item scores were only included when the linear terms were significant.

2.3.3 Validation and Goodness-of-Fit

Without an external database, this study adopts a commonly applied internal validation ‘hold-out’ approach [22, 25]. The full data were randomly divided into two mutually exclusive groups using Stata’s random-number generator: 75% of the full sample was used as an ‘estimation’ sample, whilst the remaining 25% served as a ‘validation’ sample.

Three goodness-of-fit tests were employed to compare models, the mean absolute error (MAE), the root-mean-square error (RMSE), and the intra-class correlation coefficient (ICC) between the predicted and observed utilities. The ICC, which shows the agreement between the absolute magnitude of two variables, was calculated using a two-way mixed model [26]. Since there is no consensus on the choice of criteria, the optimal regression technique was selected, which achieved the best result with respect to the majority of the above three criteria. For the OLS estimates, an additional popular goodness-of-fit measure, R2, is also reported (although it was not used as an evaluation criterion since this statistic is not available for all econometric methods). The optimal method for estimating each mapping function was identified through evaluating goodness-of-fit measures calculated on the 25% validation sample. The ICC was calculated in SPSS version 21.0 (IBM Corp, Armonk, NY, USA). Other analyses were conducted in Stata version 14.1 (StataCorp LP, College Station, Texas, USA).

3 Results

3.1 Descriptive Statistics

Summary statistics for patients’ characteristics are reported in Table 1. The mean (SD) age was 66.2 (9.5) years old, with the majority of patients being male (66.2%). Health state utility scores assessed using the MAU instruments ranged from 0.61 (EQ-5D-3L) to 0.74 (15D). The mean (SD) PDQ-8 score was 27.51 (18.65). The distributions of the patients’ responses in the three MAU instruments are presented in the Appendix (see electronic supplementary material) and, as can be seen, health state utilities are not normally distributed. There were no statistically significant differences in patients’ characteristics between the estimation and validation samples.

Table 1 Patient characteristics

The bivariate correlation between the PDQ-8 domains and each MAU instrument is reported in Table 2. Generally, it can be seen that the lowest correlations (ranged from insignificant − 0.154 to − 0.208, p > 0.05) were found between the social support domain (of PDQ-8) and all the MAU instruments. The strongest correlations varied in each MAU instrument: between 15D and the ‘communication’ domain (r = 0.578), and between EQ-5D-3L/EQ-5D-5L and the ‘activities of daily living’ domain (r = −  0.671/−  0.618). The correlations between the overall PDQ-8 score and each MAU instrument ranged from − 0.729 (EQ-5D-5L) to − 0.752 (EQ-5D-3L).

Table 2 Correlation coefficients of Parkinson’s Disease Questionnaire (PDQ-8) and multi-attribute utilities

3.2 Mapping Results

Goodness-of-fit statistics for each of the mapping algorithms are reported in Table 3 for both the estimation sample (Panel A) and the validation sample (Panel B). The better goodness-of-fit is indicated by lower MAE/RMSE and higher ICC. Focusing on the validation sample, it can be seen that firstly, in the majority of cases, mapping algorithms performed better using the PDQ-8 item scores than using the overall score. Secondly, the optimal statistical method identified (bold in table) varied in each mapping function. Among the six mapping functions, the direct mapping framework, the robust MM-estimator, was chosen four times, followed by the OLS (twice). Among all mapping equations, better goodness-of-fit was found when mapping PDQ-8 onto 15D, judging by the ICC and supported by the scatter plots between observed and predicted utilities (Fig. 1). In addition, goodness-of-fit results in the validation sample are not necessarily worse than the estimation sample. In fact, for 15D, all three indicators reported in the validation sample are better than their counterparts in the estimation sample.

Table 3 Goodness-of-fit results for transformation from PDQ-8 scores to multi-attribute utilities
Fig. 1
figure 1

Scatter plots between observed and predicted multi-attribute utilities (direct mapping based on item model shown on left and indirect response mapping shown on right), N = 56. OLS ordinary least squares

Regarding the indirect response mapping, results show that the performance was comparable or even better than the direct mapping approach based on MAE but was slightly worse based on RMSE/ICC. For example, the optimal MAE of mapping PDQ-8 item scores onto EQ-5D-3L was 0.1681 based on the MM-estimator, whilst the indirect response mapping had an MAE of 0.1606. Similarly, for the EQ-5D-5L scenario, the MAE of indirect response mapping was 0.1063, slightly worse than the best MAE value identified from direct mapping of 0.1054 (MM-estimator), but much better than the other two methods in which both MAE values were larger than 0.11. Goodness-of-fit statistics of the estimation sample are largely comparable with the validation sample (details available in Table 3).

Mapping equations corresponding to the optimal methods identified above are reported in Table 4 (direct mapping) and Table 5 (indirect response mapping), respectively. In direct mapping, none of the personal characteristics were significant. They were not included in the final models, whilst in indirect response mapping, age and gender were found to be significant in some equations. For all mapping equations, social support and cognition domains of PDQ-8 were found to be consistently insignificant. All other domains were significant in at least one mapping function.

Table 4 Direct mapping equations from PDQ-8 scores to multi-attribute utilities, N = 172
Table 5 Indirect response mapping equations from PDQ-8 scores to each EQ-5D dimension (order logit estimates), N = 172

4 Discussion

The present study is the first that attempts to predict the utilities from 15D and EQ-5D-5L based on PDQ-8 dimensions and total scores of patients with PD. There are a few published studies that developed mapping algorithms onto EQ-5D-3L in PD [25, 27]. Between all generic MAU instruments, the EQ-5D-3L is the most used measure to predict utilities in this population and other diseases, since this is the MAU instrument preferred by NICE in the UK [28]. On the other hand, both PDQ-8 and PDQ-39 have been used interchangeably in the development of this technique to predict utilities [25, 27, 29, 30].

The mean score (SD) of PDQ-8 [27.51 (18.65)] is within what is expected according to other studies with PD patients. For example, a study conducted in Singapore with 88 patients [31] reported a mean (SD) PDQ-8 score of 31.6 (16.8), whilst another multicenter study with 134 patients from the UK, Germany, and Italy [32] showed a mean (SD) score of 28.19 (17.82). The mean (SD) utilities from MAU instruments differ (from 0.61 for EQ-5D-3L to 0.74 for 15D). Likewise, the SD of patient utilities varied by more than 100%, from 0.16 for 15D to 0.33 for the EQ-5D-3L. Therefore, it seems necessary to compare the utilities derived from different MAU instruments with caution. All three MAU instruments show similar magnitudes of correlation with the PDQ-8D total score (range from 0.73 to 0.75).

Regarding the mapping performance, the goodness-of-fit results reported in this study are within the ranges of previously published mapping studies [33]. It is difficult to compare the performance of this study with previous mapping literature from PDQ-8 owing to the different country-specific tariffs being used. Cheung et al. [25] developed a mapping algorithm onto EQ-5D-3L Japanese value sets based on patients from Singapore. The highest R2 attained was 0.547 with three significant PDQ items identified (PDQ1, PDQ2, PDQ7), and PDQ3 included as a dummy variable, plus a constant. Dams et al. [27] considered both the Unified Parkinson’s Disease Rating Scale and PDQ-8 as predictors for the EQ-5D-3L (scored using a German value set developed using TTO and a European value set developed using the visual analog scale [VAS] method). For the model in which only PDQ-8 items were considered and scored using German specific tariffs, the significant predictors include the squared terms of three items (PDQ1, PDQ2, and PDQ8) and a constant, with an overall R2 of 0.603 (slightly lower than the R2 reported in this study of 0.619). The mapping functions onto a European VAS value set attained a slightly higher R2 (0.666) with the same items (PDQ1, and the squared terms of PDQ2 and PDQ8).

There are several strengths of this study [34]. Firstly, this study developed mapping algorithms onto three MAU instruments and offers future users a choice of mapping algorithms that are regarded to be more suitable in their studies. Secondly, the optimal mapping algorithms were selected from among different statistical methods. Thirdly, in addition to the more widely used direct mapping, this study has also conducted indirect response mapping to predict the responses to each of the EQ-5D dimensions. One of the key advantages of this approach is that by applying different country-specific value sets of the EQ-5D, the reported mapping functions can be more widely applied by users from other countries. However, it should be noted that the generalizability of the indirect response mapping functions depends on whether PD patients from other countries will have a similar response pattern to patients in Spain. Further external validation is warranted.

Some limitations should also be considered to accurately interpret the results. Firstly, with respect to the sample, although the study included 280 patients, all analyses were conducted with only the 228 patients who completed all the questionnaires. Also, the patients with the highest degree of severity possible (Stage V) were excluded because of lack of autonomy to complete the questionnaire set by themselves. Furthermore, the convenience sample does not guarantee representativeness for all patients with Parkinson’s disease. To ratify the results reported here and to determine other psychometric properties, such as longitudinal response, more research with a larger sample size would be desirable. Secondly, the use of an external dataset for validation is strongly recommended. Thirdly, although EQ-5D-5L and EQ-5D-3L are very similar with regard to the health state classification system (i.e., they use five identical dimensions), the distributions are very different. The selected predictors (items) from the PDQ-8 in each optimal mapping algorithm differ slightly in Table 4. It is not the aim of this study to investigate further why such differences exist between EQ-5D-3L and EQ-5D-5L value sets; however, it is an interesting topic that deserves further investigation.

Despite the growing popularity of mapping algorithms, there are doubts about whether it should be the first-choice method. McCabe et al. [35] support the idea of expressing the value of healthcare interventions using a consistent tool, but according to them, these methods are not as reliable as expected when the aim is to measure preferences about health. Likewise, in the same year, another study [36] concluded that mapping should mostly be viewed as a second-best solution. In response to these observations, other probabilistic methods have recently been published to predict the utilities of EQ-5D from PDQ-8 and PDQ-39, using multidimensional Bayesian network classifiers [37, 38].

5 Conclusions

The algorithm for each MAU instrument allows the calculation of health state utility scores with good precision in a PD sample. These algorithms can be used for CUA in order to assess the cost effectiveness of interventions in patients with PD when only the disease-specific PDQ instrument was used. In addition, the indirect mapping functions reported in this study for two EQ-5D questionnaires will further facilitate the calculation of EQ-5D utility scores using other country-specific tariffs.