Abstract
Purpose
To develop mapping algorithms that transform Diabetes-39 (D-39) scores onto EQ-5D-5L utility values for each of eight recently published country-specific EQ-5D-5L value sets, and to compare mapping functions across the EQ-5D-5L value sets.
Methods
Data include 924 individuals with self-reported diabetes from six countries. The D-39 dimensions, age and gender were used as potential predictors for EQ-5D-5L utilities, which were scored using value sets from eight countries (England, Netherland, Spain, Canada, Uruguay, China, Japan and Korea). Ordinary least squares, generalised linear model, beta binomial regression, fractional regression, MM estimation and censored least absolute deviation were used to estimate the mapping algorithms. The optimal algorithm for each country-specific value set was primarily selected based on normalised root mean square error (NRMSE), normalised mean absolute error (NMAE) and adjusted-r2. Cross-validation with fivefold approach was conducted to test the generalizability of each model.
Results
The fractional regression model with loglog as a link function consistently performed best in all country-specific value sets. For instance, the NRMSE (0.1282) and NMAE (0.0914) were the lowest, while adjusted-r2 was the highest (52.5%) when the English value set was considered. Among D-39 dimensions, the energy and mobility was the only one that was consistently significant for all models.
Conclusions
The D-39 can be mapped onto the EQ-5D-5L utilities with good predictive accuracy. The fractional regression model, which is appropriate for handling bounded outcomes, outperformed other candidate methods in all country-specific value sets. However, the regression coefficients differed reflecting preference heterogeneity across countries.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Condition-specific health-related quality of life (HRQoL) instruments are commonly regarded as being superior to generic HRQoL instruments at identifying changes in health [1]. However, for the purpose of comparing the relative cost-effectiveness of competing healthcare programmes across different disease areas, generic preference-based outcome measures are required. The most widely applied such instrument is the EQ-5D [2] that produces health state utilities for use in the calculation of quality-adjusted life years (QALYs). Given that clinical trials often do not collect data on patients’ health state utilities, there is an increasing research interest in developing mapping algorithms that can estimate the relationship between a condition-specific measure of health, or ‘source’ instrument onto a generic preference-based HRQoL measure, or ‘target’ instrument [3, 4].
Diabetes is a common cause of premature death and disability, which imposes a large economic burden on the healthcare system [5]. In 2015, the International Diabetes Federation estimated about 415 million diabetic people (or 8.8% of adults aged 20–79) and 5 million deaths [6]. The rising prevalence of diabetes will lead to increased demand for healthcare services. Consequently, there is a growing need for evaluating the cost-effectiveness of many new diabetes interventions as compared to competing use of resource in other disease groups [4, 7].
The Diabetes-39 (D-39) is the most widely used non-preference-based condition-specific instrument addressing the HRQoL of diabetic patients [8], while the EQ-5D is the most widely used generic preference-based instrument [9,10,11]. Over 17,000 studies using the EQ-5D instrument have been registered with the EuroQol Group by 2015 [11]. Further, the United Kingdom (UK) National Institute for Health and Care Excellence (NICE) [12] strictly requested that the EQ-5D should be used in all economic evaluations to ensure comparability between studies. Mapping studies aimed at generating algorithms for predicting EQ-5D utilities are also increasing, particularly after the NICE has endorsed mapping when the direct measure of EQ-5D utility is unavailable [13].
So far, only one study has developed mapping functions from D-39 to generic preference-based instruments including the five-level EQ-5D (EQ-5D-5L) [14]. A key limitation of this previous study is that it was based on an interim value set, which was a ‘cross-walk’ between the earlier three-level EQ-5D (EQ-5D-3L) and the EQ-5D-5L descriptive system [15]. Most recently, several country-specific EQ-5D-5L value sets have been published, including four Western countries (England, the Netherlands, Spain, Canada), three Asian countries (China, Japan, Korea) and one South American (Uruguay) [16,17,18,19,20,21,22,23]. Hence, the existing mapping algorithm for D-39 is becoming obsolete following the publication of these new EQ-5D-5L value sets.
This paper makes three important contributions to the literature. First, it produces optimal mapping functions for EQ-5D-5L value sets recently developed by direct elicitation of the general public. Second, we test several regression models. While the previous mapping algorithm for D-39 applied only ordinary least squares (OLS) and generalised linear model (GLM) [14], the current paper investigates into the merit of four other regression models. Third, we show that when preference-based value sets differ across countries, the regression coefficients in the mapping functions will also differ. Thus, there is a need to derive country-specific mapping algorithms.
The present study aims to develop mapping algorithms that transform D-39 scores onto EQ-5D-5L utility values for each of eight recently published country-specific EQ-5D-5L value sets and to compare mapping functions across the EQ-5D-5L value sets. This study followed the recently developed checklist of minimum reporting requirement for mapping studies [24]: ‘MApping Preference-based Measures reporting Standards (MAPS)’.
Methods
Data
Data were obtained from the Multi Instrument Comparison (MIC) study, which is based on an online survey administered in six countries (Australia, Canada, Germany, Norway, UK and United States of America) by a global panel company, CINT Australia Pty Ltd. The key aims of this large-scale study that includes more than 8,000 subjects were (i) to compare health state utilities obtained from different generic preference-based instruments, and (ii) to compare such values with values obtained from disease-specific instruments. There was no missing value in the online survey. However, considering the lack of direct control in the online survey, several edit procedures (e.g. a comparison of duplicated questions, and removal of respondents whose recorded completion time shorter than 20 minutes) were conducted to ensure the quality of data. For further details on the MIC study, see Richardson et al. [25]. Among the seven chronic disease groups included in this comprehensive international study, the current paper is based on the diabetes group (N = 924). The MIC data are an ideal source for developing mapping algorithms from condition-specific HRQoL onto generic preference-based values. So far it has been used to derive mapping functions in different chronic diseases, including asthma [26], depression [27] and heart diseases [28].
Measures of variables
The EQ-5D includes five dimensions: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. The construction of the EQ-5D-5L retained the original dimensional structure of the EQ-5D-3L questionnaire and increased the descriptive systems to five levels of severity to address the problem of sensitivity and potential ceiling effects [29]. The five severity levels include no problems, slight problems, moderate problems, severe problems and unable to/extreme problems. Here, we applied the new EQ-5D-5L value sets from eight countries [16,17,18,19,20,21,22,23]: England, the Netherlands, Spain, Canada, Uruguay, China, Japan and Korea. For ease of exposition, the English value set would be emphasised, though results for other countries’ value sets were discussed. The scale length differs across countries with the worst health state or the ‘pits’ (55555) ranging from − 0.446 in the Netherlands to − 0.025 in Japan [30].
The D-39 includes 39-items, each with 7-response levels ranging from 1 (not affected at all) to 7 (extremely affected) [8]. The D-39 covers five domains: energy and mobility (15 items), diabetes control (12 items), anxiety and worry (4 items), social burden (5 items) and sexual functioning (3 items). Each item was reverse-coded, such that low values representing poor HRQoL. Domain scores were calculated by summing the item scores within each domain. Similarly, the total score of D-39 was obtained by summing the entire 39 item scores, where item scores are set equal to the rank order of the response. Finally, the total D-39 score and each subscale summary score were linearly transformed onto a 0 to 1 scale: 0 indicating the worst, and 1 the best possible health state.Footnote 1
It should be noted that the dataset on which the modelling is based does not include subjects from any Asian countries. Thus, to the extent that people from Asia would describe their health problems differently along the instruments considered here, the regression coefficients in the mapping functions for the three Asian countries should be interpreted with somewhat more caution than those for the Western countries.
Statistical analyses and estimation
Descriptive analyses
Respondents’ characteristics were described as mean (standard deviation, SD) unless otherwise indicated. The degree of conceptual overlap between the source (D-39) and target (EQ-5D-5L) instruments was examined using Spearman’s correlation coefficients and a principal component analysis (PCA). PCA is a multivariate statistical technique used to reduce the number of variables in a dataset into a smaller number of linearly uncorrelated ‘principal components’ that helps to investigate whether a number of items generate information about a more general underlying component [31]. The PCA was applied to explore and compare the underlying dimensional structure of the D-39 data and EQ-5D-5L information evident in these data. The number of principal components was decided according to eigenvalues, which indicate the amount of variance in the original variables explained by each principal component [32]. Only principal components with eigenvalues greater than 1 were considered in the exploratory PCA [33]. Rotation of the dimensions identified in the initial extraction is needed in order to obtain simple and interpretable components [34]. Rotation methods, which allow correlations between components, are referred to as oblique. Thus, to account for potential correlations among components, rotation was performed using an oblique Promax method [35].
Econometric techniques for mapping analysis
A direct mapping approach was adopted. The D-39 five domains, as well as age and gender, were the potential variables to predict EQ-5D-5L utilities. The final predictors were determined via stepwise forward selection that included only significant variables (p < 0.05). Predictors were also required to be logically consistent, meaning that poorer scores on a source instrument should lead to lower utility on the target instrument. Non-linearities were investigated, and they were included only if the main effect variable significantly contributed to the model.
Six econometric models were adopted: OLS, GLM, MM estimation, censored least absolute deviation (CLAD), fractional regression model (FRM) and beta binomial (BB) regression. The OLS, which depends on a normally distributed error term, is the most commonly used regression model in mapping studies [36]. The GLM is a flexible generalisation of OLS that allows our target variable to have a non-normal error distribution [14, 36]. The log link function with Gaussian family predicting EQ-5D-5L utility fitted the data well, and hence applied in the estimation of GLM.
The MM estimationFootnote 2 is one of the robust regression estimation methods that is used when the distribution of residual is not normal or there are some outliers that affect the model [38]. It aims to obtain estimates that have the high breakdown value, which is a common measure of the proportion of outliers that can be addressed before these observations affect the model [39], and is more efficient. Similarly, the CLAD model is a robust estimator which further takes into account the censoring issue of the outcome variable [40]. It provides consistent and robust estimates under rather general conditions, particularly for misspecification of errors related to heteroscedasticity and non-normality [41].
The FRM was developed to model empirical bounded dependent variables defined on [0, 1] interval that exhibit piling-up at one of the two corners [42]. Unlike other parametric methods, the important advantage of this semi-parametric FRM is that it does not make any distributional assumption about an underlying structure used to obtain the outcome variable [42, 43]. Given a vector of independent variables (X) and a dependent variable (Y), the basic assumption underlying the FRM can be summarised as
where G(·) is a known non-linear function satisfying 0 ≤ G(·) ≤ 1 and β is a vector of parameters to be estimated. The generalised goodness-of-functional form test proposed by Ramalho et al. [44] suggested that ‘loglog’ is the best alternative functional form for G(.) and used as a link function. It is defined as
Another issue in FRM is to decide whether a one- or two-part model should be used, where the discrete component (observations at one or both boundaries) is modelled as a binary or multinomial model and the rest as FRM in a two-part model [45]. In EQ-5D, since boundary values are extreme scores best described by the same mechanism as all other values in the interval, there is no strong theoretical reason to use the two-part model. Further, a P-test statistic proposed by Davidson and MacKinnon [46] and applied to this particular context by Ramalho et al. [43] suggested no sufficient evidence to reject the null hypothesis that the one-part model is most appropriate (p = 0.128). The FRM (with loglog link function) was implemented using ‘frm’, a user-defined Stata command [43]. The observed EQ-5D-5L utilities were firstly transformed onto a 0–1 scale before entering into the regression as the dependent variable through the following equation: (X − Xmin)/(Xmax − Xmin), in which X stands for the EQ-5D-5L utility and min/max refers to the minimum/maximum value of observed EQ-5D-5L utilities in each country-specific value set. Eventually, predictions of EQ-5D-5L utilities were back-transformed to the original scale.
The BB regression is a fully parametric approach for modelling bounded data. For instance, due to its known flexibility that allows a great variety of asymmetric forms, the most popular choice to estimate Eq. (1) is the parametric beta distribution [47, 48]. The re-parameterisation of the beta regression has been detailed in Ferrari and Cribari-Neto [47]. The logit functional form fitted the data well and hence applied as a link function in the BB regression model for predicting the EQ-5D-5L utilities. As this parametric model is not defined at the boundary values, the outcome values should be restricted to the 0–1 range, excluding 0 and 1. This can be achieved by linear transformation, [Y(N − 1) + 0.5]/N, following earlier literature [49, 50], where N is sample size and Y is linearly transformed EQ-5D-5L utility on 0–1 scale.
Goodness-of-fit measures
Following external guidance [3, 36], mean absolute error (MAE) and root mean square error (RMSE), both of which were adjusted for the degrees of freedom owing to the potential varied number of predictors in each model, were firstly calculated. Both MAE and RMSE may easily be influenced by the scale of the dataset because wider scale size leads to a larger error [51]. Thus, we normalised both MAE and RMSE to the range of the measured data. Such normalised RMSE (NRMSE) and normalised MAE (NMAE) are non-dimensional and facilitate the comparison between datasets or models with different scales. Model performance was also examined by the square of the correlation coefficient between the observed and predicted values adjusted for the number of predictors in the model (adjusted-r2). Furthermore, the degree of absolute agreement between the predicted and the observed EQ-5D-5L was assessed using Lin’s concordance correlation coefficient (CCC) [52]. The CCC is popular for assessing agreement between two measures, which includes both precision (degree of scatter) and accuracy (systematic bias) components [52, 53]. A value of CCC close to unity implies a good concordance between predicted and observed measures.
In the absence of an external data, cross-validation was usually conducted using existing data to evaluate the performance of the mapping algorithms for generalisability. This study applied a k fold approach, where the total sample was randomly divided into k (k = 5) equally sized groups. Each time, four groups were combined as an estimation sample (i.e. 80% of the full sample) to derive mapping algorithms which were then applied to the remaining validation group (i.e. 20% of the full sample). This process was repeated five times, and hence all groups in the dataset were used for both estimation and validation sample. Finally, the normalised average values of RMSE and MAE, as well as mean r2 from the five iterations, were reported for easy comparison of the models’ predictive performance.
The best-fitting model with good predictive accuracy was selected primarily based on three metrics derived from both the full sample and validation sample: NRMSE, NMAE and adjusted-r2. As there is no single gold standard criterion, these model performance indicators were equally important. Thus, the model that performs best in most of these criteria should be selected. When a disagreement is observed between them, the CCC from the full sample would be applied to determine the final preferred mapping algorithm. Eventually, this preferred model was estimated using the full sample (N = 924). All statistical analyses were conducted using Stata® version 14.2 (StataCorp LP, College Station, Texas, USA) except the PCA, which was carried out in SPSS version 24 (IBM Corp., Armonk, NY, USA).
Results
Descriptive and conceptual overlap
Table 1 reports sample characteristics and a summary of source and target instruments. The mean (SD) value was 0.786 (0.215) for EQ-5D-5L (English value set) and 0.659 (0.222) for D-39. The average scores of the D-39 domains range from 0.535 for the ‘anxiety and worry’ domain to 0.781 for ‘social burden’. The highest correlation (Table 8 in Appendix) was found between the ‘energy and mobility’ domain of D-39 and EQ-5D-5L utility indices: it ranges from 0.692 with Uruguayan to 0.721 with the Japanese value set. For EQ-5D-5L dimensions, the strongest correlation (0.646) was observed between the ‘anxiety and worry’ domain of D-39 and the ‘anxiety/depression’ dimension of EQ-5D-5L. For the remaining four EQ-5D-5L dimensions, higher coefficients were consistently observed with the ‘energy and mobility’ domain of D-39, where the highest coefficient (0.612) was observed with EQ-5D-5L ‘mobility’ dimension and the lowest (0.449) with ‘self-care’.
Results from the PCA (Table 2) revealed six principal components when the number of components was limited to those with an eigenvalue greater than 1 and explained about 70% of the total variance. The first five principal components were the same as the D-39 domains (‘diabetes control’, ‘energy and mobility’, ‘anxiety and worry’, ‘sexual functioning’ and ‘social burden’), and the sixth component was related to ‘self-care’. Two EQ-5D-5L dimensions (‘mobility’ and ‘pain/discomfort’) were mainly loaded onto the same component as the D-39 items that describe activities related to ‘energy and mobility’. Another two EQ-5D-5L dimensions (‘self-care’ and ‘usual activities’) were mainly loaded to the last component, in which only one D-39 item (i.e. ‘having trouble caring yourself’) was mainly loaded onto. The EQ-5D-5L ‘anxiety/depression’ dimension was loaded to the ‘anxiety and worry’ component, together with relevant D-39 items. None of the EQ-5D-5L dimensions were mainly loaded on the ‘diabetes control’, ‘social burden’ or ‘sexual functioning’ components, supporting the poor correlations identified above. In sum, the EQ-5D-5L dimensions mainly loaded on three principal components (‘energy and mobility’, ‘anxiety and worry’ and ‘self-care’), which explains 17.14% of the total variance.
Model performance
Tables 3 and 4 summarise the performance of models (assessed by three goodness-of-fit indicators: NRMSE, NMAE and adjusted-r2), while Table 5 reports the fourth indicator (the CCC) for all countries. When the English value set was considered, the fractional regression model with loglog as a link function was the best performing model in terms of all four indicators: NRMSE (0.1282), NMAE (0.0914), adjusted-r2 (52.5%) and CCC (0.691). The cross-validation approach also revealed similar results that the FRM consistently performed best. The FRM model continued to perform well for the remaining country-specific value sets. However, both OLS and FRM equally performed well for the Dutch value set. Both models had the lowest NRMSE (0.1282) and the highest CCC (0.667) in the full sample. While OLS model provided the highest adjusted r2 in the full sample and cross-validation, the FRM model consistently produced lower NMAE than OLS model in both full sample and cross-validation; it also produced the lowest NMAE in the full sample among all estimators. Overall, we recommend FRM for the Dutch value set.
Regarding the CCC results, the FRM revealed the highest concordance, with only the exception for the Japanese value set in which the MM estimator produced the highest CCC (followed by CLAD and FRM), confirming the stability and consistency of our model. Except for the Japanese value set (CCC = 0.716), the MM estimator provided generally poor concordance. The lowest concordance (CCC = 0.361) was observed for the Canadian value set.
Although mapping algorithms produce adequate mean prediction for EQ-5D-5L, this may not always be true for individual predictions. The predicted English EQ-5D-5L had a mean (SD) value of 0.786 (0.157) for the best model, which is quite similar to the observed mean. The respective 25th, 50th and 75th percentiles of the predicted values were 0.703, 0.840 and 0.911 compared with 0.712, 0.858 and 0.937 for the observed values. These values were comparable. However, our best-fitting model over-predicted at severe health states (Fig. 1). For instance, the 5th and 10th percentiles of the predicted English EQ-5D-5L value set were 0.445 and 0.550 against 0.317 and 0.462 for the observed value set, respectively (Table 9 in Appendix).
Regression results
Table 6 shows the regression results for the various models when the English value set was applied. The coefficient for ‘energy and mobility’ (EM) was positive and statistically significant (p < 0.01) in all models. The ‘anxiety and worry’ (AW) domain was also a significant positive predictor of EQ-5D-5L in BB and MM models. Further, the squared-term for ‘energy and mobility’ (EM2) was a significant predictor of EQ-5D-5L in OLS, GLM and CLAD models. The remaining D-39 domains were either insignificant or yield logically inconsistent signs, and hence ignored in all models. Two socio-demographic variables (age and gender) were considered. However, only age revealed statistical significance in all models except for CLAD.
Table 7 reports the best performing mapping algorithms for eight country-specific EQ-5D-5L value sets based on the model performance criteria discussed above. For each country, the optimal mapping function was consistently estimated using the FRM.
Discussion
This study developed algorithms that allow the mapping of the D-39 instrument onto eight country-specific EQ-5D-5L value sets. The EQ-5D-5L utility index was modelled as a function of the D-39 domains using six regression models. The result generally showed the FRM to be the preferred model in predicting EQ-5D-5L utility values in all country-specific value sets.
The present study also assessed the empirical performance of different statistical approaches to predict EQ-5D-5L utility values. For diabetes patients, the EQ-5D-5L was found to have a moderate degree of ceiling effect (17.4% respondents reported to be in full health according to the EQ-5D-5L descriptive system), and highly skewed to the left. Alternative regression models robust to such properties were investigated, where the FRM with loglog link performed best in comparison with other models under consideration for each country-specific value set. Yet, the regression coefficients were considerably different across countries (Table 7).
Typically, many mapping studies apply standard linear regression to predict EQ-5D utilities. However, EQ-5D has two fundamental properties. It is characterised by (i) bounded nature of utility data (between the worst and the best health states), and (ii) ceiling effects (i.e. piling-up of utilities at 1). Under such circumstances, the effect of predictor variables cannot be constant throughout its entire range [45]. The novelty of the FRM model is that it is non-linear and more appropriate for naturally bounded data that exhibit piling-up at one of the two corners as the case in EQ-5D [43, 45]. Furthermore, FRM is a semi-parametric approach estimated by quasi-maximum likelihood, which gives consistent estimators regardless of the true distribution of the outcome variable conditional on the predictors. This model also constrains predictions within the range determined by EQ-5D utility.
The current study differs in several important aspects from the previous study [14]. The study by Chen et al. [14] only considered two regression models (OLS and GLM). The present study, however, considered different analytical approaches, addressing the characteristics of the data such as censoring, problems of normality and heterogeneity of variance by comparing six distinct econometric models. Furthermore, while the study by Chen and colleagues used the interim ‘cross-walk’ vale sets, the mapping algorithm provided in this study employed the directly elicited EQ-5D-5L value sets. Thus, the preferred models and their performance in terms of both MAE and RMSE as well as adjusted-r2 were quite different as expected. An MAE, RMSE and adjusted-r2 values for our preferred model were 0.0914, 0.1282 and 52.5%, respectively, as compared to 0.131, 0.177 and 47.5% in the study by Chen et al.Footnote 3 This discrepancy may partly be attributable to variation in the target instrument used and partly due to the mapping functions employed. The two studies also differ in terms of the use of additional covariates in predicting EQ-5D-5L utility values. While the study by Chen et al. [14] included country dummies and gender, the present study has considered respondents’ age and gender alone. Country variable was initially considered as a potential predictor in this study. However, it does not improve the prediction of EQ-5D-5L utility and hence dropped from the analysis. In both studies, gender was not a significant predictor of EQ-5D-5L utility values.
Mapping models usually overestimate for patients with severe health states and underestimate for patients with good health states. This general problem of mapping studies is due to regression to the mean [36]. Furthermore, this might also be explained by the preference pattern of EQ-5D-5L, where substantial decrements in preference weights occur at severe health states [30]. The conceptual difference between the source and target instruments may be another explanation. From the PCA result, ‘self-care’ and ‘usual activities’ dimensions of EQ-5D-5L were mainly loaded onto the ‘self-care’ component, while only one D-39 item (i.e. ‘having trouble caring for yourself’) mainly loaded onto the same component. Many of the D-39 items related to the domains ‘diabetes control’, ‘sexual functioning’ and ‘social burden’, formed three distinct components unrelated to the EQ-5D-5L dimensions. These components revealed logically inconsistent signs in the estimated coefficients (results not reported here), which may be attributable to the hidden unknown interrelationships among the domains of the D-39, evidenced by relatively high correlation coefficients between the ‘energy and mobility’ domain with the rest of the D-39 domains.
In addition to providing insight into the dimensional structure of both D-39 and EQ-5D-5L instruments, the PCA also provided a basis for the choice between a direct or an indirect (response) mapping approach [32]. As demonstrated in the PCA (Table 2), nearly 21 items from D-39, which were mainly loaded onto the ‘diabetes control’, ‘sexual functioning’ and ‘social burden’ components, were not related to EQ-5D-5L dimensions. Thus, the probability of accurately predicting five EQ-5D-5L response levels for each dimension was low. In general, exact prediction of a health state requires correct predictions for each dimension of EQ-5D in response mapping approach, which is rarely achieved. Consequently, response mapping can be severely penalised when an incorrect prediction is made [54].
This study has several strengths. First, we have investigated several mapping algorithms that could be used to predict EQ-5D-5L utility when only the D-39 instrument has been applied in a clinical study. The FRM and BB models are appropriate for bounded outcome variable as the case in this study, which has the ability to predict within the given range. Despite its strong assumptions, the OLS regression model has performed pretty well as compared to other robust candidate estimators, such as GLM, BB regression, CLAD and MM estimation. Secondly, we have adjusted for scale differences to facilitate comparison between instruments or models with different scales, which is usually undermined in many mapping studies. Thirdly, this is the first study to assess the predictive accuracy of different EQ-5D-5L value sets using the D-39 instrument. Boland et al. [55] examined the clinical chronic obstructive pulmonary disease with three EQ-5D-3L value sets (for Netherland, UK and the US), though they did not make direct comparison as in the present study. Lastly, our algorithm has wider generalisability because of the multinational nature of the patient population used. Yet, as generalisability is a major issue for mapping studies, it should be tested how these models perform in different diabetic patient populations. With regard to study limitations, the dataset on which the modelling is based does not include Asian subjects. If Asian diabetes patients would describe their problems differently on the source (D-39)—and the target (EQ-5D-5L) instruments, the regression coefficients in the mapping functions for China, Japan and Korea should be interpreted with some caution. Secondly, although the estimation data in this study cover a wide range of EQ-5D-5L utilities, they do not cover the whole theoretical range; consequently the prediction error towards the lower end of utility distribution tends to be large. Furthermore, self-selection bias might have occurred, as respondents were volunteered to participate in the online survey.
In conclusion, results from this study reveal that the estimated EQ-5D-5L utility values from the preferred mapping algorithm can adequately predict a mean utility value in other sample in the absence of preference-based generic instruments. Thus, such mapping algorithms facilitate health economic evaluation that can guide decision-makers for appropriate interventions in diabetic patients.
Notes
For example, the transformed score for energy and mobility (EM) domain is calculated by subtracting the sum of reverse-coded responses across all 15 items (say, X) from its minimum score (15) and divided by the range (maximum (105) minus minimum scores). That is, the algorithm for computing the EM score is: (X-15)/(105-15).
MM estimation estimates the regression parameter using S estimation, which minimises the scale of the residual from M estimation and then proceed with M estimation. The S in ‘S estimation’ stands for ‘scale’ of the residual, M in ‘M estimation’ for ‘maximum likelihood type’ and MM in ‘MM estimation’ stands for ‘minimising M estimation’. For detail, see Yohai [37].
RMSE and MAE were adjusted for scale differences, and normalised RMSE and normalised MAE were reported in the present study.
References
Brazier, J., & Dixon, S. (1995). The use of condition specific outcome measures in economic appraisal. Health Economics, 4(4), 255–264. https://doi.org/10.1002/hec.4730040402.
Dolan, P. (1997). Modeling valuations for EuroQol health states. Medical Care, https://doi.org/10.1097/00005650-199711000-00002.
Longworth, L., & Rowen, D. (2013). Mapping to obtain EQ-5D utility values for use in NICE health technology assessments. Value in Health, 16(1), 202–210. https://doi.org/10.1016/j.jval.2012.10.010.
Brazier, J., Ratcliffe, J., Saloman, J., & Tsuchiya, A. (2017). Measuring and valuing health benefits for economic evaluation. Oxford: Oxford University Press.
WHO. (2016). Global report on diabetes. France: World Health Organization.
IDF. (2015). Diabetes Atlas (7th edn.). Brussels: International Diabetes Federation (IDF).
Drummond, M. F., Sculpher, M. J., Torrance, G. W., O’Brien, B. J., & Stoddart, G. L. (2015). Methods for the economic evaluation of health care programme (4th edn.). Oxford: Oxford University Press: Oxford.
Boyer, J. G., & Earp, J. A. (1997). The development of an instrument for assessing the quality of life of people with diabetes: Diabetes-39. Medical Care, 35(5), 440–453.
Richardson, J., McKie, J., & Bariola, E. (2014). Multi attribute utility instruments and their use. In A. J. Culyer (Ed.), Encyclopedia of health economics (pp. 341–357). San Diego: Elsevier Science.
Wisløff, T., Hagen, G., Hamidi, V., Movik, E., Klemp, M., & Olsen, J. A. (2014). Estimating QALY gains in applied studies: A review of cost-utility analyses published in 2010. Pharmacoeconomics, 32(4), 367–375. https://doi.org/10.1007/s40273-014-0136-z.
Devlin, N. J., & Brooks, R. (2017). EQ-5D and the EuroQol group: Past, present and future. Applied Health Economics and Health Policy, 15(2), 127–137. https://doi.org/10.1007/s40258-017-0310-5.
NICE (National Institute for Health and Care Excellence). (2013). Guide to the methods of technology appraisal. London: National Health Service. Retrieved September 18, 2017, from http://www.nice.org.uk.
Dakin, H. (2013). Review of studies mapping from quality of life or clinical measures to EQ-5D: an online database. Health Qual Life Outcomes. https://doi.org/10.1186/1477-7525-11-151.
Chen, G., Iezzi, A., McKie, J., Khan, M. A., & Richardson, J. (2015). Diabetes and quality of life: Comparing results from utility instruments and Diabetes-39. Diabetes Research and Clinical Practice, 109(2), 326–333. https://doi.org/10.1016/j.diabres.2015.05.011.
van Hout, B., Janssen, M. F., Feng, Y.-S., Kohlmann, T., Busschbach, J., Golicki, D., Lloyd, A., Scalone, L., Kind, P., & Pickard, A. S. (2012). Interim scoring for the EQ-5D-5L: Mapping the EQ-5D-5L to EQ-5D-3L Value Sets. Value in Health, 15(5), 708–715. https://doi.org/10.1016/j.jval.2012.02.008.
Augustovski, F., Rey-Ares, L., Irazola, V., Garay, O. U., Gianneo, O., Fernandez, G., Morales, M., Gibbons, L., & Ramos-Goni, J. M. (2015). An EQ-5D-5L value set based on Uruguayan population preferences. Quality of Life Research. https://doi.org/10.1007/s11136-015-1086-4.
Kim, S.-H., Ahn, J., Ock, M., Shin, S., Park, J., Luo, N., & Jo, M.-W. (2016). The EQ-5D-5L valuation study in Korea. Quality of Life Research, 25(7), 1845–1852. https://doi.org/10.1007/s11136-015-1205-2.
Luo, N., Liu, G., Li, M., Guan, H., Jin, X., & Rand-Hendriksen, K. (2017). Estimating an EQ-5D-5L Value Set for China. Value in Health, 20(4), 662–669. https://doi.org/10.1016/j.jval.2016.11.016.
Ramos-Goni, J. M., Pinto-Prades, J. L., Oppe, M., Cabases, J. M., Serrano-Aguilar, P., & Rivero-Arias, O. (2014). Valuation and modeling of EQ-5D-5L health states using a hybrid approach. Medical Care. https://doi.org/10.1097/mlr.0000000000000283.
Versteegh, M. M., Vermeulen, K. M., Evers, S. M. A. A., de Wit, G. A., Prenger, R., & Stolk, E. A. (2016). Dutch Tariff for the five-level version of EQ-5D. Value in Health. https://doi.org/10.1016/j.jval.2016.01.003.
Xie, F., Pullenayegum, E., Gaebel, K., Bansback, N., Bryan, S., Ohinmaa, A., Poissant, L., & Johnson, J. A. (2015). A time trade-off-derived value set of the EQ-5D-5L for Canada. Medical Care. https://doi.org/10.1097/mlr.0000000000000447.
Devlin, N. J., Shah, K. K., Feng, Y., Mulhern, B., & van Hout, B. (2017). Valuing health-related quality of life: An EQ-5D-5L value set for England. Health Economics. https://doi.org/10.1002/hec.3564.
Shiroiwa, T., Ikeda, S., Noto, S., Igarashi, A., Fukuda, T., Saito, S., & Shimozuma, K. (2016). Comparison of value set based on DCE and/or TTO data: Scoring for EQ-5D-5L health states in Japan. Value in Health, 19(5), 648–654. https://doi.org/10.1016/j.jval.2016.03.1834.
Petrou, S., Rivero-Arias, O., Dakin, H., Longworth, L., Oppe, M., Froud, R., & Gray, A. (2015). Preferred reporting items for studies mapping onto preference-based outcome measures: The MAPS statement. Health and Quality of Life Outcomes, 13(1), 106. https://doi.org/10.1186/s12955-015-0305-6.
Richardson, J., Iezzi, A., & Maxwell, A. (2012). Cross-national comparison of twelve quality of life instruments: Mic Paper 1 background, questions, instruments. Research Paper 76. Retrieved November 23, 2017, from https://www.aqol.com.au/papers/researchpaper76.pdf.
Kaambwa, B., Chen, G., Ratcliffe, J., Iezzi, A., Maxwell, A., & Richardson, J. (2017). Mapping between the Sydney Asthma Quality of Life Questionnaire (AQLQ-S) and five multi-attribute utility instruments (MAUIs). Pharmacoeconomics, 35(1), 111–124. https://doi.org/10.1007/s40273-016-0446-4.
Mihalopoulos, C., Chen, G., Iezzi, A., Khan, M. A., & Richardson, J. (2014). Assessing outcomes for cost-utility analysis in depression: Comparison of five multi-attribute utility instruments with two depression-specific outcome measures. British Journal of Psychiatry, 205(5), 390–397. https://doi.org/10.1192/bjp.bp.113.136036.
Chen, G., McKie, J., Khan, M. A., & Richardson, J. R. (2014). Deriving health utilities from the MacNew Heart Disease Quality of Life Questionnaire. European Journal of Cardiovascular Nursing, 14(5), 405–415. https://doi.org/10.1177/1474515114536096.
Herdman, M., Gudex, C., Lloyd, A., Janssen, M., Kind, P., Parkin, D., Bonsel, G., & Badia, X. (2011). Development and preliminary testing of the new five-level version of EQ-5D (EQ-5D-5L). Quality of Life Research, 20(10), 1727–1736. https://doi.org/10.1007/s11136-011-9903-x.
Olsen, J. A., Lamu, A. N., & Cairns, J. (2018). In search of a common currency: A comparison of seven EQ-5D-5L value sets. Health Economics, 27(1), 39–49. https://doi.org/10.1002/hec.3606.
Vyas, S., & Kumaranayake, L. (2006). Constructing socio-economic status indices: how to use principal components analysis. Health Policy and Planning, 21(6), 459–468. https://doi.org/10.1093/heapol/czl029.
Oppe, M., Devlin, N., & Black, N. (2011). Comparison of the underlying constructs of the EQ-5D and Oxford Hip Score: Implications for mapping. Value in Health, 14(6), 884–891. https://doi.org/10.1016/j.jval.2011.03.003.
Jobson, J. (2012). Applied multivariate data analysis: Volume II: Categorical and multivariate methods. New York: Springer.
Yaremko, R. M., Harari, H., Harrison, R. C., & Lynn, E. (2013). Handbook of research and quantitative methods in psychology: For students and professionals. Abingdon: Taylor & Francis.
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272–299. https://doi.org/10.1037//1082-989x.4.3.272. DOI.
Brazier, J. E., Yang, Y., Tsuchiya, A., & Rowen, D. L. (2010). A review of studies mapping (or cross walking) non-preference based measures of health to generic preference-based measures. The European Journal of Health Economics. https://doi.org/10.1007/s10198-009-0168-z.
Yohai, V. J. (1987). High breakdown-point and high efficiency robust estimates for regression. The Annals of Statistics, 15(2), 642–656.
Susanti, Y., Pratiwi, H. (2014). M estimation, S estimation, and MM estimation in robust regression. International Journal of Pure and Applied Mathematics, 91(3), 349–360. https://doi.org/10.12732/ijpam.v91i3.7.
Ayinde, K., Lukman, A. F., & Arowolo, O. (2015). Robust regression diagnostics of influential observations in Linear Regression Model. Open Journal of Statistics. https://doi.org/10.4236/ojs.2015.54029.
Powell, J. L. (1984). Least absolute deviations estimation for the censored regression model. Journal of Econometrics, 25(3), 303–325. https://doi.org/10.1016/0304-4076(84)90004-6.
Hao, L., & Naiman, D. Q. (2007). Quantile regression. Thousand Oaks: SAGE Publications.
Papke, L. E., & Wooldridge, J. M. (1996). Econometric methods for fractional response variables with an application to 401(k) plan participation rates. Journal of Applied Econometrics, 11(6), 619–632.
Ramalho, E. A., Ramalho, J. J. S., & Murteira, J. M. R. (2011). Alternative estimating and testing empirical strategies for Fractional Regression Models. Journal of Economic Surveys, 25(1), 19–68. https://doi.org/10.1111/j.1467-6419.2009.00602.x.
Ramalho, E. A., Ramalho, J. J. S., & Murteira, J. M. R. (2014). A Generalized goodness-of-functional form test for Binary and Fractional Regression Models. The Manchester School, 82(4), 488–507. https://doi.org/10.1111/manc.12032.
Ramalho, J. J. S., & da Silva, J. V. (2009). A two-part fractional regression model for the financial leverage decisions of micro, small, medium and large firms. Quantitative Finance, 9(5), 621–636. https://doi.org/10.1080/14697680802448777.
Davidson, R., & MacKinnon, J. G. (1981). Several tests for model specification in the presence of alternative hypotheses. Econometrica, 49(3), 781–793. https://doi.org/10.2307/1911522.
Ferrari, S. L. P., & Cribari-Neto, F. (2004). Beta regression or modeling rates and proportions. Journal of Applied Statistics, 31(7), 799–815. https://doi.org/10.1080/0266476042000214501.
Paolino, P. (2001). Maximum likelihood estimation of models with beta distributed dependent variables. Political Analysis. https://doi.org/10.1093/oxfordjournals.pan.a004873.
Hunger, M., Baumert, J., & Holle, R. Analysis of SF-6D Index Data: Is beta regression appropriate? Value in Health, 14(5), 759–767. https://doi.org/10.1016/j.jval.2010.12.009.
Smithson, M., & Verkuilen, J. (2006). A better lemon squeezer? Maximum-likelihood regression with beta distributed dependent variables. Psychological Methods. https://doi.org/10.1037/1082-989x.11.1.54.
Versteegh, M. M., Leunis, A., Luime, J. J., Boggild, M., Uyl-de Groot, C. A., & Stolk, E. A. (2012). Mapping QLQ-C30, HAQ, and MSIS-29 on EQ-5D. Medical Decision Making. https://doi.org/10.1177/0272989x11427761.
Barnhart, H. X., Haber, M., & Song, J. (2002). Overall concordance correlation coefficient for evaluating agreement among multiple observers. Biometrics, 58(4), 1020–1027.
Andrews, G., & Slade, T. (2001). Interpreting scores on the Kessler Psychological Distress Scale (K10). Australian and New Zealand Journal of Public Health, 25(6), 494–497.
Gray, A. M., Rivero-Arias, O., & Clarke, P. M. (2006). Estimating the association between SF-12 responses and EQ-5D utility values by response mapping. Medical Decision Making, 26(1), 18–29. https://doi.org/10.1177/0272989X05284108.
Boland, M. R. S., van Boven, J. F. M., Kocks, J. W. H., van der Molen, T., Goossens, L. M., Chavannes, N. H., & Rutten-van Mölken, M. P. M. H. (2015). Mapping the clinical chronic obstructive pulmonary disease questionnaire onto generic preference-based EQ-5D values. Value in Health, 18(2), 299–307. https://doi.org/10.1016/j.jval.2014.11.006.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
Ethical approval was granted by the Monash University Human Research Ethics Committee [Reference No. CF11/ 3192–2011001748]. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Rights and permissions
About this article
Cite this article
Lamu, A.N., Chen, G., Gamst-Klaussen, T. et al. Do country-specific preference weights matter in the choice of mapping algorithms? The case of mapping the Diabetes-39 onto eight country-specific EQ-5D-5L value sets. Qual Life Res 27, 1801–1814 (2018). https://doi.org/10.1007/s11136-018-1840-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-018-1840-5