Introduction

Condition-specific health-related quality of life (HRQoL) instruments are commonly regarded as being superior to generic HRQoL instruments at identifying changes in health [1]. However, for the purpose of comparing the relative cost-effectiveness of competing healthcare programmes across different disease areas, generic preference-based outcome measures are required. The most widely applied such instrument is the EQ-5D [2] that produces health state utilities for use in the calculation of quality-adjusted life years (QALYs). Given that clinical trials often do not collect data on patients’ health state utilities, there is an increasing research interest in developing mapping algorithms that can estimate the relationship between a condition-specific measure of health, or ‘source’ instrument onto a generic preference-based HRQoL measure, or ‘target’ instrument [3, 4].

Diabetes is a common cause of premature death and disability, which imposes a large economic burden on the healthcare system [5]. In 2015, the International Diabetes Federation estimated about 415 million diabetic people (or 8.8% of adults aged 20–79) and 5 million deaths [6]. The rising prevalence of diabetes will lead to increased demand for healthcare services. Consequently, there is a growing need for evaluating the cost-effectiveness of many new diabetes interventions as compared to competing use of resource in other disease groups [4, 7].

The Diabetes-39 (D-39) is the most widely used non-preference-based condition-specific instrument addressing the HRQoL of diabetic patients [8], while the EQ-5D is the most widely used generic preference-based instrument [9,10,11]. Over 17,000 studies using the EQ-5D instrument have been registered with the EuroQol Group by 2015 [11]. Further, the United Kingdom (UK) National Institute for Health and Care Excellence (NICE) [12] strictly requested that the EQ-5D should be used in all economic evaluations to ensure comparability between studies. Mapping studies aimed at generating algorithms for predicting EQ-5D utilities are also increasing, particularly after the NICE has endorsed mapping when the direct measure of EQ-5D utility is unavailable [13].

So far, only one study has developed mapping functions from D-39 to generic preference-based instruments including the five-level EQ-5D (EQ-5D-5L) [14]. A key limitation of this previous study is that it was based on an interim value set, which was a ‘cross-walk’ between the earlier three-level EQ-5D (EQ-5D-3L) and the EQ-5D-5L descriptive system [15]. Most recently, several country-specific EQ-5D-5L value sets have been published, including four Western countries (England, the Netherlands, Spain, Canada), three Asian countries (China, Japan, Korea) and one South American (Uruguay) [16,17,18,19,20,21,22,23]. Hence, the existing mapping algorithm for D-39 is becoming obsolete following the publication of these new EQ-5D-5L value sets.

This paper makes three important contributions to the literature. First, it produces optimal mapping functions for EQ-5D-5L value sets recently developed by direct elicitation of the general public. Second, we test several regression models. While the previous mapping algorithm for D-39 applied only ordinary least squares (OLS) and generalised linear model (GLM) [14], the current paper investigates into the merit of four other regression models. Third, we show that when preference-based value sets differ across countries, the regression coefficients in the mapping functions will also differ. Thus, there is a need to derive country-specific mapping algorithms.

The present study aims to develop mapping algorithms that transform D-39 scores onto EQ-5D-5L utility values for each of eight recently published country-specific EQ-5D-5L value sets and to compare mapping functions across the EQ-5D-5L value sets. This study followed the recently developed checklist of minimum reporting requirement for mapping studies [24]: ‘MApping Preference-based Measures reporting Standards (MAPS)’.

Methods

Data

Data were obtained from the Multi Instrument Comparison (MIC) study, which is based on an online survey administered in six countries (Australia, Canada, Germany, Norway, UK and United States of America) by a global panel company, CINT Australia Pty Ltd. The key aims of this large-scale study that includes more than 8,000 subjects were (i) to compare health state utilities obtained from different generic preference-based instruments, and (ii) to compare such values with values obtained from disease-specific instruments. There was no missing value in the online survey. However, considering the lack of direct control in the online survey, several edit procedures (e.g. a comparison of duplicated questions, and removal of respondents whose recorded completion time shorter than 20 minutes) were conducted to ensure the quality of data. For further details on the MIC study, see Richardson et al. [25]. Among the seven chronic disease groups included in this comprehensive international study, the current paper is based on the diabetes group (N = 924). The MIC data are an ideal source for developing mapping algorithms from condition-specific HRQoL onto generic preference-based values. So far it has been used to derive mapping functions in different chronic diseases, including asthma [26], depression [27] and heart diseases [28].

Measures of variables

The EQ-5D includes five dimensions: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. The construction of the EQ-5D-5L retained the original dimensional structure of the EQ-5D-3L questionnaire and increased the descriptive systems to five levels of severity to address the problem of sensitivity and potential ceiling effects [29]. The five severity levels include no problems, slight problems, moderate problems, severe problems and unable to/extreme problems. Here, we applied the new EQ-5D-5L value sets from eight countries [16,17,18,19,20,21,22,23]: England, the Netherlands, Spain, Canada, Uruguay, China, Japan and Korea. For ease of exposition, the English value set would be emphasised, though results for other countries’ value sets were discussed. The scale length differs across countries with the worst health state or the ‘pits’ (55555) ranging from − 0.446 in the Netherlands to − 0.025 in Japan [30].

The D-39 includes 39-items, each with 7-response levels ranging from 1 (not affected at all) to 7 (extremely affected) [8]. The D-39 covers five domains: energy and mobility (15 items), diabetes control (12 items), anxiety and worry (4 items), social burden (5 items) and sexual functioning (3 items). Each item was reverse-coded, such that low values representing poor HRQoL. Domain scores were calculated by summing the item scores within each domain. Similarly, the total score of D-39 was obtained by summing the entire 39 item scores, where item scores are set equal to the rank order of the response. Finally, the total D-39 score and each subscale summary score were linearly transformed onto a 0 to 1 scale: 0 indicating the worst, and 1 the best possible health state.Footnote 1

It should be noted that the dataset on which the modelling is based does not include subjects from any Asian countries. Thus, to the extent that people from Asia would describe their health problems differently along the instruments considered here, the regression coefficients in the mapping functions for the three Asian countries should be interpreted with somewhat more caution than those for the Western countries.

Statistical analyses and estimation

Descriptive analyses

Respondents’ characteristics were described as mean (standard deviation, SD) unless otherwise indicated. The degree of conceptual overlap between the source (D-39) and target (EQ-5D-5L) instruments was examined using Spearman’s correlation coefficients and a principal component analysis (PCA). PCA is a multivariate statistical technique used to reduce the number of variables in a dataset into a smaller number of linearly uncorrelated ‘principal components’ that helps to investigate whether a number of items generate information about a more general underlying component [31]. The PCA was applied to explore and compare the underlying dimensional structure of the D-39 data and EQ-5D-5L information evident in these data. The number of principal components was decided according to eigenvalues, which indicate the amount of variance in the original variables explained by each principal component [32]. Only principal components with eigenvalues greater than 1 were considered in the exploratory PCA [33]. Rotation of the dimensions identified in the initial extraction is needed in order to obtain simple and interpretable components [34]. Rotation methods, which allow correlations between components, are referred to as oblique. Thus, to account for potential correlations among components, rotation was performed using an oblique Promax method [35].

Econometric techniques for mapping analysis

A direct mapping approach was adopted. The D-39 five domains, as well as age and gender, were the potential variables to predict EQ-5D-5L utilities. The final predictors were determined via stepwise forward selection that included only significant variables (p < 0.05). Predictors were also required to be logically consistent, meaning that poorer scores on a source instrument should lead to lower utility on the target instrument. Non-linearities were investigated, and they were included only if the main effect variable significantly contributed to the model.

Six econometric models were adopted: OLS, GLM, MM estimation, censored least absolute deviation (CLAD), fractional regression model (FRM) and beta binomial (BB) regression. The OLS, which depends on a normally distributed error term, is the most commonly used regression model in mapping studies [36]. The GLM is a flexible generalisation of OLS that allows our target variable to have a non-normal error distribution [14, 36]. The log link function with Gaussian family predicting EQ-5D-5L utility fitted the data well, and hence applied in the estimation of GLM.

The MM estimationFootnote 2 is one of the robust regression estimation methods that is used when the distribution of residual is not normal or there are some outliers that affect the model [38]. It aims to obtain estimates that have the high breakdown value, which is a common measure of the proportion of outliers that can be addressed before these observations affect the model [39], and is more efficient. Similarly, the CLAD model is a robust estimator which further takes into account the censoring issue of the outcome variable [40]. It provides consistent and robust estimates under rather general conditions, particularly for misspecification of errors related to heteroscedasticity and non-normality [41].

The FRM was developed to model empirical bounded dependent variables defined on [0, 1] interval that exhibit piling-up at one of the two corners [42]. Unlike other parametric methods, the important advantage of this semi-parametric FRM is that it does not make any distributional assumption about an underlying structure used to obtain the outcome variable [42, 43]. Given a vector of independent variables (X) and a dependent variable (Y), the basic assumption underlying the FRM can be summarised as

$$E(Y|X)\,=\,G(X\beta ),$$
(1)

where G(·) is a known non-linear function satisfying 0 ≤ G(·) ≤ 1 and β is a vector of parameters to be estimated. The generalised goodness-of-functional form test proposed by Ramalho et al. [44] suggested that ‘loglog’ is the best alternative functional form for G(.) and used as a link function. It is defined as

$$E(Y|X)\, = \,\exp ( - e^{{X\beta }} )\, = \,\exp \left[ { - \exp (X\beta )} \right].$$
(2)

Another issue in FRM is to decide whether a one- or two-part model should be used, where the discrete component (observations at one or both boundaries) is modelled as a binary or multinomial model and the rest as FRM in a two-part model [45]. In EQ-5D, since boundary values are extreme scores best described by the same mechanism as all other values in the interval, there is no strong theoretical reason to use the two-part model. Further, a P-test statistic proposed by Davidson and MacKinnon [46] and applied to this particular context by Ramalho et al. [43] suggested no sufficient evidence to reject the null hypothesis that the one-part model is most appropriate (p = 0.128). The FRM (with loglog link function) was implemented using ‘frm’, a user-defined Stata command [43]. The observed EQ-5D-5L utilities were firstly transformed onto a 0–1 scale before entering into the regression as the dependent variable through the following equation: (X − Xmin)/(Xmax − Xmin), in which X stands for the EQ-5D-5L utility and min/max refers to the minimum/maximum value of observed EQ-5D-5L utilities in each country-specific value set. Eventually, predictions of EQ-5D-5L utilities were back-transformed to the original scale.

The BB regression is a fully parametric approach for modelling bounded data. For instance, due to its known flexibility that allows a great variety of asymmetric forms, the most popular choice to estimate Eq. (1) is the parametric beta distribution [47, 48]. The re-parameterisation of the beta regression has been detailed in Ferrari and Cribari-Neto [47]. The logit functional form fitted the data well and hence applied as a link function in the BB regression model for predicting the EQ-5D-5L utilities. As this parametric model is not defined at the boundary values, the outcome values should be restricted to the 0–1 range, excluding 0 and 1. This can be achieved by linear transformation, [Y(N − 1) + 0.5]/N, following earlier literature [49, 50], where N is sample size and Y is linearly transformed EQ-5D-5L utility on 0–1 scale.

Goodness-of-fit measures

Following external guidance [3, 36], mean absolute error (MAE) and root mean square error (RMSE), both of which were adjusted for the degrees of freedom owing to the potential varied number of predictors in each model, were firstly calculated. Both MAE and RMSE may easily be influenced by the scale of the dataset because wider scale size leads to a larger error [51]. Thus, we normalised both MAE and RMSE to the range of the measured data. Such normalised RMSE (NRMSE) and normalised MAE (NMAE) are non-dimensional and facilitate the comparison between datasets or models with different scales. Model performance was also examined by the square of the correlation coefficient between the observed and predicted values adjusted for the number of predictors in the model (adjusted-r2). Furthermore, the degree of absolute agreement between the predicted and the observed EQ-5D-5L was assessed using Lin’s concordance correlation coefficient (CCC) [52]. The CCC is popular for assessing agreement between two measures, which includes both precision (degree of scatter) and accuracy (systematic bias) components [52, 53]. A value of CCC close to unity implies a good concordance between predicted and observed measures.

In the absence of an external data, cross-validation was usually conducted using existing data to evaluate the performance of the mapping algorithms for generalisability. This study applied a k fold approach, where the total sample was randomly divided into k (k = 5) equally sized groups. Each time, four groups were combined as an estimation sample (i.e. 80% of the full sample) to derive mapping algorithms which were then applied to the remaining validation group (i.e. 20% of the full sample). This process was repeated five times, and hence all groups in the dataset were used for both estimation and validation sample. Finally, the normalised average values of RMSE and MAE, as well as mean r2 from the five iterations, were reported for easy comparison of the models’ predictive performance.

The best-fitting model with good predictive accuracy was selected primarily based on three metrics derived from both the full sample and validation sample: NRMSE, NMAE and adjusted-r2. As there is no single gold standard criterion, these model performance indicators were equally important. Thus, the model that performs best in most of these criteria should be selected. When a disagreement is observed between them, the CCC from the full sample would be applied to determine the final preferred mapping algorithm. Eventually, this preferred model was estimated using the full sample (N = 924). All statistical analyses were conducted using Stata® version 14.2 (StataCorp LP, College Station, Texas, USA) except the PCA, which was carried out in SPSS version 24 (IBM Corp., Armonk, NY, USA).

Results

Descriptive and conceptual overlap

Table 1 reports sample characteristics and a summary of source and target instruments. The mean (SD) value was 0.786 (0.215) for EQ-5D-5L (English value set) and 0.659 (0.222) for D-39. The average scores of the D-39 domains range from 0.535 for the ‘anxiety and worry’ domain to 0.781 for ‘social burden’. The highest correlation (Table 8 in Appendix) was found between the ‘energy and mobility’ domain of D-39 and EQ-5D-5L utility indices: it ranges from 0.692 with Uruguayan to 0.721 with the Japanese value set. For EQ-5D-5L dimensions, the strongest correlation (0.646) was observed between the ‘anxiety and worry’ domain of D-39 and the ‘anxiety/depression’ dimension of EQ-5D-5L. For the remaining four EQ-5D-5L dimensions, higher coefficients were consistently observed with the ‘energy and mobility’ domain of D-39, where the highest coefficient (0.612) was observed with EQ-5D-5L ‘mobility’ dimension and the lowest (0.449) with ‘self-care’.

Table 1 Summary statistics (N = 924)

Results from the PCA (Table 2) revealed six principal components when the number of components was limited to those with an eigenvalue greater than 1 and explained about 70% of the total variance. The first five principal components were the same as the D-39 domains (‘diabetes control’, ‘energy and mobility’, ‘anxiety and worry’, ‘sexual functioning’ and ‘social burden’), and the sixth component was related to ‘self-care’. Two EQ-5D-5L dimensions (‘mobility’ and ‘pain/discomfort’) were mainly loaded onto the same component as the D-39 items that describe activities related to ‘energy and mobility’. Another two EQ-5D-5L dimensions (‘self-care’ and ‘usual activities’) were mainly loaded to the last component, in which only one D-39 item (i.e. ‘having trouble caring yourself’) was mainly loaded onto. The EQ-5D-5L ‘anxiety/depression’ dimension was loaded to the ‘anxiety and worry’ component, together with relevant D-39 items. None of the EQ-5D-5L dimensions were mainly loaded on the ‘diabetes control’, ‘social burden’ or ‘sexual functioning’ components, supporting the poor correlations identified above. In sum, the EQ-5D-5L dimensions mainly loaded on three principal components (‘energy and mobility’, ‘anxiety and worry’ and ‘self-care’), which explains 17.14% of the total variance.

Table 2 Principal component analysis—rotated factor matrix

Model performance

Tables 3 and 4 summarise the performance of models (assessed by three goodness-of-fit indicators: NRMSE, NMAE and adjusted-r2), while Table 5 reports the fourth indicator (the CCC) for all countries. When the English value set was considered, the fractional regression model with loglog as a link function was the best performing model in terms of all four indicators: NRMSE (0.1282), NMAE (0.0914), adjusted-r2 (52.5%) and CCC (0.691). The cross-validation approach also revealed similar results that the FRM consistently performed best. The FRM model continued to perform well for the remaining country-specific value sets. However, both OLS and FRM equally performed well for the Dutch value set. Both models had the lowest NRMSE (0.1282) and the highest CCC (0.667) in the full sample. While OLS model provided the highest adjusted r2 in the full sample and cross-validation, the FRM model consistently produced lower NMAE than OLS model in both full sample and cross-validation; it also produced the lowest NMAE in the full sample among all estimators. Overall, we recommend FRM for the Dutch value set.

Table 3 Primary goodness-of-fit indicators of country-specific EQ-5D-5L value sets (England, Netherland, Spain, Canada)
Table 4 Primary goodness-of-fit indicators of country-specific EQ-5D-5L value sets (Uruguay, China, Japan and Korea)
Table 5 Goodness-of-fit: concordance correlation coefficient (full sample)

Regarding the CCC results, the FRM revealed the highest concordance, with only the exception for the Japanese value set in which the MM estimator produced the highest CCC (followed by CLAD and FRM), confirming the stability and consistency of our model. Except for the Japanese value set (CCC = 0.716), the MM estimator provided generally poor concordance. The lowest concordance (CCC = 0.361) was observed for the Canadian value set.

Although mapping algorithms produce adequate mean prediction for EQ-5D-5L, this may not always be true for individual predictions. The predicted English EQ-5D-5L had a mean (SD) value of 0.786 (0.157) for the best model, which is quite similar to the observed mean. The respective 25th, 50th and 75th percentiles of the predicted values were 0.703, 0.840 and 0.911 compared with 0.712, 0.858 and 0.937 for the observed values. These values were comparable. However, our best-fitting model over-predicted at severe health states (Fig. 1). For instance, the 5th and 10th percentiles of the predicted English EQ-5D-5L value set were 0.445 and 0.550 against 0.317 and 0.462 for the observed value set, respectively (Table 9 in Appendix).

Fig. 1
figure 1

Scatter plot of observed versus predicted EQ-5D-5L (English value set). Line of equality between observed and predicted values (red-line); OLS ordinary least square, GLM generalised linear model, BB beta binomial regression, FRM fractional regression model, MM robust MM estimator, CLAD censored least absolute deviation, EQ-5D-5L EuroQol five-dimensional five-level questionnaire. (Color figure online)

Regression results

Table 6 shows the regression results for the various models when the English value set was applied. The coefficient for ‘energy and mobility’ (EM) was positive and statistically significant (p < 0.01) in all models. The ‘anxiety and worry’ (AW) domain was also a significant positive predictor of EQ-5D-5L in BB and MM models. Further, the squared-term for ‘energy and mobility’ (EM2) was a significant predictor of EQ-5D-5L in OLS, GLM and CLAD models. The remaining D-39 domains were either insignificant or yield logically inconsistent signs, and hence ignored in all models. Two socio-demographic variables (age and gender) were considered. However, only age revealed statistical significance in all models except for CLAD.

Table 6 Regression results predicting English EQ-5D-5L values from D-39 subscales

Table 7 reports the best performing mapping algorithms for eight country-specific EQ-5D-5L value sets based on the model performance criteria discussed above. For each country, the optimal mapping function was consistently estimated using the FRM.

Table 7 FRM as optimal mapping equations predicting each country-specific EQ-5D-5L utility value from D-39 subscales

Discussion

This study developed algorithms that allow the mapping of the D-39 instrument onto eight country-specific EQ-5D-5L value sets. The EQ-5D-5L utility index was modelled as a function of the D-39 domains using six regression models. The result generally showed the FRM to be the preferred model in predicting EQ-5D-5L utility values in all country-specific value sets.

The present study also assessed the empirical performance of different statistical approaches to predict EQ-5D-5L utility values. For diabetes patients, the EQ-5D-5L was found to have a moderate degree of ceiling effect (17.4% respondents reported to be in full health according to the EQ-5D-5L descriptive system), and highly skewed to the left. Alternative regression models robust to such properties were investigated, where the FRM with loglog link performed best in comparison with other models under consideration for each country-specific value set. Yet, the regression coefficients were considerably different across countries (Table 7).

Typically, many mapping studies apply standard linear regression to predict EQ-5D utilities. However, EQ-5D has two fundamental properties. It is characterised by (i) bounded nature of utility data (between the worst and the best health states), and (ii) ceiling effects (i.e. piling-up of utilities at 1). Under such circumstances, the effect of predictor variables cannot be constant throughout its entire range [45]. The novelty of the FRM model is that it is non-linear and more appropriate for naturally bounded data that exhibit piling-up at one of the two corners as the case in EQ-5D [43, 45]. Furthermore, FRM is a semi-parametric approach estimated by quasi-maximum likelihood, which gives consistent estimators regardless of the true distribution of the outcome variable conditional on the predictors. This model also constrains predictions within the range determined by EQ-5D utility.

The current study differs in several important aspects from the previous study [14]. The study by Chen et al. [14] only considered two regression models (OLS and GLM). The present study, however, considered different analytical approaches, addressing the characteristics of the data such as censoring, problems of normality and heterogeneity of variance by comparing six distinct econometric models. Furthermore, while the study by Chen and colleagues used the interim ‘cross-walk’ vale sets, the mapping algorithm provided in this study employed the directly elicited EQ-5D-5L value sets. Thus, the preferred models and their performance in terms of both MAE and RMSE as well as adjusted-r2 were quite different as expected. An MAE, RMSE and adjusted-r2 values for our preferred model were 0.0914, 0.1282 and 52.5%, respectively, as compared to 0.131, 0.177 and 47.5% in the study by Chen et al.Footnote 3 This discrepancy may partly be attributable to variation in the target instrument used and partly due to the mapping functions employed. The two studies also differ in terms of the use of additional covariates in predicting EQ-5D-5L utility values. While the study by Chen et al. [14] included country dummies and gender, the present study has considered respondents’ age and gender alone. Country variable was initially considered as a potential predictor in this study. However, it does not improve the prediction of EQ-5D-5L utility and hence dropped from the analysis. In both studies, gender was not a significant predictor of EQ-5D-5L utility values.

Mapping models usually overestimate for patients with severe health states and underestimate for patients with good health states. This general problem of mapping studies is due to regression to the mean [36]. Furthermore, this might also be explained by the preference pattern of EQ-5D-5L, where substantial decrements in preference weights occur at severe health states [30]. The conceptual difference between the source and target instruments may be another explanation. From the PCA result, ‘self-care’ and ‘usual activities’ dimensions of EQ-5D-5L were mainly loaded onto the ‘self-care’ component, while only one D-39 item (i.e. ‘having trouble caring for yourself’) mainly loaded onto the same component. Many of the D-39 items related to the domains ‘diabetes control’, ‘sexual functioning’ and ‘social burden’, formed three distinct components unrelated to the EQ-5D-5L dimensions. These components revealed logically inconsistent signs in the estimated coefficients (results not reported here), which may be attributable to the hidden unknown interrelationships among the domains of the D-39, evidenced by relatively high correlation coefficients between the ‘energy and mobility’ domain with the rest of the D-39 domains.

In addition to providing insight into the dimensional structure of both D-39 and EQ-5D-5L instruments, the PCA also provided a basis for the choice between a direct or an indirect (response) mapping approach [32]. As demonstrated in the PCA (Table 2), nearly 21 items from D-39, which were mainly loaded onto the ‘diabetes control’, ‘sexual functioning’ and ‘social burden’ components, were not related to EQ-5D-5L dimensions. Thus, the probability of accurately predicting five EQ-5D-5L response levels for each dimension was low. In general, exact prediction of a health state requires correct predictions for each dimension of EQ-5D in response mapping approach, which is rarely achieved. Consequently, response mapping can be severely penalised when an incorrect prediction is made [54].

This study has several strengths. First, we have investigated several mapping algorithms that could be used to predict EQ-5D-5L utility when only the D-39 instrument has been applied in a clinical study. The FRM and BB models are appropriate for bounded outcome variable as the case in this study, which has the ability to predict within the given range. Despite its strong assumptions, the OLS regression model has performed pretty well as compared to other robust candidate estimators, such as GLM, BB regression, CLAD and MM estimation. Secondly, we have adjusted for scale differences to facilitate comparison between instruments or models with different scales, which is usually undermined in many mapping studies. Thirdly, this is the first study to assess the predictive accuracy of different EQ-5D-5L value sets using the D-39 instrument. Boland et al. [55] examined the clinical chronic obstructive pulmonary disease with three EQ-5D-3L value sets (for Netherland, UK and the US), though they did not make direct comparison as in the present study. Lastly, our algorithm has wider generalisability because of the multinational nature of the patient population used. Yet, as generalisability is a major issue for mapping studies, it should be tested how these models perform in different diabetic patient populations. With regard to study limitations, the dataset on which the modelling is based does not include Asian subjects. If Asian diabetes patients would describe their problems differently on the source (D-39)—and the target (EQ-5D-5L) instruments, the regression coefficients in the mapping functions for China, Japan and Korea should be interpreted with some caution. Secondly, although the estimation data in this study cover a wide range of EQ-5D-5L utilities, they do not cover the whole theoretical range; consequently the prediction error towards the lower end of utility distribution tends to be large. Furthermore, self-selection bias might have occurred, as respondents were volunteered to participate in the online survey.

In conclusion, results from this study reveal that the estimated EQ-5D-5L utility values from the preferred mapping algorithm can adequately predict a mean utility value in other sample in the absence of preference-based generic instruments. Thus, such mapping algorithms facilitate health economic evaluation that can guide decision-makers for appropriate interventions in diabetic patients.