FormalPara Key Points

A non-linear, constrained hybrid model was shown to be a viable statistical technique generating health state values from combined time trade-off and discrete choice experiment preference data.

The results are comparable to linear main-effects models, but with improved out-of-sample predictive accuracy.

The non-linear, constrained hybrid model was chosen to represent the Malaysian EQ-5D-5L value set, which has a range of − 0.442 to 1.

1 Introduction

The EuroQol five-dimension inventory (EQ-5D), a preference-based instrument designed to capture health-related quality of life (HRQoL), is the most-used instrument worldwide in economic evaluations measuring health benefit in terms of quality-adjusted life-years (QALYs) [1, 2]. It is recommended by several health technology assessment agencies, the most prominent of which is the National Institute for Health and Care Excellence of England and Wales (NICE). The EQ-5D instrument describes health problems along five dimensions of health: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. The original version [3], now generally referred to as the EQ-5D-3L, describes each dimension at three levels, roughly corresponding to no problems, some problems, and extreme problems. A newer version, known as the EQ-5D-5L, has been released and is gaining popularity [1]. As the name suggests, the EQ-5D-5L describes each dimension of health at five severity levels [4], corresponding to no, slight, moderate, severe, and unable/extreme problems.

The EQ-5D-5L instrument comprises two main components: the descriptive system, and value sets [5]. The descriptive system is a questionnaire allowing respondents to describe their health along the five dimensions, as well as presenting their overall health on a standardized, vertical visual analog scale, the EQ-VAS. Each unique combination of responses to the descriptive system (sans the EQ-VAS) is called as an EQ-5D health state. The health states are conventionally referred to using a five-digit short-hand, such that each digit represents the level of functioning for each dimension in the previously presented order. Thus, state 11111 reflects no problems on any dimension, and state 55555 reflects extreme problems on all five dimensions. For responses in terms of EQ-5D health states to be useful in QALY calculation, values reflecting the relevant populations’ preferences for each health states must be assigned. By convention, such utility values are presented on a scale where 1 reflects the preference for full health, and 0 reflects the preference for not being alive. An EQ-5D value set comprises such values for all 3125 EQ-5D health states.

Value sets for EQ-5D instruments have conventionally been generated through population valuation studies in which representative samples of the general public have been administered different tasks to value a subset of the possible EQ-5D health states, after which statistical modeling is used to generate a full set of values. Previous valuation studies [6, 7] for EQ-5D-3L suffered from substantial variation in elicitation methods. To improve comparability and ensure greater quality, the EuroQol Group has developed a standardized protocol for conducting EQ-5D-5L valuation studies, involving the use of a computer-assisted interview system called EuroQol Valuation Technology (EQ-VT) [8]. EQ-VT and the standardized protocol are both updated to reflect best practice based on available evidence. For the first time, this valuation tool was utilized in the Malaysian setting to systematically derive a value set for the EQ-5D-5L instrument. This value set should have high comparability to other EQ-VT-based value sets due to the similar protocols employed.

The standardized protocol has been used in EQ-5D-5L valuation studies involving various countries, including Canada [9], Uruguay [10], England [11], the Netherlands [12], Korea [13], Japan [14], Spain [15], China [16], Hong Kong [17], Indonesia [18], Germany [19], Thailand [20], and Ireland [21], while studies are underway in countries including the United States and Singapore. As implemented methodologies are becoming more standardized, value set differences among countries, even those belonging to similar continents, are apparent and might be attributed, in part, to cultural values shared by people of a country. This is especially evident in the variability of preference weights set on different dimensions in different settings, stressing the need for a localized value set for economic evaluation use.

The EQ-5D is commonly used in the Malaysian healthcare setting [22], but the lack of a local EQ-5D-5L value set may limit its scope of usage. Having a local value set ensures that health effects or outcomes are valued according to the preferences of the concerned population. The use of a standardized and thoroughly tested valuation protocol and standard technology improves quality control and ensures greater comparability with other studies. The objective of this study was to derive a Malaysian EQ-5D-5L value set using the standardized protocol and the EQ-VT.

2 Methods

2.1 Sampling and Recruitment

A sample size of 1000 is recommended by the EuroQol Group for valuation studies aiming to establish an EQ-5D-5L value set [23]. To take into account possible study attrition, a total of 1200 Malaysians were targeted in this study. The eleven states located in the Peninsular Malaysia were divided into four regions (Northern, Central, Eastern, Southern), with each region considered as a cluster. One urban and one rural area from each cluster was then randomly selected using simple random sampling. The clusters included in the study were Penang (Northern), Selangor (Central), Kelantan (Eastern), and Malacca (Southern).

Respondents were recruited from the respective areas using quota-based sampling with quotas set for urbanicity (urban/rural), gender (male/female), age (18–39/40–64/> 64 years), and ethnicity (Indigenous/ Non_Indigenous) based on the Malaysian National Census [24]. Individuals aged > 18 years who understood English or Malay were recruited to participate in this study while those with low cognitive levels (visible signs of mental impairment) or who were unable to understand the instructions were excluded. Informed consent was obtained from all eligible respondents.

2.2 Data Collection

Eighteen trained interviewers (final-year pharmacy or medical undergraduate students) administered the valuation instrument. They moved from one region to the next after completing the required number of interviews. The interviewers were stationed at selected gathering points (e.g., shopping complexes, markets, food courts, or community halls) in targeted areas. Eligible subjects encountered in nearby public places were approached for study consent based on convenience sampling and pre-set quotas.

Prior to the field work, each interviewer attended a 3-day training workshop conducted by experienced EuroQol representatives and the Malaysian investigator team. Components of the training were co-designed by EuroQol Group and the Malaysian team. These included an introduction to the background and rationale of the study; detailed review of interview tasks using specifically designed visual materials, giving exposure to health preference elicitation methods; practical interviewing techniques; project-specific interviewer’s tasks; demonstration of the interview procedures; practice interview with fellow participants, and a ‘mock interview’ with the general public. In addition, the interviewers performed a minimum of five practice interviews as part of their training. Only interviewers that satisfied the training requirements were allowed in the field.

Quality control (QC) was carried out following the official guidelines using the EQ-VT QC tool [25] to closely and continuously monitor interviewers throughout the whole data collection process. In addition, all interviewers were required to report to the project supervisors on a daily basis.

2.3 Valuation Protocol

The valuation tasks were administered using the EQ-VT application, adhering to the valuation protocol set forth by the EuroQol Group [8]. Briefly, the interview had three parts: (1) self-reported health using the descriptive EQ-5D-5L and warm-up tasks leading to the valuation tasks; (2) valuation tasks involving completing ten composite time trade-off (C-TTO) tasks and subsequently seven discrete choice experiments (DCEs); and (3) country-specific questions including socio-demographics of respondents.

The C-TTO method has been described in detail elsewhere [26]. C-TTO applies the classic approach for health states deemed better than dead, whereby the respondent would find an indifference point between x years in full health (Life A) and 10 years in a specific health state (Life B), with the value of the impaired health state being x/10. If the respondent is faced with a health state where he or she perceives that dying immediately is preferable to being in the health state for 10 years (worse than dead, WTD), a slightly different scenario, called lead-time TTO, is used instead. In this case, the respondent must still find an indifference point between two lives: Life A consisting of x years in full health, and Life B consisting of 10 years in full health followed by 10 years in the health state. The health state is then valued as (x − 10)/10. The current EQ-VT protocol employs a total of 86 health states for direct elicitation with C-TTO in ten blocks of ten health states each. One of the five very mild health states (i.e., having only slight problems in one of the five dimensions) and the pits state (55555) were present in every block, leaving 80 states (eight health states × ten blocks) chosen by Monte Carlo simulation for inclusion in the standardized EQ-VT C-TTO design. These 80 unique states were distributed over the ten blocks ensuring mean utility is approximately equal across blocks [23].

After valuing ten health states using C-TTO, respondents were administered a set of seven DCE tasks, in which they were asked to choose between two different EQ-5D-5L health states with no duration specified. DCEs measure health state preference indirectly, on a latent scale of relative attractiveness of one health state over another, without undergoing an iterative process to reach a point of indifference, as in C-TTO [27, 28]. As both C-TTO and DCE aim to measure the same construct (health-state preference), the variability in both methods may offer information regarding the utility of the measured health states. The current EQ-VT employs 196 different DCE pairs, organized in 28 blocks of seven pairs each, with the blocks relatively balanced in terms of severity. Further information about the selected DCE pairs can be found elsewhere [23]. Each respondent was randomly assigned one C-TTO block and one DCE block, and for both, the order of presentation was randomized.

2.4 Data Analysis and Modeling

Four main candidate models were extensively tested for performance: 20-parameter linear main-effect models; and eight-parameter non-linear constrained models; both fitted to C-TTO data only (ADD20_TTO, MULT8_TTO), or to C-TTO and DCE data in a hybrid model design (ADD20_HYBRID, MULT8_HYBRID).

The 20-parameter model employs one parameter for each of levels 2 through 5 for each dimension. Let α represent the intercept, xdl the dummy variable indicating the presence of problems on dimension d at level l, and βdl the coefficient representing the estimated disutility of having problems on dimension d at level l (e.g., βMO3 representing the disutility of having moderate problems [level 3] on mobility). The mathematical function of the 20-parameter model is as follows:

$$\begin{aligned} y & = \alpha + \Sigma_{l} \Sigma_{d} \beta_{dl} x_{dl} + e = \alpha \\ & \quad + \beta_{{{\text{MO}}2}} x_{{{\text{MO}}2}} + \beta_{{{\text{SC}}2}} x_{{{\text{SC}}2}} + \beta_{{{\text{UA}}2}} x_{{{\text{UA}}2}} + \beta_{{{\text{PD}}2}} x_{{{\text{PD}}2}} + \beta_{{{\text{AD}}2}} x_{{{\text{AD}}2}} \\ & \quad + \beta_{{{\text{MO}}3}} x_{{{\text{MO}}3}} + \beta_{{{\text{SC}}3}} x_{{{\text{SC}}3}} + \beta_{{{\text{UA}}3}} x_{{{\text{UA}}3}} + \beta_{{{\text{PD}}3}} x_{{{\text{PD}}3}} + \beta_{{{\text{AD}}3}} x_{{{\text{AD}}3}} \\ & \quad + \beta_{{{\text{MO}}4}} x_{{{\text{MO}}4}} + \beta_{{{\text{SC}}4}} x_{{{\text{SC}}4}} + \beta_{{{\text{UA}}4}} x_{{{\text{UA}}4}} + \beta_{{{\text{PD}}4}} x_{{{\text{PD}}4}} + \beta_{{{\text{AD}}4}} x_{{{\text{AD}}4}} \\ & \quad + \beta_{{{\text{MO}}5}} x_{{{\text{MO}}5}} + \beta_{{{\text{SC}}5}} x_{{{\text{SC}}5}} + \beta_{{{\text{UA}}5}} x_{{{\text{UA}}5}} + \beta_{{{\text{PD}}5}} x_{{{\text{PD}}5}} + \beta_{{{\text{AD}}5}} x_{{{\text{AD}}5}} + e. \\ \end{aligned}$$

The constrained eight-parameter models fit a single parameter per dimension (βMO, βSC, βUA, βPD, βAD), taking a value representing level 5; and one parameter for each of levels 2, 3, and 4 (L2, L3, L4), which are multiplied by the respective dimension parameters. For example, the disutility of having moderate problems (level 3) on mobility is βMO × L3. The mathematical function is as described below, where xdl still represents the dummy variable representing the presence of problems on dimension d at level l.

$$\begin{aligned} y & = \alpha + \Sigma_{l} \left( {\Sigma_{d} \beta_{d} x_{dl} } \right)L_{l} + e = \alpha \\ & \quad + \left( {\beta_{\text{MO}} x_{{{\text{MO}}2}} + \beta_{\text{SC}} x_{{{\text{SC}}2}} + \beta_{\text{UA}} x_{{{\text{UA}}2}} + \beta_{\text{PD}} x_{{{\text{PD}}2}} + \beta_{\text{AD}} x_{{{\text{AD}}2}} } \right)L_{2} \\ & \quad + \left( {\beta_{\text{MO}} x_{{{\text{MO}}3}} + \beta_{\text{SC}} x_{{{\text{SC}}3}} + \beta_{\text{UA}} x_{{{\text{UA}}3}} + \beta_{\text{PD}} x_{{{\text{PD}}3}} + \beta_{\text{AD}} x_{{{\text{AD}}3}} } \right)L_{3} \\ & \quad + \left( {\beta_{\text{MO}} x_{{{\text{MO}}4}} + \beta_{\text{SC}} x_{{{\text{SC}}4}} + \beta_{\text{UA}} x_{{{\text{UA}}4}} + \beta_{\text{PD}} x_{{{\text{PD}}4}} + \beta_{\text{AD}} x_{{{\text{AD}}4}} } \right)L_{4} \\ & \quad + \beta_{\text{MO}} x_{{{\text{MO}}5}} + \beta_{\text{SC}} x_{{{\text{SC}}5}} + \beta_{\text{UA}} x_{{{\text{UA}}5}} + \beta_{\text{PD}} x_{{{\text{PD}}5}} + \beta_{\text{AD}} x_{{{\text{AD}}5}} + e. \\ \end{aligned}$$

The constrained models were tested based on previous studies demonstrating that these may produce improved out-of-sample predictions [16, 29]. C-TTO-only models were fitted using mixed-effects models with random intercepts at the level of individual study participants.

Hybrid models were also explored. In these models, C-TTO and DCE data were combined as one data file to find a common vector of coefficients. This modeling technique was originally proposed by Oppe and van Hout [30] and has been successful in producing logically consistent predictions in a number of studies [15, 17,18,19, 31]. The hybrid modeling method uses a joint maximum likelihood procedure, in which the same set of parameters is used to obtain likelihood estimates over C-TTO data assuming a normal distribution, and DCE data using conditional logit. As DCE produces relative distances between health states on a latent scale, a scaling parameter between DCE and C-TTO is introduced. Since standard errors (SEs) and confidence intervals (CIs) are not readily available from non-linear models, bootstrapping was used. All models were fitted to each of the 10,000 individual respondent-level bootstrap samples.

In addition to the eight-parameter and 20-parameter models, various other models and extensions were tested but discarded due to lack of monotonicity or greatly reduced model performance. All models were tested with intercepts (separate intercepts for C-TTO and DCE in the hybrid cases). Intercepts were removed if they were statistically non-significant.

All data were analyzed as disutilities (1 − utility score) using statistical software STATA version 12 (StataCorp, College Station, TX, USA) and R statistical package, version 3.3.0 (R Foundation for Statistical Computing, Vienna, Austria) [32].

2.5 Model Performance

Model performance was evaluated based on two criteria. The primary criterion evaluated is monotonicity; logical consistency of the parameters with respect to the hierarchical structure of levels within the five dimensions. Parameters of increasing severity (of a dimension) should have higher disutility values (e.g., valueMO2 < valueMO3). The second criterion was out-of-sample predictive accuracy over mean C-TTO health state values. This was tested using a cross-validation approach [33, 34]; sequentially splitting the dataset into two subsets, fitting the models to one set, using the fitted models to predict the other set, and comparing the predicted and observed values. The splitting method tested here is known as the leave-out-by-block method [29]. Respondents were excluded from both C-TTO and DCE data depending on their assigned C-TTO block. One block was sequentially left out, serving to evaluate prediction accuracy of the fitted model using the remaining nine blocks of health states. Cross-validation as demonstrated previously [16, 29] has shown to be useful in revealing cases in which improved fit on observed data does not correspond to improved out-of-sample predictive accuracy. Mean square error (MSE), mean absolute error (MAE), Lin concordance correlation coefficient (CCC), intra-correlation coefficient (ICC), and Pearson product-moment correlation (R) were used to assess how similar predicted mean health state values are compared with mean observed values. Better predictive accuracy is indicated by lower MSE and MAE values, and higher CCC, ICC, and R values.

2.6 Exclusion Criteria

Respondents whose C-TTO value increased with health-state severity and those who valued −1 on all health states were excluded on the basis that these choices were not rational, and the respondents may not have understood the time trade-off process.

3 Results

3.1 Respondents

A total of 1137 out of the 1172 (97.01%) consenting respondents successfully completed interviews in eight locations in the four randomly selected Malaysian states: Penang (Bayan Baru, Balik Pulau), Kelantan (Kubang Kerian, Rantau Panjang), Melaka (Bandar Melaka, Jasin), and Selangor (Subang Jaya, Sungai Buloh). The remaining 35 respondents asked to discontinue the study as they either did not understand the valuation process or were not comfortable with the time trade-off process, or the forced choice presentation of the DCE. Twelve interviews met the exclusion criteria, leaving data from 1125 respondents to be included in the data analysis process. Table 1 outlines the demographic characteristics of included respondents. While respondents were largely representative of the general population in terms of gender, age, ethnicity, and residential area, those categorized as being employed full time/self-employed were slightly under-represented.

Table 1 Study sample characteristics (N = 1125)

3.2 Data Characteristics

The constant term for the 20-parameter model fitted to C-TTO data was statistically significantly different from 0. The constant was therefore retained in the C-TTO models. For the hybrid, the constant terms were non-significant, and were therefore dropped. Table 2 presents the parameter estimates for the four models tested in this study. In terms of the characteristics of values generated by the models (Table 3), hybrid models predicted slightly lower 55555 scores than their C-TTO-only model counterparts. Among the 3125 EQ-5D-5L health states, the proportion for which negative values were predicted was roughly equal between the tested models, with MULT8_TTO and ADD20_TTO having the highest (9.15%) and lowest (9.02%), respectively. Dimension ranking differed slightly in the preference of the top two, with C-TTO-based models favoring pain/discomfort, whilst the hybrid models tilted towards mobility.

Table 2 Parameter estimates of the tested models
Table 3 Comparison of score characteristics among models

3.3 Modeling Data

Logical consistency was present in all the four estimated models whereby higher severity parameters had higher disutility values. Subsequent cross-validation analysis revealed the multiplicative models exhibiting slightly better predictive accuracy (lower MSE and MAE; higher CCC, ICC, and R) when compared with their additive model counterparts (MULT8_TTO vs ADD20_TTO; MULT8_HYBRID vs ADD20_HYBRID). A similar trend of performance was observed in all predictive accuracy measures when comparing hybrid and C-TTO-only models, with the MULT8_HYBRID model slightly outperforming the other three models by all the prediction accuracy standards (Table 4). Thus, the MULT8_HYBRID model was chosen as it best represents the Malaysian EQ-5D-5L value set with a value range of − 0.442 to 1. The predicted values of the 3125 health states, together with SEs and 95% CIs are available in CSV (comma-separated values) format in the Electronic Supplementary Material. Figure 1a illustrates the range of the predicted 3125 health state values while Figs. 1b and 2 display how similar the predictions of MULT8_HYBRID are compared with the observed C-TTO health state values and rescaled DCE health state values, respectively.

Table 4 Cross validation of various models
Fig. 1
figure 1

Observed and MULT8_HYBRID predicted values for the EQ-5D-5L health states. a Values for the whole 3125 health states by descending order of MULT8_HYBRID predicted values. b Values for 86 health states by descending order of observed mean values. C-TTO composite time trade-off, EQ-5D-5L five-level EuroQol five-dimensional questionnaire, MULT8_HYBRID multiplicative eight-parameter hybrid model

Fig. 2
figure 2

Comparison of observed rescaled discrete choice experiment (DCE) and MULT8_HYBRID predicted values for the EQ-5D-5L health states included in the EQ-VT design. EQ-5D-5L five-level EuroQol five-dimensional questionnaire, EQ-VT EuroQol Valuation Technology, MULT8_HYBRID multiplicative eight-parameter hybrid model

In calculating the predicted index value for the health state ‘12345’ using the chosen MULT8_HYBRID model’s value set, the following formula is used:

$$\begin{aligned} {\text{Utility}}\;{\text{weight}}\left( { '12345 '} \right) & = 1 - {\text{MO}}1\left( 0 \right) - {\text{SC}}2\left( {0.062} \right) - {\text{UA}}3\left( {0.064} \right) \\ & \quad - {\text{PD}}4\left( {0.259} \right) - {\text{AD}}5\left( {0.300} \right) \\ & = 0.315. \\ \end{aligned}$$

4 Discussion

A Malaysian EQ-5D-5L value set was developed from the health preferences of 1125 Malaysians using a constrained eight-parameter hybrid model fitted to C-TTO and DCE data. The respondents were representative of the general population of Malaysia in terms of urbanicity, gender, age, and ethnicity.

Constrained, non-linear models were explored in this study as they have demonstrated better out-of-sample predictive accuracy of C-TTO values in a past study that compared linear main-effect models and constrained non-linear modeling using four-country datasets [29]. Conceptually, the eight-parameter non-linear model used in this study is a special case of the 20-parameter model, in which the relative distance between levels is constrained such that it is the same for all five dimensions. With fewer parameters and less complexity, the risk of overfitting is reduced, and parameters can be estimated with improved accuracy [33]. The observation that the constrained models produce better out-of-sample predictions suggests that any real deviation from the underlying assumption of shared relative level difference between dimensions is less substantial than the noise introduced by allowing the model to fit separate level parameters for each dimension. This was also demonstrated in our study, where cross-validation analysis revealed a slightly better fit of constrained models with the observed C-TTO values.

On the other hand, hybrid modeling has also gained a reputation for modeling utility values, as it overcomes logical inconsistencies [15, 19] encountered when C-TTO or DCE data are modeled individually. Although our models did not suffer from logical inconsistencies, hybrid modeling was also included as it incorporates utility responses from two distinct tasks, potentially resulting in a more complete view on the pattern of utility response. The two hybrid models tested in this study had on average a wider range of utility values and smaller bootstrapped SEs when compared with their C-TTO-only model counterparts. Importantly, the inclusion of DCE data in the modeling improved out-of-sample predictive accuracy over left-out C-TTO health state means. This has not been demonstrated in any published hybrid model to date. This supports the notion that C-TTO and DCE tap into the same underlying preference structure, at least in this specific valuation study. We know from secondary analyses of other existing EQ-5D-5L valuation study data that this is not universally true for already published EQ-5D-5L valuation studies. When we perform the same analyses on data from already published EQ-5D-5L valuation studies from other countries, we find that inclusion of DCE data tends to reduce out-of-sample predictive accuracy over left-out TTO data (results not presented here).

Interestingly, while pain/discomfort was the dimension with greatest impact on assigned values in the C-TTO-only models, mobility had relatively greater importance in the hybrids. This subtle but important difference was captured by the combination of response patterns from two conceptually different preference elicitation techniques trying to measure the same thing: matching tasks of the TTO and choice-based tasks of the DCE [15, 30, 35].

Model fit between individual observations and predicted values were not used as a performance indicator in this study considering that results would generally favor increased model complexity. As the validity of predicted values outside the scope of observed data cannot be truly assessed, cross-validation is an attractive proxy [16, 29]. The issue of hybrid models not having apparent counterfactual observed values may be a concern, as the cross-validation procedure was based on C-TTO blocks and values [15]. In this study, predictive accuracy over C-TTO means was chosen as the most important metric, and consequently, cross-validation was performed at the level of C-TTO blocks. If greater emphasis was placed on DCE, some metric of accuracy over DCE data could be chosen as a criterion, and cross-validation using DCE blocks could be employed. A combination approach would also be possible but would potentially indicate different models as superior for different criteria.

The best-performing model in this study was the constrained, non-linear eight-parameter hybrid, which displayed the greatest out-of-sample predictive accuracy over C-TTO health state mean, produced the widest range of health state values, and the smallest SEs among all models. The highest disutility weight was placed on the mobility dimension and the lowest on usual activities, which is similar to China [16], Hong Kong [17], and Thailand [20]. Many Asian countries [13, 14, 16,17,18, 20] having published EQ-5D-5L value sets to date place the highest emphasis on the mobility dimension. The similarities in ranking of dimensions usually end there as cultural and socioeconomic factors come into play to shape the preferences of the people. Indonesia and Malaysia are neighbors, and the citizens of the two countries share similar backgrounds but display quite distinct value set characteristics. ‘Pain and discomfort’ was ranked a close second in terms of preference in the Malaysian value set but was surprisingly the least important dimension from the Indonesian perspective. Indonesia also has a health state 55555 value of −0.865, which was significantly lower than Malaysia’s. These differences further support the need for country-specific value sets to better capture the health preferences of the population.

There are a number of study limitations to be considered. Malaysia as a country consists of Peninsular Malaysia (79.6% of the population, 11 states, two Federal Territories) in the West and Borneo island (20.4% population, two states, one Federal Territory) in the East. Our study only involved sampling clusters in Peninsular Malaysia due to resource constraints. Generally, the focus of economic development in Peninsular Malaysia has resulted in better job and study opportunities, leading to the mobility of East Malaysians to the West. Conducting our interviews in gathering spots such as hospitals, colleges, and market areas allowed us to maximize the respondent characteristic sampling. Accordingly, subsequent studies focusing on valuation data of East Malaysia residents should be conducted in the future to assess if there are any significant differences in the preference patterns compared with people residing in West Malaysia. Also, due to the nature of the sampling strategy, we do not have a response rate to report. There were potential respondents who approached the interviewer booths expressing interest in the study while there were those who declined when first approached but came back at a later time. This was common in rural areas where they had to finish chores before returning when they had free time. Based on the similarities of the sample characteristics with the general population, we believe the representativeness of Malaysians in terms of age, gender, ethnicity, and urbanicity is accounted for. However, those employed full-time or self-employed may be slightly under-represented and those outside the labor force over-represented in our study. Further analysis comparing the mean difference in observed C-TTO values between these two groups was not statistically significant, indicating the slight skew in employment status representativeness was not a major factor for concern. Finally, in reflecting the number of respondents who were excluded during the data analysis stage, there was a slight learning effect observed. Most of the excluded respondents were from the first cluster. However, the quality control monitoring process [25], together with experience, aided interviewers in improving the interview process, leading to only 1.1% (12 respondents) being excluded in total.

5 Conclusion

This study demonstrated that the constrained eight-parameter hybrid model has the best potential in representing the Malaysian value set. The establishment of the Malaysian EQ-5D-5L value set will facilitate its application in health-related research, especially in the Malaysian health technology assessment scene.

Data Availability Statement

The predicted values of the 3125 health states, together with standard errors and 95% confidence intervals, are available in CSV format as Electronic Supplementary Material.