Introduction

Preference-based measures are increasingly being used for economic evaluations of health policy. An economic appraisal referred to as cost–utility analysis (CUA) has been developed to compare the costs of a healthcare program with its consequences, including both quantity and quality of life (e.g., quality-adjusted life years, QALYs) [1]. Use of QALYs requires quality weights for each health state. Several multiattribute utility instruments and their tariffs are available: EQ-5D-3L [2], Health Utilities Index Mark 2 and 3 [3, 4], and SF-6D [5]. Most countries prefer to determine their own population-based preference weights for instruments used in such economic evaluations, and there are substantial differences in EQ-5D-3L value sets among countries [6, 7]. To date, EQ-5D-3L is the only multiattribute utility instrument for which valuation sets for general population in Korea have been generated [7, 8].

Recently, the EuroQol Group designed a new questionnaire—the EQ-5D-5L version (five response categories per dimension of EQ-5D)—to improve the sensitivity and reduce the ceiling effects associated with EQ-5D-3L [9]. Several studies, including a Korean study, showed good psychometric properties of EQ-5D-5L as compared with EQ-5D-3L [1012]. These findings will presumably increase demand for EQ-5D-5L in Korea. To use the EQ-5D-5L instrument in CUA or other outcomes research effectively, it is critical to determine the value set of EQ-5D-5L for the Korean population. The EuroQol Group also newly developed a valuation protocol for EQ-5D-5L and the EuroQol Group Valuation Technology (EQ-VT) program to implement EQ-5D-5L valuation studies uniformly across different countries. The EQ-VT program also provides information on interviewer performance to control data quality, including time spent and moves for the exercise task, lead time part use for the exercise example, time spent and moves to complete the 10 cTTO tasks, and extent of negative valuation. This is helpful to detect problematic interviewers early and retrain them.

This study aimed to determine the social value set for EQ-5D-5L health states using composite time trade-off (cTTO) and develop a prediction model for estimating the preference weights for all 3125 health states, in the Korean population. Additionally, we examined the feasibility of cTTO using computer-based, face-to-face interviews in Korea.

Methods

EQ-5D-5L

The EQ-5D-5L used in this study is the official version provided by the EuroQol Group and consists of five dimensions (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression), each with five levels (no problems, slight problems, moderate problems, severe problems, and extreme problems). The EQ-5D-5L health states are defined by combining one level from each of the five dimensions, for a total of 3125 possible health states; For example, “11111” indicates no problems on any of the five dimensions, while “12345” indicates no problems with mobility, slight problems with self-care (e.g., washing or dressing), moderate problems with usual activities, severe pain/discomfort, and extreme anxiety/depression [13].

Valuation survey

This study was approved by the Institutional Review Board of the National Evidence-Based Healthcare Collaborating Agency (approval number NECA IRB13-002), and all participants provided written informed consent. We applied the valuation methodology according to the EuroQol Group protocol for EQ-5D-5L valuation studies including composite TTO and discrete choice experiment (DCE) [14]. Composite TTO involves starting with the “conventional” TTO for all health states, which are then changed to “lead time” TTO when the participants’ responses indicate that they have a health state considered worse than being dead [15]. The lead time TTO applied in this study requires a 20-year time period (10 years of lead time followed by 10 years in the health state). Subsequently, the participants’ preferences were elicited by requesting them to choose between two health states. DCE asked respondents to complete a paired comparison between two different EQ-5D-5L health states. Additional details regarding the EQ-VT program and the cTTO and DCE tasks are available elsewhere [14].

Participant selection

EuroQol recommends a minimum of 1000 representative samples. The power calculation is based on 0.01 standard error of the observed mean cTTO [16]. A representative sample was recruited from the Korean general population. The target population consisted of adults (≥19 years) living in Korea (except Jeju Island), being mostly Koreans. Sampling was performed using the multistage stratified quota method. A sample quota was assigned to each of the 15 regions according to population structure (number of population in region, sex, age, and education level), as defined in the June 2013 resident registration data available through the Ministry of Administration and Security, South Korea.

Selection of health states

According to EuroQol protocol, a total of 86 EQ-5D-5L health states were selected for valuation using cTTO. The health states are displayed in Supplementary Appendix 1. The 86 health states consisted of 80 intermediate EQ-5D-5L states, 6 very mild health states (“11112”, “12111”, “11211”, “11121”, and “11112”) and a pit state (“55555”). All health states were divided into 10 blocks, each comprising 8 intermediate EQ-5D-5L states, 1 very mild state, and the “55555” state, resulting in a total of 10 health states per block. The DCE task consisted of 196 pairs of EQ-5D-5L health states divided over 28 blocks, each consisting of 7 pairs. Each block was randomly assigned to participants using EQ-VT software.

Interviews

All interviews were conducted according to the EuroQol Group’s EQ-5D-5L valuation protocol and training manuals. Data were collected using computer-assisted, face-to-face interviews. The EQ-VT program and accompanying interviewer manuals provided by the EuroQol Group were translated into Korean. Interviewers visited each participant’s home to conduct the EQ-5D-5L valuation tasks using a laptop. The interviewers were employed by a research agency and were experienced in EQ-5D-3L valuation study. The agency has several local branches. In total, 27 interviewers performed the interviews. Surveys were performed between August 9 and November 13, 2013. The interviewers were trained to perform each valuation method and use EQ-VT software, and all interviewers practiced in pairs and conducted three pilot surveys prior to field surveys. Participants were initially asked to evaluate their own health states using EQ-5D-5L. Next, they did four practice examples using cTTO (i.e., wheelchair state and three EQ-5D-5L health states including mild, moderate, and severe states). Then, each participant consecutively valuated 10 hypothetical health states using cTTO and 7 health state pairs using DCE. The cTTO valuation task consisted of conventional TTO for better-than-death states and lead time TTO with 10 years of lead time for worse-than-death states.

Analysis

Feasibility

Feasibility of cTTO was assessed according to administration time, number of inconsistent responses, and self-reported level of difficulty. Logical consistency took into account a given pair of health states, i.e., if one state of a pair was better than the other in ≥1 dimension and was not worse than any other, then the valuation of the former state must be at least as good as the valuation of the latter state [17]. Weak inconsistency allows ties between two states and was applied in this study [18]. The feasibility of the DCE task was assessed with regard to administration time and level of difficulty.

Model development and selection

A total of 1085 respondents were included in the DCE analysis. In matched pairs (i.e., comparison between two health states), conditional logistic regression was used to investigate the relationship between health state preference and basic independent variables. S1, S2, and S3 were the only parameter estimates that were statistically significant and logically consistent. P2 was statistically significant but logically inconsistent (i.e., the health state for severe pain/discomfort demonstrated higher preferences than the health state for slight pain/discomfort). In other words, the coefficient of a higher level was smaller than that of a lower level in most dimensions (data not shown). Therefore, we developed the valuation model using cTTO data alone.

The preference weights of the states in the cTTO task were calculated as: (years at full health state ‘A’ − 10)/(20 − 10). Thus, quality weight ranged between −1 and 1. Five respondents reported the same answer for all health states and were excluded from valuation set modeling; 1080 respondents were used in model development, regardless of inconsistent responses. It was likely that the five respondents did not understand the task or data were manipulated by the interviewers. Various model specifications were explored to examine cTTO; however, only the most appropriate models are reported. Linear mixed models with individual-level data were evaluated because the data were obtained from a sample population and our aim was to predict preference of other members in that population. We defined the dependent variable as 1 − cTTO quality weight for a given health state (y). The basic independent variables were the five dimensions of EQ-5D-5L, which were specified as 20 dummy variables indicating the levels 2, 3, 4, and 5 for each dimension. The additional variables were considered independent during model development. The independent variables used in the final alternative models are presented in Table 1.

Table 1 Definition of independent variables in final alternative models

We considered three criteria in the selection of the final model. Fundamentally, the model must demonstrate logically consistent predictions. Thus, the model should predict higher score for state A, rather than state B, if A was logically better than B on ≥1 dimension and no worse on any other dimension [18]. Second, we considered the goodness of fit of the model, which demonstrates how well the model explains the observed data. The mean absolute error (MAE)—the average of the absolute differences between observed and predicted values—was considered as an important indicator during model selection. A smaller MAE indicates a better model. Generalized R 2 and the number of health states with absolute error ≥0.05 or 0.1 were also determined to assess goodness of fit. Lastly, if models with similar MAE values were consistent, the simplest model was selected to maintain parsimony. All statistical analyses were carried out using SAS software version 9.1 (SAS Institute, Cary, NC, USA).

Results

Demographic characteristics of subjects

A total of 1085 of 3206 subjects completed interviews (33.8 %). Reasons for interview failure included refusal to participate (1155 persons, 35.4 %), inappropriate age, sex, or education level (456 persons, 14.0 %), absence (295 persons, 9.0 %), discontinuation of interview by respondent request (123 persons, 3.8 %), elimination during the quality control process (47 persons, 1.5 %, i.e., subjects who did not confirm by telephone or nonparticipation in the survey, or different gender and age from the survey during the quality control process), and removal during database upload (45 persons, 1.4 %, i.e., interviewer uploaded the survey via EQ-VT using laptop, but data were missing from the server).

Five respondents who responded with the same answer for all 10 health states of cTTO were excluded from the model dataset. General characteristics of the study participants are presented in Table 2. Pain/discomfort was the most commonly reported dimension (15.8 %). Approximately three-quarters of respondents’ EQ-5D-5L health profiles indicated full health (“11111”). The average EQ-VAS score was 83 [standard deviation (SD) = 10].

Table 2 General characteristics of study participants

Feasibility

The average administration time was 13.6 min for the 10 cTTO tasks and 4.4 min for the 7 DCE tasks. The proportion of inconsistent cTTO answers is presented in Table 3. In total, 717 respondents (66.4 %) demonstrated no logical inconsistencies, and the number of respondents who demonstrated >3 inconsistencies was 48 (4.4 %). Table 4 presents the self-reported level of difficulty in conducting cTTO and DCE, which was similar for both cTTO and DCE.

Table 3 Distribution of inconsistent answers at individual level
Table 4 Responses to feedback questions for composite TTO and DCE

Composite TTO data

The misery score is defined as the sum of the five digits of the EQ-5D-5L health state, ranging from 6 (i.e., very mild states) to 25 (i.e., the “55555” state). The highest utility weight was 0.88 when the misery score was 6, while the lowest was −0.07 when the score was 25 (i.e., the “55555” state). In addition, only 1 of the 86 states demonstrated a negative value on average. Higher misery scores were generally associated with lower values of health states. The distribution of cTTO on several EQ-5D health states is shown in Fig. 1.

Fig. 1
figure 1

Distribution of utility weight of health states by misery scores

Model development

Table 5 lists the representative alternative models that were logically consistent and demonstrated optimal goodness of fit. All five dimension variables were statistically significant in all alternative models. Mobility was the most influential variable in all models. The MAEs of the models varied between 0.027 and 0.031. The N4 model, which included basic indicator variables with N4 (i.e., the third model in Table 5), was selected as the final preferred model, because the MAE was low (0.027) and the model simple. In the final model, 12 health states had absolute error >0.05 and no health states had absolute error >0.1; For example, in the N3 model, the utility of the “12345” EQ-5D health state was obtained as follows:

Table 5 Parameter estimates of final alternative models (N = 1080)

Utility weight (“12345”) = 1 − constant (0.096) − M1 (0) − S2 (0.032) − U3 (0.051) − P4 (0.166) − A5 (0.137) − N4 (0.078) = 0.44

(Mk, Sk, Uk, Pk, and Ak denote the coefficient at level k on the mobility, self-care, usual activities, pain/discomfort, and anxiety/depression domains, respectively. Definition of variables as in Table 1. N4 is the coefficient in case any of the five health dimensions contains level ≥4

 

The observed and estimated quality weights for the 86 EQ-5D-5L health states are presented in Appendix 1. The absolute difference between the observed and estimated weight ranged between 0.001 and 0.096. The generalized R 2 value was 0.538 for all alternative models. The Pearson correlation coefficient between the magnitude of the health state weight and magnitude of the prediction error was −0.049. Applying our final model to the 3125 EQ-5D-5L health states, the maximum value was 0.883 for the “11211” state whereas the minimum value was −0.066 for the “55555” state. Four of the 3125 health states demonstrated negative values (Appendix 2).

Discussion

In this study, societal cTTO weights for 86 EQ-5D-5L health states were elicited from 1080 members of the South Korean general population. All quality weights for the 3125 theoretically possible EQ-5D-5L health states were estimated using random-effects modeling based on the cTTO weights. Computer-assisted cTTO was successful in determining the social preference of the Korean population. In previous research, the proportion of respondents with logical consistency according to TTO was 21.0 % in Denmark [20], 40.8 % in Spain [17], and 71.2 and 82.2 % in Korea [7, 8], whereas this study yielded 66.4 %.

Most sociodemographic characteristics of the respondents were not significantly different from the Korean general population, except for educational level. The group that had attended only elementary school or less accounted for 4.6 % of our study population, which is relatively lower than the Korean general population (13.2 %). The effect of educational level on valuation tasks remains inconclusive. Dolan et al. [21] reported that the TTO valuation process was insignificantly affected by education, whereas a Spanish study reported that education influences the values of health states [22]. According to our modeling analysis, age, sex, and educational level were not statistically significant variables (data not shown). Thus, the impact of differences in educational level may not be critical.

In accordance with the EuroQol standard protocol, DCE was included as a direct valuation method. DCE is regarded as a relatively easy method for valuating health states in general population [23]. However, the DCE analysis of this study generally demonstrated logical inconsistency. Two possible reasons for this phenomenon could be the following: First, quality control mostly focused on cTTO (rather than DCE) because, in general, cTTO is more difficult to understand than DCE by interviewers and study subjects. The DCE results of this study indicate that DCE quality control is as important as for cTTO. Second, respondents feel cognitive fatigue when performing DCE at the end of the survey, since the total survey required 31.6 min. Changing the order of cTTO and DCE would help assess the validity of this assumption.

Random-effects modeling including the basic indicator variables and the N4 variable using only data from cTTO was used for the predictive model because of its goodness of fit, MAE, and parsimony. During log-transformation modeling, prediction errors in the log model increased due to information loss during retransformation of the log scale. There is no consensus regarding minimally important difference (MID), but a study reported that the mean MID for EQ-5D-3L is 0.074, while the corresponding mean MID estimate for SF-6D is 0.041 [24]. Therefore, the magnitude of the MAE (0.027) determined in this study is acceptable. All regression coefficients in the final model were statistically significant. Of the levels included in EQ-5D-5L (except level 4), “mobility” demonstrated the largest influence on utility estimates, similar to other EQ-5D-3L valuation studies [18, 19, 25].

Differences between the coefficients of each level and dimension of the estimation model suggested enhanced discrimination properties of EQ-5D-5L. Compared with EQ-5D-3L, EQ-5D-5L demonstrated higher constants and, thus, tended to demonstrate lower quality weights for mild to moderate health states. However, the quality weights for severe health states on EQ-5D-5L tended to increase in comparison with EQ-5D-3L health states.

We explored the potential impact of inconsistent responses on our final model. We reanalyzed data according to the number of inconsistent responses (e.g., entirely consistent responses, +1 inconsistent response, +2 inconsistent responses). Most coefficients between datasets were nonsignificant, and the absolute magnitude was not too large.

One limitation of this study is the possibility that significant interviewer effects biased the final estimation model. The interviewer effect was statistically significant when a dummy variable for interviewers was included in our final preferred model. According to interviewers, mean moves per TTO task, time on TTO task, and amount required for negative valuation varied. Despite training interviewers for 1 day prior to survey, a pilot study, and quality control meeting after two sample surveys, an additionally reinforced training program seems to be required for valuation study. In addition, a ranking task prior to the main valuation task may help reduce logical inconsistencies. Another limitation of this study is the lack of validation of our model. As the sample size was insufficient, we included as much data as possible in the model estimation.

This study was performed according to the EuroQol Group’s final EQ-5D-5L valuation protocol. Our results are comparable to findings obtained in other countries. The final model built in this study appears to predict the utilities of the included health states which were directly valuated. This could be used to generate national tariff values for EQ-5D-5L health states and contribute to economic evaluations of healthcare interventions in Korea.