Introduction

Cancer is a prominent public health issue, ranking as the second most common cause of death worldwide [1, 2]. According to the International Agency for Research on Cancer (IARC), the estimated number of new cancer cases and deaths worldwide in 2020 is as high as 19.29 million and 9.96 million, respectively, resulting in 250 million Disability Adjusted of Life Years (DALYs) lost [2] and healthcare expenditures of up to $1.16 trillion [3].

Cancer is also a major concern in China. The incidence of cancer in China is 204 new cases per 100,000 population, which is slightly higher than the global average level of 200 new cases per 100,000 population [4]. According to the National Cancer Center, there were 4.57 million new cancer cases and 3 million cancer-related deaths in 2020 [5]. Cancer cases in China account for 23.7% of the total number of cancer cases in the world, while cancer deaths in China account for 30.2% of the total number of deaths in the world [2]. This is disproportionate to the number of Chinese people in the total global population (18.6%) [6]. IARC predicts that cancer incidence and mortality in China will continue to increase over the next 20–30 years [7].

Cost utility analysis (CUA) has been widely used in cancer-related health decision making, including cancer drug research and development [8], diagnosis and treatment of cancer [9,10,11], medical insurance access reimbursement [12], drug price negotiation [13], and other cancer-related fields [14]. Further, CUA is recommended as the preferred form of economic evaluation by many health technology assessment agencies such as the National Institute of Health and Care Excellence (NICE) [15], the Canadian Agency for Medicines and Health Technology (CADTH) [16] and the French National Health Agency (HAS) [17].

Quality-adjusted life years (QALY) are usually used in CUA to quantify health effects. Estimation of QALYs necessitates measurement of health-related quality of life (HRQL) and valuation of HRQL as health utility [18]. There are different ways to measure health utility, including direct measurement such as using the standard gamble method (SG) and indirect measurement such as using multi-attribute utility instruments (MAUIs). MAUIs use a descriptive system to measure HRQL and then a value set to determine its health utility. In general, generic MAUIs are recommended to measure health utility. However, generic MAUIs’ descriptive systems lack cancer-specific domains such as nausea and vomiting and therefore may not be sensitive enough to capture changes in HRQL in cancer patient [19]. When the decision-making goal is the efficient allocation of funds within specific conditions, the content validity of generic measures may be compromised [15]. Notably, cancer patients have judged cancer-specific MAUIs to have superior content validity to generic instruments [20, 21]. Multiple national pharmacoeconomic evaluation guidelines suggest that when there is evidence indicating that generic MAUIs are insufficient to reflect the disease characteristics of a specific patient population, condition-specific MAUIs should be used [15, 22].

To address this concern, the Multi-Attribute Utility in Cancer (MAUCa) Consortium developed a cancer-specific MAUI- EORTC QLU-C10D [23]. Compared with the generic MAUIs, QLU-C10D provides a more detailed description of health status of cancer patients and may be more sensitive to change in HRQL due to treatment for cancer as it contains seven symptoms commonly experienced by cancer patients including nausea and fatigue [23]. Following the practice of developing country-specific value sets for generic MAUIs such as EQ-5D, several national QLU-C10D value sets have been developed [24,25,26,27,28,29,30,31,32,33,34] and some studies are ongoing.

In China, value sets have been developed for a few generic MAUIs such as EQ-5D [35,36,37,38], EQ-5D-Y [39], and SF-6D [40]. However, there are currently no value sets for disease specific MAUIs. As the country with the largest number of cancer cases and the heaviest cancer burden in the world, there is a strong need for a national value set for QLU-C10D.

There is ongoing debate regarding whether to apply preferences from the general population or patients when developing value sets [41]. Arguably, for clinical decision-making patient preferences are more appropriate because they provide a patient-centered perspective [42]. However, for decisions about allocation of resources in a health system funded by public funds, the general population preferences are generally considered more appropriate [43]. For this reason, general populations been used in the development of QLU-C10D value sets in various countries. Importantly, previous research indicates that the general population can reliably respond to DCE questions in online QLU-C10D valuation surveys [44].

We followed the standard valuation protocol for QLU-C10D developed by MAUCa and EORTC [24]. The feasibility and test-retest reliability of this valuation method has been established [44, 45]. Subsequently, to promote international comparability, QLU-C10D valuation studies in other countries have also adopted this methodology, ensuring consistency and comparability across countries. Compared to traditional methods like the time trade-off (TTO) or standard gamble, DCE offers greater comprehensibility and ease of application, making data collection more efficient and results more intuitive [46]. However, the DCE online survey in this study also has some constraints, as detailed in the Limitation section.

Therefore, the purpose of this study was to use DCE to estimate a Chinese value set for QLU-C10D based on the societal preferences in China.

Methods

The recruitment, survey and analysis methods were consistent with previous QLU-C10D valuation studies [24,25,26,27,28,29,30,31,32,33,34] conducted by the MAUCa Consortium and the EORTC Quality of Life Group, i.e. using the standard valuation protocol required for EORTC-endorsement of QLU-C10D value sets.

Study design and sample

This study was a cross-sectional valuation study using general population derived from a Chinese online panel. Referring to the latest demographic data from the National Bureau of Statistics of China (NBSC) [47, 48] and the United Nations Statistics Division (UNSD) [49], an attempt was made to carry out quota sampling using age, gender, provinces/territory, residence, and education level to get a study sample which was representative of the China general population. However, only the quotas for gender and age were ultimately met. The sample covers urban and rural residents in all territory of Chinese Mainland, including 22 provinces (Hebei, Shanxi, Liaoning, Jilin, Heilongjiang, Jiangsu, Zhejiang, Anhui, Fujian, Jiangxi, Shandong, Henan, Hubei, Hunan, Guangdong, Hainan, Sichuan, Guizhou, Yunnan, Shaanxi, Gansu, and Qinghai), 5 autonomous regions (Inner Mongolia, Guangxi, Tibet, Ningxia, and Xinjiang) and 4 centrally-administered municipalities (Beijing, Tianjin, Shanghai, and Chongqing). All respondents met the following criteria: (1) aged between 18 and 79; (2) able to comprehend Chinese characters and without any cognitive impairments, such as dementia; (3) is of Chinese nationality and residing in mainland China; (4) able to provide informed consent.

Valuation survey

The DCE online survey was very similar to the original survey created in collaboration with the MAUCa Consortium and the EORTC QOL Group, translated in Chinese by Weidong Huang and Nan Luo. Participant recruitment and survey implementation was managed by Survey Engine, a company which specializes in choice modelling methods such as DCE and has a series of successful cooperations with the MAUCa Consortium and the EORTC QOL Group [24,25,26,27,28,29,30,31,32,33,34]. Invitations to potential respondents who met the above-mentioned criteria were sent. Information explaining the purpose and content of the survey were included. We took continuing with the survey as implied consent. Participants were then asked to complete four sections of the survey: 1), survey on self-reported health problems, including general health using QLQ-C30, the Kessler-10 (mental health) questionnaire, and EQ-5D-5 L(all questionnaires were official Chinese versions endorsed by the questionnaire developers, which was forward-backward translated for use in the survey); 2), DCE tasks, which included 16 choice sets; 3), some feedback questions on the DCE tasks, including four questions on task difficulty, clarity, and strategies used in the DCE tasks; 4), some sociodemographic questions involving gender, age, education, and residence.

The QLU-C10D health state classification system

QLU-C10D is a cancer specific MAUI developed by the MAUCa Consortium based on the QLQ-C30 [23], The “10D” indicates that its health classification system covers the 10 dimensions of QLQ-C30: Physical functioning, Role functioning, Social functioning, Emotional functioning, Pain, Fatigue, Sleep, Appetite, Nausea, and Bowel problems. For each of these dimensions, there are four levels of severities: level 1 (not at all), level 2 (a little), level 3 (quite a bit), and level 4 (very much). Table 1 shows the QLU-C10D health state classification system in detail. For example, health problems reported by a respondent that includes “quite a bit” in physical functioning and role functioning, “very much” in social functioning, and “not at all” in the remaining dimensions would be recoded as “3341111111”. All the combinations of domains and levels describe a total of 1,048,576 unique health states.

Table 1 Health state classification system of the QLU-C10D

DCE design

This study employed the DCE methodology, previously described by King et al. [24] and used in valuation studies across various countries [25,26,27,28,29,30,31,32,33,34]. The QLU-C10D health state classification system encompasses over a million potential health states (410 = 1,048,576). We implemented a designed experiment to choose 960 choice sets, aiming to maximize statistical efficiency in estimating the utility model parameters. In the DCE task, respondents were asked to choose one of two hypothetical scenarios, A or B, both consisting of 12 attributes: 11 attributes based on the 10 domains of the QLU-C10D (the physical functioning dimension was split into 2 attributes, one for ‘long walk’ and one for ‘short walk’ (see Fig. 1), to ease respondent burden and increase comprehensibility) and a survival time of 1, 2, 5, or 10 years. However, in both the experimental design and data analysis, the physical functioning dimension was treated as a single four-level dimension. There were a total of 16 choice sets (each consists of a pair of choices) for the hypothetical scenarios randomly selected from 960 choice sets which were determined by methods of optimal design theory. Each time a choice set was seen, which hypothetical choice option was seen as Health State A or B was randomized within each choice set to mitigate any ordering bias. The 11 attributes of 10 QLU-C10D domains were always presented in the same order, as a previous QLU-C10D DCE methodology study indicated no systematic bias in utility weights due to dimension order [50]. In addition, to minimize burden on respondents, we began with a balanced incomplete block design (BIBD) to define which four of the ten QLU-C10D dimensions differed within choice sets, while the remaining attributes were kept equal. Existing researches have evaluated the feasibility and reliability of the DCE format applied in this study [44, 45]. An example of a choice task is presented in Fig. 1; the version used in the survey was in Chinese (Figure A from online supplementary material).

Fig. 1
figure 1

An example choice set from the discrete choice experiment valuation task

Data analysis

Descriptive statistics (frequency and percentage or mean and standard deviation) were performed for the characteristics of the sample. Chi-square tests were used to assess the study sample’s representativeness of the Chinese population for gender and age (population data available from the Population Statistics for Mainland China; age, gender: UNSD, 2010 [48]), highest level of education (data available from China Statistical Yearbook 2021 [44], and residence of rural/urban (data available from National Bureau of Statistics of China,2019 [47]).

Following previous research [24,25,26,27,28,29,30,31,32,33,34], two approaches were performed to analyze the DCE data, including conditional logistic regression and mixed logistic regression, with the former serving as the base (Eq. 1) and the latter as the supplement (Eq. 2).

In the first approach, modeling analysis was performed to fit data to Eq. (1), in which the utility of option j (A or B) in choice set s for respondent i is described by the following formula:

$${U_{isj}}=\alpha {\text{TIM}}{{\text{E}}_{isj}}+\beta X_{{isj}}^{\prime }{\text{TIM}}{{\text{E}}_{isj}}+{\varepsilon _{isj}}$$
(1)

where TIMEisj is the survival time presented in option j (i.e. 1, 2, 5 or 10 years) and isj is a set of dummies related to the levels of the corresponding health state. The errors εisj were assumed to be independent and identically Gumbel distributed. The conditional regression results were converted into utility decrements for a set of preference weights for each dimension to reflect the trade-off between HRQL and length of life by taking ratios of the health-state parameters β(vector) and the time coefficient α(scalar). If the utility decrements obtained in the DCE analysis did not show a monotonically increasing pattern with increasing severity levels, a modified conditional logistic regression model will be run instead, with non-monotonic coefficients being constrained the same as the adjacent level coefficient [24].

In the second approach, the analyses of the DCE data were performed using a mixed logit model. In this model, it is assumed that the coefficients 𝑎 and 𝛽 are drawn from a distribution, which has the advantage of simulating preference heterogeneity. In this model, it is assumed that the coefficients α and β are drawn from a distribution (Eq. 2), thus allowing for heterogeneous preference patterns between respondents in the QLU-C10D domains. The coefficients α and β are assumed to be selected from the distribution in this model, and γi and ηi reflect individual differences in mean utility; isj assumes to follow a multivariate normal distribution (0, ∑). More details can be found in the paper on the Australian valuation study [24].

$${U_{isj}}=(\alpha +{\gamma _i}{\text{)TIM}}{{\text{E}}_{isj}}+(\beta +{\eta _i})X_{{isj}}^{\prime }{\text{TIM}}{{\text{E}}_{isj}}+{\varepsilon _{isj}}$$
(2)

We estimated both unweighted and post-stratification weighted models. For the latter, we used iterative proportional weighting, implemented with the ipfweight command in STATA, to account for respondent characteristics that were not representative of the Chinese general population, thereby providing a final unbiased estimate of effects for the population.

Data quality was assessed using the following metrics. First, we counted the number of respondents who consistently chose either all A’s or all B’s across the choice sets, and then re-estimated the final applied model excluding their data. Second, we divided respondents into deciles based on the total survey time, ran a conditional logit analysis on the DCE data within each decile, and plotted the pseudo-R² and the number of statistically significant coefficients for each decile.

Results

Sample characteristics and representativeness

In the main valuation survey, 5244 individuals initially opted in. Of these, 3248 individuals met the pre-defined quotas and were able to proceed with the survey. Among these, 2003 (61.7%) completed all DCE choice sets and were included in the valuation analysis. Participants were recruited and surveyed between May 27, 2021 to July 6, 2021. Participants completed the survey with an mean response time of 12.2(standard deviation (SD) = 7.4)minutes for all choice sets. The study sample was representative of the general Chinese population with regards to age and gender, due to quota sampling, but not in terms of education and residence. More than one third of the participants (37.3%) suffered from a chronic disease. This was significantly higher than the percentage of people with chronic disease in the general Chinese population (34.2%) [51]. The highest and lowest proportion of “no problem” reported using the dimensions of EQ-5D-5 L were 90.9% from ‘self-care’ and 59.4% from ‘anxiety/depression’, respectively (Table 2).

Table 2 Sociodemographic and clinical characteristics

Utility estimates

Given the differences in education level and area of residence between the survey sample and the general population, the conditional logit model results were reported in terms of both unweighted QLU-C10D utility decrements (Table 3) and QLU-C10D utility decrements post-stratification weighted based on education level and residence (Table 4).

Table 3 Unweighted QLU-C10D utility decrements (unconstrained and corrected for monotonicity)
Table 4 Post-stratification weighted QLU-C10D utility decrements (weighted for education and residence)

Considering first the unweighted results, in the “Unconstrained” column of Table 3, the effects of most dimension levels were negative, with the exception of the second level coefficients for Role functioning and Social functioning, which were positive (0.008 and 0.002). In each dimension, incremental moves to the next worse level were associated with significantly greater coefficients. There were inconsistencies between level 3 and level 4 in dimensions such as Fatigue, Sleep, and Nausea, i.e. the more severe level 4 was preferred over the less severe level 3. As indicated in the second column of Table 3, we thus imposed monotonicity to correct the respective levels to the same level and re-estimated the model to eliminate these non-monotonicities. The utility weights in the unconstrained and monotonicity-corrected models were generally similar. Figure 2 depicts the change in levels after correction for each of the dimensions.

Fig. 2
figure 2

Unweighted QLU-C10D utility decrements (after correction for each of the dimensions)

Since the sample was not representativeness of the Chinese general population for education level and residence, the results incorporating post-stratification weights based on education level and residence are presented in Table 4. Compared to the unweighted results, the post-stratification weighted results contained more disordered utility decrements (non- monotonicity) even with coefficient constraint. For example, some decrements were rather large and there were quite a few zero decrements (see Fig. 3). We therefore recommend using the unweighted results as the value set for China.

Fig. 3
figure 3

QLU-C10D utility decrements, analysis with weighting for residence and education (after correction for each of the dimensions)

For completeness, the mixed logit was estimated as an exploratory extension, to investigate whether preferences were homogenous across the sample, revealing additional insights that the conditional logit model may not capture. Table A in online supplementary material contains the results of the mixed logit. Overall, the results of the mixed logit were comparable to the conditional logit analysis, though the non-monotonic dimensions were different. The standard deviations of the vast majority of dimensions were extremely significant, reflecting considerable heterogeneity in preferences between respondents. However, since preference heterogeneity is not particularly relevant for health service decision-making, we used the results from the conditional logit model for the utility scoring algorithm, consistent with other country-specific QLU-C10D utility algorithms [24,25,26,27,28,29,30,31,32,33,34].

Data quality

Figures B and C in the online supplementary material detail the findings regarding data quality. In summary, the results of the data quality check indicate that our dataset was of high quality overall. Excluding participants with the fastest response times or those who consistently chose only the first or only the second of the options in a choice set regardless of the attribute levels did not substantively affect the model estimates, confirming the stability and reliability of our results.

QLU‑C10D utility value set

We elected to use the unweighted results from Model 1 (in Table 3) to generate the QLU-C10D utility values. Since the QLU-C10D is not recommended to be used separately from the QLQ-C30, in order to calculate the QLU-C10D utility scores, responses to the QLQ-C30 need to be converted to the QLU-C10D dimensions and levels (see Table 1 for details), and the individual utility score is calculated as follows:

$${\text{QLU-C10}}{{\text{D}}_p}=1 - \sum\limits_{{d=1}}^{{10}} {{w_{dl}}} \left| {{\text{QLU-C10}}{{\text{D}}_{dlp}}} \right.$$
(3)

where w is the utility weight for each level l of dimension d of the QLU-C10D. For example, if a person describes 4,323,132,342 as their health state, their utility score in the Chinese value set would be: 1-0.316-0.073-0-0.037-0-0.046-0.038-0.041-0.065-0.002 = 0.382.

Discussion

This study estimated a QLU-C10D value set for mainland China, and it has implications for health technology assessment. First, it significantly strengths QLU-C10D as an international utility instrument by expanding its application to the largest cancer population in the world. Second, it provides a new option for CUA of cancer treatment in China. As the descriptive system of QLU-C10D is derived from QLQ-C30 [52], a widely used cancer-specific HRQL questionnaire around the world including China [53], historic QLQ-C30 data can be used to calculate QALYs. This is a great advantage as it may help save resources and time for collecting primary HRQL data.

This study also has important implications for health utility research. First, the new value set will facilitate cross-country comparisons of the health preferences for cancer-related health outcomes between eastern and western populations. Second, this study suggests that valuation studies via online survey is feasible in China. Previous valuation studies for EQ-5D and SF-6D in China [35,36,37,38,39,40] used face-to-face interviews to collect preference data, which is both expensive and time-consuming. Compared to those study, the current study is much cheaper and faster, and the research sample covers urban and rural residents in all regions of Chinese Mainland, which is unprecedented in developing value set in China with its vast territory and numerous provinces, although online surveying has its limitations (as discussed below). Among other countries that have provided the QLU-C10D value set presently, only Canada [25] and Japan [34] have attempted to quota-sample by region or provinces of residence in addition to gender and age quotas.

Because all QLU-C10D country-specific value sets are based on a standardized valuation methodology, direct comparisons can be made. The key similarity is that the greatest utility decrement was observed in the dimension of Physical functioning [24,25,26,27,28,29,30,31,32,33,34], followed by Pain and Role functioning, and the smaller decrements occurred in cancer-specific dimensions such as Nausea/Vomiting. Also similar to other QLU-C10D valuation studies, non-monotonic was observed mainly in dimensions with smallest utilities (i.e. Fatigue, Sleep, and Nausea/Vomiting). The QLU-C10D value range in China (0.083, 1) is similar to that in Italy [28], Poland [28] and the USA [31], (pit states of 0.025, 0.048 and 0.032 respectively), indicating that the worst health state is better than being dead (0 indicates a health state equivalent to being dead). However, this range is narrower than in other countries where the worst QLU-C10D health state was considered worse than being dead, with France having the most extreme pit state value (-0.411) [29].

It is worth noting that the utility decrements for Emotional functioning ranks the smallest in the Chinese QLU-C10D value set, which differs from all other QLU-C10D value sets except for the one for German’s [26] and Austria’s [28], which is similar to China’s. The weighting of mental health-related dimensions is also relatively low in the Chinese generic value set [35,36,37,38,39,40], and even the mental health dimensions represented by Anxiety and Depression in the EQ-5D-3 L value set established by Liu et al. [38]. It has been suggested that people in Western countries place higher value on mental health than people in Eastern countries, perhaps because it has been destigmatized in recent decades. A multi-country study found that Chinese people are less to report mental illness than European American and Chinese American [54]. This is probably an example of socio-culturally driven differences. Other sources of between-country differences in QLU-C10D values may include population demographics, socio-cultural factors, and perceptions of health status across countries, and linguistic differences incurred during translation of the DCE survey [28, 55]. These differences support the use of country-specific value sets.

The Chinese QLU-C10D utility values estimated in this study showed both similarity and difference compared to those generated in China by generic MAUIs, specifically the EQ-5D-5 L [35] and SF-6Dv2 [40]. The similarity is that the greatest utility decrement occurs in Physical functioning and Pain in both cancer-specific and generic MAUIs. This is not surprising as all value sets are based on the preferences of the general public in China. The difference lies in the range of the utility values. The Chinese EQ-5D-5 L [35] and SF-6Dv2 [40] value ranges are [-0.391,1] and [-0.277,1] respectively, which are wider than that of QLU-C10D. These differences can be attributed to the different health state classification systems and valuation methods. The two Chinese generic value sets were developed using a TTO technique that requires weighing the duration of health status. Valuation methods have a significant impact on the resultant utility values [56].

Our study has some limitations. Firstly, as the survey was self-administered online, we relied on respondents’ self-reported experience to assess their understanding and engagement with the valuation tasks. Given the cognitive demands of the DCE tasks, some respondents might have misunderstood or used heuristics, potentially introducing bias. However, the self-reporting experience was satisfactory, and Gamper et al. have shown good reliability of the QLU-C10D valuation [44]. Secondly, the study sample was overrepresented by individuals with higher education and living in urban areas, a recurring problem in QLU-C10D studies and an inevitable limitation of online surveys due to the exclusion of individuals with poor computer skills or low literacy. In order to mitigate this issue, we calculated post-stratification weight (based on education level and residence), but the modelling results were poor. Some valuation studies using the TTO method have shown that education level does not have an impact on health preferences [57, 58] in China. Future studies are needed to investigate whether respondent valuations in DCE tasks are driven by sociodemographic characteristics of respondents.

Conclusions

This study provides the first utility weights and value set for the new cancer-specific MAUI, EORTC QLU-C10D for China. This new value set will significantly extend the boundaries of international applications of the QLU-C10D, and will provide a new utility instrument for economic evaluation (especially CUA) of cancer treatment in China. Future studies are needed to compare it with existing generic MAUIs (e.g., EQ-5D-5 L) to evaluate its performance in target populations and provide a scientific basis for its widespread use.