FormalPara Key Points for Decision Makers

The goodness-of-fit performance of mapping onto the Short Form 6D (SF-6D) is significantly better than using the 5-level EQ-5D (EQ-5D-5L), thus the derived SF-6D utilities from mapping functions are more appriopriately for calculating quality-adjusted life year.

The response mapping used in this study can largely improve prediction accuracy, especially for SF-6D.

It should be noted that the best models were not able to predict negative EQ-5D-5L utility scores, hence they may not predict well for the poor health states.

1 Introduction

Breast cancer is the most common and second deadliest cancer in women worldwide, and it is estimated that 1.4 million women a year receive a diagnosis of breast cancer [1, 2]. In 2018, it was the most prevalent cancer and the fifth leading cause of cancer death in Chinese women [3]. The treatment for breast cancer patients resulted in a substantial financial burden on the Chinese health care system [4].

To prioritize health care resource allocation, health economic evaluation, especially cost-utility analysis (CUA) has become a preferred method [5]. In CUA, effectiveness is measured by using quality-adjusted life years (QALYs), which are calculated by multiplying the life-years by the health state utility scores. The European Organization for Research and Treatment of Cancer Quality of Life Questionnaire (EORTC QLQ)-BR53 (which consists of QLQ-C30 and QLQ-BR23) is one of the most widely used disease-specific outcome measures in breast cancer studies [6,7,8]; however, it is not a preference-based instrument and cannot be used to calculate the QALYs.

Mapping (or ‘crosswalk’) provides a solution to predict health state utility scores from a non-preference-based quality of life instrument [9]. The predicted utility values can then be analyzed using standard methods for trial-based analyses, or summarized for each health state within an economic model [10, 11]. This method has successfully been used in predicting the 3-level EQ-5D utility using breast cancer patients [12, 13]. This study aimed to develop mapping algorithms from the QLQ-BR53 onto either the 5-level EQ-5D (EQ-5D-5L) or the Short Form 6D (SF-6D) utility scores based on breast cancer patients in China. The output from this study will facilitate future CUAs in which either the QLQ-C30, QLQ-BR23, or 53-item QLQ-BR53 is included.

2 Materials and Methods

2.1 Study Population

A total of 621 female inpatients with breast cancer were recruited from Qingdao Municipal Hospital in China between October 2014 and February 2015. The inclusion criteria were patients who had been diagnosed with breast cancer, aged 18 years and older, and provided written consent to participate in the study before the interview. The exclusion criteria were patients who were unwilling to provide informed consent or could not understand the questionnaires, had breast cancer in combination with other serious diseases, or the patient was not yet 18 years of age at the time of the survey. Information on sociodemographic characteristics, clinical data, and health-related quality of life (HRQoL) was collected using two methods in a single visit: (1) face-to-face interview using an essential information questionnaire as well as three standard instruments, i.e. QLQ-BR53, EQ-5D-5L and SF-36 (used to derive the SF-6D); and (2) using the medical records of patients for clinical information. Among 621 respondents, 14 patients who had missing values on key questions were excluded from the mapping analysis. For the remaining 607 patients, there were some missing values in four QLQ-BR53 domain scores. Informed consent was obtained from all participants before the completion of the instrument survey. Ethical approval (reference no. 20131002) was obtained from the Ethics Review Board of the School of Public Health, Shandong University, and the research adhered to the tenets of the Declaration of Helsinki.

2.2 Instruments

2.2.1 QLQ-BR53

The QLQ-BR53 consists of two instruments: (1) the QLQ-C30, which contains 30 items and covers five functional scales, three symptom scales, a global health status/QOL scale, and six single items; and (2) the QLQ-BR23, which contains 23 items divided into five multi-item scales assessing systemic therapy side effects, arm symptoms, breast symptoms, body image and sexual functioning, besides, three single items evaluating sexual enjoyment, hair loss and future perspectives [6]. The score for each subscale was calculated by summing responses for all items in each subscale according to the official EORTC scoring manual [14]. The raw scores of the participants’ responses were then linearly transformed to a 0–100 scale, with a higher score indicating a better quality of life for the functioning and global health status, but a poorer quality of life for severe symptomatic problems. The QLQ-BR53 shows reasonable reliability, validity and responsiveness, and can be used to measure quality of life for Chinese patients with breast cancer [7].

2.2.2 EQ-5D-5L

The EQ-5D-5L is a common and validated instrument consisting of five dimensions of health (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression), which are characterized by five levels (no problems, slight problems, moderate problems, severe problems, and extreme problems) [15]. The EQ-5D-5L was scored using a Chinese-specific tariff developed based on a time trade-off method. The Chinese tariff has a theoretical range of scores from − 0.149 to 1.0 [16].

2.2.3 SF-6D

The SF-6D was constructed from 11 items selected from the SF-36 [17]. The SF-6D is based on a six-dimensional health state classification that assesses physical functioning, role limitations, social functioning, bodily pain, mental health, and vitality. Each dimension of the SF-6D has 4–6 levels and can be used to describe 18,000 health states [18]. In the absence of the mainland China utility algorithm, the Hong Kong tariff was used for this study, and has a theoretical range of scores from 0.315 to 1.0 [19].

2.3 Statistical Analysis

2.3.1 Crosswalks

This study was conducted in accordance with the ‘MApping onto Preference-based measures reporting Standards’ (MAPS) checklist [20] (see Online Resource 1) and the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Good Practices for Outcome Research Task Force report on mapping to estimate health state utilities [9], as well as a systematic review on mapping studies in the annual report of the National Institute for Health and Care Excellence (NICE) [21].

Patient characteristics were described using mean ± standard deviation (SD) or percentage in the sample. We tested for normality of variables using the Shapiro–Wilks test. The degree of conceptual overlap between the source and the target variables was examined using Spearman’s rank correlation. Three model specifications were considered in this study. Considering the potential multicollinearity among the QLQ-BR53 items in the regression, the mapping functions focused on the dimension scores of the QLQ-BR53. In Model 1, only dimensions from QLQ-C30 were considered, while in Model 2 only dimensions from QLQ-BR23 were considered, and, finally, in Model 3, all dimensions from QLQ-C30 and QLQ-BR23 were considered. One additional and commonly available demographic characteristic from other datasets, i.e. age, was also considered in the mapping functions. The potential non-linear effects from dimension scores can be captured by some econometric methods adopted and introduced below. A stepwise regression technique was used to help choose the final statistically significant (i.e. p < 0.05) predictors.

Both direct and indirect (response) mapping analyses were conducted. In this study, six commonly used statistical methods were adopted for direct mapping and one method was used for indirect mapping [22]. It should be noted that when mapping onto the SF-6D utility, two censored models were not used since there was no censoring issue for the SF-6D utility.

  1. 1.

    The ordinary least square (OLS) model was used to estimate the unknown parameters by minimizing the sum of squared errors from the data [23]. This is the most widely used method in mapping studies [24].

  2. 2.

    The Tobit model takes a better account of the censored nature of EQ-5D data, dealing with truncated data, and can approximate for skewed data by setting the upper limit to 1 [25].

  3. 3.

    The censored least absolute deviation (CLAD) model is a censored model to estimate conditional medians, such that it is robust to distributional assumption and heteroscedasticity [26].

  4. 4.

    The generalized linear model (GLM) allows for the non-normal distribution of dependent variables (e.g. left/negatively skewed utility scores) [27]. In this study, the GLM was estimated using a Gaussian distribution with a log-link function, which was identified to produce better goodness-of-fit from different combinations of family and link functions.

  5. 5.

    The robust MM-estimator (MM) model is designed to deal with some limitations of traditional regression methods, including heteroscedasticity and the presence of outliers. It has been shown to have both a high breakdown point (i.e. the percentage of incorrect observations an estimator can handle before giving an incorrect result) and a high efficiency [28]. It was first introduced into the mapping literature by Chen and colleagues for both adolescent and adult samples, and was found to have good performance [29, 30].

  6. 6.

    The finite mixtures of beta regression model (BETAMIX) is a general version of the truncated inflated beta regression model for variables with truncated supports either at the top or bottom of the distribution [31]. The model is robust to skewness and can estimate both unimodal and bimodal utilities [32].

The choice of the above methods cover a wide range of potential challenges when deriving the mapping functions. For instance, the MM and CLAD are robust estimators that are less influenced by potential outliers in the dataset and they can cope with the potential heteroscedasticity [29]. The Tobit and CLAD models can cope with the censoring issue that a large proportion of health state utilities equal 1 in the dataset [33]. The GLM and BETAMIX estimators are usually handling the skewness distribution of the study data [34]. Although, theoretically, different estimators have their strengths and may be better suited for the study data, the empirical evidence is crucial for the justification of the optimal estimator in the mapping studies. Although its assumptions have been violated, the OLS still has better performance when compared with other advanced methods [10, 11].

For indirect mapping, the response levels of each dimension were firstly predicted, followed by using the country-specific tariff to generate the overall utility score. Since the response level is an ordinal variable (e.g. no problems, slight problems, moderate problems, severe problems, and extreme problems), a multinomial or ordered logit model (OLOGIT) is commonly used [35]. Following on from Chen et al. [36], an OLOGIT [37] was adopted in this study and was followed by using the corresponding Chinese-specific algorithms to calculate the overall EQ-5D-5L/SF-6D utility scores.

2.3.2 Goodness-of-Fit Indicators

Predictive ability was mainly assessed based on the mean absolute error (MAE) and the mean squared error (MSE). Three additional indicators were also considered, including Lin’s concordance correlation coefficient (CCC), and the proportion of predicted utilities deviating from observed values by absolute error > 0.05 and > 0.1. With the exception of CCC (the higher the value, the better the performance), the lower the value, the better the mapping performance, as indicated by goodness-of-fit indicators. In cases where the prediction exceeds the theoretical range of the targeting utility (e.g. the predicted maximum utility is above 1.0), to mimic the real-life solution, those predictions were truncated at the theoretical maximum and/or minimum utility scores before the goodness-of-fit statistics were calculated. The goodness-of-fit results without this adjustment can be found in Online Resource 1.

Two internal validation procedures were performed. In the first (Validation I), the whole sample was randomly divided into five groups. In each group, 80% of the sample was used to calculate the mapping algorithm and the remaining 20% was used to predict the health state utility with the above mapping algorithm. These procedures were repeated five times, such that all groups were used as both predictors and predicted samples. In the second internal validation procedure (Validation II), the random sample had a sample size of 300, which was generated from the full sample to validate the mapping functions. The final mapping algorithm was developed based on the optimal statistical methods identified from the validation exercises using the full data. All analyses were conducted in Stata version 14.0 (Stata Corp LLC, College Station, TX, USA).

2.3.3 Model Comparisons

In this study, the five indicators mentioned above (MAE, MSE, CCC, AE > 0.05 and AE > 0.1) were used to evaluate the predictive accuracy of the models. The optimal econometric method for each model specification was identified based on the number of times it has produced the best goodness-of-fit indicators in two types of validation processes (i.e. a total of ten indicators).

3 Results

3.1 Patient Characteristics

The sociodemographic and clinical characteristics of 607 patients are presented in Table 1. The mean age of participants was 49.0 years (SD 9.8) and 32.8% were either illiterate or completed only primary school education. The majority (88.6%) of participants were married and 50.7% lived in the city. Approximately two-thirds (63.6%) of patients had breast cancer for a duration of up to 3 years; 48% were classified as TNM stage III and IV and approximately half (55%) had a premenopausal status.

Table 1 Patient characteristics (N = 607)

3.2 Descriptive Statistics and Conceptual Overlap

Table 2 shows the descriptive statistics on the quality of life instruments. The mean utility score derived from the EQ-5D-5L was 0.828 (SD 0.184), and 0.646 (SD 0.125) for the SF-6D. Among all the 23 dimensions of QLQ-BR53, the mean score of the highest dimension (i.e. sexual functioning) was 88.963 (SD 15.933), while the dimension with the lowest score was diarrhea at 10.434 (SD 18.886). The overlap between QLQ-BR53 and the utility scales (EQ-5D-5L and SF-6D) is presented in Table 3. Among the QLQ-BR53 dimensions, most of the functioning dimensions (e.g. physical functioning and role functioning) and symptom dimensions (e.g. pain and arm symptoms) generally provided stronger (r ≥ 0.5) correlation with the subscales of EQ-5D-5L and SF-6D. Compared with dimension scores, the utility scores generally provided a strong correlation (r ≥ 0.5, highlighted in bold) with most QLQ-BR53 subscales. All correlations were statistically significant (p < 0.01).

Table 2 Descriptive statistics for health utility and HRQoL variables
Table 3 Spearman rank correlations of QLQ-BR53 dimension scores with EQ-5D-5L and SF-6D utility values (N = 607)

3.3 Goodness-of-Fit of Mapping Functions

Tables 4 and 5 present a summary of goodness-of-fit statistic results for mapping onto the EQ-5D-5L and SF-6D, based on two validation analyses (i.e. a five-fold pooled validation and a random sample validation), as well as full-sample analyses. It can be seen that the identified optimal econometric methods varied in three model specifications.

Table 4 Goodness-of-fit results from validation analyses
Table 5 Goodness-of-fit results from the full sample (N = 607)

When mapping onto the EQ-5D-5L, the Tobit model had the best performance across two types of the validation process for Model 1. The indirect mapping via OLOGIT showed good performance, especially in the second validation process. In Models 2 and 3, the CLAD had the best mapping performance.

Regarding the full-sample results reported in Table 5, the CLAD estimates remain to show the best performance in Models 2 and 3. The performance for Model 1 was mixed, such that the Tobit model had the best performance based on the CCC, while the indirect mapping OLOGIT had the best performance on MAE and MSE. However, it should be noted that since no patients reported the fifth level of the anxiety/depression dimension in the EQ-5D-5L, the indirect mapping cannot predict this level. Based on the above considerations, the Tobit model estimates are still chosen as the best mapping function onto the EQ-5D-5L for Model 1, and the CLAD model was chosen for Models 2 and 3. It should also be seen that regardless of the model specifications, all estimators tended to overestimate the lower limit of the EQ-5D-5L utility score. Furthermore, among all estimators, the BETAMIX and OLOGIT models were able to predict negative EQ-5D-5L utility scores.

Regarding the performance of mapping onto the SF-6D, the indirect mapping based on OLOGIT had the best performance for Models 1 and 3, while the MM-estimator had the best performance for Model 2, in the two validation analyses (Table 4). For Models 1 and 3, except for OLOGIT, the GLM estimates had the best mapping performance among the direct mapping functions. Regarding the full-sample analyses, the above conclusion holds for all three models (Table 5).

Figure 1 shows the scatter plots of observed versus predicted EQ-5D-5L/SF-6D utilities from the optimal econometric methods of each model specification, while Fig. 2 shows the scatter plots of the indirect mapping results based on OLOGIT. The distributions of the prediction errors of direct mapping for the optimal methods of each model specification are shown in Fig. 3, while Fig. 4 shows the predicted error of indirect mapping results.

Fig. 1
figure 1

Scatter plot of observed versus predicted values for optimal direct mapping approaches. CLAD censored least absolute deviation model, GLM generalized linear model, MM robust MM-estimator

Fig. 2
figure 2

Scatter plot of observed versus predicted values for the indirect mapping approach. OLOGIT ordered logit regression

Fig. 3
figure 3

Predicted error distribution of the optimal direct mapping methods. CLAD censored least absolute deviation model, GLM generalized linear model, MM robust MM-estimator

Fig. 4
figure 4

Predicted error distribution of the indirect mapping approach. OLOGIT ordered logit regression

3.4 Optimal Mapping Functions

The selected dimension coefficients from the best direct mapping models are reported in Table 6, while the results of the indirect mapping models to predict each dimension are reported in Tables 78, and 9. For OLOGIT regression reported in Tables 78, and 9, it should be noted that the most severe levels of the anxiety/depression dimension in the EQ-5D-5L were not reported by patients in this study sample. Consequently, this level cannot be predicted accurately based on the results in Tables 78, and 9. Whenever the users have access to both the QLQ-C30 and QLQ-BR23, the mapping algorithms developed based on Model 3 will be preferred, whereas if the user has only included the QLQ-C30 or QLQ-BR23 in the study, mapping algorithms developed from Models 1 or 2 could be used.

Table 6 Direct mapping equations from the QLQ-BR53 index to health state utilities
Table 7 Indirect mapping equations from the QLQ-BR53 index to each of the EQ-5D-5L and SF-6D dimensions (ordered logit regression): Model 1
Table 8 Indirect mapping equations from the QLQ-BR53 index to each of the EQ-5D-5L and SF-6D dimensions (ordered logit regression): Model 2
Table 9 Indirect mapping equations from the QLQ-BR53 index to each of the EQ-5D-5L and SF-6D dimensions (ordered logit regression): Model 3

To use the mapping functions, the first step is to rescale the raw dimension score onto the 0–1 scale (i.e. raw score divided by 100). Using Model 3 reported in Table 6, the predicted EQ-5D-5L and SF-6D utilities can be calculated as:

$$\begin{aligned} {\text{EQ-}}5{\text{D-}}5{\text{L utility}} & = 0.6194 + 0.3175*{\text{PF}} \\ & \quad + 0.1300*{\text{EF}} - 0.1177*{\text{PA}} - 0.3257*{\text{AS}} \\ \end{aligned}$$
$$\begin{aligned} {\text{SF-}}6{\text{D utility}} & = \exp \left( { - 1.0632 + 0.2712 \times {\text{PF}}} \right. \\ & \quad + 0.1608 \times {\text{RF}} + 0.1302 \times {\text{EF}} + 0.0739 \times {\text{SF}} \\ & \quad + 0.0461 \times {\text{CO}} + 0.0862 \times {\text{QL}} + 0.1492 \\ & \quad \left. { \times {\text{SEF}} - 0.0949 \times {\text{BS}} - 0.1810 \times {\text{AS}}} \right) \\ \end{aligned}$$

4 Discussion

Accurate measurement and valuation of HRQoL are an important component of economic evaluations of healthcare interventions targeted at breast cancer patients. The QLQ-BR53, EQ-5D-5L and SF-6D have been demonstrated as valid instruments for the measurement of HRQoL in breast cancer patients; however, QLQ-BR53 is not currently preference-based. This study has developed mapping algorithms that can predict EQ-5D-5L and SF-6D utility scores from the QLQ-BR53 to conduct a CUA when the preference-based quality of life instrument is not included. To the best of our knowledge, this is the first mapping study based on the Chinese version of the EORTC QLQ-BR53 for breast cancer patients. There is one existing mapping study that was also developed based on breast cancer patients in China, and this can be used to convert the FACT-B scale onto EQ-5D-5L [38]. Although both FACT-B and QLQ-BR53 are widely used in breast cancer patients, they are two different instruments [39]. The new mapping algorithms reported in this study will further assist researchers to predict health state utility scores based on the breast cancer-specific quality of life instruments in China.

This study differs from previous literature in several ways. First, several mapping algorithms have been developed from the QLQ-C30 onto the EQ-5D-3L, but not yet onto the EQ-5D-5L [13, 40, 41]. This study further extends to include mapping algorithms onto another widely used preference-based instrument, the SF-6D. Without the existence of the tariff from Mainland China, the Hong Kong tariff, which has been widely used among Chinese populations, was adopted. Based on the reported indirect mapping functions, the results reported in this study can be used when the Chinese-specific tariff of Mainland China is available in the future. For the EQ-5D-5L and SF-6D studied here, mapping onto SF-6D tends to have a better performance than the EQ-5D-5L in general (based on lower predicted errors and higher CCC values).

Second, for breast cancer clinical trials, if two instruments are included, the study found that using dimensions from both instruments can lead to a better prediction [13]. Comparing our study with the Korean study by Kim et al. [13], similar model specifications were used and some of the final predictors were the same, e.g. the physical functioning dimension and the symptom dimensions of pain, dyspnea, systemic therapy side effects, and arm symptoms (all of which were statistically significant at the 0.05 level). To some extent, this similarity reflects a robust relationship between some QLQ-BR53 dimensions and EQ-5D utility. From the goodness-of-fit results, we can see that the mapping performance based on Model 3 was always better than Models 1 or 2; this finding suggests that in predicting health state utilities of breast cancer patients, the use of QLQ-BR53 will lead to a better prediction accuracy than using either QLQ-C30 or QLQ-BR23 alone.

Third, previous mapping studies have only used the OLS method to derive a mapping algorithm in this context [13]. Although the OLS is a popular method, its prediction performance may not be as good as other statistical methods. As demonstrated in this study, the OLS estimates have not been selected as the optimal algorithms. It should be noted that owing to the different versions of the EQ-5D instruments (i.e. 3-level vs 5-level EQ-5D used in this paper), as well as the different country-specific tariffs used, it is also impossible to directly compare the mapping performance between this study and previous literature [13, 40]. On the other hand, similar to previous mapping studies, almost all mapping algorithms tend to overestimate the utilities for patients in poor health [42].

There are some limitations to this study. First, only breast cancer patients were used in the study, therefore it is unclear to what extent the mapping algorithms from QLQ-C30 onto EQ-5D-5L/SF-6D can be more widely used for other cancer patients. Second, although the predicted mean health state utility scores at the sample level are very close to the observed mean utility, as commonly reported in the literature, the mapping performance at the lower end of the utility distribution is not good. For example, among the econometric methods used, only BETAMIX and OLOGIT can predict the negative EQ-5D-5L utility scores; however, based on a wide range of goodness-of-fit indicators, the overall mapping performance of these two methods may not be as good as other methods. Caution is therefore warranted when applying mapping algorithms to patients in poor health status. Third, the developed mapping algorithms should be externally validated. In particular, external validation using a longitudinal dataset will be helpful to explore to what extent the incremental utility can be accurately calculated based on the mapping algorithms.

Finally, mapping algorithms should serve as a second-best solution to generate health utilities from the non-preference-based, disease-specific quality of life instruments [23]. Recently, King et al. [43] developed a health state classification system from the EORTC QLQ-C30, called the EORTC Quality of Life Utility Measure-Core 10 dimensions (QLU-C10D). The responses to the QLQ-C30 can be converted into QLU-C10D utility scores for conducting cost-utility analyses. However, the QLU-C10D only contains information from the QLQ-C30, and, as such, more breast cancer-specific dimensions that are captured using the QLQ-BR23 will be omitted.

5 Conclusions

This study reported mapping algorithms from the QLQ-C30 and/or QLQ-BR23 onto EQ-5D-5L or SF-6D utilities based on breast cancer patients in China. Outputs from this study can be used in CUAs to prioritise health resources. Further studies are warranted to externally validate mapping algorithms and explore the use of these mapping algorithms in other cancer patients.