FormalPara Key Points for Decision Makers

The Sydney Asthma Quality of Life Questionnaire (AQLQ-S) was designed to measure functional problems in adults who have asthma. However, it is currently not possible to estimate health utilities based on the AQLQ-S because of methodological constraints.

Using regression approaches, our study showed that it is possible to predict health-state utilities for five commonly used multi-attribute utility instruments (MAUIs) from AQLQ-S responses, i.e. the Assessment of Quality of Life 8 Dimensions (AQoL-8D), EuroQoL 5 Dimensions 5-Level (EQ-5D-5L), Health Utilities Index Mark 3 (HUI3), 15 Dimensions (15D), and the Short-Form 6 Dimensions (SF-6D).

The results of this study can be used to inform utility estimation within future economic evaluations of interventions targeted at populations of people with asthma.

1 Introduction

The global prevalence, morbidity, mortality and economic burden associated with asthma have been increasing over the years [1]. Asthma affects between 1 and 18 % of the population in different countries, with an estimated 300 million individuals affected worldwide [2, 3]. Its effect on health-related quality of life (HRQoL) is increasingly being measured to inform patient management and policy decisions, including decisions relating to the share of the health budget that should be allocated to the treatment of asthma [49]. Many decision bodies, including the UK National Institute for Health and Care Excellence (NICE) and the Australian Pharmaceutical Benefits Advisory Committee (PBAC) and Medical Services Advisory Committee (MSAC) recommend the use of cost-utility analysis (CUA) [1012], which estimates and compares the cost per additional quality-adjusted life-year (QALY) obtained from each service where QALYs are calculated as life-years times an index of the utility of the relevant health state measured on a 0–1 (death to full health) scale [13, 14]. Increasingly, utilities have been derived from a limited number of multi-attribute utility instruments (MAUIs) [15, 16]; however, MAUIs are often perceived as being less sensitive to particular conditions than non-utility, condition-specific quality-of-life (QoL) measures [17].

The Sydney Asthma Quality of Life Questionnaire (AQLQ-S) is a non-utility-based asthma-specific QoL instrument that was developed to measure functional problems in adults who have asthma [18, 19]. A recent review identified it as one of the most commonly used asthma-specific QoL measures [6]. Compared with one of its variants (the McMaster Asthma Quality of Life Questionnaire [AQLQ-McMaster]), the AQLQ-S has been shown to have lower respondent burden and is therefore preferred by researchers and respondents for inclusion as an asthma-specific QoL measure in broader population health surveys [9]. A limitation of the AQLQ-S for economic evaluation is that it does not have utility weights and cannot therefore be used to estimate QALYs, as needed for a CUA.

This limitation may be overcome by creating an algorithm that predicts utility scores from the AQLQ-S. To date, no such mapping algorithm has been created. Tsuchiya et al. [20] employed ordinary least squares (OLS) and multinomial logistic regression to map the AQLQ-McMaster onto an MAUI, the EuroQoL 5 Dimension 3-Level (EQ-5D-3L), using a sample of 3000 individuals. While the authors concluded that it was possible to estimate a robust relationship between EQ-5D-3L utilities and the AQLQ-McMaster, the study was limited by the exclusion of sociodemographic variables and by only mapping to the EQ-5D-3L, which performs less well on tests of sensitivity and content validity than other MAUIs [21].

Using data from a large Multi-Instrument Comparison (MIC) study [22], the present paper develops mapping algorithms that use the AQLQ-S and patient socio-demographic characteristics to predict utilities for the five most commonly used MAUIs, which are listed in Fig. 1 along with their common abbreviation and major reference in the literature.

Fig. 1
figure 1

AQLQ-S and the multi-attribute utility instruments, along with their common abbreviation and major reference in the literature

2 Methods

We followed the newly developed ‘Mapping onto Preference-Based Measures Reporting Standards’ (MAPS) checklist in conducting this study [23]. The target instruments for mapping were the AQoL-8D, EQ-5D-5L, HUI 3, 15D and SF-6D, while the source instrument was the AQLQ-S.

2.1 Instruments

2.1.1 Sydney Quality of Life Questionnaire (AQLQ-S)

This 20-item, condition-specific instrument was developed to measure the functional impairments that are most troublesome to adults (17–70 years) living with asthma [18, 24], and consists of four domains, some with overlapping items: breathlessness (five items), mood disturbance (five items), social disruption (seven items), and concerns for health (seven items). It has shown good validity when used within asthma populations [19, 2527]. Results may be reported as average scores for each of the four domains or as a simple score that may be reduced to a 0–1 (worst–best) scale.

2.1.2 Assessment of Quality of Life 8 Dimensions (AQoL-8D)

This is an eight-dimension MAUI designed to assess HRQoL across health conditions, and is applicable to individuals aged ≥16 years [28, 29]. The AQoL-8D measures the following dimensions: independent living, relationships, mental health, coping, pain, senses, happiness and self-worth [30]. Using the time trade-off (TTO) approach [31], population preference weights were obtained from the Australian population, resulting in utilities ranging from −0.094 to 1 [32]. The validity of the AQoL-8D has been proven in multiple patient populations [21, 3335].

2.1.3 EuroQoL Dimensions 5-Level (EQ-5D-5L)

This is a measure of HRQoL suitable for use on individuals aged ≥18 years, and comprised of five single-item dimensions of health: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression [36]. It is a modification of the original EQ-5D-3L and includes five, rather than three, levels of impairment in each domain: no, slight, moderate, severe, and extreme problems in the relevant dimension of health [37]. Using these responses, the EQ-5D-5L is able to distinguish between 3125 states of health. A UK-specific algorithm developed using TTO techniques was used to convert the EQ-5D-5L health description into a valuation ranging from −0.281 to 1 [38]. Scores less than 0 represent health states that are worse than death [39]. The EQ-5D-5L has been validated in differentiated clinical populations [4042].

2.1.4 Health Utilities Index Mark 3 (HUI3)

The HUI3 is an HRQoL outcome that measures eight domains, namely vision, hearing, speech, ambulation/mobility, pain, dexterity, emotion, and cognition [43, 44]. Each of these domains has five to six rank-ordered response options. Utilities were developed using a visual analogue scale (VAS) and the Standard Gamble (SG) technique, and ranged from −0.36 to 1 [44]. The HUI3 has been validated in diverse clinical conditions and is suitable for individuals aged 5 years and older [43, 45, 46].

2.1.5 Short-Form 6 Dimensions (SF-6D)

This MAUI was derived from the Short-Form 36 dimensions (SF-36), a 36-item generic HRQoL instrument designed to measure general health concepts across different ages, diseases and treatment groups [47]. The SF-6D consists of six dimensions: vitality, physical functioning, pain, role functioning, social functioning and mental health [48]. The number of levels per dimension varies from four to six. Utilities, developed using the SG approach, can be derived from 11 of the 36 items in the SF-36, and range from 0.291 to 1 [49]. The validity of the SF-6D has been demonstrated in differentiated populations with variable clinical conditions [46, 5052].

2.1.6 15 Dimensions (15D)

This MAUI is suitable for individuals aged ≥16 years and has 15 HRQoL dimensions, namely mobility, vision, hearing, breathing, sleeping, eating, speech, excretion, usual activities, mental function, discomfort and symptoms, depression, distress, vitality and sexual activity [53]. Each of these dimensions has five ordinal levels of severity [53]. It is well-validated in various clinical populations [5456], and health states defined by the MAUI can be converted into utilities (ranging from 0 to 1) that were derived as a weighted average of VAS scores for the 15 dimensions [48, 53].

A comparison between the dimensions of the AQLQ-S and those of the five MAUIs shown in Fig. 2 depicts the conceptual overlap between these instruments.

Fig. 2
figure 2

Comparisons between the dimensions of the AQLQ-S and the MAUIs. AQLQ-S Sydney Asthma Quality of Life Questionnaire, AQoL-8D Assessment of Quality of Life 8 Dimensions, EQ-5D-5L EuroQoL 5 Dimensions 5-Level, HUI3 Health Utilities Index Mark 3, SF-6D Short-Form 6 Dimensions, 15D 15 Dimensions, MAUIs multi-attribute utility instruments

2.2 Data

A large MIC survey was carried out in six countries: Australia, Canada, Germany, Norway, the UK and the US, details of which are provided elsewhere [34]. The online survey was administered by a global company (CINT Pty Ltd), to a demographically representative group of the healthy population in each country and to patients in seven major disease areas. Quotas were applied to obtain a target number of respondents in each of the chronic disease areas. Only patients with asthma were included in the present study. Data collected included age, gender, educational level, country of residence, ethnicity, marital status, occupational status, income level, body mass index (BMI), smoking status and responses to the six instruments described above. Data were collected between October 2011 and January 2014, and all participants gave their informed consent prior to inclusion in the study. Ethical approval was granted by the Monash University Human Research Ethics Committee (MUHREC) [CF11/3192–2011001748].

2.3 Statistical Analysis

All analyses were conducted in STATA version 14.1 [57], and the analysis was conducted in two stages. In the first stage, the correlation between the AQLQ-S scores and the five MAUIs was assessed using scatter diagrams and Spearman’s rank correlation coefficients.

In the second stage, independent variables were chosen for inclusion in the regression models. Highly correlated independent variables (r > |0.7|) [58] were identified using Spearman’s rank correlation, and a decision was made with respect to which variables to include in the analysis. The independent variables, including the AQLQ-S, were mapped onto each of the five MAUIs using the nine models described in Table 1. Specific patient characteristics were included when they improved the predictive ability of the models (see Sect. 2.5 for measures of predictive ability). The effect of including dummy variables (representing each of the six countries in the MIC study) in all of the nine models was also tested within a sensitivity analysis. The following regression model families were used in the mapping:

Table 1 Variables used in regression models
  • OLS regression models These have been the most widely used models in mappings [17]; however they have a potential limitation, i.e. the presence of a data ceiling can lead to inconsistent coefficient estimates [59, 60].

  • Censored least absolute deviations (CLAD) This technique takes the ceiling effect into account and is also robust to heteroscedastic and skewed data [61]. It is consistent and asymptotically normal for a wide class of error distributions [62].

  • The generalised linear model (GLM) This family of models is also robust to heteroscedasticity and skewness [63]. The choice of the GLM distribution and link was guided by the modified park test suggested by Manning [64].

  • The Beta Binomial (BB) regression model The BB model can estimate unimodal or bimodal utilities while being robust to skewness [65, 66]. A limitation of the model is that it restricts utilities to a 0 to 1 range [66]. However, utilities in our data set were positive, except for a small number of observations for the EQ-5D-5L (0.7 %) and HUI3 (1.05 %). As done elsewhere, these data were set equal to 0 [67].

2.4 Estimation and Validation of Primary Models

We used an approach similar to previous studies to estimate ‘primary’ or ‘estimation’ models from a subset of the data, and validated these with the remaining data [6871]. In this ‘hold-out’ approach, data were split into two parts: an ‘estimation sample’ consisting of two-thirds of the data (793 observations) that were used to construct the primary models, and a ‘validation sample’ consisting of the remaining third (396 observations) that were used for validation. A total of 180 primary models were estimated (four model families × nine model specifications × five MAUI-dependent variables). These primary models were then tested on the validation sample to assess their predictive ability.

2.5 Assessment of Predictive Ability

Predicted utilities from each of the 180 models were estimated using STATA’s inbuilt post-estimation commands. The predictive ability of the models was primarily assessed using two measures of predictive error [72]: the root mean squared error (RMSE) and the mean absolute error (MAE), with lower values of the measures implying a better performing predictive model. To calculate the RMSE, the difference between the observed and predicted values of the MAUIs was squared and then summed over all observations. The RMSE was then estimated as the square root of the mean of these summed values. The MAE was calculated by summing the absolute difference between the observed and predicted values of the MAUIs and the estimated mean of these summed values. Where the RMSE and MAE indicated different results in the validation sample, and as recommended in the literature [73], more weight was placed on the RMSE, particularly when the distribution of the error from the model was Gaussian. Performance of the models was further assessed using four additional criteria estimated using the validation sample, namely (1) the ranges of, and Spearman’s rank correlations between, the predicted and observed utilities; (2) an examination of the distributions of the predicted and observed utilities to determine how closely predicted values matched observed scores [74]; (3) assessment of the distribution of the residuals (observed minus predicted utilities) to determine bias in the predicted utilities [75]; and (iv) assessment of the proportion of predicted utilities deviating from observed values by <0.03 or 0.05 [76]. A breakdown of which regression model family, model specification (among the nine models) and MAUI prediction (AQoL-8D, EQ-5D-5L, HUI3, SF-6D or 15D) performed best according to the six selection criteria (RMSE, MAE and the four additional criteria) is also presented. The best-fitting models overall, based primarily on the performance of their measures of predictive error, were re-estimated using data from the entire sample.

Complete data sets were available for all of the instruments and demographic data analysed.

3 Results

3.1 Demographic and Other Characteristics

Table 2 presents summary statistics for 856 study participants. No significant differences in the instrument scores were observed between the estimation and validation samples. Mean utilities were highest for the 15D (mean 0.85) and lowest for the AQoL-8D (mean 0.69). The majority of individuals in the sample were <45 years of age (58 %), female (62 %), married or living with a partner (59 %), non-smokers (77 %), educated beyond high school (71 %), and had a good or very good standard of living (88 %). There were no statistically significant differences between the estimation and validation sample in terms of patient characteristics. All six countries were fairly represented in the dataset.

Table 2 Descriptive statistics of estimation and validation samplesa

3.2 Bivariate Relationship between AQLQ-S and Multi-Attribute Utility Instruments (MAUIs)

The dimensions of the AQLQ-S and MAUIs are compared in Fig. 2. When contrasted against the dimensions of the AQLQ-S, the 15D had the greatest number of overlapping dimensions (12/15), followed by the SF-6D (5/6), AQoL-8D (6/8), EQ-5D-5L (4/5) and HUI3 (2/8). Figure 3 depicts the relationship between the AQLQ-S total scores and utilities for each of the five MAUIs. The plots show moderate to strong correlation for all comparisons, with the lowest being between the AQLQ-S and the HUI3 (0.458), and the highest being between the AQLQ-S and the 15D (0.544).

Fig. 3
figure 3

Scatter plots between the AQLQ-S total scores and utilities of each of the MAUIs, as well as corresponding correlation coefficients. AQLQ-S Sydney Asthma Quality of Life Questionnaire, MAUIs multi-attribute utility instruments, AQoL-8D Assessment of Quality of Life 8 Dimensions, EQ-5D-5L EuroQoL 5 Dimensions 5-Level, HUI3 Health Utilities Index Mark 3, SF-6D Short-Form 6 Dimensions, 15D 15 Dimensions

3.3 Assessment of Model Predictive Ability

Selection criteria statistics for assessing the predictive ability of the 180 models are presented in electronic supplementary Table 1 for both the estimation and validation samples. These were used to rank each model, resulting in rankings that were sufficiently consistent to permit the selection of a ‘shortlist’ of 10 best-fitting models. The short list for each MAUI, as well as for all MAUIs combined, is shown in electronic supplementary Table 2. A total of nine regression algorithms were candidates for best predicting models as they were the best-fitting models based on the selection criteria statistics: AQoL-8D – OLS (9), AQoL-8D – GLM (9), 15D – OLS (9), 15D – GLM (9), 15D – OLS (9) and 15D – CLAD (8) in the estimation sample, and AQoL-8D – GLM (8), 15D – GLM (8), and 15D – CLAD (5) in the validation sample. Selection criteria statistics estimated using these models ranged from (figures given for the estimation and validation samples) 0.0950 to 0.0973 and 0.0834 to 0.0866 (RMSE), 0.0730 to 0.0740 and 0.0645 to 0.0665 (MAE), 0.6120 to 0.6420 and 0.6370 to 0.6610 (correlation), and 43–49 % and 43–46 % (proportion of predictions with absolute errors <0.0.5). In addition, the ‘minimum to maximum’ range of the predicted probabilities for all nine models was narrower than that for the observed utilities, while the distribution of the residuals for both samples all appear close to being normally distributed (supporting our decision to put more weight on the RMSE for selecting the best-fitting model [67]). Below, these nine regression models are now analysed in order to give a breakdown of which regression model family, model specification and MAUI prediction performed best according to the selection criteria.

With respect to the performance of the regression model families, there were some mixed results. Overall, however, the OLS and GLM performed best on correlation, MAE and RMSE and the CLAD on the proportion of predicted utilities deviating from mean observed utilities by <0.05 (electronic supplementary Table 2). The OLS and GLM predicted mean utilities whose values were closest to those of the observed utilities; however, the CLAD predicted more utilities whose distribution ‘mimicked’ that of the observed values. Fewer CLAD-predicted utilities also deviated from the mean utilities by >0.05. Based on best performance on the most criteria, the OLS and GLM were deemed to have been better models.

In terms of model specification, electronic supplementary Table 2 shows that, regardless of the MAUI predicted, model specifications (8) and (9) performed the best. Model (8) was a non-linear model that included a quadratic term of the AQLQ-S, as well as demographic characteristics as independent variables. Interaction terms were added to these independent variables in model (9).

With respect to prediction of specific MAUIs (electronic supplementary Table 2), the prediction of 15D was the strongest when assessed using the RMSE, MAE and proportion of predicted utilities deviating from mean observed utilities by <0.05, while that for the AQoL-8D was strongest when correlation was assessed. There was mixed performance from the ‘AQLQ-S to AQoL-8D’ and ‘AQLQ-S to EQ-5D-5L’ predictions. Therefore, based on best performance on the most criteria the ‘AQLQ-S to 15D’ prediction was, on average, the strongest, followed by the ‘AQLQ-S to SF-6D’, while the ‘AQLQ-S to HUI3’ prediction was the weakest.

3.4 Best-Performing Models Overall for All MAUIs

When the regression model families and model specifications are considered together, GLM (8) performed best on the RMSE and MAE (except for the HUI3 and SF-6D predictions where CLAD (8) performed best on the MAE). In the estimation sample, GLM (8) was ranked within the top four best-performing models for all MAUI predictions in terms of the RMSE and MAE (except for the EQ-5D-5L prediction, where it was ranked outside the top 10 on the MAE, and for the HUI3 prediction, where it was ranked fifth on both the RMSE and MAE). In the validation sample, GLM (8) overpredicted mean utilities whose range (minimum to maximum) was also narrower than that of the observed utilities (Table 3). Although the range of utilities predicted by GLM (8) was again narrower than that of the observed utilities in the estimation sample, the mean predicted and observed utilities were the same (Table 3). However, an examination of the measures of spread (particularly the 25th percentile, median and 75th percentile) shows that the distributions of predicted utilities in both samples were similar to those for the observed utilities. Spearman’s rank correlations between GLM (8) predicted and observed utilities in both samples all showed moderate correlation (range 0.51–0.66). Finally, the plots of residuals for GLM (8) (Fig. 4) for comparable predictions in the estimation and validation samples look significantly different but appear close to being normally distributed. Including country dummies in all model specifications within the sensitivity analysis did not result in better-performing models (predictive accuracy results of the 10 best-fitting models across all MAUIs are shown in electronic supplementary Table 3). On this basis, and in order to have parsimonious prediction models, preference was given to models without country dummies. In particular, GLM (8) was chosen as relatively best-fitting in both samples, and then re-estimated using data from the entire sample. The regression model coefficients for predicting the five MAUIs using GLM (8) are shown in Table 4. To predict 15D utilities from the AQLQ-S, for instance, the following equation would have to be used:

Table 3 Predictive ability of the best-fitting models
Fig. 4
figure 4

Scatter plots of residuals for GLM (8) in estimation and validation samples. GLM generalised linear model, AQoL-8D Assessment of Quality of Life 8 Dimensions, EQ-5D-5L EuroQoL 5 Dimensions 5-Level, HUI3 Health Utilities Index Mark 3, SF-6D Short-Form 6 Dimensions, 15D 15 Dimensions

Table 4 Model coefficients for best-fitting models [n = 856]a
$$\begin{aligned} &1 5 {\text{D Utility}} = - 0. 4 7 3 - 0. 4 6 2\times {\text{Breath}}\_{\text{domain}} + 0. 1 6 8\times {\text{Mood}}\_{\text{domain}} + 0. 40 3 8\times {\text{Social}}\_{\text{domain}} \hfill \\ &+ 0.0 6 5\times {\text{Concerns}}\_{\text{domain}} + 0. 30 6\times \left( {{\text{Breath}}\_{\text{domain}}} \right)^{ 2} +\, 0.0 8 1\times \left( {{\text{Mood}}\_{\text{domain}}} \right)^{ 2} \hfill \\ &{-}\, 0. 20 5\times \left( {{\text{Social}}\_{\text{domain}}} \right)^{ 2} + 0.0 2 4\times \left( {{\text{Concerns}}\_{\text{domain}}} \right)^{ 2} + \,0.0 1 3\times {\text{Age}}\_ < 6 5 {\text{ years}} \hfill \\ &+ 0.0 1 9\times \left( {{\text{Female}}\_{\text{Gender}}} \right) \hfill \\ \end{aligned}$$

4 Discussion

The AQLQ-S is a non-preference-based measure of QoL for people with asthma, frequently used in clinical and epidemiological studies in Australia and internationally [6]. As it is not a utility instrument, it cannot be used for comparisons between interventions for disparate services. This limitation is overcome by mapping the AQLQ-S onto an MAUI. The estimated utilities from the mappings may then be used to calculate QALYs, and for the conduct of CUA. The present study has provided such mapping functions for each of the major MAUIs. As there were slight differences in the estimated utilities, the choice of which MAUIs to map onto must be guided by whether the health-state classification system of each MAUI reflects the domains deemed most important for the condition under consideration.

The AQLQ-S demonstrated strong positive association with all the MAUIs, implying good convergent validity between them. In the preliminary analysis, correlation was highest between the AQLQ-S and the 15D, and lowest between the AQLQ-S and the HUI3. The strong correlation with the 15D is a reflection of the close correspondence of the conceptualisation of the dimensions of health in the two instruments [77].

There were some mixed results among the regression algorithms for the best-predictive models (assessed according to correlation, RMSE, MAE and percentage of absolute differences between predicted and observed utilities of <0.05). The range of these statistics (0.0834–0.0973, RMSE; 0.0645–0.0740, MAE; 0.6420–0.6610, correlation; and 46–49 % of absolute differences between predicted and observed utilities of <0.05) for these models were all within acceptable ranges of published estimates [17], making the selection of the optimal algorithms for each MAUI difficult; however, differences between these models were small. It was not possible to compare our results with those of comparable analyses as our study was the first to map the AQLQ-S onto MAUIs, and the first to provide mappings for all of the major MAUIs. However, the RMSE estimates obtained in this study were substantially lower than those reported by Tsuchiya et al. [20] for the mapping of the AQLQ-McMaster onto the EQ-5D-3L (range 0.2024–0.2775).

For economic evaluation, mean estimates are of great importance [75, 78, 79]. Using this criterion, our results show that the OLS and GLM performed best as they predicted mean scores that were closest to the mean observed utilities. However, if an analyst is also interested in accurate prediction across the whole distribution of utilities, then the performance of the CLAD was the best because, compared with the OLS and GLM, CLAD models predicted mean utilities that had a wider range (minimum to maximum) that more closely described the variation of observed utilities, implying that CLAD-predicted utilities had a better spread of predicted values than the OLS and GLM models. This result has also been seen elsewhere [74, 75]. Generally though, all three model families predicted utilities with narrower ranges than those of observed utilities, a result seen in other research [68, 74, 79], and may have been due to few patients having scores or utilities at the lower or upper scales of the instruments used in this study.

Some limitations in our data and analysis need to be noted. First, no suitable out-of-sample dataset was available and therefore in-sample validation of mapping algorithms, successfully used in a number of other mapping studies [6871], was applied; However, it is desirable that the algorithms should be validated on an external dataset. Second, asthma status was self-reported and therefore subject to reporting biases. Third, our sample may not be fully representative of the asthma population as we used a self-selected sample of respondents, namely people who used the internet and were part of the online database of CINT Pty Ltd; however, there are no strong prior reasons for believing that this should skew the functional relationships reported here. The sample included a wide representation across six countries, and the study participants were reflective of a broad range of sociodemographic characteristics. Finally, the same preference weights were used regardless of the nationality of the respondents as national weights do not exist for all of the MAUIs. However, it has been shown that the content of MAUIs has a greater impact on utilities than the difference in intercountry preference weights [34].

5 Conclusions

Directly collecting data on utilities will always be the best way of measuring QoL for the purpose of conducting a CUA. When this has not been done, our results demonstrate the possibility of predicting utilities if data on the AQLQ-S have been collected. We recommend using a GLM (8) mapping function for this exercise.