FormalPara Key Points for Decision Makers

There is growing interest in measuring outcomes for economic evaluation in a way that goes beyond health-related quality of life (HRQoL). Operationalizing this work requires consideration of the overlap between these different approaches.

Variation in the dimensions of HRQoL included in current preference-based HRQoL instruments means that overlap with wellbeing and/or capability instruments [here, the ICEpop CAPability measure for Adults (ICECAP-A)] will differ.

In this study, the ICECAP-A provided additional complementary information when compared with the 15D, EQ-5D-5L, Health Utilities Index Mark 3 (HUI-3), and SF-6D, while there was substantial overlap between the ICECAP-A and Assessment of Quality of Life 8-dimension (AQoL-8D).

1 Introduction

Economic evaluation has become an important tool in many countries to inform decision makers about the value of alternative courses of action [1]. These evaluations usually take the form of a cost-utility analysis, where the outcome is measured in quality-adjusted life years (QALYs) [2]. The QALY has become the gold standard measure of health outcome in economic evaluation and is recommended by numerous health technology assessment agencies to assist in the allocation of scarce healthcare resources [3,4,5]. Indeed, an objective to ‘maximize health’ within a healthcare system is often operationalized by maximizing the number of QALYs gained from a fixed budget.

The focus on QALYs for resource allocation decisions in healthcare has been challenged for decades [6,7,8], with recent contributions drawing attention to areas such as public health [9], social care [10], mental health [11], and end-of-life care [12]. It is often argued that there are important benefits that cannot be measured in terms of health alone and that the evaluative space of economic evaluations should be more encompassing, allowing for the inclusion of broader benefits, such as wellbeing [13]. The term ‘wellbeing’ has been used inconsistently in the literature [14], although a distinction can be made between psychological wellbeing (a eudaimonic measure, i.e., a measure of flourishing such as self-acceptance or autonomy) [15] and subjective wellbeing (a hedonic measure, i.e., a measure of happiness and satisfaction) [16]. Another conceptualization of wellbeing has been offered by Amartya Sen, referred to as the capability approach [17], which distinguishes between capabilities (a person’s opportunities to achieve wellbeing) and achieved functionings (the actual outcomes realized by individuals) [18]. The capability approach accounts for the fact that a person’s capabilities (what a person can do) may differ from their functionings (what a person actually does) [19]. There is growing interest in Sen’s capability approach within health economics, and for outcome measurements in economic evaluations in particular [20]. Recent efforts to operationalize the capability approach have led to the development of preference-based instruments for the measurement of capability wellbeing, suitable for use in economic evaluation. Three such measures have resulted from the Investigating Choice Experiments for the Preferences of Older People (ICEPOP) project: the ICEpop CAPability measure for Older Adults (ICECAP-O) [21], Adults (ICECAP-A) [19], and individuals at the end of life [ICECAP Supportive Care Measure (ICECAP-SCM)] [22].

Changes in guidelines for health technology assessments have recognised the potential importance of broader benefits in economic evaluation and have made provision for the measurement of capability wellbeing. For example, the National Institute for Health and Care Excellence in the UK has recommended the use of capability measures in economic evaluations for interventions that are associated with non-health benefits [23], yet little guidance has been provided in terms of what constitutes a health benefit or a non-health benefit, and which decision rules should be applied if using ICECAP instruments alongside other preference-based health-related quality of life (HRQoL) instruments. Dutch guidelines also advocate the use of ICECAP instruments for long-term care, where the focus of interventions might be more on improving a person’s wellbeing rather than their health [5]. Because ICECAP instruments do not have ‘QALY properties’ (i.e., current values are not anchored onto the ‘full health’ to ‘dead’ scale but on a ‘full capability’ to ‘no capability’ scale [24]), the reference cases described in the UK and Dutch guidelines recommend supplementing cost-utility analysis (using the EQ-5D [25, 26]) with a cost-consequences analysis or cost-effectiveness analysis using an ICECAP instrument. The underlying intention is to capture explicitly broader aspects of capability wellbeing alongside health benefits.

In practice, decision makers may find it difficult to interpret and reconcile findings from such primary and supplementary analyses without further information describing the extent of overlap between measures of HRQoL and capability wellbeing. The extent of overlap between the ICECAP instruments and the three-level EQ-5D (EQ-5D-3L) has been examined in two previous studies. Davis and colleagues performed an exploratory factor analysis (EFA) comparing the ICECAP-O with the EQ-5D-3L in seniors enrolled in a falls prevention clinic [27], showing that the two instruments tapped into distinct and complementary factors. These results were confirmed by a second EFA, which compared the ICECAP-A with the EQ-5D-3L in an adult population of patients with knee pain [28].

Further research is needed to explore whether the same relationship holds in other clinical and non-clinical settings, as well as for other preference-based HRQoL instruments. Preference-based HRQoL instruments differ greatly in their coverage of physical, mental, and social health domains [29,30,31,32], as well as the extent to which ‘non-health’ items are included in the respective descriptive systems. These issues raise the potential for different degrees of overlap between preference-based HRQoL instruments (i.e., preference-based instruments that define health states) and measures of capability wellbeing. Such investigations are particularly important to avoid double counting when using HRQoL and capability wellbeing instruments simultaneously in health economic evaluations. In this context, double counting, where the same underlying concept of benefit is measured twice, could occur explicitly (i.e., summing health and non-health benefits into a single metric) or implicitly (e.g., misguided interpretation of outcomes data from a cost-consequences analysis). The objectives of this work are to investigate the extent to which five preference-based HRQoL instruments capture aspects of capability wellbeing, as measured by the ICECAP-A, and to consider the implications of our findings within the context of other literature regarding capability wellbeing and economic evaluation.

2 Methods

2.1 Data Source

Data were obtained from the Multi Instrument Comparison (MIC) project, a multinational survey funded by Australia’s National Health and Medical Research Council. Comprehensive details regarding the background, rationale, and administration of the MIC survey have been reported elsewhere [33]. Briefly, the aim of the MIC project was to compare several quality of life and wellbeing instruments across seven disease areas (in addition to a ‘disease free’ population) in six countries: Australia, Canada, Germany, Norway, UK, and USA. The MIC survey was administered online between February 2012 and May 2012 by a global survey company, CINT Pty Ltd.

2.2 Instruments

The MIC survey contained a comprehensive set of questions and standardized instruments [33]. In addition to questions about demographics, self-reported illnesses, and subjective wellbeing, all participants were asked to complete the ICECAP-A (with the exception of participants in Norway) and seven preference-based HRQoL instruments: 15D [34], Assessment of Quality of Life 4-dimension (AQoL-4D) [35], Assessment of Quality of Life 8-dimension (AQoL-8D) [36], EQ-5D-5L [26], Health Utilities Index Mark 3 (HUI-3) [37], Quality of Well-Being Scale Self-Administered (QWB-SA) [38], and SF-6D (based on the 36-item Short Form health survey version 2 (SF-36v2) [39]) [40]. Instruments were administered in a randomized order to account for order-effect bias [41].

For the analyses reported in the current paper, a decision was made to focus on the 35-item AQoL-8D (rather than the 12-item AQoL-4D) because of the more comprehensive descriptive system and the greater potential for overlap. The QWB-SA was also excluded because the measurement scale used for many items provides nominal data. These data would require transformation to meet the requirements for the statistical analysis performed, and such transformations render the analysis meaningless because the descriptive system has been modified. An overview of the dimensions and items contained within the preference-based HRQoL instruments included in this analysis (15D, AQoL-8D, EQ-5D-5L, HUI-3, and SF-6D) is provided in Online Supplementary Material I, with more comprehensive details available elsewhere [30]. The ICECAP-A comprises five dimensions (lay descriptions used by the instrument developers are included in brackets): stability (an ability to feel settled and secure), attachment (an ability to have love, friendship, and support), autonomy (an ability to be independent), achievement (an ability to achieve and progress in life), and enjoyment (an ability to experience enjoyment and pleasure). Each dimension comprises one question with four levels of response, ranging from full capability to no capability [19].

2.3 Statistical Analysis

Exploratory factor analyses were conducted in Mplus 7.4 (Muthén & Muthén, Los Angeles) [42]. In all pairwise comparisons (i.e., item-level responses for the ICECAP-A compared with item-level responses for each of the other instruments, namely 15D, AQoL-8D, EQ-5D-5L, HUI-3, and SF-6D), EFA was used to ascertain the number of unique underlying latent factors that were associated with the items covered by the respective preference-based HRQoL instrument and the ICECAP-A. The purpose of the EFA was to explore the underlying structure for a set of measures and to determine whether or not the ICECAP-A instrument measures something unique, i.e., a construct or constructs not captured by current preference-based HRQoL instruments. Output from EFA includes factor loadings, which reflect the strength and direction of association between each item and each of the common factors. Higher factor loadings indicate that more of the variance in the observed variables (i.e., items from the descriptive systems of the instruments being compared) is attributable to the latent variable (i.e., the common factor) [43]. The axes of the initial factor analysis were rotated using the geomin oblique rotation. Oblique rotation permits correlations between common factors, which is to be expected when all items measure aspects of a person’s quality of life. Pearson correlation coefficients were used to examine the extent of the relationship between factors (factors are considered as continuous variables); correlations were interpreted as weak (0.10–0.30), moderate (0.30–0.50), or strong (>0.50) [44]. Weighted least-square means and variance adjusted model estimation were applied to account for the ordinal nature of the item-level data.

The factor model and the number of common factors for each pairwise analysis were selected using the following procedure. The first step comprised an examination of eigenvalues. Eigenvalues are numerical values that correspond to the variance in the items accounted for by each of the common factors [43]. More specifically, an eigenvalue is the sum of the squared factor loadings for a given factor. Model selection based on eigenvalues typically entails comparison of eigenvalues against the Kaiser criterion, where the number of factors with eigenvalues >1 gives the number of common factors to be specified in the model [43]. Evaluation against the Kaiser criterion was supplemented with inspection of scree plots, which are graphical representations of the eigenvalues plotted in a descending order. Model selection based on scree plots typically involves identification of the last substantial drop in the magnitude of the eigenvalues and retention of common factors prior to this drop [45].

Scree plots also guided the identification of increases (decreases) in the number of factors suggested by the Kaiser criterion that return large gains (small losses) in the variance. Three model fit indices were used to further quantify such gains (losses). The root mean square error of approximation (RMSEA) estimates goodness of fit as the discrepancy between the model and the data per degree of freedom for the model [45]; RMSEA values were interpreted as indicating a close (<0.05), acceptable (0.05–0.08), marginal (0.081–0.1), or poor (>0.1) fit [43]. The Tucker–Lewis Index (TLI) and Comparative Fit Index (CFI) were also used. These goodness-of-fit estimates indicate how much better a model fits the data compared with a baseline model that assumes no relationship exists between any of the variables [46]. For both the TLI and CFI, values >0.9 indicate a ‘good’ model fit [47].

Using more than one criterion to guide the selection of the number of factors raises the possibility of seemingly conflicting results (e.g., a situation where the Kaiser criterion suggests a two-factor model, whereas model fit statistics suggest a three-factor model). Within EFA, it is important to recognize that the objective is not to arrive at the ‘true’ or ‘correct’ number of factors but to estimate the patterns of correlations among observed variables and to simplify the data so that these patterns of correlations can be more easily interpreted [43]. Where selection based on the Kaiser criterion, scree plot, and model fit did not yield a ‘clean’ factor structure, models with an increased number of factors were explored to see whether this improved the interpretation of the model (i.e., the interpretability of each set of items in the respective factors). A clean factor structure is given when item loadings are all >0.3 on at least one factor, and there are no or few cross-factor loadings (i.e., items that load >0.3 on more than one factor) [48]. Where expansion of the number of factors failed to remove cross-loadings, the parsimonious model with fewer factors suggested by the Kaiser criterion, scree plot, and model fit statistics was selected as the preferred model.

Once a preferred factor model was identified for each pairwise comparison, using the procedure described above, overlap between the ICECAP-A and the respective HRQoL instrument was examined using the following criteria: (1) the number of common factors shared by both instruments, and (2) the extent to which items from each instrument correlate with each shared common factor based on factor loadings. While the former refers to items of the ICECAP-A and the respective HRQoL instrument that contribute to the same underlying latent factor, the latter describes the strength of this contribution. The correlation among common factors was also examined to explore the extent to which the instruments in each pairwise comparison measure separate but correlated factors. The robustness of results was examined by comparing the extent of overlap in the preferred factor model against the extent of overlap in alternative factor models for each pairwise comparison.

3 Results

Data from 6756 individuals were used in the analyses. Table 1 provides the characteristics of the study population for the combined sample and by country. Quota sampling was used in the MIC study and, therefore, the distributions of age, sex, and education level are similar across the countries. The presence of a chronic disease was self-reported by the majority (78%) of the study population.

Table 1 Characteristics of the study population [values are numbers (percentages) unless stated otherwise]a

3.1 ‘Preferred’ Factor Models

Scree plots and the Kaiser criterion suggested a two-factor model for the EQ-5D-5L; a three-factor model for the 15D, HUI-3, and SF-6D; and a five-factor model for the AQoL-8D. In an attempt to improve model fit and interpretability, expansion of the number of factors was explored for all models. For the EQ-5D-5L and 15D, this resulted in an improvement in the model fit and factor structure with fewer cross-factor loadings, supporting the superiority of a three- and four-factor model, respectively. For the HUI-3, moving to a four-factor model improved model fit but resulted in a poorer factor structure and the three-factor model was retained as the preferred model. With regard to the SF-6D, a four-factor model was preferred because of a better model fit and a cleaner factor structure. A six-factor model was explored for the AQoL-8D but this did not improve interpretability of the factor structure and the five-factor model was retained. Results pertaining to the preferred factor model for each pairwise EFA are provided in Tables 2, 3, 4, 5 and 6.

Table 2 EFA comparing the ICECAP-A with the 15D (four-factor model)a
Table 3 EFA comparing the ICECAP-A with the AQoL-8D (five-factor model)a
Table 4 EFA comparing the ICECAP-A with the EQ-5D-5L (three-factor model)a
Table 5 EFA comparing the ICECAP-A with the HUI-3 (three-factor model)a
Table 6 EFA comparing the ICECAP-A with the SF-6D (four-factor model)a

3.2 Overlap with the ICECAP-A

Results suggest some degree of overlap between the ICECAP-A and the HRQoL instruments, although the extent of overlap varied across instruments. For the 15D EFA, two common factors were shared (Factors 2 and 4) [see Table 2]. In each case, ICECAP-A dimensions did not load strongly onto the respective shared factor [autonomy (0.337) on Factor 2 and stability (0.307) on Factor 4] and the shared factor mostly explained variance in the 15D items. All five ICECAP-A dimensions loaded strongly onto Factor 1, a factor that was not shared by any 15D items. However, Factor 1 was strongly correlated (r = 0.714) with Factor 4, which included the 15D items depression (0.841), distress (0.870), vitality (0.491), mental function (0.309), and sleeping (0.439).

The degree of overlap was much larger when comparing the ICECAP-A with the AQoL-8D. Three common factors (Factors 1–3) were shared by ICECAP-A and AQoL-8D items (see Table 3). Four ICECAP-A dimensions [stability (0.782), autonomy (0.345), achievement (0.634), and enjoyment (0.553)] and 18 AQoL-8D items loaded onto Factor 1. Factor 2 was shared by ICECAP-A autonomy (0.415) and 14 AQoL-8D items. Factor 3 included ICECAP-A attachment (0.682) and enjoyment (0.338), and six items of the AQoL-8D [social exclusion (0.307), close relationships (0.782), enjoy close relationships (0.842), pleasure (0.365), social isolation (0.338), and intimacy (0.581)]. Strong correlations were observed between Factors 1 and 3 (r = 0.643), and Factors 1 and 4 (r = 0.641). Despite the strong correlation with Factor 1, Factor 4 was not a shared factor. Factor 4 comprised AQoL-8D items only, with the largest factor loadings being social exclusion (0.679) and social isolation (0.653).

The EQ-5D-5L shared two common factors with the ICECAP-A (Factors 1 and 3) [see Table 4]. Four ICECAP-A dimensions [stability (0.803), attachment (0.798), achievement (0.658) and enjoyment (0.826)] and EQ-5D-5L anxiety/depression (0.703) loaded onto Factor 1. Factor 3 was primarily represented by the ICECAP-A autonomy (0.657) and achievement (0.426), as well as EQ-5D-5L self-care (0.301). Whereas a moderate correlation was found between Factor 1 and Factor 3 (r = 0.323), a strong correlation (r = 0.685) was observed between Factor 3 and Factor 2, where Factor 2 comprised EQ-5D-5L items only.

The HUI-3 (Table 5) and SF-6D (Table 6) also shared two common factors with the ICECAP-A in the respective pairwise comparisons. All five ICECAP-A dimensions loaded onto the same factor as a single SF-6D item [energy (0.391)], and two HUI-3 items [emotion (0.895) and cognition (0.455)]. In both models, ICECAP-A autonomy cross-loaded onto a second factor that was shared by ambulation (0.883), dexterity (0.576), and pain (0.719) in the HUI-3 EFA, and five items from the physical functioning and role limitation dimensions in the SF-6D EFA. Moderate correlations were observed between the shared factors for the respective pairwise comparisons.

3.3 Robustness of the Preferred Factor Models

Comparing the extent of overlap in the preferred factor models against alternative (larger or smaller) factor models identified differences in overlap for the pairwise analyses comprising the 15D, EQ-5D-5L, and HUI-3 (see Online Supplementary Material II–IV, respectively). For the 15D, a three-factor model suggested a higher degree of overlap with the ICECAP-A than the preferred four-factor model. For the three-factor 15D model, four 15D items [depression (0.691), distress (0.619), vitality (0.439), and sleeping (0.314)] and all five ICECAP-A dimensions were explained by Factor 1. For the EQ-5D-5L, a two-factor model confirmed the strong loading of anxiety/depression onto Factor 1, but the remaining four EQ-5D-5L dimensions now shared a common factor with ICECAP-A autonomy, which loaded onto both common factors. Differences with regard to autonomy were also observed for the HUI-3. Unlike the preferred three-factor model, a four-factor model showed that autonomy loaded strongly on a factor that was not shared by any HUI-3 items.

4 Discussion

The ICECAP-A was developed to overcome perceived limitations associated with existing preference-based instruments that focus primarily (but not only) on health-related aspects of quality of life. Our analyses have shown that the ICECAP-A provides information over and above that garnered from several commonly used preference-based HRQoL instruments. However, the level of overlap with the ICECAP-A varied across instruments. Compared with other preference-based HRQoL instruments, more common factors were identified between the ICECAP-A and AQoL-8D. Based on item loadings, these three common factors can be described as reflecting aspects of wellbeing (Factor 1), physical health (Factor 2), and relationships (Factor 3). Some but not all of these common factors emerged from other pairwise comparisons. The third factor, relationships, was not identified when comparing the ICECAP-A with the SF-6D, EQ-5D-5L, or HUI-3. Only one factor explained the overlap with the 15D, which was related to aspects of physical health.

Compared with other literature, similar results were identified by recent studies that conducted an EFA with the ICECAP-A and the EQ-5D-3L [28], as well as with the ICECAP-O and EQ-5D-3L [27]. In these studies, the respective ICECAP instrument and the EQ-5D-3L measured two separate but correlated factors, with the majority of the EQ-5D-3L items loading onto one factor and the majority of the respective ICECAP items loading onto the second. Only EQ-5D-3L anxiety/depression loaded strongly onto the same factor as four dimensions of the ICECAP-A (stability, attachment, achievement, and enjoyment) and ICECAP-O (attachment, security, role, and enjoyment), while ICECAP-A autonomy and ICECAP-O control loaded moderately onto both factors. The authors of the two previous EFA studies conclude that the EQ-5D-3L and ICECAP instruments provide complementary information and, therefore, should not be treated as substitute outcome measures. Specific to the EQ-5D-5L, these findings are confirmed by the current study owing to the relatively minimal overlap observed with the ICECAP-A.

Similar conclusions can be drawn about the 15D, HUI-3 and SF-6D, where relatively few items loaded onto the same common factor(s) as the ICECAP-A items. In contrast, the AQoL-8D provided good coverage of the three factors it shared with the ICECAP-A, with 18 AQoL-8D items loading on Factor 1 (wellbeing factor), 14 AQoL-8D items loading on Factor 2 (physical health factor), and six AQoL-8D items loading on Factor 3 (relationships factor). As a note of caution, the overlap observed between the AQoL-8D and ICECAP-A does not endorse any suggestion that these two measures are substitutes.

The observed differences in overlap across instruments may be the result, inter alia, of differences in the framing of items (e.g., question formats, response options, recall time, etc.), based on evidence of previous comparative studies of preference-based HRQoL instruments [29, 31, 49]. The combination of different health issues within a single item [e.g., anxiety and depression (EQ-5D-5L); downhearted and depressed (SF-6D); and sad, melancholic, or depressed (15D)] may also contribute to the differences observed between instruments. More generally, the fact that the instruments included in this study differ in the way they conceptualize HRQoL [32], and in their coverage of domains to define health states, is likely to be a primary reason for the variation in study findings [29, 30]. To illustrate, compared with other instruments, the AQoL-8D has a strong focus on the psycho-social domain (25 out of 35 items) and contains questions in its descriptive system that have the greatest ability to capture the concept of capability wellbeing, or wellbeing in general. As has been shown in a previous publication using data from the MIC study, which compared three subjective wellbeing instruments (Satisfaction with Life Scale, Personal Wellbeing Index, and the Integrated Household Survey of the Office for National Statistics) with preference-based HRQoL instruments, the AQoL-8D accounted for variation in subjective wellbeing to a greater extent than the other preference-based HRQoL instruments [50].

4.1 Implications and Directions for Further Research

This study has shown that the ICECAP-A, when compared directly with the 15D, EQ-5D-5L, HUI-3, and SF-6D, provides additional complementary information in terms of the impact of an intervention on an individual’s capability wellbeing. Recent studies have demonstrated that the choice of outcome measure for economic evaluation, i.e., selecting a capability measure or a HRQoL measure, is not a trivial issue [51, 52]. In an economic evaluation of an integrated care model for frail seniors, Makai and colleagues found the intervention had a higher probability of being cost effective when using the ICECAP-O when compared with use of the EQ-5D-3L [51]. This direct comparison of cost-effectiveness findings was made possible because (1) ICECAP-O responses were used to define ‘capability QALYs’ and (2) the same range of willingness to pay (WTP) values was applied in the analysis of capability QALYs and QALYs derived from EQ-5D-3L responses. Despite the use of identical economic evaluation approaches, Makai and colleagues go on to highlight that there are no estimates of WTP for a capability QALY, and state that it is unlikely that valid comparisons can be made between the ICECAP-O and EQ-5D-3L at a given level of WTP. A second example examined the cost effectiveness of psychological interventions for drug addiction [52], concluding that under the health maximization principle (using EQ-5D-5L), the results yielded different treatment recommendations when compared with the application of the ‘sufficient capability’ approach developed by Mitchell and colleagues (using ICECAP-A) [53].

Although methodologies to operationalize the use of ICECAP instruments in economic evaluation are still in their infancy [53], findings such as those in the above examples support the use of ICECAP instruments alongside preference-based HRQoL instruments to triangulate results and evaluate the robustness of conclusions regarding cost effectiveness. However, the use of different metrics to value different healthcare interventions raises questions about the objective for resource allocation decisions in healthcare [54], e.g., does health or wellbeing (or both) enter the objective function, and is the form of this function consistent with the current emphasis on maximization (rather than sufficiency)? To answer this question, further research is needed to determine whether a society is willing to sacrifice health outcomes for improvements in dimensions of wellbeing.

The use of ICECAP instruments within the current QALY-based paradigm for economic evaluation also requires further attention in health economics research. As mentioned above, the ICECAP-A is anchored on a ‘full capability’ and ‘no capability’ scale and the instrument was not intended to be used within the QALY framework. Recent advances in this area have proposed to adjust the ICECAP-A for time to enable the assessment of gains in terms of ‘years of full capability equivalence’ [24], and an approach that focuses on the objective of achieving ‘sufficient capability’ [53]. Outside the ICECAP instruments, Cookson suggested an application of the capability approach to economic evaluation by re-interpreting the QALY, referred to as the ‘capability QALY’ [55]. Cookson argued that, in practice, HRQoL instruments incorporate some elements of capability because health affects an individual’s freedom to choose non-health activities. Compared with the ‘health QALY’, this operationalization of the ‘capability QALY’ represents individuals’ entire wellbeing (not just the health component), and, therefore, reflects the value of the capability set. Concerns over using preference-based HRQoL instruments as the base of a capability QALY because they may neglect non-health dimensions of wellbeing led Cookson to conclude that, “… the QALY approach is compatible with the capability approach only insofar as the health state descriptive systems used for generating QALYs pay close attention to proxy capability variables that cover a wide range of health and non-health dimensions of wellbeing” [56]. Results from the current study suggest the AQoL-8D could be a measure that best fits Cookson’s notion of a capability QALY because of the overlap with the ICECAP-A and the presence of non-health items in the AQoL-8D descriptive system.

In a recent review, Karimi and colleagues conclude that existing capability measures (including ICECAP instruments) have important limitations because they do not elicit capability as originally intended in the capability approach [57]. Accordingly, if the added value of capability instruments in health economics is based solely on broadening the evaluative space to extend beyond a narrow focus on health, our findings provide evidence that such benefits can be potentially captured, to some degree, by the AQoL-8D (i.e., not only through the aggregation of outcomes collected by ‘complementary’ health-related and capability measures). However, as alluded to earlier, our findings do not imply the AQoL-8D and ICECAP-A are interchangeable instruments. Further work is needed to build on these findings and explore unanswered questions, such as whether individuals are able to distinguish between their capabilities and functionings, and the comparative performance of the ICECAP-A and AQoL-8D with regard to capturing the wellbeing impacts of interventions in different clinical contexts. It is also important to note that ICECAP instruments are not the only capability measures that could be combined with QALYs derived from HRQoL instruments to provide a broader assessment of the benefit of interventions. For example, the Adult Social Care Outcomes Toolkit (ASCOT) [10] is designed to capture information about an individual’s social care-related quality of life and further research is needed to explore the relationships (including overlap) between the ASCOT, ICECAP instruments, and preference-based HRQoL instruments.

4.2 Strengths and Limitations

A major strength of this study is the inclusion of multiple preference-based instruments. While previous studies explored the overlap between ICECAP instruments and the EQ-5D-3L (using much smaller samples [27, 28]), this study provides EFA results comparing the ICECAP-A with five preference-based HRQoL instruments. Conducting an EFA that uses data from the descriptive systems only (i.e., item-level response data) is a further strength because there is no reliance on country-specific index scores, where variations across national valuation studies could influence the results [58, 59]. Given that ‘overlap’ between instruments can be explored within the descriptive systems or health state valuations, this item-level analysis complements previous work that used correlation analyses and regression-based techniques to assess index scores from the MIC study [60]. The analysis also addressed the potential problem of factor under- or over-extraction by investigating alternative factor models to examine the robustness of the ‘preferred’ factor models [61]. Potential limitations associated with using data from a multinational survey include issues with the validity of instrument translations and the representation of the respective populations (for example, participants were required to have Internet access). Survey bias resulting from the repetition of similar items should also be acknowledged owing to the administration of seven preference-based HRQoL instruments.

5 Conclusion

The ICECAP-A has the potential to capture benefits of interventions and treatments that go beyond those measured by many of the traditional health-focused preference-based instruments, such as the 15D, EQ-5D-5L, HUI-3, and SF-6D. Substantial overlap was observed between the ICECAP-A and AQoL-8D. Researchers and decision makers should be aware that there is a risk of double counting when using the ICECAP-A as a complementary measure, but the level of such a risk varies depending on the choice of HRQoL measure. Further investigations are needed to explore the extent and implications of double counting, particularly when applying the ICECAP-A alongside the AQoL-8D.