FormalPara Key Points for Decision Makers

A condition-specific preference-based measure (CSPBM) is a measure of HRQOL that is specific to a condition or disease that also has a set of preference weights that enable a health state utility value to be generated each time the measure is completed.

CSPBMs have a useful role in health technology assessment (HTA) where a generic preference-based measure (generic PBM) is not appropriate, sensitive or responsive as they can provide appropriate health state utility values that capture change in that condition.

Due to issues of comparability across different patient groups and interventions, the usage of CSPBMs in HTA is generally limited to interventions where it is inappropriate to use a generic PBM.

1 What is a Condition-Specific Preference-Based Measure of Health?

This paper provides a definition of a condition-specific preference-based measure (CSPBM) of health or health-related quality of life (HRQOL) and critically examines its role in health technology assessment (HTA) and beyond. The paper provides an overview and summary of all existing CSPBMs, thus providing a resource of references for all CSPBMs across all conditions that have been derived in the literature. The paper also summarises available psychometric evidence on the performance of CSPBMs, and provides guidance on the advantages and disadvantages of using CSPBMs for HTA in comparison with generic preference-based measures (PBMs) such as the EQ-5D.

A CSPBM is a measure of HRQOL that is specific to a certain condition or disease and that also has a set of preference weights that enables a utility value to be generated from responses to the measure. Analogously to generic measures, a CSPBM consists of (1) items or questions that are typically completed by the patient to report their own health, (2) a classification system which is used to classify the self-reported health of the patient into a health state and (3) a value set that enables a utility value to be produced for every health state described by the classification system. CSPBMs typically include dimensions that are important for that condition but generally not important across all patient groups. Each CSPBM is unique and their content varies substantially. Some CSPBMs include a range of dimensions covering both generic and condition-specific aspects (e.g. a cancer-specific measure with dimensions of physical functioning, role functioning, pain, emotional functioning, social functioning, fatigue and sleep disturbance, nausea, and constipation and diarrhoea [2]), whereas others are focussed upon symptoms (e.g. a measure for flushing [a side effect of niacin medications] with dimensions of redness of skin, warmth of skin, tingling of skin, itching of skin, and difficulty sleeping [3]). Some CSPBMs are uni-dimensional and have several items relating to the same dimension (e.g. the measure for flushing [3]), whereas others are multi-dimensional (e.g. the measure for cancer [2]). CSPBMs can be developed from new, ‘de novo’, or can be derived from an existing condition-specific measure.

2 What is the Role of Condition-Specific Preference-Based Measures?

CSPBMs have a role in HTA where a generic PBM is not appropriate, or has poor psychometric performance in a condition or patient group, as they provide appropriate utility values under these circumstances. Where a generic PBM has been shown to perform poorly in terms of sensitivity or responsiveness (e.g. vision and hearing, severe and complex mental health problems and dementia, as discussed in Sect. 2), it is not expected that it will accurately capture the impact of an intervention on the HRQOL of the patient. For example, if a generic PBM has been shown to suffer from ceiling effects for a condition then an improvement in HRQOL following an intervention cannot be captured. In addition, a generic PBM may fail to capture all aspects of HRQOL that are important for that patient group. In contrast, CSPBMs are designed to capture the aspects of HRQOL that are important for that condition, and unlike a generic PBM this is likely to include symptoms, sometimes alongside more generic dimensions of HRQOL (e.g. a cancer-specific measure with dimensions of physical functioning, role functioning, pain, emotional functioning, social functioning, fatigue and sleep disturbance, nausea, and constipation and diarrhoea [2]).

In circumstances where a generic PBM has been shown to be appropriate for a condition, CSPBMs can be used in sensitivity analyses of the economic model to indicate how the use of the generic PBM, which although appropriate may be less sensitive or responsive to changes in health, may have impacted on incremental cost-effectiveness ratios.

CSPBMs have a role in HTA external to the economic model to demonstrate additional benefits that may not be captured by the generic PBM and provide additional supporting evidence. CSPBMs also have a wide role outside of economic evaluation where they can be used to compare health and treatment effects across different studies within a patient group. The inclusion of CSPBMs in a wide range of studies provide utility values that are relevant for that condition as they take into consideration the specific aspects of health that are important for that condition. These utility values can be reported alongside the detailed HRQOL data provided from the condition-specific measure that the CSPBM is derived from (e.g. reporting condition-specific EORTC QLQ-C30 HRQOL data alongside CSPBM data from the EORTC-8D for patients with prostate cancer [4]).

3 Development Issues

3.1 Development from an Existing Condition-Specific Measure

The advantage of deriving a PBM from an existing condition-specific measure is that the existing measure has already been used in many studies, and therefore existing datasets can be used to generate utility values. In addition, the existing measure is likely to have been validated and is likely to have evidence of good psychometric performance.

Figure 1 outlines the six-stage process developed by researchers at the University of Sheffield to derive a CSPBM from an existing condition-specific measure [1]. Stages I–IV derive the classification system and stages V–VI derive the value set for every health state described by the classification system. The classification system consists of multiple dimensions with typically one item to reflect that dimension, with several levels of severity.

Fig. 1
figure 1

Modified from Brazier et al. [1]

Six stages for deriving a condition-specific preference-based measure from an existing condition-specific (non-preference-based) measure

Stages I–IV derive the classification system using a combination of factor analysis, Rasch analysis and classical psychometric analysis. Factor analysis can be used to either confirm the dimensional structure of the existing condition-specific measure, to propose a different dimensional structure indicating where dimensions are not independent or where items within the same dimension capture different concepts [1], or to propose a dimension structure for the existing condition-specific measure that does not have one proposed by the instrument developer [5, 6]. Rasch analysis is a mathematical technique that enables qualitative data to be converted onto a continuous latent scale using a logit model [7, 8]. Classical psychometric analyses are used to indicate the performance of each item within each dimension and include floor and ceiling effects, correlation between items and dimensions, responsiveness over time and levels of missing data.

Stage I involves the derivation of the dimensions using a combination of factor analysis and the existing factor structure of the measure, and stage II uses Rasch analysis or item response theory and classical psychometric analysis to select the best item(s) to reflect each dimension in terms of coverage, ordering of levels, no differential item functioning across different groups, low floor and ceiling effects and good responsiveness. Stage III considers reducing the item levels to ensure that readers can accurately distinguish between each item level. Stage IV validates stages I–III, preferably on an independent dataset, to ensure the classification system has not been impacted on by the choice of dataset used to derive the classification system.

Stage V entails a valuation study typically with members of the general population to value a sample of health states, as it is generally not feasible to value all health states within the full classification system as typically there are too many. Stage VI involves regression analysis of the valuation data to produce a decrement from the reference level for every level of every dimension. This enables a utility value to be generated for every health state described by the classification system. Stages V and VI typically involve the same procedure as valuation of a generic PBM (see Sect. 2 for an overview). One additional challenge is that some CSPBMs may be uni-dimensional, or have a uni-dimensional component; for example, a CSPBM for flushing or common mental health problems. For uni-dimensional measures or components, valuation can be adapted to take this uni-dimensionality into consideration through the selection of health states for valuation using Rasch analysis, which does not require independence of items [3, 9].

At every stage, clinical input is used and often the instrument developer of the existing condition-specific measure is also involved. Some measures have also involved patients to ensure that the classification system includes all aspects that are important to patients (e.g. see [10]). Other measures have been developed using psychometric analyses on multiple existing condition-specific measures in order to select the best performing dimensions and items across these measures (e.g. [11]).

3.2 Developing a New Measure ‘De Novo’

The advantage of developing a new measure is that it does not have to be based on an existing condition-specific measure, as for some patient groups existing measures may not cover all important aspects of HRQOL. However, there will be no pre-existing evidence on the psychometric performance of the new measure, which can be important for some international agencies when they are examining the appropriateness of the usage of a CSPBM. It may therefore be necessary to establish the psychometric properties of the measure before it can be recommended for usage.

Developing a new measure involves a modification of the six-stage process. Guidelines for the development of dimensions and items for new measures are available from the US Food and Drug Administration (FDA) [12]. Patient involvement is emphasised at every stage of developing a classification system for a new measure, including both the generation and the validation of the content. Approaches in the literature include qualitative research with patients to identify dimensions, items and item wording, (e.g. [13]). The valuation of the measure is as described above in stages V and VI used to value a CSPBM derived from an existing condition-specific measure.

4 Description of Condition-Specific Preference-Based Measures

Papers developing CSPBMs either from existing condition-specific measures or ‘de novo’ that were published in English were identified using (1) a literature search conducted in December 2010 [1] and updated in March 2016 for the purpose of this paper and (2) a recent review of the literature [14]. Measures have been excluded that do not provide utility weights; that do not anchor utilities on the 1–0 full-health–dead scale; that derive utilities by mapping from a condition-specific measure to own utility values (as this is mapping, not a PBM). In total, 36 CSPBMs were identified across a range of 29 conditions. The CSPBMs are summarised in Table 1 and further details are provided in Appendix 3 (see electronic supplementary material, Table updated and modified from [63]).

Table 1 Summary of existing condition-specific preference-based measures

5 Psychometric Properties of Condition-Specific Preference-Based Measures

5.1 Psychometric Performance of Condition-Specific Preference-Based Measures in Comparison with Existing Condition-Specific Measures

There is limited evidence comparing CSPBMs to the existing condition-specific measure they are derived from [1, 14]. However, evidence suggests largely comparable psychometric performance in terms of discrimination across severity groups and responsiveness to change over time between the existing condition-specific measure and CSPBMs for asthma, cancer, common mental health problems and overactive bladder [1].

5.2 Psychometric Performance of Condition-Specific Preference-Based Measures in Comparison with Generic Preference-Based Measures

There is limited evidence comparing CSPBMs and generic PBMs [1, 14]. However, evidence suggests that CSPBMs in asthma, cancer, common mental health problems and overactive bladder offer an advantage for measuring milder health states, and are less prone to ceiling effects than the EQ-5D [1]. The ceiling effects of EQ-5D have been widely reported in the general literature examining the performance of EQ-5D (see for example [41]), and therefore for patients with mild health problems CSPBMs may be more likely to provide a more accurate measurement of HRQOL and capture change in HRQOL. The evidence also suggests that these CSPBMs and a measure in vision better discriminated across severity groups than the generic PBM they were compared with [1, 42,43,44]. It is recommended that the psychometric properties of any CSPBM are examined prior to their usage to inform HTA, and preferably compared with a generic PBM to confirm where they offer an advantage.

Mean change over time and differences in utility values between different severity groups have been found to be smaller for CSPBMs than generic PBMs, with smaller standard deviation, in particular in comparison with EQ-5D [1] (although this may not always be the case [43, 45]). Any differences may impact on incremental cost-effectiveness ratios, and may potentially impact upon whether interventions are considered cost effective. However, research in this area has been limited to a small number of datasets on a small number of conditions, CSPBMs and generic PBMs, and the existing published evidence is unlikely to be representative across all CSPBMs. Further research in this area is encouraged.

6 Selecting a Measure for Economic Evaluation

Recent ISPOR taskforce guidance provides a framework for researchers considering the collection of utility data for HTA [46]. An important consideration is the appropriateness of the measure for the condition and population, and the choice will also depend on the requirements of the agency to which the economic evaluation will be submitted (see [47]). However, an important consideration is whether to use a generic PBM or a CSPBM. Table 2 outlines the advantages and disadvantages of generic PBMs and CSPBMs with reference to different criteria: completion of the measure by the patient, psychometric performance, HRQOL coverage, issues with the valuation process used to elicit the utility values and comparability of values for use in HTA.

Table 2 Advantages and disadvantages of generic and condition-specific preference-based measures

Overall, CSPBMs offer the advantages of lower patient burden for completion, they are more relevant to the patient, are less likely to suffer from ceiling effects, and the existing condition-specific measures they are derived from are typically sensitive and responsive. However, there are disadvantages in that they may not be able to capture the impact of all side effects and comorbidities, their elicited utility values may be prone to exaggeration from focussing effects, the values they generate are not directly comparable across different conditions, and they are not accepted in the base-case cost-effectiveness analyses by many international agencies.

It is important to note that the advantages and disadvantages of CSPBMs vary both by the exact measure and the patient group it is administered to. The content of CSPBMs varies widely, where for example a CSPBM in cancer [2, 10] may be perceived as more generic in its dimensions, and could even have ‘bolt-on’ dimensions for certain cancers, whereas other CSPBMs such as for flushing are uni-dimensional [3]. It is also important to note that the psychometric performance of measures differs across patient groups, and hence a measure that is appropriate for use in some patient groups is not necessarily appropriate in all patient groups.

Generic PBMs have the advantage that they offer comparability across patient groups and interventions, have no issues in their valuation and can arguably capture comorbidities where these occur in the generic dimensions of HRQOL. However, they may not be responsive or sensitive and suffer from ceiling effects, and may not be relevant to the patient and potentially increase patient burden where they are included in addition to the condition-specific measures that are included for multiple reasons unrelated to populating the economic model.

It has been argued that CSPBMs can provide utility values that are comparable to generic PBMs as they can be derived using the same methodology as a generic PBM (e.g. a large number of CSPBMs have been derived using a time trade-off interview with the UK general population as also used by the EQ-5D UK value set), and utility values are anchored on a comparable 1–0 full-health–dead scale required to generate QALYs. However, there remains the issue of the differences in descriptive systems, and issues in the valuation of CSPBMs due to labelling the condition (disease labelling of health states can impact on elicited values [48]) and focussing effects (respondents focus only on the areas of HRQOL mentioned and exaggerate their importance) that may mean that there are important underlying issues of comparability. For this reason, to enable comparability in HTA conducted across interventions and patient groups a generic PBM is typically recommended for use in base-case analyses, and a CSPBM is typically only recommended where evidence demonstrates a generic PBM is inappropriate (see for example prescriptive guidance by NICE [49]), or for use alongside a generic PBM in sensitivity analyses.

7 Summary

The paper provides an overview and summary of all existing CSPBMs, providing a resource for researchers. There are a large number of CSPBMs across a wide range of conditions, and the coverage of these measures varies from covering a wide range of dimensions to more symptomatic or uni-dimensional measures. CSPBMs have a useful role in HTA where a generic PBM is not appropriate, sensitive or responsive. Due to issues of comparability across different patient groups and interventions, their usage in HTA is typically limited to conditions where it is inappropriate to use a generic PBM, or in sensitivity analyses. Widespread use of CSPBMs rather than generic PBMs in HTA would reduce comparability of evaluations of interventions across different patient groups. For this reason CSPBMs are not recommended as a common replacement for generic PBMs, rather they offer important evidence alongside generic PBMs or where generic PBMs are inappropriate. Evidence suggests that CSPBMs offer an advantage in more accurate measurement of milder health states. However, CSPBMs can fail to capture comorbidities and all side effects. Mean change and standard deviation can differ from generic PBMs, and this may impact on incremental cost-effectiveness ratios.