Introduction

Heart disease is the most costly of all chronic disease. The burden of heart disease is highlighted by Centre for Disease Control and Prevention in the US indicating costs of $219 billion [1] including healthcare, medicine and loss of productivity between 2014 and 2015 [2]. With ageing populations in many developed countries, these costs are expected to continue to increase significantly in the future [3]. New research developments and a continuous influx of new technology also generate increasing demands and expectations for new interventions for the prevention and treatment of heart disease, which is challenging in the context of healthcare resource scarcity [4]. This is posing a growing challenge for decision makers in allocating resources and subsidising important care among competing demands. Health economic evaluation assesses the incremental benefits and costs of new interventions which can provide crucial information on value for money to support decision-making [5].

One well-established outcome measure in economic evaluation is the quality-adjusted life year (QALY). The “quality” represents the health state utility value attached to different health states which lie on a 0–1 death-full health QALY scale. These utility values are normally derived from national-level valuation surveys of the general public which assess the preference for each health state [6]. Typically, QALYs are generated using multi-attribute utility instruments (MAUIs) (also referred to as preference-based measures) which consist of a health state classification system with a corresponding utility value set [7]. By far the most widely used generic MAUI is the EQ-5D [8]. However, there are compelling arguments that, for some diseases, generic MAUIs are not sensitive enough to measure the change in utility [9, 10]. A recent review of methods to derive QALYs in trial-based cost-utility analyses in heart failure found that EQ-5D was the most widely used measure, but that overall there were similar proportions of studies finding significant, non-significant and unreported significance in differences between interventions in QALYs [10]. While this may be because the interventions were ineffective, there is a concern that the use of generic preference-based measures such as EQ-5D in economic evaluation in heart failure may mean that the full impact of the intervention has not been captured due to the psychometric performance of EQ-5D in this patient population [11]. While condition-specific measures are used in trials, to generate utility values these are often mapped to the generic preference-based measure to generate utility values. However, in this mapping process the sensitivity of the condition-specific measure will not be reflected unless this is also captured in the generic preference-based measure [12, 13]. Use of a condition-specific measure that is sensitive to generate utility values for QALYs directly is therefore an advantage.

Heart disease is a collection of complex clinical syndromes presented as heart failure, atrial fibrillation and angina pectoris, among others. Measuring quality of life in patients with heart disease with complex symptoms is challenging for widely used generic instrument such as the EQ-5D [11]. In contrast, disease-specific quality of life instruments in heart disease include specific information such as shortness of breath/ chest pain and frequency and impact on sleep [14]. Additionally, emotional, and social impact of heart disease are difficult to detail with the limited questions posed by the EQ-5D [10]. The inability to participate in social activities, feeling excluded, worn out due to constant effect of chronic heart conditions are some of the health-related quality of life domains that heart disease-specific instruments are capable of capturing [14]. As such, there are calls for the inclusion of disease-specific measures when capturing the effectiveness of interventions in heart conditions [15], and disease-specific measures may have more content validity than generic measures. While available disease-specific measures are sensitive to changes of quality of life heart disease, none are preference-based. Generic measures may lack sensitivity to changes as they do not capture important symptoms of heart disease such as breathlessness, chest pain and aching legs [10]. Moreover, a recent review suggested sensitivity of the instrument measuring the quality of life as an important factor in determining cost-effectiveness of an intervention [10].

There are a number of heart disease-specific instruments covering various facets of such as heart failure [16] and angina[17]. The MacNew Heart Disease Health-Related Quality of Life Instrument [MacNew] is one of the most widely used heart disease-specific instruments [18] and has been found to be clinically sensitive and responsive in detecting a change in heart disease [19]. Reliability, validity, responsiveness and interpretability of this instrument has been tested [18,19,20]. However, the MacNew is not a preference-based instrument and cannot be used directly to calculate QALYs which is required in economic evaluation. Additionally, in its existing form, the MacNew has too many questions to be able to be reliably valued. Although there have been mapping functions reported to predict health state utility values elicited from generic preference-based instruments, since the targeting instruments are not heart disease-specific instruments, these mapping functions from MacNew may not be ideal [13]. To the best of our knowledge, no single preference-based heart disease-specific instrument is available. As economic evaluations underpin billions of dollars of public and private healthcare investment and expenditure, it is imperative that accurate measurement and valuation of changes in health for heart disease patients exists. There is evidence that disease-specific preference-based measures can accurately measure milder health states and differences have been observed in mean change and standard deviation of utility values between generic and disease-specific [21]. It is important to ascertain if heart disease-specific utility values would have an impact on the resource allocation decisions. This is the first step in the journey to determine if more sensitive and nuanced heart disease-specific utility weights would translate to saving money for health systems in the future. This study fills in the gap in the literature by developing a heart disease-specific health state classification system from the MacNew that is amenable to valuation. This will facilitate the development of a new preference-based heart disease-specific instrument. Preference-based measures (PBM) have two components; classification system and utility value set. Aim of this was to develop a heart disease-specific classification system.

Methods

The study collected data from patients with cardiovascular conditions using four heart disease-specific quality of life instruments: the MacNew Heart Disease Health-Related Quality of Life Instrument [18], the Minnesota Living with Heart Failure Questionnaire [16], Seattle Angina Questionnaire [17], and The Kansas City Cardiomyopathy Questionnaire [22], as well as two generic instruments the EQ-5D-5L and SF-36 with some sociodemographic information. The data were collected between June 2017 and June 2018 at the Royal Brisbane and Women’s Hospital (RBWH), Brisbane, Australia. Participants were approached in the RBWH cardiac outpatient clinics and hospital wards and invited to take part in the study. Adults above 18 years age, attending cardiology clinic or warded for treatment of any chronic heart condition were eligible for the study. The self reported heart disease category was confirmed by treating physician. They were asked to complete the questionnaire at the clinic or were given the option to complete the questionnaire at home and post it to the study team. Ethical approval was obtained from the RBWH Human Research Ethics Committee (HREC/17/QRBW/418).

MacNew

The MacNew instrument has shown strong psychometric properties [19] and unlike the other three instruments, it does not ask patients to consider a specific heart disease. The MacNew was evolved from the Quality of Life after Myocardial Infarction (QLMI) [23] which captured the quality of life changes in patients who underwent cardiac rehabilitation after an acute myocardial infarction (MI). MacNew consists of 27 items within three dimensions: physical, emotional and social. MacNew uses a Likert type item scoring system with 7 as the best and 1 as the worst quality of life.

Analysis

Development of the new classification system involved a three-stage process—Stage I: establish dimensionality; Stage II: eliminate and select items per dimension; Stage III: explore item-level reduction. This process has been used previously to derive the SF-6D from the SF-36 [24], and since used to generate several condition-specific preference-based measures [25,26,27,28].

Frequency distributions of the MacNew items to measure floor and ceiling effects and domain scores (global, physical, social and emotional) were firstly investigated. Spearman’s correlation coefficients were used to examine whether individual items had a substantial correlation with each other, as the reliability of a scale is based on the strength of the average inter-item correlation [29]. Mean inter-item correlations and corrected item total correlations were also checked. It has been suggested that inter-item correlations above 0.3 and below 0.7 are desirable [29]. Corrected item total correlations indicate how well each item correlates with the total of the other items in the scale.

Stage 1: establish dimensionality

A principal component analysis (PCA) was used as the extraction technique. Standard methods of PCA based on Pearson’s correlation assumes variables are cardinal and follow a multivariate normal distribution. Items 7 (“Happy, satisfied, pleased”), 11 (“felt dependent on”), 13 (“others confidence”) and 27 (“sexual intercourse”) were removed prior to the PCA as these items were deemed to not contribute to the quality of life of patients but rather reflect on patient confidence and satisfaction. In addition, the majority of respondents responded “not applicable” to items 27 rendering it non-relevant for the analysis. Initially, a three-component solution was followed according to the original theoretical domain structure of the instrument [18]. However, a more recent empirical study resulted in five components: emotion, restriction, symptoms, perception of others, and social [19]. We generated PCAs for 3–6 components in different iterations to seek the optimal solution. We followed the criteria recommended by Tabachnick and Fidell [30] in experimenting with different extraction techniques, considering a various number of components, and applying both oblique and orthogonal types of rotation techniques until the optimal and meaningfulness result dimension structure was obtained. This was following their instruction to obtain “solution with greatest utility, consistency and meaning” [30]. In deciding the number of components we used Kaiser–Meyer–Olkin Measure of sampling adequacy (KMO), Bartlett’s Test of Sphericity, Kaiser’s criterion, Cattell’s scree plot, percentage of variance and parallel analysis (using Monte Carlo PCA) as analytical criteria. We used previous published PCA on MacNew as well as clinical and patient advice to guide the meaning of components we extracted [19]. As the data given in the MacNew are ordinal in nature supplementary analysis was also undertaken using polychoric correlation matrix and results are reported in supplementary material. The PCA was conducted using SPSS and FACTOR software.

Stage II: eliminate and select items per dimension

Within each dimension resulting from stage I, we conducted Rasch analysis [31] to reduce the items. Rasch is the unidimensional variant of the Item Response Theory models and converts categorical responses to continuous latent scale [32]. All Rasch analyses were conducted using RUMM2030 software (31). We considered goodness of fit of each item within the fitted model for each dimension with range, spread, how the response was ordered, as well as the differential item functioning (DIF) for gender [26]. The goodness of fit was assessed using the individual item fit [33]. Fit residual values above 2.5 and significant probability values below the Bonferroni adjustment (probability base 0.05) were considered to determine non-fitting items [34]. A respondent with a high level of the latent trait within each dimension being measured would endorse high scoring responses on each item, and vice versa. This was presented in the threshold map of the analysis. DIF occurs when different groups within the sample (e.g. males versus females) respond in a different manner to an individual item despite no difference in underlying health. Generally, once an item was non-fitting to a model it was removed, and the model was re-estimated; however, we also investigated the potential reason for the non-fitting. It could have been that the item should be better grouped into a different dimension, or perhaps it represented a standalone dimension (for example, for symptoms). Ceiling and floor effects of each item were considered during item selection for each dimension [25]. We also examined item wording and its suitability for valuation. This resulted in a draft classification system.

Stakeholder engagement for item selection

Patient opinion and the collective opinion of the research team were also considered for item selection. A proposed classification system was discussed with the clinical team to ascertain the clinical importance of the selected items and to identify any omitted items that should be included due to their clinical importance. The MacNew developers were also consulted at this stage. The proposed health state classification was then presented to cardiologists, nursing and allied health practitioners (n = 20) who interact with patients with heart disease daily. They were presented with an unfinalized set of items in each dimension and were asked to select items based on their experience with patients. A revised classification system was next presented to a health consumer advisory group (n = 12) for heart disease, consisting of senior members of the society with numerous chronic heart diseases. This part of the study was conducted as a focus group discussion with a researcher experienced in the methods facilitating. The purpose of selecting items was explained to them and one by one the selected items were presented for their opinion. We also asked if any important aspects of QoL in heart disease had been missed.

Stage III: explore response-level reduction

In valuation, respondents only see one level for each item in the classification system, e.g. ‘none of the time’. The MacNew has different wordings for their items, ranging from “all of the time” to “none of the time”, “extremely limited” to “not limited at all” and “extreme shortness of breath” to “no shortness of breath”, with seven response levels each. The objective of stage III was to assess the possibility of reducing the original seven response levels to four levels in each item as seven levels may not all be distinguished in valuation. Rasch analysis and distribution of levels in selected items were used to guide the collapsing of adjacent levels and examine the ability of the respondents to identify the order of the levels. Threshold probability curves that give information on the distribution of item levels across the latent space were examined [33]. Domain-specific Rasch models were conducted with all the items per dimension including the excluded items. The levels were merged, taking into consideration the threshold probability curves, the frequency distribution of the levels, threshold maps and item goodness of fit statistics of the proposed merged levels [33].

Results

A total of 685 respondents participated in this study (Table 1). Patients were recruited mainly from the RBWH cardiology outpatient clinic with chronic heart disease conditions. The others were from wards and electrocardiography appointments. The majority (67%) were male. The age range was from 18 to 97 with a mean age of 62 (SD13.5) years. Stable coronary disease and chronic heart failure were the most common diseases reported. The mean MacNew Global and scale scores ranged from 5.3/7 (emotion) to a 5.6/7 (social) with global scale reporting 5.4/7 with a higher value indicating a better quality of life.

Table 1 Demographics of the respondents (n = 685)

Stage I: establish dimensionality

An oblique rotation technique was deemed most appropriate, and direct oblimin is suitable here as correlations between components are assumed and this method uses a parameter to control the degree of correlation between the components. The PCA indicated a four-component solution (Table 2). In our analysis, components loaded into emotion, restriction, symptoms and perception of others (Table 2). All symptom items loaded together into one dimension (Table 2). We divided the restriction dimension into physical and social dimensions (as they are important aspects of quality of life) creating final five dimensions; physical restriction, social restriction, emotion, perception of others and symptoms. The PCA conducted using polychoric correlation was similar to standard results except item 16 “aching legs” which loaded into the restriction component instead of symptoms. We have presented the standard PCA results in Table 2 and all comparison between Pearson’s correlation and polychoric correlation-based results in the supplementary file.

Table 2 Principal component analysis

Stage II: eliminate and select items per dimension

Table 3 reports the Rasch and psychometric analyses. The psychometric analysis displayed that all selected items (n = 23, after exclusions) performed well based on missing data (n = 0), floor effects and internal consistency. However, we observed substantial ceiling effects on several items (> 50%) (Table 3), in particular items 10 (tearful), 14 (chest pain), 18 (frightened), 22 (family overprotective), 23 (burden on others), 24 (felt excluded) and 25 (unable to socialise). All items in the restriction dimension, including the social and physical components, performed well. All symptoms except chest pain also showed good psychometric properties. Items with high levels of floor and ceiling effects were further investigated if they were appropriate to include.

Table 3 Rasch and psychometric analysis for item reduction

Physical dimension

Items 17, 20 and 26 were selected for this physical dimension (Table 3) from the restriction component. Although item 21 (unsure of exercise) was initially also considered, it was excluded as it was about uncertainty rather than physical activity. The three remaining items were modelled using Rasch. All three items fitted well, except that item 17 (limited in doing exercise) showed a discrepancy in the threshold map showing the ordering across the seven responses. Cross-tabulation of item 20 (restricted) and item 26 (physically restricted) showed that they are suffering from local dependency, and are similar in capturing physical restriction concerning the quality of life. As the adjective “physical” is available in item 26 there was consensus among clinicians and academics that item 26 best describes the physical domain of quality of life, hence item 26 was selected.

Social dimension

There were three items selected from the restriction component which describe the social aspect of quality of life; items 12 (social activities), 24 (felt excluded) and 25 (unable to socialise). In item 12, the words “usual” and “family” might not relate to everyone, as some people might not have family and others might not do their social activities with family. Therefore, item 12 was excluded. Items 24 and 25 were found to be covering the same quality of life space when they were cross-tabulated, and found to have similar answers, making one of them redundant. We selected item 24 after consulting with clinicians and patients.

Emotion dimension

Ten items loaded into this dimension in the PCA (Table 2). As items 3 (confident), 4 (discouraged) and 6 (worn out /low in energy) were not fitting they were excluded, and Rasch model was fitted again using the remaining seven items. These seven items fitted well together. Both items 1 and 15 have low floor and ceiling effects, ordered responses, and no DIF with a wide range and spread. There was consensus across clinicians, patients and researchers that item 1 (frustrated) should be in the final instrument. Item 15 (lack of self-confidence) also was selected based on the advice from the clinical team. Although not fitting, it was ultimately decided item 6 (worn out) should be included in the final instrument, due to its minimum floor and ceiling effects, good internal consistency and ordered responses. Item 6 description “felt worn out or low in energy” was considered as an important description of aspects of the emotional attributes of quality of life. We ultimately selected three items from this dimension; items 1, 6 and 15 based on the advice of patients and clinicians.

Perception of others

Two items loaded into ‘perception of others’ (burden on the family): items 22 (family overprotective) and 23 (burden on others). The wording “family” in item 22 may not relate to everyone, and high ceiling effects and DIF for gender in item 23 contributed to their exclusion. We concluded these aspects of the quality of life are already represented by the items of social and emotional dimensions, therefore we did not choose any item from this dimension.

Symptoms

We did not use Rasch analysis on this collection of items representing symptoms since they are not expected to be unidimensional. There were four items—9 (shortness of breath), 14 (chest pain), 16 (aching legs) and 19 (dizzy). The clinical team decided the most important symptoms were chest pain and shortness of breath and this was confirmed by the patient group. We created two separate dimensions from symptoms item 9 and 14. This selection was entirely based on clinician advice.

Stakeholder engagement for item selection

More than half of the clinicians agreed with item 26 for physical dimension, item 24 for social and item 1 for emotional. All clinicians agreed that shortness of breath and chest pain are the most important two symptoms selecting items 9 and 14. There were still doubts about items 24 or 25, whether item 15 is necessary when 6 is included, and if we have missed any important characteristics.

The patient advisory group debated which of the two social components items—item 24 or 25—best described their experience. The majority (9/12 patient participants) agreed that item 24 was better. Reasons for this included: “Item 24 includes your socialisation and sports or anything else, so it’s more general so it probably would capture more” “… [it’s] is a better question, because it’s very clear what that means to people”. Participants felt strongly and unanimously that item 15 was a critical aspect impacting the quality of life for people with heart conditions. Reasons for this included “Because of the physical restriction we become anxious and lacking in self-confidence” “It sometimes gets to a point where you just don’t go out, because there’s too many of those things to consider”. “… if you’ve got no confidence, people give up. And [people who’ve given up] are a cost to the health system”. Most patients felt that the proposed items captured the most important components associated with their quality of life. The only additional suggestion involved including an item on sexual and intimate relationships. The majority view was expressed in the following: “A lot of these things are playing on most of our minds, every single day. And I don’t think there’s anything else that you’ve missed”.

Stage III: explore item-level reduction

Examination of threshold probability curves for items suggested that levels “most of the time”, “a good bit of the time” and “some of the time” were closest together (Fig. 1). This means disordering of these curves and responders have displayed difficulty in distinguishing between levels [32]. This pattern was similar across the other items not selected in the classification system. Although not present in all items, most other items have “a little of the time” and “hardly any of the time” close to each other. These observations suggested collapsing levels 2, 3, 4 and 5, 6 together. Threshold curves of item 26 suggested collapsing levels 2, 3, 4, 5 together; however, this was not confirmed by its frequency distribution. In the emotion dimension, item 15 showed the same clumping of levels, confirming the four-level collapse. However, threshold curves of item 1 suggested five levels, collapsing 2, 3, 4 together and keeping 1, 5, 6, 7 as independent levels. Each suggested option was tested with the Rasch analysis and resulting in new frequency distributions. Collapsing levels 2, 3, 4 and 5, 6 resulted in the best-ordered models and fit. The collapsed 2, 3, 4 was named “some of the time” and 5, 6 was named “hardly any of the time” as they had the highest frequency in nearly 90% of the total items considered.

Fig. 1
figure 1

A Threshold probability curve for item 1 before rescoring. B Threshold probability curve for item 1 after rescoring

The final classification system MacNew-7D with four levels in each consisting of items 1(frustrated), 6 (wornout), 9(shortness of breath), 14(chest pain), 15(self-confidence), 24(excluded), 26(physical restriction) is presented in Table 4.

Table 4 New classification system MacNew-7D

Discussion

This study developed a new brief heart disease-specific health state classification system consisting of seven dimensions, each with four levels, from the widely used and validated MacNew instrument [18]. This classification system contains physical, social, emotion, symptoms of chest pain, and shortness of breath, which are well suited for this broad group of patients. The new classification system, based on the validated MacNew, is ideal for health state valuation and generating heart disease-specific utility values across a number of specific cardiac diseases. Moreover, it is a brief 7-item questionnaire and will have less burden on patients for completion.

The MacNew instrument is a quality of life instrument without an associated utility algorithm. The next stage of this project will be valuing the classification system derived here to generate a utility value set. More importantly, the resulting algorithm will facilitate the generation of utility values from the MacNew. Currently, there is a mapping (crosswalk) function that enables EQ-5D utility values to be predicted using MacNew scores [13]. The mapped EQ-5D utility values will not, however, reflect the sensitivity of MacNew, rather they reflect the sensitivity of EQ-5D and are dependent upon the predictive relationship between MacNew and EQ-5D which will contain error. In contrast, the results from the current programme of research will provide a more robust utility value set to be used to evaluate heart disease-related health interventions.

In selecting dimensions, we considered many aspects including the psychometric analysis results, the wording of the items, and opinions from the research team, clinicians and patients. The five components solution was chosen after conducting many PCAs as well as previous evidence which is similar to Dempster et al. [19] but differs from what has been reported from the original developers [18]. This result did not exactly match with Dempster et al. [19]; social items of our analysis were within the restriction dimension. Item 23 (burden on others) and 22 (family overprotective) loaded together in our analysis within “perception of others”. In contrast, item 23 was within the emotion dimension as reported by Dempster et al. [19]. In our results, physical and social functions were loaded together. We decided to divide them into two separate dimensions, as it is clear that social and physical functions are separate. There were three items relating to emotional functions; worn out, frustrated and self-confidence that loaded together but were separated for the classification system based on guidance from clinicians. Inclusion of these symptoms was later confirmed by the patient advisory group. Rasch analysis and item probability curves were the main informants used to determine the number of levels; however, our focus was to produce a classification system that would be amenable for a health state valuation. An instrument with seven levels would not be amendable to valuation since respondents to a valuation exercise would be unlikely be able to clearly distinguish between all of the different severity levels and the health states would become complex.

There are currently no other heart disease-specific preference-based measures. A heart disease-specific preference-based measure will be useful for resource allocation decisions within cardiovascular budget allocations. Once the utility value set is developed, comparisons can be made with the EQ-5D-5L generic utility values on discrimination and responsiveness of the utility values in this specific group of patients and the impact on economic evaluation analysis. It will be of importance if there are any effects on the incremental cost-effectiveness ratio by using this instrument, and the impact this would have on resource allocation decisions particularly when cost-effectiveness estimates are close to funding thresholds.

There are several limitations to this study. The MacNew-7D was derived from the responses of patients with heart disease inpatients and patients who attended outpatient clinics. However, our sample did not consist of patients with more severe conditions, such as those recovering from myocardial infarction. Therefore, some information about the ceiling effects may be overestimated. This classification system was developed using a single data sample. Nevertheless, we used a large sample (n = 685) [35] to prevent inter-sample variations. Moreover, clinicians and patients from a single site with heart disease provided face validity for the selected items, indicating the appropriateness of the classification system for heart disease. The aim of the analysis was to construct a health state classification system using single or multiple items from each dimension [33]. In this analysis, PCA was used before Rasch analysis [36]. They were used independently of each other [32, 33, 36]. PCA was used to identify structurally independent (with little correlation) components. This helps to avoid illogical health states defined by the resulting health classification system. Rasch was used to select items from each component[36].

We developed the first heart disease-specific classification system based on an existing disease-specific quality of life instrument. There are seven dimensions to the instrument with four levels in each. The classification system will be valued to enable utility values to be directly generated to estimate QALYs from all existing and future MacNew datasets.