Introduction

Because the effectiveness of treatment combined with an ageing population has increased the incidence of heart failure (HF) in recent years, the assessment of health-related quality of life (HRQL) in HF patients deserves special attention [1, 2]. Morbidity in HF patients not only gives rise to recurrent hospitalization, impaired exercise capacity and physical symptoms (shortness of breath and fatigue), but is also responsible for psychological problems, iatrogenic adverse effects and the curtailment of social activities. The improvement of all of these dimensions of patients’ lives is becoming a priority for cardiologists [3] and has expanded the role of patient-reported outcomes (PRO) in clinical research and practice [4, 5]. In addition, HRQL has recently been shown to be a good predictor of mortality and hospitalization in patients with HF [68].

A systematic review published in 2009 [9] confirmed the availability of at least five standardized and structured disease-specific instruments for measuring HRQL in HF patients: the Minnesota Living with Heart Failure Questionnaire (MLHFQ) [10], the Chronic Heart Failure Questionnaire (CHFQ) [11], the Quality of Life Questionnaire for Severe Heart Failure (QLQ-SHF) [12], the Kansas City Cardiomyopathy Questionnaire (KCCQ) [13] and the Left Ventricular Dysfunction (LVD-36) questionnaire [14]. On the whole, the review suggested that most of the questionnaires studied met the minimum psychometric criteria for assessing HRQL. Nonetheless, the MLHFQ was shown to be the most commonly applied disease-specific measure of HRQL in patients with HF (used in over approximately 100 publications the last 20 years). In fact, there are at least 34 linguistic versions of the MLHFQ [15]. The original US English version was developed by Thomas Rector in 1987 to assess the impact of HF on HRQL [10]. The questionnaire consists of 21 items that were intended from the beginning to make up a total score. Moreover, a physical and an emotional domain have typically also been calculated with eight and five of the 21 items, respectively. The other eight items (to add up to 21) are only added in for the calculation of the total score.

The MLHFQ total score was conceptually designed to be a summary of all the issues that have a bearing on the HRQL of HF patients. Its professional use throughout the world [16] and the fact that the development of more recent instruments has been inspired by the MLHFQ indicators [13] pay testimony to the suitability of its content and its underlying latent construct. However, the unidimensionality of this total score has never been methodologically confirmed. Furthermore, although the structure of the MLHFQ in terms of specific domains has been explored in some studies [1720], the measurement model as a whole has not been subjected to a factor analysis.

Consequently, this study aimed to verify the unidimensionality of the MLHFQ total score by exploring and confirming the questionnaire’s global measurement model and to evaluate the reliability and validity of the MLHFQ in the 21 country-specific versions.

Methods

The Minnesota Living with Heart Failure Questionnaire (MLHFQ)

The MLHFQ is self-administered and the response options of its 21 items are presented as a 6-point scale (0–5) ranging from “no impairment” to “very much impairment”. As mentioned earlier, the questionnaire is summarized in three scores: total (range 0–105, from better to worse HRQL), physical (range 0–40) and emotional (range 0–25). As proposed by the authors of the original version, these scores are computed by adding together the corresponding item responses, and the alternative mean imputation of missing values is conducted if missing items comprise less than half of those used to compute the scale [21].

The majority of the MLHFQ’s linguistic adaptations have been created following the standard forward and backward translation process [15], information on metric properties is only available for some them.

Study design

The evaluation of the properties of the MLHFQ was one of the aims of the International Quality of Life Outcomes Database (IQOD) project [22] funded by the European Commission. To accomplish this, cross-sectional data from three observational studies and five clinical trials were merged in a common database, including 3,847 HF patients from 21 countries (Table 1). The variables common to all of the studies included MLHFQ responses, functional capacity measured by the New York Heart Association class (NYHA) [23], cardiovascular risk factors such as body mass index (BMI) and smoking status, and the socio-demographic characteristics of the subjects. These variables were compared across countries using ANOVA and Bonferroni post hoc comparisons for continuous variables and a chi-squared test for categorical variables.

Table 1 Socio-demographic and clinical variables by country

Measurement model

The factorial structure of the MLHFQ was assessed in the international sample, which was randomly divided into two sub-samples for this purpose: one for conducting the exploratory factor analysis (EFA), the results of which were subsequently tested via a confirmatory factor analysis (CFA) on the other random sub-sample [24]; and one for conducting a category factor analysis. In the EFA, the most appropriate model (description of number of factors and item location) was selected based on two main criteria: (1) non-negative residual variances, and (2) factor loadings near or above 0.4. To test the factors identified in the EFA (specific domains within the MLHFQ) as well as for the existence of a general factor (the MLHFQ total score), a bifactor model structure was imposed in the CFA. This model allows all items to load in a general factor, regardless of whether they are part of one of the specific domains or not. This premise was fundamental in confirming the MLHFQ global measurement model, as the questionnaire contains several items that only count towards the total score. The CFA was performed using the weighted least squares method, and its goodness of fit was assessed using the Confirmatory Fit Index (CFI) and the Tucker-Lewis Index (TLI), which should be above 0.95; and the Root Mean Square Error of Approximation (RMSEA), which indicates an adequate fit at below 0.08. Both the EFA and the CFA were conducted with MPlus 4.2 [25].

Reliability and validity of the MLHFQ scores

The MLHFQ constructs confirmed during the previous step were evaluated in terms of reliability and validity in accordance with the basic recommended method [26]. These assessments were conducted for the overall sample (all 3,847 individuals) and for the particular case of each country. The distribution of scores was evaluated in terms of floor and ceiling effects (per cent of patients with worst and best possible scores, respectively). The Cronbach’s alpha coefficient [27] was calculated for the different constructs to assess internal consistency.

Known groups were defined by the NYHA classification in order to assess the discriminant validity of the MLHFQ. Different classes were collapsed into NYHA classes I-II and III-IV (a restriction already present in several of the pooled studies). As a first step, we looked at whether all the countries presented similar MLHFQ scores within the same NYHA groups (ANOVA with post hoc pairwise test using Bonferroni’s method to adjust for multiple comparisons). After testing for this homogeneity, the MLHFQ scores between NYHA groups were compared by means of a t test, and their magnitude was tested by means of effect size coefficient (ES = score mean difference/pooled SD). ES from 0.2 to <0.5 were considered small, while ES from 0.5 to 0.8 and above 0.8 were, respectively, considered moderate and large [28].

Results

The mean age of the patients included in the overall sample was 63.5 (12.1), ranging from 52.8 (12.6) in Brazil to 69.9 (7.3) in Sweden (Table 1). Most of the patients were male (74.2 % of the overall sample), with the exception of in Switzerland (47.6 %). Mean BMI ranged from 29.8 (6.5) in the US to 25.9 (4.7) in Brazil and was 27.3 (4.7) kg/m2 in the overall sample. On average, 23.9 % of the patients were smokers, but the percentages ranged from 9.3 % in Israel to 95.2 % in Switzerland. The two NYHA classification groups were similarly represented in the overall sample—49.1 % for I-II and 50.9 % for III-IV—but were heterogeneously represented across countries (e.g. 86.6 vs. 13.4 % in Brazil and 0 vs. 100 % in Switzerland). MLHFQ items did not present relevant percentages of missing values, >10 % on only: item 8 for Hungary (15.6 %) and Spain (10.8 %), and item 10 for Spain (13.6 %).

In the EFA, the 3-factor solution with a quartimin rotation (Table 2) yielded better results than the 2- and 4-factor outputs. This structure was fixed in the CFA, where the model presented excellent goodness of fit coefficients: CFI = 0.949, TLI = 0.988 and RMSEA = 0.065. The measurement model consisted of three specific factors and a general factor (Fig. 1). Factor 1 included eight items like the original physical score, but included item 1 (swelling in your ankles, legs) instead of item 7 (relating to or doing things with your friends or family difficult), based on the statistically significant loading from the CFA (arrows in Fig. 1). The emotional domain (Factor 2) consisted of five items—those that were considered strictly emotional in the MLHFQ score. Factor 3 included four items (working to earn a living; recreational pastimes, sports or hobbies; sexual activities; and money for medical care) which might be considered as related with the social environment in a wide scope, from individual relationships to contextual factors such as health services. Therefore, there were still four items that did not clearly load into any factor (relating to or doing things with your friends or family difficult, eat less of foods you like, stay in a hospital and side effects from medications). Finally, the most important result from the CFA was the confirmation of a single one-dimensional latent construct (which we might call HRQL-in-HF) for the total score, which included all 21 items. In fact, after controlling for the general factor, the specific domains provided little additional measurement precision.

Table 2 Original MLHFQ principal component structure (1992) and Exploratory Factor Analysis (n=1,936) results: 3 factors structure with factor loadings near or above 0.4 marked bold
Fig. 1
figure 1

The bifactor model tested by means of a confirmatory factor analysis. General and specific factors (and the given names) with the loadings of the items studied

Floor and ceiling effects were negligible in terms of the performance of these scores in the global international sample and when stratified by country. The best possible score was reached for more than 20 % of the patients in only in three particular cases (Table 3, in bold). In the overall sample, the Cronbach’s alpha coefficients were 0.9, 0.84, 0.72 and 0.92 for the physical, emotional, social environment and total scores, respectively. This estimator was above 0.8 for the physical score and above 0.7 for the emotional score in all particular cases in all countries. The Cronbach’s alpha for the social environment score was lower (0.4–0.82). The internal consistency coefficient for the MLHFQ total score was nearly 0.9 in all particular cases.

Table 3 Distribution of MLHFQ scores and internal consistency coefficients by country

The mean of MLHFQ scores calculated within each NYHA class was similar between countries (data not shown), with the exception of the social environment scores of Polish patients (p < 0.05). Besides this homogeneity across countries, the MLHFQ scores were lower in patients in NYHA classes I-II than in those in classes III-IV (p < 0.001) (Fig. 2), and presented ES above 0.5, with particularly high ES for the physical score (ES = 0.95). We therefore stratified by country for this score alone and found the same magnitude of differences between NYHA groups (Fig. 3), with the exception of Brazil, Finland and Hungary (ES from 0.39 to 0.47). Switzerland was not included in these analyses due to the lack of patients in NYHA classes I-II.

Fig. 2
figure 2

Means of MLHFQ scores calculated for the overall sample for patients in the NYHA I-II (grey) and in the NYHA III-IV (white) classifications, and the 95 % confidence interval and corresponding effect size coefficient of the differences across functional capacity groups

Fig. 3
figure 3

Means of the physical domain scores of the MLHFQ by country, for patients in the NYHA I-II (grey) and in the NYHA III-IV (white) classifications, and the 95 % confidence interval and corresponding effect size coefficient of the differences across functional capacity groups. Aus Australia, Brz Brazil, Can Canada, CzR Czech Republic, Den Denmark, Fin Finland, Fra France, Ger Germany, GB Great Britain, Hun Hungary, Isr Israel, Ita Italy, Neth Netherlands, Nwy Norway, Pol Poland, Slvk Slovakia, Spn Spain, Swd Sweden, Swtz Switzerland, US United States, Yug Yugoslavia

Discussion

Our results confirm the original measurement model of the MLHFQ and give rise to the possibility of assessing a third-specific domain concerning patients’ social lives. Furthermore, the reliability and validity of MLHFQ scores have been shown to be adequate among the different country-specific versions. These findings, together with the already proven simplicity, clarity, and good performance of the MLHFQ, definitely support the use of the questionnaire as an outcome for HF patients and add to the body of knowledge about it and its interpretation.

The measurement model of the MLHFQ has been explored using EFA in a large international and heterogeneous sample. The methodology yielded a proposal for the specific domains covered by the questionnaire that is nearly the same as that originally suggested by Rector. Only two differences arose: a minor modification regarding the content of the physical score, and the possibility of computing a social environment score. The original physical score included the item relating to or doing things with your friends or family; however, the model in this study excludes this item from the physical score and adds one that originally did not belong to any specific domain: swelling in the ankles or legs. This slight change may add face validity to the physical score while balancing the coefficients that depend on the number of items (thus, both the original domain and that suggested here contain eight items).

Another issue is the understandable concerns with regard to the third factor, which suggests the use of a new specific MLHFQ social environment score. The decision to differentiate this dimension was made based primarily on the better fit of the model compared to the bifactor structure, but also, from a conceptual perspective, due to the confirmation of its validity by means of the CFA, and the possibility of working and exploring this aspect of HF patient quality of life in further studies (without corrupting or changing any of the other well-established scores). Although they are generally believed to have better lives than people in previous decades, people today, and especially the elderly, tend to report poorer HRQL [31]. Consequently, chronic patients give more importance to social issues and to the lack of side effects from being under treatment than they used to, apart from the relevance of physical and emotional health. For the same reason, and considering the changes that have occurred regarding treatments and length of time patients can live with HF since the MLHFQ was constructed, newer disease-specific instruments for HF have included social and other constructs from their initial stages of development (i.e. the KCCQ has domains that quantify for social interference or self-efficacy) [13].

The two suggestions emerging from this study regarding the specific domains (one item replacement and the calculation of a third-specific score) do not represent a drawback in terms of comparisons with previously published data, nor do they prevent comparisons in follow-up studies. Nevertheless, although there should be no reason to avoid using these domains in the future, these recommendations may be dismissed, as has happened with the results of other studies [17, 20]. The modification of other instruments has been proposed without success, arguing poor methodology of the studies in which the changes are suggested [32], or limited dissemination of results, as well as a common general resistance to changing established approaches. In the specific case of the MLHFQ, another reason for the scarce implementation of recommendations made with regard to the specific domains might be the widespread use of its total score to the exclusion of the specific domains in many clinical publications that present HRQL as one outcome among many.

However, this is where the importance of the MLHFQ total score comes into play. In this study, two main characteristics have made it possible to address the crucial issue of the unidimensionality of the total score (i.e. the real existence of the one-dimensional latent construct underlying the MLHFQ total score). On one hand, the construct, which we might call “HRQL-in-HF”, has been confirmed through a bifactor model [33], which has allowed single items to directly load into both the general factor (or total score) as well as into one of the specific domains. In a second-order factor (a common model used in CFA of PRO measures), the general factor is constructed through its correlations with the specific domains (e.g. physical, emotional, social), which are considered first-order factors, and all the items must load into one or another of those specific factors [34]. In the MLHFQ measurement model (the original or the one presented in this study), some items are counted towards the total score without belonging to any specific domain. Consequently, an updated factorial analysis model had to be applied in order to support the validity of the MLHFQ total score as an overall measure of quality of life in HF patients.

On the other hand, the large sample studied has also contributed to the possibility of confirming the total score. In fact, by randomly splitting, the sample we were able, first of all, to perform an exploratory factor analysis with MPlus (which has never been published before), and to subsequently confirm the entire measurement model. Moreover, this sample contained a broad spectrum in terms of the MLHFQ country-specific versions and did not reveal a heterogeneous pattern of missing values (data not shown).

The fact that the structure has been validated with an international sample may constitute the first step towards a cross-cultural validation of the MLHFQ. Of all the existing linguistic versions of the MLHFQ, only a few have been validated [3541]. Most of the available information regarding the performance of the MLHFQ is based on the original version and came from the application of classical test theory. The lack of a minimum validation across MLHFQ country versions is inconsistent with the wide acceptance and use of this instrument in international clinical trials, effectiveness studies, and more recently, in clinical practice. As mentioned earlier, this study did not address the issue of cross-cultural validation as it should be understood, but it did yield results that are, for most MLHFQ country-specific versions, perhaps their first psychometric evaluation.

The general conclusion can be drawn that all of the various country-specific versions have the capacity to capture the entire range of HRQL impairments experienced by HF patients (low percentages of floor and ceiling effects for all four MLHFQ scores in all versions). With regard to both internal consistency and construct validity, our results confirmed the conclusions of other studies, mainly in relation to the original version [10]. The total and the physical scores seem to be capable of comparing individuals, presenting the highest Cronbach’s alpha coefficients (near the standard of 0.9). Moreover, these two scores also showed the best ability to differentiate between patients’ functional capacities, as they were moderately associated with NYHA classes (mean r = 0.6).

This study has some limitations which deserve consideration. The first and most relevant from the authors’ perspective is the lack of structural invariance assessment across country-specific versions [29]. This evaluation was limited by the size of the samples in some countries. One of our priorities was to include as many individuals as possible to ensure a large enough sample for exploratory and confirmatory factor analyses and heterogeneity in HF patient characteristics. A second limitation is the restricted validity assessment of the MLHFQ scores due to the nature of the sample. The pooled studies had different primary aims and presented only a few common co-variables (not including a common walking test or generic HRQL measure). Moreover, the NYHA classification is a controversial measure, the reliability of which may vary across countries. However, the collapsing strategy applied in this study may have counteracted these problems, as the main discordances typically involve classes IIIa, IIIb and IV [30]. Also, regarding the MLHFQ’s psychometric properties, follow-up data were not available to evaluate reproducibility or responsiveness. Finally, to take advantage of the international sample, the authors would have liked to assess the differential item functioning (DIF) across countries. However, the small size of some samples and the absence of common patient characteristics among the different countries studied limited these analyses. Further, country-specific works may be the most advisable designs to evaluate possible DIF between the original and each adapted version.

Conclusions

The findings of this study support the validity of using the MLHFQ to assess HRQL in HF patients, and confirm the robustness of its total score along with its capacity for covering three domains of life (physical, emotional and social environment) that are important to patients living with HF. Although it is commonly used all over the world, this is the first time that the MLHFQ model has been confirmed, and it has been done in a large international sample and through the application of an up-to-date methodology, 20 years after its initial development. Moreover, MLHFQ scores were found to have adequate reliability and validity among the different country-specific versions, thereby providing new information that justifies their use in research and clinical settings.