Introduction

According to the United States Renal Data System, the prevalence of chronic kidney disease (CKD) continues to grow each year, with the incidence of patients receiving hemodialysis reaching 310 per million population in 2004 [1].Footnote 1 In addition, annual Medicare expenditures for outpatient hemodialysis cost nearly US$6.7 billion in 2004, a 9.9% increase from 2003 [1]. As the prevalence and costs of hemodialysis continue to increase, accurate evaluation of treatment outcomes in CKD patients becomes increasingly important, not only in terms of economic burden but also on how this complex chronic illness affects individuals’ quality of life.

The measurement of health-related quality of life (HRQOL) has become increasingly common in recent years as an important indicator of health and well-being. Health-related quality of life outcome data are frequently used to determine healthcare effectiveness, including medication and procedural treatment effects as well as resource allocation and policy development [2, 3]. For example, utility HRQOL measures generate a single summary score that can be translated into quality-adjusted life years (QALYs). Theoretically, QALYs provide a standard metric by which results can be compared across studies using any utility or preferenced-based measure [4]. QALYs can have significant implications in clinical practice in that they allow comparison analysis of various treatments regardless of the HRQOL outcome measure used. However, despite the significant role that HRQOL measurement has in health care and economic evaluation, a lack of clarity remains in how HRQOL is measured and results interpreted.

Health-related quality of life is an important issue for patients with CKD receiving hemodialysis. Hemodialysis, although not a cure for CKD, helps prolong and improve patients’ quality of life [5]. Hemodialysis requires that patients be connected to a dialysis machine several hours a day at least three times a week, during which time they are essentially immobile. Social activities, physical functioning, and mental health are impacted due to the constraints of hemodialysis as well as from the effects of the CKD, which can include fatigue and nausea. A number of studies have demonstrated that perceived HRQOL of patients receiving hemodialysis is significantly impaired [69].

Although many HRQOL tools have well-documented validity and reliability, each takes a different approach to measuring the highly complex construct of HRQOL. These differences may lead to conflicting results, depending upon which tool is used. In a review of the literature over the past 10 years, several studies found that different HRQOL measurements, both profile and utility tools, yielded widely varying results within the same study sample (Appendix). Relatively few studies have compared HRQOL measurements in patients receiving hemodialysis. One study compared the EQ-5D and the SF-6D in the CKD population [10]. In this study, the authors did not find any significant differences between the two utility measures; however, they concluded that the EQ-5D was the preferred measurement tool because it had a higher response rate than the SF-6D. Another study compared HRQOL utility measures with the disease-specific HRQOL measures, the Kidney Disease Quality of Life Instrument (KDQOL), and found moderate correlation between the HUI3 and KDQOL [11]. Although the Quality of Well-Being Scale-Self-Administered (QWB-SA), SF-6D, and the Kidney Disease Quality of Life Instrument-Short Form (KDQOL-SF) have been used in a number of studies of patients receiving hemodialysis [1214], no studies were found that compared these HRQOL measures in this population. Given the significant impact that hemodialysis has on HRQOL, as well as the important role that measurement of HRQOL has in health-care decision making and allocation of resources, further evaluation of HRQOL measures in this chronically ill population is important.

The primary objective of this study was to test the hypothesis, using confirmatory factor analysis, that the QWB-SA, SF-6D, and the Kidney Disease Component Summary (KDCS; a subscale of the KDQOL-SF) measured the same construct, HRQOL, in a sample of patients receiving hemodialysis for CKD. Based upon previous research comparing other HRQOL measures, we expected to find these tools measured different aspects of HRQOL. We will explore how different HRQOL measures used within the same sample may produce conflicting results and how this issue may be addressed in future studies. Lastly, because the KDCS is a fairly new reported measure and was revised for this study, a secondary objective of our study was to examine the components of the KDCS and refine as necessary.

Methods

Study design

Data for this investigation were from a large prospective observational study comparing outcomes and costs of care of veterans dialyzing at U.S. Department of Veterans Affairs facilities or in the private sector [15]. Any patient who had received care at a Veterans Affairs facility within the prior 3 years and was receiving hemodialysis for CKD was eligible for enrollment. Patients were excluded if they: (1) had a live kidney donor identified; (2) required care in a skilled nursing facility; (3) had a life expectancy of less than 1 year, determined by a nephrologist; (4) were cognitively impaired; (5) had a severe speech or hearing impairment; (6) were not fluent in English; and (7) had no access to a telephone for follow-up contact.

Patients were recruited from eight Veterans Affairs medical centers with outpatient dialysis facilities. Institutional review board approval was obtained from all Veterans Affairs sites. Coordinators at each site explained the study and obtained written informed consent from patients who were interested in participating. A total of 364 patients consented to participate in the study with 322 subsequently completing baseline questionnaires.

Measures

The three HRQOL measures used in this study—QWB-SA, SF-6D, and KDQOL—are summarized in Table 1.

Table 1 Characteristics of the QWB-SA, SF-6D, and KDQOL instruments

Quality of well-being scale-self-administered 1.4

The QWB-SA [16] is a preference-based measurement of quality of life and is derived from the longer, more complex, Quality of Well-Being Scale (QWB) [17]. The QWB-SA, a 76-item questionnaire, assesses objective level of functioning in three areas: mobility, physical activity, and social activity. In addition, the QWB-SA measures the presence of 58 acute and chronic symptoms. The scores are combined to form one preference score on a continuum scale between .30 (for death) to 1.0 (for perfect health) [18]. In addition to one overall preference score, scores for the following four, preference-weighted subscales can also be calculated: (1) mobility scale, (2) physical activity scale, (3) social activity scale, and (4) the symptom/problem complexes scale. However, few studies have reported the subscales of the QWB. The QWB-SA has been used in several studies with a wide variety of disease entities, including migraines, diabetes, and posttraumatic stress disorder [1921]. Test–retest reliability of the original QWB ranged from .83 to .98 [18]. Reliability and validity of the QWB-SA are similar to the original QWB [16].

Short-form-6D

The SF-6D is a preference-based HRQOL measure that can be derived from either the Short-Form-36 (SF-36) or Short-Form-12 (SF-12) [2224]. Both the SF-36 and SF-12 are measures of quality of life and comprise eight, health-related dimensions: (1) physical functioning, (2) role limitations related to physical functioning, (3) bodily pain, (4) general health perceptions, (5) vitality, (6) social functioning, (7) role limitation related to emotional problems, and (8) mental health [25, 26]. Using an algorithm based upon preference weights, these eight dimensions are condensed into six dimensions (general health is omitted, and role limitation related to emotional problems and role limitations related to physical functioning are combined) and a single score is calculated to produce the SF-6D index score [22, 23, 27]. The SF-6D index score ranges on a continuum from .296 (most impaired) to 1.0 (full health) and can be translated into QALYs [28]. Although fairly new, the SF-6D has been used in a number of studies with a wide range of patient populations [2932]. Studies have demonstrated good test–retest reliability and discriminant validity of the SF-6D [32, 33]. For this study, reliability was good (Cronbach’s alpha = .702).

Kidney disease component summary

The KDCS is a subscale of the KDQOL questionnaire. The KDQOL was developed as a self-report, health-related quality of life measurement tool designed specifically for patients with CKD [34]. The 134-item KDQOL was later condensed into the 80-item Kidney Disease Quality of Life Instrument-Short Form (KDQOL-SF) [35]. The questionnaire consists of the generic SF-36 [25] as well as multi-item scales focused on quality of life issues specific to patients with kidney disease. The kidney-disease-specific subscales are listed in Table 1. All subscales are scored on a 0–100 scale, with higher numbers representing better HRQOL. A few studies were found in which scores from the 11 kidney-disease-specific subscales were averaged to form a KDCS [3639]. However, no literature was found describing the psychometrics of the KDCS. The KDQOL-SF has been widely used in several studies of patients with kidney disease, including the ongoing, international Dialysis Outcomes and Practice Patterns Study (DOPPS) [13, 36, 4044], and has demonstrated good test–retest reliability on most dimensions.

Statistical analyses

Baseline data from the QWB-SA and SF-6D were converted into utility scores based upon the scoring algorithms from Kaplan et al. [16] and Brazier and Roberts [24], respectively. Preference-weighted single attribute scores were used as indicators of the domains of the SF-6D and QWB-SA. Missing data (<5%) were replaced in the SF-6D with values using a multiple imputation procedure [45]. All other data were complete (n = 322), except for the KDCS subscale for sexual activity, which was completed by only 55 participants. Therefore, the sexual activity subscale of the KDCS was not included in the calculation of the KDCS, resulting in 10 kidney-disease-specific subscales of the KDQOL-SF generating the KDCS for this study. Descriptive statistics (mean, 95% confidence intervals, median, and range) and Pearson correlations of the QWB-SA, SF-6D, and KDCS were calculated using SAS version 9.1 (SAS Institute, Cary, NC, USA). Ceiling and floor effects for each instrument were compared by determining the highest and lowest scores for each tool.

One way to assess the degree of commonality and uniqueness across the three HRQOL instruments is merely to inspect the Pearson correlations among scores on the three instruments. However, because unreliability in measurement attenuates (i.e., dilutes) the strength of observed correlations [46, 47], differences in the reliabilities of the three instruments will produce differences in the apparent degree of relationship among the instruments, thereby yielding spurious conclusions. Confirmatory factor analysis overcomes this statistical problem by adjusting correlations among the instruments for measurement error (i.e., unreliability), thereby disattenuating correlations and providing unbiased estimates of the degree of interrelationship.

Maximum-likelihood confirmatory factor analysis was completed using LISREL 8.8 [48] in three phases: (1) analysis of each instrument (QWB-SA, SF-6D, and KDCS) separately using a one-factor model representing HRQOL containing each instruments’ respective subscales; (2) consideration of a one-factor model combining the subscales of the QWB-SA, SF-6D, and KDCS, in which the one factor represented HRQOL (Fig. 1); and (3) analysis of a three-factor model containing the subscales of the QWB-SA, SF-6D, and KDCS, in which each factor represented its corresponding HRQOL measure (Fig. 2). Models were refined as necessary. Factors were allowed to intercorrelate.

Fig. 1
figure 1

Completely standardized solution path diagram of one-factor model of HRQOL measures. Standardized factor loadings of one-factor model. Chi-square = 364.86, df = 119. Boxes represent HRQOL subscales from the QWB-SA, SF-6D, and KDCS. Long arrows include factor loadings and short arrows represent measurement error

Fig. 2
figure 2

Completely standardized solution path diagram of three-factor model of HRQOL measures. Standardized factor loadings of three-factor model. Chi-square = 241.53, df = 116. Boxes represent HRQOL subscales from the QWB-SA, SF-6D, and KDCS. Long arrows include factor loadings and short arrows represent measurement error. Curved arrows represent factor intercorrelations

In addition to the overall test of fit (the χ 2 statistic), six fit indices were used to assess model adequacy: (1) the goodness-of-fit index (GFI) [49]; (2) the root-mean-square error of approximation (RMSEA) [50]; (3) the standardized root-mean-square residual (SRMR) [51]; (4) the comparative fit index (CFI) [52]; (5) the normed fit index (NFI) [53]; and (6) the non-normed fit index (NNFI) [53]. Acceptable standards of fit for the absolute fit indices, which indicate how well the data fit the theoretically proposed model, are: GFI >.90 [53], RMSEA ≤.08 [54, 55], and SRMR ≤.05 [54, 55]. Browne and Cudeck [56] suggest that RMSEA values ≤.05 indicate close fit, and RMSEA values between .05 and .08 indicate adequate fit. The CFI, NFI, and NNFI are relative or incremental fit indices and compare the theoretical model to the null model in which all observed variables are uncorrelated [53]. A fit index above .90 for the CFI, NFI, and NNFI is considered to be acceptable [53].

Results

Table 2 summarizes the demographic and clinical characteristics of the sample. Consistent with the veteran population, the majority of participants were male. Participants had been receiving hemodialysis for an average of 2.50 years, and most had at least one comorbidity with the most common ones being hypertension (84%), diabetes (63%), and congestive heart failure (46%).

Table 2 Demographics of sample

There were 322 QWB-SA, SF-6D, and KDCS forms available for analysis. Summary statistics for each HRQOL measure are reported in Table 3 and correlations between instruments are reported in Table 4. None of the HRQOL measures revealed ceiling or floor effects.

Table 3 Summary statistics for each instrument (n = 322)
Table 4 Pearson correlation coefficients of QOL variables and index of physical impairment

Confirmatory factor analysis

Phase 1: one-factor analyses of each individual instrument

In the first phase of the confirmatory factor analysis, the subscales of each instrument—QWB-SA, SF-6D, and KDCS—were considered in three separate one-factor models representing HRQOL. As expected, for both the QWB-SA and SF-6D, the one-factor structures fit the data very well with all six fit index thresholds met (Table 5). However, the one-factor structure was a poor model for the initial set of 10 KDCS subscales, with all fit indices outside recommended thresholds of acceptability (Table 5). Completely standardized factor loadings, in which both the items and the factors were standardized to have variances of 1.0, were strong for the 10-item KDCS, ranging from .494 to .740, except for three subscales—patient satisfaction (.312), work status (.233), and dialysis staff encouragement (.205)—suggesting that these subscales measured something other than HRQOL. Work status may have loaded weakly because most participants (91.6%) in this study were not working. Patient satisfaction and dialysis staff encouragement seem to measure satisfaction with care rather than HRQOL. Therefore, the three subscales of work status, patient satisfaction, and dialysis staff encouragement were dropped from the KDCS, resulting in 7 subscales. Subsequently, the 7-subscale KDCS one-factor model fit the data reasonably well, with 4 out of the 6 fit index thresholds met (Table 5). The 7-subscale KDCS one-factor model fit the data significantly better than the 10-subscale KDCS in this sample. Therefore, the 7-subscale KDCS was used in subsequent phases of the confirmatory factor analysis.

Table 5 Fit statistics for each individual instrument (n = 322)

We computed the reliability of both the 10-subscale and 7-subscale versions of KDCS total score using Mosier’s [57] formula, which requires a reliability estimate for each subscale that is combined to construct the composite total score. For each multi-item subscale, at least two reliability estimates were available: the Cronbach’s alpha (α) reliability coefficient and the squared multiple correlation (R 2) from the one-factor CFA solution. Whereas the former reflects the average inter-item correlation for each subscale as an index of internal consistency, the latter represents the proportion of variance in the subscale that the underlying KDCS factor explains as an index of measurement reliability. Weighting each subscale equally, we computed separate Mosier reliability estimates using both α and R 2, although we were able to use only R 2 as a reliability estimate for the single-item measure of patient satisfaction (which is part of the 10-subscale version of KDCS total score). These Mosier reliability estimates for KDCS total score were as follows: 10-subscale version (using α: .88; using R 2: .77); 7-subscale version (using α: .91; using R 2: .81). Note that, although both KDCS total scores achieved acceptable reliability, Mosier reliabilities were noticeably higher for the 7-subscale version of KDCS total score, compared to the 10-subscale version; and in each case, the estimated reliability of the composite total score was lower when using R 2 versus α, reflecting the fact that R 2 was lower than α for each multi-item subscale.

Phase 2: one-factor analysis combining subscales of QWB-SA, SF-6D, and 7-subscale KDCS

The second phase of the confirmatory factor analysis comprised evaluating a one-factor model imposed on the data for the subscales of the three HRQOL measures: QWB-SA, SF-6D, and 7-subscale KDCS (Fig. 1). The one-factor structure proved to be a poor fit to the data (Table 6). Although the CFI, NFI, and NNFI suggested reasonably close fit, the GFI, RMSEA, and SRMR were outside recommended thresholds of acceptability. These findings suggested that, although the one-factor model came close to fitting the data, the combined subscales of the three HRQOL instruments measured more than one factor.

Table 6 Fit statistics for HRQOL measurement models (n = 322)

Phase 3: three-factor analysis combining QWB-SA, SF-6D, and 7-subscale KDCS

Because the originally proposed one-factor model did not demonstrate uniformly acceptable fit, a three-factor model was imposed on the data for subscales of the three HRQOL measures (Fig. 2). The three-factor model fit the data well, with 5 of the 6 fit indices within acceptable ranges (Table 6), and explained between 30 and 74% of the variance in the individual subscales (Fig. 2). The SRMR of .052 was near the indicated threshold of acceptability of ≤.05. Thus, the three-factor model fit the data significantly better than the one-factor model, as evidenced by a difference in the χ 2 of 123.3 between the two models as well as more acceptable goodness-of-fit indices.

In the completely standardized solution for this three-factor model (Fig. 2), the SF-6D and 7-subscale KDCS correlated .911 (P < .05), indicating that 83% (i.e., .912 = .83) of the variance in the 7-subscale KDCS was correlated to the SF-6D, and vice versa. As expected, factor correlations between the HRQOL measures were higher than their corresponding Pearson correlations (Table 4).

Refinement of model: two-factor analysis

The strong correlation between the SF-6D and 7-subscale KDCS (.911) suggested that a two-factor model combining the SF-6D and KDCS might fit the data. Therefore, a two-factor model with the combined measures (SF-6D and 7-subscale KDCS) and the QWB-SA was imposed on the data (Fig. 3). This two-factor model came close to fitting the data with 5 of the 6 fit indices being within acceptable ranges (Table 6). The SRMR of .056 was just slightly higher than the recommended threshold of ≤.050. Thus, the two-factor model fit the data reasonably well and was more parsimonious [58].

Fig. 3
figure 3

Completely standardized solution path diagram of two-factor model of HRQOL measures. Standardized factor loadings of two-factor model. Chi-square = 264.65, df = 118. Boxes represent HRQOL subscales from the QWB-SA, SF-6D, and KDCS. Long arrows include factor loadings and short arrows represent measurement error. Curved arrows represent factor intercorrelations

Discussion

In regards to our hypothesis that the subscales of the QWB-SA, SF-6D, and the KDCS measured the same construct, we found evidence to the contrary. We determined that the QWB-SA measured a related but different construct than the SF-6D and KDCS. This was evidenced by only 38% (i.e., .622 = .38) of the variance shared between the combined SF-6D and KDCS with the QWB-SA compared to 83% of the variance shared between the SF-6D and KDCS.

Our CFA offered strong evidence that the QWB-SA, SF-6D, and 7-subscale KDCS measured more than one factor in this study sample. A three-factor model fit the data well. However, a two-factor model, in which the highly correlated SF-6D and 7-subscale KDCS were combined, fit the data almost as well as the three-factor model and was more parsimonious. The QWB-SA and SF-6D, were found to measure different, but similar, constructs in this study.

One would have expected that the QWB-SA and the SF-6D, both generic, preference-weighted tools, would have been more similar to each other than the SF-6D with the KDCS, a disease-specific, non-preference-weighted instrument. The difference we found between the QWB-SA and SF-6D may be related to variances in preference valuation techniques, measured health states, and conceptual bases. Brazier and Roberts [24] reported that, in the SF-6D model, mental health appeared to be the second most important dimension in determining health states, following bodily pain. The findings of this study support those found in a meta-analysis which distinguished quality of life measures and health status in quality of life research [59]. The meta-analysis concluded that tools such as the QWB, that do not measure psychological functioning (deemed an important component of QOL), may be considered to be measures of perceived health rather than QOL [59]. A significant implication of the conceptual differences between these tools is that QALYs derived from these different tools cannot be compared.

Although no other studies that specifically compared the QWB-SA, SF-6D, and the KDCS were found, the finding that these HRQOL measures were not interchangeable is consistent with other studies that have compared various HRQOL measures in different patient populations (Appendix). The plethora of HRQOL measures with little standardization has been repeatedly addressed over the years [6062]. Suggestions to improve HRQOL studies have included providing details in articles regarding why a particular HRQOL measure was chosen over others to be used in the study [62] and standardization of HRQOL tools for specific populations [63]. Although some of these recommendations have been implemented, such as the coordination of task forces to standardize instrumentation within a specific patient population (for example, the DOPPS [44]), more work is needed to be able to reliably compare results of HRQOL studies and have confidence in the construct validity of the measures. We believe that a large-scale meta-analysis of instruments used to assess HRQOL is required to determine and compare overall levels of reliability and construct validity across the various available measures in the many different health-related areas. This approach would enable researchers to make choices about what instruments to use based on objective rather than subjective preference. Ultimately, researchers need to come to consensus on which HRQOL instruments best capture the concept of HRQOL and end the proliferation of different instruments to measure HRQOL.

A secondary objective of this study was to explore the components of the KDCS and refine as necessary. After eliminating the subscale of sexual activity because of missing data, we found that the subscales of patient satisfaction, work status and dialysis staff encouragement loaded weakly in our one-factor model suggesting that these subscales measured something other than HRQOL. Thus, a 7-subscale KDCS one-factor model fit the data significantly better than the 10-subscale KDCS in this sample. In addition, reliability of the 7-subscale KDCS was stronger than for the 10-subscale KDCS. These findings may have implications for how the KDCS is calculated in future studies. However, it is important to note that a high percentage of participants in our sample did not work (91.6%) as compared to those reported in other studies of hemodialysis patients, which ranged from 70.6 to 81.1% unemployed [13, 64, 65]. The low employment rate in our sample may be one reason why work status had such a low factor loading (.23). A sample with a higher proportion of employed subjects could lead to a choice of using different subscales of the KDCS than we used in our study.

A significant strength of this study is that it is the first known study to utilize confirmatory factor analysis (CFA) to compare the subscales of the HRQOL measures within a study sample. There are several advantages to using CFA over Pearson and intraclass correlations in evaluating how well HRQOL measures within the same sample relate to one another. The first advantage is that, unlike Pearson and intraclass correlations, CFA removes measurement error, which can dilute the observed strength of the associations. The second advantage to CFA is that it allows the researcher to compare how well competing conceptual models fit the data [66]. Thus, in using CFA for this study, we were able to confidently determine that the QWB-SA, SF-6D, and 7-subscale KDCS measured more than one construct.

This study included limitations. The first is that, because this study included mostly males, it is unclear whether or not results can be generalized to females with CKD receiving outpatient hemodialysis. Secondly, the sexual activity subscale of the KDCS was not included in the analysis because of missing data. Missing data for this subscale may have been related to the sensitive nature of the questions or the severity of physical impairment of the patients in this sample. Thirdly, the high unemployment rate in our sample may have influenced our choice of KDCS subscales to use in the CFA. Lastly, the results of the confirmatory factor analysis may have been different if item level scores or total scores, rather than subscales scores, were used.

Conclusion

Using CFA, we were able to provide strong evidence that the SF-6D and KDSC were more closely related to each other than with the QWB-SA. The differences between these utility tools have significant implications for how QALYs derived from the instruments can (or cannot) be compared. More work regarding construct validity of HRQOL instruments is needed to provide future researchers with the information required to make sound decisions when choosing an appropriate HRQOL instruments.