Introduction

The assessment of quality of life is an important issue in social and health policy [14]. Quality of life in old age has been the focus of much recent research and underpinned all of the projects funded by the UK Economic and Social Research Council ‘Growing Older’ research programme [57]. However, the assessment of quality of life faces particular challenges, both theoretical and methodological. First, quality of life is a complex, multidimensional construct, embracing both objective and subjective aspects of individuals’ lives [8]. Second, any measure must be suited to the population of interest, and in particular must be sensitive to differences in determinants and perceptions of quality of life associated with age, or those that relate to environment and living arrangements [1, 9, 10]. Finally, once a measure has been developed, it must exhibit the desired psychometric properties of a measurement tool [11].

The CASP-19 is a quality-of-life scale for use in old age, originally generated and validated in a representative UK sample of 264 people in ‘early old age’ (65–75 years) [12]. Given the diversity of older people’s experience and social situation, an important issue with any newly developed scale is the extent to which it can be used in different older populations and in different environments. This study therefore sought to test the properties of the CASP-19 in a UK population whose demographic profile and residential status differed markedly from the general population sample with which it was developed.

In this paper, we report quantitative measures of the scale’s psychometric properties, specifically, the distribution of scores, its factor structure, the internal consistency of its domains, and its construct validity. Comparisons are drawn with other studies of the scale’s psychometric properties in individuals aged 55 years and older [12, 13], based on the British Household Panel Survey (BHPS) [14] and the English Longitudinal Study of Ageing (ELSA) [15]. A recently proposed 12-item version of the scale [13] is also assessed.

Methods

The CASP-19

The CASP-19 is based on a needs-satisfaction model and focuses in particular on higher-level needs in the Maslovian hierarchy [16]. The scale conceptualizes quality of life in sociological—rather than in biological or medical—terms and explicitly resists a conflation of quality of life with health status, with its potential to medicalize old age [17].

The CASP-19 is a summative scale of 19 items, tapping four domains: control, autonomy, pleasure and self-realization; the scale includes both positively and negatively worded items (see Table 1). Control is defined in terms of an individual’s ability to intervene in his or her environment, whereas autonomy is freedom from unwanted interference from others [12]. Higgs et al. argue that ‘once these requirements [control and autonomy] have been met then individuals need to use them to pursue the reflexive process of self-realization through activities that make them happy’ [17, p. 245]. Each item is scored on a four-point adverbial scale (‘Often’, ‘Sometimes’, ‘Not often’ and ‘Never’), giving a possible range of scores of 0–57, with higher scores representing better quality of life.

Table 1 Domains and items in the CASP-19

In our study, the CASP-19 scale was included in a larger questionnaire used to measure aspects of health and well-being among residents of a UK retirement community that provides accommodation for people aged 55 years or older. The site has been redeveloped, and the previous 1950s rental accommodation replaced by purpose-built, self-contained houses, bungalows and apartments, available for rental or purchase. When the study began in June 2006, there were 160 residents living in 132 properties. An overarching aim of retirement communities is to maintain and improve residents’ quality of life, and a robust means of measuring this construct is therefore a key concern within such an environment.

The CASP-19 was originally validated for use as a self-completed measure, but as previous experience of using it in this way had shown a high rate of missing values [18], the scale was interviewer-administered in the present study. Response options to most questions were displayed to the respondent on one of a series of show-cards. A total of 122 residents agreed to be interviewed.

Psychometric evaluation

The factor structure of the scale was tested using confirmatory factor analysis (CFA), with scale items designated as ordinal. The factor structure of a measurement tool represents the relationship between its constituent items and one or more latent variables, specified a priori. CFA allows one to assess the fit of the observed data to this factor structure, in terms of the number of latent variables and the relationship of individual scale items to these latent variables [19]. A close fit between the data and the hypothesized latent variables serves to confirm the factor structure. A number of different indices of fit exist, and their use and interpretation are the subject of ongoing debate [2023]. Attempts have been made to establish cutoffs for these indices, but these are controversial in view of the various factors that may affect the magnitude of fit indices (e.g. sample size, model complexity, nature and extent of misspecification, level of measurement of the data [24]). There is broad consensus, however, that indices should not depend unduly on sample size, that they should take appropriate account of model parsimony and that their magnitude should not be interpreted mechanistically. Moreover, in the absence of a single robust measure of fit, it is recommended that a combination of such measures should be used and interpreted collectively, and in relation to other aspects of the CFA model [2426]. Three indices were used in the present study:

  1. 1.

    The Tucker Lewis index (TLI)

  2. 2.

    The comparative fit index (CFI)

  3. 3.

    The root mean square error of approximation (RMSEA).

These indices permitted a direct comparison with the results for the CASP-19 presented by Wiggins et al. [13]. They each determine the fit of the current model with reference to a null model in which the covariances between the items are zero (the ‘independence’ model). They are also considered to be resistant to the effect of sample size, and the TLI and RMSEA reward model parsimony [21, 27]. For the TLI and the CFI, values approaching unity are desirable, with values greater than approximately 0.95 considered to indicate a good fit, whereas for the RMSEA values should approach zero, preferably below approximately 0.05 [19, 28, 29].

Following Wiggins et al. [13], three different factor structures, of increasing complexity, were tested (see Fig. 1a–c in online supplement):

  1. 1.

    the single-factor model, whereby all 19 items are posited to measure a single latent variable, ‘quality of life’.

  2. 2.

    the domain model, whereby the items are deemed to measure four latent variables—the four domains of the CASP-19 indicated in Table 1; the domains are permitted to covary, but are not considered to represent a single, higher-order latent variable.

  3. 3.

    the domain and total model, in which the items depend upon the four domains, as specified in the ‘domain model’, and these four domains depend in turn upon a single ‘quality of life’ latent variable.

In addition, Spearman item-total correlations were calculated for all items, in relation to their own and other domains, to explore the dimensionality of the scale at the level of individual items; when correlated with its own domain, each item was excluded from the calculation of the domain total. An item’s correlation with its own domain should exceed approximately 0.30 [30], and items should correlate more strongly with their own than with other domains [31].

Having sought to confirm the factor structure of a scale through CFA, the consistency with which individual scale items tap their respective domains can be assessed by means of internal consistency analysis. This was achieved by calculating coefficients of composite reliability from the estimates provided within the CFA, as described by Raykov [32]. This reliability coefficient provides a more valid measure of internal consistency than Cronbach’s alpha, which will underestimate true reliability if items load unequally on a domain (i.e. when tau-equivalence does not obtain) [33], but equally may overestimate reliability under other circumstances, such as when measurement errors are correlated [34, 35]. The composite reliability coefficient can take values from 0 to 1, and higher values indicate that scores on a scale are internally consistent. DeVellis [36] regards values for internal consistency reliability below 0.60 as unacceptable, between 0.60 and 0.65 as undesirable, between 0.65 and 0.70 as minimally acceptable and between 0.70 and 0.80 as respectable.

Evidence of construct validity of the scale within this population was sought by correlating scores on the scale with other theoretically distinct measures to which it would be expected to be related. Two measures that had been incorporated in the larger questionnaire were used for this purpose. The Satisfaction with Life Scale (SWLS) is a five-item summative measure of subjective well-being developed by Diener et al. [37]. Higher scores indicate higher life satisfaction, and a positive correlation with the CASP-19 would be hypothesized. The SF-12 (v2) is a 12-item health status measure, derived from the SF-36 [38]. It generates two scores: a physical component summary and a mental component summary. For each of these components, higher scores indicate better health status, such that a positive correlation with the CASP-19 would be expected. We took it that if the CASP-19 were a valid measure of quality of life, it should correlate positively with both life satisfaction (SWLS) and health status (SF-12).

Data were analysed in Mplus 6 and SPSS 17, and statistical significance was set at P ≤ 0.05 (two-tailed).

Results

One hundred and twenty-two residents responded to the questionnaire, representing 75% of those living in the community at the time. There were one or more missing values on the CASP-19 in two questionnaires (due in one case to the respondent’s cognitive difficulties, and for an unidentified reason in the other case); all analyses were performed on the 120 questionnaires with complete data.

The mean age of respondents was 77.5 years (range 55–98), which is just above the top end of the ‘early old age’ range (65–75) with respect to which the scale was developed and validated. It is also higher than the mean age of those 55 or over in the first wave of ELSA (68 years; [39]) and in the 11th wave of the BHPS (68 years; http://www.iser.essex.ac.uk/ulsc/bhps/). Additionally, whilst our sample was drawn from a purpose-built retirement community, the data in these earlier studies were derived from older people living in the general population. There were 34 (28%) male and 86 (72%) female respondents. The percentage of female respondents is higher than in ELSA (54%) and in the BHPS (55%). Nearly half of the respondents (59; 49%) were widowed, 51 (43%) were married or cohabiting, five (4%) were divorced/separated and the remaining five (4%) had never married. The percentage of widowed respondents (43%) was greater than in ELSA (21%) and the BHPS (23%), and the percentage of those married or cohabiting (43%) was correspondingly lower than in ELSA (65%) and the BHPS (64%).

The mean (SD) total score on the CASP-19 was 40.24 (9.04), and individual scores ranged from 15 to 57. The variation in scores—as expressed by the coefficient of variation (SD/mean)—was 0.22. Scores exhibited a slight negative skew (median score = 41.0; skewness = −0.43, SE = 0.22), and accordingly the data departed slightly from normality (though not significantly on a Kolmogorov–Smirnov test; P = 0.093) (see Fig. 2 in online supplement).

The mean (SD) scores for the four domains were 7.38 (3.01) for control; 10.92 (2.69) for autonomy; 13.22 (2.17) for pleasure; and 8.73 (3.45) for self-realization. The control domain has one fewer item than the others and thus a smaller maximum achievable score; to assist comparison across domains, the mean score on this domain adjusted for the number of items (i.e. domain score × [5/4]) is 9.23. The distribution of domain scores for control, autonomy and self-realization was in each case fairly symmetrical; skewness ranged from −0.39 (SE = 0.22) for control to −0.44 (SE = 0.22) for self-realization. However, the negative skew in the pleasure domain scores was very marked (−1.48; SE = 0.22); the median score was 14, and 42 (35%) respondents recorded the maximum score of 15 (see Fig. 3 in online supplement).

Table 2 shows the CFA results for the CASP-19 in this population, alongside those reported by Wiggins et al. [13]. Overall, there is little evidence of a good fit. The ‘single-factor model’ appears to show the worst fit, and there is then little to choose between the ‘domain model’ and the ‘domain and total model’. The fit indices are of a similar magnitude to those reported by Wiggins et al. [13].

Table 2 Measures of fit for CASP-19, using confirmatory factor analysis

In addition, each item was correlated with total scores for both its own domain and the other three domains (Table 3). Mean item-total correlations within each domain were 0.38 (control), 0.25 (autonomy), 0.44 (pleasure) and 0.51 (self-realization). Three items (items 6, 9 and 14) had low correlations with their respective domains (0.06, 0.12 and 0.15, respectively); these correlations fell below Kline’s [30] proposed minimum of 0.30. Items 6 and 9 had negative correlations with one or more of the other domains. Additionally, for nine items (items 1, 2, 3, 4, 5, 8, 11, 15 and 17), correlations with one or more other domains were stronger than the item’s correlation with its own domain (Table 3). These 12 potentially questionable items included all six items in the scale that are worded negatively (items 1, 2, 4, 6, 8 and 9; see Table 1).

Table 3 Correlations (Spearman’s) between items and domains

Table 4 shows the composite reliability coefficients. All coefficients but that for the autonomy domain achieved the minimum acceptable magnitude of 0.60 [36]. Further analysis of the autonomy domain, following the method of Raykov [40], indicated that the deletion of item 6 would raise the reliability coefficient to 0.59, while deletion of item 9 would raise it to 0.57, suggesting that the low reliability of this domain relates to these items. Correlations between the domains of the CASP-19 are shown in Table 5, together with the corresponding values obtained by Hyde et al. [12].

Table 4 Internal consistency of the CASP-19
Table 5 Pearson correlations among the four domains for the CASP-19

In relation to the construct validity of the CASP-19, its correlation with the SWLS was moderately high, r = 0.66 (95% CI 0.55, 0.75). Somewhat lower correlations were found with the SF-12 for the physical component summary (r = 0.53; 95% CI 0.38, 0.65) and the mental component summary (r = 0.49; 95% CI 0.34, 0.62).

A possible CASP-12

As a result of their evaluation of the CASP-19, Wiggins et al. [13] propose a 12-item version of the scale. The control and autonomy domains are combined into a single control/autonomy domain, which comprises 6 items—numbers 1, 2, 4, 5, 7 and 9. The new pleasure domain consists of items 10, 11 and 12, and the new self-realization domain is made up of items 15, 18 and 19 (see Fig. 4 in online supplement).

We evaluated this scale using our data, in relation to the ‘domain and total model’. The fit statistics from the CFA analysis are shown in Table 6, alongside those obtained by Wiggins et al. [13] in their samples. All indices suggested a good fit. The mean item-total correlations were as follows: 0.32 (control/autonomy), 0.60 (pleasure) and 0.51 (self-realization). Two of the six items in the control/autonomy domain (items 2 and 4) had stronger correlations with a domain other than their own, as did two of the three items in the self-realization domain (items 15 and 19). As in the original 19-point scale, the scores in the pleasure domain displayed a marked negative skew (−1.78; SE = 0.22); 60 (50%) respondents recorded the maximum score of 9.

Table 6 Measures of fit for the ‘domain and total score’ model for the CASP-12, using confirmatory factor analysis

The composite reliability coefficients were 0.69 (95% CI 0.62, 0.75) for control/autonomy, 0.91 (95% CI 0.90, 0.92) for pleasure and 0.74 (95% CI 0.74, 0.80) for self-realization.

Discussion

This study has provided information on key psychometric properties of the CASP-19 quality-of-life scale in a different population—in terms of both demographic profile and residential status—from that in which it was originally validated.

A concern with measurement instruments when used in specific populations is that they may exhibit floor or ceiling effects, decreasing the ability of the scale to discriminate at its lower or upper extreme. Accordingly, the approximately symmetrical distribution of the total scores on the CASP-19 in this population is reassuring, as is the fact that scores cover a large proportion of the available range. The negative skew in the data is similar to that originally reported by Hyde et al. [12]. Among the domain scores, those for pleasure are notable for a very marked negative skew, with 61% of respondents scoring within one point of the scale maximum. This suggests that there is a strong ceiling effect for this domain among this population.

The CFA results suggest that the ‘single-factor model’ is a worse fit than the ‘domain model’ or the ‘domain and total model’. The fit statistics for the latter two models are similar, though on average marginally better for the ‘domain model’; this ranking of the models in terms of fit reflects Wiggins et al.’s [13] findings. However, for neither the ‘domain model’ nor the ‘domain and total model’ was there a very good fit. Viewed from a theoretical perspective, given that there is little to choose between these two models in terms of their dimensionality, the ability of the ‘domain and total model’ to provide a total summative quality-of-life score in addition to domain scores gives it greater utility.

Indications of where mismatches with the hypothesized factor structure may lie are provided by the item-domain correlations. Mean item-total correlations, within each domain, were noticeably lower for the control and autonomy domains than for the pleasure and self-realization domains. Moreover, all four items in the control domain and two out of five items in the autonomy domain correlated more strongly with at least one other domain total score than with their own. Only two items in the autonomy domain had an item-domain correlation at or above 0.30. In respect of these two factors at least, the dimensionality of the CASP-19 is questionable in this population, and the negatively worded items in the scale appeared to be implicated here.

Regarding the internal consistency of the four domains, whereas the absolute values of composite reliability obtained for the control, self-realization and pleasure domains do not cause undue concern, that for the autonomy scale falls below DeVellis’ minimum threshold of acceptability of 0.60 [36]. Items 6 and 9 are likely to be the source of the low reliability of this domain. It should again be noted that these items are worded negatively. While the use of both positively and negatively worded items will potentially counteract acquiescence bias and certain other response sets, it may impair the validity of a scale in other respects [41]. Hence it may be wise to reconsider the inclusion of negatively worded items in subsequent revision of the scale.

A possible explanation for the low internal consistency of the autonomy domain may be that certain items lacked face validity for some of our respondents. Older people living in a purpose-built retirement community may conceptualize the notion of autonomy differently from the broader population in this age range, such that its operationalization within this scale may not suit our population. Another possible explanation is that the environment influences the type and degree of autonomy that individuals can achieve. This may mean that respondents do not respond to certain items in the way intended, so that such items ‘behave’ differently from others in the same domain, thereby reducing internal consistency. Finally, life-course experiences particularly affect expectations and assumptions around old age and retirement. This may apply in particular to item 9, which focuses on ‘shortage of money’. Residents in a community such as this may have developed effective methods of managing their finances and may have ‘recalibrated’ their expectations accordingly; shortage of money may therefore be less of a constraint on quality of life in general, and on autonomy in particular, than in other populations. Underlying these considerations is an inevitable tension between the individuality of respondents’ interpretations of quality of life—with the specific meaning that it holds within their life and in the context of their social environment—and the need to produce aggregate measures that can be interpreted at a population level.

Correlations among the domains of the CASP-19 differed, to varying degrees, from those reported by Hyde et al. [12], but were on average of a similar magnitude. The lowest correlation involved the pleasure domain, and the smaller range of obtained scores, and large proportion of values at the upper end of this subscale, may have served to depress this correlation. However, the magnitudes of the remaining inter-domain correlations seem to be acceptable: large enough to suggest that the domains cohere, but not so high as to cast doubt upon their distinctness.

The high positive correlation of the CASP-19 with the SWLS, which measures a similar underlying construct, supports its construct validity. The CASP-19 correlated positively with the component summaries of the SF-12, though the correlations were somewhat lower than with the SWLS. However, it may not be reasonable to expect a very high correlation between the SF-12 and the CASP-19, given that health status is only one determinant of the broader concept of quality of life, and that the CASP-19 explicitly adopts a social, rather than a medical, model of quality of life [17].

Turning to the CASP-12 version of the scale, the fit statistics from the CFA analysis were satisfactory, and were for the most part at least as good as those achieved by Wiggins et al. [13] for this version of the scale. The mean item-domain correlation for the pleasure domain increased markedly from that derived for the CASP-19, while that for the self-realization domain was very similar. The mean item-domain correlation for the new control and autonomy domain was approximately midway between those for the separate constituent domains in the CASP-19. Overall, and in line with Wiggins et al.’s findings [13], this revised version of the scale achieved a better fit than the CASP-19, though with a possible restriction of its domain of content, given the loss of seven items.

Although data was obtained from a respectable percentage (75%) of residents in this community and all but two respondents provided full data on the scale, a limitation of this study is its sample size; a larger sample would provide more precise estimates from the confirmatory factor analysis model.

Conclusion

This study has provided further information on key psychometric properties of the CASP-19, such as internal reliability and construct validity. By testing these properties within a specific group of older people—in this case, people at the upper end of the age spectrum living in a purpose-built retirement community—we have highlighted important issues to do with the use of the scale across different populations of older people.

An attempt to confirm the factor structure of the CASP-19 in this population gave equivocal results. The somewhat poor results for the ‘single-factor model’ might suggest that it is unwise to present only the total quality-of-life score, without the domain scores. However, there are countervailing arguments against the separate use of the domain scores, owing to questionable dimensionality of the items in control and autonomy domains, low internal consistency of the control domain and marked skewing of scores in the pleasure domain. On balance, we suggest that the domain scores should not be used in the absence of evidence of their psychometric robustness in the population concerned.

We recommend that the structure of the scale is reconsidered and that a modified version—along the lines of the CASP-12 proposed by Wiggins et al. [13]—be further evaluated. The inclusion of negatively worded items should be minimized in any such further development of the scale. In the interim, the use of the CASP-12 in preference to the CASP-19 can be recommended on psychometric grounds, though this will restrict comparisons with studies that have previously used the CASP-19, unless the data for the individual items in the scale are available.