Introduction

The evaluation of QoL among older adults has become increasingly important in health and social sciences. This is due not only to the growing numbers of older adults, but also to the eradication of most infectious diseases; the dominance of chronic, degenerative diseases as populations age; impressive medical technological progress; the necessity for making the effects of medical treatment more explicit; and the demand for indicators of well-being, including psychological and social aspects (Higgs et al. 2003; Walker 2005a, b, c). Research on QoL expanded especially during the 1990s, resulting in over a hundred definitions of QoL (Cummins 1997), and more than 1,000 measures of various aspects of QoL (Hughes and Hwang 1996).

Conceptualization of QoL

Although QoL research has increased in methodological rigor, progress has been hindered by the fact that QoL has been used to mean a variety of different things. There seems to be no widely accepted theoretical framework for QoL or a general consensus concerning which areas are necessary for any comprehensive definition among adults. The question has been raised whether the conceptualization of QoL among older adults is the same as for younger, middle-aged adults (Bowling 2001a; Brown et al. 2004; Power et al. 2005; Walker 2005c).

Classic conceptualizations of QoL in all adults have included such domains as physical health, social relationships and support, environment, financial and material circumstances, and cognitive beliefs (Andrews and Withey 1976; Campbell et al. 1976; Flanagan 1978; George and Bearon 1980). Currently, most researchers are in agreement that QoL among older adults reflects a multidimensional concept, including physical, emotional and social domains (Bowling 2001a; Brown et al. 2004; Ellingson and Conn 2000; Haywood et al. 2004; Moons 2004).

Further, the position has been taken that QoL should be studied from the perspective of the individual (Andrews and Withey 1976; Calman 1987; Taylor and Bogdan 1990; Walker 2005c), although it has been suggested that lay views of older adults has not been given enough consideration when measuring QoL (Brown et al. 2004; Haywood et al. 2004; Repetto et al. 2001). Researchers have been specifically challenged to avoid measures of QoL that exclude or ineffectively explore areas that are important to older adults, or worse, lead to disadvantages in the allocations of health resources (Frytak 2000; Noro and Aro 1996). For example, authors report that a paucity of attention has been given to assessing important areas such as transitions from employment to retirement, from responsible duties to free time, integration into retired community activities, alterations in family and friends, issues of intimacy, and spiritual concerns including death and dying (Farquhar 1995; Nilsson et al. 1998; O’Boyle 1997; Power et al. 2005). Further, a recent review found that older people consistently nominated components as relationships with family and others, independence and autonomy, finances, health, spirituality, and institutional care as important (Brown et al. 2004).

Measurement issues

The measurement of QoL has become more complicated because the term “health-related quality of life” (HRQoL) has evolved. Citations in Medline of this term go back to 1989. Although this term is intended to focus on the effects of health, illness and treatment on aspects of life QoL (De Korte et al. 2004; Ferrans et al. 2005; Fries 1983; Hyland 1992; White 1967), both HRQoL and QoL as concepts include many of the same domains and literature supports problems in their differentiation (Farquhar 1995; Frytak 2000; Gill and Feinstein 1994). QoL and HRQoL are concepts that are often used interchangeably in discourse and in outcomes measurement, although it is generally agreed that QoL is considerably more comprehensive than HRQoL and includes aspects of the environment that may or may not be affected by heath and treatment (Patrick and Chiang 2000). Traditionally, the concept of HRQoL was meant to distinguish outcomes relevant to health research from earlier sociological research on subjective well-being and life satisfaction in healthy general populations (Campbell et al. 1976). Currently, words such as happiness, life satisfaction and subjective well-being are still described as being closely aligned with QoL but not including QoL (Sirgy et al. 2006).

Another measurement issue regarding older adults includes a scarcity of older adult-specific instruments for QoL assessment (Frytak 2000; Haywood et al. 2004; Hendry and McVittie 2004; Power et al. 1999).Traditionally, QoL in older adults has been measured by generic QoL/HRQoL instruments applied to younger, middle-aged samples, and oftentimes inappropriately applied (Hendry and McVittie 2004). For example, using measures which only assess “ill health” and using domains which are irrevalent (Bowling 2001a; Ellingson and Conn 2000; Fayers and Machin 2007).

It has also been recommended that when assessing QoL, instruments should be evaluated for their psychometric properties such as reliability and validity and responsiveness to important clinical changes in various populations (Deyo et al. 1991; Ettema et al. 2005; Haywood et al. 2004; Patrick and Chiang 2000; Scientific Advisory Committee of the Medical Outcomes Trust 2002). Also, considerations should be given to response burden, understandability of the items and features of score distributions (McHorney 1996). Moreover, the correspondence between QoL and underlying theoretical origins, conceptual models of relationships, concept definitions and reasons for instrument choice should be considered (Brown et al. 2004; Haywood et al. 2004; Patrick and Chiang 2000).

Special considerations in assessment

Other methodological considerations regarding older adults include physical, mental, and functional changes taking place in this population. The specificity of these changes, and how these changes appear, are dependent upon aging phases, transitions, and medical conditions (O’Boyle 1997), lifestyle characteristics (Ellingson and Conn 2000; Parse 2003), personality (Erikson and Erikson 1997; Kempen et al. 1999; Krause 2004), psychological factors (Bowling 2005; Brown et al. 2004), coping capacity (Kempen et al. 1999), and social relationships (Bowling 2005; Walker 2005a). Many older adults suffer from cognitive impairment. The measurement of cognitive status demands special attention (Ettema et al. 2005; Fors et al. 2006; Grundy 2006; Haywood et al. 2004; Kane et al. 2002; Walker 2005a). Also, older adults often experience co-morbidity together with normal ageing processes, necessitating the need for the assessment of sensory changes (Østby 2004; World Health Organization 2000). Problems including educational level, sight, hearing, communication, and fatigue also demand special concern in measurement administration and instrument adaptation (Bowling 2001a; Haywood et al. 2004, 2005a, b; Kane et al. 2002; Tidermark et al. 2004).

During the last 10 years, there has occurred a growth in studies describing the assessment of QoL and HRQoL amongst older adults (Brand et al. 2004; Brazier et al. 1996; Grimby and Wiklund 1994). Haywood’s reviews have identified a increase in the number of instrument evaluations with older adults particularly since 2000 (Haywood et al. 2004, 2005a, b). However, these authors, together with others, recommend continual evaluation of existing generic instruments in this age group (Brown et al. 2004; Buck et al. 2000; Ettema et al. 2005).

Aim

The aim of this paper is to conduct a narrative review of the conceptualization and the measurement properties of QoL instruments used in empirical studies among older adults from 1994 to 2006.

Search method

The sample consisted of all studies meeting the inclusion criteria published from 1994 to 2006. A literature search in Medline, Cinahl, Embase, PsycINFO and Cochrane databases was undertaken in May 2005. In January 2007, a supplemental search was conducted covering the years 2005–2006 in these same bases, and also including Sociological Abstracts and Anthropological literature base for the period 1994–2006. In both searches the keywords, “quality of life, elderly, measurement, measurement scale, health-related, and assessment” were used to identify the corresponding controlled vocabulary system within each database. A controlled vocabulary system is a carefully selected list of words and phrases, which are used to tag units of information (document or work) so that they may be more easily retrieved by a search (Craig and Smyth 2002). The Medical Subject Heading (MeSH) system used by the Medline database is an example of a controlled vocabulary system (Gault et al. 2002). With the databases Medline, Cinahl, Embase, PsycINFO and Cochrane the word “elderly” is defined as the subject heading “aged”. We use the definition of aged as defined in Medline “A person 65 through 79 years of age” and “aged, 80 and over”, also supported by others (Bowling 2001a; Bowling et al. 2002).

Inclusion and exclusion criteria

Titles and abstracts of all articles were assessed for inclusion/exclusion criteria by two reviewers. Articles included were retrieved in full. Publications were included in this paper if they met the following inclusion criteria: (1) addressed older adults 65 years or older, (2) the authors explicitly state they intend to measure QoL and/or HRQoL, and (3) written in English or Scandinavian language. Publications were excluded when: (1) authors did not explicitly use the term “ QoL or/HRQoL” and used other words such as mortality, life-satisfaction, happiness, well-being, or functional status, (2) QoL was pointed out for further investigating in new studies, (3) proxy informants were used, (4) age classification was under 65 for a part of or the whole sample, (5) review articles, (6) articles in the form of concept analyses, letters, commentaries, and abstracts relating to posters and oral presentations (7) articles with qualitative design, (8) not English or Scandinavian language, and (9) not within the period 1994–2006. Articles were excluded on the basis of their abstracts and reading full article texts.

Data extraction

Data extraction followed criteria considered important in instrument evaluation discussed by several authors. These criteria include: evidence given for an underlying conceptual model in the study, concept definitions, internal consistency, reproducibility, responsiveness, floor and ceiling effects, content and construct validity, interpretability and acceptability (Andresen 2000; Bowling 2001a, b; Bowling and Ebrahim 2005; Brown et al. 2004; Fitzpatrick et al. 1998; Fletcher et al. 1992; Haywood et al. 2004, 2005a; McHorney 1996; Patrick and Chiang 2000; Scientific Advisory Committee of the Medical Outcomes Trust 2002; Streiner and Norman 2003, 2006; Terwee et al. 2007; U.S. Department of Health and Human Services Food and Drug Administration 2006). Special considerations related to domains covered, age-specific areas, cognitive status, administration, and instrument adaptation were also extracted.

A conceptual model is a set of interrelated concepts or abstractions that are assembled together in some rational scheme by virtue of their relevance to a common theme; it is also referred to as a conceptual or theoretical framework. A conceptual framework can also be shown as a diagram or schema, with a set of related concepts and the linkages among them displayed by the use of boxes and arrows (Gerritsen et al. 2004; Polit and Beck 2004). The use of theory assumes that a conceptual model is utilized (Chinn and Kramer 1999). In the review, criteria for assessing evidence of a conceptual model included that QoL or HRQoL was the major construct used in connection to a specific theory named in a model, and/or the authors provided a schematic model which pictorially represented the QoL or HRQoL concepts and/or interrelationships.

Various types of psychometric criteria have been defined. Reliability summarizes the measurement’s consistency measuring internal consistency and evidence is shown with values of Cronbach’s α 0.70 and over (Bowling 2001a; Fitzpatrick et al. 1998; Nunnally and Bernstein 1994). Another form of reliability is examined by test–retest reproducibility, assessing score consistency over two points in time (Bowling and Ebrahim 2005). A kappa test, Pearson’s correlations, Spearmans’s rho, Kendall’s tau and Intraclass correlations coefficient (ICC) may be used as evidence to assess the extent to which the results obtained by two or more raters or interviewers are in agreement for the same populations (Bowling and Ebrahim 2005). There is no standard level of the reliability coefficient (Polit and Beck 2004). It is common to recommend 0.90 if the measurement is to be used for evaluating individuals and 0.70 when discriminating between groups (Fayers and Machin 2007; Polit and Beck 2004), although Fayers (2007) referrers to values of 0.60 and even 0.50 as acceptable. Altman (1999) suggests a kappa value <0.20 to be poor evidence. The ICC, expressed as a ratio between 0 and 1 (Terwee et al. 2007), demonstrates evidence that is mathematically equivalent to the unweighted kappa statistic (Streiner and Norman 2003).

Responsiveness to change, sometimes called sensitivity, has been examined as a third category in addition to reliability and validity (Bowling and Ebrahim 2005; Fitzpatrick et al. 1998). Some consider responsiveness to be related mathematically to reliability, and on the conceptual level, as an aspect of validity (Patrick and Chiang 2000; Streiner and Norman 2003; Terwee et al. 2003). Validity is understood as a measurements power to measure clinically important change over time, the most common evidence being the effect size statistic (Haywood et al. 2006; Streiner and Norman 2003; Terwee et al. 2003). Also, where more than 20% of the responders have the minimum or maximum score, the score distribution indicates floor or ceiling effects, which reduce reliability and threaten responsiveness of the measurement (Haywood et al. 2005a).

Streiner and Norman (2003) reported differences in definitions of validity. As recommended, we use the concepts; content and construct validity. Validity summarizes the degree to which a measurement measures what it is supposed to measure. Evidence for face and content validity requires a more qualitative approach to assess the underlying relationship between the items and the theoretical base, the intended purpose, or intended use of the measurement (Haywood et al. 2006). Construct validity, understood as convergent or discriminant validity, requires that the instrument display evidence of correlations with related but not with dissimilar variables (Bowling and Ebrahim 2005; Streiner and Norman 2003). Factor analysis is the most common statistical method for examining the construct validity (Fitzpatrick et al. 1998).

Interpretability is defined as the degree to which one can assign meaning to a measurement’s qualitative score. Interpretability can be assessed by comparing the data with representative data from the general population (normative data) (Fitzpatrick et al. 1998; Terwee et al. 2007). Instrument acceptability addresses the willingness of people to complete an instrument (Fitzpatrick et al. 1998). Evidence of acceptability can be explored by such characteristics as response rate, missing values, response burden and mode of administration.

Findings

The review generated 499 articles from the seven databases. Only 47 articles were found to be relevant for the purpose of this article (Table 1). Articles were most often excluded because they did not meet the age-related criteria (65%), did not focus on QoL (13%), or were written in non-English or non-Scandinavian languages (6%).

Table 1 Summary of reviewed papers with conceptual frameworks, concept used, definitions, and methodological considerations about older people

Evaluation of studies

The variability of the 47 evaluated studies was large in terms of conceptual frameworks, definitions and measurements utilized, cited psychometric properties, and special considerations given to assessment issues among older adults.

Conceptual frameworks of QoL

A conceptual framework was found in only 13% of the studies (Beaumont and Kenealy 2004; Grundy and Bowling 1999; Higgs et al. 2003; Nesbitt and Heidrich 2000; Sarvimaki and Stenbock-Hult 2000). Of the six studies reviewed, Grundy and Bowling (1999) described QoL in relation to different models of human needs, as reflected in Maslow’s theory (1954), and satisfaction with life and happiness as accompanying successful aging. These authors underscored the broad and multidimensional perspective of well-being in old age, with the assumption that QoL covers all aspects in life. Nesbitt and Heidrich (2000) proposed a model of QoL, positing interrelationships among physical health limitations, sense of coherence, illness appraisal and QoL. Sarvimaki and Stenback-Hult (2000) applied a model of QoL based upon the definition of QoL as “a sense of well being, of meaning, and of value or self-worth” (p. 1027). These authors suggested that QoL is influenced by intra-individual characteristics, such as health, functional capacity and coping mechanisms and external conditions including environment, work, housing conditions and social network. Fry (2001) used social-cognitive theory in a study predicting HRQoL among older adults losing the spouse, and stated that “self-efficacy beliefs or expectancies of elderly individuals influence the level of effort they expend to preserve their QoL” (p. 788). Aspects of contemporary social theory as reflected in models based upon social comparison strategies and need satisfaction (control, autonomy, pleasure and self realization) were also applied (Beaumont and Kenealy 2004; Higgs et al. 2003).

Definitions of QoL

Of the reviewed studies, QoL was reportedly measured by 58%, 36% reported HRQoL measurement, and 6% stated that both QoL and HRQoL was examined (Table 1). In 43% of the studies QoL or HRQoL was actually defined. Sometimes, the concepts of QoL and HRQoL were used to mean the same thing. For example, the SF-36 is described as both a QoL and HRQoL measurement (Akifusa et al. 2005; Berkman 1999; Berlowitz 1995; Brazier et al. 1996; Byles 1999; Byles et al. 2004; Gagnon et al. 1999; Hanlon et al. 1996; Jenkins 2002; McFall 2000; Nelson et al. 2004; Peek 2004; Pfisterer et al. 2003; Reeves et al. 2004; Varma et al. 1999). Various authors define QoL broadly, while others do not make a distinction between QoL and HRQoL (Bowling et al. 2002; De Leo et al. 1998; Hellstrom et al. 2004b; McHugh et al. 1997; Nesbitt and Heidrich 2000). One study, Noro and Aro (1996, p. 355–356) defined both concepts. In some studies, HRQoL was reflected by the study aims and measurements chosen (Berlowitz 1995; Brazier et al. 1996; MacRae et al. 1996; Peek 2004).

QoL measurements and domains

Wilson and Cleary (1995) have developed a conceptual model for HRQoL outcomes that has increased in popularity (Ferrans et al. 2005; Patrick and Chiang 2000). We have used this model to categorize measurements and domain areas described in the review. Basically, the model depicts relationships among biological and physiological variables, symptom status, functional status, general health perceptions and overall QoL (Table 2). A total of 40 different measurements were reported, with 34 instruments applied in single studies and six instruments used in more than one study. The SF-36 (36%) and SF-12 (11%) were most frequently used. The Life Quality Gerontological Centre Scale (LGC) was used in 9%; and the Nottingham Health Profile (NHP), EuroQol and Sickness Impact Profile (SIP) each in 4% of the studies. Two studies provided evidence for assessing lay views and personal importance given to various domains by using the Schedule for Evaluation of Individual Quality of Life: direct weighting (SEIQoL–DW) and the Modified Patient Generated Index (MPGI). According to Wilson and Cleary’s (1995) model none of the measurements met the criterion for all the five levels. The SF-36 and SF-12 included four of the five levels in the model, exhibiting greatest multidimensionality in their assessment (see references in Table 3). Further, 22 instruments assessed functional status factors and 15 instruments assessed symptom status factors. Only four measurements assessed biological–physiological variables that appeared in the same study (Kumar et al. 1995). Assessments measuring Overall QoL and general health perceptions were included 16 and seven studies, respectively. Additional domains and content areas were assessed by 17 of the measurements and included the following; religion/spirituality (spiritual life, meaning, purpose, important areas of life), independence, mobility and autonomy (autonomy, respected by others, environment), enabling activities (control, pleasure, self realization, capacity, sense of coherence) social/leisure activities and community (work, interests, hobbies, holidays, work, retirement), finances/standards of living (economy, economic dimensions), and health (common health complaints, self-reported diseases, subjective impact of disease, sexual activity, ADL, disability).

Table 2 Instrument domains after Wilson and Cleary's conceptual model
Table 3 Summary of QoL and HRQoL measurements and domains after Wilson and Clearly’s Model
Table 4 Psychometric properties of the instruments cited

Psychometric properties reliability and validity

Internal consistency and reproducibility are reported for 14 of the 40 measurements utilized. Internal consistency reliability is reported for ten instruments (Table 4). Unacceptable reliability coefficients with Cronbach’s alpha below 0.70 were reported in Berlowitz (1995) and Brazier et al (1996) for the SF-36, in Hellstrøm et al. (2004a) for the SF-12, in Hellstrøm et al. (2004a) for the LGC, in De Leo et al. (1998) for the LEPAD, and in Bowling et al. (2002) for the QoL survey questionnaire. Reproducibility was reported by acceptable values by Brand et al (2004) with kappa value (0.79) for the assessment of QoL instrument; by Lui-Ambrose (2005) for the QUALEFFO with kappa values (0.54–0.90), test–retest (r = 0.99), and ICC (0.83); and by Brazier (1996) with Spearman for the EuroQol (r = 0.53) and partly for the SF 36 (r = 0.28–0.70). Responsiveness to change (effect size) was reported for the SF-36 (Berkman 1999; Brazier et al. 1996; Byles et al. 2006, 2004; Hanlon et al. 1996), EuroQol (Brazier et al. 1996), SIP (Fletcher et al. 2004), and the QUALEFFO (Liu-Ambrose et al. 2005). Floor effects were reported for SF-36, EuroQol, and the control autonomy pleasure self realization (CASP) (Berkman 1999; Brazier et al. 1996; Higgs et al. 2003).

Evidence for construct validity was reported for all measurements except for the QUALEFFO (Liu-Ambrose et al. 2005). Of those reporting construct validity, 16 measurements provided evidence of convergent validity, 34 discriminate validity, and ten factor analysis. Face-content validity was assessed in nine measurements belonging to six studies (Berlowitz 1995; Bowling et al. 2002; Brazier et al. 1996; Grundy and Bowling 1999; Higgs et al. 2003; Sarvimaki and Stenbock-Hult 2000).

Evidence of acceptability was assessed by response rate, missing values, removal of items based on focus work, and clarification that the older adults were too frail or cognitively impaired to answer items (Andersson et al. 2006; Brand et al. 2004; Brazier et al. 1996). Interpretability, as evidenced by normative comparisons, was reported for the SF-36, SF-12, LGC and the EuroQol (Akifusa et al. 2005; Berkman 1999; Borglin et al. 2005; Byles 1999; Byles et al. 2004; Peek 2004; Stenzelius et al. 2005; Tidermark et al. 2004).

Special considerations for the assessment of QoL among older adults

Special considerations given to domain coverage, age-specific areas, cognitive status, and administration method and instrument adaptation were reviewed. Of the studies, 55% did not provide any evidence of age-specific content considerations given to the assessment of QoL among older adults and a large majority (89%) of the studies did not discuss any special considerations given to instrument adaptation. However, all studies reported the administration method. Two-thirds (62%) used face-to-face interviews separately or combined with other methods, and 11% used phone interviews. Evidence for sensory changes in relation to vision and hearing impairment was cited only once, in spite of the fact that 80% of individuals over 60 years are visually impaired, 22% experience impairment in both vision and hearing, which can complicate self or telephone completion of questionnaires (Haywood et al. 2004).

In the studies that reviewed age-specific areas, discussion was focused on physical or physiological changes (Berlowitz 1995; Brazier et al. 1996; Grimby and Wiklund 1994; Liu-Ambrose et al. 2005; Stenzelius et al. 2005); role and developmental changes (Grimby and Wiklund 1994; Grundy and Bowling 1999; Noro and Aro 1996; Sarvimaki and Stenbock-Hult 2000); cognitive and mental functioning (Brazier et al. 1996; De Leo et al. 1998; Noro and Aro 1996); changes in social network (Dempster and Donnelly 2000); changes in functional ability (Borglin et al. 2005; Liu-Ambrose et al. 2005; Stenzelius et al. 2005); need for control, autonomy, pleasure, and self-realization (Higgs et al. 2003); pain (Liu-Ambrose et al. 2005); residential arrangements and social comparison processes (Beaumont and Kenealy 2004); and sight, hearing, communication, and fatigue as considerations in administration (Tidermark et al. 2004). Also, considerations given to education, value orientations, work, and an understanding of health as differing from younger samples were made (Beaumont and Kenealy 2004; Berlowitz 1995; Bowling et al. 2002; Brazier et al. 1996; Byles 1999; De Leo et al. 1998; Higgs et al. 2003).

Cognitive status was measured in only 11% of the studies. Further, in 43% of the studies, cognitive impairment was an exclusion criterion. In 47%, cognitive factors were not mentioned at all (Table 1). A few studies considered cognitive and other mental changes as natural ageing processes, that influenced the choice of administrative methods, such as reducing the number of questions posed (Hellstrom et al. 2004b; Tidermark et al. 2004), using a probing guide (Andersson et al. 2006) and training interviewers (Lee et al. 2006).

Discussion

The variability of the 47 evaluated studies was large related to the evidence provided for conceptual frameworks, definitions, measurements utilized, psychometric properties cited and methodological considerations given to the assessment of QoL among older adults.

Conceptual frameworks QoL

Of the 47 evaluated studies, 87% lacked evidence of a conceptual framework. Lund (2005) argues that when research is largely atheoretical, measurement validity is called into serious question, especially when the measure is not consistent with the conceptual definition. Gerritsen et al. (2004) suggested that a theoretical framework should: (1) be based on assumptions about the comprehensiveness of human beings in general; (2) describe the contribution of each domain to QoL, (3) identify relationships among dimensions, and (4) take individual preferences into account. In a recent review, Brown et al. (2004) found that researchers failed to address the complexity and dynamics of QoL and the interdependency of the domains, such as specifying distinctions between indicator and causal variables and potential mediating variables. They specifically advocated the need for causal models of ageing grounded in lay perspectives. Notably, in our review we found very few studies that specified causal interrelationships.

Evidence showed that both QoL and HRQoL remain ambiguous terms. Diffuse conceptual meanings were given to both terms, which were also reflected in their operationalization and measurement. For example, many studies referred to the same instruments as measurements for both terms, QoL and HRQoL. The words QoL and HRQoL were also used interchangeably in the same article. Various studies reported that HRQoL was measured, but described results as QoL (Fletcher et al. 2004; Kumar et al. 1995; MacRae et al. 1996; Pfisterer et al. 2003). These results support earlier reviews. Gill and Feinstein (1994) reported that QoL was used as a generic term for an assortment of physical and psychosocial variables, that few studies clarified the distinction between overall QoL and HRQoL, and that the majority of articles were atheoretical. Brown et al. (2004) also voiced concern, both theoretically and methodologically over the interchangeable use, without justification, of the term QoL with other related concepts, including HRQoL.

Definitions of QoL

Although a large majority of the studies imply that QoL and HRQoL is the major focus, only 43% of the studies specifically defined these concepts. QoL has been defined as a much broader concept than health, including cultural, political and social attributes such as quality of the environment, public safety, education, standard of living, transportation, political freedom or cultural amenities (Brown et al. 2004; Ferrans et al. 2005; Guyatt et al. 1996; Higgs et al. 2003). Others describe QoL as representing physical function, health status, perceptions, behavior, lifestyle, and social functioning (Frytak 2000; Moons 2004; O’Boyle 1997). Evidence from the review showed that HRQoL and QoL were measured by many of the same broad domains and content areas (Akifusa et al. 2005; Berkman 1999), a finding also supported by Brown et al (2004).

Quality of life measurements and domains

The most frequently applied measurements used, the SF-36 and the SF-12, represented the most comprehensive assessment when linked to Wilson and Cleary’s (1995) conceptual model. The Haywood et al. (2004) review of 40 instruments also found that the SF-36 was the most widely evaluated instrument. According to Wilson and Cleary’s (1995) model, the majority of measurements assessed functional status and symptoms, lending support to findings that QoL research in older adults have focused primarily on measures of health and illness as equivalents of QoL (Higgs et al. 2003). Few older adults-specific instruments were utilized, a finding supported by Brown et al (2004), in spite of the fact that the Haywood et al. (2004) review presented empirical evidence of 18 older adults-specific instruments. The need for the application and testing of existing older specific instruments needs to be addressed in future work.

Many of the additional domains and content areas assessed by 17 of the measurements, supported the findings of Brown et al.’s (2004) review regarding important domains nominated by older adults. The use of these additional measures suggests limitations found in existing instruments, supporting the need to focus on gaps in existing measurement scales. Haywood et al. (2004) found only one publication which evaluated limitations in domain coverage. Only two studies provided evidence for assessing lay views and personal importance given to various domains. Recently there have been more agreement about qualitative and quantitative assessment (Gilhooly et al. 2005), e.g., using open-ended questions for capturing lay views alongside standardized scales (Bowling et al. 2003). Our findings showed few studies using QoL and HRQoL measures together in older adults. Also Brown and colleagues (2004) found that few authors attempted to develop a composite model of QoL, showing QoL on a multi-dimensional continuum with different domains being analyzed together, rather than separately.

Psychometric properties

The SF-36 contained the greatest evidence base. The unacceptable internal consistency coefficients with Cronbach’s alpha cited in six of the reviewed studies, threaten the homogeneity of the items used, raising doubt as to what has actually been measured, and making it difficult to compare studies (Nunnally and Bernstein 1994; Streiner and Norman 2003). Responsiveness and acceptability were poorly reported. Only three studies cited ceiling and floor effects. Also, clinical significance of change scores were seldom reported, a finding supported by others (Deyo et al. 1991; Haywood et al. 2004, 2005b). These results may reflect problems in how to define change. Responsiveness of measures to change is especially important in studies of older adults, due to the controversy over whether dysfunction and diminished well-being can be reversed. Evidence for construct validity was reported for all measurements except one, with 16 measurements reporting convergent validity, 34 discriminate validity, and ten factor analysis.

Special considerations for the assessment of QoL among older adults

Of the studies, 55% did not provide any evidence of special content considerations given to the assessment of QoL amongst older adults. Almost half (47%) did not mention cognitive status. The Haywood et al. (2004) review found only two of 18 instruments assessing cognitive function. Future measurement of cognitive impairment demands special considerations (Ettema et al. 2005). More than half (55%) of the studies did not discuss any special considerations given to instrument adaptation among older adults. Administration difficulties, such as respondent burden, were seldom mentioned. Measurement of acceptability was found lacking for most instruments in another review (Andresen and Meyers 2000). Haywood et al. (2005b) specifically advised the seeking of views of older people with regard to instrument format, relevance and mode of completion. Future administration strategies could be considered, such as postal surveys with large print (Bowling et al. 2003; Hellstrom et al. 2004b; Pfisterer et al. 2003). Considerations should also be given to the educational level of older adults (Laake 2003). Most generic instruments, including the SF-36, which was utilized most frequently in the studies, are written at the seventh-grade reading level or higher (McHorney 1996).

Limitations

This review excluded 91% of the generated studies from the databases, mostly because the samples did not meet the age-specific criterion of 65 years of age or older. It can be questioned whether this criterion was too rigid, as it excluded larger studies recently conducted among older adults (Kempen et al. 1997, 1999; Power et al. 2005). Although a supplementary search was undertaken in social sciences and anthropological bases, limitations in our search method such as the exclusion of key terms meaning the same as QoL or being closely aligned with QoL, excluding grey literature, reports and systematic reviews, and not conducting manual searches in articles and books, can be cited for not providing a more relevant and comprehensive review of QoL measurement in old age (Bowling 2005; Bowling and Ebrahim 2005; Brown et al. 2004; Haywood et al. 2004; Walker 2005a; Walker 2005b; Walker 2005c) also possibly influencing the number of measurements which could represent level five of Wilson and Cleary’s (1995) model. Further, it has been shown that MeSH terms resulted in more precise searching and lower sensitivity than the search with text–word (Jenuwine and Floyd 2004). Jenuwine and Floyd (2004) reported relevant unique hits in each search strategy and recommended to use the combination of both strategies. Others have found that MeSH search provided a more efficient search than text word search (Chang et al. 2006). Nonetheless, researchers comparing MeSH searches in different bases, have found that the systems do not retrieve identical sets of documents (Gault et al. 2002; Hallett and Todd 1998). Because we used a keyword approach, the use of controlled vocabulary systems may have directly influenced system recall and precision capabilities in our searches. It may be considered a further limitation that we did not evaluate the adequacy of the empirical data reported, although this was not the aim of our paper. Our review brought our attention to the scarcity of an explicit grading systems which can be used in the interpretation, assessment and evaluation of the adequacy of evidence provided in publications (Andresen and Meyers 2000; Haywood et al. 2004).

Conclusion

Of the 47 studies reviewed, a great majority (87%) lacked a conceptual framework, and a third lacked any formal definition of QoL. Almost two-thirds of the studies focused on QoL, where HRQoL was used as an overlapping term. Although construct validity was reported in the majority of studies, minimal empirical evidence was provided for other psychometric properties of the instruments applied, a finding supported by others (Haywood et al. 2006). Furthermore, more than half of the studies did not report any methodological considerations given to older adults. Findings confirm the need for improvement in the quality of documented (reporting of) psychometric measurements so as to determine which of the growing number of QoL instruments perform most adequately and under what set of circumstances (Andresen and Meyers 2000; Brown et al. 2004; Grotle et al. 2005; Haywood et al. 2004, 2005b, 2006; McDowell and Jenkinson 1996; Terwee et al. 2007). Continued efforts are needed to reach interdisciplinary consensus on definitions of acceptable measurement standards for good measurement properties, including explicit quality criteria for the assessment and grading of these properties. Efforts are also needed to identify those content areas that are likely to discriminate best and display the greatest responsiveness to change. Our results lend support to Walker’s (2005b) discourse regarding the amorphous, multidimensional and complex nature of QoL and the need to resolve such methodological issues if interdisciplinary research on QoL is to develop further. Research grounded in subjective evaluations of QoL, are required to capture more adequately the multi-dimensional conceptualization reflected in measurement assessment. Future priority should be given to the development of common QoL assessment models that are person-centered, causal and multidimensional, based on collaborative efforts from professionals from the international gerontology research community.