Introduction

Individualized measures of health-related quality of life (HRQL) have been used in health-related applications for several decades, and a systematic search identified more than 140 studies reporting results using this approach. These measures were developed based on the premise that HRQL is a very personal construct and its quality can only be judged by the person living that life experience [13].

The two main individualized measures are the Schedule for the Evaluation of Individual Quality of Life (SEIQoL) [1] and the Patient Generated Index (PGI) [4]. This study focuses on the PGI. While there are different versions [5, 6], it is completed in three stages. In the first, patients are asked to give up to five areas of their life that are important to them and affected by their health problem; a sixth area is for all other non-health-related aspects. In the second, they rate, on a 6- or 11-point scale, how badly they are affected in each area and the rest of their life. In the third, they are given a fixed number of hypothetical points to distribute across the areas in which they would most like to see an improvement. The total score is calculated by summing the cross-product of severity rating and points allocated and converting this to a percent, with 100 indicating the highest quality of life.

Several studies have undertaken traditional psychometric testing of these types of measures. A 2007 review indicated that reliability varied widely [5] somewhat explained by differences in the versions of the PGI. When reliability (test–retest) has been tested, estimates have ranged from 0.48 to 0.91 [615], generally lower than fully standardized measures which typically have reliability coefficients in the order of 0.8 or higher [1618]. This is because three parts of the individualized measures can vary: the area, the severity, and the weight. For example, when areas changed, the reliability was much lower than when areas did not change [9, 14]. Comparing reliability coefficients between individualized and fully standardized measures needs to consider that the response options are quite different and fully standardized measures typically have only a few response options that may be too coarse to vary.

The PGI was designed to be self-administered although some patients require the assistance of an interviewer [7] and takes only a few minutes [15, 19]. Research using postal administration showed that a disabled population could not manage this mode of administration [20] and that in a general practice population only 51% completed the PGI correctly [13]. More recently, Garratt et al. [6] reported a completion rate of 81% for an inpatient rehabilitation population.

Research has shown that these measures can be used with populations with cognitive impairment [21] although the cognitive burden is high for some people [22] but can be reduced with supported administration. In general, correlations are low between HRQL ratings from individualized measures and disease-specific measures [13, 23, 24].

Although some comparison has been made between scores for HRQL derived from individualized measures and from fully standardized measures, less has been done to understand why differences occur [25]. Several investigators [26, 27] identified that it is likely that ratings for HRQL will differ across individualized and standardized measures if the content varies widely. For example, Broberger et al. [27] showed that, in patients with lung cancer, concerns generated by an individualized approach were not fully represented in a standardized cancer HRQL measure (EORTC); this was also observed in a cardiac sample [24].

Another gap with regard to understanding individualized measures is an in-depth comparison of their performance across health conditions. Only the early work of Ruta [13] has compared the interpretation of individualized measures (PGI) across health conditions but found correlations between PGI scores and disease-specific measures which were moderate in strength. A comparison of PGI domains and scores across health conditions, as well as against standardized measures, was not made. This information would be useful because the format of the PGI is very attractive for clinical use as it generates content of clinical concern and also provides a HRQL rating for research purposes. Data on the performance of the PGI against fully standardized measures would be helpful to support the use of the PGI alone or in combination with other measures.

Objective

The purpose of this study is to estimate, across four health conditions, the magnitude of the association between ratings for HRQL derived from the Patient Generated Index (PGI) and those from fully standardized generic and disease-specific measures; to identify the extent to which the areas generated from the PGI are covered by the content of the fully standardized measures. The hypotheses were that the rating of HRQL from the PGI would be lower than those from fully standardized measures and that each health condition would yield unique content that would differ from the content of the fully standardized measures.

Methods

Four data sources were used for this study, all of which had incorporated the PGI into the measurement strategy. The study samples were comprised of people with stroke [28], multiple sclerosis (MS) [29]; advanced cancer [30], and persons living with human immunodeficiency virus (HIV+) [31]. The methods for these studies have been presented previously [32, 33]. Briefly, the stroke sample comprised people with chronic stroke recruited into a controlled trial testing an intervention to improve participation in meaningful activities; people had to be cognitively able to engage and be able to toilet independently [28]. The MS group was sampled at random from the databases maintained at the three largest MS clinics in Montreal, QC and comprised people diagnosed after 1994 [32]. The cancer sample was participants in a longitudinal study on anorexia/cachexia and was newly diagnosed with advanced cancer; the sample excluded people with brain tumors [33]. The HIV + sample were men and women ≥35 years and diagnosed for at least one year recruited into a longitudinal study on brain health in sites across Canada; exclusions were for dementia, life expectancy of <3 years, other personal factor limiting the ability to participate in follow-up, non-HIV-related neurological disorder likely to affect cognition, known active CNS opportunistic infection or hepatitis C requiring Interferon-based treatment during the follow-up, period, psychotic disorder, or current substance use disorder or severe substance use disorder within the past 12 months [31].

All of the studies had ethical approval from the institutional review boards of the respective institutions. For the three studies using longitudinal designs (stroke, cancer, and HIV), only data from the initial visit were used. While the protocol for each study stipulated interviewer assistance for the PGI, this condition was not always met in the stroke and cancer protocol and several participants completed the PGI on their own.

For all studies, the World Health Organization’s International Classification of Functioning, Disability, and Health (ICF) [34] and/or the Wilson–Cleary model for health-related HRQL [35] were the underlying measurement models. Measures included the PGI [4], multi-item generic standardized measures that yield a single index, the index value of the EQ-5D [36] (5 items; 3-point scale) and the SF-6D [37] (6 dimensions derived from 36 items), and health rating from the EQ-VAS [36]. Disease-specific HRQL indices were the Preference-Based Stroke Index (PBSI) [38] (10 items, 3-point scale), Preference-Based MS Index (PBMSI) [39, 40] (5 items; 3-point scale) and, for cancer, the Existential subscale of the McGill Quality of Life Measure (MQOL) [41]) (6 of 16 items). For cancer, two single items with numeric ratings scales for QOL were also available [41, 42]; for HIV the single item for global QOL from the WHOQOL-HIV Bref [43] was used. The disease-specific measures differed across studies, and the SF-6D was not available for the stroke sample. All measures were transformed to range from 0 to 100, with 100 as the most optimal. In addition, to questionnaires the measurement strategy included tests of physical performance. The PGI was always administered as the first questionnaire.

Completion of Stage 1 of the PGI yields text threads, which in these studies were in two languages, English and French. To create a common nomenclature for each area nominated, the coding system from the ICF [34] was used in the matching language [29, 30]. The same process had already been applied to the generic measures, EQ-5D and SF-6D [44], to compare content.

Descriptive statistics were used to summarize the results across studies. Pearson correlations between the PGI and all other available measures were calculated with 95% confidence intervals (CI), using a Fisher transformation. For persons who did not complete the PGI as indicated, as long as 7 or more of the 12 hypothetical points to denote the importance weighting were distributed, a total score was calculated with the weighting based on number of points distributed.

Results

Table 1 summarizes the key features of the samples including age, sex, and scores on the study measures. A total of 1328 people nominated areas of HRQL impact using the PGI; however, only 1263 people had data usable for analyses: 222 people with stroke (mean age 63 years); 185 people with MS (mean age 43); 690 people with HIV (mean age 52); and 173 people with advanced cancer (mean age 63). Of all the measures, the PGI provided the lowest rating across all health conditions.

Table 1 Characteristics of the four samples and ratings on measures of HRQL

Table 2 lists, in order of prevalence, the 10 most common areas (and ties) nominated across samples. This harmonization of the PGI text threads to the ICF yielded a total of 19 areas for stroke, 60 for MS, 34 for HIV, and 114 for people with advanced cancer. For stroke, the most commonly nominated area was walking/mobility but even this was only nominated by 42% of the population; second on the list was arm impairment. For people with MS, work/school was the most frequently nominated area, endorsed by 62%, followed by fatigue. For HIV, 97% nominated health worries or management as a key contributor to HRQL, followed by emotional function. For advanced cancer, fatigue was the most frequently nominated area, but only by 39%; sleep disturbance was second.

Table 2 Top 10 areas of life impact for each of the four health conditions

Figure 1 presents, for each of the four health conditions, the areas nominated using the PGI under their corresponding ICF domains (orange for impairment; blue for activity; green for participation; purple for non-ICF domains). The dimensions included in the EQ-5D and SF-6D are in the lighter shading. The EQ-5D has a total of 5 dimensions, pain and depression/anxiety in the impairment domain of the ICF, walking and self-care as activities, and usual activities as participation. The SF-6D has six dimensions, pain, emotion, and fatigue in the impairment domain, and work, recreation/leisure, and social in the participation domain.

Fig. 1
figure 1

Overlap between areas nominated on the PGI (dark shading) and dimensions of the standardized HRQL measures (pale shading) as classified using ICF domains. Orange shading indicates impairments; blue activity; green participation; purple non-ICF domains

For stroke, only the walking dimension (EQ-5D) and the work and recreation/leisure dimensions (SF-6D) matched with PGI areas nominated. For MS, HIV, and cancer, the fatigue (SF-6D) and mood (EQ-5D) dimensions were nominated, as was the walking dimension (EQ-5D) for MS and cancer, and all the dimensions related to participation (SF-6D). Only HIV nominated pain. However, a number of important areas nominated by the PGI were not represented in the generic HRQL measures mainly from the impairment domain such as balance, cognition, speech, sleep, appetite; also important to people with the four health conditions was ability to look after household tasks and engaging in vigorous activities and sports.

In terms of feasibility of using the PGI, this also varied across health conditions. For stroke, 249 completed the PGI but 27 (11%) did not do so correctly as indicated by not assigning at least 7 of the required 12 points. For MS, all 185 assigned 12 points. For HIV, 691 persons completed the PGI and only one person did not assign 12 points. For cancer, 203 filled nominated areas using the PGI, but 33 (16%) did not assign points correctly; for three of these, PGI scores could be calculated from the points assigned.

Table 3 gives the correlations and 95% CI between the PGI and each of the other measures. Overall, the correlations were low to moderate in strength. The highest correlations between the PGI and other measures were observed for people with MS, with correlations between 0.53 and 0.59. The lowest correlations were observed for people with HIV and cancer, ≤0.33. Within health condition, there was little difference in the magnitude of the correlations between the PGI and other measures. Across measures, the correlations between the PGI and other measures were much lower in the cancer group.

Table 3 Pearson correlations (95% CI) between the PGI and other measures of HRQL across the four health conditions

Discussion

This study showed that the PGI provides unique information across health conditions not captured by the standardized measures such as the importance of specific motor impairments, cognitive ability, sleep, and sports and vigorous activities (see Fig. 1). The areas nominated are all actionable with targeted interventions and would also guide the selection of treatment options, when these are available. Including the PGI with one other fully standardized measure would provide information suitable for within-person, within-condition, and cross-condition comparisons.

A number of lessons were learned about the individualized measures from the analyses carried out here on the PGI. First, across all four tested health conditions, the rating of HRQL was much lower using the PGI than with the other measures. The PGI asks people to nominate areas where their health condition has affected their life. While people can nominate areas that are positive or negative, it is most common to nominate areas that are negative, or they perceive they are to nominate negative areas only. In fully standardized measures, everyone responds to the same set of questions, whether they apply or not to the person concerned. The scoring system for the generic HRQL measures is based on utility weights assigned to each level with a higher utility assigned to more optimal levels. If the person is unaffected by that area, the total score would, by design, be higher than when the person only considered those areas that negatively impact on HRQL. The fact that the PGI focuses on the negative may make it a more clinically relevant measure. To quote: Richard Smith, former editor of the British Medical Journal [45]:

“But what is health? For most doctors that’s an uninteresting question. Doctors are interested in disease, not health.”

As a research measure, the fact that its score was much lower than the fully standardized measures could mean that there is more room for improvement making it attractive as an evaluative measure, particularly for interventions that specifically target HRQL [46], although others suggest patient-generated outcome measures would be better as complementary rather than primary measures that can help guide treatment, inform the content of new measures, and assess the content validity of existing measures [47].

Second, the PGI did not correlate very strongly with the other standardized measures, which may be due to the content differences and how the points assigned to the areas nominated are used to calculate the total score. Figure 1 shows this discrepancy quite clearly. Overall, the fully standardized measures best cover the participation domains as these are common across health conditions but do not tap the specific impairments and activity limitations unique to each health condition. It is these components that are the targets of treatment rather than the downstream outcomes of participation and this could affect the responsiveness of the generic measures to rehabilitation or cognitive behavioral interventions. On the other hand, as fully standardized measures do not tap many of the impairment level constructs, they may also not be sensitive to detect adverse side effects of other more curative focused interventions such as drug therapy that may have side effects such as fatigue and cognitive impairment. Of note is that the correlations between the PGI and other measures were lowest for the cancer groups. Of all the four groups under study here, people with cancer nominated some 114 different areas, two to six times more than people with the other conditions. This diversity captured by the PGI cannot be captured in the fully standardized measures and will have contributed to these low correlations.

Third, it was relevant to observe that the areas nominated across health conditions were quite different; however, in the standardized measures, everyone answers the same item set. Table 3 illustrates that a “one-size-fits-all” approach to HRQL assessment may not provide the most useful representation of this important construct. As shown in Table 1, the EQ-5D rating was very similar for people with stroke, MS, and cancer, despite quite different disease aetiologies, prognoses, and age at onset. The EQ-5D value for HIV was much higher as this group usually does not have any primary motor impairment leading to difficulty with walking, a dimension covered by the EQ-5D, but not the SF-6D. The SF-6D ratings were also very similar across MS, HIV, and cancer. While these ratings may help to allocate resources across these health conditions, they may be less useful in identifying the problems experienced by people with these differing conditions. In contrast, the PGI was most similar between people with stroke and people with cancer (mean 35 and 37) who have major deficits or who are systemically ill but lower than people with MS or HIV (mean 50 and 51) who are younger and have treatments that control the underlying disease processes.

Fourth, even within a health condition, there is a tremendous amount of heterogeneity in areas that impacts on HRQL. For example, for stroke, the most prevalent area nominated was walking/mobility but it was only nominated by 42% of the sample. For MS, work was the most prevalent, named by 62%; for cancer, it was fatigue, named by 39%. Interestingly, almost all people with HIV (97%) named some aspect of health as impacting HRQL. This is despite the observation that they rated their health higher than other groups but the nature of HIV is that health is achieved by an intense regimen of medication and life style considerations.

Lastly, these results support that having an interviewer assist with completing the PGI is more likely to yield useable scores. The most difficult part of the PGI is to assign points, and it is essential for the scoring algorithm that 12 points are assigned. For MS and HIV, virtually all people filled out the PGI correctly and an interviewer assisted with completion. For stroke, which was a multi-centered community-based trial, while the protocol stipulated interviewer assistance for the PGI, this could not be assured and likely the reason why 10% did not do so correctly. In the cancer study, some patients were left on their own to complete the questionnaires and this likely resulted in misunderstanding and incorrect use of the points. However, despite protocol violations, the PGI was completed correctly for over virtually all people with MS and HIV, 90% for people with stroke, and 84% of people with cancer. The greatest difficulty was with assigning points. In the absence of an interviewer, an electronic data capture system could be designed to force the use of 12 points. This would be an avenue for further research.

What construct does the PGI represent? The instructions indicate to identify the most important areas of life that are affected by the health condition, and a sixth area is for all other non-health-related factors. This would situate the PGI as a QOL measure. There are actually few fully standardized QOL measures; WHOQOL [48] and Quality of Well-being Index (QWB) [49] come to mind as generic QOL measures, but of these only the QWB yields a single index. The SEIQoL [1] would qualify as an individualized QOL measure. The most common set of measures are of HRQL, and these can be generic or disease specific. Thus, the PGI would appear to be, uniquely, a disease-specific QOL measure. As each of the areas is weighted by the individual’s preference for improvement, the total score is essentially a preference-weighted index. However, as the sixth area for non-health-related aspects rarely factors into the scoring of the PGI, dropping this would simplify the completion and situate the PGI as a disease-specific HRQL measure [6].

The literature is supportive that the PGI would have great clinical relevance. It would also be a feasible HRQL technology for multi-lingual contexts as, while the instructions for the PGI would need to be translated, these terms are informational and more easily translated. The content of the measure is person specific and people identify their own concerns, using their own language, and in their own words [50]. This would remove the barrier often encountered because valid translations are not available for all conditions in all languages or language uses. Of course, mapping this back to standard nomenclature to harmonize across words used to describe the same area or domain is still needed. For this use, the standard nomenclature from the WHO’s ICF [34] is the “dictionary” we have found ideal as our research is always done in two languages, English and Canadian French; the ICF is available in many languages including Arabic, Chinese, Russian, German, and Spanish.

The PGI would also be best administered accompanied with some qualitative debriefing querying, among other topics, whether the areas were favourably impacted on, and, if administered over time, why different areas were nominated. Ahmed et al. [51] reported on the results of interviews on 46 people with stroke to identify why responses on the PGI changed over time and found that response shift [52] was one reason, as well as forgetting to mention a previously nominated area, actual improvement, and major life events.

The results reported here arose from secondary analyses of existing data rather than designed to test the results across conditions. Thus, different types of disease-specific measures were used and one sample (stroke) did not have one of the generic measures administered (SF-6D). In the future, use of a more simplified version of the PGI, one that drops the sixth area, would reduce difficulties with completion [6].

The clear advantage of an individualized measure is that it directly captures what matters to the person being interviewed. This could make it very attractive for use in clinical practice and, as it also provides a score from 0 to 100, in research as well. The score has meaning as a health index and the areas of personal concern would be important to consider for optimal management. It would be the ultimate “patient-centered outcome”.