Introduction

Over the past 15 years, several validated health-related quality of life (HRQoL) instruments have been developed for child self-report. The availability of companion measures with parallel content for proxy raters, especially parents, has allowed researchers to explore the agreement between raters. Research to date has demonstrated considerable variability in inter-rater concordance, based on child age, gender, diagnosis, duration of illness, treatment status, and by HRQoL domain(s) [1, 3, 7, 9, 10, 14, 27]. Much of this research has been cross-sectional in design, leaving largely unanswered the manner in which patient–parent agreement changes over time and why.

The prevailing wisdom about the role of self-report versus proxy report has shifted dramatically over the past two decades. Prior to the recent proliferation of new measures and validation studies that have accompanied their release, children, especially before adolescence, were thought to lack the cognitive/developmental skills to provide valid and reliable self-report. Several emerging instruments have directly addressed these developmental limitations through altered questionnaire design, pictorial response scales, and item wording. Field-testing of these instruments has demonstrated the ability of the majority of children, beginning in the latency period of development, to rate their own HRQoL. Increasingly, there has been a shift to relying on children as the primary respondents of their own HRQoL with recognition that there are situations in which self-report is not possible (age, illness, and impairment) [6, 8, 18, 22].

Despite the differences between parent and child ratings of the child’s HRQoL, parent proxy reports do play a role in specific settings. A recent article by Upton et al. [19] presents two very different uses of parental ratings, one to enrich our understanding of the child’s HRQoL by learning another perspective, and two, by using the parent report as a substitute for child report in instances where the child is unable or unavailable to provide self-report. In the first use, the concordance between raters is less important, whereas in the second, the degree of agreement between the raters would be of clinical importance.

In this study, we explore the relationship between child self-report and parent proxy report of HRQoL by domain, using the Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 Generic Core Scales in a cohort of children newly diagnosed with cancer. Capitalizing on the rich database available through the previously conducted supportive care trial, involving 222 parent–child pairs, we describe the longitudinal nature of parent and child ratings of the child’s HRQoL over the first 16 weeks following diagnosis. Specifically, we examine how age, intensity of treatment, and time on study affect the relationship of parent and child report for each of the domains within the measure. In addition to the expected variation in ratings by child age and domain, as previously reported, we hypothesized that concordance between parent and child would vary by treatment intensity and by time. Specifically, we anticipated that there would be better agreement among parent–child dyads receiving more intense therapy and that agreement would increase over time as parent and child spent more time together in a “shared reality.”

Patients and methods

Summary of original efficacy trial

A double-blind, placebo-controlled study was conducted at 27 sites in the United States to test the tolerability and effects of once-weekly intravenous epoetin alfa (EPO) on patient-reported HRQoL and clinical outcomes in anemic pediatric patients receiving myelosuppressive treatment for cancer [17]. The study sample was stratified by tumor type (solid tumor/Hodgkin’s disease or ALL/NHL). Patients were randomly assigned to treatment arms and were followed every 3–4 weeks (depending on their chemotherapy schedule) with a final visit at Week 16. Patients and their parents reported on the patients’ HRQoL at baseline, Week 4 or 5, 7, 9 or 10, 13 and Week 16, using the age-appropriate forms of the PedsQL™ 4.0 Generic Core Scales and PedsQL™ 3.0 Cancer Module (patient-report only). The primary study endpoint was mean change from baseline to Week 16 in patient-reported PedsQL™ 4.0 Generic Core Scales. Enrollment goals to achieve 80% power at a = .05 were achieved with 113 patients randomly assigned to receive EPO and 111 assigned to receive placebo. Patients did not differ at baseline demographically or in terms of their HRQoL scores; additionally, no significant differences in HRQoL were observed between the two groups at Week 16. Because HRQoL scores were similar between the two treatment groups, data from the two treatment arms were pooled for the analysis of observer agreement in the current study.

Study measure

The primary outcome measure is the age- and rater-specific version of the PedsQL™ 4.0, a multidimensional health profile measure of HRQoL. The PedsQL™ 4.0 contains 23 items forming four principal domains including physical functioning (8 items), emotional functioning (5 items), school functioning (5 items), and social functioning (5 items). The Acute Version of the PedsQL™ 4.0 has a reference period of 1 week and uses a Likert-type response scale for each item. Higher scores connote better functioning. Although summary subscales (physical and psychosocial functioning) and total scores can be generated, we elected to rely solely on the domain scores, based on the study’s goals. This instrument has been shown to have acceptable internal consistency, known groups, and construct validity estimates, principally derived from cross-sectional samples [2225]. There has been limited use of this instrument to assess child and parent agreement over time [7, 9].

Intensity of treatment

Each diagnosis and related treatment was reviewed by a pediatric oncologist and pediatric oncology nurse to establish criteria for designating levels of intensity of anti-cancer treatment. Initially, three levels of intensity (low, moderate, and high) were created based on patient diagnosis and stage, and type and duration of treatment. Low intensity included a short course of chemotherapy. Mid-range intensity included treatment up to 1 year that was considered to be of moderate impact and toxicity. High intensity included multimodal therapy of greater than 6 months in duration, which was considered to be of greater intensity than the other two intensity levels. The two oncology specialists independently reviewed each diagnosis in addition to consulting with a pediatric oncologist at each of their respective settings. Consensus review followed, resulting in complete agreement between the two original pediatric oncology specialists.

Using these criteria, 12 patients were categorized as low intensity, 59 as mid-range intensity, and 148 as high intensity. Subsequently, for the purposes of analysis, the low and mid-range intensity categories were combined as non-high. In total, 66.7% of patients were categorized as high intensity and 33.3% of patients as non-high intensity.

Data completeness

The data set was examined for completeness. For missing items within each of the four domains of the PedsQL™ 4.0, guidelines established by the instrument’s author [20], based on the “50% rule”, were used for imputation of the domain scores. While domain scores were missing for <1% of respondents (both raters) for the domains of physical, emotional, and social functioning (data not shown), overall missingness for school functioning exceeded 20% (both raters) at baseline. Given the high level of missingness within this domain, patterns of missingness within the school functioning domain were also examined by age group, treatment intensity, and visit (time interval).

Within-respondent measures

Internal consistency reliability was measured using the Cronbach’s alpha [2] and examined for each domain of HRQoL by observer, age group, treatment intensity, and time interval. The criterion level of ≥0.70 was used to evaluate the adequacy of the internal consistency estimates [15].

Agreement between parent and child

To assess the extent of agreement between the child and parent assessments, we examined three components of the variation of each HRQoL domain using the model: \( Y_{oit} = \mu_{ot} + \delta_{oi} + \varepsilon_{oit} , \) where i indicates the ith child, o indicates whether the observer is the child (c) or parent (p), and t indicates the tth assessment. The first component assesses observer bias and is measured as difference between the child and parent group means averaged over the assessments \( \left( {\sum\nolimits_{t = 1}^{T} {(\hat{\mu }_{ct} - \hat{\mu }_{pt} )/T} } \right). \) The second component assesses the agreement between the child and parent assessment of the individual child status compared to the group average or the between-subject variation, \( \delta_{oi} . \) This second component is measured by the correlation of the child and parent deviations, \( \rho (\delta_{ci} ,\delta_{pi} ). \) Strong correlations of the between-subject variation would indicate that both observers consistently identified the child as having higher or lower scores than the “average” child. The third component assesses the agreement with respect to within-subject changes between assessments, \( \varepsilon_{oit} , \) and is measured by correlation of the child and parent deviations, \( \rho (\varepsilon_{cit} ,\varepsilon_{pit} ). \) The intraclass correlation (ICC), a measure of the correlation of scores within observer within time, was also calculated. The ICC is the ratio of the between-subject variation to the total variance. Mean differences and correlations were estimated using multivariate mixed-effect models. Standard errors for correlations were estimated using 500 bootstrap samples. Although there are no established criteria for the interpretation of such correlations, correlations of <0.29 are generally considered low, 0.3–0.6 is considered moderate, and >0.6 is considered strong [12]. All analyses were conducted using SAS version 9.1. Additional details are presented in the “Appendix”.

Results

Study sample

All patients included in the efficacy analysis of the original trial (n = 222) were included in the analysis for the aims of this study. The majority of patients were between the ages of 13 and 18 years and Caucasian, and slightly more than half were boys (Table 1). There was a variety of diagnoses with acute lymphoblastic leukemia (ALL) being the most frequent. All subjects were on a regimen with a risk of chemotherapy-induced anemia with roughly two-thirds classified as having received a high-intensity regimen.

Table 1 Patient and disease characteristics (n = 222)

Missing assessments

For the domains of physical, emotional, and social functioning, low levels of missingness (≤1%) were observed at the item level, indicating that domain scores did not require imputation for either rater group or across age groups (data not shown). In contrast, high rates of missingness were noted in the domain of school functioning (Table 2). Younger children had higher rates of missingness than older children. Across all age groups and both raters, the highest rates of missingness were detected at Week 7.

Table 2 Percentage missing within school functioning domain by rater, age group, and treatment intensity

Internal consistency estimates

Internal consistency estimates varied by HRQoL domain, observer, and child age (Table 3). Overall, parental and child reliability was highest for physical functioning and lowest for social functioning. Among the child ratings of social functioning, internal consistency estimates were slightly below the criterion level of >0.70 across all age groups, treatment intensity, and time intervals. Similarly, the reliability estimates for emotional functioning and school functioning were approximately 0.6 for children in the 5- to 7-year age group. Otherwise, the domains for both child and parent observers meet the criterion for acceptable internal consistency reliability.

Table 3 Internal consistency (Cronbach’s alpha)

Between- and within-rater agreement

Children consistently reported higher functioning than their parents (Table 4), with the largest overall differences occurring in emotional functioning (9.1 points) and the smallest differences in physical functioning (2.8 points). Difference in social and school function were 5.0 and 5.4 points, respectively. Differences varied by age, with the biggest differences occurring in the oldest children (13–17 years) for emotional, social and school functioning and in the youngest children (5–7 years) for physical functioning. No differences were associated with the intensity of treatment. Differences between average child and parent reports of emotional and social functioning decreased as the study progressed (0–7 weeks vs. 9/10–16 weeks).

Table 4 Mean difference between parent and child

Agreement between the individual parent and child responses is summarized in Table 5. The correlation of the child and parent report of the between-subject variation for the entire group ranged from 0.61 (social functioning) to 0.86 (physical functioning). Within the subgroups defined by age, treatment intensity, and time on study, the correlations were similar with two exceptions. The agreement of parent and child was significantly lower for the youngest children for the measures of physical and school function.

Table 5 Between and within parent and child correlations

The correlation between child and parent in the within-subject variation in the entire group ranged from 0.27 (social function) to 0.37 (physical function). Again, the correlations are stable across age, treatment, and time, with the exception of lower scores for adolescents in measures of social and emotional functioning.

Finally, the correlation of measures within-respondent across time (ICC) was slightly higher for child reports (0.53–0.67) than parent reports (0.48–0.57). The ICCs were relatively constant across age, treatment, and time, with the exception of lower scores for the youngest children in measures of physical, social, and school functioning.

Discussion

Notable strengths of this work include the large number of child/parent dyads reporting serially about the child’s HRQoL during the first 16 weeks of treatment, use of age-specific matched child/parent HRQoL instruments, and the multiple statistical strategies used to assess child/parent agreement within HRQoL domains. To our knowledge, this is among the few studies that have addressed child and parent agreement using the PedsQL™ 4.0 longitudinally in a large sample of children with newly diagnosed cancer.

The low rate of missingness of data and the high rate of completeness of child and parent reports increase the trustworthiness of findings with the exception of school functioning. Given higher rates of missingness in the school functioning domain for both child and parent, across child age groups and data assessment points, findings from this domain are interpreted with caution. The extent of missingness in this domain and the greater degree of missingness within certain age groups and diagnoses may imply that this domain is especially sensitive to treatment intensity and timing within the treatment course. Alternative explanations, such as school vacations or delayed school enrollment, are also plausible reasons for missing data in this domain, as previously reported [21]. Thus, it is likely that this degree of missingness in the school functioning domain will occur in other studies involving children in active treatment for cancer who are of school age, particularly when the acute version of the instrument is used. This argues for the collection of information about reasons for missingness.

The multiple ways in which we examined child/parent HRQoL agreement converge in their results to indicate that parent proxy reports of their children in treatment for cancer are reasonable estimates of the child’s reports.

The internal consistency estimates for the social functioning domain as reported by children across all age, treatment, and time subgroups are lower than needed to conclude that the reports are reliable. Similarly, we found suboptimum levels of reliability among young raters (5–7 years) in the domains of emotional and school functioning. This is an important issue when examining agreement between raters because agreement at the individual level is dependent upon the reliability of the instrument. The low internal consistency estimates in these domains likely contribute to the observed lower child/parent agreement, as compared to other domains. Of note, for the more concrete and observable domain of physical functioning, high reliability was noted across all subgroups, as expected, with high correlation between observers and small mean differences.

The relationship between ICC and low reliability was formally addressed by Verrips et al. [26] who demonstrated methods for correcting ICC by dividing the observed ICC by the square root of the product of the raters’ reliability. The authors suggest that correction for attenuation can raise ICC by 10–15%. While we do not present the results of corrected ICC for our study population, we noted similar magnitude of the corrections.

As reported in previous studies involving children with different illnesses and their parents reporting on the child’s HRQoL [6, 27, 28], children in our study consistently reported higher HRQoL functioning than did their parents, especially in the domain of emotional functioning. We attribute this consistent difference to a rating bias. Parent ratings of HRQoL reported in other studies indicate that parents over or underestimate their child’s HRQoL compared to the child’s ratings, with particular discrepancies reported in emotion-focused items [5, 6, 10, 11, 14, 28]. The rating differences in this domain appear to become more discordant with adolescents as compared to younger children [1]. Our findings also include discrepancies in social and school functioning ratings between young children (ages 5–7 years) and parents. In addition to the potential contribution of lower reliability among these young raters, a recently proposed explanation of differences between child and parent HRQoL ratings is that children tend to base their HRQoL reports on a recent event, whereas parents tend to base their reports on multiple events [4]. It is likely that information about the child’s clinical status during treatment, often shared preferentially with the parent and not directly with the child, influences parent HRQoL ratings. In a previous study, the parent HRQoL ratings tended to be more highly associated with clinical indicators of the child’s condition [3, 13, 16], a finding that has led to the suggestion that parent reports may be more accurate (in terms of disease status) than the child’s self-reported HRQoL ratings. However, while HRQoL measures should be correlated with disease status, they are not a measure of disease status, and these data do not confirm that child’s self-report is a less accurate measure of the impact of the disease and its treatment.

Importantly, our findings indicate that the size of the difference between child/parent ratings decreases over time and assessment points in three of the domains. This may indicate that a training effect is reflected in the serial completion of HRQoL assessments. Alternatively, the increasing alignment of the child and parent ratings suggests that the experience of the cancer treatment becomes a more shared experience between child and parent over time. From a measurement perspective, the latter interpretation supports the role of a parent proxy in reporting on their child’s HRQoL during treatment for cancer when the child is unable to report. There are important developmental implications to these findings, particularly in adolescence, when creation of separate identity and perception is the norm. The illness experience may thrust the dyad into increased closeness (Eiser, personal communication, 1998).

Previous research studies emphasizing the differences between child and parent reports were cross-sectional or measured HRQoL ratings at a single time point. Our findings of decreasing differences in agreement indicate that conclusions about child/parent HRQoL agreement should not be based on a single measurement point or even on the first of a series of measurement points. Moreover, our findings point to the need for further research about parent–child agreement over time, extending beyond the acute treatment phase.

Conclusion

The moderate-to-strong agreement between child and parent ratings, particularly in the domains of physical and emotional functioning, supports the usefulness of parent proxy reports during a child’s initial treatment for cancer. Interestingly, the agreement improved in these two domains quickly (by Week 9/10 of therapy). Although agreement levels may not support replacing child reports with parent reports, they do indicate the value of the parent proxy reports as ancillary data that help to interpret the child’s illness experience. We recommend collecting HRQoL data from both the child and the parent proxy across the course of treatment for a child with a life-threatening illness.