Introduction

Health practitioners use client-centered approaches to improve an individual’s health (Lin et al. 2012b), and they need effective evaluations to assess the health condition. Thus, quality of life (QoL) measurement tools, which provide useful and subjective information (World Health Organization [WHO] 1993; Young et al. 2013), can yield an effective evaluation for healthcare practitioners. In addition, QoL measures have been developed for various populations, including the children (Ravens-Sieberer et al. 2006; Roy et al. 2013). Children, however, may lack the cognitive skills necessary to accurately complete a QoL questionnaire; therefore, parent-proxy reports have been developed for some pediatric QoL instruments (Ravens-Sieberer and Bullinger 1998, 2000; Varni et al. 1999, 2001, 2008).

Some evidence has shown that even children younger than 8 years can use rating scales, understand response terms and the underlying concepts, and assess their own QoL (Cremeens et al. 2006a, b). Therefore, using child self-reports is better than using parent-proxy reports to measure a child’s QoL. However, even though a self-report is better than a proxy-report, a parent-proxy report provides valuable supplemental information for health practitioners. Firstly, concurrent parent-proxy reports and child self-reports can be used to examine the construct validity of QoL instruments. Because QoL is multidimensional (WHO 1993), items on QoL questionnaires are supposed to be discriminant for different dimensions, and to be convergent for the same dimension. The construct validity (i.e., discriminant and convergent validity) for the different and same dimensions can be examined by using two measuring methods (e.g., parent-proxy reports and child-self reports). The test for measuring the validity of these methods is called multitrait-multimethod (MTMM) analysis. Correlations of the same trait (the same dimension in QoL) between different methods are supposed to be higher than those of different traits (different dimensions in QoL) between different methods (Campbell and Fiske 1959). Furthermore, MTMM analysis that uses confirmatory factor analysis (CFA) models can provide thorough evidence of construct validity (Huang and Michael 2002; Marsh and Grayson 1995) and a quantitative description of the method effect for QoL instruments. Because the instrument has been validated, healthcare practitioners can evaluate the patients’ health conditions and assess the effectiveness of their interventions.

The other advantage of parent-proxy reports is that healthcare practitioners can know the parents’ perceptions of their children’s QoL, and can compare the agreement between parents and children (Eiser and Morse 2001; Lin et al. 2013b; Upton et al. 2008). After knowing the agreement, healthcare practitioners may help to improve the children’s QoL (Su et al. 2013). For example, when parents overrate their children’s QoL, healthcare practitioners can help parents understand their children’s QoL difficulties. Likewise, when parents underrate their children’s QoL, healthcare practitioners can help parents understand that their children are doing well and that they need not pressure their children to improve their QoL. In addition, parent-proxy reports can be the primary outcome measure when the child is too young or too ill to complete a self-report (Limbers et al. 2008), especially if agreement across parent-proxy reports and child self-reports is established. Therefore, knowing the relationships between parent- and child-reported questionnaires would be helpful for healthcare practitioners making clinical decisions.

The primary methods for comparing agreement between parents and children are correlation analyses, such as the Pearson correlation, Spearman correlation, and intraclass correlation (ICC). However, these correlation analyses measure only relative agreement; a high correlation reveals little information regarding agreement (Chen et al. 2007). In contrast, absolute agreement indices, including standard error of measurement (SEM) and smallest real difference (SRD), help healthcare practitioners understand the variances that are not caused by traits (Chen et al. 2007; Su et al. 2014). In other words, SEM and SRD provide the information of variances from different methods or from measurement errors. Moreover, SEM and SRD represent the agreement for a group of individuals and for a single individual, respectively. Therefore, SEM indicates how the ratings of a group of parents agree with their children’s ratings on QoL, and SRD indicates how the QoL rating of a parent agrees with that of his or her child. A lower SEM and SRD indicate better agreement. In addition to SEM and SRD, the Bland-Altman method provides a visualization of parent–child agreement (Rankin and Stokes 1998), and is helpful for healthcare practitioners to understand the relationship between child- and parent-rated reports.

To the best of our knowledge, no studies have tested the psychometric properties of a commonly used generic QoL questionnaire (Pediatric Quality of Life Inventory Version 4.0 [PedsQL]) with MTMM analyses using CFA. In addition, absolute agreements have not been examined for PedsQL. Because an instrument should be evaluated in several ways to establish its validity, this study focused on two purposes that correspond to psychometric developing. One purpose was to use two types of MTMM analysis: the traditional one and the one with CFA, to investigate the construct validity of the PedsQL. Another purpose was to assess both the relative and the absolute agreement across parent-proxy reports and child self-reports.

Methods

Sample

We used secondary data, for which the sampling and procedures have been reported elsewhere (Lin et al. 2013a). In brief, information about this study was disseminated by teachers of 11 elementary schools, and the children and parents interested in participating in this study signed informed consents. Those children who agreed to participate filled out the child self-report in their classrooms under the supervision of their teachers and one researcher. The parent-proxy reports were taken back home by the children for one of their parents to complete, and the parent reports were collected from the children 1 to 3 days later. In the current study, we included only 254 parent–child dyad questionnaires that had no missing values.

Quality of Life Questionnaire: Pediatric Quality of Life Inventory Version 4.0 (PedsQL)

The Chinese version of the short form PedsQL is a validated 15-item questionnaire for children aged 8 to 12 years (Lin et al. 2012a, 2013a). In addition, evidence supports the notion that comparisons between parents’ and children’s reports are meaningful, that is, that measurement equivalence has been established across parent reports and child reports of PedsQL (Lin et al. 2013a). Each item on the PedsQL asks the frequency of a difficulty’s occurrence. These frequencies are then transformed into a 0–100 scale (never = 100, almost never = 75, sometimes = 50, often = 25, and almost always = 0). Four subscale scores, two summary scores, and one total score are computed in parent- and child-reported PedsQL separately and according to the developer’s instructions (Chan et al. 2005; Varni et al. 2001). The physical summary score is the same as the physical subscale, and the psychosocial summary score is the average of the other three subscales. Higher scores represent better QoL for the corresponding subscales, summary scores, and total score.

Data Analysis

For MTMM analyses, Pearson correlation coefficient, Cronbach’s α, and CFA models were used. An 8 × 8 Pearson correlation matrix was created, and the correlation effects were designated as small (.1 to .29), medium (.3 to .49), and large (≥.5) (Varni et al. 2001). A Cronbach’s α value >.7 is regarded as acceptable (Portney and Watkins 2000). In addition, five competing CFA models were used to explore the structure of the MTMM matrix and the effects of traits and methods on items. Model 1 is a four-oblique-trait-factor (physical, emotional, social, and school factors) model (Fig. 1a); Model 2Footnote 1 is a two-oblique-method-factor (parent- and child-reported PedsQL) model (Fig. 1b); and Model 3 is a one-general-factor (the overall QoL factor) and two-oblique-method-factor model (Fig. 1c). Model 4 is a four-oblique-trait-factor and two-method-factor model (Fig. 1d). Model 5 is a four-oblique-trait-factor and two-minus-one-method-factor model (Fig. 1e). Then, we compared Models 1 with 4 for testing the method effects, compared Models 2 with 3 and Models 2 with 4 for examining the convergent validity, and compared Models 3 with 4 for understanding the discriminant validity between hypothetic traits given method effect is considered. Model 5 was used to compensate one major limitation of our Model 4. That is, slightly insufficient degrees of freedom (df = 5 in our Model 4) that might lead to convergence problem. On the other hand, although Model 5 only contains one method effect (the child reports in our case), it also tested both trait and method effects as Mode 4 did. Because all the scores in subscales can be seen as normally distributed (all absolute skewnesses < 3 and all kurtoses < 10; Table 1) (Kline 1998) in both parent- and child-reported PedsQL, using the maximum likelihood estimation for CFA was appropriate.

Fig. 1
figure 1

Phy = physical; Emo = emotional; Soc = social; Sch = school; P = parent; C = children. Model 1: Four oblique trait-factors model with standardized coefficients. Model 2: Two oblique method-factors model with standardized coefficients. Model 3: One general factor and two oblique method-factors with standardized coefficients. Model 4: Four oblique trait-factors and two oblique method-factors model with standardized coefficients

Table 1 Descriptive statistics of parent- and child-reported PedsQL (N = 254)

In these competing CFA models, nine goodness-of-fit indices were used to evaluate how the data fit the hypothesized model. For the models, a χ 2 test that shows no significant difference between the hypothesized model and the data suggests a satisfactory fit. However, a χ 2 test is easily biased when the sample size is large; therefore, other indices are recommended instead (Hoyle and Panter 1995; Schweizer 2010). Consequently, we used χ 2/df, the standardized root mean square residual (SRMR), the root mean square error of approximation (RMSEA), the goodness of fit index (GFI), the adjusted goodness of fit index (AGFI), the comparative fit index (CFI), the incremental fit index (IFI), and the Tucker-Lewis index (TLI) to examine the model fit. A χ 2/df < 3, an SRMR and a RMSEA < .08, and other indices > .9 suggested a satisfactory data-model fit (Hoyle and Panter 1995; Schweizer 2010).

Relative agreement was examined using ICCFootnote 2: >.75 was considered excellent; between .75 and .4, fair-to-good; and < .4, poor (Rosner 2006). Absolute agreement was examined using SEMFootnote 3 (= standard deviation of the parent–child mean score × √[1−ICC]) and SRDFootnote 4 (=1.96 × SEM × √2). In addition, SRD%Footnote 5 (= SRD/total score range: SRD/100 in this study) was also calculated: an SRD% < 30 % was considered good agreement (Chen et al. 2009).

Descriptive data, Cronbach’s α, MTMM using Pearson’s correlation, and ICC were analyzed using SPSS 16.0 (SPSS Inc., Chicago, IL, USA). MTMM combined with CFA models were analyzed using AMOS 7.0 (SPSS Inc.). SEM, SRD, and SRD% were computed using Microsoft Excel 2007.

Results

Participant Characteristics

The 254 children were between 9 and 12 years old (mean ± SD: 10.68 ± 1.18 years), and 120 of them were girls (47.2 %). For their parents, mothers (n = 200) and fathers (n = 196) were 39.95 ± 5.29 and 42.69 ± 6.45 years, respectively. The proxy raters were mothers (n = 159), fathers (n = 82), others (e.g., grandparents) (n = 10), and anonymous (n = 3). The QoL scores are presented on Table 1.

Multi-Traits Multi-Methods of PedsQL

The correlation coefficients ranged from small-to-medium for monotrait-heteromethod (same QoL dimension in different methods, r = .25 to .46) and heterotrait-heteromethod values (different QoL dimensions in different methods, r = .11 to .32), and were from medium-to-large for heterotrait-monomethod values (different QoL dimensions in the same method, r = .31 to .58). To examine discriminant validity, Campbell and Fiske (1959) suggest comparing monotrait-heteromethod values both with heterotrait-heteromethod values and with heterotrait-monomethod values. Among 24 comparisons, 23 (95.8 %) monotrait-heteromethod values were higher than heterotrait-heteromethod values, but only 3 (12.5 %) values were higher than heterotrait-monomethod values. The Cronbach’s α values were all higher than .7 in both parent- (Physical: .77, Emotional: .84, Social: .81, and School: .80) and child-reported PedsQL (Physical: .73, Emotional: .83, Social: .83, and School: .74) (Table 2).

Table 2 Multi-traits multi-methods correlation for PedsQL (N = 254)

The best data-model fit for the five competing models, i.e., the models in which all indices met the criteria, were Models 4 and 5 (Table 3). The other three models did not meet the criteria in at least one case (Model 3) and in as many as nine (Model 1). Figure 1 shows the standardized factor loadings and correlation of Models 1 to 4.

Table 3 Goodness of fit indices for competing models (N = 254)

We then compared Model 4 with the other four.Footnote 6 The method effects, convergent validity, and discriminant validity were supported by the significant Δχ2 of Model 4 versus 1, versus 2, and versus 3 (Table 3). In addition, all the fit indices in Model 4 were better than those in the other three. The child reports had much higher standardized coefficients for each factor than did the parent reports. Moreover, no significant difference was found between Models 4 and 5, and suggests the two models are similar.

Agreement Between Parents and Children

The relative agreements of all the subscale scores (ICC = .452 to .621) and the total score (ICC = .555) between parents and children were from fair-to-good, except for the physical subscale, which was only close to fair (ICC = .398) (Table 1). In addition, the absolute agreements showed that all the subscale scores (SRD% = 20.29–24.56) and the total score (SRD% = 17.88) were satisfactory, except for the emotional subscale, which was only close to acceptable (SRD% = 30.55). Moreover, the Bland-Altman figures showed rightward arrows in all the subscales and the total score, and most parent–child dyads fell within the 95 % limits of agreement (mean difference ± 1.96 SD: the horizontal lines immediately above and below the center line at 0 00) (Fig. 2).

Fig. 2
figure 2

The Bland-Altman figures of parent- and child-rated PedsQL. Phy = physical; Emo = emotional; Soc = social; Sch = school; Psy = psychosocial; Total = total score; Mean = (parent-rated PedsQL score + child-rated PedsQL score)/2; Difference = parent-rated PedsQL score – child-rated PedsQL score

Discussion

Our findings support the construct validity of the Chinese version of the PedsQL and of the relationship between parent- and child-rated reports in a Taiwan sample. Method effects, discriminant validity, and convergent validity were evidenced by our competing CFA models. In addition, relative and absolute agreements between parents and children were acceptable in all of the QoL domain scores and in the total score.

Method Effects and the Construct Validity of the PedsQL

The MTMM matrix in this study is comparable to that in Varni et al. (2001), who reported medium correlation coefficients for the monotrait-heteromethod (r = .36 to .50) and heterotrait-monomethod values (r = .42 to .49), and small-to-medium correlation coefficients for heterotrait-heteromethod values (r = .17 to .33). All of their monotrait-heteromethod values were higher than all of their heterotrait-heteromethod values in 24 comparisons (100 %; our result: 95.8 %), and six of their monotrait-heteromethod values were higher than heterotrait-monomethod values in 24 comparisons (25 %; our result: 12.5 %). Although their results seem better than ours, both sets of results demonstrated the method effects of the PedsQL as evidenced by medium correlation in heterotrait-monomethod values and few monotrait-heteromethod values higher than heterotrait-monomethod values. Method factors rather than trait factors make observed items more correlated. We hypothesize that this method effect is caused by the halo effect (Thorndike 1920) in parents’ reports. In other words, when parents believe that their children have a good physical QoL, they also tend to think that their children have a good emotional, social, and school QoL. Thus, the halo effect raises the heterotrait-monomethod values above those of the monotrait-heteromethod values.

The CFA models in the present study demonstrated the method effects, and supported that the discriminant and convergent validity of the PedsQL is valid and reliable for measuring children’s QoL in various language versions (e.g., Chan et al. 2005; Huguet and Miró 2008; Kobayashi and Kamibeppu 2010; Upton et al. 2005), including the Chinese version used in Taiwan (Lin et al. 2012a, 2013a). However, although both parent-proxy reports and child-self reports are used, most of these studies examine the validity of PedsQL separately for these two methods. To the best of our knowledge, only Varni et al. (2001) has examined the construct validity by using the two methods at the same time, i.e., by using MTMM analysis, which is known to have several disadvantages when using the criteria to judge discriminant and convergent validity (Marsh and Grayson 1995). Campbell and Fiske (1959) said that the MTMM criteria should be viewed as “common-sense desideratum”, and Marsh and Grayson (1995), that the criteria were not a “summative evaluation” or “global summaries” of method effects, discriminant, and convergent validity. Therefore, the alternative approach to MTMM matrices, CFA models, is widely used because they provide a thorough examination of method effects and of discriminant and convergent validity (Marsh and Grayson 1995; Tomás et al. 2000). The present study extends the evidence of method effects, discriminant validity, and convergent validity for the PedsQL by using the CFA approach to MTMM methodology.

In addition, we found that the standardized coefficients of child-self report scores are higher than those of parent-proxy report scores. Items in a child-self report are better indicators of the traits we are interested in than are those in a parent-proxy report. Our findings support the notion of using child-self reports as the primary outcome for measuring children’s QoL (Limbers et al. 2008). We could also see that the two methods were correlated mainly due to they share the same QoL traits. When the traits were not included in our model (see Model 2; Fig. 1b), both methods are moderately correlated (r = .53). However, when the traits and methods were simultaneously taken into account, i.e., traits and methods were separately extracted (see Model 4; Fig. 1d), both methods are low correlated (r = .12). Therefore, the correlations between child- and parent-rated PedsQL were mainly contributed from their same QoL traits. Moreover, the best CFA model, that is, Model 4, showed unusually satisfactory fit indices (e.g., CFI = 1; IFI and TLI over 1; RMSEA = 0), and some may speculate the results of Model 4. However, our Model 5 as a compensate model for Model 4 showed acceptable fit indices (All CFI, IFI, and TLI <1, and RMSEA>0); hence, we are confident to our demonstrated method effects, discriminant and convergent validity in the PedsQL.

Agreements Across Parents and Children

Upton et al. (2008) pointed out that of 10 studies using PedsQL as the measurement, five found higher parent–child agreement for the physical part (i.e., concrete, observable characteristics), four others found higher levels for psychosocial domains, and one (Upton et al. 2005) found better agreement on the physical subscale only when the children were unhealthy and reported the correlation as .2 in the physical domain, and as .27 to .42 in the psychosocial domains for healthy children. Because our participants had no chronic illnesses (they were all healthy), our findings of better relative agreements in the psychosocial domain scores (ICC = .452 to .621, r = .30 to .46) than those in the physical domain score (ICC = .398, r = .25) agree with the findings of Upton et al. (2005). Cremeens et al. (2006a, b) also share a similar trend in a healthy sample: ICC = .12 in psychosocial domain, and = .02 in physical domain. While the relatively low ICC values in Cremmens et al’s study as compared with ours may be due to their young participants (age 5.5 to 8.5 years). Using a stable physically disabled sample, Sheffler et al. (2009) also found higher ICC values in psychosocial domain (.36 to .47) as compared with that in physical domain (.34). The ICC values in psychosocial domain (.31 to .74) also seemed to be higher than that in physical domain (.34) of an overweight sample (Lin et al. 2013b). In addition, our finding of an acceptable SRD% was comparable to that of Upton et al. (2008), who found only small and insignificant differences between parent-proxy report scores and child-self report scores. Moreover, the Bland-Altman figures show that parent–child agreements were higher in children with a better than a worse QoL.

Clinical Implications

Our findings suggest several implications for health practitioners. First, it is feasible to use the PedsQL to rate children’s QoL and to evaluate their QoL difficulties. Because we found that the factor structures of the PedsQL’s parent-proxy reports and child self-reports are consistent, we suggest using both at the same time if both can be applied. In addition, parents and children should not fill out their questionnaires together because it may increase their agreement (Upton et al. 2008). We recommend taking the child-rated QoL as the primary QoL outcome, and the parent-rated as the secondary outcome. Second, comparing the difference between parent- and child-rated QoL can also provide health practitioners with information that they can use to understand whether parents know their children’s QoL difficulties, and that should suggest whether they ought to intervene when, for example, a large difference is found. Third, when the child is unable to fill out the PedsQL, health practitioners can use the parent-proxy report to evaluate the child’s QoL, but the discrepancy between the two forms should be taken into account. Although the parent-proxy report might provide the health practitioners some insight of the child’s QoL, the parent-rated QoL cannot completely substitute the child-rated QoL. For example, our results showed that parents tend to overestimate their children’s QoL in Physical, Emotional, and Social domains; while underestimated in School domain. The overestimation and underestimation should be concerned when use a parent-proxy report to estimate a child’s QoL. Moreover, if a parent rates a low QoL for his or her child, health practitioners could be alerted that the child may have some difficulties. However, this suggests that health practitioners should additionally and thoroughly examine the child instead of treating the child mainly based on the parent-reported QoL.

Limitations and Future Directions

This study had some limitations. First, convenience sampling was used to recruit the participants (Lin et al. 2013a), which may limit the generalizability of our results. Second, the parents who participated in this study may pay more attention to their children than those who did not participate; thus, the results of discrepancies and agreements in this study may be limited. Because all the parents knew that the PedsQL was an instrument that would help them understand their children’s health, we assume that the parents who completed the questionnaire may care more about their children than those who did not. However, it is impossible for us to know the ratings from parents who do not participate. Moreover, when we compared the child-reported PedsQL scores of children whose parents filled out parent-proxy reports with those of children whose parents did not, we found no significant differences. Therefore, parents who did not complete the parent-proxy report likely had little or no effect on our results. Third, different kinds of parent-proxy ratings (from fathers, mothers, grandparents, and other caregivers) were used in this study. Jozefiak et al. (2008) reported only moderate agreement between mother-rated and father-rated proxy reports. In addition, they pointed out that their findings may be applicable only for countries with a gender-role structure similar to those in Sweden and Norway. Therefore, further studies are needed on the differences between the proxy ratings given by different types of raters of children’s QoL. Fourth, we only used two methods (i.e., parent- and child-reported PedsQL) to perform MTMM, while three or more methods are recommended to strongly support the convergent and discriminant validity (Campbell and Fiske 1959; Kenny and Kashy 1992; Marsh and Bailey 1991). Therefore, a third method is suggested for future studies to examine the validity of the PedsQL. For example, a teacher-reported PedsQL might be useful. Fifth, although the PedsQL can be used to assess children with chronic illness, this study did not recruit any children with a chronic illness. The discrepancy and agreement across parent-proxy reports and child self-reports may be different between children with and without a chronic illness. Therefore, future studies using the PedsQL to understand the discrepancy and agreement for children with a chronic illness are suggested. Moreover, because our study only recruited participants in Taiwan, the generalizability of our results may confine to the following East Asia countries/cities, Japan, South Korea, People’s Republic of China, and Hong Kong, which share similar culture or traditional value with our sample. All parents of the countries/cities highly respect academic achievement, and most of the school-age children in the countries/cities go to cram schools after public or private school (Lin et al. 2012b). Moreover, the mentioned countries/cities are collectivism and emphasize “we” mentality and group solidarity in societies (Chun et al. 2005; Hofstede 2001; Trandis 1995). However, future studies are still needed to corroborate our results.