Introduction

Children are uniquely positioned to self-report their perspectives on their health and well-being through their perceptions of their health-related quality of life (HRQOL) outcomes. The last 15 years have evidenced a significant increase in the development and utilization of pediatric HRQOL and symptom-specific measures in an effort to improve pediatric patient health and determine the value of health care services [1]. Although the measurement of pediatric self-reported HRQOL in clinical trials has been advocated for a number of years [2], the emerging paradigm shift toward patient-reported outcomes (PROs) has provided the opportunity to further emphasize the value for child self-report HRQOL measurement as efficacy outcomes in clinical trials [3, 4].

Age groups and health-related quality of life measurement

Recent US Food and Drug Administration guidelines recommend that instrument development and validation testing for children and adolescents be conducted within fairly narrow age groupings, and to determine the lower age limit at which children can provide reliable and valid responses that can be compared across age categories [4]. Consistent with these recommendations, it has been an explicit goal of the PedsQL™ Measurement Model to develop and test brief measures for the broadest age group empirically feasible, specifically including pediatric patient self-report for the youngest children possible [1, 5]. The PedsQL™ scales include child self-report for aged 5–18 and parent proxy-report for aged 2–18 [6]. The items chosen for inclusion were initially derived from the measurement properties of the child self-report scales, while the parent proxy-report scales were constructed to directly parallel the child self-report items. Thus, the development and testing of the PedsQL™ as a pediatric PRO explicitly emphasizes the child’s perceptions, including children as young as 5 years of age [7], and consequently serves as an age-appropriate instrument to test the lower age limits achievable for factorial invariance of child self-report.

Gender differences and health-related quality of life measurement

Gender differences in health outcomes have been extensively documented in children and adolescents [8]. In children, adolescents, and young adults, gender differences in self-reported HRQOL have been demonstrated irrespective of the instrument utilized [913]. However, in order to have greater confidence that a HRQOL instrument is measuring the same constructs across different gender groups (i.e., that the items have the same meaning for boys and girls), it is essential to demonstrate measurement invariance across gender [14, 15].

Factorial invariance

Generic HRQOL and symptom-specific instruments enable comparisons across diverse pediatric populations [5, 16]. In order for these comparisons to be valid, items on such an instrument must have equivalent meaning across the subpopulations being compared [17, 18], that is, they must demonstrate factorial invariance [19]. Multigroup confirmatory factor analysis (MG-CFA) is one method used to assess these levels of factorial invariance across groups [17, 19]. To the degree that the components of the factor model (i.e., factor patterns, intercepts, and covariances) are determined to be equal across subpopulations, factorial invariance of an instrument can be inferred [17].

While the use of MG-CFA for invariance testing has grown substantially in recent years, there has been a relative absence of studies that have examined the factorial invariance of symptom-specific measures particularly in pediatric populations. Further, most of these studies have focused on establishing configural and metric invariance, ignoring higher levels of invariance that assess group differences in item-specific intercepts (i.e., strong/scalar or strict invariance) [19]. Without establishing stricter levels of invariance, age and gender differences in scores on health outcome instruments may be confounded by differences in what the instrument is measuring among the groups [20]. Table 1 shows the levels of invariance, the constraints imposed by the level, and the between-group comparisons allowed if the invariance level is tenable.

Table 1 Levels of measurement invariance from least to most restrictive

Recently, studies have demonstrated the factorial invariance of the PedsQL™ 4.0 Generic Core Scales longitudinally and across a number of subpopulations [2126], including age and gender [27, 28]. However, the factorial invariance of the fatigue construct utilizing the PedsQL™ Multidimensional Fatigue Scale across age and gender has not been previously reported.

The PedsQL™ Multidimensional Fatigue Scale

The PedsQL™ Multidimensional Fatigue Scale was designed as a generic symptom-specific instrument to measure fatigue across pediatric populations [29, 30]. Previously, the PedsQL™ Multidimensional Fatigue Scale scores have demonstrated good to excellent pediatric patient self-report reliability and validity in pediatric patients across a number of pediatric chronic health conditions, demonstrating significant correlations with generic HRQOL (more fatigue symptoms associated with lower generic HRQOL) [2939], including an expanding number of international translations and studies with young adult and adult patients [4043]. However, we are not aware of a study which has used the Multigroup CFA framework to compare the factorial invariance of the PedsQL™ Multidimensional Fatigue Scale, which may have significant utility as a statistical method for international cross-cultural assessment research in which different age and gender and other subpopulations are compared within and across countries.

Consequently, the objective of the present study was to examine the factorial invariance of child self-reported multidimensional fatigue across three age groups and gender utilizing the Multigroup CFA framework with the PedsQL™ Multidimensional Fatigue Scale.

Methods

Participants and settings

The sample contains item-level data from previously published child self-reports (n = 837) [29, 30, 33, 3537]. Participants were recruited from general pediatric clinics, subspecialty clinics, and hospitals in which children were being seen for well-child checks, mild acute illness, or chronic illness care. Participants were assessed in person or by telephone. For in-person mode of administration, research assistants obtained written parental informed consent and child assent. Paper-and-pencil surveys were self-administered for children aged 8–18 and interview-administered for children aged 5–7 (and also in situations where the child was unable to read or write as a consequence of either physical or cognitive impairment). For telephone administration, parents of children aged 5–18 were called by a research assistant who explained the study and obtained verbal parental informed consent and child assent. The research assistant verbally administered the PedsQL™ Multidimensional Fatigue Scale to the child. If the child was not home at the time of the initial call, the research assistant arranged for a call at another time. These research protocols were approved by all appropriate Institutional Review Boards (IRBs).

The average age of the 379 girls (45.3 %) and 377 boys (45.0 %) was 12.32 ± 4.98 years. Gender data were missing for 81 participants (9.7 %). With regard to race/ethnicity, the sample contained 308 (36.8 %) White non-Hispanic, 195 (23.3 %) Hispanic, 58 (6.9 %) Black non-Hispanic, 25 (3.0 %) Asian/Pacific Islander, 71 (8.5 %) American Indian or Alaskan Native, and 40 (4.8 %) other. Race/ethnicity data were missing for 140 participants (16.7 %). Seventy-five percent of the sample had a chronic health condition.

Missing survey data

On the 18-item PedsQL™ Multidimensional Fatigue Scale, one or more item responses were missing for 38 (5 %) of the respondents. Of those 38, 28 (74 %) were missing only one response, 7 (18 %) were missing only two responses, 1 (3 %) was missing 4 responses, 1 (3 %) was missing 8 responses, and 1 (3 %) was missing all responses. The respondent with missing values for all items was removed from the analysis and the other missing data were handled by using full information maximum likelihood estimation [44].

Measures

The PedsQL™ Multidimensional Fatigue Scale

The PedsQL™ Multidimensional Fatigue Scale (MFS) is an 18-item instrument encompassing three scales: (1) General Fatigue (6 items, e.g., “I feel tired.”; “I feel too tired to do things that I like to do.”), (2) Sleep/Rest Fatigue (6 items, e.g., “I feel tired when I wake up in the morning.”; “I rest a lot.”), and (3) Cognitive Fatigue (6 items, e.g., “It is hard for me to keep my attention on things.”; “It is hard for me to remember what people tell me”). The PedsQL™ MFS was developed based on the research and clinical experiences in pediatric chronic health conditions, and the instrument development literature [4547], which consisted of a review of the extant literature on fatigue in both adult and pediatric patients, patient and parent focus groups and individual focus interviews, item generation, cognitive interviewing, pretesting, and subsequent field testing of the new measurement instrument [29, 30]. In addition to the original studies [29, 30], subsequent studies have continued to support the reliability and validity of the PedsQL™ MFS scores for use with children and adolescents diagnosed with multiple chronic health conditions [2936, 39, 42, 43, 48, 49].

The format, instructions, response scale, and scoring method are identical to the PedsQL™ 4.0 Generic Core Scales, with higher scores indicating lower fatigue symptoms. The instructions ask how much of a problem each item has been either during the past month or the past 7 days. The PedsQL™ MFS has two forms, a child self-report and a parent proxy-report. There are three separate child and adolescent self-report forms: ages 5–7, 8–12, and 13–18 years. Only child self-report was utilized in the present study. The self-report forms were designed to be parallel, with the items only differing in reading level. A 5-point Likert-type response scale is utilized for the 8–12 and 13–18-year-old forms (0 = never a problem; 1 = almost never a problem; 2 = sometimes a problem; 3 = often a problem; 4 = almost always a problem). The 5–7-year-old form was simplified to a 3-point scale to increase the ease of use (0 = not at all a problem; 2 = sometimes a problem; 4 = a lot of a problem), with each response choice anchored to a happy to sad faces scale. Items are reverse-scored and linearly transformed to a 0–100 scale (0 = 100, 1 = 75, 2 = 50, 3 = 25, 4 = 0), so that higher scores indicate better HRQOL (i.e., fewer symptoms of fatigue). Scale scores are computed as the sum of item scores divided by the number of answered items. If more than 50 % of the items in the scale are missing, the Scale score is not computed [50].

PedsQL™ Family Information Form

The PedsQL™ Family Information Form [6] is a demographic questionnaire for parents to complete that asks about the child’s date of birth, gender, and race/ethnicity.

Statistical analysis

Multiple group factor analysis

The purpose of this study was to examine the invariance of the PedsQL™ MFS items for the self-report forms across both age and gender. To assess invariance, we used a MG-CFA approach, which assesses the invariance of measurement parameters (e.g., factor patterns) across two or more groups by using a series of increasingly stringent, nested models (see Table 1) [51]. We tested model fit with a bifactor model of the items [5254], which posits one general factor influencing all the items and three domain-specific factors representing the General, Sleep/Rest, and Cognitive Fatigue domains.

Researchers [20, 55] suggest two sets of criteria when testing for factorial invariance. The first (“traditional perspective”) examines the change in chi-square values (Δ χ2) across nested models. If, as the models grow more restrictive, the Δ χ2 values do not “significantly” change (using a given α level), this is evidence that a more restrictive model fits the data as well as the less restrictive model; thus, the more restrictive (i.e., more parsimonious) model should be favored over the less restrictive one.

The use of Δ χ2 values has been criticized because of the sensitivity to sample size [56]. Recently, Cheung and Rensvold [56] and Meade et al. [57] have argued that some alternative fit indices were not prone to this problem. Specifically, they found that the Comparative Fit Index (CFI) [58] and McDonald’s [59] Noncentrality Index (NCI) were more robust across a variety of sample sizes. Thus, the second line of evaluations criteria (“practical perspective”) recommends that invariance can be based on two criteria: (a) the multigroup factor model exhibits an adequate fit to the data and (b) the change in values for fit indices (e.g., Δ CFI, Δ NCI) is negligible.

Based on Byrne and Stewart’s [55] and Little’s [20] recommendation, this study used two sets of fit indices: one to assess overall model fit and the other to assess change in model fit between two models. As Hu and Bentler [60] recommend, we used multiple fit indices for both. For overall model fit, we included the root mean square error of approximation (RMSEA) [61], Comparative Fit Index (CFI) [62], McDonald’s Noncentrality Index (NCI) [63], and the standardized root mean square residual (SRMR) [64]. These indices were chosen as they represent a variety of fit criteria and they tend to perform well in evaluating different models [65]. For both overall model fit as well as change in model fit, we looked for patterns in the fit statistics and judged acceptance/rejection of the specific model based on the majority of the indices. For this study’s criteria of overall model data fit, we used the following: (a) RMSEA ≤0.08 [61, 66]; (b) SRMR ≤0.08, [60, 67]; (c) CFI ≥0.96 [68]; and (d) NCI ≥0.90 [67, 69].

To test the change in fit between nested models, we used the Δ CFI and Δ NCI. Cheung and Rensvold [56] suggested 0.01 as the threshold of Δ CFI and 0.02 as the threshold for Δ NCI. Meade et al. [57], however, suggested more restrictive values of 0.002 for Δ CFI and 0.007 for Δ NCI (based on having 3 factors and 18 indicators, p. 586) to maximize power. As this issue is not yet resolved, we considered both values for the Δ CFI and Δ NCI, with values less than Meade et al.’s [57] criteria indicating stronger evidence of invariance than values only meeting Cheung and Rensvold’s [56] criteria. All analysis was done in R [70] using the lavaan [71] and psych [72] statistical packages.

Analyzing item-level data

Before testing for invariance, the data were inspected to see whether a categorical data model or a continuous data model should be used. Previous research has shown that treating categorical variables as continuous is usually not problematic when the variables have at least three categories and do not have substantial differential skew (i.e., one variable is highly skewed positively and another is highly skewed negatively) [7375].

To assess for skew, we first examined all items together using Mardia’s test [76]. The results (b1p = 48.46, χ2 = 6,453.78, df = 1,140) indicate that some of the items might have substantial skew, so we examined each item independently. The only items with a skew statistics substantially above 1 were the fourth item on the General Fatigue Scale (skew = 1.74) and the fifth item on the Sleep/Rest Fatigue Scale (skew = 1.52). We subsequently plotted the frequency of each items response to examine whether the skew was in the same direction or different directions. Figures 1, 2, 3 indicate that any skew that exists is all in the same direction.

Fig. 1
figure 1

Item response distributions for General Fatigue Scale

Fig. 2
figure 2

Item response distributions for Sleep/Rest Scale

Fig. 3
figure 3

Item response distributions for Cognitive Scale

Next, we compared the correlation matrices between categorical (polychoric) and continuous (Pearson) estimators. The difference was minimal for the items within a domain (SRMR = 0.06) as well as items between domains (SRMR = 0.05). Second, separate exploratory factor analysis was conducted assuming the indicators were either continuous or categorical. Both the parallel analysis [77] and the minimum average partial analysis [78] (using both Pearson and the polychoric correlations) [79, 80] indicated that three factors should be extracted. Consequently, the indicators were treated as continuous for the purpose of this study, but we used a maximum likelihood estimator with robust standard errors [81], which is a better estimator to use when the data may not meet the multivariate normal assumption of traditional ML estimation [82], and help the performance of full information estimation [83].

Results

General model

Initially, we fit a baseline model using all the groups combined for each domain separately. Within each domain, there was a single factor that accounted for all the item covariance. For the General (Model General 1 in Table 2), Sleep/Rest (Model Sleep/Rest 1), and Cognitive (Model Cognitive 1) domains, the model did not fit the data badly, indicating that, within a domain, the items appear to be unidimensional. We then fit a model combing items from all three domains. First, we fit a combined model (Model All 1) that had the six items within a domain as the sole indicators of their intended factor and allowed the three domain factors to covary (i.e., oblique factors). According to most of the alternative fit indices, the model fits the data adequately, but the NCI was below the suggested value of 0.90. We then fit a bifactor model of the items [52, 53], which posits one general factor influencing all the items and three domain-specific factors representing the General, Sleep/Rest, and Cognitive Fatigue domains (Model All 2). All the alternative fit indices indicate that this model fits the data better than the three oblique factor model. Consequently, we then used the bifactor model (Model O2) as the baseline model to test for invariance. For a graphical representation of the model, see Fig. 4.

Table 2 Results from confirmatory factor analysis of PedsQL™ Multidimensional Fatigue Scale using all respondents
Fig. 4
figure 4

Bifactor model of PedsQL™ Multidimensional Fatigue Scale

Gender

To examine invariance across gender, we split the data by gender. Eighty-one of the respondents’ parents did not indicate the child’s gender, so were not used for this analysis, leaving 756 in the data set. First, we assessed for configural invariance (see Model S1 in Table 3). The results suggest that the model fits the data relatively well, although the CFI and NCI values were on the border of the “acceptable” range.

Table 3 Test for invariance by gender on PedsQL™ Multidimensional Fatigue Scale

The next step involved assessing for metric invariance, which we did by examining whether the factor pattern coefficients were the same across both genders. The results (Model S2) indicate that the overall model fits the data slightly better than model S1. Thus, there was enough evidence to continue the invariance assessment.

In the next model (Model S3), we examined scalar invariance by constraining all the indicator variables’ intercepts (i.e., the scales’ origins) across groups. The results suggest that the model fits the data relatively well, and the Δ CFI and Δ NCI values met both the Cheung and Rensvold’s [56] and Meade et al.’s [57] criteria. Consequently, we tested for further levels of invariance.

The next step involved examining whether any of the subtests’ unique (residual) variances (Model S4) were invariant across gender. While such invariance is not required to compare the latent constructs between male and females, it is a necessary step (along with the invariance of the factor variances) to determine whether the constructs’ reliabilities are the same across groups [19]. The results indicated that this model fits the data fairly well, although the CFI and NCI values drop below the “acceptable” range. The Δ CFI and Δ NCI meet the Cheung and Rensvold [56] criteria, but not the Meade et al. [57] criteria. Consequently, we tested homogeneity of factor variances using both the scalar (Model S3) and strict (Model S4) models as the baseline (Table 4).

Table 4 Standardized estimates of intercepts and factor loadings for MG-CFA model by gender

For the last step, we assessed whether the factor variances were invariant across the two groups (Models S5a and S5b). As both models fit the data as well as Model S3 and Model S4, respectively, the results indicate that the latent variances are not substantially different between genders. Subsequently, we assessed the reliability of each latent construct using ω [84]. Using the scalar invariance model, which allows different residual variances for males and females, the Sleep/Rest and Cognitive Fatigue Scales’ construct reliabilities are 0.99 for both males and females. For the General Fatigue Scale, the construct reliability is 0.98 for males and 0.97 for females. For the general fatigue factor (i.e., the factor related to all fatigue items), the construct reliability was 0.99 for males and females. Using the strict invariance model, the construct reliabilities were 0.99 for all three of the domain factors, as well as for the general fatigue factor.

Age

To assess for invariance across age, we then split the data by age form: Young Child (5–7 years), Child (8–12 years), and Adolescent (13–18 years). There were 87 participants (10.4 %) who completed the Young Child (5–7) form, 343 participants (41.0 %) who completed the Child (8–12) form, and 407 participants (48.6 %) who completed the Adolescent (13–18) form.

First, we assessed for configural invariance (see Model A1 in Table 5). All the alternative fit indices except NCI indicate that the model does not fit the data badly. Consequently, we used it to test the subsequent invariance model.

Table 5 Test for invariance by age form on PedsQL™ Multidimensional Fatigue Scale

The next step involved assessing for metric invariance, which we did by examining whether the factor pattern coefficients were the same across age groups. The results (Model A2) indicate that the overall model fits the data no worse than model A1, and the Δ CFI and Δ NCI values met the Cheung and Rensvold [56] criteria. Thus, there appeared to be enough evidence to continue the invariance assessment.

In the next model (A3), we examined scalar invariance by constraining all the indicator variables’ intercepts (i.e., the scales’ origins) across age groups. The evidence is mixed on how the model fits the data. The RMSEA and SRMR indicate that the model does not fit the data badly, but the NCI did not indicate a good fit to the data and the CFI is on the border of the unacceptable range. The Δ CFI met the Cheung and Rensvold [56] criteria, but the Δ NCI values did not. Consequently, we examined the intercepts in model A2 to determine whether there were any large differences in intercepts among the groups. The differences were minimal for a single intercept, so the model A3 was kept for future model assessment (Table 6).

Table 6 Standardized estimates of intercepts and factor loadings for MG-CFA model by age

The next step involved examining whether any of the subtests’ unique (residual) variances (Model A4) were invariant across the age groups. Such invariance is not required to compare the constructs across aged, but it is a necessary step (along with invariance of the factor variances) to determine whether the constructs’ reliabilities are the same across groups [19]. All fit indices for this model indicate that it did not fit the data well. Further data inspection showed that the majority of the residual variances needed to be unconstrained for this model to fit well, so none of them were considered to be equivalent across the age forms.

We next examined whether the factor variances were invariant across the age groups (Model A5). The results indicate this model fits the data as well as Model A3, with the Δ CFI meeting the Cheung and Rensvold [56] criteria and the Δ NCI meeting both the Cheung and Rensvold [56] and Meade et al. [57] criteria, so the variances were left equal across the age forms.

Subsequently, we assessed the reliability of each latent construct using ω [84] using model A5, which allows different residual variances for the age groups, but constrains the latent variables’ variances to be the same across groups. For the Sleep/Rest and Cognitive Fatigue Scales, construct reliability is 0.98 for the Young Child form and 0.99 for the Child and Adolescent forms. For the General Fatigue Scale, the construct reliability is 0.97 for the Young Child form and 0.99 for the Child and Adolescent forms. For the general fatigue factor (i.e., the factor related to all fatigue items), the construct reliability was 0.98 for the Young Child form and 0.99 for the for the Child and Adolescent forms.

Discussion

The present findings demonstrate that when self-reporting their fatigue, pediatric patients who completed the PedsQL™ Multidimensional Fatigue Scale (MFS) across the three age groups studied and gender had a similar three-factor multidimensional fatigue model structure, that is, across gender and age groupings, the MFS items related to their intended constructs similarly and had similar conditional means (intercepts). The factor covariances were the same across gender and age groupings, as were the factor mean scores, except for the cognitive domain across the age groups. Thus, across both gender and age, the MFS scores have the same meaning and can be interpreted similarly. While the residual variances were similar between males and females, they were not among the age groupings. Nonetheless, the reliability estimates were high across all groupings (ω ≥ 0.97), meaning that while the reliability of scores might differ among the age forms, the difference is minimal and, in general, the variability among observed scores is largely due to variance among the constructs they are designed to measure.

Raju et al. [85] succinctly describe the importance of measurement equivalence by stating that “When measurement equivalence is present, the relationship between the latent variable and the observed variable remains invariant across populations. In this case, the observed mean difference may be viewed as reflecting only the true difference between the populations” (p. 517). Factorial invariance is an essential component of the iterative process of demonstrating the measurement equivalence of latent constructs across groups, including gender and age subpopulations. MG-CFA across age and gender subpopulations has previously not been conducted in fatigue measurement in pediatric patients utilizing child and adolescent self-reported multidimensional fatigue instruments.

The MG-CFA statistical methods utilized in the present study have important implications in general for international comparative clinical research in children and adolescents in which different age and gender subpopulations are studies. Standardized assessment instruments must demonstrate that test items are interpreted similarly across age, gender, language, socioeconomic, health status, and race/ethnicity subpopulations [86]. Demonstrating stricter levels of the hierarchy of factorial invariance across these subpopulations is critical given the growing importance of patient-reported outcomes in international clinical trials, health disparities analyses, and comparative health research. Since fatigue has been found to be a common symptom in adolescent populations [87, 88], with potential gender differences in fatigue reporting [87, 89], then the demonstration of factorial invariance across age and gender is an essential step in further understanding any differences in pediatric self-reporting of multidimensional fatigue symptoms, including fatigue associations with, for example, physical, emotional, social, and school functioning in pediatric populations. Imperative to the evaluation of interventions designed to reduce fatigue is the need for reliable and valid assessment of multidimensional fatigue in pediatric populations, particularly given that previous longitudinal research in general pediatric populations of adolescents indicates that persistently fatigued participants demonstrate higher level of depression and anxiety, are less physically active, and sleep a shorter duration at night [88, 89].

The present findings contribute to the empirical literature on the PedsQL™ Measurement Model by demonstrating strict factorial invariance for child and adolescent multidimensional fatigue self-report across gender and strong factorial invariance across age subpopulations. The results of the present study suggest that when mean differences are found utilizing the PedsQL™ Multidimensional Fatigue Scale across the three age and gender subpopulations studied, these differences are more likely real differences in self-perceived multidimensional fatigue, rather than differences in the interpretation of the items as a function of age and gender. To our knowledge, the present study represents the first empirical test of the multidimensional fatigue construct in pediatric populations utilizing a bifactor model while testing for factorial invariance. These findings have important implications for scoring the PedsQL™ Multidimensional Fatigue Scale and suggest that both a total scale score comprised of all 18 items, as well as individual scale scores for each of the 6-item individual scales (General Fatigue Scale, Sleep/Rest Fatigue Scale, and Cognitive Fatigue Scale) are justified [90].

The present study has several limitations. First, given the available sample size, the age groups were limited to the PedsQL™ instrument age groups of 5–7, 8–12, and 13–18 years. With a larger sample size, it would have been more ideal to study each individual age group, as well as race/ethnicity and mode of administration subgroups, as we were able to do for the PedsQL™ 4.0 Generic Core Scales factorial invariance analyses [23, 24, 27]. Further, the sample size for the 5–7 age group would have ideally been larger than what was available in the existing database in order to increase statistical power. Working from an existing database, information on nonparticipants was not available nor were response rates from the groups studied. Also, there were missing age and gender information for some of the participants which may limit the generalizability of the findings. Nevertheless, these findings complement the previous findings that the PedsQL™ 4.0 Generic Core Scales demonstrate factorial invariance for child and adolescents self-report across age, gender, language, socioeconomic, health status, and race/ethnicity subpopulations [2123, 2628], and add factorial invariance analyses across age and gender for the PedsQL™ Multidimensional Fatigue Scale to this emerging list of pediatric factorial invariance studies in HRQOL and symptom-specific instruments.