Introduction

The measurement of pediatric health-related quality of life (HRQOL) in pediatric medicine and health services research has grown significantly over the past decade (Matza et al. 2004). HRQOL instruments are multidimensional, consisting at the minimum of the physical, psychological (including emotional and cognitive), and social health dimensions delineated by the World Health Organization (FDA 2006; World Health Organization 1948). Although the measurement of pediatric self-reported HRQOL in clinical trials and school and community populations has been advocated for a number of years (Eiser 2004; Varni et al. 2003b), the emerging paradigm shift toward patient-reported outcomes has provided the opportunity to further emphasize the value and critical need for pediatric patient self-report measurement as efficacy outcomes in clinical trials (Connelly and Rapoff 2006; FDA 2006; Razzouk et al. 2006; Schwimmer et al. 2005) and population-based investigations (Seid et al. 2006).

By definition, patient-reported outcomes are self-report instruments that directly measure the patient’s perceptions of the impact of disease and treatment as clinical trial endpoints (FDA 2006). Patient-reported outcomes include multi-item health-related quality of life instruments, as well as single-item symptom measures (e.g., pain visual analogue scale [VAS]) (Acquadro et al. 2003; Sherman et al. 2006; Willke et al. 2004). Recently, the Food and Drug Administration was quite definitive in their support of patient-reported outcomes by stating that “some treatment effects are known only to the patient” (FDA 2006). However, what has been an obvious recognition in clinical trials for adult patients, that is, that patient-reported outcomes are patient self-reported outcomes, has not received the same level of recognition in clinical trials for pediatric patients (Clarke and Eiser 2004). Consequently, demonstrating the measurement properties of child self-reported HRQOL instruments is an essential step toward meeting the requirements by the Food and Drug Administration for patient-reported outcomes in pediatric clinical trials, as well as other comparative group analyses.

Socioeconomic status differences and health-related quality of life measurement

Socioeconomic status (SES) differences in health outcomes have been extensively documented in adult and child populations (Chen et al. 2002; Denvir et al. 2006; Montazeri et al. 2003; Olson et al. 2004). A comprehensive review demonstrated a SES—health gradient in pediatric populations, with decreases in SES found to be associated in a monotonic fashion with increases in health problems (Chen et al. 2002). While reviews of the literature point to an increasingly clear picture of the effects of SES on child health outcomes, less evidence has been documented on the effects of SES on the broader concept of pediatric HRQOL (Bradley and Corwyn 2002). In children and adolescents, SES differences in self-reported HRQOL have been demonstrated irrespective of the instrument utilized (Varni et al. 2006b; von Rueden et al. 2006). For example, children in low SES schools reported significantly lower PedsQLTM Total Scale scores than children in low-middle SES schools and middle SES schools (Varni et al. 2006b).

Cultural and social differences among SES groups may lead to differences in the conceptualization of health-related quality of life. For example, some research with adults suggests that lower SES groups have the propensity to incorporate physical symptoms into their responses to questions designed to evaluate emotional functioning (Nguyen et al. 2004). Thus, items on a HRQOL measure that assess emotional and physical functioning may be understood differently by respondents in different SES groups. Specifically, for lower SES groups, the prevalence of physical symptoms may be a stronger marker of impairments in emotional functioning than affective symptoms (Nguyen et al. 2004). To date, no published studies have examined the factor structure of the PedsQLTM 4.0 Generic Core Scales or other pediatric health-related quality of life measures across different socioeconomic status groups. In order to have greater confidence that a HRQOL instrument is measuring the same constructs across different SES groups (i.e., that the items have the same meaning for all participants regardless of their SES), it is essential to demonstrate factorial invariance (Gregorich 2006; Meredith and Teresi 2006). This is particularly important in SES investigations given the educational achievement differences inherent across SES gradients for parents and their children.

Factorial invariance

Generic HRQOL instruments facilitate comparisons across diverse chronic health conditions, as well as benchmarking with healthy and community populations (Patrick and Deyo 1989; Varni et al. 1999). However, in order for these comparisons to be valid, items on a HRQOL measure must have equivalent meaning across the subpopulations being compared (French and Finch 2006). That is, they must demonstrate factorial invariance (Cheung and Rensvold 2002; Vandenberg and Lance 2000). Horn and McArdle (Horn and McArdle 1992) defined two hierarchical levels of factorial invariance. The first level of factorial invariance, which necessitates that the psychometric properties of an instrument are equal across groups (i.e., configural and metric invariance), must be established prior to testing a higher level of factorial invariance, which requires equality of item-specific intercepts across groups (i.e., scalar invariance) (French and Finch 2006; Horn and McArdle 1992; Little 1997). Configural invariance is a test in which the same pattern of fixed and free factor loadings is specified for each group (Vandenberg and Lance 2000), while metric invariance is a test where factor loadings for like items are held invariant across groups (Vandenberg and Lance 2000). Scalar invariance assesses whether intercepts of like items’ regressions on the latent variables are invariant across groups (Vandenberg and Lance 2000). Multigroup Confirmatory Factor Analysis (CFA) is a method used to assess these different levels of factorial invariance (French and Finch 2006; Vandenberg and Lance 2000). To the degree that components of the factor model are determined to be equal across subpopulations, factorial invariance of an instrument can be inferred (French and Finch 2006).

While the use of Multigroup CFA for invariance testing has grown considerably in recent years (French and Finch 2006; Vandenberg and Lance 2000), there has been a relative absence of studies that have examined the factorial invariance of pediatric HRQOL measures across socioeconomic status groups. Furthermore, the majority of studies to date have been concerned with establishing configural and metric invariance, ignoring higher levels of factorial invariance (Gregorich 2006; Vandenberg and Lance 2000). Without establishing stricter levels of factorial invariance such as scalar invariance, differences in observed health outcomes across socioeconomic status groups may be confounded by differences in item-specific intercepts (Meredith and Teresi 2006).

The 23-item PedsQLTM 4.0 Generic Core Scales is a health-related quality of life measure in which feasibility, reliability and validity have been supported for pediatric patients with chronic health conditions and healthy school and community populations (Varni et al. 2006a; 2003a, 2002a, 2004, 2006b, 2003b, 2007a, 2007b, 2002b, 2001), with international data on tens of thousands of healthy children and children with pediatric chronic health conditions published or in press in over 230 peer-reviewed journal articles since the 2001 publication of the PedsQLTM 4.0 Generic Core Scales in the journal Medical Care (see Publications, www.pedsql.org). The PedsQLTM has been translated into over 60 international languages for both research and clinical practice including, for example, German (Felder-Puig et al. in press), Catalan (Huguet and Miro 2008), Turkish (Laaksonen et al. 2007), Lithuanian (Makari et al. 2007), and Dutch (Reinfjell et al. 2006). The PedsQLTM 4.0 has demonstrated feasibility across numerous studies, as evidenced by requiring less than 5 min to complete and averaging less than 1% missing responses for the Total Scale Score. The objective of the present study was to examine the factorial invariance of children’s self-reported HRQOL on the PedsQLTM 4.0 Generic Core Scales across socioeconomic status groups using the Multigroup CFA framework (Vandenberg and Lance 2000).

Methods

Participants and settings

The sample contains child self-report data on 453 children ages 5–18 years from the PedsQLTM 4.0 Generic Core Scales database (previously published data, n = 453, 100.0%). Participants were recruited from subspecialty clinics and hospitals in which they were being seen for chronic illness care (cancer, n = 128, 28.3%; asthma, n = 13, 2.9%; cardiac conditions, n = 90, 19.9%; rheumatic condition, n = 60, 13.2%; diabetes, n = 110, 24.3%; cerebral palsy, n = 31, 6.8%; end stage renal disease, n = 21, 4.6%). Participants were assessed in-person (n = 446, 98.5%) or by telephone (n = 7, 1.5%). For in-person mode of administration, research assistants obtained written parental informed consent and child assent. Paper-and-pencil surveys were self-administered for parents and for children ages 8–18, and interview-administered for children ages 5–7 (and also in situations where the child was unable to read or write as a consequence of either physical or cognitive impairment). For telephone administration, parents of children ages 5–18 were called by a research assistant who explained the study and obtained verbal parental informed consent and child assent. The research assistant verbally administered the PedsQLTM 4.0 individually to the parent and the child. If the child was not home at the time of the initial call, the research assistant arranged for a call at another time. These research protocols were approved by the Institutional Review Board at Children’s Hospital and Health Center, San Diego and other appropriate local Institutional Review Boards and were carried out in accordance with universal ethical principles.

Measures

The PedsQLTM 4.0 (Pediatric Quality of Life InventoryTM Version 4.0)

The a priori conceptually-derived measurement model for the 23-item PedsQLTM 4.0 Generic Core Scales specified four scales: (1) Physical Functioning (8 items), (2) Emotional Functioning (5 items), (3) Social Functioning (5 items), and (4) School Functioning (5 items), which were developed through focus groups, cognitive interviews, pre-testing, and field testing measurement development protocols (Varni et al. 2001, 1999).

An empirically-derived measurement model from the initial field test of the PedsQLTM 4.0 Generic Core Scales utilizing principal components analysis with oblique rotation on the 23 items to test the underlying dimensions indicated that a five-factor model is appropriate (Varni et al. 2001). Specifically, the School Functioning Scale items split into two different factors, with three items measuring school-related cognitive functioning and two items measuring days missed from school due to illness. Thus, based on these findings from the initial field test, a five-factor model was specified for the CFA analyses in the current study.

The PedsQLTM 4.0 Generic Core Scales are comprised of parallel child self-report and parent proxy-report formats. Child self-report includes ages 5–7, 8–12, and 13–18 years. The items for each of the forms are essentially identical, differing in developmentally appropriate language. The instructions ask how much of a problem each item has been during the past one month. A 5-point categorical response scale is utilized across child self-report for ages 8–18 (0 = never a problem; 1 = almost never a problem; 2 = sometimes a problem; 3 = often a problem; 4 = almost always a problem). Items are reverse-scored and linearly transformed to a 0–100 scale (0 = 100, 1 = 75, 2 = 50, 3 = 25, 4 = 0), so that higher scores indicate better HRQOL. Scale scores are computed as the sum of the items divided by the number of items answered. To further increase the ease of use for the young child self-report (ages 5–7), the response scale is reworded and simplified to a 3-point scale (0 = not at all a problem; 2 = sometimes a problem; 4 = a lot of a problem), with each response choice anchored to a happy-to-sad faces scale. For the purposes of these analyses, child self-report data across socioeconomic status groups for children 5–18 years were utilized.

PedsQLTM Family Information Form

The PedsQLTM Family Information Form, completed by parents (Varni et al. 2001), contains demographic information including the child’s date of birth, gender, race/ethnicity, and parental education and occupation information.

For these analyses, socioeconomic status groupings were assigned according to the Hollingshead Index of Social Status (Hollingshead 1975), which was calculated for each child based on the parental education and occupation reported by parents on the PedsQLTM Family Information Form. The Hollingshead Index integrates an ordinal ranking of nine occupational categories, ranging from higher executives, proprietors of large businesses, and major professionals (9) to farm laborers/menial service workers (1), with a seven category ranking of educational categories ranging from attainment of a professional degree (7) to completing less than the seventh grade (1) (Hollingshead 1975). The occupational score is weighted by five and the educational score is weighted by 3; these two weighted scores are added together to create an overall score (Hollingshead 1975). If both parents are employed, the education and occupation scores for the father and mother are summed and divided by two (Hollingshead 1975). If only one parent is employed, the education and occupation scores for that parent are utilized. Overall scores range from a high of 66 to a low of 8. Scores ranging from 66 to 55 are designated as upper social status, 54–40 are designated as upper-middle social status, 39–30 are designated as middle social status, 29–20 are designated as lower-middle social status, and 19–8 are designated as lower social status (Hollingshead 1975).

For the present study, the upper (66–55), upper-middle (54–40), and middle (30–39) social status groups designated by Hollingshead (Hollingshead 1975) were combined into one group (Upper), as were the lower-middle (29–20) and lower (19–8) social status groups (Lower). Grouping participants in this manner initially resulted in a total of 828 participants in the Upper SES group and 259 participants in the Lower SES group. Given the strong association between race/ethnicity and socioeconomic status (Barrett et al. 1995), in order to control for race/ethnicity in our analyses, we randomly matched the Upper socioeconomic status group on race/ethnicity to the Lower socioeconomic status group utilizing the SPSS Version 15.0 statistical software random sample case selection command (SPSS 2007). Due to the disproportionate number of White-Non-Hispanic participants in the Upper SES group (59%), this resulted in excluding 616 participants in the Upper SES group from the analyses. Thus, the final sample for the Upper SES group (Hollingshead Index ranged from 30 to 66) randomly matched by race/ethnicity to the Lower SES group was 212. The final sample for the Lower SES group (Hollingshead Index ranged from 8 to 29) after excluding participants that had missing race/ethnicity data was 241. As such, for the present analyses, the heuristic recommendations set forth by Guilford and Kline, which suggest a minimum sample size of 200 per group when conducting confirmatory factor analysis (Guilford 1954; Kline 2005; MacCallum et al. 2001), were met.

Statistical analyses

A five-factor model was specified for the CFA analyses. We first conducted an omnibus test of the equality of sample covariance matrices (Model 0) (Cheung and Rensvold 2002; Vandenberg and Lance 2000). The omnibus test of the equality of covariance matrices simultaneously compares the groups on all parameters in the covariance matrix (O’Brien 1992). Thus, failure to reject the null hypothesis that the covariance matrices are invariant across groups is an indication of overall measurement equivalence across groups and consequently this test serves as an important first step in assessing factorial invariance (Vandenberg and Lance 2000). We next assessed configural invariance (Model 1). For the configural invariance model, the pattern of fixed and free factor loadings was held constant across the socioeconomic status groups while the magnitudes of these loadings were not constrained to be equal (Vandenberg and Lance 2000). Subsequently, metric invariance (Model 2) was tested by constraining corresponding factor loadings to be equal across groups (Horn and McArdle 1992; Vandenberg and Lance 2000). We proceeded by testing the scalar invariance (strong factorial invariance) model. This model (Model 3) further imposed equality constraints on the item intercepts across socioeconomic status groups (Vandenberg and Lance 2000). Following Cheung and Rensvold’s (Cheung and Rensvold 1999) recommendation, we fixed one item loading at unity for each factor, and also allowed the five latent factors of the PedsQLTM 4.0 Generic Core Scales to intercorrelate.

The chi-square statistic is an extremely sensitive statistical test, and with large samples such as the present one (n = 453), is not a practical test of model fit (Cheung and Rensvold 2002; Maydeu-Olivares and D’Zurilla 1996). Thus, consistent with recommendations set forth in the literature (Bentler 1990; Cheung and Rensvold 2002; Mulaik et al. 1989), we examined several different model indices of practical fit including the Root Mean Squared Error of Approximation (RMSEA) (Steiger 1990), the Comparative Fit Index (CFI) (Bentler 1990), the Tucker-Lewis Index (also known as the Non-Normed Fit Index; NNFI) (Tucker and Lewis 1973), and the Parsimony Normed Fit Index (PNFI) (James et al. 1982). For the RMSEA, values in the range of 0.05–0.08 were taken to indicate acceptable fit, values in the range of 0.08–0.10 to indicate marginal fit, and values larger than 0.10 to indicate poor fit (Browne and Cudeck 1992; Steiger 1989). For the NNFI and CFI indices, acceptable model fit was suggested by values greater than or equal to 0.90 (Bentler 1990; Hu and Bentler 1995; Mulaik et al. 1989). For parsimony indices however, such as the PNFI, the concept of optimal values may not pertain given that parsimonious indices are mainly influenced by model degrees of freedom (Sivo et al. 2006). We therefore do not invoke any absolute standard of model fit for the PNFI, but rather note simply that higher PNFI values reflect more parsimonious fit (Sivo et al. 2006).

For evaluating the relative fit of nested measurement equivalence models, Cheung and Rensvold (Cheung and Rensvold 2002) recommend that an appropriate criterion is the change in CFI, because large sample sizes, such as ours, will almost always signal a statistically significant chi-square difference, even when the model parameter differences are trivial (Cheung and Rensvold 2002; Vandenberg and Lance 2000). When conducting Multigroup CFA to assess factorial invariance, it is recommended that changes in CFI values of −0.01 or less are indicative of factorial invariance across groups (Cheung and Rensvold 2002; Vandenberg and Lance 2000). While it has been proposed that changes in CFI values from −0.01 to −0.02 may also be indicative of factorial invariance across groups (Vandenberg and Lance 2000), for the present analyses changes in CFI values of −0.01 or less were taken to indicate factorial invariance across groups (Cheung and Rensvold 2002; Vandenberg and Lance 2000). In each model comparison, the change in CFI was used to test whether the more constrained model (i.e., the model imposing more equality constraints across groups) resulted in worse fit. Statistical analyses were conducted using LISREL 8.5 for Windows (Joreskog and Sorbom 2003).

Results

Participant characteristics

Table 1 presents the age, gender, and race/ethnicity of the participants across the socioeconomic status groups, and suggests that with the exception of gender (P = 0.04), these characteristics do not differ significantly between the socioeconomic status groups in the current sample.

Table 1 Characteristics of Study Participants across Socioeconomic Status Groups

Factorial invariance: goodness of fit indixes

Table 2 presents the chi-square values (with df), fit indices, and change in CFI values for the measurement invariance models. Model 0 (invariant covariance matrices) demonstrates acceptable model fit (RMSEA = 0.0694; CFI = 0.956; NNFI = 0.920). Taken together, the fit indices for Model 1 (configural invariance) indicate that the five-factor measurement model of the PedsQLTM 4.0 Generic Core Scales has acceptable fit within each socioeconomic status group (RMSEA = 0.0867; CFI = 0.920; NNFI = 0.908). Based on the change in CFI criterion recommended by Cheung and Rensvold (Cheung and Rensvold 2002) and the indices of practical fit taken together, we conclude that Model 2 (the metric invariance model) fits as well as Model 1, supporting metric invariance (RMSEA = 0.0856; CFI = 0.919; NNFI = 0.910; ΔCFI = −0.001). We conclude that the scalar invariance model (Model 3) fits nearly as well as Model 2 (RMSEA = 0.0841; CFI = 0.918; NNFI = 0.913; ΔCFI = −0.001), supporting that item intercepts are relatively invariant across socioeconomic status groups (Vandenberg and Lance 2000). Considering all practical fit indices together we judge the five-factor measurement model of the PedsQLTM 4.0 Generic Core Scales to have acceptable fit across the socioeconomic status groups.

Table 2 Health-related quality of life and socioeconomic status: χ² Values (with df), goodness of fit indices, and change in CFI values for measurement invariance models

PNFI values (Table 2) give an advantage to the model imposing the most equality constraints across the socioeconomic status groups (Model 3), because it estimates fewer parameters and is thus more parsimonious (Mulaik et al. 1989). In short, estimating 2 separate measurement models across the 2 socioeconomic status groups (Model 1) results in similar empirical fit as estimating a single measurement model that constrains the factor loadings and item intercepts to be equal across socioeconomic status groups (Model 3; overall change in CFI from Model 1 to Model 3 is −0.002, from 0.920 to 0.918). However, using the model imposing the most equality constraints (Model 3) is much more parsimonious (PNFI improves from 0.762 to 0.819).

Table 3 presents the factor loadings for two models: (a) Model 1 (configural invariance model, in which factor loadings are estimated separately within each socioeconomic status group), and (b) Model 2 (metric invariance model, in which a single set of factor loadings is estimated for the socioeconomic status groups). Estimates in Table 3 reflect the common metric completely standardized solution (for Model 1 and Model 2). To be able to compare factor parameters across samples when conducting a Multigroup CFA, the variables in each group must be measured in a common metric (Joreskog and Sorbom 1996). In Lisrel, the common metric completely standardized solution standardizes the observed variables such that factor loadings can be compared across groups on a common scale (Joreskog and Sorbom 1996). Overall across the socioeconomic status groups, the items on the PedsQLTM 4.0 Generic Core Scales load similarly onto their 5 corresponding factors, providing further evidence that the items are measuring similar HRQOL constructs across socioeconomic status groups. It could be noted post hoc, however, that the factor loadings for three of the twenty-three items indicate that these items may not be operating identically across the Lower and Upper socioeconomic status groups. For these three items (Physical 7—I hurt or ache; Emotional 3—I feel angry; and Emotional 4—I have trouble sleeping), there is a general trend for the items to have higher factor loadings for the Upper socioeconomic status group.

Table 3 Health-related quality of life and socioeconomic status: standardized factor loadings for the configural and metric invariance models

Intercorrelations among the five PedsQLTM latent factors (averaged across groups) are in the range from 0.448 to 0.777. Intercorrelations among the five PedsQLTM latent factors for the Lower socioeconomic status group range from 0.423 to 0.833, while for the Upper socioeconomic status group they range from 0.272 to 0.710. Although these are relatively large factor intercorrelations, we do not judge them to be so large as to threaten the discriminant validity of the PedsQLTM 4.0 Generic Core Scales. That is, factor correlations significantly smaller than 1.0 suggest discriminant validity (Campbell and Fiske 1959; Widaman 1985). Although, the five PedsQLTM factors are intercorrelated (oblique), they are similarly intercorrelated across socioeconomic status groups, and therefore the factor obliqueness does not threaten the measurement equivalence of the scales.

Discussion

The present findings demonstrate that when self-reporting their HRQOL, children across the two socioeconomic status groups studied had a similar five-factor HRQOL structure on the PedsQLTM 4.0 Generic Core Scales. As a result, it can be concluded that children in this study had a similar interpretation of the items on the PedsQLTM 4.0 Generic Core Scales regardless of their socioeconomic status.

These findings contribute to the empirical literature on the PedsQLTM Measurement Model by demonstrating factorial invariance for child self-report across two SES groups. These findings complement recent findings that the PedsQLTM 4.0 Generic Core Scales demonstrate factorial invariance for child self-report across 12 individual age groups (Limbers et al. in press) and across health status (Limbers et al. 2008). As the consumers of pediatric healthcare, children and adolescents are uniquely positioned to give their self-reported perspectives on healthcare quality through perceptions of their own health-related quality of life outcomes.

The statistical methods utilized in the present study have important implications in general for research studies in which different SES groups are recruited, and specifically for studies in which the HRQOL of different SES groups are analyzed with the PedsQLTM 4.0 Generic Core Scales (Hassan et al. 2006). The results of the present study suggest that when differences are found across groups when utilizing the PedsQLTM, these differences are more likely real differences in self-perceived HRQOL for the groups studied, rather than differences in interpretation of the PedsQLTM items as a function of SES.

Standardized HRQOL instruments must demonstrate that test items are interpreted similarly across age, gender, language, socioeconomic, and race/ethnicity subpopulations (Gregorich 2006; Meredith and Teresi 2006), as well as mode of administration and health status categories. Demonstrating stricter levels of the hierarchy of factorial invariance across these subpopulations and methods is critical given the reliance of clinical trials, health disparities and comparative public health research on self-report instruments (Gregorich 2006; Meredith and Teresi 2006). Gregorich recently emphasized that there has been a general lack of comparative studies in medical and health services research utilizing self-report measures that have adequately demonstrated stricter levels of factorial invariance (Gregorich 2006). Further, he proposed that self-report instruments demonstrating stricter levels in the hierarchy of factorial invariance in some population groups should also be shown to demonstrate factorial invariance in additional, relevant populations. Thus, the establishment of factorial invariance is an iterative process, and must be replicated periodically as, for example, language groups, cultural norms, methods of administration, and the compositions of race/ethnicity and socioeconomic subpopulations change over time (Gregorich 2006).

The present study has several limitations. First, the socioeconomic status groups in our study differed significantly with regard to gender. Specifically, there were a higher percentage of girls in the Upper socioeconomic status group compared to the Lower socioeconomic status group (Table 1). To the best of our knowledge, no previously published studies have investigated the factor stability of the PedsQLTM 4.0 Generic Core Scale child self-report across gender. However, previous research with healthy children and children with chronic health conditions utilizing the PedsQLTM 4.0 Generic Core Scales child self-report has demonstrated no significant differences in PedsQLTM mean scores across gender (Berrin et al. 2007; Stern et al. 2007; Uzark et al. 2008; Varni et al. 2001). Further, it should be noted that any similarities in the factor structures of PedsQLTM 4.0 Generic Core Scales across the socioeconomic status subpopulations in the present study reflect similarities in the measurement properties of the scale despite gender differences across socioeconomic status groups. That is, our test of measurement invariance across socioeconomic status subpopulations represents a conservative test of measurement equivalence, since gender differences across socioeconomic status groups would be expected to exacerbate, not attenuate, groupwise differences in factor structures.

The large percentage of Hispanic participants in both SES groups may also limit the generalizability of these findings. In addition, for three items (Physical 7—I hurt or ache; Emotional 3—I feel angry; and Emotional 4—I have trouble sleeping), there is a general trend for the items to have higher factor loadings for the Upper socioeconomic status group. Thus, despite the acceptable fit of the five-factor model across socioeconomic status groups (i.e., both SES groups demonstrated equivalence in the pattern of factor structure suggesting that the relationship between the factors and the variables by which the factors are being measured is reliable), the construct validity for some items may be slightly better for respondents in the Upper socioeconomic status group. The response thresholds for reporting pain, sleeping problems, and anger may be higher in lower socioeconomic status groups, as the privilege to express these symptoms could be less normative for lower socioeconomic status children. Further research on the differential item characteristics curves of these three items (within an Item Response Theory framework) may be warranted. If this finding—that these three HRQOL items show lower loadings for low SES groups—replicates in future work, then there may be grounds for item revision or rescaling when the PedsQLTM is used to compare high versus low SES respondents. In the meantime, we mention that future SES comparative research on HRQOL be conducted with the awareness that inclusion of these three items may lead to subtle differences across groups in the underlying constructs being compared. If a researcher is especially concerned about these loading differences, then SES groups could be compared using stripped-down subscales from which these items have been removed, as an addition/supplement to SES comparisons on the full, validated scale.

Finally, there are multiple methods available for measuring SES, each with documented strengths and limitations (Cirino et al. 2002). The Hollingshead Index of Social Status was utilized in the present study because it is one of the most commonly used SES measure in the US, and has demonstrated a high level of agreement with other commonly used SES measures (Cirino et al. 2002). For the present study, larger sample sizes would have provide an opportunity for a more fine grained analysis of Hollingshead’s five SES groups. In the present study we combined the upper (66–55), upper-middle (54–40), and middle (30–39) social status groups designated by Hollingshead (Hollingshead 1975) into one group (Upper), and the lower-middle (29–20) and lower social status (19–8) groups designated by Hollingshead (Lower) into another group. We then randomly matched the Upper SES group to the Lower SES group on race/ethnicity given the strong association between race/ethnicity and socioeconomic status (Barrett et al. 1995). Doing so required us to exclude a large number of cases in the Upper SES group (n = 616) which has the potential to bias parameter estimates. It should be noted, however, that when examining factorial invariance of the PedsQLTM 4.0 Generic Core Scales across SES groups for child self-report in the full sample of 1,087 children, not matched on race/ethnicity, scalar invariance across socioeconomic status groups was demonstrated based on stability of the CFI between the models (ΔCFI values were all below −0.01; CFI’s were all at or above 0.936) and additional indices of practical fit including the RMSEA and the NNFI (RMSEA’s were all at or below 0.077; NNFI’s were all at or above 0.931). Taken as a whole, these data support the robustness of our findings of an equivalent five-factor structure of child self-report on the PedsQLTM across the SES groups studied.

In sum, quantitative comparative research requires that instruments measure constructs with the same meaning across groups, which has been inadequately reported in the extant comparative literature using self-report instruments (Gregorich 2006). Self-reported clinical outcomes such as HRQOL that are not directly observed require latent variable models such as Multigroup CFA to demonstrate factorial invariance across groups to support valid assessment (Meredith and Teresi 2006). The PedsQLTM 4.0 Generic Core Scales appear to be interpreted equivalently, and thus to measure the same constructs, irrespective of the socioeconomic status group of the children who provide their self-reported HRQOL.