Introduction

Autism spectrum disorders (ASD) are a group of developmental disabilities thought to affect about 1 in 68 children [9]. When compared to typically developing children (TD), many of them experience a markedly reduced well-being, measured as quality of life (QoL; [17]), not to mention the stress experienced by their parents [32, 38]. QoL is understood to involve both affective and cognitive elements [10, 14, 16]. Accurate measurement of QoL in children and adolescents with ASD is particularly important. This informs much of the utility of interventions. If interventions are not improving the well-being and quality of life of individuals with ASD, the value of these interventions would be questionable. Hence, it is important to establish accurate measurement of QoL among those with ASD and to establish the psychometrics of these QoL measurements. Yet exploration of QoL within populations with ASD has been limited. In the current study, we closely examine parent- and child-reported QoL in a large sample of children with and without ASD.

Recently two reviews have been published on QoL in autism. Ikeda et al. [25] undertook a systematic review of QoL assessments in children and adolescents with ASD and found that of 13 articles evaluating QoL, 11 evaluated QoL against either norms or TD control participants. Van Heijst and Geurts [54] reported a meta-analysis on quality of life in autism across the lifespan. They included 10 studies, overlapping with around 50% of the Ikeda et al. study. Thus, to date eleven papers have compared children and adolescents with ASD against norms or TD children and adolescents. Each found QoL was significantly reduced in those with ASD.

The above studies relied on parent report (4 studies), self-report (3 studies), or combined self- and parent-reported QoL (4 studies). The variable use of informants highlights the importance of analysing the value of self- and parent report, including possible measurement biases. To date, no studies have analysed these issues. This is surprising, because autism has been suggested to affect the child’s ability to reflect on his or her own experiences, and there has been much debate over the use of self-report measures in ASD. For instance, parents were found to rate the QoL of their children with autism lower than the children themselves do [29]. It could thus be argued that individuals with ASD and their parents use QoL scales differently from typically developing groups. Though this may not be specific to ASD, Bastiaansen et al. [4] reported that proxy-reported (i.e. parents or clinicians) QoL was found to be lower than that reported by the children themselves for all psychiatric or developmental conditions assessed. Despite that this result was more general than ASD, this still suggests proxy reports of QoL may be used differently than self-reported QoL and that this would then apply to those with an ASD. Given that QoL scores are based on subjective experience or parental estimations of these experiences, it is difficult to establish whether the underlying trait (subjective well-being) is indeed similar in those with ASD compared to TD groups. However, even in the absence of this information, we can disentangle whether parents and children differ in their probabilities of responding to QoL measures, using new techniques.

The Pediatric Quality of Life Inventory™ version 4.0 (PedsQL) is one of the most widely studied and cited assessments of child QoL reported in the literature, with more than 1100 citations to date (cf. www.pedsql.org) covering many thousands of children, both with and without various health conditions, and including at least seven evaluations of the item structure of this instrument. The PedsQL consisted of 23 items evaluating aspects of a child or adolescents quality of life. These are divided into four subscales addressing physical functioning, emotional functioning, social functioning, and last, school functioning. All items are responded to on five-point Likert scales (0 = never a problem to 4 = almost always a problem). Various findings suggest that underlying factor structure of the PedsQL is multidimensional, consisting of four [44], five [37, 42, 60], or even six factors [1, 28], and various configurations of second order factor models [23]. The four subscales may be each treated as a separate dimension; however, the PedsQL™ 4 manual specifies that all items provide a single, unidimensional score of well-being, representing the sum of all items [56]. In the present study, we seek to evaluate whether parents and children are responding to the same unidimensional overall construct of quality of life.

To date, the PedsQL has been used in six studies of children and adolescents with ASD [31, 36, 49, 50, 52, 55]. Reliability has generally been assessed by Cronbach’s α, and most of the reported values fall within acceptable limits for Cronbach’s α (0.72–0.93). However, Cronbach’s α is an insufficient basis from which to conclude either general reliability or that the scale is unidimensional. Cronbach’s α reflects only the lower bound of reliability, not the actual reliability, is vulnerable to manipulation through the number of test items, reflects the average relatedness of items for a particular sample, and does not reflect the unidimensionality of a particular set of items even when scores are high [41, 48, 51].

In the present day, with ready access to much more powerful techniques, Cronbach’s α is an insufficient index to establish the reliability of an instrument. Item response analysis or the Rasch model both offer a more sophisticated and nuanced way to establish reliability of an instrument, as well as its unidimensionality [12, 21]. For instance, using item response analysis, groups can be compared in their differential probability to endorse items; specifically, as well-being increases in an individual, it would be expected that they select or rate higher scores of well-being. If respondents in one group, for example parents, differ systematically from respondents in another group, for example children, the two groups would show a differential probability to endorse items. In these instances, differential item functioning (DIF) is used to measure differences between respondents’ likelihood to respond based upon their group membership and their level of trait or ability. Item response analysis first evaluates each individual’s ability and assigns all respondents to a class interval based upon their assessed trait level ability (i.e. overall score). Thus, all individuals are of known abilities (i.e. class interval) and known groups. All persons in the same class interval have the same underlying level of trait. Each class interval is relatively homogenous for the trait and can now be compared for differences in group membership (i.e. person traits). DIF analysis then tests the interaction of group and level of trait. Two kinds of DIF may be found. Uniform DIF arises where one or more groups have a consistent advantage relative to other groups—the test is therefore easier for one or more groups across all levels of ability. However, where the advantage is contingent upon both group and class interval (i.e. level of trait), this is non-uniform DIF. The presence of non-uniform DIF suggests modifications need to made to the scale or instrument being assessed [57].

While the item structure of the PedsQL has previously been analysed (cf. [18, 27, 33]), none have published this to date with ASD, evaluated how individuals with ASD might use the scale differently from TD individuals, or how parents and children with ASD may differ in their use of the PedsQL, thereby establishing the differential item functioning. The presence of DIF in a scale renders its use with different informants as problematic and likely to lead to erroneous conclusions [12, 21]. Agreement between parents and children on measures of QoL has been found largely to vary by measure [20, 53], and the PedsQL is no exception. Studies evaluating the PedsQL suggest DIF is present between parents of, and their TD children [27], between parents of different genders evaluating their own healthy children [18], between children with special health care needs and unaffected children [24], and between healthy and unwell children from the general population [30, 33]. Further, it has been found that some levels on items are not the most likely choice for any level of trait, known as disordered thresholds [26]. Therefore, the model may not be unidimensional in its standard form.

We aimed to establish if the PedsQL was responded to by parents of and ASD or TD children similarly. Consequently, further aims were to establish if DIF was present or not between parents and their children with ASD, between parents and their TD children, and between children with ASD and TD children. It was hypothesised that (1) parents of TD children when answering about their child and their children themselves would answer the PedsQL in similar ways; (2) parents of children with ASD when answering about their child and their children themselves would not differ on the PedsQL; and (3) the PedsQL was a unidimensional measure in children that assessed QoL regardless of the presence of a diagnosis of ASD.

Method

Participants

The sample consisted of 74 parents of TD children and the 74 children themselves, as well as 160 parents of children with ASD and 229 children with ASD. Gender distributions were equal between groups; there being 63 boys and 11 girls in the TD group (M = 85.1%, F = 14.9%) and 196 boys and 33 girls in the group with ASD (M = 85.6%, F = 14.4%; \(\chi_{(1)}^{2}\) = 0.009, p = 0.923). Children and adolescents with ASD were recruited from special primary and secondary schools in the Amsterdam region. Children were included based on a clinical diagnosis of Autism or Aspergers disorder established prior to recruitment according to DSM-IV-TR criteria [2] by psychiatrists and/or psychologists who were not involved in the current research project and who were qualified to make the diagnosis. The diagnostic process included psychiatric and neuropsychological examinations. The comparison group was recruited via public primary and secondary schools in the Amsterdam region. Parents confirmed the absence of ASD in the comparison group.

Data from all participants who successfully completed the Dutch version of the Peabody Picture Vocabulary Test-III [19] and who had a receptive verbal IQ of at least 70 or higher were included in the analyses. Data from these high-functioning participants with ASD (HFASD) were included when they met the clinical cut-off on the Social Responsiveness Scale [15, 45]. Consequently, the HFASD group consisted of 202 participants (173 boys; 29 girls) versus 68 participants in the comparison group (58 boys; 10 girls). Mean age of the final HFASD group was significantly higher compared to the comparison group (t (268) = 4.06, p < .001, d = .50; see Table 1). Receptive verbal IQ did not differ between the groups (t (268) = 1.53, p = .12, d = .19).

Table 1 Descriptives for the ASD group and the typically developing comparison group

Procedure

After receiving written permission from their parents, children and adolescents were invited to participate. All tests were administered by trained psychologists and master students and took place at the participants’ schools. Parents reports were obtained via mail. Quality of life was evaluated by use of the age appropriate versions of the PedsQL™ version 4, autistic symptomatology was confirmed by formal diagnosis and report from psychologists independent of this study and an SRS score greater than 60. Normal intelligence was confirmed using the Peabody Picture Vocabulary Test.

Measures

Quality of Life Subjective quality of life was assessed using the Pediatric Quality of Life Inventory™ version 4.0 (PedsQL; Parent, Child, 23 items; see Table 2). The instructions ask the respondent to indicate how much of a problem an item has been for the child during the past month. By formulating the instruction in this way, the informant is not asked to rate the presence of a certain behaviour, but if present, its impact on the child’s everyday functioning. The items are scored on a five-point Likert scale (0, 1, 2, 3, 4). Four subscales and a total score can be computed, covering the following dimensions of QoL: (1) physical functioning (8 items, e.g., “hard to do sports” or “having hurts”); (2) emotional functioning (5 items, e.g., problems with “feeling angry” or “trouble sleeping”); (3) social functioning (5 items, e.g., “trouble getting along with peers” or “being teased”); and (4) school functioning (5 items, e.g., “trouble keeping up with schoolwork” or “missing school”). Good reliability and validity have been reported for the American and Dutch versions [5] of the PedsQL. The PedsQL has three self-report versions for children, 5–7, 8–12, and 13–18 years. The age appropriate version was used with each child.

Table 2 PedsQL items

The Autism Diagnostic Observation Schedule (ADOS; [34]) assesses autism across age, developmental level, and language skills by observing social and communication behaviours. During a semi-structured observation, the ADOS interviewer offers playful activities (e.g. reading a story book) and topics of discussion (e.g. peer problems) to assess the socio-communicative abilities of the participant. Each of the participant’s behaviours is rated on a scale ranging from normal behaviour (0) to clearly deviant and autistic behaviour (2). An ADOS score of 7 or higher is indicative of the presence of an ASD [34, 35, 39].

The Social Responsiveness Scale (SRS; [15]) measures the severity of autism spectrum symptoms as they occur in natural social settings, with a 65-item questionnaire completed by parent or teacher. Several studies have found evidence for good internal consistency, test–retest reliability, interrater reliability, construct validity, and convergent validity (with both the ADOS and ADI-R) of the SRS [11, 59].

The Peabody Picture Vocabulary Test (PPVT; [19]) is designed as a test of receptive vocabulary achievement and verbal ability. The test consists of a series of pictures and is suitable for a wide age range (2–90 years). The participant has to match an orally given word to a picture. The total score is converted to a verbal IQ. The reliability of the PPVT tested with split–split half and test–retest administration is excellent and the construct and content validity good. The validity of the PPVT is evidenced by strong correlations between PPVT scores and overall intelligence [6, 7].

Data analyses

Data were analysed using RUMM2030 (2012; RUMM Laboratory Pty Ltd, Perth, Western Australia) and Winsteps (version 3.92.1, © John Linacre, 2016, www.winsteps.com). In order to evaluate the hypotheses, five main analyses were undertaken. The first consisted of parents of TD children and the children themselves, contrasted for DIF by respondent (parent vs child), and with a second further analysis treating paired respondents as a repeated factor in a separate facet analysis. The third consisted of data for parents of children with ASD and the children themselves, contrasted for DIF by respondent (parent vs child), and the fourth by treating paired respondents as a repeated factor in a separate facet analysis. The fifth consisted of data for children (ASD and TD), contrasted for differential item functioning (DIF) by diagnosis. Given the number of analyses to be undertaken, Bonferroni adjustments were made to error rates where appropriate. There were five main analytic approaches undertaken each with two principle unrelated analyses (fit and information). Bonferroni, Sidak, Holm-Bonferroni, and False Discovery Rate (step 1) techniques each suggest setting α = 0.01 for this number of tests.

In all models, cases missing data were removed listwise. Thereafter, an unrestricted polytomous or partial credit (PC) model was assessed across four parameters (location, scale, kurtosis, skew; giving 92 parameters to fit for 23 items). Models were all fitted to a convergence criterion of 0.001 or smaller, unless otherwise stated. A PC model allows each item to have different thresholds (the balance point between choosing, for example, 0 or 1). In all analyses, the five levels or ratings were found to be disordered for most items (cf. Fig. 1). Disordered thresholds indicate that a category is never the most likely to be chosen or endorsed regardless of level of trait. Consequently, in each model, all items were rescaled to three categories (0 → 0, 1 → 1, 2 → 1, 3 → 1, and 4 → 2), collapsing the three intermediate categories into a single category. Thereafter, all cases with extreme data were removed from each model. This strategy retained the broad structure and all items of the PedsQL, while maximising the fit to the unidimensional measurement model. Following this, model fits were evaluated and a likelihood ratio test contrasting the PC model to the rating model (where all thresholds are equal and are independent of the item).

Fig. 1
figure 1

Category probability curves indicating disordered thresholds for children with ASD for item 3 “it is hard for me to do sports activity or exercise”. Nowhere on curve 3 is a person more likely to choose response 3 than responses 2 or 4

Results

Parents compared to TD children

Prior to analysing the differences between children and parents of children with an ASD, it was first necessary to establish if there were differences between TD children and their parents on the PedsQL. Of the original 68 parents and their TD children, 2 children and 17 parents were removed due to missing data, giving a sample of 66 TD children and 51 parents. The likelihood ratio test was significant (\(\chi_{(21)}^{2}\) = 63.56, p < 0.001), suggesting the unrestricted PC model contained more information than the rating model. Consequently, a PC model was used to analyse the data. The data were found to fit the Rasch model [\(\chi_{(46)}^{2}\) = 64.86, p = 0.03; Infit 1.05 (0.58–1.78); Outfit 1.06 (0.62–1.39)].

Reliability indices were strong (Cronbach’s α = 0.858; PSI = 0.851), while the mean fit residual for items (Residual Fit = 0.193; SD = 0.859) and persons (Residual Fit = −0.292; SD = 0.991) was close to expected values, with no items having excessive fit residuals. The mean person location on the latent continuum was 2.366 (SD = 1.280). The plot of person–item thresholds was examined (see Fig. 2a), with scores for parents and TD children not differing (F (1,121) = 3.84, p > 0.05, η 2 = 0.03). A facet analysis was undertaken on the rescaled data treating rater (child or parent) as the repeated factor (n = 47 pairs), revealing both raters were unidimensional (\(\chi_{(92)}^{2}\) = 133.05, p = ns). Parents of TD children consistently overestimated their TD child’s own assessment of QoL (see Fig. 3). At the individual level (comparing parent to child for the 23 items), 13 parents significantly differed from their child (p < 0.01). In seven of these, the parent underestimated the child’s assessment and in six cases the parent overestimated the child’s score.

Fig. 2
figure 2

Person–item threshold distribution for all data for a parent of children with TD and the children themselves separated by respondent type; b parent of children with ASD and the children themselves separated by respondent type (red = parents response, blue = children’s response). (Color figure online)

Fig. 3
figure 3

Distribution of scores for parents and children. a Distribution of scores by location for TD children (blue line) and parents (red line). b Distribution of scores by location for ASD children (blue line) and parents (red line). (Color figure online)

DIF analysis was undertaken comparing parents and their TD children. Using Bonferroni corrected tests, no items showed significantly different class intervals. Parents and children differed on three items (item 4—“hard to lift something heavy”: F (1,110) = 36.97, p < .0007, η 2 = 0.25; item 7—“hurt or ache”: F (1,110) = 15.95, p < .0007, η 2 = 0.15, and item 19—“hard to concentrate”: F (1,110) = 19.06, p < .0007, η 2 = 0.13). There were no significant interactions of class interval by respondent. The three items with DIF were removed, resulting in a non-significant change in fit to the Rasch model; thus, the items were retained. Models were attempted with splitting out the items displaying DIF [57]; none resulted in model improvement. In summary, while parents of and TD children differed in their use of the PedsQL on some items, there were no concerning interactions between class interval and rater, meaning that while parents and children differed this did not depend upon the level of QoL in the child—both groups use the PedsQL in similar and consistent ways.

Parents compared to children with ASD

Differences between parents of children with ASD and the children themselves were then explored. Of the 160 parents of and 202 children with ASD, 40 children with ASD and 6 of their parents were removed due to missing data. This left a sample of 162 children with ASD and 154 of their parents. The likelihood ratio test was significant (\(\chi_{(21)}^{2}\) = 205.61, p < 0.001), suggesting the unrestricted PC model contained more information than the rating model. Consequently, a PC model was used to analyse the data. The data were restricted as detailed above, with all items rescored to have three categories; this did not result in a fitting model [\(\chi_{(115)}^{2}\) = 190.22, p < 0.001; Infit 1.05 (0.59–1.79); Outfit 1.06 (0.61–1.55)]. One item, item 21 (“I have trouble keeping up with my schoolwork”), had an excessive fit residual (2.92 > 2.5), removal of which improved the fit, but only marginally (p < 0.01), and so this item was retained for completeness. The failure of the test of fit alone is insufficient to conclude a lack of unidimensionality due to oversensitivity to sample size, which dictates the number of class intervals [46]. Further, given infit and outfit values were within acceptable ranges and that examination of fits revealed person and item fits were improved by retention of item 21; although classical reliability was slightly worsened, it was decided to retain the model with item 21 included. With item 21 retained, the reliability indices indicated the scale to be reliable in classical terms (Cronbach’s α = 0.778) and in Rasch model terms (PSI = 0.810). The mean fit residual for items (Residual Fit = −0.160; SD = 1.232) and persons (Residual Fit = −.247; SD = 1.113) was close to expected values. The mean person location on the latent continuum was 1.90 (SD = 0.990).

Unidimensionality was further evaluated by assessing the Principal Components (PrC) of the residuals. The first eigenvector accounted for 10.9%, while the second and third accounted for 9.0 and 8.4%, respectively. The location scores of all items with unrotated PrC factor loadings greater than ± 0.3 were contrasted to all items, and no significant difference was found (t (315) = .32, p = ns), while the two scores were highly related (r = 0.87; [43]). Thereafter, the location scores of items loading highly onto the first eigenvector were compared to those that did not load onto the first eigenvector and their location scores. No significant difference was found (t (315) = .17, p = ns), and the scores were strongly correlated (r = 0.67). Last items loading strongly onto the first eigenvector were compared to those loading strongly onto the second eigenvector, with no significant difference found (t (315) = .20, p = ns, r = 0.60). Consequently, it would appear that while the scale is not entirely unidimensional, it functions as a unidimensional scale among TD respondents. Examination of the residual correlation matrix revealed some local dependence (r > ±0.3). Of the 231 pairs, 7 pairs (3%) of items had correlations greater than the criteria (±.37 > r > ±.30; items 2 and 3; 14 and 15; 14 and 18; 15 and 16; 19 and 20; 19 and 21; 22 and 23; see Table 2).

The person–item threshold plot was examined (see Fig. 2b). A significant difference was found between parents of children with ASD and the children themselves (F (1,314) = 27.09, p < .001, η 2 = 0.08). In this broad analytic overview, parents rated the children lower than children with ASD rated themselves (parents M = 1.18, SD = 1.00; children with ASD M = 1.82, SD = 1.17). A facet analysis was undertaken treating rater (child or parent) as the repeated factor, with 134 paired cases. The likelihood ratio test was significant (\(\chi_{(44)}^{2}\) = 190.12, p < 0.0027), suggesting the PC model was more informative than the rating model. The model was found to fit the data, revealing the two levels of rater (self and parent) were unidimensional (\(\chi_{(92)}^{2}\) = 71.57, p = ns), meaning both groups utilised the PedsQL on the same dimension. However, parents of children with ASD tended to be less extreme than their child in estimation of their child’s QoL. For children with low QoL, parents tended to slightly overestimate their child’s own assessment of QoL and underestimate for children with above average QoL (see Fig. 3). At the individual level (comparing parent to child for the 23 items) in 27 cases, parents significantly differed from their child (p < 0.01). In 14 of these, the parent underestimated the child’s assessment, and in 13 cases, the parent overestimated the child’s score.

Thereafter, DIF analysis was undertaken. Using Bonferroni corrected tests, no item had a significant interaction of class interval by respondent, two items showed significantly different class intervals (item 14—“trouble getting along with peers”: F (1,304) = 4.84, p < .0007, η 2 = 0.02, and item 21—“trouble keeping up with schoolwork”: F (1,304) = 6.29, p < .0007, η 2 = 0.02). Parents and children differed on three items (item 4—“hard to lift something heavy”: F (1,297) = 28.91, p < .0007, η 2 = 0.09; item 11—“feel angry”: F (1,297) = 38.77, p < .0007, η 2 = 0.12, and item 23—“miss school—doctor appointment”: F (1,297) = 16.58, p < .0007, η 2 = 0.05). Removal of the three items displaying DIF by respondent did not improve model fit and generated other issues. Further, removal did not remove the overall group differences noted above, but instead resulted in an enlarged group difference (F (1,307) = 60.31, p < .001, η 2 = 0.16). Therefore, as removal of these three items would not improve fit, nor reduce overall group differences, these three items were retained. In summary, parents of and children with ASD differed in their use of some items of the PedsQL. Moreover, while parent’s assessment of their child’s QoL appeared to depend upon the child’s level of QoL, overall parents of children with ASD significantly underestimated their child’s QoL.

ASD children compared to TD children

Of the final sample of 270 participant children, 40 participants with ASD and 2 TD participants were removed due to missing data. This left 162 children with ASD and 66 TD children. The likelihood ratio test comparing to the rating model was significant (\(\chi_{(21)}^{2}\) = 114.02, p < 0.001, convergence 0.01), suggesting the unrestricted partial credit (PC) model contained more information than the rating model. The data were found to fit the unidimensional PC model [\(\chi_{(69)}^{2}\) = 92.22, p = 0.01; Infit 1.06 (0.49–1.98); Outfit 1.03 (0.52–1.58)]. While infit was high, scores such as this are considered unproductive, but acceptable. Consequently, no items were observed with excessive fit residuals. Consequently, the PC model was retained. Indices suggested the scale to be reliable in classical terms (Cronbach’s α = 0.842) and in Rasch model terms [Person Separation Index (PSI) = 0.806].

Given the two groups of children had different ages, it was decided to examine whether PedsQL scores differed by age group (5–7, 8–12, 12–18 based upon the PedsQL self-report forms) and then to establish if an age-matched data set differed from the main analysis. PedsQL scores were not found to differ based upon age group (F (2,264) = 2.73, p = .07, η 2 = 0.02). Child participants were then matched on age (n = 55 per group), which revealed the data fit the PC model [\(\chi_{(46)}^{2}\) = 61.06, p = 0.07; Infit 1.06 (0.43–1.72); Outfit 1.04 (0.46–1.74)]. Indices also suggested the scale to be reliable in Rasch model terms (PSI = 0.75), although the Cronbach’s alpha suggested a lower than desirable fit (α = 0.70). No issues related to DIF were found in the data. Last, groups were not found to differ (F (1,108) = 1.73, p = .19, η 2 = 0.02). As the differences in age show no effect and that the model based solely age-matched data showed no differences, we proceeded with the main analysis relying upon all the data for child participants.

The mean fit residual for items was −0.162 (SD = 1.035), and person fit residuals were −0.293 (SD = 1.191) which are close to the expected values of 0 and 1. Mean person locations on the latent continuum were 1.731 (SD = 1.026); mean item locations on the latent continuum were 0.00 (SD = 0.561). The plot of person–item thresholds was examined (see Fig. 4a), with no issues noted. Group differences by diagnosis were also examined (see Fig. 4b) and were not found to differ (F (1,226) = 0.03, p = 0.87, η 2 < 0.01). Subsequently, assessment of model structure and DIF was undertaken.

Fig. 4
figure 4

Person–item threshold distribution for a all data for child respondents and b separated by diagnostic group (red = HFASD, blue = TD). (Color figure online)

With 23 items in the data set evaluated for class interval (the band of ability or trait to which an individual belongs), diagnostic group, and the interaction of these, giving 69 tests, with α = 0.05, p was set to 0.0007. Examination of the results revealed no item differed by class interval, diagnostic group, or the interaction of these. In summary, it was found that children did not differ from each other by diagnostic group in how they assessed themselves on the PedsQL.

Discussion

Children with ASD and TD children use the PedsQL in similar ways, and the overall PedsQL conformed to a unidimensional model. The PedsQL when scored as specified was not found to fit a unidimensional Rasch model and was replete with disordered thresholds on most items. However, when this was addressed by collapsing scoring (i.e. 0 → 0, 1 → 1, 2 → 1, 3 → 1, and 4 → 2), it was found that parents and TD children utilise the rescored PedsQL in similar ways, and a unidimensional model easily fitted their data. Further, while parents of and children with ASD also used the rescored PedsQL in similar ways, with the rescored scale showing conformity to a unidimensional model, parents of children with ASD estimated the level of QoL in their children in a manner dependent upon the child’s level of QoL. Moreover, the rescored PedsQL displayed good levels of reliability in each of the three total scale assessments.

Others have previously reported that the PedsQL did not fit a unidimensional Rasch model [26, 30]. Disordered thresholds are where a particular response is never the most likely choice of a person at any given level of ability. Therefore, a modified model was analysed with three categories. Categories 1, 2, and 3 were collapsed into a single category, intermediate between the lowest and highest categories. Under this approach, we found all items in all groups functioned as expected. This model was found to fit the Rasch model, lacks any disordered thresholds, and was unidimensional when used by TD children, as well as their parents, and when used by children with an ASD. However, a unidimensional model for children with ASD and their parents was not obtained from model fit alone. It was necessary to infer from other indices of fit. The mean fit residual and person fit residuals were within acceptable limits. The item, “I have trouble keeping up with my schoolwork”, item 21, had an excessive fit residual, but its removal did not render the data fitting to the model, and fit indices were stronger with item 21 retained. Last, the lack of difference between parents of and children with ASD when analysed in the facet model revealed these groups treated the model unidimensionally. Analysing the data in this way tied each parent to their specific child. The success of the rescored model has implications for the scoring of individuals and suggests a modified scoring approach should be used in future where there are only three categories. Nonetheless, this result should be replicated before such a strategy is adopted, as this study was limited to a specific sample of children with ASD and a small sample of TD children, and their parents, and results differed by group.

The level of observed reliability in the rescored model, measured as Cronbach’s α, was good for the overall scale when measured among all participants. This was supported by person separation indices above 0.8 in all instances of the rescored model. The PSI indicates the strength which the instrument is able to distinguish between individuals of different ability and ranges from 0 (low) to 1 (high; [12]). These results suggest that the PedsQL once rescored to three categories is reliable and can adequately discriminate between individuals of different ability.

We explored whether items were responded to similarly by parents and their children and by both groups of children. Others have found DIF in the PedsQL [18, 24, 27, 30, 33]. The present study did not find DIF in the way in which TD children and children with and ASD used the PedsQL. Nonetheless, parents of TD children and the children themselves differed on three items (4, 7, and 19) with effect sizes ranging from 12 to 25%. Parents of and children with ASD also differed on three items (4, 11, and 23) with effect sizes ranging from 5 to 12%. As the overall scales between TD children and their parents did not differ, it is reasonable to conclude these significant DIF scores cancelled each other, but nonetheless, care should be taken when using these items with TD samples. However, as the parents of and children with ASD did differ at a group level in their use of the PedsQL, but not when each parent was tested against their child, this suggests the DIF may have contributed to the apparent group differences. However, removal of the three items displaying some respondent DIF did not remove the overall group difference, instead this exacerbated it. It should be noted that item 4 (“hard to lift something heavy”) differed between parents and their children for both the TD and ASD group. As children did not show DIF based upon diagnosis, it is likely that these differences in response to items may be related to the way parents interpret and respond.

The question of whether children with ASD have adequate insight to assess their own QoL can be addressed by facet analysis. It was found that TD children and their responding parents did not differ, with only about 3% of variance between them. A facet analysis treating respondent as the facet found no difference between respondents. Though not significant, parents of TD children consistently overestimated their child’s own estimate of QoL. By contrast, as a general trend, parents of children with ASD significantly underestimated their child’s QoL by 0.64 logits. However, the facet analysis revealed a more complex picture. This analysis paired each parent with their child. It was found that parents of and children with an ASD did not differ significantly in their scores. Children with ASD who had low self-reported QoL were rated as closer to average QoL by their parents, while children with ASD who had high QoL were rated by their parents also as closer to the average QoL for all children with ASD. In other words, parent estimates tended to be less extreme than their children’s responses. Thus, when rating their child’s QoL, parents differ from their child in how they perceive their child’s QoL. It is possible that children lack insight into themselves and that parents have better insight into their child. But this position is hard to parsimoniously reconcile with the data.

From the perspective of parsimony, parents must rely upon their observations of their child, who plausibly may conceal positive and negative aspects of their life. The facet analysis would suggest parents are not fully aware of their child’s QoL. This lack of shared knowledge has been noted among clinicians when dealing with children with ASD [3, 22]. Further, though in general, parents and children with ASD differ in how they responded to the PedsQL, when the target child was controlled for through the facet design, at the group level parents did not differ significantly from their matched child with ASD. This suggests, that for the most part, parents and children agree on their PedsQL rating. That there were a small number of parents of children with ASD significantly over- or underestimating their children suggests that it is only in some instances that disagreement between parent and child occurs.

Last, TD children and children with ASD were compared. These two groups were not found to differ in their PedsQL scores or to differ in the way they used any particular item. Hence, it is reasonable to conclude that for the most part, children with ASD do have sufficient insight to respond to an instrument measuring the QoL. So are parents able to suitably act as proxies? From the facet analysis provided, it would appear that some parents of children with ASD tend to describe their child’s QoL closer to the mean of all children than the child themselves does. This may not be surprising, given that children with ASD may not readily be forthcoming with the details of their daily lives [3]. In addition, we need to consider that parents may compare their children with ASD to other TD children when responding to the QoL questions, while such social comparison may be more difficult for children with ASD. Still, for the most part, parents and children agreed, and so, parents are reasonable proxies for children, but given that children are able to answer questions pertaining to QoL themselves, it seems reasonable to rely upon the child where possible (also see [58], for the importance of the child’s view).

These results contradict findings by some and support findings by others. Interestingly, in measures of social insight and of insight into cognitive processes, various authors report children with ASD do not display insight [13, 40]. In the case of Mehzabin and Stokes, the conclusion was based upon comparison between parents’ responses and of self-responses to questions about sexual behaviour. Broadbent and Stokes found a lack of insight into cognitive processes and did evaluate this explicitly. However, the contradiction between their conclusion and the data herein reveals that there may be different levels of insight into cognitive and emotional processes within individuals with ASD. The current data support findings by Schriber et al. [47], Sheldrick et al. [49], and Berthoz and Hill [8]. These authors each report establishing suitable level of insight for the evaluation of emotional processes among individuals with ASD.

The present study is not without limitations. While the overall sample size was sound, the sample of TD children and their parents was considerably smaller than the sample with ASD. In addition, though all children were diagnosed with an ASD by a suitably qualified and trained independent professional using an extensive clinical assessment, not all participants obtained the ADOS threshold for ASD, though the SRS confirmed the diagnosis for all ASD participants.

Conclusions

From the data presented herein, it is apparent that the PedsQL is a reliable measure that TD children and their parents can use, as can children with ASD, although their parents may not be as reliable reporters as the children themselves. The present results call into question the standard scoring structure for the PedsQL and suggest it may be wisest to restructure this into 3 categories rather than the typical 5. The most important finding here though is that children with ASD are able to adequately report on their own QoL.