Introduction

There is a longstanding theoretical connection between autism spectrum disorder (ASD) and theory of mind (ToM), the capacity to attribute mental states to oneself and others (Premack and Woodruff 1978). Many studies have provided data consistent with the hypothesis that social difficulties in ASD are explained by a decreased ability to use mental state concepts to interpret and predict one’s own and other people’s behavior (Baron-Cohen et al. 2000; Tager-Flusberg 2007). Although a “single cause” explanation for all the features of ASD is now considered implausible (Happé et al. 2006), the theoretical influence of ToM on social abilities in ASD remains highly prominent in the literature, with more recent work examining the potential combined contribution of ToM with other factors such as executive function (Jones et al. 2018).

Traditionally, ToM is assessed by asking young children to reflect on a naive story character that holds a “false belief.” False belief understanding is essential for social interactions, including both “first-order” false beliefs, which indicate what children think about real events, and more advanced “second-order” false beliefs, which refer to what children think about other people’s thoughts (Perner and Wimmer 1985). Preschoolers with ASD consistently show deficits in false belief reasoning compared to typically developing children (Baron-Cohen et al. 2000; Senju 2012). However, the sensitivity of traditional false belief paradigms is limited when used with school-aged and cognitively unimpaired children with ASD, who usually perform as well as typically developing children on these tasks (Scheeren et al. 2013; but see Senju et al. 2009).

Within ToM, there are distinct theoretical types with varying complexity, as well as different ways of measuring the broad construct of ToM (Apperly 2012). Advanced ToM consists of many abilities, including higher-order false belief understanding (Perner and Wimmer 1985), emotion recognition (Baron-Cohen et al. 1997), and social understanding (Baron-Cohen et al. 1999). A recent factor analysis found that advanced ToM abilities, unlike basic ToM, include social reasoning, reasoning about ambiguity, and recognizing transgressions of social norms (Osterhaus et al. 2016). These advanced ToM abilities are conceptually distinct from the more basic type of ToM measured by traditional false belief reasoning tasks.

Recent work has begun to examine advanced ToM abilities in school-aged and cognitively unimpaired children with ASD. Some studies have demonstrated that these children perform equally well on advanced ToM tasks as children without ASD (Begeer et al. 2016; Scheeren et al. 2013), while other studies continue to find deficits (Baribeau et al. 2015; Salter et al. 2008; White et al. 2009). However, across studies, the definition used to characterize advanced ToM is inconsistent, and there is variability in the tasks used. For example, advanced ToM has been defined and measured as the ability to interpret the emotional cues in pictures of eyes in some studies (Baribeau et al. 2015; Baron-Cohen et al. 1997), but as the ability to reason about others’ thoughts, feelings, and behaviors in response to social stories in other studies (Mazza et al. 2017; Scheeren et al. 2013; White et al. 2009). Moreover, most studies have used a single measure to assess advanced ToM (Baribeau et al. 2015; Salter et al. 2008; Shimoni et al. 2012), rather than multiple measures within the same sample. The mixed literature suggests that future work should clarify which components of ToM are most impacted in ASD.

One possible reason for the inconsistent findings is that past studies did not adequately distinguish performance on tasks measuring distinct theoretical components of ToM. Research has suggested that affective and cognitive aspects of ToM are somewhat separable. Shamay-Tsoory and Aharon-Peretz (2007) found a distinction between cognitive ToM (knowledge about others’ beliefs) and affective ToM (knowledge about others’ emotions), which are thought to involve overlapping but dissociable brain systems. Another possible reason for inconsistencies in the literature is that laboratory-based ToM tasks vary in their sensitivity to real-life task demands. In order to be successful in real-life social situations, it is necessary to spontaneously identify relevant social information (e.g., relationships, intentions) before using social cognitive problem solving skills (Klin et al. 2003). Yet, only a few studies have examined spontaneous social attribution skills by eliciting open-ended narratives in children with ASD (Salter et al. 2008; Shimoni et al. 2012), and none simultaneously measured cognitive and affective ToM.

In addition to variation in theoretical ToM subdomains that may account for the mixed findings in the literature, there are many methodological differences across studies. The majority of ToM measures present information verbally, while fewer present information visually. Individual differences in language ability have been found to relate to ToM performance in both typically developing children and children with ASD (Milligan et al. 2007), suggesting performance may be mediated by verbal ability. Children with strong verbal skills may be able to “hack” false belief reasoning through use of cognitive skills (Happé 1995). Age differences may also account for some of the conflicting findings in the literature, as some studies suggest that ToM is delayed rather than deficient in ASD (Burger-Caplan et al. 2016; Steele et al. 2003). In addition, the majority of ToM measures are scored as either pass or fail, while fewer allow for open-ended responses that may provide a more sensitive measurement of ToM ability. Examining different methodological components of ToM tasks is important in order to separate ToM ability from factors related to task demands, such as how much verbal reasoning and spontaneous attention to key details of the stimuli are required. A related consideration is the sample size of previous work. Given known heterogeneity within ASD, if ToM is not universally impaired among individuals with ASD, smaller samples may include differing proportions of children with ToM impairments. Research with larger samples may clarify mixed results.

Many studies have examined the extent to which ToM challenges contribute to the wide range of social difficulties observed in ASD, both in terms of the severity of specific ASD symptoms (e.g., Jones et al. 2018) and broader social functioning in daily life (e.g., Burger-Caplan et al. 2016). Symptoms represent specific social communication deficits, whereas social functioning represents overall behavior across many domains, including interpersonal relationships, adaptive function, and leisure time (Yager and Ehmann 2006). ToM deficits have been linked to higher levels of ASD symptoms in individuals with ASD (Jones et al. 2018; Lerner et al. 2011; Shimoni et al. 2012). Other studies have shown that higher ToM predicts better communicative and socially adaptive functioning, as well as fewer broad social problems (Berenguer et al. 2018; Bishop-Fitzpatrick et al. 2017; Burger-Caplan et al. 2016; Joseph and Tager-Flusberg 2004; Mazza et al. 2017; Shimoni et al. 2012). In contrast, some research suggests that ToM ability is unrelated to social communication difficulties in ASD (Cantio et al. 2016; Pellicano 2013). More research using detailed ToM batteries is needed to clarify whether distinct components of ToM predict symptom severity and social functioning in ASD.

Present Study

Given that ToM is thought to be a multifaceted construct with ToM types that vary in level of complexity (Apperly 2012), the present study examines whether theoretical distinctions are present among school-aged children (7–11 years-old) with ASD and average to above average intelligence (IQs of 85 or above). A substantial limitation of past research on ToM in ASD, particularly among school-aged children, is that few studies have examined the advanced ability to reason about others’ emotions in social situations as well as the ability to spontaneously interpret others’ ambiguous actions. Tasks that more closely mirror the unfolding of spontaneous social interpretations in everyday life, and that capture the advanced ToM skill of reasoning about the emotional experiences of others (Osterhaus et al. 2016), may represent distinct aspects of the ToM profiles of school-aged children with ASD. To date, no studies have simultaneously examined cognitive, affective, and spontaneous ToM in ASD—a theoretical limitation addressed by the present study.

First, we tested whether theoretical distinctions (cognitive, affective, and spontaneous) exist within ToM. We hypothesized that cognitive, affective, and spontaneous ToM would represent separate factors with distinct patterns of variation. Because language ability accounts for some individual differences in ToM performance (Milligan et al. 2007), we also included tasks that present information both verbally and visually. Second, we tested whether ToM predicted social symptoms and functioning. Consistent with literature in school-aged children with ASD (e.g., Jones et al. 2018), we predicted that better performance on traditional ToM tasks would relate to better social functioning. Yet, given recent work related to advanced aspects of ToM (Osterhaus et al. 2016), we also explored whether advanced ToM would be more sensitive to individual differences in social symptoms and real-life social behavior than traditional false belief ToM measures.

Method

Participants and Procedure

Sixty children (4 females) between the ages of 7–11 years-old with ASD and IQs of 85 or above participated (see Table 1). Children were recruited from the hospital’s recruitment registry and other community sources (events, clinics, word of mouth). Exclusionary criteria included severe sensory or motor impairments that limited the ability to complete the test battery, colorblindness, insufficient English fluency for valid completion of standardized measures, medical disorders that impact the central nervous system, prolonged prenatal substance exposure, and a history of seizures or use of anticonvulsant medications. The hospital’s human subjects division approved all study procedures, and all parents consented for their children to participate.

Table 1 Sample characteristics (N = 60)

All participants had a previous ASD diagnosis, which was confirmed using the Autism Diagnostic Observation Schedule, Second edition (ADOS-2; Lord et al. 2012), the Autism Diagnostic Interview, Revised (ADI-R; Rutter et al. 2003) according to Collaborative Programs of Excellence in Autism (CPEA) criteria (see Sung et al. 2005 for details), and the Diagnostic and Statistical Manual of Mental Disorders, Fifth edition (DSM-5; American Psychiatric Association 2013) criteria for ASD.

Basic exclusion criteria were screened by phone prior to enrollment. Then, the Vineland Scales of Adaptive Behavior, Second edition (Vineland-2; Sparrow et al. 2005), ADI-R, ADOS-2, Wechsler Abbreviated Scale of Intelligence, Second edition (WASI-2; Wechsler 2011), and a colorblindness screening were administered. Based on the results, eligibility was determined. A battery of ToM measures was administered in a fixed order during two additional visits. During the second visit, a parent or guardian was asked to complete questionnaires, including measures of social functioning.

Materials

ToM Measures

Given that the literature is limited on how to best measure individual differences in ToM in school-aged children with average to above average IQ, multiple measures were used to assess different types of ToM. All ToM stimuli (pictures, text, audio, and video recordings) and task instructions were presented via computer on E-prime 2.0 software.

In the First-Order False Belief (FOFB) Videos task (see Appendix 1), children answered questions after watching two videos. In the Location Change False Belief video (Saxe 2009; Wimmer and Perner 1983), children inferred the knowledge and belief of a person regarding the location of an object that was moved to a new location while he was absent. Moral judgment of his behavior (false belief or naughty) was also obtained. In the Unexpected Contents False Belief video (Perner et al. 1987), children viewed a familiar container with unexpected contents and were asked about the belief of a naive person regarding the contents. Percent correct was calculated by dividing the number of correct answers to test questions across the two videos by the total number of test questions (7). Four participants did not complete this task due to time constraints.

In the Theory-of-Mind Test (TOM Test; Muris et al. 1999), children answered questions about a series of cartoons and audio stories designed to test both basic and more advanced aspects of ToM. The task contained 38 items and three subscales: TOM level 1 (20 items), TOM level 2 (13 items), and TOM level 3 (5 items). TOM Test level 1 measures affective ToM, such as recognition of others’ affective mental states and understanding of social scenarios and emotions. TOM Test level 2 measures first-order false belief. TOM Test level 3 measures second-order false belief. The experimenter scored the child’s responses, based on a scoring sheet with common correct and incorrect responses provided by Muris et al. (1999). The responses flagged for review during administration were resolved post-administration by consensus coding and reviewed by the last author. Percent correct scores were calculated by dividing the total number of correct answers by the total number of questions for each level. One participant did not complete this task due to time constraints. Seven videos (12%) were scored by a second scorer. Intraclass correlation coefficients indicated excellent reliability (see Table 2).

Table 2 Inter-rater reliability

In the Social Attribution Task (SAT; Klin 2000), children viewed animated geometrical figures that are commonly interpreted as enacting a social scene (Heider and Simmel 1944). Children were asked to describe the meaning of the animation in an open-ended manner and then with verbal cues about its social nature. The SAT video clips, instructions, and coding scheme were identical to those used by Klin (2000). The first narrative was obtained after viewing the full animation twice. Then, a series of meaningful segments were presented one at a time (narratives 2 through 7). For narratives 1 through 7, children were asked “what happened there?” For narratives 8 through 10, they were asked “what kind of person is the big triangle/small triangle/small circle?” Explicit questions followed (narratives 11 through 17), such as “why did the big triangle go into the house?” All responses were recorded for transcription and coding.

To quantify spontaneous ToM, five index scores were derived from the narratives: salience, cognition, affect, person, and problem-solving (Klin 2000)Footnote 1. PropositionsFootnote 2 in narratives 1 through 7 that were rated as non-pertinent to the broad social theme of the animation (inconstant attributions, vague references, misattributions, and irrelevant attributions) were eliminated for the coding of the salience, cognition, and affect indices. The cognition and affect indices measure the frequency of cognitive mental state terms (e.g., knowledge, desire, or belief) and affective mental state terms (e.g., jealousy or embarrassment), respectively. Each index is scored by dividing the number of cognitive or affective terms by the total number of propositions in narratives 1 through 7, such that higher scores indicate a higher frequency of spontaneous mental state terms. The salience index measures the ability to impose a social interpretation to the animations and is scored by summing the total number of social elements identified (from a total of 20 pre-identified themes) divided by 20. Higher salience index scores indicate a higher frequency of spontaneous social attribution. The person index measures the capacity to create personality attributions about the shapes and is scored hierarchically based on level of sophistication using an ordinal scale, wherein 0 represents no personality attributions (e.g., describes the shapes in physical terms only) and 6 represents attributions of sophisticated psychological characteristics. Finally, the problem-solving index measures the ability to answer explicit questions about the animation. It is scored by summing the number of correct responses (from a total of 10 items) divided by 10, with higher scores representing an increased ability to make salient social attributions when asked about the social nature of the animation.

To maximize inter-rater reliability (Klin 2000): (1) three coders were trained on SAT scoring before coding the transcripts included in this study, and frequent meetings were held to learn explicit scoring guidelines and clarify coding issues, (2) coders followed a procedural sequence for coding each transcript, and (3) examples of frequent terms encountered in SAT narratives were included in the manual with their corresponding codes. Twenty-nine (48%) of transcripts were coded by two different coders. Intraclass correlation coefficients for SAT index scores and the total number of propositions indicated moderate to excellent reliability (see Table 2). A summary of the descriptive statistics for ToM measures and simple bivariate correlations between measures is shown in Table 3.

Table 3 Descriptive statistics and bivariate correlations among ToM variables

Measures of Social Symptoms and Social Function

The ADOS-2 Module 3 provided a measure of ASD social symptom severity using revised algorithms that included items from the Social Affect scale (Hus et al. 2014). Calibrated severity scores are derived from these items. Higher scores reflect more severe ASD-related symptoms adjusted for ADOS-2 module and age.

The Vineland-2 (Sparrow et al. 2005) interview was administered to caregivers by a clinician and determines a child’s current level of adaptive functioning with normative scores relative to same-aged peers. The Socialization domain captures interpersonal relationships, play and leisure time, and coping skills. Higher scores reflect better social functioning.

Analytic Approach

Data were analyzed by use of Structural Equation Modeling (SEM) to assess both measurement and structural models (Bentler 2007; Joreskog 1973). SEM models employ a simultaneous equation approach to define unobservable constructs using observed behaviors (measured using items/exercises, etc.). The goal of the model is to minimize the discrepancy between a population covariance matrix [implied variance–covariance matrix Σ(θ)] and the sample’s variance–covariance matrix [Σ]. Model fit is evaluated by means of an omnibus asymptotic Chi square test, which tests the hypothesis that [Σ(θ) = Σ], with good model fit being reflected by a non-significant Chi square value. Given power-related concerns (excessive power given large samples), additional fit-related information comes from absolute, incremental, and parsimonious fit indices such as the Comparative Fit Index (CFI), the Normative Fit Index (NFI), and the Adjusted Goodness of Fit Index (AGFI), for which acceptable values are greater than 0.95 (Hu and Bentler 1999). Unstandardized residuals (i.e., RMSEA) below 0.05 are indicative of excellent fit, whereas values between 0.05 and 0.08 indicate acceptable model fit. Evaluation of the residuals provides the most unbiased estimate of model fit (MacCallum et al. 1996).

Power for the measurement and structural models was evaluated using a Monte Carlo simulationFootnote 3 and the recommendations of Muthén and Muthén (2006) using Mplus 8.0. For the measurement model, latent variables were standardized with a mean of zero and variance equal to one. Factor loadings were specified at 0.50 and residual variances at 0.75. Between factors correlation was posited at 0.50. Results using 1,000 replicated samples of n = 60 participants indicated that power for identifying significant factor loadings equal to or greater than 0.50 ranged between 87.1% and 90.9%. Coverage ranged between 93.2% and 93.9%. Structural paths were specified to a standardized value of 0.50 as per Cohen (1992) conventions of medium effect size. Power for those structural paths was 84.3%. Thus, the sample size of 60 participants would suffice to identify both the proper measurement model as well as clinically important structural paths. These power-related findings agreed with a recent simulation in which power and parameter stability were observed with SEM models having 50–70 participants (Sideridis et al. 2014). In addition to the use of inferential statistics, the significance of structural paths was further verified by simulating the population distribution of a slope using 10,000 replicated b-values using the mean and variance estimates of the sample. Evidence of significant effects would be manifested with point estimates and 95% confidence intervals that do not contain the value of zero. These findings are described in the section “Structural Model.”

Results

Measurement Model

A two-factor correlated model was posited to assess the cognitive and affective domains of ToM. As shown in Fig. 1, a two-factor correlated model provided good fit to the data. Unstandardized residuals (i.e., RMSEA; Steiger and Lind 1980) were less than 0.01 and descriptive fit indices, such as the CFI and TLI, had values equal to unity (Bentler 1990). Furthermore, the omnibus Chi square test was not significant, suggesting “exact fit” of the data to the model (MacCallum et al. 1996). The correlation between cognitive and affective domains was 0.6, suggesting good convergence but also the presence of discriminant construct validity. More on the model’s fit is shown in Appendix 2 by use of absolute and incremental fit indices. This model was contrasted to a model positing a unidimensional structure. Using a Chi square difference test, results pointed to the superiority of the 2-factor correlated model over its unidimensional counterpart [ΔChi-square(1) = 17.379Footnote 4, p < .001], with the latter not fitting the data well (e.g., RMSEA = 0.157, CFI = 0.861). A second alternative involved contrasting the 2-factor correlated solution to a 3-factor model in which a domain of spontaneous ToM was included. This model did not provide a good fit to the data, as the RMSEA was far beyond the 5% cutoff value (i.e., 0.137), fit indices were unacceptable (CFI = 755, TLI = 0.633), and two factor loadings were not significant for the spontaneous ToM factor. Further results again pointed to the superiority of the 2-factor correlated solution as model fit, as the 3-factor model was significantly worse by use of a Chi square difference test [ΔChi-square(19) = 48.316Footnote 5, p < .001].

Fig. 1
figure 1

Two-factor correlated solution for the measurement of cognitive and affective ToM. Residual values (1-squared factor loadings) are not shown for parsimony

The roles of age and verbal IQ were assessed by relating both variables with the latent factors of cognitive and affective ToM. Both covariates exerted null effects on ToM. Specifically, affective ToM was unrelated to age (r = .306, p = .19) and verbal IQ (r = − .133, p = .50). Similarly, cognitive ToM was unrelated to age (r = − .088, p = .39) and verbal IQ (r = − .110, p = .37).

Structural Model

Following identification of the optimal simple structure for the measurement of cognitive and affective ToM, a structural model was posited in which social outcomes, including social symptoms (ADOS-2 Social Affect calibrated severity score) and functioning (Vineland-2 Socialization score), were predicted by cognitive and affective ToM (see Fig. 2). Results indicated that social symptoms were predicted significantly by affective ToM (b = − 346, p < .05). No other structural path exceeded conventional levels of significance. For the significant structural path, bootstrapping involved simulating the population distribution of b-values. Using 10,000 replicated paths, results indicated that the bias between the population and sample estimates was equal to 0.002. Furthermore, the 95% confidence intervals did not contain zero, suggesting that the structural path between ADOS-2 and affective ToM was different from zero (see Fig. 3). Better affective ToM performance predicted less severe social symptoms.

Fig. 2
figure 2

Structural equation modeling (SEM) predicting the cognitive and affective aspects of ToM using the Vineland-2 and ADOS-2 instruments. A (*) indicates two-tailed significance at p < .05

Fig. 3
figure 3

Simulated population distribution of path coefficient between affective ToM and the ADOS-2 using 10,000 replicated values using mean and variance estimates from the structural model. The value of zero was not included in the 95% confidence interval of the population distribution, suggesting that the path was significantly different from zero, agreeing with the inferential statistical finding (i.e., p < .05). The simulation was run using Mplus, 8.0

Discussion

The present study examined individual differences in ToM in a large sample of cognitively unimpaired school-aged children with ASD. Our ToM battery measured cognitive, affective, and spontaneous ToM with a variety of different task demands, including presentation of social information visually and verbally and with structured and open-ended questions. Our results demonstrate that affective ToM has explanatory power for understanding the phenotypic heterogeneity in social symptoms of ASD and that theoretical constructs within ToM matter.

The first aim of the present study was to examine the nature of the construct of ToM in school-aged children with ASD. We examined three potential theoretical distinctions within ToM—cognitive, affective, and spontaneous. A two-factor model significantly fit the data and provided support for latent cognitive and affective factors. As hypothesized, traditional false belief variables related across visual and verbal task demands. Likewise, affective ToM variables related across structured and open-ended task demands. However, the model that included a third latent factor of spontaneous ToM, assessed by asking open-ended questions about the ambiguous actions of shapes, was not supported. This suggests that spontaneous ToM may not be a specific theoretical type of ToM, but rather, a ToM methodology that is more consistent with real-life social attribution demands because of its open-ended and implicit nature. As with the SAT, spontaneous ToM tasks may contain both cognitive and affective elements. Taken together, our results identify two theoretically distinct types of ToM present in ASD—cognitive and affective—and demonstrate that they can be measured across structured, open-ended, visual, and verbal task demands. The present study indicates the importance of explicitly evaluating and describing affective and cognitive ToM components in ASD.

The other focus of the present study was to examine the explanatory power of ToM with respect to individual differences in social outcomes in ASD. Only affective ToM predicted social symptom severity. In contrast, cognitive ToM did not predict symptom severity, which supports the theory that individuals with ASD may use compensatory strategies to infer others’ cognitive mental states (Happé 1995; Senju 2012). The finding that only affective ToM predicts social symptom severity may explain the inconsistent findings in the ToM literature in ASD and has many theoretical and clinical implications. Our results highlight that affective ToM, which captures the advanced ToM factor of social reasoning about the emotional experiences of others (Osterhaus et al. 2016), is sensitive to subtle individual differences in social function. The SAT affect index measures spontaneous affective attributions to shapes in a video, while affective questions on the TOM Test assess understanding emotional situations, that certain stimuli lead to behavioral and emotional responses, and that the social context of situations leads to different emotions. Although Muris et al. (1999) call the scale “ToM precursors,” our results indicate that affective ToM remains elusive for some children with ASD and may represent a more clinically relevant type of ToM than cognitive ToM. This adds to the literature on emotion recognition deficits in ASD (e.g., Fridenson-Hayo et al. 2016; Trevisan and Birmingham 2016) by specifying that children with worse social impairment have a particular challenge reasoning about affective components of mental states. It is also consistent with the theory that ToM challenges in ASD extend beyond mental state reasoning to include the fundamental processing of social stimuli (Klin et al. 2003). Clinically, these findings indicate that affective ToM assessments may help identify social challenges that contribute to symptom severity of children with high verbal reasoning skills who are able to perform well on more rule-bound, traditional false belief tasks.

Our results also indicate affective ToM difficulties are uniquely predictive of social symptoms that are characteristic of ASD, but not social functioning more broadly. This suggests that individual differences in ToM are most closely tied to the specific social challenges associated with ASD, rather than general social skills as would be expected for children without ASD. In contrast to the argument that social cognition contributes to social functioning (Beauchamp and Anderson 2010), these results support the model within the ASD literature that suggests social cognition contributes specifically to symptoms (Frith 1989). However, it is possible that laboratory-based ToM tasks are more likely to relate to a laboratory-based social measure, which may capture different social behaviors than assessments asking parents to report on enduring real-world social behaviors.

Several limitations of the present study should be addressed in future research. The ToM battery was designed to assess a wide range of ToM abilities, but floor and ceiling effects may have constricted the range and could dampen true effects for some measures. Yet, the variability in performance across the entire battery is also a strength, as the battery captured different components of ToM that vary in level of difficulty for children with ASD. Future work will benefit from the use of batteries that include an even wider range of measures and task demands in order to capture individual differences and replicate our finding of affective and cognitive factors within ToM among school-aged children with ASD. As well, given that the two processes of emotion perception and ToM are conceptually linked but poorly understood (Mitchell and Phillips 2015), future studies that examine both affective ToM and basic emotion processing skills in children with ASD will shed light on the extent to which higher-level integration and inference of social-emotional information is impaired, beyond basic emotion perception. Future work is also needed to explore how other variables, such as motivation or executive functioning, contribute to ToM performance. Finally, research with a larger number of females and using a comparison group without ASD or with a clinical condition thought to have impaired ToM will provide useful information about the generalizability of our findings.

In sum, through use of a large battery of ToM measures, this study demonstrates that the emotional aspect of interpreting others’ mental states can be distinguished from traditional cognitive aspects of ToM and uniquely predicts phenotypic heterogeneity in ASD. Finding two factors within ToM provides valuable theoretical insight into the ToM profile of school-aged children with ASD. Furthermore, to our knowledge, this is the first demonstration that reasoning about the affective component of others’ ambiguous mental states and social scenarios uniquely predicts social symptom severity in ASD. These results specify the centrality of affective social reasoning as a critical component of ToM in school-aged children with ASD. Clinically, it may be important to develop assessments and treatments that target affective ToM difficulties and other related challenges with reasoning about the social and emotional world.