Introduction

Autism Spectrum Conditions (ASC) are neurodevelopmental conditions, characterized by cognitive and behavioural difficulties in communication and social interaction (American Psychiatric Association 1994; World Health Organisation 1994). A fundamental part of these difficulties is a delay in the ability to recognize emotions and mental states in others (Baron-Cohen 1995; Hobson 1994).

Emotion recognition difficulties in ASC have been identified within cognitive, behavioural and neuroimaging studies, and across different sensory modalities (Frith and Hill 2004). Most of these have focused on the recognition of the six basic emotions: happiness, sadness, fear, anger, surprise and disgust (Ekman 1993; Ekman and Friesen 1971). Studies assessing recognition of these emotions report inconclusive findings in children with ASC: Some studies report difficulties in recognition of basic emotions from facial expressions, voice recordings, and from the matching of stimuli from the two modalities (Celani et al. 1999; Deruelle et al. 2004; Hobson 1986a, b; Loveland et al. 1995; Macdonald et al. 1989; Yirmiya et al. 1992). Other studies have found no such difficulties (Baron-Cohen et al. 1993; Boucher et al. 2000; Grossman et al. 2000; Loveland et al. 1997).

In contrast, studies investigating recognition of complex emotions and mental states by children with ASC have shown stronger and more consistent evidence of emotion recognition difficulties. Generally, complex emotions involve attributing a cognitive state as well as an emotion, and are more context and culture dependent (Griffiths 1997). Studies have reported that children with ASC have deficits in complex emotion and mental state recognition from photos of eyes (Baron-Cohen et al. 2001), from facial expressions and short voice recordings (Capps et al. 1992; Yirmiya et al. 1992), from pictures (Bauminger 2004) and from linguistic contextual cues (Baron-Cohen et al. 1999a; Happe 1994). However, since judging complex emotions may require the integration of multimodal information, including semantic information, prosody, and nonverbal visual cues (body postures and facial expressions) into a coherent holistic picture (Herba and Phillips 2004), the difficulties found in these studies may be related to the use of partial information and adherence to one perceptual channel at a time in these tasks.

Multi-modal, context-rich, ecologically valid complex emotion recognition assessments have received less empirical attention. While a few studies have reported using ecologically valid complex emotion recognition tasks for adults with ASC (Dziobek et al. 2006; Golan et al. 2006b; Heavey et al. 2000), no such tasks have been reported for children. Studies of complex emotion recognition in children with ASC have mostly focused on specific emotions, such as jealousy (Bauminger 2004), embarrassment (Hillier and Allinson 2002), or pride (Yirmiya et al. 1992).

In the current study we present a multi-modal, ecologically valid task which assesses recognition of a wide variety of complex emotions and mental states. This task, the child version of the ‘Reading the Mind in Films’ task (RMF-C), uses 22 short ecologically valid scenes, taken from feature films. The scenes include visual input (facial expressions, body language, action), auditory input (prosody, verbal content) and context, thus making it more ecologically valid than tasks testing each modality separately. The emotions and mental states selected vary in valence, intensity and complexity. They were chosen for their relevance to everyday social interaction.

Prediction of the performance of children with ASC on this multi-modal, complex emotion recognition task is based on two theories: The weak central coherence theory of autism predicts that because they focus on details and are held to have difficulties integrating them into a coherent whole (Frith 2003), children with ASC should find the RMF-C task hard. The mindblindess/empathy theory (Baron-Cohen 1995, 2003) predicts that since individuals with ASC have mentalizing difficulties, children with ASC will score lower on the RMF-C, compared to matched controls from the general population. The RMF-C can quantify the deficit predicted by both theories, acknowledging that it cannot distinguish between these alternative theories should any deficit be found.

In addition, we tested if the RMF-C task scores correlate with several factors: (1) Age was hypothesized to correlate positively since emotion recognition tends to improve with age (Harris 1989; Herba and Phillips 2004). (2) Correlations of item length and the number of characters appearing in the item with task score were calculated in order to test if difficulties on the task were due to poor working memory. (3) Correlation with IQ was tested since previous studies have found positive correlations between IQ and emotion and mental state understanding (Dyck et al. 2001; Fein et al. 1992; Hobson 1986a; Steele et al. 2003). We hypothesized that a positive association between task scores and verbal IQ would be found, given that the task involves matching a mental state word to a character’s action. (4) Finally, correlations with other complex emotion recognition tasks (from faces and voices separately) were also tested, to validate the RMF-C task.

Method

Participants

Two groups took part in the study: The ASC group comprised 23 children (22 boys and 1 girl), aged 8.3–11.8 (M = 10.0, SD = 1.1). Participants had all been diagnosed with Asperger Syndrome (AS) or High Functioning Autism (HFA) in specialist centres using established criteria (American Psychiatric Association 1994; World Health Organisation 1994). They were recruited from a volunteer database and a local clinic for children with ASC.

A control group from the general population was matched to the clinical group. It comprised 24 children (23 boys and 1 girl), aged 8.2–12.1 (M = 10.1, SD = 1.2). They were recruited from a local primary school. All participants were given the Wechsler Abbreviated Scales of Intelligence (WASI; Wechsler, 1999), and scored above 75 on both verbal and performance scales. To screen for autism spectrum conditions, participants’ parents filled in the Childhood Asperger Syndrome Test (CAST) (Scott et al. 2002; Williams et al. 2005). None of the control participants and all but one participant with ASC scored above the cut-off point of 15. This participant had a score of 11 on the CAST, perhaps due to several unanswered items. Therefore, he was not excluded from the experiment. The two groups were matched on sex, age, verbal IQ, and performance IQ. The groups’ background data appears in Table 1.

Table 1 Means, standard deviations and ranges of CAST, chronological age and WASI scores for the ASC group and the control group

Instruments

The ‘Reading the Mind in Films’ Task—Child Version (RMF-C): Task Development

Twenty-seven short scenes (6–30 s long, M = 16.5, SD = 7.3) were sampled from four children’s feature films by two of the authors. All films were rated ‘Universal’—appropriate for children from 4 years of age, by the British Board of Film Classification. The selected scenes involved socio-emotional interaction between 1 and 4 characters, and the expression of complex emotions and mental states (e.g. relieved, guilty, lonely). In each scene, a protagonist was identified and their emotion or mental state at the end of the scene was labelled. Next, three foils were selected for each item. In order to keep foils’ verbal difficulty similar to the target, an emotion taxonomy was used (Baron-Cohen et al. 2004) (www.jkp.com/mindreading). The taxonomy comprises 412 emotions and mental states, each in one of six developmental levels. Selected foils were either the same level or easier levels than the target. Foils were selected so that they matched some but not all of the emotional information in the scene, e.g. matching the content of an utterance but not its intonation or context. The labels and foils were then reviewed by two independent judges. Two scenes were removed at this point due to disagreement between judges. A handout with definitions for all the target and foil words in the items included was prepared for participants’ use before and during the task.

The items were then played to 16 typically developing children—two girls and two boys from four age groups—8,9,10 and 11 years of age. Children were randomly selected from a local mainstream school. Teachers confirmed the selected children had no learning difficulties and parents confirmed that no family members were diagnosed with autism spectrum conditions. The items were played to the selected children on two laptop computers, using DMDX experimental software (Forster and Forster 2003). Items were mixed so that no two adjacent items were from the same film. For each film, the order of scenes presented was reversed, so that scenes from the end of the film were played first. This was done to avoid use of the plot for contextual cues to answer the items. Every item was preceded by the question: ‘At the end of the scene, what is the woman/man/girl/boy feeling?’ followed by the choice of four emotion labels. To avoid confounds due to reading difficulties, the experimenter read each question and the possible answers to the children and made sure they were familiar with all the words before playing the scene. The question and possible answers appeared again at the end of the scene. Participants were then asked to press a number from 1 to 4, to choose the answer which best describes how the protagonist is feeling. Figure 1 shows an example of an item from the task. The scene depicts a young woman rushing to the post office and knocking at the door. An elderly woman opens the door, and we see her face when the young woman says ‘I’m sorry, I know you’re closed’. Participants are then asked to choose the answer that best describes how the elderly woman feels, with unfriendly being the target emotion.

Fig. 1
figure 1

An item from the child version of ‘Reading the Mind in Films’, showing only one frame out of the full clip. Screenshot taken from Anne of Green Gables - The Sequel (1987), Courtesy of http://www.sullivanmovies.com

Next, an item analysis was carried out. Items were included if the target answer was picked by at least half of the participants, and if no foil was selected by more than a third of the participants (p < .05, Binomial test). Items which failed to meet these criteria were matched with new foils and played to a different group of 16 children. Three items which did not meet the criteria after this second round of validation were excluded from the final task, which comprised 22 items. Task score varied between 0 and 22.

The Cambridge Mindreading Face-Voice Battery—Children Version (CAM-C) (Golan & Baron-Cohen, submitted)

This battery tests recognition of nine complex emotions and mental states using two unimodal tasks: A face task, comprising silent video clips of child and adult actors, expressing the emotions on their faces; and a voice task, comprising recordings of short sentences expressing various emotional intonations. The battery provides an overall facial and an overall vocal emotion recognition score, as well as the number of basic or complex emotions and the overall number of emotions correctly recognized. Children with ASC scored significantly lower than controls on overall face and voice scores, and recognized fewer emotions (Golan and Baron-Cohen, submitted). Test–retest correlations, calculated for children with ASC were r = .74 for the face task, and r = .76 for the voice task.

Procedure

Participants with ASC were tested at the second meeting of an intervention study, in which they all served as controls (i.e., no intervention). Typically developing controls were tested at a local school. The final version of the task was presented to the participants on a laptop with a 15 inch screen, using DMDX experimental software. Headphones were used to improve perception. An instructions slide and a practice item preceded the task. The experimenter read the instructions, and the questions and answers of all items with the participants, and checked they were familiar with the possible answers. When needed, the definition handout was used to familiarize participants with answers. There was no time limit to answer each item. Completion of the task took about 30 min, including a short break in the middle. The CAM-C battery was completed by the participants at an earlier assessment meeting on a different day, and took about 45 min to complete. The WASI was completed during that earlier meeting. Participants’ parents filled in the CAST in advance and brought it with them to the assessment.

Results

Task scores were calculated by counting the number of correct answers for each participant. All the participants in the control group and all but one of the participants in the ASC group scored above chance (i.e. above 9, p < .05, Binomial test) on the RMF-C task. No participant scored at ceiling. A review of percentage of correct responses for the 22 items of the task (see Table 2) shows that no item was answered correctly by 100% of the participants in either group. As in the validation stage, targets were picked by more than 50% of the controls on all items. The proportion of correct responses to task items did not correlate significantly with the number of characters appearing in them (r spearman = −.06, n.s.) or with items’ length (r spearman = −.10, n.s.).

Table 2 The 22 target emotions and mental states included in the RMF-C task, and the percent of participants in each group who selected them

An analysis of covariance (ANCOVA) was performed on the RMF-C task score, with group as independent variable and with verbal IQ, performance IQ, and age as covariates. A significant main effect was found for group (F[1,42] = 33.73, p < .001), with ASC group scores (M = 14.2, SD = 3.4) being significantly lower than the control group scores (M = 18.2, SD = 2.6).

In addition to the group main effect, the ANCOVA also yielded significant effects of verbal IQ (F[1,42] = 9.91, p < .005) and Age (F[1,42] = 22.7, p < .001). Correlation analysis revealed a significant positive correlation between task scores and verbal IQ (r = .47, p < .005) beyond groups, and similar correlations for each of the two groups (r = .52 for the ASC group and r = .53 for the control group, p < .02 for both). A positive correlation was also found between task scores and age (r = .57, p < .001) beyond groups, with a tendency for a stronger correlation between task scores and age in the control group (r = .78, p < .001), compared to the ASC group (r = .54, p < .01). To check for the association of age with task scores beyond verbal IQ effects, partial correlations were computed for the two groups, controlling for verbal IQ. The correlation did not change in the control group (r partial = .78, p < .001), but was somewhat weaker in the ASC group (r partial = .46, p < .05). Task scores were not significantly correlated with performance IQ (r = .20, n.s.).

The correlations of the RMF-C task scores with CAM-C scores were positive for the face task (r = .68, p < .001), the voice task (r = .68, p < .001) and number of CAM-C emotional concepts recognized (r = .63, p < .001). These suggest there is an association between the ability to recognize complex emotions separately in faces and voices, and the ability to recognize them when facial, vocal and contextual information is integrated. Power calculations for the task (two tailed, with α = .01) revealed a power level of 1 − β = .961. A discriminant analysis was conducted, and the significant discriminant function (χ2[22] = 35.2, p < .05) successfully classified 87.2% of the participants (87% of participants with ASC and 87.5% of controls) into their original groups.

Discussion

This study reports the children version of the ‘Reading the Mind in Films’ task (RMF-C task), a multi-modal, ecologically valid task for assessing recognition of complex emotions and mental states, using social scenes from films. Results show that high functioning children with autism spectrum conditions (ASC) scored significantly lower than matched controls from the general population. This effect was not simply due to the association of task scores with verbal ability or with age. The task quantifies the difficulties children with ASC experience in recognizing complex emotions and mental states in everyday life.

The task has a wide score range in both groups, with no ceiling or floor effects. Power calculations and discriminant analysis showed the task is sensitive and has a good discriminative validity, with more than 87% of the participants correctly allocated to their groups, based solely on their task performance. RMF-C scores significantly correlated with other complex emotion recognition tasks, supporting its validity. Performance on the task was not correlated with length of its items or the number of characters appearing in the scenes. Hence, these measures did not indicate there was a working memory confound.

The significant group difference on task scores replicates previous findings of difficulties among high functioning children with ASC on complex emotion and mental state recognition tasks from visual, auditory and contextual stimuli (Baron-Cohen et al. 1999a; Baron-Cohen et al. 2001; Golan and Baron-Cohen, submitted; Happe 1994; Yirmiya et al. 1992). It also replicates findings of studies of adults with ASC, which found difficulties in recognition of complex emotions and mental states in ecologic film tasks (Golan et al. 2006b; Heavey et al. 2000; Klin et al. 2002a, b). To our knowledge, the RMF-C is the first ecologically valid task to test complex emotion recognition in children. As the results show, multi-modal information does not seem to help the recognition of complex emotions and mental states among children with ASC. Indeed, studies have shown that adding emotional cues in different channels does not help, and even hampers the ability of children and adults with ASC to recognize emotions and mental states (Hall et al. 2003; Pierce et al. 1997). Pierce et al. (1997) played social scenes with increasing numbers of emotional cues in different channels (prosody, verbal content, nonverbal, or nonverbal with object) to children with ASC and controls, and asked them if this is a good way to make friends. Children with ASC performed as well as controls on scenes containing one cue, but performed more poorly on scenes containing multiple cues, suggesting that individuals with ASC fail to benefit from multimodal socio-emotional information. Results of the current study show a similar pattern with a variety of emotional and mental states.

As predicted, task scores were correlated with age, indicating that the ability to recognize complex emotions and mental states improves with age. However, this association was stronger for the control group than for children with ASC, even when controlling for verbal IQ. Typically developing children learn to recognize complex emotions and mental states through constant interaction with family members and peers (Denham 1998; Harris 1994; Jenkins and Astington 1996). The reduced levels of social interaction among children with ASC may to some extent account for their slower learning of complex emotion and mental states recognition. It will be important to examine whether this knowledge can be acquired through training.

The task scores’ correlation with verbal IQ, which was predicted because of the task’s verbal nature, suggests participants with ASC use verbal cues to pick up the protagonist’s mental states. Previous studies report that children and adults with ASC use verbal content as a way to compensate for their difficulties in noticing non-verbal socio-emotional cues in a situation (Adolphs et al. 2001; Golan et al. 2006a, b; Grossman et al. 2000; Kasari et al. 2001). As the following examples show, relying on language alone may result in misinterpretation of protagonist’s mental and emotional state.

A more detailed observation of the task reveals that on some items there is a greater group difference in the proportion of correct answers. For example, only 48% of the ASC group labelled the emotional state depicted in Fig. 1 as unfriendly, compared to 83% of controls. Twenty-two percent of the clinical group labelled the protagonist’s emotion in this scene as sorry, whereas only 8% of the control group gave this label. Since the protagonist in this scene did not speak, participants who chose this distracter may have relied on what the other character said (‘I’m sorry, I know you’re closed’) rather than using the context set up by this utterance and the protagonist’s facial expression.

Another example was the mislabelling of a protagonist lying. The scene shows a father, mother and a son having dinner together. The son wants to tell his mother about something he saw today, but his father looks at him, then kicks him under the table. The son then reports seeing something dull, for which the father replies with false excitement. Participants were asked about the father’s mental state at the end of the scene. In the control group, 79% of the participants spotted the son’s initial enthusiasm, the father’s disapproval and then his false excitement, which combined together gives the impression the father is trying to conceal some information from the mother. Only 35% of participants in the ASC group correctly labelled this scene, whereas 39% of them thought the father was genuinely interested. This item confirms the difficulties children with ASC have with understanding of deception (Baron-Cohen 1992; Sodian and Frith 1992). The social relevance of such difficulties is clear. In her book, Claire Sainsbury, a woman with ASC, describes how she used to skip school on ‘April Fool’s Day’, as she could not recognize the other children’s tricks or understand their hidden intent, leaving her the victim of their pranks (Sainsbury 2000). Recognizing deception is still challenging even for adults with ASC. In a recent study, testing complex emotion and mental state recognition from faces and voices in adults with ASC, insincerity was the hardest mental state for participants to recognize (Golan et al. 2006a).

These two examples demonstrate a tendency among participants with ASC to process part of the given information, both perceptually and temporally. Perceptually, participants with ASC did not integrate the visual, auditory and linguistic information to arrive at an answer, but rather focused only on one channel, the linguistic. Temporally, they used information which was usually proximal to the time the clip was stopped and the question was raised, instead of taking into account the entire development of the situation. This suggests that participants with ASC processed the surface information, perhaps because of difficulties accessing the full depth of the situation (Lawson 2003). Reference to ‘depth’ of course does not specify the nature of the information participants with ASC were finding difficult. As mentioned earlier, there are at least two possibilities. In terms of weak central coherence theory, they may have over-focused on the details, resulting in having difficulties seeing the larger picture (Frith 2003; Plaisted et al. 2003). In terms of mindblindness/empathy theory, participants with ASC may have had specific difficulty in representing another person’s mental state (Baron-Cohen 1995). Neurologically, the over-reliance on local detail may result from too much local neural activation (i.e. within neural assemblies) and insufficient long-range connectivity between functional regions which would support integration (Baron-Cohen and Belmonte 2005). The mindblindness difficulties may arise from underactivity in ‘social brain’ regions such as the amygdala, left medial prefrontal cortex, orbito-frontal cortex, anterior cingulate cortex, and insula (Baron-Cohen et al. 1994, 1999b, 2000; Frith and Frith 2006).

The examples above demonstrate some errors children with ASC make when interpreting others’ mental states. In real life situations these could lead to misinterpretation of an interaction or to inappropriate responses. These subtle differences highlight the importance of an ecologically valid assessment of socio-emotional understanding in high-functioning children with ASC, since they may be able to pass more basic emotion recognition tasks.

Further investigation is needed to reveal the strategies children with ASC use to interpret emotional and mental states. Our task employed a forced-choice paradigm, which may explain the relatively good performance by the ASC group. The use of open-ended descriptions of situations, as well as neuroimaging and eye tracking techniques, could reveal the behavioural and neuropsychological aspects of the children’s compensatory strategies. Using imaging and gaze tracking techniques could also eliminate the need for any verbal component to the task, which had a significant effect on performance in the current study.

We conclude that the ‘Reading the Mind in Films’ task [Child Version] can quantify the complex emotion recognition skills that distinguish high-functioning children with ASC from controls in the general population. This task may be useful in intervention research, to monitor improvements in this skill, or to augment diagnostic assessments. The task also lends itself to neuroimaging and gaze tracking and developmental research in being standardized and validated. It will be of interest to apply the task to clinical groups other than ASC (e.g. schizophrenia, conduct disorder, Williams syndrome) in order to establish the sensitivity and specificity of the instrument to detect deficits, and perhaps even talents.