Introduction

In Kanner’s (1943) description of 11 children with inborn autistic disturbances of affective contact, 6 showed an interest for music, and since that time, musical abilities have been considered as a relative strength in individuals with autism spectrum disorders (ASD), in comparison to their global profile (Applebaum et al. 1979; Rimland 1964). Above-average auditory processing abilities, including enhanced pitch discrimination (Bonnel et al. 2003; Heaton et al. 1999b) and increased sensitivity to alterations of single pitches in a melody (Heaton et al. 1999b; Mottron et al. 2000) have been observed in children and adolescents with ASD. Children with autism can identify single pitches more accurately and possess better long-term memory for pitches than typically developing children (Heaton et al. 1998, 2008b; Heaton 2003). Music and music therapy have also been shown to have a powerful and lasting effect on a variety of outcome measures among individuals with ASD, facilitating social interaction and general behavioral improvements (Kaplan and Steele 2005; Kim et al. 2008; Whipple 2004).

The communication of emotion is generally regarded to be a primary purpose, intent or effect of music (Juslin and Sloboda 2001; Meyer 1956). Hence, the evidence of preserved musical ability in ASD is intriguing because impairments in processing emotional information are characteristic of autism (Buitelaar et al. 1999; see Hobson 2005 for a review). The relationship between autism and emotions has been studied extensively (Hobson 2005), but few studies have focused on how individuals with ASD perceive emotions conveyed by music.

Children with autism were shown to identify melodies in a major mode as happy and melodies in a minor mode as sad (Heaton et al. 1999a), a distinction that typically developing children can normally accomplish by 3 or 6 years of age depending on study design (Dalla Bella et al. 2001; Kastner and Crowder 1990). Children with ASD are more accurate than children with Down syndrome when asked to associate musical excerpts with visual representations of feelings: anger, fear, love, triumph, and contemplation (Heaton et al. 2008a). However, in that particular study, the effect of diagnosis failed to reach statistical significance when verbal mental age was accounted for. Thus, the authors suggest that emotional deficits present in ASD do not generalize to music and that understanding of music is limited by cognitive function, specifically verbal ability. A detailed comparison of groups for each one of the five emotions included in the study would have been interesting. However, data were only presented for all five emotions collapsed together.

Parents report that affect elicited by music in their children with ASD remains for prolonged periods of time in comparison to what is reported by parents of controls (Levitin et al. 2004). Compared to controls, children and adolescents with ASD are equally distracted by music when it accompanies moving visual images (Bhatara et al. 2009). Adults with ASD respond to and appreciate music in a fashion similar to the typical listener, as indicated by a semi-structured interview (Allen et al. 2009). However, adults with ASD tend to describe the effects of music in terms of arousal or internally focused language (e.g. calm, tense) rather than emotional language (e.g. happy, sad).

Thus far, investigating emotion perception via tasks of identification, recognition or categorization through music in ASD has yielded interesting results and warrants further exploration. In addition, employing musical stimuli addresses methodological concerns encountered in more traditional studies of emotion recognition. For one, it reduces reliance on verbal material. This is important because it has been suggested that individuals with ASD who have higher levels of cognitive functioning (indicated by higher verbal or nonverbal mental age or IQ) can adopt cognitive strategies enabling them to solve emotion recognition tasks successfully (Hobson 1986; Ozonoff et al. 1990; Teunisse and de Gelder 2001). Emotion research could also benefit from alternatives to the photographs and line drawings of faces or parts of faces often used as experimental stimuli. Behavioral and imaging data suggests that children and adults with ASD do not process facial cues, specifically the eyes, in the same way as typically developing individuals (Gross 2004; Pelphrey et al. 2002; Schultz et al. 2000; Spezio et al. 2007). This may cause differences in emotion recognition in ASD that are specific to faces but do not appear across other domains.

Studies in the auditory domain suggest that children with autism recognize basic emotions such as happiness, sadness, and anger in nonverbal vocalizations (Ozonoff et al. 1990) and adults with Asperger syndrome can recognize these same emotions in the voice (O’Connor 2007). However, adults with high-functioning autism or Asperger syndrome are impaired at recognizing complex emotions or mental states portrayed by the voice such as being hopeful, concerned, nervous, or embarrassed (Golan et al. 2007; Rutherford et al. 2002). Individuals with ASD have difficulty matching nonverbal vocalizations (Hobson 1986) and speech to videos (Loveland et al. 1995) or pictures (O’Connor 2007) of people displaying basic emotions. In addition, level of functioning (low vs. high), but not diagnosis (autism vs. control), can distinguish among children, adolescents, and young adults in their ability to recognize basic emotions conveyed in videos where verbal and/or nonverbal emotional cues are emphasized to varying degrees (Loveland et al. 1997).

Although most studies have found that, when verbal ability is considered, children with autism (Castelli 2005; Ozonoff et al. 1990) and adults with high-functioning autism (Baron-Cohen et al. 1997; Neumann et al. 2006) can recognize facial expressions of basic emotions, some studies report deficits in similar tasks (Baron-Cohen et al. 1993; Celani et al. 1999) that have been attributed to impaired recognition of one emotion in particular: fear (Pelphrey et al. 2002). For example, Teunisse and de Gelder (2001) presented faces which were morphed along a continuum from one emotion to another. They found that young adults with high-functioning autism and typically developing young adults discriminate happy versus sad similarly, but that they discriminate angry versus sad and angry versus afraid differently.

The amygdala is known to play a role in the processing of complex mental states including social emotions (Adolphs et al. 2002; Baron-Cohen et al. 1999) and the recognition of fear in both facial expressions (Adolphs et al. 1994, 1995) and auditory stimuli (Scott et al. 1997). Adults with unilateral amygdala resection have difficulty recognizing scary and peaceful music (Gosselin et al. 2005; peaceful can be thought of as an antonym of scary, hence its use). Higher functioning adolescents and adults with ASD, like patients with amygdala damage, show difficulty identifying eye-gaze direction and facial expressions of fear (Howard et al. 2000) and trustworthiness (Adolphs et al. 2001). They also show an atypical pattern of fear acquisition (Gaigg and Bowler 2007). Although they exhibit appropriate electrodermal response to distress cues, they show hyporesponsiveness to threatening stimuli (Blair 1999). Findings from event-related potential (ERP) studies demonstrate that 3–4 year olds with ASD do not exhibit the typical N300 and negative slow wave responses to fearful versus neutral faces (Dawson et al. 2004). Thus, Baron-Cohen et al. (2000) proposed the “amygdala theory of autism,” which posits that reduced functioning of the amygdala (hypoamygdalism) or circuits including it contribute to social and emotional deficits and may explain abnormal responses to fear in autism. However, animal models (Amaral et al. 2003) and psychological assessments of a patient (S.M.) with bilateral amygdala damage (Tranel et al. 2006) suggest that amygdala damage does not impair social behavior nor the range of affects and emotions, although it alters reactions to or sense of fear, danger, and distrust. Specifically, S.M.’s recognition of scary music is impaired (Gosselin et al. 2007). Thus, Amaral et al. (2003) suggest that atypical amygdala functioning accounts primarily for increased anxieties in ASD rather than social deficits per se.

As noted by Baron-Cohen et al. (2000), research on emotional responsiveness including fear represents an indirect way to confirm the amygdala theory of autism in ASD (the direct way requiring imaging techniques). This has fuelled many studies focusing on fear recognition in ASD, most of which have been conducted in the visual domain. The experimental task presented here seeks to assess emotion recognition in a different modality: music. To the authors’ knowledge, only one study to date (Heaton et al. 2008a) has included scary music in tasks of emotion recognition in ASD but data for all the emotions presented were analyzed together, i.e. no separate analysis was carried out on results for recognition of scary music specifically.

Based on the experiments conducted by Gosselin et al. (2005, 2007), the present study explores the ability of individuals with ASD to recognize or categorize different emotions in music using a forced-choice experimental procedure and explores perception of emotional intensity in music. Happy, sad, scary, and peaceful music are employed to compare results with those obtained in previous studies (Gosselin et al. 2005, 2007; Heaton et al. 1999a). It is hypothesized that children and adolescents with ASD will be able to recognize happy and sad music (per Heaton et al. 1999a). However, the amygdala theory of autism (Baron-Cohen et al. 2000) and studies including patients with damage to the amygdala (see Gosselin et al. 2005), suggest that individuals with ASD will exhibit difficulties recognizing scary and peaceful music, and will provide lower ratings of emotional intensity in comparison to controls. In addition, self-awareness of emotion perception will be assessed through confidence ratings given by participants. This has not been studied extensively, therefore, confidence ratings are being used here as exploratory measures. This will allow for a test of Hill et al.’s (2004) argument that, although individuals with ASD show some degree of insight, there seems to be a dissociation between what they think, feel or experience and their descriptions of their thoughts, feelings, and experiences.

Method

Participants

Two groups (typically developing; ASD) composed of 26 participants each (52 participants in total) were included in the final sample retained for the analyses (see Table 1). Given the mean age of the participants (years: months; ASD: M = 13:7, SD = 1:1, range of 10:10 to 19:4; TD: M = 13:6, SD = 2:2, range of 9:11 to 17:9) and for brevity, we will refer to the participants collectively as adolescents rather than children and adolescents from this point on.

Table 1 Group characteristics of patients with ASD (N = 26) and typically developing (TD, N = 26) participants

Seventy-nine participants were initially recruited (46 typically developing adolescents, TD, and 33 high-functioning adolescents with ASD). Participants with ASD were then group-matched to participants with TD so that performance and full scale IQ scores (PIQ and FSIQ) obtained with the Wechsler Abbreviated Scale of Intelligence (WASI: Wechsler 1999) differed by less than one standard deviation between groups (see Table 1). However, a statistical difference remained between groups on verbal IQ, and consequently ANCOVAs and stepwise linear regression analyses were used in the main analyses reported below to fully account for the impact of verbal ability on task performance. The final ASD group comprised 3 adolescents with Autism, 13 with Asperger, and 10 with PDD-NOS. Independent samples t-tests confirmed no significant difference between groups on the following: (a) chronological age, (b) sequential auditory processing and auditory working memory, and (c) number of years of musical training and number of instruments played. Auditory processing and working memory (b above) were assessed using the Digit Span and Letter-Number Sequencing subtests of the Wechsler Intelligence Scale for Children, 4th edition (WISC-IV: Wechsler 2003) because the experimental task requires use of temporal auditory memory. Musical training and experience (c above) was assessed by combining information from the Salk and McGill Musical Inventory (SAMMI, Levitin et al. 2004) completed by parents and a semi-structured interview conducted with the participants (Queen’s University Music Questionnaire—Revised, based on Cuddy et al. 2005).

Participants with TD were recruited through word of mouth, and advertisements placed at the university and posted in four schools in Montreal (two elementary schools and two high schools). Participants with ASD were recruited through a specialized clinic for ASD at the Montreal Children’s Hospital (25 participants) and a Montreal school offering special education for children with ASD (8 participants). All participants with ASD had received a diagnosis by a specialized medical team (child psychiatrist, developmental psychologist, etc.) based on DSM-IV criteria.

Parents filled out the Social Responsiveness Scale (SRS; Constantino et al. 2003) and the Social Communication Questionnaire (SCQ; Rutter et al. 2003). This provided descriptive information (and converging evidence of the diagnosis) in the case of adolescents with ASD. It also confirmed that participants with TD did not show signs of ASD. Before proceeding to group-matching, 3 adolescents originally classified as TD had been excluded because they presented a neurodevelopmental or psychiatric disorder that interfered with testing, ADHD for example. None of the profiles for the remaining adolescents with TD indicated the possibility of ASD or other disorders. Two participants with ASD included in the 26 participants retained for analyses did not meet the cut-off for ASD on the SCQ, a raw score of 15, but they scored in the ASD range on the SRS. Therefore, they were retained in the analyses.

Stimulus Creation and Experimental Procedure

A musical task was created by the present authors to assess recognition or categorization of emotions combining the methodologies described by Gosselin et al. (2005, 2007) and Rapport et al. (2002). This task was validated with 20 participants (19–39 years old; 7 men and 13 women) recruited through word of mouth and advertisements placed at the university in order to evaluate how normal healthy adults respond to the task and to infer probable reactions in adolescents with typical development and ASD. The validation study included 28 music clips that had previously been used in the last author’s laboratory to target happiness, sadness, scariness, and peacefulness. Of those, the 20 music clipsFootnote 1 (five for each target emotion) for which the highest agreement was obtained were retained for the experiment (see “Appendix 1”).

Before testing, participants with TD and ASD were asked to identify the emotions depicted by 4 faces (without labels) and positive feedback or correction was given. The 4 faces were then presented with the appropriate label (happy, sad, scared, and peaceful). During testing, a “judgment screen” appeared on the computer 7 s after the beginning of exposure to the music clip. Participants chose which of the 4 faces coupled with the appropriate label best described the music clip. There was only one “judgment screen”; the same 4 faces and labels were presented for each musical stimulus (see “Appendix 2”). These 4 emotions were employed in order to present the same emotions that Gosselin et al. (2005, 2007) had focused on for patients with damage to the amygdala. A forced-choice procedure was used per previous research (Juslin and Sloboda 2001) and thus the experimental task will be referred to from this point on as a task of emotion recognition.

To avoid possible stimulus or participant biases due to factors such as age, gender and ethnicity, line drawings of faces were used instead of pictures (visual stimuli adapted from Hess et al. 2004, 2005). Line drawings have also been successfully employed in the past in experiments with individuals with ASD (Heaton et al. 1999a). The order of presentation of the music clips was randomized for each participant. Responses were considered to be “correct” (and assigned a score of 1) if the emotion selected by the participant corresponded to the “intended emotion”.Footnote 2 Responses were considered “incorrect” (and assigned a score of 0) if the selection did not match the “intended emotion.” The maximum possible score was 20 for the total score and 5 for the score for each emotion. Participants also rated how intensely the music clip conveyed the selected emotion by using the computer mouse to move a slider along a continuous scale of 32 cm where ratings ranged from slightly intense (0) to very intense (1). With an identical slider scale, participants rated how confident they were that they had correctly recognized the emotion from not at all confident (0) to very confident (1). Response times were also obtained for the time to select an emotion. The response times reported do not include the 7 s listening period.

Participants heard the music clips through stereo loudspeakers (Advent Powered Partners 570, Audiovox Electronics Corporation, New York, 2004) connected to a laptop computer (Powerbook G4 15’’, Apple Computer Inc., 2006). The task was programmed in PsiExp (for the graphics, Smith 1995) and MaxMSPRunTime (for the sounds, Cycling 74/IRCAM, 2005). The music clips’ durations were between 30 and 50 s. Equal subjective loudness was obtained by collecting loudness matching judgments from 5 trained listeners in a separate validation study, a method more reliable than using acoustical or electrical measurements for time-varying stimuli such as music (Caclin et al. 2005; Marozeau et al. 2003). Each musical clip had a sound pressure level varying between 50 and 70 dB A-weighted at the position of the participants’ ears as measured with a Brüel and Kjær 2203 sound level meter fitted with a Brüel and Kjær 4144 omnidirectional microphone (Brüel and Kjær, Naerum, Denmark).

Participants were tested individually in a soundproof room. The emotion recognition task lasted between 10 and 20 min and was included as part of a larger experimental protocol that lasted approximately 3 h. Informed consent was obtained from parents and participants. Participants and parents were debriefed at the end of each session. Participants received a $20 gift certificate for a local music store as compensation and parking was also paid for. The research received ethical approval from both McGill University and McGill University Health Center Research Ethics Boards.

Statistical Analyses

Repeated measures ANOVA and ANCOVAs were performed to assess performance on the emotion recognition task. The role of VIQ on task performance was accounted for by ANCOVAs. Linear regressions and ANOVAs assessed various predictors of task performance including VIQ and gender. These analyses were repeated to assess recognition of each emotion with a specific interest in recognition of scary and peaceful music given the amygdala theory of autism. This led to an analysis of patterns of confusions, i.e. which emotion was selected instead of the “intended emotion” when “incorrect” responses were given, with inter-rater agreement and Chi-Square test. Patterns of confusions were compared between groups with intraclass correlations. Next, correlations were performed on ratings of intensity and confidence and response times. MANOVA and ANOVAs were performed on these measures with a specific interest for group differences in terms of intensity ratings given the amygdala theory of autism. Confidence ratings were compared within the ASD group to explore self-awareness of emotion recognition in ASD in line with Hill et al.s’ (2004) work. Finally, the impact of musical training and experience on task performance was evaluated by creating groups based on these factors and ANOVAs and ANCOVAs were performed.

Results

Emotion Recognition

Performance of each group on emotion recognition is reported in Table 2. Visual examination of Table 2 reveals that, at least in terms of raw scores on the task, the mean performances of the ASD group are slightly below those of the TD group. Repeated measures ANOVA and ANCOVAs (Table 3) were conducted with “intended emotion” as a within-subject factor with four levels (happy, sad, scared, peaceful), diagnostic group (ASD and TD) as the between-subject factor, and VIQ as a covariate for the ANCOVA. Significant main effects were found for both diagnostic group and “intended emotion” (ANOVA in Table 3); the participants with ASD were less accurate than participants with TD, but the effects failed to reach significance when VIQ was considered as a covariate (First ANCOVA in Table 3). The diagnostic group’s effect size decreased form a medium-large to a medium-small effect when VIQ was considered as a covariate. A significant main effect was found for the VIQ covariate with a medium to large effect size. There was no significant interaction between “intended emotion” and diagnostic group, indicating that some emotions were recognized with greater accuracy than others and that this pattern was similar for both groups. A follow-up repeated measures ANCOVA (Second ANCOVA in Table 3) showed a significant interaction between VIQ and diagnostic group, indicating that the effect of VIQ on task performance was not the same for each diagnostic group.

Table 2 Recognition of musical “intended emotion” for participants with ASD and TD: diagnostic group and VIQ effects
Table 3 Repeated measures ANOVA and ANCOVAs for recognition of musical “intended emotion”

Given the significant interaction effect between VIQ and diagnostic group, stepwise regressions were performed for each diagnostic group and for both groups combined. The outcome variable to be predicted by the regression models was the total score on the emotion recognition task (max = 20), which differed between groups (Table 2). The following predictors were entered: chronological age, VIQ, PIQ, Social Responsiveness Scale (t-score), Social Communication Questionnaire (raw score), total digit span (scaled score), letter-number sequencing (scaled score), number of years of musical training and number of instruments played. VIQ was retained as a significant predictor for both groups combined, R 2VIQ  = .15, B = .06, SE B = .02, p < .01, as well as for the ASD group, R 2VIQ  = .17, B = .07, SE B = .03, p = .04; the other predictors were not retained. For the TD group, none of the predictors were significant. Regression slopes for the total score on the emotion recognition task and VIQ are presented for both groups in Fig. 1. Thus, VIQ was related to task performance for the ASD group but not for the TD group (confirming the Diagnostic group × VIQ effect previously found). The effect of gender on task performance was explored through ANOVA instead of a linear regression (because gender is a categorical variable). That ANOVA revealed that the effect of gender on task performance was not significant for both groups combined nor for the groups considered separately.

Fig. 1
figure 1

Regression slopes for recognition of musical “intended emotion” (total score) and VIQ

Specific Emotions

Analyses were conducted to assess specific a priori hypotheses regarding separate recognition of each of the four emotions in ASD. Bonferroni adjusted one-tailed t-tests, with an adjusted alpha level of .013, failed to show significant differences between the ASD and TD groups for recognizing music as happy, sad, and scared (Table 2). An ANCOVA revealed a significant effect of VIQ for recognition of music as scared but there were no differences between groups whether or not VIQ was considered as a covariate for this emotion. A significant diagnostic group difference was found for recognizing peacefulness, with the participants with ASD being less accurate than participants with TD, but the group effect did not remain significant when the effect of VIQ was controlled for.

Given that the diagnostic group effect on the score for recognizing peaceful music did not remain significant when VIQ was controlled for and that the effect of VIQ was not significant, a stepwise regression was performed to assess if any of the other potential predictors previously considered for the total score (see Table 1) predicted the score for recognizing peaceful music. Number of instruments played was retained as the only significant predictor, R 2Instruments  = .09, B = .33, SE B = .15, p = .04, and VIQ was not retained. An ANOVA revealed that the effect of gender for recognizing peaceful music was not significant for both groups combined or for the groups considered separately.

Analyses of Confusion Matrices

Additional analyses were conducted to further investigate the participants’ responses. Measures of inter-rater agreement (Cohen 1960) performed on the data presented in Table 4 revealed that the majority of responses fell along the diagonal, ASD: κ = .63; 95% CI: .58–.68 and TD: κ = .73; 95% CI: .68–.78, indicating consistent agreement between the “intended emotion” and the emotion selected. Follow-up analyses were performed to assess the possibility that the three remaining (incorrect) emotions were equally confusable. Taking the diagonal entries as fixed values, Chi-Square tests conducted on the off-diagonal cells (incorrect responses) revealed that the three remaining emotions were not selected equally often (Table 4); seven comparisons showed significance and one showed a trend. Visual inspection of Table 4 suggests that peacefulness tended to be confused with happiness or sadness for both diagnostic groups. The pattern of confusions made by both groups was not significantly different; that is, if an emotion was confused with another one, this was the case equally for both groups as demonstrated by the intraclass correlation coefficients (Shrout and Fleiss 1979) obtained for each “intended emotion”, Happy: .99, Sad: .98, Scared: .99, Peaceful: .95.

Table 4 Percentage of “correct” (along the diagonal) and “incorrect” recognition of the “intended emotion”

Intensity and Confidence Ratings and Response Times

Accuracy (correct vs. incorrect emotion recognition) was significantly positively correlated to ratings of intensity, Kendall’s tau (τ) = .14, p ≤ .01, and to ratings of confidence, Kendall’s tau (τ) = .16, p ≤ .01. In turn, intensity and confidence ratings were correlated to each other, Pearson r = .63, p ≤ .01, as revealed by two-tailed Pearson correlations. These correlations remained significant when groups were analyzed separately. In addition, response times were significantly negatively correlated with confidence ratings for participants with TD, Pearson r = −.23, p ≤ .01, indicating that shorter response times were associated with higher confidence ratings, but this failed to reach significance for participants with ASD, Pearson r = −.04, ns.

A MANOVA was performed with the intensity ratings, confidence ratings, and response times as dependent variables. MANOVA was selected to take into account the associations among the dependent variables. Diagnostic group (ASD, TD) and response accuracy (correct or incorrect emotion recognition) were considered as independent variables.

The difference between diagnostic groups in terms of intensity ratings was nonsignificant, F(1,1040) = 2.13, p = .15. Correct responses were associated with higher intensity ratings than incorrect responses, F(1,1040) = 31.14, p < .01. The interaction between diagnostic group and accuracy was nonsignificant, F(1,1040) = .77, p = .38 (Fig. 2).

Fig. 2
figure 2

Means and standard errors for intensity ratings

Although the TD group generally gave higher confidence ratings, F(1,1040) = 21.10, p < .01, confidence ratings were lower for both the TD and the ASD group when their responses were incorrect, F(1,1040) = 37.68, p < .01. This pattern of response did not differ between groups, F(1,1040) = 2.76, p = .10 (Fig. 3). An additional ANOVA assessing confidence ratings given by participants with ASD only confirmed they gave higher confidence ratings when they had correctly (vs. incorrectly) selected the “intended emotion”, F(1, 520) = 27.64, p < .01.

Fig. 3
figure 3

Means and standard errors for confidence ratings

For response times considered as dependent variables, the difference between groups was marginally significant, F(1,1040) = 3.27, p = .07, with participants with TD responding slightly faster than participants with ASD; the difference between correct and incorrect responses was not significant, F(1,1040) = 2.46, p = .12, nor was the interaction between diagnostic group and accuracy, F(1,1040) = .12, p = .73.

Musical Training and Experience

Although groups were matched for musical training and experience, additional analyses were performed to further investigate the impact of musical training and experience. Two groups were created based on the data collected with the Salk and McGill Musical Inventory (SAMMI) and the Queen’s University Music Questionnaire—Revised (see “Methods”). Participants were included in the first group if they played at least one musical instrument and if they had received at least 2 years of musical training. Ten adolescents with TD and 18 with ASD were included in this “musicians” group (N = 28). The remaining participants, 16 with TD and 8 with ASD, were included in the second “non-musicians” group (N = 24). A repeated measures ANOVA was performed with “intended emotion” as a within-subject factor and musical training and experience as the between-subject factor (Table 5). Significant main effects of “intended emotion” and musical training and experience were found. The analysis was repeated as an ANCOVA with VIQ as the covariate. The effect of VIQ was significant, while the effects of both “intended emotion” and musical training and experience failed to reach significance. Two repeated measures ANCOVAs were then performed separately for the participants with TD and ASD. For participants with TD, no significant effects were found. For participants with ASD, the effect of VIQ was significant, while the effects of both “intended emotion” and musical training and experience failed to reach significance.

Table 5 ANOVA and ANCOVAs for emotion recognition with groups based on musical training and experience

Discussion

The current study was conducted to complement the existing, albeit scarce, literature on emotion perception in music in ASD and to discover whether the deficits in emotion recognition and categorization reported in the visual domain for individuals with ASD might also exist in the musical domain. Stimuli included happy, sad, scary, and peaceful music to compare results with those previously reported for patients with damage to the amygdala (Gosselin et al. 2005, 2007). Thus, the experimental task allowed for a test of the amygdala theory of autism at the perceptual level in the musical domain. Ratings of emotional intensity were also collected to assess the amygdala theory of autism. Self-awareness of emotional recognition in music by adolescents with ASD was also explored via confidence ratings.

Emotion Recognition

The group means for raw scores revealed that adolescents with ASD were not as accurate as adolescents with TD at recognizing or categorizing emotions (happy, sad, scared, peaceful) represented in musical excerpts, but when verbal ability was accounted for as a covariate, the two groups did not differ statistically. The lack of a significant interaction between “intended emotion” and diagnostic group reveals that both groups exhibited a similar pattern of performance in the task, meaning that emotions recognized with greater accuracy by participants with TD were also recognized with greater accuracy by participants with ASD. In other words, some emotions are easier to recognize in music, whether or not an adolescent has ASD, and verbal ability is related to the ability to make such judgments accurately. This was also supported by high intraclass correlation coefficients found when analyzing patterns of confusions.

Burack et al. (2004) emphasize the importance of interpreting findings of studies on ASD within a developmental context while taking into account the role of cognitive functioning on task performance and, consequently, on group differences in performance. In the present study, the effects of diagnostic group on task performance were not significant once VIQ was controlled for. This is paralleled by a lower effect size associated to diagnostic group in the ANCOVA with VIQ than in the ANOVA (without VIQ). Task performance for the group of participants with TD was not predicted by any of the factors entered in a regression model including chronological age, VIQ, PIQ, measures of auditory memory, and musical training and experience, while VIQ was retained as the only significant predictor of total task performance for participants with ASD (Fig. 1). These results may be because the groups included in this study were matched more closely on PIQ than VIQ or that the variability and range in VIQ and on the emotion recognition task were greater for the ASD group. To the authors’ knowledge, only one study to date has reported results on a task of emotion recognition in music wherein a group of participants with autism were matched to two different control groups according to verbal or nonverbal IQ (Heaton et al. 1999a). There, the two matching procedures did not yield different results, i.e. both types of matching showed that children with ASD could distinguish happy and sad music as accurately as both controls groups. Given this lack of difference and given that participants with ASD in the present study were high-functioning, the fact that groups were matched more closely on PIQ than VIQ does not invalidate the results, although it can limit interpretations. It would be instructive to conduct this study with lower-functioning children with ASD and with younger participants to discover whether the findings can be replicated, and thus, further understand the contribution of verbal ability to emotion recognition in music. Findings reported here are not unexpected given the existing evidence that, in ASD, performance on emotion recognition tasks can be mediated by cognitive function and, more specifically, by verbal ability (Heaton et al. 2008a; Hobson 1986; Ozonoff et al. 1990; Teunisse and de Gelder 2001). Findings also support the two-threshold model proposed by Happé (1995), which states that a higher level of verbal ability is required for children with ASD to succeed on mentalizing tasks in comparison to controls who can succeed while exhibiting lower levels of verbal ability. In line with this, VIQ predicted task performance for participants with ASD (see Fig. 1) in the present study while performance of participants with TD seemed to be unrelated to VIQ.

In addition to controlling for VIQ, the impact of gender on task performance was controlled for statistically because an uneven number of girls and boys with ASD participated in the study, and an uneven number of men and women participated in the validation study. This could represent a limitation to the study; however, there was no effect of gender on task performance for participants with TD (similar number of TD boys and girls participated in the study) and participants with ASD, and for both groups combined. Although this suggests that gender does not have a crucial impact on emotion recognition in music in high-functioning adolescents with ASD, more research is needed to address this issue. For instance, gender may have a greater impact on emotion recognition in music in younger children with ASD or in lower functioning individuals with ASD. In typical development, emotion recognition in music develops at least until the age of 8 (Heaton et al. 2008b).

Specific Emotions

When “intended emotions” were considered separately, performance of participants with ASD and TD could not be distinguished when asked to recognize music as happy, sad, or scared, although participants with ASD were slightly less accurate than participants with TD (but not statistically so). Thus, recognition of basic emotions in music is comparable in ASD and TD, as is the case for recognition of basic emotions in nonverbal vocalizations (Ozonoff et al. 1990), the voice (O’Connor 2007) and facial expressions (Baron-Cohen et al. 1997; Castelli 2005; Neumann, et al. 2006; Ozonoff et al. 1990). In addition, the present results replicate findings by Heaton et al. (1999a) that children with ASD can distinguish happy and sad music.

Scary and Peaceful Music and the Amygdala Theory of Autism

Notably, it had not been predicted that adolescents with ASD would be able to recognize scary and peaceful music (for peaceful music, the group effect was not significant when the effect of VIQ was controlled for) as accurately as adolescents with TD. These results fail to replicate those of Gosselin et al. (2005, 2007) for patients with damage to the amygdala. Thus, emotion recognition in music among individuals with ASD differs from that in patients with damage to the amygdala, in the sense that individuals with ASD can recognize some musical emotions that patients with damage to the amygdala cannot recognize such as scary and peaceful music. This observation, combined with the lack of group difference for ratings of emotional intensity, cannot be reconciled with the amygdala theory of autism at the perceptual level. Emotion perception in music in ASD does not seem out of norms. Data from psychophysiological measures such as electrodermal response, heart rate, and imaging techniques could potentially inform this issue, but for now, the behavioral results presented in the current study indicate that this may well be the case. Imaging techniques could be considered in future research to assess amygdala activation associated to musical emotion recognition in comparison to activation of other brain areas thought to be atypically developed in ASD such as the frontal cortex (see Hill 2004, for a review) while considering possible atypical neural connectivity in ASD (Belmonte et al. 2004; Just et al. 2004). Exploring recognition of musical emotions in ASD with the help of recent imaging techniques, such as diffusion tensor imaging, seems promising, as well as comparing emotion recognition in many modalities (visual: still images, videos; auditory: voice, non-voice). In addition to amygdala activation, it will also be important to consider other areas implicated in emotional processing of music such as the temporal poles (Koelsch et al. 2006) and areas involved in pitch processing such as Heschl’s gyrus in the auditory cortex (Patterson et al. 2002).

The Case of Peaceful Music

When many potential predictors of task performance were considered, the number of instruments played—which averaged near 1 for both diagnostic groups—was retained as a predictor for recognizing peaceful music while VIQ was not. When peaceful music was not correctly recognized, it tended to be confused with happy or sad, which was the case for both diagnostic groups. Interestingly, typically developing adults who participated in a validation study for stimulus selection were also less accurate at recognizing peaceful music than the other three emotions (see “Appendix 1”), which suggests that peaceful music is generally the most difficult of the four emotions to recognize. A possible explanation for this is that a state of peacefulness can be thought of as a complex emotion or mental state whereas the other three emotions included in the experimental task are basic emotions (as per Golan et al. 2007; Rutherford et al. 2002). Findings reported here are thus coherent with previous findings, which suggest that individuals with ASD have difficulty recognizing complex emotions and mental states in the voice (Golan et al. 2007; Rutherford et al. 2002).

Perhaps, difficulties presented by both groups can be attributed to the nature of the peaceful stimuli used, which may represent a limitation of this study. However, it seems more likely that the difficulty in recognizing peaceful music may be attributable to a quality inherent to peaceful music itself; boundaries or conventions for what is considered to be peaceful music may be ambiguous. This could partly explain why group differences were nonsignificant when verbal ability was controlled for. Teunisse and de Gelder (2001) showed that, although individuals with ASD can assign emotions to categories, the way they classify representation of emotions falling between category boundaries is different from typically developing individuals. Thus, the inaccurate recognition of music meant to be peaceful (an ambiguous category), by participants with ASD, can be seen as consistent with work from Teunisse and de Gelder. To explore this question, future studies of emotion recognition in music could examine classification of emotions along a continuum instead or in addition to using a forced-choice method. Allowing participants to recognize more than one emotion per stimulus could also be informative.

Intensity and Confidence Ratings and Response Times

Emotion recognition accuracy was found to be associated to ratings of emotional intensity, confidence in task performance, and response times. Participants rated music as more emotionally intense when they correctly recognized the “intended emotion” and this was the case for both groups. Like participants with TD, participants with ASD reported being more confident of their responses when they correctly (vs. incorrectly) recognized the “intended emotion.” Thus, one can argue that high-functioning adolescents with ASD can perceive and relate to the emotional quality of music similarly to the typical listener. In addition, confidence ratings made by participants with ASD suggest awareness of their response accuracy, which supports the claim made by Hill et al. (2004) that individuals with ASD show some degree of insight into their own thought processes. Whether or not there is a dissociation between what adolescents with ASD think, feel or experience while listening to music and their descriptions of their thoughts, feelings, and experiences remains to be addressed. Future work using psychophysiological measures (electrodermal response, heart rate, etc.) could inform this issue. In addition, different questions can be asked of participants in future studies such as how music makes them feel, how it makes someone else feel, which music they prefer, etc. Interviews on experiences with music performed by Allen et al. (2009) with adults with ASD could be adapted to adolescents in pursuit of these questions.

Implications for ASD

Results from the current study suggest that high-functioning adolescents with ASD can recognize basic emotions in music. Specifically, this is the case for happy, sad, and scary music. These findings underscore the need to vary the types of stimuli used to test emotion recognition in ASD and, moreover, not to limit stimuli to the visual domain. Studies of emotion recognition, so far, are more consistent in the musical domain than the visual domain, and it may be due to the nature of music. Although music is initially a social product created by a composer, a listener does not have to enter into a direct interpersonal interaction with the composer in order to appreciate the music. This may explain why the participants with ASD were able to interpret the music’s meaning in a similar way as participants with TD. These findings therefore suggest that emotion processing deficits in ASD are domain specific, arising in response to social stimuli and situations specified in the DSM-IV criteria for pervasive developmental disorders and may not generalize to other domains such as music.

Music seems to be a channel through which emotions can be communicated to individuals with ASD. Whether individuals with ASD feel emotions from music or process them similarly to the typical listener at the neural level remains to be verified by additional behavioral, physiological, and imaging measures. Further research comparing emotion recognition in many modalities and including other basic emotions is needed.

In closing, this study’s findings can be applied in the context of music therapy or other intervention programs to target social, communicative, and emotional skills. Although findings do not suggest music therapy to be a panacea, they do not contradict use of music therapy in ASD. Music perception seems to be a relative strength for individuals with ASD, in the context of a profile characterized by strengths and weaknesses (Happé and Frith 1994; Happé 1999). Music therapy has yielded positive increases in verbal and nonverbal communication in ASD (Gold et al. 2006). Musical soundtracks influence emotional interpretation of stories in typically developing children (Ziv and Goshen 2006). Music may help individuals with ASD understand basic emotions and/or social situations in everyday life. Verbal instruction techniques meant to teach children with ASD to perceive and express emotions have been shown to be more efficient with the addition of background music targeting the emotions to be learned (Katagiri 2009). Given the accessibility of music, children with ASD and their parents could create lists of songs associated to basic emotions or social situations that they refer to or play to demonstrate which emotion a family member is feeling or felt. Unfortunately, these efforts will be limited by the reality that not all emotions, to say nothing of complex mental states, can be reliably expressed through music. Future studies could also examine if music can be used efficiently for a variety of purposes such as helping individuals with ASD regulate their mood, reduce anxiety or increase concentration.