There is growing consensus that the disturbance in the development of joint attention skills is the major characteristic of the social deficits in young children with autism (Bruner 1995; Mundy and Sigman 2006). Numerous studies have addressed the need for a more extensive understanding of the nature of this disturbance to inform both early diagnosis and intervention, and to further predict prognoses of language development and social competency for these children (Bono et al. 2004; McArthur and Adamson 1996; Mundy et al. 1986; Mundy and Sigman 2006; Siller and Sigman 2002).

Improvisational music therapy has long been noted for its efficacy in engaging autistic children at their level and interest, and helping them to develop spontaneous self-expression, emotional communication and social interaction. Music offers a means of self-expression, communication and interaction that can be more easily assimilated by the children than some other medium (Alvin 1978; Edgerton 1994; Gold et al. 2006; Nordoff and Robbins 1977; Robarts 1996; Trevarthen 2002).

Acquisition of joint attention skills plays an important role in early development, as without joint attention skills, higher functions such as communication, social interaction and language cannot develop well. To date there has not been a controlled study in this area in improvisational music therapy. Therefore, this study set out to investigate whether pre-school children with autism show observable and measurable changes in joint attention behaviors in improvisational music therapy.

Identifying methods for effectively achieving improvement in joint attention skills in young children with autism remains a high priority. Studies indicated superior gains in joint attention, language development and social communication skills when the adult’s behavior remains either contingent, or imitative of the child’s behavior showing high level of synchronization and matching during play interaction (Escalona et al. 2002; Lewy and Dawson 1992; Siller and Sigman 2002; Watson 1998).

Since the early pioneering years of improvisational music therapy, the process of musical attunement, whereby the therapist would sensitively and musically match in with a client’s musical and non-musical expression in order to ‘tune in’ empathically, has been a germane feature of clinical practice and an essential skill of a music therapist (Alvin 1978; Nordoff and Robbins 1977). The term ‘musical attunement’ implies a moment-by-moment, responsive use of improvised music which is sensitive and attentive to the child’s music and non-musical expression. This often involves matching the child’s pulse, rhythmic patterns of movement or musical play, and dynamic forms of expression and melodic contour to the point where there is a common musical foundation between the child and the therapist. This in turn actions as shared context for a therapeutic relationship (Kim 2006; Trolldalen 2005; Wigram and Elefant 2008). In this context, autistic children often appear to perceive that the therapist’s music has something to do with themselves, which often encourages them to join in with, or even initiate interaction with the therapist. This happens because predictable patterns are created by the therapist in the musical improvisation with the child built up from material originating from the child (Kim 2006; Robarts 1996; Saperston 1973). This typically child-centered approach utilises mainly non-verbal musical interaction and is comparable with early reciprocal interaction between mother and infant (Heal Hughes 1995; Holck 2004a, b; Pavlicevic 1997; Robarts 1996; Trolldalen 2005).

Typically developing infants are reported to be born with the emerging capacity to relate to and communicate with people (Stern 1985; Trevarthen 2001). This is an ability that has been described as “communicative musicality” (Malloch 1999) and “inherent musicality” (Robarts 1996). Early pre-verbal infant–mother interaction is indeed fundamentally improvisational in its nature whereby both participants are subtly tuning in, exchanging responses, adjusting and developing their tonal and temporal qualities, dynamic forms and shapes in relation to each other (Stern 1985). Robarts wrote (1996, p. 140).

It is this very intersynchrony, flexibility and creative reciprocity that is absent in the autistic child, and which the music therapist seeks to help the child experience and assimilate to whatever extent she or he is able to do so.

Holck (2002) suggested that music therapists use ‘response evoking techniques’ that involve creating mutually meaningful and enjoyable musical interaction themes in relation to the child’s expression and focus of attention. These offer the potential for drawing the child’s attention towards joint musical engagement. The caregiving/scaffolding model (Bakeman and Adamson 1984) in the development of joint attention skills in infancy is relevant to the way music therapists work with children with a wide range of developmental disorders. Furthermore, Bakeman and Adamson (1984) used a term “coordinated attention” describing an interactive state of the mother–infant dyad, whereby they shared a common focus of attention—a definition that is relevant to this study.

The working definition of joint attention behavior for the analyses undertaken in this study is; an interactive state of joint engagement that involves the child, the therapist, and objects, or events in either musical form, or in play. This study set out to explore that state in improvisational music therapy and play. It was predicted that musical attunement would open and maintain a channel of communication with the child. The child’s ability in joint attention will increase positively over time and joint attention behavior may be better in music therapy condition than play condition.

Methods

Participants

Thirteen boys and two girls aged between 3 and 5 years old with autism, who had no previous experiences in music therapy or play therapy, were recruited from the Department of Child and Adolescent Psychiatry at Seoul National University Hospital (SNUH). Parents gave informed consent for their children to be involved in the study. During the clinical trial, 5 children dropped out for various reasons, and 10 remaining children were boys. The reason for the high rate drop-out will be explained in the discussion section. The participants had a mean chronological age of 51.20 months (SD = 12.08; range 39–71 months) when they entered the clinical trials. All the participants were examined independently at SNUH by two experienced child and adolescent psychiatrists with this population and met the DSM-IV criteria of Autistic disorder. In addition, each participant met criteria on the Korean version of Childhood Autism Rating Scale (Kim and Park 1995; mean score = 36.10; range 32–42.50; SD = 3.41). The Autism Diagnostic Observation Schedule (Lord et al. 1999) which was only available during the latter stages of the trials was administered to 4 out of 10 remaining participants. The results also supported the diagnoses of autism. The results of the Korean version of the PsychoEducational Profile (Kim 1995) showed a mean developmental quotients of 70.28; range 60–89; SD = 9.97. Five children were non-verbal whereas another five were verbal with varying degree of language skills.

Procedure

A repeated measures comparison design, both between conditions and within subjects, was used. Each participant had 12 weekly 30 min improvisational music therapy sessions, which were compared with a control condition of 12 weekly 30 min play sessions with toys. Due to holidays and sick leave, it took the participants between 7 and 8 months to complete the 24 session program. The participants were randomly assigned to two groups. Group one (five children) had music therapy sessions first and play sessions after, while group two (five children) had play sessions first followed by music therapy sessions. Each session in both conditions was divided into a 15 min undirected (child-led) part where the therapist supported and elaborated the child’s play, followed by a 15 min directed part where the therapist gently introduced modeling and turn-taking activities within the child’s focus of attention and the range of interest, each lasting approximately 15 min. The Pervasive Developmental Disorder Behavior Inventory-C (PDDBI; Cohen and Subhalter 1999), and the Early Social Communication Scales (ESCS; Mundy et al. 2003) were used as pre, in between, and post-treatment measures. Predefined target behaviors were analyzed from sampled DVD excerpts using both frequency and duration measurements. In order to avoid person familiarity influencing the therapeutic outcome, the music therapy and play sessions were carried out by two different therapists. Two music therapists, one play therapist and three music therapy graduate students formed the research team. A semi-flexible treatment manual was developed and used for both conditions to ensure consistency and reliability of intervention. The clinical trials were undertaken at the first author’s private practice clinic in Seoul, Korea.

Measures

The PDDBI

The PDDBI is an informant-based scale that can reliably measure responsiveness to interventions in individuals within autism spectrum disorder. The social approach subscale from the PDDBI was used as the relevant scale for the study. There are two versions of the PDDBI; Parent and Teacher PDDBI. Both versions were translated and back-translated by the first author. A professor of clinical psychology (Shin) at SNUH scrutinized the first version of PDDBI and made amendments. The amended PDDBI was tested on different people including mothers of autistic children, teachers in special education, primary school and university, and speech and language therapists in Korea to examine its validity. The Korean version of the PDDBI was developed before the trials of this study began in 2004.

The PDDBI was completed by both the mothers of the children and professionals who were involved with each child. For the practical reasons, all mothers observed every session through the TV monitor in the waiting room of the clinic, while the professionals nominated to complete the PDDBI were blind to the order of the experimental conditions. Cohen et al. (2003) reported that the PDDBI was a reliable and valid tool for assessing the children’s responsiveness to interventions. This sub-scale was used in order to identify whether the mothers and professionals recognized improvements in joint attention skills and pro-social behaviors of the children.

The ESCS

The ESCS is a structured toy play assessment measuring non-verbal social communication skills in typically developing infants aged between 6 and 30 months. This has been applied to assess those skills in young children with autism (Kasari et al. 1990; Mundy et al. 1994; Siller and Sigman 2002). In this study the abridged version of the ESCS (Mundy et al. 2003) was used. The ESCS provides frequencies of scores for two types of joint attention behaviors: Initiation of Joint Attention (IJA) and Responding to Joint Attention bids (RJA). IJA differentiates a low level behavior (IJAL) such as making eye contact and alternating eye contact between a toy and the tester, and a high level behavior (IJAH) such as pointing and showing gesture that indicates the child’s intention to share the experience of toy play with the tester. RJA refers to the number of times (as a percentage) the child follows the tester’s pointing gesture correctly.

Treatment Sessions

For each condition, a consistent range of materials (Table 1) were pre-selected and made available to participants throughout the trials.

Table 1 Equipment in music therapy and play conditions

All sessions in both conditions were video recorded and stored on DVD. Minutes 4–7 (undirected part), and minutes 19–22 (directed part) from selected sessions (1st, 4th, 8th, and 12th) in both conditions (music therapy and play) were subjected to detailed analysis for pre-defined target behaviors of joint attention. Table 2 shows brief definitions of these target behaviors.

Table 2 Brief definitions of target behaviors

For coding the frequency and duration of the behaviors, a coding procedure and data chart were developed for microanalytic (second by second) analysis of the session material.

Data Analysis

Repeated measures ANOVAs were applied and effect sizes were calculated in order to find out whether changes were clinically meaningful. For a small sample outcome study, there is a need to reduce the number of outcome variables (Cohen 1990). Therefore, the combined scales of the ESCS were constructed by adding up the raw values. While this resulted in more weight being given to those elements that vary most (IJAL), this was the most transparent and clear way to analyze the data. The internal consistency for the joint attention sub-scale was calculated using Cronbach’s alpha (0.81 for only first time point; 0.60 for all time points combined).

For the statistical treatment of frequency and duration data drawn from the analysis of DVD excerpts, the distribution of values of session data resembled a poisson distribution (Upton and Cook 2002). This was analysed through a repeated measures ANOVA incorporating a generalized linear mixed model with multivariate normal random effects, using a penalized quasi-likelihood (www.r-project.org, package MASS, function glmmPQL; Venables and Ripley 2002, pp. 297–298).

Results

Levels of Agreement Between Observers in PDDBI, ESCS, and Session Analysis

Intraclass correlation coefficients (ICC) were used to determine levels of agreement in perception of behavior between the two raters (parents and professionals) in scoring the PDDBI, and levels of agreement between the two observers analyzing DVD data for the ESCS and session analysis.

For the PDDBI, the level of agreement (correlation) between the mothers and the professionals appeared to be very low (0.19 at pre-treatment; 0.51 at in between treatment; 0.67 at post-treatment). Strictly this was not inter-rater agreement, but the comparison of data, as the raters were reporting on a child’s behavior in two different situations.

For the ESCS and session analysis, 30% of total DVD recordings were randomly selected and rated to establish inter-observer reliability. As an index of inter-observer reliability, the ICC levels for the ESCS and session analyses ranged between good and excellent (Cicchetti and Sparrow 1981; Cicchetti 1994). For the ESCS, the two independent observers were blind to the order of conditions. Inter-observer reliability, as measured on the first observation of each participant, ranged between 0.89 and 0.97, except for IJAH, which had a lower reliability (0.71). For the session analysis, the primary coder was the first author. Since the microanalytic coding procedure was highly complex, a research assistant was thoroughly trained to carry out independent coding (once a week for approximately 4 months) in order to ensure good enough inter-observer agreement. The second coder was blind to the order of the sessions. The results of the ICC ranged from 0.90 up to 0.98.

The PDDBI

Figure 1 shows an analysis of the pooled scores made by both the professionals (the left) and the mother (the right) on the social approach subscale from the PDDBI (y-axis) at the three separate time points (x-axis). Group 1 is represented by dotted line and group 2 by solid line. Group 1 in both graphs improved after music therapy and to a slightly lesser degree (the professionals), or slightly worsened (the mothers) after play sessions. The scores by the professionals indicated that group 2 improved slightly after play, and to a larger degree after music therapy, whereas the scores by the mothers showed that group 2 improved in both conditions and there was not much difference between the two conditions.

Fig. 1
figure 1

Pooled scores of social approach behaviors by professionals and parents—PDDBI

The results of the ANOVAs indicated that time was significant (p < 0.0001), but the other independent variables were not. Effect sizes comparing scores after music therapy with after play (ignoring sequence) found a small effect for professionals (d = 0.16, 95% confidence interval (CI) ranging from −0.31 to 0.62). Recalculating the effect based on the change scores between data points (i.e., change during music therapy versus change during play) yielded an effect size of d = 0.79 (95% CI from −0.14 to 1.71) for professionals, which was a larger effect, but still not significant. Effect size calculations of parents’ scores provided similar results.

THE ESCS

Figure 2 shows the combined scores of joint attention behaviors of the participants and indicates that there was improvement over time in both groups and the improvement appeared to be greater after music therapy than after play.

Fig. 2
figure 2

Pooled scores of joint attention—ESCS

The results of the ANOVA suggest that the interaction of time and group was significant (p = 0.01) indicating that music therapy was significantly more effective than play sessions in addressing joint attention skills. Effect sizes comparing scores after music therapy with after play (ignoring sequence) found a medium effect (d = 0.63, 95% CI ranging from 0.31 to 0.95). Recalculating the effect based on the change scores between data points yielded an effect size of d = 0.97 (95% CI ranging from 0.20 to 1.74), which was a large and significant effect. The patterns seen in individual subscales—IJAL and RJA were mostly similar to those of the combined scale, with the exception of IJAH. The IJAH either showed hardly any change over time, or a slightly worsening of the behavior.

Treatment Session Analysis

Session analysis yielded very similar frequency and duration data on selected target behaviors. Therefore, only duration data is selected to be presented in this article.

Eye Contact Duration

A significant effect (p < 0.0001) was found comparing the music therapy condition with play condition. Figure 3 shows eye contact events occurring markedly longer in music therapy than in play.

Fig. 3
figure 3

Eye contact duration

Turn-taking Duration

For turn-taking duration, a significant effect was found comparing conditions (p < 0.0001) and session part (p = 0.037), and results are illustrated in Figure 4. The ANOVA of the selected sessions was approaching significance (p = 0.051).

Fig. 4
figure 4

Turn-taking duration

Figure 4 indicates that music therapy was more effective at facilitating a longer turn-taking duration than the play condition. There was longer duration of turn-taking activity in the second (directed) half of the sessions in both music therapy and the play condition.

Discussion

The overall results from both the standardized measures and the session analysis were generally in favor of music therapy over the play condition with toys in improving the joint attention behaviors of the participants.

The findings of the PDDBI-social approach behavior sub-scales suggested that the parents and the professionals recognized improvements in both conditions as the sessions were carried out by the experienced therapists in this field. While the scores of the professionals suggested greater improvement after music therapy than after the play condition, the scores of the mothers were not consistently in that direction. Overall the mothers appeared to give scores much higher than the scores of professionals. Therefore, the levels of agreement between mothers and professionals appeared to be very low. Cohen et al.’s (2003) study also indicated that the parent–professional reliabilities were not as high as professional–professional (teacher–teacher) reliabilities. It turned out that the scores of the mothers often reflected the mother’s level of expectation and understanding (also misunderstanding) of their child’s condition and level of functioning, while scores of the professionals’ (who were blind to the order of the experimental conditions) were quite congruent with the results of the ESCS and session analysis, suggesting the scores of the professionals retained more objectivity. However between group correlation scores at mid and post-test improved over time, suggesting that the mothers’ scoring was becoming more accurate and realistic. The differences in the scores of the mothers and professionals may reflect that children behave differently in different situations, but may also suggest that the informant based rating scales may reflect some level of personal bias, relating to who was scoring the scale, therefore interpretation of such results needs to be carefully considered.

The results of repeated measures ANOVAs in the total joint attention scores of the ESCS and session analysis indicated that the improvement after music therapy was markedly better than after play at a significant level (p < 0.05).

The most outstanding results of the individual items of the ESCS were ‘Initiation of Joint Attention Low (IJAL)’, consisting of eye contact and alternating eye contact. Both the ESCS and session analysis indicated that the majority of participants showed marked improvement in joint visual attention skills during and after music therapy than during and after the play condition throughout the trials and across the cases. There are many case studies and a few controlled studies that describe how the use of improvised music (attentive to what the autistic child does in the here-and-now and previously described as ‘musical attunement’) results in increased spontaneous eye contact among other behavioral improvements (Bunt 1994; Plahl 2000; Saperston 1973; Robarts 1996; Wigram 2002). Gold et al. (2006) point out that improvisational music therapy offers premises for communicative behaviors, such as joint attention behaviors including eye contact. The majority of the participants in this study, however, failed to exhibit improvement in higher levels of gestural joint attention (pointing and showing). This finding was congruent with Mundy et al.’s study (1994). Only 2 out of 10 participants exhibited a few pointing gestures during the ESCS. Their gestures appeared to be ambivalent, which resulted in low inter-observer agreement. Concerning responses to a joint attention bid (RJA), improvement after music therapy than after play sessions was greater. Some studies (Bono et al. 2004; Siller and Sigman 2002) specifically indicated that children who respond positively to the joint attention bids of others (RJA) potentially make the largest developmental gains in language. Anecdotal reports from the therapists and the mothers showed that three out of five non-verbal participants did begin to develop some initial language skills during and after the music therapy condition, which seemed to support the claims of these studies.

The turn-taking activity in the session analysis shares a common procedural feature with the RJA of the ESCS (i.e., the therapist was instructed to make purposeful, but gentle interpersonal demands in the second half of the session). The participants showed a longer turn-taking behavior in the directed part (second half of the session) than in the undirected part (first half of the session) both in music therapy and the play condition. The result suggest that the turn-taking may occur spontaneously in the first half, but occurred longer in second half when clinical direction was intended to influence the child that way. Therefore, the result provided some evidence that the therapists were able to follow the direction of treatment protocol. Holck (2004b) pointed out that musical turn-taking often consisted of imitation and variation. In this study, initially it was the therapist who imitated what the participant did in order to build empathic mutuality of interaction with the participant. Holck (2004b) described in her study how the child and the therapist exchanged roles and the initiator (child) became the imitator in the latter stages of the music therapy intervention, which was congruent with the observational findings of this study. Other studies report that autistic children not only perform poorly on imitation tasks, but that they do not alternate the roles of initiation and imitation in turn-taking readily (DeMyer et al. 1972; Nadel et al. 1999). Therefore the results suggest that improvisational music therapy has the potential to facilitate skills fundamental to social interaction, especially non-verbal interaction in children with autism.

The ESCS and session analysis results suggested that music therapy intervention was especially effective in improving lower levels of initiating joint attention (IJAL; eye contact, and alternating eye contact between an object and a person), responding to joint attention bids (RJA) and social interaction. What then makes improvisational music therapy an effective intervention in improving these specific social skills in children with autism? Music therapy experts in the field have pointed out two contrasting qualities of the improvisational music making process, which are clinically relevant in working with autistic children; the possibilities for stability (predictable structure) and flexibility (spontaneity) (Brown 1994; Oldfield 2006; Wigram 2002). Improvisational musical interaction can foster flexibility and creativity in a structured framework for those children who cannot readily adjust themselves to the unpredictability of daily life. Therefore improvised music in relation to the child’s musical and non-musical expression is an ideal way to work through issues of control and rigidity with these children. Wigram (1995, p. 184) has noted that music therapy offers the child opportunities to be “more in control, and even musically direct the behavior of an adult”, therefore, focusing on what the child is able to do rather than focusing on the pathology of the child.

As the process of improvisational music therapy facilitated simultaneous coordination of ‘listening’, ‘visual referencing’ and ‘responding’ and ‘engaging’, the results suggested that improvisational music therapy facilitated the spontaneous process of social learning and provided the premise for social motivation in children with autism.

While the results of the ESCS and session analysis were encouraging, the study revealed the limitation of a small sample study. There were indications that both parents and professionals recognized more improvement in joint attention behaviors in children after music therapy than after play sessions. However, the results of ANOVAs for the PDDBI were not statistically significant. There was a high rate of dropouts, primarily due to long distance travel requirements. Three of five children dropped out for health reasons (hospitalization) in addition to the difficulties of long distance travel. The data from the drop-outs were not included in this study. As the study only involved 10 remaining participants, drawing any generalisable conclusions are premature. The claims of the study of increased joint attention remain to be established through further research.

The direction of future research should focus on replicating this study with larger samples to find out whether similar encouraging results can be generalized beyond what occurred in this experimental study. The order of session part (therapist led or child led) should also be randomized to explore how the autonomy of the child may be a significant factor in joint attention and social communication. This study may serve as a model for such a future study, but caution needs to be exercised in the very time consuming process of video analysis, even with selected excerpts. The findings of this study highlighted the social engagement that occurs through improvisational music making, and the therapeutic potential of child-centered approaches like improvisational music therapy.