Introduction

Narratives relate to personally experienced or fictional events (McCabe 1991). From an early age they represent a large part of discourse (McCabe et al. 2008) and have been linked with success in peer relationships (Bloome et al. 2003), daily interactions (McCabe et al. 2008), development of personal identity (Bloome et al. 2003) and school achievement (Bishop and Edmundson 1987; Hughes et al. 1997; Kaderavek and Sulzby 2000). Narratives have been called a bridge between oral language and literacy (Westby 1991) as they provide a structure for organizing abstract thought through sequencing, and a structure for the development of literate language, (Petersen 2011) (e.g., conjunctions, elaborated noun phrases, mental verbs and adverbs) (Greenhalgh and Strong 2001). Skills with both personal and fictional narratives have been associated with skills in reading (Wellman et al. 2011), reading comprehension (Dimino et al. 1995), written language (Kaderavek 2015; McCabe et al. 2008), and classroom discussion (Nathanson et al. 2007). Crucially, narrative is also a major tool for teacher evaluation of student knowledge (Bloome et al. 2003; Petersen and Spencer 2010a).

The mainstream western academic culture places a high value on narratives that adhere to a set macrostructure (story grammar) organization (Bliss and McCabe 2008; Brown et al. 2014; Petersen et al. 2010). Macrostructure is the content and organization of a story (Finestack 2012) and represents a means of making sense of narratives (McCabe 1991).

Specific microstructural features, (e.g., total number of words, number of different words, coordinating conjunctions, subordinating conjunctions, past tense) may enhance narrative macrostructure quality (Segal and Duchan 1997) and clarify meaning (Eisenberg et al. 2008; Spencer et al. 2013). Macrostructure elements however are considered core to fictional narratives (Peterson and McCabe 1991) and include setting (incorporating character), initiating event, internal response or feelings, plan, attempt at a resolution and an end (Stein and Glenn 1978). See appendices for general and specific examples of macrostructure elements.

Narrative Retell

Narrative retells require individuals to listen to or read a story and then retell it in their own words (Kalmbach 1986). A narrative retell is not an attempt at verbatim recall but rather an attempt to communicate understanding by selecting, organizing and emphasizing parts of the narrative while ignoring others (Kalmbach 1980). The ability to retell a fictional narrative, using macrostructural elements, is an important skill for literacy development (Dimino et al. 1995) as proficiency with narrative retells assists individuals to comprehend narrative structure and the main idea, while simultaneously facilitating oral language development (Rog 2003). Narrative retell may provide a bridge to original narratives, as individuals are required to identify, comprehend and reproduce the narrative structure for an existing story, without the additional cognitive demands of original narrative generation.

Narrative and ASD

Autism spectrum disorder (ASD) is a developmental disorder characterized by impairments in social interaction and social communication and restrictive and repetitive patterns of behavior (American Psychiatric Association 2013). Narratives are one of the most socially motivated areas of language (Eigsti et al. 2011) and children with ASD have been found to have difficulties with narratives, even when they do not have diagnosed language impairment (Baixauli et al. 2016). Baixauli et al. (2016) conducted a meta analysis of 24 studies in which researchers investigated the oral narrative production skills in individuals with ASD but no language or intellectual impairment. They concluded that individuals with ASD performed significantly worse than peers in both macrostructural and microstructural domains. Specifically, they concluded that individuals with ASD may produce narratives that have impaired story structure (Barnes and Baron-Cohen 2012); include fewer causal relations and fewer mental state verbs (Barnes and Baron-Cohen 2012; Baron-Cohen et al. 1986) and that they may be shorter, less descriptive and less grammatically complex (Tager-Flusberg 1995; King et al. 2013). Such difficulties are likely to be substantially compounded in individuals with language impairments.

There is limited research into the effect of oral narrative intervention on the oral narratives of children with ASD. Researchers in three studies have investigated the effects of oral narrative intervention on the narratives of children with ASD (Favot et al. 2018; Gillam et al. 2015; Petersen et al. 2014) and found that explicit oral narrative intervention may be an effective strategy. Participants in the studies were required to generate personal narratives (Favot et al. 2018; Petersen et al. 2014) and original fictional narratives (Gillam et al. 2015) but narrative retells have not been examined to date. Favot et al. (2018) used a single macrostructure score combined from the elements of where, who with, what and feelings to measure the efficacy of an oral narrative intervention. Gillam et al. (2015) measured narrative growth using three different scales, two made from combined scores of macrostructure and microstructure and one using a combined score of five of the seven macrostructure elements taught in the intervention program. Petersen et al. (2014) measured growth across individually targeted single elements of macrostructure and microstructure. Gillam et al. (2015) included two participants with mild language impairment and 2 with moderate to severe language impairment. All three participants in Favot et al. (2018) had diagnosed language impairment according to a battery of standardized language assessments. All three participants in Petersen’s study (2014) were described as having language impairment based on parent and teacher reports, and narrative retell skills significantly below developmental expectations, using the Test of Narrative Retell (Petersen and Spencer 2010b). Only one study included participants with documented low verbal IQ (Favot et al. 2018). Common intervention components of these studies included, using icons to represent macrostructure elements, pictures to represent individual narratives, clinician modeling of narratives and requiring students to say an entire narrative each intervention session.

Given the links between narrative success and academic success, the problems that children with ASD experience with narrative retell and the paucity of research in the area further research to extend the existing research is warranted. The purpose of this study is to investigate the effect of an oral narrative intervention on the macrostructure of fictional narrative retells of children with ASD and severe language impairments. Given there is no research currently available on teaching this type of narrative to children with ASD and severe language impairment, the pilot study was also intended to provide information on measurement issues, problems related to the intervention and any adjustments that might be required. The specific research questions were:

  1. 1.

    Does oral narrative intervention have an effect on the macrostructure of fictional narrative retells produced by school-aged children with ASD and severe language impairment?

  2. 2.

    Do improvements in the macrostructure of fictional narrative retells produced by school aged children with ASD and severe language impairment maintain after intervention has been withdrawn?

  3. 3.

    Do improvements in the macrostructure of fictional narrative retells produced by school aged children with ASD and severe language impairment generalize to storybooks typical of classroom use?

Method

Participants

Two girls and two boys were selected to participate in the intervention study. All four participants attended the university based special education program where the intervention took place. The university research ethics committee approved the intervention. Participants attended the program Monday to Friday and received instruction in a broad educational program with a focus on literacy and numeracy. Participants were eligible for the study if they (a) had a diagnosis of ASD from a pediatrician or psychologist, (b) had a receptive and expressive language impairment according to results from a standardized language assessment, (c) had English as their home language, (d) had speech intelligible to non-familiar listeners as judged by the researchers, (e) were able to sit at a desk and participate in a structured class activity for 10 to 15 min, as reported by the classroom teachers and (f), did not include all of the following macrostructure elements in their fictional narrative retells, who, what + where, problem, feelings about problem, what did the person in the story do to fix the problem, what happened next, and the end.

The first author, also the school speech and language pathologist conducted language assessments using the Clinical Evaluation of Language Fundamentals, 4th Ed, Australian and New Zealand Standardized Edition (Semel et al. 2006), and the Peabody Picture Vocabulary Test, 4th Ed (Dunn and Dunn 2007). The final inclusion criterion was based on a screener fictional narrative retell that was collected from each participant prior to the research by asking each child to listen to a short narrative and then tell it back. The fictional narrative retells were collected in a quiet room with the participant sitting next to the first author. The participants’ classroom teachers also completed the Childhood Autism Rating Scale, 2nd Ed (Schopler et al. 2010). The results of assessments are provided in Table 1. Not long after the intervention finished Zoe was diagnosed with absence seizures by a neurologist, but did not receive medication.

Table 1 Participant description

Experimental Design

A multiple baseline with probe across participants design was used to investigate the effects of a fictional narrative retell intervention on the participants’ fictional narrative macrostructure. Children who may benefit from narrative intervention are a diverse group and may have complex and idiosyncratic problems and the diversity of needs and skills in children with ASD make it difficult to recruit the large samples needed for group designs. Multiple baseline across participant designs allow the researcher to investigate behaviors in individuals rather than groups. Experimental control is demonstrated in multiple baseline designs when the data illustrates experimental effect at three different points in time (Kratochwill et al. 2010; Kazdin 2011).

Materials

The first author used a magnetic whiteboard (60 × 45 cm), icon cards (5 × 5 cm) representing each of the seven macrostructure elements, one probe narrative and one intervention narrative per session. The narratives were written by the first author based on narratives in The Test of Narrative Retell-Preschool (TNR-P) (Spencer and Petersen 2010). Each narrative was textually explicit as all the information needed to fully understand the text was given to the listener (Carnine et al. 2009). The narratives contained situations and problems that could likely be within the participants’ experience (e.g., falling off a scooter). The 30 stories were written in the same format. They contained between 65 and 75 words and presented information pertaining to the seven macrostructure elements in the same order. Each narrative included the macrostructure elements of who (main character), what + where (what the main character was doing and where they were), problem (what went wrong), feelings about the problem, do (what the main character did to try and fix the problem), next, (what happened after the main character tried to fix the problem), and end. Picture Communication Symbols (Mayer-Johnson 2008) representing each of the macrostructure elements were used as icons (visual supports). After two intervention sessions with the first participant the data indicated that the original “where” macrostructure component and corresponding icon was not eliciting the expected information. Therefore the icon was altered for use in both probes and intervention to become “what + where”. This was to explicitly incorporate the setting activity plus the location as used by Petersen and Spencer in the TNR-P (Spencer and Petersen 2010). This was then applied in both the icon probe conditions and intervention for all participants.

Each of the 30 narratives was assigned a number between one and 30. A random number generator (www.random.org) was used to select 10 numbers between one and 30, to be the probe narratives. Those 10 narratives were then renamed probe narrative 1–10. The remaining 20 narratives were used for intervention (intervention narratives 1–20).

Dependent Variables

The dependent variable was the macrostructure of fictional narrative retells. Data were collected in both the no icon and the icon condition. The icon condition was included as it is a more sensitive measure of improvement and it was likely that progress would be made in the icon condition before the no icon condition. The seven macrostructure elements used in this study are based on Stein and Glenn’s (1978) macrostructure elements but were renamed to increase the transparency of meaning for the participants, given their level of language impairment. The macrostructure of fictional narrative retell was comprised of who, what + where, problem, feelings about the problem, do, next and end.

Responses of any length were acceptable. The responses for each component were not required to be linked grammatically or to be provided in a specific order. The first author awarded each of the seven macrostructure components a score of 0, 1 or 2 according to set criteria for each story and each retell was scored out of possible 14 points. The scoring criteria for each story followed the general scoring guidelines set out in the Test of Narrative Retell School Age: Examiner’s Manual (Petersen and Spencer 2010a) but was adapted to suit the stimulus stories. Two points were awarded if all the relevant information was explicitly included. One point was awarded if only some relevant information was included or if the information was not specific. See Appendix 1 for definitions of each macrostructure component and general scoring guidelines. See Appendix 2 for an example story with specific scoring guidelines.

Procedures

In baseline and probe conditions the participant sat next to the first author at a table in a small room next to their regular classroom. The whiteboard was on the table directly in front of the participant. An iPhone 4 was in an elevated position on the table and was used to video record each session.

Baseline and Probes

Probes were collected weekly, if the participant was not yet receiving intervention or four times a week if they were in true baseline or receiving intervention. In the baseline / probe sessions a different narrative was used each session for 10 sessions and then the narratives were reused. In the intervention phase, probes were conducted before the intervention session that day.

Two probes were conducted in each session. The first probe was the no icons condition. The whiteboard was placed in landscape orientation in front of the student but was not used. The first author greeted the participant and gained their attention by saying that she was going to read a story and that the participant should listen and tell it back to the first author. If the participant began to talk while the first author was still reading the first author put up a hand and non-verbally indicated to the participant to stop talking. The first author read the narrative, paused for one to 2 seconds and then asked the participant to retell the story. When the participant stopped talking for 3 seconds, the first author thanked them but made no other comments.

The second probe was the icon condition and it was carried out immediately after the no icon probe, using the same stimulus story. The first author placed the seven macrostructure icons across the top of the whiteboard left to right in the following order, who, what + where, problem, feeling about the problem, do, next and end. The first author did not explain the macrostructure icons. The same procedure was used as for the icon condition.

The participants’ responses were transcribed by the first author as much as possible during the probe sessions, but all of the probes and intervention sessions were video recorded to allow for baseline and probe transcription, coding, and interrater and procedural reliability. All probes were transcribed verbatim by the first author, including fillers, false starts and idiosyncratic articulation. If the first author was not able to understand the participant on the recording then it was replayed at 50% playback speed to ensure the participants were not penalized for lack of intelligibility.

After 20 probes with Zoe the first author discontinued transcription during the probe sessions and relied on the video recording only. It appeared that Zoe would keep talking, including extended talking off topic as long as the first author was writing. When transcription ceased, this behavior also ceased. This change was only made for Zoe.

Intervention

Intervention was implemented by the first author immediately after the probes. Participants received four intervention sessions over 3 days each week. Intervention sessions were conducted one to one in a small room next to the participants’ classroom. All intervention sessions were conducted with the first author sitting next to the participant. In the intervention sessions a different narrative was used for 20 sessions and then the intervention narratives were reused.

The intervention procedure was designed so that the participant produced each element of the retell separately in response to questioning and then they would say the entire retell independently. The procedure is outlined in Table 2. Specific wording used by the first author during the intervention reflected the participant’s language level (e.g. “He couldn’t get to sleep because he was scared.” or “He couldn’t get to sleep. He was scared.”). Reminders to attend and/or praise for being on task were used as needed.

Table 2 Key Steps of Fictional Narrative Retell Intervention

The seven macrostructure icons were presented as in the baseline and probe conditions. The first author stated that she was going to read another story and that the participant should listen because they would have to retell the story. The first author waited for the participant to indicate that they were ready to begin.

The first author read the intervention narrative, asked the participant to retell the story, then immediately asked the participant to say who was in the story while simultaneously pointing to the icon for who at the top of the board. If the participant responded correctly the first author provided confirming feedback (e.g., state name of character you told me who was in the story) and moved the who icon to the bottom of the whiteboard. If the participant provided no response, a partially correct response or an incorrect response the first author modeled the correct response then asked the participant again who was in the story. If the participant responded correctly the first author restated the correct information and confirmed that the participant had said who was in the story, pointed to the who icon and moved it to the bottom of the whiteboard. If the participant again provided no response, a partially correct response or an incorrect response, the first author stated the correct information, stated that it is the who information, pointed to the who icon and moved it to the bottom of the whiteboard. The same procedure was followed for the remaining macrostructure elements until all the icons were at the bottom of the board.

The first author then asked the participant to retell the entire narrative, pointing to the who icon as a cue to begin. As the participant provided information for each macrostructure element the first author pointed to the next macrostructure icon. If the participant provided no information, partially correct information or incorrect information for any macrostructure element the first author immediately provided the correct information for the whole of the element and then pointed to the next macrostructure icon card.

When the participant had finished their retell, the first author retold the whole narrative, while pointing to the relevant macrostructure icon. If the participant made an error during the first opportunity to retell the narrative the first author asked the participant to say the entire narrative again, pointing to the who icon as a cue to begin. If the participant again made an error the first author provided the correct information and immediately moved onto the next macrostructure element. To conclude the session the first author told the participant that they did a great job and they were finished.

A gradual introduction of the seven macrostructure components was implemented for each participant. In the first intervention session, the who, what + where and problem elements were elicited and retold, in the second session feelings about the problem and do were added and from the third session, all macrostructure elements were included.

After 33 intervention sessions Monica was still not consistently including the do, next, and end components in probe conditions. The first author amended the intervention procedure to highlight those macrostructure elements. After the first author asked for the do information, the correct answer was modeled straight away and the question was asked again. The correct answer and error correction procedures remained the same. Similarly, the correct answer was modeled straight away for the next and end macrostructure components. This change was only for Monica.

After seven intervention sessions with Stephano a narrative retell with no icon component was added to the end of each intervention session, as he had begun to show an intervention affect in the icon condition but was still scoring zero in the no icon condition. After he retold the whole narrative with icons the icon cards were removed and the first author stated that he could also retell the story without the cards. The first author asked Stephano to retell the narrative, pointing to the place where the icon for who would have been as a cue to begin. This change to the intervention was made only for Stephano.

Maintenance Probes

Maintenance probes were collected for all participants under the same conditions as baseline and intervention probes. Five maintenance probes were collected for Monica up to 26 weeks after intervention ceased, four were collected for Andre in the no icons condition and 3 in the icons condition up to 15 weeks after intervention, three were collected for Stephano up to 15 weeks after intervention, and two were collected for Zoe up to 8 weeks after intervention had ceased.

Generalization Probes

The daily probes were a measure of generalization, as the probe stories were untaught. In addition, generalization data across stimulus types, using three storybooks that had not been read to the class but were indicated by the classroom teacher as being typical of classroom use, were collected in both the no icons and the icons condition. Generalization data were collected under the same conditions as the probes. Data were collected during the intervention and maintenance phases for Monica and Andre, and in the baseline, intervention and maintenance phases for Zoe and Stephano.

Transcript Reliability

A research assistant independently transcribed 20% of randomly selected probes. For training purposes the research assistant was instructed to transcribe the recordings verbatim including all false starts, fillers and idiosyncratic articulations and indicate blocks of unintelligible speech as UI (unintelligible). They could play the recording as often as was needed to allow full transcription and at a reduced speed. The research assistant transcribed three recordings not used for transcript reliability and the first author conducted reliability as described below. Training reliability was 80%.

Each participant’s probes were assigned a number and then selected for reliability using a random number generator (www.random.org). The first author’s transcription was the base transcription and the research assistant’s was compared against it. All words were counted in each base transcription. Fillers were not included in assessment of transcript reliability. The differences between the 2 transcriptions were recorded and divided into differences that could lead to coding changes and those that could not. Differences that could lead to coding changes were those involving essential information to the story (e.g., “He was a surprise” versus “He wasn’t surprise”). Overall transcript reliability was 76% (range 73–78%). Monica’s overall transcript reliability was 78% (range 65–93%), Andre’s was 76% (range 59–83%), Zoe’s was 73% (range 63–84%) and Stephano’s was 76% (range 40–100%). Only 4% of disagreements lead to coding changes.

Coding Reliability

A different trained research assistant independently coded the same 20% of participants’ narrative retell transcripts. For training purposes the research assistant was provided with a copy of general scoring rubric and the specific scoring rubric for each narrative. Coding for one narrative not used for reliability was then discussed. Twelve further transcripts not used for reliability were selected for coding practice. 86% agreement was achieved overall on the training scripts. Disagreements were discussed.

Reliability was calculated by dividing agreements by agreements plus disagreements. Overall reliability across all seven macrostructure components was 89%. Overall reliability was: Andre 92% (range 86–100%), Monica 86% (range 57–100%), Zoe 87% (range 57–100%) and Stephano 95% (range 92–100%). The low scores for Monica and Zoe resulted from coders having three disagreements on three different occasions. The disagreements were due to the participants’ disordered language structures leading to content being interpreted differently. Reliability across the individual macrostructure components was: who 96%, what + where 96%, problem 82%, feeling 93%, do 71%, next 78%, and end 84%. The information contained in the do component of the stories was complex information making coding judgment more challenging. Given that total scores for narratives were used as the dependent variable, a Pearson correlation was also calculated between the total scores for each rater, resulting in a correlation of 0.98 in the no icons condition and 0.97 in the icons condition.

Procedural Reliability

A trained research assistant also conducted a procedural reliability check on the same 20% of all intervention sessions using a procedural reliability checklist (available from first author on request). For training purposes a procedural reliability checklist was provided to and discussed with the research assistant. The first author and the research assistant watched one intervention session together and jointly conducted reliability as described below. The research assistant then independently conducted reliability on two more intervention sessions. Questions arising were discussed. For reliability scoring purposes each step was scored as either correctly or incorrectly completed. Steps that were not carried out were scored as errors. Steps that were not required, for example the error correction steps if no errors occurred, were not included in the final calculations. Overall procedural reliability was 97% (range 91–100%).

Social Validity

Two social validity measures were conducted. The purpose of the first measure was to determine whether a naïve observer rated baseline or intervention narratives as better. A school volunteer with experience interpreting disordered language read five pairs of transcribed retells for each participant. Each pair consisted of the participant’s first attempt at retelling a narrative in baseline and their final attempt at retelling the same narrative after intervention. The order of baseline and intervention retells within each pair was randomized and the rater was asked to read the paired retells and judge which was the better story. The rater was given the original narrative but no explanation of what constituted a better story.

The primary purpose of the second measure of social validity was to determine whether macrostructural elements from a participant narrative could be extracted without access to the original story. Four school staff members were trained by the first author using two training scripts. The training exercise involved teachers individually extracting macrostructure elements then discussing the outcomes as a group.

Each of the staff members read five randomly selected pairs of retell transcripts for one participant. Each pair of retells consisted of one retell from baseline or, if a particular story was not available from baseline, from early intervention, (this occurred on three of a total of 20 occasions across the four participants) and one retell from the final third of intervention. The first author used a random number generator (www.random.org) to select narrative retells that would be presented together for each participant. For each narrative in each pair teachers were asked to extract and record the macrostructure information (who, what + where, problem, feelings, do, next, end) of each story and then judge which was the better narrative. The teachers were not given the original story or any explanation of what constituted a better story.

Results

Figure 1 shows the effects of oral narrative intervention on the macrostructure of fictional narrative retell for each participant in the no icons condition. Figure 2 shows the effect of oral narrative intervention on the macrostructure of fictional narrative retell for each participant in the icons condition.

Fig. 1
figure 1

no icons

Fig. 2
figure 2

icons

Monica received 52 intervention sessions and approximately 5 hours of intervention. Intervention sessions ranged between approximately 4 and 8 minutes in length. Examination of the Figs. 1 and 2 indicates an intervention effect for Monica. Her baseline scores in both conditions were low, despite one high score in the icons condition. Her scores increased quickly in both conditions once intervention began and despite variability in scores display a general upward trend.

Ideally after Monica showed an intervention effect intervention should have started with Stephano, based on his low stable baseline, however due to classroom considerations intervention began with Andre. He received 30 intervention sessions and approximately 2 hours and 30 minutes of intervention. Intervention sessions ranged between 3 and 8 minutes in length. His baseline score in the no icons condition was stable and even though his true baseline was higher than the weekly probes, intervention data showed no upward trend. His intervention data was initially quite variable but then showed a clear upward trend with a higher degree of stability of scores.

Andre’s baseline scores in the icons condition were variable. There was an upward trend in his true baseline but it stabilized before intervention began. His intervention scores in the icon condition were also initially variable but then became more stable.

Zoe received 31 intervention sessions and approximately 3 hours of intervention. Sessions ranged between 5 and 6 min. Her mean scores increased from 5.8 in the weekly probe to 8.45 in intervention in the no icon condition. Her data was variable in both conditions however and an intervention effect was not clearly demonstrated.

Stephano received 21 intervention sessions and approximately 2 hours of intervention, with sessions ranging between 5 and 7 min. Baseline scores in both conditions were low and stable, and he scored 0 in all probes in true baseline. He quickly showed an intervention effect in the icons condition and all data points except one, were at or above the highest baseline data point. He did not show an intervention effect at the same time in the no icons condition and a decision was made to introduce a structured fading of icons in the intervention procedure after seven intervention sessions, as described in the method. He showed an intervention effect two sessions after the change in the intervention procedure, scoring between 5 and 13 on the remaining probes on all but on one occasion when he scored 0 after a 2-week school break.

In the first measure of social validity the naïve observer selected the intervention narrative as better than the baseline narrative in 90% of the paired transcripts. In the second measure of social validity each staff member selected the late intervention narrative as the better narrative in five out of five pairs. The teachers identified 11 (out of a possible 35) macrostructure elements from the early retells and 34 (out of a possible 35) macrostructure elements from the late intervention retells. For Andre teachers correctly identified 21 from early retells and 34 out of 35 from late intervention. For Zoe, teachers identified 15 from early retells and 27 from late intervention and for Stephano 3 were identified from early retells and 23 from late intervention.

Discussion

The aim of this pilot study was to improve the capacity of four participants with ASD and severe language impairment to retell a short fictional narrative. This study has extended the existing body of research with participants with ASD by including participants with lower levels of intellectual ability than in previous oral narrative interventions (e.g., Gillam et al. 2015; Petersen et al. 2014) and by including participants with ASD and severe language impairment. The measurement of macrostructure used to score the dependent variable was reliable and, despite the severe language impairment of the participants, no significant problems were encountered in its application. The results of this intervention are in keeping with previously reported interventions that have also used macrostructure icons, modeling of narrative, and the participant producing the entire narrative each session (Brown et al. 2014; Gillam et al. 2015; Miller et al. 2018; Petersen et al. 2014; Spencer et al. 2013; Spencer and Slocum 2010).

A strong experimental effect was demonstrated in both the no icons and the icons conditions for Monica and Stephano and a suggestive experimental effect in both conditions for Andre. Although Zoe’s mean scores improved, the variability in her scores means that an experimental effect was not clearly demonstrated. It is also difficult to know the extent to which her absence seizures may have affected her performance. Although all four participants improved their performance following intervention, none achieved the maximum score. In order to achieve the maximum, participants were required to recall and produce seven different components of the story macrostructure. The participants’ failure to achieve a maximum score could be due to a number of reasons. First the requirements of planning even a simple narrative retell may have exceeded the children’s language capabilities. They may have been unable to allocate mental resources to both macrostructure and microstructure (Colozzo et al. 2011) resulting in a trade-off between language features (Crystal 1987). Second, the requirements of the task may also have placed excessive demands on working memory. Working memory enables an individual to store and process information at the same time (Baltruschat et al. 2011) and, although much of the evidence concerning working memory in individuals with ASD is conflicting, there is evidence for reduced working memory performance in those with ASD and cognitive delay (Poirer and Martin 2008). Finally, Zoe and Andre produced segments of unintelligible speech during their probes and it is possible that their scores were depressed, as some parts of some responses could not be understood.

The purpose of the present pilot study was, in part, to trial and refine intervention procedures. Revisions that were made to Monica and Stephano’s intervention procedures highlight the benefits of single case research with this population. Children with ASD often have specific and idiosyncratic abilities (Busby et al. 2012; Nicholas et al. 2008) and, consequently, may respond differently to interventions. Single case research allows for changes to be made in the intervention procedure during a study to accommodate these individual responses.

While Monica made rapid and consistent progress with the first four elements, her progress plateaued and she struggled to consistently achieve a score of two in each of the do, next and end elements. Similarly, Andre demonstrated particular difficulty with the next component of the retell and, with hindsight, he may have benefited from extra intervention around that component. Previous researchers have noted that some elements of retells are more difficult than others and these more difficult elements may require more teaching (Dimino et al. 1995). The revisions were designed to teach by example rather than by explaining the meanings of the elements do, next and end. The amendments to Monica’s intervention did not lead to a consistent increase in her scores but it is possible that the complexity of the do and next elements in particular exceeded her language or cognitive capabilities. External factors also contributed to variability of scores in some instances when she was apparently distracted during the tasks. Monica was moved to maintenance conditions after 52 intervention sessions as it was considered that she had possibly reached her cognitive and linguistic limit.

Revisions were made to Stephano’s intervention procedure after only seven intervention sessions, as he was not transferring the skills he gained in the icons condition to the no icons condition. A structured fading procedure was implemented where the icons were removed as a final step in the intervention and he was required to retell the intervention narrative without using the icons to guide him. He was then able to rapidly transfer skills developed using the icons to the no icons condition. Inclusion of additional fading procedures in the intervention was not a necessary step for the other participants.

Experimental control was not clearly demonstrated for Zoe and her data showed great variability, including some high scores in baseline. There were, however, 11 occasions out of 31 when her intervention probe data for the no icons condition were above the highest baseline data point. The case for intervention effect in the icons condition is stronger as she started to show some stability, at or above the highest baseline data point toward the end of the intervention period. This stability in the icons condition could be due to two factors. First, her increasing stability coincided with the change in probe collection conditions. Second, her better performance in the icons condition could be due to her explicit use of the icons to help her retell the story. She stated “I can use the icons to help me” very soon after the intervention had started.

Social validity data address the meaningfulness of the intervention, which includes showing that the intervention produced clinically important changes (Foster and Mash 1999; Wolf 1978). The results from the social validity measurements indicated that raters who were blind to the conditions under which a narrative retell had been produced, evaluated the later intervention narratives as being better than baseline narratives. They also indicated that the later intervention narratives of all participants included a higher number of recognizable and correct macrostructure components. Thus, there is strong evidence of meaningful improvement in narrative retell according to the assessment of blinded observers.

The daily probes used to measure intervention effects were a close measure of generalization as participants were not taught the probe narratives. The outcomes for this measure have been discussed above. An additional far measure of generalization was the participants’ capacity to retell a storybook typical of classroom use. The participants were not able to generalize the taught macrostructure system when retelling these storybooks. The narratives in the storybooks typical of classroom use were more complex than the intervention narratives. Specifically, they were longer, had varied presentation of the macrostructure elements, contained more complex syntax and vocabulary and required some inference to establish a full understanding of the events. The difficulty with this measure of generalization could be that the storybooks typical of the classrooms were not well matched to the capabilities of the participants, at least with regard to independent narrative understanding and retell.

Limitations

A number of limitations of the present pilot study should be acknowledged. The production of accurate transcripts was difficult for several reasons. First, participants presented with marked social pragmatic language deficits such as not speaking loudly enough, not facing the first author when talking and speaking quickly. These issues resulted in decreased intelligibility, even for motivated and familiar listeners. Second, due to severe language impairments, participants made a high number of unpredictable language errors. In addition, recording was conducted using an iPhone microphone and it is possible that use of a higher quality external microphone could have increased transcript reliability. Due to the difficulties in transcription, transcript reliability was marginal at 78%. It was, however conducted stringently. All utterances, not just those affecting coding were included in the reliability data. Critically, the majority of disagreements were over words or phrases that did not carry meaning for the coding (e.g., disagreements over whether the participant said the versus a or ate it versus ated) and had very limited effect on coding.

Transferring knowledge from the clinic to the classroom is important in the research to practice framework (Brown et al. 2014) and another limitation to the study is the individual delivery of the intervention by a speech language pathologist, which is not always practical in a school setting. The semi-scripted, short, intervention could however be modified for use by teachers as an individual or classroom based group intervention. Researchers in two previous studies have been able to show an intervention effect when delivering an oral narrative intervention to small groups (Brown et al. 2014; Spencer and Slocum 2010).

Finally, changes to the probe conditions need to be noted. Firstly, for Zoe when it was noted by the first author that a non-essential component of the data collection process was affecting her performance and secondly the change in the icon probe condition for Monica early in the study. These changes clearly compromise ability to infer causal influence and were threats to internal validity. Nevertheless, the purpose of the current pilot study was to trial measurement and this data will be useful in future research.

Implications for Practice and Future Research

The outcomes of this pilot study are promising. The results from this study indicate that the intervention may be effective and it has supported the usefulness of previous interventions using similar strategies. The materials and strategies implemented in this study are potentially useful for a clinical or classroom practitioner, but given the limited research, caution should be used.

Future research in the area should incorporate a number of changes. Researchers in future narrative retell intervention studies should consider strategies to specifically address more complex narrative components, such as the do and next components of fictional narrative retells, to enable more complete information to be retold.

Researchers in future studies could also investigate the efficacy of small group intervention with participants with ASD and severe language impairment within the classroom. Finally, an area for future research could be the investigation of how to translate the effect of intervention with simple and predictable stories, such as the ones used in this study, to more complex stories, for example, those where the elements are presented in an unpredictable order and where higher levels of inference are required. This might include progressively varying macrostructure element order and gradually increasing complexity to facilitate generalization to the more complex stories that are typically used in the classroom.

Conclusion

In this paper the effects of an oral narrative intervention on the fictional narrative retells of four participants with ASD and severe language impairment are described. Key components of the intervention included the use of macrostructure icons to represent the components of a simple orally presented narrative, modeling the narrative, and the participants being required to retell an entire narrative each intervention session. There was reasonable evidence of efficacy of the intervention for three of the four participants with untaught narratives. Revisions to the intervention procedure were made for two participants, highlighting the suitability of single case research for this population. The learned skills were maintained but generalization to storybooks typical of classroom use did not occur. Areas for future research include investigation of group delivery, maximizing performance on more complex narrative components and transferring skills developed with simple short narratives to storybooks typical of classroom use.