Introduction

Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by impaired development in social communication and social interaction and restricted, repetitive patterns of behavior, interests, or activities (American Psychiatric Association 2013). Severity and degree of impairment vary greatly within ASD (Irwin et al. 2011). The prevalence of ASD has increased over the last decades (Campbell et al. 2011; Elsabbagh et al. 2012; Irwin et al. 2011), now estimated at 62 per 10,000 (Elsabbagh et al. 2012). People with ASD often require life-long support (Reichow et al. 2012) and experience reduced quality of life (Khanna et al. 2011; Kuhlthau et al. 2010; Lee et al. 2008). The provision effective treatment and education is essential for the patient’s independence and coping.

Numerous treatment strategies have been claimed to improve functional outcomes for children with ASD, but there is a paucity of controlled studies examining the efficacy of most treatments (National Standards Project 2009; Oono et al. 2013). Early and intensive interventions, based on applied behavior analysis, are one of the few treatment options with a strong empirical support (Eldevik et al. 2009; Howlin et al. 2009; Makrygianni and Reed 2010; Reichow et al. 2012, 2018; Reichow and Wolery 2009; Virués-Ortega 2010; Wong et al. 2015). Behavioral interventions utilizing a developmental orientation have evolved over the past 20 years and are now referred to as “Naturalistic Developmental Behavioral Interventions” (NDBI). NDBIs integrate developmental principles and applied behavior analysis and also incorporate a developmental systems approach (Schreibman et al. 2015). NDBIs focus on facilitating learning and development in daily interactions supporting the learning of functional skills. In NDBIs the intervention provider systematically use joint activities to expand children’s reciprocity, communication, social, and play skills simultaneously targeting cognitive, motor, and adaptive skills (Schreibman et al. 2015). Pivotal response treatment (PRT) is one of several interventions referred to as NDBI (Schreibman et al. 2015). Other NDBI interventions are, among others, the Early Start Denver Model (ESDM; Rogers et al. 2012), Reciprocal Imitation Training (RIT; Ingersoll 2012), and Joint Attention Symbolic Play Engagement and Regulation (JASPER) (Kasari et al. 2010).

PRT for ASD is provided in naturally occurring situations in order to facilitate generalization, reduce prompt dependency, and increase spontaneity and motivation (Suhrheinrich 2015). PRT is uniquely related to other NDBIs in the focus on pivotal areas, i.e., developmental areas that, when targeted, result in a widespread effect on other, not targeted, areas and skills. Research on PRT indicates that targeting pivotal areas contributes to more efficient treatment, as fewer skills need to be specifically targeted (Koegel and Koegel 2012). The four pivotal areas are motivation, self-initiations, self-management, and responding to multiple cues (Bryson et al. 2007), with motivation as the core one (Koegel and Koegel 2012; Smith et al. 2015). The area of motivation can be targeted by five main PRT motivation techniques (Koegel and Koegel 2012):

  • Child chosen stimulus items

  • Interspersal of acquisition and maintenance tasks

  • Task variation

  • Natural reinforcement

  • Reinforcing attempts

Several sources consider PRT an evidence-based intervention for children with ASD, but few longitudinal or controlled trials confirm the efficacy (Bozkus Genc and Vuran 2013; Cadogan and McCrimmon 2015; Suhrheinrich 2015; Wong et al. 2015). Our aim was to conduct a systematic review of randomized controlled studies examining the effectiveness of PRT on social communication, social interaction, and repetitive behavior in children with ASD.

Methods

A protocol for this review is published in PROSPERO international prospective register of systematic reviews (identification number CRD42016038328). We followed the recommendations of the Cochrane Collaboration when conducting the review (Higgins and Green 2011) and the PRISMA checklist for reporting systematic reviews (Moher et al. 2009). An extended version of this review is published in Norwegian as a part of a master thesis and available at request.

Eligibility Criteria

We included randomized controlled trials involving children with ASD up to 18 years of age (Table 1). Our targeted intervention was PRT, with outcome measures for social-communication, social interaction, and repetitive behaviors.

Table 1 Study eligibility and exclusion criteria

Information Sources and Search Strategy

We searched the following databases from their inception to August 2017: MEDLINE, EMBASE, PsycINFO (all via Ovid), ERIC, Cinahl, SocINDEX (all via EBSCOhost), Cochrane Central Register of Controlled Trials (CENTRAL), and ahead of print citations in PubMed. We searched OpenGrey and Google Scholar for gray literature and ClinicalTrials and WHO-International Clinical Trial Registry Platform (ICTRP) for ongoing trials. We also conducted a citation search of included studies in Web of Science and assessed reference lists of included studies as well as existing systematic reviews to identify additional potentially relevant studies.

We developed a search strategy for Ovid MEDLINE, which was adapted to the other databases (Online Resource 1). A librarian assessed the quality of the search using Peer Review of Electronic Search Strategies (PRESS) checklist (McGowan et al. 2016).

Study Selection

Two reviewers (HNO and KL) independently assessed the titles and abstracts of records identified by the search. Records appearing to meet the inclusion criteria were retrieved in full text. The same two reviewers independently assessed the full-text publications for inclusion.

Data Extraction and Risk of Bias Assessments

Two reviewers (HNO and KL, LVN, or KGB) extracted data from included studies and used a modified version of the guidelines from the Cochrane Consumers and Communication Review Group to assess risk of bias in the included studies (Ryan et al. 2007). Risk of bias was assessed in nine domains: Random sequence generation, allocation concealment, baseline measurements, blinding of treatment providers, deviation from intended interventions, blinding of outcome assessment, incomplete outcome data, selective reporting, and other bias. Each criterion was assessed as having “high,” “low,” or “unclear” risk of bias (Higgins and Altman 2008).

Data Synthesis

In meta-analysis of continuous outcome data we calculated standardized mean differences (SMD) (Deeks et al. 2008), 95% confidence intervals and P values using a random-effect model (Deeks et al. 2008). We examined the chi-square and I2 tests for heterogeneity (Deeks et al. 2008).

Two reviewers (HNO and KGB) applied Grading of Recommendation, Assessment, Development and Evaluation (GRADE) to assess the quality of the evidence for each outcome (Schünemann et al. 2008). Briefly, the quality of the evidence depends on the risk of bias in included studies, directness of the evidence, heterogeneity, precision of the summary estimates, and risk of publication bias (GRADEpro 2014).

Results

The search resulted in 4916 records after removal of duplicates (Fig. 1). A total of 4821 records were excluded after assessment of titles and abstracts, and 89 records were excluded after assessment of full text (Online Resources 2 and 3).

Fig. 1
figure 1

Flow chart search results and study selection (Moher et al. 2009)

Study Characteristics

Five studies described in seven publications (two studies were described in two publications)’ were included in this review (Hardan et al. 2015; Mohammadzaheri et al. 2014, 2015; Nefdt 2007; Nefdt et al. 2010; Openden 2005; Schreibman and Stahmer 2014). Characteristics of included studies are presented in Table 2.

Table 2 Characteristics of included studies. Two studies were described in two different publications

Setting and Participants

The five studies included 181 children, and 91 of these received PRT. The mean age of all children was 5.3 years, ranging from 2.4 to 9.2 years across the studies. Three studies included children with a minimum of language skills (Hardan et al. 2015; Mohammadzaheri et al. 2014; Nefdt 2007), two included children with a maximum of intelligible words (Schreibman and Stahmer 2014), whereas one study did not apply criteria for language skills (Openden 2005).

Interventions and Comparisons

Two studies compared PRT to treatment as usual (Mohammadzaheri et al. 2014) or information about the diagnosis (Hardan et al. 2015). One study compared PRT to Picture Exchange Communication System (PECS) (Schreibman and Stahmer 2014). The two last studies used a waiting-list control group (Nefdt 2007; Openden 2005).

The duration of the PRT intervention ranged from 1 to 23 weeks, but one study did not specify duration (Nefdt 2007). Two studies used professional therapists to implement the intervention (Mohammadzaheri et al. 2014; Schreibman and Stahmer 2014), whereas parents provided the intervention in the remaining studies (Hardan et al. 2015; Nefdt 2007; Openden 2005). A detailed overview of the interventions and comparisons is available in Table 3.

Table 3 Overview of interventions in included studies. Two studies were described in two different publications

Reported Outcomes

All studies reported outcomes for social communication within the subdomain for expressive language. Three studies assessed the subdomain for communication (Hardan et al. 2015; Mohammadzaheri et al. 2014; Schreibman and Stahmer 2014), and one study assessed the subdomain for receptive language (Hardan et al. 2015). Moreover, Hardan et al. (2015) and Mohammadzaheri et al. (2015) reported social interaction and repetitive behavior, respectively (Table 2).

Risk of Bias

A summary of the risk of bias assessments is shown in Fig. 2. Incomplete information to judge risk of bias associated with random sequence generation and allocation concealment was a concern in three studies (Mohammadzaheri et al. 2014; Nefdt 2007; Openden 2005). Lack of blinding was associated with high or unclear risk of bias for four studies (Hardan et al. 2015; Mohammadzaheri et al. 2014; Openden 2005; Schreibman and Stahmer 2014), while two studies had a risk of bias related to incomplete outcome data (Nefdt 2007; Schreibman and Stahmer 2014). We distinguished between outcomes directly measured with tests or observations of the child and more subjectively scored outcomes. Only one study was assessed as having an overall low risk of bias for language and social interaction outcomes when measured directly (Hardan et al. 2015). When more subjective scoring was used to assess the same two outcomes (Hardan et al. 2015), the outcomes were judged as having unclear risk of bias. Most other outcomes were judged as having high risk of bias. A full risk of bias assessment is provided (Online Resource 4).

Fig. 2
figure 2

Summery of risk of bias of included studies (Review Manager (RevMan) 2014)

Effects of Interventions

Social-Communication Skills: Communication

Two studies measured communication skills using parent or professional reporting (Hardan et al. 2015; Mohammadzaheri et al. 2014). The results of these two studies were synthesized in a meta-analysis, and the resulting standardized mean difference (SMD) was 1.12 (95% CI − 0.49 to 2.73; P = 0.17; Fig. 3). Hence, the difference between the groups was not statistically significant, but the confidence interval ranged from medium effect in favor of the comparator to large effect in favor of PRT. Wide confidence intervals and the presence of considerable heterogeneity in the analysis (P = 0.003 and I2 = 89%) prevent us from drawing certain conclusions.

Fig. 3
figure 3

Forest plot for the comparison of PRT versus treatment as usual measured on subjectively reported communication (child social-communication skills)

Schreibman and Stahmer (2014) compared PRT versus PECS (another active communication-intervention), and their study was therefore not included in the meta-analysis. They reported a SMD − 0.57 (95% CI − 1.25 to 0.10; P = 0.10) on subjectively reported communication skills. The difference was not statistically significant, and the wide confidence interval ranged from a small effect in favor of PRT to large effect in favor of the comparison intervention.

Social-Communication Skills: Expressive Language

Mohammadzaheri et al. (2014) and Hardan et al. (2015) measured expressive language by child observation. The results were pooled in a meta-analysis (SMD of 0.48; 95% CI 0.04 to 0.93; P = 0.03; Fig. 4), implying a statistic significant result in favor of PRT. The meta-analysis was not associated with heterogeneity (P = 0.67 and I2 = 0%), but the confidence interval ranged from little or no differences to a large positive effect in favor of PRT.

Fig. 4
figure 4

Forest plot for the comparison of PRT versus treatment as usual measured on directly measured expressive language (child social-communication skills)

Expressive language was also reported by Nefdt (2007) and Openden (2005), but the duration of the PRT intervention was very short and the control conditions poorly described. We therefore deemed it inappropriate to include the data in the same meta-analysis as above. Instead, Nefdt (2007) and Openden (2005) were pooled in a separate meta-analysis that resulted in a SMD of 0.58 (95% CI 0.03 to 1.13; P = 0.04; Fig. 5). Hence, the latter analysis also pointed in favor of PRT, with a similar effect estimate, confidence interval, and heterogeneity (P = 0.30 and I2 = 8%) as the meta-analysis comparing PRT versus treatment as usual.

Fig. 5
figure 5

Forest plot for the comparison of PRT versus passive intervention measured on directly measured expressive language (child social-communication skills)

Schreibman and Stahmer (2014) compared PRT versus PECS without finding a statistically significant difference in expressive language between the two active interventions (SMD = − 0.40; 95% CI − 1.04 to 0.24; P = 0.22). The confidence interval ranged from a large positive effect in favor of PECS to a small positive effect in favor of PRT and does not facilitate certain conclusions.

In two studies, parents or professionals subjectively rated the child’s expressive language (Hardan et al. 2015; Schreibman and Stahmer 2014). Hardan et al. (2015) did not find statistically significant differences between PRT and information when measured by VABS (SMD = 0.45; 95% CI − 0.13 to 1.03; P = 0.13) or CDI (SMD = − 0.35; 95% CI − 0.92 to 0.23; P = 0.24). Schreibman and Stahmer (2014) did not detect a statistically significant difference between PRT and PECS when measured by CDI (SMD = − 0.06; 95% CI − 0.72 to 0.61; P = 0.87). For all the three effect measures, the confidence intervals were too wide to allow firm conclusions about the effectiveness of PRT.

Social-Communication Skills: Receptive Language

Hardan et al. (2015) described the effect of PRT on receptive language (SMD 0.22; 95% CI − 0.35 to 0.79; P = 0.45). The confidence interval ranged from a slight positive effect in favor of information about the diagnosis to a medium effect in favor of PRT.

Social Interaction

Hardan et al. (2015) measured the effectiveness of PRT by using an objectively assessed severity scale (CGI-S) (SMD 0.46; 95% CI − 0.12 to 1.04; P = 0.12) and an improvement scale (CGI-I) (SMD 1.12; 95% CI 0.50 to 1.74; P = 0.0004). The authors also reported a more subjective outcome measure (SRS) that gave rise to an SMD of 0.48 (95% CI − 1.10 to 1.06; P = 0.10). The reported effect estimates were highly inconsistent, preventing us from drawing firm conclusions about the effectiveness of PRT.

Repetitive Behavior

Only Mohammadzaheri et al. (2015) assessed repetitive behavior through direct assessment, and the authors showed a statistically significant effect in favor of PRT (SMD 15.97; 95% CI 11.57 to 20.36; P < 0.0001).

Subgroup and Sensitivity Analyses

The planned subgroup and sensitivity analyses to explore possible differences between different treatments and to explore reasons for heterogeneity in our meta-analyses were not conducted due to few included studies.

Quality of Evidence

The quality of the evidence was assessed using GRADE and rated as low for most of the outcomes and “very low” for one outcome (Tables 4 and 5; Online Resource 5). The main reasons for low rating of the quality of evidence was poor precision of results due to wide confidence intervals and few studies. Some outcomes were also downgraded due to risk of bias, indirectness, and inconsistencies. When the quality of evidence is rated to low or very low, it implies that we have limited confidence in the presented effect estimates and that more research is needed before we can draw certain conclusions.

Table 4 Summary of findings: comparison control treatment
Table 5 Summary of findings: comparison passive treatment

Discussion

Five randomized controlled trials were included in this systematic review. Due to differences in interventions and comparison interventions, we were only able to conduct three meta-analyses, each with two studies. Expressive language measured by direct methods showed statistically significant effects of PRT as compared to treatment as usual and passive treatment, with SMD about 0.5. A positive effect of PRT on subjectively reported communication skills cannot be ruled out, but the difference did not reach statistical significance (SMD 1.12; 95% CI − 0.49 to 2.73; P = 0.17). We also summarized results from other studies and outcomes, but these results were not possible to compile in meta-analyses due to differences in the interventions and comparison interventions. For most of the outcomes, the quality of the evidence was judged to be low, implying limited confidence in the results. With regard to the subjectively reported communication skills, the quality of evidence was rated to very low, implying that no conclusions could be drawn.

The majority of the included studies had methodological limitations, with selection bias as the most concerning. Three of the included studies lacked information about randomization and allocation concealment (Mohammadzaheri et al. 2014; Nefdt 2007; Openden 2005). We included the three studies despite these possible shortcomings but are aware that inadequate random sequence generation and allocation concealment can be associated with inflated effect estimates (Pildal et al. 2007; Schulz et al. 1995). Moreover, inadequate randomization procedures and small studies often indicate a need to adjust for baseline differences between the groups in the analysis. The P values reported by Openden (2005) demonstrate that P values may vary considerably between adjusted and unadjusted analysis, but because adjusted effect estimates were not reported in the primary studies, we had to include unadjusted effect estimates in our meta-analysis.

There were two main concerns on the acceptability to perform meta-analysis. First, there was a risk that pooling high and low risks of bias studies would exaggerate the summary effect and increase the heterogeneity in a meta-analysis. We planned to accentuate studies with low risk of bias in sensitivity analysis, but unfortunately, the limited number of available studies prevented us from carrying out meaningful sensitivity analyses. The second concern was the variation in treatment providers. In three studies, parents provided the intervention (Openden 2005; Nefdt 2007; Hardan et al. 2015), but the number of included studies in each meta-analysis was not sufficient to allow meaningful subgroup analysis. However, it was a clear difference between parents and professionals in their ability to reach fidelity with the implementation criteria. Thus, the decision to pool studies irrespective of who provided the intervention may contribute to clinical diversity and a more heterogeneous meta-analysis.

The meta-analyses measuring communications skills and expressive language included the same two studies (Hardan et al. 2015; Mohammadzaheri et al. 2014), but only the meta-analysis of communication skills was associated with considerable heterogeneity (P = 0.003; I2 = 89%). Mohammadzaheri et al. (2014) reported large effect sizes for all their outcomes, and as discussed by Wood et al. (2008), the combination of subjective measures and inadequate allocation concealment may have led to exaggerated effect sizes. Another factor giving a potential placebo effect, pointed out by Masi et al. (2015), is trials located in Iran compared with the United States (USA). Mohammadzaheri et al. (2014) was conducted in Iran, while the study by Hardan et al. (2015) was conducted in the USA.

Last, the studies included in this review used three different manuals describing only the pivotal area of motivation. Motivation was identified first, and considered as the core area, affecting all other areas (Koegel and Koegel 2012). Several efficacy studies investigating motivation have been published (Koegel and Koegel 2012), while other pivotal areas are explored in more recent years (Koegel and Koegel 2012). This makes more research investigating the other areas highly needed. Lack of studies investigating these areas restricts the generalizability of the results to the three other pivotal areas.

Comparison with Other Studies

There are other published reviews that evaluate PRT alone or as one of several interventions for ASD (Bozkus Genc and Vuran 2013; Oono et al. 2013; Rispoli et al. 2011; Schultz et al. 2011; Sisavath 2014). Their results are more positive than in our review, although one showed mixed results (Verschuur et al. 2014), and one review (Boudreau et al. 2015) concluded that PRT is not a promising intervention. There are several reasons why our results differ from those of previous reviews. First, former reviews have mainly included single case design studies (Boudreau et al. 2015; Bozkus Genc and Vuran 2013; Rispoli et al. 2011; Schultz et al. 2011; Sisavath 2014; Verschuur et al. 2014). Accordingly, these reviews have included a larger number of studies. One review included studies where only one participant needed to be diagnosed with ASD (Verschuur et al. 2014), and one study included only peers of children as education providers (Boudreau et al. 2015). Previous reviews have also included parent interventions or parent education programs in general (Oono et al. 2013; Rispoli et al. 2011; Schultz et al. 2011; Sisavath 2014). Those reviews analyzed outcomes according to studies using PRT, three studies specified communication or language as outcomes (Oono et al. 2013; Rispoli et al. 2011; Schultz et al. 2011; Sisavath 2014), and one specified social skills (Boudreau et al. 2015). Sisavath (2014) reports that all included studies using PRT increased communication and language skills for the children. Boudreau et al. (2015) concluded that PRT is not a promising intervention for targeting social communication skills. The result is likely to be influenced by the fact that peers of children implemented the intervention.

Last, Oono et al. (2013) included one study specifying implementation of PRT (Nefdt 2007), which is the only one also included in this review. They assessed the study by Nefdt (2007) more strictly according to randomization, allocation concealment, and blinding of education providers than we did, but the overall assessment was judged as high risk of bias in both reviews. Oono et al. (2013) neither included the study in a meta-analysis nor described other analyses according to outcomes for children’s language skills, making any comparison with our review difficult.

Implication for Practice and Research

Due to low overall quality for most of the results, the impact on the field of practice is uncertain, and our review does not answer whether efficacy of PRT depends on whether parents or professionals implement the treatment. A clearer result in this regard would have had a greater impact on recommendations for practice. The internal validity of the studies included in this review indicates that reaching fidelity of implementation before providing the intervention may affect the results in a positive way. This implies a larger focus on implementation fidelity, regardless of whether parents or professionals are treatment providers. We need more research addressing this aspect. Until more certain conclusions can be drawn, professionals should follow-up on criteria for implementation fidelity, by offering good and sufficient training to parents’ implementation of PRT.

Another important recommendation for future research is to separate evaluations of implementation of PRT from evaluations of training parents in PRT. Our review included a limited number of outcomes, and more knowledge on several other outcomes for PRT is needed, such as quality of life, intelligence quotients, adaptive behavior, and parent stress. These outcomes are important for the child and family’s long-term outcomes.

Given that most of our included studies had methodological shortcomings, the quality of evidence is low. For the future, we need robust research to improve the certainty of the evidence, a recommendation that also calls for increased use of validated and objectively measured outcomes.

The results for this review are mixed and inconclusive, but this does not imply that PRT interventions are ineffective. On the contrary, our results support and call for more high quality research of PRT interventions for children with ASD.

Conclusions

Five randomized controlled trials were included in this review. A statistically significant effect of PRT was seen on expressive language skills, but the overall certainty of evidence was judged as low, and it is difficult to draw firm conclusions. We need more research to fill the existing gap of evidence. Future studies should focus on transparent and robust methodology and the use of validated outcomes.