Introduction

Autism spectrum disorder (ASD) is a neurodevelopmental disorder that affects an estimated one in 36 children in the United States (Maenner et al., 2023). ASD is characterized by persistent deficits in social communication and interaction, as well as repetitive and restricted patterns of behavior, interests, or activities (American Psychiatric Association, 2013). Children with ASD often face challenges in multiple areas of development, including language, communication, social skills, and adaptive functioning (e.g., Zwaigenbaum et al., 2015). Research (e.g., Magiati et al., 2014) has demonstrated the long-term impact of ASD on academic performance, social engagement, and quality of life. Moreover, the increasing economic and societal cost of raising a child with ASD can be a burden to their families (Rogge & Janssen, 2019). Research has shown early intervention for children with ASD can have positive impacts, especially for communication and socialization (e.g., Lord et al., 2022; Zwaigenbaum et al., 2015). However, questions about active elements of early intervention remain, with research showing the effectiveness of intervention can vary regarding to the specific intervention approach and child characteristics (Trembath et al., 2023). Therefore, it is essential to scientifically identify effective early interventions and its active elements for children with ASD.

Naturalistic Developmental and Behavioral Interventions

In 2015, Schreibman and colleagues proposed a framework to categorize intervention models that integrate behavioral and developmental strategies, which they termed naturalistic developmental and behavioral interventions (NDBIs; Bruinsma et al., 2020; Schreibman et al., 2015). NDBIs integrate behavioral technologies and developmental principles in instructional strategies, such as environmental arrangements, natural reinforcers, prompting and prompt fading strategies, turn-taking, and adult’s imitation of child’s behavior, and use of the three-term contingency (i.e., antecedent, behavior, consequence). The teaching targets of NDBIs arise from developmental domains, such as language and communication, play, social interaction, cognition, and motor skills. The teaching opportunities occur in child’s naturalistic environment, daily routines, or during other highly motivating interactions to promote generalization and maintenance of the new skills in natural settings. Empirical studies have shown positive impacts of NDBIs for children with ASD across areas, such as adaptive skills (e.g., Estes et al., 2015; Ingersoll et al., 2017), cognitive skills (e.g., Kasari et al., 2008; Wetherby & Woods, 2006), social communication (e.g., Brian et al., 2017; Shire et al., 2017) and language (e.g., Chang et al., 2016; Dawson et al., 2010), and play (e.g., Chang et al., 2016; Shire et al., 2017).

While there are a number of interventions that can be considered NDBIs, two of the most researched interventions include the Early Start Denver Model (ESDM) and Pivotal Response Treatment (PRT). ESDM (Rogers & Dawson, 2010) is an evidence-based intervention for young children with ASD and focuses on children’s social-emotional, cognitive, and language development. ESDM prioritizes individualized intervention and parental involvement; parents are often trained to implement ESDM during daily interactions to promote generalization. Since the initial clinician led trials of ESDM, further research has shown it to be effective when delivered in groups (e.g., Vinen et al., 2018; Vivanti et al., 2014) and through parent education and training (e.g., Rogers et al., 2012; Vismara et al., 2016). PRT (Koegel & Koegel, 2019) is also an evidence-based intervention for children with ASD that targets pivotal developmental areas, such as motivation, responsivity to multiple cues, and self-management. PRT utilizes natural reinforcers to encourage engagement and encourages interventionists to respond to children’s cues and interests during child-led activities. While PRT is often delivered individually to children, studies have shown it to be effective when delivered in groups (e.g., Hardan et al., 2015) and through parent education and training (e.g., Gengoux et al., 2019; Schreibman & Stahmer, 2014).

Recently, a number of systematic reviews and meta-analyses have been published on NDBIs (e.g., Forbes et al., 2020; Sandbank et al., 2020; Tiede & Walton, 2019). While the reviews have, overall, reported predominantly positive findings, gaps, and inconsistencies in the research on NDBIs for children with ASD remain. For example, there are variations in the way in which NDBIs have been defined and used to include primary studies in reviews of NDBIs. Tiede and Walton (2019) included studies on Learning Experiences and Alternative Program (LEAP; Strain & Bovey, 2011) while others have not. In their review of PRT, Forbes et al. (2020) excluded studies that did not explicitly identify the intervention as PRT by name, possibly to the exclusion of studies with similar intervention techniques. Given the increased adoption of NDBIs in clinical practice (Bruinsma et al., 2020) and divergent findings in NDBIs reviews, we felt an overview of reviews would provide a systematic appraisal of the extant evidence on NDBI from meta-analytic reviews. The purpose of this overview was twofold: 1) What are the overall effects of NDBIs on children with ASD under 8-years-old? and 2) Which variables may influence the effects of NDBIs for children with ASD under 8-years-old?

Methods

We registered, a priori, a review protocol with the International Prospective Register of Systematic Reviews (PROSPERO; CRD42022353045). We conducted an overview of reviews consistent with the methods outlined by Cochrane (e.g., Pollock et al., 2023) and is reported consistent with the Preferred Reporting Items for Overview of Reviews (Gates et al., 2022).

Eligibility Criteria

We included systematic reviews meeting the following inclusion criteria:

  1. 1.

    Included at least one meta-analytic synthesis for a child outcome;

  2. 2.

    Primary studies of the review examined a NDBI (as defined by Schreibman et al., 2015);

  3. 3.

    Primary studies of the review included children with ASD who had a mean pre-treatment age less than 8-years-old;

  4. 4.

    Primary studies of the review were conducted used a two-group experimental design comparing NDBIs to a comparator;

  5. 5.

    Review was published in English.

For this overview, we did not place any restrictions on publication status (i.e., we did not exclude grey literature) or publication date. We did not consider systematic reviews with meta-analysis of single case experimental designs or qualitative research given differences in meta-analytic methods for these types of research design.

Information Sources and Search Strategy

We conducted an electronic database search of Academic Search Premier, Cumulated Index to Nursing and Allied Health Literature (CINHAL), Education Resources Information Center (ERIC), Medline, and APA PsycINFO in EBSCO during October 2022 (updated in August 2023). The search strategy for each database is included in shown in Supplemental Text 1. We also used snowball methods (Greenhalgh & Peacock, 2005) to review the reference lists of all included reviews and Google Scholar to conduct a forward search of articles that cited included reviews.

Selection Process

We exported the records from the electronic database search into Covidence (Veritas Health Innovation, 2020) for screening and study selection. Two reviewers independently screened records by title and abstract based on eligibility criteria, with disagreements resolved through consensus. The remaining records were then screened at the full-text stage, in which the same two screeners independently screened the full text of each record against the eligibility criteria. Disagreements between reviewers were resolved through discussion with a third party.

Data Collection Process and Data Items

A data extraction sheet was developed, pilot tested on three randomly selected reviews, and then refined. After finalizing the data extraction form, two reviewers extracted the data independently, with disagreements resolved through discussion and mediation when necessary. We extracted information on research characteristics, participant characteristics, and intervention characteristics. For the research characteristics, we extracted the number of primary studies included, the meta-analytic model, the study selection methods, and the characteristics of the comparator or counterfactual condition. For the characteristics of the participants, we extracted variables such as age and gender. For the characteristics of the intervention, we extracted the intervention methods/techniques, intervention components, intervention agent (i.e., who implemented the intervention), and intervention density (i.e., duration of intervention and intensity). We also extracted the data, when provided, on moderator and mediator analyses included in the meta-analytic synthesis of the included reviews (i.e., we extracted extant moderator/mediator analyses but did not conduct new moderator/mediator analyses).

Because outcomes were reported differently across reviews, we created outcome categories to help synthesize evidence across reviews. The categories included communication/language, cognition, adaptive behavior, autism symptomatology, and restricted and repetitive behaviors. Given the breadth of studies and outcomes related to communication and language, we created four sub-categories within this outcome category (i.e., generalized language, expressive communication, receptive communication, social communication). Operational definitions of the outcome categories used in this review, with exemplar measures for each category, are shown in Table 1.

Table 1 Operational definition of outcome categories

Risk of Bias

We used the Risk of Bias in Systematic Reviews (Whiting et al., 2016) to assess the risk of bias of the included meta-analytic reviews. As with data extraction, two independent reviewers extracted the risk of bias data with disagreements resolved through consensus. For the risk of bias assessment, we judged four domains: study eligibility criteria, identification and selection of studies, data collection and study appraisal, and synthesis findings. After each domain of bias was assessed, the reviewers made a summary-level judgement by collating the concerns of risks of bias identified for each domain. Finally, an overall risk of bias rating (i.e., high, low, or unclear) was made for each review. We did not assess the risk of bias for the primary studies of the included reviews.

Overlap of Primary Studies

We used the corrected covered area (Pieper et al., 2014) to quantify the degree of primary study overlap across included meta-analyses. Corrected covered area was calculated as \(CCA= (N-u)/(uc-u)\), where N was the number of included primary studies (including double counting), u was the number of primary studies (excluding duplicated reports), and c was the number of meta-analyses. We used Pieper and colleagues guidelines for quantifying the level of overlap for slight (0–5%), moderate (6–10%), high (11–15%), or very high (15–100%) levels of overlap.

Synthesis Methods

We conducted descriptive and narrative syntheses of the outcomes reported in meta-analyses of the included reviews. The intervention effect on each category of outcome was first assessed by examining the estimated magnitude of effects shown by the effect sizes calculated in each review. We extracted the standardized mean difference (SMD) effect size (e.g., Cohen’s d, Hedges’ g) or data from which to calculate a SMD for each outcome category or sub-category from each review. When a review reported more than one outcome (i.e., effect estimate) for a single category or sub-category, we selected one dependent measure from each study as a representative estimate based on the following hierarchy: (1) dependent measure collected using direct or standardized assessments, (2) dependent measure with the largest number of primary studies, and (3) dependent measure that included the largest number of child participants. We chose one outcome per outcome category to report for summative analyses, using the decision rules outlined above; we used this to select representative effect size estimates for the Ona et al. (2020) and Uljarević et al. (2022) reviews. To formulate conclusions regarding the evidence of NDBIs on child outcomes across reviews, we created tables to explore patterns of magnitude and statistical significance.

Deviations from Review Protocol

We made two changes to the intended overview of review methods outlined in our protocol (CRD42011253045). First, we decided to exclude meta-analyses of primary studies that used single-case research designs. This decision was made, in part, due to finding a sufficient number of meta-analyses of group design studies, which met the aims of this overview more closely than single-case research designs. Second, we had planned to include caregiver outcomes in addition to child outcomes, but we were unable to include caregiver outcomes because the reviews that met our inclusion criteria did not contain any caregiver outcomes.

Results

The electronic database search identified 1,304 records. After removing 398 duplicates, 906 records remained and underwent title and abstract screening. We removed 809 records based on titles and abstracts alone, which left 97 records for full-text screening. After full-text review, five meta-analyses (reported in six articlesFuller et al., 2020; Ona et al., 2020; Sandbank et al., 2020; Uljarević et al., 2022; Wang et al., 2022) met all eligibility criteria and were included in this overview (see Fig. 1 for a flow diagram of review selection). The use of the snowball search methods identified an additional 1,390 records but yielded no additional reviews meeting all inclusion criteria. All five reviews were published in peer-reviewed journals.

Fig. 1
figure 1

PRIOR flowchart of initial search

Review Characteristics

The five reviews that met our inclusion criteria were published between 2020 and 2022. The number of primary studies (u)Footnote 1 in each review ranged from seven (Ona et al., 2020) to 26 (Sandbank et al., 2020). Across the five reviews, the cumulative number of included studies summed to 66. This figure represents a gross count of primary studies that includes a count including primary studies that were in more than one review. Across reviews, the total number of unique (unduplicated) primary studies was 48 (u = 48); 11 primary studies (Dawson et al., 2010; Hardan et al., 2015; Mohammadzaheri et al., 2014, 2015; Nefdt et al., 2010; Rogers et al., 2012, 2019; Schreibman & Stahmer, 2014; Vinen et al., 2018; Vismara et al., 2016; Vivanti et al., 2014) were included in more than one review. The primary study overlap estimated by the corrected covered area (CCA) was approximately 9.38%, indicating a moderate level of overlap.

The characteristics of the five included reviews are shown in Table 2. Most reviews (4 of 5) used multiple search methods for study selection. The most common methods for study selection included electronic database searches (5 of 5) and hand searches of the reference lists of included studies (3 of 5). Three meta-analyses included information on the comparators used within the primary studies, with the most common comparators being treatment as usual (u = 9), waitlist control (u = 4), and psychoeducation (u = 1). Across reviews, different methods of meta-analysis were used to statistically combine effect size estimates. Three reviews (Ona et al., 2020; Uljarević et al., 2022; Wang et al., 2022) used a random-effects model and two reviews (Fuller et al., 2020; Sandbank et al., 2020) used robust variance estimation. Across reviews, we extracted 23 SMD effect size estimates across our five outcome categories and sub-categories.

Table 2 Characteristics of included meta-analytic reviews

Participant Characteristics

The characteristics of the participants of the reviews are shown in Table 3. Across studies, 1,697 child participants were included in the four reviews (data on the total number of participants was not available for the Sandbank et al. review), with a range of 181 (Ona et al., 2020) to 640 (Fuller et al., 2020) children. Across reviews, the mean age of the child participants ranged from 2.5 (Fuller et al., 2020) to 5.6 (Uljarević et al., 2022) years old. The majority of reviews reported more than 80% of participants were male, which is consistent with the typical gender distribution of males to females for ASD (American Psychiatric Association, 2013).

Table 3 Characteristics of primary studies included in reviews

Intervention Characteristics

Table 3 also shows the characteristics of the interventions examined in the primary studies included in the reviews. Two reviews (Fuller et al., 2020; Wang et al., 2022) included studies that examined the ESDM, two reviews (Ona et al., 2020; Uljarević et al., 2022) included studies that examined PRT, and one review (Sandbank et al., 2020) included studies examining NDBIs collectively. Across the primary studies included in the reviews, the intervention agents included caregivers (u = 24), interventionists (u = 30; i.e., paraprofessionals, therapists, educators, and clinicians), and a combination of caregivers and professionals (u = 12). Five reviews reported data on intervention density. Two reviews (Sandbank et al., 2020; Wang et al., 2022) reported the mean length of intervention across primary studies, with a range of 0.8 (Sandbank et al., 2020) to 10.8 months (Wang et al., 2022), and the other three reviews reported the range intervention length of primary studies, with a range of 0.2 to 35.9 months. One review (Wang et al., 2022) reported the density of intervention across primary studies in hours per week, with a mean of 10.9 h per week.

Risk of Bias

A summary of risk of bias across the five included reviews is shown in Fig. 2 and an itemized risk of bias by review and domain is shown in Supplemental Table 1. Overall, 80% of the included reviews were judged to have a high risk of bias. The highest levels of concern were seen in the domains of synthesis and findings (4 of 5 reviews rated as having a high risk of bias) and study selection (2 of 5 reviews rated as having a high risk of bias). Lower risks of bias were seen in the domains of study eligibility criteria and data collection and appraisal, where all reviews were judged to have had a low risk of bias.

Fig. 2
figure 2

Risk of bias assessment

Effects of NDBI on Child Outcomes

Effect size estimates for each outcome category are shown by review in Table 4, which serves an overview of the results of this overview. Four reviews (Fuller et al., 2020; Ona et al., 2020; Sandbank et al., 2020; Wang et al., 2022) meta-analyzed an outcome included in our category of generalized language outcomes. Across meta-analyses, the SMD ranged from 0.20 (95% CI 0.03 to 0.38, u = 19; Sandbank et al., 2020) to 1.12 (95% CI –0.49 to 2.73, u = 2; Ona et al., 2020). Two reviews (Ona et al., 2020; Uljarević et al., 2022) meta-analyzed expressive communication outcomes, with a range of SMD 0.48 (95% CI 0.04 to 0.93, u = 2; Ona et al., 2020) to 1.37 (95% CI –2.53 to 5.27, u = 3; Uljarević et al., 2022) and one review meta-analyzed receptive communication (SMD = 0.51; 95% CI 0.23 to 0.80, u = 3; Uljarević et al., 2022). Three reviews (Fuller et al., 2020; Sandbank et al., 2020; Wang et al., 2022) meta-analyzed social communication with the range of SMD from 0.01 (95% CI –0.18 to 0.20, u = 7; Wang et al., 2022) to 0.35 (95% CI 0.18 to 0.53, u = 24; Sandbank et al., 2020).

Table 4 Summary of effect sizes by outcome categories

Four reviews (Fuller et al., 2020; Sandbank et al., 2020; Uljarević et al., 2022; Wang et al., 2022) meta-analyzed cognitive outcomes, with SMD ranging across reviews from 0.15 (95% CI –0.17 to 0.48, u = 3; Uljarević et al., 2022) to 0.41 (p = 0.04, u = 9; Fuller et al., 2020). Three reviews (Fuller et al., 2020; Sandbank et al., 2020; Uljarević et al., 2022) meta-analyzed adaptive behavior. Across reviews, the SMD ranged from 0.12 (p = 0.46, u = 6; Fuller et al., 2020) to 0.31 (95% CI –0.03 to 0.65, u = 2; Uljarević et al., 2022). Four reviews (Fuller et al., 2020; Sandbank et al., 2020; Uljarević et al., 2022; Wang et al., 2022) meta-analyzed autism symptomatology, with SMD ranging from –6.03 (95% CI –13.45 to 1.40, u = 2; Uljarević et al., 2022) to 0.05 (95% CI –0.38 to 0.48, u = 6; Sandbank et al., 2020). Finally, two reviews (Fuller et al., 2020; Sandbank et al., 2020) meta-analyzed restricted and repetitive behaviors, with SMD ranging from –0.01(95% CI –0.34 to 0.32, u = 7; Sandbank et al., 2020) to 0.02 (p = 0.88, u = 5; Fuller et al., 2020).

Influential Variables

Three reviews (Fuller et al., 2020; Sandbank et al., 2020; Wang et al., 2022) conducted statistical tests (i.e., moderator/mediator analyses) to explore variables that might be related to differences in outcomes reported in their meta-analyses. The variables or factors that were explored included study quality (i.e., correlated measurement error, parent report, independent assessor, and randomization), intervention characteristics (i.e., parent involvement, length of intervention, hours of intervention per week, total number of intervention hours, interventionist, intervention delivery method, and study location), sample characteristics (i.e., mean chronological age, mean language age at study entry, and percentage of males in each sample), and outcome characteristics (i.e., proximity of outcome to intervention, boundedness of outcome to intervention, and measurement proximity). Among these variables, proximity of outcome to intervention (Sandbank et al., 2020), boundedness of outcome to intervention (Sandbank et al., 2020), and study location (Wang et al., 2022) were identified as variables with a statistically significant association to the effects of NDBI. Outcomes proximal to the intervention had larger effect sizes (β = 0.25, p = 0.041) than distal outcomes in Sandbank et al. (2020). Generalized (β = –0.40, p = 0.003) or potentially context-bounded outcomes (β = –0.31, p = 0.022) had smaller effect sizes than context-bounded outcomes in Sandbank et al. (2020). Additionally, Wang et al (2022) found that studies conducted in Asia (i.e., China), on average, had larger effect sizes than studies conducted in western countries (i.e., United States, Australia) for both autism symptomatology (Qbetween = 3.99, p = 0.046) and generalized communication/language (Qbetween = 7.12, p = 0.008). However, the moderating effects of proximity of outcome to intervention and boundedness of outcome to intervention did not have a statistically significant association in Fuller et al., 2020 (p = 0.20 to 0.95).

Discussion

This overview of reviews synthesizes the meta-analytic evidence on child outcomes associated with NDBIs for young children with ASD. The findings of this overview support the positive findings on the effects of NDBIs on young children’s communication and language skills and cognitive development. The findings across reviews for these outcomes were robust, with medium to large effect sizes shown for each outcome in multiple reviews. The largest statistically significant effect sizes for the communication outcomes were shown in Fuller et al. (2020), Ona et al. (2020), and Uljarević et al. (2022) and the largest estimated effects for cognition was shown in Fuller et al. (2020). The effects of NDBIs were not found to have statistically significant differences favoring the treatment group over the control group for adaptive behavior, autism symptomatology, or restrictive and repetitive behaviors. However, the meta-analyses for these outcomes were based on a small sample of primary studies, which can impact statistical power and allow for the presence of Type I error.

While NDBIs were shown to be an effective treatment option for most children with ASD, it is important to consider heterogeneity (between study heterogeneity and between review heterogeneity) when interpreting the findings of this overview. Heterogeneity is most typically used to refer to systematic differences between studies included in a single review or meta-analysis (Borenstein, 2019). For an overview of reviews, heterogeneity can also be used in reference to differences between included reviews (Pollock et al., 2023). Different types of heterogeneity have been suggested including clinical heterogeneity (i.e., differences in interventions, participants, or outcomes; Fletcher, 2007), statistical heterogeneity (i.e., differences in effects or study results; Fletcher, 2007), and methodological heterogeneity (i.e., differences in design or study confounds; Deeks et al., 2019). While there are standard methods for dealing with heterogeneity in meta-analyses (e.g., Borenstein, 2019; Deeks et al., 2019), guidelines and standards for dealing with heterogeneity in an overview of reviews has not been established (Gates et al., 2020). Without a standard for detecting and dealing with heterogeneity in overview of reviews, we explore the presence and impact of heterogeneity descriptively for this overview in the following section.

Between Review Heterogeneity in Overview of NDBI

Across the five reviews included in this overview, two primary intervention methods of NDBI were used—ESDM and PRT. While these interventions have many similarities and are both considered NDBIs, they do have some distinctions, which might introduce clinical heterogeneity. Similarly, the mean age of participants receiving ESDM was lower than the mean age of participants receiving PRT groups. Furthermore, Sandbank et al. (2020) included primary studies of ESDM and PRT with other NDBI interventions, whereas the other meta-analyses included primary studies of either ESDM or PRT alone. Differences in these and other variables (e.g., intervention fidelity) may lead to systematic differences in intervention effects and need further attention in systematic reviews.

We observed variability in the effects (i.e., results) presented across reviews, suggesting the presence of statistical heterogeneity. One possible reason for this heterogeneity could be that different meta-analytic models were used across reviews. In a random-effects model, a true effect size is assumed to be random and is estimated using a weighted mean that takes into account both within-study and across-study variances (Konstantopoulos & Hedges, 2019). When using robust variance estimation, the standard error is estimated by the observed variation in effect sizes and the average effect size does not require the assumptions on normal distribution of effect sizes, correct variances, and inverse variance (Hedges et al., 2010). Given the differences in how data area analyzed in these two meta-analytic models, differences in estimated effects (i.e., mean effect size estimates) would be likely to occur. For instance, Sidik and Jonkman (2006) found a random-effects model produced imprecise results due to errors associated with the estimation of marginal variances not being used. Robust variance estimate may provide a means by which to address some of the issues of multiple measures within a study and robust estimates about the mean effect size; however, the method does not confer robust estimates of the prediction interval, which can serve as a reliable indicator of heterogeneity. Additionally, the amount of within-study correlations between effect sizes (i.e., rho; Fisher & Tipton, 2015) needs to be justified in a robust variance estimation and is not always readily known or reported. Finally, between-study variance (i.e., tau-squared) and meta-regression coefficients are sensitive to changes in rho when between-study covariance is significantly smaller than the within-study covariances. Given these differences and possible influences, evaluation of differences in meta-analytic models using large extant data sets would be helpful.

Analyses of moderator/mediator analyses were done in three reviews (Fuller et al., 2020; Sandbank et al., 2020; Wang et al., 2022) included in this overview. While some variables (i.e., moderators) were found to have a statistically significant relation to the magnitude of effect (e.g., proximity of outcome to intervention, boundedness of outcome to intervention, and study locations) most of the statistical tests that were conducted in the included reviews did not find statistically significant relations between variables and outcomes suggesting the variables were not mediating or moderating the effects of NDBIs. Again, a small number of intervention studies on which the secondary analyses were conducted could limit the power of these tests to show meaningful associations between variables and should be reexamined when more primary studies have been completed.

Limitations

There are several limitations to this overview that should be taken into consideration when interpreting our findings. First, only one included review, Ona et al. (2020), had a preregistered protocol. Lack of pre-registration has been shown to be associated with greater review-level risks of bias, thus should be considered a limitation of this overview. Differences between the Ona et al. (2020) review protocol and their published review, including changes to the meta-analytic synthesis methods used, were noted, which could increase risks of bias. Second, while a risk of bias of assessment of the primary studies of the reviews included in this overview was not completed, four of five reviews (Ona et al., 2020; Sandbank et al., 2020; Uljarević et al., 2022; Wang et al., 2022) reported concerns with the methodological quality of the primary studies including lack of blinding of outcome assessors, lack of blinding of treatment personnel, and selection bias. Third, we only included meta-analyses of primary studies that were conducted using comparative group experimental designs. Thus, the meta-analysis of single case studies of NDBI by Bozkus-Genc and Yucesoy-Ozkan (2016) was excluded. Restriction on design type for primary studies may limit the generality of our findings given the large number of NDBI primary studies that are conducted using single case experimental designs. Third, the number of primary studies (u) included in the meta-analytic syntheses of the reviews was often small, which could lead to underpowered analyses raising a concern of Type I error. This concern is seen in the wide confidence intervals associated with many of the analyses done (see Table 4). Moreover, the primary studies included in the analyses often contained small sample sizes themselves, which could compound this potential limitation. Future reviews or overviews of NDBI might consider inclusion and synthesis of studies that utilized single case experimental designs to gain further insights into the effects of this category of interventions for children with ASD. A fourth, related limitation is that although three reviews conducted moderator analyses, only a small number of moderators were found to have a statistically significant relation to the effects of NDBI. Given the relatively small sample sizes in the included meta-analyses, replication of these exploratory analyses should be done as the small sample of primary studies could lead to Type I error. A final limitation of this overview is the moderate level of overlap of primary studies across the included reviews (corrected covered area [CCA] = 9.38%). Although many studies were included across reviews, methods for dealing with this overlap have yet to be established for overview of reviews.

Conclusion

As shown in this overview, the positive effects of NDBI for young children with ASD are supported by meta-analytic evidence. While the overall findings for NDBI across reviews are positive, the findings on specific outcomes and influential variables moderating the effects of NDBI are inconsistent. Additional evidence from randomized controlled trials and future meta-analyses are needed to strengthen our knowledge of the effects of NDBI for young children with ASD.