Introduction

Perinatal impaired gas exchange (hypoxia and hypercarbia) is a determinant of mortality in the neonatal period (Victory et al. 2004; Casey et al. 2001). Impaired gas exchange has also been associated with neurological outcomes including seizure, cerebral palsy and neonatal encephalopathy (Scafidi and Gallo 2008; Martinez-Biarge et al. 2011). Hypoxia and hypercarbia alter cell energy metabolism which can subsequently lead to cell death (Scafidi and Gallo 2008). Brain tissue is particularly sensitive to disturbances in energy metabolism during the perinatal period (du Plessis and Volpe 2002; de Haan et al. 2006). Several studies suggest that impaired gas exchange in the perinatal period may be associated with neurodevelopmental abnormalities such as intellectual disability (ID) and autism spectrum disorders (ASD) but results are mixed (Gardener et al. 2011; Gonzalez and Miller 2006).

Whereas measurement of impaired gas exchange would ideally include a combination of clinical, laboratory, and pathology data, clinical proxies are typically used in neonatal units (Goldenberg et al. 1984; Gonzalez and Miller 2006). Many of these proxies are related to resuscitative efforts of the newborn, which reflects some degree of cardiorespiratory compromise (Kattwinkel et al. 2010). Despite limitations, such measures have an advantage of being measured in virtually every newborn. For example the Apgar score, introduced by Virginia Apgar in 1953 is a measure of well-being in the immediate neonatal period and is routinely assessed worldwide (Apgar 1953). Standards measurements take place at 1 and 5 min after birth. In recent years, there has been a growing debate over the utility of Apgar score for the prediction of asphyxia (American Academy of Pediatrics 2006; Watterberg et al. 2015). Nevertheless, universal measurement of Apgar score provides researchers with an opportunity for an in-depth exploration of its clinical values. Several other measures have been used as proxies of impaired gas exchange including delayed onset of breathing, need for resuscitation and/or ventilation, and metabolic acidosis (Scafidi and Gallo 2008). In the present study we aimed to systematically and quantitatively review the evidence for the hypothesis that proxies of impaired gas exchange are associated with increased risk of ID and ASD.

Methods

We performed our meta-analysis in accordance to the latest guidelines on meta-analysis of observational studies (Stroup et al. 2000).

Search Strategy and Eligibility Criteria

A.M. and J.M. independently searched Medline and Web of Science core collection since inception through July 2015 for English language studies using the search terms: [(perinatal) OR (neonatal) OR (prenatal) OR (birth) OR (asphyxia) OR (hypoxia) OR (anoxia) OR (Apgar) OR (“fetal distress”) OR (“respiratory distress”)] AND [(autism) OR (“autistic disorder”) OR (“mental retardation”) OR (“intellectual disability”)]. We also manually searched references of several relevant review articles on the subject.

We included all published peer-reviewed population or clinic-based studies with a healthy comparison group that assessed the relationship between proxies of impaired gas exchange and ID or ASD. Definition of ID and ASD were based on the author criteria. Neurodevelopmental comorbidities were not considered as exclusion criteria. The studies were required to include one or more proxies of impaired gas exchange namely: acidosis, Apgar score, need for resuscitation or oxygenation, apnea or delayed crying and breathing, respiratory distress syndrome, and asphyxia/hypoxia (as defined by the author).

Exclusion criteria were: (1) assessing proxies as a part of a larger optimality score (without separate data), (2) failure to separate data on ID or ASD and including them as a broader term e.g. “neurodevelopmental disability”, (3) providing insufficient information to extract estimates and (4) limiting the study to high risk groups such as neonatal ICU survivors or preterm individuals.

Selection of the Studies and Data Extraction

Two independent reviewers (A.M. and J.M.) screened search output for eligible studies and discussed any discordance with a third reviewer (A.R.) to reach a final conclusion. Overlap of reports from the same study was investigated by the author names, location, and date. In the case of overlap, we selected the most suitable report based on a combination of quality, sample size, publication year, and adequate data presentation. Estimates for each study were extracted by A.M. independently on two separate occasions. In the case of inadequate presentation of data, we contacted the corresponding authors to obtain the data. We extracted the following data for each study.

  1. 1.

    Setting (population-based or clinical)

  2. 2.

    Definition of ID or ASD

  3. 3.

    Characteristics of subjects

  4. 4.

    Proxies of impaired gas exchange as risk factors

  5. 5.

    Results in the form of 2 × 2 contingency table, or odds ratio (OR) and 95 % confidence intervals (95 % CI). Adjusted estimates were also extracted if available.

  6. 6.

    In the case of ID, we also extracted mean and SD of IQ, if available for the exposed and unexposed groups.

We assessed the quality of each study using the Newcastle–Ottawa scale for assessment of observational studies in systematic reviews (Wells et al. 2011). This nine-point scale evaluates quality of studies in domains of sample selection, comparability, and outcome or exposure. This scale has been used widely in systematic reviews and is recommended by Cochrane’s Collaboration (Higgins 2012).

Statistical Analysis

Stata/MP V.14.0 (StataCorp, College Station, TX) was used for all data analysis. Meta-analysis and meta-regression were performed using “metan” and “metareg” commands respectively. We calculated pooled ORs (95 % CIs) for each proxy-outcome combination if two or more studies were available. Some studies reported results for subtypes of ID (mild, moderate, severe, profound) or ASD [autistic disorder, pervasive developmental disorder (PDD), Asperger’s syndrome]. In such cases the contingency table was used to recalculate the combined OR (95 % CI). If the values for 2 × 2 tables were not available, but the control group for each subtype was different, all ORs (95 % CI) were included in the meta-analysis separately. If neither was available and we could not obtain data from the author, the OR from the most representative group (usually that with the largest sample size) was included. Some studies reported relative risks. We recalculated ORs and 95 % CIs for these studies or requested data from the authors. We also carried out a meta-analysis of studies that compared continuous IQ scores between individuals with and without proxies of impaired gas exchange using standardized mean difference (SMD). In the case of Apgar score, if more than one cut-off was used for definition of a low Apgar score, we pooled data for each cut-off separately. Because few studies reported adjusted ORs, only crude estimates were used for the main meta-analyses. To study the effect of multivariable adjustment on the estimates, we also performed a sensitivity analysis using the adjusted estimates.

Publication bias was assessed using Peters regression test (Peters et al. 2006). When there was a risk of publication bias in subgroups in the absence of heterogeneity, trim and fill method was used within that subgroup to correct for potentially unpublished studies (Duval and Tweedie 2000). Agreement between the standard estimates and the estimates calculated by trim and fill method ensures that the findings are robust to bias (Peters et al. 2007).

We used a fixed or random effect model based on heterogeneity status. I2 statistics, and p value from heterogeneity Chi squared (I2 > 50 % or p < 0.1 meaning heterogeneous) were used to assess for heterogeneity. We then investigated heterogeneity using random-effect meta-regression. Meta-regression was performed only if at least ten studies were available. We performed meta-regression for year of publication (both as continuous and categorical with categories of before 1990, 1990 and thereafter), sample size (both as continuous and categorical with categories of <1000, >1000), country (North America versus other countries), type of study (population-based vs. clinic-based), disorder type for ASD (autism or ASD), IQ score cut-off for ID. Standard meta-regression methods can lead to false positive findings in the case of significant heterogeneity. To reduce the risk of false positive results, we used Monte Carlo simulation with 1000 permutations to report the p values.

We also performed sensitivity analyses for different pooling methods (fixed vs. random), population based studies, high quality studies (quality score >6), studies with adjusted estimates, IQ cut-off score of <70 (for ID), and autism and ASD subgroups (for ASD).

Results

Search Results and Study Characteristics

A total of 12,119 potentially relevant reports were identified. After excluding 3134 duplicates, 8649 reports were further excluded by screening titles and abstracts. We identified 19 additional studies by searching in the references of relevant review articles. Out of 355 full-text papers, a total of 67 studies (34 studies of ID and 31 studies of ASD, 2 studies of both) were eligible for meta-analysis (see eFigure 1 for specific reasons of exclusion) (Akesson 1966; Atladottir et al. 2015; Benaron et al. 1960; Bilder et al. 2009, 2013; Buchmayer et al. 2009; Burd et al. 1999; Burstyn et al. 2010, 2011; Camp et al. 1998; Campbell et al. 1950; Cans et al. 1999; Cassimos et al. 2015; Chapman et al. 2008; Comi et al. 1999; Corah et al. 1965; Darke 1944; Deykin and MacMahon 1980; Dodds et al. 2011; Drage et al. 1969; Duan et al. 2014; Dweck et al. 1974; Ehrenstein et al. 2009; Finegan and Quarrington 1979; Fisch et al. 1975; Froehlich-Santino et al. 2014; Gillberg et al. 1990; Gillberg and Gillberg 1983; Glasson et al. 2004; Graham et al. 1962; Handley-Derry et al. 1997; Jonas et al. 1989; Kamper 1978; Khaiman et al. 2015; Lamont and Dennis 1988; Langridge et al. 2013; Larsson et al. 2005; Lawlor et al. 2006; Lord et al. 1991; Louhiala 1995; Mamidala et al. 2013; Maramara et al. 2014; Mason-Brothers et al. 1990; Mrozek-Budzyn et al. 2013; Nath et al. 2012; Odd et al. 2009; Piven et al. 1993; Polo-Kantola et al. 2014; Rantakallio and von Wendt 1985; Robertson and Finer 1993; Schreiber 1943; Seidman et al. 1991; Sonnander and Gustavson 1987; Steffenburg et al. 1989; Stein et al. 2006; Stromme 2000; Sugie et al. 2005; Sussmann et al. 2009; Taylor et al. 1985; Thorngren-Jerneck and Herbst 2001; Usdin and Weil 1952; van Handel et al. 2007; Visser et al. 2013; von Wendt and Rantakallio 1987; Williams et al. 2008; Zhang et al. 2010; Maimburg and Vaeth 2006). Upon contact with the authors, we were able to retrieve data for two large studies (Langridge et al. 2013; Chapman et al. 2008). The most widely used proxy of impaired gas exchange was 5-min Apgar score <7, measured in 17 studies. eTable 1 and eTable 2 summarize the main characteristics of the studies included in the meta-analysis.

Meta-analysis

Results of meta-analyses are presented in Table 1. In the analyses of different proxies of impaired gas exchange, results of regression test for small study effect were significant for one meta-analysis. We observed slight changes in pooled estimates following the use of trim and fill method to correct for small study effect (Table 1). The measure for inter-study heterogeneity was significant for three out of eight meta-analyses for ID, and two out of ten meta-analyses for ASD (Table 1). Therefore, for the main analyses we used random effect models in these cases, whereas in the rest of the meta-analyses we used fixed effect models.

Table 1 Meta-analysis of the association between indicators of impaired gas exchange and intellectual disability and autism spectrum disorder

Acidosis

Two studies for ID (Dweck et al. 1974; Cans et al. 1999) and two studies for ASD (Maimburg and Vaeth 2006; Burstyn et al. 2011) investigated the association between neonatal acidosis and the outcomes of interest (Table 1). Acidosis at birth was associated with an OR of 3.55 (95 % CI 2.23–5.49) for ID, whereas it only slightly increased the risk of ASD (OR 1.10; 95 % CI 0.91–1.31).

Apgar Score

An Apgar score of <7 at 1 min was associated with an OR of 3.28 (95 % CI 2.31–4.65) for ID (Louhiala 1995; Taylor et al. 1985; Stromme 2000) and an OR of 1.40 (95 % CI 1.26–1.55) for ASD (Mason-Brothers et al. 1990; Polo-Kantola et al. 2014; Burstyn et al. 2010) (Table 1). Children with an Apgar score of < 7 at 5 min (Cans et al. 1999; Chapman et al. 2008; Ehrenstein et al. 2009; Jonas et al. 1989; Lamont and Dennis 1988; Louhiala 1995; Stromme 2000; Sussmann et al. 2009; Taylor et al. 1985; Thorngren-Jerneck and Herbst 2001; Seidman et al. 1991) had an OR of 5.39 (95 % CI 3.84–7.55) for ID and an OR of 1.67 (95 % CI 1.34–2.09) for ASD (Buchmayer et al. 2009; Burstyn et al. 2010; Mrozek-Budzyn et al. 2013; Bilder et al. 2009; Mason-Brothers et al. 1990; Dodds et al. 2011) (Fig. 1). Furthermore, there was a progressive increase in the risk of ID (but not ASD) with a decrease in the 5-min Apgar score cut-off, although the number of studies for each cut-off (except for the cut-off score of 7) was small (Fig. 1).

Fig. 1
figure 1

Meta-analysis of studies of 5-min Apgar score: the relationship between 5-min Apgar score and intellectual disability and autism spectrum disorder

Apnea and Respiratory Distress

Meta-analysis of five studies for neonatal apnea showed an OR of 2.93 (95 % CI 1.45–5.92) for ID (Benaron et al. 1960; Camp et al. 1998; Jonas et al. 1989; Sonnander and Gustavson 1987; Usdin and Weil 1952) (Table 1). Similarly, meta-analysis of five studies showed an OR of 2.89 (95 % CI 1.46–5.72) for ASD (Deykin and MacMahon 1980; Glasson et al. 2004; Mamidala et al. 2013; Stein et al. 2006; Zhang et al. 2010). Respiratory distress was associated with an OR of 2.70 (95 % CI 2.28–3.19) for ID (Atladottir et al. 2015; Fisch et al. 1975; Gillberg et al. 1990; Kamper 1978) and an OR of 1.55 (95 % CI 1.34–1.78) for ASD (Steffenburg et al. 1989; Piven et al. 1993; Finegan and Quarrington 1979; Gillberg and Gillberg 1983; Bilder et al. 2009; Lord et al. 1991; Sugie et al. 2005; Mrozek-Budzyn et al. 2013; Froehlich-Santino et al. 2014; Dodds et al. 2011; Buchmayer et al. 2009; Atladottir et al. 2015).

Resuscitation and Ventilation

A meta-analysis of four studies (Odd et al. 2009; Sonnander and Gustavson 1987; Langridge et al. 2013; Sussmann et al. 2009) showed that neonatal resuscitation was associated with a slightly increased risk of ID (OR 1.56; 95 % CI 1.47–1.66) (Table 1). The only study (Langridge et al. 2013) that addressed the association between resuscitation and risk of ASD found a slightly increased risk of ASD (OR 1.22; 95 % CI 1.08–1.37). O2 treatment was associated with an OR of 4.32 (95 % CI 3.23–5.78) for ID (Bilder et al. 2013; Cans et al. 1999; Gillberg et al. 1990; Jonas et al. 1989; Kamper 1978; Taylor et al. 1985) and an OR of 2.02 (95 % CI 1.45–2.83) for ASD (Finegan and Quarrington 1979; Froehlich-Santino et al. 2014; Gillberg and Gillberg 1983; Lord et al. 1991; Mason-Brothers et al. 1990; Mrozek-Budzyn et al. 2013; Piven et al. 1993; Steffenburg et al. 1989; Bilder et al. 2009) (Fig. 2).

Fig. 2
figure 2

Meta-analysis of studies of measures of resuscitation: the relationship between measures of resuscitation and intellectual disability and autism spectrum disorder

Undefined Hypoxia/Asphyxia

We found several studies that assessed the association between ID/ASD and hypoxia/asphyxia without defining the latter (Table 1). Meta-analysis of six studies (Darke 1944; Akesson 1966; Schreiber 1943; Rantakallio and von Wendt 1985; Campbell et al. 1950; Graham et al. 1962) showed a significantly increased risk of ID in children with undefined asphyxia/hypoxia (OR 3.29; 95 % CI 1.81–5.96). Slightly higher estimates were found for the association between ASD and undefined asphyxia/hypoxia (OR 3.84; 95 % CI 1.95–7.56) (Comi et al. 1999; Froehlich-Santino et al. 2014; Visser et al. 2013; Mamidala et al. 2013; Duan et al. 2014; Sugie et al. 2005; Maramara et al. 2014; Khaiman et al. 2015; Nath et al. 2012).

Meta-regression and Sensitivity Analysis

In all five meta-analyses where there was significant heterogeneity, we used a random-effect model. In only one meta-analyses on the association between 5-min Apgar score and ID was there a sufficient number of studies to allow for a meta-regression analysis. None of the tested variables in the meta-regression could explain the heterogeneity.

We performed several sensitivity analyses to assess the effect of various conditions on the estimates (Table 2). The difference in the estimates between random and fixed-effect models were <10 % with one exception; there was a 60 % difference in the pooled estimates between random-effect and fixed-effect meta-analyses of the association between ASD and apnea. The number of studies with adjusted estimates was small. Overall, analyses of studies with adjusted estimates were associated with <20 % change in the estimates, except for the meta-analysis of the association between undefined asphyxia/hypoxia and ASD. In general, meta-analyses of population-based and high-quality studies had little effect on the estimates, except for the association between ID and undefined asphyxia/hypoxia and the association between ASD and apnea. Meta-analyses of autism often resulted in larger estimates than meta-analyses of ASD, although the 95 % CIs always overlapped.

Table 2 Sensitivity analysis of intellectual disability and autism spectrum disorder

The number of studies that provided mean and standard deviation for IQ was small. A meta-analysis of two studies using hypoxic ischemic encephalopathy as the definition of asphyxia/hypoxia yielded a SMD of −0.72 (95 % CI −1.32 to 0.12); a meta-analysis of two studies of undefined asphyxia/hypoxia yielded an SMD of −0.26 (95 % CI −0.50 to 0.02). A meta-analysis of two studies of apnea yielded as SMD of 0.13 (95 % CI −0.19 to 0.45) (eFigure 2).

Discussion

The results of this meta-analysis show that the presence of proxies of impaired gas exchange in the neonate is strongly associated with increased risk of ID. Presence of these proxies in the neonate is also associated with increased risk of ASD but to a lesser extent. A number of systematic reviews on the association between prenatal complications and neurodevelopmental disorders have been published (Gardener et al. 2009, 2011; Bass et al. 2004). Our meta-analysis is the largest to date and is unique in three important ways. First, to the best of our knowledge, it is the first meta-analysis of the association between prenatal nonhereditary risk factors and ID. Second, it focuses on the proxies of impaired gas exchange as a means to assess possible hypoxic brain damage. And third, it provides us with insight about the differences between ID and ASD in their relation to neonatal altered gas exchange status.

With regard to ID, several inferences can be made from our findings. First, the direction of association between ID and proxies of impaired gas exchange was consistent across measures and studies. Second, the magnitude of association was particularly strong for proxies that are more reliable measures of impaired oxygenation such as acidosis or O2 treatment. Third, when cut-offs were available for an indicator (e.g. Apgar score), lower cut-off values were associated with larger estimates indicating a dose–response relationship. Low 5-min Apgar score often indicates poorer neonatal condition than low 1-min Apgar score. In our analysis, there was a stronger relation between low 5-min Apgar score and ID risk than low 1-min Apgar score and ID. Fourth, the association was robust to various analysis methods. Importantly, analyses of high quality studies, adjusted estimates, and population-based studies did not attenuate the results. Taken together, these findings are consistent with the assumption that neonatal exposure to impaired gas exchange might have detrimental effects on the brain with subsequent impairment in cognitive function (Bass et al. 2004).

With regard to ASD, the association between proxies of impaired gas exchange and ASD were qualitatively different. Associations were weaker, particularly for measures with more direct relation to impaired gas exchange, such as acidosis or O2 treatment. Furthermore, compared to meta-analyses of ID, there was little evidence of a dose–response relationship between proxies with cut-off score and ASD. It is important to note that a large proportion of patients with ASD also have some degree of ID (Baio 2012; Fombonne et al. 1997; Leonard et al. 2011; Oliveira et al. 2007; Wong and Hui 2008). This suggests that concomitant ID might account for the observed association between the gas exchange proxies and ASD. A recent study of the association between low Apgar score and ASD and/or ID has addressed this issue (Schieve et al. 2015). The authors found that ASD per se has a weak relation with low Apgar score, which increases to stronger association when ID accompanies ASD, and is strongest when ID happens without ASD. Another study showed that the association of resuscitation with ASD with ID is not different from that of ASD without ID (Langridge et al. 2013). Taken together, these findings suggest that the nature of the relation of ID accompanying ASD to impaired gas exchange is different from that of ID per se.

Our findings must be interpreted in the context of our study’s limitations. Our meta-analysis was based on observational studies. Therefore, although we found a strong, consistent association between proxies of impaired gas exchange and ID, our results cannot definitely attest to whether this association is causal (Weed 2000). However, given that it is impossible to conduct experimental studies of impaired gas exchange and neurodevelopmental outcomes in human, our meta-analysis provides the most rigorous synthesis of the current evidence. Moreover, meta-analysis, even at its best, only provides a rough solution to the problem of causality by addressing some of its aspects. For example, in the case of our meta-analysis there may be unknown factors that could account for the observed association. Unfortunately we were unable to assess the effect of sex as a potential effect modifier due to a small number of studies providing data. Sex-specific fetal programming can modify the brain’s vulnerability to prenatal stress and is therefore crucial to investigate in more depth in future studies (Mueller and Bale 2008).

Our meta-analysis could not verify the biological plausibility behind the association. However, evidence shows that neonatal brain tissue is extremely sensitive to hypoxic-ischemic damage (de Haan et al. 2006). Immaturity of the brain vasculature and autoregulation at birth, along with white matter vulnerability, contribute to brain injury following exposure to hypoxia/ischemia (Armstrong-Wells et al. 2010). In addition, hypoxic/ischemic insult induces inflammation, formation of free radicals and excitotoxicity, all of which can lead to exacerbation of neuronal damage and death (Armstrong-Wells et al. 2010; du Plessis and Volpe 2002). Regions that are involved in cognitive function, such as the hippocampus and cortex, are commonly injured following neonatal hypoxia (de Haan et al. 2006). Nevertheless, the detailed interplay between hypoxia and intellectual dysfunction has yet to be investigated.

Several other methodological considerations warrant further attention. We tried to minimize heterogeneity by separately analyzing various proxies. This approach resulted in meaningful reduction in heterogeneity compared with previous meta-analyses. For example, in their comprehensive meta-analysis of prenatal and perinatal risk factors for ASD, (Gardener et al. 2011) pooled apnea and respiratory distress together, which resulted in significant heterogeneity. We could significantly reduce the heterogeneity by analyzing these proxies separately. Nevertheless, heterogeneity due to variable design, definition of the conditions, and outcome measures was still an issue. Furthermore, even within a single measure, definitions could be different. This was particularly true for undefined hypoxia/asphyxia that in both ASD and ID were associated with significant heterogeneity. Furthermore, unclear definition of asphyxia/hypoxia makes it less useful than other proxies of impaired gas exchange. Another important limitation of our study was lack of more objective evidence of impaired gas exchange with the exception of a few studies that considered acidosis. At the same time however, the measures used represent real-world standard clinical practice and therefore they are both important and practical to study. Notably, these measures are not screening tools because most neonates with presence of these proxies do not develop disability, and most neonates with developmental disability have normal clinical course at the time of birth (Nelson and Ellenberg 1981).

Our study underscores the need for more thorough investigations into the role of altered gas exchange status in neurodevelopmental disorders. To increase the validity of these measurements, more direct measures of hypoxia/hypercarbia such as blood gas values and blood PH are needed. Neuroimaging studies can be particularly helpful in detecting functional and structural consequences of hypoxia/ischemia (Sie et al. 2000; Maneru et al. 2003). An important factor in our study was the definition of impaired gas exchange. Several other perinatal factors, from cord accidents and placental abnormalities to maternal disorders might be associated with hypoxia, acidemia, and hypercarbia in the neonate (Scafidi and Gallo 2008). We did not include all possible related measures because some of them were even less strongly related to asphyxia/hypoxia than currently used measures, making interpretation more difficult. Nevertheless, the measures that we included in our study have their own caveats. For example, low Apgar score can be a result of medications, infections, congenital anomalies, hypoxia and prematurity (Freeman and Nelson 1988), few of which were controlled for in the included studies. Apgar score of less than three might be a more valid measure of disturbed gas exchange status, but was unreported in most studies. Similarly, a need for O2 treatment might reflect some degree of hyperoxygentation and thus hyperoxic damage (Bancalari and Claure 2012; Casey et al. 2001).

In summary, our results show that presence of proxies of impaired gas exchange is associated with higher risk of ID. Moreover, our findings suggest that the ASD has a weak association with clinical proxies of impaired gas exchange. Future studies might focus on more objective measures of impaired gas exchange such as blood gas values and placental pathology as well as the effect of important modifiers such as sex and severity on the association between hypoxia/asphyxia and neurodevelopmental outcomes.