Introduction

Millions of children are exposed to anesthesia every year for surgical and diagnostic procedures [1, 2]. Advances in anesthesia care have resulted in dramatic improvements in perioperative safety in children over the past few decades. However, the question of long-term neurodevelopmental effects of anesthetic agents in children has emerged as a concern, with the Food and Drug Administration (FDA) issuing a drug safety communication in 2016 about the neurodevelopmental effects of anesthetic drugs in “children younger than 3 years or in pregnant women during their third trimester” (FDA) [3].

The origins of these concerns about potential neurotoxic effects of anesthetic drugs on the developing brain stemmed from preclinical studies, with studies in animal models from rodents to non-human primates suggesting that exposure to all commonly used anesthetic agents is associated with alterations in neurodevelopment [4, 5]. Several mechanisms by which anesthetic agents could produce neurotoxicity via direct effects on neuronal structural have been hypothesized, including neuro-apoptosis, generation of reactive oxidative species, influences on synaptogenesis and receptor expression, inhibition of neurotrophic factors such as brain-derived neurotrophic factor, among others [5]. However, translating the results from animal models to humans has been challenging given differences in brain structure and development stages between species as well as ethical and logistical considerations that limit the types of clinical studies that can be performed.

Interpreting the Clinical Studies

The vast majority of published clinical studies of anesthetic neurotoxicity are observational in nature, with most using data from pre-existing birth cohorts, or other research cohorts and educational or insurance databases. The advantage of this is that much of the data has been collected, so these studies often can be performed more efficiently than prospective studies. The disadvantage however is that they are restricted to the available data and the number of subjects that exist in the pre-existing datasets. For example, the available neurodevelopmental outcomes may not be the ideal outcomes for evaluating children following exposure to anesthesia, and adjustment for confounding is also restricted to the available covariates. Since these datasets were created for a variety of different reasons, the data available in these datasets also varies and contributes to differences between the studies, including the ages at which children were evaluated for anesthetic exposure, the age of outcome evaluation, the types of outcomes available, the types of surgery and comorbid disease in the children, the sample sizes, and who was chosen as a control for comparison. Given these differences, it is not surprising that there is also heterogeneity in the results, with some studies reporting differences in children who have been exposed to anesthesia and others not reporting differences. Summarizing the results from the published studies is therefore complex and requires an understanding of the limitations of the studies and a nuanced interpretation of the literature.

Age at Exposure and Assessment

Given that anesthetic neurotoxicity was initially observed in animal models, there has been uncertainty regarding translation to humans and the exact age at which children may be vulnerable to anesthesia. Given this lack of clear guidance, clinical studies have assessed children exposed at a variety of ages from the neonatal period [6] to early infancy [7, 8] to late childhood [9, 10]. Comparing studies evaluating exposures at different ages to determine a clear age of vulnerability is difficult because the studies may also differ in other important factors besides the age of exposure. Some studies were designed with the specific purpose of evaluating children exposed at different ages, but interpretation of those studies may even be limited. As an example, one study reported worse neurodevelopmental assessment scores in children exposed at older ages [11]. However, in that study, most of the older children received anesthesia for dental procedures, and it is possible that the need for anesthesia for dental procedures may indicate a higher baseline risk for poor neurodevelopmental assessment scores. Another study that attempted to overcome confounding based on procedure type evaluated children who all received the same minor procedures at various ages, finding that a similar increased risk of subsequent neuropsychological diagnoses was seen at all ages of exposure between birth and age 5 [12]. Despite the FDA warning against anesthetic exposures in pregnant women, prenatal exposures due to maternal surgery during pregnancy were not evaluated in clinical studies until recently, with two published studies to date. In one, 2024 children, of which 22 children with prenatal exposure to general anesthesia reported an increased incidence of subsequent externalizing behavioral problems [13••]. In the second, by Bleeser et al., 129 mothers with exposure to general or regional anesthesia during pregnancy were evaluated, and the prenatally exposed children did not differ from unexposed children. However, in a sub-analysis evaluating only the 111 mothers who had general anesthesia, more problems with executive function were observed in the children [14••]. Studies of prenatal anesthetic exposures are relevant because anesthetic exposure stems from the need to treat a medical problem in the mother, not the child. As a result, prenatally exposed children are unlikely to have a higher level of medical disease than children without prenatal exposure, eliminating a possible source of bias. However, an important limitation is that some mothers had serious illnesses including malignancy, where concomitant treatments such as radiation or chemotherapy have the potential to adversely affect the neurodevelopment of the fetus.

In addition to variation in exposure age, there is also wide variability in the ages at which children are assessed due to the available data in the published studies. The age of assessment may have important implications as some neurodevelopmental domains are challenging to evaluate accurately in early childhood. In addition, some neurodevelopmental deficits may not manifest until children are older and can evolve over time. The age at which children may be vulnerable to anesthetic agents and the ideal age to assess children following exposure remains unclear. These are important questions to answer as they could inform the design of prospective studies of anesthetic exposed children as well as guide clinical management including delay of elective procedures if anesthetics are ultimately determined to be neurotoxic.

Anesthetic Dose and Types of Medications

Anesthetic medication data are unavailable in many studies, and children are commonly assessed as being either exposed or unexposed without consideration for the type of anesthetic medications or doses administered. A group from the Mayo clinic has used the number of exposures evaluating children with single or multiple exposures as a surrogate for exposure dose [15,16,17,18]. These studies have generally shown that multiply exposed children have worse neurodevelopmental test scores and a higher rate of learning disability compared to either unexposed or singly exposed children. A limitation of these studies however is that children who require multiple anesthetic exposures generally also have a higher rate of comorbid disease than other children [19].

Some studies have recently evaluated specific types and doses of medications in potentially at-risk populations. A study of preterm infants admitted to a neonatal ICU evaluated associations between Full-Scale Intelligence Quotient (FSIQ) scores and types of anesthetic agents including volatile anesthetics, propofol, benzodiazepines, barbiturates, and ketamine, finding that exposure to all medications were associated with lower FSIQ at age 3 years of age except for opioids [20]. A separate study of preterm infants found that only exposure to over 7 days of opioids or benzodiazepines was associated with worse Bayley III scores at 2 years of age, which evaluates language, cognition, and motor function, while children with short exposures were more similar to unexposed children [21]. Simpao et al. investigated the association between various anesthetic and sedative agents and Bayley III scores at 18 months of age in children with congenital heart surgery during infancy. They reported that while cumulative exposures to volatile agents, opioids, benzodiazepines, and dexmedetomidine were not associated with adverse neurodevelopmental outcomes, higher doses of ketamine were associated with worse motor function [22]. Andropoulos et al. also evaluated children with congenital cardiac surgery and conversely reported worse Bayley III scores with greater volatile anesthetic dose, while also finding no association with opioid and benzodiazepine dose [23]. Many questions about the toxicity of specific medications, minimum doses for toxicity, and possible interactions between drugs remain. While these questions have been evaluated in some studies, it remains unclear whether specific drugs and doses cause neurodevelopmental deficits or are simply markers for higher levels of illness. However, as more studies are performed that evaluate drug-specific effects, a clearer view of these associations may be seen.

Types of Outcomes

A challenge with performing clinical studies of anesthetic neurotoxicity is that children exposed to anesthesia do not display deficits that can be easily appreciated on routine examination. Therefore, a clear phenotype of injury and the appropriate tests to evaluate this phenotype are not readily apparent. The outcomes that have been assessed include psychiatric or behavioral diagnoses, academic achievement tests, neuroimaging studies, and a wide range of neuropsychological tests. A few large-scale prospective studies have tried to address this uncertainty by evaluating a range of outcomes based on input from neuropsychologists and neurotoxicologists. The first is the GAS trial, the only large-scale randomized controlled trial of anesthetic neurotoxicity that has been performed, which randomized infants to receive either a brief sevoflurane anesthetic or an awake regional anesthetic for herniorrhaphy [8]. Children were exposed before 60 weeks postmenstrual age and evaluated at age 5. Two other prospective studies were observational in nature and employed an “ambi-directional” approach, with children retrospectively identified as having been exposed to anesthesia and then prospectively evaluated. The MASK study included children undergoing a variety of surgical procedures [24]. Children who had either a single or multiple exposures to anesthesia prior to age 3 were matched to unexposed children, and neurodevelopmental evaluation was performed at 8–12 or 15–20 years. The PANDA study compared siblings discordant for exposure to hernia surgery with neurodevelopmental evaluation at 8–15 years of age [25]. The primary outcome for all three prospective studies was intelligence as measured by FSIQ as a primary outcome with a host of other neuropsychological tests evaluated as secondary outcomes. While no score differences were observed in the primary outcome of FSIQ in all three studies, in each of the three studies, exposed children reported statistically significantly worse scores in some of the secondary outcomes.

As small effects may be difficult to recognize given the limited sample size of individual studies, a meta-analysis was performed evaluating prospective studies and pooling data from the outcomes that were common between the GAS, MASK, and PANDA studies [26••]. After pooling, approximately 800 children with a single brief exposure to anesthesia were compared to approximately 800 unexposed children. The exposed children were found to have no difference in FSIQ but significantly worse scores in internalizing, externalizing, and total behavioral problems as measured by the Child Behavior CheckList (CBCL), and executive function as measured by the Behavior Rating Inventory of Executive Function (BRIEF) were observed. To put these score differences into a clinical context, a secondary analysis evaluated the increased risk of crossing a threshold for clinical deficit based on these score differences and found that a single exposure to anesthesia was associated with a 47% increased risk of an internalizing behavioral deficit and a 68% increased risk of a deficit in executive function.

Given that most published studies were not prospective, another meta-analysis evaluated all published clinical studies [27••]. In this study, it was found that in 108 clinical studies of neurotoxicity, 422 different measures were evaluated, showing tremendous heterogeneity in the reported outcomes. The outcomes were classified into 9 different neurodevelopmental domains, and data from different studies were pooled, with the largest differences seen in executive function, behavior, and motor function, which is consistent with the meta-analysis of the GAS, PANDA, and MASK studies.

Recently, interest has developed in other outcomes including autism spectrum disorder (ASD). Pikwer et al. recently evaluated an association between anesthesia exposure and ASD in a nationwide cohort of children in Sweden, reporting that exposure prior to age 5 was associated with an almost two-fold higher risk of ASD and that risk was higher in younger ages of exposure [28•]. However, Laporta et al. evaluated children exposed to anesthesia prior to 3 years of age and conversely found that ASD was not associated with anesthesia exposure after adjusting for covariates [29•]. In performing prospective studies, young children exposed to anesthesia commonly need to grow older before they can be adequately evaluated with neuropsychological testing, potentially requiring years of follow-up that can be costly and logistically challenging. If, however, an objective measure of injury can be identified, children can be assessed earlier reducing the cost of studies and loss to follow-up. Salaun et al. recently published a translational study incorporating neuroimaging and examining exposure to general anesthesia and its long-term impact on behavior and brain structure in mice and humans [30•]. They reported preclinical and clinical evidence that exposure to anesthesia in childhood was associated with gray matter atrophy in the right prefrontal gyrus that was more pronounced with earlier general anesthesia exposure. In mice, the periaqueductal gray matter plays a role in fear discrimination, anxiety and depression which is consistent with existing literature [31,32,33]. Whereas in humans, reductions in the inferior frontal gyrus’ volume have been associated with dysregulated emotional function and depression [34, 35] which is also consistent other published clinical studies [26••].

Neurodevelopmental differences in children exposed to anesthesia have been reported in many studies evaluating a range of neurodevelopmental domains. However, due to the limitations of the studies, it remains unclear if these differences are caused by the anesthetic medications or other factors or confounders. The concept of confounding is that other factors may be associated with anesthesia exposure (e.g., underlying medical conditions), and it is these external factors and not the anesthetic medications that are causing the observed differences in exposed and unexposed children. While confounding in principle can be overcome by performing a randomized controlled trial, given the difficulty in performing these studies, there has only been one large-scale randomized controlled trial. In the remaining published observational studies, while it is unclear how much bias is introduced by confounding, these factors should still be considered. A variety of methods can be applied to reduce potential bias, but in order to implement these methods, the sources of confounding must first be understood.

Visualizing Confounders Using a Graphical Framework: Directed Acyclic Graph (DAG)

A method that can be used to better comprehend the relationships between exposures, outcomes, and potential sources of bias is the directed acyclic graph (DAG) [36,37,38,39,40,41,42], and its utility in studies of anesthetic neurotoxicity has been recognized [43]. In studies of anesthetic neurotoxicity, DAGs can be applied as a graphical tool to visualize the hypothesized causal relationship between anesthetic exposure, neurodevelopmental outcomes, and confounding factors that may bias that relationship. The use of DAGs can prevent overadjustment bias in multivariate analyses and permit a greater degree of accuracy in establishing causal associations, which is particularly helpful in studying areas where randomized controlled trials are difficult if not impossible to perform [44]. DAGs consist of nodes, which represent variables, with arrows denoting cause and effect relationships between nodes. The absence of an arrow between two nodes represents a lack of a causal relationship between those variables. For a graph to be a directed acyclic graph, each arrow must point in a single direction, and no variable can be an ancestor of itself [45]. DAGs are underpinned by a robust mathematical framework, and there are numerous software packages available to draw and analyze these graphs [46]. DAGs are used in various disciplines such as economics [47,48,49], education [50,51,52], and sociology [53,54,55] and have been steadily increasing in popularity across various areas of healthcare research including anesthesiology and surgery [42, 56, 57, 58••].

Using a DAG in Clinical Studies of Anesthetic Neurotoxicity

A proposed DAG illustrating relationships between anesthetic exposure and pediatric neurodevelopmental outcomes, along with shared common causes, can be seen in Fig. 1. An important feature encoded in the DAG is the deterministic relationship between surgery and anesthetic exposure, indicated by a bold red arrow. This signifies that surgery completely determines the exposure to anesthesia since a child will not receive surgery without anesthesia or receive anesthesia without undergoing a surgical or diagnostic procedure. While inclusion of this red arrow is not a typical feature of DAGs, it is a relevant consideration since distinguishing the effect of anesthesia from that of surgery is not generally possible in observational clinical studies. When interpreting this DAG, many factors have been listed, but it is important to note that there may be other unknown confounders have not been listed.

Fig. 1
figure 1

Directed acyclic graph (DAG) to help visualize the relationships between anesthesia exposure and pediatric neurodevelopmental outcomes. The bold red arrow denotes a deterministic relationship between surgery and anesthetic exposure. The baseline factors represent confounders, that may be a cause of surgery, pediatric neurodevelopment, and post-exposure factors. Post-exposure factors represent potential mediators and are separated into two groups, some of which may stem from surgery and anesthesia, while others from surgery alone. Some specific examples of potential confounders and mediators are listed

Baseline Sociodemographic and Clinical Factors

The baseline factors, or confounders, may be particularly important to consider in the DAG because they precede all other variables. The connection between baseline factors and pediatric neurodevelopment is motivated by the understanding that children requiring surgical procedures may possess medical conditions and underlying comorbidities. These clinical factors and other factors such as sociodemographic or geographic characteristics may increase the underlying risk for neurodevelopmental deficits independent of exposure to anesthesia, with an extensive list of potential confounders published by Walkden et al. [59]. Other baseline factors may also influence the need for surgery including demographic, geographic, and socio-economic characteristics. While one necessary condition for reaching a causal conclusion (that anesthesia causes neurodevelopmental deficits in children) is that all confounders are perfectly specified and accounted for, this is generally not possible in observational studies. However, nearly all clinical studies of anesthetic neurotoxicity make attempts to minimize bias by implementing methods to control for confounding. In observational studies, sociodemographic factors are likely to differ between exposed and unexposed children. Child sex is commonly accounted for in most studies as children who do and do not have surgery and anesthesia often differ based on sex, with hernias for example being performed much more commonly in boys [60]. Similarly, sociodemographic factors are commonly accounted for using statistical methods [61] or by using sibling matched [25, 62, 63] or twin-matched analyses [64], which have an added benefit of also achieving similarities in baseline genetic characteristics.

Accounting for clinical factors also presents a challenge, as even in sibling-matched studies, the sibling requiring surgery may have different baseline clinical characteristics from those who do not have surgery. This has been dealt with in some studies by adjusting for or matching exposed and unexposed children on baseline comorbidity or healthcare utilization variables [12, 24] and is particularly important in children requiring complex or multiple surgical procedures, as these children are more likely to have more underlying comorbidities. However, given that the majority of children who have surgery do not have major comorbid conditions [19], and the exclusion of children with complex conditions such as cardiac surgery had minimal impact on the study results [24]; it is possible that bias resulting from underlying medical problems may be limited after using methods to control for such confounding.

Another factor that complicates the interpretation of the clinical studies is the considerable variation in the patient populations evaluated. Some studies include primarily healthy children, while others exclusively evaluate preterm infants or children with significant comorbid diseases such as congenital cardiac disease or medulloblastomas [23, 65,66,67]. As such, findings from these studies may not be generalizable to healthy children, and finding appropriate controls for children in these studies may be challenging.

Adverse Healthcare Interactions, Parental Separation, Perioperative Pain, and Inflammation

The DAG also depicts two nodes representing variables that stem from exposure to surgery and anesthesia. The lower node is connected to surgery and baseline factors and includes psychological factors such as healthcare interactions and parental separation, as well as clinical factors such as perioperative pain and inflammation. The causal basis of these factors includes the concept that preoperative anxiety—including parental separation [68], lack of control over the environment, and adverse interactions with the healthcare system [69]—may influence long-term outcomes in children. Perioperative and intraoperative pain may also be associated with neurodevelopmental outcomes, as well as preoperative anxiety, which in pediatric patients has been linked to downstream effects such as postoperative delirium [70], perioperative pain [71], and maladaptive behaviors [72]. Painful stimuli without analgesia have been shown to trigger neurotoxic effects in the developing brain in both preclinical and clinical models [73, 74], particularly during the critical neonatal period [75, 76]. Even seemingly minor procedures such as neonatal circumcision, when performed without analgesia, are associated with increased subsequent pain behaviors [77]. However, given that adequate analgesia during painful stimuli attenuates the deleterious effects in both animals and humans [78,79,80,81] and these patients are in the operating room under anesthesia care, their pain is likely being managed, and therefore, the long-term effect on children may be limited. Over the past few decades, inflammation has also been increasingly recognized as playing a key role in central nervous system injury and development, and the inflammatory response due to surgery may also result in neurodevelopmental injury or heighten the brain’s sensitivity to anesthetic induced injury [82]. Several neurodevelopmental disorders have been associated with early life immune activation and inflammation, including autism spectrum disorder, cerebral palsy, depression, and schizophrenia [83, 84]. Surgery and certain underlying comorbidities may result in a pro-neuroinflammatory response, so while the potential for an increased neurodevelopmental risk due to inflammation is possible, any long-term effects in children remain unproven [82, 85].

Peri-operative Complications and Physiological Disturbances

The upper node of the DAG is connected to surgery, baseline factors, and anesthetic exposure and includes the presence of perioperative complications and physiologic disturbances that could potentially contribute to neurodevelopmental injury by reducing cerebral perfusion [86,87,88].

Blood pressures in anesthetized children have been found to be considerably lower than in non-anesthetized children [89], particularly in children under general anesthesia compared to regional anesthesia [90, 91]. Recently, a concern has developed from data reporting the associations between intraoperative hypotension and negative cardiac, renal, and neurologic outcomes [92,93,94]. Some have even proposed that pediatric neurotoxicity could be explained by extrinsic factors such as underlying deficits in anesthetic management such as hypotension, rather than as a byproduct of the intrinsic effects of the anesthetic drugs themselves [95]. Several studies investigating blood pressure, however, were unable to identify a significant association between intraoperative blood pressure and subsequent risk of neurodevelopmental deficits [19, 95•].

What Can We Learn from This DAG?

Factors present prior to exposure to surgery and anesthesia (i.e., baseline factors) should be adjusted for in order to control for spurious associations between anesthetic exposure and the pediatric neurodevelopment. While the use of specific methods varies based on the research question, some common methods that can be implemented include propensity score weighting, various matching techniques, instrumental variable analysis, or difference in difference analysis [37, 38, 45].

The DAG also incorporates multiple post exposure variables or factors that may occur as a result of anesthetic exposure, including perioperative complications and hypotension. Adjusting for such factors to study the effect of anesthetic exposure on pediatric neurodevelopment would lead to overadjustment bias. In contrast, causal mediation leverages post-exposure variables to understand whether the relationship between the exposure and outcome is partially or fully explained by one of these variables, also known as mediators [97,98,99]. There are various techniques used for performing causal mediation analysis, such as regression-based, weighting-based, and simulation-based estimation [97,98,99,100,101].

Conclusion

Since the question of anesthetic neurotoxicity first emerged two decades ago, many clinical studies evaluating children exposed to anesthetic agents have been performed. While interpreting these studies has been complex given the heterogeneity in the published literature, a few concepts have now been appreciated. Neurodevelopmental differences between exposed and unexposed children have been observed but are relatively small on an individual level. The magnitude of the differences vary based on neurodevelopmental domain with larger differences seen in executive function and behavior and the smallest differences in cognition. Given that nearly all published studies are observational, the results may be biased by confounding factors. Most studies have used methods to account for differences between exposed and unexposed children, so while there is almost certainly some confounding, how much these factors alter study results is uncertain. Methods exist to evaluate causal relationships and reduce bias and have been applied to in other scenarios, such as environmental exposures where randomized controlled trials cannot be performed and causality in principle teased out. As the question of whether anesthetic medications influence neurodevelopmental outcomes in children remains a subject of intense debate, further well-designed studies will be required, and the application of methods to aid with study design will be helpful in quantifying the contribution of mediating factors and reducing confounding bias in future studies.