Introduction

Subsurface contaminant transport is a common but difficult issue for impacted sites. Preventing the transport of contaminants in groundwater is often a key component of site remediation. Offsite transport can impact ecological and human health endpoints distant from the source. Subsurface characterization can be complex and expensive as contaminants can vary spatially and temporally. Since subsurface plumes can be challenging to characterize directly and are less amenable to observation than surface waters and soils, a weight-of-evidence approach is often used to facilitate site management.

The conceptual site model (CSM) is a key component for characterizing and remediating contaminated sites (e.g., USEPA 2011; USACE 2012; ASTM 2020). The CSM summarizes information about the site including the exposure pathways between sources of stressors (e.g., contaminants) and receptors. Although qualitative, the CSM informs the calculations used for characterizing and predicting the behavior of contaminants at a site (Bian et al. 2023), and the CSM is often paired with the development of clear assumptions for quantitative predictions (Bear and Cheng 2010). Because system processes represented in a model are captured with the CSM, development can be iterated with the model and tested with new information and data (McMahon et al. 2001). This makes the CSM especially useful for capturing information in a weight-of-evidence approach. Related uncertainties are also identified in CSM development phases including how and whether these can be quantified (Lerner et al. 2003). Bayesian modeling (e.g., Bian et al. 2023; Pan et al. 2020; Koch and Nowak 2015) may be used to capture the additional uncertainties of the statistical parameters or, sometimes, the structure of the model itself making them especially useful for propagating uncertainties from lines of evidence.

Recent research has introduced the notion of conditional dependence and independence, visualized through directed acyclic graph (DAG) relationships, for representing CSMs and elucidating exposure pathways (Carriger and Parker 2021). The conditional independence notions in a DAG are guided by set and graph theory and are critical assumptions when developing the DAG structure of Bayesian networks. In a Bayesian network graph, conditional independence is described by the theory of d-separation which states that two sets of variables are independent from each other given a third set of variables if any correlation between the first two sets is blocked by the third set (Pearl 1988).

As applied to CSMs, the notions of conditional independence can be beneficial for representing hypothesized causal pathways by providing clear and logical assumptions for the connections with a CSM. A CSM constructed in such a manner was called a conceptual Bayesian network (CBN) by Carriger and Parker (2021). A CBN provides additional support for inferences on the qualitative nature of causal pathways and documents the sufficient set of remediation alternatives needed to break causal pathways when they are known. Previous work using DAGs has also demonstrated the utility for conceptual modeling for different applications including combining coastal watershed and marine management concerns (Carriger et al. 2018), elucidating causal exposure pathways for site ecological risk assessment (Carriger and Parker 2021), integrating multidisciplinary perspectives on biofouling management in shipping (Luoma et al. 2021), and sustainable development of marinas (Luoma et al. 2024).

This article will further explore the utility of CBNs for applications concerned with controlling and preventing offsite contaminant migration, specifically for contaminated sites with dense nonaqueous phase liquids (DNAPLs). Remediation of contaminated DNAPL sites is a priority in the United States and globally due to the long-term risk DNAPLs pose to groundwater resources (e.g., Cohen and Mercer 1993; Pankow et al. 1996; Kueper et al. 2014a; and references therein). Kueper et al. (2014b) divide the life cycle of subsurface DNAPL contamination into five stages: (1) initial DNAPL release; (2) DNAPL redistribution; (3) continued DNAPL dissolution and aging; (4) complete DNAPL depletion; and (5) desorption and back diffusion. While most sites are likely in stage 3 or higher, some sites fall into the first two stages. Uncontrolled mobile DNAPL, that is, DNAPL not being actively recovered, generally represents a greater risk compared to immobile DNAPL because the spatial extent of the DNAPL is growing. Contaminants may volatilize from DNAPL into the gas phase within the vadose zone, and dissolve from DNAPL into groundwater both in the vadose and saturated zones, creating vapor phase and dissolved phase plumes that may extend beyond the limits of the DNAPL itself (e.g., Feenstra et al. 1996). Moreover, given the relatively low solubility of DNAPL in groundwater, recovery of DNAPL (when practical) is a much more efficient remediation process compared to the recovery of contaminant mass dissolved in the groundwater. Consequently, it is good practice to minimize DNAPL migration and remove as much DNAPL as possible from impacted sites. Characterization and remediation of subsurface DNAPL contamination can be difficult and expensive (e.g., U.S. EPA 2003; Suchomel et al. 2014). There is often a large degree of uncertainty about the presence of DNAPL, its spatial distribution, and the extent to which it may be immobile or mobile. Consequently, the topic is well suited for representation with a CBN.

The CBN concept and network flow will be adapted for a stylized example representing a generic contaminated site’s subsurface environment. Causal pathway analysis and intervention analysis for breaking causal pathways will be demonstrated. Additional extensions of the CBN concept will be explored such as pictorial CSM parallels and measurement nodes to represent lines of evidence for hypothesized events in the CBN.

Conceptual Bayesian networks

As introduced for use as a conceptual modeling tool for ecological risk assessment and management, the CBN allows the representation and evaluation of exposure pathways(Carriger and Parker 2021). The generalized structure, flow, and content of a CBN (Fig. 1) were also constructed based on guidance and recommendations from Carriger et al. (2018), Tighe et al. (2013), Ayre and Landis (2012), Neapolitan (2009), Korb et al. (2009), Cain (2001), and U.S. EPA (1998). The components of the CBN are the nodes (circles) and the connections, represented by arcs (arrows). The presence of an arc between two nodes normally implies a hypothesized cause-effect connection within a CBN. For example, concentrations in one spatially distinct region can causally influence the concentration in another connected region and this would be represented by two nodes for the concentrations in each region. The nodes represent random variables that can take on a mutually exclusive, exhaustive set of states (not shown), such as concentration intervals, presence/absence of a chemical, or high/medium/low groupings. One suggestion is that the selected nodes should be useful for capturing available data and guiding future information-gathering activities, and should reflect the potential outcomes from the information gathered in their states or designations (Borsuk 2008; Song and Song 2023).

Fig. 1
figure 1

Conceptual Bayesian network components and structural flow overview example. Red node = source; blue node = intermediate node; green node = endpoint; yellow node = modifier; light blue square node = intervention; orange boundary = exposure pathway system

A CBN may be constructed by first identifying the endpoints and the stressors’ sources. The red node in Fig. 1 represents a source for a stressor such as a release or a legacy source media concentration value. Source nodes should be chosen based on their capabilities for representing initiating, ongoing, or intermittent releases to the system. The green nodes represent the endpoints such as ecological receptors or offsite migration flux. The endpoints may be chosen due to their importance in the problem context (e.g., sensitivity to the stressors, stakeholder values, environmental relevance) and may be defined by the environmental entities and the attributes of the entities that are being assessed (EPA 1998). Due to the uncertainties and need for understanding other facets of the system for interventions, additional node types may need to be considered to represent exposure pathways between the sources and the endpoints.

The blue nodes represent intermediate nodes in the exposure pathways. Intermediate nodes should be directly dependent on parent nodes (or nodes that directly come before) and are necessary for understanding their child nodes (the nodes that come after) (Cain 2001). Likewise, their identification can be elicited in a forward direction (from source to endpoint) against the arcs (from endpoint to source), or both. Borsuk (2008) recommends developing well-defined endpoints as measurable attributes reflecting stakeholder and management objectives and provides a process for working back from endpoints to the source nodes after their identification. The value each node can take is unknown, so the intermediate nodes should represent key uncertainties between the source and the endpoints. For chemical concentrations, this would usually include the different phase shifts and media that the chemicals traverse which can be identified through characteristics of the receiving environment and chemicals’ properties (e.g., volatility, solubility). Taken as a whole, intermediate nodes should include all the key uncertain variables needed for representing the potential pathways between the source and the endpoints in a CBN. A system with higher certainty may have fewer intermediate nodes between the source(s) and endpoint(s). Tighe et al. (2013) also include different intermediate node types such as ones representing bioavailability in an exposure system, which may be helpful in elucidating in a CBN for estimating risks. The formation of secondary stressors, or indirect effects from the chemical(s) could be included when needed (EPA 1998). Interventions should also be considered when developing the intermediate nodes including what can directly or indirectly change from an intervention and additional impacts (Cain 2001; Korb et al. 2009).

Modifiers (yellow) include factors (e.g., hydrological) that are not in the exposure pathway but enhance or inhibit the migration of chemicals along the pathways and remediation interventions. Their importance comes from representing site characteristics that may influence exposure pathways. Marcot et al. (2006) recommend limiting the number of parent nodes to three or fewer per child where appropriate to keep the analysis of the relationships tractable. This may make their selection difficult, but they should adequately capture influential characteristics that can impact the fate and transport of chemicals (Tighe et al. 2013). Grouping of variables in objects or, perhaps, single or aggregate nodes may be of use as well as listing the potential modifiers in the qualitative stages if they are numerous. One key assumption for appropriate inferences in the Bayesian network (quantitative portion) is to include modifiers that are causal factors for more than one node (Borsuk 2008). For quantitative stages, some modifiers may be removed from consideration and their influence subsumed in the probabilistic relationships among the system variables if they do not violate the previous rule (Borsuk 2008). Modifiers may also be variables that could be intervened upon to influence the intermediate or other nodes in the exposure system. Modifiers may be refined in later stages. Tighe et al. (2013) recommend examining the modifiers closely for increased accuracy when higher probabilities of negative outcomes to endpoints or intermediate nodes are indicated.

Intervention nodes (light blue square) are also outside of the exposure pathway system between the source and endpoints but can facilitate or hinder fate and transport. The interventions include actions that may or may not be taken and are not variables representing the states of a system. Unlike the modifiers, the interventions are under the complete control of the decision maker(s) of the problem (Carriger et al. 2018). The intervention nodes contain additional characteristics due to their unique qualities as decisions including intentionality and whether the intervention is overwhelming (Korb et al. 2009). If the intervention is not an overwhelming one, the states of the node intervened will still contain uncertainty, and the parent relationships might still have an impact.

The purpose or context of the CBN will affect the nodes and their relationships identified above. In the initial steps, the model bounds, scope, assumptions, objectives, and prior knowledge will help guide the initial features or nodes that are defined and selected (Jakeman et al. 2006). In an ecological risk assessment, much of this would be gathered during the problem formulation phase (EPA 1998). The development of the CBN will be an iterative process as knowledge is obtained so model features and needs should be revisited throughout an assessment (EPA 1998; Jakeman et al. 2006). The use of the CBN for adaptive management or quantitative analysis can also play an important role (Nyberg et al. 2006). Uncertainties about the variables (nodes) may be resolved and simplifications may be helpful to reduce insignificant modifiers or intermediate nodes (Chen and Pollino 2012). In addition, nodes and arcs may be used to represent different viewpoints on relationships due to uncertain factors such as stressor behavior and site conditions prior to knowledge discovery. Differences can be represented by indicating where a relationship may be hypothesized between two nodes or as in EPA (1998) recommendations, through the combination or construction of multiple conceptual site models. Formal review by domain experts may help reduce structural uncertainty (Chen and Pollino 2012).

Case study

A generic groundwater DNAPL CBN was developed (Fig. 2) based on the following assumed conditions. The source is represented as a historic and ongoing DNAPL release to a site. The site geology consists of unconsolidated overburden (UO) underlain by fractured rock (FR). Subsurface characterization has confirmed mobile DNAPL in both formations. Site decision-makers have placed an initial priority on containment to prevent offsite contaminant migration. Consequently, the CBN endpoints are related to offsite flux in the groundwater system (green nodes). The key modifiers for the mobility of DNAPL and contaminated groundwater are the hydraulic gradients of the system (yellow nodes). Causal pathways in exposure routes are found by tracing the arcs (arrows) in the forward direction from the source to the endpoint. (From this point forward, all CBN structures and analyses were conducted in BayesiaLab 10.2 (Bayesia S.A.S. 2023).) The causal pathways to the four endpoints from the source are illustrated in bolded blue in Fig. 2. One causal pathway is exemplified with three intermediate nodes in the exposure pathway between surface release and offsite flux for contaminated groundwater from the FR layer (Fig. 2a), while all arcs in the causal pathways are bolded between the release of DNAPL at the surface to each of the offsite flux endpoints in Fig. 2b.

Fig. 2
figure 2

Causal pathway analysis from source (historic release, red node) to endpoints (offsite flux, green nodes) for a hypothetical conceptual Bayesian network for assessing offsite flux from DNAPL. a One causal pathway of the length of 4 from the surface release source to groundwater offsite flux in the fractured rock layer; and panel b all causal pathways together from surface release to multiple offsite flux endpoints. Arcs in causal pathways are bolded blue. CGW = contaminated groundwater; FR = fractured rock; HG = hydraulic gradient; UO = unconsolidated overburden

A CSM may be graphically illustrated to depict more details of the contamination sources and transport pathways, as well as characteristics of the site that influence fate and transport. Often, the basis for this depiction is a longitudinal geologic cross-section, with different hydrogeologic units identified (Fig. 3a, adapted from Fig. 1 in Kueper and Davies 2009). This CSM representation is hereafter called a pictorial CSM. Several features are normally apparent in a pictorial subsurface CSM (Lerner et al. 2003). These features include the characteristics of the subsurface such as fractures and permeable zones, site stratigraphy, the dispersal characteristics of DNAPL including continuous and residual distributions, the presence of horizontal and vertical pathways, groundwater flow and gradients, and likely partitioning with various media. From these features, a CSM would be used to identify influential variables, linkages, and transport parameters critical to the assessment of the subsurface environment. In addition, existing quantitative and monitoring work can be used to identify and prioritize stressor pathways in a pictorial CSM as demonstrated by Bartolo et al. (2017). The information represented in the pictorial CSM can then be used to construct quantitative simulation models and target data collection efforts to validate and improve conceptual and quantitative understanding.

Fig. 3
figure 3

Pictorial conceptual site model for the case study system and contaminant behavior shown a alone and b combined with the case study CBN. Pictorial model was adapted from Kueper and Davies (2009). CGW = contaminated groundwater; FR = fractured rock; HG = hydraulic gradient; UO = unconsolidated overburden

While a pictorial representation of the CSM is more common, a CBN can be used in conjunction with a pictorial CSM to capture the key variables involved in the transport pathway such as source terms, pathway concentrations, and phase transfers that will be used to inform quantitative modeling and reasoning about the site characteristics (Fig. 3b). There are multiple benefits to combining the CBN with the pictorial representation. In particular, the CBN can be a useful bridge between the development of the CSM and the quantitative model by summarizing the key causal features affecting the fate and transport mechanisms and contaminant pathways and phases (ANZG 2018). Causality and uncertainty (through the inclusion of random variables) are naturally considered in a CBN and form the basis of the relationships between variables in a mechanistic model (Carriger and Parker 2021). A CBN can also be quantified and used to capture the outcomes from ensemble modeling to represent probability distributions and perform fast inferences with scenarios. This benefit of Bayesian networks is demonstrated by Liedloff and Smith (2010) for performing scenarios from simulation model output with Bayesian updating.

The pictorial CSM can be a useful aid in risk communication to describe the system, the uncertainties, and the assumptions behind the quantitative predictions (ANZG 2018). The pictorial CSM will be understood more easily and faster than a CBN by most audiences with and without quantitative backgrounds (DEHP 2012). The important causal features can still be described by the CBN and clarified further with a pictorial CSM. Textual information can accompany the CBN for supporting communication of the problem domain and it can be developed with multiple parties to combine knowledge sources (EPA Victoria 2023). However, the CSM should provide visual inference support capabilities that go beyond the textual descriptions (Bresciani et al. 2008; DEHP 2012). The choice of CSM(s) should include consideration of the risk communication goals along with the understanding of the system processes that are needed (Gross 2003).

Extensions of the conceptual Bayesian network

Representing interventions

Interventions can be developed with the CBN and represented as decision nodes (Carriger and Parker 2021). With this CBN, several interventions may be proposed to prevent the offsite flux of DNAPL at the hypothetical site (Fig. 4). Multiple interventions may be represented in the network. The first intervention considered is a barrier wall that blocks the movement of the DNAPL and contaminated groundwater from the UO offsite (Fig. 4a). This intervention still leaves the potential offsite flux from the FR layer below. To prevent the movement of DNAPL and contaminated water from the FR layer and underneath the barrier wall, a dual-phase recovery system is proposed (Fig. 4b).

Fig. 4
figure 4

Conceptual Bayesian network depicting interventions (blue boxes) to break causal pathways from initial and intermediate sources to offsite flux endpoints with a one intervention that breaks the causal pathways between the source emissions and the unconsolidated overburden offsite flux and b two interventions that break all causal pathways to offsite flux. Arcs in causal pathways are bolded blue. CGW = contaminated groundwater; FR = fractured rock; HG = hydraulic gradient; UO = unconsolidated overburden

A key feature of the CBN is the ability to explicitly indicate the nodes that are impacted by the interventions including the targeted node and nodes that may be impacted unintentionally (Korb et al. 2009). The dual-phase recovery system may reduce or remove the FR DNAPL and contaminated groundwater but could also influence the hydraulic gradient in the FR layer. This connection is shown as a cause-effect relationship between the intervention and the hydraulic gradient. The interventions should interrupt all exposure pathways from the sources to the endpoints in this hypothetical example.

Weight of evidence analysis

The states of some variables are directly observable while others might require one or more sources of evidence to infer their changes. We consider the former variables to be evidence variables and the latter variables to be hypothesis variables. The usage of evidence variables is especially important when coupling simulation models for prior understanding of the functional relationships in the CBN with updates from field observations. The hypotheses variables could also be considered and evaluated as hidden state or latent variables (Fenton and Neil 2019). The evidence variables are used to infer the latent variable states with uncertainties.

The weight of the evidence reflects the totality of the diagnoses on the hypotheses from observable evidence. The outcome of evidence collection to evaluate hypotheses can be viewed as a causal process where the observed measurement is dependent on the true underlying hypothesis. From this notion, the hypothesis-evidence model was developed (Fenton and Neil 2011). For the CBN, lines of evidence can be added by connecting evidence nodes directly to the hypotheses with incoming arrows (Fig. 5). As demonstrated in Fenton et al. (2013), the evidence nodes can also be related to one another to represent dependencies among the lines of evidence as shown by the dashed line potentially connecting evidence nodes in Fig. 5. These evidence nodes are particularly useful in a weight-of-evidence process where the probabilities cannot be reliably estimated for one or more individual nodes in the network. Evidence nodes can be used for source terms, exposure pathways, remediation outcomes, or endpoint nodes. Evidence on a node can be hard (certain) or soft (uncertain). For better examining uncertain evidence, further extensions can represent the accuracy of candidate measurement procedures within a Bayesian network structure as described in Fenton et al. (2013).

Fig. 5
figure 5

Relationship between a hypothesis node and multiple sources of evidence represented by evidence nodes. The dashed line means a potential linkage for correlated lines of evidence while a solid line refers to a definite linkage

Inferential pathways

Just as the CSM exposure pathways can be identified with causal pathway analysis, the evidence nodes can be examined for inferential pathways. Inferences are based on the potential to update the probability distribution over the states of one node if another node is observed. While the causal pathways always travel in a forward direction with the arcs, the inferential pathways can go with or against the arcs. Causal pathways might be broken by interventions (e.g., Fig. 4) that override the influence of the parent nodes on an intervened node. In the case of inferential pathways, the observations on separate nodes do not break exposure pathways as it is an observation of aspects of the system and not an intervention that can remove the influence of parent and predecessor nodes in the model.

Several key rules can be used to examine the propagation of evidence in a model. As a coarse introduction, an observation on an evidence node will initially update the probability distribution of the hypothesis node to which it is attached (the first step in the inferential pathway). This change in turn will update the probability distribution for the nodes to which the hypothesis nodes are directly attached (second step) and so on until the more distant nodes of the network are reached. However, in practice, not all node distributions will be updated from a single observation in a network due to blockages on some inferential pathways. Determining how the evidence will propagate from an observation on an evidence node is done through the d-separation criterion, which translates statistical conditional independence to graphs (Geiger et al. 1990). The principles of d-separation and d-connection can be illustrated from basic connections with three nodes (i.e., serial, diverging, converging) and the impacts of hard evidence placed on one node for the other two (Fig. 6) (Neapolitan 2009). Essentially, two nodes are d-separated by an intermediate node if there is a serial or diverging connection with hard evidence or a converging connection with no evidence on the intermediate node (Fenton and Neil 2019). For soft evidence (or probabilistic evidence) placed on the intermediate node that provides an unfixed distribution, the two outer nodes in the serial or diverging connection will still not be d-separated (Conrady et al. 2014). However, the outer nodes will be d-connected in the case of the converging connection with soft evidence on the intermediate node (e.g., node ‘B6’ in Fig. 6f) whether the soft evidence freezes the distribution on the intermediate node or not. The graph may be used to identify whether an inference pathway in the CBN is blocked (d-separated) or opened (d-connected) by the evidence on another node or set of nodes. For the current case, updating a probability distribution from observing a line of evidence may or may not provide additional information on other variables of interest in a CSM, and this potential for updating probabilities may be better understood by using d-separation with a CBN. Moreover, observing multiple lines of evidence might provide additional inferences or block inferences from occurring that would have been found on a single line of evidence alone and this can also be extracted from d-separation and d-connection criteria.

Fig. 6
figure 6

d-Separation and d-connection examples for three node model structures [(a) and (b); (c) and (d); and e and f]. Orange nodes indicate that the node is observed. A1 is d-connected to C1 in a; however, when B2 is observed in b, A2 is d-separated from C2. A3 is d-connected to C3 in c; however, when B4 is observed in d, A4 is d-seperated from C4. A5 is d-separated from C5 in e; however, when B6 is observed in f, A6 is d-connected to C6. Figure adapted from Sinha (2016)

A weight-of-evidence approach is especially useful for subsurface evaluations. For example, Kueper and Davies (2009) present a weight-of-evidence approach for evaluating the presence or absence of DNAPL. The assessment CBN from the case study was augmented to include evidence nodes for the nodes in the subsurface pathways and the offsite flux endpoints (Fig. 7). The evidence nodes, colored grey, are used to represent measurements taken to infer the value of their parent nodes. In this case, the DNAPL and contaminated groundwater concentrations are observed or measured from samples taken. Offsite flux is detected from downstream wells and the presence of DNAPL and contaminated groundwater is not clearly differentiated in the analyses. This situation is represented in the CBN by a single evidence node connected to the offsite flux for both contaminated groundwater and DNAPL in the two layers (Fig. 7).

Fig. 7
figure 7

Groundwater DNAPL assessment conceptual Bayesian network with observation nodes (gray) for conceptual model components. Single DNAPL observation with two delineated pathways in a and b for inferences on CGW offsite flux FR from observing DNAPL obs 1. Arcs in inference pathways are bolded pink. CGW = contaminated groundwater; FR = fractured rock; HG = hydraulic gradient; obs = observation; UO = unconsolidated overburden

The probability distributions will be updated by observing the evidence (recall that the inferential pathway can go against the direction of the arcs). As an example, in Fig. 7a, b, the potential implications of observing DNAPL in the UO for understanding the contaminated groundwater offsite migration in the FR layer are shown via the pink arcs. A single inferential pathway between DNAPL obs1 and CGW offsite Flux FR is demonstrated in bolded pink arcs (Fig. 7a). The second and last inferential pathway contains the contaminated groundwater in the UO and not the DNAPL in the FR (Fig. 7b). Both of these inferential pathways are open when DNAPL obs1 is observed and the probability distributions of all of the variables in the two pathways can change if this is the only observation made on an evidence node. In other words, a DNAPL observation in the UO may provide information on the contaminated groundwater offsite flux in the FR as well as multiple intermediate variables in both layers in the subsurface. If additional observations are made, probability distributions are updated and the inferences on different variables through the possible pathways may be better quantified.

Conditioning on additional nodes can open or block inferential pathways between variables. A scenario is examined where a monitoring well is observed (Well detection2) prior to observing DNAPL obs1 (Fig. 8a). This prior observation will open seven new inferential pathways between DNAPL obs1 and CGW offsite flux FR and the pathways in the previous figures without the well detection observation also remain active. A second scenario with Well detection1 also observed opens seventeen new pathways for inferences between DNAPL obs1 and CGW offsite flux FR beyond the original two pathways and the pathways in Fig. 8 also remain active (Fig. 8b). In fact, all relationships in the CBN are potentially active between DNAPL obs1 and CGW offsite flux FR with these two prior observations except for the arcs originating from the source and evidence variables that have not been observed. For example, pathways between DNAPL obs1 and CGW offsite flux FR now can run through HG2 due to a downstream observation on Well detection2 that opens the collider relationships with DNAPL FR and CGW FR. This scenario indicates potential synergies between these measurements for reducing uncertainty on variables.

Fig. 8
figure 8

Impact of additional observations on inference pathways to CGW offsite flux FR from observing DNAPL obs 1 for a single DNAPL observation after observing a Well detection2 for fractured rock layer and b Well detection1 for unconsolidated overburden and Well detection2 for fractured rock layers. Arcs in inference pathways are bolded pink. When observed, Well detection1 and Well detection2 contain highlighted orange outer rings. CGW = contaminated groundwater; FR = fractured rock; HG = hydraulic gradient; obs = observation; UO = unconsolidated overburden

When multiple measurements are taken, some variables may provide additional information on the exposure pathways than if they are measured alone. However, in some cases, combining measurements can be antagonistic (closing pathways rather than opening) and d-separate nodes that were once connected. For example, if the contaminated groundwater and DNAPL concentrations in the FR are known for certain, the offsite flux in the FR would not be informed by taking additional measurements in the UO. Thus, additional sampling in the upper layer of the subsurface would not be informative for predicting FR offsite flux endpoints according to the dependencies mapped by the CBN. This measurement combination may also be viewed as blocking additional information from the upper layers to predictions for the lower layer offsite flux predictions. Inferences based on measurement combinations may be intuited, but a properly developed CBN provides confirmation and a formal tool for explaining and detecting why these situations may be the case. Measurement scenarios can be further examined for additional interactive and antagonistic effects. The structural accuracy of the site relationships is key for a qualitative understanding of the potential inferences that can occur.

Discussion

The CBN concept was demonstrated and explored for subsurface assessment and remediation applications. A CBN is a useful tool for DNAPL site management through the explicit inclusion of modifying variables, measurements in the subsurface, remediation interventions, and causal and inferential pathways. The current paper expanded the CBN concept initially described in Carriger and Parker (2021) to a new problem domain of offsite contaminant migration via groundwater and DNAPL. Developing a CBN with measurements and proposed remediation alternatives can benefit the evaluation scenarios established in problem formulation and used for characterizing risks. Key Bayesian network structural assumptions such as d-separation (criteria for determining variable independencies in a graph (Pearl 1988)) are helpful for causal exposure pathway examination with the CBN, especially as complexity and the number of potential pathways increase (Carriger and Parker 2021). The communication of remediation alternatives under consideration and how they are anticipated to perform (underwhelming vs. overwhelming in breaking a pathway) by themselves or as a component in a treatment train is a clear benefit of using a CBN.

The information that goes into building a CBN is often available for contaminated site assessments that follow the Superfund risk assessment process or similar assessment procedures. The CBN does not supplant existing risk assessment methods. Rather it can augment the assessment process and provide additional information than what is found in typical CSMs. As such, CBNs are powerful knowledge representation tools. The experience of the authors of the current paper has borne witness to this as one author used multiple lengthy site assessment reports and condensed their contents to one simple CBN. The CBN can be developed to capture literature reviews, workshop discussions, formal expert elicitation, multiple model types, and the synthesis of different sources such as scientific and local knowledge (Gregory et al. 2012; Carriger et al. 2018).

On its own, the CBN can be a powerful communication and reasoning tool for mitigating the risks from environmental stressors and measuring the effectiveness of remedies. In conjunction with a pictorial CSM, the information in a CBN can provide clarity on the exposure scenarios. A pictorial CSM can be helpful for orienting and displaying site features (ANZG 2018). The CBN can be overlaid on the pictorial CSM, or it can be used separately to convey different types of information about a site. Pictorial CSMs are generally less informative than CBNs regarding fate and transport pathways and require modifications to indicate the magnitudes of uncertainties. The CBNs can be augmented to represent the strengths of the relationships through probabilities but also include qualitative uncertainties by assuming the directed relationships are hypothesized or indicating magnitudes in the relationships. The modular structure of a CBN also permits easy structural modifications for iterative model development as new knowledge becomes available, such as in an adaptive management process (Chen and Pollino 2012).

Although the components of the CBN can accommodate many contaminated site problems, complexity can be an issue when using the CBN for communication purposes. Model representations of real-world processes often balance the complexity with relevant knowledge and data (Ferre 2020; Conrady and Jouffe 2015). Castilla-Rho (2017) describes how groundwater systems meet the criteria for being a complex system necessitating interactive and iterative approaches for model development. Identifying the hypothesized exposure pathways with a CBN can support these flexible modeling processes by rigorously capturing the key features and uncertainties of the relationships between source terms and receptors. The use of nodes that represent probability distributions also reduces the need for increasing mechanistic detail while capturing key uncertain variables (Reckhow 1999). The CBNs condense layers of information on potential events and influencing factors and the outcomes within each node are uncertain, even with a fully specified and certain CBN structure for a well-understood system.

As discussed in Carriger and Parker (2021), it is recommended that probabilities be used with the CBN to include uncertainty in a quantitative fashion. This makes the CBN platform useful for setting up detailed risk models for contaminated sites. Moreover, the CBNs are built with assumptions of conditional independence found in probabilistic analysis for causal and inferential uses which facilitates a translatable platform for quantitative model development. The probabilities can help provide coherent uncertainties to missing knowledge of site risks and remediation. The CBN provides a compliant structure for formulating these probabilities through conditional probability tables common to Bayesian networks (Tighe et al. 2013). The quantitative portion of the Bayesian network can easily propagate the uncertainties on one or multiple variables throughout a model for evaluating scenarios with or without measurements and remediation action implementation (Ayre and Landis 2012). Future research may adapt the CBN to a Bayesian network in a complete assessment process with a real-world contaminated site application.

Uncertainties in a probabilistic model like a Bayesian network can be broken down into two main but dependent parts: structural and conditional uncertainty. The structural uncertainties consist of questions about the actual connections among the nodes, the types of nodes, and their definition. In some cases, this may limit the usage of the CBN for causal and inferential pathway analysis if structural uncertainties are large as in chaotic systems (French et al. 2009; Snowden and Boone 2007). However, testing interventions in simulated or real-world applications may help probe the structural assumptions. This gains in importance with an adaptive process with the CBN (Nyberg et al. 2006). Thus, existing knowledge can still be represented on the structural aspects of a problem with a CBN for highly chaotic contexts by fostering updates of the CBN within an adaptive process. The conditional probabilities are developed in the direct connections but propagate throughout the network, given the network structure and presence or absence of interventions and observations, for marginal prior probability calculations and posterior probabilities if evidence is entered anywhere in the network.

Along with risk predictions, the quantitative phases may be useful for amending the CBN in an iterative process. Sensitivity analysis can help examine the magnitude and direction of the changes in one node when knowledge of another node is available (Hassan et al. 2022). Sensitivity analysis is commonly used in Bayesian networks for environmental risk assessments to examine the influence of a collection of nodes on endpoint(s) of concern (Kaikkonen et al. 2021). One useful tool for sensitivity analysis with a Bayesian network relies on examining uncertainty reduction capabilities through measures such as mutual information (Nicholson and Jitnah 1998). This was demonstrated by Spence and Jordan (2013) in a meta-analysis model on nitrogen removal impacts from wetlands on ecosystem services. Sensitivity analysis can be used to investigate the need for greater structural complexity or additional investigation by adding hypothesized arc connections and examining the impacts of these relationships on target nodes (Laskey 1993). Higher-order sensitivity analysis should also be considered to examine joint sensitivity from findings on two or more nodes together (Coupé and van der Gaag 2002). Likewise, the effects of multiple interventions on endpoints should be properly examined (Cain 2001). Besides node evaluation, Taylor et al. (2016) emphasize the importance of sensitivity analysis in examining the impacts of multiple data sources and focusing on future data collection efforts. Sufficiency of the data in an existing model might be justified from a sensitivity analysis (Taylor et al. 2016). Similarly, varying conditional probabilities and checking predictions on endpoints can be helpful when data are inaccurate to evaluate higher-order uncertainties (Cain 2001).

As demonstrated, CBNs are amenable to including lines of evidence and potential inferential connections between variables from observations. Some variables are not directly observable. Including lines of evidence as measurement nodes within a CBN provides information on the measurement sources for the variables that are not directly observable (Fenton and Neil 2019). For the conditional uncertainties, the measurement nodes help to communicate how uncertainties are being addressed with data on unobservable phenomena modeled. The lines of evidence may be used to help resolve uncertainties related to source term, site transport, offsite migration, modifiers, and intervention efficacy variables. The introduction of measurement variables can also be useful for future work determining information sources for the conditional uncertainties underlying the CBN. Examining the potential influence of these measurements in the problem formulation phase with a CBN will further aid the quantification of uncertainty in probabilistic risk assessment and risk management decisions.

The measurement variables open additional benefits for qualitative uncertainty with CBNs. Like the causal pathway analysis, the inference pathways are based purely on the qualitative connections in the graph, and thus, the magnitudes of the uncertainty reductions from measurements are not discernable. The CBNs can still aid in evaluating the potential for environmental measurements to inform a system model prior to quantification for risk assessment. However, quantification is necessary to examine the strength of the weight of evidence and the value of information. A quantitative CBN in which relationships include probabilities would facilitate understanding the magnitude of uncertainty reductions. When valuation is used for costs and outcomes as in an influence diagram or decision network, the CBN can be extended for usage with value of information assessments to prioritize and determine the value of collecting more and different types of data (Clemen and Reilly 2014). To better represent conditional and structural uncertainties, model averaging (Neuman 2003) can be useful for supporting multiple CSMs or probability distribution functions within a CBN.

The CBN process was explored for supporting lines of evidence approaches in the characterization and remediation of contaminated DNAPL sites. Characterization and ultimately remediation of contaminated DNAPL sites rely on lines of evidence at an appropriate resolution due to the long-term risk to groundwater resources and off-site migration issues (Rossabi et al. 2022). Preventing the movement of DNAPL and other contaminants to sensitive receptors is important for supporting and evaluating remediation. However, DNAPL site characterization is expensive and often involves many uncertainties including the DNAPL spatial distribution, total DNAPL mass, and mass discharge. A weight-of-evidence approach has become increasingly common in groundwater characterization and remediation due to the uncertainties encountered during site characterization. Evaluating the weight of evidence for DNAPL characterization and remediation with a CBN or quantified Bayesian network may provide valuable information and better communication of uncertainties in future site characterization tasks.