Introduction

Canada and the northern United States are covered by complex and heterogeneous glacial deposits. These deposits contain major aquifers that are important sources of water and are increasingly being relied upon as water scarcity increases and surface-water availability decreases due to human activity and climate change (Vaccaro 1992). Advancements in groundwater modelling software have resulted in more effective use of regional groundwater flow models as tools to support sustainable groundwater management (Pasanen and Okkonen 2017); however, conceptualization of geology still presents the greatest uncertainty, particularly for glacial deposits (Anderson et al. 2015; Refsgaard et al. 2012). The continuity of glacial deposits influences aquifer extents, where hydraulic interactions occur along the flow path (e.g. recharge and discharge areas), groundwater chemistry, and the development of hydraulic conditions (e.g. confined aquifers, artesian wells; Bayless et al. 2017).

Geologic models are the backbone of every groundwater model and are critical to understanding groundwater flow within complex glacial aquifer systems (Pasanen and Okkonen 2017). The deposition of glacial deposits varies both temporally and spatially and results in complex relations and hierarchical structures that make them difficult to model (Schorpp et al. 2022). Traditional methods used to develop three-dimensional (3D) geologic models for groundwater applications rely on manual interpretations that incorporate known geologic relationships and expert knowledge into a deterministic model; however, such models are typically time-consuming to create, difficult to update and do not adequately represent subsurface heterogeneity or account for uncertainty in geologic structure (Jørgensen et al. 2015; Kearsey et al. 2015). Geostatistical approaches have been applied using numerical data (e.g. geophysical surveys) to make spatial predictions in the subsurface but typically have limitations associated with the underlying assumption of stationarity (i.e., statistics used to describe data distribution that does not change throughout the spatial domain) and the inability to reproduce subsurface complexity (Bianchi et al. 2015).

There are several examples in the literature that provide recommendations for geologic model development specific to glacial environments that differ from traditional approaches and embrace stochastic methods. Stochastic methods are more automated, allow representation of complexity and evaluation of uncertainty that is well suited for glacial deposits (Refsgaard et al. 2012; Jørgensen et al. 2015; Toth et al. 2016). Kearsey et al. (2015) use indicator kriging (IK) and sequential indicator simulation (SIS) as stochastic methods to investigate lithological variations within glacial and post-glacial deposits. They advocate for a lithofacies-based approach instead of stratigraphic layers for geologic modelling, especially in complex, heterogenous deposits where it can be difficult to locate stratigraphic boundaries. The 3D geologic model constructed by Jørgensen et al. (2015) uses a combination of manual and stochastic methods to represent subsurface heterogeneity. The developers of ArchPy (Schorpp et al. 2022) provide automated solutions for 3D geologic modelling of Quaternary aquifers to any desired hierarchical level (stratigraphic unit, stratigraphic subunits, lithofacies, hydraulic properties) using stochastic methods based on multi-point statistics, variogram-based categorical simulation techniques, and Gaussian random functions.

Artificial neural network (ANN) is a widely used machine learning algorithm inspired by biological neural networks invented in the 1950s that can be trained to identify hidden patterns and structures without any assumptions on data distribution. ANNs are suitable for geoscience applications because they can model complex nonlinear dependencies, are adaptive for managing nonstationary data, and allow the integration of contextual information (Kanevski et al. 2009). Most ANN applications in geoscience have generally focused on using numerical data, including geophysical survey data to interpret lithofacies (Cracknell and Reading 2014; Baykan and Yilmaz 2010; Morgan 2018), historical groundwater level measurements to make future predictions (Rajaee et al. 2019), as well as seismic and borehole geophysics for reservoir modelling (Vo Thanh et al. 2019; Ansah et al. 2020).

ANN, however, can also be used to solve classification problems using categorical information—for example, Rizzo and Dougherty (1994) use ANN to characterize the 2D distribution of hydraulic conductivity based on three classes (low, medium, and high). ANN also proved to be a useful methodology for a basin-wide study characterizing complex and heterogeneous lithology with a multilayered aquifer system at the borehole level (Sahoo and Jha 2017). The ANNs used in these aforementioned studies include self-organizing maps (SOMs) and multilayer perceptron (MLP). SOMs are a single-layer feedforward neural network used to produce a low-dimensional representation of a dataset with a high-dimensional space or multiple features. SOMs have been combined with MLP to enhance pattern recognition in hydrogeologic applications (Rizzo and Dougherty 1994; Sahoo and Jha 2017). MLP is a feedforward neural network that uses supervised learning to make predictions. The architecture of a MLP consists of an input layer, an output layer, and one or more hidden layers. The layers contain neurons that are connected by weights and act as computational units to convert incoming signals into outputs using activation functions. The stochastic nature of MLP training and probability outputs from predictive modelling are key features that allow for multiple geologic realizations as well as exploration of geologic uncertainty.

Borehole logs from public well records are typically the most abundant type of subsurface data for hydrogeologic applications and contain large amounts of textual data that can be leveraged to classify similar materials and provide qualitative information about the subsurface (Russell et al. 1998; Allen et al. 2008; Bayless et al. 2017). Natural Language Processing (NLP) techniques provide a statistical approach to understanding language and have recently been applied in geoscience applications to explore the geologic lexicon so that material classification is more automated and less subjective (Padarian and Fuentes 2019; Fuentes et al. 2020). Fuentes et al. (2020) provide one of the first geoscience applications where a numerical representation of textual data (word embedding) from borehole logs is used to spatially predict lithology using MLP and a 2.5D interpolation approach. However, an example of using categorical data to predict complex subsurface conditions directly in 3D using ANNs was not found in literature.

The main goal of this research is to leverage textual data from borehole logs and apply machine learning to solve the 3D multi-class classification problem of predicting lithofacies in the subsurface. Different data selection alternatives are considered and the impact on the training and prediction capabilities of MLP are evaluated. This study aims to effectively reproduce the geologic complexity associated with glacial environments that can be used in groundwater models to support water resource management. The study addresses the major source of uncertainty in groundwater modelling (e.g. heterogeneity) and provides an innovative technique and workflow for predicting lithofacies in complex glacial deposits.

Materials and methods

Study area

The regional basin containing the Fraser Lowland in southwest British Columbia, Canada, and the northwest portion of Whatcom County, Washington, United States, was selected as the study area (Fig. 1). It has gently rolling and flat-topped uplands separated by wide, flat-bottomed lowlands bounded by mountainous terrain and the sea. Drainage is directed towards the Fraser and Nooksack rivers as well as smaller local rivers that all discharge to the Salish Sea.

Fig. 1
figure 1

Fraser-Whatcom Basin location, topographical areas, and major drainage features. Processed point data for borehole logs are also visualized based on data sources (BCBH – British Columbia Ministry of Environment and Climate Change borehole log lithology, BCOG – British Columbia Oil & Gas Commission eLibrary, BCWR – BC GWELLS application, WSSD – Washington Geological Survey subsurface database, WSWR – Washington State Department of Ecology Well Log Viewer)

Geologic setting

The geologic setting of the study area has been influenced by tectonic and glacial events. The study area lies largely within a sedimentary basin surrounded by three different geological basements—Coast Mountains, Cascade Mountains, and Vancouver Island Ranges (Jones 1999; Mustard and Rouse 1994). Bedrock outcrops are generally limited to the northwestern portion of the study area along Burrard Inlet and Sumas Mountain in the central valley. Bedrock surfaces have been prepared separately for the Canadian and American portions of the study area (Hamilton and Ricketts 1994; Eungard 2014). They provide a generalized representation of the bedrock surface given the limited number of deep wells with significant discrepancies particularly along the international boundary where the surfaces can be compared. The bedrock surface can be over 300 m deep and generally becomes shallower along the margins of the mountains. Fault structures near Sumas Mountain are also inferred to locally deepen the bedrock surface (Mustard and Rouse 1994).

Multiple glaciations, fluctuations in sea level, and isostatic adjustments have contributed to a complex distribution of Quaternary deposits in the study area. The stratigraphic framework is primarily based on Pleistocene deposits associated with the Fraser Glaciation and post-glacial deposits from the Holocene (Figs. 2 and 3).

Fig. 2
figure 2

Hydrostratigraphic chart for the Fraser-Whatcom Basin modified from Fig. 6A in Jones (1999) and Fig. 2 in Ward and Thomson (2004). Equivalent naming convention used in the United States is indicated in brackets

Fig. 3
figure 3

Stratigraphic units within the Fraser-Whatcom Basin modified from surficial mapping (Armstrong 1976, 1977; Armstrong and Hicock 1979, 1980; Washington Division of Geology and Earth Resources 2016)

During the beginning of the Fraser Glaciation, advancing outlet glaciers from the Cordilleran Ice Sheet led to the deposition of thick, proglacial outwash called Quadra Sand (Easterbrook 1986; Clague 1991; Vaccaro et al. 1998). The Coquitlam Stade and Port Moody Interstade occurred early in the Fraser Glaciation (Ward and Thomson 2004). Quadra Sand is generally not exposed at the surface and deposits from the Coquitlam Stade and Port Moody Interstade are limited in extent based on surficial mapping.

The Vashon Stade was the most extensive advancement of the Cordilleran Ice Sheet. At the Vashon stadial maximum, the study area was completely covered by ice more than 1.5 km thick (Clague et al. 1991). Isostatic depression of the land mass occurred from the weight of the ice sheet. During deglaciation of the Cordilleran Ice Sheet, marine waters inundated the area and resulted in a calving embayment that induced rapid ablation of the glacier (Armstrong 1957). Capilano sediments were deposited along the western portion of the study area during this period. These sediments include glaciomarine silt and clay containing drop-stones from melting icebergs and fossil shells.

Glacier retreat slowed and the ice margin stabilized in the central portion of the study area. Thick interbedded glaciomarine and glacial sediments deposited within the area of the fluctuating ice margin are included in the Fort Langley Formation. The glacier that persisted became land-based as isostatic rebound of the terrain and sea level rise started to occur. This led to initial development of drainage networks on the emerged terrain while the sea occupied the lower reaches of many of the paleovalleys (Kovanen 2002). A minor-readvancement during the Sumas Stade extended the glacier to the southwest where Sumas till and glaciofluvial deposits were deposited on top of the Fort Langley Formation.

Following the final retreat of the glacier, nonglacial Salish sediments were deposited by marine, fluvial, mass movement and aeolian processes. Peat accumulated in poorly drained areas in surface depressions or valley bottoms. Modern alluvial deposition from aggradation of the floodplains and major deltas of the Fraser and Nooksack occurred and continues today.

While the Quaternary history of the area has been well-studied and surficial geology is well mapped, the subsurface geology is less understood at the basin scale with cross-sections from existing geological maps providing only local context.

Hydrogeologic framework

The hydrogeologic framework varies across the study area because of partial glacier advances and the timing of glacial retreat. Aquifers exist in the study area within a complex sequence of glaciated materials (Halstead 1986; Vaccaro et al. 1998). The aquifer units generally consist of coarse-grained outwash deposited during glacial advances and retreats, proglacial deposits, and fluvial sediments deposited during glacial interstades. The semiconfining and confining units generally consist of till, glaciomarine, lacustrine and organic deposits. Alluvial sediments in the major broad alluvial valleys overlie eroded drift sequences. Bedrock that underlies these sediments forms the lateral and basal boundaries of the aquifer system.

Over 100 unconsolidated aquifers have been locally mapped within the Canadian portion of the study area (BCMECC 2023). A hydrogeologic framework and numerous fence diagrams within the central portion of the study area in Canada are detailed in Halstead (1986). Three regional aquifers including an alluvium aquifer within the Nooksack River valley, glaciofluvial sediments of the Sumas Drift (Fraser Aquifer), and a deeper confined aquifer in the Quadra Sands (Puget Aquifer) have been conceptualized within Washington State (Vaccaro et al. 1998). A regional framework for aquifers in quaternary sediments in the glaciated portions of the United States has also been developed (Haj et al. 2018; Yager et al. 2019).

General modeling approach

The general approach includes development of a geologic database, standardization of lithofacies based on descriptions from borehole logs, and creation of a 3D mesh to facilitate data selection for MLP training. MLP was then used to generate a geologic realization for three data selection alternatives with the best outcome verified against subsurface interpretations from independent studies and hydraulic indicators for the region. The study was executed using Python programming language with final visualization of results using ParaView (Ayachit 2015) (v5.10.1).

Geologic database

The geologic database includes information from public borehole logs and point data from surficial geology mapping. Approximately 13,900 borehole logs from various data collections managed by government agencies in both British Columbia and Washington State were included in the database (Fig. 1; Table 1). For the United States, water well records from WSWR required manual processing since multiple well reports from multiple wells can exist at a given location (Eungard 2014). The WSWR well report having the deepest depth within a 1-km grid spacing was selected for manual entry of lithology data. As such, the density of subsurface information in the United States is much lower compared to what was used in Canada (Fig. 1). Boreholes from environmental and geotechnical investigations are relatively shallow (<20 m deep), while water wells are deeper but generally less than 100 m below ground surface. Boreholes from oil and gas wells were generally the deepest but are sparsely distributed within the study area.

Table 1 Summary of borehole data organized based on country and government agency

Borehole data were augmented with ~40,000 data points based on geomorphic mapping by Kovanen and Slaymaker (2015). This mapping provides a single representation of surficial deposits for the study area recreated using surficial mapping (Dunn and Ricketts 1994; Washington Division of Geology and Earth Resources 2016). Lithofacies were assigned predominantly based on geomorphic units with consideration of material descriptions from surficial mapping (Table 2). Point data were generated using a grid spacing of 1 km as well as vertical points every 5 m to a depth of 100 m to provide additional vertical coverage where bedrock outcrops were mapped.

Table 2 Lithofacies classification overview. Material grouping codes modified from the Unified Soil Classification System with consideration of hydraulic data

Lithofacies standardization

Material descriptions of lithology from borehole logs were first compiled and processed using various text-cleaning methods (e.g. correcting spelling mistakes, standardizing short-forms and abbreviations). After text cleaning, fewer than five words were typically used with two or three words being the most common to describe lithology. The processed lithology was classified into lithofacies using a semiautomated approach that included NLP techniques. The multistep process is described in the electronic supplementary material (ESM). Classification was primarily based on size and composition of grains using a modification of the Unified Soil Classification System (USCS). Hydraulic characteristics (e.g. hydraulic conductivity, yield estimates) were also considered to assign material grouping codes (see Table S3 in the ESM). The original 20,000 unique material descriptions were consolidated into the top 45 descriptor combinations, grouped into 10 material grouping codes, and classified into five lithofacies (coarse, fines, clay, till, bedrock) as shown in Table 2. This captured 92% of the total thickness from the processed lithology dataset.

Mesh generation

A mesh was created using the Pyvista (Sullivan and Kaszynski 2019) Python package to generate a 3D representation of the study area and to facilitate processing lithofacies. The lateral boundaries of the mesh were based on the extent of Quaternary mapping, excluding the land north of Burrard Inlet and intermountain valleys. A cell size of 200 m wide by 200 m long and 5 m high was used to capture the major variability in lithofacies for regional modelling purposes. A uniform vertical cell height of 5 m was established based on a statistical review of lithofacies thickness and depth. The greatest generalization using this approach occurs near the surface where lithofacies tend to be less than 5 m thick.

The top of the mesh was generated based on a digital elevation model (DEM) that combined topographic and bathymetric data (Canadian Hydrographic Service 2019; DMTI Spatial Inc. 2002; US Geological Survey 2019; Finlayson et al. 2000). All datasets were reprojected to NAD 1983 UTM zone 10 with elevation units in metres above sea level (masl) and then combined. A cell spacing of 30 m and automatic filling of elevation voids was initially completed followed by resampling of the surface to generate a DEM with 200-m cell spacing. The bottom of the mesh was established at an elevation of –150 masl given the vertical extent of lithofacies that could act as aquifers, which also corresponds to the deepest extent considered in groundwater modelling studies within the region (Golder 2005). The resultant mesh has over 3.3 million cells with a surface area of ~670 km2.

Data selection alternatives

Three data selection alternatives were considered to evaluate the training and prediction capabilities of MLP. Alternative 1 consists of lithofacies spatially represented using coordinates and interval depths from the geologic database. Alternative 2 considers a 3D mesh and uses the cell centroid coordinates and the lithofacies mode within each cell to provide a more regularly spaced dataset. To determine the lithofacies mode, a point cloud was generated spacing data vertically every metre for each interval. The most frequently occurring lithofacies within each cell was then used as the target for training. Like alternative 2, alternative 3 uses the lithofacies mode within each cell but takes the coordinates of the cell centroid and maps them on a 2D grid using SOM before being used in the MLP algorithm. Table 3 provides an overview of each data selection alternative, including the approach for input features and targets as well as the number of samples.

Table 3 Overview of data selection alternatives for MLP

A representative distribution of samples for each alternative is shown in Fig. 4 to highlight spatial differences. Alternative 2 has fewer lateral locations compared to Alternative 1 because cell centroids are used instead of well locations. However, there are more samples for alternative 2 in the vertical direction since intermediate points are added to intervals greater than 5 m thick. For alternative 3, the spatial distribution of lithofacies is now assigned using coordinates from 2D mapping. Data are grouped as clusters with multiple lithofacies assigned to the same 2D coordinates. All three alternatives have imbalanced lithofacies distributions, meaning that the lithofacies are not represented equally, but the distribution is not considered extreme (e.g. ratios below 1:5).

Fig. 4
figure 4

Spatial distribution of subsurface data for a alternative 1, b alternative 2, and c alternative 3. Lithofacies are shown for alternatives 1 and 2. Spatial clusters are shown for alternative 3 to show grouping of data

Machine learning

MiniSom (v. 2.3.0, Vettigli 2018) and scikit-learn (v.0.24.2, Pedregosa et al. 2011) were used to implement the SOM and MLP algorithms respectively. For SOM, normalization was used to scale the coordinates of the cell centroids. For the first stage, the elevation of each cell centroid was set to zero; therefore, only the normalized eastings and northings were used to initially train the SOM. For the second stage, the normalized northing, easting, and elevation of the cell centroids were used to refine mapping of the samples onto the 2D grid. SOM hyperparameters selected for optimization include sigma and learning rate. Hyperopt (Bergstra et al. 2013) was used for hyperparameter optimization to determine the best combination that would result in the lowest error.

For MLP, training and testing subsets from the datasets were made using an 80 and 20% split, respectively. A stratified splitting approach was used to ensure each set contains approximately the same percentage of lithofacies as the original dataset. Input data were transformed using the standardization scaling method. Four hyperparameters (hidden layer size, alpha, batch size, and initial learning rate) were selected for optimization. These hyperparameters were chosen because they appeared to have the greatest impact on MLP performance when using the ReLU activation function and Adam solver. The hyperparameter default values were used and generally modified by an order of magnitude to establish upper and lower limits of the search space considered for optimization. A stratified threefold cross-validation grid search implemented with the GridSearchCV optimizer in scikit-learn was used to objectively select the combination of hyperparameters that achieved the best performance on the training dataset.

Once the hyperparameters were established, the MLP was trained using all the training data. The testing data were then used to evaluate the generalization performance of the model when making predictions on unseen data. The performance metrics used to evaluate the models include log-loss (i.e. cross-entropy), balanced accuracy, and the confusion matrix. MLP tries to minimize log-loss using stochastic gradient descent by adjusting weights during training; therefore, a lower log-loss indicates better performance. Balanced accuracy is the average fraction of relevant instances that were correctly predicted. The balance accuracy scores ranged from 0 to 1 with values closer to 1, indicating good accuracy. A confusion matrix is a common technique used to summarize the performance of a classification algorithm. It highlights the errors being made by the MLP or what is making the MLP ‘confused’ when making predictions. Once the training model was validated and tested, all the data were used to train and establish the final weights of the predictive model.

The main purpose of the predictive model is to take lithofacies data for each alternative and make predictions where information is not available in the study area. Cell centroid coordinates from the mesh were used for alternatives 1 and 2, while the coordinates from 2D SOM mapping were used for alternative 3. The same scaling methods were used to transform inputs prior to running the predictive models. This resulted in the prediction of lithofacies at ~3,336,000 unique cell centroids and 7,700 unique SOM neurons. The output from the predictive model includes lithofacies and probability predictions at each cell centroid. The negative log of probability (between 0 and 1) for the predicted lithofacies in each cell was used to calculate entropy. A low entropy indicates less uncertainty while higher values suggest more uncertainty. The model outputs were assigned as attributes to the mesh using PyVista and then exported as a voxel model for 3D viewing using Paraview.

Results

MLP training and testing

MLP training for each alternative was evaluated based on a review of log-loss, balanced accuracy scores, and confusion matrix results. At the end of training, alternative 2 had the lowest log-loss at 0.8, followed by Alternative 3 at 1.10 and Alternative 1 at 1.15. For a naïve classification model, which assumes the same probability for each lithofacies, the log-loss score would be 1.3, e.g. –log(0.2). The MLP models for all three alternatives achieve a lower log-loss compared to a naïve classification model.

MLP tuning (e.g. hyperparameter optimization, number of epochs) improved training performance compared to the use of default MLP values for all alternatives. Tuning improved balanced accuracy scores by 9–17%. Alternative 2 had the highest training and testing scores of 65 and 60%, respectively. Model variance (difference between training and testing score) is lowest for alternative 3 and highest for alternative 2 with values below 5%. The balanced accuracy scores indicate modelling results with a high bias and low variance. Typically, training and testing accuracy scores above 80% indicate good performance for other machine learning applications (Brownlee 2022); however, this may not be achievable given the multiclass classification problem and variability in subsurface data for this study.

The confusion matrix from the testing results showed that alternative 2 had the best precision for all lithofacies. Bedrock was most accurately predicted despite bedrock samples occurring less frequently in all three datasets. This could be attributed to the continuity of bedrock once it is encountered. The prediction performance for unconsolidated lithofacies follows the same trend for all alternatives where ‘coarse’ has the second highest precision with lower precision in descending order for clay, fines, and till. This may be attributed to the distribution of unconsolidated lithofacies or could be associated with discontinuity of unconsolidated lithofacies in the subsurface and the underfitting of the model to capture this level of complexity. Clay and till are most often confused with each other, while coarse seems to be confused most often with till and fines. Fines are commonly confused with till and clay.

Predictions

A cross-sectional view of predicted lithofacies for each alternative is shown in Fig. 5. Alternative 1 appears to be the most underfit (e.g. unable to model training data nor make predictions), which is expected given that it has the lowest balanced accuracy score. Alternative 2 shows more complexity in the subsurface compared to alternatives 1 and 2. Alternative 3 is more comparable to alternative 2, but is more generalized. This suggests the dimensions of the 2D SOM (used in alternative 3) could be inadequate to represent the subsurface complexity. Alternative 2 resulted in a larger coverage and higher probability for coarse, clay and till compared to the other alternatives. As shown in Fig. 5, the spatial coverage of higher entropy values is generally greater for alternatives 1 and 3 compared to alternative 2. The average entropy values of 0.72, 0.28, and 0.64 for alternatives 1, 2, and 3, respectively, indicate the most confidence in alternative 2 results. Extrapolation was required to make predictions given the limited distribution of deep boreholes available for the region.

Fig. 5
figure 5

Cross-sectional view of predicted lithofacies (left) and calculated cell entropy (right) from MLP predictive models developed using a alternative 1, b alternative 2, and c alternative 3

Compared to alternatives 1 and 3, MLP predictions for alternative 2 had the lowest log-loss, highest balanced accuracy scores for both training and testing, and highest precision for all lithofacies. Based on the cross-sectional reviews, alternative 2 shows more complexity in the subsurface compared to alternatives 1 and 3. Alternative 2 may have performed better because lithofacies were averaged based on the cell mode which may have reduced some noise associated with the spatial variability of data for alternative 1. The size of the 2D SOM for alternative 3 may have been inadequate to represent the subsurface complexity. Alternative 2 was selected as the preferred alternative based on better performance metrics, lower entropy, and cross-sectional reviews that indicated more complexity in the subsurface compared to the other alternatives.

Verification

Geologic modelling results from MLP predictions using alternative 2 data were verified by considering published interpretations of the subsurface from independent studies. Verification efforts focused on local areas (Fig. 6), including the Township of Langley (TOL) and the Nicomekl-Serpentine Valley between Langley and Surrey because of the high density of wells, number of mapped aquifers, and availability of groundwater modelling reports.

Fig. 6
figure 6

Local area used to verify geologic modelling results from MLP predictions using alternative 2 data. Mapped aquifers of interest are shown with unique numbers assigned by the Province of British Columbia (e.g. AQ33 is provincially mapped aquifer number 33). Geologic cross-sections A–A′ (Fig. 7) and B–B′ (Fig. 8) are approximate locations based on Golder (2005). C–C′ and D–D′ are shown in Fig. 10. See Fig. 1 for the local area relative to the study area

Township of Langley

Geologic interpretations within the Township of Langley (TOL) published by Golder (2005) were reviewed to verify MLP predictions. The approximate locations of geologic cross-sections A–A′ and B–B′ within the Langley Uplands are shown in Fig. 6. Comparisons of the geologic interpretations provided by Golder (2005) and the alternative 2 MLP geologic model at the cross-section locations are provided in Fig. 7 (A–A′) and Fig. 8 (B–B′).

Fig. 7
figure 7

Geologic cross-section A–A′ modified from Golder (2005) and from MLP predictions using alternative 2 data. Provincial aquifers are labelled beginning with AQ. Colour-coded circles indicate lithofacies for the processed lithology material descriptions from boreholes. See Fig. 6 for cross-section locations

Fig. 8
figure 8

Geologic cross-section B–B′ modified from Golder (2005) and from MLP predictions using alternative 2 data. Provincial aquifers are labelled beginning with AQ. Colour-coded circles indicate lithofacies for the processed lithology material descriptions from boreholes. See Fig. 6 for cross-section locations

Cross-section A–A′ (Fig. 7) from MLP predictions shows aquifer (AQ) 35 separately from AQ33 and AQ1144 consistent with the interpretation by Golder (2005). The distribution of coarse material lumps AQ33 and AQ1144 into one aquifer unit which may be reasonable given arbitrary cut-offs used by Golder to establish aquifer extents. Provincial mapping of AQ33 overlies AQ1144, although this is not shown on the Golder cross-section. There are interconnected areas between the three aquifers elsewhere in the TOL area described by Golder 2005. Coarse material from MLP predictions is interpolated at the surface where AQ1144 has been mapped and extends to deeper depths compared to interpretations by Golder. In general, the MLP geologic model performs well at representing the subsurface in this area with the potential issue of extrapolation at depth.

For geologic cross-section B–B′ (Fig. 8), three distinct aquifer units are represented in the MLP geologic model while five aquifers are interpreted by Golder (2005). The original cross-section B–B’ from Golder shows the aquifer material for AQ32 as sand and gravel but is described as ‘a body of fine sands, sand and locally gravel and till’ in the report. The MLP geologic model shows a relatively large fines unit approximately 20 m thick in this area. AQ33, AQ1193, and AQ32 are lumped in the lower coarse unit from MLP predictions and have a larger extent compared to interpretations by Golder. The description for AQ1193 in Golder (2005) indicates it is located between +20 and –20 masl; therefore, it may be reasonable for the lower coarse unit to extend below 0 masl in this area. In general, the MLP cross-sections show a more generalized representation of aquifers and greater connectivity of permeable units compared to the Golder cross-sections.

Golder identified 18 major aquifers based on hydrostratgraphic interpretation of geologic units. For these major aquifers, permeable units that overlap by at least 10% and any aquitard between overlapping units less than 10 m thick were considered by Golder as ‘well-connected hydraulically’. Coarse lithofacies with a probability above 50% from MLP modelling were arbitrarily selected and grouped by connectivity to represent geologic areas as ‘well-connected hydraulically’ (Fig. 9). This includes lithologic materials described as sand, sand and gravel, and gravel, but does not include fines such as ‘silty sand’ which may be important to this local area for aquifer characterization purposes (Figs. 7 and 8). Grouping by connectivity assigns a zone for each group of connected cells.

Fig. 9
figure 9

3D representation of major aquifers identified by Golder (2005) in the alternative 2 MLP geologic model for coarse lithofacies with a probability above 50% colour coded to represent hydraulic connnectivity

The unconfined aquifers including the Brookswood Aquifer (AQ41), Fort Langley Aquifer (AQ41), Hopington AB Aquifer (AQ35), and Abbotsford A (AQ15) are represented by the MLP geologic model; however, some of the linear features, interpreted as meltwater channels by Golder (2005), and extending from the main volume of the Brookswood (AQ41) and Abbotsford A (AQ15) aquifers are not reproduced using MLP. A finer mesh resolution and additional surficial data points could potentially be used to model this connection.

The MLP geologic model shows a large connected volume that consolidates several of the mapped aquifers interpreted in the area. This consolidated representation of aquifers may be plausible given the potential for interconnection noted by Golder (2005); however, the use of a higher probability cut-off limit may be more representative of aquifer extents which would be more conservative for water exploration but less conservative for water management purposes. Some of the deeper aquifers in Golder (2005) are only partially represented or not reproduced by the MLP geologic model, likely due to aquifer materials with a greater fines content (e.g. silty sand) that are locally significant as an aquifer unit but are not captured by the coarse lithofacies. A more detailed review of data within this area would be required to support conceptualization of intertill aquifers given the limited frequency of data points categorized as till.

Nicomekl-Serpentine Valley

Most flowing artesian wells within the study area are within the Nicomekl-Serpentine valley in the Surrey-Langley area (Fig. 6). Flowing artesian wells can occur when the aquifer is confined by low permeability materials or where there are large upward hydraulic gradients if the aquifer is unconfined. Well drilling advisory currently exists for AQ58 (Nicomekl-Serpentine; BC Ministry of Forests, Lands, Natural Resource Operations and Rural Development 2018) and are proposed for the western portion of AQ33 (West Aldergrove; Johnson et al. 2022) due to the potential for flowing artesian conditions.

AQ58 includes two permeable units consisting of a shallower unit generally occurring between –60 and –90 masl in the local upland area and a deeper unit up to 20 m thick generally occurring between –120 and –150 masl that underlies the upland but also extends along the northern portion of TOL (BCMECC 2016a). AQ33 is described as an intertill aquifer consisting of two permeable units including a shallower unit between 5–15 m thick and a deeper unit up to 20 m thick, both sloping westward and merging along the western extent of the aquifer (BCMECC 2016b).

Geologic cross-sections that intersect AQ58 and AQ33 are shown in Fig. 10 based on the alternative 2 MLP geologic model. MLP predictions show upper and lower permeable units consistent with the description provided for AQ58 and AQ33. The continuity of the lower permeable unit for AQ58 is typically associated with fines. There are several overlapping confined aquifers in the Nicomekl-Serpentine Valley that were difficult to distinguish based on the MLP geologic model. Most flowing artesian wells appear to be screened below confining material; however, several may be screened in unconfined AQ35. The northward and westward sloping topography around unconfined AQ35 likely contributes to flowing artesian conditions. The majority of flowing artesian conditions appear to be attributed to low permeability material overlying aquifer material and the topographical transition from upland to lowland as discussed in the following.

Fig. 10
figure 10

Geologic cross-section C–C′ and D–D′ showing predicted lithofacies based on alternative 2 data and MLP interpolation algorithm, artesian well locations (red tubes), and inferred provincial mapped aquifers. The lines of cross section are shown in Fig. 6

The distribution of clay in relation to flowing artesian wells in the Nicomekl-Serpentine valley is shown in Fig. 11. Till is not shown since the interpolated spatial extent is limited. The majority of clay cells are connected throughout the area of interest with variability in both the vertical and horizontal directions. The top view in Fig. 11 shows clay intersecting the tops of most flowing artesian wells. The bottom view shows the bottom of most flowing wells extending through the clay. This suggests the model adequately represents the confining unit that contributes to flowing artesian conditions in the Nicomekl-Serpentine Valley. Aquifer material confined by clay includes sand and gravel (e.g. standardized as coarse) and finer-grained material like silty sand (e.g. standardized as fines). Most flowing artesian wells appear to be screened in confined aquifers; however, several are screened in unconfined AQ35 where artesian conditions exist due to the topographical transition from upland to lowland.

Fig. 11
figure 11

Clay lithofacies (blue) and flowing artesian wells (red tubes) for the local area shown in a top view and b bottom view of the geologic model. Lateral extent of the local area is shown in Figs. 1 and 6

Discussion

Three data selection alternatives were considered to evaluate the training and prediction capabilities of MLP. The main differences between the datasets include the level of effort to process data, number of samples, frequency distribution of lithofacies, and spatial distribution of data. Alternative 1 involved the least amount of processing and had the lowest number of samples, typically under-representing bedrock and clay due to limited point data representing these thick intervals. Alternative 2 required more data processing effort but resulted in a larger dataset that was more regularly spaced in the vertical direction and had fewer lateral locations that potentially reduced noise in the data by using the lithofacies mode. For alternative 3, the dataset from alternative 2 was lumped into spatial clusters with consideration of the entire mesh domain. Alternative 3 required the greatest amount of effort for data processing because of hyperparameter tuning and training with SOM prior to using MLP. Hyperparameter optimization and ANN training can introduce variability in results although effort was taken to automate evaluation of multiple combinations of hyperparameters and to include cross-validation (e.g. multiple subsets) to inform training inline with current best practices (Pedregosa et al. 2011; Vettigli 2018). Testing data were also used to evaluate the generalization performance of the MLP models when making predictions on unseen data.

The MLP models for all three alternatives achieve a lower log-loss compared to a naive classification model which assumes the same probability for each lithofacies. Alternative 2 data resulted in the best MLP performance based on better performance metrics and cross-sectional reviews that indicated more complexity in the subsurface as expected for the study area. The combination of SOM and MLP (alternative 3) did not perform the best despite the enhanced pattern recognition anticipated using this approach. Heuristics used to size the SOM grid may have been insufficient to spatially cluster data in a manner to reproduce subsurface complexity.

A possible approach to improve MLP predictions may be to exclude bedrock as a category. Alternatively, a bedrock surface could be used to constrain cells within the 3D mesh used to predict unconsolidated lithofacies. This aligns with modelling objectives that focus on glacial deposits while representing bedrock as a surface. The uncertainty associated with lithofacies predicted at deeper depths is likely underestimated using entropy as a metric given the limited distribution of deep subsurface data. It may be beneficial to combine entropy with another metric based on data density or proximity to a cell with data to better reflect areas where data extrapolation occurs and uncertainty increases. In addition, the impact of hyperparameter selection on model development was not evaluated and could introduce additional uncertainty in predictions.

In general, the geologic model generated using alternative 2 data and the MLP algorithm performed well at representing the subsurface in the Langley area based on cross-sectional review with the potential issue of extrapolation at depth. Confining units that contribute to artesian conditions in the Nicomekl-Serpentine Valley were also adequately represented by MLP predictions.

This study provides an initial conceptualization of glacial sediments in the subsurface of the Fraser-Whatcom Basin. Material descriptions of lithology (e.g. borehole log) provide the necessary vertical coverage but are likely insufficient for lateral characterization of continuous subsurface features as discussed in other studies (Cummings et al. 2012; Frind et al. 2014; Jørgensen et al. 2015). This study did not evaluate the impact of data density on the predicted results. Additional soft data (e.g. geophysics, fence diagrams from Halstead 1986) that provide spatially continuous or profile sources of information could be incorporated into the geologic database to improve the representation of subsurface conditions.

Conclusions

The feasibility of using MLP to interpret glacial deposits in the subsurface of the Fraser-Whatcom Basin was explored in this study. Based on the results, MLP appears to be a promising algorithm to solve multiclass classification problems related to modelling complex glacial deposits in the subsurface. This has the benefit of interpreting subsurface conditions using categorical data instead of numerical information which is typically more readily available for hydrogeologic applications. This study showed additional processing effort to create a more regular dataset using the lithofacies mode in each cell of the mesh produced better results compared to directly using borehole intervals. The stochastic nature of training MLP also makes it possible to generate multiple geologic models, which has been recommended to quantify the sensitivity of groundwater flow to geologic architecture as part of groundwater modelling (Poeter and Anderson 2005; Refsgaard et al. 2012; He et al. 2013; Lukjan et al. 2016).