Abstract
The feasibility of using multilayer perceptron (MLP), an artificial neural network, was evaluated to predict lithofacies in complex glacial deposits within the Fraser-Whatcom Basin in southwest British Columbia, Canada, and northwest Washington State, USA. Descriptions of materials from borehole logs were standardized into lithofacies using natural language processing techniques to reduce subjectivity in classification and improve automation. Three data-selection alternatives were considered to evaluate the training and prediction capabilities of MLP. Block-model representations of the subsurface were created and the best geologic realization was verified against geologic cross-sections from independent studies, evidence of hydraulic connectivity between aquifers, and the occurrence of artesian wells. Verification results showed MLP predictions were typically more generalized but produced similar subsurface trends and recreated confining units contributing to local artesian conditions. MLP appears to be a promising algorithm to solve multi-class classification for geologic modelling purposes. The workflow developed has the added benefit of being stochastic with the potential to generate multiple geologic realizations to account for uncertainty in heterogeneity.
Résumé
La faisabilité de l’utilisation du perceptron multicouche (MLP), un réseau neuronal artificiel, a été évaluée pour prédire les lithofaciès dans les dépôts glaciaires complexes du bassin Fraser-Whatcom dans le sud-ouest de la Colombie-Britannique, au Canada, et le nord-ouest de l’État de Washington, aux États-Unis d’Amérique. Les descriptions des matériaux provenant des diagraphies ont été normalisées en lithofaciès à l’aide de techniques de traitement du langage naturel afin de réduire la subjectivité de la classification et d’améliorer l’automatisation. Trois possibilités de sélection des données ont été envisagées pour évaluer les capacités d’apprentissage et de prédiction du MLP. Des représentations du sous-sol sous forme de blocs ont été créées et la meilleure réalisation géologique a été vérifiée considérant des coupes géologiques transversales provenant d’études indépendantes, la connectivité hydraulique entre les aquifères avérée et l’occurrence de puits artésiens. Les résultats de la vérification ont montré que les prédictions du MLP avaient typiquement une généralisation importante, mais qu’elles produisaient des tendances souterraines similaires et recréaient des unités de confinement contribuant aux conditions artésiennes locales. Le MLP semble être un algorithme prometteur pour résoudre la classification multi-classe à des fins de modélisation géologique. Le processus de travail mis au point présente l’avantage supplémentaire d’être stochastique, avec la possibilité de générer plusieurs réalisations géologiques pour tenir compte de l’incertitude de l’hétérogénéité.
Resumen
Se evaluó la viabilidad del uso del perceptrón multicapa (MLP), una red neuronal artificial, para predecir litofacies en depósitos glaciares complejos dentro de la cuenca Fraser-Whatcom en el suroeste de la Columbia Británica, Canadá, y el noroeste del estado de Washington, EEUU. Las descripciones de materiales de los registros de sondeos se estandarizaron en litofacies utilizando técnicas de procesamiento del lenguaje natural para reducir la subjetividad en la clasificación y mejorar la automatización. Se consideraron tres alternativas de selección de datos para evaluar las capacidades de entrenamiento y predicción del MLP. Se crearon representaciones del subsuelo mediante modelos de bloques y la mejor realización geológica se verificó con secciones geológicas de estudios independientes, pruebas de conectividad hidráulica entre acuíferos y la presencia de pozos artesianos. Los resultados de la verificación mostraron que las predicciones MLP eran típicamente más generalizadas, pero producían tendencias subsuperficiales similares y recreaban unidades de confinamiento que contribuían a las condiciones artesianas locales. MLP parece ser un algoritmo prometedor para resolver la clasificación multiclase con fines de modelización geológica. El flujo de trabajo desarrollado tiene la ventaja añadida de ser estocástico, con la posibilidad de generar múltiples realizaciones geológicas para tener en cuenta la incertidumbre en la heterogeneidad.
摘要
在加拿大不列颠哥伦比亚省西南部和美国华盛顿州西北部的Fraser-Whatcom盆地中, 评估了使用人工神经网络的多层感知器 (MLP) 方法预测复杂冰川沉积物岩相的可行性。采用自然语言处理技术将钻孔记录中的岩性描述标准化为岩相, 以减少分类的主观性并改善自动化。考虑了三种数据选择方案来评估MLP的训练和预测能力。创建了地下的块模型表示, 并通过独立研究的地质剖面、含水层之间的水力联系证据以及自流井的出现来验证最佳地质分布情况。验证结果显示, MLP的预测通常更具一般性, 但产生了类似的地下趋势, 并重新创建了导致局部自流条件的承压单元。MLP似乎是解决地质建模多类别分类问题的一种有前景的算法。开发的工作流程的附加优势是具有随机性, 有潜力生成多个地质分布以应对非均质性的不确定性。
Resumo
A viabilidade do uso do perceptron multicamadas (MLP), uma rede neural artificial, foi avaliada para prever litofáceis em depósitos glaciais complexos na Bacia de Fraser-Whatcom, no sudoeste da Colúmbia Britânica, Canadá, e no noroeste do Estado de Washington, EUA. As descrições de materiais dos registros dos furos de sondagem foram padronizadas em litofáceis usando técnicas de processamento de linguagem natural para reduzir a subjetividade na classificação e melhorar a automação. Três alternativas de seleção de dados foram consideradas para avaliar os recursos de treinamento e previsão do MLP. Foram criadas representações de modelos de blocos da subsuperfície e a melhor realização geológica foi verificada com base em seções transversais geológicas de estudos independentes, evidências de conectividade hidráulica entre aquíferos e a ocorrência de poços artesianos. Os resultados da verificação mostraram que as previsões do MLP eram normalmente mais generalizadas, mas produziam tendências semelhantes na subsuperfície e recriavam unidades de confinamento que contribuíam para as condições artesianas locais. O MLP parece ser um algoritmo promissor para resolver a classificação multiclasse para fins de modelagem geológica. O fluxo de trabalho desenvolvido tem a vantagem adicional de ser estocástico, com a possibilidade de gerar várias realizações geológicas para levar em conta a incerteza na heterogeneidade.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Canada and the northern United States are covered by complex and heterogeneous glacial deposits. These deposits contain major aquifers that are important sources of water and are increasingly being relied upon as water scarcity increases and surface-water availability decreases due to human activity and climate change (Vaccaro 1992). Advancements in groundwater modelling software have resulted in more effective use of regional groundwater flow models as tools to support sustainable groundwater management (Pasanen and Okkonen 2017); however, conceptualization of geology still presents the greatest uncertainty, particularly for glacial deposits (Anderson et al. 2015; Refsgaard et al. 2012). The continuity of glacial deposits influences aquifer extents, where hydraulic interactions occur along the flow path (e.g. recharge and discharge areas), groundwater chemistry, and the development of hydraulic conditions (e.g. confined aquifers, artesian wells; Bayless et al. 2017).
Geologic models are the backbone of every groundwater model and are critical to understanding groundwater flow within complex glacial aquifer systems (Pasanen and Okkonen 2017). The deposition of glacial deposits varies both temporally and spatially and results in complex relations and hierarchical structures that make them difficult to model (Schorpp et al. 2022). Traditional methods used to develop three-dimensional (3D) geologic models for groundwater applications rely on manual interpretations that incorporate known geologic relationships and expert knowledge into a deterministic model; however, such models are typically time-consuming to create, difficult to update and do not adequately represent subsurface heterogeneity or account for uncertainty in geologic structure (Jørgensen et al. 2015; Kearsey et al. 2015). Geostatistical approaches have been applied using numerical data (e.g. geophysical surveys) to make spatial predictions in the subsurface but typically have limitations associated with the underlying assumption of stationarity (i.e., statistics used to describe data distribution that does not change throughout the spatial domain) and the inability to reproduce subsurface complexity (Bianchi et al. 2015).
There are several examples in the literature that provide recommendations for geologic model development specific to glacial environments that differ from traditional approaches and embrace stochastic methods. Stochastic methods are more automated, allow representation of complexity and evaluation of uncertainty that is well suited for glacial deposits (Refsgaard et al. 2012; Jørgensen et al. 2015; Toth et al. 2016). Kearsey et al. (2015) use indicator kriging (IK) and sequential indicator simulation (SIS) as stochastic methods to investigate lithological variations within glacial and post-glacial deposits. They advocate for a lithofacies-based approach instead of stratigraphic layers for geologic modelling, especially in complex, heterogenous deposits where it can be difficult to locate stratigraphic boundaries. The 3D geologic model constructed by Jørgensen et al. (2015) uses a combination of manual and stochastic methods to represent subsurface heterogeneity. The developers of ArchPy (Schorpp et al. 2022) provide automated solutions for 3D geologic modelling of Quaternary aquifers to any desired hierarchical level (stratigraphic unit, stratigraphic subunits, lithofacies, hydraulic properties) using stochastic methods based on multi-point statistics, variogram-based categorical simulation techniques, and Gaussian random functions.
Artificial neural network (ANN) is a widely used machine learning algorithm inspired by biological neural networks invented in the 1950s that can be trained to identify hidden patterns and structures without any assumptions on data distribution. ANNs are suitable for geoscience applications because they can model complex nonlinear dependencies, are adaptive for managing nonstationary data, and allow the integration of contextual information (Kanevski et al. 2009). Most ANN applications in geoscience have generally focused on using numerical data, including geophysical survey data to interpret lithofacies (Cracknell and Reading 2014; Baykan and Yilmaz 2010; Morgan 2018), historical groundwater level measurements to make future predictions (Rajaee et al. 2019), as well as seismic and borehole geophysics for reservoir modelling (Vo Thanh et al. 2019; Ansah et al. 2020).
ANN, however, can also be used to solve classification problems using categorical information—for example, Rizzo and Dougherty (1994) use ANN to characterize the 2D distribution of hydraulic conductivity based on three classes (low, medium, and high). ANN also proved to be a useful methodology for a basin-wide study characterizing complex and heterogeneous lithology with a multilayered aquifer system at the borehole level (Sahoo and Jha 2017). The ANNs used in these aforementioned studies include self-organizing maps (SOMs) and multilayer perceptron (MLP). SOMs are a single-layer feedforward neural network used to produce a low-dimensional representation of a dataset with a high-dimensional space or multiple features. SOMs have been combined with MLP to enhance pattern recognition in hydrogeologic applications (Rizzo and Dougherty 1994; Sahoo and Jha 2017). MLP is a feedforward neural network that uses supervised learning to make predictions. The architecture of a MLP consists of an input layer, an output layer, and one or more hidden layers. The layers contain neurons that are connected by weights and act as computational units to convert incoming signals into outputs using activation functions. The stochastic nature of MLP training and probability outputs from predictive modelling are key features that allow for multiple geologic realizations as well as exploration of geologic uncertainty.
Borehole logs from public well records are typically the most abundant type of subsurface data for hydrogeologic applications and contain large amounts of textual data that can be leveraged to classify similar materials and provide qualitative information about the subsurface (Russell et al. 1998; Allen et al. 2008; Bayless et al. 2017). Natural Language Processing (NLP) techniques provide a statistical approach to understanding language and have recently been applied in geoscience applications to explore the geologic lexicon so that material classification is more automated and less subjective (Padarian and Fuentes 2019; Fuentes et al. 2020). Fuentes et al. (2020) provide one of the first geoscience applications where a numerical representation of textual data (word embedding) from borehole logs is used to spatially predict lithology using MLP and a 2.5D interpolation approach. However, an example of using categorical data to predict complex subsurface conditions directly in 3D using ANNs was not found in literature.
The main goal of this research is to leverage textual data from borehole logs and apply machine learning to solve the 3D multi-class classification problem of predicting lithofacies in the subsurface. Different data selection alternatives are considered and the impact on the training and prediction capabilities of MLP are evaluated. This study aims to effectively reproduce the geologic complexity associated with glacial environments that can be used in groundwater models to support water resource management. The study addresses the major source of uncertainty in groundwater modelling (e.g. heterogeneity) and provides an innovative technique and workflow for predicting lithofacies in complex glacial deposits.
Materials and methods
Study area
The regional basin containing the Fraser Lowland in southwest British Columbia, Canada, and the northwest portion of Whatcom County, Washington, United States, was selected as the study area (Fig. 1). It has gently rolling and flat-topped uplands separated by wide, flat-bottomed lowlands bounded by mountainous terrain and the sea. Drainage is directed towards the Fraser and Nooksack rivers as well as smaller local rivers that all discharge to the Salish Sea.
Geologic setting
The geologic setting of the study area has been influenced by tectonic and glacial events. The study area lies largely within a sedimentary basin surrounded by three different geological basements—Coast Mountains, Cascade Mountains, and Vancouver Island Ranges (Jones 1999; Mustard and Rouse 1994). Bedrock outcrops are generally limited to the northwestern portion of the study area along Burrard Inlet and Sumas Mountain in the central valley. Bedrock surfaces have been prepared separately for the Canadian and American portions of the study area (Hamilton and Ricketts 1994; Eungard 2014). They provide a generalized representation of the bedrock surface given the limited number of deep wells with significant discrepancies particularly along the international boundary where the surfaces can be compared. The bedrock surface can be over 300 m deep and generally becomes shallower along the margins of the mountains. Fault structures near Sumas Mountain are also inferred to locally deepen the bedrock surface (Mustard and Rouse 1994).
Multiple glaciations, fluctuations in sea level, and isostatic adjustments have contributed to a complex distribution of Quaternary deposits in the study area. The stratigraphic framework is primarily based on Pleistocene deposits associated with the Fraser Glaciation and post-glacial deposits from the Holocene (Figs. 2 and 3).
During the beginning of the Fraser Glaciation, advancing outlet glaciers from the Cordilleran Ice Sheet led to the deposition of thick, proglacial outwash called Quadra Sand (Easterbrook 1986; Clague 1991; Vaccaro et al. 1998). The Coquitlam Stade and Port Moody Interstade occurred early in the Fraser Glaciation (Ward and Thomson 2004). Quadra Sand is generally not exposed at the surface and deposits from the Coquitlam Stade and Port Moody Interstade are limited in extent based on surficial mapping.
The Vashon Stade was the most extensive advancement of the Cordilleran Ice Sheet. At the Vashon stadial maximum, the study area was completely covered by ice more than 1.5 km thick (Clague et al. 1991). Isostatic depression of the land mass occurred from the weight of the ice sheet. During deglaciation of the Cordilleran Ice Sheet, marine waters inundated the area and resulted in a calving embayment that induced rapid ablation of the glacier (Armstrong 1957). Capilano sediments were deposited along the western portion of the study area during this period. These sediments include glaciomarine silt and clay containing drop-stones from melting icebergs and fossil shells.
Glacier retreat slowed and the ice margin stabilized in the central portion of the study area. Thick interbedded glaciomarine and glacial sediments deposited within the area of the fluctuating ice margin are included in the Fort Langley Formation. The glacier that persisted became land-based as isostatic rebound of the terrain and sea level rise started to occur. This led to initial development of drainage networks on the emerged terrain while the sea occupied the lower reaches of many of the paleovalleys (Kovanen 2002). A minor-readvancement during the Sumas Stade extended the glacier to the southwest where Sumas till and glaciofluvial deposits were deposited on top of the Fort Langley Formation.
Following the final retreat of the glacier, nonglacial Salish sediments were deposited by marine, fluvial, mass movement and aeolian processes. Peat accumulated in poorly drained areas in surface depressions or valley bottoms. Modern alluvial deposition from aggradation of the floodplains and major deltas of the Fraser and Nooksack occurred and continues today.
While the Quaternary history of the area has been well-studied and surficial geology is well mapped, the subsurface geology is less understood at the basin scale with cross-sections from existing geological maps providing only local context.
Hydrogeologic framework
The hydrogeologic framework varies across the study area because of partial glacier advances and the timing of glacial retreat. Aquifers exist in the study area within a complex sequence of glaciated materials (Halstead 1986; Vaccaro et al. 1998). The aquifer units generally consist of coarse-grained outwash deposited during glacial advances and retreats, proglacial deposits, and fluvial sediments deposited during glacial interstades. The semiconfining and confining units generally consist of till, glaciomarine, lacustrine and organic deposits. Alluvial sediments in the major broad alluvial valleys overlie eroded drift sequences. Bedrock that underlies these sediments forms the lateral and basal boundaries of the aquifer system.
Over 100 unconsolidated aquifers have been locally mapped within the Canadian portion of the study area (BCMECC 2023). A hydrogeologic framework and numerous fence diagrams within the central portion of the study area in Canada are detailed in Halstead (1986). Three regional aquifers including an alluvium aquifer within the Nooksack River valley, glaciofluvial sediments of the Sumas Drift (Fraser Aquifer), and a deeper confined aquifer in the Quadra Sands (Puget Aquifer) have been conceptualized within Washington State (Vaccaro et al. 1998). A regional framework for aquifers in quaternary sediments in the glaciated portions of the United States has also been developed (Haj et al. 2018; Yager et al. 2019).
General modeling approach
The general approach includes development of a geologic database, standardization of lithofacies based on descriptions from borehole logs, and creation of a 3D mesh to facilitate data selection for MLP training. MLP was then used to generate a geologic realization for three data selection alternatives with the best outcome verified against subsurface interpretations from independent studies and hydraulic indicators for the region. The study was executed using Python programming language with final visualization of results using ParaView (Ayachit 2015) (v5.10.1).
Geologic database
The geologic database includes information from public borehole logs and point data from surficial geology mapping. Approximately 13,900 borehole logs from various data collections managed by government agencies in both British Columbia and Washington State were included in the database (Fig. 1; Table 1). For the United States, water well records from WSWR required manual processing since multiple well reports from multiple wells can exist at a given location (Eungard 2014). The WSWR well report having the deepest depth within a 1-km grid spacing was selected for manual entry of lithology data. As such, the density of subsurface information in the United States is much lower compared to what was used in Canada (Fig. 1). Boreholes from environmental and geotechnical investigations are relatively shallow (<20 m deep), while water wells are deeper but generally less than 100 m below ground surface. Boreholes from oil and gas wells were generally the deepest but are sparsely distributed within the study area.
Borehole data were augmented with ~40,000 data points based on geomorphic mapping by Kovanen and Slaymaker (2015). This mapping provides a single representation of surficial deposits for the study area recreated using surficial mapping (Dunn and Ricketts 1994; Washington Division of Geology and Earth Resources 2016). Lithofacies were assigned predominantly based on geomorphic units with consideration of material descriptions from surficial mapping (Table 2). Point data were generated using a grid spacing of 1 km as well as vertical points every 5 m to a depth of 100 m to provide additional vertical coverage where bedrock outcrops were mapped.
Lithofacies standardization
Material descriptions of lithology from borehole logs were first compiled and processed using various text-cleaning methods (e.g. correcting spelling mistakes, standardizing short-forms and abbreviations). After text cleaning, fewer than five words were typically used with two or three words being the most common to describe lithology. The processed lithology was classified into lithofacies using a semiautomated approach that included NLP techniques. The multistep process is described in the electronic supplementary material (ESM). Classification was primarily based on size and composition of grains using a modification of the Unified Soil Classification System (USCS). Hydraulic characteristics (e.g. hydraulic conductivity, yield estimates) were also considered to assign material grouping codes (see Table S3 in the ESM). The original 20,000 unique material descriptions were consolidated into the top 45 descriptor combinations, grouped into 10 material grouping codes, and classified into five lithofacies (coarse, fines, clay, till, bedrock) as shown in Table 2. This captured 92% of the total thickness from the processed lithology dataset.
Mesh generation
A mesh was created using the Pyvista (Sullivan and Kaszynski 2019) Python package to generate a 3D representation of the study area and to facilitate processing lithofacies. The lateral boundaries of the mesh were based on the extent of Quaternary mapping, excluding the land north of Burrard Inlet and intermountain valleys. A cell size of 200 m wide by 200 m long and 5 m high was used to capture the major variability in lithofacies for regional modelling purposes. A uniform vertical cell height of 5 m was established based on a statistical review of lithofacies thickness and depth. The greatest generalization using this approach occurs near the surface where lithofacies tend to be less than 5 m thick.
The top of the mesh was generated based on a digital elevation model (DEM) that combined topographic and bathymetric data (Canadian Hydrographic Service 2019; DMTI Spatial Inc. 2002; US Geological Survey 2019; Finlayson et al. 2000). All datasets were reprojected to NAD 1983 UTM zone 10 with elevation units in metres above sea level (masl) and then combined. A cell spacing of 30 m and automatic filling of elevation voids was initially completed followed by resampling of the surface to generate a DEM with 200-m cell spacing. The bottom of the mesh was established at an elevation of –150 masl given the vertical extent of lithofacies that could act as aquifers, which also corresponds to the deepest extent considered in groundwater modelling studies within the region (Golder 2005). The resultant mesh has over 3.3 million cells with a surface area of ~670 km2.
Data selection alternatives
Three data selection alternatives were considered to evaluate the training and prediction capabilities of MLP. Alternative 1 consists of lithofacies spatially represented using coordinates and interval depths from the geologic database. Alternative 2 considers a 3D mesh and uses the cell centroid coordinates and the lithofacies mode within each cell to provide a more regularly spaced dataset. To determine the lithofacies mode, a point cloud was generated spacing data vertically every metre for each interval. The most frequently occurring lithofacies within each cell was then used as the target for training. Like alternative 2, alternative 3 uses the lithofacies mode within each cell but takes the coordinates of the cell centroid and maps them on a 2D grid using SOM before being used in the MLP algorithm. Table 3 provides an overview of each data selection alternative, including the approach for input features and targets as well as the number of samples.
A representative distribution of samples for each alternative is shown in Fig. 4 to highlight spatial differences. Alternative 2 has fewer lateral locations compared to Alternative 1 because cell centroids are used instead of well locations. However, there are more samples for alternative 2 in the vertical direction since intermediate points are added to intervals greater than 5 m thick. For alternative 3, the spatial distribution of lithofacies is now assigned using coordinates from 2D mapping. Data are grouped as clusters with multiple lithofacies assigned to the same 2D coordinates. All three alternatives have imbalanced lithofacies distributions, meaning that the lithofacies are not represented equally, but the distribution is not considered extreme (e.g. ratios below 1:5).
Machine learning
MiniSom (v. 2.3.0, Vettigli 2018) and scikit-learn (v.0.24.2, Pedregosa et al. 2011) were used to implement the SOM and MLP algorithms respectively. For SOM, normalization was used to scale the coordinates of the cell centroids. For the first stage, the elevation of each cell centroid was set to zero; therefore, only the normalized eastings and northings were used to initially train the SOM. For the second stage, the normalized northing, easting, and elevation of the cell centroids were used to refine mapping of the samples onto the 2D grid. SOM hyperparameters selected for optimization include sigma and learning rate. Hyperopt (Bergstra et al. 2013) was used for hyperparameter optimization to determine the best combination that would result in the lowest error.
For MLP, training and testing subsets from the datasets were made using an 80 and 20% split, respectively. A stratified splitting approach was used to ensure each set contains approximately the same percentage of lithofacies as the original dataset. Input data were transformed using the standardization scaling method. Four hyperparameters (hidden layer size, alpha, batch size, and initial learning rate) were selected for optimization. These hyperparameters were chosen because they appeared to have the greatest impact on MLP performance when using the ReLU activation function and Adam solver. The hyperparameter default values were used and generally modified by an order of magnitude to establish upper and lower limits of the search space considered for optimization. A stratified threefold cross-validation grid search implemented with the GridSearchCV optimizer in scikit-learn was used to objectively select the combination of hyperparameters that achieved the best performance on the training dataset.
Once the hyperparameters were established, the MLP was trained using all the training data. The testing data were then used to evaluate the generalization performance of the model when making predictions on unseen data. The performance metrics used to evaluate the models include log-loss (i.e. cross-entropy), balanced accuracy, and the confusion matrix. MLP tries to minimize log-loss using stochastic gradient descent by adjusting weights during training; therefore, a lower log-loss indicates better performance. Balanced accuracy is the average fraction of relevant instances that were correctly predicted. The balance accuracy scores ranged from 0 to 1 with values closer to 1, indicating good accuracy. A confusion matrix is a common technique used to summarize the performance of a classification algorithm. It highlights the errors being made by the MLP or what is making the MLP ‘confused’ when making predictions. Once the training model was validated and tested, all the data were used to train and establish the final weights of the predictive model.
The main purpose of the predictive model is to take lithofacies data for each alternative and make predictions where information is not available in the study area. Cell centroid coordinates from the mesh were used for alternatives 1 and 2, while the coordinates from 2D SOM mapping were used for alternative 3. The same scaling methods were used to transform inputs prior to running the predictive models. This resulted in the prediction of lithofacies at ~3,336,000 unique cell centroids and 7,700 unique SOM neurons. The output from the predictive model includes lithofacies and probability predictions at each cell centroid. The negative log of probability (between 0 and 1) for the predicted lithofacies in each cell was used to calculate entropy. A low entropy indicates less uncertainty while higher values suggest more uncertainty. The model outputs were assigned as attributes to the mesh using PyVista and then exported as a voxel model for 3D viewing using Paraview.
Results
MLP training and testing
MLP training for each alternative was evaluated based on a review of log-loss, balanced accuracy scores, and confusion matrix results. At the end of training, alternative 2 had the lowest log-loss at 0.8, followed by Alternative 3 at 1.10 and Alternative 1 at 1.15. For a naïve classification model, which assumes the same probability for each lithofacies, the log-loss score would be 1.3, e.g. –log(0.2). The MLP models for all three alternatives achieve a lower log-loss compared to a naïve classification model.
MLP tuning (e.g. hyperparameter optimization, number of epochs) improved training performance compared to the use of default MLP values for all alternatives. Tuning improved balanced accuracy scores by 9–17%. Alternative 2 had the highest training and testing scores of 65 and 60%, respectively. Model variance (difference between training and testing score) is lowest for alternative 3 and highest for alternative 2 with values below 5%. The balanced accuracy scores indicate modelling results with a high bias and low variance. Typically, training and testing accuracy scores above 80% indicate good performance for other machine learning applications (Brownlee 2022); however, this may not be achievable given the multiclass classification problem and variability in subsurface data for this study.
The confusion matrix from the testing results showed that alternative 2 had the best precision for all lithofacies. Bedrock was most accurately predicted despite bedrock samples occurring less frequently in all three datasets. This could be attributed to the continuity of bedrock once it is encountered. The prediction performance for unconsolidated lithofacies follows the same trend for all alternatives where ‘coarse’ has the second highest precision with lower precision in descending order for clay, fines, and till. This may be attributed to the distribution of unconsolidated lithofacies or could be associated with discontinuity of unconsolidated lithofacies in the subsurface and the underfitting of the model to capture this level of complexity. Clay and till are most often confused with each other, while coarse seems to be confused most often with till and fines. Fines are commonly confused with till and clay.
Predictions
A cross-sectional view of predicted lithofacies for each alternative is shown in Fig. 5. Alternative 1 appears to be the most underfit (e.g. unable to model training data nor make predictions), which is expected given that it has the lowest balanced accuracy score. Alternative 2 shows more complexity in the subsurface compared to alternatives 1 and 2. Alternative 3 is more comparable to alternative 2, but is more generalized. This suggests the dimensions of the 2D SOM (used in alternative 3) could be inadequate to represent the subsurface complexity. Alternative 2 resulted in a larger coverage and higher probability for coarse, clay and till compared to the other alternatives. As shown in Fig. 5, the spatial coverage of higher entropy values is generally greater for alternatives 1 and 3 compared to alternative 2. The average entropy values of 0.72, 0.28, and 0.64 for alternatives 1, 2, and 3, respectively, indicate the most confidence in alternative 2 results. Extrapolation was required to make predictions given the limited distribution of deep boreholes available for the region.
Compared to alternatives 1 and 3, MLP predictions for alternative 2 had the lowest log-loss, highest balanced accuracy scores for both training and testing, and highest precision for all lithofacies. Based on the cross-sectional reviews, alternative 2 shows more complexity in the subsurface compared to alternatives 1 and 3. Alternative 2 may have performed better because lithofacies were averaged based on the cell mode which may have reduced some noise associated with the spatial variability of data for alternative 1. The size of the 2D SOM for alternative 3 may have been inadequate to represent the subsurface complexity. Alternative 2 was selected as the preferred alternative based on better performance metrics, lower entropy, and cross-sectional reviews that indicated more complexity in the subsurface compared to the other alternatives.
Verification
Geologic modelling results from MLP predictions using alternative 2 data were verified by considering published interpretations of the subsurface from independent studies. Verification efforts focused on local areas (Fig. 6), including the Township of Langley (TOL) and the Nicomekl-Serpentine Valley between Langley and Surrey because of the high density of wells, number of mapped aquifers, and availability of groundwater modelling reports.
Township of Langley
Geologic interpretations within the Township of Langley (TOL) published by Golder (2005) were reviewed to verify MLP predictions. The approximate locations of geologic cross-sections A–A′ and B–B′ within the Langley Uplands are shown in Fig. 6. Comparisons of the geologic interpretations provided by Golder (2005) and the alternative 2 MLP geologic model at the cross-section locations are provided in Fig. 7 (A–A′) and Fig. 8 (B–B′).
Cross-section A–A′ (Fig. 7) from MLP predictions shows aquifer (AQ) 35 separately from AQ33 and AQ1144 consistent with the interpretation by Golder (2005). The distribution of coarse material lumps AQ33 and AQ1144 into one aquifer unit which may be reasonable given arbitrary cut-offs used by Golder to establish aquifer extents. Provincial mapping of AQ33 overlies AQ1144, although this is not shown on the Golder cross-section. There are interconnected areas between the three aquifers elsewhere in the TOL area described by Golder 2005. Coarse material from MLP predictions is interpolated at the surface where AQ1144 has been mapped and extends to deeper depths compared to interpretations by Golder. In general, the MLP geologic model performs well at representing the subsurface in this area with the potential issue of extrapolation at depth.
For geologic cross-section B–B′ (Fig. 8), three distinct aquifer units are represented in the MLP geologic model while five aquifers are interpreted by Golder (2005). The original cross-section B–B’ from Golder shows the aquifer material for AQ32 as sand and gravel but is described as ‘a body of fine sands, sand and locally gravel and till’ in the report. The MLP geologic model shows a relatively large fines unit approximately 20 m thick in this area. AQ33, AQ1193, and AQ32 are lumped in the lower coarse unit from MLP predictions and have a larger extent compared to interpretations by Golder. The description for AQ1193 in Golder (2005) indicates it is located between +20 and –20 masl; therefore, it may be reasonable for the lower coarse unit to extend below 0 masl in this area. In general, the MLP cross-sections show a more generalized representation of aquifers and greater connectivity of permeable units compared to the Golder cross-sections.
Golder identified 18 major aquifers based on hydrostratgraphic interpretation of geologic units. For these major aquifers, permeable units that overlap by at least 10% and any aquitard between overlapping units less than 10 m thick were considered by Golder as ‘well-connected hydraulically’. Coarse lithofacies with a probability above 50% from MLP modelling were arbitrarily selected and grouped by connectivity to represent geologic areas as ‘well-connected hydraulically’ (Fig. 9). This includes lithologic materials described as sand, sand and gravel, and gravel, but does not include fines such as ‘silty sand’ which may be important to this local area for aquifer characterization purposes (Figs. 7 and 8). Grouping by connectivity assigns a zone for each group of connected cells.
The unconfined aquifers including the Brookswood Aquifer (AQ41), Fort Langley Aquifer (AQ41), Hopington AB Aquifer (AQ35), and Abbotsford A (AQ15) are represented by the MLP geologic model; however, some of the linear features, interpreted as meltwater channels by Golder (2005), and extending from the main volume of the Brookswood (AQ41) and Abbotsford A (AQ15) aquifers are not reproduced using MLP. A finer mesh resolution and additional surficial data points could potentially be used to model this connection.
The MLP geologic model shows a large connected volume that consolidates several of the mapped aquifers interpreted in the area. This consolidated representation of aquifers may be plausible given the potential for interconnection noted by Golder (2005); however, the use of a higher probability cut-off limit may be more representative of aquifer extents which would be more conservative for water exploration but less conservative for water management purposes. Some of the deeper aquifers in Golder (2005) are only partially represented or not reproduced by the MLP geologic model, likely due to aquifer materials with a greater fines content (e.g. silty sand) that are locally significant as an aquifer unit but are not captured by the coarse lithofacies. A more detailed review of data within this area would be required to support conceptualization of intertill aquifers given the limited frequency of data points categorized as till.
Nicomekl-Serpentine Valley
Most flowing artesian wells within the study area are within the Nicomekl-Serpentine valley in the Surrey-Langley area (Fig. 6). Flowing artesian wells can occur when the aquifer is confined by low permeability materials or where there are large upward hydraulic gradients if the aquifer is unconfined. Well drilling advisory currently exists for AQ58 (Nicomekl-Serpentine; BC Ministry of Forests, Lands, Natural Resource Operations and Rural Development 2018) and are proposed for the western portion of AQ33 (West Aldergrove; Johnson et al. 2022) due to the potential for flowing artesian conditions.
AQ58 includes two permeable units consisting of a shallower unit generally occurring between –60 and –90 masl in the local upland area and a deeper unit up to 20 m thick generally occurring between –120 and –150 masl that underlies the upland but also extends along the northern portion of TOL (BCMECC 2016a). AQ33 is described as an intertill aquifer consisting of two permeable units including a shallower unit between 5–15 m thick and a deeper unit up to 20 m thick, both sloping westward and merging along the western extent of the aquifer (BCMECC 2016b).
Geologic cross-sections that intersect AQ58 and AQ33 are shown in Fig. 10 based on the alternative 2 MLP geologic model. MLP predictions show upper and lower permeable units consistent with the description provided for AQ58 and AQ33. The continuity of the lower permeable unit for AQ58 is typically associated with fines. There are several overlapping confined aquifers in the Nicomekl-Serpentine Valley that were difficult to distinguish based on the MLP geologic model. Most flowing artesian wells appear to be screened below confining material; however, several may be screened in unconfined AQ35. The northward and westward sloping topography around unconfined AQ35 likely contributes to flowing artesian conditions. The majority of flowing artesian conditions appear to be attributed to low permeability material overlying aquifer material and the topographical transition from upland to lowland as discussed in the following.
The distribution of clay in relation to flowing artesian wells in the Nicomekl-Serpentine valley is shown in Fig. 11. Till is not shown since the interpolated spatial extent is limited. The majority of clay cells are connected throughout the area of interest with variability in both the vertical and horizontal directions. The top view in Fig. 11 shows clay intersecting the tops of most flowing artesian wells. The bottom view shows the bottom of most flowing wells extending through the clay. This suggests the model adequately represents the confining unit that contributes to flowing artesian conditions in the Nicomekl-Serpentine Valley. Aquifer material confined by clay includes sand and gravel (e.g. standardized as coarse) and finer-grained material like silty sand (e.g. standardized as fines). Most flowing artesian wells appear to be screened in confined aquifers; however, several are screened in unconfined AQ35 where artesian conditions exist due to the topographical transition from upland to lowland.
Discussion
Three data selection alternatives were considered to evaluate the training and prediction capabilities of MLP. The main differences between the datasets include the level of effort to process data, number of samples, frequency distribution of lithofacies, and spatial distribution of data. Alternative 1 involved the least amount of processing and had the lowest number of samples, typically under-representing bedrock and clay due to limited point data representing these thick intervals. Alternative 2 required more data processing effort but resulted in a larger dataset that was more regularly spaced in the vertical direction and had fewer lateral locations that potentially reduced noise in the data by using the lithofacies mode. For alternative 3, the dataset from alternative 2 was lumped into spatial clusters with consideration of the entire mesh domain. Alternative 3 required the greatest amount of effort for data processing because of hyperparameter tuning and training with SOM prior to using MLP. Hyperparameter optimization and ANN training can introduce variability in results although effort was taken to automate evaluation of multiple combinations of hyperparameters and to include cross-validation (e.g. multiple subsets) to inform training inline with current best practices (Pedregosa et al. 2011; Vettigli 2018). Testing data were also used to evaluate the generalization performance of the MLP models when making predictions on unseen data.
The MLP models for all three alternatives achieve a lower log-loss compared to a naive classification model which assumes the same probability for each lithofacies. Alternative 2 data resulted in the best MLP performance based on better performance metrics and cross-sectional reviews that indicated more complexity in the subsurface as expected for the study area. The combination of SOM and MLP (alternative 3) did not perform the best despite the enhanced pattern recognition anticipated using this approach. Heuristics used to size the SOM grid may have been insufficient to spatially cluster data in a manner to reproduce subsurface complexity.
A possible approach to improve MLP predictions may be to exclude bedrock as a category. Alternatively, a bedrock surface could be used to constrain cells within the 3D mesh used to predict unconsolidated lithofacies. This aligns with modelling objectives that focus on glacial deposits while representing bedrock as a surface. The uncertainty associated with lithofacies predicted at deeper depths is likely underestimated using entropy as a metric given the limited distribution of deep subsurface data. It may be beneficial to combine entropy with another metric based on data density or proximity to a cell with data to better reflect areas where data extrapolation occurs and uncertainty increases. In addition, the impact of hyperparameter selection on model development was not evaluated and could introduce additional uncertainty in predictions.
In general, the geologic model generated using alternative 2 data and the MLP algorithm performed well at representing the subsurface in the Langley area based on cross-sectional review with the potential issue of extrapolation at depth. Confining units that contribute to artesian conditions in the Nicomekl-Serpentine Valley were also adequately represented by MLP predictions.
This study provides an initial conceptualization of glacial sediments in the subsurface of the Fraser-Whatcom Basin. Material descriptions of lithology (e.g. borehole log) provide the necessary vertical coverage but are likely insufficient for lateral characterization of continuous subsurface features as discussed in other studies (Cummings et al. 2012; Frind et al. 2014; Jørgensen et al. 2015). This study did not evaluate the impact of data density on the predicted results. Additional soft data (e.g. geophysics, fence diagrams from Halstead 1986) that provide spatially continuous or profile sources of information could be incorporated into the geologic database to improve the representation of subsurface conditions.
Conclusions
The feasibility of using MLP to interpret glacial deposits in the subsurface of the Fraser-Whatcom Basin was explored in this study. Based on the results, MLP appears to be a promising algorithm to solve multiclass classification problems related to modelling complex glacial deposits in the subsurface. This has the benefit of interpreting subsurface conditions using categorical data instead of numerical information which is typically more readily available for hydrogeologic applications. This study showed additional processing effort to create a more regular dataset using the lithofacies mode in each cell of the mesh produced better results compared to directly using borehole intervals. The stochastic nature of training MLP also makes it possible to generate multiple geologic models, which has been recommended to quantify the sensitivity of groundwater flow to geologic architecture as part of groundwater modelling (Poeter and Anderson 2005; Refsgaard et al. 2012; He et al. 2013; Lukjan et al. 2016).
References
Allen DM, Schuurman N, Deshpande A, Scibek J (2008) Data integration and standardization in cross-border hydrogeological studies: a novel approach to hydrostratigraphic model development. Environ Geol 53:1441–1453. https://doi.org/10.1007/s00254-007-0753-3
Anderson W, Woessner WW, Hunt RJ (2015) Applied groundwater modeling: simulation of flow and advective transport, 2nd edn. Academic, San Diego, CA
Ansah EO, Vo Thanh H, Sugai Y, Nguele R, Sasaki K (2020) Microbe-induced fluid viscosity variation: field-scale simulation, sensitivity and geological uncertainty. J Petrol Explor Prod Technol 10:1983–2003. https://doi.org/10.1007/s13202-020-00852-1
Armstrong JE (1957) Surficial geology of New Westminister map-area, British Columbia. Paper 57-5, Geological Survey of Canada, Ottawa
Armstrong JE (1976) Surficial geology, Mission, British Columbia. Map 1485A, 1 sheet, scale 1:50,000, Geological Survey of Canada. https://doi.org/10.4095/108875
Armstrong JE (1977) Surficial geology, Chilliwack, British Columbia. Map 1487A, 1 sheet, scale 1:50,000, Geological Survey of Canada. https://doi.org/10.4095/108877
Armstrong JE, Hicock SR (1979) Surficial geology, Vancouver, British Columbia. Map 1486A, 1 sheet, scale 1:50,000, Geological Survey of Canada. https://doi.org/10.4095/108876
Armstrong JE, Hicock SR (1980) Surficial geology, New Westminister, British Columbia. Map 1484A, 1 sheet, scale 1:50,000, Geological Survey of Canada. https://doi.org/10.4095/108874
Ayachit U (2015) The paraview guide: a parallel visualization application. Kitware, Inc., USA
Baykan NA, Yilmaz N (2010) Mineral identification using color spaces and artificial neural networks. Comput Geosci 36:91–97. https://doi.org/10.1016/j.cageo.2009.04.009
Bayless ER, Arihood LD, Reeves HW, Sperl BJS, Qi SL, Stipe VE, Bunch AR (2017) Maps and grids of hydrogeologic information created from standardized water-well drillers’ records of the glaciated United States. US Geol Surv Sci Invest Rep 2015-5105. https://doi.org/10.3133/sir20155105
Bergstra J, Yamins D, Cox DD (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. Proc. of the 30th International Conference on Machine Learning (ICML 2013), Atlanta, GA, June 2013
Bianchi M, Kearsey T, Kingdon A (2015) Integrating deterministic lithostratigraphic models in stochastic realizations of subsurface heterogeneity: impact on predictions of lithology, hydraulic heads and groundwater fluxes. J Hydrol 531:557–573. https://doi.org/10.1016/j.jhydrol.2015.10.072
British Columbia Ministry of Environment and Climate Change Strategy (BCMECC) (2016a) Aquifer classification worksheet (aquifer reference number: 33). https://s3.ca-central-1.amazonaws.com/aquifer-docs/00000/AQ_00033_Aquifer_Mapping_Report.pdf. Accessed March 2019
British Columbia Ministry of Environment and Climate Change Strategy (BCMECC) (2016b) Aquifer classification worksheet (aquifer reference number: 58). https://s3.ca-central-1.amazonaws.com/aquifer-docs/00000/AQ_00058_Aquifer_Mapping_Report.pdf. Accessed March 2019
British Columbia Ministry of Environment and Climate Change Strategy (BCMECC) (2016d) Ground water aquifers. https://catalogue.data.gov.bc.ca/dataset/ground-water-aquifers. Accessed March 2019
British Columbia Ministry of Forests, Lands, Natural Resource Operations and Rural Development (2018) Well drilling advisory flowing artesian conditions: Surrey and Langley, BC. https://www2.gov.bc.ca/assets/gov/environment/air-land-water/water/water-wells/surrey_and_langley_bc.pdf. Accessed March 2019
Brownlee J (2022) How to improve deep learning performance. Machine Learning Mastery. https://machinelearningmastery.com/improve-deep-learning-performance/. Accessed August 2022
Canadian Hydrographic Service (2019) NONNA bathymetric data. https://www.charts.gc.ca/data-gestion/resourcemaps-cartesressources-eng.html. Accessed December 2019
Clague JJ (1991) Quaternary stratigraphy and history of south-coastal British Columbia. In: Monger JWH (ed) Geology and geological hazards of the Vancouver region, southwestern British Columbia. Geol Surv Can Bull 481:181–192
Cracknell MJ, Reading AM (2014) Geological mapping using remote sensing data: a comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information. Comput Geosci 63:22–33. https://doi.org/10.1016/j.cageo.2013.10.008
Cummings DI, Russell HAJ, Sharpe DR (2012) Buried-valley aquifers in the Canadian prairies: geology, hydrogeology, and origin. Earth Science Sector (ESS) Contribution 20120131. Can J Earth Sci 49(9):987–1004. https://doi.org/10.1139/e2012-041
DMTI Spatial Inc. (2002) Digital elevation model (DEM). NAD83, UTM projection, 30 m resolution from 1:50,000 scale digital mapping. SFU Library. Accessed on December 2019
Dunn D, Ricketts B (1994) Surficial geology of Fraser Lowlands digitized from GSC Maps 1484A, 1485A, 1486A, and 1487A (92 G/1, 2, 3, 6, 7; 92 H/4). Geol Surv Can Open File 2894. https://doi.org/10.4095/194084
Easterbrook DJ (1986) Stratigraphy and chronology of Quaternary deposits of the Puget Lowland and Olympic Mountains of Washington and the Cascade Mountains of Washington and Oregon. Quatern Sci Rev 5:145–159. https://doi.org/10.1016/0277-3791(86)90180-0
Eungard DW (2014) Models of bedrock elevation and unconsolidated sediment thickness in Puget Lowland, Washington. US Geol Surv Open-File Rep 2014-04. https://doi.org/10.3133/ofr20181115
Finlayson DP, Haugerud RA, Greenberg H, Logsdon MG (2000) Puget Sound digital elevation model. University of Washington. https://www.ocean.washington.edu/data/pugetsound/asciigrid.html. Accessed December 2019
Frind EO, Molson JW, Sousa MR, Martin PJ (2014) Insights from four decades of model development on the Waterloo Moraine: a review. Can Water Resour J 39(2):149–166. https://doi.org/10.1080/07011784.2014.914799
Fuentes I, Padarian J, Iwanaga T, Vervoort RW (2020) 3D lithological mapping of borehole descriptions using word embeddings. Comput Geosci 141:104516. https://doi.org/10.1016/j.cageo.2020.104516
Golder Associates (Golder) (2005) Comprehensive groundwater modelling assignment, Township of Langley. Prepared for the Township of Langley, BC
Haj AE, Soller DR, Reddy JE, Kauffman LJ, Yager RM, Buchwald CA (2018) Hydrogeologic framework for characterization and occurrence of confined and unconfined aquifers in quaternary sediments in the glaciated conterminous United States: a digital map compilation and database. US Geol Surv Data Series 1090. https://doi.org/10.3133/ds1090
Halstead EC (1986) Ground water supply: Fraser Lowland, British Columbia. Scientific series, Canada Inland Waters Directorate, no. 145, NHRI paper no. 26, NHRI, Ontario
Hamilton TS, Ricketts BD (1994) Contour Map of the Sub-Quaternary Bedrock Surface, Strait of Georgia and Fraser Lowland. Geol Surv Can Bull 481:193–196
He X, Sonnenborg TO, Jørgensen F, Høyer AS, Møller RR, Jensen KH (2013) Analyzing the effects of geological and parameter uncertainty on prediction of groundwater head and travel time. Hydrol Earth Syst Sci 17:3245–3260. https://doi.org/10.5194/hess-17-3245-2013
Johnson B, Allen DM, Wei M (2022) Mapping the likelihood of flowing artesian conditions in the Okanagan Basin and Fraser Valley, British Columbia. Water Science Series, Province of British Columbia, Victoria, BC
Jones MA (1999) Geologic framework for the Puget Sound Aquifer System, Washington and British Columbia. US Geol Surv Pap 1424-C. https://doi.org/10.3133/pp1424C
Jørgensen F, Høyer AS, Sandersen PBE, He X, Foged N (2015) Combining 3D geological modelling techniques to address variations in geology, data type and density: an example from Southern Denmark. Comput Geosci 81:53–63. https://doi.org/10.1016/J.CAGEO.2015.04.010
Kanevski M, Pozdnoukhov A, Timonin V (2009) Machine learning for spatial environmental data, 1st edn. EPFL Press, Lausanne, Switzerland. https://doi.org/10.1201/9781439808085
Kearsey T, Williams J, Finlayson A, Williamson P, Dobbs M, Marchant B, Kingdon M, Campbell D (2015) Testing the application and limitation of stochastic simulations to predict the lithology of glacial and fluvial deposits in central Glasgow, UK. Eng Geol 187:98–112. https://doi.org/10.1016/j.enggeo.2014.12.017
Kovanen DJ (2002) Morphologic and stratigraphic evidence for Allerod and Younger Dryas age glacier fluctuations of the Cordilleran Ice Sheet, British Columbia, Canada and Northwest Washington, U.S.A. Boreas 31:163–184
Kovanen DJ, Slaymaker O (2015) The paraglacial geomorphology of the Fraser Lowland, southwest British Columbia and northwest Washington. Geomorphology 232:78–93. https://doi.org/10.1016/j.geomorph.2014.12.021
Lukjan A, Swasdi S, Chalermyanont T (2016) Importance of alternative conceptual model for sustainable groundwater management of the Hat Yai Basin, Thailand. Procedia Eng 154:308–316. https://doi.org/10.1016/j.proeng.2016.07.480
Morgan SE (2018) Investigating the role of buried valley aquifer systems in the regional hydrogeology of the Central Peace Region in Northeast British Columbia. MSc Thesis, Simon Fraser University, BC, Canada. http://summit.sfu.ca/item/17906
Mustard PS, Rouse GE (1994) Stratigraphy and evolution of Tertiary Georgia Basin and subjacent Upper Cretaceous sedimentary rocks, southwestern British Columbia and northwestern Washington State. In: Monger JWH (ed) Geology and geologic hazards of the Vancouver region, southwestern British Columbia. Geol Surv Can Bull 481:97–169
Padarian J, Fuentes I (2019) Word embeddings for application in geosciences: development, evaluation, and examples of soil-related concepts. Soil 5:177–187. https://doi.org/10.5194/soil-5-177-2019
Pasanen AH, Okkonen JS (2017) 3D Geological models to groundwater flow models: data Integration between GSI3D and groundwater flow modelling software GMS and FeFlow®. Geol Soc Spec Publ 408(1):71–87. https://doi.org/10.1144/SP408.15
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830. https://doi.org/10.48550/arXiv.1201.0490
Poeter E, Anderson D (2005) Multimodel ranking and inference in ground water modeling. Ground Water 43(4):597–605. https://doi.org/10.1111/j.1745-6584.2005.0061.x
Rajaee T, Ebrahimi H, Nourani V (2019) A review of the artificial intelligence methods in groundwater level modeling. J Hydrol 572:336–351. https://doi.org/10.1016/j.jhydrol.2018.12.037
Refsgaard JC, Christensen S, Sonnenborg TO, Seifert D, Højberg AL, Troldborg L (2012) Review of strategies for handling geological uncertainty in groundwater flow and transport modeling. Adv Water Resour 36:36–50. https://doi.org/10.1016/j.advwatres.2011.04.006
Rizzo DM, Dougherty DE (1994) Characterization of aquifer properties using artificial neural networks: neural kriging. Water Resour Res 30(2):483–497. https://doi.org/10.1029/93WR02477
Russell HAJ, Brennand TA, Logan C, Sharpe DR (1998) Standardization and assessment of geological descriptors from water well records, Greater Toronto and Oak Ridges Moraine areas, southern Ontario. Geol Surv Can Current Res 1998-E:89–102. https://doi.org/10.4095/209956
Sahoo S, Jha MK (2017) Pattern recognition in lithology classification: modeling using neural networks, self-organizing maps and genetic algorithms. Hydrogeol J 25(2):311–330. https://doi.org/10.1007/s10040-016-1478-8
Schorpp L, Straubhaar J, Renard P (2022) Automated hierarchical 3D modeling of quaternary aquifers: the ArchPy approach. Front Earth Sci 10:884075. https://doi.org/10.3389/feart.2022.884075
Standard, A.S.T.M. (2011) D2487-11 standard practice for classification of soils for engineering purposes (Unified Soil Classification System). ASTM Int., West Conshohocken, PA
Sullivan CB, Kaszynski A (2019) PyVista: 3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK). J Open Source Softw 4(37):1450. https://doi.org/10.21105/joss.01450
Toth A, Havril T, Simon S, Galsa A, Santos FAM, Muller I, Madl-Szonyi J (2016) Groundwater flow pattern and related environmental phenomena in complex geologic settings based on integrated model construction. J Hydrol 539:330–344. https://doi.org/10.1016/j.hydrol.2016.05.038
United States Geologic Survey (2019) National Elevation Dataset (NED). https://gdg.sc.egov.usda.gov/Catalog/ProductDescription/NED.html. Accessed July 2019
Vaccaro JJ (1992) Plan of study for the Puget-Willamette lowland regional aquifer system analysis, Western Washington and Western Oregon. US Geol Surv Water Resour Invest Rep 91-4189. https://doi.org/10.3133/wri914189
Vaccaro JJ, Hansen AJ, Jones MA (1998) Hydrogeologic framework of the Puget Sound Aquifer System. US Geol Surv Prof Pap 1424–D. https://doi.org/10.3133/pp1424D
Vettigli G (2018) MiniSom: minimalistic and NumPy-based implementation of the self organizing map. https://github.com/JustGlowing/minisom/. Accessed May 2020
Vo Thanh H, Sugai Y, Nguele R, Sasaki K (2019) Integrated workflow in 3D geological model construction for evaluation of CO2 storage capacity of a fractured basement reservoir in Cuu Long Basin, Vietnam. Int J Greenh Gas Control 90:102826. /https://doi.org/10.1016/j.ijggc.2019.102826
Ward BC, Thomson B (2004) Late Pleistocene stratigraphy and chronology of lower Chehalis River Valley, southwestern British Columbia: evidence for a restricted Coquitlam Stade. Can J Earth Sci 41(7):881–895. https://doi.org/10.1139/e04-037
Washington Division of Geology and Earth Resources (2016) Surface geology, 1:100,000: GIS data. Washington Division of Geology and Earth Resources Digital Data Series DS-18, version 3.1, Washington Division of Geology and Earth Resources, Olympia, WA
Yager RM, Kauffman LJ, Soller DR, Haj AE, Heisig PM, Buchwald CA, Westenbroek SM, Reddy JE (2019) Characterization and occurrence of confined and unconfined aquifers in Quaternary sediments in the glaciated conterminous United States (ver. 1.1). US Geol Ssurv Sci Invest Rep 2018-5091. https://doi.org/10.3133/sir20185091
Acknowledgements
This research was supported by a Natural Sciences and Engineering Research Council (NSERC) Discovery Grant to D. Allen.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
On behalf of all authors, Zidra Hammond states that there is no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hammond, Z., Allen, D.M. Evaluating the feasibility of using artificial neural networks to predict lithofacies in complex glacial deposits. Hydrogeol J 32, 509–526 (2024). https://doi.org/10.1007/s10040-023-02726-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10040-023-02726-2