Application of SVC, k-NN, and LDA machine learning algorithms for improved prediction of Bioturbation: Example from the Subei Basin, China

Quaye, Jonathan Atuquaye; Sarkodie, Kwame; Jiang, Zaixing; Hu, Chenlin; Agbanu, Joshua; Adjei, Stephen; Li, Baiqiang

doi:10.1007/s12145-024-01450-z

Application of SVC, k-NN, and LDA machine learning algorithms for improved prediction of Bioturbation: Example from the Subei Basin, China

RESEARCH
Published: 28 August 2024

(2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Earth Science Informatics Aims and scope Submit manuscript

Application of SVC, k-NN, and LDA machine learning algorithms for improved prediction of Bioturbation: Example from the Subei Basin, China

Download PDF

69 Accesses
Explore all metrics

Abstract

Three supervised machine learning (ML) classification algorithms: Support Vector Classifier (SVC), K- Nearest Neighbour (K-NN), and Linear Discriminant Analysis (LDA) classification algorithms are combined with seventy-six (76) data points of nine (9) core sample datasets retrieved from five (5) selected wells in oilfields of the Subei Basin to delineate bioturbation. Application of feature selection via p-score and f-scoring reduced the number of relevant features to 7 out of the 12 considered. Each classifier underwent model training and testing allocating 80% of the data for training and the remaining 20% for testing. Under the model training, optimization of hyperparameters of the SVC (C, Gamma and Kernel) and K-NN (K value) was performed via the grid search to understand the best form of the decision boundaries that provides optimal accuracy of prediction of Bioturbation. Results aided the selection of optimized SVC hyperparameters such as a linear kernel, C-1000 and Gamma parameter—0.10 that provided a training accuracy of 96.17%. The optimized KNN classifier was obtained based on the K = 5 nearest neighbour to obtain a training accuracy of 73.28%. The training accuracy of the LDA classifier was 67.36% which made it the worst-performing classifier in this work. Further cross-validation based on a fivefold stratification was performed on each classifier to ascertain model generalization and stability for the prediction of unseen test data. Results of the test performance of each classifier indicated that the SVC was the best predictor of the bioturbation index at 92.86% accuracy, followed by the K-NN model at 90.48%, and then the LDA classifier which gave the lowest test accuracy at 76.2%. The results of this work indicate that bioturbation can be predicted via ML methods which is a more efficient and effective means of rock characterization compared to conventional methods used in the oil and gas industry.

Predicting earthquake-induced soil liquefaction based on a hybridization of kernel Fisher discriminant analysis and a least squares support vector machine: a multi-dataset study

Article 30 July 2016

Toward the reliable prediction of reservoir landslide displacement using earthworm optimization algorithm-optimized support vector regression (EOA-SVR)

Article 24 November 2023

Classification of alteration zones based on whole-rock geochemical data using support vector machine

Article 01 April 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Bioturbation is a geological phenomenon where the displacement of sediments is attained via the activity of living organisms. This phenomenon can alter not only the textural but also the petrophysical characteristics of a rock (Bromley 1996). Past research conducted on diverse formations within the Subei region has confirmed the significant impact of bioturbation on reservoir quality, underscoring its crucial role in determining a reservoir's productivity. These studies identified ignored secondary reservoir targets that may have increased reserve estimates in petroleum fields (Quaye et al. 2019, 2022, 2023). Analyzing textural parameters within sedimentary rocks poses significant challenges, particularly complex bioturbation. This complexity arises from the wide variation in borings, rootlets, size and intricacy of burrows, and the regular and quick vertical and horizontal alterations, often due to factors like crosscutting burrows and the arrangement of trace-forming endobenthic communities (Bromley and Ekdale 1984). Over time, researchers have adopted various approaches to tackle the issue of describing bioturbation. These methods range from early classification attempts (Schäfer 1956), semi-quantitative measurements proposed by Reineck (1963), quantitative estimations outlined by Dorador et al. (2014), to mathematical modelling as discussed by Guinasso and Schink (1975).

Bioturbation has over the long term greatly affected reservoir quality through modifications on porosity, permeability, or their effects upon depositional stability (Pemberton & Gingras 2005) Burrowing can increase porosity and permeability by opening pathways among grains or reduce these properties if the resulting burrow compacts bordering sediments with finer infill material (Tonkin et al. 2010). Bioturbation impairs depositional interpretations, homogenizes the microstratigraphic distribution of sediment layers, and affects redox chemistry by mixing oxygenated surface sediments with subsurface reducing zones. It also influences compaction and cementation which can either preserve or modify porosity and permeability. In general, bioturbation causes heterogeneity at different scales that can affect fluid flow and reservoir performance making it a key factor to consider in an efficient hydrocarbon recovery (Gingras et al. 1999; Hovikoski et al. 2008).

Machine learning, a component of artificial intelligence, comprises diverse data processing approaches like classification, regression, and clustering. It can be categorized into supervised and unsupervised techniques, delineating two primary branches within this field (Hall 2016). Supervised learning in artificial intelligence involves training a computer algorithm using labelled input data (herein training wells data) to predict specific outputs. Through iterative training, the algorithm learns to recognize hidden patterns and connections between the input and output data, ultimately allowing it to provide precise predictions when given new, unlabelled data (herein test dataset) (Mohri et al. 2012). Mandal and Rezaee (2019) asserted that the use of machine learning, especially when combined with wells data, has gained significant traction in tackling geoscientific issues within sectors of the oil and gas industry. Its application has been widely used in the analysis of various geological, geophysical and petrophysical characterizations (Deshenenkov and Polo 2020; Gharavi et al. 2022; Hansen et al. 2023; Mohammadinia et al. 2023). Fomel & Liu (2017) interpreted seismic data to identify subsurface geological structures and predict reservoir properties. Machine learning models can classify minerals and rocks based on their spectral signatures obtained from remote sensing data (Crosta and Souza Filho 1998). Sarma and Gupta (2000) (“Application of Neural Networks to Tunnel Data Analysis,” 1998) used machine learning for reservoir characterization, predicting porosity and permeability, and lithology from well logs and seismic data. These examples demonstrate the diverse range of applications for machine learning in geology, geophysics, and petrophysics, helping researchers and professionals better understand and characterize subsurface geological formations.

The degree of bioturbation (vol. % of bioturbation) is mostly interpreted via visual estimation using the Bioturbation Index (BI) (Taylor and Goldring 1993). This can be a limitation that further augments the identification confusion between biogenic and diagenetic structures. This paper aims to efficiently improve the prediction of bioturbation in reservoir rocks via the SVC, k-NN, and LDA machine learning algorithms as has been done in some studies (e.g., Tarabulski and Reinhardt 2020; Zhang et al. 2021). It aims to serve as a framework for future bioturbation-related machine learning studies. This work combines the Support Vector Machine (SVM), k- Nearest Neighbour (k-NN), and Linear Discriminant Analysis (LDA) classification algorithms with wells data to effectively determine bioturbated zones in selected reservoir facies to reduce human error and provide more accurate outcomes.

Geological setting

Located on the western periphery of the Yellow Sea in northern Jiangsu province, eastern China, the Subei basin (Fig. 1) is characterized as a fault sag basin. Its geological history dates back to the Late Cretaceous period when it began as a rift and covers an estimated area of around 35,000 square kilometres (Song et al. 2010). The basin's formation can be divided into two primary rift phases, occurring between 83 and 54.9 million years ago, followed by another between 54.9 and 38 million years ago (Yang & Chen 2003; Chen 2010). The intervals of rift activity were separated by significant tectonic events known as the Wubao and Sanduo occurrences, associated with thermal subsidence, as documented by Liu et al. (2014). Liu et al. (2017) suggested the occurrence of the Wubao incident-induced faulting and division within the basin. In parallel, the Sanduo event resulted in notable uplift, subsequently causing erosion of the Oligocene and the strata beneath it. This process resulted in the forming of an angular unconformity observed between the Neogene and the formations positioned beneath it (Yi et al. 2003).

The paleocene funing formation (Ef)

The Paleocene Funing Formation’s lowermost member (E₁f₁) displays a composition of 350 to 800 m of red beds (Fig. 2). These layers consist of intermixed brownish-red, very fine to fine-grained sandstones, siltstones, mudstones, and sporadic occurrences of greyish-green, very fine to fine-grained sandstones and siltstones (Zhang et al. 2006; Deng 2014). The Paleocene Funing Formation’s second member (E₁f₂) comprises alternating layers of lacustrine carbonates and fine grey sandstones, spanning 70 to 110 m (Fig. 2), succeeded by a dark grey mudstone layer, 60 to 120 m thick (Liu et al. 2012; Luo et al. 2013; Shao et al. 2013). The third member (E₁f₃) includes layers of interbedded grey, very fine-grained mudstones and sandstones, spanning 200 to 300 m in thickness (Fig. 2). Zhang et al. (2006) suggested that this section is topped by the fourth member, a layer of dark grey mudstone that ranges from 300 to 400 m thick.

Jinhu depression

The Jinhu Sag is situated southwest of the Subei Basin and is the largest within the basin, covering an area of approximately 5,500 square kilometres (Fig. 1B). Situated close to the northwest lies the Jianhu Uplift, while the Zhangbaling Uplift is positioned southwesterly, and the Sunan Uplift is found southeasterly. (Li et al. 2011; Liu et al. 2012; Shao et al. 2013). Within the Jinhu Depression, specific geological features are noteworthy, including the Liubao sandstone reservoirs and the Lingtangqiao Low-Uplift from the Paleocene Funing Formation (Li et al. 2011; Wang 2011). Dividing the formation (Fig. 2) into four clearly defined members has been accomplished using well logs and lithological studies (Liu et al. 2012).

Gaoyou depression

Positioned southerly to the Subei basin, the Gaoyou depression stretches about 2,670 square kilometres (Fig. 1B). Extending from east to west for over 100 kms and spanning about 30 kms from north to south, this depression takes the form of a half-graben. It is positioned with geographical separation to the east and south, demarcated by the Wubao Low and Tongyang uplifts.

The Jiangdu-Wubao fault zone delineates the southern and eastern perimeters of the depression, spanning over 140 kms and housing prominent faults like Wu 1, Wu 2, Zhen 1, and Zhen2. Within the Gaoyou depression, four discernible depocenters emerge Fanchuan, Liulu, Liuwushe, and Shaobo sub-basins (Liu et al. 2017).

In the western area, the Gaoyou Depression finds its boundaries marked by the Lingtangqiao Low Uplift, while the Tongyang Uplift characterizes its southern extent. During the Dainan-Yancheng period, significant movement occurred along two growth faults situated in the southern region of the Gaoyou depression. This movement resulted in the partitioning of the depression into three distinct segments as illustrated by Gu and Dai (2015): the South Fault-Terrace Belt, the Central Deep Sag, and the North Slope.

Bioturbation in the funing formation of the Subei basin

During the early Paleocene, the Subei basin was likely situated in a semiarid environment with seasonal rainfall patterns. This setting led to the development of diverse ichnofauna, including meniscal burrows, simple horizontal, vertical, or sub-vertical burrows, and plant roots/debris (Fig. 3G) along with their traces, which are characteristic of the Scoyenia or Skolithos ichnofacies in the Funing Formation. The Scoyenia ichnofacies predominantly feature horizontal meniscal burrows such as Beaconites coronus (Fig. 3A) Taenidium satanassi (Fig. 3B), and Taenidium barretti (Fig. 3C), along with simple horizontal cylindrical burrows like Planolites (Fig. 3D), Palaeophycus heberti (Fig. 3E), and Palaeophycus tubularis (Fig. 3F). The Skolithos ichnofacies include Skolithos isp. (Fig. 3H) and Skolithos linearis (Fig. 3I) (Zhou et al. 2019; Quaye et al. 2022, 2023). These ichnofacies are typically found in a mixture of clean, silty, and muddy substrates, indicative of multipurpose structures for feeding, dwelling, breeding, escape, and scavenging (Hubert and Dutcher 2010). They often exhibit very high bioturbation intensities (4 ≤ BI ≤ 6).

Methods

Data acquisition

This study used the Support Vector Machine (SVM), k- Nearest Neighbour (k-NN), and Linear Discriminant Analysis (LDA) classification algorithms combined with seventy-six (76) data points of nine (9) core samples (see Appendix) retrieved from five selected wells (see Fig. 1, B5; F12, H19, M7; and L5) in E₁f₂ of the Jinhu depression, and E₁f₁ and E₁f₃ of the Gaoyou Depression, respectively. These nine reservoir facies were mainly selected according to several indispensable factors. These are the outcrop area of facies, orientation and spacing of oilfield wellbores sectioned in relevant strata for ichnofauna search, and degree of bioturbation. The facies selected were also required to be free of fissures/fractures and any other defects that the results obtained would not accurately reflect their true values.

Table 1 presents a summary of core samples’ properties that were considered as features for this work in the prediction of the BI. Key parameters were core dimensions, density, porosity, permeability, and the physical property’s location.

Table 1 Core sample properties considered as features

Application of SVC, k-NN, and LDA machine learning algorithms for improved prediction of Bioturbation: Example from the Subei Basin, China

Abstract

Similar content being viewed by others

Predicting earthquake-induced soil liquefaction based on a hybridization of kernel Fisher discriminant analysis and a least squares support vector machine: a multi-dataset study

Toward the reliable prediction of reservoir landslide displacement using earthworm optimization algorithm-optimized support vector regression (EOA-SVR)

Classification of alteration zones based on whole-rock geochemical data using support vector machine

Explore related subjects

Introduction

Geological setting

The paleocene funing formation (Ef)

Jinhu depression

Gaoyou depression

Bioturbation in the funing formation of the Subei basin

Methods

Data acquisition

Data pre-processing and labelling

Feature selection

Data split

Model selection

Linear Discriminant Analysis (LDA)

Support Vector Classification (SVC)

K-Nearest Neighbour

Model training and hyper parameter tuning

Model cross-validation

Model performance evaluation

Results and discussion

Feature selection

Model training performance

Training results of optimized classifiers

Other performance metrics

Cross-validation of models

Test performance

Models’ prediction limitations on bioturbated reservoir facies

Conclusion

Data Availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation