Introduction

Bioturbation is a geological phenomenon where the displacement of sediments is attained via the activity of living organisms. This phenomenon can alter not only the textural but also the petrophysical characteristics of a rock (Bromley 1996). Past research conducted on diverse formations within the Subei region has confirmed the significant impact of bioturbation on reservoir quality, underscoring its crucial role in determining a reservoir's productivity. These studies identified ignored secondary reservoir targets that may have increased reserve estimates in petroleum fields (Quaye et al. 2019, 2022, 2023). Analyzing textural parameters within sedimentary rocks poses significant challenges, particularly complex bioturbation. This complexity arises from the wide variation in borings, rootlets, size and intricacy of burrows, and the regular and quick vertical and horizontal alterations, often due to factors like crosscutting burrows and the arrangement of trace-forming endobenthic communities (Bromley and Ekdale 1984). Over time, researchers have adopted various approaches to tackle the issue of describing bioturbation. These methods range from early classification attempts (Schäfer 1956), semi-quantitative measurements proposed by Reineck (1963), quantitative estimations outlined by Dorador et al. (2014), to mathematical modelling as discussed by Guinasso and Schink (1975).

Bioturbation has over the long term greatly affected reservoir quality through modifications on porosity, permeability, or their effects upon depositional stability (Pemberton & Gingras 2005) Burrowing can increase porosity and permeability by opening pathways among grains or reduce these properties if the resulting burrow compacts bordering sediments with finer infill material (Tonkin et al. 2010). Bioturbation impairs depositional interpretations, homogenizes the microstratigraphic distribution of sediment layers, and affects redox chemistry by mixing oxygenated surface sediments with subsurface reducing zones. It also influences compaction and cementation which can either preserve or modify porosity and permeability. In general, bioturbation causes heterogeneity at different scales that can affect fluid flow and reservoir performance making it a key factor to consider in an efficient hydrocarbon recovery (Gingras et al. 1999; Hovikoski et al. 2008).

Machine learning, a component of artificial intelligence, comprises diverse data processing approaches like classification, regression, and clustering. It can be categorized into supervised and unsupervised techniques, delineating two primary branches within this field (Hall 2016). Supervised learning in artificial intelligence involves training a computer algorithm using labelled input data (herein training wells data) to predict specific outputs. Through iterative training, the algorithm learns to recognize hidden patterns and connections between the input and output data, ultimately allowing it to provide precise predictions when given new, unlabelled data (herein test dataset) (Mohri et al. 2012). Mandal and Rezaee (2019) asserted that the use of machine learning, especially when combined with wells data, has gained significant traction in tackling geoscientific issues within sectors of the oil and gas industry. Its application has been widely used in the analysis of various geological, geophysical and petrophysical characterizations (Deshenenkov and Polo 2020; Gharavi et al. 2022; Hansen et al. 2023; Mohammadinia et al. 2023). Fomel & Liu (2017) interpreted seismic data to identify subsurface geological structures and predict reservoir properties. Machine learning models can classify minerals and rocks based on their spectral signatures obtained from remote sensing data (Crosta and Souza Filho 1998). Sarma and Gupta (2000) (“Application of Neural Networks to Tunnel Data Analysis,” 1998) used machine learning for reservoir characterization, predicting porosity and permeability, and lithology from well logs and seismic data. These examples demonstrate the diverse range of applications for machine learning in geology, geophysics, and petrophysics, helping researchers and professionals better understand and characterize subsurface geological formations.

The degree of bioturbation (vol. % of bioturbation) is mostly interpreted via visual estimation using the Bioturbation Index (BI) (Taylor and Goldring 1993). This can be a limitation that further augments the identification confusion between biogenic and diagenetic structures. This paper aims to efficiently improve the prediction of bioturbation in reservoir rocks via the SVC, k-NN, and LDA machine learning algorithms as has been done in some studies (e.g., Tarabulski and Reinhardt 2020; Zhang et al. 2021). It aims to serve as a framework for future bioturbation-related machine learning studies. This work combines the Support Vector Machine (SVM), k- Nearest Neighbour (k-NN), and Linear Discriminant Analysis (LDA) classification algorithms with wells data to effectively determine bioturbated zones in selected reservoir facies to reduce human error and provide more accurate outcomes.

Geological setting

Located on the western periphery of the Yellow Sea in northern Jiangsu province, eastern China, the Subei basin (Fig. 1) is characterized as a fault sag basin. Its geological history dates back to the Late Cretaceous period when it began as a rift and covers an estimated area of around 35,000 square kilometres (Song et al. 2010). The basin's formation can be divided into two primary rift phases, occurring between 83 and 54.9 million years ago, followed by another between 54.9 and 38 million years ago (Yang & Chen 2003; Chen 2010). The intervals of rift activity were separated by significant tectonic events known as the Wubao and Sanduo occurrences, associated with thermal subsidence, as documented by Liu et al. (2014). Liu et al. (2017) suggested the occurrence of the Wubao incident-induced faulting and division within the basin. In parallel, the Sanduo event resulted in notable uplift, subsequently causing erosion of the Oligocene and the strata beneath it. This process resulted in the forming of an angular unconformity observed between the Neogene and the formations positioned beneath it (Yi et al. 2003).

Fig. 1
figure 1

a Description and maps detailing the general location of the Subei Basin general, positioned westerly to the Yellow Sea. b Delineation of different depressions within the Subei Basin, its surrounding tectonic features, and the specific examination of wells conducted in this research. Modified after Zhou et al. (2019)

The paleocene funing formation (Ef)

The Paleocene Funing Formation’s lowermost member (E1f1) displays a composition of 350 to 800 m of red beds (Fig. 2). These layers consist of intermixed brownish-red, very fine to fine-grained sandstones, siltstones, mudstones, and sporadic occurrences of greyish-green, very fine to fine-grained sandstones and siltstones (Zhang et al. 2006; Deng 2014). The Paleocene Funing Formation’s second member (E1f2) comprises alternating layers of lacustrine carbonates and fine grey sandstones, spanning 70 to 110 m (Fig. 2), succeeded by a dark grey mudstone layer, 60 to 120 m thick (Liu et al. 2012; Luo et al. 2013; Shao et al. 2013). The third member (E1f3) includes layers of interbedded grey, very fine-grained mudstones and sandstones, spanning 200 to 300 m in thickness (Fig. 2). Zhang et al. (2006) suggested that this section is topped by the fourth member, a layer of dark grey mudstone that ranges from 300 to 400 m thick.

Fig. 2
figure 2

In-depth analysis of the E1f1, E1f2, and E1f3 (65.0–56 Ma) involves a comprehensive study of their stratigraphy, facies, petroleum systems, tectonic occurrences, and evolutionary changes.Adapted from the work of Quaye et al. (2022)

Jinhu depression

The Jinhu Sag is situated southwest of the Subei Basin and is the largest within the basin, covering an area of approximately 5,500 square kilometres (Fig. 1B). Situated close to the northwest lies the Jianhu Uplift, while the Zhangbaling Uplift is positioned southwesterly, and the Sunan Uplift is found southeasterly. (Li et al. 2011; Liu et al. 2012; Shao et al. 2013). Within the Jinhu Depression, specific geological features are noteworthy, including the Liubao sandstone reservoirs and the Lingtangqiao Low-Uplift from the Paleocene Funing Formation (Li et al. 2011; Wang 2011). Dividing the formation (Fig. 2) into four clearly defined members has been accomplished using well logs and lithological studies (Liu et al. 2012).

Gaoyou depression

Positioned southerly to the Subei basin, the Gaoyou depression stretches about 2,670 square kilometres (Fig. 1B). Extending from east to west for over 100 kms and spanning about 30 kms from north to south, this depression takes the form of a half-graben. It is positioned with geographical separation to the east and south, demarcated by the Wubao Low and Tongyang uplifts.

The Jiangdu-Wubao fault zone delineates the southern and eastern perimeters of the depression, spanning over 140 kms and housing prominent faults like Wu 1, Wu 2, Zhen 1, and Zhen2. Within the Gaoyou depression, four discernible depocenters emerge Fanchuan, Liulu, Liuwushe, and Shaobo sub-basins (Liu et al. 2017).

In the western area, the Gaoyou Depression finds its boundaries marked by the Lingtangqiao Low Uplift, while the Tongyang Uplift characterizes its southern extent. During the Dainan-Yancheng period, significant movement occurred along two growth faults situated in the southern region of the Gaoyou depression. This movement resulted in the partitioning of the depression into three distinct segments as illustrated by Gu and Dai (2015): the South Fault-Terrace Belt, the Central Deep Sag, and the North Slope.

Bioturbation in the funing formation of the Subei basin

During the early Paleocene, the Subei basin was likely situated in a semiarid environment with seasonal rainfall patterns. This setting led to the development of diverse ichnofauna, including meniscal burrows, simple horizontal, vertical, or sub-vertical burrows, and plant roots/debris (Fig. 3G) along with their traces, which are characteristic of the Scoyenia or Skolithos ichnofacies in the Funing Formation. The Scoyenia ichnofacies predominantly feature horizontal meniscal burrows such as Beaconites coronus (Fig. 3A) Taenidium satanassi (Fig. 3B), and Taenidium barretti (Fig. 3C), along with simple horizontal cylindrical burrows like Planolites (Fig. 3D), Palaeophycus heberti (Fig. 3E), and Palaeophycus tubularis (Fig. 3F). The Skolithos ichnofacies include Skolithos isp. (Fig. 3H) and Skolithos linearis (Fig. 3I) (Zhou et al. 2019; Quaye et al. 2022, 2023). These ichnofacies are typically found in a mixture of clean, silty, and muddy substrates, indicative of multipurpose structures for feeding, dwelling, breeding, escape, and scavenging (Hubert and Dutcher 2010). They often exhibit very high bioturbation intensities (4 ≤ BI ≤ 6).

Fig. 3
figure 3

A Beaconites coronus (Be); B Taenidium satanassi (Ta); C Taenidium barretti (Ta); D Planolites isp. (Pl); E Palaeophycus heberti (Pa); F Palaeophycus tubularis (Pa); G plant debris and/traces (Rt); H Skolithos isp. (Sk); I Skolithos linearis (Sk). Modified after Zhou et al. (2019)

Methods

Data acquisition

This study used the Support Vector Machine (SVM), k- Nearest Neighbour (k-NN), and Linear Discriminant Analysis (LDA) classification algorithms combined with seventy-six (76) data points of nine (9) core samples (see Appendix) retrieved from five selected wells (see Fig. 1, B5; F12, H19, M7; and L5) in E1f2 of the Jinhu depression, and E1f1 and E1f3 of the Gaoyou Depression, respectively. These nine reservoir facies were mainly selected according to several indispensable factors. These are the outcrop area of facies, orientation and spacing of oilfield wellbores sectioned in relevant strata for ichnofauna search, and degree of bioturbation. The facies selected were also required to be free of fissures/fractures and any other defects that the results obtained would not accurately reflect their true values.

Table 1 presents a summary of core samples’ properties that were considered as features for this work in the prediction of the BI. Key parameters were core dimensions, density, porosity, permeability, and the physical property’s location.

Table 1 Core sample properties considered as features

Data pre-processing and labelling

Data cleaning was performed to remove duplicates and handle missing values to achieve data quality for training. Then data labelling was achieved by assigning each row of core sample features with a bioturbation index classification.

The seventy-six data points from datasets of nine core samples were labelled with the appropriate BI that provides a degree of bioturbation from 0–6, hence seven classifications. Details of the data points are provided in the Appendix Tables 9, 10, 11, 12, 13, 14, 15, 16 and 17. Each data set was well-labelled based on expert advice and observation of cores.

Feature selection

Important features relevant to reservoir bioturbation were considered from the data set. These included seven relevant features extracted from the 12 features in the dataset, as listed in Table 2. SelectKBest is a common feature selection method that selects the best list of features based on statistical tests. The consideration of the ANOVA F-test is due to its suitability for high-dimensionality data sets. The comparison of variance between the BI classes to the variance with each group is also possible in the identification of features that possess a significant relationship with the BI index. Numerous investigators have hence considered the use of the ANOVA F test for feature selection in their supervised ML works (Shayestegan et al. 2024; Theng and Bhoyar 2024). Under the Python script, the ANOVA F-test (f_classif) is implemented via the SelectKBest statistical test used to score and rank features based on their relationship with the output variable.

Table 2 Summary of variables used for Linear Discriminant Analysis

The F-test statistics were calculated and defined in Eq. 1.

$$F-test=\frac{variance\;of\;features\;between\;the\;different\;classes}{variance\;of\;feature\;within\;each\;class}$$
(1)

The K feature with the highest f_score or f_value indicated a strong relationship between the features of the target. Feature scaling is applied to transform features in the dataset to a comparable scale and range to avoid the domination of features over others and to enable the models to perform better and converge quickly. The Minmax scaler is a feature scaling method that shrinks the features in a given dataset within the range of 0 to 1. Equation 2 shows the formula for Minmax scaler normalization.

$$Minmax= \frac{x- {x}_{min}}{{x}_{max}- {x}_{min}}$$
(2)

Data split

The dataset was divided randomly, allocating 80% for training purposes and reserving 20% for testing. The splits hence translate into 60 training and 16 test datasets, with their assigned labels. It is typical of a data split to have a higher percentage of training data set than the test set. A higher training set percentage will provide sufficient information for the models to be well-fitted to handle most of the feature combinations for adequate prediction of the BI (vol % of bioturbation). A data split of 80/20 was deemed suitable for the data set size in this study and also follows a more conservative approach according to Birba (2020). The test set is expected to be an unknown data set to asses model performance (Joseph and Vakayil 2022).

Model selection

In this work, three supervised classification ML methods are considered Support Vector Classification (SVC), K-Nearest Neighbour (K-NN), and Linear Discriminant Analysis (LDA). The models are trained using selected features as inputs to provide a robust model for BI (vol. % of bioturbation) prediction. The justification for the selection of these three classical supervised ML methods in the novel application of BI prediction is that each model has proven to handle small data sets with reasonable generalization. That is, the impact of the data sets on the performance of each model is considered insignificant especially after a thorough cross-validation of the models is performed (Beckmann et al. 2015; Nalepa and Kawulok 2019; Raikwal and Saxena 2012). It is well acknowledged that most recent ML algorithms exist such as the ensemble methods, however, for the sake of simplicity and novelty of application, it is worthy of consideration to commence with established and classical supervised ML algorithms to provide novel prediction of the BI. Furthermore, the ease of interpretability in the case of a small data set, in the use of the LDA, SVC and k-NN makes these a preferred model for this study. More advanced models can also be applied for the prediction of BI; however, numerous hyper-parameters need to be tuned to achieve an optimal model. Although the selected models come with some limitations associated with the choice of kernel function (for SVC), choice of k neighbour (in the case of k-NN) and limitations to multi-dimensionality given linear boundary (in the case of LDA), the type of classification problem presented in work affords their use as the advantages outweigh the limitations.

Linear Discriminant Analysis (LDA)

The LDA serves as a supervised machine learning algorithm designed to execute classification tasks effectively. Additionally, it's adept at addressing dimensionality reduction challenges, eliminating redundant and interdependent features, and transforming high-dimensional features into a more concise low-dimensional space (Tharwat et al. 2017). All classes are assumed to be linearly separable, and hyperplanes within the feature space are created to differentiate between classes. (Vaibhaw and Pattnaik 2020). Hyperplanes are created based on two criteria: first, maximizing the separation between the means of distinct classes, known as the between-class variance (\({S}_{Bi}\))) as shown in Eq. 3; secondly, minimizing the distance between the class means and their respective samples, termed the within-class variance (\({S}_{Wi}\))) as represented in Eq. 4. The various variables used in this paper are described in Table 2.

$${S}_{Bi}={{W}^{T}({\mu }_{i}-\mu )({\mu }_{i}-\mu )}^{T}W$$
(3)
$${S}_{wi}={W}^{T}({x}_{ij}-{\mu }_{j}){({x}_{ij}-{\mu }_{j})}^{T}W$$
(4)

Support Vector Classification (SVC)

The Support Vector Classifier (SVC) represents a supervised machine-learning algorithm specifically applied to address multi-classification challenges. It operates in a fashion similar to LDA. The SVM creates decision boundaries between classes that help predict labels from feature vectors (Huang et al. 2018). Decision boundaries are called hyperplanes. The number of dimensions in the data dictates the configuration of hyperplanes. Through a set of constraints, Support Vector Machines (SVM) engage in an optimization process to ascertain optimal hyperplanes that maximize the margin between distinct classes. This margin denotes the space between the hyperplane and the support vector, which represents the closest data point from each class. Equation 5 defines the hyperplane equation, where 'w' signifies the weight, and 'b' stands for the bias.

$$f=w.x+b=0$$
(5)

The objective is to identify a hyperplane that optimizes margins while minimizing classification errors. There is a need to optimize the quadratic function with linear constraints defined in Eq. 6 and subjected to Eq. 7, where \({y}_{i}\) denotes training data class label, point \({x}_{i}\).

$$minimize= \frac{1}{2}|w||w|$$
(6)
$$subjected: {y}_{i}\left(w.{x}_{i}+b\right)\ge 1 for all i$$
(7)

The optimal hyperplanes can separate data points defined by the decision rule in Eq. 8.

$$\left\{\begin{array}{c}+1, if w.x+b \ge 0 \\ -1, if w.x+b\le 0\end{array}\right.$$
(8)

K-Nearest Neighbour

The K-Nearest Neighbour (K-NN) algorithm enjoys widespread use in classification due to its simplicity in computation and interpretation (Moldagulova and Sulaiman 2017). The K-NN algorithm relies on distance metrics, evaluating the similarity between two points based on their distance. The commonly utilized distance metric in scikit-learning is the Euclidean distance function, determining the distance between points X and Y. This Euclidean distance is mathematically defined in Eq. 9.

$$\left|XY\right|= \sqrt{{\left({x}_{1}-{y}_{1}\right)}^{2}+ {\left({x}_{2}-{y}_{2}\right)}^{2}}$$
(9)

The k value is the number of numbers used to make a prediction. Choosing k is important because it significantly affects the algorithm's performance. The proximity of the nearest k values was evaluated and sorted based on their closeness. The K-NN algorithm predicts the class of a new data point based on the majority class of the K most similar data points.

Model training and hyper parameter tuning

The model training entails the use of an 80% data set (66 training data points) to provide a multidimensional fit of decision boundaries for each model. To avoid the problems of underfitting and overfitting, optimization of the hyperparameters of the SVC and K-NN models was performed to ensure that models minimize the loss function. For the LDA, no optimization was required since there are no hyperparameters in the algorithm framework. For instance, the SVC model has the C and gamma and kernel parameters which require optimization or tuning while the K-NN model has the K-nearest neighbour’s value to optimize decision boundaries before further cross-validation can be performed. The C parameter typically represents the regularization parameter, playing a crucial role in determining the tradeoff between bias and variance. A smaller C provides for a larger margin separating the hyperplane and a larger C parameter leads to a smaller margin separating the hyperplane that provides the decision as to the varied BI (vol.% of bioturbation) classifications. The kernel parameter defines the type of decision boundary such as linear, polynomial or sigmoid (radial basis function) in nature. The Gamma parameter defines the degree of influence of a single training data on the decision boundary. Hence, high values of gamma depict a closer degree of influence and vice versa for a low-value situation. For the K-NN as a classifier, the kth nearest neighbour classifies new data to be within a class based on the proximity of the new data to k number of classes labelled within a defined distance termed the neighbourhood. If K is low, the algorithm tends to capture local patterns within the data but it is short of handling noise or outliers in the data. A high K value may provide a smother decision boundary but may not capture local variations in the data. Therefore, the need for k-parameter tuning and optimization.

The grid search method is applied in the search for all hyper-parameter combinations that are considered for the optimal multidimensional grid. Hyperparameter optimization will aid adequate bias-variance tradeoff that will provide model robustness in the prediction of unseen test datasets.

Model cross-validation

It is important to attain model stability and generalization which infers that the classification models are independent of the training data set selection. To ensure consistent accuracy and model reliability, cross-validation was executed on the training datasets. This process evaluated how well the machine learning models performed on unseen data, aiming for generalization and stability. The K-fold cross-validation was applied with a value of k set to 5. The training data set was further split into 80% sub-training and 20% cross-validation sets (illustrated in Fig. 4). In the fivefold, the models were trained and evaluated, such that, training and cross-validation were performed five times with each period using a different form of sub-training sets and cross-validation sets. Results from the cross-validation will provide insight into the data dependence and stability of classification models. A cross-validation step is also required to prevent model underfitting and overfitting issues.

Fig. 4
figure 4

5-Fold cross validation step

Model performance evaluation

Given that the issue addressed in this study pertains to multiclass classification, a confusion matrix was considered for the model performance evaluation. A detailed treatment of the confusion matrix for multiclass model assessment has been explored by some researchers (Delgado and Núñez-González 2019; Mathur and Foody 2008). Key metrics from the confusion matrix include average error, F1 score, Recall, precision and average accuracy. Performance metrics considered for this work will be based on average accuracy and average error. Overall accuracy based on the training set, cross-validation or test sets was determined as the ratio of accurately predicted BI (vol.% of bioturbation) to the total actual BI classifications. The model performance average error was evaluated based on loss of the target, defined as, 1-average accuracy. Precision refers to the ratio of the correctly predicted positive observations referred to as true positives to the total predicted positive observations.

$$Precision=\frac{True\;positives}{True\;positives+False\;Positives}.100\%$$
(10)

Recall is defined as the ratio of True positives to the sum of the true positives and False negative predictions

$$Recall=\frac{True\;positives}{True\;positives+False\;Negatives}.100\%$$
(11)

The F1 score is the weighted average of the Precision and Recall such that

$$F1score=2.\frac{Precision\;x\;Recall}{Pricision+Recall}.100\%$$
(12)

Figure 5 summarizes the key steps of processes utilized via machine learning.

Fig. 5
figure 5

Summary flow-chart process of machine learning

Results and discussion

Feature selection

In this work, a total of twelve features (as summarized in Table 3) were considered to be screened based on relevance and significance to changes in bioturbation prediction. For feature selection, based on f-scores and p-values, 7 features indexed as 1,2,3,4,5,6,7, were selected to be used as inputs for each of the supervised ML models. The remaining 5 features indexed as 8, 9, 10, 11, and 12 were rejected since low f-scores and high p-values were obtained. Features with p-values greater than 0.45 were rejected since increases in p-value represent a higher probability that the respective feature adds little or no information to predicting bioturbation.

Table 3 Feature selection

Put differently, it means that features with high p-values possess a high likelihood of changing the BI (vol.% of bioturbation) by chance and by a redefined correlation with the BI (vol.% of bioturbation). The rejected features with high p-values also corresponded with increases in f- f-scores.

Model training performance

Training of the classifiers was performed based on the selected features (see Table 3) with hyperparameter tuning of SVC and K-NN models using the training set. In Fig. 6, the results of the effects of the SVC hyperparameters, Gamma, Kernel and C on the average training accuracy of the bioturbation index (vol % of bioturbation) are shown. It can be deduced from the figure, that a general increase in training accuracy is observed for increases in the regularization parameter C for varied values of the gamma and kernel functions considered. This is simply because increasing the C parameter leads to a reduced regularization and a more complex decision boundary, hence a smaller boundary margin separating the bioturbation index classes such that the hyperplane accurately classifies the new data. Although very high C values (small regularization) can lead to overfitting of the training set, a balance between generalization and complexity is key to achieving the optimal classifier model.

Fig. 6
figure 6

Effects of C, Kernel and gamma on the training performance

For the linear kernel function, under each C value that ranges from 1–1000, increases in the gamma parameter from 0.01 to 1.00 showed little or no effect on the training accuracy. In other words, the variation in similarity radius for each class of bioturbation index does not affect the training fit of the SVC model. For example, at C = 1, under the linear kernel, increases in the gamma from 0.01 to 1.00, showed similar training accuracies around 62%. A similar training accuracy trend can be observed for further increases in C = 10, 100, and 1000 under a linear kernel to be around 78.4%, 84.2% and 96.1% respectively. The insensitivity of the SVC prediction to the gamma parameter is mainly due to the simpler linear function that defines the decision boundaries for the classification of the BI (vol.% of bioturbation) for the training data set considered.

Conversely, under the RBF kernel, which is a more complex function, the effect of the gamma parameter is evident given the clear distinction in training accuracies for a given C parameter (as shown in Fig. 6). The more complex sigmoid nature of the RBF allows for the effect of the gamma parameter that represents changes in the similarity radii of each BI class to be significant, hence leading to increases in fitting performances of the SVC model, that amount to training accuracies as high as 98% (for C = 1000 and gamma = 1.00).

Overall, the optimal hyperparameters of the SVC model selected for this work are C = 1000, and Gamma = 0.10 based on the linear kernel function. The selection of these hyperparameters can be justified to obtain an SVC model with an adequate bias-variance tradeoff. The results of the RBF kernel function also show less stability in training accuracy compared to the linear function case where there is little or no effect of the gamma parameter on the training accuracy.

Figure 7 presents the results of the effect of the k value on the training accuracy and corresponding error rates for the K-NN model. A variation of k values from 1–10 indicates a maximum training accuracy at k = 5 from the gird search results which corresponds to a training accuracy of 73.28%. For increases in k values (less than k = 5), decision boundaries of the K-NN model tend to be more complex and hence sensitive to outliers in the training data set since it relies heavily on the nearest neighbour classification. For K values greater than 5 the K-NN model tends to decrease in training accuracy (increase in error rate) as an underfitting results from the simpler decision boundaries that misclassify the training data. K = 5 is therefore selected as the optimal hyperparameter for the K-NN model.

Fig. 7
figure 7

Effect of the kth nearest neighbour on the training accuracy of the K-NN model

Training results of optimized classifiers

This section presents the training results of the optimized classifiers (SVC and K-NN) and LDA models in the form of the confusion matrices presented in Tables 4, 5 and 6. The confusion matrix presented in Table 4 shows the training results of the optimized SVC in the prediction of the Bioturbation for the training data set (60 datasets). For the SVC, most classes of bioturbation were excellently classified and predicted relative to the actual BI (vol.% of bioturbation) classification. However, classes 3 and 5 were adequately predicted but the false positive predictions for classes 2 and 4 respectively led to errors of 14.29% (for class 3) and 12.50% (for class 5). An overall training accuracy of 96.17% was obtained (3.83% error) under the optimized SVC.

Table 4 Confusion Matrix Training performance of hyper-parameter tuned SCV model
Table 5 Confusion Matrix showing training performance of hyperparameter optimized KNN model
Table 6 Confusion matrix showing training performance LDA model

In comparison with the K-NN method (Table 5), an overall training accuracy of 73.28% corresponding with an average error of prediction of BI of 26.72% is obtained. In the K-NN model, there were some misclassifications relative to the actual labels of BI on the data sets considered given the choice of k = 5.

As presented in Table 6, the LDA method provides training overall accuracy of 67.36% and an error of 32.64%. Under the LDA method, a linear combination of the features is expected to predict the BI (vol.% of bioturbation), hence the Significant error obtained could be related to an underfitting problem which may require further cross-validation.

The confusion matrix shows the classification performance across seven classes. The precision, recall, and F1 score are perfect (1.00) for Classes 0, 1, 2, 4, and 6, indicating flawless classification with no false positives or false negatives.

Class 3 has a lower performance with a precision and recall of 0.86, reflected in its F1 score of 0.86. This reduction is due to one instance of Class 2 being misclassified as Class 3 (false positive) and one instance of Class 3 being misclassified as Class 2 (false negative). This suggests some confusion between these two classes.

Class 5 also shows slightly reduced performance, with a precision and recall of 0.88 and an F1 score of 0.88. This is due to one instance of Class 4 being misclassified as Class 5 and one instance of Class 5 being misclassified as Class 4, indicating a minor overlap between these classes.

Overall, the model demonstrates high accuracy, correctly classifying the vast majority of instances. The slight misclassifications in Classes 3 and 5 suggest areas for further refinement, potentially focusing on distinguishing features between the closely related classes.

Other performance metrics

The results of other performance metrics such as the precision-recall and F1 score of each model are presented in Tables 7 and 8. Training results based on other metrics such as the precision, recall and F1 score for each model are presented in Table 7. The results indicate that the SVC model outperforms other models with an average precision, recall and score of 96.17%.

Table 7 Other performance metrics for all models in this study
Table 8 Test performance results on BI prediction

More so, on a class basis, the SVC still performs best with the least training classification performances in class 3 at 85.71%. This indicates that the SVC model could be effective in accurately predicting the bioturbation index across the different classes with minimal error.

The KNN model also shows relatively lesser training performance than the SVC model, although with perfect recall, precision and F1 scores for class 0, other classes were moderately classified correctly with the least precision-recall and f1 score to be 50 per cent at class 4 predictions. Overall, the F1 score of the KNN model training was 69.71%. These performances suggest that the KNN model has some overlapping class boundaries given imbalanced class distributions relative to the SVM model. The LDA model that provides linear class boundaries also performs best for classes 1,2,3 at F1 scores above 80%. However, low training classifications especially for classes 4, 5 and 6 were as low as 37.50%. The overall precision, recall and F1 scores of 69.15%, 67.36% and 68.13% respectively indicate that the LDA model given its linear boundaries tends to misinterpret overlapping classes.

Cross-validation of models

To establish the generalization of each model performance, a cross-validation using the fivefold stratification of the training data set was performed. Figure 8(a) below shows the error as a result of variation in training sets used for each model. The stability of each model can be inferred from the variation in average error for increases in k-folds from 1 to 5. For instance, for the SVC model variation in error is between 1% (at a K-fold of 1) and 6.25% (at a K-fold of 4). The undulating behaviour within the stated average errors is indicative of a stable model and independence of the training data set used. For the K-NN model, similar stability of training performance can be inferred, albeit the significant losses in prediction. The LDA model shows a relative increase in loss with an increase in K folding of the training set combined with the highest average errors in the prediction of BI (vol.% of bioturbation). The LDA model is, therefore, considered the worst-performing model in this work. The results of average cross-validation accuracy of the prediction of BI (vol % of bioturbation) for all models are depicted in Fig. 8(b). The results of the average cross-validation accuracies indicate that the optimized SVC is the best-performing classifier followed by the optimized K-NN classifiers and the worst perming to be the LDA classifier.

Fig. 8
figure 8

Cross-validation results of the fivefold cross-validation on (a) error rates for SVC, K-NN and LDA classifiers and (b) cross-validation accuracy

Test performance

Table 7 showcases the test performance outcomes for BI (vol. % of bioturbation) prediction across the SVC, K-NN, and LDA classifiers. The support vector classifier outperforms other classifiers in the prediction of BI (vol.% of bioturbation) overall, which results in an average accuracy of 92.86%. Although a poor prediction of the class 2 BI (vol % of bioturbation), as 50% cannot be ignored more data is recommended to improve the results of classification. The KNN classifier performs at an average accuracy of 90.48% which is acceptable. Classes 3 and 5 were predicted at 66.7% accuracy as a result of the decision boundary defined by the K-NN model. The linear discriminant model remains the worst-performing classifier in this work with an average accuracy of 76.2% with only classes 0, 4, and 6 predicted at 100% accuracies for the test data set considered.

In comparison with other works such as that of Timmer et al. (2021) that considered deep learning methods (deep convolutional neural networks (DCNNs)) using images as input and arrived at an 88% accuracy of prediction of bioturbation. More so, Ayranci et al. (2021) also obtained an accuracy of 70% when they used a neural network algorithm combined with a high number of input images to detect the Bioturbation Index (vol % of bioturbation). In this work, results indicate an improved prediction of the BI especially with the SVC when given the relatively small sample space and expert labelling of core samples with BI (vol.% of bioturbation) within the respective classes of identification. The improved performance of the SVC compared to neural networks in a classification problem is due to the capability of the kernel function to augment feature numbers onto which the intrinsic data properties are extracted.

Models’ prediction limitations on bioturbated reservoir facies

Reservoir-scale advantages of studying bioturbation mainly focus on a better understanding of mud-dominated sedimentary structures that allow improved predictions for rock properties (e.g., Buatois and Mángano 2011; Gingras et al. 2001). It is also of special interest in hydrocarbon reservoirs since it modifies porosity and permeability. Yet, they are also fraught with a range of major limitations and difficulties (Gibling and Bird 1994). Pros and cons of applying Machine Learning (ML) models to predict bioturbation in reservoir conditions They are particularly good at churning through vast amounts of data, uncovering patterns and links that traditional methods may not identify. Reservoir conditions such as fractures, fissures, and diagenetic processes can substantially impact the accuracy and reliability of these ML models. Furthermore, fractures and fissures lead to complex flow pathways within the reservoir, hence complicating bioturbation signal interpretation with a potential reduction in the performance of ML predictions (Oliver et al. 2008; Tarabulski and Reinhardt 2020). Baniak et al. (2013) propose that bioturbated reservoir facies may be permeable with areas of low porosity acting as semi-sealed, intra-stratum micro-fracture systems. Such housings create fluid flow focussed on these high-permeability paths, which may be decoupled by structural features and pass deposition away from (bioturbated matrix), complicating predictions. Due to diagenetic processes such as cementation, dissolution and recrystallization at the micro-scale the primary sedimentary structures or bioturbation features could be modified which represents complications in interpretation because these elements may affect final model surrogates for hydraulic properties. An insight into these interactions is important for improving the predictive capabilities of ichnological models in a wide range of reservoir conditions (Worden and Burley 2003; Buatois and Mángano 2007).

Conclusion

This work considers a unique set of inputs that includes the key dimensions of the core samples and the volume of bioturbation in the sample is considered for the prediction of BI via SVC, K-NN and LDA algorithms. These classifiers provide decision boundaries that aid the prediction of the multi-classification of Bioturbation in the form of the Bioturbation index. 76 data sets of core samples retrieved from existing wells in the Subei Basin, China. Key steps in machine learning performed in this work include data preprocessing, feature selection, model training cross-validation, and testing. Seven (7) selected features from the core data were used as inputs to build each classifier to predict bioturbation.

A training-test data split of 80/20 was adequate for the study. Training of the SVC and K-NN models considered hyperparameter optimization and cross-validation of all models before a model evaluation using the test data set. Based on grid search, the hyperparameters of the SVC and K-NN models were selected based on adequate bias-variance tradeoff considerations. The training and test results indicate that the optimized SVC was the best classifier followed by the k-NN classifier and then the LDA classifier which was the worst-performing classifier.

The results also show that hyperparameter optimization is critical for desired model performances. The novelty of this work was evident in the application of core data that comprised rock properties and BI parameters as selected features for training each classifier to predict bioturbation compared to other works that considered images of core samples as features for bioturbation prediction. We recommend adaptive unsupervised ML classifiers to predict bioturbation in future works.