1 Introduction

The oil and gas industry has adopted machine learning methods in several areas, including exploration, production, drilling, among others (Choubey and Karmakar 2020; Tian et al. 2021; Osarogiagbon et al. 2020; Alakeely and Horne 2021; Cheraghi et al. 2021).

At exploration stages, recoverable fluid volumes in a reservoir rock are usually estimated by interpreting integrated well-logs, and 3D seismic data (Abiola and Obasuyi 2019; Paris and Stewart 2020). The interpretation of these data, including porosity logs (derived from sonic, density, neutron, or nuclear magnetic resonances El Shayeb et al. (2020)), depends on a series of factors, among them, matrix density, the saturating fluid type, and the drilling fluid type to name a few (Singh et al. 2016). Some of these parameters are not usually known and must be estimated from laboratory analysis of extracted samples, which are time-consuming and expensive at the reservoir scale (Singh et al. 2016). In the traditional approach, i.e., using plug samples for laboratory analysis, a typical set of samples usually requires 4–6 months to process, with the added downside that the process is destructive.

Furthermore, sample extraction involves highly localized data, which usually cover tiny areas, and diagenetic processes on carbonate reservoirs prevent the application of conventional upscaling methods. Determining porosity from well-logs is also complicated when one considers their resolution, usually very coarse, which results in the loss of fine details (Ghaleh and Kordavani 2019), compared to core plugs. Depending on the tool, the maximum vertical resolution achieved for porosity logs is up to 1.82 m throughout Nuclear Magnetic Resonance (NMR) logs. Additionally, these methodologies are susceptible to fluid alterations in the invaded zone due to an unexpected mixture of mud filtrate and formation fluids (Mondol 2015). Accordingly, porosity estimation from well-logs involves uncertainties and is also affected by the logging environment and logging tools (Elkatatny et al. 2018).

The reservoir characterization process requires data analysis from different sources, such as well-logs and core samples, to mitigate indirect measurements and carbonate heterogeneity uncertainties. Digital core sample analysis using X-ray Computed Tomography (CT) data has become a routine technique used to acquire 3D images for rocks that are difficult to characterize, such as carbonate rocks (Basso et al. 2020; Chandra et al. 2019; Victor et al. 2017). The CT image grayscale level is a nonlinear function of material density. Hence, since material porosity impacts medium thickness, it can be estimated from image attenuation coefficients. Wolański et al. (2017) found correlations between attenuation coefficients, formation bulk density, and the total porosity index for 230 sandstone samples classified by lithofacies without individualizing pore space.

Deep learning models, such as convolutional neural networks (CNNs) (LeCun 1989), have been used for automatic analysis of Micro Computed Tomography (\(\mu \)-CT) images in several tasks, including lithological classification of carbonate rocks (dos Anjos et al. 2021); mineral segmentation of 2D and 3D data (Da Wanga et al. 2021); porosity, specific surface area, and average pore size quantification (Alqahtani et al. 2020); and permeability estimation (Tembely et al. 2020; Sudakov et al. 2019; dos Anjos et al. 2023). Despite the promising results for petrophysical characterization using deep learning models, the proposed approaches mostly use rock samples without complex structures such as sandstone, employ high-quality \(\mu \)-CT images of rock samples as input, or apply complex simulation techniques or laboratory tests to calculate the properties used as targets for model training. The \(\mu \)-CT samples provide only very localized information, which is not representative enough to upscale to the reservoir scale, especially in heterogeneous rocks such as carbonates. Furthermore, the methodology employed to estimate these properties from well-logs at the reservoir scale can be strongly influenced by data acquisition conditions and is plagued by coarse resolutions.

An attempt to predict the petrophysical properties at the well-scale at a more detailed resolution is presented in Valentín et al. (2018). This proposal uses borehole image logs and the petrophysical properties obtained through well-log analysis as training pairs to adjust the neural network architecture. The main obstacle of this methodology is that borehole images can present many artifacts requiring complex processing methods for their generation. Another limitation stems from using well-log data to obtain petrophysical properties since these are less accurate and display lower resolution than laboratory tests. These factors can be problematic in heterogeneous samples.

Another method that attempts to predict a set of petrophysical properties, such as porosity, density, formation factor, and clay volume, at the well-scale is presented in Jouini and Keskes (2017). In that work, texture features are extracted from the tomography image using the steerable pyramids decomposition method. The extracted characteristics are used to predict the desired petrophysical characteristics using multilayer perceptron networks. Once trained, the authors use these models to create continuous profiles of the desired petrophysical properties using the textural features extracted along the core. Although the results were satisfactory, the method was tested only on sandstone samples. Unlike that approach, our proposal explores a deep convolutional model’s ability to directly estimate the set of relevant features from the image to determine the desired property in complex carbonate samples. This proposal stems from carbonate samples having a wide range of textural attributes that are difficult to characterize using traditional texture analysis methods.

Considering the previous limitations, the primary purpose of this work is to combine machine learning methods, whole-core CT images, and laboratory data to estimate porosity logs from whole-core CT images using a deep neural network. The main idea behind this work is to merge the different scales (whole-core and plugs) and measurement sets (CT images and laboratory results) to reduce uncertainties when upscaling laboratory data to the reservoir scale. The dataset used in this work is composed of Brazilian carbonate rock samples. However, the method employed is not specific for this type of rock and can be used for other types of rock formations.

An additional advantage of the proposed method is the time reduction compared with conventional porosity analysis since ours is almost instantaneous (depending only on machine time).

The main contributions of this work are:

  • an end-to-end method for porosity prediction using multiple scales data currently available in the petrophysical workflow;

  • a realistic experimental validation method using 26 wells of a Brazilian pre-salt carbonate reservoir based on well-based cross-validation to assess model generalization capabilities;

  • the method employs widely available CNN models such that practitioners can easily deploy it on their own data.

The remainder of the paper is organized as follows. The next Sect. 2 presents the theoretical background of X-ray computed tomographies. Section 3 details the material and methods used in this work. Section 4 presents the results and discussion, while Section 5 the conclusions.

2 Theoretical background

2.1 X-ray computed tomography

CT is a non-destructive spatial imaging technique that was originally developed for medical diagnostics but has been extensively applied in the industrial environment since the 1970s (Wevers et al. 2018). CT has been widely used in the oil and gas industry given its outstanding potential in reservoir engineering since it enables analysis of the internal structure of rock samples while also preserving a digital record of the core prior to other destructive analysis.

Based on internal variations in material density, CT images are used for quantitative and qualitative analysis of rocks. In the qualitative sphere, the technique allows facies evaluation (Odi and Nguyen 2018) and fracture mapping (Remeysen and Swennen 2008), among others. Concerning quantitative evaluations, high-resolution CT data allows 3D pore structures to be individualized, enabling estimation of a series of petrophysical properties (Andrä et al. 2013; Faisal et al. 2019). Additionaly, laboratory and well-log scales can be correlated using CT images, which can also be easily associated with electrical and acoustic borehole image logs.

Detailed descriptions of acquisition systems and imaging processing can be found in Hanna and Ketcham (2017) and Ketcham and Carlson (2001).

2.2 Multi-scale analysis

The rock characterization process consists of determining the different physical, chemical and petrophysical characteristics of rock samples. The determination of these characteristics is done on a scale that allows us to interconnect with well data and a resolution that allows in detail the heterogeneity of the well, to later be used in the characterization of oil and gas reservoirs. Some of the most important characteristics determined in this process are porosity and permeability, which are related to the amount of fluid contained in the sample and its ability to flow. To determine these characteristics, data obtained from various sources are used in different scales such as well profile data, rock sample analysis, among others, this makes this process challenging. The idea of this paper is to combine data coming from different scales. Let’s first detail these scales:

  • whole-Core: Cylindrical rock sample extracted from the well with a conventional height of 30 m, used to perform a more detailed analysis compared to the well profile data. These samples are taken only from some regions of interest to the well;

  • whole-Core Section: Cut of 90 cm from the Whole-Core sample made to facilitate the manipulation and the digitalization of the rock. The problem with this scale is that it is not feasible to perform laboratory tests in rock samples of this size and digital images obtained by CT equipment have a very low resolution (around 200 \(\mu \)m), which makes it impossible to model the porous system in detail, especially in heterogeneous samples such as carbonates;

  • plug/\(\mu \)-Plug: Sub-samples of the Whole-Core extracted from some regions of interest. Plugs usually have between 5 and 10 cm in height and 3 cm in diameter, while \(\mu \)-plugs have 3 to 5 mm in height and from 1 to 2 mm in diameter. These sub-samples are the ones used for laboratory tests and to obtain \(\mu \)-CT images. Plugs are usually digitized with low-resolution (around 40 \(\upmu \)m), while \(\mu \)-plugs are obtained with high-resolutions (typically between 1 to 5 \(\upmu \)m). From these images it is possible to know in detail the internal structures of the rock controlling some of the petrophysical properties.

A schematic representation of the distinct scales utilized in our study can be seen in Fig. 1. In this light, the idea of this paper is to merge the coarse information obtained in the scale of the Whole-Core with the more granular information obtained from the laboratory tests done in the plug scale.

Fig. 1
figure 1

Depiction of the different scales involved in a typical workflow

3 Materials and methods

In Fig. 2, we summarize the approach proposed in this work. This approach uses CT images and laboratory porosity measures (orange points) as input and targets for model training and validation. Once the model is trained, a porosity curve (blue line) can be extracted from the CT images. The proposed approach consists of 3 phases: the first phase consists of building the tomographic profile image of the rock; the second phase corresponds to training the machine learning model; and the last step refers to using the model to infer the desired porosity curve. Although deep learning models are proposed in this work, a linear regression model trained on the averages of grayscale lines was used as a simple baseline for comparison. The elements of the proposed approach are explained in detail in this section.

Fig. 2
figure 2

Overview of the proposed workflow

3.1 Dataset

This work uses CT images of 26 wells from 3 Brazilian carbonate fields. The 16-bit images were acquired on a GE BrightSpeed Elite Select 16 0.49 \(\times \) 0.49 \(\times \) 1.25 with a spatial resolution of approximately 490 \(\upmu \)m. The number of cores, CT images, and laboratory tests for each field are presented in Table 1. The laboratory data corresponds to porosity values obtained by a gas porosimeter.

Table 1 Dataset description

3.2 Data preparation

The tomographic profile is extracted by unwrapping the external cylindrical shell from the volumetric tomography as displayed in Fig. 3c. This view was selected because it is the closest portion of the volumetric data to the logged formation.

Before extracting the tomographic profile, the volumetric CT data are pre-processed to correct possible artifacts generated during the acquisition process, such as beam hardening. Boas and Fleischmann (2012) shows an overview of common CT artifacts and the respective correction method. It is important to note that the acquisition equipment should be calibrated to make sure that the output image values correspond to the Hounsfield scale, which is a scale used by radiologists to interpret medical tomography (DenOtter and Schubert 2023). All images used in this work have the same acquisition settings.

Fig. 3
figure 3

a Three-dimensional CT data and the derived bi-dimensional views: b transversal, c tomographic profile, and d longitudinal layer

Once the tomography profile image is extracted, the porosity measurements of the available horizontal plug samples performed at the laboratory are aligned with the CT image (Fig. 4b) using the depth information available for core photographs (Fig. 4a). Using these points as a reference, crops of dimensions [\(91\times 501\)] (adjusted so the height is similar to the sample’s diameter, 3.8 cm, and with a length spanning the entire whole-core) are generated from the tomographic profile image (Fig. 4c). Naturally, the length of the tomographic profile varies from well to well, thus, we used a sliding window with a stride of 45 pixels to create the desired crops. Using this methodology, 22,000 crops were generated.

Finally, two traditional data augmentation techniques were applied to each crop: flipping and rotation by 180 degrees as depicted in Fig. 4d. The goal is to increase the diversity of the database and decrease the risk of overfitting the neural network model. Following the beforementioned process, input–output pairs, i.e. images crops and laboratory measures, are formed to be used in the deep learning models.

For the linear regression model, that will be used as baseline, at each depth of the CT images containing a plug, a row of pixels with the same width as the image is considered and its average grayscale value is calculated. These average values are used as input to the linear regression model. The target, as in the case of the deep learning models, is the laboratory-measured porosity.

Fig. 4
figure 4

Depth adjustment of laboratory porosity measurements used for model training and evaluation based on a colored photographs; b the tomography profile image; c crop generation by using a sliding window with a stride of 45 pixels (each dashed line rectangle represents a crop), and d crop data augmentation (the top rectangle is a crop taken from c and the remaining two rectangles are the resulting crops after the flipping and the 180 degrees rotations)

3.3 Deep learning models

In this work, two distinct network topologies were utilized: a simpler convolutional neural network, henceforth referred to as the Shallow CNN (SCNN), which aims to establish a baseline model with a low computational cost, and a deeper convolutional neural network, termed the Deep CNN (DCNN). The DCNN is based on the well-known VGG19 architecture (Simonyan and Zisserman 2014) and it was initialized with its weights pre-trained on the ImageNet dataset (Deng et al. 2009). Its last layer was replaced with new fully-connected layers that were randomly initialized. After this modification, all layers of the DCNN were jointly trained on our dataset.

All convolutional layers of both topologies used Rectified Linear Unit (ReLU) (Goodfellow et al. 2016) as an activation function. The SCNN features a single fully-connected layer after the convolutional layers, which employs ELU activation, and is succeeded by a final linear layer. The DCNN, while maintaining a similar structure until the convolutional layers, includes 2 hidden fully-connected layers rather than 1. Both the SCNN and DCNN employ dropouts in these fully-connected layers. Table 2 summarizes the configuration of the proposed topologies.

Table 2 Architecture of the deep learning models: \(k\) corresponds to the size of the convolutional filters, \(F\) is the number of filters in each layer, and \(S\) is the stride of the max-pooling layer

3.4 Training and validation setup

Before training, validating, or testing the model, the input images must be properly normalized. For the SCNN and the linear regression models, this normalization process consists of dividing each pixel by the maximum possible value for a 16-bit image (namely, 65.536). For the DCNN model, given that the model was pre-trained on the ImageNet dataset they need to follow the same normalization process used in the pre-training step, which consists of normalizing the pixels range to the \([-1,1]\) interval. Additionally, since the ImageNet (and thus the pre-trained network) uses 3-channel images we need to convert our CT images, in grayscale, to a 3-channel representation. This is done by replicating the original channel in all the 3-channels.

To assess the proposed method generalization capabilities, a cross-validation scheme was devised. In each fold, in a total of 26, an entire well is separated to act as a blind test, while the remaining 25 wells are used for training and validation. This type of validation simulates the following real case scenario: given an oil field with several, already established, wells are found a new one is to be built. The model trained on the established fields is thus used to perform predictions for the new well. The training and validation sets correspond to 80% and 20% of the generated CT crops while the test set is composed of the remaining, separated, well. The training set is used to adjust the models weights. The validation set is used to stop the model’s training early to prevent overfitting. The test set is used to guage performance.

The Huber loss is used as the cost function of the SCNN and DCNN models (Eq. 1). This loss function is selected based on the presence of outliers in the dataset that are associated with low CT image resolution, which leads to poor structural details and in turn leads to outlier targets. Besides, the Adam optimization algorithm was used with a learning ratio of 1e-4. All models were trained in an environment with four Nvidia V100 GPUs using TensorFlow and scikit-learn as frameworks. On average, an epoch of the SCNN model took approximately 26 s, while the DCNN model took 222 s. To evaluate the model’s performance, two figures of merit were selected: the root mean square error (RMSE) (Eq. 2) and the Pearson correlation coefficient (PCC) (Eq. 3).

  • Huber Loss

    $$\begin{aligned} \begin{aligned}&\text {loss}_{\delta }(y, f(x)) \\&\quad ={\left\{ \begin{array}{ll} 0.5 (y - f(x))^2, \quad |y - f(x)| \le \delta \\ \delta (\mid y - f(x)\mid - 0.5 \delta ), \quad \text {otherwise} \end{array}\right. }, \end{aligned} \end{aligned}$$
    (1)

    where \(\text {loss}_{\delta }(y, f(x))\) is the loss value for a target value y and a prediction f(x) using a hyperparameter \(\delta \). If the absolute error between the target and the model prediction is less than the \(\delta \) value, then the loss calculates the Mean Squared Error (MSE); otherwise, it calculates the Mean Average Error (MAE). The limitation of this loss function is the definition of the hyperparameter \(\delta \).

  • Root Mean Squared Error (RMSE)

    $$\begin{aligned} \text {RMSE} = \sqrt{\frac{1}{N} \sum _{i=1}^{N} (x_i - y_i)^2}, \end{aligned}$$
    (2)

    where x\(_{i}\) are the predicted values, y\(_{i}\) are the target values and N is the number of samples.

  • Pearson Correlation

    $$\begin{aligned} \text {Pearson} = \frac{ \sum _{i=1}^{N}(x_i-\bar{x})(y_i-\bar{y}) }{ \sqrt{\sum _{i=1}^{N}(x_i-\bar{x})^2}\sqrt{\sum _{i=1}^{N}(y_i-\bar{y})^2}} \end{aligned}$$
    (3)

    where \(\overline{x}\), \(\overline{y}\) are the predicted and target mean values.

3.5 Model inference at the core scale

Once the model is trained, the porosity for each well depth can be inferred from the CT images alone without the need of the information obtained through laboratory tests. Thus, it is possible to perform model inference at the level of whole-core sections. Using these predictions, a porosity curve is constructed at the core scale with an approximate resolution of 490 \(\upmu \)m following the process summarized in Algorithm 1 and detailed below:

  • for each depth at the whole-core sections a prediction is generated. These predictions are subsequently stacked to generate a whole-core section log. Here, only the central crop of the CT image is used as the input of the model. To generate a model prediction with the same dimension as the input the top and bottom edges are padded. In this case, the padding is performed with a 45-pixel square mirroring operator;

  • before stacking the whole-core section predictions to generate the logs, the top and bottom edges are filtered-out to reduce the edge effect generated by the convolutional operators and the mirror technique used in the previous step. This filter corresponds to the function edge_filter in Algorithm 1, and consists of replacing the first and last 45 pixels with the closest valid prediction made by the model;

  • due to rock sample integrity, discontinuities or overlaps may occur when stacking predictions based on depth information, especially in fractured samples, where the dimensions of each section sample is exactly standard. Therefore, to work around this problem 3 operations are made: first, fractures with a length greater than 80% of the core sample length are discarded; second, samples with a height greater than 90 cm are cropped to conform with the standard height; and finally, the model’s last prediction is repeated untill the target dimension of 90 cm is reached;

  • after the previous steps, the predictions of each sub-core sample are stacked to form the predicted porosity log. Given that the obtained log displays high frequency components, it undergoes a smoothing process. This is also done in order to perform direct comparisons with other property profiles obtained by well drilling tools, which are usually smooth. The smoothing process consists of applying a six-order Butterworth filter with a cutoff frequency.

Algorithm 1
figure a

Model inference at the core scale

The analysis of the predicted porosity curve at the core scale is essential to achieve the upscaling objective of the present work; thus, in addition to the direct comparison between the predictions of the model and the measurements made in the plug scale, a second evaluation approach was proposed, which consists of a comparative analysis between the porosity curve predicted by the model and the porosity log derived from the NMR log. This comparison is only performed because no other result concerning the porosity curve (with the same high-frequency components as the predicted curve) may serve as a reference. Our objective is not to match the referenced porosity log perfectly but to obtain a high-frequency porosity curve in the core scale that exhibits the same behaviour.

One of the main limitations of this comparison is the limited availability of tomography images with corresponding porosity logs in well-scale for research purposes. Thus, in this work, porosity logs could only be obtained for a single well, including an 8-meter section of the core images (not included in Table 1). However, the well-log data still exhibits obstacles even after being acquired. First, these data are strongly influenced by acquisition conditions, such as tool calibration, the presence of drilling fluid, vugular porosity, and magnetic minerals. These factors reduce the reliability of the data, which in some cases causes significant changes from the laboratory data hindering the process of comparing the data at different scales. The second obstacle is related to the low resolution of these data (in the scale of centimeters) compared to the tomography data (in the scale of micrometers), which can lead to the non identification of certain levels of heterogeneity in the sample that are noticeable at the tomography scale but not on the well-scale. Furthermore, this mismatch may generate errors in the alignment between the log data and the data extracted from CT images. Thus, this process usually uses complementary information from well-logs acquired from different logging tools. Even the profile predicted by the model can be used as a reference to perform this adjustment automatically.

To solve the problem of scaling the profile, a cubic interpolation process is performed to generate a profile with the same resolution as the core image (490\( \mu \)m), the resulting record is called the filtered log. Then, to correct the misalignment problem between the predicted porosity log and the filtered porosity log, a manual alignment process was performed. Given the above considerations, matching the porosity values perfectly is not as relevant as matching its behavior (which in the end is our goal); thus, the correlation analysis between the available logs (ours and the ones coming from NMR) is important to validate this result.

4 Results

4.1 Well based cross-validation scheme

Tables 3 and 4 present the results obtained by the proposed models in terms of the RMSE and PCC in the cross-validation scheme introduced in Sect. 3.4. The metrics presented correspond to the model’s performance when the corresponding well is used as a blind test for the model.

Table 3 RMSE results for each fold of the well based cross validation scheme
Table 4 PCC results for each fold of the cross-validation scheme

These tables indicate some interesting facts: firstly, the models are able to generalize on the blind test dataset; secondly, the deep learning models show better results on average compared to the linear regression model, with the DCNN model presenting the smallest RMSE on average. Besides, it is possible to notice that, when we analyse the results for each well, the deep learning models have the best RMSE for almost every well and the best PCC for every well. On the other hand, the linear regression model has the best RMSE for only two wells.

In order to check the differences between the deep learning models and the linear regression model a histogram, per well, of the RMSE of the tested models was conducted and is shown in Fig. 5. It is possible to notice that for the linear model the errors concentrate on larger values of RMSE, while the the deep learning models have a greater concentration of smaller values of RMSE. Another interesting fact is that the difference between the deep models is small which means that the models are robust considering the topology.

Fig. 5
figure 5

Histogram per well of the RMSE for the LR, SCNN and DCNN models

Fig. 6
figure 6

Scatter plot for each fold of the cross-validation scheme (well W2 is isolated in Fig. 7). The horizontal axis of the plots corresponds to the target porosity value, and the vertical axis corresponds to the predicted porosity

Even though the difference of the average results of SCNN and DCNN is small, the DCNN model was chosen for the application described in this article since it has a smaller RMSE on average with smaller variance. Thus, all the results presented hereafter will refer to the DCNN architecture.

Figures 6 and 7 show the scatter plots for each well for the fold in which this well is used as the blind test. In these images, apart from the scatter plot itself, two lines were plotted. These lines represent the maximum desired error of the method, stipulated here to be 5\(\%\). The 5\(\%\) accuracy limit was defined by experts based on a tolerable margin of error for reservoir characterization, considering the uncertainties involved in the process. Visual inspection of these figures confirms the overall good performance of the model, with relatively few cases where the model’s predictions go over the stipulated images. These cases allows us to better understand the source of errors in these problematic wells. In the rest of this section, we try to understand the reasons why the model’s performance dropped for these particular wells.

Fig. 7
figure 7

Scatter plot of the predicted porosity and laboratory porosity of well W2. Regions within the black dashed lines correspond to the laboratory porosity range of [12–13%] and predicted porosity range of [19–20%]. It is interesting to notice that the target porosity, although restricted, spans the entire range of predicted values, and vice-versa

By analyzing Fig. 6, an unusual porosity distribution can be identified in well W9, which is mostly represented by low-porosity samples. Another characteristic of this well, as well as W8, is a reduced number of samples, which results in poor statistical relevance, containing approximately 1.37% and 1.72% of the total data. Notably, despite the low correlations for wells W8 and W9, the RMSE values remain low at 3.793 and 2.231, respectively, when comparing to the LR baseline model, indicating that the model can infer satisfactory porosity values for these wells.

To correlate the numerical results with texture patterns, some image slices of sections with significant divergence between porosity predictions and laboratory measurements are discussed in the following figures. Notably, to perform this visualization, the contrast of each image is adjusted to facilitate expert analysis, allowing us to highlight some textural attributes.

In Fig. 8, characteristics of silicification processes can be identified in both nodular and laminated structures. The specific gravity of silica minerals is lower than that of most of the minerals that compose carbonate rocks; therefore, such materials present a darker gray tone in the tomography images, which can generate a texture that can be confused with porous spaces in some cases, resulting in overestimations by the model of the predicted porosity.

Fig. 8
figure 8

Crops of size 44.4 mm \(\times \) 244.6 mm from wells W8 and W9 showing laminated and nodular textural patterns where divergence from the predicted porosity value was overestimated compared to laboratory-measured porosity at the same depth

Concerning wells W20 and W22, crops of depths whose divergence between the predicted porosity value and the laboratory measurements exceeds 5% are illustrated in Fig. 9. In the first and second crops, regarding well W20, the presence of vugs and release joints between the laminations is evident. In the remaining crops, as shown in Fig. 9, we illustrate vugs possibly not covered by sampling and laboratory measurement, but may be interpreted as porous space by the CNN models, which could explain the overestimation of the model. In cases where porosity was underestimated (these cases are not shown in Fig. 9), especially in well W22, a pattern could not be found in the images, but a probable explanation for this behavior is the presence of sub-resolution porosity, suggesting pores with diameters smaller than the resolution of the medical CT acquisition setup that are consequently not captured by CT imaging, and, thus, cannot be seen by the model.

Fig. 9
figure 9

Crops of size 44.4 mm \(\times \) 244.6 mm from wells W20 (first and second) and W22 (third and fourth) where the predicted porosity exceeded the value of the laboratory measured porosity at the same depth by more than 5%

The models tend to underestimate the porosity values for well W13, as shown in Fig. 6. This well is primarily composed of a laminated texture, as shown in Fig. 10. In this example, the porosity of the laminated textures is predominantly composed of subresolution porosity due to its fine depositional components. Furthermore, most of the high porosity laminated texture in our dataset is found in this well, resulting in model failure when this well is not included in the training set. Pointing once again to the need for a representative dataset in machine learning models.

Fig. 10
figure 10

Crops of size 44.4 mm \(\times \) 244.6 mm from well W13 showing laminated textural patterns where predicted porosity values were underestimated by the model compared to the laboratory-measured porosities at the same depth

Some of the images outside the error range (\(\pm 5\%\)) from well W17 are listed in Fig. 11, where the presence of a large amount of possibly non-sampled vugular porosity can be observed. Due to the type of image used for this purpose (external area of the core), we are unable to guarantee that these pores were adequately represented in the sample extracted from the inner portion of the core used in laboratory tests, which may be the main reason for overestimation of the predicted porosity. In Fig. 11, two cases in which the contribution of sub-resolution porosity is probably large is shown. In addition to W13, the model also recognizes a textural pattern on images, predicting porosity values within a small range for similar images.

Fig. 11
figure 11

Crops of size 44.4 mm \(\times \) 244.6 mm from well W17 showing a similar textural pattern, where predicted porosity values are similar, while the laboratory-measured porosities at the same depth range from approximately 2 to 20%

Likewise wells W13 and W17, well W19 also predict lower porosity values for depths where laboratory porosity samples presented values greater than 15%. However, depths with porosities below 5% are overestimated, as shown in Fig. 6.W19. As previously described, the overestimation is probably related to vugs that were possibly not covered during the sampling process and were not measured by routine core analysis, as listed in Fig. 12. In Fig. 13 cases in which the contribution of subresolution porosity is probably large are depicted.

Fig. 12
figure 12

Crops of size 44.4 mm \(\times \) 244.6 mm where predicted porosity values were overestimated by the model with respect to the values measured for well W19 in the laboratory at the same depth

Fig. 13
figure 13

Crops of size 44.4 mm \(\times \) 244.6 mm where predicted porosity values were underestimated by the model with respect to the values measured for well W19 in the laboratory at the same depth

Finally, to understand the impact of the heterogeneity of the images in this methodology, from the data for well W2, two porosity intervals were selected: laboratory porosity between 12% and 13% and predicted porosity between 19% and 20%, as outlined in Fig. 7.

Some of the depth tomography images with porosity contained in the intervals mentioned above are presented in Figs. 14 and 15, where extensive variability is evident between the images of the same group. In Fig. 14, some images do not have visible pores, in addition to fine granulometry. In contrast, some textures show visible pores, and this variability in textures with the same porosity value measured in the laboratory may cause confusion in the training process. In Fig. 15 it is shown that no pattern of textures all from which the model associates the same range of porosity values, with some cases showing extensive macro-porosity and other cases showing no apparent macro-porosity, which reflects the heterogeneity of the training set.

Fig. 14
figure 14

Crops of size 44.4 mm \(\times \) 244.6 mm from well W2 with porosity values measured in the laboratory ranging from 12 to 13% whose predicted porosity values range from approximately 6 to 19% at the same depth

Fig. 15
figure 15

Crops of size 44.4 mm \(\times \) 244.6 mm from well W2 with predicted porosity values ranging from 19 to 20% whose porosity values measured in the laboratory range from approximately 6 to 21% at the same depth

Summarizing, a significant portion of the model’s errors seem to stem from 3 cases: first, the textural attributes of the well were not present at training, in which case the only solution is to retrain the model (although machine learning models can generalize they cannot do it for previously unknown features); second, the porosity is represented in a finer resolution than the one used in the model, and, thus, invisible to it; third, the plug samples extracted were not representative of the whole-core pore structure. Apart from these the model achieved good results within the stipulated images.

4.2 Prediction at the core/well log scale

As referred to in Sect. 3.5, a comparison between the smoothed estimated porosity and the NMR-derived porosity is performed. The NMR-derived porosity is commonly used for porosity estimation and pore size distribution analysis and with bulk density. These logs were chosen due to their low depth of investigation (approximately 2–10 cm), which is ideal for a reasonable comparison with imaging data acquired from core samples. Notably, to create a continuous porosity log from NMR data, an interpolation process was performed. This comparison is presented in Fig. 16; in this case, data from an 8-meter sequence of a well were used to evaluate model performance in this task, resulting in a RMSE of 2.656 and PCC of 0.768. As it can be seen the behaviour is very similar.

Fig. 16
figure 16

a Log view of an 8-meter sequence of a tomography profile image and respective b porosity logs (black—NMR-derived porosity; blue—predicted porosity log after postprocessing) and c density logs (yellow − grayscale-based CT; orange—bulk density) (color figure online)

5 Conclusion

Traditional laboratory methods for porosity analysis are accurate but punctual and time-consuming. In this paper, a new methodology is proposed based on artificial intelligence to estimate porosity directly from CT images of rock samples and subsequently obtain a high frequency porosity well log. Two CNN models, with different depths, were proposed to achieve this objective. The proposed method was compared against a baseline linear regression model. The results using the proposed cross-validation scheme show that all models are capable of generalization in blind sets. The DCNN model, when measuring the RSME, displayed lower deviations and averages, suggesting that the DCNN model takes into account not only the grayscale values but also the texture of the image. These gains, aligned with the fact that these models are trained once and used many times, justify the model’s increased complexity. However, the models may perform differently in wells with other textural characteristics, considering the wide heterogeneity of carbonate rocks. This work mainly focused on validating the proposed methodology rather than identifying the best neural network architecture, such that other CNN models and different topologies could be employed and tested in future works.

The proposed methodology can be used to generate a continuous core-scale porosity log directly from CT images in a few seconds, depending on core dimensions and computational resources. The results show that this prediction is helpful in conducting preliminary analysis of rock samples, providing valuable insight into reservoir characteristics. Core-to-log correlation is a common practice used to calibrate log measurements, providing a bridge between coarse-resolution data obtained at the well log scale and laboratory data obtained in fixed intervals according to government rules. The continuous porosity log (predicted by the model) overcomes the lack of resolution of the previous data, reducing the uncertainty of the upscaling process in heterogeneous regions. The proposed approach also allows the possibility of automating the alignment process between well log data and the data in the core-scale. Traditionally this process involves the analysis of complementary information from different well-logs. Some tests performed by the authors and subsequent comparison with manual alignment show promising results.

Future works may include efforts in automation and expanding the proposed method to different petrophysical characteristics, such as permeability. In the automation front, the identification of high-frequency stratigraphic cycles at the well-core scale, which are associated with specific patterns of petrophysical attributes (Zhao and Li 2015). Such automatic identification can accelerate the process of detailed well characterization, thus avoiding human bias.