1 Introduction

Successful production of hydrocarbon from unconventional shale plays relies heavily on our knowledge of mechanical properties of the formation including elastic moduli and strength (Dahi-Taleghani et al. 2011; Hoek and Martin 2014). Generally, shales are extremely heterogeneous composite materials, with various constituents and complex microstructure. As a consequence, researchers face many challenges to study their mechanical properties (Li et al. 2018a, b, 2019a). Conventional laboratory mechanical testing, such as uniaxial and triaxial compressive strength testing, is time-consuming and expensive. Additionally, the acquisition of rock samples of good quality for these laboratory measurements can become challenging and sometimes impossible due to the brittleness of the samples and their availability, as well. Thus, numerical modeling methods that are based on micromechanical theory are well established in many studies (Goodarzi et al. 2017, 2016; Li et al. 2019a, b; Liu et al. 2018; Zhao et al. 2019). However, these studies are generally carried out on assumptions in the model where interactions between various components are simplified (Goodarzi et al. 2017, 2016; Zhao et al. 2019).

Recently, indentation techniques that  show great promise as an approach in assessing the mechanical properties of shales at a local scale have gained popularity compared to the conventional macroscale methods. Instrument indentation is a technique that allows us to measure the mechanical properties of porous composite comprised of multiple minerals. This technique involves a small probe that penetrates into a flat surface of the sample, while the applied force and displacement of the probe are then recorded concurrently to calculate the mechanical properties, including Young’s modulus and hardness (Ulm and Abousleiman 2006). The mechanical properties of shales are scale-dependent (Abedi et al. 2016b; Li et al. 2021a; Veytskin et al. 2017). At a macroscale, the mechanical properties of shale are measured in the lab on core plugs or bulk samples which contain natural fractures and weak bedding planes. In comparison, the indentation technique can be used to investigate the mechanical properties at a fine scale by avoiding from the effect of uneven distributed microfractures. Many existing analytical models usually ignore these features by considering shales as composite materials with different minerals and organics geometrically stacked together. Therefore, the mechanical response is determined by microstructures, components configuration, and grain-to-grain interactions (Sayers 2013; Goodarzi et al. 2016, 2017; Zhao et al. 2019).

Considering the sophisticated relationships between various components in shales, artificial intelligence (AI) and machine learning (ML) techniques have enabled us to tackle these problems more feasibly. Machine learning techniques are generally utilized to uncover the inherent rules and to highlight hidden features. In the past few years, a large number of studies using ML methods have been conducted to solve problems in the fields of petroleum engineering and geosciences (Izadi et al. 2015; Khan et al. 2018; Li et al. 2019a, b, c, 2021b; Misra et al. 2019; Torlov et al. 2017). Among various machine learning algorithms and their applications, artificial neural network (ANNs) has demonstrated outstanding performance on various tasks. The typical ANNs are comprised of multiple layers, including an input layer, hidden layers, and an output layer, and each layer contains a various number of nodes (Hassoun et al. 1995). The calculation is passed through by the connected nodes at each layer, while the connections carry weights, bias, and non-linear transformations. The outputs are then sent to the next layer of the network (Hassoun et al. 1995). The ANNs models should be trained based on a large amount of dataset, while during the training process, the values of the weights and biases by minimizing the error via backpropagation algorithms are iteratively updated. Moreover, ANNs with multiple hidden layers are classified under deep learning models. In this regard, convolutional neural networks (CNNs) is a category of deep neural networks, which has presented the state-of-the-art performance in computer vision applications. CNNs generally follows a similar basic structure as ANNs, with special convolutional and pooling layers (Simard et al. 2003). Though, unlike ANNs, where all nodes are fully connected to one other, convolutional layers utilize filters over multiple image locations and preserve the spatial relationships. Also, pooling layers are used to simplify the output information from the previous convolutional layer (Krizhevsky et al. 2012). Compared to classical machine learning algorithms that mostly rely on hand-engineered filters/algorithms to extract specific features, ANNs/CNNs automatically learn the complex hidden features and therefore can exhibit outstanding performance when dealing with complex data, such as images (Sultana et al. 2018).

Studies using image-based deep learning methods to estimate mechanical properties are limited. The majority of machine learning methods in determining the geomechanical properties of rocks so far are limited to well logs. For example, He et al. (2018) reported a comparative study of using shallow learning models to calculate compressional and shear wave travel-time logs (He et al. 2018). Gupta et al. (2019) compared the performance of several machine learning regressors in the generation of synthetic sonic logs and the prediction of mechanical properties (Gupta et al. 2019). The application of machine learning in imagery to determine mechanical properties is relatively limited. Li et al. (2019c) employed CNNs to establish a relationship between simplified mineralogy maps and effective mechanical properties of shale rocks. The labeled mechanical behavior of each sample was obtained using a finite-element model with generated input (Li et al. 2019c). However, obtaining the mineralogy maps of rocks is time-consuming, which either needs special proprietary software or needs to be done by experienced technicians on (back scatter electron) BSE-SEM and EDX (energy-dispersive X-ray) images. In contrast, the EDS(X) images are easily obtained. Elemental mapping involves rostering an electron beam, point by point over an area of interest on a sample surface, and at the same time, the spatial distribution and intensities of each element are mapped as pixel-by-pixel (bitmap) images. In geosciences, surface distribution and intensity information of the elements in a 2D map over the same area allow the users to determine the presence of mineral phases. Additionally, microstructural information of shale rocks such as the shape of mineral particle distribution pattern and interaction between mineral phases can be obtained in these electron micrographs.

In this study, a new approach to estimate Young’s modulus of shales by integrating deep learning CNNs into 2D elemental intensity distribution maps is presented. An end-to-end CNNs model was employed to link the comprising chemical elements of shale samples with corresponding mechanical properties of the same area. To do so, A CNNs model was trained with the elemental maps as input and Young’s modulus as ground truth data. In comparison to 2D mineral maps, elemental images are much easier to generate, since they are the main result obtained from electron microscopy. Hence, we can skip generating mineral maps as a connecting bridge to estimating the mechanical properties, which is the most labor-intensive and time-consuming step. The labeled data were obtained from laboratory measurements of Young’s modulus via laboratory microindentation tests. Next, the CNNs model was trained and further utilized for the mechanical behavior prediction of the samples. This study shows that collecting quick and easy elemental maps from the surface with SEM–EDS would be sufficient to estimate the mechanical properties of the samples.

2 Materials

Samples used in this study were retrieved from the Bakken Formation, Williston Basin, North Dakota, U.S. The Willison Basin is a major energy resource that covers western North Dakota, the northeastern region of Montana, and extends into parts of Canada (Fig. 1a). The Bakken Formation is the most significant hydrocarbon-bearing layer within the basin. It comprises three distinct members, the Upper, Middle, and the Lower Bakken. The upper and lower members are mostly black, organic-rich shales, with an average total organic carbon (TOC) content of 8 wt% and the middle member is fine carbonate-rich sandstone and siltstone (Abarghani et al. 2018). Ten shale samples taken from the lower and upper members from three different wells were used in this study (Fig. 1b). The samples were purposely chosen from various locations to reflect variations in mineralogy, thermal maturity, and microstructures.

Fig. 1
figure 1

a Schematic maps showing the location of the Bakken Formation in Williston Basin. The Bakken Formation covers western of North Dakota, US, northeastern region of Montana, US, and extended to Canada; b Well locations where samples are retrieved, shown on the zoomed map of North Dakota

3 Methodology

Methods to collect the input data, element intensity maps, and labeled data, Young’s modulus, are provided in this section, followed by a description of the architecture of CNNs model, and a flowchart to summarize the entire workflow.

3.1 Acquisition of Input Dataset–Elemental Density Images

The input data are element density maps. An element density map is an image showing the spatial distribution and intensity of the element on the surface of the sample (Wenner et al. 2017). Acquisition of elemental density images was carried out by energy-dispersive X-ray spectroscopy (EDS/EDX). An FEI Quanta FEG 650 SEM instrument, equipped with an X-ray detector, was used to acquire the map of the elements on the area of investigation. The operation principle is based on the ejection of electrons from shells near the atom of an element, and it leaves behind a hole in the inner shell. The generation of X-ray involves energy releasing when electrons jumping from the outer higher energy shell to the inner lower one to fill the hole. The energy and the amount of the emitted X-rays can be detected by an energy-dispersive spectrometer. Since the wavelength of the X-rays is unique to each element, it can be used as a characteristic of the atomic structure of the emitting element (Shindo and Oikawa 2002). The overall maps constitute of existing elements on the surface with their proportion at the submicron scale (Saif et al. 2017).

Mapping of the element intensity is based on compiling specific elemental composition across the certain area of the sample, following these steps (Fig. 2): (a) electron beam scans the target area to produce a BSE image, (b) EDS X-ray detector examines each pixel by desired resolution to collect an X-ray spectrum at each grid block, and extracts information about what elements are present and the proportion of each one, and (c) an energy window is defined for each element of interest and the number of X-rays detected in the energy window of the element at each X, Y location is plotted, thus the intensity maps are then created. These maps illustrate regions of relatively high and low elemental intensity (Shindo and Oikawa 2002).

Fig. 2
figure 2

Schematic illustration of element density mapping through Energy-Dispersive X-ray spectroscopy. Acquiring the element intensity maps requires the following steps: a scanning a Back-scattered electron (BSE) image via SEM, and the image was then divided into multiple tile; b for each point within a tile, the X-ray detector was used to get c the X-ray spectrum, which contains the intensity information for elements, and after all the points were examined and analyzed, d a spatial intensity map was generated for each element

3.2 Acquisition of Labeled Data (Young’s Modulus) via Microindentation

The labeled data are Young’s modulus measured through laboratory microindentation tests. The instrumental indentation is a load- and- displacement sensing technique, which has been broadly used in characterizing mechanical properties of a variety of materials at a microscale or nanoscale (Abedi et al. 2016a, b; Bobko and Ulm 2008; Bobko 2008; Fischer-Cripps 2011). The indentation procedure involves pressing an indenter with a fine probe on the sample surface by applying a load. During this process, the applied load P and displacement h are recorded by sensors. The elastic properties of the materials are estimated from the measured indentation load–displacement curve (Fig. 3) via the following analytical model (Oliver and Pharr 1992, 2004):

Fig. 3
figure 3

(modified from Hu and Li 2015). There are three sections in each indentation procedure. It involves pressing a fine probe into the surface of the shale sample by applying a load (Loading section, as shown in red). When the applied load matches the predefined maximum load, Pmax, the probe keeps the load and holding for a period of time (Holding section, 200 s in this study, as shown in black), and then, the probe lifts from the sample surface to the original position (Unloading section, as shown in green). The Young’s modulus for each indent was then calculated using information obtained from the unloading curve by Eqs. 14 

Load–displacement curve of indentation

$$\begin{array}{c}{E}_{r}=\frac{\sqrt{\pi }}{2}\frac{S}{\sqrt{{A}_{c}}}.\end{array}$$
(1)
$$\begin{array}{c}H=\frac{P}{{A}_{c}}.\end{array}$$
(2)
$$\begin{array}{c}S={\left(\frac{dP}{dh}\right)}_{h={h}_{\mathrm{max}}}.\end{array}$$
(3)

Here, \({E}_{r}\) is the reduced Young’s modulus, which is a function of the stiffness, S, and the indent contact area, \({A}_{c}\). Stiffness, S, is calculated by fitting the slope of the upper portion of the unloading curve (Fig. 3). Additionally, Hardness, H, can also be estimated by the peak load, P, and the contact area, \({A}_{c}\). The Young’s modulus can then be calculated by the following equation (Constantinides et al. 2006; Hertz 1881):

$$\begin{array}{c}\frac{1}{{E}_{r}}=\frac{1-{{v}_{s}}^{2}}{{E}_{s}}+\frac{1-{{v}_{\mathrm{tip}}}^{2}}{{E}_{\mathrm{tip}}}.\end{array}$$
(4)

Here, \({v}_{s}\) and \({v}_{\mathrm{tip}}\) denotes the Poisson’s ratio of the sample and the probe, respectively, and \({E}_{s}\) and \({E}_{\mathrm{tip}}\) are Young’s modulus of the sample and the probe, correspondingly. It has been proved that the value of Poisson’s ratio does not significantly affect the ultimate Young’s modulus that is derived, therefore, as an accepted value; 0.3 is considered for further calculations (Liu et al. 2018).

It is important to note that the mechanical property values obtained could either reflect the average property of the sample (mesoscopic) or the response of a single mineral (microscopic), depending on the magnitude of the applied load and size of the indenter. This being said, delineating the governing relationship between microstructures (minerals/organics and pore space) and mechanical properties would be critical to obtaining meaningful outcomes and properly interpreting the results. The loading force during the microindentation experiments must be chosen in accordance with the material (here shale) heterogeneity and constituent components to provide us with the average and correct mechanical response of the sample (Ulm and Abousleiman 2006). Considering a heterogeneous material comprised of two distinguished phases, when the indent size (\(l)\) is smaller than the characteristic size (\(D)\) of the particles, the mechanical response from each indent would be the properties of an individual phase, and the density distribution curve of the overall obtained Young’s modulus from the entire dataset should display two different peaks as a bimodal curve (Fig. 4a). In contrast, to access the bulk properties, or the average mechanical response of the composite material, a much higher load force should be chosen, so that the indent size can be much larger than the characteristic size \((D)\), \(l\)> > D. In this way, we can ensure that each indentation will assess the average property of the heterogeneous shale sample (Fig. 4b). Previous studies have shown that at least 4 μm displacement of each indents can be used for examining the bulk mechanical properties of shale samples (Zhao et al. 2019). In this study, experimental indentation tests on the Bakken shale samples demonstrate that a maximum load of 400 \(mN\) can generate indents large enough, while other necessary conditions are followed (Pmax = 400 \(mN\)).

Fig. 4
figure 4

Comparison between principles of a nanoindentation and b microindentation. \(\mathrm{l}\) denotes the indenter size, \(\mathrm{D}\) is the characteristic size, and \(\mathrm{L}\) is the spacing distance of indenters. The circle in dark brown illustrates the mineral particles, and the gray represents the matrix for a two-component material. The yellow triangle shows the indenter. For shale samples, when a relative smaller load (with Pmax = 3–5 \(\mathrm{mN}\)) is applied, the displacement of the probe and the indenter size is relative small, and the obtained Young’s modulus can be used to reflect the mechanical properties of each comprising mineral as shown in (a); In contrast, in microindentation with larger load (with Pmax =  ~ 400 \(\mathrm{mN}\)), a larger indenter would be generated and it then reflects the average mechanical response of the composite material

The indentation tests were performed using a Berkovich pyramidal tip with TI 980 Triboindenter, Hysitron, Minneapolis, Minnesota). Each sample was indented 50 times with two sets of 5 \(\times\) 5 grid patterns. The indenter locations were randomly selected within each sample. A load control model with a maximum load of 400 mN was used. A constant hold time was needed to decrease viscoelastic effects while measuring Young’s modulus. Since the examined samples in this study are organic-rich, a relatively long constant holding time, 200 s, was chosen following the loading stage. The indenter was advanced at a rate of 40 \(\mathrm{mN}/s\) to 400 mN, held at a constant hold for 200 s, and unloaded at a rate of 40 \(\mathrm{mN}/s\), and then, the elastic modulus calculated from each curve for each sample was obtained by averaging the dataset.

3.3 Convolutional Neural Networks’ Architecture

A standard CNNs architecture consists of several convolutional blocks, and each convolutional block typically is comprised of one or several convolutional layers, followed by pooling layers, and non-linear activation transformation (Sultana et al. 2018). A fully connected layer is attached at the end of the architecture, as shown in Fig. 5:

  1. (a)

    Convolutional layer: the convolution layer takes an image as input and applies transformations by filters. The spatial relationship of the input images is passed into the filtered maps, and also referred to as feature maps. The training process uses gradual modification to reduce any loss learnings from the parameters of the filtered images. The parameters of the filters are shared across the input space, which reduces the number of trainable parameters compared to the fully connected layers.

  2. (b)

    Pooling layer: a pooling layer is usually designed to attach the convolutional layers in CNNs to simplify the feature maps and reduce the number of model parameters, as well. The maximum pooling is the most popular pooling layer, which applies a maximum function to a receptive field (usually a n \(\times\) n kernel).

  3. (c)

    Non-linear activation transformation: similar to the ANNs, non-linear transformation carried out by the activation function is included in the CNNs’ architecture. The non-linearity of rectified linear unit (ReLu) defined as \(ReLU(x)=max(x,0)\) presents advantages in preventing vanishing gradient problems (Goodfellow et al. 2016). Conventionally, the ReLu term is used in deep CNNs models.

  4. (d)

    Fully connected layer: Fully connected layers are added to the last convolutional layer. The function of this layer is to transform all previous extracted scalarized features to a final class score for classification tasks or to some values for regression tasks.

Fig. 5
figure 5

Architecture of the modified CNNs model. The original VGG-16 model has two convolution layers for the first two Conv. Blocks, and has three convolution layers for the Conv. Block 3–5, and three Dense layers attached in the end. In our modified model, we use two convolution layers for all the Conv.Blocks, and increase one more Dense layer.

A number of classic CNNs architectures have been proposed and shown outstanding performance on image recognition and objective detection. These architectures also have proven successful applications on different problems. Classic architectures including AlexNet (Krizhevsky et al. 2012), ResNet50 (He et al. 2016), VGG (Simonyan and Zisserman 2015), and GoogLeNet (Szegedy et al. 2015) have exhibited excellent performance in image classification. In this study, the VGG-16 architecture was chosen as the basic model and further modified. The reason to choose the VGG model is the small receptive field of 3 \(\times\) 3 that is used in its architecture. This would be suitable in capturing the details of the contact interaction of grain particles in the element maps. The original architecture of the VGG-16 model consists of five convolutional blocks, each block comprising of two or three convolution layers followed by one pooling layer, and three full-connected dense layers attached in the end. In the original VGG-16 model, the initial input has the size of \(224\times 224\times 3\).

In this study, the following modifications were made for the architecture of the model, as shown in Fig. 5: we keep the structure of five convolutional blocks, and each block has two convolutional layers and one pooling layer, compared to the variable number of convolutional layers in the original VGG-16 model. Then, in the final block, we increase the number of dense layers to four, compared to the original model which has three dense layers. For each convolutional layer, the filter size is 3*3, the padding size is 1, and the stride size is 2*2. ReLu activation transformation was used. Additionally, the input data dimension was modified from originally \(224\times 224\times 3\) to \(112\times 112\times 9\), where \(112\times 112\) represents the size of each  element maps, and 9 represents that there are 9 element intensity channels. The number of learnable parameters in this CNNs’ model is around 12 million (Table 1), which is around only 9% of that of the original VGG-16 (Simonyan and Zisserman 2015). For more details regarding the model architecture, training process and prediction, a Github repository link: https://github.com/chunxiaoqiuyue/Estimate_Young-s-modulus-from-EDS-maps is provided for reference.

Table 1 Architecture and number of parameters in the CNNs model

3.4 Overview of the Computational Framework

The purpose of the computational framework is to train a valid CNNs model, which can effectively predict mechanical properties via input elements’ intensity images. There are four steps in the proposed computational framework: (a) building the input database by collecting the elements density maps through the EDS experiments, (b) collecting mechanical properties, Young’s modulus, of samples through indentation experiments, (c) building a CNNs model and training the model using input images and mechanical properties dataset, and (d) predicting the mechanical properties using the trained CNNs model (Fig. 6).

Fig. 6
figure 6

A general overview of the proposed framework: a collecting intensity maps of nine elements via SEM–EDS mapping; b collecting Young’s modulus by laboratory microindentation tests; c building CNNs model and training with element maps as input data and Young’s modulus as labeled data; d using trained model to predict Young’s modulus value for unseen samples.

4 Results and Discussion

4.1 Elements Density Images

Intensity distribution maps of nine most common elements within shale rocks, including aluminum (Al), calcium (Ca), carbon (C), iron (Fe), potassium (K), magnesium (Mg), sodium (Na), sulfur(S), and silicon (Si), were generated (Fig. 7). Twenty different locations were scanned for each shale sample. The original map of each scan area was specified by 768 \(\times\) 1024 pixel size, with a pixel resolution of 0.25um/pixel, which cultivated an area of 192 \(\times\) 256 μm for each scanned area. The smallest area over which a measurement can be made that will yield a value representative of the whole sample is referred to as a representative element area (REA). REA studies of 2D mineral maps of the Bakken shales have confirmed that the fractions of each mineral calculated from the box-counting method would not vary much when the boxing size becomes larger than 100 μm (Liu et al. 2018). Furthermore, a similar study of the REA in the Bakken shale via SEM maps in terms of porosity has reported that the REA can be a few hundreds of micrometers (Saraji and Piri 2015). In this study, by considering the REA and collecting adequate images to train our model, we chose an area of 125 \(\times\) 125 μm as the size of the basic REA images. Besides, since CNNs models use a matrix of n \(\times\) n, for each scanned area, an image with a size of 125 μm \(\times\) 125 μm was randomly cropped. A data augmentation technique was adopted next for generating a good number of training images, which resulted in the total number of 800 sets of elements density images (10 samples \(\times\) 20 scan locations per sample \(\times\) 4 randomly cropped sub-image per scan location) in the entire image dataset.

Fig. 7
figure 7

Maps of the element density of cropped sub-images for element Al, C, Ca, Si, K, Mg, Na, S, and Fe. Each image has the size of 125 μm × 125 μm and the brightness indicates the intensity of each element.

Additionally, the brightness of a given pixel in the 2D map represents the relative intensity of the corresponding element. It was found that Si was the most widely spread in the samples, followed by Al, K, and Ca, while Ca, Mg, Na, S, and Fe are less abundant. The correlation matrix of nine channels/elements densities demonstrated the relationship among densities of different elements, where 1 indicates a perfect positive linear correlation between two variables, and − 1 indicates a perfect negative linear correlation, while 0 means no linear correlation between two variables (Fig. 8). In our study, three sets of clearly positive correlations were noted, as shown in gray–black, and black in the correlation matrix. The existence of Al vs. K, Mg vs. Ca, and S vs. Fe is highly positively related, since these elements are co-existing in most minerals. Chemically, each mineral has a fixed elemental composition. For example, K-feldspar has the formula of KAlSi3O8, and therefore, the presence of Al is strongly related to the presence of K. In contrast, a strong negative correlation between elements was observed as only one mineral can be present at each pixel. For elements that do not exist within the same mineral, a negative correlation, such as Si vs. C, Si vs. Ca, Si, vs. Fe, and Si vs. S was observed.

Fig. 8
figure 8

Correlation matrix among elemental intensity. Zero denotes no linear correlation; 1 and -1 each indicates perfect positive and negative linear correlations, respectively. Al vs. K, Mg vs. Ca, and S vs. Fe are highly positively related, since these elements are co-existing in minerals abundant in shales.

With chemical information of the minerals, not only raw elemental composition and distribution information can be obtained, but mineral phases can be mapped with additional processing of the EDS maps. For example, the high intensity of Ca and Mg indicates the existence of dolomite and the locations of the relatively high intensity of Fe and S refers to the existence of pyrite. Previous studies have already reported mineral classification and segmentation using the EDS maps (Knaup et al. 2019; Li et al. 2021b; Tang and Spikes 2017). This process is commonly done by proprietary software, though, in this study, we used the end-to-end method where the EDS maps were fed into the learning models directly, instead of using the developed mineral maps or other features to predict mechanical properties.

4.2 Mechanical Dataset

Indentations were conducted on the shale samples, and each sample was specified with two sets of 5 \(\times\) 5 grid indents, with a total of 50 indents on each sample. For each indent, Young’s modulus was calculated from the load–displacement curves (Fig. 3). Figure 9 illustrates the load–displacement curves of Sample 4 as a representative. It explains that with the same maximum load setting, the displacements for Sample 4 vary due to the heterogeneity in shale samples and the displacement for most curves is in the range of 4–5 \(\mathrm{\mu m}\) at the maximum load of 400 \(\mathrm{mN}\).

Fig. 9
figure 9

Load–displacement curves from indentations of Sample 4. At the maximum load of 400 \(\mathrm{mN}\), the displacement of most curves is in the range of 4–5 \(\mathrm{\mu m}\)

Based on the analysis of the load–displacement curves, the reduced Young’s modulus is calculated from Eq. (1) through Eq. (4). Due to the highly heterogeneous nature of shales, the calculated Young’s modulus displayed variations among indented points within the same sample (Figs. 9, 10). An arithmetically averaged value of Young’s modulus was calculated and used as the ground truth label for each sample. The values of Young’s modulus are obtained in a range from 15.20 to 20.39 GPa among the samples, with Sample 4 exhibiting the highest value, and Sample 5 the lowest.

Fig. 10
figure 10

Box chart of Young’s modulus obtained from indentation experiment (S on the x-axis, refers to sample numbers), Sample 4 exhibiting the highest value, and Sample 5 the lowest.

4.3 Training Process

The entire dataset was divided into training and testing, where the testing dataset made 25% of the overall dataset. All weights and biases of CNNs model  were updated using stochastic gradient descent (SGD) with a mini-batch size of 16 to avoid high-cost local minima. A learning rate of \({1e}^{-7}\) is used in this optimizer. The objective function is the mean absolute percentage error (MAPE), which can be expressed in the following equation:

$$\begin{array}{c}M=\frac{1}{n}\sum_{t=1}^{n} |\frac{{A}_{t}-{F}_{t}}{{A}_{t}}|,\end{array}$$
(5)

where \({A}_{t}\) is the ground truth, the value of Young’s modulus, and \({F}_{t}\) is the predicted Young’s moduli. During the training process, 300 epochs were conducted. The training loss at the initial iteration steps was significant due to the randomly assigned weights and biases. Though, the training loss then descends dramatically after several iterations are completed (Fig. 11).

Fig. 11
figure 11

Loss decreasing during training process. The loss for training dataset is in orange, while that for validation in blue. Total 300 epochs with a batch size of 16 and learning rate of 1e-7 were used. The loss changes smoothly at the beginning and then has a sharp decrease, and after 30 epochs, it reaches a plateau.

4.4 Predicting the Moduli of Unseen Samples

The trained network for Young’s modulus of unseen EDS image prediction was performed in the final step. Figure 12a displays the predicted results: the x-axis is the labeled/true value of the testing samples, while the y-axis represents the predicted Young’s modulus value. This figure indicates that the predicted Young’s modulus values are in good agreement with the data measured from the laboratory tests. Additionally, for each sample, the predicted Young’s modulus value exhibited variations due to the highly heterogeneous nature of the samples. Histograms of MAPE among all samples demonstrated that for most of the test data points, errors between the prediction and ground truth are less than 10%, with an average error value calculated as 6.5%, which is within an acceptable range compared to the results obtained from the laboratory tests with higher error percentage (Fig. 12b).

Fig. 12
figure 12

a Predicted results of Young’s modulus compared with the laboratory data; the x-axis shows the ground truth data obtained from laboratory microindentation, while the y-axis shows the predicted Young’s modulus from the trained model. b Histogram of mean absolute percentage error for the test dataset, showing that most of the test data have the error smaller than 15%, with an averaged error of 6.5%

4.5 Discussion of Advantages and Limitations

It is worth understanding why the proposed method provides good predictive performance. A critical prerequisite of satisfactory predictive performance is that Young’s modulus of rocks is indeed a non-linear function of the information that can be extracted from the EDS maps, mostly the mineral information. It is well known that the effective material properties of shale rocks are determined by the mechanical property and distribution pattern of each forming constituent and their configuration. In the analytical method for estimating the mechanical properties, based on the micromechanical theorem, this can be described as follows (Mori and Tanaka 1973):

$$\begin{array}{c}{\mathbb{C}}_{hom}=\sum_{r=1}^{N}{f}_{r}{\mathbb{C}}_{r}:{\left[{\mathbb{I}}+{\mathbb{P}}_{{I}_{r}}^{0}:\left({\mathbb{C}}_{r}-{\mathbb{C}}_{0}\right)\right]}^{-1}\times {\left[\sum_{s=0}^{N}{f}_{s}:{\left[{\mathbb{I}}+{\mathbb{P}}_{{I}_{s}}^{0}:\left({\mathbb{C}}_{s}-{\mathbb{C}}_{0}\right)\right]}^{-1}\right]}^{-1},\end{array}$$
(6)

where \({\mathbb{C}}_{0}\) and \({\mathbb{C}}_{r}\) (or \({\mathbb{C}}_{s}\)) are the stiffness tensor of the matrix phase and inclusion phase; representatively, \({f}_{r}\) or \({f}_{s}\) denotes the proportion of each phase; N is the total number of phases. \({\mathbb{P}}_{{I}_{r}}^{0}\) or \({\mathbb{P}}_{{I}_{s}}^{0}\) are tensors related to the shape and distribution patterns of the comprising composition (Laws 1977). Based on the equation, it is shown that the mechanical properties are a non-linear function of the fraction, mechanical properties, and the contact boundaries of mineral phases. As described earlier, through elemental mapping, the mineral phases, their fractions, spatial distribution, and their configuration can be extracted through these elemental maps. The convolutional transform in the CNNS model can capture and establish the connection between these features and the mechanical properties. Additionally, during the iterations when optimization is in process, the CNNs can identify the useful information, and filter out irrelevant features.

However, there are some limitations on this proposed method. First, as mentioned previously, the mechanical properties of shales are scale-dependent. The proposed approach is based on a mesoscopic point of view. For both the element density maps and the Young’s modulus measured from microindentation tests, the effect of nature fractures and weak bedding planes for the mechanical properties are not considered. Therefore, the Young’s modulus mentioned in this study might have some difference between values measured from traditional mechanical tests. Second, even we have as many of 50 indents on each shale sample, however, locating each indent is challenging and we used an averaged Young’s modulus for each sample, which means heterogeneous changes within a shale sample is ignored from the output end. In the further, we would like to link the variability of the microindentation data to the variability of the element maps for the next step.

5 Conclusion

In this study, a deep learning CNNs model was employed on 2D elemental intensity distribution maps to predict the mechanical properties of shales. The proposed CNNs model framework followed: (a) collecting element intensity maps for nine major elements abundant in a shale, including Al, Ca, C, Fe, K, Mg, Na, S, and Si; (b) mechanical properties’ collection, Young’s modulus of the corresponding samples; (c) CNNs model building, and training based on the images and mechanical properties datasets, and (d) mechanical properties prediction using the trained CNNs model. The input data were created from SEM–EDS mapping, and the ground truth data were Young’s modulus values corresponding to each image obtained from microindentation tests. A total of 800 images obtained from ten shale samples were used for training and testing. The results showed that the predicted Young’s modulus values had an averaged relative error of 6.5%, which is in an acceptable error range compared to the laboratory errors. In addition, the prediction of the mechanical parameters of rocks by this newly proposed method can be an alternative to the laboratory approaches where sample preparation and more elaborate data interpretation would be inevitable.