Introduction

A 3D geological structural model can describe complicated geological phenomena in an intuitive way. In general, the accuracy of the geological structural model is affected by many factors, such as the geological complexity, the measurement errors, the sparsity of samples, the subjectivity of the modeler, the limitations of the modeling method and the software, and so on. These factors result in inevitable uncertainties in the geological models. A 3D geological model can be only an approximate description of objective geological phenomena. To further revise and improve a model, it is necessary to determine the uncertainty factors that can affect the model.

To date, many efforts have been made to analyze and quantify the uncertainties of geological models (Chilès et al. 2004; Caumon and Journel 2005; Bistacchi et al. 2008; Suzuki et al. 2008; Jessell et al. 2010; Wellmann and Regenauer-Lieb 2012; Thiele et al. 2016; Schneeberger et al. 2017; Edwards et al. 2017). Uncertainties in 3D geological models can be classified into three types, according to their different sources (Mann 1993; Bárdossy and Fodor 2004; Wu et al. 2005; Wellmann et al. 2010; Zhu and Zhuang 2010; Bond et al. 2011; Lindsay et al. 2012; Bond 2015). (1) Data error (Fig. 1a), e.g., measurement error in observations and accuracy loss in data processing. (2) Spatial variation (Fig. 1b). This type of uncertainty is derived from the random process defined in the modeling algorithm to describe geological phenomena. Similar conditions may produce different results. (3) Cognitive uncertainty (Fig. 1c). Cognitive uncertainty is mainly caused by a geologist’s cognitive bias or incomplete knowledge, e.g., empirical knowledge bias in the delimitation of ore bodies.

Fig. 1
figure 1

Multi-source uncertainties in geological models: (a) Data error in contact measurements; (b) different boundary realizations via a stochastic simulation with the same data; (c) Diverse ore bodies based on the same observations

Many studies of uncertainty analysis have focused only on uncertainties from specific sources, or have ignored the interactions between different sources. This makes it hard to precisely estimate the comprehensive impact from multi-source uncertainties on model accuracy. To determine a more accurate assessment of this impact, some researchers have considered multiple uncertainty sources and have integrated the uncertainties from different sources (Riddick et al. 2005; Lelliott et al. 2009; Lark et al. 2013; Pakyuz-Charrier et al. 2018). Numerous geostatistical modelling methods, such as Bayesian Maximum Entropy (Christakos 1990), Bayesian sequential Gaussian simulation (Doyen and Den Boer 1996), kriging with measurement errors (Fazekas and Kukush 2005), and so on, can process data with measurement errors. These methods can also be used to evaluate the comprehensive uncertainty of data error and the spatial variation in their posterior distribution (e.g., Li et al. 2013), various realizations (e.g., Thore et al. 2002), or estimation error (Kang et al. 2017). In addition to the methods mentioned above, geostatistical methods can also be combined with stochastic simulation to assess the comprehensive uncertainty (Wellmann et al. 2010; Wellmann and Regenauer-Lieb 2012; Røe et al. 2014; Schweizer et al. 2017; Hou et al. 2017; Soares et al. 2017; Pakyuz-Charrier et al. 2019). Although cognitive uncertainty is difficult to evaluate (Bárdossy and Fodor 2001), some researchers have presented uncertainty assessment methods that consider a geologist’s cognition (Wellmann and Regenauer-Lieb 2012; Wellmann et al. 2017; de la Varga and Wellmann 2016; Demyanov et al. 2019). To assess the impact of spatial variation and cognitive bias, Tacher et al. (2006) assumed that a geological model is the best guess based on modeler’s cognition, and used a Gaussian random field around the expected model to evaluate the comprehensive uncertainty. This is a very useful method to effectively mine information from the established model. To satisfy the hypotheses of the respective methods used, many uncertainty assessment studies have more or less ignored the effects of data error and the modeler’s subjective cognition. This simplification of uncertainty factors may affect the outcome of the assessment. Therefore, it is necessary to conduct an overall assessment of multi-source uncertainties.

In this paper, an uncertainty assessment of 3D geological models that uses integrated data error, spatial variation and cognition is proposed. This method is applied to the uncertainty assessment of an established 3D geological model of the Huangtupo Slope in Badong, Yichang. In the following sections, we will analyze the sources of the uncertainties of the model and introduce a geological interface random function to formulize multi-source uncertainties. In a Bayesian framework, we will integrate the various data errors, the spatial variation and the geologist’s cognition into the posterior probability distribution of the geological interface. Then, the probability distributions of the geological interfaces are converted into a series of conditional probabilities on the basis of the contact relationships of strata. Lastly, to assess the uncertainty at any specific location in the geological model, the probability field for individual stratum and the comprehensive uncertainty for the entire model are calculated.

Materials and method

Data and model description

The Huangtupo slope is located on the south limb of the Guandongkou syncline in the Three Gorges reservoir area, on the south bank of the Yangtze River in Badong, Hubei, China. The facility sliding strata at this location pose a landslide threat to the residents and towns on the Huangtupo slope. Huangtupo Landslide is a complex landslide mass formed after multiple slumps (including Riverside slump mass 1#, Riverside slump mass 2#, Garden Spot Landslide and Substation Landslide) covering an area of 1.35×km2 with a volume of 69 million m3. The strata on the Huangtupo slope mainly belong to the Badong Formation, Triassic system (T2b): the slump mass of Huangtupo is developed in slip stratum of the Middle Triassic Badong Formation Section 2 (T2b2) and Section 3 (T2b3), mainly composed of mud rock, pelitic siltstone and muddy limestone (Yang and Chen 2001). The Huangtupo slope shows the alternations of soft and hard rocks: upper soft rock (T2b2), medium hard rock (T2b3 − 1) and lower soft rock (T2b3 − 2) (Hu et al. 2012). The surface of underlying bedrock in the landslide area is undulating. The rock structure of the slide body presents fragmented. The geological map and location of the Huangtupo slope are shown in Fig. 2.

Fig. 2
figure 2

Geological map and location of the Huangtupo slope

Many detailed geological investigations have been conducted in this area. A 3D structural model of the Huangtupo slope was built in the previous geological modeling work to assist the geological hazard analysis (as shown in Fig. 3 and Fig. 4). However, many uncertainty factors may limit the accuracy of the subsequent analysis. Therefore, it is necessary to determine the impact of these uncertainties on the established model. The model was built by modeling software with the artificial auxiliary. The modeler ignored data errors in the modeling process. The model mainly used the contact information for 22 strata from 95 boreholes, 6 sections, and a detailed geological map. Because the area has few observational data, some virtual boreholes were added based on the modeler’s expertise. The size of the study area is 1530 × 2265 × 702 m. Because of the absence of initial modelers and relevant record, the details of the parameter settings used in the modeling procedure are unknown. We can only obtain observational data and an established model to use in the uncertainty assessment in this paper.

Fig. 3
figure 3

Geological subsurface model of the Huangtupo slope

Fig. 4
figure 4

Two geological profiles of the Huangtupo slope model

Source analysis of the uncertainty of the model

Before the uncertainty assessment is conducted, we need to ascertain the sources of uncertainty in our study. The 3D geological model is built in three stages: data acquisition, data processing, and structural modeling (Fig. 5). The observations from the measurements are processed into a dataset. Within the constraints of the available geological information and the dataset, the geological interfaces are modeled and assembled into a geological structural model. During data acquisition, measurement error is inevitable. The data processing procedure may cause accuracy loss. In the procedure for structural modeling, the randomness of spatial variation and the cognitive uncertainty of the modeler are introduced. All of these uncertainties propagate and cumulate as the modeling progresses.

Fig. 5
figure 5

Accumulation of Multi-source Uncertainties in Modeling

To quantify the impact of multi-source uncertainty factors on the geological model, a random function that describes the probable events affecting the geological structure should be defined. We defined a geological interface random function based on Abrahamsen and Omre’s work (Abrahamsen and Omre 1994). First, for each interface, a coordinate system (U, V, Z) is defined according to the occurrence of formation: U is parallel to the strike direction, V is parallel to the dip direction and Z is perpendicular to the UV plane and oriented upwards. The UV plane is horizontal, and the Z-direction is vertical. The local coordinate system (U, V, Z) and global coordinate system (X, Y, Z) adopt the same origin. The coordinates in the (U, V, Z) coordinate system can be obtained by the rotation transformation according to the azimuth of dip direction. Then, the random function Z(u) is defined as the elevation value of the geological interface at the location u = (u, v). The purpose of coordinate transformation is to consider the anisotropy of spatial variation. Since the trend of different interfaces are not the same, each interface should be handled individually. A diagram of the random function Z(u) in the interface between strata T2b2 and T2b1 is shown in Fig. 6.

Fig. 6
figure 6

Random function Z(u) of the interface between the strata T2b2 and T2b1. The enlarged section shows the distribution of elevation value of the geological interface in the plane of U = u0

In geostatistics, the spatial variation in the geological structures presents two types of characteristics: a structural trend and random fluctuations. The structural trend is caused by the spatial continuity of geological structures. The randomness reflects the measurement error and the spatial variation. The geological model can be seen to be a mixture of deterministic and random information (Abrahamsen et al. 1991). Therefore, Z(u) consists of two distinct parts: a local drift m(u) and a local variation σ(u) ⋅ ε(u).

$$ Z\left(\boldsymbol{u}\right)=m\left(\boldsymbol{u}\right)+\sigma \left(\boldsymbol{u}\right)\cdot \varepsilon \left(\boldsymbol{u}\right). $$
(1)

Where m(u) is the most probable value that Z(u) will have at the location u and σ(u) ⋅ ε(u) expresses the random fluctuation around the best guess, m(u). m(u), is inferred by the geologist based on his knowledge of the geological phenomenon. σ(u), is the standard deviation of Z(u) at the location u, which means the fluctuation amplitude of the random distribution. ε(u), is a random function, which Var{ε(u)} = 1. The distribution characteristic of ε(u) depends on the spatial correlation structure in the local area surrounding the location u. Because of the insufficient information, we usually simply assumed that E{ε(u)} = 0 and the distribution of ε(u) follows Gaussian distribution. However it should be noted that ε(u) is not limited only to subject to Gaussian distribution(Gunning 2000). m(u), σ(u) and ε(u) are spatially dependent, these terms may change with location u.

Since Z(u) is a random function describes the geologic structure, the three types of uncertainty we mentioned in the introduction also exist in the subterms of the random function Z(u):

  1. 1.

    Data error. σ(u) is the random amplitude of the summation of the spatial variation and the data error at the location u. σ(u) is partially composed of data errors. The rest of the composition of σ(u) reflects the randomness of spatial variation.

  2. 2.

    Spatial variation. From the viewpoints of the probability and statistics, the observations of geological phenomena are the outcome of a regionalized random process (Tacher et al. 2006). In the unobserved area, the distribution of geological phenomena is characterized by randomness. In practical applications, we make assumptions about the study variables based on a certain application scenario that helps us select the appropriate modeling methods, such as intrinsic assumption, stationary assumption, Gaussian assumption, and so on. Various modeling methods have different theoretical models and assumptions to describe spatial variation; therefore, the descriptions of the randomness and the assumptions used in these theories may take a variety of forms of expression (Arnold et al. 2019). The random function ε(u) reflects the pattern of spatial variation simulated by the chosen modeling method.

  3. 3.

    Cognitive uncertainty. Cognitive bias is a systematic (non-random) error in modeler’s mind (Haselton et al. 2005; Ariely 2008). Due to lack of knowledge or scientific ignorance, the subjective bias in the cognition of the modeler will inevitably lead to the bias and limitation in the research methods adopted by the modeler and the consequent conclusions (Mann 1993). The modeling method, modeling software, and all of the components that used as modeling tools are inevitably influenced by the cognition of the modeler. Therefore, the uncertainty caused by cognitive bias is derived not only from the human-computer interaction in the modeling but also from the selection of the modeling method, software parameter setting, and so on. The subjective decisions made by the modeler result in the cognitive uncertainty of geological model. Although the cognitive uncertainty is difficult to quantify, the established model is based on available observations and the modeler’s own subjective understanding. Therefore, a geological model is the most likely instance of the geological structure in modeler’s cognition. We assume m(u) is chosen by the modeler as the best guess of the geological structure at the location u, which the modeler developed after considering the comprehensive information, and is consistent with observations from the sampling locations. The established geological model is an assemblage of the best guess m(u) for the entire region. The deviation between the model and the reality reflects the uncertainty caused by cognitive bias. It should be noted that in the cognitive uncertainty, the parts “We don’t know what we don’t know” and “We cannot know” (Caers 2011), i.e. the “unknown unknowns”, are theoretically unknowable on the basis of the given information (Mann 1993; Wellmann et al. 2010). The potential effect of the “unknown unknowns” is outside the scope of this paper.

In error theory, m(u) represents the predicted value with a systematic error, and σ(u) ⋅ ε(u) represents a random error. m(u) expresses the most likely result, with a modeler’s cognitive bias. σ(u) expresses the degree of random uncertainty. ε(u) expresses the distribution pattern of random uncertainty. The cognitive bias (as a systematic error) may not reduce the precision of the geological model, but it may change the accuracy of the model. Multi-source uncertainty factors may be correlative. For example, modeling data can introduce measurement error into the randomness of spatial variation. To assess the comprehensive influence of these factors, multi-source uncertainties should be integrated following certain rules.

Methodology of uncertainty assessment

We know that Bayesian theory is widely used in multi-source information integration (Dowd 2018). In some ways, the propagation and accumulation of uncertainty in a modeling procedure can be identified as a process of information integration (Caers 2011). Thus, we utilize a Bayesian approach to integrate multi-source uncertainties. In this section, uncertainties from data error, spatial variation and the modeler’s cognition are considered and integrated.

Statistically, data error and the randomness of spatial variation are all included in random uncertainty. We integrate these two types of uncertainties into the PDF (Probability Density Function) of Z(u) based on the Bayesian Maximum Entropy (BME) principle. Although the cognitive uncertainty has a different nature than the random uncertainty (Mann 1993; Ariely 2008), both of them can be quantified as probabilities in Bayesian theory. Hence we adopt a Bayesian Inference (BI) method to update the BME PDF. A posterior PDF is calculated for each individual geological interface; thus, it is insufficient for analyzing the uncertainty of the entire model. The posterior PDF of each interface is transformed into a stratigraphic type probability field and integrated by the stratigraphic relationship. Ultimately, the comprehensive uncertainty will have a unified representation; therefore, there will be a holistic picture of the uncertainty in the model. The workflow of the uncertainties assessment is shown in Fig. 7. The details of the workflow will be explained in the following sections.

Fig. 7
figure 7

Workflow of the uncertainty assessment

Uncertainty integration of data error and spatial variation

In this paper, we adopt the spatial BME method (Christakos 1990), implemented using the BMELib code library to integrate data error and spatial variation into the posterior PDF of every interface. BMElib is a spatiotemporal geostatistics toolbox run in MATLAB (for more details, see reference Christakos et al. 2002). BMElib can be downloaded freely (https://mserre.sph.unc.edu/BMElib_web/). One of the advantages of using this method is that it can integrate data with various errors. Many geostatistical prediction methods are based on the minimum prediction error criterion and an assumption of a Gaussian distribution. However, the multi-source data with the measurement error may not conform to the assumption of a Gaussian distribution. In addition, there may be more than one type of error distribution existent in the data error, such as uniform distributions, Gaussian distributions, Bernoulli distributions, skewed distributions, and so on. Because of the complexity of multi-source uncertainties, many linear integration methods may not be applicable to data with such intricate errors. The BME method can process both hard data (e.g., accurate observations without errors) and soft data (e.g., data with uncertainties, such as interval data or data expressed by PDFs) (Bordwell 2002). BME does not require the data to conform to a Gaussian distribution. Another advantage is that BME is allowed to add many aspects of general knowledge (such as physical laws, expertise, and statistical information) as constraints (Christakos 2017). These non-numerical constraints are treated as prior information of the variables. Then, BME calculates the prior distribution of the maximum information entropy using prior information. The maximum entropy principle can ensure that information will be fully used in the estimation procedure to obtain the optimum result. BME not only generates a single prediction value but also calculates the PDFs of all potential values. Using the sample information (i.e., data from boreholes, sections, geological maps, etc.), the prior distribution will be updated to the BME probability distribution in accordance with the Bayesian inference. If the constraints are only the mean and the variogram of the variable, and the sample data conform to a Gaussian distribution without measurement error, then the solution of the BME will coincide with that given by a simple kriging prediction (Christakos 1990). In this condition, spatial BME and simple kriging are equivalent.

It’s necessary to note that, as a method for spatial interpolation in 2D mapping, BME cannot be applied directly to some complex geological settings. To some scenarios, geological structures may have multiple Z(u) (elevation value of the boundary) in an u = (u, v). To solve this problem, a strategy of boundary decomposition (Pomian-Srzednicki 2001) is taken into account. In this strategy, the boundaries of complex geological structures such as faults, lenticular bodies, overturned folds, etc. can be divided into different parts in order to satisfy the condition of single Z(u). For instance, we can separate the different limbs of an overturned fold into distinct interfaces and divide strata into different parts on either side of fault surface.

In BME, the prior distribution reflects the understanding of a geological structure based on knowledge and experience that exists before the sample data are considered. In addition, the BME PDF represents the new understanding after the field measurement data have been considered. BME provides a tool that can be used to update our understanding of geological structures.

Uncertainty update considering cognition

The uncertainty of a modeler’s cognition is affected by many subjective factors that are difficult to describe quantitatively. We consider the established model as a deterministic instance based on the combination of sample information and modeler’s empirical knowledge. The model that incorporates empirical knowledge reflects the modeler’s biased cognition. To integrate the cognitive uncertainty into the comprehensive uncertainty, we adopt a Bayesian Inference (BI) method to update the BME PDF with information from modeler’s empirical knowledge.

We reference Tacher’s assumption (Tacher et al. 2006) about the established model and provide further development. In a practical geological modeling procedure, most of the modeling software and modelers ignore the measurement error in observations. Usually, they just utilize the best estimate of the observations to build the model. In Tacher’s methodology, ignoring the data error, the uncertainty in a geological variable can be estimated by substituting the expectation mk(u) of random variables Z(u) with the best guess m(u), which is the model established by a modeler with a background in geology. In the uncertainty integration, before and after the modeler’s empirical knowledge is considered, the uncertainty of Z(u) varies as follows:

$$ {Z}_{prior}\left(\boldsymbol{u}\right)={m}_k\left(\boldsymbol{u}\right)+{\sigma}_k\left(\boldsymbol{u}\right)\cdot {\varepsilon}_k\left(\boldsymbol{u}\right)\overset{update}{\to }{Z}_{post}\left(\boldsymbol{u}\right)=m\left(\boldsymbol{u}\right)+{\sigma}_k\left(\boldsymbol{u}\right)\cdot {\varepsilon}_k\left(\boldsymbol{u}\right). $$
(2)

Without the influence of measurement errors, the distribution of εk(u) is standard normal distribution, the prior distribution of the Z(u) can be expressed as a Gaussian distribution (Abrahamsen 1993; Gunning and Glinsky 2004), in which the expectation is the kriging prediction values mk(u) = zk(u) and the variance equals the prediction variance \( {\sigma}_k^2\left(\boldsymbol{u}\right) \), namely \( {Z}_{prior}\left(\boldsymbol{u}\right)\sim N\left({z}_k\left(\boldsymbol{u}\right),{\sigma}_k^2\left(\boldsymbol{u}\right)\right) \). The kriging prediction variance \( {\sigma}_k^2\left(\boldsymbol{u}\right) \) represents the possible random fluctuation of the interface elevation Z(u). Suppose m(u) = zm(u) is the most likely elevation value of the interface model at the location u = (u, v), the posterior distribution of Z(u) can be expressed as the Gaussian distribution composed of the expectation zm(u) and variance \( {\sigma}_k^2\left(\boldsymbol{u}\right) \), namely \( {Z}_{post}\left(\boldsymbol{u}\right)\sim N\left({z}_m\left(\boldsymbol{u}\right),{\sigma}_k^2\left(\boldsymbol{u}\right)\right) \). The difference between the expectations zm(u) and zk(u) reflects the influence of modeler’s cognition. As a systematic error, the cognitive bias will not influence the level of the random fluctuation (i.e. \( {\sigma}_k^2\left(\boldsymbol{u}\right) \)).

As mentioned in Section 2.3.1, without data error, the probability distribution of Z(u) calculated by the BME method will attain the same Gaussian distribution \( N\left({z}_k\left(\boldsymbol{u}\right),{\sigma}_k^2\left(\boldsymbol{u}\right)\right) \). However, considering the various errors in the measurement data, BME may attain a skewed distribution or even a complex distribution with multiple peak values. m(u) and ε(u) in the BME estimation may not have explicit expressions. Therefore, it is difficult to determine an optimal solution of Z(u). In this situation, it is not appropriate to replace the expectation with the established model. To update the probability distribution of Z(u), we choose the Bayesian inference method. To avoid confusion and awkward phrasing, some comments on notation are needed here. In this paper, p(⋅) represents the marginal distribution. p(⋅| ⋅) represents the conditional probability density of the parameters mentioned in the context. P(⋅) represents the probability of an event.

Since the measurement and modeling happen in different stages and are operated by different people. We assume the sample information from measurement and the modeler’s subjective empirical knowledge are conditional independent given Z(u) = z. The Bayesian inference of the uncertainty of Z(u) is as follows:

$$ p\left(Z=z|D,K\right)\propto p\left(Z=z|D\right)\cdot L\left(Z=z|K\right) $$
(3)

Where z means the possible value in the domain of Z(u). D denotes the dataset from sample observations. K denotes the modeler’s empirical knowledge which implies the cognitive bias of modeler. p(Z = z| D) is the prior probability density of Z(u) = z before the Bayesian update, which is estimated by geostatistical method based on the observation D. The prior distribution p(Z| D) reflects the understanding of the geological structure summarized from statistical information and the modeling algorithm. p(Z = z| D, K) is the posterior probability density of Z(u) = z after considering the modeler’s empirical knowledge K. The posterior distribution p(Z| D, K) reflects the new understanding after integrating all of the observation information and the subjective cognition in the modeler’s mind. Likelihood function L(Z = z| K) = p(K| Z = z) is a function of Z(u). p(K| Z = z) denotes the probability density of the empirical knowledge K in the condition of given Z(u) = z. L(Z = z| K) represents the likelihood that Z(u) = z is the real interface position value under the given empirical knowledge K.

Without considering data error (i.e., Tacher’s study), for each location u, using the sample data D and the empirical knowledge K, the prior probability density p(Z = z| D) and the posterior probability density p(Z = z| D, K) would be calculated as follows:

$$ p\left(Z=z|D\right)=\frac{1}{\sqrt{2\pi }{\sigma}_k(u)}{e}^{-\frac{{\left[z-{z}_k(u)\right]}^2}{2{\sigma}_k^2(u)}}, $$
(4)
$$ p\left(Z=z|D,K\right)=\frac{1}{\sqrt{2\pi }{\sigma}_k(u)}{e}^{-\frac{{\left[z-{z}_m(u)\right]}^2}{2{\sigma}_k^2(u)}} $$
(5)

After the update, the expectation of Z(u) changes from the kriging prediction value zk(u) to the model value zm(u). According to Eq. (3), we have the likelihood function:

$$ L\left(Z=z|K\right)=\alpha \cdot \frac{p\left(Z=z|D,K\right)}{p\left(Z=z|D\right)} $$
(6)

Where α denotes a constant as scaling factor.

To assess the uncertainty from measurement error, we estimate the error distribution for each sample and convert inaccurate sample data to soft data. We use De to denote the dataset after error assessment. Considering the measurement error, the posterior probability density p(Z = z| De, K) can be obtained by:

$$ p\left(Z=z|{D}_e,K\right)\propto p\left(Z=z|{D}_e\right)\cdot L\left(Z=z|K\right) $$
(7)

According to Eq. (6),

$$ p\left(Z=z|{D}_e,K\right)\propto p\left(Z=z|{D}_e\right)\cdot \frac{p\left(Z=z|D,K\right)}{p\left(Z=z|D\right)} $$
(8)

As mentioned above, the probability distribution p(Z| D) and p(Z| De) can be calculated with geostatistical methods; here we use kriging and the BME method to calculate p(Z| D) (i.e., ignore data errors) and p(Z| De) (i.e., consider data errors), respectively. In addition, we adopt the methodology of Tacher et al. (2006), i.e., select the Gaussian distribution composed of model zm(u) and the kriging variance \( {\sigma}_k^2\left(\boldsymbol{u}\right) \) as p(Z| D, K). After calculation by the proportion in Eq. (A2), a normalization process should be taken to ensure the integral of p(Z| De, K) on all z equals to 1.

Comprehensive uncertainty field calculation

The posterior distribution of Z(u) can only describe the spatial uncertainty of the interface between two adjacent strata. However, the occurrences of different interfaces are mutually exclusive in certain locations, and we should consider the impact from the other strata when calculating the uncertainty in the present stratum. We utilize the conditional probability of a stratigraphic type to assess the uncertainty in a certain stratum existing in a given location under the influence of other strata. The occurrence probability of each stratigraphic type is calculated with the CDF (cumulative distribution function) of stratigraphic interface Z(u). According to the contact relationship of strata, we update the stratigraphic type probability in an iterative way (Pomian-Srzednicki 2001). Using this method (See comments to modeling method above in Appendix), we obtain a multi-stratigraphic type probability field P(X, Li) that in each location X = (x, y, z), P(X, Li) represents the conditional probability of the stratigraphic type Li.

To guide the further adjustment of the geological model, we should know the spatial distribution of the uncertainty of the model. The stratigraphic type probability field P(X, Li) can quantify the spatial uncertainty in each stratum. However, sometimes we need to not only reveal the uncertainty of each stratum but also reveal the uncertainty in the integral structure. For this purpose, the probability field of each stratum should be merged into a 3D uncertainty field. Following Wellmann and Regenauer-Lieb (2012), information entropy is applied as the measurement of the integral structure uncertainty in this study. We can calculate the information entropy of the stratigraphic type based on the multi-type probability field P(X, Li). In any location X = (x, y, z), the information entropy H(X) of n stratigraphic types is defined as:

$$ H\left(\boldsymbol{X}\right)=-\sum \limits_{i=1}^nP\left(\boldsymbol{X},{L}_i\right){\log}_bP\left(\boldsymbol{X},{L}_i\right). $$
(9)

Using different values of base b, information entropy has different units; if b = 2, the unit is a bit. In this paper, ‘bit’ is adopted as the unit of information entropy. When the probability of each event is equal, the information entropy will be at its maximum, which means that the dispersion degree of the possible result is maximized; therefore, the uncertainty in the prediction will be at its maximum in this situation. If the sample space has only one result, the entropy of this definite event will be 0.

It should be noted that in the probability density functions p(Z| D), p(Z| De), p(Z| D, K) and p(Z = z| De, K) mentioned in Section 2.3.2, only p(Z| D, K) and p(Z = z| De, K) imply the information from the established model. p(Z| D) and p(Z| De) are improper to be used as distribution of Z(u) in the calculation of probability field of geological model or else the uncertainty fields just express geological uncertainty and have no connection with established model. In other words, the consideration of cognitive bias is inherent in the uncertainty assessment of geological model.

The MATLAB codes of comprehensive uncertainty calculation can be obtained freely (https://github.com/bomer2000/UAGM).

Results

This section shows the results of an experiment used to illustrate the approach to comprehensive uncertainty assessment of the Huangtupo geological model. The result of the uncertainty assessment is shown below.

Probability distribution of stratigraphic interface

To analyze the spatial uncertainty of the structural model in each location, we discretized the study area into a 3D field with a resolution of 15 × 15 × 1 m. By analyzing the main tectonic directions, the coordinate system (U, V, Z) was set up.

We analyzed the sample data to evaluate the measurement errors of boreholes and sections. Some of the contact data with few measurement errors were treated as hard data, while other data were converted into probability distributions as soft data. We considered the quality standard of the data we used, and set up the error distribution for the data by referring to the previous research (Wellmann et al. 2010; Pakyuz-Charrier et al. 2018; Hou et al. 2019). For example, according to the quality standard of the geological survey data (e.g., the measurement error of the borehole should not be higher than 1‰ of the drilling length; in geological profile, the root mean square error of the geological boundary should correspond with the scale of profile), some contact positions from the boreholes and profiles were described with Gaussian distributions (estimated values as mean and their root mean square errors as standard deviation). For the data with direct contact missing, uniform distribution was used to represent the uncertainty of the missing part. For the virtual borehole given by geologist, we made educated guesses based on expert knowledge and used probability distribution to represent the contact point position. The rest of data assumed to contain no errors were treated as hard data.

The experimental covariance function that expresses the spatial variation in the interface elevation was computed using the contact data. Using the experimental covariance fitting and the assistance of expertise, the parameters of the covariance model were determined. We used samples (both soft data and hard data) and their covariance function to calculate the PDF of the interface elevation at each location (u, v) by the BME method. Considering the data error, the randomness pattern of the interface elevation became a non-Gaussian distribution. With the help of the established model and the kriging prediction, the likelihood function was calculated to update the BME PDF. Finally, the posterior PDFs of all of the interfaces were calculated. The comprehensive uncertainty that is expressed in the posterior PDFs integrates the data error, the randomness of spatial variation, and the cognitive uncertainty.

We chose the interface between strata T2b2 and T2b1 (as shown in Fig. 6) as an example of the uncertainty integration of data error and spatial variation. At the interface, the interface elevation Z(u) is anisotropic. The covariance functions of the two directions, U and V, are different. The experimental covariance functions and the fitted spherical models of the two directions were calculated, as shown in Fig. 8. For the case in which the sills were observed to be equal in the directions U and V, we set up an anisotropic nested covariance function. The nested structural model is as follows:

$$ \Big\{{\displaystyle \begin{array}{c}C(0)={C}_0+{C}_1,h=0\\ {}C(h)={C}_1\cdot \left(1-\frac{3}{2}\cdot \frac{h}{a_u}+\frac{1}{2}\cdot \frac{h^3}{a_u^3}\right),0<h\le {a}_u\\ {}C(h)=0,h>{a}_u\end{array}} $$
(10)
Fig. 8
figure 8

Covariance functions in two directions

In the nested spherical model, the ranges of two directions are au = 1000 m and av = 500 m. The separation distance of isotropic is expressed as \( h=\sqrt{h_u^2+{\left(K\cdot {h}_v\right)}^2} \), with the anisotropic ratio \( K=\frac{a_u}{a_v} \). The sill is the sum of the nugget C0 = 0.25 and the partial sill C1 = 25000. This covariance model reflects structural information and the randomness of the elevation Z(u) of the interface between strata T2b2 and T2b1.

A location u = (u0, v0) (as shown in Fig. 6) is chosen in the study area to show the uncertainty update of the interface between strata T2b2 and T2b1. At the location u = (u0, v0), the elevation of the interface in the established model is zm = 561.2 m, the kriging prediction of this interface is zk = 574.7 m, and the prediction variance of the kriging is \( {\sigma}_k^2=210.3 \). In Fig. 9, the red solid curve is the probability distribution p(Z| D) from the kriging prediction. This probability distribution expresses the random uncertainty of the geological variable Z(u). The red vertical line is the kriging estimated value zk. The green vertical line represents the elevation zm of the interface model. The green solid curve is the uncertainty p(Z| D, K) before considering data error. The blue dotted curve is the BME prediction p(Z| De) that integrates data error and the randomness of the spatial variation. The BME PDF is a non-Gaussian distribution. The red dashed curve is the comprehensive uncertainty p(Z| De, K) that integrates the data error, the randomness of spatial variation, and the cognitive uncertainty. From a variance perspective, the conditional variances \( Var\left(Z|D\right)={\sigma}_k^2=210.3 \), Var(Z| De) = 242.5 and Var(Z| De, K) = 446.8 reflect that the uncertainty of geological interface increases with considering more uncertainty factors.

Fig. 9
figure 9

PDFs of Z(u) under different conditions

Likewise, the probability distributions of the interface between the strata T2b2 and T2b1 at all of the locations were calculated and shown in Fig. 10. The probability density at each location means the occurrence probability that the interface extends through this grid cell. It is observed that, in the p(Z| D) stage, a high probability appears around the observations. The occurrence probability of the interface gathers around the prediction of the kriging estimation. In the p(Z| De) stage, considering the impact of the data error, the occurrence probability in the observation area decreases and the confidence interval width of interface increases. In addition, the occurrence probability in the area far away from observations increases slightly. In the process of integration of data error and spatial variation (p(Z| D)→p(Z| De)), the changes (increase or decrease) of the probability distribution of interface are shown in Fig. 11a. In the p(Z| De, K) stage, the probability density is condensed around the interface model. In contrast, the occurrence probability decreases rapidly outside of the model coverage. In the process of uncertainty update considering cognition (p(Z| De)→p(Z| De, K)), the changes (increase or decrease) of the probability distribution of interface are shown in Fig. 11b.

Fig. 10
figure 10

Probability density distribution of the interface between strata T2b2 and T2b1 in different stages of integration: (a) the interface model; (b) perspective view of probability density distribution; (c) side view of probability density distribution

Fig. 11
figure 11

Probability changes in the uncertainty integration: (a) Probability changes after integrating data errors. The area near observations is enlarged and displayed. (b) Probability changes after considering the cognitive uncertainty

Probability field of geological models

After the uncertainties of each interface have been integrated separately, the probability of each stratigraphic type was calculated with the Eq. (A1). The contact relationships are depositional or erosional between adjacent strata in this study area. According to the contact relationships, a conditional probability field of all the stratigraphic types was obtained by the iterative update Eqs. (A2) and (A3). Each location in this conditional probability field stores a vector of 22 conditional probabilities of all of the stratigraphic types. This multi-stratigraphic type probability field represents the comprehensive uncertainty of every stratum.

Similar to what is shown in Fig. 12, for each stratigraphic type, high probabilities mainly occur in the outcrop and in the interior of the stratum. In addition, the probability gradually decreases from the interior to the boundary between the present stratum and its neighbor. At the observation location, considering the measurement error, the probability of each stratigraphic type in the sample is close to, but not exactly, 0 or 1. Around the section area, the interpretation error of the section is much higher than the boreholes’ error; therefore, the reliability at this location is lower than that at the drilling area, and the probability is slightly farther from 0 or 1.

Fig. 12
figure 12

Probability distribution of the four stratigraphic types (T2b1, T2b2, Slip Zone and \( {\mathrm{Q}}_{{\mathrm{T}}_2{\mathrm{b}}^2}^{\mathrm{del}\hbox{-} \mathrm{su}} \)). In the overlay area of the strata, a location is selected to show the profile of the probability from the base to top of each stratum

Comprehensive uncertainty field

We calculated the information entropy based on the conditional probability of each stratigraphic type to quantify the structural uncertainty of the entire model. The entropy value represents the quantity of the comprehensive uncertainty of the model at any given location. In order to demonstrate the influence of data error on the uncertainty assessment, we also calculated the model uncertainty only considering spatial variation and cognitive bias (without data error). The 3D structural model and its uncertainty fields are shown in Fig. 13.

Fig. 13
figure 13

3D structural model of Huangtupo slope and its uncertainty field (unit of entropy: bit): (a) subsurface model of Huangtupo slope; (b) uncertainty field considering spatial variation, and cognitive bias; (c) comprehensive uncertainty field considering data error, spatial variation, and cognitive bias; (d) influence of data error on model uncertainty

In the comprehensive uncertainty field (Fig. 13c), the high entropy mainly appears in the area where more than one stratum may exist. The low entropy mostly occurs in the area near the observations or where there is only one single stratum. Sample information from the observation reduces the uncertainty of the spatial variation. It is easy to understand why low entropy usually occurs at a location that only has one single stratum because there are few other possibilities. The maximum entropy in the study area is approximately 2.359 bit, which occurs in the aggregation area of the six different strata. In the interlayer of the adjacent interfaces, if the thickness of the stratum is small, then more types of possible stratigraphic types appear in this complex area. The entropy range of 1 ~ 1.58 bit is common in the interlayer area. The low entropy usually appears on the surface because the field data are much easier to collect on the ground. Most of the high entropy regions near the ground congregate around the stratigraphic boundary. Some other high entropy areas are caused by a lack of observations and a reliance on the extrapolations from the modeler’s experience in the modeling software. As a contrast, Fig. 13b shows the uncertainty field without considering data error. In Fig. 13d, we can see the influence of data error on model uncertainty clearly: entropy increases at the locations of geological interface in the model and decreases at the areas that far away (in height) from geological interface. The complex errors from diverse data affect the distribution of uncertainty of subsurface structure. With the consideration of data error, the uncertainty of geological interface increased.

Discussion

In the prediction of a geological variable, the PDF calculated by BME is more credible than that calculated by kriging (Christakos and Li 1998). However, to estimate the likelihood function L(Z| K), we still need the help of the kriging variance. In the result of p(Z| D) (as shown in Fig. 10), it is observed that the local maximum probabilities from the kriging prediction vary around the observations, and the model is deterministic in the observation positions. As the distance departs from the observation values, the variance of Z(u) increases, which indicates that the randomness of the spatial variation is increasing and the occurrence probability of the interface is decreasing. After the data error is integrated by BME, the uncertainties in the observations increase. The observed deterministic value becomes randomly distributed with the observation error. As the uncertainty is propagated, the occurrence probabilities of the interfaces decrease near the observations and increase far away from the observations (as shown in Fig. 11a). Considering the modeler’s subjective cognition, the occurrence probabilities of the interface increase in the corresponding model region and decrease outside of this region (as shown in Fig. 11b). In the area where the model approximates the BME prediction, the probability remains approximately unchanged. These results are consistent with our expectations for the propagation of multi-source uncertainties in the modeling process. Our approach further considers the data error and the modeler’s cognitive bias relative to previous work.

The uncertainty analysis is essentially a model-based spatial analysis. This spatial analysis process may introduce, propagate, or even amplify the uncertainty of the original model (Shi 2009). Usually, the modeler and the analyst of a model’s uncertainty are not the same person. In the uncertainty analysis of an established model, the original parameter settings and interactive modifications employed by the modeler are difficult to obtain and to reproduce. A discrepancy between the uncertainty assessment and the theoretical accuracy of modeling will inevitably appear when different methods and parameters are adopted in the analysis versus the modeling process. Different uncertainty analysis methods will lead to different assessment results for the same model (Bárdossy and Fodor 2004). The selection of methods depends on the subjective cognition and the empirical knowledge of the analyst. Similarly, the established model contains the modeler’s cognitive bias. Therefore, the uncertainty assessment of the established model is not only affected by the data error and the randomness of spatial variation, but is also influenced by the cognitive uncertainty of both the modeler and the analyst (Caers 2011). Whether in modeling or in uncertainty analysis, the influence of a human’s subjective cognition is hard to avoid (Shi 2009).

In this research, modeler’s empirical knowledge is assumed to be conditional independent with sample data (no matter whether considering data error or not). In fact, the Bayesian inference method in the Section 2.3.2 is Naive Bayes model. In practical application, it does not always confirm to the assumption on conditional independence. This is a problem of estimation in condition of incomplete information, because we have no further information about the correlation of the variables. Some studies indicate that (Rish 2001; Zhang 2004; Kupervasser 2014), for the case that the correlation is unknown, the Naive Bayes model maybe not exact, but optimal solution. In our case, the Naive Bayes model is applicable. In addition, we assume that our object of analysis is the best guess model using mathematical or empirical criteria. If there misconceptions or inaccurate knowledge exists in the modeler’s cognition, the “best guess” assumption is perhaps no longer valid. In this situation, some other methods should be considered to quantify cognitive uncertainties.

Conclusions

We developed a new method for performing uncertainty assessments of geological models. This method applies to the comprehensive evaluation of spatial uncertainty in structural models. The multi-source uncertainties derived from data errors, spatial variations, and cognitive bias are considered. Under the constraints of observations and geological rules, the various uncertainties of a geological model are quantified as probabilities. Using this method, multi-source uncertainties are integrated into the posterior probability of each interface and assembled according to the contact relationships. To determine the uncertainty of the integral structure, a comprehensive uncertainty field is calculated based on the posterior probability of the interfaces and their contact relationships. The assessment result can be treated as a representation of the modeler’s confidence in the model after integrating all available information.

In this paper, we employ BME to integrate multi-source uncertainties. The BME method can integrate information that contains a variety of error distributions and prior knowledge, however the information sources for BME need to satisfy a conditional independence assumption (Allard et al. 2012). The information integration methods suitable for more general condition will be considered in future studies. Expert knowledge is useful for evaluating the possibility of geological phenomena (Guillen et al. 2008; Howard et al. 2009; Wellmann et al. 2014). More examples of expert knowledge, such as the stratum thickness and conceptual graphs of the geological structure, will be considered in the next stage.