Introduction

Geostatistics provides tools to construct numerical models and support mineral and petroleum resource estimates. Geostatistical modeling requires parameters and assumptions based on the limited data available for a particular deposit. The parameters include global probability distributions, variograms and training images. A common assumption of geostatistical modeling is stationarity of these parameters, that is, they are independent of location. For example, the expected value or mean value is assumed constant for all locations within each domain. The decision of stationarity is made prior to any geostatistical prediction (Deutsch and Journel 1998; Davis and Sampson 2002; Wackernagel 2003; Pyrcz and Deutsch 2014). Most decisions of stationarity are implicitly made with the application of a particular algorithm.

Simulation plays an important role in geostatistical modeling. Simulation draws multiple realizations to characterize the geological heterogeneity and quantify the uncertainty of the regionalized variable. The realizations reflect the statistical characteristics of the observed data and unsampled locations. The central step in simulation is to draw simulated values from conditional distributions (Deutsch and Journel 1998). The variation between multiple realizations represents the geological uncertainty (Goovaerts 2001; Rossi and Deutsch 2014). This uncertainty is considered with other aspects of a project to support decision-making. Sequential Gaussian simulation is widely used (Deutsch and Journel 1998). Such simulation-based techniques make a strong assumption of stationarity. However, real geological data often exhibit trends or non-stationary location dependent features (Wang et al. 2012; Boisvert et al. 2013). This violates the assumption of stationarity and the local accuracy of the predicted uncertainty may be unreliable.

Non-stationary geostatistical methods have been developed. One approach is decomposition of the regionalized variable into a deterministic component with large-scale features and a stochastic component with small-scale variations. Several methods are available for modeling the deterministic component (Journel and Huijbregts 2003; Machuca-Mory 2010; Rossi and Deutsch 2014). The conventional geostatistical algorithms would only be applied to the stochastic component assuming that it is stationary and the modeled deterministic component would be added back to the simulated result (Wackernagel 2003; Chiles and Delfiner 2012). The naive approach of modeling with residuals and adding the trend model back in the final models is straightforward; however, the variogram from the residuals is biased downward compared with the underlying variogram (Delfiner 1976; Sabourin 1976; Chiles and Delfiner 2012). Additionally, some constraints should be considered to ensure nonnegative simulated values in the final model when adding the trend at the end (Leuangthong 2003). Another approach is a conditional transformation, such as the stepwise conditional transformation (Leuangthong and Deutsch 2003, 2004) or a locally varying transformation (Gonzales et al. 2006). The complex features of the trend can be removed in the forward transformation, but artifacts could also be introduced due to the bin selection or the use of few data in the transformation. Other approaches include the intrinsic random functions of order k (Matheron 1973), the non-stationary covariance function (Sampson and Guttorp 1992), the moving window averages (Brunsdon et al. 2002), the spatially varying linear model of coregionalization (Gelfand et al. 2004) and the local random function (Machuca-Mory 2010). However, these non-stationary techniques encounter difficulties in practice (Rossi and Deutsch 2014).

This paper develops a geostatistical modeling algorithm that accounts for the deterministic features of continuous regionalized variables in an artifact-free fashion. A methodology similar to the nonparametric stepwise conditional transformation proposed by Leuangthong and Deutsch (2003) is considered. The conditional distributions are calculated by a Gaussian mixture model fitted to the deterministic trend and the data. The trend-like features in the regionalized variable are removed by the conditional transformation. A porphyry copper deposit is considered where the grade shows an obvious trend. The trend is assumed known without uncertainty. A comparison to conventional geostatistical calculations is made. The results show that the geostatistical modeling with trend modeling outperforms the conventional geostatistical modeling with less error and better reproduction of important features of the regionalized variable.

Background

The stepwise conditional transformation technique was first introduced by Rosenblatt (1952) as an extension of the normal score transformation. Leuangthong and Deutsch (2003) introduced this technique to geostatistics and developed practical applications (Leuangthong and Deutsch 2004). This technique removes some complex features from data.

Consider \(\left\{ Z_{k}({\varvec{u}}), k = 1, \ldots ,K \right\} \) is a set of K stationary random functions. \(\left\{ {\varvec{u}}_{i}, i = 1, \ldots ,n \right\} \) represents a set of n data locations. The observations of the random function at location \({\varvec{u}}_{i}\) are denoted by \({\varvec{z}}_{i}=\left\{ {\varvec{z}}_{i,1}, \ldots , {\varvec{z}}_{i,K} \right\} \). The first variable \( \left\{ {\varvec{z}}_{i,1}, i = 1, \ldots ,n \right\} \) is transformed independently to Gaussian units through a normal score transformation, the second variable \( \left\{{\varvec{z}}_{i,2}, i = 1, \ldots ,n \right\} \) is transformed conditional to the first variable and so on:

$$\begin{aligned} {\varvec{y}}_{i,1}&= G^{-1}\left( F_{1}\left( {\varvec{z}}_{i,1}\right) \right) \\ {\varvec{y}}_{i,2}&= G^{-1}\left( F_{ 2 \mid 1 }\left( {\varvec{z}}_{i,2} \mid {\varvec{z}}_{i,1} \right) \right) \\ \vdots&\\ {\varvec{y}}_{i,k}&= G^{-1}\left( F_{ k \mid 1, \ldots , k-1 }\left( {\varvec{z}}_{i,k} \mid {\varvec{z}}_{i,1}, \ldots , {\varvec{z}}_{i,k-1} \right) \right) \\ \vdots&\\ {\varvec{y}}_{i,K}&= G^{-1}\left( F_{ K \mid 1, \ldots , K-1 }\left( {\varvec{z}}_{i,K} \mid {\varvec{z}}_{i,1}, \ldots , {\varvec{z}}_{i,K-1} \right) \right) \quad i = 1, \ldots , n \end{aligned}$$
(1)

where \( \left\{{\varvec{y}}_{i,k}, i = 1, \ldots ,n\,\text\;{and}\;k = 1, \ldots ,K \right\} \) are transformed multivariate Gaussian variables. \( G^{-1} (\cdot ) \) represents the inverse Gaussian cumulative distribution, and \( F(\cdot ) \) indicates a cumulative distribution function derived from the data. The co-located transformed values are independent although there is no guarantee of decorrelation at nonzero lag distances (Leuangthong and Deutsch 2003). The co-located complex features are removed in the forward transformation and are brought back in the back transformation. The transformed variables are simulated, then back transformed in reverse order.

The original stepwise proposal considered nonparametric conditional distributions for the transformation. This approach, however, suffers from artifacts due to the bins used for the conditional distributions and becomes difficult to apply with more than \( K=3 \) variables; there are rarely enough data to reliably inform the conditional distributions.

The Gaussian distribution is fully parameterized by a mean vector and a covariance matrix. One single Gaussian model cannot capture all the complex features of geological data, while one Gaussian kernel per observation is computationally expensive with a large number of data (Silverman 1986; Gray and Moore 2003). Mixture models with a small number of Gaussian kernels could be considered. Pearson (1894) proposed the initial approach of mixture models. A number of authors including Gilardi et al. (2002) and Silva and Deutsch (2016) have used them in geostatistics. The expectation maximization algorithm is considered to fit a Gaussian mixture model (McLachlan and Peel 2004; McLachlan and Krishnan 2007; Silva and Deutsch 2016). The benefits of Gaussian mixture models are that complex features can be captured and any conditional distribution can be easily calculated.

Consider the same set of K variables at n data locations \({\varvec{z}}_{k} = \left\{ {\varvec{z}}_{1,k}, \ldots , {\varvec{z}}_{n,k} \right\} ^{T} \). \( {\varvec{y}}_{k} = \left\{ {\varvec{y}}_{1,k},\ldots , {\varvec{y}}_{n,k} \right\} ^{T} \) represents the set of the normal score transformed observations where each variable is transformed independently. The Gaussian mixture model is a multivariate probability density function. The probability density function is written as a sum of g components or mixtures:

$$\begin{aligned} f^{'}\left( {\varvec{y}}_{k}; {\varvec{\varPsi}}\right) = \sum _{j=1}^{g} {\pi _{j} \phi \left({\varvec{y}}_{k}; {\varvec{\mu}}_{j};\,{\varvec{\varSigma}}_{j}\right) } \quad k = 1, \ldots ,K \end{aligned}$$
(2)

here \( f^{'} (\cdot ) \) is the estimated distribution. \({\varvec{\varPsi }}\) is the set of unknown parameters \(\left\{ \pi _1,\ldots ,\pi _{g}, {\varvec{\mu}}_1,\ldots, {\varvec{\mu}}_{g}, {\varvec{\varSigma}}_1,\ldots, {\varvec{\varSigma}}_{g} \right\} \). \(\left\{ \pi _1, \ldots , \pi _{g} \right\}\) are the nonnegative weights assigned to each mixture, \( \left\{ {\varvec{\mu }}_1,\ldots ,{\varvec{\mu }}_{g} \right\} \) indicates the mean vector of all variables and \( \left\{ {\varvec{\varSigma }}_1,\ldots ,{\varvec{\varSigma }}_{g} \right\} \) refers to the set of covariance matrices between variables for each mixture. \( \phi (\cdot ) \) is the multivariate Gaussian probability density function. The expectation maximization algorithm maximizes the log likelihood, \(\log \lbrace L\left( {\varvec{\varPsi }}\right) \rbrace \). The parameters of the mixtures would be iteratively fitted so that \( f^{'}(\cdot ) \) closely fits the experimental data. Any marginal or conditional distribution is easy to compute once \( f^{'}(\cdot ) \) is fit.

Proposed Method

The stepwise conditional transformation proposed by Leuangthong and Deutsch (2003) transforms the residuals from the trend conditional to the trend. This approach has binning artifacts due to the nonparametric conditional distributions and creates a small number of negative estimates due to variations within the bins. A revised methodology is proposed. The first change is to transform the variable conditional to the trend, not the residual conditioned to the trend. The second change is to use a Gaussian mixture model to avoid any binning artifacts. The objective is to remove the trend-like features from data in a bin-free manner that accounts for the spatial structure and multivariate relationship between the data and the trend.

Figure 1
figure 1

Overview of the proposed approach: (a) normal score transform the trend model and data individually; (b) crossplot of the transformed data and the co-located trend; (c) fit with a Gaussian mixture model; and (d) transform the normal scored data with the Gaussian mixture model

Consider a set of n observations, \( \left\{ {\varvec{z}}_{i}, i =1,\ldots , n \right\} \). The trend is assumed exhaustive and known that it is represented by \( \left\{ {\varvec{m}}_{i}, i = 1, \ldots , N \right\} \). Figure 1 shows a schematic illustration of the proposed transformation sequence. The steps for the stepwise conditional transformation using the Gaussian mixture model are as follows:

  1. 1.

    Normal score transformation: the trend and data are transformed into standard normal score units through the normal score transformation individually. The trend model is exhaustive, and there is no need to consider the declustering, while the data should be transformed with declustering weights if they are unequally sampled. The normal score transformations are written as:

    $$\begin{aligned} \begin{aligned} {\varvec{y}}_{m_{i}}&= G^{-1}\left( F_m ( {\varvec{m}}_{i} ) \right) \quad i = 1, \ldots ,N \\ {\varvec{y}}_{z_{i}}&= G^{-1}\left( F_z ( {\varvec{z}}_{i} ) \right) \quad i = 1, \ldots ,n \end{aligned} \end{aligned}$$
    (3)

    here \( \left\{ {\varvec{y}}_{m_{i}}, i = 1, \ldots ,N \right\} \) denotes the Gaussian transformed trend value. Such trend values are known everywhere. N is the number of grid nodes from the exhaustively sampled trend. \( F_m (\cdot ) \) represents the cumulative distribution function of the exhaustive trend. \( \left\{ {\varvec{y}}_{z_{i}}, i = 1, \ldots ,n \right\} \) is the Gaussian transformed data, while \( F_z (\cdot ) \) represents its cumulative distribution function. n represents the number of data where \( n \le N \).

  2. 2.

    Review the transformed variables: the transformed data and the co-located transformed trend are crossplotted. This crossplot is used to help choose the number of Gaussian mixture components, g, for the bivariate fitting and conditional transformation. Too many mixture models will over-fit the complexity of the data, while too few mixture models would fail to reproduce the important complexity. The number of the Gaussian mixture models should be reasonable, such that it gives reliable conditional distributions from the bivariate distribution of the data and the trend in an artifact-free fashion. It is common to choose between 2 and 5.

  3. 3.

    Multivariate density estimation: the expectation maximization algorithm is considered to fit the bivariate distribution of the transformed variables. The estimated multivariate density function is calculated by Eq. 2.

  4. 4.

    Conditional transform the normal score data: the normal score data, \( \left\{ {\varvec{y}}_{z_{i}}, i = 1, \ldots ,n \right\} \), are transformed by the conditional distribution of \( {\varvec{y}}_{z_{i}} \) given \( \left\{ {\varvec{y}}_{m_{i}}, i = 1, \ldots ,n \right\} \). The equation is given as:

    $$\begin{aligned} {\varvec{y}}^{'}_{z_{i}} = G ^{-1}\left( F _{z \mid m} ( {\varvec{y}}_{z_{i}} \mid {\varvec{y}}_{m_{i}} ) \right) \quad i = 1, \ldots ,n \end{aligned}$$
    (4)

    where the random variable \( {\varvec{y}}^{'}_{z_{i}} \) indicates the transformed data by the Gaussian mixture models. \( F _{z \mid m} (\cdot ) \) represents the cumulative distribution function of the data given the exhaustive trend. The cumulative distribution function of the trend does not enter any calculations, but the transformed data consider the trend at each location. The bivariate distribution of the transformed data and the co-located normal score trend has no correlation.

The proposed parametric conditional transformation removes the trend-like features that may be problematic in the modeling of the raw data directly. Gaussian simulation can now be used and several realizations are generated with the transformed data. The back transformation will ensure that the trend model is used everywhere. The trend is reproduced in original units.

Application

The data shown in Figure 2a (top view) and b (3D view) comprise 121 drillholes with 3302 grade measurements from a porphyry copper deposit. The location coordinates range from 34, 200 to 36, 200 meters East, from 27, 400 to 28, 800 meters North, and from 600 to 1, 300 meters Elevation with 9 meters intervals. The grade of copper ranges from 0.0 to \( 3.4 \% \) with a mean of \(0.262\% \) and a standard deviation of \( 0.266\% \) and the histogram is shown in Figure 2c. Despiking was considered due to grades of constant values (Rossi and Deutsch 2014). The histogram after despiking is shown in Figure 2d. The mean is \( 0.263\% \) and the standard deviation is \( 0.265\% \).

Figure 2
figure 2

The location maps and histograms of 3302 data. (a) Location map in a top view, (b) Location map in a 3D view, (c) Histogram with raw data and (d) Histogram with despiking

Figure 3
figure 3

Visualization of the global kriging estimates of 3302 data. (a) Estimation at 775.5 meters and (b) Estimation in a 3D view

Although kriging strongly depends on the decision of stationarity, it can still be used for mapping the large-scale trend-like features. A global kriging was performed with a variogram with a \( 20\% \) nugget effect and a range of 1000 meters. The global kriging result in Figure 3 reveals the obvious trend where high values are concentrated in the center. The most continuous direction in the plane direction is at an azimuth of \( 110^{\circ } \).

Figure 4
figure 4

Histograms of 2496 data with different weights. (a) Histogram with despiking and (b) Histogram with declustering

The 3302 copper grade data from 121 drillholes were divided randomly into a modeling set of 2496 data from 88 drillholes and a test set of 806 data from 33 drillholes. The modeling data were used for geostatistical modeling, and the test data were used to check the simulated results.

Figure 5
figure 5

The isotropic variogram model of 2496 data in normal score units. The sizes of the dots represent the relative number of pairs in each direction

Figure 6
figure 6

The first three realizations and the average over one hundred realizations in original units. (a) Realization 1, (b) Realization 2, (c) Realization 3 and (d) Etype

Figure 7
figure 7

The variance over one hundred realizations in original units. (a) A slice at 775.5 meters and (b) A volume in a 3D view

Figure 8
figure 8

The histogram reproduction of 2496 data with the conventional geostatistical modeling in original units

Figure 9
figure 9

The variogram reproduction of 2496 data with the conventional geostatistical modeling in original units. Directional experimental variograms are plotted with points. Light gray lines are the variograms of each realization, while the dark gray line represents the average of all realizations. The black line is the directional variogram from the original 2496 values

The histogram of the modeling 2496 data is shown in Figure  4a. Data are clustered together, so the declustering was needed (Deutsch and Journel 1998). Cell declustering was performed with a 400-meter cell size and the corrected histogram with a mean of \( 0.201\% \) and a standard deviation of \( 0.210\% \) is shown in Figure 4b. The data were transformed into a normal distribution with the declustering weights. The directional variograms in normal score units were plotted with an isotropic variogram model in Figure  5. Sequential Gaussian simulation was considered and 100 realizations were generated. A normal score back transformation was considered to bring all realizations back to the original units. Figure 6 shows the back transformed results with the first three realizations and the average of all 100 realizations in original units. The variance of all 100 values at each location is shown in Figure  7. There is low variance in the low-valued zones and high variance in high-valued zones as expected with a positively skewed distribution. The variance is high around the margins because of the few conditioning data.

Histogram reproduction can be checked that geostatistical realizations are intended to reproduce the input histogram. The histograms of 100 realizations are modeled and shown with black lines, while the 2496 conditioning data are shown with a red line in Figure 8. The mean of the realizations, \( 0.200\% \), is close to the reference mean, \( 0.201\% \); the standard deviation, \( 0.196\% \), is lower than the conditioning variance, \( 0.210\% \). The realizations successfully reproduce the global mean and the global distribution of the data. The histogram reproduction appears reasonable. Variogram reproduction should be theoretically honored in simulation that checks the spatial correlation in the final model. Figure 9 shows the variogram reproduction in original units. The variograms of all realizations are slightly more continuous than the original isotropic variogram.

Figure 10
figure 10

The trend model and the scatter plot of 2496 data in original units. (a) Trend model and (b) Scatter plot

Figure 11
figure 11

The transformed trend model and the scatter plot of 2496 data in normal score units. (a) Trend model and (b) Scatter plot

Figure 12
figure 12

The bivariate and univariate distributions of the Gaussian mixture model. (a) Univariate distribution of the exhaustive trend, (b) Scatter plot of transformed variable, (c) Bivariate distribution and (d) Univariate distribution of data

Figure 13
figure 13

The location map and the variogram model of 2496 data in stepwise units. The sizes of the dots in the variogram model represent the relative number of pairs in each direction. (a) Location map and (b) Variogram model

Figure 14
figure 14

The first three realizations and the average over one hundred realizations in stepwise units. (a) Realization 1, (b) Realization 2, (c) Realization 3 and (d) Etype

Figure 15
figure 15

The first three realizations and the average over one hundred realizations in normal score units. (a) Realization 1, (b) Realization 2, (c) Realization 3 and (d) Etype

Figure 16
figure 16

The first three realizations and the average over one hundred realizations in original units (a) Realization 1, (b) Realization 2, (c) Realization 3 and (d) Etype

Figure 17
figure 17

The variance over one hundred realizations in original units. (a) A slice at 775.5 m and (b) A volume in a 3D view

Figure 18
figure 18

The histogram reproduction of 2496 data with the proposed geostatistical modeling in original units

Figure 19
figure 19

The variogram reproduction of 2496 data with the proposed geostatistical modeling in original units. Directional experimental variograms are plotted with points. Light gray lines are the variograms of each realization, while the dark gray line represents the average of all realizations. The black line is the directional variogram from the original 2496 values

The proposed methodology was implemented with a trend model. The trend model is constructed to avoid under- or over-fitting to the data. The trend model contains the large-scale variability and is shown in Figure 10a. The scatter plot is shown in Figure 10b, indicating that the correlation between the trend and the data is 0.51. The exhaustive trend model was transformed into normal score units, while the 2496 data were transformed into normal score units with the declustering weights independently. Figure 11 shows the transformed results indicating a direct relationship with a correlation of 0.52 between the trend and the data in normal score units. The stepwise conditional transformation with a Gaussian mixture model was considered to remove the complexity of the data. The decision of the number of mixture components is subjective. In this case study, two components were determined by visual inspection to fit the scatter plot. Figure 12 shows the Gaussian mixture model. The univariate distributions of the trend model and the data are shown in Figure 12a and d, respectively. The marginal distributions from the Gaussian mixture models are not exactly normal; however, the deviation appears to be very small in Figure 12a where the combined mixture distribution and an exact normal distribution are almost perfectly overlapping. The bivariate distribution is shown on a 2D probability density plot in Figure 12c. The transformed variables are uncorrelated (Fig. 12b). The data after the stepwise conditional transformation in Figure 13a show a randomness, and the trend is removed. The directional variograms in stepwise units are fitted with an isotropic variogram model and shown in Figure 13b. Sequential Gaussian simulation was conducted on the transformed variable. Figure 14 shows the first three realizations and the average of 100 realizations. No trend-like features exist in the simulated results. A stepwise conditional back transformation with the trend was performed. Figure  15 shows the first three realizations and the average of 100 realizations in normal score units. The simulated results show that the trend-like features are restored from the back transformation. The initial normal score transformation was also reversed. Figure 16 shows the first three realizations and the average of 100 realizations in original units. The local variance is calculated and shown in Figure 17. The map shows the high variance in the central and low variance around the margins.

The histogram of realizations must be consistent with the histogram of 2496 conditioning data. The realizations over all locations are considered. The histogram is reasonably reproduced in original units (Fig. 18). The mean over 100 realizations is \( 0.198\% \). The value is slightly lower than the conditioning mean, \( 0.201\% \). The standard deviation is \( 0.201\% \), which is lower than that of the conditioning data, \( 0.210\% \), but it is higher than that of \( 0.196\% \) in the conventional method. Figure 19 shows the variogram reproduction. The overall variogram reproduction from the realizations appears better than that from the conventional method in Figure 9.

Figure 20
figure 20

The location maps with the test data in normal score units. (a) Location map in a top view and (b) Location map in a 3D view

Figure 21
figure 21

The cross-validations of the test data with different methods in normal score units. The grid of light lines shows the probability intervals, while the red lines and bullets show the deviations of the actual proportions from the predicted probability intervals (Deutsch 2010). (a) Conventional method and (b) Developed method

The first validation step was to compare 806 true values with the simulated average values in normal score units. The test data were transformed into a normal distribution with the reference distribution of 2496 data. The locations of the test data were labeled with the drillhole IDs and shown in Figure 20. The distributions of the local uncertainty were specified by a conditional mean and variance in normal score units. The plot in Figure 21 shows the accuracy of the simulated distributions of the uncertainty with the conventional method and the developed method in normal score units. The mean of the variance over 100 realizations from 806 checking locations represents the local uncertainty of the model. The local uncertainty is 0.589 that is underestimated by the conventional method, while the local uncertainty is fair, 0.742, with the developed method. It highlights that the numerical model with the developed method contains more variance than the model with the conventional method due to the values with the conventional method is smooth and close to the global mean. The accuracy of the developed method is better than that of the conventional method.

Figure 22
figure 22

The location maps with the test data in original units. (a) Location map in a top view and (b) Location map in a 3D view

Figure 23
figure 23

The Comparisons from drillholes with different methods in original units: (a, c, e, g) the mean squared error value between true data and simulated values with the conventional method; and (b, d, f, h) the mean squared error value between true data and simulated values with the proposed method. (a) 806 data from 33 DHs, (b) 806 data from 33 DHs, (c) 29 data from DH 31, (d) 29 data from DH 31, (e) 43 data from DH 75, (f) 43 data from DH 75, (g) 10 data from DH 1 and (h) 10 data from DH 1

The second validation step was to compare 806 true values with the simulated average values with mean squared error values in original units. Figure 22 shows the location maps of the test data labeled with drillhole IDs. The mean of the developed method, \( 0.247\% \), is close to the true mean, \( 0.247\% \). The standard deviation of the average measures the smoothing effect. The standard deviation of the average values with the developed method, \( 0.128\% \) contains more variability than that with the conventional method, \( 0.108\% \). The mean squared error value measures the difference between the truth and what is being estimated and, further, summarizes the prediction performance. The minimized mean squared error is used to identify the best method for modeling with a trend. The mean squared error values between the true values and the average values are 0.0631 and 0.0618, respectively. It shows a \( 2.06\% \) improvement in the developed method. Three drillholes extracted from high-, medium- and low-valued zones are compared and shown in Figure 23, indicating 8.53, 2.89, and \(46.21 \% \) improvements, respectively. The developed method shows a significant improvement.

Discussion

A practical framework for non-stationary geostatistical techniques using a Gaussian mixture model was established. The data were divided into a modeling set and a test set. The modeling set was used for proceeding geostatistical modeling, and the test data were used for checking the results. The assumption of stationarity is made in the conventional geostatistical prediction and relaxed in the developed method. The proposed method is more accurate but with greater uncertainty. The mean squared error comparisons show a modest yet important \( 2.06\% \) improvement in the developed method. Drillholes close the margins of the deposit show the greatest improvement.

A significant assumption in the case study is that the trend model is assumed optimal and known. The trend model is a part of characterizing the natural resources. The uncertainty in trend model is ignored so that the overall uncertainty might be underestimated. Data with an apparent trend were transformed conditional to the trend that the trend is important in the stepwise transformation. The parameterization and optimization of the trend is an important area of future work.

Another assumption is that two components for the Gaussian mixture modeling are optimal. A visual inspection is a common approach, but this decision is subjective and depends on the practitioner. A criterion for the number of Gaussian mixtures should be proposed in future research.

Figure 24
figure 24

The mean squared error value of 806 data from 33 drillholes in the final model with the naive approach of modeling with residuals using Gaussian mixture models and adding the trend model back in final models

The approach of modeling with residuals using Gaussian mixture models and adding the trend model back in final models, that is, \( R( {\varvec{u}} ) = Z({\varvec{u}})-m({\varvec{u}}) \) then \( Z({\varvec{u}}) = m({\varvec{u}}) + R({\varvec{u}}) \), was also implemented. The mean squared error value between the truth and the simulated results is 0.0639 in Figure 24, indicating a \( 3.29\% \) loss. The performance of modeling with residuals using Gaussian mixture models was not as good as the proposed method, which models the data more accurately. In addition, the constraint for nonnegative simulated values (\( Z({\varvec{u}}) \ge 0.0 \)) is not required in the proposed method.

The improvements of the stepwise conditional transformation with Gaussian mixture model still exist. The covariance after the stepwise conditional transformation is zero at the lag distance \( {\varvec{h}} = 0 \) and may not be zero at other lag distances that could affect the result (Leuangthong and Deutsch 2003). The use of minimum/maximum autocorrelation factors (MAF) (Desbarats and Dimitrakopoulos 2000) may be considered on the transformed variables if remnant cross-spatial correlation is present. A MAF could assist with variogram fitting, and further, it could help a better performance of mixture models and lead to a better result.

Multiple non-stationary variables could also be considered simultaneously in a hierarchical workflow. Each variable could be processed according to the proposed workflow in Figure 1, and then another Gaussian mixture model could be fit to the detrended variables. A second stepwise conditional transform would remove the dependency between the variables. Gaussian simulation of the independent factors would proceed; then, the back transformation would be performed in reverse order to account for multivariate dependencies and the non-stationary trend models.

Conclusion

Geostatistics has been used for predicting spatial variability. Geostatistical methods depend on stationary statistics. Real geological data often exhibit trend-like features that represent the large-scale variability of the regionalized variable. The assumption of stationarity is not satisfied with the variable in presence of trends. The trend should lead to more accurate estimates than if the trend is ignored.

A modified stepwise conditional transformation for geostatistical modeling is proposed. Data with an apparent trend were transformed conditional to the trend by a parametric transformation. The use of the Gaussian mixtures removes the trend-like features from the regionalized variable, eliminates the artifacts from the data binning of the conventional stepwise conditional transformation, and brings more variation to numerical models. The improved performance of the geostatistical algorithm is attributed to the stationarity of the transformed result.

A real dataset with an obvious trend was used to demonstrate the proposal. Comparisons between the conventional prediction and the developed prediction were made. The performances of numerical models, the reproduction of geological characterizations and the analysis of the local uncertainty were compared. The case study shows that the geostatistical modeling with trend modeling performs better than conventional geostatistical modeling, especially around the margins of the domain.