1 Introduction

Collection of exact observations of the sub-surface is usually costly and time-consuming since it involves drilling of vertical wells from which samples can be obtained. Hence, indirect measurements about the sub-surface are frequently collected, and predictions of the characteristics of interest are based on these indirect observations. The measurements can be collected by log-responses in narrow drill holes or by active geophysical procedures. In order to define a prediction rule for the sub-surface characteristics of interest based on the indirect measurements, both exact observations of the characteristic and indirect measurements are collected in sub-surface profiles in a nearby location. Only a small number of such calibration profiles are usually available due to the cost of obtaining them.

The objective of the study is to define a prediction rule for the facies characteristics along a vertical sub-surface profile in which one has indirect measurement in a narrow drill hole. Calibration data of both facies occurrences and indirect measurements are available in a small number of locations in the neighborhood of the narrow drill hole. The ultimate aim is to predict the facies distribution in a three-dimensional sub-surface volume based on data in a large number of drill holes and a small number of calibration wells. The results from the current study provides an important intermediate result on the path to obtain this three-dimensional sub-surface volume prediction.

Classification of soil classes in the sub-surface of an off-shore windmill park is used as a case study. The log-responses are resistance and friction of a bore head along the vertical profile, and the soil classification is traditionally made by using standardized charts (Robertson 2010). In the current study, a more formal statistical classification approach is used.

The inversion problem is cast in a Bayesian setting. The posterior probability distribution of the categorical facies variable is defined by a likelihood and a prior model. The likelihood model links the indirect measurements and the facies variable, while the prior model represents the experience with the facies variable. The current model, with a conditional independent, single-site response likelihood and Markov chain prior, falls in the class of hidden Markov models. This model class is extensively studied and frequently applied in statistics; see Dymarski (2011) for an overview. In subsurface modelling, facies/lithology characterization is often based on a hidden Markov model. The prior model for the categorical facies variable along a vertical profile is a Markov chain, in a long tradition following Krumbein and Dacey (1969) and Harbaugh and Bonham-Carter (1970). Traditionally, this Markov chain is assumed to be stationary and parametrized by one facies transition matrix, which usually is assessed by a counting estimator in one vertical well (Weissman and Fogg 1999). In Eidsvik et al. (2004) the assessment of the facies transition matrix is cast in a hierarchical Bayesian setting.

The use of a prior stationary Markov chain along the profile causes the prior marginal distributions of facies at all depths to be identical. For sub-surface phenomena this assumption often appears unrealistic since the facies occurrences vary with depth (Avseth et al. 2005). A quantitative assessment of the vertical facies proportions is discussed in Ravenne et al. (2002), but the inclusion of these proportion curves into a Markov model for facies classification remains a challenge, which is addressed in the current study. A prior Markov model that is non-stationary is defined, hence with facies transition matrices that vary with depth. A similar non-stationary Markov chain prior model is used in Ulvmoen et al. (2010), but the assessment of the model parameters are made very ad hoc.

This depth-dependence complicates estimation of the transition matrices. The problem is parametrized by specifying one transition reference-matrix being the transition matrix in the best stationary model and geologist-specified facies proportion curves along the vertical profile. Based on this parametrization, a criterion for identifying the facies transition matrices for each depth is specified and an algorithm for assessing them is defined.

The likelihood model is defined as conditionally independent with single-site response. Hence, the posterior model appears as a non-stationary Markov chain, which can be exactly assessed by the recursive Forward–Backward and Viterbi algorithms (Viterbi 1967; Baum et al. 1970; Scott 2002).

Bold lower, \(\ \varvec{\mathbf {x}}, \ \) and upper, \(\ \varvec{\mathbf {X}}, \ \) case letters denote vectors and matrices, respectively. The vector \(\ \varvec{\mathbf {x}}_{-i} \ \) represents the vector \(\ \varvec{\mathbf {x}} \ \) with element \(\ i \ \) removed. Similarly, the set \( {\mathcal {T}}\) with element \(\{i\}\) removed is denoted \( {\mathcal {T}}_{-i}\). The term \(\ p(\cdot ) \ \) is used for probability density/mass function and the Gaussian n-vector \(\ \varvec{\mathbf {x}} \ \) has probability density function \(\ \varphi _n\left( \varvec{\mathbf {x}}; \ \varvec{\mathbf {\mu }}_x, \ \varvec{\mathbf {\Sigma }}_x \right) \ \) with expectation n-vector \(\ \varvec{\mathbf {\mu }}_{x} \ \) and covariance \(\ \ \left( n\times n \right) \)-matrix \(\ \varvec{\mathbf {\Sigma }}_{x}. \ \) The identity n-vector and \(\ \ \left( n\times n \right) \)-matrix are denoted \(\ \varvec{\mathbf {i}}_n,\ \text{ and } \ \varvec{\mathbf {I}}_n,\ \) respectively. Lastly, inequality operators between vectors or matrices entails element-wise inequality.

In Sect. 2, the problem definition and notation are discussed in more detail. In Sect. 3, the likelihood and two prior and posterior models are defined and also the depth-dependent transition matrices and an algorithm for assessing them are specified. In Sect. 4, the categorical Bayesian inverse method is tested on both synthetic and real data sets. Finally, in Sect. 5, conclusion and recommendation for further work are presented.

2 Problem Definition and Notation

The sub-surface characteristic of interest is represented along a vertical profile discretized to \( {\mathcal {T}}: \left\{ 1, \ldots , {T} \right\} \). At each depth \( t \in {\mathcal {T}} \) the categorical facies class \( \kappa _t \in \varOmega _\kappa : \left\{ 1,\ldots ,K \right\} \) belong to one out of K classes, which might be un-ordered. The facies profile is represented by the vector, \(\varvec{\mathbf {\kappa }} = \left( \kappa _1,\ldots ,\kappa _T\right) ^{'}\), and assessing this vector is the objective of the study. Along the profile one real variable associated with the facies class is observed \( \ \varvec{\mathbf {d}} = \left( d_1,\ldots ,d_T\right) ^{{'}}, \ \ d_t \in {\mathcal {R}}^{1}\). Hence, the objective of the study is to predict \(\ \varvec{\mathbf {\kappa }} \ \ \text{ given } \ \ \varvec{\mathbf {d}},\) that is predict \([\varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}}]\).

Prediction is approached in a probabilistic setting using Bayesian inversion

$$\begin{aligned} p(\varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}}) = \ \text{ const } \ \times p(\varvec{\mathbf {d}}|\varvec{\mathbf {\kappa }}) \ p(\varvec{\mathbf {\kappa }}) \end{aligned}$$

where the likelihood function \( \ p(\varvec{\mathbf {d}}|\varvec{\mathbf {\kappa }}) \ \) defines the link between the observations \(\ \varvec{\mathbf {d}} \ \) and variable of interest \( \ \varvec{\mathbf {\kappa }}, \ \) the prior probability density function (pdf) \(\ p(\varvec{\mathbf {\kappa }}) \ \) represents the general experience with the variable of interest \(\ \varvec{\mathbf {\kappa }}, \ \) while “\( \text{ const }\)” is a normalizing constant. The posterior pdf \(\ p\left( \varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}}\right) \ \) is uniquely defined by the likelihood function and the prior pdf, and it constitutes the ultimate solution to a Bayesian inversion problem.

The posterior pdf \(\ p\left( \varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}}\right) \ \) must usually be explored by simulation, ideally based on a set of realizations

$$\begin{aligned} \left[ \varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}}\right] _1, \ldots , \left[ \varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}}\right] _s \ \ \text{ iid } \ \ p\left( \varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}}\right) . \end{aligned}$$

These realizations are most often generated approximately by Markov chain Monte Carlo (MCMC) simulation, while under very particular model assumptions the generation might be done exactly by repeating a sequential algorithm. In the current study, model assumptions, which make efficient sequential generation possible are defined.

The prediction of \( \ [\varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}}] \ \) can be based on a maximum posterior (MAP) criterion since the sample space \(\ \varOmega _\kappa \ \) is categorical, hence the MAP predictor is

$$\begin{aligned} \hat{\varvec{\mathbf {\kappa }}}_{MAP}= \text{ MAP }\left\{ \varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}}\right\} = \mathrm {arg}\ {\underset{\varvec{\mathbf {\kappa }}}{\mathrm {max}}} \ \left\{ p(\varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}})\right\} . \end{aligned}$$

Computing the MAP predictor will generally require complex discrete optimization in high dimension, while particular model assumptions might simplify the calculations. The model assumptions defined later make sequential optimization possible. Moreover, it is interesting to present posterior probability profiles for each class \(\ \kappa \in \varOmega _\kappa , \ \) which is defined as

$$\begin{aligned} \varvec{\mathbf {p}}_\kappa = \left[ p(\kappa _t = \kappa |\varvec{\mathbf {d}}) \right] _{t \in {\mathcal {T}}}; \ \ \ \kappa \in \varOmega _\kappa . \end{aligned}$$

From these profiles, the uncertainty associated with the MAP prediction \(\ \hat{\varvec{\mathbf {\kappa }}}_{MAP} \ \) can be assessed. Lastly, an alternative marginal maximum posterior (MMAP) predictor can be defined

$$\begin{aligned} \hat{\varvec{\mathbf {\kappa }}}_\mathrm{MMAP}= \text{ MMAP }\left\{ \varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}}\right\} = \ \left[ \ \mathrm {arg}\ {\underset{\kappa }{\mathrm {max}}} \ \left\{ p(\kappa _t=\kappa |\varvec{\mathbf {d}}) \right\} \ \right] _{t \in {\mathcal {T}}}. \end{aligned}$$

3 Model Definition

The posterior pdf, which is the focus of this study, is defined by the likelihood function and prior pdf. The posterior also involves the normalizing constant; however, it is usually very complex to calculate. In this section, model assumptions are defined such that it is possible to assess the posterior pdf very efficiently.

3.1 Observation Likelihood

The likelihood function \(\ p(\varvec{\mathbf {d}}|\varvec{\mathbf {\kappa }}) \ \) defines the link between the observations, \(\ \varvec{\mathbf {d}} \ \) and the characteristic in focus, \(\ \varvec{\mathbf {\kappa }}. \ \) In the function, \(\ \varvec{\mathbf {d}} \ \) is a fixed vector of observations while the vector of facies classes \(\ \varvec{\mathbf {\kappa }} \ \) is the variable. If it is reasonable to assume that the likelihood can be factorized as follows

$$\begin{aligned} p(\varvec{\mathbf {d}}|\varvec{\mathbf {\kappa }}) \ = \ \prod _{t \in {\mathcal {T}}} p(d_t|\varvec{\mathbf {\kappa }}) \ = \ \prod _{t \in {\mathcal {T}} } p(d_t|\kappa _t) \end{aligned}$$
(1)

the calculations of the posterior pdf might be dramatically simplified. The assumption entails conditional independence and single-site response in the observations. Observations with independent errors from one depth \(\ t \ \) to another cause the former, while responses without spatial convolutions cause the latter. These assumptions are often justifiable at an initial stage of modeling, and they may be generalized at the expense of higher computational demands. The likelihood function is defined by the partial likelihood functions

$$\begin{aligned} p(d_t|\kappa _t); \ \kappa _t \in \varOmega _\kappa \ ; \ \ t \in {\mathcal {T}}. \end{aligned}$$

3.2 Prior Experience

The prior pdf \(\ p(\varvec{\mathbf {\kappa }}) \ \) represents the initial knowledge about the variable in focus \( \ \varvec{\mathbf {\kappa }}. \ \) This sub-surface characteristic will normally be spatially coupled and represented by a Markov chain

$$\begin{aligned} p(\varvec{\mathbf {\kappa }}) \ = \ p(\kappa _{1}) \ \prod _{t \in {\mathcal {T}}_{-1}} p(\kappa _t|\kappa _{t-1},\ldots ,\kappa _1) \ = \ p(\kappa _{1}) \ \prod _{t \in {\mathcal {T}}_{-1}} p(\kappa _t|\kappa _{t-1}). \end{aligned}$$
(2)

The first-order Markov assumption entails that the knowledge of the previous state screens out the influence of the history, causing the latter factorial expression. Note that this assumption entails that the spatial dependence is \( \ p(\kappa _t|\varvec{\mathbf {\kappa }}_{-t}) \ = \ p(\kappa _t|\kappa _{t-1},\kappa _{t+1}). \ \) These screening assumptions introduce geometrically decaying spatial dependence in the variable, and they may be generalized on the expense of higher computational demands.

The prior pdf is defined using the uni- and bi-variate properties, which define the initial pdf and the set of transition matrices

$$\begin{aligned} \begin{aligned}&\varvec{\mathbf {p}}_{1} \ = \ [ p(\kappa _1) ]_{\kappa _1 \in \varOmega _\kappa } \\&\varvec{\mathbf {P}}_{t-1, \ t} \ = \ [ p(\kappa _t|\kappa _{t-1}) ]_{\kappa _{t-1}, \ \kappa _t \in \varOmega _\kappa } \ ; \ \ t \in {\mathcal {T}}_{-1} \end{aligned} \end{aligned}$$
(3)

which in turn defines the set of marginal pdfs by the recursion

$$\begin{aligned} \begin{aligned}&\varvec{\mathbf {p}}_{t} \ = \ \left[ p(\kappa _t) \right] _{\kappa _t \in \varOmega _\kappa } \ = \ \varvec{\mathbf {P}}_{t-1, \ t}' \ \varvec{\mathbf {p}}_{t-1} \ ; \ \ t \in {\mathcal {T}}_{-1}. \end{aligned} \end{aligned}$$
(4)

Note that the set of marginal pdfs \( \ \varvec{\mathbf {p}}_{t} \ ; \ t \in {\mathcal {T}} \ \) does not uniquely define the prior pdf since many sets of transition matrices will provide identical sets of marginal pdfs.

3.3 Posterior Model

The posterior pdf is fully defined by the likelihood function and the prior pdf. The assumptions enforced in the two previous sub-sections entails that the model can be displayed as a directed graph (Fig. 1). The resulting posterior pdf can be expressed as

$$\begin{aligned} \begin{aligned} p(\varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}})&= \ \text{ const } \ \times \prod _{t \in {\mathcal {T}}} \ p(d_t|\kappa _t) \times p(\kappa _1) \ \prod _{t \in {\mathcal {T}}_{-1}} \ p(\kappa _t|\kappa _{t-1}) \\&= \ \text{ const } \ \times p(d_1|\kappa _1) \ p(\kappa _1) \ \times \prod _{t \in {\mathcal {T}}_{-1}} \ p(d_t|\kappa _t) \ p(\kappa _t|\kappa _{t-1}) \\&= p(\kappa _1|\varvec{\mathbf {d}}) \ \prod _{t \in {\mathcal {T}}_{-1}} p(\kappa _t|\kappa _{t-1}, \ \varvec{\mathbf {d}}). \end{aligned} \end{aligned}$$
(5)

Hence, the posterior pdf will also be a first-order Markov chain with initial pdf and transition matrices dependent on the observation vector \( \ \varvec{\mathbf {d}}; \ \) see “Appendix A”. The corresponding spatial expression is \( \ p(\kappa _t|\varvec{\mathbf {\kappa }}_{-t}, \ \varvec{\mathbf {d}}) \ = \ p(\kappa _t|\kappa _{t-1}, \ \kappa _{t+1}, \ \varvec{\mathbf {d}}). \ \)

Fig. 1
figure 1

Dependence structure of the hidden Markov model

The posterior pdf, under the assumptions specified, can be assessed by the highly efficient, recursive Forward–Backward algorithm, see “Appendix B”, and the solution is the initial posterior pdf and the set of posterior transition matrices (Baum et al. 1970; Scott 2002)

$$\begin{aligned} \begin{aligned}&\varvec{\mathbf {p}}_{1|\mathrm{d}} \ = \ [ p(\kappa _1|\varvec{\mathbf {d}}) ]_{\kappa _1 \in \varOmega _\kappa } \\&\varvec{\mathbf {P}}_{t-1, \ t|\mathrm{d}} \ = \ [ p(\kappa _t|\kappa _{t-1}, \ \varvec{\mathbf {d}}) ]_{\kappa _{t-1}, \ \kappa _t \in \varOmega _\kappa } \ ; \ \ t \in {\mathcal {T}}_{-1}. \end{aligned} \end{aligned}$$
(6)

The corresponding set of marginal posterior pdfs can also be recursively assessed

$$\begin{aligned} \varvec{\mathbf {p}}_{t|\mathrm{d}} \ = \ [ p(\kappa _t|\varvec{\mathbf {d}}) ]_{\kappa _t \in \varOmega _\kappa } \ = \ \varvec{\mathbf {P}}_{t-1, \ t|\mathrm{d}}' \ \varvec{\mathbf {p}}_{t-1|\mathrm{d}} \ ; \ t \in {\mathcal {T}}_{-1}. \end{aligned}$$

Since the posterior pdf is a first-order Markov chain with easily assessable initial pdf and set of transition matrices, generating a set of realizations from the posterior pdf, \(\ [\varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}}]_s \ ; \ s = 1, \ldots , S \ \) is trivial and very fast. Moreover, determining the MAP state, \(\ \hat{\varvec{\mathbf {\kappa }}}_{MAP}, \ \) in a first-order Markov chain can be done recursively by the very efficient Viterbi algorithm (Viterbi 1967); see “Appendix C”.

Lastly, the posterior probability profile for each class, \( \ \varvec{\mathbf {p}}_{\kappa } \ ; \ \kappa \in \varOmega _\kappa , \ \) can be calculated from the set of marginal posterior pdfs, \( \ \varvec{\mathbf {p}}_{t|\mathrm{d}} \ ; \ t \in {\mathcal {T}}. \ \) Consequently, all features of the posterior pdf of interest can be assessed extremely efficiently by these recursive algorithms.

3.4 Alternative Prior Formulations

The prior pdf is assumed to be a first-order Markov chain, possibly a non-stationary one

$$\begin{aligned} p(\varvec{\mathbf {\kappa }}) \ = \ p(\kappa _{1}) \ \times \prod _{t \in {\mathcal {T}}_{-1}} p(\kappa _t|\kappa _{t-1}). \end{aligned}$$
(7)

The model is defined by the initial pdf \(\ \varvec{\mathbf {p}}_1 \ \) and the set of transition matrices \(\ \varvec{\mathbf {P}}_{t-1, \ t} \ ; \ t \in {\mathcal {T}}_{-1}. \ \) The resulting set of marginal prior pdfs is \(\ \varvec{\mathbf {p}}_{t} \ ; \ t \in {\mathcal {T}}. \ \)

3.4.1 Traditional Prior Model

The prior pdf is usually defined to be spatially stationary (Krumbein and Dacey 1969; Eidsvik et al. 2004). This definition entails that the elements in the set of transition matrices are identical, \(\ \varvec{\mathbf {P}}_{t-1, t} \ = \ \varvec{\mathbf {P}}_{{t^{'}-1}, t^{'}} \ = \ \varvec{\mathbf {P}} {;} \ t, t^{'} \in {\mathcal {T}}_{-1} \ \) and the initial pdf is \(\ \varvec{\mathbf {p}}_1=\varvec{\mathbf {p}}_s \ \) where \( \ \varvec{\mathbf {p}}_s \ \) is the associated stationary pdf, that is \( \ \varvec{\mathbf {p}}_s = \varvec{\mathbf {P}}' \ \varvec{\mathbf {p}}_s. \ \) Consequently, all marginal pdfs are identical \(\ \varvec{\mathbf {p}}_t=\varvec{\mathbf {p}}_s {;} \ t \in {\mathcal {T}}.\) Hence, spatial stationarity entails transition invariant marginal pdfs and transition invariant spatial dependence (Billard and Meshkani 1995).

This prior stationarity assumption is often suitable, since prior models are experience-based and globally anchored, while local characteristics should be a consequence of the site specific observations introduced by the likelihood function.

3.4.2 Trend Prior Model

Spatial stationarity is often a suitable assumption for the horizontal dimension, while the vertical dimension often is known to have trends. The proportions of facies are known to vary with depth in a vertical sub-surface profile, for example.

In the current study of the categorical variable \( \ \varvec{\mathbf {\kappa }} \ \) along a vertical sub-surface profile one may want to specify varying vertical proportions a priori. A suitable prior model parameterization would be the varying marginal pdfs along the profile \( \ \varvec{\mathbf {p}}_{t}^{0} {;} \ t \in {\mathcal {T}}. \ \) The fact that this model specification does not uniquely define the prior Markov chain model \( \ p(\varvec{\mathbf {\kappa }}) \), since the set of marginal pdfs does not fully characterize the multivariate pdf, creates problems. One possible solution to this challenge is to define a reference transition matrix \( \ \varvec{\mathbf {P}}_{r} \ \) and to define a prior model specification for the multivariate pdf \( \ p(\varvec{\mathbf {\kappa }}) \ \) that has transition matrices that deviate as little as possible from \( \ \varvec{\mathbf {P}}_{r} \ \), which still reproduce the marginal pdfs \(\ \varvec{\mathbf {p}}_{t}^{0} { ;} \ t \in {\mathcal {T}}. \ \) This approach is used in defining the trend prior model in this study, and the approach is outlined below with a more detailed presentation in “Appendix D”.

Specify the marginal pdf profile \( \ \varvec{\mathbf {p}}_{t}^{0}; \ t \in {\mathcal {T}} \ \) and a reference transition matrix \( \ \varvec{\mathbf {P}}_{r} \ = [p_{r}(\kappa |\kappa ^{'})]_{\kappa , \ \kappa ^{'} \ \in \varOmega _{\kappa }}. \ \) The required prior model parameters are the initial pdf \( \ \varvec{\mathbf {p}}_1 \ \) and the set of transition matrices \( \ \varvec{\mathbf {P}}_{t-1, \ t} \ = \ \left[ \ p(\kappa _t|\kappa _{t-1}) \ \right] _{\kappa _{t-1}, \ \kappa _t \in \varOmega _\kappa }; \ \ t \in {\mathcal {T}}_{-1} \ , \ \text{ which } \text{ are } \text{ defined } \text{ as } \text{ follows. } \ \)

Let

$$\begin{aligned} \varvec{\mathbf {p}}_1 \ = \ \varvec{\mathbf {p}}_{1}^{0} \end{aligned}$$

\(\text{ For } \ \ t=2,\ldots ,T \)

$$\begin{aligned} \varvec{\mathbf {P}}_{t-1, \ t} = \mathrm {arg}\ {\underset{\varvec{\mathbf {P}}}{\mathrm {min}}} \ \left\{ \Arrowvert \varvec{\mathbf {P}} - \varvec{\mathbf {P}}_r \Arrowvert _{wL^{2}} \right\} \end{aligned}$$

constrained by

$$\begin{aligned} \begin{aligned}&\varvec{\mathbf {p}}_{t}^{0} \ = \ \varvec{\mathbf {P}}^{'} \ \varvec{\mathbf {p}}_{t-1}^{0} \\&\varvec{\mathbf {i}}_k \ = \ \varvec{\mathbf {P}} \ \varvec{\mathbf {i}}_k \\&\varvec{\mathbf {P}} \ \ge \ 0 \ \varvec{\mathbf {I}}_k \end{aligned} \end{aligned}$$
(8)

where \(\ \varvec{\mathbf {P}} \ \) is a \(\ (K \times K)\)-matrix and the deviation measure \( \ \ \Arrowvert \cdot \Arrowvert _{wL^{2}} \ \) is an element-wise weighted squared distance metric. The first set of constraints ensures reproduction of the marginal pdfs \(\ \varvec{\mathbf {p}}^{0}_{t} \ ; \ t \in {\mathcal {T}}_{-1}, \ \) the second set ensures that the rows of \(\ \varvec{\mathbf {P}} \ \) add to unity, while the third set ensures that all elements in \(\ \varvec{\mathbf {P}} \ \) are non-negative. Consequently, \(\ \varvec{\mathbf {p}}_1 \ \) and the set \( \ \varvec{\mathbf {P}}_{t-1, \ t} \ ; \ t \in {\mathcal {T}}_{-1}, \ \) define a first-order Markov chain with marginal pdfs \( \ \varvec{\mathbf {p}}_{t} \ ; \ t \in {\mathcal {T}} \ \) that reproduces \( \ \varvec{\mathbf {p}}^{0}_{t} \ ; \ t \in {\mathcal {T}}. \ \)

The minimization system at each \( \ t \ \) contains a quadratic object function with two linear equality constraints and one linear inequality constraint. An optimization problem of this type cannot generally be solved analytically.

Sequential optimization procedure involving analytical solutions in each step is prescribed.

  1. 1.

    Solve the minimization system with the two linear equality constraints analytically by using Lagrange minimization. If the solution \(\ (K \times K)\)-matrix \(\ \varvec{\mathbf {P}}_{t-1, t} \ \) contains only non-negative elements, the solution is identified. If, however, some elements in \( \ \varvec{\mathbf {P}}_{t-1, \ t} \ \) are negative, proceed to Step 2.

  2. 2.

    Set the negative elements in \( \ \varvec{\mathbf {P}}_{t-1, \ t} \ \) to zero, and solve the minimization system with the linear equality constraints for the remaining elements. If the updated solution \( \ \varvec{\mathbf {P}}_{t-1, \ t} \ \) has only non-negative elements, the solution is identified. If, however, some elements are negative iterate Step 2.

Note that if the solution is identified in Step 1,  the optimal solution is found since the inequality constraints are inactive. Otherwise, neither existence of a solution nor optimality of the obtained solution are demonstrated. Note, however, if a solution of \( \ \varvec{\mathbf {P}}_{t-1, \ t} \ \) is identified, it is a valid transition matrix and it reproduces the set of marginal pdfs \( \ \varvec{\mathbf {p}}_{t}^{0} \ , \ t \in {\mathcal {T}}. \ \) The algorithm has computational demands of decomposing a \( \left( (K+1)^{2}-2\right) \times \left( (K+1)^{2}-2\right) \) matrix at each t and is, therefore, linear in T and very efficient.

The current application, involving varying class proportions along a vertical sub-surface profile, will naturally have smoothly varying marginal pdf curves. The reference transition matrix \( \ \varvec{\mathbf {P}}_r \ \) can be interpreted as an average transition matrix of the Markov chain. By defining \( \ \varvec{\mathbf {P}}_r \ \) consistently with the marginal pdf profile, and by using natural binomial weighting in the object function, one expects the appearance of non-negative elements in the solution \( \ \varvec{\mathbf {P}}_{t-1, \ t} \ \) to be unlikely; see “Appendix C”.

3.5 Assessment of Posterior Model

Both posterior models that are based on the traditional and the trend prior models can be exactly assessed by the recursive Forward–Backward algorithm; see “Appendix B”. The computational demands are proportional to \(\ T \times K^{2} \ \) for both models, hence linear in the length of the profile. The solution identifies

$$\begin{aligned} \begin{aligned}&\varvec{\mathbf {p}}_{1|d} \ = \ [ p(\kappa _1|\varvec{\mathbf {d}}) ]_{\kappa _1 \in \varOmega _\kappa } \\&\varvec{\mathbf {P}}_{t-1, \ t|d} \ = \ [ p(\kappa _t|\kappa _{t-1}, \varvec{\mathbf {d}}) ]_{\kappa _{t-1}, \ \kappa _t \in \varOmega _\kappa } \ ; \ \ t \in {\mathcal {T}}_{-1}. \end{aligned} \end{aligned}$$
(9)

4 Results and Discussion

To compare the traditional and trend prior models, the MAP predictor, the MMAP predictor, the marginal probability profile, and realizations simulated from the posterior model are used. The methods are applied on two different data sets, one synthetic example and one real case study.

4.1 Synthetic Example

Consider an example with a vertical profile discretized to \( \ {\mathcal {T}} =\left\{ \ 1, \ldots , 100 \right\} \ \) with categorical variable of interest \( \ \kappa _t \in \varOmega _\kappa : \ \{\text{ black, } \text{ gray } \}, \ \) see reference profile \( \ \varvec{\mathbf {\kappa }}^{r} \ \) in Fig. 2. Note that there appears to be more black than gray on top of the profile and vice versa. The likelihood model is defined by \( \ p(d_t|\kappa _t) \ = \ \varphi _1(d_t; \ \mu _{\kappa _t}, \ \sigma ^{2}_{\kappa _t} ) \ \) with \( \ (\mu _{\kappa }, \ \sigma ^{2}_{\kappa } ); \ \kappa \in \varOmega _\kappa \ \) equal \( \ (1, \ 1.5^{2} ) \ \) and \( \ (-1, \ 1.5^{2} ) \ \) for \( \ \kappa \ \) being black and gray, respectively; see Fig. 2.

Fig. 2
figure 2

Reference profile, likelihood model and reference observations

The synthetic reference observations \( \ \varvec{\mathbf {d}}^{0} \ \) are also displayed in Fig. 2. The objective of the example is to assess \( \varvec{\mathbf {\kappa }} \ \) given the observation \( \ \varvec{\mathbf {d}}^{0}, \ \) hence assess \( \ [ \ \varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}}^{0} \ ]. \ \) Bayesian inversion as described in Sect. 2 is applied, and the correct likelihood function displayed in Fig. 2 is used. Two alternative prior models \( \ p(\varvec{\mathbf {\kappa }}) \ \) are evaluated. One traditional prior model with stationary transition matrix and stationary pdf as

$$\begin{aligned} \varvec{\mathbf {P}}_{t-1, \ t} \ = \ \begin{bmatrix} 0.8&0.2 \\ 0.2&0.8 \end{bmatrix} \quad \varvec{\mathbf {p}}_{s}=( 0.5, \ 0.5)'. \end{aligned}$$

One trend prior model with marginal pdf profile as displayed in Fig. 3b top third from left and reference transition matrix \( \ \varvec{\mathbf {P}}_r \ \) identical to the transition matrix for the traditional prior model. Note that the prior marginal profile assigns higher probability for black at the top and gray at the bottom and that this profile is subjectively assigned by the geoscientists. Based on this pdf profile and the reference matrix unique model parameters for the prior non-stationary Markov chain model are defined by the approach outlined in Sect. 3.2. The elements must be non-negative. In this example, negative values occurred in Step 1 of the algorithm, see “Appendix D”, at two depths. The actual depths are 79 and 80 and the values are \(-\,0.008\) and \(-\,0.002\). Hence, the absolute values are very small compared to the range of probabilities [0, 1], and they occur at break points close to 0 in the prior marginal profile. In this example, with two classes \(\{\text{ black,gray }\}\) only, Step 2 in the algorithm will assign a binary transition for the corresponding facies at these depths. Experience tells that if the prior marginal profile had been smoothed, the negative values are less likely to occur. Note that in this example with two classes \(\{\text{ black,gray }\}\) the transition matrix to be determined is a \((2\times 2)\)-matrix with four unknown elements. Each line in the matrix must sum to unity, hence only two unknown elements remain. The reproduction of the marginal probabilities defines two linear equality constraints which actually are linearly dependent and hence only enforce one active linear equality constraint. This reduces the number of unknown elements to only one which is identified by the quadratic minimization. In cases with more classes, hence large K,  the number of free parameters in the optimization will be larger.

The solutions to the inversion problem \( \ [ \varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}}^{0} ] \ \) are presented in Fig. 3. The upper display contains results based on the traditional prior model while the lower display contains results from the trend prior model. In both displays the reference profile \( \ \varvec{\mathbf {\kappa }}^{r} \ \text{ and } \text{ observations } \ \varvec{\mathbf {d}}^{0} \ \) are identical, and they are presented upper left. Afterwards, the actual prior marginal pdf profiles for the two prior models are presented in each display. The stationary and trend features are easily observable. The upper right profiles in each display are predictions of \( \ [\varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}}], \ \) that is \(\ \hat{\varvec{\mathbf {\kappa }}}_{MAP} \ \) and \(\ \hat{\varvec{\mathbf {\kappa }}}_{MMAP} \ \) respectively, and the posterior probability profile \(\ \varvec{\mathbf {p}}_{\kappa }\ ; \kappa \in \varOmega _{\kappa }, \ \) for each prior model. The impact of the trend in the latter display is easily observable in the posterior probability profile, there is a much clearer vertical black-gray drift than for the stationary model in the upper display. Note also that the MAP and MMAP predictions deviate at the very bottom of the profile. The predictions based on both prior models have a vertical black-gray drift due to conditioning on \( \ \varvec{\mathbf {d}}^{0}, \ \) but the drift is emphasized by having the trend in the prior model. The lower part in each display contains six realizations from the posterior pdf for each prior model. These realizations expose higher heterogeneity than the corresponding predictions, of course. Note that the trend prior model in the lower display tend to reduce the heterogeneity at top and bottom of the profile. The emphasized vertical black-gray drift can be observed in the realization based on the trend prior model.

Note lastly, that introduction of trends in the prior models should be justified by external geoscience experience with the phenomenon under study, using a stationary prior model would be a natural choice without such experience.

Fig. 3
figure 3

Results from Bayesian inversion of synthetic example. a Results from traditional prior model. b Results from trend prior model

4.2 Case Study

Focus is on predicting the soil classes in the sub-surface off-shore, in order to plan construction of an off-shore windmill park. Consider a vertical sub-surface profile discretized with \( \ T=1186\). The true profile in one location \( \ \varvec{\mathbf {\kappa }}^{r} \ = \ \left( \kappa _1^{r},\ldots ,\kappa _T^{r}\right) ^{'} \ \text{ with } \ \kappa _t^{r}\in \varOmega _\kappa : \left\{ \text{ black, } \text{ red, } \text{ green, } \text{ blue }\right\} \ \) is displayed in Fig. 4a. The classes represent various sand/clay types with different geomechanical characteristics. In the same location, the log-responses, cone resistance and sleeve friction are collected.

These responses are normalized to obtain the actual observations \( \ \varvec{\mathbf {d}}^{0} \ = \ (\varvec{\mathbf {d}}_1^{0},\ldots , \varvec{\mathbf {d}}_T^{0}) \ \text{ with } \ \varvec{\mathbf {d}}_t^{0}\in {\mathcal {R}}^{2} \ \) being normalized cone resistance and normalized sleeve friction, respectively (Fig. 4). In this case study the aim is to assess the true profile based on the observed log-response \( \ \left[ \varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}}^{0}\right] . \ \)

Fig. 4
figure 4

Case study data and estimated likelihood model

Fig. 5
figure 5

Results from Bayesian inversion of Case study. a Results from traditional prior model. b Results from trend prior model

The data in the well make it possible to present a bi-variate plot of likelihood function \( \ p(\varvec{\mathbf {d}}_\mathrm{t}^{0}|\kappa _t), \ \) and a likelihood model is fitted,

$$\begin{aligned} p(\varvec{\mathbf {d}}_\mathrm{t}^{0}|\kappa _t) \ = \ \varphi _2\left( \varvec{\mathbf {d}}_t^{0}; \ \varvec{\mathbf {\mu }}_{\kappa _t}, \ \varvec{\mathbf {\Sigma }}_{\kappa _t} \right) \ \ \kappa _t \in \varOmega _\kappa \end{aligned}$$

which also is displayed in Fig. 4b. Note that the model is very rough and does not over-fit the observations that actually appear as multi-modal; see for example the green class.

The Bayesian inversion results based on two alternative prior models are compared. One traditional Markov chain model with stationary class proportions along the profile, having stationary transition matrix and stationary proportions,

$$\begin{aligned} \varvec{\mathbf {P}}=\begin{bmatrix} 0.85&\quad 0.05&\quad 0.05&\quad 0.05 \\ 0.05&\quad 0.85&\quad 0.05&\quad 0.05\\ 0.05&\quad 0.05&\quad 0.85&\quad 0.05\\ 0.05&\quad 0.05&\quad 0.05&\quad 0.85 \end{bmatrix} \quad \varvec{\mathbf {p}}_{s} = (0.25 \ \ 0.25 \ \ 0.25 \ \ 0.25 ).^{'} \end{aligned}$$

The other model is a trend prior model with depth varying class proportions, \( \ \varvec{\mathbf {p}}_t^{0}; \ t \in {\mathcal {T}}, \ \) presented in Fig. 5b. No class has < 0.05 and more than 0.85 prior probability to occur. The reference transition matrix \( \ \varvec{\mathbf {P}}_r \ \) is identical to the transition matrix above. These two parameters do not uniquely define a prior Markov model, but the approach outlined in Sect. 3.2 to specify the prior pdf \( \ p(\varvec{\mathbf {\kappa }}) \ \) is used. In this study there are nine free parameters to minimize the object function and the object function is centered close to zero, at 0.05, for twelve of the 16 parameters.

In the algorithm, see “Appendix D”, \( T-1=1185\) transition \((4 \times 4)\)-matrices are identified, hence 18,960 transition probabilities. After Step 1 666 negative values appeared and these values are displayed in the histogram in Fig. 6a. The absolute values of all these negative values are very small. In Step 2 of the algorithm these transition probabilities are assigned the value zero, and a reduced optimization is solved. The solution after this step contained no negative values.

Fig. 6
figure 6

Results from optimization procedure for Case study. a Histogram of negative values after Step 1 in optimization procedure for Case study. b Prior transition matrix after optimization procedure for Case study

These results are encouraging since the final solution is very close to the initial one after Step 1. Figure 6b display the calculated transition matrices \( \ \varvec{\mathbf {P}}_{t-1, t} \ \) for \(t=200,\ 600,\ 1000\). Note that the matrix for \(t=200\) has one zero-valued element which is assigned in Step 2 of the algorithm. The transition matrices appear as expected with relatively large diagonal elements ensuring high spatial dependence of the class. The identical off-diagonal elements are caused by the marginal probability for several classes being identical in sections of the prior specification.

Note that the two prior models are fairly general and based on expert experience about soil distribution in the sub-surface in the area. The results from the Bayesian inversion of \( \ \left[ \varvec{\mathbf {\kappa }}|\varvec{\mathbf {d}}^{0}\right] \ \) based on the two alternative prior models are displayed in Fig. 5 in a format similar to Fig. 3 in the synthetic example. The MAP, MMAP and posterior marginal pdfs based on the traditional prior model, in Fig. 5a upper right, do largely have the correct ordering of the classes compared to the true profile at far left in the figure. The predictions appear with many more class-transition than the true profile, however. The associated posterior realizations in Fig. 5a bottom row are even more heterogeneous. The reason for heterogeneity can be seen in the likelihood display, Fig. 4b. Observe that the likelihood model for the blue class, for example, is very wide due to the appearance of two blue clusters in the data, hence it overlaps the likelihood for the red class. A more adaptive likelihood model could be defined, of course, but this could cause over-fitting. Alternatively, a more informative prior transition matrix could be defined, also at the cost of possibly over-fitting.

The MAP, MMAP and posterior marginal pdfs based on the trend prior model is displayed in Fig. 5b upper right, with the posterior realizations in the bottom row of the figure. These results have many features in common with the results based on the traditional prior model, of course, since both models draw on the same likelihood model. Observe, however, that the predictions and posterior realizations based on the trend prior model appear with significantly less heterogeneity than the ones based on the traditional prior model. The varying proportion profile enforced through the trend prior model reduces the number of class-transitions in the posterior model.

This case study is based on one calibration well only, hence the model inference and posterior model evaluation are based on the same data. This coupling favors over-fitted models which are avoided by use of rough likelihood models. A fair comparison should involve posterior model evaluation on data from a blind calibration well, which will be made in a later study.

5 Conclusions

A categorical inversion method for modelling the sequence of classes along a sub-surface profile based on a observed logging response is presented. The inversion is cast in a Bayesian framework as a hidden Markov model and assessed by the Forward–Backward and Viterbi algorithms. Two different prior model formulations are studied, one traditional stationary model and one trend prior model. The trend prior model is parametrized by a vertical class-proportion profile and a reference transition matrix. By using an efficient sequential analytical procedure depth dependent transition matrices reproducing the class-proportions are identified. The two prior models are tested on both synthetic and real data sets.

User-specified class-proportions along the vertical profile may have large impact on the class predictions and simulations. The number of class transitions is often lower than the corresponding number for a traditional stationary model. The algorithm for estimating the depth-dependent transition matrices from the proportion profiles and the reference matrix appears as efficient and reliable.

The possibility for geologists to constrain the categorical inversion by class-proportion curves is expected to have a large user-potential.