Definition

Forward model. Quantitative tool for simulation of observables for a given set of model parameters.

Forward problem. System of differential equations with initial and/or boundary conditions, the solution of which, the forward solution, is used for simulation of observables.

Forward solution. Solution of the forward problem.

Instrument model. Quantitative procedure for evaluation of observables from the forward solution, which models the characteristics of the instrument and the procedure for measurements.

Inverse problem. Mathematical problem, usually in the form of a matrix equation, the solution of which, the inverse solution, represents an estimate of the state vector.

Model parameters. Parameters of the forward problem contained in its differential equations and initial and/or boundary conditions.

Observables. Output parameters of the forward model, which simulate the quantities measured in the geophysical retrieval.

Sensitivities. Partial or variational derivatives of observables with respect to model parameters.

State vector. Subset of the set of model parameters, which is intended to be retrieved.

Introduction

In contrast to forward modeling, which proceeds from assumed values of model parameters of the subject of study to simulated observables, the input in inverse modeling consists of measured observables, and the output consists of retrieved model parameters. Also, in contrast to forward modeling, where modeling of observables includes two major components – forward problem (conversion of model parameters into the forward solution) and instrument model (conversion of the forward solution into simulated observables) – inverse modeling consists of only one major component, inverse problem, which converts the measured observables into estimates of the state vector, a subset of model parameters subject to the retrieval.

The value of any physical measurement – direct, like in situ measurements, or indirect, like in geophysical retrievals – has to be associated with its uncertainty, measurement error. Thus, it is highly desirable to retrieve the values of model parameters along with their retrieval errors (aka error bars). The existing methods of solution of the inverse problem deliver the solution along with its retrieval errors derived from errors of measurements driven by the instrument design.

Formulation and information analysis of the inverse problem

As stated in the accompanying article “Forward Modeling,” the quantitative relation between model parameters \( \mathbf{ p} \), and simulated observables \( \mathbf{ R} \) is, in general, nonlinear. The inverse problem is formulated, essentially, as inversion of this relation. Using high-level notations, we have:

$$ \mathbf{ F}(\mathbf{ p}) = \mathbf{ R} $$
(1)

It is assumed that all continuous model parameters are represented by their values on suitable grids of arguments and, correspondingly, \( \mathbf{ F} \) is a vector function of a vector argument.

The necessary premise for the formulation of the inverse problem is the availability of a computational procedure, which, for a given set of n model parameters described by a vector \( \mathbf{ p} \), provides a capability to compute a set of m observables described by a vector \( \mathbf{ R} \) along with their partial derivatives \( {{{\partial \mathbf{ R}}} \left/ {{\partial \mathbf{ p}}} \right.} \) treated as a single m × n matrix of sensitivities \( \mathbf{ K} \). Another premise, which is necessary for the error analysis of the resulting inverse problem, as well as for its solution afterward, is knowledge of measurement errors – uncertainties of measured observables described by an m − vector ε. Computation of the corresponding covariance matrix is a necessary attribute of the algorithm of solution of the inverse problems in geophysical retrievals.

Differences between measured and simulated observables are commonly referred to as residuals. The nonlinear inverse problem is solved iteratively, and at each iteration, the solution of the corresponding linearized inverse problem yields corrections to the current approximation of the set of model parameters. These corrections are commonly referred to as increments.

The nonlinear inverse problem is solved iteratively. The corresponding linearized inverse problem in high-level notations has the form:

$$ \mathbf{ Kx} = \mathbf{ y} $$
(2)

where \( \mathbf{ y}={{\mathbf{ R}}_{obs }} - \mathbf{ R} \) is the m − vector of residuals and \( \mathbf{ x} = {{\mathbf{ p}}_{next }} - \mathbf{ p} \) is the n − vector of increments.

Based on the sensitivity matrix \( \mathbf{ K} \), a preliminary information analysis of the inverse problem can be conducted. In general, not all elements of the measurement set are independent from each other, and, as a result, the number of linearly independent rows of \( \mathbf{ K} \) may be less than the total number of the rows. The number rm of linearly independent rows is referred to as the rank of the matrix \( \mathbf{ K} \).

Rank of matrix\( \mathbf{ K} \) can be determined using a technique called singular value decomposition. An eigenvalue problem is considered in the form:

$$\begin{array}{ll}\left( \begin{array}{ll} \mathbf{ 0} & \mathbf{ K} \cr {{{\mathbf{ K}}^T}} & \mathbf{ 0} \cr \end{array} \right)\,\,\left( \begin{array}{ll} \mathbf{ u} \cr \mathbf{ v} \cr \end{array} \right) = \,\,\lambda \left( \begin{array}{ll} \mathbf{ u} \cr \mathbf{ v} \cr \end{array} \right)\,\,\,\mathrm{ or},\,\mathrm{ in}\,\mathrm{ explicit}\,\mathrm{ form}:\\ \begin{array}{ll} {\mathbf{ K}\mathbf{ v} = \lambda \mathbf{ u}} \cr {{{\mathbf{ K}}^T}\mathbf{ u}=\lambda \mathbf{ v}} \cr \end{array} \end{array}$$
(3)

Here \( \mathbf{ u} \) is an m − vector in the space of measurement vectors and \( \mathbf{ v} \) is an n − vector in the space of state vectors. This problem can be rewritten in the form of two eigenvalue problems for matrices \( {{\mathbf{ K}}^T}\mathbf{ K}(n \times n) \) and \( \mathbf{ K}{{\mathbf{ K}}^T}(m \times m) \):

$$ {{\mathbf{ K}}^T}\mathbf{ Kv} = {\lambda^2}\bf v $$
(4)
$$ \mathbf{ K}{{\mathbf{ K}}^T}\mathbf{ u} = {\lambda^2}\mathbf{ u} $$
(5)

which are solved by standard methods. The numbers of resulting nonzero eigenvalues of the matrices \( {{\mathbf{ K}}^T}\mathbf{ K} \) and \( \mathbf{ K}{{\mathbf{ K}}^T} \) coincide and yield the rank p of the matrix \( \mathbf{ K} \).

Number of degrees of freedom (DOF) for signal ds is another quantitative measure of quality of the measurements selected for the retrieval. It indicates how many independent quantities can be measured. In general, it is not an integer number:

$$ {d_{\mathrm{ s}}} = \mathrm{ tr}\left( {{{\boldsymbol{\Lambda}}^2}{{{({{\boldsymbol{\Lambda}}^2}+{{\mathbf{ I}}_m})}}^{-1 }}} \right) = \sum\limits_{i=1}^m {\frac{{\lambda_i^2}}{{\lambda_i^2 + 1}}} $$
(6)

An indicator complementary to ds is the number of degrees of freedom for noise dn:

$$ {d_{\mathrm{ n}}} = \mathrm{ tr}\,\left( {{{{({{\boldsymbol{\Lambda}}^2} + {{\mathbf{ I}}_m})}}^{-1 }}} \right) = \sum\limits_{i=1}^m {\frac{1}{{\lambda_i^2 + 1}}} $$
(7)

Shannon information content H of measurements used in retrievals with a given matrix \( \mathbf{ K} \) can be estimated assuming Gaussian noise:

$$ H = \frac{1}{2}\,\sum\limits_{i=1}^m {\ln \left( {1 + \lambda_i^2} \right)} $$
(8)

Solution of Inverse Problems

Unlike solution of forward problems, which, in principle, can be accomplished with any numerical accuracy, solution of inverse problems is associated with inherent uncertainty, for two general reasons: presence of measurement errors, and the indirect, inference-like nature of the process of retrieval, which is implemented by solution of the inverse problem.

The presence of measurement errors forces to consider the residuals vector \( \mathbf{ y} \) as a random quantity. It is customary to replace \( \mathbf{ y}\to \mathbf{ y}+\boldsymbol{\varepsilon} \), where the vector of measurement errors \( \boldsymbol{\varepsilon} \) is assumed to obey the Gaussian distribution with an average (mathematical expectation) \( \bar{\boldsymbol{\varepsilon}} =0 \) and a covariance matrix \( {{\mathbf{ C}}_{\mathbf{ y}}} \). The corresponding probability distribution function (PDF) has the form:

$$ P(\boldsymbol{\varepsilon} )\propto \exp \left( {-\frac{1}{2}\,{\boldsymbol{\varepsilon}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\,\boldsymbol{\varepsilon} } \right) $$
(9)

where the symbol “\( \propto \)” means proportionality. In the case of non-correlating measurement errors \( {s_i}(\mathrm{ i} = 1,\ldots m) \) Eq. 9 can be rewritten as

$$ P(\boldsymbol{\varepsilon} ) \propto \exp\,\left( {-\frac{1}{2}\,\sum\limits_{i=1}^m {\frac{{\varepsilon_i^2}}{{s_i^2}}} } \right) $$
(10)

With measurement errors included explicitly, the inverse problem Eq. 2 takes the form

$$ \mathbf{ Kx}=\mathbf{ y}+\boldsymbol{\varepsilon} $$
(11)

where the vector \( \mathbf{ x} \) is treated as a random quantity, which also obeys the Gaussian distribution with the PDF:

$$ P(\mathbf{ x}) \propto \exp \left( {-\frac{1}{2}\,{{{(\mathbf{ x}-\bar{\mathbf{ x}} )}}^T}\mathbf{ C}_{\mathbf{ x}}^{-1 }(\mathbf{ x}-\bar{\mathbf{ x}} )} \right) $$
(12)

The solution of the inverse problem Eq. 2 is sought as the average \( \bar{\mathbf{ x}} \) of this distribution, whereas the covariance matrix \( {{\mathbf{ C}}_{\mathbf{ x}}} \) describes uncertainty (retrieval errors) of this solution.

By definition of the covariance matrix, \( {{\mathbf{ C}}_{\mathbf{ x}}} \) is symmetric, i.e., \( {{\mathbf{ C}}_{\mathbf{ x}}} = \mathbf{ C}_{\mathbf{ x}}^T \), and \( \mathbf{ C}_{\mathbf{ x}}^{-1} = {{\left( {\mathbf{ C}_{\mathbf{ x}}^{-1 }} \right)}^T} \). Then, the PDF Eq. 12 can be transformed as

$$ P(\mathbf{ x}) \propto \exp \left( {-\frac{1}{2}\,{{\mathbf{ x}}^T}\mathbf{ Bx} + {{\mathbf{ b}}^T}\mathbf{ x}} \right) $$
(13)

where

$$ \mathbf{ B} = \mathbf{ C}_{\mathbf{ x}}^{-1 } $$
(14)

and

$$ \mathbf{ b} = \mathbf{ B}\bar{\mathbf{ x}} $$
(15)

On the other hand, the PDF \( P(\mathrm{ x}) \) can be derived from the PDF \( P(\boldsymbol{\varepsilon} ) \) Eq. 9 by the substitution \( \boldsymbol{\varepsilon} = \mathbf{ Kx} - \mathbf{ y} \). Essentially, this is an a posteriori PDF for the random quantity \( \mathbf{ x} \) with quantity \( \mathbf{ y} \) known:

$$ P\left( {\mathbf{ x}\left| \mathbf{ y} \right.} \right) \propto \exp\,\left( {-\frac{1}{2}\,{{{(\mathbf{ Kx} - \mathbf{ y})}}^T}\,\mathbf{ C}_y^{-1 }(\mathbf{ Kx}-\mathbf{ y})} \right) $$
(16)

We have:

$$\begin{array}{ll}{{(\mathbf{ Kx} - \mathbf{ y})}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\,(\mathbf{ Kx} - \mathbf{ y})\cr = {{(\mathbf{ Kx})}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ Kx} - {{(\mathbf{ Kx})}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\,\mathbf{ y} - {{\mathbf{ y}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ Kx} + {{\mathbf{ y}}^T}\mathbf{ y}\end{array}$$
(17)

Note that all terms in Eq. 17 are scalars, and thus are equal to themselves transposed. In particular

$$ {{\mathbf{ y}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ Kx} = {{(\mathbf{ Kx})}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ y} $$

Observing also that the term \( {{\mathbf{ y}}^T}\mathbf{ y} \) in Eq. 17 does not depend on \( \mathbf{ x} \) we can rewrite Eq. 16 as:

$$ P\left( {\mathbf{ x}\left| \mathbf{ y} \right.} \right) \propto \exp \left( {-\frac{1}{2}\,{{\mathbf{ x}}^T}{{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ Kx} + {{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ y}} \right) $$
(18)

Comparing Eq. 18 with the general form of PDF Eq. 13, we obtain:

$$ \mathbf{ B} = {{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ K} $$
(19)

and

$$ \mathbf{ b} = {{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ y} $$
(20)

Comparison with the equality Eq. 15 yields a matrix equation for the solution \( \bar{\mathbf{ x}} \) of the inverse problem Eq. 2:

$$ {{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ K}\,\bar{\mathbf{ x}} = {{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ y} $$
(21)

along with the expression for the covariance matrix of this solution:

$$ {{\mathbf{ C}}_{\mathbf{ x}}} = {{\left( {{{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\,\mathbf{ K}} \right)}^{-1 }} $$
(22)

In the particular case of non-correlated equal measurement errors, when sis, the matrix equation Eq. 21 and expression for the covariance matrix of its solution Eq. 22 reduce to:

$$ {{\mathbf{ K}}^T}\mathbf{ K}\bar{\mathbf{ x}} = {{\mathbf{ K}}^T}\mathbf{ y} $$
(23)

and

$$ {{\mathbf{ C}}_{\mathbf{ x}}} = {s^2}{{({{\mathbf{ K}}^T}\mathbf{ K})}^{-1 }} $$
(24)

If the PDF of \( \mathbf{ x} \) can be sufficiently constrained based on some additional, a priori information, then the number of measurements m can be less than the dimension n of \( \mathbf{ x} \). This is the case, e.g., in atmospheric remote sensing, when information about correlation between values of the atmospheric parameter to be retrieved can be invoked in the form of a covariance matrix

$$ {{\mathbf{ C}}_{{\mathbf{ x},a}}} = \overline{{\mathbf{ x}{{\mathbf{ x}}^T}}} $$
(25)

Then the a priori PDF of \( \mathbf{ x} \) can be represented in the form of the corresponding Gaussian distribution

$$ {P_a}(\mathbf{ x}) \propto \exp \left( {-\frac{1}{2}\,{{\mathbf{ x}}^T}\mathbf{ C}_{{\mathbf{ x},a}}^{-1}\,\mathbf{ x}} \right) $$
(26)

and the resulting constrained PDF is a product of PDFs Eqs. 18 and 26:

$$\begin{array}{ll}{P_c}\left( {\mathbf{ x}\left| \mathbf{ y} \right.} \right) = P\left( {\mathbf{ x}\left| \mathbf{ y} \right.} \right) \cdot {P_a}(\mathbf{ x})\\ \propto \exp \left( {-\frac{1}{2}\,{{\mathbf{ x}}^T}\left( {{{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ K} + \mathbf{ C}_{{\mathbf{ x},a}}^{-1 }} \right)\,\mathbf{ x} + {{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\,\mathbf{ y}} \right)\end{array}$$
(27)

Correspondingly, the matrix equation for the solution \( \bar{\mathbf{ x}} \) of the regularized inverse problem Eq. 2 and expression for the covariance matrix \( {{\mathbf{ C}}_{\mathbf{ x}}} \) of this solution, respectively, take the form:

$$ \left( {{{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\,\mathbf{ K} + \mathbf{ C}_{{\mathbf{ x},a}}^{-1 }} \right)\,\bar{\mathbf{ x}} = {{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ y} $$
(28)
$$ {{\mathbf{ C}}_{\mathbf{ x}}} = {{\left( {{{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ K} + \mathbf{ C}_{{\mathbf{ x},a}}^{-1 }} \right)\;\;}^{-1 }} $$
(29)

In practice, the number of measurements m in atmospheric remote sensing can be larger, and even much larger than n, e.g., in retrievals from data of high-resolution spectral measurements with the Tropospheric Emissions Spectrometer (TES) flown on the Aura spacecraft. But the rank r of the corresponding sensitivity matrix \( \mathbf{ K}(m \times n) \) is of the order of the number of atmospheric scale heights over the altitude range covered by retrievals. Of course, the number of necessary grid points n is substantially larger, and, thus, invoking of a priori information is necessary.

Recall that in general the inverse problem \( \mathbf{ F}(\mathbf{ p})=\mathbf{ R} \) is nonlinear and it has to be solved using the linearized inverse problem \( \mathbf{ Kx} = \mathbf{ y} \). If the first guess of the state vector \( {{\mathbf{ p}}_0} \) is too far from the solution, then a straightforward application of the above approach at each iteration may not converge to the solution. A more robust approach named Levenberg–Marquardt method, which has a long history of development (see Rodgers (2000) for a review), makes it possible to cope with this difficulty. At each iteration, the size of the step \( \bar{\mathbf{ x}} \) is regulated by introducing an additional matrix term proportional to a scalar γ, which is chosen based on some semiempirical rules. If no regularization is necessary, this matrix term is \( \gamma \mathbf{ D} \), where \( \mathbf{ D} \) is a diagonal scaling matrix with elements corresponding to magnitudes and dimensions of the elements of the state vector \( \mathbf{ p} \). If the vector \( \mathbf{ p} \) consists of elements of same magnitudes and dimensions, then \( \mathbf{ D}\) mathrm reduces to the identity matrix I. If regularization is necessary, Rodgers (2000) suggests this term in the form \( \gamma \mathbf{ C}_{{\mathbf{ x},a}}^{-1 } \). Thus, the step \( \bar{\mathbf{ x}} \) is sought as a solution of the above matrix equations modified accordingly:

$$ \left( {{{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\,\mathbf{ K} + \gamma \mathbf{ D}} \right)\,\bar{\mathbf{ x}} = {{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ y} $$
(30)

or

$$ \left( {{{\mathbf{ K}}^T}{{\mathbf{ C}}_{\mathbf{ y}}^{-1 }}\mathbf{ K} + (1+\gamma )\mathbf{ C}\,_{{\mathbf{ x},a}}^{-1 }} \right)\,\bar{\mathbf{ x}} = {{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\,\mathbf{ y} $$
(31)

Error analysis of inverse problems

The covariance matrix \( {{\mathbf{ C}}_{\mathbf{ x}}} \) provides a straightforward way for estimations of the retrieval errors (error bars) of retrieved profiles defined as variances derived from the covariance matrix

$$ {\sigma_j} = \sqrt{{\overline{{{{{({x_j} - {{\bar{x}}_j})}}^2}}}}} = \sqrt{{{{{({{\mathbf{ C}}_{\mathbf{ x}}})}}_{jj }}}},\,(j=1,\,\ldots\,n) $$
(32)

When additional information about the solution in the form of the a priori covariance matrix \( {{\mathbf{ C}}_{{\mathbf{ x},a}}} \) is used, then additional, smoothing errors emerge. Whereas error bars represent uncertainties of retrieved values, the smoothing errors represent uncertainties of benchmarking of retrieved values, e.g., retrieved atmospheric profiles, to specific reference arguments, e.g., altitude in the atmosphere. This smoothing error is represented by the n × n matrix \( \mathbf{ A} = {{{\partial \bar{\mathbf{ x}}}} \left/ {{\partial \mathbf{ x}}} \right.} \) called the averaging kernel. Using the substitution \( \mathbf{ y} = \mathbf{ Kx} \) in the right-hand term of the regularized inverse problem in Eq. 28, and solving it for \( \bar{\mathbf{ x}} \), we have:

$$ \bar{\mathbf{ x}} = {{\left( {{{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\,\mathbf{ K} + \mathbf{ C}_{{\mathbf{ x},a}}^{-1 }} \right)}^{-1 }}\,{{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ Kx} $$

Thus the averaging kernel \( \mathbf{ A} \) has the form:

$$ \mathbf{ A} = {{\left( {{{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ K} + \mathbf{ C}_{{\mathbf{ x},a}}^{-1 }} \right)}^{-1 }}{{\mathbf{ K}}^T}\mathbf{ C}_{\mathbf{ y}}^{-1}\mathbf{ K} $$
(33)

It should be emphasized that the accuracy of estimation of the smoothing error represented by the matrix \( \mathbf{ A} \) depends on the accuracy of the knowledge of a priori covariance matrix \( {{\mathbf{ C}}_{{\mathbf{ x},a}}} \). On the other hand, if the solution does not need to be constrained, then a priori PDF \( P(\mathbf{ x}) \propto 1 \) and correspondingly \( \mathbf{ C}_{{\mathbf{ x},a}}^{-1 } = 0 \), and the averaging kernel \( \mathbf{ A} \) reduces to the n × n identity matrix \( {{\mathbf{ I}}_n} \).

Finally, retrieval errors may be associated with uncertainties in parameters of the forward model, which are not being retrieved and have to be assumed based on some independent information. General analysis of associated uncertainties is given in the monograph of Clive Rodgers (2000).

Summary

Formulation of the inverse problem, analysis of its information content, choice of the method of its solution, and analysis of resulting retrieval errors represent main phases of development of the inversion algorithm in geophysical retrievals. The key premise is availability of the adequate forward model, which, for a given set of model parameters, provides accurate values of simulated observables along with the matrix of sensitivities of observables with respect to elements of the state vector. This makes it possible to formulate the linearized inverse problem. If this problem is ill posed, then the algorithm of its solution involves regularization using some a priori information about this solution. The solution is sought as an average over a probability distribution, which is driven by the probability distribution of measurement errors around the measured observables and by the matrix of sensitivities provided by the forward model. The variance of the probability distribution of the solution provides the estimate of retrieval errors. If necessary, this estimate needs to be complemented by estimates of errors due to uncertainties of the model parameters outside the state vector and due to the approximate nature of the forward model.

There is a vast amount of literature describing formulation and solution of inverse problems in geophysical retrievals. The reader is encouraged, once he/she gets a big picture, to do an independent search for this literature and to surf the Internet for this purpose. The monograph by Clive Rodgers (2000) is a rich source of information on formulation and solution of inverse problems in atmospheric remote sensing, which, as a rule, are ill-posed and need regularization. Two other monographs provide valuable information on various aspects of practical implementation of algorithms of solution of inverse problems.