Keywords

1 Introduction

Geodetic adjustment theory has been developed by assuming the following model of measurements:

$$\displaystyle{ \left.\begin{array}{l} \mathbf{y} = \mathbf{f}(\boldsymbol{\beta })+\boldsymbol{\epsilon } \\ E(\mathbf{y}) = \mathbf{f}(\boldsymbol{\beta }) \\ D(\mathbf{y}) = \mathbf{W}^{-1}\sigma ^{2} \end{array} \right \}, }$$
(1)

where y is a vector of measurements, \(\mathbf{f}(\boldsymbol{\beta })\) is the mathematical or functional model which describes the physical or geometrical relationships between the measurements, \(\boldsymbol{\beta }\) is the real-valued vector of unknown parameters to be estimated, \(\boldsymbol{\epsilon }\) is the random error vector of the measurements. Very often, we also assume that \(\boldsymbol{\epsilon }\) is of zero mean and variance-covariance matrix W −1 σ 2, with W being a given weight matrix of measurements and σ 2 an unknown positive scalar (the variance of unit weight), E(⋅ ) and D(⋅ ) stand for the expectation and variance-covariance matrix of the measurements, respectively. The most important feature of adjustment model (1) is that the random errors \(\boldsymbol{\epsilon }\) are added to the functional model \(\mathbf{f}(\boldsymbol{\beta })\). In other words, the sizes or magnitudes of random errors are independent of the true values of measured quantities.

However, in geodetic practice, we know that this assumption is not necessarily always true. For example, we know that the accuracy of an EDM, VLBI and/or GPS baseline is proportional to the length of the baseline itself, namely,

$$\displaystyle{ \sigma _{L}^{2} = a^{2} + b^{2}L^{2}, }$$
(2)

(see e.g., Ewing and Mitchell 1970; MacDoran 1979; Seeber 2003; Petrov et al. 2010), where both a and b are constants. Physically, the constant a may be more specific to the local environment of stations and b more to the path of propagation of light/electronic waves (see e.g., Xu et al. 2013). From the statistical point of view, the accuracy formula (2) is equivalent to the following representation of random errors:

$$\displaystyle{ \epsilon _{L} =\epsilon _{a} + L\,\epsilon _{b}, }$$
(3)

where ε L is the random error of L, and ε a and ε b stand for the random errors of mean zero and variances a 2 and b 2, respectively, if ε a and ε b are assumed to be statistically independent. The error representation (3) clearly indicates that the random error ε L is proportional to the measured baseline. In geodetic practice, both ε a and ε b are generally assumed to be normally distributed. For other modern space observation technology such as SLR (see e.g. Pearlman et al. 2002; Seeber 2003) and DORIS (see e.g. Willis et al. 2010), since they essentially utilize electromagnetic waves for observation and go through the same physical media as VLBI and GPS, we conjecture that errors of SLR and DORIS baselines should also show multiplicative error behavior, which will be a topic of research in the future.

Modern geodetic technology also fully utilizes coherent imaging systems such as Synthetic Aperture Radar (SAR) images and Light Detection And Ranging (LiDAR). As is well known, SAR images are contaminated by speckle noise (see e.g. Goodman 1976; Ulaby et al. 1986; Oliver 1991; López-Martínez et al. 2011) and the corresponding observational equation can be represented as follows:

$$\displaystyle{ y_{ij} = s_{ij}(1 +\epsilon _{ij}), }$$
(4)

where y i j is the measurement, s i j the true (or noiseless) value of the signal and ε i j the random error with zero mean and variance σ 2. Intensity measurements of SAR type are usually assumed to have a gamma-distribution. Other imaging systems would also produce Gaussian multiplicative random errors (see e.g., Tian et al. 2001). Range measurements of LiDAR are also shown to be contaminated by multiplicative speckle errors (see e.g., Flamant et al. 1984; Wang and Pruitt 1992; Hill et al. 2003).

The paper is organized as follows. Section 2 will first define mixed additive and multiplicative error models. In Sects. 3 and 4, we will discuss two important classes of methods for parameter estimation in mixed additive and multiplicative error models, namely, quasi-likelihood and least squares (LS). Computational algorithms will be briefly given. If the reader is interested in other methods such as cumulant moment and variational methods, he may refer to, Swami (1994) and Aubert and Aujol (2008), for example. Actually, variational methods assume gamma-distributions for intensity measurements and then add an extra smoothness or regularized term to the log-likelihood of the gamma-distributions for de-speckling or de-noising multiplicative random errors, as can be seen in Xu (1999) and Aubert and Aujol (2008). We will then extend the bias-corrected LS method to the case with prior information in Sect. 5. Finally, we will then finish our paper with some concluding remarks in Sect. 6.

2 Mixed Additive and Multiplicative Error Models

We will now extend the conventional Gauss-Markov adjustment model (1) to account for both additive and multiplicative errors. The new starting model of adjustment becomes

$$\displaystyle{ \left.\begin{array}{l} \mathbf{y} = \mathbf{f}(\boldsymbol{\beta }) \odot (\mathbf{1} +\boldsymbol{\epsilon } _{m}) +\boldsymbol{\epsilon } _{a} \\ E(\mathbf{y}) = \mathbf{f}(\boldsymbol{\beta }) \\ E(\boldsymbol{\epsilon }_{m}) = \mathbf{0},\hspace{2.84526pt} D(\boldsymbol{\epsilon }_{m}) =\boldsymbol{ \Sigma }_{m} \\ E(\boldsymbol{\epsilon }_{a}) = \mathbf{0},\hspace{2.84526pt} D(\boldsymbol{\epsilon }_{a}) =\boldsymbol{ \Sigma }_{a} \end{array} \right \}, }$$
(5)

where y and \(\mathbf{f}(\boldsymbol{\beta })\) have been defined in (1), ⊙ stands for the Hadamard product of matrices and/or vectors, 1 for the vector with all its elements being equal to unity, both of the random errors \(\boldsymbol{\epsilon }_{m}\) and \(\boldsymbol{\epsilon }_{a}\) are of mean zero and variance-covariance matrices \(\boldsymbol{\Sigma }_{m}\) and \(\boldsymbol{\Sigma }_{a}\), respectively. Since the random vector \(\boldsymbol{\epsilon }_{m}\) is multiplied to the true value of measurements \(\mathbf{f}(\boldsymbol{\beta })\), \(\boldsymbol{\epsilon }_{m}\) has been naturally called multiplicative errors. As in the case of the additive error model (1), \(\boldsymbol{\epsilon }_{a}\) in (5) is additive. Accordingly, \(\boldsymbol{\epsilon }_{a}\) is called additive errors. To illustrate additive and multiplicative random errors, we simulate the random errors of baselines, with the constants a and b set to 0.05 m and 10 ppm, respectively. The generated errors with the lengths of baselines are illustrated in Fig. 1. It is obvious from the simulated random errors in the upper panel of Fig. 1 that the additive random errors uniformly scatter over different lengths of baselines. The multiplicative errors in the middle panel of the same figure show a clear trend of fan shape, with the amplitudes of errors increasing with the increase of lengths of baselines.

Fig. 1
figure 1

Illustration of the additive and multiplicative random errors of baselines: the upper panel – the additive errors; the middle panel – the multiplicative errors; and the lower panel – the mixed additive and multiplicative errors

Assuming that \(\boldsymbol{\epsilon }_{\mathbf{m}}\) and \(\boldsymbol{\epsilon }_{\mathbf{a}}\) are statistically independent and applying the error propagation law to each of the measurements y, we have

$$\displaystyle{ \sigma _{y_{i}}^{2} = f_{ i}^{2}(\boldsymbol{\beta })\sigma _{ mi}^{2} +\sigma _{ ai}^{2}, }$$
(6)

where \(\sigma _{y_{i}}^{2}\) is the variance of the ith measurement of y, and \(\sigma _{mi}^{2}\) and \(\sigma _{ai}^{2}\) are the ith diagonal elements of \(\boldsymbol{\Sigma }_{m}\) and \(\boldsymbol{\Sigma }_{a}\), respectively. It is obvious from (6) that the larger the true value of measurement \(f_{i}(\boldsymbol{\beta })\), the noisier the corresponding measurement y i . When applying the same error propagation law to the measurement vector y, we can obtain the variance-covariance matrix of the measurements y as follows:

$$\displaystyle{ \boldsymbol{\Sigma }_{y}(\boldsymbol{\beta }) = \mathbf{D}_{f\beta }\boldsymbol{\Sigma }_{m}\mathbf{D}_{f\beta } +\boldsymbol{ \Sigma }_{a}, }$$
(7)

where D f β is a diagonal matrix with its ith diagonal element being equal to \(f_{i}(\boldsymbol{\beta })\). The elements of \(\boldsymbol{\Sigma }_{y}(\boldsymbol{\beta })\) are obviously the functions of the parameters \(\boldsymbol{\beta }\). For simplicity, we will use \(\boldsymbol{\Sigma }_{y}\) to denote the variance-covariance of y. If necessary, we can also readily take the correlation between \(\boldsymbol{\epsilon }_{m}\) and \(\boldsymbol{\epsilon }_{a}\) into account, which will not be discussed in this paper, however.

If \(\mathbf{f}(\boldsymbol{\beta })\) is linear, then the error model (5) becomes

$$\displaystyle{ \left.\begin{array}{l} \mathbf{y} = (\mathbf{A}\boldsymbol{\beta }) \odot (\mathbf{1} +\boldsymbol{\epsilon } _{m}) +\boldsymbol{\epsilon } _{a} \\ E(\mathbf{y}) = \mathbf{A}\boldsymbol{\beta } \\ E(\boldsymbol{\epsilon }_{m}) = \mathbf{0},\hspace{2.84526pt} D(\boldsymbol{\epsilon }_{m}) =\boldsymbol{ \Sigma }_{m} \\ E(\boldsymbol{\epsilon }_{a}) = \mathbf{0},\hspace{2.84526pt} D(\boldsymbol{\epsilon }_{a}) =\boldsymbol{ \Sigma }_{a} \end{array} \right \}, }$$
(8)

where A is a given design matrix, which will be assumed to be of full rank. The model (5) will be called mixed additive and multiplicative error models in the remainder of this paper. In accordance with (8), we can rewrite D f β as D a β , whose diagonal elements are equal to \(\mathbf{a}_{i}\boldsymbol{\beta }\), where a i is the ith row of the matrix A.

3 The Quasi-Likelihood Method

The quasi-likelihood method was first proposed by Wedderburn (1974). It has since become a statistical method to estimate the parameters in the model of type (4) and widely applied in many areas of science and engineering. Actually, the multiplicative error model (4) has been better known in statistics as the generalized linear model and well documented in statistical books (see e.g. McCullagh and Nelder 1989; Heyde 1997, chapter  5.3), if the function of signal s i j can be represented linearly by a number of unknown parameters \(\boldsymbol{\beta }\).

Wedderburn (1974) started with a set of independent measurements y i  (i = 1, 2, , n), with expectations \(\overline{y}_{i}\) and variances \(\sigma _{i}^{2}(\overline{y})\), and then defined the quasi-likelihood function \(\text{QLF}(y_{i},\overline{y}_{i})\) as follows:

$$\displaystyle{ \frac{\partial \text{QLF}(y_{i},\overline{y}_{i})} {\partial \overline{y}_{i}} = \frac{y_{i} -\overline{y}_{i}} {\sigma _{i}^{2}(\overline{y}_{i})}, }$$
(9)

where the variance of each y i is assumed to be the function of \(\overline{y}_{i}\). By letting the expression (9) equal zero over all the measurements y i , Wedderburn (1974) was then able to estimate the unknown parameters from the measurements y. If \(\overline{y}_{i}\) can further be represented linearly by a number of unknown parameters \(\boldsymbol{\beta }\) and if the measurements y are assumed to be correlated, then (9) can be rewritten as follows:

$$\displaystyle{ \frac{\partial \text{QLF}(\boldsymbol{\beta })} {\partial \boldsymbol{\beta }} = \mathbf{A}^{T}\boldsymbol{\Sigma }_{ y}^{-1}(\boldsymbol{\beta })(\mathbf{y} -\mathbf{A}\boldsymbol{\beta }), }$$
(10)

where \(\boldsymbol{\Sigma }_{y}(\boldsymbol{\beta })\) is the variance-covariance matrix of y whose elements are all the functions of the unknown parameters \(\boldsymbol{\beta }\). The quasi-likelihood function is proved to be equal to the maximum likelihood function, if the distribution of y i is exponential. In general, quasi-likelihood is different from maximum likelihood, however.

Although Wedderburn (1974) defined the quasi-likelihood function \(\text{QLF}(y_{i},\overline{y}_{i})\) through the differential equation (9), \(\text{QLF}(y_{i},\overline{y}_{i})\) is really not required for parameter estimation. Actually, all what we need for parameter estimation is the expression on the right hand side of (9), which is completely defined by y i , its expectation \(\overline{y}_{i}\) and its variance \(\sigma _{i}^{2}(\overline{y}_{i})\). When the quasi-likelihood method is applied to the mixed additive and multiplicative error model (8), we have the system of normal equations:

$$\displaystyle{ \mathbf{A}^{T}\boldsymbol{\Sigma }_{ y}^{-1}(\hat{\boldsymbol{\beta }}_{ ql})(\mathbf{y} -\mathbf{A}\hat{\boldsymbol{\beta }}_{ql}) = \mathbf{0}, }$$
(11)

where \(\hat{\boldsymbol{\beta }}_{ql}\) stands for the quasi-likelihood estimate of \(\boldsymbol{\beta }\). Obviously, the system of normal equations (11) is nonlinear and can generally be solved by using numerical methods. Very often, one can use the Gauss-Newton method to find the solution to (11). The quasi-likelihood estimator \(\hat{\boldsymbol{\beta }}_{ql}\) is asymptotically unbiased (see e.g., McCullagh 1983) and its variance-covariance matrix, denoted by \(D(\hat{\boldsymbol{\beta }}_{ql})\), is then given approximately by

$$\displaystyle{ D(\hat{\boldsymbol{\beta }}_{ql}) = (\mathbf{A}^{T}\boldsymbol{\Sigma }_{ y}^{-1}\mathbf{A})^{-1}. }$$
(12)

It is seen from (11) that the equation system (11) has completely defined the quasi-likelihood estimator \(\hat{\boldsymbol{\beta }}_{ql}\), no matter whether we can or cannot solve for the quasi-likelihood function \(\text{QLF}(\mathbf{y},\boldsymbol{\beta })\) through the differential equation of type (9). The system of normal equations clearly indicates that an estimator can simply be constructed through a system of equations. As a result of this, the system of equations like (11) has been called generalized estimating equations (see e.g., Crowder 1995; Desmond 1997; Heyde 1997; Kukusha et al. 2010; Fitzmauric 1995).

4 Least-Squares-Based Methods

Although quasi-likelihood has become a standard method for parameter estimation in multiplicative error models, its associated quasi-likelihood function may hardly be derived for a general nonlinear function \(\mathbf{f}(\boldsymbol{\beta })\). Even if such a function can indeed be found by solving the corresponding differential equations, it may not be connected with any physically meaningful distribution function. As a result, Xu and Shimada (2000) alternatively proposed LS-based methods to estimate the unknown parameters in the mixed additive and multiplicative error model (8). In this section, we will briefly discuss the ordinary LS, the weighted LS and bias-corrected weighted LS methods. If the reader is interested in the error analysis of adjusted measurements and the corrections of measurements and/or the estimation of the variance of unit weight in multiplicative error models, she/he is referred to Shi et al. (2014).

4.1 The Ordinary LS Method

When applying the ordinary LS method to estimate the unknown parameters \(\boldsymbol{\beta }\) in the model (8), we have the following optimization objective function:

$$\displaystyle{ \text{min:}\hspace{5.69054pt} F_{1}(\boldsymbol{\beta }) = (\mathbf{y} -\mathbf{A}\boldsymbol{\beta })^{T}(\mathbf{y} -\mathbf{A}\boldsymbol{\beta }). }$$
(13)

The solution to (13) is the ordinary LS estimate of \(\boldsymbol{\beta }\), which is denoted by \(\hat{\boldsymbol{\beta }}_{LS}\) and given by

$$\displaystyle{ \hat{\boldsymbol{\beta }}_{LS} = (\mathbf{A}^{T}\mathbf{A})^{-1}\mathbf{A}^{T}\mathbf{y}. }$$
(14)

The variance-covariance matrix of \(\hat{\boldsymbol{\beta }}_{LS}\) is then given as follows:

$$\displaystyle{ D(\hat{\boldsymbol{\beta }}_{LS}) = (\mathbf{A}^{T}\mathbf{A})^{-1}\mathbf{A}^{T}\boldsymbol{\Sigma }_{ y}\mathbf{A}(\mathbf{A}^{T}\mathbf{A})^{-1}, }$$
(15)

where \(\boldsymbol{\Sigma }_{y}\) is the variance-covariance matrix of the measurements y.

If we assume that the signal within a small area of pixels in a coherent image with multiplicative noises is identical, namely, s i j remains unchanged in such a small area, then all the corresponding measurements y i j are of the same variances. In other words, the weights of measurements y i j are all equal to each other. As a result, the estimate of s i j is simply equal to the mean value of all y i j in the area. Actually, it is exactly the local mean filter for de-noising images contaminated by multiplicative noises.

4.2 The Weighted LS Method

When applying the weighted LS method to the model (8), we have the following minimization problem:

$$\displaystyle{ \text{min:}\hspace{5.69054pt} F_{2}(\boldsymbol{\beta }) = (\mathbf{y} -\mathbf{A}\boldsymbol{\beta })^{T}\boldsymbol{\Sigma }_{ y}^{-1}(\mathbf{y} -\mathbf{A}\boldsymbol{\beta }). }$$
(16)

To derive the weighted LS estimate of \(\boldsymbol{\beta }\), we can differentiate \(F_{2}(\boldsymbol{\beta })\) of (16) with respect to \(\boldsymbol{\beta }\) and let it be equal to zero. After some lengthy derivations, we can finally obtain the system of normal equations as follows:

$$\displaystyle{ (\mathbf{A}^{T}\hat{\boldsymbol{\Sigma }}_{ y}^{-1}\mathbf{A})\hat{\boldsymbol{\beta }} -\mathbf{A}^{T}\hat{\boldsymbol{\Sigma }}_{ y}^{-1}\mathbf{y} -\mathbf{G}_{ 1}(\mathbf{A}\hat{\boldsymbol{\beta }} -\mathbf{y}) = \mathbf{0}, }$$
(17)

where

$$\displaystyle{\mathbf{G}_{1} = \left [\begin{array}{c} (\mathbf{A}\hat{\boldsymbol{\beta }} -\mathbf{y})^{T}\hat{\boldsymbol{\Sigma }}_{y}^{-1}\mathbf{D}_{ae_{1}}\boldsymbol{\Sigma }_{m}\hat{\mathbf{D}}_{a\beta }\hat{\boldsymbol{\Sigma }}_{y}^{-1} \\ (\mathbf{A}\hat{\boldsymbol{\beta }} -\mathbf{y})^{T}\hat{\boldsymbol{\Sigma }}_{y}^{-1}\mathbf{D}_{ae_{2}}\boldsymbol{\Sigma }_{m}\hat{\mathbf{D}}_{a\beta }\hat{\boldsymbol{\Sigma }}_{y}^{-1}\\ \vdots \\ (\mathbf{A}\hat{\boldsymbol{\beta }} -\mathbf{y})^{T}\hat{\boldsymbol{\Sigma }}_{y}^{-1}\mathbf{D}_{ae_{t}}\boldsymbol{\Sigma }_{m}\hat{\mathbf{D}}_{a\beta }\hat{\boldsymbol{\Sigma }}_{y}^{-1} \end{array} \right ],}$$

\(\mathbf{D}_{ae_{i}}\) is a diagonal matrix with its kth diagonal element being equal to \((\mathbf{a}_{k}\mathbf{e}_{i})\), e i is the ith natural basis vector of dimension t, \(\hat{\mathbf{D}}_{a\beta }\) is the estimate of D a β by replacing \(\boldsymbol{\beta }\) with its corresponding weighted LS estimate \(\hat{\boldsymbol{\beta }}\). Following Xu et al. (2013), we can solve for the weighted LS estimate of \(\boldsymbol{\beta }\) through the following iteration procedures:

$$\displaystyle{ \hat{\boldsymbol{\beta }}_{k+1} = (\mathbf{A}^{T}\hat{\boldsymbol{\Sigma }}_{ yk}^{-1}\mathbf{A})^{-1}\{\mathbf{A}^{T}\hat{\boldsymbol{\Sigma }}_{ yk}^{-1}\mathbf{y} + \mathbf{G}_{ 1k}(\mathbf{A}\hat{\boldsymbol{\beta }}_{k} -\mathbf{y})\},\hspace{2.84526pt} k = 0,1,\ldots }$$
(18)

where \(\hat{\boldsymbol{\Sigma }}_{yk}\) and G 1k stand for computing \(\hat{\boldsymbol{\Sigma }}_{y}\) and \(\mathbf{G}_{1}\) at the point of \(\hat{\boldsymbol{\beta }}_{k}\).

It is obvious from (17) that the weighted LS estimate \(\hat{\boldsymbol{\beta }}\) is nonlinear and is expected to be biased. Xu et al. (2013) derived the bias of \(\hat{\boldsymbol{\beta }}\) in the mixed additive and multiplicative error model (8), which is denoted by \(\mathbf{b}(\hat{\boldsymbol{\beta }})\) and is simply listed as follows:

$$\displaystyle{ \mathbf{b}(\hat{\boldsymbol{\beta }}) = E(\mathbf{b}_{\beta }) = \mathbf{N}^{-1}\mathbf{g}_{ 2}, }$$
(19)

where \(\mathbf{N} = \mathbf{A}^{T}\boldsymbol{\Sigma }_{y}^{-1}\mathbf{A}\) and g 2 is given by

$$\displaystyle{\mathbf{g}_{2} = \left [\begin{array}{c} \text{tr}\{\mathbf{D}_{ae_{1}}\boldsymbol{\Sigma }_{m}\mathbf{D}_{a\beta }\boldsymbol{\Sigma }_{y}^{-1}\} \\ \text{tr}\{\mathbf{D}_{ae_{2}}\boldsymbol{\Sigma }_{m}\mathbf{D}_{a\beta }\boldsymbol{\Sigma }_{y}^{-1}\}\\ \vdots \\ \text{tr}\{\mathbf{D}_{ae_{t}}\boldsymbol{\Sigma }_{m}\mathbf{D}_{a\beta }\boldsymbol{\Sigma }_{y}^{-1}\} \end{array} \right ].}$$

By limiting themselves to the linear term of \(\hat{\boldsymbol{\beta }}\) with respect to the random errors \(\boldsymbol{\epsilon }_{m}\) and \(\boldsymbol{\epsilon }_{a}\), Xu et al. (2013) also derived the first order approximation of the variance-covariance matrix of the weighted LS estimate \(\hat{\boldsymbol{\beta }}\), which is denoted by \(D_{1}(\hat{\boldsymbol{\beta }})\) and given as follows:

$$\displaystyle{ D(\hat{\boldsymbol{\beta }}) = (\mathbf{A}^{T}\boldsymbol{\Sigma }_{ y}^{-1}\mathbf{A})^{-1}. }$$
(20)

After taking the bias (19) into account, we obtain the approximate mean squared error (MSE) matrix of \(\hat{\boldsymbol{\beta }}\) as follows:

$$\displaystyle\begin{array}{rcl} \mathbf{M}(\hat{\boldsymbol{\beta }})& =& D(\hat{\boldsymbol{\beta }}) + \mathbf{b}(\hat{\boldsymbol{\beta }})[\mathbf{b}(\hat{\boldsymbol{\beta }})]^{T} \\ & =& (\mathbf{A}^{T}\boldsymbol{\Sigma }_{ y}^{-1}\mathbf{A})^{-1} + (\mathbf{A}^{T}\boldsymbol{\Sigma }_{ y}^{-1}\mathbf{A})^{-1}\mathbf{g}_{ 2}\mathbf{g}_{2}^{T}(\mathbf{A}^{T}\boldsymbol{\Sigma }_{ y}^{-1}\mathbf{A})^{-1},{}\end{array}$$
(21)

where \(\mathbf{M}(\hat{\boldsymbol{\beta }})\) stands for the MSE matrix of \(\hat{\boldsymbol{\beta }}\).

4.3 The Bias-Corrected Weighted LS Method

Bias analysis in Xu and Shimada (2000) and Xu et al. (2013) clearly indicates that the bias of the weighted LS estimate is solely caused by the non-zero term of derivatives of the variance-covariance matrix \(\boldsymbol{\Sigma }_{y}\) with respect to \(\boldsymbol{\beta }\). Thus both works propose deleting the corresponding term in the normal equations, namely the third term on the right hand side of the normal equations (17), to remove the bias from the weighted LS estimate \(\hat{\boldsymbol{\beta }}\). As a result, they are able to construct the bias-corrected weighted LS estimate of \(\boldsymbol{\beta }\).

When the same idea is applied to the mixed additive and multiplicative error model (8), they derive the bias-corrected weighted LS estimate of \(\boldsymbol{\beta }\), which is denoted by \(\hat{\boldsymbol{\beta }}_{bc}\) and solved through the following system of normal equations:

$$\displaystyle{ (\mathbf{A}^{T}\hat{\boldsymbol{\Sigma }}_{ y}^{-1}\mathbf{A})\hat{\boldsymbol{\beta }}_{ bc} -\mathbf{A}^{T}\hat{\boldsymbol{\Sigma }}_{ y}^{-1}\mathbf{y} = \mathbf{0}, }$$
(22)

where \(\hat{\boldsymbol{\beta }}_{bc}\) is the bias-corrected WLS estimate of \(\boldsymbol{\beta }\). Equivalently, \(\hat{\boldsymbol{\beta }}_{bc}\) can be formally rewritten as follows:

$$\displaystyle{ \hat{\boldsymbol{\beta }}_{bc} = (\mathbf{A}^{T}\hat{\boldsymbol{\Sigma }}_{ y}^{-1}\mathbf{A})^{-1}\mathbf{A}^{T}\hat{\boldsymbol{\Sigma }}_{ y}^{-1}\mathbf{y}, }$$
(23)

which is unbiased up to the second order approximation (Xu et al. 2013). The variance-covariance matrix of \(\hat{\boldsymbol{\beta }}_{bc}\) is denoted by \(D(\hat{\boldsymbol{\beta }}_{bc})\) and given by

$$\displaystyle{ D(\hat{\boldsymbol{\beta }}_{bc}) = (\mathbf{A}^{T}\boldsymbol{\Sigma }_{ y}^{-1}\mathbf{A})^{-1}. }$$
(24)

Because the matrix \(\hat{\boldsymbol{\Sigma }}_{y}\) depends on \(\hat{\boldsymbol{\beta }}_{bc}\), (23) is actually a nonlinear system of equations and can, in general, be solved numerically. If the Gauss-Newton method is used to solve for the bias-corrected weighted LS estimate, we have the following iterative formula:

$$\displaystyle{ \hat{\boldsymbol{\beta }}_{bc}^{\mathit{k}+1} =\hat{\boldsymbol{\beta }}_{ bc}^{\mathit{k}} - (\mathbf{A}^{T}\hat{\boldsymbol{\Sigma }}_{ yk}^{-1}\mathbf{A})^{-1}\mathbf{A}^{T}\hat{\boldsymbol{\Sigma }}_{ yk}^{-1}(\mathbf{A}\hat{\boldsymbol{\beta }}_{ bc}^{\mathit{k}} -\mathbf{y}),\hspace{2.84526pt} k = 0,1,\ldots }$$
(25)

(see also McCullagh 1983; McCullagh and Nelder 1989; Dennis and Schnabel 1996; Xu et al. 2013).

We should like to note that given some approximate values, say \(\boldsymbol{\beta }_{0}\), \(\boldsymbol{\Sigma }_{m0}\) and \(\boldsymbol{\Sigma }_{a0}\), we can then replace \(\hat{\boldsymbol{\Sigma }}_{y}\) of (22) with \(\boldsymbol{\Sigma }_{y0}\) (computed at \(\boldsymbol{\beta }_{0}\), \(\boldsymbol{\Sigma }_{m0}\) and \(\boldsymbol{\Sigma }_{a0}\)), which is exactly the conventional practice of adjustment of geodetic networks such as EDM, VLBI and GPS baselines. In other words, the conventional weighted LS adjustment of baseline networks can be interpreted as a special case of the bias-corrected weighted LS method with given approximate values. Nevertheless, the effectiveness of using approximate values would depend on how far away these approximate values deviate from their true values, as also pointed out and demonstrated in Xu et al. (2013).

5 Mixed Additive and Multiplicative Random Error Models with Prior Information

In this section, we will extend the parameter estimation in mixed additive and multiplicative random error models to the case with prior information. Prior information will only be assumed in the form of the first two moments on the unknown parameters \(\boldsymbol{\beta }\), i.e. its prior mean \(\boldsymbol{\mu }\) and prior variance-covariance matrix. Bearing the concept of additive and multiplicative random errors in mind, we will accordingly assume two types of prior variance-covariance matrices. As in the case of Gauss-Markov models with additive random errors, the first type of prior variance-covariance matrices is assumed to be independent of \(\boldsymbol{\beta }\) and symbolically denoted by \(\boldsymbol{\Sigma }_{0}\). If prior information on \(\boldsymbol{\beta }\) is obtained from measurements contaminated by multiplicative errors other than the measurements y in the mixed additive and multiplicative error model (8), then the prior variance-covariance matrix will surely be dependent on \(\boldsymbol{\beta }\), as can be readily seen in (24), for example. Thus, the second type of prior variance-covariance matrices is assumed to be the functions of \(\boldsymbol{\beta }\) and denoted by \(\boldsymbol{\Sigma }_{\beta }\). Of course, prior information can also be presented in the form of prior distributions. In this case, one can use Bayesian inference to estimate the unknown parameters. For more information, the reader is referred to Xu (1999).

If the model (8) is combined with the first type of prior variance-covariance matrices, namely, \(\boldsymbol{\Sigma }_{0}\), then the corresponding generalized (weighted) LS objective function will become

$$\displaystyle{ \text{min:}\hspace{5.69054pt} F_{3}(\boldsymbol{\beta }) = (\mathbf{y} -\mathbf{A}\boldsymbol{\beta })^{T}\boldsymbol{\Sigma }_{ y}^{-1}(\mathbf{y} -\mathbf{A}\boldsymbol{\beta }) + (\boldsymbol{\beta }-\boldsymbol{\mu })^{T}\boldsymbol{\Sigma }_{ 0}^{-1}(\boldsymbol{\beta }-\boldsymbol{\mu }). }$$
(26)

According to the bias analysis in Xu and Shimada (2000) and Xu et al. (2013), we know that \(\boldsymbol{\Sigma }_{y}\) will create a bias in the solution to (26). Since the prior variance-covariance matrix \(\boldsymbol{\Sigma }_{0}\) is independent of \(\boldsymbol{\beta }\), it will not contribute extra terms to the bias of the solution. By following the same rationale as in Sect. 4.3, we can ignore the dependence of \(\boldsymbol{\Sigma }_{y}\) on \(\boldsymbol{\beta }\) as if it were independent of \(\boldsymbol{\beta }\) and, as a result, readily construct the bias-corrected estimator of \(\boldsymbol{\beta }\) with prior information, denoted by \(\hat{\boldsymbol{\beta }}_{bc}^{\mathit{p0}}\), as follows:

$$\displaystyle{ \hat{\boldsymbol{\beta }}_{bc}^{\mathit{p0}} = (\mathbf{A}^{T}\hat{\boldsymbol{\Sigma }}_{ y}^{-1}\mathbf{A} +\boldsymbol{ \Sigma }_{ 0}^{-1})^{-1}(\mathbf{A}^{T}\hat{\boldsymbol{\Sigma }}_{ y}^{-1}\mathbf{y} +\boldsymbol{ \Sigma }_{ 0}^{-1}\boldsymbol{\mu }). }$$
(27)

The first order accuracy of \(\hat{\boldsymbol{\beta }}_{bc}^{\mathit{p0}}\) is then given by

$$\displaystyle{ D(\hat{\boldsymbol{\beta }}_{bc}^{\mathit{p0}}) = (\mathbf{A}^{T}\boldsymbol{\Sigma }_{ y}^{-1}\mathbf{A} +\boldsymbol{ \Sigma }_{ 0}^{-1})^{-1}. }$$
(28)

If the second type of prior information is combined with the measurements from the model (8), we should then have the following optimization problem:

$$\displaystyle{ \text{min:}\hspace{5.69054pt} F_{4}(\boldsymbol{\beta }) = (\mathbf{y} -\mathbf{A}\boldsymbol{\beta })^{T}\boldsymbol{\Sigma }_{ y}^{-1}(\mathbf{y} -\mathbf{A}\boldsymbol{\beta }) + (\boldsymbol{\beta }-\boldsymbol{\mu })^{T}\boldsymbol{\Sigma }_{\beta }^{-1}(\boldsymbol{\beta }-\boldsymbol{\mu }). }$$
(29)

Obviously, both of \(\boldsymbol{\Sigma }_{y}\) and \(\boldsymbol{\Sigma }_{\beta }\) will now directly contribute terms to the bias of the optimal solution to the optimization problem (29), according to Xu and Shimada (2000) and Xu et al. (2013). In the similar manner to (27), we can construct the bias-corrected estimator of \(\boldsymbol{\beta }\) with prior mean \(\boldsymbol{\mu }\) and prior variance-covariance matrix \(\boldsymbol{\Sigma }_{\beta }\), which is denoted by \(\hat{\boldsymbol{\beta }}_{bc}^{\mathit{p1}}\) and simply listed as follows:

$$\displaystyle{ \hat{\boldsymbol{\beta }}_{bc}^{\mathit{p1}} = (\mathbf{A}^{T}\hat{\boldsymbol{\Sigma }}_{ y}^{-1}\mathbf{A} +\hat{\boldsymbol{ \Sigma }}_{\beta }^{-1})^{-1}(\mathbf{A}^{T}\hat{\boldsymbol{\Sigma }}_{ y}^{-1}\mathbf{y} +\hat{\boldsymbol{ \Sigma }}_{\beta }^{-1}\boldsymbol{\mu }), }$$
(30)

which is unbiased up to the second order approximation, and its first order accuracy is given by

$$\displaystyle{ D(\hat{\boldsymbol{\beta }}_{bc}^{\mathit{p1}}) = (\mathbf{A}^{T}\boldsymbol{\Sigma }_{ y}^{-1}\mathbf{A} +\boldsymbol{ \Sigma }_{\beta }^{-1})^{-1}. }$$
(31)

As in the case of the bias-corrected weighted LS estimation, the bias-corrected stochastic inference (or LS collocation) with prior information in mixed multiplicative and additive error models is essentially of the same form as in the case of purely additive error models, as correctly pointed out by one of the reviewers. However, unlike stochastic inference in additive error models, the bias-corrected stochastic inference with prior information in mixed multiplicative and additive error models requires that \(\hat{\boldsymbol{\beta }}_{bc}^{\mathit{p1}}\) of (30) be computed iteratively, since the right hand side of (30) contains the unknowns \(\hat{\boldsymbol{\beta }}_{bc}^{\mathit{p1}}\) as well. Nevertheless, if one would simply apply the conventional principle of stochastic inference to mixed multiplicative and additive error models, one would end up with a biased estimator, which would not be of the same form as in the case of purely additive error models.

In the one-dimensional case, namely,

$$\displaystyle{y_{ij} = (1 +\varepsilon _{mij})s_{ij} +\varepsilon _{aij},}$$

then the bias-corrected LS estimate of s i j with prior information can be rewritten as follows:

$$\displaystyle{ \hat{s}_{ij} =\mu _{ij} + \frac{\sigma _{\mu }^{2}} {\sigma _{m}^{2}s_{0ij}^{2} +\sigma _{ a}^{2} +\sigma _{ \mu }^{2}}(y_{ij} -\mu _{ij}), }$$
(32)

where μ i j is the prior mean of s i j , \(\sigma _{\mu }^{2}\) is the prior variance of μ i j , s 0i j is some approximate value of s i j , and \(\sigma _{m}^{2}\) and \(\sigma _{a}^{2}\) are the variances of the multiplicative and additive errors \(\varepsilon _{mij}\) and \(\varepsilon _{aij}\), respectively. By properly choosing the values of μ i j , \(\sigma _{\mu }^{2}\), s 0i j , \(\sigma _{m}^{2}\) and \(\sigma _{a}^{2}\), one can then construct the filter by Kuan et al. (1985) for image de-noising.

6 Concluding Remarks

Geodetic adjustment has been developed on the basis of Gauss-Markov models with additive random errors. The most important feature of such a Gauss-Markov model with additive random errors is that the accuracy of a measurement has nothing to do with the true value of the measurement. However, geodetic practice has clearly demonstrated that random errors of EDM, VLBI and GPS baselines indeed change with the length of a baseline. In other words, random errors of such types usually consist of two parts: one behaves more or less constant and may reflect only random effects of local nature, while the other is proportional to the length of the baseline and could, very likely, reflect total effect of the propagation path between the two stations. Such error characteristics are part of modern geodetic coherent imaging systems such as SAR and LiDAR. Obviously, the conventional adjustment theory that has been developed on the assumption of additive random errors cannot theoretically meet the need to process geodetic measurements that are contaminated by multiplicative and/or mixed additive and multiplicative random errors.

In this paper, we have briefly reviewed two types of methods for parameter estimation in mixed additive and multiplicative error models, namely, quasi-likelihood and least-squares-based methods, with or without prior information. Quasi-likelihood, though theoretically connected with distributions, can be used directly for parameter estimation without any assumption on distributions. If there exist multiple solutions to the generalized estimating equations, no criterion is available for quasi-likelihood to pick up the right solution, however. The LS-based methods have a clearly defined objective function. Thus the sense of optimality of LS-based estimates is well defined. For the linear model (8), quasi-likelihood, the ordinary and bias-corrected weighted LS methods can all warrant an unbiased estimate of the unknown parameters, while the weighted LS method will generally lead to a biased estimate. Quasi-likelihood and the bias-corrected weighted LS method are more efficient than the ordinary LS method. We have also extended the bias-corrected LS estimate to the case with prior information, which can either be given in the form of prior mean and a parameter-free or a parameter-dependent prior variance-covariance matrix.