Keywords

1 Introduction

The linear least squares (LS) estimation approach is an important concept in electrical engineering, especially in digital signal processing. Example applications range from localization [1] and positioning [2], over robotics [3], power and battery applications [4], biomedical applications [5], as well as image processing [6].

For LS estimation we assume the following system model:

$$\begin{aligned} \mathbf{y} = \mathbf{Hx} + \mathbf{n} \end{aligned}$$
(1)

where \(\mathbf y\) is a measured vector, \(\mathbf H\) is a known system matrix of dimension \(m \times p\), \(\mathbf n\) is a noise vector and \(\mathbf x\) is the parameter vector that we want to estimate.

The linear least squares solution to this estimation problem is well known as

$$\begin{aligned} \hat{\mathbf{x}}_{LS} = \mathbf{H}^\dag \mathbf{y}. \end{aligned}$$
(2)

with the pseudoinverse \(\mathbf{H}^\dag = (\mathbf{H}^T\mathbf{H})^{-1}{} \mathbf{H}^T\). Such a direct calculation of the solution as is often called batch solution in literature [7].

For many realtime applications a low complexity implementation is preferred over an exact solution. For this reason the calculation of the batch solution is usually avoided for such applications, due to its computational complexity and its large memory requirements. To provide a low complexity approximate approach for the LS estimation problem we developed a method that we call Approximate Least Squares (ALS) [8]. It can be seen as a variant of the Kaczmarz algorithm [9] for overdetermined and inconsistent linear equation systems.

Fig. 1.
figure 1

Schematical drawing of the gradient and partial gradients.

ALS is based on the iterative least squares (ILS) approach that iteratively calculates

$$\begin{aligned} \hat{\mathbf{x}}^{(k)}&= \hat{\mathbf{x}}^{(k-1)} - \mu \mathbf{d} (\hat{\mathbf{x}}^{(k-1)}). \end{aligned}$$
(3)

Here \(\mathbf{h}_i^T\) is the \(i^{th}\) row of \(\mathbf H\). The function

$$\begin{aligned} \mathbf{d} (\mathbf{x}^{(k-1)}) = \sum _{i=1}^{m} 2 \mathbf{h}_{i}( \mathbf{h}_{i}^T \hat{\mathbf{x}}^{(k-1)} - {y_{i}}) \end{aligned}$$
(4)

is the gradient of the least squares cost function

$$\begin{aligned} J(\hat{\mathbf{x}}) = \sum _{i=1}^m (y_i - \mathbf{h}_i^T \hat{\mathbf{x}})^2 \end{aligned}$$
(5)

that has its minimum at \(\hat{\mathbf{x}}_{LS}\). It can be shown that for \(k \rightarrow \infty \), \(\hat{\mathbf{x}}^{(k)}\) converges to \(\hat{\mathbf{x}}_{LS}\) given that the iteration step width \(\mu \) fulfills \({0<\mu <1/(2 s_1^2(\mathbf{H}))}\) [12], with \(s_1(\mathbf{H})\) as the largest singular value of \(\mathbf H\). Figure 1 schematically shows one iteration of ILS. The gradient \(\mathbf{d} (\mathbf{x}^{(k-1)})\) can be seen as a sum of partial gradients

$$\begin{aligned} d_i(\hat{\mathbf{x}}^{(k-1)}) = 2\mathbf{h}_{i} ( \mathbf{h}_{i}^{T}\hat{\mathbf{x}}^{(k-1)} - {y_{i}}). \end{aligned}$$
(6)

2 Approximate Least Squares

To reduce the complexity of this approach we proposed to use only one of these partial gradients per iteration, leading to the ALS iteration:

$$\begin{aligned} \hat{\mathbf{x}}^{(k)}&= \hat{\mathbf{x}}^{(k-1)} + 2 \mu \mathbf{h}_{k\urcorner } ({y_{k\urcorner }} - \mathbf{h}_{k\urcorner }^{T} \hat{\mathbf{x}}^{(k-1)}). \end{aligned}$$
(7)

In the above equation the operator “\(\,\,\,\urcorner \)” is defined as: \({k\urcorner = ( (k-1)\text { mod } m ) + 1}\) for a positive natural number k. For better readability we don’t write the dependence of this operator on m in the operator’s symbol. For ALS m is always the number of rows of the matrix \(\mathbf H\). Using this operator naturally allows to have more iterations than the number of rows of \(\mathbf H\), which is typically required for ALS to obtain a good performance.

As we described in [8], the error \(\mathbf{e}^{(k)}= \hat{\mathbf{x}}^{(k)} - \mathbf{x}\) of the above iteration can be split into two parts

$$\begin{aligned} \mathbf{e}^{(k)} = \mathbf{e}^{(k)}_0 + \mathbf{e}_\mathbf{\Delta }^{(k)}, \end{aligned}$$
(8)

with \(\mathbf{e}^{(k)}_0\) as the error depending on the initial value \(\hat{\mathbf{x}}^{(0)}\) of the algorithm before the first iteration, and \(\mathbf{e}_\mathbf{\Delta }^{(k)}\) as the error depending on the noise vector \(\mathbf n\). As we showed in [8], the error \(\mathbf{e}^{(k)}_0\) goes to zero as the number of iterations goes to infinity, while the noise dependent error \(\mathbf{e}_\mathbf{\Delta }^{(k)}\) persists even if \(k \rightarrow \infty \). Due to the cyclic re-use of the rows of \(\mathbf H\) as well as the measurement values in \(\mathbf y\), the errors \(\mathbf{e}^{(k)}\) converge to m different values for \(k \rightarrow \infty \):

$$\begin{aligned} \mathbf{e}^{(mk+i)} = \mathbf{e}_{\mathbf{\Delta }}^{(mk+i)} = \mathbf{e}_{\mathbf{\Delta }_\infty }^{(i)} \text { for } i=1, \ldots , m \end{aligned}$$
(9)

In Fig. 2 we plotted a typical case of the error norm of \(\hat{\mathbf{x}}^{(k)}\) for an example 100 \(\times \) 10 matrix \(\mathbf H\) over the iterations k as well as the error norm of the algorithm’s output \(\hat{\mathbf{x}}_{ALS}\), as described below. For better visibility, the error norm of \(\hat{\mathbf{x}}_{ALS}\) has been depicted has a horizontal line, although it is available only at the end of the algorithm. When analyzing the norm of the error \(\mathbf{e}^{(k)}\) over the iterations, one can see an oscillatory behavior. This comes from the fact that for large k the ALS algorithm produces approximately the same m recurring error vectors (up to the vanishing deviation \(\mathbf{e}^{(k)}_0\)). To reduce the final error norm of the vector output by the ALS algorithm, we introduced an averaging step in the final m iterations of the algorithm.

Fig. 2.
figure 2

ALS error norms.

The basic ALS algorithm is summarized in the following pseudocode (Algorithm: ALS).

figure a

Here N denotes the number of iterations of the algorithm and \(\hat{\mathbf{x}}_{ALS}\) is the approximation of \(\hat{\mathbf{x}}_{LS}\) that is output by the algorithm. \(\mathbf 0\) denotes the zero vector. As one can see from the algorithm’s description the averaging only has to be done once, it therefore presents only a minor complexity increase (overall only pm additions and p multiplications with the constant 1 / m have to be performed additionally).

As one can see in the above algorithm, if k reaches m, then for the following iterations the first rows of \(\mathbf H\) and the first elements of \(\mathbf y\) are re-used again in a cyclic manner. This approach has the advantage, that compared to ILS, about m times less multiplications per iteration are required.

But ALS also has the drawback – as we show in [8] and as discussed above – that for a constant step width \(\mu \), a persistent oscillating error \(\mathbf{e}^{(k)} = \hat{\mathbf{x}}^{(k)}- \mathbf{x}_{LS}\) exists, even if \(k \rightarrow \infty \). This error depends on the noise as well as on the parameter \(\mu \) of the algorithm. While a large value of \(\mu \) leads to a fast decrease of the error \(\mathbf{e}^{(k)}_0\) at early iterations it leads to a higher error \(\mathbf{e}_\mathbf{\Delta }^{(k)}\) and thus to a high final error. Choosing small values leads to small final errors but requires a large number of iterations because the error \(\mathbf{e}^{(k)}_0\) is decreasing more slowly. Supporting these findings, in [10] it has been shown that the ALS iteration converges to the LS solution, if \(\mu \rightarrow 0\) and \(k \rightarrow \infty \). However, for a practical application of the algorithm, one is interested in a reduction method providing a fast convergence close to \(\mathbf{x}_{LS}\).

For this reason we proposed to adjust the step width \(\mu \) of the algorithm during the iterations. We call this approach Step-Adaptive Approximate Least Squares (SALS) [11].

3 Step-Adaptive Approximate Least Squares (SALS)

In [11] we proposed to adjust the step width \(\mu = \mu _k\) at every iteration. For this we divide the overall ALS iteration process into two phases, the reduction phase and the oscillation phase. In the reduction phase the error norm \(\Vert \mathbf{e}^{(k)}\Vert _2\) decreases while in the oscillation phase the error norm is approximately cyclically repeating as described above. In Fig. 2 the reduction phase lasts the first 500 iterations. During the reduction phase, the error part \(\mathbf{e}^{(k)}_0\) (due to the initial value \(\mathbf{x}^{(0)}\)) contributes most to \(\mathbf{e}^{(k)}\) in (8), thus a high value of \(\mu _k\) should be used to decrease the overall error. In the oscillation phase, the error \(\mathbf{e}_\mathbf{\Delta }^{(k)}\) contributes most to \(\mathbf{e}^{(k)}\). Here a high value of \(\mu \) would prevent a further decrease of the error, thus a low value of \(\mu _k\) is beneficial in this phase. For the reduction phase we propose, to use

$$\begin{aligned} \mu _k = \frac{1}{2\Vert \mathbf{h}^T_{{k\urcorner }}\Vert _2^2}. \end{aligned}$$
(10)

As one can show [11], this is the largest value of \(\mu \) for which \(\mathbf{e}^{(k)}_0 \rightarrow \mathbf{0}\) as \(k \rightarrow \infty \). To detect whether or not the algorithm is already in the oscillation phase we used the following method. The oscillation phase is characterized by the occurrence of approximately the same error vectors, every m iterations. When inspecting (7) one can see that the error vector is only influenced by the term \({({y_{k\urcorner }} - \mathbf{h}_{k\urcorner }^{T} \hat{\mathbf{x}}^{(k-1)})}\). By comparing this value with the value m iterations before one can detect the oscillation phase if the difference is below a predefined threshold. After the oscillation phase is detected, the idea for SALS is to reduce

figure b

the step width \(\mu \) to as well reduce the final error of the ALS solution. The overall SALS algorithm is presented in the following pseudocode (Algorithm: SALS). The function \(f(\mu )\) is used to reduce \(\mu \) once the oscillation phase is detected. The first reduction is done by setting \(\mu \) to the minimum of the m values used in the iterations before. In every following iteration \(\mu \) is furthermore reduced via a reduction function \(f(\mu )\). In the following section we present and compare two low complexity variants of \(f(\mu )\).

4 Simulation Results

The aim of Approximate Least Squares is to provide a low complexity approximate solution of the linear least squares problem. For this reason we restrict ourselves to reduction functions causing only a negligible complexity overhead compared to the basic ALS algorithm. We specifically compare the two reduction functions

$$\begin{aligned} \mu _{k} = \mu _{k-1} - s = f_1(\mu _{k-1}) \end{aligned}$$
(11)

as well as

$$\begin{aligned} \mu _{k} = (1 - 2^{-c}) \mu _{k} = f_2(\mu _{k-1}) , \end{aligned}$$
(12)

with a positive fractional s and a positive integer c. For (11) only one subtraction per iteration is required, while for (12) only one shift operation and a subtraction is required per iteration. Figure 3 shows simulation results for random \(\mathbf{H}\) matrices for ALS and SALS using these reduction functions, respectively. The entries of these matrices have been sampled from a uniform distribution out of [0, 1]. Every simulation has been done for white Gaussian noise with zero mean and standard deviation \(\sigma \in S=\{ 10^{-4},10^{-3},10^{-2},10^{-1}, 1\}\), respectively. In the SALS algorithms \(v_{th}\) was set to \(10^{-6}\).

For (11), we used \(s = \mu _i / (N-i)\), with i as the iteration when the oscillation phase was detected and \(\mu _i\) as the corresponding step width, respectively. For (12), c was set to \(\lfloor log_2(N) \rfloor \). These values of s and c have been found and optimized by extensive simulations, respectively.

Fig. 3.
figure 3

Simulation results.

The figure shows the relative error of ALS and SALS, respectively, compared to the optimum least squares solution. It shows the maximum relative increase of the error norms of ALS and SALS, respectively, over the averaged error norms of LS. The maximization has been done over the elements of S: \({r_{ALS} = \underset{S}{\text {max}} \left( \frac{\overline{||\hat{\mathbf{x}}_{ALS}-\mathbf{x}||_2}}{\overline{||\hat{\mathbf{x}}_{LS}-\mathbf{x}||_2}} -1 \right) }\) and \({r_{SALS} = \underset{S}{\text {max}} \left( \frac{\overline{||\hat{\mathbf{x}}_{SALS}-\mathbf{x}||_2}}{\overline{||\hat{\mathbf{x}}_{LS}-\mathbf{x}||_2}} -1 \right) }\), respectively. As one can see from these results, SALS performs significantly better than the basic ALS algorithm. Its deviation to the least squares error norm is within the single percentage range. For large \(\mathbf H\) matrices \(f_1\) performs significantly better than \(f_2\). For the simulated matrices with 1000 rows, the average error norm deviation \(r_{SALS}\) was below \(2\,\%\). For a practical application, especially when thinking of an implementation in fixed point precision, such an error deviation is typically considered negligible.

5 Conclusion

We present and compare different low complexity approaches for reducing the step width of Step-Adaptive Approximate Least Squares. We show that with Step-Adaptive Approximate Least Squares an error norm performance can be achieved that is within the single percentage range compared to the optimal least squares error performance. For many applications such an error norm can be considered practically equivalent to the optimum least squares solution that is obtainable only with a much higher computational effort.