Keywords

1 Introduction

We consider the overdetermined linear least squares (LLS) problem

$$\begin{aligned} \min _{x \in \mathbb {R}^n} \Vert {Ax - b} \Vert _2, \end{aligned}$$
(1)

with \(A \in \mathbb {R}^{m \times n}, m \ge n\) and \(b \in \mathbb {R}^m\). We assume throughout this paper that \(A\) has full column rank and as a result, Eq. (1) has a unique solution \(x = A^+b\) where \(A^+\) is the Moore-Penrose pseudoinverse of the matrix \(A\), expressed by \(A^+ = (A^TA)^{-1}A^T\). We can find for instance in [7, 13, 19] a comprehensive survey of the methods that can be used for solving efficiently and accurately LLS problems.

The condition number is a measure of the sensitivity of a mapping to perturbations. It was initially defined in [23] as the maximum amplification factor between a small perturbation in the data and the resulting change in the problem solution. Namely, if the solution \(x\) of a given problem can be expressed as a function \(g(y)\) of a data \(y\), then if \(g\) is differentiable (which is the case for many linear algebra problems), the absolute condition number of \(g\) at \(y\) can be defined as (see e.g. [12])

$$\begin{aligned} \kappa (y)=\max _{z \ne 0} \frac{\Vert {g'(y).z} \Vert }{\Vert {z} \Vert }. \end{aligned}$$
(2)

From this definition, \(\kappa (y)\) is a quantity that, for a given perturbation size on the data \(y\), allows us to predict to first order the perturbation size on the solution \(x\). Associated with a backward error [26], condition numbers are useful to assess the numerical quality of a computed solution. Indeed numerical algorithms are always subject to errors although their sensitivity to errors may vary. These errors can have various origins like for instance data uncertainty due to instrumental measurements or rounding and truncation errors inherent to finite precision arithmetic.

LLS can be very sensitive to perturbations in data and it is crucial to be able to assess the quality of the solution in practical applications [4]. It was shown in [14] that the 2-norm condition number \({{\mathrm{cond}}}(A)\) of the matrix \(A\) plays a significant role in LLS sensitivity analysis. It was later proved in [25] that the sensitivity of LLS problems is proportional to \({{\mathrm{cond}}}(A)\) when the residual vector is small and to \({{\mathrm{cond}}}(A)^2\) otherwise. Then [12] provided a closed formula for the condition number of LLS problems, using the Frobenius norm to measure the perturbations of \(A\). Since then many results on normwise LLS condition numbers have been published (see e.g. [2, 7, 11, 15, 16]).

It was observed in [18] that normwise condition numbers can lead to a loss of information since they consolidate all sensitivity information into a single number. Indeed in some cases this sensitivity can vary significantly among the different solution components (some examples for LLS are presented in [2, 21]). To overcome this issue, it was proposed the notion of “componentwise” condition numbers or condition numbers for the solution components [9]. Note that this approach must be distinguished from the componentwise metric also applied to LLS for instance in [5, 10]. This approach was generalized by the notion of partial or subspace condition numbers which corresponds to conditioning of \(L^Tx\) with \(L~\in ~\mathbb {R}^{n \times k}, k~\le ~n\), proposed for instance in [2, 6] for least squares and total least squares, or [8] for linear systems. The motivation for computing the conditioning of \(L^Tx\) can be found for instance in [2, 3] for normwise LLS condition numbers.

Even though condition numbers provide interesting information about the quality of the computed solution, they are expected to be calculated in an acceptable time compared to the cost for the solution itself. Computing the exact (subspace or not) condition number requires \(\mathcal {O}(n^3)\) flops when the LLS solution \(x\) has been already computed (e.g., using a QR factorization) and can be reused to compute the conditioning [2, 3]. This cost is affordable when compared to the cost for solving the problem (\(\mathcal {O}(2mn^2)\) flops when \(m\gg n\)). However statistical estimates can reduce this cost to \(\mathcal {O}(n^2)\) [17, 20]. The theoretical quality of the statistical estimates can be formally measured by the probability to give an estimate in a certain range around the exact value. In this paper we summarize closed formulas for the condition numbers of the LLS solution and of its components, and we propose practical algorithms to compute statistical estimates of these quantities. In particular we derive a new expression for the statistical estimate of the conditioning of \(x\). We also present numerical experiments to compare LLS conditioning with the corresponding statistical estimates.

Notations. The notation \(\left\| {\cdot } \right\| _2\) applied to a matrix (resp. a vector) refers to the spectral norm (resp. the Euclidean norm ) and \(\left\| {\cdot } \right\| _F\) denotes the Frobenius norm of a matrix. The matrix \(I\) is the identity matrix and \(e_i\) is the \(i\)th canonical vector. The uniform continuous distribution between \(a\) and \(b\) is abbreviated \(\mathcal {U}(a,b)\) and the normal distribution of mean \(\mu \) and variance \(\sigma ^2\) is abbreviated \(\mathcal {N}(\mu ,\sigma ^2)\). \({{\mathrm{cond}}}(A)\) denotes the 2-norm condition number of a matrix \(A\), defined as \({{\mathrm{cond}}}(A) = \Vert {A} \Vert _2 \Vert {A^+} \Vert _2\). The notation \(|\cdot |\) applied to a matrix or a vector holds componentwise.

2 Condition Estimation for Linear Least Squares

In Sect. 2.1 we are concerned in calculating the condition number of the LLS solution \(x\) and in Sect. 2.2 we compute or estimate the conditioning of the components of \(x\). We suppose that the LLS problem has already been solved using a QR factorization (the normal equations method is also possible but the condition number is then proportional to \({{\mathrm{cond}}}(A)^2\) [7, p. 49]). Then the solution \(x\), the residual \(r=b-Ax\), and the factor \(R \in \mathbb {R}^{n \times n}\) of the QR factorization of \(A\) are readily available (we recall that the Cholesky factor of the normal equations is, in exact arithmetic, equal to \(R\) up to some signs). We also make the assumption that both \(A\) and \(b\) can be perturbed, these perturbations being measured using the weighted product norm \(\left\| {(\varDelta A,\varDelta b)} \right\| _F = \sqrt{\left\| {\varDelta A} \right\| _F^2 + \left\| {\varDelta b} \right\| _2^2}\) where \(\varDelta A\) and \(\varDelta b\) are absolute perturbations of \(A\) and \(b\). In addition to providing us with simplified formulas, this product norm has the advantage, mentioned in [15], to be appropriate for estimating the forward error obtained when the LLS problem is solved via normal equations.

2.1 Conditioning of the Least Squares Solution

Exact formula. We can obtain from [3] a closed formula for the absolute condition number of the LLS solution as

$$\begin{aligned} \kappa _{LS} = \Vert {R^{-1}} \Vert _2 \left( \Vert {R^{-1}} \Vert _2^2 \Vert {r} \Vert _2^2 + \Vert {x} \Vert _2^2 + 1 \right) ^{\frac{1}{2}}, \end{aligned}$$
(3)

where \(x\), \(r\) and \(R\) are exact quantities.

This equation requires mainly to compute the minimum singular value of the matrix \(A\) (or \(R\)), which can be done using iterative procedures like the inverse power iteration on R, or more expensively with the full SVD of R (\(\mathcal {O}(n^3)\) flops). Note that \( \Vert {R^{-T}} \Vert _2\) can be approximated by other matrix norms (see [19, p. 293]).

Statistical estimate. Similarly to [8] for linear systems, we can estimate the condition number of the LLS solution using the method called small-sample theory [20] that provides statistical condition estimates for matrix functions.

Let us denote by \(x(A,b)\) the expression of \(x\) as a function of the data \(A\) and \(b\). Since \(A\) has full rank n, \(x(A,b)\) is continuously F-differentiable in a neighborhood of \((A,b)\). If \(x'(A,b)\) is the derivative of this function, then \(x'(A,b).(\varDelta A,\varDelta b)\) denotes the image of \((\varDelta A,\varDelta b)\) by the linear function \(x'(A,b)\). By Taylor’s theorem, the forward error \(\varDelta x\) on the solution \(x(A,b)\) can be expressed as

$$\begin{aligned} \varDelta x= x'(A,b).(\varDelta A,\varDelta b)+\mathcal {O}(\left\| {(\varDelta A,\varDelta b)} \right\| _F^2). \end{aligned}$$
(4)

Following the definition given in Eq. (2), the condition number of \(x\) corresponds to the operator norm of \(x'(A,b)\), which is a bound to first order on the sensitivity of \(x\) at \((A,b)\) and we have

$$\left\| {\varDelta x} \right\| _2 \le \kappa _{LS} \left\| {(\varDelta A,\varDelta b)} \right\| _F.$$

We now use [20] to estimate \(\left\| {\varDelta x} \right\| _2\) by

$$\begin{aligned} \xi (q)= \frac{\omega _q}{\omega _n} \sqrt{|z_1^T \varDelta x|^2+\cdots +|z_q^T \varDelta x|^2}, \end{aligned}$$
(5)

where \(z_1,\cdots ,z_q\) are random orthogonal vectors selected uniformly and randomly from the unit sphere in \(n\) dimensions, and \(\omega _q\) is the Wallis factor defined by

$$\begin{aligned} \omega _1&= 1,\\ \omega _q&= \frac{1\cdot 3\cdot 5\cdots (q-2)}{2\cdot 4\cdot 6\cdots (q-1)}~\mathrm{for}~q~\mathrm{odd},\\ \omega _q&= \frac{2}{\pi }~\frac{2\cdot 4\cdot 6\cdots (q-2)}{1\cdot 3\cdot 5\cdots (q-1)}~\mathrm{for}~q~\mathrm{even}. \end{aligned}$$

\(\omega _q\) can be approximated by \(\sqrt{\frac{2}{\pi (q-\frac{1}{2})}}\).

It comes from [20] that if for instance we have \(q=2\), then the probability that \(\xi (q)\) lies within a factor \(\alpha \) of \(\left\| {\varDelta x} \right\| _2\) is

$$\begin{aligned} Pr(\frac{\left\| {\varDelta x} \right\| _2}{\alpha }\le \xi (q) \le \alpha \left\| {\varDelta x} \right\| _2) \approx 1-\frac{\pi }{4 \alpha ^2}. \end{aligned}$$
(6)

For \(\alpha =10\), we obtain a probability of \(99.2\,\%\).

For each \(i \in \{1,\cdots ,q\}\), using Eq. (2) we have the first-order bound

$$\begin{aligned} |z_i^T\varDelta x| \le \kappa _i \left\| { (\varDelta A,\varDelta b)} \right\| _F, \end{aligned}$$
(7)

where \(\kappa _i\) denotes the condition number of the function \(z_i^Tx(A,b)\). Then using (5) and (7) we get

$$\xi (q) \le \frac{\omega _q}{\omega _n} \left( \sum _{i=1}^{q} \kappa _i^2 \right) ^{\frac{1}{2}} \left\| {(\varDelta A,\varDelta b)} \right\| _F.$$

\(\xi (q)\) being an estimate of \(\left\| {\varDelta x} \right\| _2\), we will use the quantity \(\bar{\kappa }_{LS}\) defined by

$$\begin{aligned} \bar{\kappa }_{LS} = \frac{\omega _q}{\omega _n} \left( \sum _{i=1}^{q} {\kappa _i}^2\right) ^{\frac{1}{2}} \end{aligned}$$
(8)

as an estimate for \(\kappa _{LS}\).

We point out that \(\bar{\kappa }_{LS}\) is a scalar quantity that must be distinguished from the estimate given in [21] which is a vector. Indeed the small-sample theory is used here to derive an estimate of the condition number of \(x\) whereas it is used in [21] to derive estimates of the condition numbers of the components of \(x\) (see Sect. 2.2). Now we can derive Algorithm 1 that computes \(\bar{\kappa }_{LS}\) as expressed in Eq. (8) and using the condition numbers of \(z_i^Tx\). The vectors \(z_1,\cdots ,z_q\) are obtained for instance via a QR factorization of a random matrix \(Z \in \mathbb {R}^{n\times q}\). The condition number of \(z_i^Tx\) can be computed using the expression given in [3]) as

$$\begin{aligned} \kappa _i = \left( \Vert {R^{-1} R^{-T} z_i} \Vert _2^2 \Vert {r} \Vert _2^2 + \Vert {R^{-T} z_i} \Vert _2^2 ( \Vert {x} \Vert _2^2 + 1) \right) ^{\frac{1}{2}}. \end{aligned}$$
(9)

The accuracy of the estimate can be tweaked by modifying the number \(q\) of considered random samples. The computation of \(\bar{\kappa }_{LS}\) requires computing the QR factorization of an \(n \times q\) matrix for \(\mathcal {O}(nq^2)\) flops. It also involves solving \(q\) times two \(n \times n\) triangular linear systems, each triangular system being solved in \(\mathcal {O}(n^2)\) flops. The resulting computational cost is \(\mathcal {O}(2qn^2)\) flops (if \(n\gg q\)).

figure a

2.2 Componentwise Condition Estimates

In this section, we focus on calculating the condition number for each component of the LLS solution \(x\). The first one is based on the results from [3] and enables us to compute the exact value of the condition numbers for the \(i\)th component of \(x\). The other is a statistical estimate from [21].

Exact formula. By considering in Eq. (9) the special case where \(z_i=e_i\), we can express in Eq. (10) the condition number of the component \(x_i=e_i^Tx\) and then calculate a vector \(\kappa _{CW} \in \mathbb {R}^n\) with components \(\kappa _i\) being the exact condition number for the \(i\)th component expressed by

$$\begin{aligned} \kappa _i = \left( \Vert {R^{-1} R^{-T} e_i} \Vert _2^2 \Vert {r} \Vert _2^2 + \Vert {R^{-T} e_i} \Vert _2^2 ( \Vert {x} \Vert _2^2 + 1) \right) ^{\frac{1}{2}}. \end{aligned}$$
(10)

The computation of one \(\kappa _i\) requires two triangular solves (\(R^Ty=e_i~\mathrm{and}~ Rz=y\)) corresponding to \(2n^2\) flops. When we want to compute all \(\kappa _i\), it is more efficient to solve \(RY=I\) and then compute \(YY^T\), which requires about \(2n^3/3\) flops.

Statistical condition estimate. We can find in [21] three different algorithms to compute statistical componentwise condition estimation for LLS problems. Algorithm 2 corresponds to the algorithm that uses unstructured perturbations and it can be compared with the exact value given in Eq. (10). Algorithm 2 computes a vector \(\bar{\kappa }_{CW} =\left( \bar{\kappa }_1,\cdots ,\bar{\kappa }_n \right) ^T\) containing the statistical estimate of each \(\kappa _i\). Depending on the needed accuracy for the statistical estimation, the number of random perturbations \(q \ge 1\) applied to the input data in Algorithm 2 can be adjusted. This algorithm involves two \(n \times n\) triangular solves with \(q\) right-hand sides, which requires about \(qn^2\) flops.

figure b

3 Numerical Experiments

In the following experiments, random LLS problems are generated using the method given in [22] for generating LLS test problems with known solution \(x\) and residual norm. Random problems are obtained using the quantities \(m\), \(n\), \(\rho \), \(l\) such that \(A \in \mathbb {R}^{m \times n}\), \(\left\| {r} \right\| _2 = \rho \) and \({{\mathrm{cond}}}(A) = n^l\). The matrix \(A\) is generated using

$$\begin{aligned} A = Y \left( \begin{array}{c} D\\ 0\end{array}\right) Z^T,~ Y = I - 2yy^T,~Z = I - 2zz^T \end{aligned}$$

where \(y \in \mathbb {R}^m\) and \(z \in \mathbb {R}^n\) are random unit vectors and \(D = n^{-l} diag(n^l, (n - 1)^l, (n - 2 )^l, \cdots , 1)\). We have \(x = (1, 2^2,..., n^2)^T\), the residual vector is given by \(r = Y \left( \begin{array}{c} 0\\ v\end{array}\right) \) where \(v \in \mathbb {R}^{m-n}\) is a random vector of norm \(\rho \) and the right-hand side is given by \(b = Y \left( \begin{array}{c} DZx\\ v\end{array}\right) \). In Sect. 3.1, we will consider LLS problems of size \(m \times n\) with \(m=9984\) and \(n=2496\). All the experiments were performed using the library LAPACK 3.2 [1] from Netlib.

3.1 Accuracy of Statistical Estimates

Conditioning of LLS Solution. In this section we compare the statistical estimate \(\overline{\kappa }_{LS}\) obtained via Algorithm 1 with the exact condition number \(\kappa _{LS}\) computed using Eq. (3). In our experiments, the statistical estimate is computed using two samples (\(q = 2\)). For seven different values for \({{\mathrm{cond}}}(A)=n^l\) (\(l\) ranging from \(0\) to \(3\), \(n=2496\)) and several values of \(\left\| {r} \right\| _2\), we report in Table 1 the ratio \(\bar{\kappa }_{LS}/\kappa _{LS}\), which is the average of the ratios obtained for \(100\) random problems.

Table 1. Ratio between statistical and exact condition numbers (\(q = 2\))

The results in Table 1 show the relevance of the statistical estimate presented in Sect. 2.1. For \(l \ge \frac{1}{2}\) the averaged estimated values never differ from the exact value by more than one order of magnitude. We observe that when \(l\) tends to \(0\) (i.e., \({{\mathrm{cond}}}(A)\) gets close to 1) the estimate becomes less accurate. This can be explained by the fact that the statistical estimate \(\overline{\kappa }_{LS}\) is based on evaluating the Frobenius norm of the Jacobian matrix [17]. Actually some additional experiments showed that \(\overline{\kappa }_{LS}/\kappa _{LS}\) evolves exactly like \(\left\| {R^{-1}} \right\| _F^2/\left\| {R^{-1}} \right\| _2^2\). In this particular LLS problem we have

$$\begin{aligned} \left\| {R^{-1}} \right\| _F^2/\left\| {R^{-1}} \right\| _2^2&= \left( 1 + (n/(n-1))^{2l}+(n/(n-2))^{2l} +\cdots +n^{2l} \right) /n^{2l}\\&= \sum _{k=1}^{n} \frac{1}{k^{2l}}. \end{aligned}$$

Then when \(l\) tends towards 0, \(\left\| {R^{-1}} \right\| _F/\left\| {R^{-1}} \right\| _2 \sim \sqrt{n}\), whereas this ratio gets closer to 1 when \(l\) increases. This is consistent with the well-known inequality \(1\le \left\| {R^{-1}} \right\| _F/\left\| {R^{-1}} \right\| _2 \le \sqrt{n}\). Note that the accuracy of the statistical estimate does not vary with the residual norm.

Componentwise Condition Estimation. Figure 1 depicts the conditioning for all LLS solution components, computed as \(\kappa _i/|x_i|\) where \(\kappa _i\) is obtained using Eq. (10). Figure 1(a) and (b) correspond to random LLS problems with respectively \({{\mathrm{cond}}}(A)=2.5 \cdot 10^3\) and \({{\mathrm{cond}}}(A)=2.5 \cdot 10^9\). These figures show the interest of the componentwise approach since the sensitivity to perturbations of each solution component varies significantly (from \(10^2\) to \(10^8\) for \({{\mathrm{cond}}}(A)=2.5 \cdot 10^3\), and from \(10^7\) to \(10^{16}\) for \({{\mathrm{cond}}}(A) = 2.5 \cdot 10^9\)). The normalized condition number of the solution computed using Eq. (3) is \(\kappa _{LS}/\left\| {x} \right\| _2 = 2.5 \cdot 10^3\) for \({{\mathrm{cond}}}(A)=2.5 \cdot 10^3\) and \(\kappa _{LS}/\left\| {x} \right\| _2 = 4.5 \cdot 10^{10}\) for \({{\mathrm{cond}}}(A) = 2.5 \cdot 10^9\), which in both cases greatly overestimates or underestimates the conditioning of some components. Note that the LLS sensitivity is here well measured by \({{\mathrm{cond}}}(A)\) since \(\left\| {r} \right\| _2\) is small compared to \(\left\| {A} \right\| _2\) and \(\left\| {x} \right\| _2\), as expected from [25] (otherwise it would be measured by \({{\mathrm{cond}}}(A)^2\)).

Fig. 1.
figure 1

Componentwise condition numbers of LLS (problem size \(9984 \times 2496\))

In Fig. 2 we represent for each solution component, the ratio between the statistical condition estimate computed via Algorithm 2, considering two samples (\(q=2\)), and the exact value computed using Eq. (10). The ratio is computed as an average on 100 random problems. We observe that this ratio is lower than 1.2 for the case \({{\mathrm{cond}}}(A) = 2.5 \cdot 10^3\) (Fig. 2(a)) and close to 1 for the case \({{\mathrm{cond}}}(A) = 2.5 \cdot 10^9\) (Fig. 2(b)), which also confirms that, similarly to \(\overline{\kappa }_{LS}\) in Sect. 3.1, the statistical condition estimate is more accurate for larger values of \({{\mathrm{cond}}}(A)\).

Fig. 2.
figure 2

Comparison between componentwise exact and statistical condition numbers

4 Conclusion

We illustrated how condition numbers of a full column rank LLS problem can be easily computed using exact formulas or statistical estimates at an affordable flop count. Numerical experiments on random LLS problems showed that the statistical estimates provide good accuracy by using only 2 random orthogonal vectors. Subsequently to this work, new routines will be proposed in the public domain libraries LAPACK and MAGMA [24] to compute exact values and statistical estimates for LLS conditioning.