1 Introduction

To quickly construct low-rank matrix approximations, it is advantageous to use CUR (or CGR) cross approximation based on a small number of columns \(C \in {\mathbb {R}}^{M \times r}\) and rows \(R \in {\mathbb {R}}^{r \times N}\) of the original matrix \(A \in {\mathbb {R}}^{M\times N}.\) Generator U is commonly chosen as \(U = {\hat{A}}^{-1},\) where \({\hat{A}} \in {\mathbb {R}}^{r\times r}\) is a submatrix at the intersection of rows R and columns C. Approximation \(C {\hat{A}}^{-1} R\) has an important interpretation: it is equivalent to an incomplete LU decomposition of the matrix A with rows and columns rearranged accordingly. In other words, it is given by the first r steps of Gaussian elimination, with the choice of initial rows and columns corresponding to \({\hat{A}}\) submatrix. In practice, \(C {\hat{A}}^{-1} R\) approximations are built using adaptive cross approximation (ACA) [1] or maxvol [2] algorithm. They also play an important role in the construction of tensor trains [3]. If the resulting submatrix turns out to be a submatrix of locally maximum volume (we will give the exact definition later), then there exist error bounds for the \(C {\hat{A}}^{-1} R\) approximation [4, 5].

Note that in many papers [4, 6, 7] the emphasis is placed on the submatrices of maximum volume. On the other hand, in order to achieve the corresponding estimates, it already suffices to find a submatrix of locally maximum volume. The difference between them is very significant, since finding a submatrix of the maximum volume or a volume close to the maximum is an NP-hard problem [8, 9]. Until now, it was not clear whether there is an algorithm that can find a submatrix \({\hat{A}} \in {\mathbb {R}}^{r \times r}\) of almost locally maximum volume in the matrix \(A \in {\mathbb {R}}^{M \times N}\) of a general form in polynomial time. In this paper, such an algorithm is presented for the first time, estimates for the number of steps are given and the accuracy of the obtained approximation is proved. Obtaining a submatrix of almost locally maximum volume allows us to construct cross and incomplete LU decompositions with guaranteed accuracy in the spectral norm and Chebyshev norms.

Before we go any further, it is important to mention, that for symmetric positive semi-definite (SPSD) matrices locally maximum volume search is much easier, as one can only consider principle submatrices. A \(\rho \)-locally maximum volume principle submatrix can be found using the strong rank revealing Cholesky factorization algorithm [10]. Since any SPSD matrix \(A \in {\mathbb {R}}^{N \times N}\) can be written in the form \(B^T B,\) \(B \in {\mathbb {R}}^{N \times N},\) the search of the locally maximum volume principle submatrix is equivalent to the search for \(N \times r\) locally maximum volume columns in B,  which requires \(O \left( r \log _{\rho } r \right) \) column exchanges [11, Theorem 2]. Consequently, strong rank revealing Cholesky factorization can also be achieved in \(O \left( r \log _{\rho } r \right) \) exchanges, which require in total \(O \left( N r^2 \log _{\rho } r \right) \) operations, when using the algorithm from [10]. This complexity bound has been proven directly for SPSD matrices in [12] (Section 2.2.3).

2 Definitions

We will consider the problem of finding the submatrix \({\hat{A}}\) in terms of constructing the corresponding incomplete LU decomposition. Incomplete LU decomposition can serve as a criterion to estimate the rank of a matrix if the rows and columns are chosen appropriately. Such LU decompositions are called rank revealing LU (RRLU).

Similarly, RRQR is an abbreviation for the rank revealing QR decomposition. Strong RRQR decomposition plays an important role in constructing estimates and algorithms for RRLU decompositions.

Definition 1

Incomplete QR decomposition with column pivoting

$$\begin{aligned} A \Pi = Q\left[ {\begin{array}{ll} {R}&{}B \\ {}&{}C \end{array}} \right] \in {\mathbb {R}}^{M \times N}, \end{aligned}$$

where \(Q \in {\mathbb {R}}^{M \times M}\) is orthogonal and \(R \in {\mathbb {R}}^{r \times r}\) is upper triangular is called “rank revealing”, if

$$\begin{aligned} \sigma _r (R) \geqslant \sigma _r (A) / p_1(r,N) \end{aligned}$$

and

$$\begin{aligned} \sigma _1 (C) \leqslant \sigma _{r+1} (A) p_2(r, N), \end{aligned}$$

where \(p_1(r, N)\) and \(p_2(r, N)\) are polynomials or some functions which can be bounded by polynomials in r and N.

It is called “spectrum revealing” if also

$$\begin{aligned} \sigma _i (R) \geqslant \sigma _i (A) / p_1(r,N), \quad i = \left\{ 1, \ldots , r-1 \right\} . \end{aligned}$$

It is called “strong” [13] if also

$$\begin{aligned} \sigma _i (C) \leqslant \sigma _{r+i} (A) p_2(r, N), \quad i = \left\{ 1, \ldots , \min (M-r, N-r) \right\} \end{aligned}$$

and

$$\begin{aligned} \left\| R^{-1} B \right\| _C \leqslant f = {{\textrm{const}}}, \end{aligned}$$

where

$$\begin{aligned} \left\| A \right\| _C = \max \limits _{i,j} \left|A_{ij} \right|\end{aligned}$$

is the Chebyshev norm.

Similar definitions apply for RRLU.

Definition 2

Incomplete LU decomposition with row and column pivoting

$$\begin{aligned} \Pi _1 A \Pi _2 = \left[ {\begin{array}{ll} {L_{11}}&{} \\ {L_{21}}&{}I_{M-r} \end{array}} \right] \left[ {\begin{array}{ll} {U_{11}}&{}U_{12} \\ {}&{}U_{22} \end{array}} \right] \in {\mathbb {R}}^{M \times N}, \end{aligned}$$
(1)

where \(L_{11} \in {\mathbb {R}}^{r \times r}\) is unit lower triangular and \(U_{11} \in {\mathbb {R}}^{r \times r}\) is upper triangular is called “rank revealing”, if

$$\begin{aligned} \sigma _r (L_{11} U_{11}) \geqslant \sigma _r (A) / p_1(r,M,N) \end{aligned}$$

and

$$\begin{aligned} \sigma _1 (U_{22}) \leqslant \sigma _{r+1} (A) p_2(r, M, N), \end{aligned}$$

where \(p_1(r, M, N)\) and \(p_2(r, M, N)\) are polynomials or some functions which can be bounded by polynomials in rM and N.

It is called “spectrum revealing” [14] if also

$$\begin{aligned} \sigma _i (L_{11} U_{11}) \geqslant \sigma _i (A) / p_1(r,M,N), \quad i = \left\{ 1, \ldots , r-1 \right\} . \end{aligned}$$

It is called “strong” [15] if also

$$\begin{aligned}{} & {} \sigma _i (U_{22}) \leqslant \sigma _{r+i} (A) p_2(r, M, N), \quad i = \left\{ 1, \ldots , \min (M-r, N-r) \right\} , \end{aligned}$$
(2)
$$\begin{aligned}{} & {} \left\| L_{21} L_{11}^{-1} \right\| _C \leqslant f = {\textrm{const}} \end{aligned}$$
(3)

and

$$\begin{aligned} \left\| U_{11}^{-1} U_{12} \right\| _C \leqslant f = {\textrm{const}}. \end{aligned}$$
(4)

Construction of RRQR and RRLU decompositions can be considered equivalent to choosing r columns in the case of RRQR and r rows and columns in the case of RRLU, which are given by permutations of \(\Pi ,\) \(\Pi _1\) and \(\Pi _2\) respectively. From an algorithmic perspective, searching for these permutations usually takes more time than constructing RRLU or RRQR based on rows and columns, which are already given. In particular, after choosing columns, constructing an incomplete QR decomposition takes \(O \left( \text {nnz} \left( A \right) r + M r^2 \right) \) operations, where \(\text {nnz} \left( A \right) \) is the number of nonzero elements of the matrix A. The construction of the factors \(L = \left[ {\begin{array}{ll} {L_{11}} \\ {L_{21}} \end{array}} \right] \) and \(U = \left[ {\begin{array}{ll} {U_{11}}&U_{12} \end{array}} \right] \) in the incomplete LU decomposition requires only \(O \left( \left( M + N \right) r^2 \right) \) arithmetic operations (when the rows and columns, corresponding to \(L_{11}\) and \(U_{11},\) are known). Due to the linear complexity of RRLU in terms of matrix sizes, algorithms for finding a large volume submatrix are often used, which also have linear complexity [1, 2]. They do not, however, guarantee high approximation accuracy precisely because that linear complexity does not allow them to consider the entire matrix and select close to optimal rows and columns.

As will be shown, to achieve strong RRLU decomposition, it suffices to choose permutations \(\Pi _1\) and \(\Pi _2\) so that the first r rows and columns correspond to a submatrix of \(\rho \)-locally maximum volume, \(\rho = {\textrm{const}}.\) Thus, the problem of constructing a strong RRLU decomposition is reduced to finding a submatrix of \(\rho \)-locally maximum volume and proving the “strong” properties of the corresponding decomposition.

Definition 3

The volume of the matrix \({\hat{A}} \in {\mathbb {R}}^{n \times r},\) \(n \geqslant r,\) is the following quantity:

$$\begin{aligned} {{\mathcal {V}}}\left( {\hat{A}} \right) = \sqrt{\det \left( A^T A \right) }. \end{aligned}$$

In the case of a square matrix \((n = r),\)

$$\begin{aligned} {{\mathcal {V}}}\left( {\hat{A}} \right) = \left|\det {\hat{A}} \right|. \end{aligned}$$

Definition 4

A full rank submatrix \({\hat{A}} \in {\mathbb {R}}^{m \times n}\) of a matrix \(A \in {\mathbb {R}}^{M \times N}\) is said to have \(\rho \)-locally maximum volume (hereinafter, estimates with \(\rho \geqslant 1\) are considered) if the permutation of its arbitrary i-th row and/or j-th column with any other row k and/or column l of the matrix A increases its volume by at most \(\rho \) times.

A submatrix is said to have locally maximum volume if \(\rho = 1.\)

Note that, unlike the definition in [16], one row and one column can be replaced at the same time.

Taking into account the definitions above, we can now formulate the main result of this paper.

Theorem 1

Strong RRLU decomposition of rank r of the matrix \(A \in {\mathbb {R}}^{M \times N}\) can be constructed on the basis of a submatrix of \(3 \rho \)-locally maximum volume of size \(r \times r.\) Moreover,  such a submatrix can be found in \(O \left( MNr \left( \log r + \log _{\rho } r \right) \right) \) operations,  and \(\rho \)-locally maximum volume submatrix with \(\rho \leqslant 3\) can be found in \(O \left( MN r^3 \log _{\rho } r \right) \) operations.

Before we move on to construct a decomposition, which satisfies Theorem 1, it is worth paying attention to other existing methods for constructing RRQR and RRLU decompositions with proven bounds on the number of steps and approximation error, since they can also be used to obtain the required spectral norm approximation accuracy.

3 Strong RRQR

We start with RRQR decomposition, since it can be used as a first step to constructing RRLU decomposition. The following theorem was proved in [13].

Theorem 2

[13] Let the submatrix \(A_1 \in {\mathbb {R}}^{M \times r}\) consisting of r columns of the matrix \(A \in {\mathbb {R}}^{M \times N}\) have \(\rho \)-locally maximum volume.

Then,  based on the columns \(A_1,\) one can construct strong RRQR decomposition with

$$\begin{aligned} \left\| R^{-1} B \right\| _C \leqslant \rho \end{aligned}$$

and

$$\begin{aligned} p_1(r, N) = p_2(r, N) = \sqrt{1 + \rho ^2 r (N-r)}. \end{aligned}$$

Moreover,  a \(\rho \)-locally maximal submatrix can be found in \(O\left( \left( nnz(A)r + \left( M+N \right) r^2 \right) \left( 1 + \log _ {\rho } r \right) \right) \) operations,  where nnz(A) is the number of nonzero elements in the matrix A.

The complexity bound was first proved in [17] and then improved in [11].

4 Rank revealing LU

Now we move on to constructing RRLU decomposition. Recall the main theorem from [16] and an algorithm for obtaining bounds from it, based on strong RRQR [13]. Note that an alternative algorithm suggested in [16] does not have a bound on the number of steps.

Theorem 3

[16] Let the submatrix \(A_1 \in {\mathbb {R}}^{M \times r}\) consisting of r columns of the matrix \(A \in {\mathbb {R}}^{M \times N}\) have \(\rho \)-locally maximum volume in A. Let the submatrix \({\hat{A}} \in {\mathbb {R}}^{r \times r}\) have \(\rho \)-local maximum volume in \(A_1.\)

Then,  based on the columns \(A_1\) and the rows corresponding to \({\hat{A}},\) one can construct a spectrum revealing LU decomposition with

$$\begin{aligned} p_1(r, M, N) = p_2(r, M, N) = \sqrt{1 + \rho ^2 r (N-r)}\sqrt{1 + \rho ^2 r (M-r)}. \end{aligned}$$

In addition, 

$$\begin{aligned} \left\| L_{21} L_{11}^{-1} \right\| _C \leqslant \rho \end{aligned}$$

and

$$\begin{aligned} \left\| A_1^{\dagger } A \right\| _C \leqslant \rho , \end{aligned}$$

where \(A_1^{\dagger }\) denotes the Moore–Penrose pseudoinverse of \(A_1.\)

Moreover,  submatrices \(A_1\) and \({\hat{A}}\) can be found in \(O\left( (nnz(A)r + (M+N)r^2) \left( 1 + \log _{\rho } r \right) \right) \) operations,  where nnz(A) is the number of nonzero elements of the matrix A.

The bound on the complexity follows directly from Theorem 2.

Since in [16] it was not proved that the decomposition is spectrum revealing, we prove it here. To do this, we will apply spectrum revealing estimates for RRQR twice.

Using the spectrum revealing property of QR based on the submatrix \(A_1\) of \(\rho \)-locally maximum volume (Theorem 2), we obtain the inequality

$$\begin{aligned} \sigma _i(A_1) \geqslant \sigma _i \left( A \right) / \sqrt{1 + \rho ^2 r (N-r)}. \end{aligned}$$
(5)

Next, columns \(A_1^T \in {\mathbb {R}}^{r \times M}\) contain the submatrix \({\hat{A}}^T \in {\mathbb {R}}^{r \times r}\) of \(\rho \)-locally of maximum volume, and the QR decomposition \(A_1^T \Pi = Q_1 \left[ R_1 \; B_1 \right] ,\) where \(Q_1 R_1 = {\hat{A}}^T,\) according to Theorem 2 has the following spectrum revealing property:

$$\begin{aligned} \sigma _i \left( {\hat{A}}^T \right) \geqslant \sigma _i \left( A_1^T \right) / \sqrt{1 + \rho ^2 (M-r)}. \end{aligned}$$
(6)

Since \({\hat{A}} = L_{11} U_{11},\) and \(A_1 = \left[ {\begin{array}{ll} {{L_{11}}} \\ {{L_{21}}} \end{array}} \right] {U_{11}},\) then, combining (6) and (5) together leads to the following inequality:

$$\begin{aligned} \sigma _i (L_{11} U_{11})&\geqslant \sigma _i \left( \left[ {\begin{array}{ll} {{L_{11}}} \\ {{L_{21}}} \end{array}} \right] {U_{11}}\right) / \sqrt{1 + \rho ^2 r (M-r)} \nonumber \\&= \sigma _i \left( A_1 \right) / \sqrt{1 + \rho ^2 r (M-r)} ) \nonumber \\&\geqslant \sigma _i \left( A \right) / \left( \sqrt{1 + \rho ^2 r (M-r)} \sqrt{1 + \rho ^2 r (N-r)} \right) , \end{aligned}$$
(7)

which proves the spectrum revealing property of the resulting RRLU decomposition.

The columns \(A_1\) can be found numerically using the RRQR algorithm from [13], while the submatrix \({\hat{A}} \in {\mathbb {R}}^{r \times r}\) in the already fixed columns \(A_1 \in {\mathbb {R}}^{M \times r}\) can also be found using the maxvol [2] algorithm.

Although we are not aware of an example, which shows that Theorem 3 may not provide the strong RRLU bounds (2), it can definitely violate the condition (4), which, as we will see later, is crucial for proving the bounds (2).

For example, consider the matrix

$$\begin{aligned} A = \left[ {\begin{array}{llll} {1 + \varepsilon } &{} {\sqrt{N} } &{} \cdots &{} {\sqrt{N} } \\ 1 &{} 0 &{} \cdots &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ 1 &{} 0 &{} \cdots &{} 0 \\ \end{array} } \right] \in {{\mathbb {R}}^{N \times N}} \end{aligned}$$

with arbitrary small \(\varepsilon \) and its rank 1 approximation, based on the Theorem 3. First, we select the locally maximum volume column, i.e., the column with the maximum length. This is always going to be the first column. Then we select \(1 \times 1\) maximum volume submatrix in this column, which is going to be the element \(1 + \varepsilon .\) It differs from the locally maximum volume submatrix in the first row by a factor arbitrarily close to \(\sqrt{N}.\) Namely, we have \(f = \sqrt{N}/(1 + \varepsilon ),\) which depends on the matrix size and thus contradicts the “strong” property (4). Nevertheless, it does not directly prevent (2), which may or may not hold.

5 Strong RRLU

As we have seen, rank revealing LU from the previous section provides reasonable approximation accuracy in the spectral norm, from which the Chebyshev norm bound can also be derived. Nevertheless, finding close to locally maximum volume submatrix not only additionally guarantees the “strong” properties (2)–(4), but also provides a much better Chebyshev norm bound, which does not depend on matrix sizes [5].

For the first time, it was proposed to search for a submatrix of locally maximum volume in the entire matrix in [15]. In addition, the following criterion was proved. Here Eqs. (8) and (9) provide the same bounds as Eqs. (3) and (4).

Lemma 1

[15] Consider a matrix \(A \in {\mathbb {R}}^{M \times N}\) with the following block structure

$$\begin{aligned} A = \left[ {\begin{array}{ll} {{A_{11}}}&{}{{A_{12}}} \\ {{A_{21}}}&{}{{A_{22}}} \end{array}} \right] , \end{aligned}$$

with

$$\begin{aligned} C = \left[ {\begin{array}{ll} {{A_{11}}} \\ {{A_{21}}} \end{array}} \right] \in {\mathbb {R}}^{M \times r}, \end{aligned}$$

and

$$\begin{aligned} R = \left[ {\begin{array}{ll} {{A_{11}}}&{{A_{12}}} \end{array}} \right] \in {\mathbb {R}}^{r \times N}. \end{aligned}$$

Then submatrix \({\hat{A}} = A_{11} \in {\mathbb {R}}^{r \times r}\) has \(\rho \)-locally maximum volume if and only if the following inequalities are true : 

$$\begin{aligned}{} & {} \left\| {\hat{A}}^{-1} R \right\| _C \leqslant \rho , \end{aligned}$$
(8)
$$\begin{aligned}{} & {} \left\| C {\hat{A}}^{-1} \right\| _C \leqslant \rho , \end{aligned}$$
(9)
$$\begin{aligned}{} & {} \mathop {\max }\limits _{i,j,k,l} \left|{{{\left( {{{{\hat{A}}}^{ - 1}}R} \right) }_{jl}}{{\left( {C{{{\hat{A}}}^{ - 1}}} \right) }_{ki}} + {\hat{A}}_{ji}^{ - 1}{{\left( {A - C{{{\hat{A}}}^{ - 1}}R} \right) }_{kl}}} \right|\leqslant \rho . \end{aligned}$$
(10)

Moreover,  when the i-th row of the submatrix \({\hat{A}}\) is replaced by the k-th row of the matrix A,  the volume of the submatrix is multiplied by the factor \(\left|\left( C {\hat{A}}^{-1} \right) _{ki} \right|;\) when the j-th column of the submatrix \({\hat{A}}\) is replaced by the l-th column of the matrix A,  the volume is multiplied by the factor \(\left|\left( {\hat{A}}^{-1} R \right) _{jl} \right|;\) when replacing a row and a column,  the volume is multiplied by the factor \(\left|{{{\left( {{{{\hat{A}}}^{ - 1}}R} \right) }_{jl}}{{\left( {C{{{\hat{A}}}^{ - 1}}} \right) }_{ki}} + {\hat{A}}_{ji}^{ - 1}{{\left( {A - C{{{\hat{A}}}^{ - 1}}R} \right) }_{ kl}}} \right|.\) Hereinafter \({\hat{A}}_{ji}^{-1}\) denotes the ji-th element of \({\hat{A}}^{-1}.\)

Finally, the following theorem was proved.

Theorem 4

[15] Based on rows \(R \in {\mathbb {R}}^{r \times N}\) and columns \(C \in {\mathbb {R}}^{M \times r}\) corresponding to submatrix \({\hat{A}} \in {\mathbb {R}}^{r \times r}\) of matrix \(A \in {\mathbb {R}}^{M \times N}\) such that

$$\begin{aligned}{} & {} \left\| {\hat{A}}^{-1} R \right\| _C \leqslant \sqrt{\rho },\\{} & {} \left\| C {\hat{A}}^{-1} \right\| _C \leqslant \sqrt{\rho } \end{aligned}$$

and

$$\begin{aligned} \left\| {\hat{A}}^{-1} \right\| _C \left\| A - C {\hat{A}}^{-1} R \right\| _C \leqslant 2 \rho \end{aligned}$$
(11)

(it is also sufficient that \({\hat{A}}\) has \(\sqrt{\rho }\)-locally maximum volume in the entire matrix) one can construct a strong RRLU with

$$\begin{aligned} f= & {} \sqrt{\rho },\\ p_1(r, M, N)= & {} p_2(r, M, N) = \sqrt{(1 + 3 \rho r (M-r))(1 + 3 \rho r (N-r))}. \end{aligned}$$

In [15] the result was formulated less rigorously, so we repeat its proof here.

Proof

First, we prove that

$$\begin{aligned} \sigma _i \left( U_{22} \right) \!\leqslant \! \sqrt{ \left( 1 \!+\! 3 \rho r \left( N\!-\!r \right) \right) \left( 1 \!+\! 3 \rho r \left( N-r \right) \right) } \sigma _{r+i} \left( A \right) = \sigma _{r+i} \left( A \right) p_2 \left( r, M, N \right) . \end{aligned}$$

Like in Lemma 1, we can assume \({\hat{A}}\) is on the intersection of the first r rows and columns, \({\hat{A}} = A_{11},\) where the entire matrix A can be written in block form as

$$\begin{aligned} A = \left[ {\begin{array}{ll} {{A_{11}}}&{}{{A_{12}}} \\ {{A_{21}}}&{}{{A_{22}}} \end{array}} \right] . \end{aligned}$$

Then \(U_{22} = A_{22} - A_{21} A_{11}^{-1} A_{12}\) and submatrix \({\hat{A}} = A_{11}\) corresponds to the rows \(R = \left[ {\begin{array}{ll} {{A_{11}}}&{{A_{12}}} \end{array}} \right] \) and columns \(C =\left[ {\begin{array}{ll} {{A_{11}}} \\ {{A_{21}}} \end{array}} \right] .\) The error of the corresponding cross approximation is

$$\begin{aligned} A - C {\hat{A}}^{-1} R = \left[ {\begin{array}{ll} {0}&{}{0} \\ {0}&{}{A_{22} - A_{21} A_{11}^{-1} A_{12}} \end{array}} \right] = \left[ {\begin{array}{ll} {0}&{}{0} \\ {0}&{}{U_{22}} \end{array}} \right] . \end{aligned}$$

Consider the matrix

$$\begin{aligned} D = \left[ {\begin{array}{ll} {ab A_{11}}&{}{0} \\ {0}&{}{A_{22} - A_{21} A_{11}^{-1} A_{12}} \end{array}} \right] , \end{aligned}$$
(12)

where

$$\begin{aligned} a= & {} \sqrt{2\rho r \left( M-r \right) }, \\ b= & {} \sqrt{2\rho r \left( N-r \right) }. \end{aligned}$$

Observe that

$$\begin{aligned} \left\| A_{11}^{-1} \right\| _2 \Vert A_{22} - A_{21} A_{11}^{-1} A_{12} \Vert _2= & {} \left\| {\hat{A}}^{-1} \right\| _2 \Vert A - C {\hat{A}}^{-1} R \Vert _2\nonumber \\\leqslant & {} r \left\| {\hat{A}}^{-1} \right\| _C \cdot \sqrt{\left( M-r \right) \left( N-r \right) } \Vert A - C {\hat{A}}^{-1} R \Vert _C \nonumber \\\leqslant & {} 2 \rho r \sqrt{\left( M-r \right) \left( N-r \right) } \nonumber \\= & {} ab, \end{aligned}$$
(13)

where we used condition (11). Hereinafter

$$\begin{aligned} \left\| A \right\| _2 = \sigma _1 \left( A \right) \end{aligned}$$

denotes the spectral norm.

Then, using (13), we can write the following inequalities for the singular values, corresponding to the submatrices of D:

$$\begin{aligned} \sigma _{\min } \left( ab A_{11} \right) = ab \sigma _{\min } \left( A_{11} \right) = ab / \left\| A_{11}^{-1} \right\| _2 \geqslant \Vert A_{22} - A_{21} A_{11}^{-1} A_{12} \Vert _2. \end{aligned}$$

Since \(\textrm{rank}\hspace{2.0pt}A_{11} = r,\) the first r largest singular values of D (12) correspond to the submatrix \(ab A_{11},\) so for the rest of its singular values

$$\begin{aligned} \sigma _{r+i} \left( D \right) = \sigma _i \left( A_{22} - A_{21} A_{11}^{-1} A_{12} \right) . \end{aligned}$$

Now, let us factorize matrix D as follows:

$$\begin{aligned} D = \left[ {\begin{array}{ll} {ab A_{11}}&{}{0} \\ {0}&{}{A_{22} - A_{21} A_{11}^{-1} A_{12}} \end{array}} \right] = \left[ {\begin{array}{ll} {a I}&{}{0} \\ {-A_{21} A_{11}^{-1}}&{}{I} \end{array}} \right] \left[ {\begin{array}{ll} {{A_{11}}}&{}{{A_{12}}} \\ {{A_{21}}}&{}{{A_{22}}} \end{array}} \right] \left[ {\begin{array}{ll} {b I}&{}{-A_{11}^{-1} A_{12}} \\ {0}&{}{I} \end{array}} \right] . \end{aligned}$$

Then

$$\begin{aligned}{} & {} \sigma _i \left( A_{22} - A_{21} A_{11}^{-1} A_{12} \right) = \sigma _{r+i} \left( D \right) \nonumber \\{} & {} \quad \leqslant \left\| \left[ {\begin{array}{ll} {a I}&{}{0} \\ {-A_{21} A_{11}^{-1}}&{}{I} \end{array}} \right] \right\| _2 \sigma _{r+i} \left( \left[ {\begin{array}{ll} {{A_{11}}}&{}{{A_{12}}} \\ {{A_{21}}}&{}{{A_{22}}} \end{array}} \right] \right) \left\| \left[ {\begin{array}{ll} {b I}&{}{-A_{11}^{-1} A_{12}} \\ {0}&{}{I} \end{array}} \right] \right\| _2. \end{aligned}$$
(14)

Spectral norms of the block matrices can be estimated as

$$\begin{aligned} \left\| \left[ {\begin{array}{ll} {a I}&{}{0} \\ {-A_{21} A_{11}^{-1}}&{}{I} \end{array}} \right] \right\| _2^2\leqslant & {} \left\| a I \right\| _2^2 + \left\| -A_{21} A_{11}^{-1} \right\| _2^2 + \left\| I \right\| _2^2\nonumber \\\leqslant & {} 1 + a^2 + r(M-r) \left\| A_{21} A_{11}^{-1} \right\| _C^2\nonumber \\\leqslant & {} 1 + 2 \rho r \left( M-r \right) + r \left( M-r \right) \rho \nonumber \\= & {} 1 + 3 \rho r \left( M-r \right) . \end{aligned}$$
(15)

Similarly,

$$\begin{aligned} \left\| \left[ {\begin{array}{ll} {b I}&{}{- A_{11}^{-1} A_{12}} \\ {0}&{}{I} \end{array}} \right] \right\| _2^2 \leqslant 1 + 3 \rho r \left( N-r \right) . \end{aligned}$$
(16)

Substituting (15) and (16) into (14) leads us to the desired inequality:

$$\begin{aligned} \sigma _i \left( U_{22} \right)= & {} \sigma _i \left( A_{22} - A_{21} A_{11}^{-1} A_{12} \right) \\\leqslant & {} \sqrt{ \left( 1 + 3 \rho r \left( N-r \right) \right) \left( 1 + 3 \rho r \left( N-r \right) \right) } \sigma _{r+1} \left( A \right) \\= & {} p_2 \left( r, M, N \right) \sigma _{r+i} \left( A \right) . \end{aligned}$$

Next, we prove the “spectrum revealing” property for \(p_1 \left( r, M, N \right) = p_2 \left( r, M, N \right) \):

$$\begin{aligned} \sigma _i \left( L_{11} U_{11} \right) = \sigma _i \left( A_{11} \right) \geqslant \sigma _i \left( A \right) / \sqrt{ \left( 1 + 3 \rho r \left( N-r \right) \right) \left( 1 + 3 \rho r \left( N-r \right) \right) }. \end{aligned}$$
(17)

To do that, we can use the following factorization:

$$\begin{aligned} A = \left[ {\begin{array}{ll} {{A_{11}}}&{}{{A_{12}}} \\ {{A_{21}}}&{}{{A_{22}}} \end{array}} \right] = \left[ {\begin{array}{ll} {I}&{}{0} \\ {A_{21} A_{11}^{-1}}&{}{aI} \end{array}} \right] \left[ {\begin{array}{ll} {A_{11}}&{}{0} \\ {0}&{}{\frac{1}{ab} \left( A_{22} - A_{21} A_{11}^{-1} A_{12} \right) } \end{array}} \right] \left[ {\begin{array}{ll} {I}&{}{A_{11}^{-1} A_{12}} \\ {0}&{}{bI} \end{array}} \right] . \end{aligned}$$

Then

$$\begin{aligned}{} & {} \sigma _i \left( A \right) \nonumber \\{} & {} \quad \leqslant \left\| \left[ {\begin{array}{ll} {I}&{}{0} \\ {A_{21} A_{11}^{-1}}&{}{aI} \end{array}} \right] \right\| _2 \sigma _i \left( \left[ {\begin{array}{ll} {A_{11}}&{}{0} \\ {0}&{}{\frac{1}{ab} \left( A_{22} - A_{21} A_{11}^{-1} A_{12} \right) } \end{array}} \right] \right) \left\| \left[ {\begin{array}{ll} {I}&{}{A_{11}^{-1} A_{12}} \\ {0}&{}{bI} \end{array}} \right] \right\| _2. \end{aligned}$$
(18)

We again estimate the spectral norms, similar to (15) and (16):

$$\begin{aligned} \left\| \left[ {\begin{array}{ll} {I}&{}{0} \\ {A_{21} A_{11}^{-1}}&{}{aI} \end{array}} \right] \right\| _2^2 \leqslant 1 + 3 \rho r \left( M-r \right) \end{aligned}$$
(19)

and

$$\begin{aligned} \left\| \left[ {\begin{array}{ll} {I}&{}{- A_{11}^{-1} A_{12}} \\ {0}&{}{bI} \end{array}} \right] \right\| _2^2 \leqslant 1 + 3 \rho r \left( N-r \right) . \end{aligned}$$
(20)

Moreover,

$$\begin{aligned} \sigma _i \left( \left[ {\begin{array}{ll} {A_{11}}&{}{0} \\ {0}&{}{\frac{1}{ab} \left( A_{22} - A_{21} A_{11}^{-1} A_{12} \right) } \end{array}} \right] \right) = \sigma _i \left( \frac{1}{ab} D \right) = \sigma _i \left( A_{11} \right) \end{aligned}$$
(21)

as now we have \(i \leqslant r,\) and we have already shown that the first r singular values correspond to the submatrix \(A_{11}.\) Combining together Eqs. (18) through (21), we obtain

$$\begin{aligned} \sigma _i \left( A \right) \leqslant \sqrt{ \left( 1 + 3 \rho r \left( N-r \right) \right) \left( 1 + 3 \rho r \left( N-r \right) \right) } \sigma _i \left( A_{11} \right) . \end{aligned}$$
(22)

Equation (17) follows directly from (22). \(\square \)

However, it is still not clear how to find a submatrix of \(\rho \)-locally maximum volume fast enough. The authors themselves state in [14] that there are still no effective algorithms for its search. Let us show that it is possible to reach the bounds of Theorem 4 fairly quickly.

We will start from the initial \(r \times r\) submatrix, constructed using Gaussian elimination with complete pivoting. For our purposes, we will require the following result, which is also useful for adaptive cross approximation [18].

Lemma 2

[18] Let the rows of the submatrix \({\hat{A}} \in {\mathbb {R}}^{r \times r}\) be obtained by Gaussian elimination with partial pivoting in the columns \(C \in {\mathbb {R}}^{M \times r}.\) Then

$$\begin{aligned} \left\| C {\hat{A}}^{-1} \right\| _C \leqslant 2^{r-1}. \end{aligned}$$

Obviously, the same result will hold for complete pivoting. In complete pivoting, we will also have the inequality for \(\left\| {\hat{A}}^{-1} R \right\| _C \leqslant 2^{r-1}.\) Moreover, in [19] the following result was proved.

Lemma 3

[19] Let the rows \(R \in {\mathbb {R}}^{M \times r}\) and columns \(C \in {\mathbb {R}}^{r \times N}\) corresponding to submatrix \({\hat{A}} \in {\mathbb {R}}^{r \times r}\) be the first r rows and columns selected by Gaussian elimination with complete pivoting for the matrix \(A \in {\mathbb {R}}^{M \times N}.\) Then

$$\begin{aligned} \left\| A - C {\hat{A}}^{-1} R \right\| _C \leqslant 4^r \gamma _r \sigma _{r+1} \left( A \right) , \end{aligned}$$

where \(\gamma _r \leqslant 2 \sqrt{r+1} (r+1)^{\frac{\ln (r+1)}{4}}\) is the growth factor.

Corollary 1

Under the conditions of Lemma 3,

$$\begin{aligned} \left\| A - C {\hat{A}}^{-1} R \right\| _C \leqslant 4^{r-1/2} r^{\frac{2 + \ln r}{4}} \left\| {\hat{A}}^{-1} \right\| _2^{-1}. \end{aligned}$$

Proof

Consider the submatrix \({\bar{A}} \in {\mathbb {R}}^{(r-1) \times (r-1)},\) obtained after \(r-1\) steps of Gaussian elimination. Applying Lemma 3 to the matrix \({\hat{A}}\) (instead of A), after \(r-1\) steps (instead of r) we have

$$\begin{aligned} \left\| {\hat{A}} - {\hat{C}} {\bar{A}}^{-1} {\hat{R}} \right\| _C \leqslant 4^{r-1} \gamma _{r-1} \sigma _r \left( {\hat{A}} \right) , \end{aligned}$$

where \({\hat{R}} \in {\mathbb {R}}^{(r-1) \times r}\) and \({\hat{C}} \in {\mathbb {R}}^{r \times (r-1)}\) are rows and columns of \({\hat{A}},\) corresponding to \({\bar{A}}.\)

Then, by definition of the growth factor, we can estimate the error at the r-th step of Gaussian elimination:

$$\begin{aligned} \left\| A - C {\hat{A}}^{-1} R \right\| _C\leqslant & {} \gamma _1 \left\| {\hat{A}} - {\hat{C}} {\bar{A}}^{-1} {\hat{R}} \right\| _C \\\leqslant & {} 2 \left\| {\hat{A}} - {\hat{C}} {\bar{A}}^{-1} {\hat{R}} \right\| _C \\\leqslant & {} 4^{r-1/2} \gamma _{r-1} \sigma _r \left( {\hat{A}} \right) \\\leqslant & {} 4^{r-1/2} r^{\frac{2 + \ln r}{4}} \sigma _r \left( {\hat{A}} \right) \\= & {} 4^{r-1/2} r^{\frac{2 + \ln r}{4}} \left\| {\hat{A}}^{-1} \right\| _2^{-1}. \end{aligned}$$

\(\square \)

Combining Lemma 2 and Corollary 1, we see that Gaussian elimination produces a \(\rho \)-locally maximum volume submatrix, albeit with quite a large \(\rho .\)

Corollary 2

Let the rows \(R \in {\mathbb {R}}^{M \times r}\) and columns \(C \in {\mathbb {R}}^{r \times N}\) corresponding to submatrix \({\hat{A}} \in {\mathbb {R}}^{r \times r}\) be the first r rows and columns selected by Gaussian elimination with complete pivoting for the matrix \(A \in {\mathbb {R}}^{M \times N}.\) Then \({\hat{A}}\) is a \(\rho \)-locally maximum volume submatrix with

$$\begin{aligned} \rho \leqslant 3 \cdot 4^{r-1} r^{\frac{2 + \ln r}{4}}. \end{aligned}$$

Proof

Comparing Lemma 2 with the conditions (8) and (9) of Lemma 1, we see that they are satisfied for \(\rho = 2^{r-1}.\) Let us now check the last condition, where we substitute the inequalities, provided by Lemma 2 and Corollary 1:

$$\begin{aligned}{} & {} \mathop {\max }\limits _{i,j,k,l} \left|{{{\left( {{{{\hat{A}}}^{ - 1}}R} \right) }_{ik}}{{\left( {C{{{\hat{A}}}^{ - 1}}} \right) }_{jl}} + {\hat{A}}_{ij}^{ - 1}{{\left( {A - C{{{\hat{A}}}^{ - 1}}R} \right) }_{lk}}} \right|\\{} & {} \quad \leqslant \left\| C {\hat{A}}^{-1} \right\| _C \left\| {\hat{A}}^{-1} R \right\| _C + \left\| {\hat{A}}^{-1} \right\| _C \left\| A - C {\hat{A}}^{-1} R \right\| _C \\{} & {} \quad \leqslant 4^{r-1} + 4^{r-1/2} r^{\frac{2 + \ln r}{4}} \\{} & {} \quad \leqslant 3 \cdot 4^{r-1} r^{\frac{2 + \ln r}{4}}. \end{aligned}$$

\(\square \)

Finally, we will need a result on the closeness of the volume of a submatrix of locally maximum volume to the volume of the maximum volume submatrix.

Lemma 4

Let \({\hat{A}} \in {\mathbb {C}}^{r \times r}\) be a \(\rho \)-locally maximum volume in the matrix \(A \in {\mathbb {C}}^{M \times N}.\) Let \(A_M \in {\mathbb {C}}^{r \times r}\) be a submatrix of maximum volume in A. Then

$$\begin{aligned} {{\mathcal {V}}}(A_M) \leqslant {{\mathcal {V}}}({\hat{A}}) \left( 1 + 3 \rho ^2 r^2 \right) ^r. \end{aligned}$$

Proof

Let us start with the observation, that there exists a submatrix \({\tilde{A}} \in {\mathbb {C}}^{m \times n}\) with \(m, n \leqslant 2r,\) which contains the submatrices \({\hat{A}}\) and \(A_M.\)

According to Theorem 4, when constructing LU for \({\tilde{A}}\) based on \({\hat{A}},\)

$$\begin{aligned} p_1 \left( r,m,n \right) \leqslant \sqrt{\left( 1 + 3 \rho ^2 r (m-r) \right) \left( 1 + 3 \rho ^2 r (n-r) \right) } \leqslant 1 + 3 \rho ^2 r ^2, \end{aligned}$$

and therefore

$$\begin{aligned} \sigma _i({\hat{A}}) \geqslant \sigma _i( {\tilde{A}})/p_1 \left( r, m, n \right) \geqslant \sigma _i( {\tilde{A}}) / (1 + 3 \rho ^2 r^2), \quad i = \left\{ 1, \ldots , r \right\} . \end{aligned}$$

Taking the product over all singular values, we get

$$\begin{aligned} {{\mathcal {V}}}(A_M)= & {} \prod \limits _{i = 1}^r {{\sigma _i}\left( A_M \right) } \\\leqslant & {} \prod \limits _{i = 1}^r {{\sigma _i}\left( {\tilde{A}} \right) } \\\leqslant & {} \left( 1 + 3 \rho ^2 r^2 \right) ^r \prod \limits _{i = 1}^r {{\sigma _i}\left( {\hat{A}} \right) } \\\leqslant & {} {{\mathcal {V}}}({\hat{A}}) \left( 1 + 3 \rho ^2 r^2 \right) ^r. \end{aligned}$$

\(\square \)

We are now ready to prove the main Theorem 1. To do that, we are going to use the following Algorithm 1.

Algorithm 1
figure a

\(3 \rho \)-locally maximum volume search

It starts from Gaussian elimination, and then exchanges rows and columns based on the best of three values \(\rho _1,\) \(\rho _2\) and \(\rho _3.\) They correspond to the estimated volume increase after exchanging a row, a column, or both a row and a column respectively. Note, that if it stops, the following inequalities are satisfied:

$$\begin{aligned} \left\| {\hat{A}}^{-1} R \right\| _C\leqslant & {} \sqrt{\rho },\nonumber \\ \left\| C {\hat{A}}^{-1} \right\| _C\leqslant & {} \sqrt{\rho },\nonumber \\ \left\| {\hat{A}}^{-1} \right\| _C \left\| A - C {\hat{A}}^{-1} R \right\| _C\leqslant & {} 2 \rho . \end{aligned}$$
(23)

The first two follow directly from the fact that \(\max \left( \rho _1, \rho _2 \right) \leqslant \rho _n \leqslant \sqrt{\rho }\) (inequality on line 15 is false if “break” happened), and the third one follows from

$$\begin{aligned} \rho _3= & {} \left|{{{\left( {{{{\hat{A}}}^{ - 1}}R} \right) }_{j_3 l_3}}{{\left( {C{{{\hat{A}}}^{ - 1}}} \right) }_{k_3 i_3}} + {\hat{A}}_{j_3 i_3}^{ - 1}{{\left( {A - C{{{\hat{A}}}^{ - 1}}R} \right) }_{k_3 l_3}}} \right|\\\geqslant & {} \left|{\hat{A}}_{j_3 i_3}^{ - 1}{{\left( {A - C{{{\hat{A}}}^{ - 1}}R} \right) }_{k_3 l_3}} \right|- \left|{{\left( {{{{\hat{A}}}^{ - 1}}R} \right) }_{j_3 l_3}}{{\left( {C{{{\hat{A}}}^{ - 1}}} \right) }_{k_3 i_3}} \right|\\= & {} \max \limits _{i,j} \left|{\hat{A}}_{ji}^{-1} \right|\max \limits _{k,l} \left|\left( A - C {\hat{A}}^{-1} R \right) _{kl} \right|- \left|{{\left( {{{{\hat{A}}}^{ - 1}}R} \right) }_{j_3 l_3}}{{\left( {C{{{\hat{A}}}^{ - 1}}} \right) }_{k_3 i_3}} \right|\\\geqslant & {} \max \limits _{i,j} \left|{\hat{A}}_{ji}^{-1} \right|\max \limits _{k,l} \left|\left( A - C {\hat{A}}^{-1} R \right) _{kl} \right|- \max \limits _{j_2 l_2} \left|\left( {\hat{A}}^{-1} R \right) _{j_2 l_2} \right|\max \limits _{i_1 k_1} \left|\left( C {\hat{A}}^{-1} \right) _{i_1 k_1} \right|\\= & {} \max \limits _{i,j} \left|{\hat{A}}_{ji}^{-1} \right|\max \limits _{k,l} \left|\left( A - C {\hat{A}}^{-1} R \right) _{kl} \right|- \rho _2 \rho _1 \\= & {} \left\| {\hat{A}}^{-1} \right\| _C \left\| A - C {\hat{A}}^{-1} R \right\| _C - \rho _2 \rho _1, \end{aligned}$$

so

$$\begin{aligned} \left\| {\hat{A}}^{-1} \right\| _C \left\| A - C {\hat{A}}^{-1} R \right\| _C \leqslant \rho _2 \rho _1 + \rho _3 \leqslant \rho + \sqrt{\rho } \leqslant 2 \rho . \end{aligned}$$

From the conditions (23) we also find that the output submatrix has \(3 \rho \)-locally maximum volume. According to Lemma 1, it is enough to satisfy the conditions (8)–(10). First two follow directly from (23) and the last condition (with \(3 \rho \) in place of \(\rho \)) follows from

$$\begin{aligned}{} & {} \mathop {\max }\limits _{i,j,k,l} \left|{{{\left( {{{{\hat{A}}}^{ - 1}}R} \right) }_{jl}}{{\left( {C{{{\hat{A}}}^{ - 1}}} \right) }_{ki}} + {\hat{A}}_{ji}^{ - 1}{{\left( {A - C{{{\hat{A}}}^{ - 1}}R} \right) }_{kl}}} \right|\\{} & {} \quad \leqslant \mathop {\max }\limits _{i,j,k,l} \left|{{\left( {{{{\hat{A}}}^{ - 1}}R} \right) }_{jl}}{{\left( {C{{{\hat{A}}}^{ - 1}}} \right) }_{ki}} \right|+ \mathop {\max }\limits _{i,j,k,l} \left|{\hat{A}}_{ji}^{ - 1}{{\left( {A - C{{{\hat{A}}}^{ - 1}}R} \right) }_{kl}} \right|\\{} & {} \quad \leqslant \left\| {\hat{A}}^{-1} R \right\| _C \left\| C {\hat{A}}^{-1} \right\| _C + \left\| {\hat{A}}^{-1} \right\| _C \left\| A - C {\hat{A}}^{-1} R \right\| _C \leqslant \rho + 2 \rho \leqslant 3 \rho . \end{aligned}$$

Algorithm 1 also does eventually stop, as the volume is multiplied at least by a factor \(\max \left( \rho _1, \rho _2, \rho _3 \right) > \sqrt{\rho } \geqslant 1\) at each iteration. Thus, it can’t visit the same submatrix twice, and there is a finite number of submatrices to check.

If we want to reach a \(\rho \)-locally maximum volume submatrix, we first run Algorithm 1 with \(\rho = \sqrt{3}\) (although any constant greater than 1 would suffice) and then improve it making exchanges following Lemma 1. According to Lemma 1, Algorithm 2 eventually outputs a \(\rho \)-locally maximum volume submatrix. And it also eventually stops for the same reason.

Algorithm 2
figure b

\(\rho \)-locally maximum volume search

We now estimate the number of steps required to reach \(3 \rho \)-locally maximum volume and \(\rho \)-locally maximum volume submatrices in the following proposition.

Proposition 1

Algorithm 1 stops after \(O \left( r \left( 1 + \frac{1}{\log \rho } \right) \log r \right) \) iterations and has the total computational cost of \(O \left( MNr \left( 1 + \frac{1}{\log \rho } \right) \log r \right) .\)

Algorithm 2 performs no more than \(O \left( r \log _{\rho } r \right) \) additional iterations and has the total computational cost of \(O \left( MNr \left( 1 + \frac{r^2}{\log \rho } \right) \log r \right) .\)

Proof

The starting submatrix \(A_{(0)}\) is chosen using Gaussian elimination with complete pivoting. By Corollary 2, \(A_{(0)}\) has \(\rho _{(0)}\)-locally maximum volume with

$$\begin{aligned} \rho _{(0)} \leqslant 3 \cdot 4^{r-1} r^{\frac{2 + \ln r}{4}}. \end{aligned}$$
(24)

Consider now the sequence of submatrices \({\hat{A}} = A_{(s)}\) obtained after s steps of column and/or row exchanges. According to Lemma 4,

$$\begin{aligned} {{\mathcal {V}}}(A_M)\leqslant & {} {{\mathcal {V}}}(A_{(s)}) \left( 3 \rho _{(s)}^2 (r^2+1) \right) ^r, \nonumber \\ \rho _{(s)}\geqslant & {} \frac{1}{\sqrt{3(r^2+1)}} \left( \frac{{{\mathcal {V}}}(A_M)}{{{\mathcal {V}}}(A_{(s)})} \right) ^{\frac{1}{2r}}, \end{aligned}$$
(25)

where \(A_M \in {\mathbb {R}}^{r \times r}\) is the maximum volume submatrix.

At each step of the Algorithm 1, we make an exchange according to the following rule. We select a new row k and/or a new column l that match the value of

$$\begin{aligned} \rho '= & {} \max \left( \left|\left( {\hat{A}}^{-1} R \right) _{j_2 l_2} \right|, \left|\left( C {\hat{A}}^{-1} \right) _{k_1 i_1} \right|, \right. \nonumber \\{} & {} \left. \left|{ {\hat{A}}_{j_3 i_3}^{ - 1}{{\left( {A - C{{{\hat{A}}}^{ - 1}}R} \right) }_{k_3 l_3}} + {{\left( {{{{\hat{A}}}^{ - 1}}R} \right) }_{j_3 l_3}}{{\left( {C{{{\hat{A}}}^{ - 1}}} \right) }_{k_3 i_3}}} \right|\right) \nonumber \\\geqslant & {} \max \left( \max \limits _{j,l} \left|\left( {\hat{A}}^{-1} R \right) _{jl} \right|, \max \limits _{i,k} \left|\left( C {\hat{A}}^{-1} \right) _{ki} \right|, \right. \nonumber \\{} & {} \left. \max \limits _{i,j} \left|{\hat{A}}_{ji}^{-1} \right|\max \limits _{k,l} \left|\left( A - C {\hat{A}}^{-1} R \right) _{kl} \right|- \left|\left( {\hat{A}}^{-1} R \right) _{j_3 l_3} \right|\left|\left( {\hat{C}} A^{-1} \right) _{k_3 i_3} \right|\right) \nonumber \\\geqslant & {} \max \left( \max \limits _{j,l} \left|\left( {\hat{A}}^{-1} R \right) _{jl} \right|, \max \limits _{i,k} \left|\left( C {\hat{A}}^{-1} \right) _{ki} \right|, \right. \nonumber \\{} & {} \left. \max \limits _{i,j} \left|{\hat{A}}_{ji}^{-1} \right|\max \limits _{k,l} \left|\left( A - C {\hat{A}}^{-1} R \right) _{kl} \right|- \max \limits _{j_2, l_2} \left|\left( {\hat{A}}^{-1} R \right) _{j_2 l_2} \right|\max \limits _{i_1, k_1} \left|\left( {\hat{C}} A^{-1} \right) _{k_1 i_1} \right|\right) , \end{aligned}$$
(26)

where we substituted the definitions of the indices \(i_1, k_1, j_2, k_2, i_3, j_3, k_3, l_3\) from the Algorithm 1.

Such a choice increases the volume by at least \(\rho '\) times, since we have chosen \(\rho ' = \max \left( \rho _1, \rho _2, \rho _3 \right) .\)

Moreover, if at the s-th step we have exactly \(\rho _{(s)}\)-locally maximum volume submatrix with \(\rho _{(s)} \geqslant 3,\) then \(\rho ' \geqslant \sqrt{\rho _{(s)}/3}.\) Indeed, if \(\max \limits _{j,l} \left|{\hat{A}}^{-1} R \right|_{jl} < \sqrt{\rho _{(s)}/3}\) and \(\max \limits _{i,k} \left|C {\hat{A}}^{-1} \right|_{ki} < \sqrt{\rho _{(s)}/3},\) then

$$\begin{aligned}{} & {} \max \limits _{i,j} \left|{\hat{A}}^{-1} \right|_{ji} \max \limits _{k,l} \left|A - C {\hat{A}}^{-1} R \right|_{kl} - \max \limits _{j_2, l_2} \left|\left( {\hat{A}}^{-1} R \right) _{j_2 l_2} \right|\max \limits _{i_1, k_1} \left|\left( C {\hat{A}}^{-1} \right) _{k_1 i_1} \right|\\{} & {} \quad \geqslant \max \limits _{i,j,k,l} \left|{\hat{A}}_{ji}^{-1} \left( A - C {\hat{A}}^{-1} R \right) _{kl} + \left( {\hat{A}}^{-1} R \right) _{jl} \left( C {\hat{A}}^{-1} \right) _{ki} \right|\\{} & {} \qquad - 2 \max \limits _{j_2, l_2} \left|\left( {\hat{A}}^{-1} R \right) _{j_2 l_2} \right|\max \limits _{i_1, k_1} \left|\left( C {\hat{A}}^{-1} \right) _{k_1 i_1} \right|\\{} & {} \quad \geqslant \rho _{(s)} - 2\frac{\rho _{(s)}}{3} = \rho _{(s)}/3 \geqslant \sqrt{\rho _{(s)}/3}. \end{aligned}$$

Taking (25) into account, we get that the ratio to the maximum volume changes as

$$\begin{aligned} \frac{{{\mathcal {V}}}(A_M)}{{{\mathcal {V}}}(A_{(s+1)})} \leqslant \frac{{{\mathcal {V}}}(A_M)}{{{\mathcal {V}}}(A_{(s)})} \cdot \sqrt{\frac{3}{\rho _{(s)}}} \leqslant \sqrt{3\sqrt{3(r^2+1)}} \left( \frac{{{\mathcal {V}}}(A_M)}{{{\mathcal {V}}}(A_{(s)})} \right) ^{1 - \frac{1}{4r}}. \end{aligned}$$
(27)

Let us denote

$$\begin{aligned} \alpha _{(s)} = \frac{\left( \frac{{{\mathcal {V}}}(A_M)}{{{\mathcal {V}}}(A_{(s)})} \right) ^{\frac{1}{4r}}}{\sqrt{3\sqrt{3(r^2+1)}}}. \end{aligned}$$
(28)

From (24) and Lemma 4

$$\begin{aligned} \alpha _{(0)} \leqslant \sqrt{\rho _{(0)}/3} \leqslant 2^{r-1} r^{\frac{2 + \ln r}{8}} \leqslant 2^r r^{\frac{2 + \ln r}{8}}. \end{aligned}$$
(29)

Substituting \(\alpha _{(s)}\) (28) into (27), we can estimate how \(\alpha _{(s)}\) changes after each step:

$$\begin{aligned} \left( 27 (r^2+1) \right) ^{r} \alpha _{(s+1)}^r\leqslant & {} \left( 27 (r^2+1) \right) ^{r} \alpha _{(s)}^{4r-1},\nonumber \\ \alpha _{(s+1)}\leqslant & {} \alpha _{(s)}^{1-\frac{1}{4r}}, \nonumber \\ \alpha _{(s)}\leqslant & {} \alpha _{(0)}^{\left( 1-\frac{1}{4r} \right) ^s}, \nonumber \\ \ln \ln \alpha _{(s)} - \ln \ln \alpha _{(0)}\leqslant & {} s \ln \left( 1-\frac{1}{4r} \right) \leqslant -\frac{s}{4r}. \end{aligned}$$
(30)

Thus, to reach, for instance, \(\alpha _{(s)} \leqslant 3^{1/4} < 2^r r^{\frac{2 + \ln r}{8}},\) one would need

$$\begin{aligned} s_1\leqslant & {} 4r \ln \ln \left( 2^r r^{\frac{2 + \ln r}{8}} \right) - 4r \ln \ln 3^{1/4} + 1 \nonumber \\= & {} O \left( r \log r \right) \end{aligned}$$
(31)

steps. After that

$$\begin{aligned} \frac{{{\mathcal {V}}}(A_M)}{{{\mathcal {V}}}(A_{(s_1)})} \leqslant \left( \sqrt{3 \sqrt{3(r+1)}} \alpha _{(s_1)} \right) ^{4r} \leqslant \left( 81 \left( r^2+1 \right) \right) ^r. \end{aligned}$$

Then, while \(\rho ' \geqslant \sqrt{\rho },\) the Algorithm 1 increases the volume by at least a factor of \(\sqrt{\rho }\) at every step. Since we cannot exceed the maximum volume, the algorithm requires at most

$$\begin{aligned} s_2 \leqslant \log _{\sqrt{\rho }} \left( 81 \left( r^2+1 \right) \right) ^{r} = O \left( r \frac{\log r}{\log \rho } \right) \end{aligned}$$
(32)

steps until termination.

Each step of the algorithm is an update of rank no greater than 2 for the submatrices \({\hat{A}},\) C,  and R,  since no more than one row and no more than one column is replaced. It follows that the recalculation of \(C {\hat{A}}^{-1},\) \({\hat{A}}^{-1} R\) and \(C {\hat{A}}^{-1} R\) is also of low rank and requires a total of at most O(MN) operations per step, after which the choice of indices ijkl based on \(\rho '\) (26) also takes at most \(O \left( MN \right) \) operations. Given the constraints (31) and (32), the total number of steps is \(s_1 + s_2 = O \left( r \left( 1 + \frac{1}{\log \rho } \right) \log r \right) ,\) and the total computational complexity of the algorithm is \(O \left( MNr \left( 1 + \frac{1}{\log \rho } \right) \log r \right) .\) Gaussian elimination with complete pivoting takes \(O \left( M N r \right) \) operations, which does not affect the general asymptotics.

If it is necessary to search for an \(\rho \)-locally maximum volume with \(\rho \leqslant 3,\) then we can also make exchanges of rows and columns after \(3 \rho \)-locally maximum volume is reached for \(\rho = \sqrt{3}\) (which will take \(O \left( M N r \log r \right) \) operations), but then checking

$$\begin{aligned} \max \limits _{i,j,k,l} \left|{{{\left( {{{{\hat{A}}}^{ - 1}}R} \right) }_{jl}}{{\left( {C{{{\hat{A}}}^{ - 1}}} \right) }_{ki}} + {\hat{A}}_{ji}^{ - 1}{{\left( {A - C{{{\hat{A}}}^{ - 1}}R} \right) }_{kl}}}\right|> \rho , \end{aligned}$$

which is what Algorithm 2 does.

Since in this case we guarantee an increase in volume by at least \(\rho \) times, with Lemma 4 the number of additional steps can be bounded by

$$\begin{aligned} s_3 \leqslant \log _{\rho } \left( 1 + 3 \left( 3 \sqrt{3} \right) ^2 r^2 \right) ^r = \log _{\rho } \left( 1 + 81 r^2 \right) ^r = O \left( r \log _{\rho } r \right) . \end{aligned}$$

However, now at each step it is required to check all \(MN r^2\) variants of the quadruple ijkl,  so these (at most) \(s_3\) steps are \(r^2\) times more expensive. Thus, the total cost of the search for \(\rho \)-locally maximum volume is \(O \left( MNr \left( 1 + \frac{r^2}{\log \rho } \right) \log r \right) .\) For \(\rho \leqslant 3\) it simplifies to \(O \left( MNr^3 \log _{\rho } r \right) .\) \(\square \)

Note that, as shown in [5], cross approximation based on a \(\rho \)-locally maximum volume submatrix also allows achieving high accuracy in the Chebyshev norm (which does not depend on matrix sizes). Let us formulate this fact in terms of the LU decomposition.

Theorem 5

[5] Let an incomplete LU decomposition (1) be constructed based on the submatrix \({\hat{A}} = L_{11} U_{11} \in {\mathbb {R}}^{r \times r}\) of \(\rho \)-locally maximal volume. Then

$$\begin{aligned} \left\| A - LU \right\| _C \leqslant \rho \left( r + 1 \right) ^2 \left\| E \right\| _C, \end{aligned}$$
(33)

where matrix E is the error of the best approximation in the Chebyshev norm.

So, finding \(3 \rho \)-locally maximum volume submatrix also guarantees a good Chebyshev norm bound.

An algorithm, which allows to guarantee the bound (33) was recently suggested in [20]. Namely, it was shown that a factor of \(\rho \) in the Chebyshev norm can be reached without considering all \(MN r^2\) possible exchanges. Following [20, Lemma 3.15], we show that it is possible to quickly reach a factor of \(\rho \) instead of \(3 \rho \) in Chebyshev norm, even when we only guarantee to find a \(3 \rho \)-locally maximum volume submatrix. It should be noted, however, that the algorithm, described in [20], is asymptotically slower and that reaching Chebyshev norm bound does not, by itself, guarantee either “spectrum revealing” or “strong” RRLU properties (see Definition 2), which are necessary to obtain a \(\rho \)-locally maximum volume submatrix (see Theorem 4).

To guarantee the bound (33), we calculate the value \(\rho _3\) in the Algorithm 1 more efficiently, by directly finding the maximum possible volume multiplier over all i and j. We write the result as Algorithm 3.

Algorithm 3
figure c

\(3 \rho \)-locally maximum volume search with better Chebyshev norm error guarantees

Proposition 2

Algorithm 3 requires at most \(O \left( MNr \left( 1 + \frac{1}{\log \rho } \right) \log r \right) \) operations,  guarantees the same bounds as Algorithm 1 and also allows to construct an incomplete LU decomposition,  such that

$$\begin{aligned} \Vert A - LU \Vert _C \leqslant \rho (r+1)^2 \Vert E\Vert _C. \end{aligned}$$
(34)

Proof

In Algorithm 3, calculating \(\rho _3\) allows additionally checking the condition

$$\begin{aligned} \left|{{{\left( {{{{\hat{A}}}^{ - 1}}R} \right) }_{jl_3}}{{\left( {C{{{\hat{A}}}^{ - 1}}} \right) }_{k_3 i}} + {\hat{A}}_{ji}^{ - 1}{{\left( {A - C{{{\hat{A}}}^{ - 1}}R} \right) }_{k_3 l_3}}} \right|\leqslant \sqrt{\rho } \end{aligned}$$
(35)

for all i and j and for

$$\begin{aligned} k_3, l_3 = \mathop {\arg \max }\limits _{k,l} \left|{\left( {A - C{{{\hat{A}}}^{ - 1}}R} \right) }_{k l} \right|. \end{aligned}$$

If, given \(k_3\) and \(l_3,\) it is possible to find i and j such that the condition (35) is violated, then the increase in volume with the corresponding replacement is at least \(\sqrt{\rho }\) times, according to the Lemma 1. Only if this growth is higher than all other options for \(\rho '\) (26), then we make this replacement. We note right away that this will not increase the cost of the algorithm, since the cost of going through all pairs of i and j requires only \(O \left( r^2 \right) \) arithmetic operations, which does not increase the asymptotic cost of one step of the algorithm. In addition, the bound on the number of steps also does not change, as each increase in volume is not lower than before.

Now we show that when the algorithm stops, the inequality (34) holds. We argue from the contrary: if the inequality for the Chebyshev norm fails, then, in particular, it fails for those k and l where the error reaches its maximum. It is these new row \(k_3\) and new column \(l_3\) that we are checking. Consider the submatrix \({\tilde{A}} \in {\mathbb {R}}^{(r+1) \times (r+1)},\) which is the extension of the submatrix \({\hat{A}}\) by the found row \(k_3\) and column \(l_3.\) If inequality (34) is false, then

$$\begin{aligned} \left\| {\tilde{A}} - {\tilde{L}} {\tilde{U}} \right\| _C > \rho (r+1)^2 \Vert E\Vert _C \geqslant \rho (r+1)^2 \Vert {\tilde{E}}\Vert _C, \end{aligned}$$

where \(\Vert {\tilde{E}}\Vert _C\) is the error of the best approximation of \({\tilde{A}}\) in Chebyshev norm, submatrices \({\tilde{L}} \in {\mathbb {R}}^{(r+1) \times r}\) and \({\tilde{U}} \in {\mathbb {R}}^{r \times (r+1)}\) of L and U correspond to the rows and columns of \(\tilde{A}.\) Then from Theorem 5 we get that the submatrix \({\hat{A}}\) does not have \(\rho \)-locally maximum volume inside \({\tilde{A}},\) and hence there is a replacement that includes only the row \(k_3\) or only the column \(l_3,\) increasing the volume by more than \(\rho \) times (which contradicts (23)), or simultaneous replacement of some row i and column j by \(k_3\) and \(l_3,\) which increases the volume by more than \(\rho \) times, which contradicts the additional check (35) that we introduced into the algorithm. This contradiction shows that after termination the inequality (34) must be satisfied. \(\square \)

6 Numerical experiments

In this section, we demonstrate the effectiveness of various methods for constructing rank revealing LU decompositions. The purpose of this chapter is to compare the proposed method for finding submatrices of locally maximum volume with other methods, which search for a large volume submatrix.

The fastest method based on volume maximization is the adaptive cross approximation (ACA) [1]. It uses an incomplete Gaussian elimination with partial pivoting. Each new row and column are chosen corresponding to the search for the maximum element in the current row/column of the error. At each step, only one row and one column are searched, and the final row is chosen as the starting row for the next step. The rows and columns thus added are used to construct the \(C {\hat{A}}^{-1} R\) decomposition, also known as the cross or skeleton [6].

As an alternative to ACA, one can also use an incomplete Gaussian elimination with the choice of the next maximum element in the entire matrix (complete pivoting). This guarantees the stability of the method, but greatly increases its computational complexity.

The maxvol algorithm [2] is more efficient, but also slightly more expensive compared to the ACA method. It allows replacing the chosen rows and columns of the submatrix. After termination, it guarantees that the found submatrix will have \(\rho \)-locally maximum volume inside its corresponding rows and columns. To limit its running time, one can directly prohibit more than r row and column swaps.

Finally, we will look at the rank revealing LU decompositions, based on Theorems 3 and 4.

Complexities and known estimates for all the above methods are shown in Table 1 for the case of a square matrix \(A \in {\mathbb {R}}^{N \times N}.\) From Theorem 3 one can derive the following bound on the Chebyshev norm

$$\begin{aligned} \left\| A - LU \right\| _C\leqslant & {} \left\| A - LU \right\| _2 \\\leqslant & {} \left( 1 + \rho ^2 r (N-r) \right) \sigma _{r+1} \left( A \right) \\\leqslant & {} \left( 1 + \rho ^2 r (N-r) \right) \left\| E_{\textrm{best}} \right\| _2 \\\leqslant & {} \left( 1 + \rho ^2 r (N-r) \right) N \left\| E_{\textrm{best}} \right\| _C, \end{aligned}$$

where \(E_{\textrm{best}}\) is the matrix of the lowest rank r approximation error in the Chebyshev norm.

Table 1 Estimates of the complexity and accuracy of approximation for various methods of incomplete LU decomposition of the square matrix \(A \in {\mathbb {R}}^{N \times N}\)

To test these methods, we will use the RANDSVD ensemble [21], which consists of matrices of the form

$$\begin{aligned} A = U S V, \quad U,S,V \in {\mathbb {R}}^{N \times N}, \end{aligned}$$
(36)

where S is a fixed diagonal matrix of singular values and UV are independent random orthogonal matrices.

Since in practice the algorithms stop after a small number of replacements even for \(\rho = 1,\) we will use \(\rho = 1,\) although the obtained estimates for the number of steps do not hold in this case.

Table 2 Average errors and computation times of rank \(r = 20\) approximation for 100 random matrices of the form (36) of size \(N = 1000\) with singular values as in (37)

Table 2 shows the results of rank 20 approximations of random RANDSVD matrices with singular values

$$\begin{aligned} \sigma _k(A) = 1/2^k. \end{aligned}$$
(37)

It can be seen that, in order to achieve a high-quality approximation, it is usually not necessary to iterate over all rows and columns, and the faster ACA and maxvol algorithms are sufficient. At the same time, although the estimate of Theorem 3 does not imply achieving high accuracy in the Chebyshev norm, and the found submatrix may differ from the submatrix of locally maximum volume, the corresponding algorithm allows one to achieve approximations of higher accuracy and in less time than search for a locally maximum volume in the entire matrix. At the moment, we do not have a theoretical justification for this feature.

Despite the efficiency of ACA and maxvol on average, it is easy to give a lot of examples where it is necessary to consider the entire matrix to build a high accuracy approximation. Consider, for example, an arbitrary matrix with a hidden block structure

$$\begin{aligned} B = \Pi _1 \left[ {\begin{array}{ll} {{B_1}} &{} 0 \\ 0 &{} {{B_2}} \\ \end{array} } \right] \Pi _2 + E, \end{aligned}$$
(38)

where E is some noise and \(\Pi _1\) and \(\Pi _2\) are permutations of rows and columns. In the worst case, when the starting submatrix is located entirely in one of the blocks, replacing the row or column will not allow leaving the current block and moving to another. This is due to the fact that any column in the rows corresponding to \(B_1\) submatrix will either be inside \(B_1\) or consist entirely of noise. Only the simultaneous replacement of a row and a column allows the algorithm to “see” elements from another block. Even if the initial submatrix is not entirely in one of the blocks, such a structure still significantly complicates the search for the optimal submatrix and leads to an increase in the approximation error.

Table 3 Average errors and computation times of rank \(r = 20\) approximations for 100 random matrices of the form (38) of size \(N = 1000\) with singular values of blocks as in (39)

We will take the blocks \(B_1\) and \(B_2\) also from the RANDSVD ensemble with singular values

$$\begin{aligned} \sigma _k \left( B_1 \right) = \sigma _k \left( B_2 \right) = 1/2^k, \end{aligned}$$
(39)

and E as a random Gaussian matrix in which each element has zero mean and variance \({\mathbb {E}} \left|E_{ij}\right|^2 = 10^{-14},\) so that \(\left\| E \right\| _F \approx 10^{-4}.\) Let us again look for an approximation of rank 20 of such a matrix. The optimum in this case is to take 10 rows and columns from the submatrix \(B_1\) and 10 more from the submatrix \(B_2,\) so that the error in the spectral norm and the Frobenius norm will be about \(10^{-3}.\) The results of the same algorithms for the matrix B are presented in Table 3. As we can see, the adaptive cross and maxvol methods in this case work much worse and do not allow building an approximation with high accuracy.

Overall, the choice of the rank revealing LU decomposition algorithm depends on the requirements for the speed and accuracy of the method. Adaptive cross approximation [1] is significantly faster than all other methods, but also leads to the largest error. The maxvol algorithm [2] reduces the error several times, while maintaining the same asymptotic cost as the adaptive cross. On average, it can even outperform complete pivoting in accuracy (see Table 2). However, like the adaptive cross, it does not provide any guarantees for the accuracy of the approximation, and can partially lose its accuracy if the approximated matrix has some hidden structure. The algorithm corresponding to Theorem 3 guarantees the best accuracy in terms of the spectral norm and also shows the best results in approximation accuracy among all the given algorithms. However, the presence of the factor N in the Chebyshev norm error estimate does not make this bound very useful. The algorithm for finding a locally maximum volume, proposed in this paper, although looks a little less efficient in practice, allows for high accuracy of the approximation in the Chebyshev norm, as well as the “strong” property (Definition 2) of the resulting RRLU decomposition.

The algorithms from Tables 2 and 3 are realised in Fortran and are available in https://github.com/RodniO/Projective-volume-low-rank (file ExampleLU.f90).

7 Conclusion

The obtained estimates show that, in contrast to submatrices of almost maximum volume, the search for which is an NP-hard problem [8, 9], submatrices close to locally maximum volume can be found in polynomial time, which ensures that the various bounds that are often associated with maximum volume submatrices are satisfied. In particular, it is possible to quickly construct cross approximations with guarantees of accuracy in the Chebyshev norm and the spectral norm.