1 Introduction

We are interested in computing low-rank approximate solutions of linear systems \(Au = b\) with the Kronecker-product structure

$$\begin{aligned} \left( \sum _{i=0}^{m} G_i \otimes K_i \right) u = \sum _{i=0}^{{\hat{m}}} g_i \otimes f_i, \end{aligned}$$
(1.1)

where \(A =\sum _{i=0}^{m} G_i \otimes K_i\) is symmetric positive definite, \(\otimes \) is the Kronecker product, \(\{K_i\}_{i=0}^{m}\in {\mathbb {R}}^{n_1 \times n_1}\), \(\{G_i\}_{i=0}^{m} \in {\mathbb {R}}^{n_2 \times n_2}\), \(\{f_i\}_{i=0}^{{\hat{m}}} \in {\mathbb {R}}^{n_1}\), and \(\{g_i\}_{i=0}^{{\hat{m}}} \in {\mathbb {R}}^{n_2}\), and we assume that \(m, {\hat{m}} \ll n_1, n_2\). The solution vector \(u \in {\mathbb {R}}^{n_1 n_2}\) consists of \(n_2\) subvectors of dimension \(n_1\), i.e., \(u = [u_1^{\mathsf {T}}, \ldots , u_{n_2}^{\mathsf {T}}]^{\mathsf {T}}\), where \(\{u_i\}_{i=1}^{n_2} \in {\mathbb {R}}^{n_1}\). The solution also has an alternative representation in matrix format, \(U = [u_1, \ldots , u_{n_2}] \in {\mathbb {R}}^{n_1 \times n_2}\). Exploiting this, and using standard properties of the Kronecker product, one can show that the linear system (1.1) is equivalent to a linear multi-term matrix equation (LMTME) [36]

$$\begin{aligned} \sum _{i=0}^{m} K_i U G_i^{\mathsf {T}}= B, \quad \text {where} \quad B = \sum _{i=0}^{{\hat{m}}} f_i g_i^{\mathsf {T}}\in {\mathbb {R}}^{n_1 \times n_2}. \end{aligned}$$
(1.2)

Systems with such structure arise, for example, in the discretization of deterministic linear elliptic PDEs on high-dimensional domains [2, 20,21,22] as well as in the discretization, via stochastic Galerkin finite element methods (SGFEMs) [13, 18, 25, 28, 47], of linear elliptic PDEs parameterized with random or unknown coefficients (see Eqs. (1.4)–(1.5) below). When the matrices \(K_i\) and \(G_i\) are sparse, then for moderately large values of \(n_{1}\) and \(n_{2}\) it is feasible to solve (1.1) using standard iterative methods. Indeed, in the case of parameterized PDEs, standard Krylov subspace methods [34, 35] and multigrid methods [4, 10, 24] have been considered. However, the dimensions of the system matrices can grow rapidly when the discretization is refined and, in the case of parameterized PDEs, when the number m of input random variables is increased.

For large \(n_1\) and \(n_2\), direct application of standard iterative methods may be computationally prohibitive and storing or explicitly forming the matrix U may be prohibitive in terms of memory. Motivated by this, we are interested in inexpensive computation of approximate solutions of LMTMEs of the form (1.2) of low rank, using methods that do not require constructions of matrices of size \(n_1\times n_2\). We begin by introducing a factored representation of \(U \in {\mathbb {R}}^{n_1 \times n_2}\),

$$\begin{aligned} U = VW^{\mathsf {T}}, \end{aligned}$$

where, if U is of full rank \(r:=\min (n_1,n_2)\), \(V \in {\mathbb {R}}^{n_1 \times r}\) and \(W \in {\mathbb {R}}^{n_2 \times r}\). Our aim is then to find a low-rank approximation to this factored matrix of the form

$$\begin{aligned} U_p= V_{p}W_{p}^{\mathsf {T}}\in {\mathbb {R}}^{n_1 \times n_2}, \end{aligned}$$
(1.3)

where \(V_p =[v_1,\ldots ,v_p] \in {\mathbb {R}}^{n_1 \times p}\) and \(W_p =[w_1,\ldots ,w_p] \in {\mathbb {R}}^{n_2 \times p}\) and \(p \ll r\), and we want to derive solution algorithms for computing \(U_p\) that operate only on the smaller factors \(V_p\) and \(W_p\) without explicitly forming the large matrix \(U_p\).

One such solution algorithm has been developed for matrix completion/sensing problems [16, 17], which, at the pth iteration, computes \(V_p\) and \(W_p\) by alternately solving certain minimization problems. Although the algorithm computes highly accurate approximations, it can become very expensive as p increases (see Sect. 2.4). Another approach is to use successive rank-one approximations and successively compute pairs of vectors \(\{(v_i\),\(w_i)\}_{i=1}^{p}\) to build the factors \(V_p\) and \(W_p\) of (1.3) until a stopping criterion is satisfied. The pth iteration starts with \(V_{p-1}\) and \(W_{p-1}\) and constructs \(v_p\) and \(w_p\) as the solutions of certain minimization problems. This approach for solving parameterized PDEs is one component of a methodology known as Proper Generalized Decomposition (PGD) [30, 31, 46]. As observed in those works, using only successive rank-one approximations is less expensive but may not be practical because it typically results in approximations with an unnecessarily large value of p.

Our goal in this study is to develop solution algorithms that preserve only the good properties of the above two types of solution strategies, i.e., algorithms that compute an accurate low-rank approximate solution in a computationally efficient way. In developing such algorithms, we take our cue from PGD methods, in which, to improve accuracy, the successive rank-one constructions are supplemented with an updating procedure that is performed intermittently during the iteration. Inspired by this approach, we propose a solution algorithm that adaptively computes approximate solutions in an inexpensive way via the successive rank-one approximation method. This is supplemented by an enhancement procedure, which effectively improves the accuracy of the resulting approximate solutions. We propose two novel enhancement procedures developed by modifying some ideas in [17] used for matrix completion problems [38]. An algebraic formulation of PGD methods corresponds to alternating minimizations of errors in a particular norm, e.g., the energy norm or the \(\ell ^2\) norm. Since we are considering linear systems with symmetric and positive definite matrices, we use the energy norm, and refer to the resulting methods as alternating energy minimization (AEM) methods.

Some other rank-adaptive approaches for approximating solutions of LMTMEs in low-rank format are as follows. An approach close to the ideas considered in this paper is a greedy low-rank algorithm, developed in [20], where the successive rank-one algorithm is followed by an enhancement procedure. This method is also used in [23] for a regression problem arising in a neuroscientific model of synaptic connections. A method in [5], called the AMEn algorithm, uses AEM techniques in combination with tensor-train (TT) decompositions [32]; AMEn is designed for high-dimensional problems with multiple terms in the Kronecker product format. Another class of approaches includes incrementally computing rank-p solution pairs by solving a residual minimization problem, an approach known as alternating least-squares (ALS). This has been used to compute low-rank approximate solutions of parameterized PDEs in [6, 7], and to solve matrix recovery problems, matrix sensing and completion problems, [15,16,17, 38]. In [36], an adaptive iterative procedure to solve LMTMEs (1.2) associated with SGFEM discretizations is given, which incrementally computes a set of orthonormal basis vectors to form \(V_p\) which represents the spatial part of the solution. Two other classes of iterative low-rank algorithms are low-rank Krylov subspace methods [2, 3, 21, 22, 29, 33, 40, 45] and low-rank multigrid methods [12, 44]. These methods operate on iterates represented in Kronecker product format (e.g., \(\sum _{i=1}^{{\tilde{p}}} w_i \otimes v_i\) for representing a vector with \({\tilde{p}}\) terms) and employ so-called truncation operators (e.g., based on singular value decomposition) to keep the iterates in low rank. See also [43] for a comprehensive overview of computational approaches for solving linear matrix equations.

Although we only exploit the abstract structure (1.2) to derive solution algorithms, we are motivated by the need to solve LMTMEs associated with SGFEM discretizations of parameterized PDEs arising in forward uncertainty quantification. Here we briefly mention their key features; more details are given in Sect. 4. Consider the model problem

$$\begin{aligned} \begin{array}{r l l} -\nabla \cdot (a(x,\xi ) \nabla u(x,\xi ) ) = f(x) \quad (x, \xi ) \in D \times \varGamma , \end{array} \end{aligned}$$
(1.4)

where \(D \subset {\mathbb {R}}^{2,3}\) is the spatial domain and the diffusion coefficient has the form

$$\begin{aligned} a(x,\xi ) = a_{0}(x) + \sum _{i=1}^{m} a_i(x) \xi _i, \end{aligned}$$
(1.5)

where \(\xi =[\xi _{1}, \ldots , \xi _{m}]\) is a vector of m independent random variables taking values in a parameter domain \(\varGamma \subset {\mathbb {R}}^{m}\). In the SGFEM approach, the solution \(u(x, \xi )\) to (1.4) is approximated in a finite-dimensional space with tensor product structure \(X_{h}~\otimes ~S_{d}\) where \(X_{h}\) is a standard finite element space associated with D and \(S_{d}\) is a space of (usually, global) polynomials on \(\varGamma \). Applying such a scheme leads to a LMTME (1.2) in which the \(K_{i}\) matrices are associated with the chosen finite element discretization, and the \(G_{i}\) matrices are associated with the polynomial approximation on the m-dimensional parameter domain. In particular, \(K_{i}\) is a finite element stiffness matrix weighted by the coefficient \(a_{i}(x)\) appearing in (1.5). These matrices are ill-conditioned with respect to the mesh parameter h, and due to the properties of the coefficients in (1.5), they have decaying importance in terms of their contribution to the sum in (1.1). Moreover, the first term \(G_{0} \otimes K_{0}\) usually dominates and serves as an effective and computationally efficient preconditioner [35]. We will exploit this fact in numerical experiments in Sect. 4.

An outline of the rest of the paper is as follows. In Sect. 2, we introduce and derive alternating energy minimization (AEM) methods for (1.2) using the well-known general projection framework and discuss a collection of methods developed for constructing low-rank approximate solutions of the form (1.3). In Sect. 3, we discuss enhancement procedures and derive two new approaches for performing such updates. In Sect. 4, we perform extensive numerical experiments and measure the effectiveness and the efficiency of the enhanced AEM methods for LMTMEs associated with SGFEM discretizations of parameterized elliptic PDEs of the form (1.4). We also compare our methods with the greedy low-rank algorithm [20] on the same benchmark problems as that method shares many features with the proposed methods and can easily be described within the enhanced AEM framework. Finally, in Sect. 5, we draw some conclusions.

2 Alternating energy minimization (AEM) methods

In this section, we derive AEM methods for solving the matrix Eq. (1.2) from the optimal projection framework and review two variants of such methods. We first introduce some notation. Upper-case and lower-case letters are used to denote matrices and vectors, respectively. An inner product between two matrices \(X, Y \in {\mathbb {R}}^{n_1 \times n_2}\) is defined as \(\langle X, Y \rangle \equiv \text {tr}(X^{\mathsf {T}}Y) = \text {tr}(XY^{\mathsf {T}}) = \sum _{i,j} X_{ij}Y_{ij}\), where tr is the trace operator, and \(\text {tr}(X) = \sum _{i=1}^{n} x_{ii}\) if \(X\in {\mathbb {R}}^{n\times n}\). The norm induced by \(\langle \cdot , \cdot \rangle \) is the Frobenius norm \(\Vert X \Vert _\text {F}= \sqrt{\langle X, X \rangle }\). For shorthand notation, we introduce a linear operator \({\mathscr {A}}( X ) = \sum _{i=0}^{m} K_i X G_i^{\mathsf {T}}\) for \(X \in {\mathbb {R}}^{n_1 \times n_2}\). Using this, we can define the weighted inner product \(\langle X, Y\rangle _A = \langle {\mathscr {A}}(X), Y\rangle = \langle X, \mathscr {A}(Y)\rangle \) and the induced A-norm \(\Vert \cdot \Vert _A\). Finally, vec denotes a vectorization operator, \(\text {vec}(X)=x\), where \(X = [x_1,\ldots ,x_{n_2}] \in {\mathbb {R}}^{n_1 \times n_2}\) and \(x=[x_1^{\mathsf {T}},\ldots ,x_{n_2}^{\mathsf {T}}]^{\mathsf {T}}\in {\mathbb {R}}^{n_1n_2}\), for \(x_i \in {\mathbb {R}}^{n_1}\), \(i=1,\ldots ,n_2\).

2.1 General projection framework

For the computation of \(V_p\) and \(W_p\) in (1.3), we rely on the classical theory of orthogonal (Galerkin) projection methods [39]. Let \({\mathscr {K}} \subset {\mathbb {R}}^{n_1 \times n_2}\) be a search space in which an approximate solution \(U_p\in {\mathbb {R}}^{n_1 \times n_2}\) is sought, and let \({\mathscr {L}}\) be a constraint space onto which the residual \(B -{\mathscr {A}} (U_p)\) is projected. Following [39, Proposition 5.2], if the system matrix A is symmetric positive definite and \({\mathscr {L}} = {\mathscr {K}}\), then a matrix \(U_p^*\) is the result of an orthogonal projection onto \({\mathscr {L}}\) if and only if it minimizes the A-norm of the error over \({\mathscr {K}}\), i.e.,

$$\begin{aligned} U_p^*= \underset{U_p\in {\mathscr {K}}}{\arg \min } \,J_A(U_p), \end{aligned}$$

where the objective function is

$$\begin{aligned} J_A(U_p) = \frac{1}{2} \Vert U - U_p\Vert _A^2. \end{aligned}$$
(2.1)

Because we seek a factored representation of \(U_p\), we slightly modify (2.1) to give

$$\begin{aligned} J_A( V_p,W_p) =\frac{1}{2} \Vert U - V_pW_p^{\mathsf {T}}\Vert _A^2, \end{aligned}$$
(2.2)

and a new minimization problem

$$\begin{aligned} \min _{V_p \in {\mathbb {R}}^{n_1 \times p}, W_p \in {\mathbb {R}}^{n_2 \times p}} J_A(V_p,W_p). \end{aligned}$$
(2.3)

Since \(J_A\) is quadratic, gradients with respect to \(V_p\) and \(W_p\) can be easily obtained as

$$\begin{aligned} \nabla _{V_p} J_A&= \left( {\mathscr {A}} (V_p W_p^{\mathsf {T}}) - B \right) W_p = \sum _{i=0}^m (K_i V_p W_p^{\mathsf {T}}G_i^{\mathsf {T}}) W_p - BW_p,\end{aligned}$$
(2.4)
$$\begin{aligned} \nabla _{W_p} J_A&= \left( {\mathscr {A}} (V_p W_p^{\mathsf {T}}) - B \right) ^{\mathsf {T}}V_p = \sum _{i=0}^m (K_i V_pW_p^{\mathsf {T}}G_i^{\mathsf {T}})^{\mathsf {T}}V_p - B^{\mathsf {T}}V_p. \end{aligned}$$
(2.5)

Employing the first-order optimality condition on (2.4)–(2.5) (i.e., setting (2.4) and (the transpose of) (2.5) to be zero) results in the set of equations

$$\begin{aligned} \sum _{i=0}^m (K_i V_p W_p^{\mathsf {T}}G_i^{\mathsf {T}}) W_p&= BW_p \in {\mathbb {R}}^{n_1 \times p},\end{aligned}$$
(2.6)
$$\begin{aligned} \sum _{i=0}^m V_p^{\mathsf {T}}(K_i V_p W_p^{\mathsf {T}}G_i ^{\mathsf {T}})&= V_p^{\mathsf {T}}B \in {\mathbb {R}}^{ p \times n_2 } . \end{aligned}$$
(2.7)

These equations can be interpreted as projections of the residual \(B - {\mathscr {A}} (V_p W_p^{\mathsf {T}}) \) onto the spaces spanned by the columns of \(W_p\) and \(V_p\), respectively.

Given (2.6)–(2.7), a widely used strategy for solving the minimization problem (2.3) is to compute each component of the solution pair \((V_p,W_p)\) alternately [5,6,7, 15,16,17]. That is, one can fix \(W_p\) and solve the system of equations of order \(n_1 p\) in (2.6) for \(V_p\), and then one can fix \(V_p\) and solve the system of equations of order \(n_2 p\) in (2.7) for \(W_p\). However, in this approach, suitable choices of p for satisfying a fixed error tolerance are typically not known a priori. Thus, adaptive schemes that incrementally compute solution pairs (\(v_i\),\(w_i\)) have been introduced [17, 30, 31, 46]. All of these schemes are based on alternately solving two systems of equations for two types of variables in an effort to minimize a certain error measure. In this study, we employ alternating methods for minimizing the energy norm of the error (2.3) and, thus, we refer to approaches of this type as alternating energy minimization (AEM) methods. In the following sections, we present two adaptive variants of AEM methods: a Stage-\(p\) AEM method and a successive rank-one AEM method.

2.2 Stage-\(p\) AEM method

In [17], an ALS method that entails solving a sequence of least-squares problems whose dimensions increase with p was developed for solving matrix-recovery problems [15,16,17]. We adapt this approach to the energy minimization problem (2.3) and refer to it as the Stage-\(p\) AEM method. It is an iterative method that runs until an approximate solution satisfies a stopping criterion (e.g., the relative difference of two consecutive iterates \(\Vert V_pW_p^{\mathsf {T}}- V_{p-1}W_{p-1}^{\mathsf {T}}\Vert _\text {F}\le \epsilon \Vert V_pW_p^{\mathsf {T}}\Vert _\text {F}\) with a user-specified stopping tolerance \(\epsilon \).) At the pth iteration, called a “stage” in [17], this method seeks p-column factors \(V_p\) and \(W_p\) determining an approximate solution by initializing \(W_p^{(0)}\) and solving the following systems of equations in sequence:

$$\begin{aligned} \sum _{i=0}^m (K_i) V_p^{(k)} (W_p^{(k-1)}{}^{\mathsf {T}}G_i W_p^{(k-1)})^{\mathsf {T}}&= BW_p^{(k-1)}, \end{aligned}$$
(2.8)
$$\begin{aligned} \sum _{i=0}^m (V_p^{(k)}{}^{\mathsf {T}}K_i V_p^{(k)}) W_p^{(k)}{}^{\mathsf {T}}(G_i ^{\mathsf {T}})&= V_p^{(k)}{}^{\mathsf {T}}B, \end{aligned}$$
(2.9)

for \(k=1,\ldots ,k_{\max }\), where the superscript indicates the number of alternations between the two systems of Equations (2.8)–(2.9). Note that the method can also begin by initializing \(V_p^{(0)}\) and alternating between (2.9) and (2.8). Algorithm 1 summarizes the entire procedure. The CheckConvergence procedure (line 9) is detailed in Sect. 3. Terms of the form \(V_{p-1}\) or \(W_{p-1}\) that appear in several places for \(p=1\) (for example, in line 3 of Algorithm 1) correspond to null or “zero-column” matrices.

figure a

Systems of equations for “vectorized” versions of the matrix factors \(V_p\) and \(W_p\) can be derivedFootnote 1 from (2.8) and (2.9) as follows

$$\begin{aligned} \sum _{i=0}^{m} [ (W_p^{(k-1)}{}^{\mathsf {T}}G_i W_p^{(k-1)})\otimes K_i ]\,\text {vec}(V_p^{(k)})&= \text {vec}(B W_p^{(k-1)}),\end{aligned}$$
(2.10)
$$\begin{aligned} \sum _{i=0}^{m} [ (V_p^{(k)}{}^{\mathsf {T}}K_i V_p^{(k)}) \otimes G_i ]\,\text {vec}(W_p^{(k)})&= \text {vec}( B^{\mathsf {T}}V_p^{(k)}{}). \end{aligned}$$
(2.11)

Thus, solving (2.8) entails solving a linear system of dimension \(n_1 p \times n_1 p\), and solving (2.9) entails solving a system of dimension \(n_2 p \times n_2 p\). Both systems are smaller than the original system (1.2) when p is small. However, the blocks of the reduced matrices of size \(p\times p\) such as \((W_p^{\mathsf {T}}G_i W_p\) and \(V_p^{\mathsf {T}}K_i V_p\)) are dense, even if the original ones are sparse, and so as p increases, the computational costs for solving (2.8)–(2.9) increase and the Stage-\(p\) AEM method may be impractical for large-scale problems.

2.3 Successive rank-one AEM method

We now describe a successive rank-one (S-rank-\(1\)) approximation method which, at each iteration, adds a rank-one correction to the current iterate. This is a basic component of PGD methods [30, 31, 46] for solving parameterized PDEs. The method only requires solutions of linear systems with coefficient matrices of size \(n_1 \times n_1\) and \(n_2 \times n_2\) rather than coupled systems like those in the Stage-\(p\) AEM method that grow in size with the step counter p.

Assume that \(p-1\) pairs of solutions are computed, giving \(V_{p-1}\) and \(W_{p-1}\). The next step is to compute a new solution pair \((v_p,w_p)\) by choosing the objective function

$$\begin{aligned} J_A( v_p, w_p) =\frac{1}{2} \Vert U - V_{p-1}W_{p-1}^{\mathsf {T}}- v_p w_p^{\mathsf {T}}\Vert _A^2, \end{aligned}$$

and solving the following minimization problem

$$\begin{aligned} \min _{v_p \in {\mathbb {R}}^{n_1}, w_p \in {\mathbb {R}}^{n_2}} J_A(v_p, w_p). \end{aligned}$$

The gradients of \(J_A\) with respect to \(v_p\) and \(w_p\) are

$$\begin{aligned} \nabla _{v_p} J_A&= \left( {\mathscr {A}}(v_{p} w_{p}^{\mathsf {T}}) + {\mathscr {A}}(V_{p-1} W_{p-1}^{\mathsf {T}}) - B\right) w_p,\end{aligned}$$
(2.12)
$$\begin{aligned} \nabla _{w_p} J_A&= \left( {\mathscr {A}}(v_{p} w_{p}^{\mathsf {T}}) + {\mathscr {A}}(V_{p-1} W_{p-1}^{\mathsf {T}}) - B\right) ^{\mathsf {T}}v_p. \end{aligned}$$
(2.13)

Employing the first-order optimality conditions (setting (2.12) and (the transpose of) (2.13) to zero) results in systems of equations for which, in a succession of steps \(k=1,\ldots ,k_{\max }\), \(v_p\) is updated using fixed \(w_p\) and then \(w_p\) is updated using fixed \(v_p\):

$$\begin{aligned} \sum _{i=0}^m (K_i) v_p^{(k)} (w_p^{(k-1)}{}^{\mathsf {T}}G_i w_p^{(k-1)})^{\mathsf {T}}&= Bw_p^{(k-1)} - {\mathscr {A}}( V_{p-1}W_{p-1}^{\mathsf {T}}) w_p^{(k-1)}, \end{aligned}$$
(2.14)
$$\begin{aligned} \sum _{i=0}^m (v_p^{(k)}{}^{\mathsf {T}}K_i v_p^{(k)}) w_p^{(k)}{}^{\mathsf {T}}(G_i^{\mathsf {T}})&= v_p^{(k)}{}^{\mathsf {T}}B - v_p^{(k)}{}^{\mathsf {T}}{\mathscr {A}}( V_{p-1} W_{p-1}^{\mathsf {T}}). \end{aligned}$$
(2.15)

Algorithm 2 summarizes this procedure, which randomly initializes \(w_p^{(0)}\) and then alternately solves (2.14)–(2.15). Like the Stage-\(p\) AEM method, the algorithm can start with either \(w_p^{(0)}\) or \(v_p^{(0)}\).

figure b

2.4 Algebraic interpretation of the methods

Algorithms 1 and 2 both entail an “outer iteration” with counter p and an “inner iteration” with counter k, and both are designed to minimize the objective function (2.2). It is instructive to see the difference between the two methods in vectorized format. To this end, let

$$\begin{aligned} {{\mathscr {A}}_w}(w_i,w_j) = \sum _{{l=0}}^m K_{l} (w_j^{\mathsf {T}}G_{l}^{\mathsf {T}}w_i) \in {\mathbb {R}}^{n_1 \times n_1},\quad {{\mathscr {A}}_v}(v_i,v_j) = \sum _{{l=0}}^m G_{l} (v_j^{\mathsf {T}}K_{l}^{\mathsf {T}}v_i) \in {\mathbb {R}}^{n_2 \times n_2}, \end{aligned}$$

and let us assume \(p=2\) for simplifying the presentation.

Both methods seek solution pairs \((V_2, W_2) =(\left[ v_1, v_2\right] , \left[ w_1, w_2\right] )\) satisfying the systems of Eqs. (2.6)–(2.7), which can be written in a vectorized form:

$$\begin{aligned} \left[ \begin{array}{ll} A_w(w_1,w_1) &{} A_w(w_1,w_2)\\ A_w(w_2,w_1) &{} A_w(w_2,w_2)\\ \end{array} \right] \left[ \begin{array}{ll} v_1 \\ v_2 \end{array} \right]&= \left[ \begin{array}{ll} B w_1\\ B w_2 \end{array} \right] , \end{aligned}$$
(2.16)
$$\begin{aligned} \left[ \begin{array}{ll} A_v(v_1,v_1) &{} A_v(v_1,v_2) \\ A_v(v_2,v_1) &{} A_v(v_2,v_2) \\ \end{array} \right] \left[ \begin{array}{ll} w_1 \\ w_2 \end{array} \right]&= \left[ \begin{array}{ll} B^{\mathsf {T}}v_1\\ B^{\mathsf {T}}v_2 \end{array} \right] . \end{aligned}$$
(2.17)

In the second outer iteration, the Stage-\(p\) AEM method alternately solves fully coupled linear systems (2.8)–(2.9) specified by \(W_2^{(k-1)}\) and \(V_2^{(k)}\), respectively , which can be written in vectorized form as in (2.16) and (2.17), respectively:

$$\begin{aligned}&\left[ \begin{array}{ll} A_w(w_1^{(k-1)},w_1^{(k-1)}) &{} \!A_w(w_1^{(k-1)},w_2^{(k-1)})\\ A_w(w_2^{(k-1)},w_1^{(k-1)}) &{}\! A_w(w_2^{(k-1)},w_2^{(k-1)})\\ \end{array} \right] \quad \!\!\! \left[ \begin{array}{ll} v_1^{(k)} \\ v_2^{(k)} \end{array} \right] = \left[ \begin{array}{ll} B w_1^{(k-1)}\\ B w_2^{(k-1)} \end{array} \right] ,\!\! \nonumber \\&\quad \left[ \begin{array}{ll} A_v(v_1^{(k)},v_1^{(k)}) &{} A_v(v_1^{(k)},v_2^{(k)}) \!\!\\ A_v(v_2^{(k)},v_1^{(k)}) &{} A_v(v_2^{(k)},v_2^{(k)}) \!\!\\ \end{array} \right] \quad \!\!\! \left[ \begin{array}{ll} \!\!w_1^{(k)}\! \\ \!\!w_2^{(k)}\! \end{array} \right] = \left[ \begin{array}{ll} B^{\mathsf {T}}v_1^{(k)}\\ B^{\mathsf {T}}v_2^{(k)} \end{array}\!\! \right] . \end{aligned}$$
(2.18)

In contrast, the S-rank-\(1\) method seeks approximate solutions of (2.16)–(2.17) by solving systems of equations associated with the diagonal blocks. In the first outer iteration, the method alternates between the following equations to find \(v_1\) and \(w_1\):

$$\begin{aligned}&\left[ \begin{array}{ll} A_w(w_1^{(k-1)},w_1^{(k-1)})\!\!\!&\end{array} \right] \quad \!\!\! \left[ \begin{array}{ll} v_1^{(k)} \!\!\! \end{array} \right] = \left[ \begin{array}{ll} B w_1^{(k-1)}\!\!\! \end{array} \right] , \\&\quad \left[ \begin{array}{ll} A_v(v_1^{(k)},v_1^{(k)})&\end{array} \right] \quad \left[ \begin{array}{ll} w_1^{(k)} \end{array} \right] = \left[ \begin{array}{ll} B^{\mathsf {T}}v_1^{(k)} \end{array} \right] . \end{aligned}$$

In the second outer iteration, the method alternately solves the systems of equations in the second rows of the following equations to find \(v_2\) and \(w_2\):

$$\begin{aligned}&\left[ \begin{array}{ll} A_w(w_1,w_1)\!\!\! &{} \\ A_w(w_2^{(k-1)},w_1) &{} A_w(w_2^{(k-1)},w_2^{(k-1)})\!\!\! \end{array} \right] \quad \!\!\! \left[ \begin{array}{ll} v_1 \\ v_2^{(k)} \end{array} \right] \!\!= \!\! \left[ \begin{array}{ll} B w_1\\ B w_2^{(k-1)} \end{array} \right] , \\&\quad \left[ \begin{array}{ll} A_v(v_1,v_1) &{} \\ A_v(v_2^{(k)},v_1) &{} A_v(v_2^{(k)},v_2^{(k)}) \end{array} \right] \quad \!\!\! \left[ \begin{array}{ll} w_1 \\ w_2^{(k)} \end{array} \right] = \left[ \begin{array}{ll} B^{\mathsf {T}}v_1\\ B^{\mathsf {T}}v_2^{(k)} \end{array} \right] . \end{aligned}$$

Because \(v_1\) and \(w_1\) are fixed, the (2,1)-block matrices are multiplied by \(v_1\) and \(w_1\) and the resulting vectors are moved to the right-hand sides. Then solving the equations associated with the (2,2)-block matrices gives \(v_2^{(k)}\) and \(w_2^{(k)}\). As illustrated in this example, the S-rank-\(1\) AEM method approximately solves (2.16)–(2.17) by taking the matrices in the lower-triangular blocks to the right-hand sides and solving only the systems associated with the diagonal blocks, as opposed to solving fully coupled systems as in the Stage-\(p\) AEM method.

The system matrices that arise in Algorithm 1 have reduced components that are dense but small (of size \(p \times p\)) whereas the “non-reduced” components are large but sparse. In Algorithm 2, the system matrices are sparse and of order \(n_1\) and \(n_2\) (as the reduced components are of size 1 \(\times \) 1). Thus in both cases, we may use Krylov subspace methods to solve the systems. Then, with the iteration counter p, the cost of the Stage-\(p\) AEM method grows quadratically (since the reduced components are dense), whereas that of the S-rank-\(1\) AEM method grows linearly with p. Thus, using the Stage-\(p\) AEM method can be impractical for large-scale applications. On the other hand, as the S-rank-\(1\) AEM method employs only the lower-triangular part of the system matrices, convergence tends to be slow and the level of accuracy that can be achieved in a small number of steps is limited. To overcome these shortcomings, we will consider several ways to modify and enhance them to improve accuracy.

Remark 2.1

The Stage-\(p\) AEM and S-rank-\(1\) AEM methods can be seen as two extreme versions of AEM methods. The former solves fully coupled systems and the latter sequentially solves systems associated with the diagonal blocks. Although it has not been explored in this study, in an intermediate approach, more than one consecutive pair of solution vectors \((\{v_{p},\ldots ,v_{p+\ell }\},\{w_{p},\ldots ,w_{p+\ell }\})\), with \(\ell \in {\mathbb {N}}\), can be computed in a coupled manner at each outer iteration.

3 Enhancements

We now describe variants of the S-rank-\(1\) AEM method that perform extra computations to improve accuracy. The general strategy is to compute an enhancement of the approximate solution at every \(n_{\text {update}}\) outer iterations as specified in Algorithms 3–5.

figure c
figure d
figure e

We present three enhancement procedures, one taken from the literature and two new ones. These are (i) a procedure adopted from an updating technique developed in [46, Section 2.5], which defines one variant of PGD methods; (ii) a refined version of this approach, which only solves systems associated with the diagonal blocks of the system matrices but incorporates information (upper-triangular blocks) in a manner similar to Gauss-Seidel iterations; and (iii) an adaptive enhancement of the Stage-\(p\) AEM method that decreases costs with negligible impact on accuracy. In discussing these ideas, we distinguish updated solutions using the notation, \({\overline{v}}_i\), \({\overline{w}}_i\) (for vectors), and \({\overline{V}}_p = [{\overline{v}}_1,\ldots ,{\overline{v}}_p]\), \({\overline{W}}_p = [{\overline{w}}_1,\ldots ,{\overline{w}}_p]\) (for matrices). In addition, we also review the method proposed in [23].

Before we detail each method, we first elaborate on the \(\textsc {CheckConvergence}\) procedure in Algorithm 5. This checks the relative difference between the current iterate and the previous iterate \(\Vert V_pW_p^{\mathsf {T}}- V_{p-1}W_{p-1}^{\mathsf {T}}\Vert _\text {F}\le \epsilon \Vert V_pW_p^{\mathsf {T}}\Vert _\text {F}\) in the Frobenius norm. To compute \( \Vert V_pW_p^{\mathsf {T}}\Vert _\text {F}^{2}\) while avoiding explicitly forming the large matrix \(V_pW_p^{\mathsf {T}}\), we form \(X = (V_p^{\mathsf {T}}V_p) \odot (W_p^{\mathsf {T}}W_p) \in {\mathbb {R}}^{p\times p}\), where \(\odot \) is the Hadamard product, and then sum up all the elements of X. The product \(V_p W_p^{\mathsf {T}}\) is never explicitly formed. If this condition is met, we apply the \(\textsc {Enhancement}\) procedure and check the convergence with the same criterion. The purpose of this extra enhancement is to help prevent Algorithm 3 from terminating prematurely (i.e., the stopping condition can be met when Algorithm 3 stagnates).

3.1 PGD-updated AEM

Suppose the factors \(V_p\) and \(W_p\) obtained from RankOneCorrection do not satisfy the first-order optimality conditions (2.6)–(2.7). An enhancement like that of the PGD update [30, 31, 46] modifies one of these factors (e.g., the one corresponding to the smaller dimension \(n_1\) or \(n_2\)) by solving the associated minimization problem for \(V_p\) (given \(W_p\), when \(n_1<n_2\)) or for \(W_p\) (given \(V_p\) when \(n_1>n_2\)) so that one of the first-order conditions holds. We outline the procedure for approximating \(W_p\); the procedure for \(V_p\) is analogous. The basic procedure is to solve the optimization problem \(\min _{W_p \in {\mathbb {R}}^{n_2 \times p}} J_A\left( V_p, W_p\right) \) every \(n_{\text{ u }pdate}\) steps. In place of \(V_p\), an orthonormal matrix \({{\widetilde{V}}}_p\) is used, so that the construction entails solving

$$\begin{aligned} {\overline{W}}_p = \underset{W_p \in {\mathbb {R}}^{n_2 \times p}}{\arg \min }J_A\left( {\widetilde{V}}_p, W_p\right) , \end{aligned}$$
(3.1)

where \(J_A\) is the quadratic objective function defined in (2.2). The gradient of the objective function \(J_A\) with respect to \(W_p\) can be computed as

$$\begin{aligned} \nabla _{W_p} J_A&= \left( {\mathscr {A}} ({\widetilde{V}}_p W_p^{\mathsf {T}}) - B \right) ^{\mathsf {T}}{\widetilde{V}}_p = \sum _{i=0}^m (K_i {\widetilde{V}}_p W_p^{\mathsf {T}}G_i^{\mathsf {T}})^{\mathsf {T}}{\widetilde{V}}_p - B^{\mathsf {T}}{\widetilde{V}}_p. \end{aligned}$$

Thus, solving the minimization problem (3.1) by employing the first-order optimality condition is equivalent to solving a system of equations similar in structure to (2.7),

$$\begin{aligned} \sum _{i=0}^m ({\widetilde{V}}_p^{\mathsf {T}}K_i {\widetilde{V}}_p) {\overline{W}}_p^{\mathsf {T}}(G_i ^{\mathsf {T}})&= {\widetilde{V}}_p^{\mathsf {T}}B \in {\mathbb {R}}^{ p \times n_2 }. \end{aligned}$$
(3.2)

Compared to the original system (1.2), the dimension of this matrix is reduced via a “single-sided” reduction; in (3.2), the reduction is on the side of the first dimension, i.e., \(n_1\) is reduced to p. The vectorized form of this system, for \(p=2\), is

$$\begin{aligned} \left[ \begin{array}{ll} \!\!A_v({{\tilde{v}}}_1,{{\tilde{v}}}_1) \!\!&{}\!\! A_v({{\tilde{v}}}_1,{{\tilde{v}}}_2) \!\!\\ \!\!A_v({{\tilde{v}}}_2,{{\tilde{v}}}_1) \!\!&{}\!\! A_v({{\tilde{v}}}_2,{{\tilde{v}}}_2) \!\!\\ \end{array} \right] \quad \!\!\! \left[ \begin{array}{ll} {\overline{w}}_1\!\!\! \\ {\overline{w}}_2\!\!\! \end{array} \right] \,= \, \left[ \begin{array}{ll} B^{\mathsf {T}}{{\tilde{v}}}_1\\ B^{\mathsf {T}}{{\tilde{v}}}_2 \end{array} \right] , \end{aligned}$$

which has structure like that of the second system in (2.18) of the Stage-\(p\)  AEM method. We summarize this single-sided enhancement method in Algorithm 6.

figure f

Remark 3.1

Another approach for computing a set of orthonormal basis vectors and computing a low-rank solution by solving a reduced system of type (3.2) is given in [36]. The MultiRB method of [36] incrementally computes a set of orthonormal basis vectors for the spatial part of the solution (i.e., \({\widetilde{V}}_p \in {\mathbb {R}}^{n_1 \times p}\)) using rational Krylov subspace methods and solves a reduced system for \({\overline{W}}_p\) and, consequently, \(U_p= {\widetilde{V}}_p {\overline{W}}_p^{\mathsf {T}}\).

3.2 PGD/Gauss–Seidel-updated AEM

The second strategy for enhancement, like Algorithm 2 (and in contrast to PGD-updated AEM), only requires solutions of linear systems with coefficient matrices of dimensions \(n_1 \times n_1\) and \(n_2 \times n_2\), independent of p. As observed in Sect. 2.4, the S-rank-\(1\) AEM method loosely corresponds to solving lower block-triangular systems of equations. We modify these computations by using more information (from the upper triangular part), as soon as it becomes available. This leads to a method that resembles the (block) Gauss–Seidel method for linear systems [14]. Suppose \(\{(v_i, w_i)\}_{i=1}^{p}\) are obtained from p iterations of Algorithm 3. When the condition on line 5 of Algorithm 3 is met, these quantities will be updated in sequence to produce \(\{({\overline{v}}_i, {\overline{w}}_i)\}_{i=1}^{p}\) using the most recently computed quantities. In particular, suppose the updated pairs \(\{({\overline{v}}_i, {\overline{w}}_i)\}_{i=1}^{l-1}\) have been computed. Then the lth pair \((v_l, w_l)\) is updated as follows. First, given \(w_l\), the update \({\overline{v}}_l\) is computed by solving

$$\begin{aligned} {\mathscr {A}}_w(w_l, w_l) {\overline{v}}_l = B w_l - {\sum _{i=1}^{l-1} {\mathscr {A}}_w( w_l, {\overline{w}}_i) {\overline{v}}_i} - \sum _{i=l+1}^{p} {\mathscr {A}}_w( w_l, w_i) v_i. \end{aligned}$$
(3.3)

Then given \({\overline{v}}_l\), \({\overline{w}}_l\) is computed by solving

$$\begin{aligned} {\mathscr {A}}_v({\overline{v}}_l,{\overline{v}}_l) {\overline{w}}_l = B^{\mathsf {T}}{\overline{v}}_l - \sum _{i=1}^{l-1} {\mathscr {A}}_v( {\overline{v}}_l, {\overline{v}}_i) {\overline{w}}_i - \sum _{i=l+1}^{p} {\mathscr {A}}_v( {\overline{v}}_l, {v}_i) {w}_i. \end{aligned}$$
(3.4)
figure g

With \(p=2\) as an example, in vector format, the first step of this enhancement is to update \((v_1, w_1)\) to \(({\overline{v}}_1,{\overline{w}}_1)\) by solving the following equations:

$$\begin{aligned}&\left[ \begin{array}{ll} \qquad \!\!\!A_w({w}_1,{w}_1)\!\!\! \qquad &{}\qquad \!\!\!A_w({w}_1,w_2)\;\!\!\!\\ \end{array} \right] \quad \!\!\! \left[ \begin{array}{ll} {\overline{v}}_1 \\ {v}_2 \end{array} \right] \,= \, \left[ \begin{array}{ll} B w_1\\ \end{array} \right] , \nonumber \\&\quad \left[ \begin{array}{ll} \qquad A_v({\overline{v}}_1,{\overline{v}}_1) \qquad &{}\qquad A_v({\overline{v}}_1,v_2)\; \quad \\ \end{array} \right] \quad \!\!\! \left[ \begin{array}{ll} {\overline{w}}_1 \\ w_2 \end{array} \right] \,= \, \left[ \begin{array}{ll} B^{\mathsf {T}}{\overline{v}}_1\\ \end{array} \right] , \end{aligned}$$

and the second step is to update \((v_2, w_2)\) to \(({\overline{v}}_2,{\overline{w}}_2)\) by solving the second row of the following equations:

$$\begin{aligned} \begin{aligned} \left[ \begin{array}{ll} A_w({\overline{w}}_1,{\overline{w}}_1) \quad &{}\quad A_w({\overline{w}}_1,{w}_2) \;\\ A_w(w_2,{\overline{w}}_1) \quad &{}\quad A_w(w_2,w_2) \;\\ \end{array} \right] \quad \!\!\! \left[ \begin{array}{ll} {\overline{v}}_1 \\ {\overline{v}}_2 \end{array} \right] \,= \, \left[ \begin{array}{ll} B {\overline{w}}_1\\ B w_2 \end{array} \right] , \\ \left[ \begin{array}{ll} A_v({\overline{v}}_1,{\overline{v}}_1) \quad &{}\quad A_v({\overline{v}}_1,{\overline{v}}_2) \;\\ A_v({\overline{v}}_2,{\overline{v}}_1)\quad &{}\quad A_v({\overline{v}}_2,{\overline{v}}_2) \end{array} \right] \quad \!\!\! \left[ \begin{array}{ll} {\overline{w}}_1 \\ {\overline{w}}_2 \end{array} \right] \,= \, \left[ \begin{array}{ll} B^{\mathsf {T}}{\overline{v}}_1\\ B^{\mathsf {T}}{\overline{v}}_2 \end{array} \right] . \end{aligned} \end{aligned}$$

This strategy, which we call the PGD/GS enhancement, is summarized in Algorithm 7. It is an alternative to Algorithm 6 and is also applied every \(n_{\text {update}}\) outer iterations. For a comparison of Algorithms 6 and 7, note that Algorithm 6 (PGD-update) works with a larger system but it can exploit the matricized representation (3.2). Once the system matrices \({\widetilde{G}}_i = {\widetilde{W}}_p^{\mathsf {T}}G_i \widetilde{W}_p\) or \({\widetilde{K}}_i= {\widetilde{V}}_p^{\mathsf {T}}K_i {\widetilde{V}}_p\) are formed, if it is not too large, the system in (3.2) (of order \(n_2p\) in this example) can be approximately solved using an iterative method such as the preconditioned conjugate gradient (PCG) method. In contrast, Algorithm 7 (PGD/GS) requires sequential updates of individual components in Eqs. (3.3)–(3.4), but with smaller blocks, of order \(n_1\) and \(n_2\). As we will show in Sect. 4, the PGD/GS-updated AEM method exhibits better performance in some error measures.

We have found that in practice, the enhancement procedure can be improved by updating only a chosen subset of solution pairs rather than all the solution pairs \(\{(v_i,w_i)\}_{i=1}^{p}\). We discuss a criterion to choose such a subset next.

3.3 Reduced stage-\(p\) AEM method

The third enhancement procedure excerpts and modifies certain computations in the Stage-\(p\)  AEM method (Lines 5 and 6 in Algorithm 1) in a computationally efficient way. The procedure adaptively chooses solution pairs to be updated and solves reduced systems to update only those pairs. Let us assume for now that a subset of the solution pairs to be updated has been chosen. Denote the set of indices of those solution pairs by \(\ell (p) \subseteq \{1,\ldots ,p-1\}\) and the remaining indices by \(\ell ^{\text {c}}(p) = \{1,\ldots ,p-1\} \setminus \ell (p)\). Then the update is performed by solving the following equations for \({\overline{V}}_{\ell (p)}\) and \({\overline{W}}_{\ell (p)}\):

$$\begin{aligned} \sum _{i=0}^{m} (K_i) {\overline{V}}_{\ell (p)} ({\widetilde{W}}_{\ell (p)}^{\mathsf {T}}G_i {\widetilde{W}}_{\ell (p)})^{\mathsf {T}}&= B {\widetilde{W}}_{\ell (p)} - \sum _{i=0}^m (K_i) V_{\ell ^{\text {c}}(p)} ({\widetilde{W}}_{\ell (p)}^{\mathsf {T}}G_i W_{\ell ^{\text {c}}(p)})^{\mathsf {T}}, \end{aligned}$$
(3.5)

where \({\widetilde{W}}_{\ell (p)}\) is obtained by orthonormalizing the columns of \(W_{\ell (p)}\), and

$$\begin{aligned} \sum _{i=0}^{m} ({\widetilde{V}}_{\ell (p)}^{\mathsf {T}}K_i {\widetilde{V}}_{\ell (p)}) {\overline{W}}_{\ell (p)}^{\mathsf {T}}( G_i ^{\mathsf {T}})&= {\widetilde{V}}_{\ell (p)}^{\mathsf {T}}B - \sum _{i=0}^m ({\widetilde{V}}_{\ell (p)}^{\mathsf {T}}K_i V_{\ell ^{\text {c}}(p)}) W_{\ell ^{\text {c}}(p)}^{\mathsf {T}}( G_i ^{\mathsf {T}}), \end{aligned}$$
(3.6)

where \({\widetilde{V}}_{\ell (p)}\) is obtained by orthonormalizing the columns of \({\overline{V}}_{\ell (p)}\). Then, \({V}_{\ell (p)}\) and \({W}_{\ell (p)}\) are updated to \({\overline{V}}_{\ell (p)}\) and \({\overline{W}}_{\ell (p)}\), while \({V}_{\ell ^{\text {c}}(p)}\) and \({W}_{\ell ^{\text {c}}(p)}\) remain the same.

We now describe a criterion to choose a subset of the solution pairs to be updated. Let us assume that \(p-1\) iterations of Algorithm 3 have been performed, and \(V_{p-1}\) and \(W_{p-1}\) have been computed. The pth solution pair (\(v_p\), \(w_p\)) is then computed via Algorithm 4. If \(p\!\!\! \mod n_{\text {update}}=0\), then a subset of the previous \(p-1\) solution pairs is chosen by inspecting the angles between \(v_p\) and the columns of \(V_{p-1}\) and similarly for \(w_{p}\) and \(W_{p-1}\). We normalize all vectors \({\tilde{v}}_i = \frac{v_i}{\Vert v_i\Vert _2}\) and compute \(\beta _V = {\widetilde{V}}_{p-1}^{\mathsf {T}}{{\tilde{v}}}_p \in {\mathbb {R}}^{p-1}\) (the vector of cosines of the angles), and an analogous vector \(\beta _W\) using \(w_p\) and \(W_{p-1}\). The entries of \(\beta _V\) and \(\beta _W\) indicate how far from orthogonal all previous vectors are to \(v_p\) and \(w_p\). Ideally, we want the method to compute p left and right singular vectors of the solution U (i.e., \(\beta _V\! =\! \beta _W\!=\!0\)). As the aim is to find good basis vectors for approximating U, it is undesirable to keep vectors that are far from being orthogonal to \(v_p\) and \(w_p\). To resolve this, we choose a subset of columns of \(V_{p-1}\) and \(W_{p-1}\) for which the entries of \(\beta _V\) and \(\beta _W\) are too large; we fix \(\tau >0\) and choose

$$\begin{aligned} \ell (p) = \{ i \in \{1,\ldots , p-1 \} \mid | [\beta _V]_i |> \tau \text { or } | [\beta _W]_i | > \tau \}. \end{aligned}$$
(3.7)

Algorithm 8 summarizes the resulting reduced stage-\(p\) (R-stage-\(p\)) enhancement.

figure h

3.4 PGD-greedy AEM

As a baseline for comparison, we review the greedy low-rank method proposed in [20] and further examined in [23], which can be interpreted as another variant of the S-rank-\(1\) AEM method with Enhancement. We denote this method as PGD-greedy AEM in this study. The method seeks an approximate solution in a three-factor form \(U_p = {\widetilde{V}}_p Z_p {\widetilde{W}}_p^{\mathsf {T}}\), where the columns of \({\widetilde{V}}_p\) and \({\widetilde{W}}_p\) are orthonormal; this can be achieved by (i) slightly modifying RankOneCorrection (Algorithm 4) and (ii) employing a particular enhancement procedure. The modified version of RankOneCorrection computes a new basis pair (\({{\tilde{v}}}_p,{{\tilde{w}}}_p\)), where each vector has unit norm, by setting \({{\tilde{w}}}_p^{(0)}\) to have unit norm and alternately performing the following procedure:

$$\begin{aligned}&\mathrm {Solve}\;\; {{\mathscr {A}}_w}({{\tilde{w}}}_p^{(k-1)},{{\tilde{w}}}_p^{(k-1)}) v_p^{(k)} = B{{\tilde{w}}}_p^{(k-1)}\! -\! {\mathscr {A}}( U_{p-1}^{\mathsf {T}}) {{\tilde{w}}}_p^{(k-1)}, \,&{{\tilde{v}}}_p^{(k)}&\!\leftarrow { v_p^{(k)}}/{\Vert v_p^{(k)}\Vert _2}, \\&\mathrm {Solve}\;\; {{\mathscr {A}}_v}({{\tilde{v}}}_p^{(k)},{{\tilde{v}}}_p^{(k)}) w_p^{(k)} = B^{\mathsf {T}}{{\tilde{v}}}_p^{(k)}\!-\! {\mathscr {A}}( U_{p-1}^{\mathsf {T}})^{\mathsf {T}}v_p^{(k)}, \,&{{\tilde{w}}}_p^{(k)}&\!\leftarrow { w_p^{(k)}}/{\Vert w_p^{(k)}\Vert _2}. \end{aligned}$$

This inner iteration continues until a stopping criterion \( \left| \Vert v_p^{(k)}\Vert _2 / \Vert w_p^{(k)}\Vert _2 - 1\right| \le \delta \) is satisfied, where \(\delta \) is a stopping tolerance. Then (at each outer iteration) the approximate solution \(U_p\) is computed by solving a reduced linear system of equations, which is obtained via a “double-sided” reduction, for \(Z_p\) such that

$$\begin{aligned} {\widetilde{V}}_p^{\mathsf {T}}{\mathscr {A}}( {\widetilde{V}}_p Z_p {\widetilde{W}}_p^{\mathsf {T}}) {\widetilde{W}}_p&= {\widetilde{V}}_p^{\mathsf {T}}B{\widetilde{W}}_p \in {\mathbb {R}}^{p \times p}. \end{aligned}$$
(3.8)

We refer readers to [20, 23] for more details on this method.

At the pth outer iteration, this double-sided reduction technique requires the computation of the solution of Eq. (3.8) which is of size \(p\times p\), whereas the PGD-update method employs a single-sided reduction requiring the computation of solutions of Eq. (2.7) which are of size \(\min (n_1,n_2) \times p\). The R-stage-\(p\) method, on the other hand, updates only the solution pairs chosen based on the criterion (3.7) and, thus, the size of problems arising in that approach is affected by the value of \(\tau \).

4 Numerical experiments

In this section, we present the results of numerical experiments with the algorithms described in Sects. 2 and 3. For benchmark problems, we consider stochastic diffusion problems, where the stochasticity is assumed to be characterized by a prescribed set of m real-valued random variables. We apply suitable SGFEM discretizations to these problems, resulting in LMTMEs of the form (1.2) whose system matrices are symmetric positive-definite. All experiments are performed on an Intel 3.1 GHz i7 CPU, with 16 GB RAM, using Matlab R2019b.

4.1 Stochastic diffusion problems

Let \((\varOmega ,{\mathscr {F}}, P)\) be a probability space and let \(D=[0, 1]\times [0,1]\) be the spatial domain. Next, let \(\xi _{i}: \varOmega \rightarrow \varGamma _{i} \subset {\mathbb {R}}\), for \(i=1,\ldots , m,\) be independent and identically distributed random variables and define \(\xi =[\xi _{1}, \ldots , \xi _{m}]\). Then, \(\xi : \varOmega \rightarrow \varGamma \) where \(\varGamma \equiv \prod _{i=1}^{m} \varGamma _i\) denotes the image. Given a second-order random field \(a: D\times \varGamma \rightarrow {\mathbb {R}}\), we consider the following boundary value problem with constant forcing term \(f(x)=1\). Find \(u: D \times \varGamma \rightarrow {\mathbb {R}}\) such that

$$\begin{aligned} \left\{ \begin{array}{r l l} &{}-\nabla \cdot (a(x,\xi ) \nabla u(x,\xi ) ) = f(x) &{}\text { in } D \times \varGamma ,\\ &{}u(x,\xi ) = 0 &{}\text { on } \partial D \times \varGamma , \end{array} \right. \end{aligned}$$
(4.1)

where \(a(x,\xi )\) has the form (1.5) and the \(\xi _i\) are chosen to be independent uniform random variables. Note that (1.5) has the same structure as a truncated Karhunen-Loève (KL) expansion [27]. If we denote the joint probability density function of \(\xi \) by \(\rho (\xi )\) then the expected value of a random function \(v(\xi )\) on \(\varGamma \) is \(\langle v \rangle _\rho = \int _\varGamma v(\xi ) \rho (\xi ) d\xi .\)

For the discretization, we consider the stochastic Galerkin method [1, 13, 28, 47], which seeks an approximation to the solution of the following weak formulation of (4.1): Find \(u(x,\xi )\) in \(V = H_0^1(D) \otimes L_{\rho }^2(\varGamma )\) such that

$$\begin{aligned} \left\langle \int _D a({x},\,\xi ) \nabla u({x},\,\xi ) \cdot \nabla v({x},\,\xi ) d{x} \right\rangle _\rho = \left\langle \int _D f(x) v({x},\,\xi ) dx \right\rangle _\rho \!\!, \quad \forall v \in V.\qquad \quad \end{aligned}$$
(4.2)

In particular, we seek an approximation of form \({\tilde{u}}({x},\,{\xi }) = \sum _{s=1}^{n_\xi } \sum _{r=1}^{n_x} u_{rs} \phi _r({x}) \psi _s({\xi }), \) where \(\{\phi _r\}_{r=1}^{n_x}\) is a set of standard finite element basis functions, which arises from using continuous piecewise bilinear approximation on a uniform mesh of square elements (Q1 elements) and \(n_x\) is related to the refinement level of the spatial mesh (our implementation uses the Incompressible Flow & Iterative Solver Software (IFISS) [11, 42]). In addition, \(\{ \psi _s \}_{s=1}^{n_\xi }\) is chosen to be a finite subset of the set of orthonormal polynomials that provides a basis for \(L_{\rho }^{2}(\varGamma )\) (also known as a generalized polynomial chaos (gPC), [48]). As the random variables are uniformly distributed, we use m-variate normalized Legendre polynomials \(\{\psi _s \}_{s=1}^{n_\xi }\), which are constructed as products of univariate Legendre polynomials, \(\psi _s(\xi ) = \prod _{i=1}^{m} \pi _{d_i(s)}(\xi _i)\). Here, \(d(s) = (d_1(s),\ldots ,d_m(s))\) is a multi-index and \(\pi _{d_i(s)}\) is the \(d_i(s)\)-order univariate Legendre polynomial in \(\xi _i\). A set of multi-indices \(\{d(s)\}_{s=1}^{n_\xi }\) is specified as a set \(\varLambda _{m,\, d_\text {tot}} = \{d(s) \in {\mathbb {N}}^{m}_0: \Vert {d}(s)\Vert _1 \le d_\text {tot}\}\), where \({\mathbb {N}}_0\) is the set of non-negative integers, and \(\Vert {d}(s)\Vert _1 = \sum _{j=1}^{m} d_j(s)\). With this construction, \(\text {span}(\{\psi _s(\xi )\}_{s=1}^{n_\xi })\) is the set of polynomials in \(\xi \) of total degree \(d_{\text {tot}}\) or less, and with dimension \(n_\xi \!=\! \text {dim}(\varLambda _{m,\, d_\text {tot}})\! =\! \frac{(m+d_\text {tot})!}{m!d_\text {tot}!}\).

Galerkin projection of (4.2) onto the chosen finite-dimensional space (i.e., using the same test basis functions as the trial basis functions), with the coefficients of the solution expansion ordered as \(u = [u_{11}, \ldots , u_{n_x1},u_{12}, \ldots , u_{n_x n_\xi }]^{\mathsf {T}}\) results in

$$\begin{aligned} \left( \sum _{i=0}^m G_i \otimes K_i \right) u = g_0 \otimes f_0, \end{aligned}$$
(4.3)

where the system matrices are defined as

$$\begin{aligned}{}[G_0]_{st}&= \left\langle \psi _s ({\xi }) \psi _t ({\xi }) \right\rangle _\rho ,&\quad [K_0]_{k\ell }= \int _D a_{0}(x) \nabla \phi _k ({x}) \cdot \nabla \phi _\ell ({x}) d{x}, \\ [G_i]_{st}&= \left\langle \xi _i\, \psi _s ({\xi }) \psi _t ({\xi }) \right\rangle _\rho ,&\quad [K_i]_{k\ell } = \int _D a_i({x}) \nabla \phi _k ({x}) \cdot \nabla \phi _\ell ({x}) d{x}, \end{aligned}$$

for \(i = 1,\,\ldots ,\,m\), \(s,t=1,\ldots , n_{\xi }\) and \(k,\ell =1,\ldots n_{x}\). Due to the deterministic forcing term \(f(x)=1\), the right-hand side has a rank-one structure (i.e., \(\hat{m}=0\) in (1.1)), with \([f_0]_k = \int _D f(x) \phi _k({x}) d{x},\) and \([g_0]_s = \left\langle \psi _s ({\xi }) \right\rangle _\rho \). Matricizing (4.3) gives an LMTME of the form (1.2) with \(n_1 = n_x\) and \(n_2 = n_\xi \), where m is the number of terms in Eq. (1.5), and we can now apply the AEM methods to compute an approximate solution.

4.2 Benchmark problem 1: separable exponential covariance

In this problem, we assume that the random field \(a(x,\xi )\) is a truncated KL expansion

$$\begin{aligned} a(x,\xi ) = \mu + \sigma \sum _{i=1}^{m} \sqrt{\lambda _i} \varphi _i(x) \xi _i, \end{aligned}$$
(4.4)

where \(\mu \) is the mean of \(a(x,\xi )\), \(\{(\varphi _i(x),\lambda _i)\}_{i=1}^{m}\) are eigenpairs of the integral operator associated with the covariance kernel \(C({x},\, {y}) \equiv \exp \left( - \frac{|x_1 - y_1|}{c} - \frac{|x_2 - y_2|}{c} \right) \), c is the associated correlation length, and \(\sigma ^2\) is the variance of the untruncated random field. In addition, each \(\xi _i \sim U(-\sqrt{3},\sqrt{3})\) and so has mean zero and variance one.

In the following sections, we compare the six AEM variants: Stage-\(p\) (Algorithm 1), S-rank-\(1\) (Algorithm 2), PGD-updated (Algorithm 6), PGD/GS-updated (Algorithm 7), reduced stage-\(p\) (Algorithm 8), and PGD-greedy from [20, 23]. For orthonormalization in PGD-updated (Algorithm 6) and reduced stage-\(p\) (Algorithm 8), we use Matlab’s qr function to implement the so-called skinny QR method. For assessing performance, we explore two key aspects. The first is the accuracy of the computed solutions, which we assess by computing two error metrics: cosines of angles between the truth singular vectors and the columns of the computed factors (Sect. 4.2.1), and errors between the truth solution and the computed solution measured in three different norms (Sect. 4.2.2). The second aspect is timings and scalability (Sect. 4.2.3). As the assessment of the first aspect requires the ground truth solution of (4.3), which is computed using Matlab’s backslash operator, and its singular vectors, we choose small-sized problems in Sects. 4.2.14.2.2. When making comparisons with the truth solution, we force all the AEM methods to execute \(p_{\max }=\min (n_x,n_\xi )=56\) iterations. Larger problems are considered in Sect. 4.2.3, where scalability matters and finding the truth solution is impossible with the available resources.

In producing experimental results in Sects. 4.2.14.2.2, we attempt to see each method’s best possible results without considering the computational costs. Hence, we set \(k_{\max }=5\), \(n_\text {update}=1\) (i.e., enhancements are performed at every outer iteration) in Algorithm 3. For the same reason, we set PGD/GS to update all the solution pairs and, for R-stage-\(p\), we set \(\tau =.001\). For PGD-greedy  we use \(\delta =.1\) (the inner iteration stopping tolerance) following [23].Footnote 2 All linear systems that arise in each AEM method are solved using PCG. The stopping criterion for these iterations is for the relative residual to be less than the stopping tolerance \(10^{-12}.\) We also apply so-called mean-based preconditioners, which we will discuss in detail in Sect. 4.2.3.

Fig. 1
figure 1

Cosines of angles (plotted in \(\log \) scale) between the left singular vectors \(V^*\) and \({\widetilde{V}}_p\), where \({\widetilde{V}}_p\) are computed using the Stage-\(p\) and S-rank-\(1\) AEM methods, and the EnhancedAEM methods with PGD-update, PGD/GS, R-stage-\(p\) enhancements, and PGD-greedy

4.2.1 Relation to singular vectors

We begin by exploring how the factors in the approximate solutions constructed by each of the methods compare with the left and right singular vectors of the true solution matrix U. This is important because (i) singular vectors represent the most effective choice with respect to the Frobenius norm for approximating a matrix U. That is, the minimum error over all rank-p approximations is \(\Vert U - \widetilde{V}_p\varSigma _p \widetilde{W}_p^{\mathsf {T}}\Vert _\text {F}\), where \(U=\widetilde{V}\varSigma \widetilde{W}^{\mathsf {T}}\) is the singular value decomposition [8], and (ii) in some applications such as collaborative filtering for recommendation systems, computing singular vectors accurately is very important for precise predictions [17, 19, 49]. For these tests, the diffusion coefficient is given by (4.4) with \((\mu ,\sigma ) = (1,.1)\) and \(c=2\). We use a spatial discretization with grid level 4 (i.e., grid spacing \(\frac{1}{2^4}\), and \(n_x=225\)) and we truncate the expansion (4.4) at \(m=5\). For the SGFEM approximation, we choose \(d_{\text {tot}}=3\) which gives \(n_\xi = 56\).

Fig. 2
figure 2

Cosines of angles (plotted in \(\log \) scale) between the right singular vectors \(W^*\) and \({\widetilde{W}}_p\), where \({\widetilde{W}}_p\) are computed using the Stage-\(p\) and S-rank-\(1\) AEM methods, and the EnhancedAEM methods with PGD-update, PGD/GS, R-stage-\(p\) enhancements, and PGD-greedy

For any approximation of the form (1.3), let \({\widetilde{V}}_p\) and \({\widetilde{W}}_p\) be normalized versions of the factors, i.e., each column of \({\widetilde{V}}_p\) and \({\widetilde{W}}_p\) is scaled to have unit norm. From the ground truth solution U, the matrices \(V^*\) and \(W^*\) of left and right singular vectors are computed. The entries of \(V^*{}^{\mathsf {T}}{\widetilde{V}}_p\), the cosines of the angles between the left singular vectors of the true solution and the left vectors defining the approximate solution, together with the analogous angles for the right vectors, \(W^{*}{}^{\mathsf {T}}{\widetilde{W}}_p\), give insight into the quality of the approximate solution. Figs. 1a and 2a and Figs. 1b and 2b depict the cosines of the angles between the singular vectors and the columns of \({\widetilde{V}}_p\) and \({\widetilde{W}}_p\) computed using the Stage-\(p\) AEM and S-rank-\(1\) AEM methods discussed in Sect. 2. It can be seen from these results (in Figs. 1a and 2a) that the Stage-\(p\) AEM method does a good job of approximating the singular vectors of the solution. That is, the values of the diagonal entries are close to one and the values of the off-diagonal entries are close to zero. On the other hand, the S-rank-\(1\) AEM method (see Figs. 1b and 2b) is far less effective. The \(2\times 2\) blocks on the diagonals in Figs. 1a and 1a reflect the presence of equal singular values.

Figures 1c–f and 2c–f show analogous results for the four enhanced AEM methods. With PGD-update, the spatial component gets reduced (i.e., we form \({\widetilde{K}}_i= {\widetilde{V}}_p^{\mathsf {T}}K_i{\widetilde{V}}_p\)) and \(W_p\) is updated. Figures 1c and 2c show that this computation improves the quality of the resulting factor \(W_p\) (and \(V_p\) as well) as approximate singular vectors, compared to those obtained with the S-rank-\(1\) method. It is evident that PGD/GS further improves the quality of \({\widetilde{V}}_p\) and \({\widetilde{W}}_p\) (Figs. 1d and 2d) as approximate singular vectors, and R-stage-\(p\) is nearly as effective as Stage-\(p\) (Figs. 1e and 2e). The PGD-greedy AEM method is less effective in finding the left and the right singular vectors (Figs. 1f and 2f) than PGD/GS and R-stage-\(p\).

4.2.2 Assessment of solution accuracy

We now compare the convergence behavior of the variants of the AEM methods introduced in Sects. 2 and 3. We use two different settings for the stochastic diffusion coefficient: [exp1] \((\mu ,\sigma ) = (1, .1)\), \(c=2\) and [exp2] \((\mu ,\sigma ) = (1, .2)\), \(c=.5\). We again truncate the series (4.4) at \(m=5\) and, for the Legendre basis polynomials, we consider \(d_{\text {tot}}=3\) which gives \(n_\xi = 56\). We deliberately keep the same value for m and \(d_{\text {tot}}\) for both settings so that we can keep the dimensions of the problem the same and, thus, directly compare the behavior of each method in different problem settings.

Fig. 3
figure 3

Solution errors measured in the energy norm

For each method, the approximate solution \(U_p\) is computed and we measure the accuracy compared to the reference solution U. We did this using three different metrics: the energy norm error \(\Vert U - U_p\Vert _A\), the error in the Frobenius norm \( \Vert U-U_p\Vert _\text {F}\), and the residual in the Frobenius norm \(\Vert B - {\mathscr {A}}(U_p)\Vert _\text {F}\). Note that direct computation of the residual norm is only possible due to the small size of the problem. See [22] for a method to compute this norm using skinny QR factorization in cases where m, p and \(\hat{m}\) are not too large. Here, we only report the energy norm errors (in Fig. 3), as behavior for the other two metrics is virtually identical. For comparison, a rank-p reference solution (referred to as “full” in Fig. 3) is obtained directly from the first p singular values and singular vectors of U.

For both settings, as expected, the convergence behavior of the S-rank-\(1\) AEM method is significantly worse than that of the rank-p reference solution, whereas that of the Stage-\(p\) AEM method is virtually the same as for the full direct solver. The EnhancedAEM method with PGD-update converges well until a certain level of accuracy is achieved, but it fails to achieve a high level of accuracy. The EnhancedAEM methods with PGD/GS and R-stage-\(p\) enhancement are more effective than with the PGD-update. The accuracy that those two methods achieve is virtually the same as that of the Stage-\(p\) AEM method and the full direct solver.

4.2.3 Computational timings

The above results do not account for computational costs; we now investigate timings under various experimental settings with a more practical outer stopping criterion. This is important for large-scale applications, and so we now consider a finer spatial grid, with grid level 6 (i.e., grid spacing \(\frac{1}{2^6}\), and \(n_x=3969\)), as well as larger parameter spaces, with \(m=\{20,24\}\) (the number of random variables in (4.4)) and \(d_{\text {tot}}=4\), which results in \(n_\xi = \{10626,20475\}\). We use the same settings for the stochastic diffusion coefficient [exp1] \((\mu ,\sigma ) = (1, .1)\), \(c=2\) and [exp2] \((\mu ,\sigma ) = (1, .2)\), \(c=.5\). Again, we set m and \(d_{\text {tot}}\) to be the same for both problems, as we want to keep the dimensions fixed so that we can make direct and fair comparisons.

Before we present these results, we summarize the systems of equations to be solved for each of the EnhancedAEM methods and the adjustable parameters that affect the performances of the methods.Footnote 3 We first describe how we solve the systems arising at the pth outer iteration when the condition for applying the enhancement is met, as well as the systems arising in RankOneCorrection (Algorithm 4). We use PCG to solve each system of equations using mean-based preconditioners [35], which are constructed using reduced versions of the matrices \(K_0\) and \(G_0\), that are adapted to each method. For all systems, each PCG iteration requires matrix-vector products in the matricized form (see [2, 22, 26] for detailed matrix operations)

$$\begin{aligned} \sum _{i=0}^m (M_x^{-1} \widetilde{K}_i) X (M_{\xi }^{-1} \widetilde{G}_i)^{\mathsf {T}}, \end{aligned}$$

where X is a quantity to be updated, \(\widetilde{K}_i\) and \(\widetilde{G}_i\) are reduced matrices, and \(M_x\) and \(M_\xi \) are the preconditioner factors. Table 1 summarizes each system matrix and preconditioner.Footnote 4

Table 1 System matrices and preconditioners for each Enhancement procedure

Now, we discuss adjustable parameters. The EnhancedAEM methods (Algorithms 3–5) require parameters \(p_{\max }\), \(k_{\max }\), \(n_{\text {update}}\), and \(\epsilon \). We set \(p_{\max }=1000\) to prevent excessive computations. We found that choosing \(k_{\max } > 2\) results in negligible difference in accuracy, but requires extra computations and, thus, we use \(k_{\max }=\{1,2\}\). For \(n_{\text {update}}\), which determines how often the enhancement procedure is called, we vary \(n_{\text {update}}\) as \(\{5,10,20,30\}\). Next, we use \(\epsilon \) to check the convergence of the outer iteration in Algorithm 5), and we vary \(\epsilon \) as \(\{10^{-10},10^{-9},10^{-8},10^{-7}\}\). Finally, for PGD/GS and R-stage-\(p\), we empirically found that choosing \(\tau > 0.05\) results in decreased accuracy in the approximate solution and, thus, we set \(\tau = 0.05\). Again, for PGD-greedy  we use \(\delta =.1\) (the inner iteration stopping tolerance) following [23].

Next, we set parameters for the PCG method. For all systems, the stopping criterion uses the relative residual in the Frobenius norm. We use two different tolerances: \(\tau _{\text {basis}}{}\) for solving systems that arise in RankOneCorrection and PGD/GS, and \(\tau _{\text {coupled}}{}\) for solving systems that arise in PGD-update, R-stage-\(p\), and PGD-greedy. Table 2 summarizes the parameters used for the experiments. We found that larger values of the tolerances \(\tau _{\text {basis}}{}\) and \(\tau _{\text {coupled}}{}\) led to poor performance and smaller values did not improve accuracy.

Table 2 Parameters used in the experiments for measuring timings
Fig. 4
figure 4

Computational timings (in seconds) of four EnhancedAEM methods for varying \(k_{\max }\) and \(n_{\text {update}}\). Timings of each method with each parameter set-up are averaged over 5 testing runs

In Fig. 4, we plot elapsed time (in seconds) against the relative residual error in the Frobenius norm of the final iterate for both [exp1] and [exp2]. Recall that we use the stopping condition in Algorithm 5 for the outer iteration, which has a lower computational cost and lower storage requirements. Here, we compute the final relative residual error in a separate post-processing step, simply for comparison. For these experiments, the relative residual error is observed to be up to three orders of magnitude larger than the value of \(\epsilon \) used for the chosen stopping condition (see Algorithm 5). A discussion about using the relative residual error and the backwards error as stopping criteria for smaller-scale stochastic Galerkin matrix equations is given in [37].

Results obtained with the EnhancedAEM methods with PGD-update, PGD/GS, R-stage-\(p\), and PGD-greedy are marked in red, green, blue, and magenta respectively, and each configuration of \(n_{\text {update}}\) and \(k_{\max }\) is marked with a different symbol. It can be seen from the figures that

  • the costs of R-stage-\(p\) and PGD/GS are less sensitive to \(n_{\text {update}}\) and \(k_{\max }\) than those of PGD-update;

  • the costs of PGD-greedy, which solves a reduced system at every outer iteration, tend to be larger then other methods;

  • R-stage-\(p\) is more efficient for smaller values of \(n_{\text {update}}\) whereas PGD/GS and PGD-update are better with larger \(n_{\text {update}}\);

  • for PGD-update and PGD/GS, relatively large \(n_{\text {update}} >10\) and \(k_{\max }=2\) results in better performances, and, for R-stage-\(p\), relatively small \(n_{\text {update}} \le 10\) and \(k_{\max }=1\) results in better performances.

Table 3 reports the number of outer iterations p required to achieve the stopping tolerance \(\epsilon \) for problems [exp1] and [exp2] when PGD-update, PGD/GS, and \(\text {R-stage-}p\) are used. The benefit of using R-stage-\(p\) becomes more pronounced as we seek highly accurate solutions with smaller \(\epsilon \). Our general observation is that among the four enhancement approaches, the R-stage-p method is less sensitive to the choice of algorithm parameter inputs, scales better for larger problem sizes, and is the most effective of the four approaches.

Table 3 The number of outer iterations p required to achieve the stopping tolerance \(\epsilon \) for solving the problems [exp1] and [exp2] when PGD-update, PGD/GS, and R-stage-\(p\) are used. The reported values of p are computed by averaging values of p obtained with the eight different combinations of \(n_\text {update}\) and \(k_{\max }\) shown in the legend of Fig. 4

We now briefly consider a second benchmark problem whose solution matrix has different rank characteristics and for which low-rank solvers ought to perform well.

4.3 Benchmark problem 2: fast decay coefficients

We define the random field \(a(x,\xi )\) as in (1.5) but now we choose \(\xi _{i} \sim U(-1,1)\) and the functions \(a_{i}(x)\) have coefficients that decay more rapidly than in the first benchmark problem. The details of this problem can be found in [9]. Specifically, the coefficients of the expansion are

$$\begin{aligned} a_{0}=1, \qquad a_i(x) = \alpha _i \cos (2\pi \varrho _1 (i) x_1) \cos (2\pi \varrho _2(i) x_2), \quad i=1,2,\ldots , m \end{aligned}$$

where \(\alpha _i ={\bar{\alpha }}i^{-\sigma }\) with \(\sigma >1\) and \({\bar{\alpha }}\) satisfies \(0<{\bar{\alpha }}<1/\zeta (\sigma )\), where \(\zeta \) is the Riemann zeta function. Furthermore, \(\varrho _1(i) = i - k(i) (k(i)+1)/2\) and \(\varrho _2(i) = k(i) - \varrho _1(i)\) where \(k(i)=\lfloor -1/2 + \sqrt{1/4+2i} \rfloor \). For computing the coefficients, we use the Matlab software package S-IFISS [41]. In the following experiment, we choose \(m=20\), \(\sigma =4\) and \({\bar{\alpha }}=0.832\). The parameter \(\sigma \) controls the rate of algebraic decay of the coefficients. The specific choice \(\sigma =4\) leads to fast decay and this causes the true solution matrix to have a lower rank than in the first benchmark problem.

We investigate computational timings of the EnhancedAEM methods with the same experimental settings used in Sect. 4.2.3. Here, we vary the stopping tolerance for the outer iterations as \(\epsilon =\{10^{-9},10^{-8},10^{-7},10^{-6}\}\) and we choose the same values of \(n_\text {update}\) and \(k_{\max }\) as before. Figure 5 reports elapsed time (in seconds) against relative residual error. In nearly all cases, our observations agree with the findings in Fig. 4. However, the impact of \(n_\text {update}\) is slightly less clear for these tests. The R-stage-p method is generally still less sensitive than the other two methods to the choices of \(n_\text {update}\) and \(k_{\max }\), with one exception, indicated by the blue triangle marker, which is located to the far right in Fig. 5. With \(n_\text {update}=30\), \(k_{\max }=2\), and \(\epsilon =10^{-9}\) (giving the right-most blue triangle), the R-stage-\(p\) method does not meet the stopping criterion until \(p\approx 125\), which is larger than the value \(p\approx 90\) needed for the other choices of algorithm inputs. We attribute this to the large number of steps (30) between enhancements; in this case, the method fell just short of the stopping criterion after 90 steps. Finally, we report the number of outer iterations p required to achieve the stopping tolerance \(\epsilon \) in Table 4. As the true solution matrix has an intrinsic low-rank structure, the reported values of p are much smaller than those shown in Table 3.

Fig. 5
figure 5

Computational timings (in seconds) of four EnhancedAEM methods for varying \(k_{\max }\) and \(n_{\text {update}}\). Timings of each method with each parameter set-up are averaged over 5 testing runs

Remark 4.1

We found the solvers to be largely insensitive to the choice of the parameters \(k_{\max }\) and \(n_\text {update}\); a similar observation was made in [30].

Table 4 The number of outer iterations p required to achieve the stopping tolerance \(\epsilon \) for solving the second benchmark problem when PGD-update, PGD/GS, R-stage-\(p\), and PGD-greedy are used. The reported values of p are computed by averaging values of p obtained with the eight different combinations of \(n_\text {update}\) and \(k_{\max }\) shown in the legend of Fig. 5

4.4 Further extensions

We tested all the AEM methods on matrix equations obtained from SGFEM discretizations of stochastic convection-diffusion problems [26, Section 5.2], where the randomness is in the diffusion coefficient as in Sect. 4.2. Although the energy norm cannot be defined for this problem as it has a non-symmetric operator, the same projection framework described herein can be applied to compute approximate solutions. Experiments (not reported here) were conducted similar to the ones in Sects. 4.2.14.2.2. We observed that the R-stage-\(p\) method produces qualitatively better approximate factors \(V_{p}\) and \(W_{p}\), as measured in the error metrics used in Sects. 4.2.14.2.2, than the S-rank-\(1\) AEM method and the other two EnhancedAEM methods. We also note that for problems with a non-symmetric operator, rank-one update ALS algorithms such as the ones in [20, 23] can be used.

5 Conclusions

In this study, we have investigated several variants of alternating minimization methods to compute low-rank solutions of matrix equations that arise from SGFEM discretizations of parameterized elliptic PDEs. Using a general formulation of alternating minimization of errors in the energy norm, derived from the well-known general projection method, our starting point was a variant of the stagewise ALS method, a technique for building rank-p approximate solutions developed for matrix completion and matrix sensing. Our main contribution consists of a combination of this approach with so-called enhancement procedures of the type used for PGD methods [30, 31] in which rank-one approximate solutions are enhanced by adaptive use of higher-rank quantities that improve solution quality but limit costs by adaptively restricting the rank of updates. Experimental results demonstrate that the proposed PGD/GS and R-stage-\(p\) methods produce accurate low-rank approximate solutions built from good approximations of the singular vectors of the matricized parameter-dependent solutions. Moreover, the results show that the R-stage-\(p\) method scales better for larger problems, is less sensitive to algorithm inputs, and produces approximate solutions in the fastest times.