Keywords

1 Introduction

In the present paper we discuss the construction of the all-at-once multigrid solvers for two model problems. The first model problem is a standard Poisson control problem: Find a state \(y \in H^1(\varOmega )\) and a control \(u\in L^2(\varOmega )\) such that they minimize the cost functional

$$\begin{aligned} J(y,u) := \tfrac{1}{2} \Vert y-y_D\Vert _{L^2(\varOmega )}^2 + \tfrac{\alpha }{2} \Vert u\Vert _{L^2(\varOmega )}^2, \end{aligned}$$

subject to the elliptic boundary value problem (BVP)

$$\begin{aligned} -\varDelta y + y = u \text{ in } \varOmega \qquad \text{ and } \qquad \tfrac{\partial y}{\partial n} = 0 \text{ on } \partial \varOmega . \end{aligned}$$

The desired state \(y_D\) and the regularization parameter \(\alpha >0\) are assumed to be given. Here and in what follows, \(\varOmega \subseteq \mathbb {R}^2\) is a polygonal domain. We want to solve the finite element discretization of this problem using a fast linear solver which shows robust convergence behavior in the grid size and the regularization parameter. For solving this problem, we use the method of Lagrange multipliers, cf. [5, 6]. We obtain a linear system in the state \(y\), the control \(u\) and the Lagrange multiplier \(\lambda \). In this linear system we eliminate the control as this has been done in [6, 9]. We discretize the resulting system using the Courant element and obtain a linear system:

$$\begin{aligned} \underbrace{ \left( \begin{array}{cc} M_k &{} K_k \\ K_k &{} -\alpha ^{-1} M_k \end{array} \right) }_{\displaystyle \mathcal {A}_k:=} \underbrace{\left( \begin{array}{c} \underline{y}_k \\ \underline{\lambda }_k \end{array} \right) }_{\displaystyle \underline{x}_k:=} = \underbrace{\left( \begin{array}{c} \underline{{f}}_k \\ 0 \end{array} \right) }_{\displaystyle \underline{ f }_k:=}. \end{aligned}$$
(1)

Here, \(M_k\) and \(K_k\) are the standard mass and stiffness matrices, respectively. The control can be recovered using the following simple relation from the Lagrange multiplier: \(\underline{u}_k = \alpha ^{-1} \underline{\lambda }_k\), cf. [6]. In [6, 12] it was shown that there are constants \(\underline{C}>0\) and \(\overline{C}>0\) (independent of the grid size \(h_k\) and the choice of \(\alpha \)) such that the stability estimate

$$\begin{aligned} \Vert \mathcal {Q}_k^{-1/2} \mathcal {A}_k \mathcal {Q}_k^{-1/2}\Vert \le \overline{C}\qquad \text{ and }\qquad \Vert \mathcal {Q}_k^{1/2} \mathcal {A}_k^{-1} \mathcal {Q}_k^{1/2}\Vert \le \underline{C}^{-1} \end{aligned}$$
(2)

holds for the symmetric and positive definite matrix

$$\begin{aligned} \mathcal {Q}_k := \left( \begin{array}{cc} M_k + \alpha ^{1/2} K_k \\ &{} \alpha ^{-1} M_k + \alpha ^{-1/2} K_k \\ \end{array} \right) . \end{aligned}$$

The second model problem is a standard Stokes control problem (velocity tracking problem): Find a velocity filed \(v\in [H^1(\varOmega )]^d\), a pressure distribution \(p\in L^2(\varOmega )\) and a control \(u\in [L^2(\varOmega )]^d\) such that

$$\begin{aligned} J(v,p,u) = \tfrac{1}{2} \Vert v-v_D\Vert _{L^2(\varOmega )}^2 + \tfrac{\alpha }{2} \Vert u\Vert _{L^2(\varOmega )}^2 \end{aligned}$$

is minimized subject to the Stokes equations

$$\begin{aligned} -\varDelta v + \nabla p = u \text{ in } \varOmega , \qquad \nabla \cdot v = 0 \text{ in } \varOmega , \qquad v = 0 \text{ on } \partial \varOmega . \end{aligned}$$

The regularization parameter \(\alpha >0\) and the desired state (desired velocity field) \(v_D\in [L^2(\varOmega )]^d\) are assumed to be given. To enforce uniqueness of the solution, we additionally require \(\int _{\varOmega } p \text{ d }x=0\).

Similar as above, we can set up the optimality system and eliminate the control, cf. [7, 12]. The discretization can be done using the Taylor-Hood element. After these steps, we end up with the following linear system:

$$\begin{aligned} \underbrace{\left( \begin{array}{cccc} M_k &{} &{} K_k &{} D_k^T \\ &{}0&{}D_k\\ K_k &{}D_k^T&{} -\alpha ^{-1} M_k\\ D_k&{}&{}&{}0 \end{array} \right) }_{\displaystyle \mathcal {A}_k:=} \underbrace{\left( \begin{array}{c} \underline{v}_k \\ \underline{p}_k \\ \underline{\lambda }_k \\ \underline{\mu }_k \\ \end{array} \right) }_{\displaystyle \underline{x}_k:=} = \underbrace{\left( \begin{array}{c} \underline{{f}}_k \\ 0\\ 0\\ 0 \end{array} \right) .}_{\displaystyle \underline{ f }_k:=} \end{aligned}$$
(3)

where \(M_k\) and \(K_k\) are standard mass and stiffness matrices and \(D_k^T\) is the discretization of the gradient operator, see, e.g., [7, 12]. Again, we are interested in a fast solver which is robust in the regularization parameter and the grid size. As in the previous example, the control \(\underline{u}_k\) can by recovered from the Lagrange multiplier: \(\underline{u}_k=\alpha ^{-1}\underline{\lambda }_k\). In [12] it was shown that stability estimate (2) is satisfied for

$$\begin{aligned} \mathcal {Q}_k = \text{ block-diag }\left( W_k,\; \alpha D_kW_k^{-1}D_k^T,\; \alpha ^{-1} W_k,\; D_kW_k^{-1}D_k^T \right) , \end{aligned}$$

where \(W_k:=M_k + \alpha ^{1/2} K_k\).

2 An All-at-once Multigrid Method

The linear systems (1) and (3) shall be solved by a multigrid method, which reads as follows. Starting from an initial approximation \(\underline{x}^{(0)}_k\), one iterate of the multigrid method is given by the following two steps:

  • Smoothing procedure: Compute

    $$\begin{aligned} \underline{x}^{(0,m)}_k := \underline{x}^{(0,m-1)}_k + \hat{\mathcal {A}}_k^{-1} \left( \underline{ f}_k -\mathcal {A}_k\;\underline{x}^{(0,m-1)}_k\right) \qquad \text{ for } m=1,\ldots ,\nu \end{aligned}$$

    with \(\underline{x}^{(0,0)}_k=\underline{x}^{(0)}_k\). The choice of the smoother (or, in other words, of the matrix \(\hat{\mathcal {A}}_k^{-1}\)) will be discussed below.

  • Coarse-grid correction:

    • Compute the defect \(\underline{ f}_k -\mathcal {A}_k\;\underline{x}^{(0,\nu )}_k\) and restrict it to grid level \(k-1\) using an restriction matrix \(I_k^{k-1}\): \( \underline{r}_{k-1}^{(1)} := I_k^{k-1} \left( \underline{ f}_k -\mathcal {A}_k \;\underline{x}^{(0,\nu )}_k\right) . \)

    • Solve the following coarse-grid problem approximatively:

      $$\begin{aligned} \mathcal {A}_{k-1} \,\underline{p}_{k-1}^{(1)} =\underline{r}_{k-1}^{(1)} \end{aligned}$$
      (4)
    • Prolongate \(\underline{p}_{k-1}^{(1)}\) to the grid level \(k\) using an prolongation matrix \(I^k_{k-1}\) and add the result to the previous iterate: \( \underline{x}_{k}^{(1)} := \underline{x}^{(0,\nu )}_k + I_{k-1}^k \, \underline{p}_{k-1}^{(1)}. \)

As we have assumed to have nested spaces, the intergrid-transfer matrices can be chosen in a canonical way: \(I_{k-1}^k\) is the canonical embedding and the restriction \(I_k^{k-1}\) is its (properly scaled) transpose. If the problem (4) is solved exactly, we obtain the two-grid method. In practice, the problem (4) is approximatively solved by applying one step (V-cycle) or two steps (W-cycle) of the multigrid method, recursively. Only the coarsest grid level, (4) is solved exactly.

The only part of the multigrid algorithm that has not been specified yet, is the smoother. For the choice of the smoother, we make use of the convergence theory. We develop a convergence theory based on Hackbusch’s splitting of the analysis into smoothing property and approximation property:

  • Smoothing property:

    $$\begin{aligned} \sup _{\underline{\tilde{x}}_k\in X_k} \frac{\left( \mathcal {A}_k(\underline{x}_k^{(0,\nu )}-\underline{x}_k^*), \underline{\tilde{x}}_k\right) _{\ell ^2}}{\Vert \underline{\tilde{x}}_k\Vert _{\mathcal {L}_k}} \le \eta (\nu ) \Vert \underline{x}_k^{(0)}-\underline{x}_k^*\Vert _{\mathcal {L}_k} \end{aligned}$$
    (5)

    should hold for some function \(\eta (\nu )\) with \(\lim _{\nu \rightarrow \infty }\eta (\nu )= 0\). Here and in what follows, \(\underline{x}_k^* := \mathcal {A}_k^{-1} \underline{ f }_k\) is the exact solution, \(\Vert \cdot \Vert _{\mathcal {L}_k}:= (\cdot ,\cdot )_{\mathcal {L}_k}^{1/2} := (\mathcal {L}_k \cdot ,\cdot )_{\ell ^2}^{1/2}\) for some symmetric positive definite matrix \(\mathcal {L}_k\) and \((\cdot ,\cdot )_{\ell ^2}\) is the standard Euclidean scalar product.

  • Approximation property:

    $$\begin{aligned} \Vert \underline{x}_k^{(1)}-\underline{x}_k^*\Vert _{\mathcal {L}_k}\le C_A \sup _{\underline{\tilde{x}}_k\in X_k} \frac{\left( \mathcal {A}_k(\underline{x}_k^{(0,\nu )}-\underline{x}_k^*), \underline{\tilde{x}}_k\right) _{\ell ^2}}{\Vert \underline{\tilde{x}}_k\Vert _{\mathcal {L}_k}} \end{aligned}$$

    should hold for some constant \(C_A>0\).

It is easy to see that, if we combine both conditions, we see that the two-grid method converges in the norm \(\Vert \cdot \Vert _{\mathcal {L}_k}\) for \(\nu \) large enough. The convergence of the W-cycle multigrid method can be shown under mild assumptions, see e.g. [3].

For the smoothing analysis, it is convenient to rewrite the smoothing property in pure matrix notation: (5) is equivalent to

$$\begin{aligned} \Vert \mathcal {L}_k^{-1/2}\mathcal {A}_k(I-\hat{\mathcal {A}}_k^{-1}\mathcal {A}_k)^{\nu }\mathcal {L}_k^{-1/2}\Vert \le \eta (\nu ). \end{aligned}$$
(6)

For the Poisson control problem, it was shown in [6], that the approximation property is satisfied for the following choice of the matrix \(\mathcal {L}_k\) (note that this matrix represents the norm \(\Vert \cdot \Vert _{X^-}\) used in the mentioned paper)

$$\begin{aligned} \mathcal {L}_k = \left( \begin{array}{cc} \text {diag}(M_k + \alpha ^{1/2} K_k) \\ &{} \text {diag}(\alpha ^{-1} M_k + \alpha ^{-1/2} K_k) \\ \end{array} \right) , \end{aligned}$$

i.e., \(\mathcal {L}_k = \text {diag}(\mathcal {Q}_k)\). Here and in what follows, \(\text {diag}(M)\) is the diagonal matrix containing the diagonal of a matrix \(M\). For the Stokes control problem it was shown in [7], that the approximation property is satisfied for the following choice of \(\mathcal {L}_k\):

$$\begin{aligned} \mathcal {L}_k = \left( \begin{array}{cccc} \hat{W}_k \\ {} &{}\hat{P}_k \\ &{}&{} \alpha ^{-1} \hat{W}_k \\ &{}&{}&{}\alpha ^{-1} \hat{P}_k \end{array} \right) , \end{aligned}$$

where \(\hat{W}_k := \text {diag}(M_k+\alpha ^{1/2} K_k)\) and \(\hat{P}_k := \alpha \;\text {diag}( D_k \hat{W}_k^{-1} D_k^T ).\)

Still, we have not specified the choice of the smoother, which now can be done using the convergence theory. We have seen for which choices of \(\mathcal {L}_k\) the approximation property is satisfied. We are interested in a smoother such that the smoothing property is satisfied for the same choice of \(\mathcal {L}_k\).

In [7, 9] a normal equation smoother was proposed. This approach is applicable to a quite general class of problems, cf. [2] and others. In our notation, the normal equation smoother reads as follows:

$$\begin{aligned} \underline{x}^{(0,m)}_k := \underline{x}^{(0,m-1)}_k + \tau \underbrace{\mathcal {L}_k^{-1} \mathcal {A}_k^T \mathcal {L}_k^{-1}}_{\displaystyle \hat{\mathcal {A}}_k^{-1}:=} \left( \underline{ f}_k -\mathcal {A}_k \;\underline{x}^{(0,m-1)}_k\right) \quad \text{ for } m=1,\ldots ,\nu . \end{aligned}$$

Here, a fixed \(\tau >0\) has to be chosen such that the spectral radius \(\rho (\tau \hat{\mathcal {A}}_k^{-1}\mathcal {A}_k)\) is bounded away from \(2\) on all grid levels \(k\) and for all choices of the parameters. It was shown that it is possible to find such an uniform \(\tau \) for the Poisson control problem, e.g., in [9] and for the Stokes control problem, e.g., in [7]. For the normal equation smoother, the smoothing property can be shown using a simple eigenvalue analysis, cf. [2]. Numerical experiments show that the normal equation smoother works rather well for the mentioned model problems. However, there are smoothers such that the overall multigrid method converges much faster. Note that the normal equation smoother is basically a Richardson iteration scheme, applied to the normal equation. It is well-known for elliptic problems that Gauss Seidel iteration schemes are typically much better smoothers than Richardson iteration schemes. In the context of saddle point problems, the idea of Gauss Seidel smoothers has been applied, e.g., in the context of collective smoothers, see below. However, in the context of normal equation smoothers the idea of Gauss Seidel smoothers has not gained much attention. The setup of such an approach is straight forward: In compact notation such an approach, which we call least squares Gauss Seidel (LSGS) approach, reads as follows:

$$\begin{aligned} \underline{x}^{(0,m)}_k := \underline{x}^{(0,m-1)}_k + \underbrace{ \text {trig}(\mathcal {N}_k)^{-1} \mathcal {A}_k^T \mathcal {L}_k^{-1}}_{\displaystyle \hat{\mathcal {A}}_k:=} \left( \underline{ f}_k -\mathcal {A}_k \;\underline{x}^{(0,m-1)}_k\right) \;\; \text{ for } m=1,\ldots ,\nu , \end{aligned}$$

where \(\mathcal {N}_k:=\mathcal {A}_k^T\mathcal {L}_k^{-1} \mathcal {A}_k\) and \(\text {trig}(M)\) is a matrix whose coefficients coincide with the coefficients of \(M\) on the diagonal and the left-lower triangular part and vanish elsewhere. The author provides a possible realization of that approach as Algorithm 2 to convince the reader that the computational complexity of the LSGS approach is equal to the computational complexity of the normal equation smoother, where a possible realization is given as Algorithm 1.

We will see below that the LSGS approach works very well in the numerical experiments. However, there is no proof of the smoothing property known to the author. This is due to the fact that the matrix \(\hat{\mathcal {A}}_k\) is not symmetric. One possibility to overcome this difficulty is to consider the symmetric version (symmetric least squares Gauss Seidel approach, sLSGS approach). This is analogous to the case of elliptic problems: For elliptic problems the smoothing property for the symmetric Gauss Seidel iteration can be shown for general cases but for the standard Gauss Seidel iteration the analysis is restricted to special cases, cf. Section 6.2.4 in [3].

figure a
figure b

One step of the sLSGS iteration consists of one step of the LSGS iteration, followed by one step of the LSGS iteration with reversed order of the variables. (So the computational complexity of one step of the sLSGS iteration is equal to the computational complexity of two steps of the standard LSGS iteration.) One step of the sLSGS iteration reads as follows in compact notation:

$$\begin{aligned}&\underline{x}^{(0,m)}_k := \underline{x}^{(0,m-1)}_k + \hat{\mathcal {N}}_k^{-1} \mathcal {A}_k^T \mathcal {L}_k^{-1} \left( \underline{ f}_k -\mathcal {A}_k \;\underline{x}^{(0,m-1)}_k\right) \qquad \text{ for } m=1,\ldots ,\nu ,\nonumber \\&\text{ where } \hat{\mathcal {N}}_k:=\text {trig}(\mathcal {N}_k)\; \text {diag}(\mathcal {N}_k)^{-1}\; \text {trig}(\mathcal {N}_k)^T\text{. } \end{aligned}$$
(7)

For our needs, the following convergence lemma is sufficient.

Lemma 1

Assume that \(\mathcal {A}_k\) is sparse, (2) is satisfied and let \(\mathcal {L}_k\) be a positive definite diagonal matrix such that

$$\begin{aligned} \Vert \mathcal {Q}_k^{1/2}\underline{x}_k\Vert \le \Vert \mathcal {L}_k^{1/2}\underline{x}_k\Vert \quad \text{ for } \text{ all } \underline{x}_k. \end{aligned}$$
(8)

Then the sLSGS approach satisfies the smoothing property (6), i.e.,

$$\begin{aligned} \Vert \mathcal {L}_k^{-1/2} \mathcal {A}_k (I-\hat{\mathcal {N}}_k^{-1} \mathcal {N}_k)^{\nu } \mathcal {L}_k^{-1/2}\Vert \le \frac{2^{-1/2}\;\overline{C}\;\text {nnz}(\mathcal {A}_k)^{5/2} }{\sqrt{\nu }}, \end{aligned}$$

where \(\text {nnz}(M)\) is the maximum number of non-zero entries per row of \(M\).

Note that (8) is a standard inverse inequality, which is satisfied for both model problems, cf. [6, 7, 9]. Note moreover that this assumption also has to be satisfied to show the smoothing property for the normal equation smoother, cf. [7, 9].

Proof of Lemma  1. The combination of (2) and (8) yields \(\Vert \mathcal {L}_k^{-1/2}\mathcal {A}_k \mathcal {L}_k^{-1/2}\Vert \le \overline{C}\). Property 6.2.27 in [3] states that for any symmetric positive definite matrix \(\mathcal {N}_k\)

$$\begin{aligned} \Vert \hat{\mathcal {N}}_k^{-1/2} \mathcal {N}_k (I-\hat{\mathcal {N}}_k^{-1} \mathcal {N}_k)^{\nu } \hat{\mathcal {N}}_k^{-1/2}\Vert \le \nu ^{-1} \end{aligned}$$
(9)

holds, where \(\hat{\mathcal {N}}_k\) is as in (7). Using \(\mathcal {D}_k:= \text{ diag }(\mathcal {N}_k)\), we obtain

$$\begin{aligned}&\Vert \mathcal {L}_k^{-1/2} \hat{\mathcal {N}}_k^{1/2}\Vert ^2 = \rho ( \mathcal {L}_k^{-1/2}\hat{\mathcal {N}}_k\mathcal {L}_k^{-1/2}) \le \Vert \mathcal {L}_k^{-1/2}\text {trig}(\mathcal {N}_k) \mathcal {D}_k^{-1/2}\Vert ^2\\&\quad \le \Vert \mathcal {L}_k^{-1/2}\mathcal {D}_k^{1/2}\Vert ^2 \Vert \mathcal {D}_k^{-1/2}\text {trig}(\mathcal {N}_k) \mathcal {L}_k^{-1/2}\Vert ^2 \end{aligned}$$

Let \(\mathcal {A}_k=(\mathcal {A}_{i,j})_{i,j=1}^N\), \(\mathcal {N}_k=(\mathcal {N}_{i,j})_{i,j=1}^N\), \(\mathcal {L}_k=(\mathcal {L}_{i,j})_{i,j=1}^N\) and \(\psi (i):=\{j\in \mathbb {N}:\mathcal {N}_{i,j}\not =0\}\). We obtain using Gerschgorin’s theorem, the fact that the infinity norm is monotone in the matrix entries, and using the symmetry of \(\mathcal {N}_k\) and \(\mathcal {A}_k\) and Cauchy-Schwarz inequality:

$$\begin{aligned}&\Vert \mathcal {D}_k^{-1/2}\text {trig}(\mathcal {N}_k) \mathcal {D}_k^{-1/2}\Vert \nonumber \\&\le \Vert \mathcal {D}_k^{-1/2}\text {trig}(\mathcal {N}_k) \mathcal {D}_k^{-1/2}\Vert _{\infty }^{1/2} \Vert \mathcal {D}_k^{-1/2}\text {trig}(\mathcal {N}_k)^T \mathcal {D}_k^{-1/2}\Vert _{\infty }^{1/2} \le \Vert \mathcal {D}_k^{-1/2}\mathcal {N}_k\mathcal {D}_k^{-1/2}\Vert _{\infty } \nonumber \\&= \max _{i=1,\ldots , N} \sum _{k\in \psi (i)} \left( \sum _{n=1}^N \frac{\mathcal {A}_{i,n}^2}{\mathcal {L}_{n,n}} \right) ^{-1/2} \left( \sum _{j=1}^N \frac{\mathcal {A}_{i,j} \mathcal {A}_{j,k}}{\mathcal {L}_{j,j}}\right) \left( \sum _{n=1}^N \frac{\mathcal {A}_{k,n}^2}{\mathcal {L}_{n,n}} \right) ^{-1/2}\nonumber \\&\le \max _{i=1,\ldots , N} \sum _{k\in \psi (i)} 1 = \text {nnz}(\mathcal {N}_k) \le \text {nnz}(\mathcal {A}_k)^2. \end{aligned}$$
(10)

Further, we obtain

$$\begin{aligned}&\Vert \mathcal {L}_k^{-1/2}\mathcal {D}_k^{1/2}\Vert ^2 =\Vert \mathcal {L}_k^{-1/2}\mathcal {D}_k^{1/2}\Vert _{\infty }^2 = \Vert \mathcal {L}_k^{-1/2}\mathcal {D}_k\mathcal {L}_k^{-1/2}\Vert _{\infty } = \max _{i=1,\ldots ,N} \sum _{j=1}^N \frac{\mathcal {A}_{i,j}^2}{\mathcal {L}_{i,i}\mathcal {L}_{j,j}} \nonumber \\&\quad \le \text {nnz}(\mathcal {A}_k) \max _{i,j=1,\ldots ,N}\frac{\mathcal {A}_{i,j}^2}{\mathcal {L}_{i,i}\mathcal {L}_{j,j}} = \text {nnz}(\mathcal {A}_k) \Vert \mathcal {L}^{-1/2}\mathcal {A}\mathcal {L}^{-1/2}\Vert ^2 \le \text {nnz}(\mathcal {A}_k) \;\overline{C}^2. \end{aligned}$$
(11)

By combining (9), (10) and (11), we obtain

$$\begin{aligned}&\Vert \mathcal {L}_k^{-1/2} \mathcal {A}_k (I-\hat{\mathcal {N}}_k^{-1} \mathcal {N}_k)^{\nu } \mathcal {L}_k^{-1/2}\Vert ^2 \\&\quad \le \Vert \mathcal {L}_k^{-1/2}(I- \mathcal {N}_k\hat{\mathcal {N}}_k^{-1})^{\nu }\mathcal {A}_k\mathcal {L}_k^{-1} \mathcal {A}_k (I-\hat{\mathcal {N}}_k^{-1} \mathcal {N}_k)^{\nu } \mathcal {L}_k^{-1/2}\Vert \\&\quad = \Vert \mathcal {L}_k^{-1/2}\mathcal {N}_k(I- \hat{\mathcal {N}}_k^{-1}\mathcal {N}_k)^{2\nu } \mathcal {L}_k^{-1/2}\Vert \le \frac{\overline{C}^2 \text {nnz}(\mathcal {A}_k)^5}{2\nu }, \end{aligned}$$

which finishes the proof. \(\square \)

We went to compare the numerical behavior of the LSGS approach with the behavior of a standard smoother. One class of standard smoothers for saddle point problems is the class of Vanka type smoothers, which has been originally introduced for Stokes problems, cf. [11]. Such smoothers have also gained interest for optimal control problems, see, e.g., [1, 8, 10].

The idea of Vanka type smoothers is to compute updates in subspaces directly for the whole saddle point problem and to combine these updates is an additive or a multiplicative way to compute the next update. Here, the variables are not grouped based on the block-structure of \(\mathcal {A}_k\), but the grouping is done of based on the location of the corresponding degrees of freedom in the domain \(\varOmega \). The easiest of such ideas for the Poisson control problems is to do the grouping point-wise, which leads to the idea of point smoothing. Here, we group for each node \(\delta _i\) of the discretization (each degree of freedom of the Courant element) the value \(y_i\) of the state and the value \(\lambda _i\) of the Lagrange multiplier and compute an update in the corresponding subspace. The multiplicative variant of such a smoother is a collective Gauss Seidel (CGS) smoother:

$$\begin{aligned} \underline{x}^{(0,m,i)}_k&:= \underline{x}^{(0,m,i-1)}_k + \mathcal {P}_k^{(i)} \left( \left. \mathcal {P}_k^{(i)}\right. ^T \mathcal {A}_k\mathcal {P}_k^{(i)}\right) ^{-1} \left. \mathcal {P}_k^{(i)}\right. ^T \left( \underline{f}_k -\mathcal {A}_k \;\underline{x}^{(0,m,i-1)}_k\right) , \end{aligned}$$

where \(\underline{x}^{(0,m,0)}_k:=\underline{x}^{(0,m-1)}_k\) and \(\underline{x}^{(0,m)}_k :=\underline{x}^{(0,m,N_k)}_k\). For each \(i=1,\ldots , N_k\), the matrix \(\mathcal {P}_k^{(i)}\in \mathbb {R}^{2 N_k\times 2}\) takes the value \(1\) on the positions \((i,1)\) and \((i+N_k,2)\) and the value \(0\) elsewhere. For the Poisson control problem, we obtain

$$\begin{aligned} \left. \mathcal {P}_k^{(i)}\right. ^T \mathcal {A}_k\mathcal {P}_k^{(i)} = \left( \begin{array}{cc} M_{i,i} &{} K_{i,i} \\ K_{i,i} &{} - \alpha ^{-1} M_{i,i} \end{array} \right) , \end{aligned}$$

where \(M_{i,i}\) and \(K_{i,i}\) are the entries of the matrices \(M_k\) and \(K_k\).

Fig. 1.
figure 1

Patches for the Vanka-type smoother applied to a Taylor Hood discretization. The dots are the degrees of freedom of \(v\) and \(\lambda \), the rectangles are the degrees of freedom of \(p\) and \(\mu \)

For the Stokes control problem, it is not reasonable to use exactly the same approach. This is basically due to the fact that the degrees of freedom for \(v\) and \(\lambda \) are not located on the same positions as the degrees of freedom for \(p\) and \(\mu \). However, we can introduce an approach based on patches: so, for each vertex of the triangulation, we consider subspaces that consist of the degrees of freedoms located on the vertex itself and the degrees of freedom located on all edges which have one end at the chosen vertex, cf. Fig. 1. Note that here the subspaces are much larger than the subspaces chosen in the case of the CGS approach for the Poisson control problem (which was just \(2\)). This increases the computational cost of applying the method significantly. For Vanka type smoothers there are only a few convergence results known, cf. [1] for a Fourier Analysis and an analysis based compactness argument and [8] for a proof based on Hackbusch’s splitting of the analysis into smoothing property and approximation property which shows the convergence in case of a collective Richardson smoother.

3 Numerical Results

In this section we give numerical results to illustrate quantitatively the convergence behavior of the proposed methods. The number of iterations was measured as follows: We start with a random initial guess and iterate until the relative error in the norm \({\Vert \cdot \Vert _{\mathcal {L}_k}}\) was reduced by a factor of \(10^{-6}\). Without loss of generality, the right-hand side was chosen to be \(0\). For both model problems, the normal equation smoother, the LSGS smoother, the sLSGS smoother and a Vanka type smoother have been applied. For the smoothers \(2\) pre- and \(2\) post-smoothing steps have been applied. Only for the sLSGS smoother, just \(1\) pre- and \(1\) post-smoothing step has been applied. This is due to the fact that one step of the symmetric version is basically the same computational cost as two steps of the standard version. The normal equation smoother was damped with \(\tau =0.4\) for the Poisson control problem and \(\tau =0.35\) for the Stokes control problem, cf. [7, 9]. For the Gauss Seidel-like approaches, damping was not used.

In Table 1, we give the results for the standard Poisson control problem. Here, we see that all smoothers lead to convergence rates that are well bounded for a wide range of \(h_k\) and \(\alpha \). Compared to the normal equation smoother, the LSGS smoother leads to a speedup be a factor of about two without any additional work. The symmetric version (sLSGS) is a bit slower than the LSGS method. For the first model problem, the (popular) CGS method is significantly faster. However, for this method no convergence theory is known.

Table 1. Number of iterations for the Poisson control model problem

In Table 2, we give the convergence results for the Stokes control problem. Also here we observe that the LSGS and the sLSGS approach lead to a speedup of a factor of about two compared to the normal equation smoother. Here, the Vanka type smoother shows slightly smaller iteration numbers than the LSGS approach. In terms of computational costs, the LSGS smoother seems to be much better than the patch-based Vanka type smoother because there relatively large subproblems have to be solved to compute the updates. This is different the case of the CGS smoother, where the subproblems are just \(2\)-by-\(2\) linear systems. Numerical experiments have shown that the undamped version of the patch-based Vanka type method does not lead to a convergent multigrid method. So, this smoother was damped with \(\tau =0.4\). Due to lack of convergence theory, the author cannot explain why this approach – although it is a multiplicative approach – needs damping.

Table 2. Number of iterations for the Stokes control model problem

For completeness, the author wants to mention that for cases, where a (closed form of a) matrix \(\mathcal {Q}_k\) satisfying (2) robustly is not known, the normal equation smoother does not show as good results as methods where such an information is not needed, like Vanka type methods. This was discussed in [8] for a boundary control problem, but it is also true for the linearization of optimal control problems with inequality constraints as discussed in [4] and others. The same is true for the Gauss Seidel like variants of the normal equation smoother.

Concluding, we have observed that accelerating the idea of normal equation smoothing with a Gauss Seidel approach, leads to a speedup of a factor of about two without any further work. The fact that convergence theory is known for the sLSGS approach, helps also for the numerical practice (unlike the case of Vanka type smoothers).