1 Introduction

We will use the following notation

  • \(\varOmega \subset \mathbb {R}^n\) is a bounded domain,

  • \(d_h\) is the observation data,

  • \(\rho > 1\) and \(\kappa \ge 0\) are parameters,

  • A, B and T are linear operators,

  • \(P^n_h\) represents a n-th order scalar FEM space,

  • \(V_h = H^1_h(\varOmega )\), i.e. \(P^1_h\) equipped with the \(H^1\)-norm, is the control space,

  • U and Z are Hilbert spaces,

  • \(U_h = U \cap P^n_h\) is the state space,

  • \(Z_h = Z \cap P^n_h\) is the observation space,

and we note that \(1 \le \text {dim}(V_h), \, \text {dim}(U_h), \, \text {dim}(Z_h) < \infty \). The objective of this paper is to propose and analyze an efficient algorithm for solving PDE-constrained optimization problems which can be written in the form

$$\begin{aligned} \min _{(v_h,u_h) \in V_h \times U_h} \left\{ \frac{1}{2}\rho \Vert Tu_h-d_h\Vert _{Z_h}^2 + \frac{1}{2}\kappa \Vert v_h\Vert _{L^2(\varOmega )}^2 + \int _\varOmega |Dv_h| \ dx \right\} \!, \end{aligned}$$
(1)

subject to

$$\begin{aligned} Au_h + Bv_h = 0. \end{aligned}$$
(2)

Here, (2) is a discretized PDE, and A, B and T will be discussed properly in Section 2. From the results published in [4], it follows that (12) has a unique solution if \(\kappa >0\), or if

$$\begin{aligned} K = - TA^{-1}B \end{aligned}$$
(3)

is injective.

In order to design a fast method for solving (12), we must not only use an efficient algorithm for handling the nonlinear total variation (TV) term in (1), but also a scheme suitable for solving the inner systems that arise in each iteration of the outer algorithm. This will be achieved by combining the approached analyzed in [11], which is suitable for linear optimality systems, with a successful method for solving deblurring problems, namely the split Bregman method [5].

The main result of this paper shows that the spectrum of the linear systems, arising in each Bregman iteration, has a nice structure. Therefore, the Minimal Residual (MINRES) method handles these subproblems very well: Theoretically, one can prove that only \(O([\ln (\alpha ^{-1})]^2)\) MINRES-iterations are required, where \(\alpha \) is a small regularization parameterFootnote 1. (In [11] it is also explained why iteration numbers of order \(O(\ln (\alpha ^{-1}))\) often will occur in practise.)

Frequently, Tikhonov regularization is employed to stabilize PDE-constrained optimization problems. This approach yields a linear-quadratic optimization problem, which is mathematically appealing, but will produce smooth solutions. In many inverse problems, the control parameter is some physical property, e.g. a heat source, density of a medium or an electrical potential. When we try to identify such quantities, it might be desirable to make a sharp separation between regions with different qualities of the physical property. In other words, we want “jumps” in the solution. Thus, one might argue that the smooth solutions produced with Tikhonov regularization are of limited value, and that TV-regularization should be employed. The inverse problem of electrocardiography is a problem of this type, where one seeks to locate the ischemicFootnote 2 region of the heart. This can be achieved with a PDE-constrained optimization problem, where the control is the electrical potential of the myocardium, and the data is given in terms of ECG recordings. The ischemic area can be determined from the fact that the electrical potential is (approximately) piecewise constant, with different values in the ischemic and healthy regions. From an imaging point of view, it would be beneficial to properly separate these areas [12, 18].

Note that we limit our study to discretized problems posed in terms of finite dimensional spaces. This simplifies the discussion of the TV-regularization and enables the use of the results published in [11].

This paper is organized as follows:

  • In Sect. 2 we show how the PDE-constrained optimization problem (12) can be modified in such a way that we can apply the split Bregman algorithm.

  • In Sect. 3 we prove that the KKT systems that arise in each iteration of the split Bregman algorithm have a spectrum almost contained in three bounded intervals, with a very limited number of isolated eigenvalues. Hence, Krylov subspace algorithms will handle these systems very well.

  • Sect. 4 presents an alternative version of the split Bregman algorithm.

  • In Sect. 5 we illuminate the theoretical results with some numerical experiments.

  • Finally, the conclusions are presented in Sect. 6.

2 Split Bregman algorithm for PDE-constrained optimization problems

The split Bregman method has its roots in the Bregman iteration, which is an algorithm for computing extrema of convex functionals [2]. Later, it was used in [13] as a new regularization procedure for inverse problems, and in [5] the authors used this approach to find an effective solution method for \(L^1\)-regularization problems. In particular, they demonstrated why this method works well for total variation problems. Motivated by these papers, we write (12) in the form

$$\begin{aligned} \min _{v_h,p_h \in V_h \times \mathbf {P}_h^0} \left\{ \frac{1}{2}\rho \Vert Kv_h-d_h\Vert _{Z_h}^2 + \frac{1}{2}\kappa \Vert v_h\Vert _{L_h^2(\varOmega )}^2 + \int _\varOmega |p_h| \right\} \!, \end{aligned}$$
(4)

subject to

$$\begin{aligned} \nabla v_h = p_h, \end{aligned}$$
(5)

where \(\mathbf {P}_h^0\) is a vector space of piecewise constant functions and K is defined in (3).

We will not go into details on how the split Bregman algorithm is derived, instead we refer to [5] and [3]. In this method, see Algorithm 1, we note that the parameter \(\lambda \) is fixed. Instead, it is the “data” that varies with the introduction of \(b^k\), where \(b^k\) can be interpreted as the Lagrange multiplier estimate associated with (5). The authors of [20] and [19] explain why the split Bregman scheme can be considered as an augmented Lagrangian method [6, 15].

figure a

At this point, we would like to present one important theorem from [3]:

Theorem 1

Assume that there exists at least one solution \(v_h^*\) of (45). Then the split Bregman algorithm satisfies

$$\begin{aligned}&\lim _{k\rightarrow \infty } \frac{1}{2}\rho \Vert Kv_h^k - d_h\Vert _{Z_h}^2 +\frac{1}{2}\kappa \Vert v_h^k\Vert _{L_h^2(\varOmega )}^2 + \int _\varOmega |\nabla v_h^k|\,dx\\&\quad = \frac{1}{2}\rho \Vert Kv_h^* - d_h\Vert _{Z_h}^2 + \frac{1}{2}\kappa \Vert v_h^*\Vert _{L_h^2(\varOmega )}^2 + \int _\varOmega |\nabla v_h^*|\,dx. \end{aligned}$$

If the solution \(v_h^*\) is unique, we also have

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert v_h^k-v_h^*\Vert _{L^2_h(\varOmega )} = 0. \end{aligned}$$

Recall that our main objective is to derive an efficient solution method for (12), i.e. for rather general PDE-constrained optimization problems subject to TV-regularization. We will restrict our analysis to problems that satisfy the assumptions

  • \(\mathcal {A}\mathbf {1:}\,A: U_h\rightarrow U_h'\) is bounded and linear.

  • \(\mathcal {A}\mathbf {2:}\,A^{-1}\) exists and is bounded.

  • \(\mathcal {A}\mathbf {3:}\,B:V_h\rightarrow U_h'\) is bounded and linear.

  • \(\mathcal {A}\mathbf {4:}\,T:U_h\rightarrow Z_h\) is bounded and linear.

Here, bounded should be interpreted as having operator norm which is bounded independently of the mesh parameter h. Due to assumption \(\mathcal {A}\mathbf {2}\), we can write (2) in the form

$$\begin{aligned} u_h = -A^{-1}Bv_h, \end{aligned}$$
(6)

and the operator K, see (3), is well-defined.

We want to apply the split Bregman algorithm to (45). Unfortunately, the explicit computation of the operator K is difficult in practical applications; if (2) is a PDE, then the inverse of A is typically too expensive to compute explicitly. This issue has been handled, in the case of Tikhonov regularization, by solving the associated KKT system. The purpose of this paper is to adapt the KKT approach to the framework of the split Bregman algorithm. As we will see below, this yields an efficient and practical solution method for PDE-constrained optimization problems subject to TV-regularization.

We proceed by applying Algorithm 1 to the minimization problem (45). Step 5 in Algorithm 1 is straightforward. Furthermore, Step 4 is, sinceFootnote 3 \(\nabla v_h^{k+1}, p_h^k, b_h^k \in \mathbf {P}^0_h\), very cheap to solve by the shrinkage operator

$$\begin{aligned} p_{h,x_i}^{k+1}(x) = \text {shrink}\left( \nabla _{x_i} v_{h}^{k+1}(x) + b_{h,x_i}^k(x), \frac{1}{\lambda }\right) , \end{aligned}$$
(7)

where

$$\begin{aligned} \text {shrink}(a,b) = \frac{a}{|a|}*\max (a-b,0), \end{aligned}$$

see [5]. Hence, the challenge is to find the minimizer of Step 3. That is, we must solve the minimization problem

$$\begin{aligned} \min _{v_h \in V_h} \left\{ \frac{1}{2}\rho \Vert Kv_h-d_h\Vert _{Z_h}^2 + \frac{1}{2}\kappa \Vert v_h\Vert _{L_h^2(\varOmega )}^2 + \frac{1}{2} \lambda \Vert \nabla v_h - p_h^k + b_h^k\Vert _{L_h^2(\varOmega )}^2\right\} , \end{aligned}$$

where \(d_h, p_h^k\) and \(b_h^k\) are given quantities. By combining this minimization problem with Eqs. (6) and (3), we get the equivalent constrained minimization problem:

$$\begin{aligned} \min _{v_h,u_h \in V_h \times U_h} \left\{ \frac{1}{2}\rho \Vert Tu_h-d_h\Vert _{Z_h}^2 + \frac{1}{2}\kappa \Vert v_h\Vert _{L_h^2(\varOmega )}^2 + \frac{1}{2}\lambda \Vert \nabla v_h - p_h^k + b_h^k\Vert _{L_h^2(\varOmega )}^2\right\} \nonumber \\ \end{aligned}$$
(8)

subject to

$$\begin{aligned} Au_h + Bv_h = 0. \end{aligned}$$
(9)

For the sake of simplicity, we want our optimality system to be as similar as possible to the optimality system analyzed in [11]. Thus, we need to scale the cost-functional in (8) such that we get

$$\begin{aligned} \min _{v_h,u_h \in V_h \times U_h} \left\{ \frac{1}{2}\Vert Tu_h-d_h\Vert _{Z_h}^2 + \frac{1}{2}\gamma \Vert v_h\Vert _{L_h^2(\varOmega )}^2 + \frac{1}{2}\alpha \Vert \nabla v_h - p_h^k + b_h^k\Vert _{L_h^2(\varOmega )}^2\right\} \nonumber \\ \end{aligned}$$
(10)

subject to (9), where

$$\begin{aligned} \alpha = \frac{\lambda }{\rho } \text { and } \gamma = \frac{\kappa }{\rho }. \end{aligned}$$
(11)

Next, we can introduce the Lagrangian associated with (910):

$$\begin{aligned} \mathcal {L}(v_h,u_h,w_h)= & {} \frac{1}{2}\Vert Tu_h-d_h\Vert _{Z_h}^2 + \frac{1}{2}\gamma \Vert v_h\Vert _{L_h^2(\varOmega )}^2 \\&+\frac{1}{2}\alpha \Vert \nabla v_h - p_h^k + b_h^k\Vert _{L_h^2(\varOmega )}^2 + \langle Au_h + Bv_h, w_h \rangle . \end{aligned}$$

The first-order optimality conditions can be found by computing the derivatives of the Lagrangian with respect to \(v_h\), \(u_h\) and \(w_h\). These conditions can be expressed by the KKT system

$$\begin{aligned} \underbrace{\begin{bmatrix} - \alpha \varDelta +\gamma E&0&B'\\ 0&T'T&A' \\ B&A&0 \end{bmatrix}}_{\widehat{\mathcal {A}}_{\alpha }} \begin{bmatrix} v_h \\ u_h \\ w_h \end{bmatrix}=\begin{bmatrix} -\alpha \nabla \cdot p_h^k + \alpha \nabla \cdot b_h^k \\ T'd_h \\ 0 \end{bmatrix}, \end{aligned}$$
(12)

where ”\('\)” is used to denote dual operators, and \(E:V_h\rightarrow V_h'\) is defined by

$$\begin{aligned} \langle Ev_h,\phi _h \rangle = (v_h,\phi _h)_{L_h^2(\varOmega )},\quad \phi _h \in V_h. \end{aligned}$$

We have thus derived a new system of equations to be solved in Step 3 in Algorithm 1, which does not require the explicit inverse of A. Also note the form of the operator \(-\varDelta : V_h \rightarrow V_h'\) in the top left corner of the KKT system (12). In an infinite dimensional setting, this operator must be replaced with the more involved operator \(D'D: BV(\varOmega ) \rightarrow BV(\varOmega )'\), see [4] for a thorough discussion of this operator.Footnote 4 The operator \(D'D\) is much more challenging to analyze, but it coincides with the operator \(-\varDelta \) in a finite dimensional setting, which follows from the fact that \(Dv = \nabla v\) for all elements in \(W^{1,1}(\varOmega )\), see [1]. This concludes the discussion of Step 3 in Algorithm 1.

We might now formulate the full algorithm for solving the PDE-constrained optimization problem (12), see Algorithm 2.

figure b

The efficiency of the split Bregman algorithm has been demonstrated earlier, see e.g. [3, 5]. Of the three inner steps of the for-loop in Algorithm 2, the update of \(b_h^k\) is obviously cheap, and the update of \(p_h^k\) is accomplished by the simple shrinkage operator (7). What remains, however, is to analyze the spectrum of the KKT system (12), see Step 3 in Algorithm 2: The efficiency of the algorithm is highly dependent on how fast we can solve these KKT systems with, e.g., Krylov subspace solvers.

3 Spectrum of the KKT system

In its current form, the operator \(\widehat{\mathcal {A}}_\alpha \) in (12) is a mapping from the product space \(V_h \times U_h \times U_h\) onto the dual space \(V_h' \times U_h' \times U_h'\). Since this operator maps to the dual space, and not to the space itself, it is not possible to use the MINRES method directly. A remedy exists, however, in the form of Riesz maps. In this case, we must introduce the two Riesz maps

$$\begin{aligned} R_{V_h}:\,&V_h \rightarrow V_h',\\ R_{U_h}:\,&U_h \rightarrow U_h'. \end{aligned}$$

This enables us to use the MINRES algorithm, since the KKT system (12) can be written as

$$\begin{aligned}&\underbrace{\begin{bmatrix} R_{V_h}^{-1}&0&0 \\ 0&R_{U_h}^{-1}&0 \\ 0&0&R_{U_h}^{-1} \end{bmatrix}}_{\mathcal {R}^{-1}} \underbrace{\begin{bmatrix} -\alpha \varDelta +\gamma E&0&B'\\ 0&T'T&A' \\ B&A&0 \end{bmatrix}}_{\widehat{\mathcal {A}}_\alpha } \begin{bmatrix} v_h \\ u_h \\ w_h \end{bmatrix} \nonumber \\&\quad = \begin{bmatrix} R_{V_h}^{-1}&0&0 \\ 0&R_{U_h}^{-1}&0 \\ 0&0&R_{U_h}^{-1} \end{bmatrix} \begin{bmatrix} -\alpha \nabla \cdot p_h^k + \alpha \nabla \cdot b_h^k \\ T'd_h \\ 0 \end{bmatrix}, \end{aligned}$$
(13)

where

$$\begin{aligned} \mathcal {R}^{-1}\widehat{\mathcal {A}}_\alpha : V_h \times U_h \times U_h \rightarrow V_h \times U_h \times U_h. \end{aligned}$$

The operator \(\mathcal {R}^{-1}\) can be considered to be a preconditioner. See [10, 11] for a more thorough analysis.

We performed an experimental investigation that suggested the use of small values of \(\alpha \) to obtain good convergence results for the outer split Bregman algorithm. That is, \(\lambda /\rho \) should be small, see (11). According to standard theory for Krylov subspace methods, the number of iterations needed by the MINRES algorithm is of the same order as the spectral condition number of the involved operator. In our case, this corresponds to iterations numbers of order \(O(\alpha ^{-1})\), when \(\gamma = 0\). We will now show that this estimate is very pessimistic.

Since the case \(\gamma = 0\) is the most challenging, and also the most interesting, we will for the rest of the analysis assume that this is the case, i.e. \(\gamma = 0\). Let us first simplify the notation in (13), and write the operator \(\mathcal {R}^{-1}\widehat{\mathcal {A}}_\alpha \) in the form

$$\begin{aligned} \mathcal {R}^{-1}\widehat{\mathcal {A}}_\alpha = \mathcal {A}_\alpha = \begin{bmatrix} \alpha Q&0&\tilde{B}^* \\ 0&T^*T&\tilde{A}^* \\ \tilde{B}&\tilde{A}&0 \end{bmatrix}, \end{aligned}$$
(14)

where we have the following definitions:

  • \(Q = -R_{V_h}^{-1}\varDelta : V_h \rightarrow V_h\),

  • \(\tilde{B} = R_{U_h}^{-1}B : V_h \rightarrow U_h\),

  • \(\tilde{A} = R_{U_h}^{-1}A : U_h \rightarrow U_h\),

  • \(T^*T = R_{U_h}^{-1}T'T: U_h \rightarrow U_h\).

In this new form, the operator \(\mathcal {A}_\alpha \) in (14) is very similar to the operator analyzed in [11]. In fact, they analyzed the operator \(\mathcal {B}_\alpha : V_h \times U_h \times U_h \rightarrow V_h \times U_h \times U_h\), defined as

$$\begin{aligned} \mathcal {B}_\alpha = \begin{bmatrix} \alpha I&0&\tilde{B}^* \\ 0&T^*T&\tilde{A}^* \\ \tilde{B}&\tilde{A}&0 \end{bmatrix}. \end{aligned}$$
(15)

The main result in [11] is that the spectrum of \(\mathcal {B}_\alpha \) is of the form

$$\begin{aligned} \text {sp}(\mathcal {B}_\alpha ) \subset [-b,-a] \cup [c\alpha ,2\alpha ] \cup \{ \tau _1, \tau _2, ..., \tau _{N(\alpha )} \} \cup [a,b], \end{aligned}$$

where

$$\begin{aligned} N(\alpha ) = O(\ln (\alpha ^{-1}) \end{aligned}$$

and the constants ab and \(c > 0\) are independent of the parameter \(\alpha \). The analysis in [11] is roughly performed as follows:

  • The negative eigenvalues are shown to be bounded away from zero, regardless of the size of regularization parameter \(\alpha \ge 0\). That is, it even holds for \(\alpha = 0\). Hence, the negative eigenvalues of \(\mathcal {A}_\alpha \), defined in (14), are bounded away from zero: The argument in [11] can be adapted to the present situation in a straightforward manner.

  • For the positive eigenvalues, the Courant-Fischer-Weyl min-max principle is used to show that the difference between the eigenvalues of \(\mathcal {B}_0\) and \(\mathcal {B}_\alpha \) is “small”, where \(\mathcal {B}_0\) denotes the operator \(\mathcal {B}_\alpha \) with zero regularization \(\alpha =0\). More specifically, they prove that the difference between the eigenvalues of \(\mathcal {B}_0\) and \(\mathcal {B}_\alpha \), properly sorted, is less than the size of the regularization parameter \(0 < \alpha \ll 1\). It is easy to verify that a similar property will hold for \(\mathcal {A}_0\) and \(\mathcal {A}_\alpha \). More specifically, the difference between the eigenvalues of \(\mathcal {A}_0\) and \(\mathcal {A}_\alpha \) is less than \(\tilde{c}\alpha \), where \(\tilde{c} = \Vert Q\Vert \).

  • Finally, the analysis in [11] requires that

    $$\begin{aligned} \alpha (v_h,v_h)_{V_h} + (T^*Tu_h,u_h)_{U_h} \end{aligned}$$

    must be coercive whenever

    $$\begin{aligned} \tilde{A}u_h + \tilde{B}v_h = 0. \end{aligned}$$

    It is proven in [11] that this property holds for the operator \(\mathcal {B}_\alpha \). For the operator \(\mathcal {A}_\alpha \), defined in (14), this analysis is more involved, and it will therefore be explored in detail here. More specifically, we must show, provided that suitable assumptions hold, that

    $$\begin{aligned} \alpha (Qv_h,v_h)_{V_h} + (T^*Tu_h,u_h)_{U_h} \end{aligned}$$

    is coercive for all \((v_h,u_h)\) satisfying

    $$\begin{aligned} \tilde{A}u_h+\tilde{B}v_h = 0. \end{aligned}$$

To further investigate the coercivity problem associated with (14), we introduce the notation

$$\begin{aligned} X_h&= V_h \times U_h, \Vert x_h \Vert _{X_h} = \Vert (v_h,u_h) \Vert _{X_h} = \sqrt{\Vert v_h\Vert ^2_{V_h} + \Vert u_h\Vert ^2_{U_h}}, \nonumber \\ M_\alpha&= \begin{bmatrix} \alpha Q&0 \\ 0&T^*T \end{bmatrix}:X_h \rightarrow X_h, \end{aligned}$$
(16)
$$\begin{aligned} N&= \begin{bmatrix} \tilde{B}&\tilde{A} \end{bmatrix} : X_h \rightarrow U_h. \end{aligned}$$
(17)

Since we work with finite dimensional spaces, we employ the control space \(V_h\) with the norm

$$\begin{aligned} \Vert \cdot \Vert ^2_{V_h} = \Vert \cdot \Vert _{L_h^2(\varOmega )}^2 + |\cdot |^2_{H_h^1(\varOmega )}, \end{aligned}$$
(18)

i.e. \(V_h=H^1_h(\varOmega ) \subset H^1(\varOmega )\). Note that, for the analysis presented below, we must assume that the operator B satisfies assumption \(\mathcal {A}\mathbf {3}\) with the norm (18), i.e. that

$$\begin{aligned} B:V_h \rightarrow U'_h \end{aligned}$$

is bounded, which along with assumptions \(\mathcal {A}\mathbf {2}\) and \(\mathcal {A}\mathbf {4}\) imply that

$$\begin{aligned} K_h = K = -TA^{-1}B = -T\tilde{A}^{-1}\tilde{B}:V_h \rightarrow Z_h \end{aligned}$$

is boundedFootnote 5 (Bounded in the sense that the operator norm is bounded independently of h). We must also assume that the discrete solutions converge toward the correct solution as \(h \rightarrow 0\):

$$\begin{aligned} \lim _{h\rightarrow 0} \Vert v_h - v\Vert _{H^1(\varOmega )} = 0 \Rightarrow \lim _{h\rightarrow 0} \Vert K_h v_h - \hat{K}v\Vert _Z = 0, \end{aligned}$$
(19)

where \(\hat{K}:H^1(\varOmega ) \rightarrow Z\) denotes the associated mapping between the infinite dimensional spaces.

We are now ready to formulate the result concerning the coercivity issue for the operator \(\mathcal {A}_\alpha \) defined in (14).

Lemma 1

Let \(M_\alpha \) and N be defined as in (16) and (17), respectively. Assume that (19) holds and that \(\hat{K}\) does not annihilate constants, i.e. the constant function \(k \notin \mathcal {N}(\hat{K})\). Then there exists \(\bar{h} > 0\) such that the operator \(M_\alpha \) is coercive on the kernel of N, i.e. for \(\alpha \in (0,1)\):

$$\begin{aligned} (M_\alpha x_h,x_h)_{X_h} \ge c \alpha \Vert x_h \Vert ^2_{X_h} \end{aligned}$$
(20)

for all \(h \in (0,\bar{h})\) and for all \(x_h=(v_h,u_h) \in X_h\) satisfying

$$\begin{aligned} \tilde{A}u_h + \tilde{B}v_h = 0. \end{aligned}$$
(21)

The constant c is independent of \(h \in (0,\bar{h})\) and \(\alpha > 0\).

Proof

We will first show that, if \(\hat{K}\) does not annihilate constants, then there exist constants \(\bar{h} > 0\) and \(c \in (0,1)\), which is independent of \(h \in (0, \bar{h})\), such that

$$\begin{aligned} (K_h v_h,K_h v_h)_Z&\ge (c-1)(\nabla v_h,\nabla v_h)_{L^2(\varOmega )} \nonumber \\&\quad + c(v_h,v_h)_{L^2(\varOmega )},\quad \forall v_h \in V_h,\,\forall h \in (0, \bar{h}). \end{aligned}$$
(22)

Thereafter, we will use this result to prove (2021).

Assume that there do not exist \(\bar{h} > 0\) and \(c \in (0,1)\) such that (22) holds. We will show that this implies that the constant function k must belong to the null-space of \(\hat{K}\). If (22) is not satisfied, then it follows that there exist a sequence

$$\begin{aligned} \lim _{i \rightarrow \infty } h_i = 0 \end{aligned}$$

and a sequence of functions

$$\begin{aligned} \{ v_{h_i} \} \subset H^1(\varOmega ), \quad v_{h_i} \in H^1_{h_i}(\varOmega ), \quad \Vert v_{h_i} \Vert ^2_{L^2(\varOmega )} = 1, \end{aligned}$$

such that

$$\begin{aligned} 0 \le (K_{h_i} v_{h_i},K_{h_i} v_{h_i})_{Z}&< \left( \frac{1}{i} - 1 \right) |v_{h_i}|_{H^1(\varOmega )}^2 + \frac{1}{i}(v_{h_i},v_{h_i})_{L^2(\varOmega )}\\&= \left( \frac{1}{i} - 1 \right) |v_{h_i}|_{H^1(\varOmega )}^2 + \frac{1}{i}. \end{aligned}$$

We may choose a sequence with the property \(\Vert v_{h_i} \Vert ^2_{L^2(\varOmega )}=1\) because the operator \(K_h\) is linear. Since \((1/i - 1) \rightarrow -1\) as \(i \rightarrow \infty \), we can conclude that

$$\begin{aligned}&|v_{h_i}|_{H^1(\varOmega )} \rightarrow 0 \quad \text {as } i \rightarrow \infty , \end{aligned}$$
(23)
$$\begin{aligned}&\Vert K_{h_i}v_{h_i} \Vert _{Z} \rightarrow 0 \quad \text {as } i \rightarrow \infty . \end{aligned}$$
(24)

We will now show that \(\{v_{h_i}\}\) has a limit in \(H^1(\varOmega )\). Let

$$\begin{aligned} S = \left\{ s \in H^1(\varOmega ): \int _\varOmega s \, dx = 0\right\} . \end{aligned}$$

It is well known that \(H^1(\varOmega ) = S \oplus \mathbb {R}\), i.e. every function in \(H^1(\varOmega )\) can be (uniquely) expressed as a sum of a function in S and a constant. Hence,

$$\begin{aligned}&v_{h_i} = s_{h_i} + r_i, \quad \text{ where } \\&s_{h_i} \in S, \\&r_i \in \mathbb {R} \text{ is } \text{ a } \text{ constant }. \end{aligned}$$

From this splitting, we obtain

$$\begin{aligned} 0 \le |s_{h_i}|_{H^1(\varOmega )} = |s_{h_i} + r_i|_{H^1(\varOmega )} = |v_{h_i}|_{H^1(\varOmega )} \rightarrow 0 \text{ as } i \rightarrow \infty , \end{aligned}$$
(25)

see (23).

This enables us to use the Poincaré inequality to conclude that

$$\begin{aligned} 0 \le \Vert s_{h_i}\Vert _{L^2(\varOmega )} \le C|s_{h_i}|_{H^1(\varOmega )} \rightarrow 0 \quad \text { as } i \rightarrow \infty , \end{aligned}$$

i.e.

$$\begin{aligned} \Vert s_{h_i}\Vert _{L^2(\varOmega )} \rightarrow 0 \quad \text { as } i \rightarrow \infty . \end{aligned}$$
(26)

Furthermore, recall that \(\Vert v_{h_i}\Vert _{L^2(\varOmega )}^2 = 1\) and that \(\int _\varOmega s_{h_i} \, dx = 0\). Thus, it follows that

$$\begin{aligned} 1&= \Vert v_{h_i}\Vert _{L^2(\varOmega )}^2 \\&= \Vert s_{h_i} + r_i\Vert _{L^2(\varOmega )}^2 \\&= \Vert s_{h_i}\Vert _{L^2(\varOmega )}^2 + 2(s_{h_i},r_i)_{L^2(\varOmega )} + \Vert r_i\Vert _{L^2(\varOmega )}^2 \\&= \Vert s_{h_i}\Vert _{L^2(\varOmega )}^2 + 2r_i\int _\varOmega s_{h_i} \ dx + \Vert r_i\Vert _{L^2(\varOmega )}^2 \\&= \Vert s_{h_i}\Vert _{L^2(\varOmega )}^2 + |\varOmega | (r_i)^2, \end{aligned}$$

which yields

$$\begin{aligned} (r_i)^2 = \frac{1}{|\varOmega |}\left( 1-\Vert s_{h_i}\Vert ^2_{L^2(\varOmega )}\right) . \end{aligned}$$

By using (26) we get

$$\begin{aligned} r_i = \frac{1}{\sqrt{|\varOmega |}}\sqrt{1 - \Vert s_{h_i}\Vert _{L^2(\varOmega )}^2} \rightarrow r^*=\frac{1}{\sqrt{|\varOmega |}} \quad \text { as } i \rightarrow \infty . \end{aligned}$$

We claim that also the sequence \(\{ v_{h_i} \}\) converges toward \(r^*\) in \(H^1(\varOmega )\):

$$\begin{aligned} v_{h_i} \rightarrow r^* = \frac{1}{\sqrt{|\varOmega |}} \quad \text { in } H^1(\varOmega ). \end{aligned}$$

This follows from the fact that \(s_{h_i} = v_{h_i} - r_i\) and (2526):

$$\begin{aligned} \Vert v_{h_i} - r^*\Vert _{H^1(\varOmega )}&= \Vert v_{h_i} - r_i + r_i - r^*\Vert _{H^1(\varOmega )} \\&\le \Vert v_{h_i} - r_i\Vert _{H^1(\varOmega )} + \Vert r_i - r^*\Vert _{H^1(\varOmega )} \\&= \Vert s_{h_i}\Vert _{H^1(\varOmega )} + \Vert r_i-r^*\Vert _{H^1(\varOmega )} \\&= \Vert s_{h_i}\Vert _{H^1(\varOmega )} + \Vert r_i-r^*\Vert _{L^2(\varOmega )} \xrightarrow {i \rightarrow \infty } 0. \end{aligned}$$

Here, we have used that \(\{ r_i \}\) is a sequence of constants and that \(r^*\) is a constant, which implies that \(\Vert r_i-r^*\Vert _{H^1(\varOmega )} = \Vert r_i-r^*\Vert _{L^2(\varOmega )}\) and that

$$\begin{aligned} r_i \rightarrow r^* \text{ in } \mathbb {R} \Rightarrow \Vert r_i-r^*\Vert _{L^2(\varOmega )} \rightarrow 0, \end{aligned}$$

provided that \(\varOmega \) has finite measure.

Since \(\{ v_{h_i} \}\) converges toward \(r^*\) in \(H^1(\varOmega )\), we may employ assumption (19) to find that

$$\begin{aligned} \lim _{i\rightarrow \infty } \Vert K_{h_i} v_{h_i} \Vert _Z = \Vert \hat{K} r^* \Vert _Z. \end{aligned}$$

By combining these observations with (24), we conclude that

$$\begin{aligned} r^* \in \mathcal {N}(\hat{K}). \end{aligned}$$

To summarize, if (22) does not hold, then \(\hat{K}\) annihilates constants. Conversely, if \(\hat{K}\) does not annihilate constants, (22) must hold.

We are now ready to show that (2021) does indeed hold. Note that (22) can be written in the form: There exists \(c \in (0,1)\) such that

$$\begin{aligned}&(\nabla v_h,\nabla v_h)_{L^2(\varOmega )}+(K_h v_h,K_hv_h)_Z \nonumber \\&\quad \ge c[(\nabla v_h,\nabla v_h)_{L^2(\varOmega )} + (v_h,v_h)_{L^2(\varOmega )}], \nonumber \\&\quad \forall v_h \in V_h \text{ and } \forall h \in (0,\bar{h}). \end{aligned}$$
(27)

Assume that \(x=(v_h,u_h) \in X_h\) satisfies the state equation, i.e.

$$\begin{aligned} \tilde{A}u_h + \tilde{B}v_h = 0. \end{aligned}$$

Then,

$$\begin{aligned} u_h= -\tilde{A}^{-1} \tilde{B}v_h, \end{aligned}$$
(28)

and since \(\tilde{A}^{-1}\tilde{B}\) is assumed to be bounded independently of h,

$$\begin{aligned} \Vert u_h \Vert _{U_h} \le \bar{c} \Vert v_h \Vert _{V_h}. \end{aligned}$$
(29)

In addition,

$$\begin{aligned} T u_h= - T \tilde{A}^{-1} \tilde{B}v_h = K_h v_h. \end{aligned}$$
(30)

Therefore, see (16), for \(\alpha \in (0,1)\) and \(h \in (0,\bar{h})\),

$$\begin{aligned} (M_\alpha x_h,x_h)_{X_h}&= \alpha (\nabla v_h,\nabla v_h)_{L_h^2(\varOmega )} + (T^*Tu_h,u_h)_{U_h} \\&\ge \alpha [(\nabla v_h,\nabla v_h)_{L_h^2(\varOmega )} + (Tu_h,Tu_h)_{Z_h}] \\&= \alpha [(\nabla v_h,\nabla v_h)_{L_h^2(\varOmega )} + (K_hv_h,K_hv_h)_{Z_h}], \end{aligned}$$

where we have used (30). Next, by invoking (27) and (29) we can conclude that

$$\begin{aligned} (M_\alpha x_h,x_h)_{X_h}&\ge \alpha [(\nabla v_h,\nabla v_h)_{L_h^2(\varOmega )} + (K_hv_h,K_hv_h)_{Z_h}], \nonumber \\&\ge \alpha c [(\nabla v_h,\nabla v_h)_{L_h^2(\varOmega )} + (v_h,v_h)_{L_h^2(\varOmega )}], \nonumber \\&\ge \alpha c [0.5(\nabla v_h,\nabla v_h)_{L_h^2(\varOmega )} + 0.5(v_h,v_h)_{L_h^2(\varOmega )} + 0.5 \bar{c}^{-2} \Vert u_h \Vert _{U_h}^2 ], \nonumber \\&\ge \alpha c \min \{0.5,0.5 \bar{c}^{-2}\} \Vert (v_h,u_h) \Vert _{X_h}^2. \end{aligned}$$
(31)

That is, \(M_\alpha \) is coercive on the kernel of N, cf. (17). \(\square \)

We will now use this lemma to establish the main result of this section. First, however, we note that:

Remark 1

From \(\mathcal {A}\mathbf {1}\)\(\mathcal {A}\mathbf {3} \ \) it follows that the inf-sup condition holds, i.e.

$$\begin{aligned} \inf _{w_h\in {U_h}}\sup _{(v_h,u_h)\in V_h \times U_h} \frac{(\tilde{B}v_h,w_h)_{U_h}+(\tilde{A}u_h,w_h)_{U_h}}{\sqrt{\Vert v_h\Vert _{V_h}^2+\Vert u_h\Vert _{U_h}^2}\Vert w_h\Vert _{U_h}}\ge c > 0. \end{aligned}$$
(32)

Remark 2

Since we consider finite dimensional problems, there always exist positive constants \(\tilde{c}, \, \tilde{C}\) such that

$$\begin{aligned} |\tau _i(\mathcal {A}_{0}) | \le \tilde{c} e^{-\tilde{C} i}, \quad i=1,2,\ldots ,n, \end{aligned}$$
(33)

where \(\tau _i(\mathcal {A}_{0})\) denotes the ith eigenvalue of \(\mathcal {A}_{0}\) sorted in decreasing order according to their absolute value. Here, \(\mathcal {A}_{0}\) is \(\mathcal {A}_{\alpha }\) with \(\alpha =0\), i.e. without regularization. (\(\mathcal {A}_{\alpha }\) is defined in (14)).

We consider ill-posed PDE-constrained optimization tasks. For such problems, \(\tilde{c} e^{-\tilde{C} n}\) will typically be extremely small, i.e. much smaller than practical choices of the size of the regularization parameter.

Let us state the theorem:

Theorem 2

Assume that all assumptions of Lemma 1 hold and that \(h \in (0,\bar{h})\). Then there exist constants \(a, \, b, \, c > 0\) such that, for \(\alpha \in (0,1)\), the spectrum of \(\mathcal {A}_\alpha \), defined in (14), satisfies

$$\begin{aligned} \text {sp}(\mathcal {A}_\alpha ) \subset [-b,-a] \cup [c\alpha ,2\alpha ] \cup \{ \tau _1, \tau _2, ..., \tau _{N(\alpha )\}} \cup [a,b], \end{aligned}$$
(34)

where

$$\begin{aligned} N(\alpha ) \le \left\lceil \frac{\ln (\tilde{c})-\ln (\alpha )}{\tilde{C}} \right\rceil =O(\ln (\alpha ^{-1})). \end{aligned}$$

Here \(\tilde{c}, \, \tilde{C}\) are the constants in (33).

Proof

Since we consider finite dimensional problems, the theorem follows from Lemma 1 and the analysis presented in [11]. \(\square \)

Since the spectrum of \(\mathcal {A}_\alpha \) is of the form (34), we can conclude that the MINRES method will handle the KKT systems (13) very well. More precisely, the number of iterations needed by the MINRES scheme to solve (13) can not grow faster than \(O([\ln (\alpha ^{-1})]^2)\) as \(\alpha \rightarrow 0\), see [11]. In fact, in practice, iterations counts of order \(O(\ln (\alpha ^{-1}))\) will in many situations occur, which is also explained in [11].

Note that, while the optimality system (12) requires that either \(K = -TA^{-1}B: V_h \rightarrow Z_h\) is injective or that \(\gamma > 0\) to obtain a unique solution, see [4], the inner MINRES algorithm only requires that the constant k does not belong to the null-space of \(\hat{K}\).

4 Constrained split Bregman algorithm

The split Bregman algorithm we have analyzed is in [5] referred to as the unconstrained split Bregman method. For some applications, the related constrained split Bregman algorithm, also introduced in [5], produces better convergence rates. In order to discuss the latter method, we observe that the problem (45) can be formulated on the related, constrained form

$$\begin{aligned} \min _{v_h \in V_h} \left\{ \frac{1}{2}\kappa \Vert v_h\Vert _{L^2(\varOmega )}^2 + \int _\varOmega |p_h| \ dx \right\} , \end{aligned}$$

subject to

$$\begin{aligned} Kv_h&= d_h \quad \text {on } \varOmega _{\text {observe}}, \\ Dv_h&= p_h \quad \text {on } \varOmega . \end{aligned}$$

Here, \(\varOmega _{\text {observe}}\) is the domain on which the observation data \(d_h\) is defined. The constraints are “implicit” in the sense that they are not necessarily satisfied in each step of the split Bregman algorithm, see [5]. Instead, the scheme generates approximations which converge toward functions satisfying these constraints, and a natural stopping criterion is thus

$$\begin{aligned} \Vert K v_h^k-d_h\Vert _{Z_h} < \text {TOL}. \end{aligned}$$

Details about the constrained split Bregman algorithm associated with this problem can be found in [3].

It turns out that this constrained approach also can be applied to a PDE-constrained optimization problem, and an experimental investigation gave us better convergence results with this latter approach. We will therefore present the constrained split Bregman algorithm for discretized PDE-constrained optimization problems of the form

$$\begin{aligned} \min _{v_h \in V_h} \left\{ \frac{1}{2}\kappa \Vert v_h\Vert _{L_h^2(\varOmega )}^2 + \int _\varOmega |p_h| \ dx \right\} , \end{aligned}$$

subject to

$$\begin{aligned} Au_h + Bv_h&= 0, \\ Tu_h&= d_h \quad \text {on } \varOmega _{\text {observe}}, \\ Dv_h&= p_h \quad \text {on } \varOmega . \end{aligned}$$

Note that the first constraint here is “explicit”, i.e. it must be satisfied in each step of the algorithm. The latter two constraints are “implicit”.

Recall the KKT system (12) that we derived in connection with Algorithm 2. For the constrained split Bregman method, we get the very similar optimality system

$$\begin{aligned} \underbrace{\begin{bmatrix} - \alpha \varDelta +\gamma E&0&B'\\ 0&T'T&A' \\ B&A&0 \end{bmatrix}}_{\widehat{\mathcal {A}}_{\alpha }} \begin{bmatrix} v_h \\ u_h \\ w_h \end{bmatrix}=\begin{bmatrix} -\alpha \nabla \cdot p_h^k + \alpha \nabla \cdot b_h^k \\ T'd_h - T'c_h^k \\ 0 \end{bmatrix}, \end{aligned}$$
(35)

where “\('\)” is used to denote dual operators, and \(E:V_h\rightarrow V_h'\) is defined by

$$\begin{aligned} \langle Ev_h,\phi _h \rangle = (v_h,\phi _h)_{L_h^2(\varOmega )},\quad \phi _h \in V_h. \end{aligned}$$

Compared with (12), only the term \(- T'c_h^k\) has been added to the second row of the right hand side of (35). The operator \(\widehat{\mathcal {A}}_{\alpha }\) on the left hand side is unchanged, and our analysis of the MINRES method, presented above, also applies to this KKT system. The associated algorithm is, of course, similar to Algorithm 2, see Algorithm 3.

figure c

We observe that Algorithm 3 only requires one more simple update compared with Algorithm 2: The update for \(c_h^{k+1}\). This extra computer effort is diminishingly small, and since we obtain better convergence results, we will present numerical experiments with the use of Algorithm 3 only.

5 Numerical experiments

5.1 Example 1

Let \(\varOmega = (0,1) \times (0,1)\). We consider the standard example in PDE-constrained optimization, but with TV-regularization instead of Tikhonov regularization. That is,

$$\begin{aligned} \min _{(v_h,u_h) \in V_h \times U_h} \left\{ \frac{1}{2} \rho \Vert Tu_h-d_h\Vert ^2_{L_h^2(\varOmega )} + \int _\varOmega |Dv_h|\right\} , \end{aligned}$$
(36)

subject to

$$\begin{aligned} -\varDelta u_h + u_h&= v_h \text { in } \varOmega , \end{aligned}$$
(37)
$$\begin{aligned} \nabla u_h \cdot n&= 0 \text { on } \partial \varOmega , \end{aligned}$$
(38)

where the control space \(V_h\), the state space \(U_h\) and the observation space \(Z_h\) are

$$\begin{aligned} V_h&= H_h^1(\varOmega ) = H^1(\varOmega ) \cap P^1_h , \end{aligned}$$
(39)
$$\begin{aligned} U_h&= H_h^1(\varOmega ), \end{aligned}$$
(40)
$$\begin{aligned} Z_h&= L_h^2(\varOmega ) = L^2(\varOmega ) \cap P^1_h, \end{aligned}$$
(41)

respectively. Furthermore, the operator T is the embedding \(T: H^1(\varOmega ) \hookrightarrow L^2(\varOmega )\). Hence, assumption \(\mathcal {A}\mathbf {4}\) is satisfied.

Recall that our objective is to solve this system with Algorithm 3. The main challenge is the efficient solution of the KKT systems (35). To derive this optimality system, we need the weak formulation of the boundary value problem (3738).

5.1.1 Computational details

The weak formulation reads: Find \(u_h \in U_h\) such that

$$\begin{aligned} \langle Au_h,\psi _h \rangle = -\langle Bv_h,\psi _h \rangle \quad \forall \psi _h \in U_h, \end{aligned}$$

where

$$\begin{aligned} A: U_h \rightarrow U_h', \quad&u_h \rightarrow \int _{\varOmega } \nabla u_h \cdot \nabla \psi _h + u_h \psi _h\,dx,\,\forall \psi _h \in U_h, \end{aligned}$$
(42)
$$\begin{aligned} B: V_h \rightarrow U_h', \quad&v_h \rightarrow \int _{\varOmega } v_h \psi _h\,dx,\forall \psi _h \in U_h. \end{aligned}$$
(43)

From standard PDE theory, we find that A and \(A^{-1}\) have operator norms which are bounded independently of h. The boundedness of

$$\begin{aligned} B:V_h \rightarrow U'_h, \end{aligned}$$

where one employs the \(H^1\)-topology (18) on \(V_h\), follows from the inequalities

$$\begin{aligned} \int _{\varOmega } v_h \psi _h \ dx\le & {} \Vert v_h \Vert _{L^2_h(\varOmega )} \cdot \Vert \psi _h \Vert _{L^2_h(\varOmega )} \\\le & {} \sqrt{\Vert v_h \Vert ^2_{L^2_h(\varOmega )} + | v_h |^2_{H^1_h(\varOmega )}} \cdot \sqrt{\Vert \psi _h \Vert ^2_{L^2_h(\varOmega )} + | \psi _h |^2_{H^1_h(\varOmega )}}\\= & {} \Vert v_h \Vert _{V_h} \cdot \Vert \psi _h \Vert _{U_h}. \end{aligned}$$

We conclude that assumptions \(\mathcal {A}\mathbf {1}\), \(\mathcal {A}\mathbf {2}\) and \(\mathcal {A}\mathbf {3}\) are satisfied.

The KKT system to be solved in Algorithm 3 now takes the form

$$\begin{aligned}&\underbrace{\begin{bmatrix} R_{V_h}^{-1}&0&0 \\ 0&R_{U_h}^{-1}&0 \\ 0&0&R_{U_h}^{-1} \end{bmatrix}}_{\mathcal {R}^{-1}} \underbrace{\begin{bmatrix} - \alpha \varDelta&0&B'\\ 0&T'T&A' \\ B&A&0 \end{bmatrix}}_{\widehat{\mathcal {A}}_\alpha } \begin{bmatrix} v_h \\ u_h \\ w_h \end{bmatrix} \nonumber \\&\quad = \begin{bmatrix} R_{V_h}^{-1}&0&0 \\ 0&R_{U_h}^{-1}&0 \\ 0&0&R_{U_h}^{-1} \end{bmatrix} \begin{bmatrix} -\alpha \nabla \cdot p_h^k + \alpha \nabla \cdot b_h^k \\ T'd_h - T'c_h^k \\ 0 \end{bmatrix}. \end{aligned}$$
(44)

Recall that \(\alpha =\lambda /\rho \), where \(\rho \) is the regularization parameter in (36) and \(\lambda \) is the parameter employed in the Bregman scheme, see the discussion of (811).

The discretization of the operator \(\mathcal {R}\) in (44) is rather straightforward. Recall that the finite dimensional space \(V_h\) was equipped with the norm \(\Vert \cdot \Vert _{H_h^1(\varOmega )}\). Furthermore, since \(U = H^1(\varOmega )\) in this particular example, it follows that the discretization of both of the Riesz maps \(R_{V_h}\) and \(R_{U_h}\) yields the sum of the mass matrix M and stiffness matrix S.

For the operator \({\widehat{\mathcal {A}}_\alpha }\) in (44), the discretization is more challenging, but a general recipe can be found in [10]. The end result can be summarized as follows:

  • A, defined in (42), yields the matrix \(M + S\), which is the sum of the mass and stiffness matrices associated with the domain \(\varOmega \).

  • B, defined in (43), yields the mass matrix M.

  • \(-\varDelta \) yields the stiffness matrix S.

  • \(T'T = R_{U_h}^{-1}T^*T\) yields the mass matrix M.

  • The functions \(v_h\), \(u_h\), \(w_h\), \(p_h^k\), \(b_h^k\), \(c_h^k\) and \(d_h\) yields the corresponding vectors \(\bar{v}\), \(\bar{u}\), \(\bar{w}\), \(\bar{p}^k\), \(\bar{b}^k\), \(\bar{c}^k\) and \(\bar{d}\), respectively.

Hence, the matrix “version” of (44) is

$$\begin{aligned}&\underbrace{\begin{bmatrix} (M+S)^{-1}&0&0 \\ 0&(M+S)^{-1}&0 \\ 0&0&(M+S)^{-1} \end{bmatrix}}_{\bar{\mathcal {R}}^{-1}} \underbrace{\begin{bmatrix} \alpha S&0&M\\ 0&M&M+S \\ M&M+S&0 \end{bmatrix}}_{\bar{\mathcal {A}}_\alpha } \underbrace{\begin{bmatrix} \bar{v}^{k+1} \\ \bar{u}^{k+1} \\ \bar{w}^{k+1} \end{bmatrix}}_{\bar{q}^{k+1}} \nonumber \\&\quad = \begin{bmatrix} (M+S)^{-1}&0&0 \\ 0&(M+S)^{-1}&0 \\ 0&0&(M+S)^{-1} \end{bmatrix} \underbrace{\begin{bmatrix} -\alpha \nabla \cdot \bar{p}^k + \alpha \nabla \cdot \bar{b}^k \\ M\bar{d} - M\bar{c}^k \\ 0 \end{bmatrix}}_{\bar{g}^k}. \end{aligned}$$
(45)

The preconditioner thus reads

$$\begin{aligned} \begin{bmatrix} (M+S)^{-1}&0&0 \\ 0&(M+S)^{-1}&0 \\ 0&0&(M+S)^{-1} \end{bmatrix}, \end{aligned}$$
(46)

and involves the inverse of the matrix \(M + S\). This inverse is computed approximately by using algebraic multigrid (AMG). We discuss this in some more detail in the numerical setup.

5.1.2 Numerical setup

  • We wrote the code using cbc.block, which is a FEniCS-based Python implemented library for block operators. See [9] for details.

  • The PyTrilinos package was used to compute an approximation of the preconditioner (46). We approximated the inverse using AMG with a symmetric Gauss-Seidel smoother with three smoothing sweeps. All tables containing iteration counts for the MINRES method were generated with this approximate inverse Riesz map. On the other hand, the eigenvalues of the KKT systems \([\bar{\mathcal {R}}]^{-1}\bar{\mathcal {A}}_\alpha \), see (45), were computed with an exact inverse \([\bar{\mathcal {R}}^k]^{-1}\) computed in Octave.

  • To discretize the domain, we divided \(\varOmega = (0,1) \times (0,1)\) into \(N \times N\) squares, and each of these squares were divided into two triangles.

  • The MINRES iteration process was stopped as soon as

    $$\begin{aligned} \frac{\Vert r^k_n\Vert }{\Vert r^k_0\Vert } = \left[ \frac{\left( \ \bar{\mathcal {A}}_\alpha \bar{q}_n^k - \bar{g}^k,[\bar{\mathcal {R}}]^{-1} \left[ \bar{\mathcal {A}}_\alpha \bar{q}_n^k - \bar{g}^k\right] \ \right) }{\left( \ \bar{\mathcal {A}}_\alpha \bar{q}_0^k - \bar{g}^k,[\bar{\mathcal {R}}]^{-1} \left[ \bar{\mathcal {A}}_\alpha \bar{q}_0^k - \bar{g}^k\right] \ \right) } \right] ^{1/2} < \epsilon . \end{aligned}$$
    (47)

    Here, \(\epsilon \) is a small positive parameter. Note that the superindex k is the iteration index for the “outer” split Bregman method, while the subindex n is the iteration index for the “inner” MINRES algorithm (at each step of the split Bregman method).

  • No noise was added to the input data \(d_h\), see (36).

5.1.3 Results

We are now ready to solve the problem (3638). The synthetic data \(d_h\) was produced by setting

$$\begin{aligned} v_h(x) ={\left\{ \begin{array}{ll} -5 &{} \text {if}\,x_2<0.5,\\ 7 &{}\text {if}\,x_2>0.5, \end{array}\right. } \end{aligned}$$
(48)

and then we solved the boundary value problem (3738) with (48) as input. The data \(d_h\) was thereafter set equal to the solution \(u_h\) throughout the entire domain \(\varOmega =(0,1) \times (0,1)\).

Theorem 2 states that the KKT system (1314) arising in each iteration of the split Bregman iteration has a spectrum of the form (34). In Fig. 1, we see a spectrum of such a KKT system, and it is clearly of the form (34). Hence, we should expect the MINRES algorithm to solve the problem efficiently.

Fig. 1
figure 1

The eigenvalues of \([\bar{\mathcal {R}}^k]^{-1}\bar{\mathcal {A}}_\alpha \) in Example 1. Here, \(\alpha = 0.0001\) and \(N = 32\), i.e. \(h=1/32\). (\([\bar{\mathcal {R}}^k]^{-1}\) denotes the exact inverse of the preconditioner—not its AMG approximation)

Table 1 illuminates the theoretically established convergence behavior of the MINRES algorithm. As previously mentioned, in [11] the authors proved that the number of iterations can not grow faster than \(O([\ln (\alpha ^{-1})]^2)\), and showed why iteration growth of \(O(\ln (\alpha ^{-1}))\) often occur in practice. For \(\epsilon = 10^{-6}\), see (47), and \(N = 256\), we get the following estimate for the iteration growth

$$\begin{aligned} 40.2 - 21.6\log _{10}(\alpha ), \end{aligned}$$

where the coefficients are computed by the least squares method. The growth is very well modeled by this formula. Similarly, for \(\epsilon = 10^{-10}\) and \(N = 256\), we can model the growth by the formula

$$\begin{aligned} 57.6 - 43.5\log _{10}(\alpha ). \end{aligned}$$

In Fig. 2, two approximate solutions of the optimization problem (3638) are displayed: After 10 and 70 Bregman iterations. The “true” control/source is defined in (48).

Table 1 The average number of MINRES iterations required to solve the KKT systems arising in the first ten steps of the split Bregman algorithm in Example 1

Remark

As mentioned above, the problem (3638), with Tikhonov regularization instead of TV-regularization, has been analyzed by many scientists. In fact, for Tikhonov regularization a number of numerical schemes that are completely robust with respect to the size of the regularization parameter have been developed [14, 16, 17]: Even logarithmic growth in iterations counts is avoided. As far as the authors knows, it is not known whether these techniques can be adapted to the saddle point problem (44).

Fig. 2
figure 2

The solution of the problem (3638). Here, \(\epsilon = 10^{-6}\), \(\alpha = 10^{-6}\) and \(N = 128 \ (i.e. \ h = 1/128)\). a Approximative inverse solution generated with 10 split Bregman iterations, i.e. \(v_h^{10}\). b Approximative inverse solution generated with 70 split Bregman iterations, i.e. \(v_h^{70}\)

5.2 Example 2

We will now explore a more challenging problem. Let the domain \(\varOmega \) still be the unit square. Furthermore, define

$$\begin{aligned} \tilde{\varOmega } = (1/4, 3/4) \times (1/4, 3/4). \end{aligned}$$

The problem we want to study is

$$\begin{aligned} \min _{(v_h,u_h) \in V_h \times U_h} \left\{ \frac{1}{2} \rho \Vert Tu_h-d_h\Vert _{L_h^2(\partial \varOmega )}^2 + \int _{\tilde{\varOmega }} |Dv_h|\right\} , \end{aligned}$$
(49)

subject to

$$\begin{aligned} -\varDelta u_h + u_h= & {} {\left\{ \begin{array}{ll} -v_h &{} \text {if}\,x \in \tilde{\varOmega },\\ 0 &{}\text {if}\,x \in \varOmega \setminus \tilde{\varOmega }, \end{array}\right. } \end{aligned}$$
(50)
$$\begin{aligned} \nabla u_h\cdot n= & {} 0 \text { on }\partial \varOmega , \end{aligned}$$
(51)

where the control space \(V_h\), the state space \(U_h\) and the observation space \(Z_h\) are

$$\begin{aligned} V_h&= H^1_h(\tilde{\varOmega }), \end{aligned}$$
(52)
$$\begin{aligned} U_h&= H_h^1(\varOmega ) = H^1(\varOmega ) \cap P_h^1, \end{aligned}$$
(53)
$$\begin{aligned} Z_h&= L_h^2(\partial \varOmega ) = L^2(\partial \varOmega ) \cap T(P_h^1), \end{aligned}$$
(54)

respectively. Furthermore, the operator \(T: H^1(\varOmega ) \rightarrow L^2(\partial \varOmega )\) is the trace operator. Hence, assumption \(\mathcal {A}\mathbf {4}\) is satisfied.

We observe two differences between examples 1 and 2. First, the control domain \(\tilde{\varOmega }\) is now a subdomain of the entire domain \(\varOmega \), bounded strictly away from the boundary \(\partial \varOmega \). Secondly, the observation domain is reduced from the entire domain \(\varOmega \) to the boundary \(\partial \varOmega \).

In this model problem, \(V_h\) does not coincide with the control space defined in bullet point 1 in Sect. 1. Nevertheless, the proof of Lemma 1 can be adapted to the present situation in a straightforward manner, and Theorem 2 therefore also holds for this example.

Since the discretization of (4951) is very similar to the discretization of (3638), we do not enter into all the details. Instead, we only focus on the differences.

The weak formulation of the state Eqs. (5051) reads: Find \(u \in U_h\) such that

$$\begin{aligned} \langle Au_h,\psi _h \rangle = -\langle Bv_h,\psi _h \rangle \quad \forall \psi _h \in U_h, \end{aligned}$$

where the operator A is still defined as in (42). The operator B, however, is no longer as in (43), but is here defined by

$$\begin{aligned} B: V_h \rightarrow U_h', \quad v_h \rightarrow \int _{\tilde{\varOmega }} v_h \psi _h \, dx, \forall \psi _h \in U_h, \end{aligned}$$
(55)

where we can employ the norm

$$\begin{aligned} \Vert \cdot \Vert ^2_{V_h} = \Vert \cdot \Vert _{L_h^2(\tilde{\varOmega })}^2 + |\cdot |^2_{H_h^1(\tilde{\varOmega })} \end{aligned}$$

on the control space \(V_h\). From standard PDE theory, we can guarantee that A and \(A^{-1}\) are bounded, and the boundedness of B is verified in a manner very similar to the argument presented in connection with Example 1:

$$\begin{aligned} \int _{\tilde{\varOmega }} v_h \psi _h \ dx\le & {} \Vert v_h \Vert _{V_h} \cdot \Vert \psi _h \Vert _{H^1_h(\tilde{\varOmega })}\\\le & {} \Vert v_h \Vert _{V_h} \cdot \Vert \psi _h \Vert _{U_h} \end{aligned}$$

because \(\tilde{\varOmega }\) is a subdomain of \(\varOmega \). We conclude that assumptions \(\mathcal {A}\mathbf {1}\), \(\mathcal {A}\mathbf {2}\) and \(\mathcal {A}\mathbf {3}\) are satisfied.

The new control domain \(\tilde{\varOmega }\) and the redefined operators B and T lead to some changes in the discretization of the optimality system (35), which must be solved repeatedly in Algorithm 3. These can be summarized as follows:

  • B, defined in (55), yields the mass matrix \(\tilde{M}\) associated with the subdomain \(\tilde{\varOmega }\).

  • \(-\varDelta \) yields the stiffness matrix \(\tilde{S}\) associated with the subdomain \(\tilde{\varOmega }\).

  • \(T'T = R_{U_h}^{-1}T^*T\) yields the “boundary” mass matrix \(M_\partial \).

  • The Riesz map \(R_{V_h}\) now yields the sum of the mass matrix \(\tilde{M}\) and stiffness matrix \(\tilde{S}\).

All other operators are discretized in the same fashion as in Example 1. Hence, the matrix “version” of the optimality system in Algorithm 3, associated with (4951), takes the form

$$\begin{aligned}&\underbrace{\begin{bmatrix} (\tilde{M}+\tilde{S})^{-1}&0&0 \\ 0&(M+S)^{-1}&0 \\ 0&0&(M+S)^{-1}\end{bmatrix}}_{\bar{\mathcal {R}}^{-1}} \underbrace{\begin{bmatrix} \alpha \tilde{S}&0&\tilde{M}\\ 0&M_\partial&M+S \\ \tilde{M}&M+S&0 \end{bmatrix}}_{\bar{\mathcal {A}}_\alpha } \underbrace{\begin{bmatrix} \bar{v}^{k+1} \\ \bar{u}^{k+1} \\ \bar{w}^{k+1} \end{bmatrix}}_{\bar{q}^{k+1}} \nonumber \\&\quad = \begin{bmatrix} (\tilde{M}+\tilde{S})^{-1}&0&0 \\ 0&(M+S)^{-1}&0 \\ 0&0&(M+S)^{-1} \end{bmatrix} \underbrace{\begin{bmatrix} -\alpha \nabla \cdot \bar{p}^k + \alpha \nabla \cdot \bar{b}^k \\ M_\partial \bar{d} - M_\partial \bar{c}^k \\ 0 \end{bmatrix}}_{\bar{g}^k}. \end{aligned}$$
(56)

The preconditioner thus reads

$$\begin{aligned} \begin{bmatrix} (\tilde{M}+\tilde{S})^{-1}&0&0 \\ 0&(M+S)^{-1}&0 \\ 0&0&(M+S)^{-1} \end{bmatrix}. \end{aligned}$$
(57)

5.2.1 Results

The synthetic data \(d_h\) was produced in the same manner as in Example 1. We computed the synthetic data from the function \(v_h \in V_h\), where

$$\begin{aligned} v_h(x)={\left\{ \begin{array}{ll} 5 &{} \text {if}\,x_1<0.5,\\ -5&{}\text {if}\,x_1>0.5. \end{array}\right. } \end{aligned}$$
(58)
Fig. 3
figure 3

The eigenvalues of \([\bar{\mathcal {R}}^k]^{-1}\bar{\mathcal {A}}_\alpha \) in Example 2. Here, \(\alpha = 0.0001\) and \(N = 32\). (\([\bar{\mathcal {R}}^k]^{-1}\) denotes the exact inverse of the preconditioner - not its AMG approximation)

Table 2 The average number of MINRES iterations required to solve the KKT systems arising in the first ten steps of the split Bregman algorithm in Example 2
Fig. 4
figure 4

The solution of the problem (4951). Here, \(\epsilon = 10^{-6}\), \(\alpha = 10^{-6}\) and \(N = 128 \ (i.e. \ h = 1/128)\). a Approximative inverse solution generated with 10 split Bregman iterations, i.e. \(v_h^{10}\). b Approximative inverse solution generated with 70 split Bregman iterations, i.e. \(v_h^{70}\)

Note that the forward operator \(K = -TA^{-1}B\) does not guarantee a unique solution of (4951), since the trace operator is not injective, see [4]. Nevertheless, the forward operator K does not annihilate constants, and from Theorem 2 it then follows that the MINRES algorithm should handle the KKT systems, arising in each Bregman iteration, very well.

Figure 3 shows the spectrum of \([\bar{\mathcal {R}}^k]^{-1}\bar{\mathcal {A}}_\alpha \) for this example. This eigenvalue distribution is clearly on the form (34). Hence, in accordance with Theorem 2, we obtain such a spectrum even though \(K = -TA^{-1}B\) is not injective (and \(\kappa =0\) in these computations).

Table 2 displays the iteration counts for Example 2. We see that the growth in the iteration numbers, as \(\alpha \) decreases, is handled well by the MINRES algorithm. For example, for the case of \(N = 256\) and \(\epsilon = 10^{-6}\), the growth can be modeled by the formula

$$\begin{aligned} 40.8 - 16.2\log _{10}(\alpha ). \end{aligned}$$

Similarly, for \(N = 256\) and \(\epsilon = 10^{-10}\), the least squares method gives us the formula

$$\begin{aligned} 58.2 - 35.6\log _{10}(\alpha ), \end{aligned}$$

as the best logarithmic fit of iteration growth.

The approximate solutions, seen in Fig. 4, are close to the “input solution” (58). We thus get good approximations even though we can not guarantee a unique solution (\(\kappa =0\), see [4]).

6 Conclusions

We have studied PDE-constrained optimization problems subject to TV-regularization. The main purpose of this text was to adapt the split Bregman algorithm, frequently used in imaging analysis, to this kind of problems.

In each iteration of the split Bregman scheme, a large KKT system

$$\begin{aligned} {\mathcal {A}}_\alpha q = g \end{aligned}$$
(59)

must be solved. Here, \(0< \alpha \ll 1\) is a regularization parameter, and the spectral condition number of \({\mathcal {A}}_\alpha \) tends to \(\infty \) as \(\alpha \rightarrow 0\). We investigated the performance of the MINRES algorithm applied to these indefinite systems. In particular, we analyzed the spectrum of \({\mathcal {A}}_\alpha \), and our main result shows that this spectrum is almost contained in three bounded intervals, with a small number of isolated eigenvalues. More specifically, we found that

$$\begin{aligned} \text {sp}(\mathcal {A}_\alpha ) \subset [-b,-a] \cup [c\alpha ,2\alpha ] \cup \{ \tau _1, \tau _2, ..., \tau _{N(\alpha )\}} \cup [a,b], \end{aligned}$$
(60)

where \(N(\alpha ) = O(\ln (\alpha ^{-1}))\). Krylov subspace solvers therefore handle (59) very well: The number of iterations required by the MINRES method can not grow faster than \(O([\ln (\alpha ^{-1})]^2)\) as \(\alpha \rightarrow 0\), and in practice one will often encounter growth rates of order \(O(\ln (\alpha ^{-1}))\).

Our theoretical findings were illuminated by numerical experiments. In both examples we observed approximately logarithmic growth in iteration numbers as \(\alpha \rightarrow 0\). This is in accordance with our theoretical results.