Abstract
We derive an efficient solution method for ill-posed PDE-constrained optimization problems with total variation regularization. This regularization technique allows discontinuous solutions, which is desirable in many applications. Our approach is to adapt the split Bregman technique to handle such PDE-constrained optimization problems. This leads to an iterative scheme where we must solve a linear saddle point problem in each iteration. We prove that the spectra of the corresponding saddle point operators are almost contained in three bounded intervals, not containing zero, with a very limited number of isolated eigenvalues. Krylov subspace methods handle such operators very well and thus provide an efficient algorithm. In fact, we can guarantee that the number of iterations needed cannot grow faster than \(O([\ln (\alpha ^{-1})]^2)\) as \(\alpha \rightarrow 0\), where \(\alpha \) is a small regularization parameter. Moreover, in our numerical experiments we demonstrate that one can expect iteration numbers of order \(O(\ln (\alpha ^{-1}))\).
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
We will use the following notation
-
\(\varOmega \subset \mathbb {R}^n\) is a bounded domain,
-
\(d_h\) is the observation data,
-
\(\rho > 1\) and \(\kappa \ge 0\) are parameters,
-
A, B and T are linear operators,
-
\(P^n_h\) represents a n-th order scalar FEM space,
-
\(V_h = H^1_h(\varOmega )\), i.e. \(P^1_h\) equipped with the \(H^1\)-norm, is the control space,
-
U and Z are Hilbert spaces,
-
\(U_h = U \cap P^n_h\) is the state space,
-
\(Z_h = Z \cap P^n_h\) is the observation space,
and we note that \(1 \le \text {dim}(V_h), \, \text {dim}(U_h), \, \text {dim}(Z_h) < \infty \). The objective of this paper is to propose and analyze an efficient algorithm for solving PDE-constrained optimization problems which can be written in the form
subject to
Here, (2) is a discretized PDE, and A, B and T will be discussed properly in Section 2. From the results published in [4], it follows that (1–2) has a unique solution if \(\kappa >0\), or if
is injective.
In order to design a fast method for solving (1–2), we must not only use an efficient algorithm for handling the nonlinear total variation (TV) term in (1), but also a scheme suitable for solving the inner systems that arise in each iteration of the outer algorithm. This will be achieved by combining the approached analyzed in [11], which is suitable for linear optimality systems, with a successful method for solving deblurring problems, namely the split Bregman method [5].
The main result of this paper shows that the spectrum of the linear systems, arising in each Bregman iteration, has a nice structure. Therefore, the Minimal Residual (MINRES) method handles these subproblems very well: Theoretically, one can prove that only \(O([\ln (\alpha ^{-1})]^2)\) MINRES-iterations are required, where \(\alpha \) is a small regularization parameterFootnote 1. (In [11] it is also explained why iteration numbers of order \(O(\ln (\alpha ^{-1}))\) often will occur in practise.)
Frequently, Tikhonov regularization is employed to stabilize PDE-constrained optimization problems. This approach yields a linear-quadratic optimization problem, which is mathematically appealing, but will produce smooth solutions. In many inverse problems, the control parameter is some physical property, e.g. a heat source, density of a medium or an electrical potential. When we try to identify such quantities, it might be desirable to make a sharp separation between regions with different qualities of the physical property. In other words, we want “jumps” in the solution. Thus, one might argue that the smooth solutions produced with Tikhonov regularization are of limited value, and that TV-regularization should be employed. The inverse problem of electrocardiography is a problem of this type, where one seeks to locate the ischemicFootnote 2 region of the heart. This can be achieved with a PDE-constrained optimization problem, where the control is the electrical potential of the myocardium, and the data is given in terms of ECG recordings. The ischemic area can be determined from the fact that the electrical potential is (approximately) piecewise constant, with different values in the ischemic and healthy regions. From an imaging point of view, it would be beneficial to properly separate these areas [12, 18].
Note that we limit our study to discretized problems posed in terms of finite dimensional spaces. This simplifies the discussion of the TV-regularization and enables the use of the results published in [11].
This paper is organized as follows:
-
In Sect. 2 we show how the PDE-constrained optimization problem (1–2) can be modified in such a way that we can apply the split Bregman algorithm.
-
In Sect. 3 we prove that the KKT systems that arise in each iteration of the split Bregman algorithm have a spectrum almost contained in three bounded intervals, with a very limited number of isolated eigenvalues. Hence, Krylov subspace algorithms will handle these systems very well.
-
Sect. 4 presents an alternative version of the split Bregman algorithm.
-
In Sect. 5 we illuminate the theoretical results with some numerical experiments.
-
Finally, the conclusions are presented in Sect. 6.
2 Split Bregman algorithm for PDE-constrained optimization problems
The split Bregman method has its roots in the Bregman iteration, which is an algorithm for computing extrema of convex functionals [2]. Later, it was used in [13] as a new regularization procedure for inverse problems, and in [5] the authors used this approach to find an effective solution method for \(L^1\)-regularization problems. In particular, they demonstrated why this method works well for total variation problems. Motivated by these papers, we write (1–2) in the form
subject to
where \(\mathbf {P}_h^0\) is a vector space of piecewise constant functions and K is defined in (3).
We will not go into details on how the split Bregman algorithm is derived, instead we refer to [5] and [3]. In this method, see Algorithm 1, we note that the parameter \(\lambda \) is fixed. Instead, it is the “data” that varies with the introduction of \(b^k\), where \(b^k\) can be interpreted as the Lagrange multiplier estimate associated with (5). The authors of [20] and [19] explain why the split Bregman scheme can be considered as an augmented Lagrangian method [6, 15].
At this point, we would like to present one important theorem from [3]:
Theorem 1
Assume that there exists at least one solution \(v_h^*\) of (4–5). Then the split Bregman algorithm satisfies
If the solution \(v_h^*\) is unique, we also have
Recall that our main objective is to derive an efficient solution method for (1–2), i.e. for rather general PDE-constrained optimization problems subject to TV-regularization. We will restrict our analysis to problems that satisfy the assumptions
-
\(\mathcal {A}\mathbf {1:}\,A: U_h\rightarrow U_h'\) is bounded and linear.
-
\(\mathcal {A}\mathbf {2:}\,A^{-1}\) exists and is bounded.
-
\(\mathcal {A}\mathbf {3:}\,B:V_h\rightarrow U_h'\) is bounded and linear.
-
\(\mathcal {A}\mathbf {4:}\,T:U_h\rightarrow Z_h\) is bounded and linear.
Here, bounded should be interpreted as having operator norm which is bounded independently of the mesh parameter h. Due to assumption \(\mathcal {A}\mathbf {2}\), we can write (2) in the form
and the operator K, see (3), is well-defined.
We want to apply the split Bregman algorithm to (4–5). Unfortunately, the explicit computation of the operator K is difficult in practical applications; if (2) is a PDE, then the inverse of A is typically too expensive to compute explicitly. This issue has been handled, in the case of Tikhonov regularization, by solving the associated KKT system. The purpose of this paper is to adapt the KKT approach to the framework of the split Bregman algorithm. As we will see below, this yields an efficient and practical solution method for PDE-constrained optimization problems subject to TV-regularization.
We proceed by applying Algorithm 1 to the minimization problem (4–5). Step 5 in Algorithm 1 is straightforward. Furthermore, Step 4 is, sinceFootnote 3 \(\nabla v_h^{k+1}, p_h^k, b_h^k \in \mathbf {P}^0_h\), very cheap to solve by the shrinkage operator
where
see [5]. Hence, the challenge is to find the minimizer of Step 3. That is, we must solve the minimization problem
where \(d_h, p_h^k\) and \(b_h^k\) are given quantities. By combining this minimization problem with Eqs. (6) and (3), we get the equivalent constrained minimization problem:
subject to
For the sake of simplicity, we want our optimality system to be as similar as possible to the optimality system analyzed in [11]. Thus, we need to scale the cost-functional in (8) such that we get
subject to (9), where
Next, we can introduce the Lagrangian associated with (9–10):
The first-order optimality conditions can be found by computing the derivatives of the Lagrangian with respect to \(v_h\), \(u_h\) and \(w_h\). These conditions can be expressed by the KKT system
where ”\('\)” is used to denote dual operators, and \(E:V_h\rightarrow V_h'\) is defined by
We have thus derived a new system of equations to be solved in Step 3 in Algorithm 1, which does not require the explicit inverse of A. Also note the form of the operator \(-\varDelta : V_h \rightarrow V_h'\) in the top left corner of the KKT system (12). In an infinite dimensional setting, this operator must be replaced with the more involved operator \(D'D: BV(\varOmega ) \rightarrow BV(\varOmega )'\), see [4] for a thorough discussion of this operator.Footnote 4 The operator \(D'D\) is much more challenging to analyze, but it coincides with the operator \(-\varDelta \) in a finite dimensional setting, which follows from the fact that \(Dv = \nabla v\) for all elements in \(W^{1,1}(\varOmega )\), see [1]. This concludes the discussion of Step 3 in Algorithm 1.
We might now formulate the full algorithm for solving the PDE-constrained optimization problem (1–2), see Algorithm 2.
The efficiency of the split Bregman algorithm has been demonstrated earlier, see e.g. [3, 5]. Of the three inner steps of the for-loop in Algorithm 2, the update of \(b_h^k\) is obviously cheap, and the update of \(p_h^k\) is accomplished by the simple shrinkage operator (7). What remains, however, is to analyze the spectrum of the KKT system (12), see Step 3 in Algorithm 2: The efficiency of the algorithm is highly dependent on how fast we can solve these KKT systems with, e.g., Krylov subspace solvers.
3 Spectrum of the KKT system
In its current form, the operator \(\widehat{\mathcal {A}}_\alpha \) in (12) is a mapping from the product space \(V_h \times U_h \times U_h\) onto the dual space \(V_h' \times U_h' \times U_h'\). Since this operator maps to the dual space, and not to the space itself, it is not possible to use the MINRES method directly. A remedy exists, however, in the form of Riesz maps. In this case, we must introduce the two Riesz maps
This enables us to use the MINRES algorithm, since the KKT system (12) can be written as
where
The operator \(\mathcal {R}^{-1}\) can be considered to be a preconditioner. See [10, 11] for a more thorough analysis.
We performed an experimental investigation that suggested the use of small values of \(\alpha \) to obtain good convergence results for the outer split Bregman algorithm. That is, \(\lambda /\rho \) should be small, see (11). According to standard theory for Krylov subspace methods, the number of iterations needed by the MINRES algorithm is of the same order as the spectral condition number of the involved operator. In our case, this corresponds to iterations numbers of order \(O(\alpha ^{-1})\), when \(\gamma = 0\). We will now show that this estimate is very pessimistic.
Since the case \(\gamma = 0\) is the most challenging, and also the most interesting, we will for the rest of the analysis assume that this is the case, i.e. \(\gamma = 0\). Let us first simplify the notation in (13), and write the operator \(\mathcal {R}^{-1}\widehat{\mathcal {A}}_\alpha \) in the form
where we have the following definitions:
-
\(Q = -R_{V_h}^{-1}\varDelta : V_h \rightarrow V_h\),
-
\(\tilde{B} = R_{U_h}^{-1}B : V_h \rightarrow U_h\),
-
\(\tilde{A} = R_{U_h}^{-1}A : U_h \rightarrow U_h\),
-
\(T^*T = R_{U_h}^{-1}T'T: U_h \rightarrow U_h\).
In this new form, the operator \(\mathcal {A}_\alpha \) in (14) is very similar to the operator analyzed in [11]. In fact, they analyzed the operator \(\mathcal {B}_\alpha : V_h \times U_h \times U_h \rightarrow V_h \times U_h \times U_h\), defined as
The main result in [11] is that the spectrum of \(\mathcal {B}_\alpha \) is of the form
where
and the constants a, b and \(c > 0\) are independent of the parameter \(\alpha \). The analysis in [11] is roughly performed as follows:
-
The negative eigenvalues are shown to be bounded away from zero, regardless of the size of regularization parameter \(\alpha \ge 0\). That is, it even holds for \(\alpha = 0\). Hence, the negative eigenvalues of \(\mathcal {A}_\alpha \), defined in (14), are bounded away from zero: The argument in [11] can be adapted to the present situation in a straightforward manner.
-
For the positive eigenvalues, the Courant-Fischer-Weyl min-max principle is used to show that the difference between the eigenvalues of \(\mathcal {B}_0\) and \(\mathcal {B}_\alpha \) is “small”, where \(\mathcal {B}_0\) denotes the operator \(\mathcal {B}_\alpha \) with zero regularization \(\alpha =0\). More specifically, they prove that the difference between the eigenvalues of \(\mathcal {B}_0\) and \(\mathcal {B}_\alpha \), properly sorted, is less than the size of the regularization parameter \(0 < \alpha \ll 1\). It is easy to verify that a similar property will hold for \(\mathcal {A}_0\) and \(\mathcal {A}_\alpha \). More specifically, the difference between the eigenvalues of \(\mathcal {A}_0\) and \(\mathcal {A}_\alpha \) is less than \(\tilde{c}\alpha \), where \(\tilde{c} = \Vert Q\Vert \).
-
Finally, the analysis in [11] requires that
$$\begin{aligned} \alpha (v_h,v_h)_{V_h} + (T^*Tu_h,u_h)_{U_h} \end{aligned}$$must be coercive whenever
$$\begin{aligned} \tilde{A}u_h + \tilde{B}v_h = 0. \end{aligned}$$It is proven in [11] that this property holds for the operator \(\mathcal {B}_\alpha \). For the operator \(\mathcal {A}_\alpha \), defined in (14), this analysis is more involved, and it will therefore be explored in detail here. More specifically, we must show, provided that suitable assumptions hold, that
$$\begin{aligned} \alpha (Qv_h,v_h)_{V_h} + (T^*Tu_h,u_h)_{U_h} \end{aligned}$$is coercive for all \((v_h,u_h)\) satisfying
$$\begin{aligned} \tilde{A}u_h+\tilde{B}v_h = 0. \end{aligned}$$
To further investigate the coercivity problem associated with (14), we introduce the notation
Since we work with finite dimensional spaces, we employ the control space \(V_h\) with the norm
i.e. \(V_h=H^1_h(\varOmega ) \subset H^1(\varOmega )\). Note that, for the analysis presented below, we must assume that the operator B satisfies assumption \(\mathcal {A}\mathbf {3}\) with the norm (18), i.e. that
is bounded, which along with assumptions \(\mathcal {A}\mathbf {2}\) and \(\mathcal {A}\mathbf {4}\) imply that
is boundedFootnote 5 (Bounded in the sense that the operator norm is bounded independently of h). We must also assume that the discrete solutions converge toward the correct solution as \(h \rightarrow 0\):
where \(\hat{K}:H^1(\varOmega ) \rightarrow Z\) denotes the associated mapping between the infinite dimensional spaces.
We are now ready to formulate the result concerning the coercivity issue for the operator \(\mathcal {A}_\alpha \) defined in (14).
Lemma 1
Let \(M_\alpha \) and N be defined as in (16) and (17), respectively. Assume that (19) holds and that \(\hat{K}\) does not annihilate constants, i.e. the constant function \(k \notin \mathcal {N}(\hat{K})\). Then there exists \(\bar{h} > 0\) such that the operator \(M_\alpha \) is coercive on the kernel of N, i.e. for \(\alpha \in (0,1)\):
for all \(h \in (0,\bar{h})\) and for all \(x_h=(v_h,u_h) \in X_h\) satisfying
The constant c is independent of \(h \in (0,\bar{h})\) and \(\alpha > 0\).
Proof
We will first show that, if \(\hat{K}\) does not annihilate constants, then there exist constants \(\bar{h} > 0\) and \(c \in (0,1)\), which is independent of \(h \in (0, \bar{h})\), such that
Thereafter, we will use this result to prove (20–21).
Assume that there do not exist \(\bar{h} > 0\) and \(c \in (0,1)\) such that (22) holds. We will show that this implies that the constant function k must belong to the null-space of \(\hat{K}\). If (22) is not satisfied, then it follows that there exist a sequence
and a sequence of functions
such that
We may choose a sequence with the property \(\Vert v_{h_i} \Vert ^2_{L^2(\varOmega )}=1\) because the operator \(K_h\) is linear. Since \((1/i - 1) \rightarrow -1\) as \(i \rightarrow \infty \), we can conclude that
We will now show that \(\{v_{h_i}\}\) has a limit in \(H^1(\varOmega )\). Let
It is well known that \(H^1(\varOmega ) = S \oplus \mathbb {R}\), i.e. every function in \(H^1(\varOmega )\) can be (uniquely) expressed as a sum of a function in S and a constant. Hence,
From this splitting, we obtain
see (23).
This enables us to use the Poincaré inequality to conclude that
i.e.
Furthermore, recall that \(\Vert v_{h_i}\Vert _{L^2(\varOmega )}^2 = 1\) and that \(\int _\varOmega s_{h_i} \, dx = 0\). Thus, it follows that
which yields
By using (26) we get
We claim that also the sequence \(\{ v_{h_i} \}\) converges toward \(r^*\) in \(H^1(\varOmega )\):
This follows from the fact that \(s_{h_i} = v_{h_i} - r_i\) and (25–26):
Here, we have used that \(\{ r_i \}\) is a sequence of constants and that \(r^*\) is a constant, which implies that \(\Vert r_i-r^*\Vert _{H^1(\varOmega )} = \Vert r_i-r^*\Vert _{L^2(\varOmega )}\) and that
provided that \(\varOmega \) has finite measure.
Since \(\{ v_{h_i} \}\) converges toward \(r^*\) in \(H^1(\varOmega )\), we may employ assumption (19) to find that
By combining these observations with (24), we conclude that
To summarize, if (22) does not hold, then \(\hat{K}\) annihilates constants. Conversely, if \(\hat{K}\) does not annihilate constants, (22) must hold.
We are now ready to show that (20–21) does indeed hold. Note that (22) can be written in the form: There exists \(c \in (0,1)\) such that
Assume that \(x=(v_h,u_h) \in X_h\) satisfies the state equation, i.e.
Then,
and since \(\tilde{A}^{-1}\tilde{B}\) is assumed to be bounded independently of h,
In addition,
Therefore, see (16), for \(\alpha \in (0,1)\) and \(h \in (0,\bar{h})\),
where we have used (30). Next, by invoking (27) and (29) we can conclude that
That is, \(M_\alpha \) is coercive on the kernel of N, cf. (17). \(\square \)
We will now use this lemma to establish the main result of this section. First, however, we note that:
Remark 1
From \(\mathcal {A}\mathbf {1}\)–\(\mathcal {A}\mathbf {3} \ \) it follows that the inf-sup condition holds, i.e.
Remark 2
Since we consider finite dimensional problems, there always exist positive constants \(\tilde{c}, \, \tilde{C}\) such that
where \(\tau _i(\mathcal {A}_{0})\) denotes the ith eigenvalue of \(\mathcal {A}_{0}\) sorted in decreasing order according to their absolute value. Here, \(\mathcal {A}_{0}\) is \(\mathcal {A}_{\alpha }\) with \(\alpha =0\), i.e. without regularization. (\(\mathcal {A}_{\alpha }\) is defined in (14)).
We consider ill-posed PDE-constrained optimization tasks. For such problems, \(\tilde{c} e^{-\tilde{C} n}\) will typically be extremely small, i.e. much smaller than practical choices of the size of the regularization parameter.
Let us state the theorem:
Theorem 2
Assume that all assumptions of Lemma 1 hold and that \(h \in (0,\bar{h})\). Then there exist constants \(a, \, b, \, c > 0\) such that, for \(\alpha \in (0,1)\), the spectrum of \(\mathcal {A}_\alpha \), defined in (14), satisfies
where
Here \(\tilde{c}, \, \tilde{C}\) are the constants in (33).
Proof
Since we consider finite dimensional problems, the theorem follows from Lemma 1 and the analysis presented in [11]. \(\square \)
Since the spectrum of \(\mathcal {A}_\alpha \) is of the form (34), we can conclude that the MINRES method will handle the KKT systems (13) very well. More precisely, the number of iterations needed by the MINRES scheme to solve (13) can not grow faster than \(O([\ln (\alpha ^{-1})]^2)\) as \(\alpha \rightarrow 0\), see [11]. In fact, in practice, iterations counts of order \(O(\ln (\alpha ^{-1}))\) will in many situations occur, which is also explained in [11].
Note that, while the optimality system (1–2) requires that either \(K = -TA^{-1}B: V_h \rightarrow Z_h\) is injective or that \(\gamma > 0\) to obtain a unique solution, see [4], the inner MINRES algorithm only requires that the constant k does not belong to the null-space of \(\hat{K}\).
4 Constrained split Bregman algorithm
The split Bregman algorithm we have analyzed is in [5] referred to as the unconstrained split Bregman method. For some applications, the related constrained split Bregman algorithm, also introduced in [5], produces better convergence rates. In order to discuss the latter method, we observe that the problem (4–5) can be formulated on the related, constrained form
subject to
Here, \(\varOmega _{\text {observe}}\) is the domain on which the observation data \(d_h\) is defined. The constraints are “implicit” in the sense that they are not necessarily satisfied in each step of the split Bregman algorithm, see [5]. Instead, the scheme generates approximations which converge toward functions satisfying these constraints, and a natural stopping criterion is thus
Details about the constrained split Bregman algorithm associated with this problem can be found in [3].
It turns out that this constrained approach also can be applied to a PDE-constrained optimization problem, and an experimental investigation gave us better convergence results with this latter approach. We will therefore present the constrained split Bregman algorithm for discretized PDE-constrained optimization problems of the form
subject to
Note that the first constraint here is “explicit”, i.e. it must be satisfied in each step of the algorithm. The latter two constraints are “implicit”.
Recall the KKT system (12) that we derived in connection with Algorithm 2. For the constrained split Bregman method, we get the very similar optimality system
where “\('\)” is used to denote dual operators, and \(E:V_h\rightarrow V_h'\) is defined by
Compared with (12), only the term \(- T'c_h^k\) has been added to the second row of the right hand side of (35). The operator \(\widehat{\mathcal {A}}_{\alpha }\) on the left hand side is unchanged, and our analysis of the MINRES method, presented above, also applies to this KKT system. The associated algorithm is, of course, similar to Algorithm 2, see Algorithm 3.
We observe that Algorithm 3 only requires one more simple update compared with Algorithm 2: The update for \(c_h^{k+1}\). This extra computer effort is diminishingly small, and since we obtain better convergence results, we will present numerical experiments with the use of Algorithm 3 only.
5 Numerical experiments
5.1 Example 1
Let \(\varOmega = (0,1) \times (0,1)\). We consider the standard example in PDE-constrained optimization, but with TV-regularization instead of Tikhonov regularization. That is,
subject to
where the control space \(V_h\), the state space \(U_h\) and the observation space \(Z_h\) are
respectively. Furthermore, the operator T is the embedding \(T: H^1(\varOmega ) \hookrightarrow L^2(\varOmega )\). Hence, assumption \(\mathcal {A}\mathbf {4}\) is satisfied.
Recall that our objective is to solve this system with Algorithm 3. The main challenge is the efficient solution of the KKT systems (35). To derive this optimality system, we need the weak formulation of the boundary value problem (37–38).
5.1.1 Computational details
The weak formulation reads: Find \(u_h \in U_h\) such that
where
From standard PDE theory, we find that A and \(A^{-1}\) have operator norms which are bounded independently of h. The boundedness of
where one employs the \(H^1\)-topology (18) on \(V_h\), follows from the inequalities
We conclude that assumptions \(\mathcal {A}\mathbf {1}\), \(\mathcal {A}\mathbf {2}\) and \(\mathcal {A}\mathbf {3}\) are satisfied.
The KKT system to be solved in Algorithm 3 now takes the form
Recall that \(\alpha =\lambda /\rho \), where \(\rho \) is the regularization parameter in (36) and \(\lambda \) is the parameter employed in the Bregman scheme, see the discussion of (8–11).
The discretization of the operator \(\mathcal {R}\) in (44) is rather straightforward. Recall that the finite dimensional space \(V_h\) was equipped with the norm \(\Vert \cdot \Vert _{H_h^1(\varOmega )}\). Furthermore, since \(U = H^1(\varOmega )\) in this particular example, it follows that the discretization of both of the Riesz maps \(R_{V_h}\) and \(R_{U_h}\) yields the sum of the mass matrix M and stiffness matrix S.
For the operator \({\widehat{\mathcal {A}}_\alpha }\) in (44), the discretization is more challenging, but a general recipe can be found in [10]. The end result can be summarized as follows:
-
A, defined in (42), yields the matrix \(M + S\), which is the sum of the mass and stiffness matrices associated with the domain \(\varOmega \).
-
B, defined in (43), yields the mass matrix M.
-
\(-\varDelta \) yields the stiffness matrix S.
-
\(T'T = R_{U_h}^{-1}T^*T\) yields the mass matrix M.
-
The functions \(v_h\), \(u_h\), \(w_h\), \(p_h^k\), \(b_h^k\), \(c_h^k\) and \(d_h\) yields the corresponding vectors \(\bar{v}\), \(\bar{u}\), \(\bar{w}\), \(\bar{p}^k\), \(\bar{b}^k\), \(\bar{c}^k\) and \(\bar{d}\), respectively.
Hence, the matrix “version” of (44) is
The preconditioner thus reads
and involves the inverse of the matrix \(M + S\). This inverse is computed approximately by using algebraic multigrid (AMG). We discuss this in some more detail in the numerical setup.
5.1.2 Numerical setup
-
We wrote the code using cbc.block, which is a FEniCS-based Python implemented library for block operators. See [9] for details.
-
The PyTrilinos package was used to compute an approximation of the preconditioner (46). We approximated the inverse using AMG with a symmetric Gauss-Seidel smoother with three smoothing sweeps. All tables containing iteration counts for the MINRES method were generated with this approximate inverse Riesz map. On the other hand, the eigenvalues of the KKT systems \([\bar{\mathcal {R}}]^{-1}\bar{\mathcal {A}}_\alpha \), see (45), were computed with an exact inverse \([\bar{\mathcal {R}}^k]^{-1}\) computed in Octave.
-
To discretize the domain, we divided \(\varOmega = (0,1) \times (0,1)\) into \(N \times N\) squares, and each of these squares were divided into two triangles.
-
The MINRES iteration process was stopped as soon as
$$\begin{aligned} \frac{\Vert r^k_n\Vert }{\Vert r^k_0\Vert } = \left[ \frac{\left( \ \bar{\mathcal {A}}_\alpha \bar{q}_n^k - \bar{g}^k,[\bar{\mathcal {R}}]^{-1} \left[ \bar{\mathcal {A}}_\alpha \bar{q}_n^k - \bar{g}^k\right] \ \right) }{\left( \ \bar{\mathcal {A}}_\alpha \bar{q}_0^k - \bar{g}^k,[\bar{\mathcal {R}}]^{-1} \left[ \bar{\mathcal {A}}_\alpha \bar{q}_0^k - \bar{g}^k\right] \ \right) } \right] ^{1/2} < \epsilon . \end{aligned}$$(47)Here, \(\epsilon \) is a small positive parameter. Note that the superindex k is the iteration index for the “outer” split Bregman method, while the subindex n is the iteration index for the “inner” MINRES algorithm (at each step of the split Bregman method).
-
No noise was added to the input data \(d_h\), see (36).
5.1.3 Results
We are now ready to solve the problem (36–38). The synthetic data \(d_h\) was produced by setting
and then we solved the boundary value problem (37–38) with (48) as input. The data \(d_h\) was thereafter set equal to the solution \(u_h\) throughout the entire domain \(\varOmega =(0,1) \times (0,1)\).
Theorem 2 states that the KKT system (13–14) arising in each iteration of the split Bregman iteration has a spectrum of the form (34). In Fig. 1, we see a spectrum of such a KKT system, and it is clearly of the form (34). Hence, we should expect the MINRES algorithm to solve the problem efficiently.
Table 1 illuminates the theoretically established convergence behavior of the MINRES algorithm. As previously mentioned, in [11] the authors proved that the number of iterations can not grow faster than \(O([\ln (\alpha ^{-1})]^2)\), and showed why iteration growth of \(O(\ln (\alpha ^{-1}))\) often occur in practice. For \(\epsilon = 10^{-6}\), see (47), and \(N = 256\), we get the following estimate for the iteration growth
where the coefficients are computed by the least squares method. The growth is very well modeled by this formula. Similarly, for \(\epsilon = 10^{-10}\) and \(N = 256\), we can model the growth by the formula
In Fig. 2, two approximate solutions of the optimization problem (36–38) are displayed: After 10 and 70 Bregman iterations. The “true” control/source is defined in (48).
Remark
As mentioned above, the problem (36–38), with Tikhonov regularization instead of TV-regularization, has been analyzed by many scientists. In fact, for Tikhonov regularization a number of numerical schemes that are completely robust with respect to the size of the regularization parameter have been developed [14, 16, 17]: Even logarithmic growth in iterations counts is avoided. As far as the authors knows, it is not known whether these techniques can be adapted to the saddle point problem (44).
5.2 Example 2
We will now explore a more challenging problem. Let the domain \(\varOmega \) still be the unit square. Furthermore, define
The problem we want to study is
subject to
where the control space \(V_h\), the state space \(U_h\) and the observation space \(Z_h\) are
respectively. Furthermore, the operator \(T: H^1(\varOmega ) \rightarrow L^2(\partial \varOmega )\) is the trace operator. Hence, assumption \(\mathcal {A}\mathbf {4}\) is satisfied.
We observe two differences between examples 1 and 2. First, the control domain \(\tilde{\varOmega }\) is now a subdomain of the entire domain \(\varOmega \), bounded strictly away from the boundary \(\partial \varOmega \). Secondly, the observation domain is reduced from the entire domain \(\varOmega \) to the boundary \(\partial \varOmega \).
In this model problem, \(V_h\) does not coincide with the control space defined in bullet point 1 in Sect. 1. Nevertheless, the proof of Lemma 1 can be adapted to the present situation in a straightforward manner, and Theorem 2 therefore also holds for this example.
Since the discretization of (49–51) is very similar to the discretization of (36–38), we do not enter into all the details. Instead, we only focus on the differences.
The weak formulation of the state Eqs. (50–51) reads: Find \(u \in U_h\) such that
where the operator A is still defined as in (42). The operator B, however, is no longer as in (43), but is here defined by
where we can employ the norm
on the control space \(V_h\). From standard PDE theory, we can guarantee that A and \(A^{-1}\) are bounded, and the boundedness of B is verified in a manner very similar to the argument presented in connection with Example 1:
because \(\tilde{\varOmega }\) is a subdomain of \(\varOmega \). We conclude that assumptions \(\mathcal {A}\mathbf {1}\), \(\mathcal {A}\mathbf {2}\) and \(\mathcal {A}\mathbf {3}\) are satisfied.
The new control domain \(\tilde{\varOmega }\) and the redefined operators B and T lead to some changes in the discretization of the optimality system (35), which must be solved repeatedly in Algorithm 3. These can be summarized as follows:
-
B, defined in (55), yields the mass matrix \(\tilde{M}\) associated with the subdomain \(\tilde{\varOmega }\).
-
\(-\varDelta \) yields the stiffness matrix \(\tilde{S}\) associated with the subdomain \(\tilde{\varOmega }\).
-
\(T'T = R_{U_h}^{-1}T^*T\) yields the “boundary” mass matrix \(M_\partial \).
-
The Riesz map \(R_{V_h}\) now yields the sum of the mass matrix \(\tilde{M}\) and stiffness matrix \(\tilde{S}\).
All other operators are discretized in the same fashion as in Example 1. Hence, the matrix “version” of the optimality system in Algorithm 3, associated with (49–51), takes the form
The preconditioner thus reads
5.2.1 Results
The synthetic data \(d_h\) was produced in the same manner as in Example 1. We computed the synthetic data from the function \(v_h \in V_h\), where
Note that the forward operator \(K = -TA^{-1}B\) does not guarantee a unique solution of (49–51), since the trace operator is not injective, see [4]. Nevertheless, the forward operator K does not annihilate constants, and from Theorem 2 it then follows that the MINRES algorithm should handle the KKT systems, arising in each Bregman iteration, very well.
Figure 3 shows the spectrum of \([\bar{\mathcal {R}}^k]^{-1}\bar{\mathcal {A}}_\alpha \) for this example. This eigenvalue distribution is clearly on the form (34). Hence, in accordance with Theorem 2, we obtain such a spectrum even though \(K = -TA^{-1}B\) is not injective (and \(\kappa =0\) in these computations).
Table 2 displays the iteration counts for Example 2. We see that the growth in the iteration numbers, as \(\alpha \) decreases, is handled well by the MINRES algorithm. For example, for the case of \(N = 256\) and \(\epsilon = 10^{-6}\), the growth can be modeled by the formula
Similarly, for \(N = 256\) and \(\epsilon = 10^{-10}\), the least squares method gives us the formula
as the best logarithmic fit of iteration growth.
The approximate solutions, seen in Fig. 4, are close to the “input solution” (58). We thus get good approximations even though we can not guarantee a unique solution (\(\kappa =0\), see [4]).
6 Conclusions
We have studied PDE-constrained optimization problems subject to TV-regularization. The main purpose of this text was to adapt the split Bregman algorithm, frequently used in imaging analysis, to this kind of problems.
In each iteration of the split Bregman scheme, a large KKT system
must be solved. Here, \(0< \alpha \ll 1\) is a regularization parameter, and the spectral condition number of \({\mathcal {A}}_\alpha \) tends to \(\infty \) as \(\alpha \rightarrow 0\). We investigated the performance of the MINRES algorithm applied to these indefinite systems. In particular, we analyzed the spectrum of \({\mathcal {A}}_\alpha \), and our main result shows that this spectrum is almost contained in three bounded intervals, with a small number of isolated eigenvalues. More specifically, we found that
where \(N(\alpha ) = O(\ln (\alpha ^{-1}))\). Krylov subspace solvers therefore handle (59) very well: The number of iterations required by the MINRES method can not grow faster than \(O([\ln (\alpha ^{-1})]^2)\) as \(\alpha \rightarrow 0\), and in practice one will often encounter growth rates of order \(O(\ln (\alpha ^{-1}))\).
Our theoretical findings were illuminated by numerical experiments. In both examples we observed approximately logarithmic growth in iteration numbers as \(\alpha \rightarrow 0\). This is in accordance with our theoretical results.
Notes
The exact form of the parameter \(\alpha \) is presented in the next section.
Ischemia is a precursor of heart infarction.
For higher order discretizations, the problem in Step 4 becomes more difficult to solve.
Except for the presentation and discussion of Lemma 1, we simply write K instead of \(K_h\).
References
Acar, R., Vogel, C.R.: Analysis of bounded variation penalty methods for ill-posed problems. Inverse Probl. 10, 1217–1229 (1994)
Brègman, L.M.: A relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming. Zh. vychisl. Mat. mat. Fiz 7, 620–631 (1967)
Cai, J.F., Osher, S., Shen, Z.: Split Bregman methods and frame based image restoration. Multiscale Model. Simul. 8, 337–369 (2009)
Chavent, G., Kunisch, K.: Regularization of linear least squares problems by total bounded variation. ESAIM 2, 359–376 (1997)
Goldstein, T., Osher, S.: The split Bregman method for L1-regularized problems. SIAM J. Imaging Sci. 2, 323–343 (2009)
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4, 302–320 (1969)
Hintermüller, M., Kunisch, K.: Totally bounded variation regularization as bilaterally constrained optimization problem. SIAM J. Appl. Math. pp. 1311–1333 (2004)
Hintermüller, M., Stadler, G.: A primal-dual algorithm for TV-based inf-convolution-type image restoration. SIAM J. Sci. Comput. 28, 1–23 (2006)
Mardal, K.A., Haga, J.B.: Block preconditioning of systems of PDEs. In: Logg, A., Mardal, K.A., Wells, G. (eds.) Automated Solution of Differential Equations, pp. 635–654. Springer, New York (2012)
Mardal, K.A., Winther, R.: Preconditioning discretizations of systems of partial differential equations. Numer. Linear Algebr. Appl. 18(1), 1–40 (2011)
Nielsen, B.F., Mardal, K.A.: Analysis of the minimal residual method applied to ill-posed optimality systems. SIAM J. Sci. Comput. 35(2), 785–814 (2013)
Nielsen, B.F., Lysaker, M., Grøttum, P.: Computing ischemic regions in the heart with the bidomain model—first steps towards validation. IEEE Trans. Med. Imaging 32, 1085–1096 (2013)
Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation-based image restoration. Multiscale Model. Simul. 4, 460–489 (2005)
Pearson, J.W., Wathen, A.J.: A new approximation of the Schur complement in preconditioners for PDE-constrained optimization. Numer. Linear Algebr. Appl. 19(5), 816–829 (2012)
Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization, pp. 283–298 (1969)
Schöberl, J., Zulehner, W.: Symmetric indefinite preconditioners for saddle point problems with applications to PDE-constrained optimization problems. SIAM J. Matrix Anal. Appl. 29(3), 752–773 (2007)
Schöberl, J., Simon, R., Zulehner, W.: A robust multigrid method for elliptic optimal control problems. SIAM J. Numer. Anal. 49(4), 1482–1503 (2011)
Wang, D., Kirby, R.M., MacLeod, R.S., Johnson, C.R.: Inverse electrocardiographic source localization of ischemia: an optimization framework and finite element solution. J. Comput. Phys. 250, 403–424 (2013)
Wu, C., Tai, X.C.: Augmented Lagrangian method, dual method and split-Bregman iterations for ROF, vectorial TV and higher order models. SIAM J. Imaging Sci. 3(3), 300–339 (2010)
Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for L1-minimization with applications to compressed sensing. SIAM J. Imaging Sci. 1(1), 143–168 (2008)
Acknowledgments
The authors would like to thank the FEniCS community for their work on the automatic solution of PDEs. We are also grateful for the comments provided by the referees, which significantly improved this article. This work was supported by the The Research Council of Norway, Project Number 239070.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Elvetun, O.L., Nielsen, B.F. The split Bregman algorithm applied to PDE-constrained optimization problems with total variation regularization. Comput Optim Appl 64, 699–724 (2016). https://doi.org/10.1007/s10589-016-9823-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-016-9823-3