Abstract
This work is concerned with linear inverse problems where a distributed parameter is known a priori to only take on values from a given discrete set. This property can be promoted in Tikhonov regularization with the aid of a suitable convex but nondifferentiable regularization term. This allows applying standard approaches to show well-posedness and convergence rates in Bregman distance. Using the specific properties of the regularization term, it can be shown that convergence (albeit without rates) actually holds pointwise. Furthermore, the resulting Tikhonov functional can be minimized efficiently using a semi-smooth Newton method. Numerical examples illustrate the properties of the regularization term and the numerical solution.
Access provided by CONRICYT-eBooks. Download chapter PDF
Similar content being viewed by others
1 Introduction
We consider Tikhonov regularization of inverse problems, where the unknown parameter to be reconstructed is a distributed function that only takes on values from a given discrete set (i.e., the values are known, but not in which points they are attained). Such problems can occur, e.g., in nondestructive testing or medical imaging; a similar task also arises as a sub-step in segmentation or labelling problems in image processing. The question we wish to address here is the following: If such strong a priori knowledge is available, how can it be incorporated in an efficient manner? Specifically, if X and Y are function spaces, F : X → Y denotes the parameter-to-observation mapping, and y δ ∈ Y is the given noisy data, we would wish to solve the constrained Tikhonov functional
for
where \(u_1,\dots ,u_d\in \mathbb {R}\) are the known parameter values. However, this set is nonconvex, and hence the functional in (1) is not weakly lower-semicontinuous and can therefore not be treated by standard techniques. (In particular, it will in general not admit a minimizer.) A common strategy to deal with such problems is by convex relaxation, i.e., replacing U by its convex hull
This turns (1) into a classical bang-bang problem, whose solution is known to generically take on only the values u 1 or u d ; see, e.g., [4, 24]. If d > 2, intermediate parameter values are therefore lost in the reconstruction. (Here we would like to remark that a practical regularization should not only converge as the noise level tends to zero but also yield informative reconstructions for fixed—and ideally, a large range of—noise levels.) As a remedy, we propose to add a convex regularization term that promotes reconstructions in U (rather than merely in coU) for the convex relaxation. Specifically, we choose the convex integral functional
for a convex integrand \(g:\mathbb {R}\to \mathbb {R}\) with a polyhedral epigraph whose vertices correspond to the known parameter values u 1, …, u d . Just as in L 1 regularization for sparsity (and in linear optimization), it can be expected that minimizers are found at the vertices, thus yielding the desired structure.
This approach was first introduced in [8] in the context of linear optimal control problems for partial differential equations, where the so-called multi-bang (as a generalization of bang-bang) penalty \(\mathcal {G}\) was obtained as the convex envelope of a (nonconvex) L 0 penalization of the constraint u ∈ U. The application to nonlinear control problems and the limit as the L 0 penalty parameter tends to infinity were considered in [9], and our particular choice of \(\mathcal {G}\) is based on this work. The extension of this approach to vector-valued control problems was carried out in [10].
Our goal here is therefore to investigate the use of the multi-bang penalty from [9] as a regularization term in inverse problems, in particular addressing convergence and convergence rates as the noise level and the regularization parameter tend to zero. Due to the convexity of the penalty, these follow from standard results on convex regularization if convergence is considered with respect to the Bregman distance. The main contribution of this work is to show that due to the structure of the pointwise penalty, this convergence can be shown to actually hold pointwise. Since the focus of our work is the novel convex regularization term, we restrict ourselves to linear problems for the sake of presentation. However, all results carry over in a straightforward fashion to nonlinear problems. Finally, we describe following [8, 9] the computation of Tikhonov minimizers using a path-following semismooth Newton method.
Let us briefly mention other related literature. Regularization with convex nonsmooth functionals is now a widely studied problem, and we only refer to the monographs [17, 21, 23] as well as the seminal works [6, 13, 15, 20]. To the best of our knowledge, this is the first work treating regularization of general inverse problems with discrete-valued distributed parameters. As mentioned above, similar problems occur frequently in image segmentation or, more generally, image labelling problems. The former are usually treated by (multi-phase) level set methods [27] or by a combination of total variation minimization and thresholding [7]. More general approaches to image labelling problems are based on graph-cut algorithms [1, 16] or, more recently, vector-valued convex relaxation [14, 19]. Both multi-phase level sets and vector-valued relaxations, however, have the disadvantage that the dimension of the parameter space grows quickly with the number of admissible values, which is not the case in our approach. On the other hand, our approach assumes, similar to [16], a linear ordering of the desired values which is not necessary in the vector-valued case; see also [10].
This work is organized as follows. In Sect. 2, we give the concrete form of the pointwise multi-bang penalty g and summarize its relevant properties. Section 3 is concerned with well-posedness, convergence, and convergence rates of the corresponding Tikhonov regularization. Our main result, the pointwise convergence of the regularized solutions to the true parameter, is the subject of Sect. 4. We also briefly discuss the structure of minimizers for given y δ and fixed α > 0 in Sect. 5. Finally, we address the numerical solution of the Tikhonov minimization problem using a semismooth Newton method in Sect. 6 and apply this approach to an inverse source problem for a Poisson equation in Sect. 7.
2 Multi-Bang Penalty
Let \(u_1<\cdots <u_d\in \mathbb {R}\), d ≥ 2, be the given admissible parameter values and \(\Omega \subset \mathbb {R}^n\), \(n\in \mathbb {N}\), be a bounded domain. Following [9, § 3], we define the corresponding multi-bang penalty
for \(g:\mathbb {R}\to \overline {\mathbb {R}}\) defined by
(Note that we have now included the convex constraint u ∈coU in the definition of \(\mathcal {G}\).) This choice can be motivated as the convex hull of \(\frac 12\| \cdot \|{ }_{L^2(\Omega )}^2 + \delta _U\), where δ U denotes the indicator function of the set U defined in (2) in the sense of convex analysis, i.e., δ U (u) = 0 if u ∈ U and ∞ else; see [9, § 3]. Setting
it is straightforward to verify that
and hence g is the pointwise supremum of affine functions and therefore convex and continuous on the interior of its effective domain domg = [u 1, u d ].
We can thus apply the sum rule and maximum rule of convex analysis (see, e.g., [22, Props. 4.5.1 and 4.5.2, respectively]), and obtain for the convex subdifferential at v ∈domg that
Using the definition of g i together with the classical characterization of the subdifferential of an indicator function via its normal cone yields the explicit characterization
In Sects. 5 and 6, we will also make use of the subdifferential of the Fenchel conjugate g ∗ of g. Here we can use the fact that g is convex and hence q ∈ ∂g(v) if and only if v ∈ ∂g ∗(q) (see, e.g., [22, Prop. 4.4.4]) to obtain
(Note that subdifferentials are always closed.) We illustrate these characterizations for a simple example in Fig. 1.
Finally, since g is proper, convex, and lower semi-continuous by construction, the corresponding integral functional \(\mathcal {G}:L^2(\Omega )\to \overline {\mathbb {R}}\) is proper, convex and weakly lower semicontinuous as well; see, e.g., [2, Proposition 2.53]. Furthermore, the subdifferential can be computed pointwise as
see, e.g., [2, Prop. 2.53]. The same is true for the Fenchel conjugate \(\mathcal {G}^*:L^2(\Omega )\to \overline {\mathbb {R}}\) and hence for \(\partial \mathcal {G}^*\) (which is thus an element of L ∞( Ω) instead of L 2( Ω)); see, e.g., [12, Props. IV.1.2, IX.2.1].
3 Multi-Bang Regularization
We consider for a linear operator K : X → Y between the Hilbert spaces X = L 2( Ω) and Y and exact data y †∈ Y the inverse problem of finding u ∈ X such that
We assume that K is weakly closed, i.e., \(u_n\rightharpoonup u\) and \(Ku_n\rightharpoonup y\) imply y = Ku. For the sake of presentation, we also assume that (6) admits a solution u †∈ X. Let now y δ ∈ Y be given noisy data with ∥ y δ − y †∥ Y ≤ δ for some noise level δ > 0. The multi-bang regularization of (6) for α > 0 then consists in solving
Since \(\mathcal {G}\) is proper, convex and semi-continuous with bounded effective domain coU, and K is weakly closed, the following results can be proved by standard semi-continuity methods; see also [9, 10].
Proposition 1 (Existence and Uniqueness)
For every α > 0, there exists a minimizer \(u_\alpha ^\delta \) to (7). If K is injective, this minimizer is unique.
Proposition 2 (Stability)
Let \(\{y_n\}_{n\in \mathbb {N}}\subset Y\) be a sequence converging strongly to y δ ∈ Y and α > 0 be fixed. Then the corresponding sequence of minimizers \(\{u_n\}_{n\in \mathbb {N}}\) to (7) contains a subsequence converging weakly to a minimizer \(u^\delta _\alpha \).
We now address convergence for δ → 0. Recall that an element u †∈ X is called a \(\mathcal {G}\)-minimizing solution to (6) if it is a solution to (6) and \(\mathcal {G}(u^\dag )\le \mathcal {G}(u)\) for all solutions u to (6). The following result is standard as well; see, e.g., [17, 21, 23].
Proposition 3 (Convergence)
Let \(\{y^{\delta _n}\}_{n\in \mathbb {N}}\subset Y\) be a sequence of noisy data with \(\|\,y^{\delta _n}-y^\dag \|{ }_Y\leq \delta _n \to 0\) , and choose α n := α n (δ n ) satisfying
Then the corresponding sequence of minimizers \(\{u_{\alpha _n}^{\delta _n}\}_{n\in \mathbb {N}}\) to (7) contains a subsequence converging weakly to a \(\mathcal {G}\) -minimizing solution u †.
For convex nonsmooth regularization terms, convergence rates are usually derived in terms of the Bregman distance [5], which is defined for u 1, u 2 ∈ X and \(p_1\in \partial \mathcal {G}(u_1)\) as
From the convexity of \(\mathcal {G}\), it follows that \(d_{\mathcal {G}}^{p_1}(u_2,u_1)\geq 0\) for all u 2 ∈ X. Furthermore, we have from, e.g., [17, Lem. 3.8] the so-called three-point identity
for any u 1, u 2, u 3 ∈ X and \(p_1\in \mathcal {G}(u_1)\) and \(p_2\in \partial \mathcal {G}(u_2)\). Finally, we point out that due to the pointwise characterization (5) of the subdifferential of the integral functional \(\mathcal {G}\), we have that
for
Standard arguments can then be used to show convergence rates for a priori and a posteriori parameter choice rules under the usual source conditions; see, e.g., [6, 17, 20, 21, 23]. Here we follow the latter and assume that there exists a w ∈ Y such that
Under the a priori choice rule
we obtain the following convergence rate from, e.g., [17, Cor. 3.4].
Proposition 4 (Convergence Rate, A Priori)
Assume that the source condition (10) holds and that α = α(δ) is chosen according to (11). Then there exists a C > 0 such that
We obtain the same rate under the classical Morozov discrepancy principle
for some τ > 1 from, e.g., [17, Thm. 3.15].
Proposition 5 (Convergence Rate, A Posteriori)
Assume that the source condition (10) holds and that α = α(δ) is chosen according to (12). Then there exists a C > 0 such that
4 Pointwise Convergence
The pointwise definition (9) of the Bregman distance together with the explicit pointwise characterization (3) of subgradients allows us to show that the convergence in Proposition 3 is actually pointwise if u †(x) ∈{u 1, …, u d } almost everywhere. The following lemma provides the central argument for pointwise convergence.
Lemma 1
Let v †∈{u 1, …, u d } and q †∈ ∂g(v †) satisfying
Furthermore, let \(\{v_n\}_{n\in \mathbb {N}}\subset [u_1,u_d]\) be a sequence with
Then, v n → v †.
Proof
We argue by contraposition: Assume that v n does not converge to v † = u i for some 1 ≤ i ≤ d. Then there exists an ε > 0 such that for every \(n_0\in \mathbb {N}\), there is an n ≥ n 0 with |v n − v †| > ε, i.e., either v n > u i + ε or v n < u i − ε. We now further discriminate these two cases. (Note that some cases cannot occur if i = 1 or i = d.)
-
(i)
v n > u i+1: Then, v n ∈ (u k , u k+1] for some k ≥ i + 1. The three point identity (8) yields that
$$\displaystyle \begin{aligned} d_g^{q^\dag}(v_n,v^\dag)=d_g^{q_{i+1}}(v_n,u_{i+1})+d_g^{q^\dag}(u_{i+1},v^\dag)+(q_{i+1}-q^\dag)(v_n-u_{i+1})\end{aligned} $$for q i+1 ∈ ∂g(u i+1). We now estimate each term separately. The first term is nonnegative by the properties of Bregman distances. For the last term, we can use the assumption (13) and the pointwise characterization (3) to obtain
$$\displaystyle \begin{aligned} q^\dag\in\left(\tfrac{1}{2}(u_i+u_{i-1}),\tfrac{1}{2}(u_i+u_{i+1})\right)\quad\text{and}\quad q_{i+1}\in\left[\tfrac{1}{2}(u_{i+1}+u_i),\tfrac{1}{2}(u_{i+1}+u_{i+2})\right],\end{aligned} $$which implies that q i+1 − q † > 0. By assumption we have v n − u i+1 > 0, which together implies that the last term is strictly positive. For the second term, we can use that v †, u i+1 ∈ [u i , u i+1] to simplify the Bregman distance to
$$\displaystyle \begin{aligned} d_g^{q^\dag}(u_{i+1},v^\dag)=\frac{1}{2}(u_{i+1}-u_i)(u_{i+1}+u_i-2q^\dag) >0,\end{aligned} $$again by assumption (13). Since this term is independent of n, we obtain the estimate
$$\displaystyle \begin{aligned} d_g^{q^\dag}(v_n,v^\dag)>d_g^{q^\dag}(u_{i+1},v^\dag)=:\varepsilon_1>0.\end{aligned} $$ -
(ii)
u i < v n ≤ u i+1: In this case, we can again simplify
$$\displaystyle \begin{aligned} d_g^{q^\dag}(v_n,v^\dag)=\frac{1}{2}(u_{i+1}+u_i-2q^\dag)(v_n-v^\dag)>C_1\varepsilon,\end{aligned} $$since \(C_1:=\frac {1}{2}(u_{i+1}+u_i-2q^\dag )>0\) by assumption (13) and v n − v † > ε by hypothesis.
-
(iii)
v n < u i : We argue similarly to either obtain
$$\displaystyle \begin{aligned} d_g^{q^\dag}(v_n,v^\dag)&>d_g^{q^\dag}(u_{i-1},v^\dag)=:\varepsilon_2>0 \end{aligned} $$or
$$\displaystyle \begin{aligned} d_g^{q^\dag}(v_n,v^\dag)&>C_2\varepsilon \end{aligned} $$for \(C_2:=-\frac {1}{2}(u_{i-1}+u_i-2q^\dag )>0\).
Thus if we set \(\tilde \varepsilon :=\min \{\varepsilon _1,\varepsilon _2,C_1\varepsilon ,C_2\varepsilon \}\), for every \(n_0\in \mathbb {N}\) we can find n ≥ n 0 such that \(d_g^{q^\dag }(v_n,v^\dag )>\tilde \varepsilon >0\). Hence, \(d_g^{q^\dag }(v_n,v^\dag )\) cannot converge to 0. □
Assumption (13) can be interpreted as a strict complementarity condition for q † and v †. Comparing (13) to (3), we point out that such a choice of q † is always possible. If v †∉{u 1, …, u d }, on the other hand, convergence in Bregman distance is uninformative.
Lemma 2
Let v †∈ (u i , u i+1) for some 1 ≤ i < d and q †∈ ∂g(v †). Then we have
Proof
By the definition of the Bregman distance and the characterization (3) of ∂g(v †) (which is single-valued under the assumption on v †), we directly obtain
for any v ∈ [u i , u i+1]. □
Lemma 1 allows us to translate the weak convergence from Proposition 3 to pointwise convergence, which is the main result of our work.
Theorem 1
Assume the conditions of Proposition 3 hold. If u †(x) ∈{u 1, …, u d } almost everywhere, the subsequence \(u_{\alpha _n}^{\delta _n}\to u^\dag \) pointwise almost everywhere.
Proof
From Proposition 3, we obtain a subsequence \(\{u_n\}_{n\in \mathbb {N}}\) of \(\{u_{\alpha _n}^{\delta _n}\}_{n\in \mathbb {N}}\) converging weakly to u †. Since \(\mathcal {G}\) is convex and lower semicontinuous, we have that
By the minimizing properties of \(\{u_n\}_{n\in \mathbb {N}}\) and the nonnegativity of the discrepancy term, we further obtain that
Dividing this inequality by α n and passing to the limit n →∞, the assumption on α n from Proposition 3 yields that
which combined with (14) gives \(\lim _{n\to \infty }\mathcal {G}(u_n)=\mathcal {G}(u^\dag )\). Hence, \(u_n\rightharpoonup u^\dag \) implies that \(d_{\mathcal {G}}^{p^\dag }(u_n,u^\dag )\to 0\) for any \(p^\dag \in \partial \mathcal {G}(u^\dag )\). By the pointwise characterization (9) and the nonnegativity of Bregman distances, this implies that \(d_g^{p^\dag (x)}(u_n(x),u^\dag (x))\to 0\) for almost every x ∈ Ω. Choosing now \(p^\dag \in \partial \mathcal {G}(u^\dag )\) such that (13) holds for q † = p †(x) and v † = u †(x) almost everywhere, the claim follows from Lemma 1. □
Since u n (x) ∈ [u 1, u d ] by construction, the subsequence \(\{u_n\}_{n\in \mathbb {N}}\) is bounded in L ∞( Ω) and hence also converges strongly in L p( Ω) for any 1 ≤ p < ∞ by Lebesgue’s dominated convergence theorem. We remark that since Lemma 1 applied to u n (x) and u †(x) does not hold uniformly in Ω, we cannot expect that the convergence rates from Propositions 4 and 5 hold pointwise or strongly as well.
5 Structure of Minimizers
We now briefly discuss the structure of reconstructions obtained by minimizing the Tikhonov functional in (7) for given y δ ∈ Y and fixed α > 0, based on the necessary optimality conditions for (7). Since the discrepancy term is convex and differentiable, we can apply the sum rule for convex subdifferentials. Furthermore, the standard calculus for Fenchel conjugates and subdifferentials (see, e.g., [22]) yields for \(\mathcal {G}_\alpha :=\alpha \mathcal {G}\) that \(\mathcal {G}_\alpha ^*(p)=\alpha \mathcal {G}^*(\alpha ^{-1}p)\) and hence that \(p\in \partial \mathcal {G}_\alpha (u)\) if and only if \(u\in \partial \mathcal {G}_\alpha ^*(p)=\partial \mathcal {G}^*(\tfrac 1\alpha p)\). We thus obtain as in [8] that \(\bar u:=u_\alpha ^\delta \in L^2(\Omega )\) is a solution to (7) if and only if there exists a \(\bar p\in L^2(\Omega )\) satisfying
for
Here we have made use of the pointwise characterization in (4) and reformulated the case distinction in terms of \(\bar p(x)\) instead of \(\frac 1\alpha \bar p(x)\).
First, we obtain directly from (15) the desired structure of the reconstruction \(\bar u\): Apart from a singular set
we always have \(\bar u(x) \in \{u_1,\dots ,u_d\}\). For operators K where K ∗ w cannot be constant on a set of positive measure unless w = 0 locally (as is the case for many operators involving solutions to partial differential equations; see [8, Prop. 2.3]) and y δ∉ranK, the singular set \(\mathcal {S}\) has zero measure and hence the “multi-bang” structure \(\bar u\in \{u_1,\dots ,u_d\}\) almost everywhere can be guaranteed a priori for any α > 0.
Furthermore, we point out that the regularization parameter α only enters via the case distinction. In particular, increasing α shifts the conditions on \(\bar u(x)\) such that the smaller values among the u i become more preferred. In fact, if \(\bar p\) is bounded, we can expect that there exists an α 0 > 0 such that \(\bar u\equiv u_1\) for all α > α 0. Conversely, for α → 0, the second line of (15) reduces to
i.e., (15) coincides with the well-known optimality conditions for bang-bang control problems; see, e.g., [25, Lem. 2.26]. Since in the context of inverse problems, we only have α = α(δ) → 0 if δ → 0, the limit system (15) will contain consistent data and hence \(\bar p\equiv 0\). This allows recovery of u †(x) ∈{u 2, …, u d−1} on a set of positive measure, consistent with Theorem 3. However, if u †(x) ∈{u 1, …, u d } does not hold almost everywhere, we can only expect weak and not strong convergence, cf. [10, Prop. 5.10 (ii)].
6 Numerical Solution
In this section we address the numerical solution of the Tikhonov minimization problem (7) for given y δ ∈ Y and α > 0, following [9]. For the sake of presentation, we omit the dependence on α and δ from here on. We start from the necessary (and, due to convexity, sufficient) optimality conditions (15). To apply a semismooth Newton method, we replace the subdifferential inclusion \(\bar u\in \partial \mathcal {G}_\alpha ^*(\bar p)\) by its single-valued Moreau–Yosida regularization, i.e., we consider for γ > 0 the regularized optimality conditions
The Moreau–Yosida regularization can also be expressed as
for
see, e.g., [3, Props. 13.21, 12.29]. This implies that for (u γ , p γ ) satisfying (16), u γ is a solution to the strictly convex problem
so that existence of a solution can be shown by the same arguments as for (7). Note that by regularizing the conjugate subdifferential, we have not smoothed the nondifferentiability but merely made the functional (more) strongly convex. The regularization of \(\mathcal {G}_\alpha ^*\) instead of \(\mathcal {G}^*\) also ensures that the regularization is robust for α → 0. From [9, Prop. 4.1], we obtain the following convergence result.
Proposition 6
The family {u γ } γ>0 satisfying (16) contains at least one subsequence \(\{u_{\gamma _n}\}_{n\in \mathbb {N}}\) converging to a global minimizer of (7) as n →∞. Furthermore, for any such subsequence, the convergence is strong.
From [11, Appendix A.2] we further obtain the pointwise characterization
where
Since H γ is a superposition operator defined by a Lipschitz continuous and piecewise differentiable scalar function, H γ is Newton-differentiable from L r( Ω) → L 2( Ω) for any r > 2; see, e.g., [18, Example 8.12] or [26, Theorem 3.49]. A Newton derivative at p in direction h is given pointwise almost everywhere by
Hence if the range of K ∗ embeds into L r( Ω) for some r > 2 (which is the case, e.g., for many convolution operators and solution operators for partial differential equations) and the semismooth Newton step is uniformly invertible, the corresponding Newton iteration converges locally superlinearly. We address this for the concrete example considered in the next section. In practice, the local convergence can be addressed by embedding the Newton method into a continuation strategy, i.e., starting for γ large and then iteratively reducing γ, using the previous solution as a starting point.
7 Numerical Examples
We illustrate the proposed approach for an inverse source problem for the Poisson equation, i.e., we choose K = A −1 : L 2( Ω) → L 2( Ω) for Ω = [0, 1]2 and A = − Δ together with homogeneous boundary conditions. We note that since Ω is a Lipschitz domain, we have that \({{\mathrm {ran}}} A^{-*}={{\mathrm {ran}}} A^{-1} = H^2(\Omega )\cap H^1_0(\Omega )\), and hence this operator satisfies the conditions discussed in Sect. 5 that guarantee that \(u_\alpha ^\delta (x)\in \{u_1,\dots ,u_d\}\) almost everywhere if y δ∉ranK; see [8, Prop. 2.3]. For the computational results below, we use a finite element discretization on a uniform triangular grid with 256 × 256 vertices.
The specific form of K can be used to reformulate the optimality condition (and hence the Newton system) into a more convenient form. Introducing y γ = A −1 u γ and eliminating u γ using the second relation of (16), we obtain as in [8] the equivalent system
Setting \(V:=H^1_0(\Omega )\), we can consider this as an equation from V × V to V ∗× V ∗, which due to the embedding V ↪L p( Ω) for p > 2 provides the necessary norm gap for Newton differentiability of H γ . By the chain rule for Newton derivatives from, e.g., [18, Lem. 8.4], the corresponding Newton step therefore consists of solving for (δy, δp) ∈ V × V given (y k, p k) ∈ V × V in
and setting
Note that the reformulated Newton matrix is symmetric, which in general is not the case for nonsmooth equations. Following [8, Prop. 4.3], the Newton step (18) is uniformly boundedly invertible, from which local superlinear convergence to a solution of (17) follows.
In practice, we include the continuation strategy described above as well as a simple backtracking line search based on the residual norm in (17) to improve robustness. Since the forward operator is linear and H γ is piecewise linear, the semi-smooth Newton method has the following finite termination property: If H γ (p k+1) = H γ (p k), then (y k+1, p k+1) satisfy (17); cf. [18, Rem. 7.1.1]. We then recover u k+1 = H γ (p k+1). In the implementation, we also terminate if more than 100 Newton iterations are performed, in which case the continuation is also terminated and the last successful iterate is returned. Otherwise we terminate if γ < 10−12. In all results reported below, the continuation is terminated successfully. The implementation of this approach used to obtain the following results can be downloaded from https://github.com/clason/discreteregularization.
The first example illustrates the convergence behavior of the Tikhonov regularization. Here, the true parameter is chosen as
for (u 1, u 2, u 3) = (0, 0.1, 0.15); see Fig. 2a. (This might correspond to, e.g., material properties of background, healthy tissue, and tumor, respectively.) The noisy data is constructed pointwise via
where ξ is a vector of identically and independently normally distributed random variables with mean 0 and variance 1, and \(\tilde \delta \in \{2^0,\dots ,2^{-20}\}\). For each value of \(\tilde \delta \), the corresponding regularization parameter α is chosen according to the discrepancy principle (12) with τ = 1.1. Details on the convergence history are reported in Table 1, which shows the effective noise level δ := ∥ y δ − y †∥2, the parameter α selected as satisfying the Morozov discrepancy principle, the L 2-error \(e_2:=\| u_{\alpha }^\delta -u^\dag \|{ }_2\) and the L ∞-error \(e_\infty :=\| u_\alpha ^\delta -u^\dag \|{ }_\infty \). First, we note that the a posteriori choice approximately follows the a priori choice α ∼ δ. Similarly, for larger values of δ, the L 2-error behaves as e 2 ∼ δ, which is no longer true for δ → 0 (and cannot be expected due to the nonsmooth regularization). The L ∞-error e ∞ is initially dominated by the jump in admissible parameter values: As long as there is a single point x ∈ Ω with \(u_\alpha ^\delta (x) = u_i \neq u_j = u^\dag (x)\), we necessarily have e ∞ ≥min1≤i<d u i+1 − u i . (Recall that we do not have a convergence rate and thus an error bound for pointwise convergence.) Later, e ∞ becomes smaller than this threshold value, which indicates that apart from points in the regularized singular set (i.e., where \(p_\gamma (x)\in Q^\gamma _{i,i+1}\), which in these cases happens for 20 out of 256 × 256 vertices), the reconstruction is exact. Here we point out that since γ is independent of α, the Moreau–Yosida regularization for fixed γ becomes more and more active as α → 0. Nevertheless, in all cases γ ≪ α, and hence the multi-bang regularization dominates.
The pointwise convergence can also be seen clearly from Fig. 2, which shows the true parameter u † together with three representative reconstructions for different noise levels. It can be seen that for large noise, the corresponding large regularization suppresses the smaller inclusion; see Fig. 2b. This is consistent with the discussion at the end of Sect. 5. For smaller noise, the inclusion is recovered well (Fig. 2c), and for δ ≈ 3.69 × 10−4, the reconstruction is visually indistinguishable from the true parameter (Fig. 2d).
The behavior is essentially the same if we set (u 1, u 2, u 3) = (0, 0.1, 0.11) in (19) (i.e., a contrast of 10% instead of 50% for the inner inclusion), demonstrating the robustness of the multi-bang regularization; see Fig. 3 and Table 2.
To illustrate the behavior if the true parameter does not satisfy the assumption u †∈{u 1, …, u d } almost everywhere, we repeat the above for
with (u 1, u 2, u 3) = (0, 0.1, 0.12); see Fig. 4a. While for large noise level and regularization parameter value, the multi-bang regularization behaves as before (see Fig. 4b), the reconstruction for smaller noise and regularization (Fig. 4c) shows the typical checkerboard pattern expected from weak but not strong convergence; cf. [8, Rem. 4.2]. Nevertheless, as δ → 0, we still observe convergence to the true parameter; see Fig. 4d and Table 3.
Finally, we address the qualitative dependence of the reconstruction on the regularization parameter α. Figure 5 shows reconstructions for the true parameter u † from (19) again with (u 1, u 2, u 3) = (0, 0.1, 0.15) for an effective noise level δ ≈ 0.759 and different values of α. First, Fig. 5b presents the reconstruction for the value α = 1.25 × 10−3, where as before the volume corresponding to u 2 is reduced and the inner inclusion corresponding to u 3 is suppressed completely. If the parameter is chosen smaller as α = 10−4, however, the reconstruction of the outer volume is essentially correct, while the inner inclusion—although reduced—is also localized well; see Fig. 5c. Visually, this value yields a better reconstruction than the one obtained by the discrepancy principle. The trade-off is a loss of spatial regularity, manifested in more irregular level lines, which becomes even more pronounced for smaller α = 10−5; see Fig. 5d. This behavior is surprising insofar that the pointwise definition of the multi-bang penalty itself imposes no spatial regularity on the reconstruction at all; as is evident from (15), any regularity of the solution \(\bar u\) is solely due to that of the level sets of \(\bar p\) (which in this case has the regularity of a solution to a Poisson equation).
8 Conclusion
Reconstructions in inverse problems that take on values from a given discrete admissible set can be promoted via a convex penalty that leads to a convergent regularization method. While convergence rates can be shown with respect to the usual Bregman distance, if the true parameter to be reconstructed takes on values only from the admissible set, the convergence (albeit without rates) is actually pointwise. A semismooth Newton method allows the efficient and robust computation of Tikhonov minimizers.
This work can be extended in several directions. First, Fig. 5 demonstrates that regularization parameters chosen according to the discrepancy principle are not optimal with respect to the visual reconstruction quality. This motivates the development of new, heuristic, parameter choice rules that are adapted to the discrete-valued, pointwise, nature of the multi-bang penalty. It would also be interesting to investigate whether an active set condition in the spirit of [28, 29] based on (13) can be used to obtain strong or pointwise convergence rates. A natural further step is the extension to nonlinear parameter identification problems, making use of the results of [9]. Finally, Fig. 5c, d suggest combining the multi-bang penalty with a total variation penalty to also promote regularity of the level lines of the reconstruction. The resulting problem is challenging both analytically and numerically, but would open up the possibility of application to electrical impedance tomography, which can be formulated as parameter identification problem for the diffusion coefficient in an elliptic equation.
References
E. Bae, X.C. Tai, Graph cut optimization for the piecewise constant level set method applied to multiphase image segmentation (Springer, Berlin/Heidelberg, 2009), pp. 1–13, http://doi.org/10.1007/978-3-642-02256-2_1
V. Barbu, T. Precupanu, Convexity and Optimization in Banach Spaces. Springer Monographs in Mathematics, 4th edn. (Springer, Dordrecht, 2012). http://doi.org/10.1007/978-94-007-2247-7
H.H. Bauschke, P.L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC (Springer, New York, 2011). http://doi.org/10.1007/978-1-4419-9467-7
M. Bergounioux, F. Tröltzsch, Optimality conditions and generalized bang-bang principle for a state-constrained semilinear parabolic problem. Numer. Funct. Anal. Optim. 17(5–6), 517–536 (1996). htttp://doi.org/10.1080/01630569608816708
L.M. Bregman, The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)
M. Burger, S. Osher, Convergence rates of convex variational regularization. Inverse Prob. 20(5), 1411 (2004). http://doi.org/10.1088/0266-5611/20/5/005
X. Cai, R. Chan, T. Zeng, A two-stage image segmentation method using a convex variant of the Mumford–Shah model and thresholding. SIAM J. Imag. Sci. 6(1), 368–390 (2013). http://doi.org/10.1137/120867068
C. Clason, K. Kunisch, Multi-bang control of elliptic systems. Ann. Inst. H. Poincaré Anal. Non Linéaire 31(6), 1109–1130 (2014). http://doi.org/10.1016/j.anihpc.2013.08.005
C. Clason, K. Kunisch, A convex analysis approach to multi-material topology optimization. ESAIM: Math. Model. Numer. Anal. 50(6), 1917–1936 (2016). http://doi.org/10.1051/m2an/2016012
C. Clason, C. Tameling, B. Wirth, Vector-valued multibang control of differential equations (2016). arXiv 1611(07853). http://www.arxiv.org/abs/1611.07853
C. Clason, K. Ito, K. Kunisch, A convex analysis approach to optimal controls with switching structure for partial differential equations. ESAIM Control, Optimisation and Calculus of Variations 22(2), 581–609 (2016). http://doi.org/10.1051/cocv/2015017
I. Ekeland, R. Témam, Convex Analysis and Variational Problems. Classics in Applied Mathematics, vol. 28 (SIAM, Philadelphia, 1999). http://doi.org/10.1137/1.9781611971088
J. Flemming, B. Hofmann, Convergence rates in constrained Tikhonov regularization: equivalence of projected source conditions and variational inequalities. Inverse Prob. 27(8), 085001 (2011). http://doi.org/10.1088/0266-5611/27/8/085001
B. Goldluecke, D. Cremers, Convex relaxation for multilabel problems with product label spaces (Springer, Berlin/Heidelberg, 2010), pp. 225–238. http://doi.org/10.1007/978-3-642-15555-0_17
B. Hofmann, B. Kaltenbacher, C. Pöschl, O. Scherzer, A convergence rates result for Tikhonov regularization in Banach spaces with non-smooth operators. Inverse Prob. 23(3), 987–1010 (2007). http://doi.org/10.1088/0266-5611/23/3/009
H. Ishikawa, Exact optimization for Markov random fields with convex priors. IEEE Trans. Pattern Anal. Mach. Intell. 25(10), 1333–1336 (2003). http://doi.org/10.1109/TPAMI.2003.1233908
K. Ito, B. Jin, Inverse Problems: Tikhonov Theory and Algorithms. Series on Applied Mathematics, vol. 22 (World Scientific, Singapore, 2014). http://doi.org/10.1142/9789814596206_0001
K. Ito, K. Kunisch, Lagrange Multiplier Approach to Variational Problems and Applications, Advances in Design and Control, vol. 15 (SIAM, Philadelphia, PA, 2008). http://doi.org/10.1137/1.9780898718614
J. Lellmann, C. Schnörr, Continuous multiclass labeling approaches and algorithms. SIAM J. Imag. Sci. 4(4), 1049–1096 (2011). http://doi.org/10.1137/100805844
E. Resmerita, Regularization of ill-posed problems in Banach spaces: convergence rates. Inverse Prob. 21(4), 1303 (2005). http://doi.org/10.1088/0266-5611/21/4/007
O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, F. Lenzen, Variational Methods in Imaging. Applied Mathematical Sciences, vol. 167 (Springer, Cham, 2009). http://doi.org/10.1007/978-0-387-69277-7
W. Schirotzek, Nonsmooth Analysis. Universitext (Springer, Berlin, 2007). http://doi.org/10.1007/978-3-540-71333-3
T. Schuster, B. Kaltenbacher, B. Hofmann, K.S. Kazimierski, Regularization Methods in Banach Spaces. Radon Series on Computational and Applied Mathematics, vol. 10 (De Gruyter, Berlin, 2012). http://doi.org/10.1515/9783110255720
F. Tröltzsch, A minimum principle and a generalized bang-bang principle for a distributed optimal control problem with constraints on control and state. Z. Angew. Math. Mech. 59(12), 737–739 (1979). http://doi.org/10.1002/zamm.19790591208
F. Tröltzsch, Optimal Control of Partial Differential Equations: Theory, Methods and Applications, Translated from the German by Jürgen Sprekels (American Mathematical Society, Providence, 2010). http://doi.org/10.1090/gsm/112
M. Ulbrich, Semismooth Newton Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces. MOS-SIAM Series on Optimization, vol. 11 (SIAM, Philadelphia, PA, 2011). http://doi.org/10.1137/1.9781611970692
L.A. Vese, T.F. Chan, A multiphase level set framework for image segmentation using the Mumford and Shah model. Int. J. Comput. Vis. 50(3), 271–293 (2002). http://doi.org/10.1023/A:1020874308076
D. Wachsmuth, G. Wachsmuth, Regularization error estimates and discrepancy principle for optimal control problems with inequality constraints. Control. Cybern. 40(4), 1125–1158 (2011)
G. Wachsmuth, D. Wachsmuth, Convergence and regularization results for optimal control problems with sparsity functional. ESAIM Control Optim. Calc. Var. 17(3), 858–886 (2011). http://doi.org/10.1051/cocv/2010027
Acknowledgements
This work was supported by the German Science Fund (DFG) under grant CL 487/1-1. The authors also wish to thank Daniel Wachsmuth for several helpful remarks.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Clason, C., Do, T.B.T. (2018). Convex Regularization of Discrete-Valued Inverse Problems. In: Hofmann, B., Leitão, A., Zubelli, J. (eds) New Trends in Parameter Identification for Mathematical Models. Trends in Mathematics. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-70824-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-70824-9_2
Published:
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-70823-2
Online ISBN: 978-3-319-70824-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)