1 Introduction

We consider Tikhonov regularization of inverse problems, where the unknown parameter to be reconstructed is a distributed function that only takes on values from a given discrete set (i.e., the values are known, but not in which points they are attained). Such problems can occur, e.g., in nondestructive testing or medical imaging; a similar task also arises as a sub-step in segmentation or labelling problems in image processing. The question we wish to address here is the following: If such strong a priori knowledge is available, how can it be incorporated in an efficient manner? Specifically, if X and Y are function spaces, F : X → Y denotes the parameter-to-observation mapping, and y δ ∈ Y is the given noisy data, we would wish to solve the constrained Tikhonov functional

$$\displaystyle \begin{aligned} \min_{u\in U} \frac 12\|\,F(u) - y^\delta \|{}_Y \end{aligned} $$
(1)

for

$$\displaystyle \begin{aligned} U := \left\{u\in X:u \in \{u_1,\dots,u_d\} \text{ pointwise}\right\}, \end{aligned} $$
(2)

where \(u_1,\dots ,u_d\in \mathbb {R}\) are the known parameter values. However, this set is nonconvex, and hence the functional in (1) is not weakly lower-semicontinuous and can therefore not be treated by standard techniques. (In particular, it will in general not admit a minimizer.) A common strategy to deal with such problems is by convex relaxation, i.e., replacing U by its convex hull

$$\displaystyle \begin{aligned} {{\mathrm{co}}} U = \left\{u\in X:u \in [u_1,u_d] \text{ pointwise}\right\}. \end{aligned}$$

This turns (1) into a classical bang-bang problem, whose solution is known to generically take on only the values u 1 or u d ; see, e.g., [4, 24]. If d > 2, intermediate parameter values are therefore lost in the reconstruction. (Here we would like to remark that a practical regularization should not only converge as the noise level tends to zero but also yield informative reconstructions for fixed—and ideally, a large range of—noise levels.) As a remedy, we propose to add a convex regularization term that promotes reconstructions in U (rather than merely in coU) for the convex relaxation. Specifically, we choose the convex integral functional

$$\displaystyle \begin{aligned} \mathcal{G}:X\to\mathbb{R},\qquad \mathcal{G}(u) := \int g(u(x))\,dx, \end{aligned}$$

for a convex integrand \(g:\mathbb {R}\to \mathbb {R}\) with a polyhedral epigraph whose vertices correspond to the known parameter values u 1, …, u d . Just as in L 1 regularization for sparsity (and in linear optimization), it can be expected that minimizers are found at the vertices, thus yielding the desired structure.

This approach was first introduced in [8] in the context of linear optimal control problems for partial differential equations, where the so-called multi-bang (as a generalization of bang-bang) penalty \(\mathcal {G}\) was obtained as the convex envelope of a (nonconvex) L 0 penalization of the constraint u ∈ U. The application to nonlinear control problems and the limit as the L 0 penalty parameter tends to infinity were considered in [9], and our particular choice of \(\mathcal {G}\) is based on this work. The extension of this approach to vector-valued control problems was carried out in [10].

Our goal here is therefore to investigate the use of the multi-bang penalty from [9] as a regularization term in inverse problems, in particular addressing convergence and convergence rates as the noise level and the regularization parameter tend to zero. Due to the convexity of the penalty, these follow from standard results on convex regularization if convergence is considered with respect to the Bregman distance. The main contribution of this work is to show that due to the structure of the pointwise penalty, this convergence can be shown to actually hold pointwise. Since the focus of our work is the novel convex regularization term, we restrict ourselves to linear problems for the sake of presentation. However, all results carry over in a straightforward fashion to nonlinear problems. Finally, we describe following [8, 9] the computation of Tikhonov minimizers using a path-following semismooth Newton method.

Let us briefly mention other related literature. Regularization with convex nonsmooth functionals is now a widely studied problem, and we only refer to the monographs [17, 21, 23] as well as the seminal works [6, 13, 15, 20]. To the best of our knowledge, this is the first work treating regularization of general inverse problems with discrete-valued distributed parameters. As mentioned above, similar problems occur frequently in image segmentation or, more generally, image labelling problems. The former are usually treated by (multi-phase) level set methods [27] or by a combination of total variation minimization and thresholding [7]. More general approaches to image labelling problems are based on graph-cut algorithms [1, 16] or, more recently, vector-valued convex relaxation [14, 19]. Both multi-phase level sets and vector-valued relaxations, however, have the disadvantage that the dimension of the parameter space grows quickly with the number of admissible values, which is not the case in our approach. On the other hand, our approach assumes, similar to [16], a linear ordering of the desired values which is not necessary in the vector-valued case; see also [10].

This work is organized as follows. In Sect. 2, we give the concrete form of the pointwise multi-bang penalty g and summarize its relevant properties. Section 3 is concerned with well-posedness, convergence, and convergence rates of the corresponding Tikhonov regularization. Our main result, the pointwise convergence of the regularized solutions to the true parameter, is the subject of Sect. 4. We also briefly discuss the structure of minimizers for given y δ and fixed α > 0 in Sect. 5. Finally, we address the numerical solution of the Tikhonov minimization problem using a semismooth Newton method in Sect. 6 and apply this approach to an inverse source problem for a Poisson equation in Sect. 7.

2 Multi-Bang Penalty

Let \(u_1<\cdots <u_d\in \mathbb {R}\), d ≥ 2, be the given admissible parameter values and \(\Omega \subset \mathbb {R}^n\), \(n\in \mathbb {N}\), be a bounded domain. Following [9, § 3], we define the corresponding multi-bang penalty

$$\displaystyle \begin{aligned} \mathcal{G}:L^2(\Omega)\to\overline{\mathbb{R}},\qquad \mathcal{G}(u) = \int_\Omega g(u(x))\,dx, \end{aligned}$$

for \(g:\mathbb {R}\to \overline {\mathbb {R}}\) defined by

$$\displaystyle \begin{aligned} g(v) = \begin{cases} \frac 12 \left((u_{i}+u_{i+1})v - u_iu_{i+1}\right) & \text{if }v \in [u_i,u_{i+1}], \quad1\leq i < d,\\ \infty & \text{else}. \end{cases} \end{aligned}$$

(Note that we have now included the convex constraint u ∈coU in the definition of \(\mathcal {G}\).) This choice can be motivated as the convex hull of \(\frac 12\| \cdot \|{ }_{L^2(\Omega )}^2 + \delta _U\), where δ U denotes the indicator function of the set U defined in (2) in the sense of convex analysis, i.e., δ U (u) = 0 if u ∈ U and else; see [9, § 3]. Setting

$$\displaystyle \begin{aligned} g_i(v):= \frac 12 \left((u_{i}+u_{i+1})v - u_iu_{i+1}\right),\qquad 1\leq i<d,\end{aligned} $$

it is straightforward to verify that

$$\displaystyle \begin{aligned} g(v) = \max_{1\leq i <d} g_i(v),\qquad v\in [u_1,u_d],\end{aligned} $$

and hence g is the pointwise supremum of affine functions and therefore convex and continuous on the interior of its effective domain domg = [u 1, u d ].

We can thus apply the sum rule and maximum rule of convex analysis (see, e.g., [22, Props. 4.5.1 and 4.5.2, respectively]), and obtain for the convex subdifferential at v ∈domg that

$$\displaystyle \begin{aligned} \begin{aligned} \partial g(v) &= \partial\left(\max_{1\leq i<d}g_i + \delta_{[u_1,u_d]}\right)(v)\\ &= \partial \left(\max_{1\leq i< d} g_i\right)(v) + \partial\delta_{[u_1,u_d]}(v) \\ &= {{\mathrm{co}}} \left(\bigcup_{i:g(v)=g_i(v)}g^{\prime}_i(v)\right) + \partial\delta_{[u_1,u_d]}(v). \end{aligned}\end{aligned} $$

Using the definition of g i together with the classical characterization of the subdifferential of an indicator function via its normal cone yields the explicit characterization

(3)

In Sects. 5 and 6, we will also make use of the subdifferential of the Fenchel conjugate g of g. Here we can use the fact that g is convex and hence q ∈ ∂g(v) if and only if v ∈ ∂g (q) (see, e.g., [22, Prop. 4.4.4]) to obtain

$$\displaystyle \begin{aligned} \partial g^*(q) \in \begin{cases} \{u_1\} & \text{if }q \in \left(-\infty,\tfrac12(u_1+u_2)\right),\\ {} [u_i,u_{i+1}] &\text{if } q = \tfrac12(u_{i}+u_{i+1}),\qquad\qquad\qquad\quad 1\leq i<d,\\ \{u_i\} &\text{if } q \in \left(\tfrac12(u_{i-1}+u_{i}),\tfrac12(u_{i}+u_{i+1})\right), \quad 1< i<d,\\ \{u_d\} &\text{if } q \in \left(\tfrac12(u_{d-1}+u_d),\infty\right),\\ \emptyset &\text{else.} \end{cases} \end{aligned} $$
(4)

(Note that subdifferentials are always closed.) We illustrate these characterizations for a simple example in Fig. 1.

Fig. 1
figure 1

Structure of pointwise multibang penalty for the choice (u 1, u 2, u 3) = (0, 1, 2). (a) g, (b) ∂g, (c) ∂g

Finally, since g is proper, convex, and lower semi-continuous by construction, the corresponding integral functional \(\mathcal {G}:L^2(\Omega )\to \overline {\mathbb {R}}\) is proper, convex and weakly lower semicontinuous as well; see, e.g., [2, Proposition 2.53]. Furthermore, the subdifferential can be computed pointwise as

$$\displaystyle \begin{aligned} \partial\mathcal{G}(u) = \left\{v\in L^2(\Omega):v(x) \in \partial g(u(x))\quad\text{for almost every }x\in\Omega\right\}, \end{aligned} $$
(5)

see, e.g., [2, Prop. 2.53]. The same is true for the Fenchel conjugate \(\mathcal {G}^*:L^2(\Omega )\to \overline {\mathbb {R}}\) and hence for \(\partial \mathcal {G}^*\) (which is thus an element of L ( Ω) instead of L 2( Ω)); see, e.g., [12, Props. IV.1.2, IX.2.1].

3 Multi-Bang Regularization

We consider for a linear operator K : X → Y between the Hilbert spaces X = L 2( Ω) and Y and exact data y ∈ Y the inverse problem of finding u ∈ X such that

$$\displaystyle \begin{aligned} Ku=y^\dag. \end{aligned} $$
(6)

We assume that K is weakly closed, i.e., \(u_n\rightharpoonup u\) and \(Ku_n\rightharpoonup y\) imply y = Ku. For the sake of presentation, we also assume that (6) admits a solution u ∈ X. Let now y δ ∈ Y be given noisy data with ∥ y δ − y Y  ≤ δ for some noise level δ > 0. The multi-bang regularization of (6) for α > 0 then consists in solving

$$\displaystyle \begin{aligned} \min_{u\in X} \frac 12\| Ku-y^\delta \|{}_Y^2 + \alpha \mathcal{G}(u). \end{aligned} $$
(7)

Since \(\mathcal {G}\) is proper, convex and semi-continuous with bounded effective domain coU, and K is weakly closed, the following results can be proved by standard semi-continuity methods; see also [9, 10].

Proposition 1 (Existence and Uniqueness)

For every α > 0, there exists a minimizer \(u_\alpha ^\delta \) to (7). If K is injective, this minimizer is unique.

Proposition 2 (Stability)

Let \(\{y_n\}_{n\in \mathbb {N}}\subset Y\) be a sequence converging strongly to y δ ∈ Y and α > 0 be fixed. Then the corresponding sequence of minimizers \(\{u_n\}_{n\in \mathbb {N}}\) to (7) contains a subsequence converging weakly to a minimizer \(u^\delta _\alpha \).

We now address convergence for δ → 0. Recall that an element u ∈ X is called a \(\mathcal {G}\)-minimizing solution to (6) if it is a solution to (6) and \(\mathcal {G}(u^\dag )\le \mathcal {G}(u)\) for all solutions u to (6). The following result is standard as well; see, e.g., [17, 21, 23].

Proposition 3 (Convergence)

Let \(\{y^{\delta _n}\}_{n\in \mathbb {N}}\subset Y\) be a sequence of noisy data with \(\|\,y^{\delta _n}-y^\dag \|{ }_Y\leq \delta _n \to 0\) , and choose α n  := α n (δ n ) satisfying

$$\displaystyle \begin{aligned} \lim_{n\to\infty}\frac{\delta_n^2}{\alpha_n}=0 \qquad \mathit{\text{and}}\qquad \lim_{n\to\infty}\alpha_n=0.\end{aligned} $$

Then the corresponding sequence of minimizers \(\{u_{\alpha _n}^{\delta _n}\}_{n\in \mathbb {N}}\) to (7) contains a subsequence converging weakly to a \(\mathcal {G}\) -minimizing solution u .

For convex nonsmooth regularization terms, convergence rates are usually derived in terms of the Bregman distance [5], which is defined for u 1, u 2 ∈ X and \(p_1\in \partial \mathcal {G}(u_1)\) as

$$\displaystyle \begin{aligned} d_{\mathcal{G}}^{p_1}(u_2,u_1)=\mathcal{G}(u_2)-\mathcal{G}(u_1)-\langle \,p_1,u_2-u_1 \rangle_X.\end{aligned} $$

From the convexity of \(\mathcal {G}\), it follows that \(d_{\mathcal {G}}^{p_1}(u_2,u_1)\geq 0\) for all u 2 ∈ X. Furthermore, we have from, e.g., [17, Lem. 3.8] the so-called three-point identity

$$\displaystyle \begin{aligned} d_{\mathcal{G}}^{p_1}(u_3,u_1)=d_{\mathcal{G}}^{p_2}(u_3,u_2)+d_{\mathcal{G}}^{p_1}(u_2,u_1)+(p_2-p_1)(u_3-u_2)\end{aligned} $$
(8)

for any u 1, u 2, u 3 ∈ X and \(p_1\in \mathcal {G}(u_1)\) and \(p_2\in \partial \mathcal {G}(u_2)\). Finally, we point out that due to the pointwise characterization (5) of the subdifferential of the integral functional \(\mathcal {G}\), we have that

$$\displaystyle \begin{aligned} d_{\mathcal{G}}^{p}(u_2,u_1) = \int_\Omega d_g^{p(x)}(u_2(x),u_1(x))dx\end{aligned} $$
(9)

for

$$\displaystyle \begin{aligned} d_g^q(v_2,v_1) = g(v_2)-g(v_1) - q(v_2-v_1).\end{aligned} $$

Standard arguments can then be used to show convergence rates for a priori and a posteriori parameter choice rules under the usual source conditions; see, e.g., [6, 17, 20, 21, 23]. Here we follow the latter and assume that there exists a w ∈ Y such that

$$\displaystyle \begin{aligned} p^\dag:=K^*w\in\partial \mathcal{G}(u^\dag). \end{aligned} $$
(10)

Under the a priori choice rule

$$\displaystyle \begin{aligned} \alpha = c \delta\qquad \text{for some }c>0, \end{aligned} $$
(11)

we obtain the following convergence rate from, e.g., [17, Cor. 3.4].

Proposition 4 (Convergence Rate, A Priori)

Assume that the source condition (10) holds and that α = α(δ) is chosen according to (11). Then there exists a C > 0 such that

$$\displaystyle \begin{aligned} d_{\mathcal{G}}^{p^\dag}(u_{\alpha}^\delta,u^\dag) \leq C\delta. \end{aligned}$$

We obtain the same rate under the classical Morozov discrepancy principle

$$\displaystyle \begin{aligned} \delta<\| Ku_\alpha^\delta-y^\delta \|{}_Y\le \tau\delta, \end{aligned} $$
(12)

for some τ > 1 from, e.g., [17, Thm. 3.15].

Proposition 5 (Convergence Rate, A Posteriori)

Assume that the source condition (10) holds and that α = α(δ) is chosen according to (12). Then there exists a C > 0 such that

$$\displaystyle \begin{aligned} d_{\mathcal{G}}^{p^\dag}(u_{\alpha}^\delta,u^\dag) \leq C\delta. \end{aligned}$$

4 Pointwise Convergence

The pointwise definition (9) of the Bregman distance together with the explicit pointwise characterization (3) of subgradients allows us to show that the convergence in Proposition 3 is actually pointwise if u (x) ∈{u 1, …, u d } almost everywhere. The following lemma provides the central argument for pointwise convergence.

Lemma 1

Let v ∈{u 1, …, u d } and q  ∂g(v ) satisfying

(13)

Furthermore, let \(\{v_n\}_{n\in \mathbb {N}}\subset [u_1,u_d]\) be a sequence with

$$\displaystyle \begin{aligned} d_g^{q^\dag}(v_n,v^\dag)\rightarrow 0. \end{aligned}$$

Then, v n  → v .

Proof

We argue by contraposition: Assume that v n does not converge to v  = u i for some 1 ≤ i ≤ d. Then there exists an ε > 0 such that for every \(n_0\in \mathbb {N}\), there is an n ≥ n 0 with |v n  − v | > ε, i.e., either v n  > u i  + ε or v n  < u i  − ε. We now further discriminate these two cases. (Note that some cases cannot occur if i = 1 or i = d.)

  1. (i)

    v n  > u i+1: Then, v n  ∈ (u k , u k+1] for some k ≥ i + 1. The three point identity (8) yields that

    $$\displaystyle \begin{aligned} d_g^{q^\dag}(v_n,v^\dag)=d_g^{q_{i+1}}(v_n,u_{i+1})+d_g^{q^\dag}(u_{i+1},v^\dag)+(q_{i+1}-q^\dag)(v_n-u_{i+1})\end{aligned} $$

    for q i+1 ∈ ∂g(u i+1). We now estimate each term separately. The first term is nonnegative by the properties of Bregman distances. For the last term, we can use the assumption (13) and the pointwise characterization (3) to obtain

    $$\displaystyle \begin{aligned} q^\dag\in\left(\tfrac{1}{2}(u_i+u_{i-1}),\tfrac{1}{2}(u_i+u_{i+1})\right)\quad\text{and}\quad q_{i+1}\in\left[\tfrac{1}{2}(u_{i+1}+u_i),\tfrac{1}{2}(u_{i+1}+u_{i+2})\right],\end{aligned} $$

    which implies that q i+1 − q  > 0. By assumption we have v n  − u i+1 > 0, which together implies that the last term is strictly positive. For the second term, we can use that v , u i+1 ∈ [u i , u i+1] to simplify the Bregman distance to

    $$\displaystyle \begin{aligned} d_g^{q^\dag}(u_{i+1},v^\dag)=\frac{1}{2}(u_{i+1}-u_i)(u_{i+1}+u_i-2q^\dag) >0,\end{aligned} $$

    again by assumption (13). Since this term is independent of n, we obtain the estimate

    $$\displaystyle \begin{aligned} d_g^{q^\dag}(v_n,v^\dag)>d_g^{q^\dag}(u_{i+1},v^\dag)=:\varepsilon_1>0.\end{aligned} $$
  2. (ii)

    u i  < v n  ≤ u i+1: In this case, we can again simplify

    $$\displaystyle \begin{aligned} d_g^{q^\dag}(v_n,v^\dag)=\frac{1}{2}(u_{i+1}+u_i-2q^\dag)(v_n-v^\dag)>C_1\varepsilon,\end{aligned} $$

    since \(C_1:=\frac {1}{2}(u_{i+1}+u_i-2q^\dag )>0\) by assumption (13) and v n  − v  > ε by hypothesis.

  3. (iii)

    v n  < u i : We argue similarly to either obtain

    $$\displaystyle \begin{aligned} d_g^{q^\dag}(v_n,v^\dag)&>d_g^{q^\dag}(u_{i-1},v^\dag)=:\varepsilon_2>0 \end{aligned} $$

    or

    $$\displaystyle \begin{aligned} d_g^{q^\dag}(v_n,v^\dag)&>C_2\varepsilon \end{aligned} $$

    for \(C_2:=-\frac {1}{2}(u_{i-1}+u_i-2q^\dag )>0\).

Thus if we set \(\tilde \varepsilon :=\min \{\varepsilon _1,\varepsilon _2,C_1\varepsilon ,C_2\varepsilon \}\), for every \(n_0\in \mathbb {N}\) we can find n ≥ n 0 such that \(d_g^{q^\dag }(v_n,v^\dag )>\tilde \varepsilon >0\). Hence, \(d_g^{q^\dag }(v_n,v^\dag )\) cannot converge to 0. □

Assumption (13) can be interpreted as a strict complementarity condition for q and v . Comparing (13) to (3), we point out that such a choice of q is always possible. If v ∉{u 1, …, u d }, on the other hand, convergence in Bregman distance is uninformative.

Lemma 2

Let v ∈ (u i , u i+1) for some 1 ≤ i < d and q  ∂g(v ). Then we have

$$\displaystyle \begin{aligned} d_{\mathcal{G}}^{q^\dag}(v,v^\dag)=0\qquad\mathit{\text{for any}}\quad v\in[u_i,u_{i+1}]. \end{aligned}$$

Proof

By the definition of the Bregman distance and the characterization (3) of ∂g(v ) (which is single-valued under the assumption on v ), we directly obtain

$$\displaystyle \begin{aligned} d_g^{q^\dag}(v,v^\dag)=\frac{1}{2}\left[(u_i+u_{i+1})v-u_iu_{i+1}\right]-\frac{1}{2}[(u_i+u_{i+1})v^\dag-u_iu_{i+1}]\\ -\frac{1}{2}(u_i+u_{i+1})(v-v^\dag)=0 \end{aligned} $$

for any v ∈ [u i , u i+1]. □

Lemma 1 allows us to translate the weak convergence from Proposition 3 to pointwise convergence, which is the main result of our work.

Theorem 1

Assume the conditions of Proposition 3 hold. If u (x) ∈{u 1, …, u d } almost everywhere, the subsequence \(u_{\alpha _n}^{\delta _n}\to u^\dag \) pointwise almost everywhere.

Proof

From Proposition 3, we obtain a subsequence \(\{u_n\}_{n\in \mathbb {N}}\) of \(\{u_{\alpha _n}^{\delta _n}\}_{n\in \mathbb {N}}\) converging weakly to u . Since \(\mathcal {G}\) is convex and lower semicontinuous, we have that

$$\displaystyle \begin{aligned} \mathcal{G}(u^\dag)\leq \liminf_{n\to\infty}\mathcal{G}(u_n)\leq \lim_{n\to\infty}\mathcal{G}(u_n). \end{aligned} $$
(14)

By the minimizing properties of \(\{u_n\}_{n\in \mathbb {N}}\) and the nonnegativity of the discrepancy term, we further obtain that

$$\displaystyle \begin{aligned} \alpha_n\mathcal{G}(u_n) \leq \frac{1}{2}\| Ku_n-y^{\delta_n} \|{}_{Y}^2+\alpha_n\mathcal{G}(u_n)\leq \frac{\delta_n^2}{2}+\alpha_n\mathcal{G}(u^\dag). \end{aligned}$$

Dividing this inequality by α n and passing to the limit n →, the assumption on α n from Proposition 3 yields that

$$\displaystyle \begin{aligned} \lim_{n\to\infty}\mathcal{G}(u_n)\leq \mathcal{G}(u^\dag), \end{aligned}$$

which combined with (14) gives \(\lim _{n\to \infty }\mathcal {G}(u_n)=\mathcal {G}(u^\dag )\). Hence, \(u_n\rightharpoonup u^\dag \) implies that \(d_{\mathcal {G}}^{p^\dag }(u_n,u^\dag )\to 0\) for any \(p^\dag \in \partial \mathcal {G}(u^\dag )\). By the pointwise characterization (9) and the nonnegativity of Bregman distances, this implies that \(d_g^{p^\dag (x)}(u_n(x),u^\dag (x))\to 0\) for almost every x ∈ Ω. Choosing now \(p^\dag \in \partial \mathcal {G}(u^\dag )\) such that (13) holds for q  = p (x) and v  = u (x) almost everywhere, the claim follows from Lemma 1. □

Since u n (x) ∈ [u 1, u d ] by construction, the subsequence \(\{u_n\}_{n\in \mathbb {N}}\) is bounded in L ( Ω) and hence also converges strongly in L p( Ω) for any 1 ≤ p <  by Lebesgue’s dominated convergence theorem. We remark that since Lemma 1 applied to u n (x) and u (x) does not hold uniformly in Ω, we cannot expect that the convergence rates from Propositions 4 and 5 hold pointwise or strongly as well.

5 Structure of Minimizers

We now briefly discuss the structure of reconstructions obtained by minimizing the Tikhonov functional in (7) for given y δ ∈ Y and fixed α > 0, based on the necessary optimality conditions for (7). Since the discrepancy term is convex and differentiable, we can apply the sum rule for convex subdifferentials. Furthermore, the standard calculus for Fenchel conjugates and subdifferentials (see, e.g., [22]) yields for \(\mathcal {G}_\alpha :=\alpha \mathcal {G}\) that \(\mathcal {G}_\alpha ^*(p)=\alpha \mathcal {G}^*(\alpha ^{-1}p)\) and hence that \(p\in \partial \mathcal {G}_\alpha (u)\) if and only if \(u\in \partial \mathcal {G}_\alpha ^*(p)=\partial \mathcal {G}^*(\tfrac 1\alpha p)\). We thus obtain as in [8] that \(\bar u:=u_\alpha ^\delta \in L^2(\Omega )\) is a solution to (7) if and only if there exists a \(\bar p\in L^2(\Omega )\) satisfying

$$\displaystyle \begin{aligned} \left\{\begin{aligned} \bar p & = K^{*}(y^\delta - K\bar u)\\ \bar u &\in \partial\mathcal{G}_\alpha^*(\bar p) := \begin{cases} \{u_i\} & \bar p(x) \in Q_i,\qquad 1\leq i \leq d,\\ {} [u_i,u_{i+1}] &\bar p(x) \in Q_{i,i+1}\quad\ 1\leq i< d. \end{cases} \end{aligned}\right.\end{aligned} $$
(15)

for

$$\displaystyle \begin{aligned} Q_1 &=\left\{q:q <\tfrac\alpha2 (u_1+u_2)\right\},\\ Q_i &=\left\{q:\tfrac\alpha2(u_{i-1}+u_i) < q < \tfrac\alpha2(u_{i}+u_{i+1})\right\},\quad 1<i<d,\\ Q_d &=\left\{q:q >\tfrac\alpha2 (u_{d-1}+u_d)\right\},\\ Q_{i,i+i}&= \left\{q:q = \tfrac\alpha2(u_{i}+u_{i+1})\right\}, \qquad\qquad\qquad\quad 1\leq i < d.\end{aligned} $$

Here we have made use of the pointwise characterization in (4) and reformulated the case distinction in terms of \(\bar p(x)\) instead of \(\frac 1\alpha \bar p(x)\).

First, we obtain directly from (15) the desired structure of the reconstruction \(\bar u\): Apart from a singular set

$$\displaystyle \begin{aligned} \mathcal{S}:=\left\{x\in \Omega:\bar p(x) = \tfrac\alpha2(u_i+u_{i+1})\text{ for some }1\leq i <d\right\}, \end{aligned}$$

we always have \(\bar u(x) \in \{u_1,\dots ,u_d\}\). For operators K where K w cannot be constant on a set of positive measure unless w = 0 locally (as is the case for many operators involving solutions to partial differential equations; see [8, Prop. 2.3]) and y δ∉ranK, the singular set \(\mathcal {S}\) has zero measure and hence the “multi-bang” structure \(\bar u\in \{u_1,\dots ,u_d\}\) almost everywhere can be guaranteed a priori for any α > 0.

Furthermore, we point out that the regularization parameter α only enters via the case distinction. In particular, increasing α shifts the conditions on \(\bar u(x)\) such that the smaller values among the u i become more preferred. In fact, if \(\bar p\) is bounded, we can expect that there exists an α 0 > 0 such that \(\bar u\equiv u_1\) for all α > α 0. Conversely, for α → 0, the second line of (15) reduces to

$$\displaystyle \begin{aligned} \bar u(x) \in \begin{cases} \{u_1\} &\text{if } \bar p(x)<0,\\ \{u_d\} &\text{if } \bar p(x)>0,\\ {} [u_1,u_d] &\text{if }\bar p(x)=0, \end{cases} \end{aligned}$$

i.e., (15) coincides with the well-known optimality conditions for bang-bang control problems; see, e.g., [25, Lem. 2.26]. Since in the context of inverse problems, we only have α = α(δ) → 0 if δ → 0, the limit system (15) will contain consistent data and hence \(\bar p\equiv 0\). This allows recovery of u (x) ∈{u 2, …, u d−1} on a set of positive measure, consistent with Theorem 3. However, if u (x) ∈{u 1, …, u d } does not hold almost everywhere, we can only expect weak and not strong convergence, cf. [10, Prop. 5.10 (ii)].

6 Numerical Solution

In this section we address the numerical solution of the Tikhonov minimization problem (7) for given y δ ∈ Y and α > 0, following [9]. For the sake of presentation, we omit the dependence on α and δ from here on. We start from the necessary (and, due to convexity, sufficient) optimality conditions (15). To apply a semismooth Newton method, we replace the subdifferential inclusion \(\bar u\in \partial \mathcal {G}_\alpha ^*(\bar p)\) by its single-valued Moreau–Yosida regularization, i.e., we consider for γ > 0 the regularized optimality conditions

$$\displaystyle \begin{aligned} \left\{\begin{aligned} p_\gamma & = K^{*}(y^\delta - Ku_\gamma)\\ u_\gamma & = (\partial\mathcal{G}_\alpha^*)_\gamma(p_\gamma). \end{aligned}\right. \end{aligned} $$
(16)

The Moreau–Yosida regularization can also be expressed as

$$\displaystyle \begin{aligned} H_\gamma:=(\partial\mathcal{G}_\alpha^*)_\gamma = \partial(\mathcal{G}_{\alpha,\gamma})^* \end{aligned}$$

for

$$\displaystyle \begin{aligned} \mathcal{G}_{\alpha,\gamma}(u) := \alpha\mathcal{G}(u) + \frac\gamma2\| u \|{}_{L^2(\Omega)}^2, \end{aligned}$$

see, e.g., [3, Props. 13.21, 12.29]. This implies that for (u γ , p γ ) satisfying (16), u γ is a solution to the strictly convex problem

$$\displaystyle \begin{aligned} \min_{u\in L^2(\Omega)} \frac 12\| Ku-y^\delta \|{}_Y^2 + \alpha\mathcal{G}(u) + \frac\gamma2\| u \|{}_{L^2(\Omega)}^2, \end{aligned}$$

so that existence of a solution can be shown by the same arguments as for (7). Note that by regularizing the conjugate subdifferential, we have not smoothed the nondifferentiability but merely made the functional (more) strongly convex. The regularization of \(\mathcal {G}_\alpha ^*\) instead of \(\mathcal {G}^*\) also ensures that the regularization is robust for α → 0. From [9, Prop. 4.1], we obtain the following convergence result.

Proposition 6

The family {u γ } γ>0 satisfying (16) contains at least one subsequence \(\{u_{\gamma _n}\}_{n\in \mathbb {N}}\) converging to a global minimizer of (7) as n ∞. Furthermore, for any such subsequence, the convergence is strong.

From [11, Appendix A.2] we further obtain the pointwise characterization

$$\displaystyle \begin{aligned}[H_\gamma(p)](x) = \begin{cases} u_i & \text{if }p(x)\in Q^\gamma_i,\qquad 1\leq i\leq d,\\ \tfrac1\gamma(p(x)-\tfrac\alpha2(u_i+u_{i+1})) & \text{if }p(x) \in Q^\gamma_{i,i+1},\quad 1\leq i < d, \end{cases} \end{aligned}$$

where

$$\displaystyle \begin{aligned} Q_1^\gamma &= \left\{q:q< \tfrac\alpha2\left((1+ {2\gamma})u_1+u_2\right)\right\},\\ Q_i^\gamma &= \left\{q:\tfrac\alpha2\left(u_{i-1} + (1 + {2\gamma})u_i\right) < q < \tfrac\alpha2\left((1+ {2\gamma})u_i+u_{i+1}\right)\right\} \quad \text{ for } 1<i<d,\\ Q_d^\gamma &= \left\{q:\tfrac\alpha2\left(u_{d-1} + (1 + {2\gamma})u_d\right) < q\right\},\\ Q_{i,i+1}^\gamma &= \left\{q:\tfrac\alpha2\left((1 + {2\gamma})u_i+u_{i+1}\right) \leq q \leq \tfrac\alpha2\left(u_i+(1+ {2\gamma})u_{i+1}\right)\right\} \quad\text{for } 1 \leq i < d. \end{aligned} $$

Since H γ is a superposition operator defined by a Lipschitz continuous and piecewise differentiable scalar function, H γ is Newton-differentiable from L r( Ω) → L 2( Ω) for any r > 2; see, e.g., [18, Example 8.12] or [26, Theorem 3.49]. A Newton derivative at p in direction h is given pointwise almost everywhere by

$$\displaystyle \begin{aligned}[D_N H_\gamma(p)h](x) = \begin{cases} \frac 1\gamma h(x) & \text{if }p(x)\in Q^\gamma_{i,i+1},\quad 1\leq i < d,\\ 0 &\text{else.} \end{cases} \end{aligned}$$

Hence if the range of K embeds into L r( Ω) for some r > 2 (which is the case, e.g., for many convolution operators and solution operators for partial differential equations) and the semismooth Newton step is uniformly invertible, the corresponding Newton iteration converges locally superlinearly. We address this for the concrete example considered in the next section. In practice, the local convergence can be addressed by embedding the Newton method into a continuation strategy, i.e., starting for γ large and then iteratively reducing γ, using the previous solution as a starting point.

7 Numerical Examples

We illustrate the proposed approach for an inverse source problem for the Poisson equation, i.e., we choose K = A −1 : L 2( Ω) → L 2( Ω) for Ω = [0, 1]2 and A = − Δ together with homogeneous boundary conditions. We note that since Ω is a Lipschitz domain, we have that \({{\mathrm {ran}}} A^{-*}={{\mathrm {ran}}} A^{-1} = H^2(\Omega )\cap H^1_0(\Omega )\), and hence this operator satisfies the conditions discussed in Sect. 5 that guarantee that \(u_\alpha ^\delta (x)\in \{u_1,\dots ,u_d\}\) almost everywhere if y δ∉ranK; see [8, Prop. 2.3]. For the computational results below, we use a finite element discretization on a uniform triangular grid with 256 × 256 vertices.

The specific form of K can be used to reformulate the optimality condition (and hence the Newton system) into a more convenient form. Introducing y γ  = A −1 u γ and eliminating u γ using the second relation of (16), we obtain as in [8] the equivalent system

$$\displaystyle \begin{aligned} \left\{\begin{aligned} A^*p_\gamma + y_\gamma - y^\delta &=0,\\ Ay_\gamma - H_\gamma(p_\gamma) &=0. \end{aligned}\right.\end{aligned} $$
(17)

Setting \(V:=H^1_0(\Omega )\), we can consider this as an equation from V × V to V × V , which due to the embedding V ↪L p( Ω) for p > 2 provides the necessary norm gap for Newton differentiability of H γ . By the chain rule for Newton derivatives from, e.g., [18, Lem. 8.4], the corresponding Newton step therefore consists of solving for (δy, δp) ∈ V × V given (y k, p k) ∈ V × V in

$$\displaystyle \begin{aligned} \begin{pmatrix} {{\mathrm{Id}}} & A^*\\ A & -D_N H_\gamma(p^k) \end{pmatrix} \begin{pmatrix}\delta y \\ \delta p\end{pmatrix} = -\begin{pmatrix} A^*p^k + y - y^\delta\\ Ay^k - H_\gamma(p^k) \end{pmatrix}\end{aligned} $$
(18)

and setting

$$\displaystyle \begin{aligned} y^{k+1} = y^k + \delta y,\qquad p^{k+1} = p^k + \delta p.\end{aligned} $$

Note that the reformulated Newton matrix is symmetric, which in general is not the case for nonsmooth equations. Following [8, Prop. 4.3], the Newton step (18) is uniformly boundedly invertible, from which local superlinear convergence to a solution of (17) follows.

In practice, we include the continuation strategy described above as well as a simple backtracking line search based on the residual norm in (17) to improve robustness. Since the forward operator is linear and H γ is piecewise linear, the semi-smooth Newton method has the following finite termination property: If H γ (p k+1) = H γ (p k), then (y k+1, p k+1) satisfy (17); cf. [18, Rem. 7.1.1]. We then recover u k+1 = H γ (p k+1). In the implementation, we also terminate if more than 100 Newton iterations are performed, in which case the continuation is also terminated and the last successful iterate is returned. Otherwise we terminate if γ < 10−12. In all results reported below, the continuation is terminated successfully. The implementation of this approach used to obtain the following results can be downloaded from https://github.com/clason/discreteregularization.

The first example illustrates the convergence behavior of the Tikhonov regularization. Here, the true parameter is chosen as

$$\displaystyle \begin{aligned} \begin{aligned}[b] u^\dag(x) = u_1 &+ u_2 \,\chi_{\{x:(x_1 - 0.45)^2 + (x_2 - 0.55)^2 < 0.1\}}(x)\\ &+ (u_3-u_2)\, \chi_{\{x:(x_1 - 0.4)^2 + (x_2 - 0.6)^2 < 0.02\}}(x) \end{aligned} \end{aligned} $$
(19)

for (u 1, u 2, u 3) = (0, 0.1, 0.15); see Fig. 2a. (This might correspond to, e.g., material properties of background, healthy tissue, and tumor, respectively.) The noisy data is constructed pointwise via

$$\displaystyle \begin{aligned} y^\delta = y^\dag + (\tilde\delta \|\,y^\dag \|{}_{\infty})\xi, \end{aligned}$$

where ξ is a vector of identically and independently normally distributed random variables with mean 0 and variance 1, and \(\tilde \delta \in \{2^0,\dots ,2^{-20}\}\). For each value of \(\tilde \delta \), the corresponding regularization parameter α is chosen according to the discrepancy principle (12) with τ = 1.1. Details on the convergence history are reported in Table 1, which shows the effective noise level δ := ∥ y δ − y 2, the parameter α selected as satisfying the Morozov discrepancy principle, the L 2-error \(e_2:=\| u_{\alpha }^\delta -u^\dag \|{ }_2\) and the L -error \(e_\infty :=\| u_\alpha ^\delta -u^\dag \|{ }_\infty \). First, we note that the a posteriori choice approximately follows the a priori choice α ∼ δ. Similarly, for larger values of δ, the L 2-error behaves as e 2 ∼ δ, which is no longer true for δ → 0 (and cannot be expected due to the nonsmooth regularization). The L -error e is initially dominated by the jump in admissible parameter values: As long as there is a single point x ∈ Ω with \(u_\alpha ^\delta (x) = u_i \neq u_j = u^\dag (x)\), we necessarily have e ≥min1≤i<d u i+1 − u i . (Recall that we do not have a convergence rate and thus an error bound for pointwise convergence.) Later, e becomes smaller than this threshold value, which indicates that apart from points in the regularized singular set (i.e., where \(p_\gamma (x)\in Q^\gamma _{i,i+1}\), which in these cases happens for 20 out of 256 × 256 vertices), the reconstruction is exact. Here we point out that since γ is independent of α, the Moreau–Yosida regularization for fixed γ becomes more and more active as α → 0. Nevertheless, in all cases γ ≪ α, and hence the multi-bang regularization dominates.

Fig. 2
figure 2

True parameter u for u 3 = 0.15 and reconstructions \(u_\alpha ^\delta \) for different values of δ. (a) u . (b) \(u_\alpha ^\delta \) for δ ≈ 1.89 × 10−1. (c) \(u_\alpha ^\delta \) for δ ≈ 2.37 × 10−2. (d) \(u_\alpha ^\delta \) for δ ≈ 3.69 × 10−4

Table 1 Convergence behavior as δ → 0 for u 3 = 0.15: noise level δ, regularization parameter α, L 2-error e 2, L -error e

The pointwise convergence can also be seen clearly from Fig. 2, which shows the true parameter u together with three representative reconstructions for different noise levels. It can be seen that for large noise, the corresponding large regularization suppresses the smaller inclusion; see Fig. 2b. This is consistent with the discussion at the end of Sect. 5. For smaller noise, the inclusion is recovered well (Fig. 2c), and for δ ≈ 3.69 × 10−4, the reconstruction is visually indistinguishable from the true parameter (Fig. 2d).

The behavior is essentially the same if we set (u 1, u 2, u 3) = (0, 0.1, 0.11) in (19) (i.e., a contrast of 10% instead of 50% for the inner inclusion), demonstrating the robustness of the multi-bang regularization; see Fig. 3 and Table 2.

Fig. 3
figure 3

True parameter u for u 3 = 0.11 and reconstructions \(u_\alpha ^\delta \) for different values of δ. (a) u . (b) \(u_\alpha ^\delta \) for δ ≈ 1.68 × 10−1. (c) \(u_\alpha ^\delta \) for δ ≈ 2.17 × 10−2. (d) \(u_\alpha ^\delta \) for δ ≈ 3.29 × 10−4

Table 2 Convergence behavior as δ → 0 for u 3 = 0.11: noise level δ, regularization parameter α, L 2-error e 2, L -error e

To illustrate the behavior if the true parameter does not satisfy the assumption u ∈{u 1, …, u d } almost everywhere, we repeat the above for

$$\displaystyle \begin{aligned} \begin{aligned}[b] u^\dag(x) = u_1 &+ u_2 \,\chi_{\{x:(x_1 - 0.45)^2 + (x_2 - 0.55)^2 < 0.1\}}(x)\\ &+ (u_3-u_2)(1-x_1)\, \chi_{\{x:(x_1 - 0.4)^2 + (x_2 - 0.6)^2 < 0.02\}}(x) \end{aligned} \end{aligned}$$

with (u 1, u 2, u 3) = (0, 0.1, 0.12); see Fig. 4a. While for large noise level and regularization parameter value, the multi-bang regularization behaves as before (see Fig. 4b), the reconstruction for smaller noise and regularization (Fig. 4c) shows the typical checkerboard pattern expected from weak but not strong convergence; cf. [8, Rem. 4.2]. Nevertheless, as δ → 0, we still observe convergence to the true parameter; see Fig. 4d and Table 3.

Fig. 4
figure 4

True parameter u and reconstructions \(u_\alpha ^\delta \) for different values of δ. (a) u . (b) \(u_\alpha ^\delta \) for δ ≈ 2.11 × 10−2. (c) \(u_\alpha ^\delta \) for δ ≈ 3.29 × 10−4. (d) \(u_\alpha ^\delta \) for δ ≈ 1.29 × 10−6

Table 3 Convergence behavior as δ → 0 for u : noise level δ, regularization parameter α, L 2-error e 2, L -error e

Finally, we address the qualitative dependence of the reconstruction on the regularization parameter α. Figure 5 shows reconstructions for the true parameter u from (19) again with (u 1, u 2, u 3) = (0, 0.1, 0.15) for an effective noise level δ ≈ 0.759 and different values of α. First, Fig. 5b presents the reconstruction for the value α = 1.25 × 10−3, where as before the volume corresponding to u 2 is reduced and the inner inclusion corresponding to u 3 is suppressed completely. If the parameter is chosen smaller as α = 10−4, however, the reconstruction of the outer volume is essentially correct, while the inner inclusion—although reduced—is also localized well; see Fig. 5c. Visually, this value yields a better reconstruction than the one obtained by the discrepancy principle. The trade-off is a loss of spatial regularity, manifested in more irregular level lines, which becomes even more pronounced for smaller α = 10−5; see Fig. 5d. This behavior is surprising insofar that the pointwise definition of the multi-bang penalty itself imposes no spatial regularity on the reconstruction at all; as is evident from (15), any regularity of the solution \(\bar u\) is solely due to that of the level sets of \(\bar p\) (which in this case has the regularity of a solution to a Poisson equation).

Fig. 5
figure 5

True parameter u and reconstructions \(u_\alpha ^\delta \) for u 3 = 0.15, δ ≈ 7.59 × 10−1, and different α. (a) u . (b) \(u_\alpha ^\delta \) for α = 1.25 × 10−3. (c) \(u_\alpha ^\delta \) for α = 10−4. (d) \(u_\alpha ^\delta \) for α = 10−5

8 Conclusion

Reconstructions in inverse problems that take on values from a given discrete admissible set can be promoted via a convex penalty that leads to a convergent regularization method. While convergence rates can be shown with respect to the usual Bregman distance, if the true parameter to be reconstructed takes on values only from the admissible set, the convergence (albeit without rates) is actually pointwise. A semismooth Newton method allows the efficient and robust computation of Tikhonov minimizers.

This work can be extended in several directions. First, Fig. 5 demonstrates that regularization parameters chosen according to the discrepancy principle are not optimal with respect to the visual reconstruction quality. This motivates the development of new, heuristic, parameter choice rules that are adapted to the discrete-valued, pointwise, nature of the multi-bang penalty. It would also be interesting to investigate whether an active set condition in the spirit of [28, 29] based on (13) can be used to obtain strong or pointwise convergence rates. A natural further step is the extension to nonlinear parameter identification problems, making use of the results of [9]. Finally, Fig. 5c, d suggest combining the multi-bang penalty with a total variation penalty to also promote regularity of the level lines of the reconstruction. The resulting problem is challenging both analytically and numerically, but would open up the possibility of application to electrical impedance tomography, which can be formulated as parameter identification problem for the diffusion coefficient in an elliptic equation.