1 Introduction

We consider linear-quadratic optimal control problems where the optimal control is only allowed to take values at discrete values \(u_1< \dots < u_d \in \mathbb {R}\) with \(d \in \mathbb {N}\). Such problems occur, e.g., in topology optimization, nondestructive testing or medical imaging; a similar task also arises as a sub-step in segmentation or labeling problems in image processing. However, such problems are inherently nonconvex and, more importantly, not weakly lower semi-continuous and hence cannot be treated by standard techniques. A classical remedy is convex relaxation, where the nonconvex constraint \(u(x) \in \{u_1,\dots ,u_d\}\) is replaced by the convex constraint \(u(x)\in [u_1,u_d]\), but this leads to ignoring the intermediate parameter values. In [3, 5,6,7,8], it was therefore proposed to promote all desired control values using a convex multibang penalty

$$\begin{aligned} G(u): L^2(\Omega ) \rightarrow \mathbb {R}, \qquad u \mapsto \int \limits _\Omega g(u(x)) \mathrm {d}x, \end{aligned}$$

for a suitable convex integrand \(g: \mathbb {R}\rightarrow \mathbb {R}\) with a polyhedral epigraph whose vertices correspond to the desired control values \(u_1,\dots ,u_d\). We thus consider the multibang control problem

$$\begin{aligned} \min \limits _{u \in L^2(\Omega )} \frac{1}{2}\Vert Ku - z\Vert _Y^2 + \alpha G(u) \end{aligned}$$
(1.1)

with \(\alpha > 0\), \(z \in Y\) for a Hilbert space Y, and \(K:L^2(\Omega )\rightarrow Y\) a linear and continuous operator (e.g., the solution operator for a linear elliptic partial differential equation). Just as in \(L^1\) regularization for sparsity (and in linear optimization), it can be expected that minimizers are found at the vertices of G, thus yielding the desired structure. Furthermore, it was shown in [3, 4, 7] that this leads to a primal-dual optimality system that can be solved by a superlinearly convergent semismooth Newton method in function space [14, 22] if a suitable Moreau–Yosida approximation (of the Fenchel conjugate \(G^*\), see Proposition 2.2 below) is introduced. It turns out that this approximation can be expressed in primal form as

$$\begin{aligned} \min \limits _{u \in L^2(\Omega )} \frac{1}{2}\Vert Ku - z\Vert _Y^2 + \alpha G(u) + \frac{\gamma }{2} \Vert u\Vert ^2_{L^2(\Omega )} \end{aligned}$$
(1.2)

for a parameter \(\gamma >0\). We remark that this approach (i.e., applying the approximation to \(G^*\) instead of G) does not destroy the non-differentiability of G and hence preserves the structural properties of (1.1). Standard lower semicontinuity techniques can then be applied to show that the solutions to (1.2) converge weakly to the solution to (1.1) as \(\gamma \rightarrow 0\); see [7, § 4.1]. The aim of this paper is to establish strong convergence and in particular approximation error estimates for \(\Vert \bar{u} - u_\gamma \Vert _{L^2(\Omega )}\).

Let us recall some literature and already known results. For the case \(d=2\) we obtain the minimization problem

$$\begin{aligned} \min \limits _{u_1 \le u \le u_2} \frac{1}{2}\Vert Ku - z\Vert _Y^2. \end{aligned}$$
(1.3)

and if the associated adjoint state \(\bar{p}(x) \ne 0\) almost everywhere, it is well-known that \(\bar{u}\) exhibits a bang-bang structure, i.e. \(\bar{u}(x) \in \{u_1, u_2\}\) almost everywhere. This problem has been studied intensively in the literature, see [20, 21, 23, 24, 26] and the references therein. Note that this list is far away from being complete. For this problem a structural assumption has been established in [24, 26], which controls the behavior of the adjoint state around a singular set and guarantees that the optimal control \(\bar{u}\) exhibits a bang-bang structure. Using this assumption, error estimates for the approximation of (1.3) can be proven; see [24]. A related question is the Moreau–Yosida approximation of state constraints; see [10, 11].

If \(d=3\) and \(u_1< u_2 = 0 < u_3\), the problem (1.1) resembles the minimization problem

$$\begin{aligned} \min \limits _{u_1 \le u \le u_2} \frac{1}{2}\Vert Ku - z\Vert _Y^2 + \alpha \Vert u\Vert _{L^1(\Omega )}, \end{aligned}$$
(1.4)

see, e.g., [20]. The structural assumption used to prove error rates for the approximation of (1.3) can be generalized to problem (1.4). Again, approximation error estimates can be proven; see [23, 24, 26] and the reference therein.

We will generalize this structural assumption to the multibang control problem (1.1). We will show that this assumption is sufficient to guarantee that an optimal control \(\bar{u}\) of (1.1) satisfies \(\bar{u}(x) \in \{u_1,\dots ,u_d\}\) for almost all \(x \in \Omega \). Furthermore, we will use this condition to prove approximation error estimates of the form

$$\begin{aligned} \Vert \bar{u} - u_\gamma \Vert _{L^2(\Omega )} = \mathcal {O} \left( \gamma ^{\frac{\kappa }{2}} \right) \end{aligned}$$

with a constant \(\kappa >0\) depending only on the structural assumption.

The paper is organized as follows. In Sect. 2 we recall some preliminary results which are needed for the convergence analysis. Our structural assumption is introduced in Sect. 3 and used to derive the approximation error estimates. This is also the main result of this paper. In Sect. 4, we establish discretization error estimates under our structural assumption. We introduce an active set method for the solution of (1.2) and show its equivalence to a semismooth Newton method in Sect. 5. Finally, numerical results to support our theoretical findings can be found in Sect. 6.

2 Preliminary results

Let \(u_1< u_2< \dots < u_d\) be some given real numbers with \(d \ge 2\), and let \(\Omega \subset \mathbb {R}^n\) be a bounded domain. Following [3, 5,6,7], we define the piecewise linear function

$$\begin{aligned} g(v) := {\left\{ \begin{array}{ll} \frac{1}{2}( (u_i + u_{i+1})v - u_i u_{i+1} ) &{}\text {if } v \in [u_i, u_{i+1}], \quad 1 \le i < d,\\ \infty &{} \text {else}. \end{array}\right. } \end{aligned}$$

As the pointwise supremum of affine functions, g is convex and continuous on the interior of its domain \(\mathrm {dom}(g) = [u_1,u_d]\). Hence, the corresponding integral functional

$$\begin{aligned} G: L^2(\Omega ) \rightarrow \mathbb {R}, \qquad u \mapsto \int \limits _\Omega g(u(x)) \mathrm {d}x, \end{aligned}$$

is proper, convex and weakly lower semicontinuous as well; see, e.g., [2, Proposition 2.53].

We now consider the problem

$$\begin{aligned} \min \limits _{u \in L^2(\Omega )} \frac{1}{2}\Vert Ku - z\Vert _Y^2 + \alpha G(u) \end{aligned}$$
(2.1)

with a parameter \(\alpha > 0\). Standard semi-continuity methods then yield existence of a minimizer \(\bar{u}\), which is unique if K is injective; see [7]. We will later impose a condition which guarantees that \(\bar{u}\) exhibits a multibang structure, i.e., \(\bar{u}(x) \in \{u_1,\dots ,u_d\}\) for almost every \(x\in \Omega \).

Let us further define the set

$$\begin{aligned} {U_{\mathrm {ad}}}:= \{u \in L^2(\Omega ): u_1 \le u(x) \le u_d \} = \mathrm {co} \left\{ u \in L^2(\Omega ): u(x) \in \{u_1, \dots ,u_d\} \right\} , \end{aligned}$$

where \(\mathrm {co}\) denotes the convex hull. It is clear that (2.1) is equivalent to the problem

figure a

We will use this equivalent formulation to derive variational inequalities which will be useful in the convergence analysis. Standard convex analysis techniques then yield primal–dual optimality conditions; see, e.g.,[3, 7].

Proposition 2.1

Define the sets

$$\begin{aligned} Q_1&:= \left\{ q: q< \frac{\alpha }{2}(u_1 + u_2)\right\} ,\\ Q_i&:= \left\{ q: \frac{\alpha }{2}(u_{i-1} + u_i)< q< \frac{\alpha }{2}(u_i + u_{i+1}) \right\} , \quad 1< i < d,\\ Q_d&:= \left\{ q: q > \frac{\alpha }{2}(u_{d-1} + u_d) \right\} ,\\ Q_{i,i+1}&:= \left\{ q: q = \frac{\alpha }{2}(u_i + u_{i+1}) \right\} . \end{aligned}$$

Let \(\bar{u} \in {U_{\mathrm {ad}}}\) with associated adjoint state \(\bar{p} := K^*(z - K \bar{u})\). Then \(\bar{u}\) is a solution to (P) if and only if

$$\begin{aligned} \bar{u}(x) \in {\left\{ \begin{array}{ll} \{u_i\} \quad &{}\text {if } \bar{p}(x) \in Q_i \quad 1 \le i \le d,\\ {[}u_i, u_{i+1}] \quad &{} \text {if } \bar{p}(x) \in Q_{i,i+1} \quad 1 \le i < d. \end{array}\right. } \end{aligned}$$
(2.2)

It is clear that the optimal solution \(\bar{u}\) is uniquely determined by the adjoint state on the sets \(\{x\in \Omega : \bar{p}(x)\in Q_i\}\). We see furthermore that \(\bar{u}(x) \in \{ u_1, \dots , u_d \}\) almost everywhere on \(\Omega \) if \({\text {meas}}\{x\in \Omega :\bar{p}(x)\in Q_{i,i+1}\} = 0\) for all \(1 \le i < d\). Hence \(\bar{u}\) has a multibang structure in this case. In the following, we will make use of this relation to construct a suitable regularity condition on these sets.

Remark 2.1

Although the dependence of the optimal controls on \(\alpha \) is not the focus of this work – see instead the earlier works [5,6,7,8], and, in particular, [3, Section 5] – let us recall the essential features for the sake of completeness. First, note that \(\alpha \) enters the optimality conditions (2.2) only via the case distinction for the sets \(Q_i\) and \(Q_{i,i+1}\). Specifically, increasing the value of \(\alpha \) shifts the conditions on \(\bar{p}\) so that desired control values \(u_i\) of smaller magnitude are preferred. Conversely, for \(\alpha \rightarrow 0\), these conditions coincide with the well-known optimality conditions for bang-bang control problems where only \(Q_1\), \(Q_d\), and \(Q_{1,d}\) are relevant; see, e.g., [21, Lemma 2.26]. This implies that apart from singular cases where \({\text {meas}}\{x\in \Omega :\bar{p}(x) =c \}\ne 0\) for some \(c\in \mathbb {R}\), the value of \(\alpha \) does not influence the “ strength” of the multibang penalty in enforcing the desired control values but only the specific selection among these values.

We next introduce the Moreau–Yosida approximation of (P) with a regularization parameter \(\gamma > 0\),

figure b

As for (P), arguments from convex analysis lead to the following optimality conditions; see [3, 7].

Proposition 2.2

Define the sets

$$\begin{aligned} Q_1^\gamma&:= \left\{ q: q< \frac{\alpha }{2} \left( \left( 1+2 \frac{\gamma }{\alpha } \right) u_1 + u_2 \right) \right\} ,\\ Q_i^\gamma&:= \left\{ q: \frac{\alpha }{2}\left( u_{i-1} + \left( 1 + 2 \frac{\gamma }{\alpha } \right) u_i \right)< q< \frac{\alpha }{2} \left( \left( 1+2 \frac{\gamma }{\alpha } \right) u_i + u_{i+1} \right) \right\} ,\\ Q_{i,i+1}^\gamma&:= \left\{ q: \frac{\alpha }{2}\left( \left( 1+ 2 \frac{\gamma }{\alpha } \right) u_i + u_{i+1} \right) \le q \le \frac{\alpha }{2} \left( u_i + \left( 1 + 2 \frac{\gamma }{\alpha } \right) u_{i+1} \right) \right\} ,\\ Q_d^\gamma&:= \left\{ q: \frac{\alpha }{2} \left( u_{d-1} + \left( 1+2 \frac{\gamma }{\alpha } \right) u_d \right) < q \right\} . \end{aligned}$$

Let \(u_\gamma \in {U_{\mathrm {ad}}}\) with associated adjoint state \(p_\gamma := K^*(z - K u_\gamma )\). Then \(u_\gamma \) is a solution to (\(P_\gamma \)) if and only if

$$\begin{aligned} u_\gamma (x) = {\left\{ \begin{array}{ll} u_i \quad &{}\text {if } p_\gamma (x) \in Q_i^\gamma \quad 1 \le i \le d,\\ \frac{1}{\gamma }\left( p_\gamma (x) -\tfrac{\alpha }{2}(u_i+u_{i+1})\right) \quad &{} \text {if } p_\gamma (x) \in Q_{i,i+1} \quad 1 \le i < d. \end{array}\right. } \end{aligned}$$
(2.3)

We remark that (2.3) is the explicit pointwise characterization of \(u_\gamma \in (\partial G^*)_\gamma (p_\gamma )\), where \((\partial G^*)_\gamma \) denotes the Yosida approximation of the convex subdifferential (which coincides with the Fréchet derivative of the Moreau envelope) of the Fenchel conjugate of G, which justifies the term Moreau–Yosida approximation; see, e.g., [3, § 4.1].

We can also derive purely primal first-order optimality conditions for (P) and (\(P_\gamma \)) in terms of variational inequalities using standard arguments as in, e.g., [21, Thm. 2.22].

Proposition 2.3

Let \(\bar{u}\) and \(u_\gamma \) be solutions of (P) and (\(P_\gamma \)) with associated adjoint states \(\bar{p} := K^*(z - K\bar{u})\) and \(p_\gamma := K^*(z - K u_\gamma )\), respectively. Then,

$$\begin{aligned} \left( -\bar{p}, u - \bar{u} \right) _{L^2(\Omega )} + \alpha G'(\bar{u}; u - \bar{u})&\ge 0 \quad \text {for all } u \in {U_{\mathrm {ad}}},\\ \left( -p_\gamma + \gamma u_\gamma , u - u_\gamma \right) _{L^2(\Omega )} + \alpha G'(u_\gamma ; u - u_\gamma )&\ge 0 \quad \text {for all } u \in {U_{\mathrm {ad}}}. \end{aligned}$$

Here, \(G'(\bar{u}; u - \bar{u})\) denotes the directional derivative of G at \(\bar{u}\) in direction \(u - \bar{u}\), which will be characterized in the following lemma. Note that for \(\bar{u}, u \in {U_{\mathrm {ad}}}\) we have \(u- \bar{u} \in T_{U_{\mathrm {ad}}}(\bar{u})\) for

$$\begin{aligned} T_{U_{\mathrm {ad}}}(u) := \left\{ v\in L^2(\Omega ): v(x) \left. {\left\{ \begin{array}{ll} \ge 0 &{} \text {if } u(x) = u_1\\ \le 0 &{} \text {if } u(x) = u_d \end{array}\right. } \right\} \right\} , \end{aligned}$$

i.e., the tangential cone to \({U_{\mathrm {ad}}}\) in the point u. It thus suffices to consider directional derivatives for directions in \(T_{{U_{\mathrm {ad}}}}\), which helps to avoid unnecessary case distinctions in the proof. Furthermore, since \({U_{\mathrm {ad}}}\subset L^\infty (\Omega )\), we only have to consider directions in \(L^\infty (\Omega )\). In the following, all pointwise expressions and calculations are understood in an almost everywhere sense.

Lemma 2.1

Let \(u \in {U_{\mathrm {ad}}}\) and define the sets

$$\begin{aligned} S_i&:= \{x \in \Omega : u(x) = u_i\}, \quad i =1,\dots ,d,\\ T_i&:= \{ x \in \Omega : u_i< u(x) < u_{i+1} \}, \quad i=1,\dots ,d-1. \end{aligned}$$

The directional derivative of G in direction \(v \in T_{U_{\mathrm {ad}}}(u) \cap L^\infty (\Omega )\) is then given as

$$\begin{aligned} \begin{aligned} G'(u;v)&= \sum \limits _{i=1}^{d-1} \int \limits _{T_i} \frac{1}{2}(u_i + u_{i+1})v(x) \ \mathrm {d}x\\&\quad + \sum \limits _{i=1}^{d} \left[ \ \int \limits _{S_i \cap \{v \ge 0\}} \frac{1}{2}(u_i + u_{i+1})v(x) \mathrm {d}x+ \int \limits _{ S_i \cap \{ v < 0 \} } \frac{1}{2}(u_{i-1} + u_i) v(x) \mathrm {d}x\right] . \end{aligned} \end{aligned}$$

Proof

We use the definition of the directional derivative and of the sets \(S_i\) and \(T_i\) to obtain

$$\begin{aligned} \begin{aligned} G'(u;v) :=&\lim \limits _{\rho \rightarrow 0} \frac{1}{\rho } \left( G(u+ \rho v) - G(u) \right) \\ =&\lim \limits _{\rho \rightarrow 0} \frac{1}{\rho } \left[ \sum \limits _{i=1}^{d-1} \int \limits _{T_i} ( g(u(x)+\rho v(x)) - g(u(x)) ) \mathrm {d}x\right. \\&\left. + \sum \limits _{i=1}^d \int \limits _{S_i}( g(u(x)+\rho v(x)) - g(u(x)) ) \mathrm {d}x\right] . \end{aligned} \end{aligned}$$

We now make use of our assumption that \(v \in T_{{U_{\mathrm {ad}}}} \cap L^\infty (\Omega )\). For such a v, we can find a \(\rho > 0\) such that \(u + \rho v \in {U_{\mathrm {ad}}}\). Note that this is a pointwise condition, which we are going to exploit in the following. We have to differentiate between several cases.

  1. (i)

    First, assume that \(x \in T_i\) with \(1 \le i \le d-1\). For \(\rho \) small enough we then get \(u(x) + \rho v(x) \in [u_i, u_{i+1}]\). Hence we obtain

    $$\begin{aligned} g(u(x)+\rho v(x)) - g(u(x))&= \frac{1}{2}( (u_i + u_{i+1})( u(x)+ \rho v(x)) - u_i u_{i+1} )\nonumber \\&- \frac{1}{2}( (u_i + u_{i+1})u(x) - u_i u_{i+1} )\nonumber \\&= \frac{\rho }{2} (u_i + u_{i+1}) v(x). \end{aligned}$$
    (2.4)

    which yields

    $$\begin{aligned} \lim \limits _{\rho \rightarrow 0} \int \limits _{T_i} (g(u(x)+\rho v(x)) - g(u(x))) \mathrm {d}x= \int \limits _{T_i} \frac{1}{2}(u_i + u_{i+1}) v(x) \mathrm {d}x. \end{aligned}$$
  2. (ii)

    Now assume that \(x \in S_i\) with \(1< i < d\). Then by definition, \(u(x) = u_i\). Here we have to further differentiate between three cases.

    • \(v(x)=0\): Here we obtain \(u(x) + \rho v(x) = u(x)\), leading to

      $$\begin{aligned} g(u(x) + \rho v(x)) - g(u(x)) = 0. \end{aligned}$$
    • \(v(x)>0\): Here we obtain \(u(x) + \rho v(x) \in [u_i, u_{i+1}]\) for \(\rho \) small enough, leading to

      $$\begin{aligned} g(u(x)+\rho v(x)) - g(u(x)) = \frac{\rho }{2} (u_i + u_{i+1}) v(x). \end{aligned}$$
    • \(v(x)<0\): Here we obtain \(u(x) + \rho v(x) \in [u_{i-1}, u_{i}]\), leading as in (2.4) to

      $$\begin{aligned} g(u(x)+\rho v(x)) - g(u(x)) = \frac{\rho }{2} ( u_{i-1} + u_i ) v(x). \end{aligned}$$

    Combining all three cases yields

    $$\begin{aligned}&\lim \limits _{\rho \rightarrow 0} \frac{1}{\rho } \int \limits _{S_i} (g(u(x)+\rho v(x)) - g(u(x))) \mathrm {d}x\\&\quad = \int \limits _{ S_i \cap \{v \ge 0\} } \frac{1}{2}(u_i +u_{i+1} ) v(x) \mathrm {d}x+ \int \limits _{ S_i \cap \{v < 0\} } \frac{1}{2} (u_{i-1} + u_i) v(x) \mathrm {d}x. \end{aligned}$$
  3. (iii)

    We are left with the special cases \(x\in S_i\) for \(i=1\) and \(i = d\). We only consider the case \(i=1\) as the case \(i=d\) is similar. Hence we assume \(x \in S_1\), which implies \(u(x) = u_1\). Since \(v \in T_{U_{\mathrm {ad}}}(u)\), we have that \(v(x) \ge 0\). If \(v(x) > 0\), we obtain for \(\rho \) small enough that \(u(x) + \rho v(x) \in T_1\) holds, leading to

    $$\begin{aligned} g(u(x) + \rho v(x)) - g(u(x)) = \frac{\rho }{2}(u_1 + u_2) v(x) \end{aligned}$$

    and similar if \(v(x) = 0\). This leads to

    $$\begin{aligned} \lim \limits _{\rho \rightarrow 0} \frac{1}{\rho } \int \limits _{S_1} ( g(u(x)+\rho v(x)) - g(u(x)) ) \mathrm {d}x= \int \limits _{S_1} \frac{1}{2}(u_1 + u_2) v(x) \mathrm {d}x. \end{aligned}$$

    A similar argument for the remaining case \(i=d\) finishes the proof.

\(\square \)

3 Regularity assumption and error estimates

We now extend the active set condition from [24, 26] to the multibang control problem. From Proposition 2.1, we see that the optimal control \(\bar{u}\) is not uniquely determined by the adjoint state \(\bar{p}\) on the singular sets \(Q_{i,i+1}\). We therefore need to control the way in which \(\bar{p}\) “detaches” from these sets. This motivates the following assumption.

Assumption REG For the solution \(\bar{u}\) to (P) with adjoint state \(\bar{p} = K^*(z-K \bar{u})\) there exists a constant \(c > 0\) and \(\kappa > 0\) such that

$$\begin{aligned} {\text {meas}}\left( \bigcup \limits _{i=1}^{d-1} \left\{ x \in \Omega : \left| \bar{p}(x) - \frac{\alpha }{2}(u_i + u_{i+1}) \right| < \varepsilon \right\} \right) \le c \varepsilon ^\kappa \end{aligned}$$

holds for all \(\varepsilon > 0\) small enough.

Note that if \(\bar{u}\) satisfies this assumption, the sets \(Q_{i,i+1}\) have Lebesgue measure zero. Hence, \(\bar{u}\) is multibang by Proposition 2.1. In addition, we have the following result, which is a direct consequence of \({\text {meas}}\{x\in \Omega :\bar{p}(x)\in Q_{i,i+1}\} = 0\).

Lemma 3.1

Assume \(\bar{u}\) satisfies Assumption REG. Then \(\bar{p}(x) \in Q_i\) if and only if \(\bar{u}(x) = u_i\) holds almost everywhere in \(\Omega \).

Following [9, Lemma 1.3], we can derive a sufficient condition for Assumption REG.

Theorem 3.1

Suppose that the adjoint state \(\bar{p}\in C^1(\bar{\Omega })\) and satisfies

$$\begin{aligned} \min \limits _{x \in K_i} |\nabla p(x)| > 0\qquad \text {for all } i= 1,\dots ,d-1, \end{aligned}$$

where

$$\begin{aligned} K_i := \left\{ x \in \bar{\Omega }: p(x) = \frac{\alpha }{2}(u_i + u_{i+1}) \right\} . \end{aligned}$$

Then Assumption REG holds with \(\kappa = 1\).

Proof

Define for \(t\in \mathbb {R}\) the level sets \(F_t := \{ x \in \bar{\Omega }: p(x) = t \}\). Now we use a continuity argument to obtain constants \(\varepsilon _0, c_0, C > 0\) such that for all \(|t - \frac{\alpha }{2}(u_i + u_{i+1})|\le \varepsilon _0\) and all \(1\le i < d\) there holds

$$\begin{aligned} |\nabla p(x)| \ge c_0 >0, \quad \mathcal {H}^{n-1}(F_t) \le C, \end{aligned}$$

where \(\mathcal {H}^{n-1}\) is the \((n-1)\)-dimensional Hausdorff measure. In the following, we denote by \(\mathbb {1}_C\) the characteristic function of the set C, i.e., \(\mathbb {1}_C(x) =1\) if \(x\in C\) and 0 else. We now use the co-area formula

$$\begin{aligned} \int \limits _\Omega h(x) |\nabla p(x)| \mathrm {d}x= \int \limits _{- \infty }^\infty \left( \ \int \limits _{p^{-1}(t)} h(x) \mathrm {d} {\mathcal {H}^{n-1}}(x) \right) \mathrm {d}t\end{aligned}$$

with the function

$$\begin{aligned} h(x) := \mathbb {1}_{ E_i }, \quad E_i := \left\{ x \in \Omega : \left| p(x) - \frac{\alpha }{2}(u_i + u_{i+1}) \right| \le \varepsilon \right\} , \end{aligned}$$

to obtain for all \(1\le i < d\) and \(0 < \varepsilon \le \varepsilon _0\) that

$$\begin{aligned} c_0 {\text {meas}}\left( E_i \right) \le \int \limits _{E_i} |\nabla p(x)| \mathrm {d}x= \int \limits _{-\varepsilon }^\varepsilon \mathcal {H}^{n-1} \left( F_{t - \frac{\alpha }{2}(u_i + u_{i+1})}\right) \mathrm {d}t\le 2 C \varepsilon \end{aligned}$$

holds. Since this holds for all \(1 \le i < d\), the Assumption REG now follows with \(\kappa = 1\). \(\square \)

We now establish error estimates for the approximation (\(P_\gamma \)) of (P). For this purpose, we first derive a stronger version of Proposition 2.3. The next result, which is similar to ones in [18, 19], is the most important tool in the convergence analysis.

Lemma 3.2

Assume that the solution \(\bar{u}\) to (P) satisfies Assumption REG. Then,

$$\begin{aligned} (-\bar{p},u - \bar{u})_{L^2(\Omega )} + \alpha G'(\bar{u}; u - \bar{u}) \ge c_A \Vert u-\bar{u}\Vert ^{1 + \frac{1}{\kappa }}_{L^1(\Omega )} \qquad \forall u \in {U_{\mathrm {ad}}}\end{aligned}$$

with a constant \(c_A:=c_A(\kappa ) > 0\).

Proof

First, recall that Assumption REG implies that \(\bar{u}\) has a multibang structure. Furthermore, using Lemma 3.1 we obtain with the definition of \(Q_i\) and \(S_i\) in Proposition 2.1 and Lemma 2.1, respectively, that \(\bar{u}(x) \in S_i\) if and only if \(\bar{p}(x) \in Q_i\). Now we use Lemma 2.1 and the fact that \(u - \bar{u} \in T_{U_{\mathrm {ad}}}(\bar{u})\) to compute

$$\begin{aligned}&(-\bar{p},u - \bar{u})_{L^2(\Omega )} + \alpha G'(\bar{u}; u-\bar{u}) \\&\quad = \int \limits _{\{\bar{p} \in Q_1\} } \left( -\bar{p}(x) + \frac{\alpha }{2} (u_1 + u_2) \right) (u(x) - \bar{u}(x)) \mathrm {d}x\\&\qquad + \int \limits _{\{\bar{p} \in Q_d\} } \left( - \bar{p}(x)+ \frac{\alpha }{2} (u_{d-1} + u_d) \right) (u(x) - \bar{u})(x) \mathrm {d}x\\&\qquad +\sum \limits _{i=2}^{d-1} \int \limits _{ \{\bar{p} \in Q_i\} \cap \{ u-\bar{u} \ge 0 \} } \left( -\bar{p}(x) + \frac{\alpha }{2}(u_i + u_{i+1}) \right) (u(x) - \bar{u}(x)) \mathrm {d}x\\&\qquad + \sum \limits _{i=2}^{d-1} \int \limits _{ \{\bar{p} \in Q_i\} \cap \{ u-\bar{u} < 0 \} } \left( -\bar{p}(x) + \frac{\alpha }{2}(u_{i-1} + u_{i}) \right) (u (x)- \bar{u}(x)) \mathrm {d}x. \end{aligned}$$

Here we have abbreviated the sets \(\{\bar{p} \in Q_1\} :=\{ x \in \Omega : \bar{p}(x) \in Q_1 \}\) and similar for the other sets. Recall that by definition, \(\bar{p}(x) \in Q_1\) implies that \(-\bar{p}(x) + \frac{\alpha }{2}(u_1 + u_2) > 0\). Furthermore, we know that \(\bar{u}(x) = u_1\), leading to \(u(x) - \bar{u}(x) = u(x)-u_1 \ge 0\). We similarly obtain on \(Q_d\) that \(-\bar{p}(x) + \frac{\alpha }{2}(u_{d-1} + u_d) < 0\) and \(u(x) - \bar{u}(x) = u(x) - u_d \le 0\). Finally, if \(\bar{p}(x) \in Q_i\) for \(1< i < d\), we obtain that

$$\begin{aligned} \frac{\alpha }{2}(u_{i-1} + u_i)< \bar{p}(x) < \frac{\alpha }{2}(u_i + u_{i+1}), \end{aligned}$$

which leads to

$$\begin{aligned} - \bar{p}(x) + \frac{\alpha }{2}(u_i + u_{i+1}) > 0 \quad \text {and} \quad - \bar{p}(x) + \frac{\alpha }{2}(u_{i-1} + u_i) < 0. \end{aligned}$$

This allows us to write

$$\begin{aligned}&(-\bar{p},u - \bar{u})_{L^2(\Omega )} + \alpha G'(\bar{u}; u-\bar{u})\\&\quad = \int \limits _{\{\bar{p} \in Q_1\} } \left| -\bar{p}(x) + \frac{\alpha }{2} (u_1 + u_2) \right| |u(x) - \bar{u}(x)| \mathrm {d}x\\&\qquad + \int \limits _{\{\bar{p} \in Q_d\} } \left| - \bar{p}(x)+ \frac{\alpha }{2} (u_{d-1} + u_d) \right| |u(x) - \bar{u}(x)| \mathrm {d}x\\&\qquad + \sum \limits _{i=2}^{d-1} \int \limits _{ \{\bar{p} \in Q_i\} \cap \{ u-\bar{u} \ge 0 \} } \left| -\bar{p} (x)+ \frac{\alpha }{2}(u_i + u_{i+1}) \right| |u(x) - \bar{u}(x)| \mathrm {d}x\\&\qquad + \sum \limits _{i=2}^{d-1} \int \limits _{ \{\bar{p} \in Q_i\} \cap \{ u-\bar{u} < 0 \} } \left| -\bar{p}(x) + \frac{\alpha }{2}(u_{i-1} + u_{i}) \right| |u(x) - \bar{u}(x)| \mathrm {d}x. \end{aligned}$$

Now let \(\varepsilon > 0\) and consider the set

$$\begin{aligned} Q_1^\varepsilon :=\left\{ q: q \le \frac{\alpha }{2}(u_1 + u_2) - \varepsilon \right\} \subset Q_1. \end{aligned}$$

Let \(\bar{p}(x) \in Q_1^\varepsilon \). Together with \(-\bar{p}(x) + \frac{\alpha }{2}(u_1 + u_2) > 0\), this implies that

$$\begin{aligned} \left| -\bar{p}(x) + \frac{\alpha }{2}(u_1 + u_2) \right| = -\bar{p}(x) + \frac{\alpha }{2}(u_1 + u_2) \ge \varepsilon , \end{aligned}$$

leading to

$$\begin{aligned} \begin{aligned} \int \limits _{\{\bar{p} \in Q_1\} } \left| -\bar{p} + \frac{\alpha }{2} (u_1 + u_2) \right| |u - \bar{u}| \mathrm {d}x&\ge \int \limits _{\{\bar{p} \in Q_1^\varepsilon \} } \left| -\bar{p} + \frac{\alpha }{2} (u_1 + u_2) \right| |u - \bar{u}| \mathrm {d}x\\&\ge \varepsilon \int \limits _{\{\bar{p} \in Q_1^\varepsilon \} } |u - \bar{u}| \mathrm {d}x. \end{aligned} \end{aligned}$$

We similarly define

$$\begin{aligned} Q_d^\varepsilon := \left\{ q \ge \frac{\alpha }{2}(u_{d-1} + u_d) + \varepsilon \right\} , \end{aligned}$$

leading to

$$\begin{aligned} \int \limits _{\{\bar{p} \in Q_d\} } \left| - \bar{p}(x)+ \frac{\alpha }{2} (u_{d-1} + u_d) \right| |u(x) - \bar{u}(x)| \mathrm {d}x\ge \varepsilon \int \limits _{\{\bar{p} \in Q_d^\varepsilon \} } |u(x) - \bar{u}(x)| \mathrm {d}x, \end{aligned}$$

as well as for \(1<i<d\)

$$\begin{aligned} Q_i^\varepsilon := \left\{ \varepsilon + \frac{\alpha }{2}(u_{i-1} + u_i) \le q \le \frac{\alpha }{2}(u_i + u_{i+1}) - \varepsilon \right\} \subset Q_i. \end{aligned}$$

The latter leads to

$$\begin{aligned} \left| -\bar{p}(x) + \frac{\alpha }{2}(u_i + u_{i+1})\right|&= -\bar{p}(x) + \frac{\alpha }{2}(u_i + u_{i+1}) \ge \varepsilon ,\\ \left| -\bar{p}(x) + \frac{\alpha }{2}(u_{i-1} + u_i)\right|&= \phantom {-}\bar{p}(x) - \frac{\alpha }{2}(u_{i-1} + u_i) \ge \varepsilon \end{aligned}$$

and therefore

$$\begin{aligned}&\int \limits _{ \{\bar{p} \in Q_i\} \cap \{ u-\bar{u} \ge 0 \} } \left| -\bar{p}(x) + \frac{\alpha }{2}(u_i + u_{i+1}) \right| |u(x) - \bar{u}(x)| \mathrm {d}x\\&\qquad + \int \limits _{ \{\bar{p} \in Q_i\} \cap \{ u-\bar{u}< 0 \} } \left| -\bar{p}(x) + \frac{\alpha }{2}(u_{i-1} + u_{i}) \right| |u(x) - \bar{u}(x)| \mathrm {d}x\\&\quad \ge \varepsilon \int \limits _{ \{\bar{p} \in Q_i^\varepsilon \} \cap \{ u-\bar{u} \ge 0 \} } |u(x) - \bar{u}(x)| \mathrm {d}x+ \varepsilon \int \limits _{ \{\bar{p} \in Q_i^\varepsilon \} \cap \{ u-\bar{u} < 0 \} } |u(x)-\bar{u}(x)| \mathrm {d}x\\&\quad = \varepsilon \int \limits _{ \{\bar{p} \in Q_i^\varepsilon \} } |u(x)-\bar{u}(x)| \mathrm {d}x. \end{aligned}$$

We now combine all these estimates to obtain

$$\begin{aligned}&(-\bar{p},u - \bar{u})_{L^2(\Omega )} + \alpha G'(\bar{u}; u - \bar{u}) \\&\quad \ge \varepsilon \sum \limits _{i=1}^d \int \limits _{ \{ \bar{p} \in Q_i^\varepsilon \} } |u(x) - \bar{u}(x)| \mathrm {d}x\\&\quad =\varepsilon \sum \limits _{i=1}^d \left( \int \limits _{ \{ \bar{p} \in Q_i \} } |u(x) - \bar{u}(x)| \mathrm {d}x- \int \limits _{ \{ \bar{p} \in Q_i \} \setminus \{ \bar{p} \in Q_i^\varepsilon \} } |u(x) - \bar{u}(x)| \mathrm {d}x\right) \\&\quad = \varepsilon \Vert u - \bar{u}\Vert _{L^1(\Omega )} - \varepsilon \sum \limits _{i=1}^d \int \limits _{ \{ \bar{p} \in Q_i \} \setminus \{ \bar{p} \in Q_i^\varepsilon \} } |u(x) - \bar{u}(x)| \mathrm {d}x\\&\quad \ge \varepsilon \Vert u - \bar{u}\Vert _{L^1(\Omega )} - \varepsilon \Vert u - \bar{u}\Vert _{L^\infty (\Omega )} \sum \limits _{i=1}^d \int \limits _{ \{ \bar{p} \in Q_i \} \setminus \{ \bar{p} \in Q_i^\varepsilon \} } 1 \mathrm {d}x, \end{aligned}$$

where we have used the \(L^\infty \)-boundedness of \(u - \bar{u}\) in the last step. We now use Assumption REG to estimate the remaining sum, yielding

$$\begin{aligned} \sum \limits _{i=1}^d \int \limits _{ \{ \bar{p} \in Q_i \} \setminus \{ \bar{p} \in Q_i^\varepsilon \} } 1 \mathrm {d}x= {\text {meas}}\left( \bigcup \limits _{i=1}^{d-1} \left\{ x\in \Omega : \left| \bar{p}(x) - \frac{\alpha }{2}(u_i + u_{i+1}) \right| < \varepsilon \right\} \right) \le c \varepsilon ^\kappa . \end{aligned}$$

Summarizing, we have for a constant \(c > 1\) that

$$\begin{aligned} (-\bar{p},u - \bar{u})_{L^2(\Omega )} + \alpha G'(\bar{u}; u - \bar{u}) \ge \varepsilon \Vert u - \bar{u}\Vert _{L^1(\Omega )} -c \varepsilon ^{\kappa + 1}, \end{aligned}$$

and hence setting

$$\begin{aligned} \varepsilon := c^{- \frac{2}{\kappa }} \Vert u - \bar{u}\Vert _{L^1(\Omega )}^{\frac{1}{\kappa }} \end{aligned}$$

finishes the proof. \(\square \)

We now have everything at hand to prove approximation error estimates.

Theorem 3.2

Let \(\bar{u}\) be a solution of (P) with corresponding state \(\bar{y} := K\bar{u}\) and assume that Assumption REG is satisfied. Furthermore, let \(u_\gamma \) be the solution of (\(P_\gamma \)) for \(\gamma >0\) with corresponding state \(y_\gamma := K u_\gamma \). Then there exists a constant \(c> 0\) such that

$$\begin{aligned} \frac{1}{\gamma }\Vert y_\gamma - \bar{y}\Vert _Y^2 + \frac{1}{\gamma }\Vert u_\gamma - \bar{u}\Vert _{L^1(\Omega )}^{1 + \frac{1}{\kappa }} + \Vert u_\gamma - \bar{u}\Vert _{L^2(\Omega )}^2 \le c \gamma ^\kappa . \end{aligned}$$

Proof

First note that G is a convex function and hence that

$$\begin{aligned} G'(\bar{u}; u_\gamma - \bar{u}) + G'(u_\gamma ; \bar{u} - u_\gamma ) \le 0. \end{aligned}$$

We thus obtain from Proposition 2.3 and Lemma 3.2 that

$$\begin{aligned} (-\bar{p},u - \bar{u})_{L^2(\Omega )} + \alpha G'(\bar{u}; u - \bar{u}) \ge c_A \Vert u_\gamma - \bar{u}\Vert _{L^1(\Omega )}^{1 + \frac{1}{\kappa }} \quad&\forall u \in {U_{\mathrm {ad}}},\\ (-p_\gamma ,u - u_\gamma )_{L^2(\Omega )}+ \alpha G'(u_\gamma ; u - u_\gamma ) + \gamma (u_\gamma , u - u_\gamma )_{L^2(\Omega )} \ge 0 \quad&\forall u \in {U_{\mathrm {ad}}}. \end{aligned}$$

Inserting \(u=u_\gamma \) and \(u=\bar{u}\) into two above inequalities, respectively, and then adding both yields

$$\begin{aligned}&(-\bar{p} + p_\gamma , u_\gamma - \bar{u} )_{L^2(\Omega )} + \alpha ( G'(\bar{u}; u_\gamma - \bar{u}) + G'(u_\gamma ; \bar{u} - u_\gamma )) + \gamma (u_\gamma , \bar{u} - u_\gamma )_{L^2(\Omega )}\\&\quad \ge c_A \Vert u_\gamma - \bar{u}\Vert _{L^1(\Omega )}^{1 + \frac{1}{\kappa }}. \end{aligned}$$

We now use the definition of \(\bar{p} = K^*(z - K \bar{u})\) and \(p_\gamma = K^*(z - K u_\gamma )\) to deduce that

$$\begin{aligned} (-\bar{p} + p_\gamma , u_\gamma - \bar{u} )_{L^2(\Omega )} = - \Vert y_\gamma - \bar{y}\Vert ^2_{Y}. \end{aligned}$$

Hence, by adding \(\gamma \Vert \bar{u} - u_\gamma \Vert ^2_{L^2(\Omega )}\) to the inequality above and rearranging terms, we obtain that

$$\begin{aligned} \begin{aligned} \Vert y_\gamma - \bar{y}\Vert _Y^2 + c_A \Vert u_\gamma - \bar{u}\Vert _{L^1(\Omega )}^{1 + \frac{1}{\kappa }} + \gamma \Vert u_\gamma - \bar{u}\Vert _{L^2(\Omega )}^2 \le&\alpha ( G'(\bar{u}; u_\gamma - \bar{u}) + G'(u_\gamma ; \bar{u} - u_\gamma )) \\&+ \gamma (\bar{u}, \bar{u} - u_\gamma )_{L^2(\Omega )}\\ \le&\gamma (\bar{u}, \bar{u} - u_\gamma )_{L^2(\Omega )}\\ \le&c \gamma \Vert u_\gamma - \bar{u}\Vert _{L^1(\Omega )}\\ \le&\frac{c_A}{2}\Vert u_\gamma - \bar{u}\Vert _{L^1(\Omega )}^{1 + \frac{1}{\kappa }} + c \gamma ^{\kappa +1}, \end{aligned} \end{aligned}$$

where we have used Young’s inequality in the last step. The stated inequality now follows immediately. \(\square \)

4 Discretization error estimates

In practice, the exact operator K is not realizable, and a discretization \(K_h: L^2(\Omega ) \rightarrow Y_h\) with finite dimensional range \(Y_h\) must be employed. Denote by \(u_{\gamma ,h}\) the solution of the discrete problem

figure c

with corresponding state \(y_{\gamma ,h}:=K_hu_{\gamma ,h}\) and adjoint state \(p_{\gamma ,h}:=K_h^*(z-y_{\gamma ,h})\). If K is the solution operator of an elliptic partial differential equation and \(K_h\) its finite element discretization as in the next section, (\(P_{\gamma ,h}\)) can be interpreted as a variational discretization [12, 13].

We assume that for all \(h>0\), the estimate

$$\begin{aligned} \Vert (K -K_h) u_{\gamma ,h}\Vert _Y + \Vert (K^*- K_h^*)(y_{\gamma ,h} - z)\Vert _{L^2(\Omega )} \le \delta (h), \end{aligned}$$
(4.1)

holds uniformly for all \(\gamma >0\) with a monotonically increasing function \(\delta : \mathbb {R}^+_0 \rightarrow \mathbb {R}\) such that \(\delta (0) = 0\). Note that this approximation condition only needs to be satisfied for the solutions to the discretized problem (\(P_{\gamma ,h}\)). However, as in [23] the condition can also be replaced by a corresponding uniform condition for the solution to the continuous problem (\(P_\gamma \)).

Now, we follow [23, Proposition 1.8] and estimate the discretization error for the solution to (\(P_\gamma \)).

Theorem 4.1

For all \(\gamma >0\) and \(h\ge 0\) there holds

$$\begin{aligned} \Vert y_\gamma - y_{\gamma ,h}\Vert _Y^2 + \gamma \Vert u_\gamma - u_{\gamma ,h}\Vert _{L^2(\Omega )}^2 \le (1 + \gamma ^{-1})\delta (h)^2. \end{aligned}$$

Proof

With \(u_{\gamma ,h}\) and \(u_\gamma \) solutions to (\(P_{\gamma ,h}\)) and (\(P_\gamma \)), respectively, we have from Proposition 2.3 that

$$\begin{aligned} \left( -p_{\gamma ,h} + \gamma u_{\gamma ,h}, u_\gamma - u_{\gamma ,h} \right) _{L^2(\Omega )} + \alpha G'(u_{\gamma ,h}; u_\gamma - u_{\gamma ,h})&\ge 0 ,\\ \left( -p_\gamma + \gamma u_\gamma , u_{\gamma ,h} - u_\gamma \right) _{L^2(\Omega )} + \alpha G'(u_\gamma ; u_{\gamma ,h} - u_\gamma )&\ge 0. \end{aligned}$$

Adding these two inequalities, substituting

$$\begin{aligned} p_{\gamma ,h} = -K_h^*(K_hu_{\gamma ,h} - z),\quad p_\gamma =-K^*(Ku_\gamma - z), \end{aligned}$$

and using the convexity of G then yields

$$\begin{aligned}&\left( K_h^*(K_h u_{\gamma ,h} - z) + \gamma u_{\gamma ,h}, u_\gamma - u_{\gamma ,h}\right) + \left( K^*(K u_\gamma - z) + \gamma u_\gamma , u_{\gamma ,h} - u_\gamma \right) \\&\quad \ge -\alpha \left( G'(u_{\gamma ,h}; u_\gamma - u_{\gamma ,h}) + G'(u_\gamma ; u_{\gamma ,h} - u_\gamma )\right) \ge 0. \end{aligned}$$

We thus obtain that

$$\begin{aligned} \begin{aligned} \gamma \Vert u_{\gamma ,h}-u_\gamma \Vert _{L^2(\Omega )}^2 \le&\left( K_h^*(y_{\gamma ,h} - z) - K^*(y_\gamma - z),u_\gamma - u_{\gamma ,h}\right) \\ \le&\left( (K_h^* - K^*)(y_{\gamma ,h} - z),u_\gamma - u_{\gamma ,h}\right) \\&+ \left( K^*(y_{\gamma ,h} - y_\gamma ),u_\gamma - u_{\gamma ,h}\right) . \end{aligned} \end{aligned}$$

The rest of the proof follows similarly to the proof of [23, Proposition 1.6]. The first term on the right-hand side is estimated by the Cauchy–Schwarz inequality and the inequality (4.1) as

$$\begin{aligned} \left( (K_h^* - K^*)(y_{\gamma ,h} - z),u_\gamma - u_{\gamma ,h}\right) \le \frac{\gamma }{2}\Vert u_{\gamma ,h}-u_\gamma \Vert _{L^2(\Omega )}^2 + \frac{1}{2\gamma }\delta (h)^2. \end{aligned}$$

Rewriting the second term and using again the Cauchy–Schwarz inequality combined with the inequality (4.1), we obtain

$$\begin{aligned} \begin{aligned} \left( K^*(y_{\gamma ,h} - y_\gamma ),u_\gamma - u_{\gamma ,h}\right)&= -\Vert y_\gamma - y_{\gamma ,h}\Vert _Y^2 + (y_\gamma - y_{\gamma ,h}, (K_h - K)u_{\gamma ,h})\\&\le -\frac{1}{2}\Vert y_\gamma - y_{\gamma ,h}\Vert _Y^2 + \frac{1}{2}\delta (h)^2. \end{aligned} \end{aligned}$$

Adding these two estimates, we finally arrive at

$$\begin{aligned} \frac{1}{2}\Vert y_\gamma - y_{\gamma ,h}\Vert _Y^2 + \frac{\gamma }{2} \Vert u_\gamma - u_{\gamma ,h}\Vert _{L^2(\Omega )}^2 \le \left( \frac{1}{2} + \frac{1}{2\gamma }\right) \delta (h)^2. \end{aligned}$$

\(\square \)

Combining the approximation error estimate from Theorem 3.2 and the discretization error estimate from Theorem 4.1, we immediately obtain the following result.

Theorem 4.2

If \(\bar{u}\) satisfies Assumption REG, then

$$\begin{aligned} \frac{1}{\gamma }\Vert y_{\gamma , h} - \bar{y}\Vert _Y^2 + \Vert u_{\gamma , h} - \bar{u}\Vert _{L^2(\Omega )}^2 \le c\left( \gamma ^{-1}(1+\gamma ^{-1}) \delta (h)^2 + \gamma ^\kappa \right) \end{aligned}$$

holds for all \(\gamma >0\) and \(h \ge 0\).

5 Active set method for the regularized problem

Let us now consider the special case where \(y = Ku\) is given as the unique solution of the partial differential equation

$$\begin{aligned} \left\{ \begin{aligned} Ay = u \quad&\text {in} \quad \Omega ,\\ y = 0 \quad&\text {on} \quad \partial \Omega . \end{aligned} \right. \end{aligned}$$
(5.1)

with A being a linear second-order elliptic differential operator, e.g., \(A = - \Delta \). In this case, the optimality conditions from Proposition 2.2 can be solved using a superlinearly convergent semi-smooth Newton method in function space; see [3, 6, 7].

We recall that (2.3) can be written as \(u_\gamma = H_\gamma (p_\gamma )\) for \(H_\gamma :L^r(\Omega )\rightarrow L^2(\Omega )\) with \(r\ge 2\),

$$\begin{aligned} {[}H_\gamma (p)](x) = {\left\{ \begin{array}{ll} u_i &{} \text {if } p(x) \in Q_i^\gamma 1 \le i \le d,\\ \frac{1}{\gamma }\left( p(x) - \frac{\alpha }{2}(u_i + u_{i+1}) \right) &{} \text {if } p(x) \in Q_{i,i+1}^\gamma 1 \le i < d, \end{array}\right. } \end{aligned}$$

where \(p_\gamma \in H^1_0(\Omega )\) is the solution to the adjoint equation

$$\begin{aligned} \left\{ \begin{aligned} A^*p&= z-y_\gamma \quad \text {in} \quad \Omega ,\\ p&= 0 \quad \quad \qquad \text {on} \quad \partial \Omega , \end{aligned} \right. \end{aligned}$$
(5.2)

and \(y_\gamma \) is the solution to (5.1) with \(u=u_\gamma \). From the natural \(H^1_0(\Omega )\) regularity of solutions to (5.2), the Sobolev embedding \(H^1_0(\Omega )\hookrightarrow L^r(\Omega )\) for some \(r>2\), and the general theory of semi-smooth Newton methods in function space [22], we deduce that the superposition operator \(H_\gamma \) is Newton differentiable from \(L^r(\Omega )\) to \(L^2(\Omega )\) with

$$\begin{aligned} {[}D_N H_\gamma (p) h](x) = {\left\{ \begin{array}{ll} \frac{1}{\gamma }h(x) &{} \text {if } p(x) \in Q_{i,i+1}^\gamma ,\\ 0 &{} \text {else}. \end{array}\right. } \end{aligned}$$

A Newton step for the solution of (\(P_\gamma \)) can therefore be formulated as

$$\begin{aligned} \begin{pmatrix} -\text {Id} &{} A &{} 0 \\ 0 &{} \text {Id} &{} A^*\\ 0 &{} A &{} -D_N H_\gamma (p^k) \end{pmatrix} \begin{pmatrix} u^{k+1} - u^k\\ y^{k+1} - y^k \\ p^{k+1} - p^k \end{pmatrix} = - \begin{pmatrix} Ay^k - u^k\\ A^*p^k + y^k - z \\ Ay^k - H_\gamma (p^k) \end{pmatrix} \end{aligned}$$
(5.3)

In [3], this was reduced to a symmetric system in (yp). Here, we instead consider an equivalent primal active set formulation that has proven to be more robust for small values of \(\gamma \) and h. In a slight abuse of notation, we introduce

$$\begin{aligned} Q_i^k:=\left\{ x\in \Omega :p^k(x)\in Q_i^\gamma \right\} ,\qquad 1\le i\le d, \end{aligned}$$

and similarly for \(Q_{i,i+1}^k\). The following algorithm is an extension of the one proposed in [20] for \(G(u)=\Vert u\Vert _{L^1(\Omega )}\).

Algorithm 1

Choose initial data \(u^0, p^0\) and parameters \(\alpha , \gamma \), set \(k=0\) and compute the sets \(Q_i^0\) for \(1 \le i \le d\) and \(Q_{i,i+1}^0\) for \(1 \le i < d\).

  1. 1.

    Solve for \((u^{k+1},y^{k+1}, p^{k+1}, \lambda ^{k+1})\) satisfying

    $$\begin{aligned} \left\{ \begin{aligned} Ay^{k+1} -u^{k+1}&=0 ,\\ A^*p^{k+1} + y^{k+1}-z&= 0,\\ - p^{k+1}+\gamma u^{k+1} + \alpha \lambda ^{k+1}&= 0, \end{aligned} \right. \end{aligned}$$
    (5.4a)
    $$\begin{aligned} \left( 1- \sum \limits _{i=1}^d \mathbb {1}_{Q_i^k} \right) \lambda ^{k+1}+ \left( 1-\sum \limits _{i=1}^{d-1} \mathbb {1}_{Q_{i,i+1}^k} \right) u^{k+1}\nonumber \\&=\sum \limits _{i=1}^d \mathbb {1}_{Q_i^k} u_i + \frac{1}{2}\sum \limits _{i=1}^{d-1} \mathbb {1}_{Q_{i,i+1}^k} (u_i + u_{i+1}).\nonumber \\ \end{aligned}$$
    (5.4b)
  2. 2.

    Compute the sets \(Q_i^{k+1}\) for \(1 \le i \le d\) and \(Q_{i,i+1}^{k+1}\) for \(1 \le i < d\).

  3. 3.

    If \(Q_i^k = Q_i^{k+1}\) for \(1 \le i \le d\) and \(Q_{i,i+1}^k = Q_{i,i+1}^{k+1}\) for \(1 \le i < d\), then go to step 4. Otherwise set \(k = k+1\) and go to step 2.

  4. 4.

    STOP: \(u^{k+1}\) is a solution of (\(P_\gamma \)).

The stopping criterion yields solutions of (\(P_\gamma \)).

Lemma 5.1

If

$$\begin{aligned} Q_i^k&= Q_i^{k+1} \quad 1 \le i \le d,\\ Q_{i,i+1}^k&= Q_{i,i+1}^{k+1} \quad 1 \le i < d, \end{aligned}$$

then the solution \((u^{k+1}, p^{k+1})\) computed from (5.4) satisfy (2.3). In particular, \(u^{k+1}\) is a solution to (\(P_\gamma \)).

Proof

Since for fixed \(Q_i^k\) and \(Q_{i,i+1}^k\) the solution of (5.4) is unique, we have \((u^k, y^k, p^k) = (u^{k+1}, y^{k+1}, p^{k+1})\). Inserting this into (5.4b) and comparing with (2.3) yields the claim. \(\square \)

We now show that Algorithm 1 coincides with a semi-smooth Newton method, which implies locally superlinear convergence.

Theorem 5.1

The active set step (5.4) is equivalent to the semi-smooth Newton step (5.3).

Proof

Clearly, the first two equations of (5.3) are equivalent to the first two equation of (5.4a). It therefore remains to consider the last equation, which is given by

$$\begin{aligned} A(y^{k+1} - y^k) - D_N H_\gamma (p^k) (p^{k+1} - p^k) = - Ay^k + H_\gamma (p^k). \end{aligned}$$
(5.5)

Let us define the function

$$\begin{aligned} \lambda ^{k+1}(x) := {\left\{ \begin{array}{ll} -\frac{1}{\alpha } \left( - p^{k+1}(x) + \gamma u^{k+1} \right) &{} \text {if } x \in Q_i^k,\\ \frac{1}{2}(u_i + u_{i+1}) &{} \text {if } x \in Q_{i,i+1}^k. \end{array}\right. } \end{aligned}$$

We now make a case distinction pointwise almost everywhere.

  1. (i)

    If \(x\in Q_i^k\), (5.5) reduces to \([Ay^{k+1}](x) = u_i\), and from the first line of (5.3) we obtain \(u^{k+1}(x) = u_i\).

  2. (ii)

    If \(x\in Q_{i,i+1}^k\), (5.5) shows that

    $$\begin{aligned} \gamma u^{k+1}(x) - p^{k+1}(x) + \frac{\alpha }{2}(u_i + u_{i+1}) = \gamma u^{k+1}(x) - p^{k+1}(x) + \alpha \lambda ^{k+1}(x)=0. \end{aligned}$$

Hence the third row of (5.3) is equivalent to (5.4b). In both cases, we obtain from the definition of \(\lambda ^{k+1}\) that

$$\begin{aligned} -p^{k+1} + \gamma u^{k+1} + \alpha \lambda ^{k+1} = 0, \end{aligned}$$

which finally gives (5.4a) and therefore the claimed equivalence. \(\square \)

6 Numerical results

In this section we present some numerical results and convergence rates. Let \(\Omega \subset \mathbb {R}^d\) be a bounded Lipschitz domain and K be the operator mapping u to the weak solution y of

$$\begin{aligned} \left\{ \begin{aligned} - \Delta y = u \quad&\text {in} \quad \Omega ,\\ y = 0 \quad&\text {on} \quad \partial \Omega . \end{aligned} \right. \end{aligned}$$
(6.1)

The operator \(K_h\) is correspondingly defined via the Galerkin approximation of (6.1) using linear finite elements on a triangulation of \(\Omega \), which is chosen in such a way that the approximation condition (4.1) is satisfied; see [23]. For the multibang penalty, we take \((u_1,\dots ,u_5) = (-2, -1, 0, 1, 2)\) and \(\alpha = 2\). We implemented Algorithm 1 in Python using DOLFIN [16, 17], which is part of the open-source computing platform FEniCS [1, 15]. The linear system (5.4) arising from the active set step is solved using the sparse direct solver spsolve from SciPy. The code used to obtain the following results can be downloaded from https://github.com/clason/multibangestimates.

Example 1: \(\kappa = 1\) We first consider \(\Omega =(0,1)\) and define

$$\begin{aligned} \bar{p}(x) :=&\left( \tfrac{27}{2} x\right) \mathbb {1}_{[0,\frac{2}{9})}(x)\\&+\left( -72 + \tfrac{3123 x}{2} - 13122 x^2 + 54675 x^3 - 111537 x^4\right. \\&\left. + \tfrac{177147}{2} x^5\right) \mathbb {1}_{[\frac{2}{9}, \frac{3}{9})}(x)\\&+\left( 9 - 18 x\right) \mathbb {1}_{[\frac{3}{9}, \frac{6}{9})}(x)\\&+\left( -20079 + 136062 x - 367416 x^2 + 494262 x^3 \right. \\&\left. - \tfrac{662661}{2} x^4 +\tfrac{177147}{2} x^5\right) \mathbb {1}_{[\frac{6}{9}, \frac{7}{9})}(x)\\&+\left( -\tfrac{27}{2} + \tfrac{27}{2} x\right) \mathbb {1}_{[\frac{7}{9},1]}(x),\\ \bar{u}(x) :=&\mathbb {1}_{[\frac{2}{27}, \frac{2}{9})}(x) + 2\mathbb {1}_{[\frac{2}{9}, \frac{3}{9})}(x) + \mathbb {1}_{[\frac{3}{9}, \frac{4}{9})}(x)\\&-\mathbb {1}_{[\frac{5}{9}, \frac{6}{9})}(x) -2\mathbb {1}_{[\frac{6}{9}, \frac{7}{9})}(x) -\mathbb {1}_{[\frac{7}{9}, \frac{25}{27})}(x),\\ \bar{y}(x) :=&\sin (2 \pi x)\\ e_\Omega :=&-\Delta \bar{y} - \bar{u},\\ z :=&- Ke_\Omega - \Delta \bar{p} + \bar{y}, \end{aligned}$$

see Fig. 1a and b. Note that \(\bar{p}, \bar{y} \in C^2(\overline{\Omega })\), and that \(\bar{u}\) and \(\bar{p}\) satisfy the optimality conditions in Proposition 2.1. Hence, \((\bar{u}, \bar{p})\) are a solution to (P). From Theorem 3.1 we further deduce that Assumption REG is satisfied with \(\kappa = 1\).

Fig. 1
figure 1

Constructed optimal adjoint states \(\bar{p}\) and optimal control \(\bar{u}\). a adjoint state, Example 1. b control, Examples 1 and 2. c adjoint state, Example 2

We now compute the solution of (\(P_{\gamma ,h}\)) for different values of h, where \(\Omega \) is divided into equidistant elements with mesh size h. From Theorem 3.2 we expect that the numerical convergence rate

$$\begin{aligned} \kappa _{\gamma ,h} := \frac{1}{\log (2)} \log \left( \frac{\Vert u_{\gamma /2,h} - \bar{u}\Vert _{L^2(\Omega )}^2}{\Vert u_{\gamma ,h} - \bar{u}\Vert _{L^2(\Omega )}^2} \right) \end{aligned}$$

satisfies \(\kappa _{\gamma ,h}\ge \kappa =1\). We compute \(\kappa _{\gamma ,h}\) for different but fixed mesh sizes h. Due to the discretization error, we expect a certain saturation effect for small \(\gamma \); see Theorem 4.2. Note that for \(d=2\), it is known that Assumption REG is not only sufficient for convergence rates similar to Theorem 3.2 but also necessary for high convergence rates; see [25]. Hence, we expect that \(\kappa _{\gamma ,h} \approx 1\), which can be observed from Table 1a and Fig. 2a. In addition, the discretization error dominates for small \(\gamma \) as expected.

Table 1 Computed numerical order of convergence for different h
Fig. 2
figure 2

Discretization and approximation error \(\Vert u_{\gamma ,h} - \bar{u}\Vert ^2_{L^2(\Omega )}\) for different \(\gamma \) and h. a Example 1. b Example 2

Example 2: \(\kappa < 1\) We also consider an example where Assumption REG is only satisfied with \(\kappa < 1\). The idea is to violate the assumption of the sufficient condition presented in Theorem 3.1. We modify the adjoint state \(\bar{p}\) from Example 1 to

$$\begin{aligned} \bar{p}(x):= & {} \left( \tfrac{27}{2} x \right) \mathbb {1}_{[0,\tfrac{3}{27})}(x)\\&\quad +\left( 266085 x^5 - \tfrac{433593}{2} x^4 + \tfrac{135765}{2} x^3 - \tfrac{20437}{2} x^2 + \tfrac{6812}{9} x - \tfrac{1703}{81} \right) \mathbb {1}_{[\tfrac{3}{27}, \tfrac{2}{9})}(x)\\&\quad +\left( 11334492 x^5 - 14168034 x^4 + 7054821x^3 - \tfrac{3498235}{2} x^2\right. \\&\quad \left. + \tfrac{1943450}{9}x - \tfrac{860051}{81} \right) \mathbb {1}_{[\tfrac{2}{9}, \tfrac{5}{18})}(x)\\&\quad +\left( -11334492 x^5 + 17316666 x^4 - 10553301 x^3 + \tfrac{6413635}{2} x^2\right. \\&\quad \left. - \tfrac{1457650}{3} x + \tfrac{528697}{18} \right) \mathbb {1}_{[\tfrac{5}{18}, \tfrac{3}{9})}(x)\\&\quad +\left( -\tfrac{709317}{2} x^5 + 696195 x^4 - \tfrac{1085913}{2} x^3 + 210182 x^2 - \tfrac{121150}{3} x + \tfrac{27761}{9} \right) \mathbb {1}_{[\tfrac{3}{9}, \tfrac{4}{9})}(x)\\&\quad +\left( - 18x + 9 \right) \mathbb {1}_{[\tfrac{4}{9},\tfrac{5}{9})}(x)\\&\quad +\left( - \tfrac{707859}{2} x^5 + \tfrac{2149821}{2}x^4 - \tfrac{2604285}{2} x^3 + \tfrac{1573075}{2}x^2 - \tfrac{710804}{3}x + \tfrac{256331}{9} \right) \mathbb {1}_{[\tfrac{5}{9},\tfrac{6}{9})}(x)\\&\quad +\left( -11340324 x^5 + 39376206 x^4 - 54660123 x^3 + \tfrac{75835981}{2} x^2 \right. \\&\quad \left. - \tfrac{39434798}{3} x + \tfrac{16396175}{9} \right) \mathbb {1}_{[\tfrac{6}{9},\tfrac{13}{18})}(x)\\&\quad +\left( 11340324 x^5 - 42526134 x^4 + 63759915 x^3 - \tfrac{95552197}{2} x^2 \right. \\&\quad \left. + \tfrac{161022862}{9} x - \tfrac{433967467}{162} \right) \mathbb {1}_{[\tfrac{13}{18},\tfrac{7}{9})}(x)\\&\quad +\left( 265356 x^5 - \tfrac{2221101}{2} x^4 + \tfrac{3712707}{2} x^3 - 1549124 x^2\right. \\&\quad \left. + \tfrac{11616563}{18} x - \tfrac{17395339}{162} \right) \mathbb {1}_{[\tfrac{7}{9},\tfrac{8}{9})}(x)\\&\quad +\left( \tfrac{27}{2} x -\tfrac{27}{2} \right) \mathbb {1}_{[\tfrac{8}{9},1]}(x), \end{aligned}$$

see Fig. 1c, while the remaining functions remain unchanged. Note that for, e.g., \(\hat{x} :=\frac{2}{9}\), we obtain \(p'(\hat{x})=0\) and \(p(\hat{x}) = 3\), which violates the assumption of Theorem 3.1. Hence we expect that \(\kappa < 1\) holds, resulting in a much slower convergence speed; see Theorem 3.2. This is corroborated by our numerical results: We obtain \(\kappa _{\gamma ,h} \approx 0.35 < 1\), which can be seen in Table 1b and Fig. 2b. Due to the slower convergence speed, we do not observe a saturation effect for the chosen range of \(\gamma \) and h.

7 Conclusions

For optimal control problems with a convex penalty promoting minimizers that pointwise almost everywhere take on values from a given discrete set, Moreau–Yosida approximation allows the solution by a superlinearly convergent semi-smooth Newton method. On a structural assumption on the behavior of the adjoint state near singular sets, convergence rates as the approximation parameter \(\gamma \rightarrow 0\) can be derived. The same assumption also yields discretization error estimates for fixed \(\gamma >0\). Numerical experiments corroborate the predicted rate.

This work can be extended in a number of directions. First, an active set condition similar to Assumption REG was derived in [19] for the approximation of bang-bang control of a semilinear equation and could be adapted to the multibang control setting. Of particular interest would be the extension to problems where the control enters into the principal part of an elliptic equation as in the case of topology optimization problems [5, 7].

On the other hand, the applicability of the multibang penalty G to the regularization of inverse problems was demonstrated in [3]. There, a condition related to Assumption REG was used to derive strong convergence as \(\alpha \rightarrow 0\), albeit without rates; and a natural question is whether the more quantitative Assumption REG would allow obtaining such rates at least in \(L^2(\Omega )\). Finally, combined regularization, approximation, and discretization estimates for the convergence \((\alpha ,\gamma ,h)\rightarrow 0\) would be highly useful.