1 Introduction

This paper is concerned with optimal control problems governed by semilinear parabolic partial differential equations (PDEs) subject to pointwise in time constraints on mean values in space of the solution to the PDE. We derive convergence rates for a space-time discretization of the problem based on conforming finite elements in space and a discontinuous Galerkin discretization method in time. The control variable is a vector-valued function depending on time, acting distributed in the domain. The inequality constraint on the solution of the PDE is imposed pointwise in time and averaged in space. The precise problem formulation will be given in the next section.

This class of problems under consideration is a simplified model motivated by applications from industrial processes like cooling/heating in steel manufacturing, or tumor therapy in biomathematics. For an extended overview of the possible applications, we refer, e.g., to [1, 2]. In many of these applications, the control variable depends on finitely many parameters with fixed spatial influence but varying in time. Further, especially in cooling processes and material optimization, bounds on the state variable and its derivatives are prescribed to avoid material failure and to preserve product quality.

Despite all these interesting applications, the literature on a priori error estimates for semilinear parabolic optimal control, even without state constraints, has only a few contributions. This can be explained by rather low regularity properties of parabolic equations compared to the more often discussed case of elliptic problems. This already becomes apparent, when considering convex linear-quadratic examples and comes into play even more severely, when issues of nonconvex problems, due to nonlinear state-equations, need to be addressed. To start with the latter issue, it is well known that, in contrast to convex problems, the first-order necessary conditions are no longer sufficient for optimality; hence, second-order sufficient optimality conditions (SSCs) are of interest, also for a priori discretization error estimates. These, or rather the resulting quadratic growth condition, can for instance be used to prove finite element error estimates. When discussing SSCs, one is often interested in proving or using conditions with minimal gap to the second-order necessary optimality conditions; as they are more likely to hold.

One of our main aims and novelty in this paper is to use weak SSCs for the continuous problem as derived in [3] in order to prove discretization error estimates. This is not straightforward if at the same time a clear separation of the influence of spatial and temporal discretization is desired. For elliptic state-constrained problems, it is known that the proof of convergence requires the quadratic growth condition for the continuous problem, only. Once the quadratic growth for the continuous problem is given, it does not matter if the growth is given due to weak or strong SSCs; see [4] in the elliptic case with pointwise state constraints.

In the parabolic setting, if one wants to achieve a clear separation of the spatial and temporal errors, the numerical analysis of both, convex and nonconvex problems, has previously been done in two steps introducing an intermediate time-discrete problem, cf., [5,6,7,8] for convex or [9] for nonconvex problems. As a consequence of this approach, a quadratic growth condition has to hold for a time-discrete version of a nonconvex problem in order to prove an error estimate for the fully discrete problem. Rather than relying on an additional assumption for each time-discrete problem, SSCs can be transferred from the continuous to the semidiscrete level if one uses a rather strong SSC; see [9]. In contrast, weak SSCs have not been shown to be stable with respect to time discretization, and it is not clear at all if this is possible without further assumptions.

In this paper, in favor of the more general weak SSCs, we will derive error estimates without the use of an intermediate auxiliary problem. Our main result is the error estimate of Theorem 5.3, namely an error estimate which coincides with the orders obtained for convex problems in [10].

Our technique allows to derive an estimate for the error between the continuous and semidiscrete solution only depending on the time step size. The price we pay for not transferring the SSC, to the time-discrete level, is that proving an analogous error estimate between the time-discrete and fully discrete problem is a difficult open problem. Let us note that error estimates for the control immediately imply an error estimate of the same order for the state due to Lipschitz properties of the control-to-state mapping. However, often, the state seems to have a better order of convergence concluding from numerical experiments. Proving this can be rather involved and has only been done in special settings, cf., [11, 12], even in the case of elliptic equations.

As already mentioned, there are rather few contributions to the error analysis for problems governed by semilinear parabolic equations. Among them is [9], discussing a setting including bilateral control constraints. The authors also discussed several control discretization approaches. Error estimates were obtained in [13, 14] for a problem without control and state constraints. Overall, the literature on state-constrained semilinear elliptic problems is less sparse, and we refer the reader to [4, 15, 16] and the references therein.

The lack of results for semilinear parabolic problems in the presence of state constraints is also explained by the sparsity of results for the corresponding linear theory, also due to rather low regularity properties of parabolic equations. Only recently, error estimates for the maximal error in time and different norms in space were derived for a space-time discretization of a linear parabolic state equation in [10] and [7]. Indeed, such estimates are necessary for the consideration of constraints pointwise in time on the mean value of the state variable and its first derivative. We would also like to point out the new result [17], where a pointwise (quasi)-best-approximation result for the maximal space-time error for the discretization of linear parabolic PDEs has been derived.

Confining ourselves to the linear parabolic setting, error estimates for pointwise in time and space state constraints are derived in [18], while [19] is concerned with the variational discretization approach. For control constraints, we refer to [5, 6].

Let us end this introduction by some remarks concerning second-order optimality conditions for state-constrained parabolic optimal control problems, which have recently attracted attention. A well-written survey on the state-of-the art can be found in [20].

For the case at hand, we will rely on second-order sufficient conditions (SSCs) that were introduced in [3]. The authors, inspired by techniques from nonlinear optimization in finite-dimensional spaces, obtained SSCs that are very close to the necessary ones. Their analysis was limited to the one-dimensional case and has been extended in [1] to domains of arbitrary dimensions considering, as in our case, vector-valued control functions depending on time only. Due to the nature of the problem, the resulting cone of critical directions can be recasted also from the theory of semi-infinite optimization [21].

Seminal papers for the theory of SSCs in the presence of integral state constrains are [22, 23]. The former deals with boundary controls and handles the state constraints using Ekeland’s principle. The latter considers a nonlinearity in the boundary conditions and uses concepts of semigroup theory to cope with the limitations on the dimension of the domain. More recently, [24] has overcome the limitation in the dimension using concepts of maximal parabolic regularity. For other contributions to the theory of SSCs in the presence of state constraints, we refer to [25,26,27]. Parts of this manuscript are considered in the PhD-thesis of Francesco Ludovici [28].

The paper is organized as follows: in Sect. 2, we give a precise definition of the model problem sketched in (1), introduce the operators and functionals involved in the analysis, and state first- and second-order optimality conditions. Section 3 is devoted to the time and space discretization of the problem. After these preliminary sections, collecting ‘essentially known’ results, we derive new a priori estimates for the discretization error between the solution of the continuous, semidiscrete and discrete state equation, extending techniques from [10] for linear parabolic problems to the semilinear case at hand, in Sect. 4. The core of the paper is Sect. 5. Extending techniques for the elliptic case presented in [4] to the parabolic case, we derive the rate of convergence for the optimal control problem.

2 Problem Formulation, Assumptions, and Analytic Setting

In this section, we introduce our model problem, discuss its precise analytic setting, introduce the main assumptions, and fix the notation. Extending the result of [10] to the case of semilinear parabolic PDEs, we consider, for a time interval \( I=]0,T[ \) and a convex bounded domain \( \varOmega \subset {\mathbb {R}}^{2} \) with boundary \( \partial \varOmega \), the following problem

$$\begin{aligned} \min \; J(q,u):=\frac{1}{2}\int _{I}\int _{\varOmega }(u(t,x)-u_{\mathrm{d}}&(t,x))^{2}\,\mathrm {d}x\,\mathrm {d}t+\frac{\alpha }{2}\int _{I}q(t)^Tq(t)\,\mathrm {d}t, \end{aligned}$$
(1a)

where the state u(tx) and the control \(q(t)=(q_{i}(t))_{i=1}^m\) are coupled by the semilinear parabolic PDE

$$\begin{aligned} \partial _{t}u(t,x)-\varDelta u(t,x) +d(t,x,u(t,x))&=\sum \limits _{i=1}^{m}q_{i}(t)g_{i}(x),\qquad \,\text {in} \;\,I \times \varOmega , \nonumber \\ u(t,x)&= 0,\qquad \qquad \qquad \,\text {on} \,I\times \partial \varOmega ,\nonumber \\ u(0,x)&= u_{0},\qquad \qquad \qquad \text {in}\;\lbrace 0 \rbrace \times \varOmega , \end{aligned}$$
(1b)

with a monotone and smooth nonlinearity d. Further, we consider control constraints

$$\begin{aligned} q_{\min }\le q(t)&\le \, q_{\max }, \qquad \text {a.e. in}\,I, \end{aligned}$$
(1c)

and, for a given weighting function \( \omega (x) \), pointwise in time state constraints

$$\begin{aligned} \int _{\varOmega }u(t,x)\omega (x)\mathrm {d}x&\le 0, \qquad \forall t \in [0,T]. \end{aligned}$$
(1d)

For \( i = 1,\ldots ,m \), we consider controls \( q_{i}\! \in \! L^{2}(I) \) and fixed functions \( g_{i} \!\in \! L^{\infty }(\varOmega ) \). We assume that the desired state satisfies \( u_{\mathrm{d}} \in L^{2}(I \times \varOmega ) \) and the initial data satisfies \( u_{0}\in H^{1}_{0}(\varOmega )\cap H^{2}(\varOmega )\hookrightarrow C({\text {cl}}(\varOmega )) \).

In the following, we set \( V:=H^{1}_{0}(\varOmega ) \), \( H:=L^{2}(\varOmega ) \); \( (\cdot ,\cdot )_{I} \) denotes the standard inner product in \( L^{2}(I,H) \), i.e., \( (\cdot ,\cdot )_{I} = \int _{I}(\cdot ,\cdot )\mathrm {d}t \) with associated norm \( \Vert \cdot \Vert _{I} \), while \( (\cdot , \cdot ) \) and \( \Vert \cdot \Vert \) is used for \( L^{2}(\varOmega ) \). The state constraint is denoted by \( F(u):=(u, \omega ) \), where \( \omega \in L^{\infty }(\varOmega ) \) is a weighting function. Throughout the paper, c will denote a generic constant independent of the discretization parameters, that may take different values at each appearance.

With appropriate discretization of the problem, we will derive our main result, Theorem 5.3, namely the convergence

$$\begin{aligned} \Vert \bar{q}-\bar{q}_{kh}\Vert _{L^{2}(I,{\mathbb {R}}^{m})}\le c\Bigg (k^\frac{1}{2}\Big (\log \frac{T}{k}+1\Big )^{\frac{1}{4}}+h\Big (\log \frac{T}{k}+1\Big )^\frac{1}{2}\Bigg ) \end{aligned}$$

between a locally optimal control \(\bar{q}\) of the model problem, and its discrete counterpart \(\bar{q}_{kh}\). The semidiscrete analogue

$$\begin{aligned} \Vert \bar{q}-\bar{q}_{k}\Vert _{L^{2}(I,{\mathbb {R}}^{m})} \le ck^\frac{1}{2}\Big (\log \frac{T}{k}+1\Big )^{\frac{1}{4}}, \end{aligned}$$

only depending on the time step size k could be shown with the same technique. As mentioned in the introduction, the price we pay for not transferring the SSC, to the time-discrete level, is that proving an analogous error estimate between the time-discrete and fully discrete problem, i.e.,

$$\begin{aligned} \Vert \bar{q}_k-\bar{q}_{kh}\Vert _{L^{2}(I,{\mathbb {R}}^{m})} \le ch\Big (\log \frac{T}{k}+1\Big )^\frac{1}{2} \end{aligned}$$

is delicate, as it would require a SSC for the time-discrete problem, where all constants need to be independent of the time discretization.

Before discussing the problem in detail, we impose the following usual assumptions on the nonlinearity; see, e.g., [29, Chapter 5, Assumption 5.6].

Assumption 2.1

The nonlinearity \( d(t,x,u) :I \times \varOmega \times {\mathbb {R}}\) is assumed to satisfy the following:

(i):

For all \(u \in {\mathbb {R}}\), the nonlinearity is measurable with respect to \( (t, x) \in I \times \varOmega \). Further, for almost every \( (t, x) \in I \times \varOmega \) it is twice continuously differentiable with respect to u.

(ii):

For \( u = 0 \), there is \( c > 0 \) such that d(txu) satisfies, together with its derivatives up to order two, the boundedness condition

$$\begin{aligned} \Vert d(\cdot ,\cdot ,0)\Vert _{L^\infty (I \times \varOmega )}+\Vert \partial _{u} d(\cdot ,\cdot ,0)\Vert _{L^\infty (I \times \varOmega )} + \Vert \partial _{u}^2d(\cdot ,\cdot ,0)\Vert _{L^\infty (I \times \varOmega )} \le c. \end{aligned}$$

Further, each of these derivatives satisfies a local Lipschitz condition with respect to u, i.e., for any \( M>0 \) there exist a constant \( L(M)>0 \) such that for any \(|u_{j}| \le M\) \(j=1,2\) there holds

$$\begin{aligned} \Vert \partial _{u}^id(\cdot , \cdot , u_{1})-\partial _{u}^id(\cdot , \cdot , u_{2})\Vert _{L^\infty (I \times \varOmega )} \le L(M)\vert u_{1} - u_{2} \vert , \end{aligned}$$

for every \(i = 0,1,2\).

(iii):

For all \( u \in {\mathbb {R}}\) and for almost every \( (t, x) \in I \times \varOmega \), there holds the monotonicity condition

$$\begin{aligned} \partial _{u}d(t, x, u) \ge 0. \end{aligned}$$

When no confusion arises, we shorten the notation for the semilinearity from \( d(\cdot , \cdot , u) \) to d(u) . We now focus on the well-posedness of the state equation (1b). We introduce the Hilbert space

$$\begin{aligned} W(0,T)= \lbrace u \in L^{2}(I, V):\partial _{t} u \in L^{2}(I, V^{*}) \rbrace , \end{aligned}$$

and the space of admissible controls

$$\begin{aligned} Q_{\mathrm{ad}}= \left\{ q \in L^{2}(I, {\mathbb {R}}^{m}):q_{\min }\le q(t) \le q_{\max },\, \text {a.e. in}\,I \right\} , \end{aligned}$$

with \( q_{\min } < q_{\max } \in \mathbb {R}^{m} \).

Denoting with \( V^{*} \) the dual space of V, we recall that the triplet

$$\begin{aligned} V \hookrightarrow H \hookrightarrow V^{*} \end{aligned}$$

forms a Gelfand triple. Then, for \( u, \varphi \in W(0,T) \), we define a bilinear form

$$\begin{aligned} b(u, \varphi ) := (\partial _{t}u, \varphi )_{I}+(\nabla u, \nabla \varphi )_{I}+(u(0),\varphi (0)) \end{aligned}$$

and the weak formulation of (1b) reads: for given \( q \in L^{2}(I,{\mathbb {R}}^{m})\) and initial data \( u_{0} \in V \cap H^{2}(\varOmega ) \hookrightarrow C({\text {cl}}(\varOmega )) \), find \( u \in ~W(0,T) \) satisfying

$$\begin{aligned} b(u, \varphi ) + (d(\cdot ,\cdot ,u),\varphi )_{I} = (qg,\varphi )_{I} + (u_{0},\varphi (0)),\,\,\forall \varphi \in W(0,T). \end{aligned}$$
(2)

It is well known that the PDE (2) admits a unique solution \( u \in W(0,T)\) satisfying the additional regularity \( u \in C({\text {cl}}(I) \times {\text {cl}}(\varOmega )) \); see, e.g., [29, Theorem 5.5]. Further, thanks to the monotonicity assumption on \( d(\cdot ,\cdot ,u) \), the solution u of (2) satisfies the additional regularity

$$\begin{aligned} u \in L^{2}(I,V \cap H^{2}(\varOmega ))\cap H^{1}(I,H)\hookrightarrow L^{\infty }(I\times \varOmega ), \end{aligned}$$

and the following stability estimates hold, cf., [9, Proposition 2.1] for a weak formulation with explicit initial condition which is equivalent to the form considered here, justifying the use of the \(L^2\) inner product in the notation of (2).

Proposition 2.1

Let \( u \in W(0,T) \) be the solution of (2) for given data q, g, \(u_0\), and d. Then, there holds

$$\begin{aligned} \begin{aligned}&\Vert u\Vert _{L^{\infty }(I\times \varOmega )}\le c\big (\Vert qg\Vert _{L^{\infty }(I\times \varOmega )} +\Vert u_{0}\Vert _{L^{\infty }(\varOmega )}+\Vert d(0)\Vert _{L^{\infty }(I\times \varOmega )}\big ),\\&\Vert u\Vert _{L^{2}(I,V\cap H^2(\varOmega ))}+\Vert u\Vert _{L^{\infty }(I,V)}+\Vert \partial _{t}u\Vert _{I}\le c\big (\Vert qg\Vert _{I}+\Vert u_{0}\Vert _{V}+\Vert d(0)\Vert _{I} \big ). \end{aligned} \end{aligned}$$

Remark 2.1

We observe that the regularity of \( u \in W(0,T) \) is enough to treat the state constraint. Indeed, there holds the embedding \( W(0,T) \hookrightarrow C(I, H) \) and we have \( F :W(0,T)\rightarrow C({\text {cl}}(I)) \), where

$$\begin{aligned} F(u)(t) := \int _\varOmega u(t,x)\omega (x)\mathrm {d}x. \end{aligned}$$

On the other hand, we require more regularity for the solution of (2), because stability estimates in the norms of \( L^{\infty }(I \times \varOmega )\), \(L^{\infty }(I,H) \) will come into play to ensure Lipschitz continuity for the control-to-state map. Further, we note that \( u_{0} \in V \cap C({\text {cl}}(\varOmega )) \) is enough to ensure well-posedness of the problem. The assumption \( u_{0} \in H^{2}(\varOmega ) \) is posed to use results from [6, 10], where this regularity is required to fully exploit the approximation property of the discontinuous Galerkin method.

Thanks to (1c), we can regard the control variable q as an element of \( L^{\infty }(I, {\mathbb {R}}^{m}) \). Then, the following definitions are justified. We introduce the control-to-state map

$$\begin{aligned} S :L^{\infty }(I, {\mathbb {R}}^{m}) \rightarrow W(0,T)\cap C(I \times \varOmega ), \end{aligned}$$

associating with any given q the solution \( u(q):=S(q) \) of (2). We denote the concatenation of the control-to-state map and the state constraint F by

$$\begin{aligned} G = (F \circ S) :L^{\infty }(I, \mathbb {R}^{m})\rightarrow C({\text {cl}}(I)) . \end{aligned}$$

In the subsequent analysis, we will need G to be of class \( C^{2} \). This is indeed the case; see [3].

In order to formulate the optimal control problem in reduced form, we introduce the set of feasible controls

$$\begin{aligned} Q_{\mathrm{feas}} := \lbrace q \in Q_{\text {ad}} :G(q) \le 0\rbrace . \end{aligned}$$

Then, (1) reads

figure a

We remark that the problem at hand is nonconvex, due to the presence of the nonlinear term in the state equation. As a consequence, it is suitable to consider local solutions, as defined below.

Definition 2.1

A control \( \bar{q} \in Q_{\mathrm{feas}} \) is a local solution, in the sense of \( L^{2}(I,{\mathbb {R}}^{m}) \), if there exists some \( \epsilon > 0 \) such that there holds

$$\begin{aligned} j(\bar{q}) \le j(q) \end{aligned}$$

for all \( q \in Q_{\mathrm{feas}} \) with \( \Vert q-\bar{q}\Vert _{L^{2}(I,{\mathbb {R}}^{m})} \le \epsilon \).

The existence of a local solution follows by standard arguments; see, e.g., [29, Theorem 5.7].

Proposition 2.2

Assuming the existence of a feasible point, problem (\(\mathbb {P}\)) admits at least one solution

$$\begin{aligned} (\bar{q}, \bar{u}) \in L^{\infty }(I, \mathbb {R}^{m}) \times \big (W(0,T)\cap C({\text {cl}}(I) \times \bar{\varOmega )}\cap H^1(I,H)\cap L^2(I,H^2(\varOmega ))\big ), \end{aligned}$$

where \( \bar{u} = S(\bar{q}) \).

We conclude the section with well-known differentiability properties of the operators and functionals involved in the analysis, referring to [29, Chapter 5] for details.

Lemma 2.1

The map \( S :L^{\infty }(I, {\mathbb {R}}^{m}) \rightarrow W(0,T)\cap L^{\infty }(I \times \varOmega ) \) is of class \( C^{2} \) from \( L^{\infty }(I, \mathbb {R}^{m})\) to W(0, T) . For \( p \in L^{\infty }(I, \mathbb {R}^{m})\) its first derivative \(S^{'}(q)p =: v_{p}\), in direction p, is the solution of

$$\begin{aligned} b(v_{p}, \varphi ) + (\partial _{u}d(\cdot ,\cdot ,u(q))v_{p}, \varphi )_{I} = (pg, \varphi )_{I}, \qquad \forall \;\varphi \in W(0,T), \end{aligned}$$
(3)

with zero-initial condition. For \( p_{1}, p_{2} \in L^{\infty }(I, \mathbb {R}^{m})\) the second derivative \( S^{''} (q)p_{1}p_{2} =: v_{p_{1}p_{2}} \), in the directions \(p_1, p_2\), solves

$$\begin{aligned} b(v_{p_{1}p_{2}},\varphi )+(\partial _{u}d(\cdot ,\cdot ,u_{q})v_{p_{1}p_{2}}, \varphi )_{I}=-(\partial _{uu}d(\cdot ,\cdot ,u_{q})v_{p_{1}}v_{p_{2}}, \varphi )_{I}, \end{aligned}$$

for all \(\varphi \in W(0,T)\), again with zero-initial condition, where \(v_{p_{1}}, v_{p_{2}}\) are given by (3).

For S and its first derivative, the following Lipschitz properties hold.

Lemma 2.2

For \(p, q_{1}, q_{2} \in Q_\mathrm{ad}\), there exists a constant \( c > 0 \) such that

$$\begin{aligned} \Vert S(q_{1})-S(q_{2})\Vert _{L^{\infty }(I,V)}&\le c \Vert q_{1}-q_{2}\Vert _{L^{2}(I, {\mathbb {R}}^{m})}, \end{aligned}$$
(4a)
$$\begin{aligned} \Vert S^{'}(q_{1})p-S^{'}(q_{2})p\Vert _{I}&\le c \Vert q_{1}-q_{2}\Vert _{L^{2}(I, {\mathbb {R}}^{m})}\Vert p\Vert _{L^{2}(I, {\mathbb {R}}^{m})}, \end{aligned}$$
(4b)
$$\begin{aligned} \Vert S^{'}(q_{1})p-S^{'}(q_{2})p\Vert _{L^{\infty }(I,H)}&\le c \Vert q_{1}-q_{2}\Vert _{L^{2}(I, {\mathbb {R}}^{m})}\Vert p\Vert _{L^{2}(I, {\mathbb {R}}^{m})}. \end{aligned}$$
(4c)

Proof

The claim of (4a)–(4b) is given in [9, Lemma 2.3]. To show (4c), we consider \( \xi :=S^{'}(q_{1})p-S^{'}(q_{2})p \) and define \( \tilde{u}:=S^{'}(q_{2})p \). We note that, for any \( \varphi \in W(0,T)\), \( \xi \) fulfills

$$\begin{aligned} b(\xi , \varphi )+\big (\partial _{u}d(u(q_{1}))\xi ,\varphi \big )_{I}= -\big (\partial _{u}d(u(q_{1}))\tilde{u}-\partial _{u}d(u(q_{2}))\tilde{u} ,\varphi \big )_{I}. \end{aligned}$$
(5)

Clearly, due to the boundedness of \( \partial _{u}d(\cdot ) \), for \( S^{'}(q)p \) there hold the same stability estimates as for S(q) , compare with Lemma 2.1. Then, by means of such a stability estimate in \( L^{\infty }(I,H) \) in combination with the Lipschitz continuity of \( \partial _{u}d(\cdot ) \), we obtain

$$\begin{aligned} \Vert \xi \Vert _{L^{\infty }(I,H)}&\le c\left\| \big (\partial _{u}d(u(q_{1}))-\partial _{u}d(u(q_{2}))\big )\tilde{u}\right\| _{I}\\&\le c\left\| u(q_{1})-u(q_{2})\right\| _{L^{4}(I\times \varOmega )}\left\| \tilde{u}\right\| _{L^{4}(I\times \varOmega )}\\&\le c\left\| u(q_{1})-u(q_{2})\right\| _{L^{\infty }(I, V)}\left\| \tilde{u}\right\| _{L^{\infty }(I,V)}\\&\le c\Vert q_{1}-q_{2}\Vert _{I}\Vert p\Vert _{I}, \end{aligned}$$

where we used the embedding \( L^{\infty }(I,V) \hookrightarrow L^{4}(I\times \varOmega ) \). \(\square \)

Corollary 2.1

The functional \( j(q):L^{\infty }(I,{\mathbb {R}}^{m}) \rightarrow {\mathbb {R}}\) is of class \( C^{2} \) in the topology of \( L^{\infty }(I, \mathbb {R}^{m})\) and for \( q, p, p_{1}, p_{2} \in L^{\infty }(I,{\mathbb {R}}^{m}) \) there holds

$$\begin{aligned} j^{'}(q)(p)&= \int _{\varOmega }\sum _{i=1}^{m}(\alpha q_{i}(t)+z_{0}(q)g_{i}(x))p_{i}(t)\,\mathrm {d}t, \\ j^{''}(q)p_{1}p_{2}&= \int _{\varOmega _{I}}(v_{p_{1}}v_{p_{2}}+\alpha p_{1}p_{2}-z_{0}(q)\partial _{uu}d(x,t,u(q))v_{p_{1}}v_{p_{2}})\,\mathrm {d}t\,\mathrm {d}x, \end{aligned}$$

where \( z_{0}(q) \in W(0,T) \) is the adjoint state associated with q and j, defined, for all \( \varphi \in W(0,T) \), as the unique solution of

$$\begin{aligned} b(\varphi , z) + (\partial _{u}d(\cdot ,\cdot ,u(q))z, \varphi )_{I} = (u_{q}-u_{\mathrm{d}}, \varphi )_{I}, \end{aligned}$$
(6)

and \( v_{p_{i}}\), \(i=1,2 \), is defined as (3).

Remark 2.2

As observed in [29, Section 5.7.4], when the control appears quadratically in the cost functional and linearly in the state equation, then the reduced cost functional is of class \( C^{2} \) not only in \( L^{\infty }(I, {\mathbb {R}}^{m}) \) but also in \( L^{2} (I,{\mathbb {R}}^{m}) \); see also [3, Remark 2.8]. In particular, this allows the introduction of a quadratic growth condition without two-norm discrepancy as it is stated later in Theorem 2.5.

2.1 Optimality Conditions

In this section, we discuss the optimality conditions for our optimal control problem. In a first step, we state standard first-order necessary conditions in KKT form.

For the rest of the paper, we rely on the following linearized Slater’s regularity condition.

Assumption 2.2

Given a local solution \( \bar{q} \) of (\(\mathbb {P}\)), we assume the existence of \( q_{\gamma } \in Q_{\mathrm{ad}} \) such that

$$\begin{aligned} G(\bar{q}) + G^{'}(\bar{q})(q_{\gamma }-\bar{q}) \le -\gamma < 0, \end{aligned}$$
(7)

for some \( \gamma \in \mathbb {R}_{+} \).

Based on the Slater condition, we obtain first-order necessary optimality conditions in KKT form; see, e.g., [25].

Theorem 2.3

Let \( \bar{q} \in Q_{\mathrm{feas}} \) be a local solution of (\(\mathbb {P}\)) such that Assumption 2.2 is satisfied, and let \( \bar{u} \) be the associated state. Then, there exists a Lagrange multiplier \( \bar{\mu } \in C({\text {cl}}(I))^{*} \) and an adjoint state \( \bar{z} \in L^{2}(I, V) \) such that

$$\begin{aligned} \begin{aligned}&b(\bar{u},\varphi )+(d(\cdot ,\cdot ,\bar{u}), \varphi )_{I} = (\bar{q}g,\varphi )_{I} + (u_{0}, \varphi (0)),&\forall \varphi \in W(0,T),\\&b(\varphi ,\bar{z})+(\varphi , \partial _{u}d(\cdot ,\cdot ,\bar{u})\bar{z}) = (\bar{u}-u_{\mathrm{d}},\varphi )_{I} + \langle \bar{\mu }, F(\varphi ) \rangle ,&\forall \varphi \in W(0,T), \\&\alpha (\bar{q}, q-\bar{q})_{L^{2}(I)} + (\bar{z},(q-\bar{q})g)_{I} \ge 0,&\forall q \in Q_{\mathrm{ad}}, \\&\langle F(\bar{u}),\bar{\mu } \rangle = 0,\,\, \bar{\mu } \ge 0,\,\,F(\bar{u})\le 0, \end{aligned} \end{aligned}$$
(8)

where \(\langle \cdot , \cdot \rangle \) denotes the duality pairing between \( C({\text {cl}}(I)) \) and \( C({\text {cl}}(I))^{*} \).

Since the problem at hand is nonconvex, we introduce second-order sufficient conditions. The following results can be obtained by combining several ideas from the literature. As it is not the purpose of this paper to show these second-order conditions, we will refrain from providing the lengthy details and refer to [28]. We only remark that the SSCs stated here can be obtained using the approach developed in [30] for semilinear elliptic problems. Their analysis was extended to semilinear parabolic problems in [3], for the one-dimensional case. In higher dimensions, the control-to-state map is, in general, not twice continuously differentiable from \( L^{2}(I \times \varOmega ) \) to \(C({\text {cl}}(I) \times {\text {cl}}(\varOmega ))\). This restriction to one space dimension has been circumvented in [1], considering, as in this paper, controls depending on time only.

To discuss SSCs, we introduce the Hamiltonian \(H :{\mathbb {R}}\times \varOmega \times {\mathbb {R}}\times {\mathbb {R}}\times {\mathbb {R}}\rightarrow {\mathbb {R}}\) given by

$$\begin{aligned} H(q,u,z) = H(t,x,q,u,z) = \frac{1}{2}(u-u_{\mathrm{d}})^{2}+\frac{\alpha }{2}q^{2}+z\left( \sum _{i=1}^{m}q_{i}g_{i}-d(u)\right) , \end{aligned}$$

suppressing the first two arguments tx in the exposition. Moreover, the reduced Lagrangian function is given by

$$\begin{aligned} \mathcal {L}(q,\mu )= j(q) + \langle \mu , G(q) \rangle . \end{aligned}$$

Remark 2.3

For better readability, at each \( (t,x) \in I\times \varOmega \), we denote by \( \bar{H}, \bar{\mathcal {L}} \) the Hamiltonian and Lagrangian function, when evaluated at \( (\bar{q}, \bar{u}, \bar{z}) \). We note that \(\frac{\partial H}{\partial q}, \frac{\partial ^{2} H}{\partial q^{2}} \) are, respectively, an \( {\mathbb {R}}^{m} \)-vector and an \( {\mathbb {R}}^{m\times m} \)-matrix. When referring to the ith component and the (ij)-entry, we abbreviate \( \partial _{q}H_{i}, \partial _{q}^{2}H_{i,j} \), respectively.

We now give the cone of critical directions associated with \( \bar{q} \in Q_{\mathrm{feas}} \), following [3]. Introducing the conditions

$$\begin{aligned}&p_{i}(t) = \left\{ \begin{array}{l l l} \ge 0, &{} \quad \text {if } \bar{q}_{i}=q_{\min } ,\\ \le 0, &{} \quad \text {if } \bar{q}_{i}=q_{\max } ,\\ =0, &{} \quad \text {if } \int _{\varOmega }\partial _q\bar{H}_{i}\,\mathrm {d}x \not = 0 , \end{array} \right. \text {for all}\,\, i=1,\ldots ,m \end{aligned}$$
(9)
$$\begin{aligned}&F(v_{p})=\frac{\partial F}{\partial u}(\bar{u})v_{p} \le 0 \,\,\text {if}\,\, F(\bar{u}) =0, \end{aligned}$$
(10)
$$\begin{aligned}&\int _{\varOmega }F(v_{p})\,\mathrm {d}\bar{\mu }=0, \end{aligned}$$
(11)

where \( v_{p} \) is defined by (3), the cone of critical direction is given by

$$\begin{aligned} C_{\bar{q}} = \{ p \in L^{2}(I, {\mathbb {R}}^{m}):p\,\,\text {satisfies }(9),~(10),~(11)\}. \end{aligned}$$
(12)

After this preparation, we postulate the following second-order sufficient condition.

Assumption 2.4

Let \( \bar{q} \in Q_{\mathrm{feas}} \) fulfill, together with the associated state \( \bar{u} \), adjoint state \( \bar{z} \), and Lagrange multipliers \( \bar{\mu } \), the first-order optimality conditions (8). Then, we assume

$$\begin{aligned} \frac{\partial ^{2}\bar{\mathcal {L}}}{\partial ^{2}q}p^{2} >0 \qquad \forall p \in C_{\bar{q}} \setminus \lbrace 0 \rbrace . \end{aligned}$$
(13)

Remark 2.4

Comparing the second-order sufficient condition of Assumption 2.4 with the one of [3], we observe that the assumption

$$\begin{aligned} \partial _{q}^{2}\bar{H}_{i,i} \ge \xi , \qquad \forall t \in I\setminus E_{i}^{\nu },\,\forall i =1,\ldots ,m, \end{aligned}$$

where

$$\begin{aligned} E_{i}^{\nu } = \Big \lbrace t \in I:\int _{\varOmega }\partial _{q}\bar{H}_{i}\,\mathrm {d}x \big \vert \ge \nu \Big \rbrace \end{aligned}$$

is the set of sufficiently active control constraints and \( \xi , \nu \) are positive constants, is implicitly satisfied in our setting. Indeed, since the control appears quadratically in the cost functional and linearly in the state equation, it trivially follows \( \partial _{q}^{2}\bar{H}_{i,i} = \alpha \mathbb {I} > 0 \), where \( \mathbb {I} \) denotes the identity operator.

With the second-order conditions at hand, we obtain the following quadratic growth condition; see [1, Theorem 5] and Remark 2.2.

Theorem 2.5

Let \( \bar{q} \in Q_{\mathrm{feas}} \) satisfy the first-order necessary optimality conditions (8) and let Assumption 2.4 hold. Then, there exist constants \( \delta ,\eta > 0 \) such that

$$\begin{aligned} j(q) \ge j(\bar{q}) + \delta \Vert q-\bar{q}\Vert _{L^{2}(I,{\mathbb {R}}^{m})}^{2}, \end{aligned}$$
(14)

for any \( q \in Q_{\mathrm{feas}} \) with \( \Vert q-\bar{q}\Vert _{L^{2}(I, {\mathbb {R}}^{m})} \le \eta \).

3 Discretization

We briefly describe the discretization in time and space of our problem. We use the dG(0)cG(1) method, discontinuous in time and continuous in space Galerkin method, referring to [31] for additional details.

The control variable is discretized implicitly by the optimality conditions through the variational discretization approach, attributed to [32].

3.1 Time Discretization

We consider a partitioning of \({\text {cl}}(I)\) consisting of time intervals \( I_{n}=]t_{n-1},t_{n} ] \), for \( n=1,\ldots ,N \) and \( I_{0}=\lbrace 0 \rbrace \), where the times \( t_{i} \) are such that

$$\begin{aligned} 0=t_{0}<t_{1}<\cdots<t_{N-1}<t_{N}=T . \end{aligned}$$

The length of the interval \( I_{n} \) is \( k_{n} \) and we set \( k=\max _{n}k_{n} \) imposing that \( k<T \). Further, we assume the existence of strictly positive constants \( a,b,\tilde{k} \) such that the following technical conditions hold:

$$\begin{aligned} \min _{n> 0}k_{n} \ge ak^{b},&\tilde{k}^{-1} \le \frac{k_{n}}{k_{n+1}}\le \tilde{k} \quad \forall n > 0. \end{aligned}$$

We denote with \( \mathcal {P}_{0}(I_{n},V) \) the space of piecewise constant polynomials on \( I_{n} \) with values in V. The semidiscrete state and trial space is

$$\begin{aligned} U_{k}=U_{k}(V)=\big \lbrace \varphi _{k} \in L^{2}(I,V):\varphi _{k,n} = \varphi _{k}\vert _{I_{n}} \in \mathcal {P}_{0}(I_{n},V), \,n=1,\ldots ,N \big \rbrace , \end{aligned}$$

with inner product \( (\cdot ,\cdot )_{I_{n}} \) and norm \( \Vert \cdot \Vert _{I_{n}} \) given by the restriction of the usual inner product and norm of \( L^{2}(I,H) \) onto the interval \(I_{n}\), i.e., we have \( (\cdot , \cdot )_{I_{n}} = \int _{I_{n}}(\cdot , \cdot )\,\mathrm {d}t \).

Our functions are piecewise constant on each interval. Thus, we can simplify standard notation and, for functions \( \varphi _{k} \in U_{k}\), we write

$$\begin{aligned} \varphi _{k,n+1}=\varphi _{k,n}^{+} = \lim _{t \rightarrow 0^{+}} \varphi _{k}(t_{n}+t) =\lim _{t \rightarrow 0^{+}} \varphi _{k}(t_{n+1}-t),\quad [\varphi _{k}]_{n} = \varphi _{k,n+1}-\varphi _{k,n}. \end{aligned}$$

For \( u_{k},\varphi _{k} \in U_{k}+ W(0,T) \), the semidiscrete bilinear form is defined, in general, as

$$\begin{aligned} B(u_{k},\varphi _k):=\sum _{n=1}^{N}(\partial _{t}u_{k},\varphi _k)_{I_{n}}\!+\!\, (\nabla u_{k}, \nabla \varphi _k)_{I}\!+\!\sum _{n=2}^{N}([u_{k}]_{n-1},\varphi _{k,n})\!\,+\!\,(u_{k,1}, \varphi _{k,1}). \end{aligned}$$

As long as only piecewise constants in time are considered, the bilinear form can be simplified, noting that \(\partial _t u_k\bigl |_{I_{n}} \equiv 0\) for any \(u_k \in U_{k}\). Indeed, for any \(u_k,\varphi _k \in U_{k}\) it is

$$\begin{aligned} B(u_{k},\varphi _k)&= (\nabla u_{k}, \nabla \varphi _k)_{I}+\sum _{n=2}^{N}\left( [u_{k}]_{n-1},\varphi _{k,n}\right) +(u_{k,1}, \varphi _{k,1}), \end{aligned}$$

and the semidiscrete state equation reads: given \( q \in L^{2}(I,{\mathbb {R}}^{m}) \) as well as \( u_{0}\in H^{2}(\varOmega )\cap V \), find \( u_{k} = u_{k}(q) \in U_{k}\) such that

$$\begin{aligned} B(u_{k}, \varphi _{k}) + (d(\cdot , \cdot ,u_{k}),\varphi _{k})_{I}=(qg,\varphi _{k})_{I}+(u_{0},\varphi _{k,1}),\quad \forall \varphi _{k} \in U_{k}. \end{aligned}$$
(15)

Note that this is a variant of the implicit Euler scheme with averaging on the right-hand side, where the partial derivatives with respect to time are piecewise zero. For the existence and regularity of a unique solution for (15), the following proposition, from [9, Theorem 3.1 and 3.2], holds true.

Proposition 3.1

For the solution \( u_{k} \in U_{k}\) of (15), the following stability estimates hold

$$\begin{aligned}&\Vert u_{k}\Vert _{L^{\infty }(I \times \varOmega )} \le c\big (\Vert qg\Vert _{L^{p}(I \times \varOmega )}+ \Vert u_{0}\Vert _{L^{\infty }} + \Vert d(\cdot , \cdot ,0)\Vert _{L^{p}(I \times \varOmega )}\big ),\nonumber \\&\Vert u_{k}\Vert _{L^{\infty }(I,V)} \le c\big (\Vert qg\Vert _{I}^{2} + \Vert u_{0}\Vert _{V} + \Vert d(\cdot , \cdot ,0)\Vert _{I} \big ), \end{aligned}$$
(16)

where \( p > 2 \).

As for the continuous case, we now introduce the semidiscrete control-to-state map

$$\begin{aligned} S_{k} :L^{\infty }(I, {\mathbb {R}}^{m}) \rightarrow U_{k}, \end{aligned}$$

associating with any given q the solution \( u_{k}(q):=S_{k}(q) \) of (15). As in the continuous case, we have that \( S_{k} \) is of class \( C^{2} \).

Lemma 3.1

The operator \( S_{k} :L^{\infty }(I, {\mathbb {R}}^{m}) \rightarrow U_{k}\) is of class \( C^{2} \). Further, for \( u_{k}=S_{k}(q) \) and \( p~\in ~L^{\infty }(I, \mathbb {R}^{m})\), its first derivative \( S_{k}^{'}(q)p:=v_{k,p} \), in direction p, is the solution of

$$\begin{aligned} B(v_{k,p},\varphi _{k}) +(\partial _{u}d(\cdot , \cdot , u_{k})v_{k,p}, \varphi _{k})_{I} = (pg,\varphi _{k})_{I},\quad \forall \varphi _{k}\in U_{k}. \end{aligned}$$

For \( p_{1}, p_{2} \in L^{\infty }(I, \mathbb {R}^{m})\), its second derivative \( S^{''}_{k}(q)p_{1}p_{2}=v_{k,p_{1}p_{2}} \), in the directions \(p_1, p_2\), is the solution of

$$\begin{aligned} B(v_{k,p_{1}p_{2}},\varphi _{k})+(\partial _{u}d(\cdot , \cdot , u_{k})v_{k,p_{1}p_{2}}, \varphi _{k})_{I} =-(\partial _{uu}d(\cdot ,\cdot ,u_{k})v_{k,p_{1}}v_{k,p_{2}},\varphi _{k})_{I}, \end{aligned}$$

for all \(\varphi _{k}\in U_{k}\).

Similarly to S, also for \( S_{k} \) and its first derivative there holds the following Lipschitz property, compare [9, Lemma 3.1] and Lemma 2.2.

Lemma 3.2

For \( q_{1}, q_{2}, p \in L^{\infty }(I, \mathbb {R}^{m})\) there holds

$$\begin{aligned} \begin{aligned} \Vert S_{k}(q_{1})-S_{k}(q_{2})\Vert _{I}&\le c \Vert q_{1}-q_{2}\Vert _{L^{2}(I, {\mathbb {R}}^{m})},\\ \Vert S^{'}_{k}(q_{1})p-S^{'}_{k}(q_{2})p\Vert _{I}&\le c \Vert q_{1}-q_{2}\Vert _{L^{2}(I, {\mathbb {R}}^{m})}\Vert p\Vert _{L^{2}(I, {\mathbb {R}}^{m})}, \\ \Vert S^{'}_{k}(q_{1})p-S^{'}_{k}(q_{2})p\Vert _{L^{\infty }(I,H)}&\le c \Vert q_{1}-q_{2}\Vert _{L^{2}(I, {\mathbb {R}}^{m})}\Vert p\Vert _{L^{2}(I, {\mathbb {R}}^{m})}. \end{aligned} \end{aligned}$$
(17)

3.2 Space Discretization

We consider a family \(\mathcal {T}_{h}\) of subdivisions of \( \varOmega \) consisting of closed triangles or quadrilaterals (tetrahedral or hexahedral in dimension three) T which are affine equivalent to their reference elements. The union \(\varOmega _h = \text {int}\bigl ( \bigcup _{T \in \mathcal T_h} T\bigr )\) of these elements is assumed to be such that the vertices on \(\partial \varOmega _h\) are located on \(\partial \varOmega \). We assume the family \(\mathcal T_h\) to be quasi-uniform and shape regular in the sense of [33] denoting by \( h_{T} \) the diameter of T and \( h:=\max _{T\in \mathcal {T}_{h}} h_{T}\). Then, we define the conforming finite element space \( V_{h} \subset V\) as the space of piecewise linear functions with respect to \( \mathcal {T}_{h}\) with the canonical extension \(v\bigl |_{\varOmega \setminus \varOmega _h} \equiv 0\) for any \(v \in V_h\). Moreover, we assume that the sequence of spatial meshes is such that the \(L^2\)-projection \(\varPi _h\) onto \(V_h\) is stable with respect to the \(H^1\)-norm, for conditions ensuring this stability see, e.g., [34]. Then, the discrete state and trial spaces are given by

$$\begin{aligned} U_{kh}=U_{kh}(V_{h})=\big \lbrace \varphi _{kh} \in L^{2}(I,V_{h}):\varphi _{kh,n}=\varphi _{kh}\vert _{I_{n}} \in \mathcal {P}_{0}(I_{n},V_{h}), n=1,\ldots ,N \big \rbrace , \end{aligned}$$

and the discrete state equation reads: for given \( q \in L^{\infty }(I, \mathbb {R}^{m})\), find the state \( u_{kh} = u_{kh}(q) \in U_{kh}\) such that

$$\begin{aligned} B(u_{kh}, \varphi _{kh}) + (d(\cdot , \cdot ,u_{kh}),\varphi _{kh})_{I}=(qg,\varphi _{kh})_{I}+(u_{0},\varphi _{kh,1}),\,\, \forall \varphi _{kh} \in U_{kh}. \end{aligned}$$
(18)

Just as in the semidiscrete case, we have the following stability estimates; see [9, Theorem 4.1]. We remark again that the uniform boundedness of \( u_{kh} \), independent of the discretization parameters kh, will play a crucial role.

Proposition 3.2

For the solution \( u_{kh} \in U_{kh}\) of (18), the following stability estimates holds

$$\begin{aligned} \Vert u_{kh}\Vert _{L^{\infty }(I \times \varOmega )}&\le c\big (\Vert qg\Vert _{L^{p}(I \times \varOmega )}+ \Vert \varPi _{h}u_{0}\Vert _{L^{\infty }(\varOmega )} + \Vert d(\cdot , \cdot ,0)\Vert _{L^{p}(I \times \varOmega )}\big ),\nonumber \\ \Vert u_{kh}\Vert _{L^{\infty }(I,V)}&\le c\big (\Vert qg\Vert _{I}^{2} + \Vert \varPi _{h}u_{0}\Vert _{V} + \Vert d(\cdot , \cdot ,0)\Vert _{I} \big ), \end{aligned}$$
(19)

where \( p > 2 \) and \( \varPi _{h} :V \rightarrow V_{h} \) is the \(L^{2}\)-projection in space.

Next, we introduce the discrete control-to-state map \( S_{kh}:L^{\infty }(I, \mathbb {R}^{m})\rightarrow U_{kh}\), the discrete state constraint

$$\begin{aligned} F_{kh}:=(\cdot , w) :U_{kh}\rightarrow U_{kh}(\mathbb {R}), \end{aligned}$$

where \(U_{kh}(\mathbb {R})\) denotes the space of piecewise constant functions \(]0,T[ \rightarrow {\mathbb {R}}\). Further, we introduce the \( C^{2} \)-functional \( G_{kh}=(F_{kh} \circ S_{kh} ) \), and the set of feasible controls \( Q_{kh, \mathrm{feas}} := \lbrace q \in Q_{\mathrm{ad}}:G_{kh}(q) \le 0 \rbrace \).

The discrete problem reads

figure b

Similar to the semidiscrete case, first and second derivatives of the discrete control-to-state map \( S_{kh} \) are defined via Lemma 3.1, with test functions from \( U_{kh}\), instead of \(U_{k}\). Further, for \( S_{kh} \) and its first derivative there holds the Lipschitz property analog to Lemma 3.2, compare with [9, Lemma 4.1].

We formulate standard KKT optimality conditions for problem (\(\mathbb {P}_{kh}\)). These conditions will be justified after the introduction of an auxiliary problem in Sect. 5. In particular, we will show in Lemma 5.1 that, for kh small enough, the Slater point for (1) is also a Slater point for (\(\mathbb {P}_{kh}\)).

Theorem 3.1

Let \( \bar{u}_{kh} \in Q_{kh,\mathrm{feas}} \) be a local solution of (\(\mathbb {P}_{kh}\)) with \( \bar{u}_{kh} \in U_{kh}\) the associated state. Then, under Assumption 2.2, for kh sufficiently small there exists a Lagrange multiplier \( \bar{\mu }_{kh} \in U_{kh}(\mathbb {R})^{*}\cap C({\text {cl}}(I))^{*} \) and an adjoint state \( \bar{z}_{kh} \in U_{kh}\) such that

$$\begin{aligned}&B(\bar{u}_{kh},\varphi ) + (d(\cdot ,\cdot ,\bar{u}_{kh}), \varphi )_{I} = (\bar{q}_{kh}g,\varphi )_{I} + (u_{0}, \varphi _{kh,1}),&\forall \varphi \in U_{kh}, \\&B(\varphi , \bar{z}_{kh}) + (\varphi , \partial _{u}d(\cdot ,\cdot ,\bar{u}_{kh})\bar{z}_{kh}) = (\bar{u}-u_{\mathrm{d}},\varphi )_{I} + \langle \bar{\mu }_{kh}, F_{kh}(\varphi ) \rangle ,&\forall \varphi \in U_{kh}, \\&\alpha (\bar{q}_{kh}, q-\bar{q}_{kh})_{L^{2}(I)} + (\bar{z}_{kh},(q-\bar{q}_{kh})g)_{I} \ge 0,&\forall q \in Q_{kh,\mathrm{feas}},\\&\langle F_{kh}(\bar{u}_{kh}),\bar{\mu }_{kh} \rangle = 0,\,\, \bar{\mu } \ge 0, \end{aligned}$$

where \(\langle \cdot , \cdot \rangle \) denotes the duality pairing between \( U_{kh}({\mathbb {R}}) \) and \( U_{kh}({\mathbb {R}})^{*} \). Further, the Lagrange multiplier can be represented as an element of \( C({\text {cl}}(I))^{*} \) by

$$\begin{aligned} \langle v, \bar{\mu }_{kh} \rangle = \sum _{n=1}^{N}\frac{\mu _{kh,n}}{k_{n}}\int _{I_{n}}v(t)\mathrm {d}t,\,\,\forall v \in C({\text {cl}}(I))\cup U_{kh}(\mathbb {R}). \end{aligned}$$

4 The State Equation

In this section, we are interested in the derivation of \( L^{\infty }(I,H) \) error estimates for the solutions of the continuous, semidiscrete and discrete state equation, which are not available for semilinear parabolic equations and are required for our final result in Sect. 5. The technique behind these estimates is based on a duality argument requiring, at any level of discretization, the introduction of auxiliary linearized problems. This approach has been used in [10] for a linear parabolic state equation. We now intend to extend it to the semilinear parabolic case adapting an idea of [35] for semilinear elliptic equations.

4.1 Error Estimates for the Temporal Discretization

In a first step, we introduce the backward uncontrolled linearized counterpart of the state equation. For a given fixed \( q \in L^{\infty }(I, \mathbb {R}^{m})\), we consider u and \( u_{k} \) solutions of (2) and (15), respectively, and we define

$$\begin{aligned} \tilde{d}={\left\{ \begin{array}{ll} \frac{d(u(t,x))-d(u_{k}(t,x))}{u(t,x)-u_{k}(t,x)}, &{} \text {if}\,\,u(t,x) \ne u_{k}(t,x), \\ 0,&{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

Then, we consider the linear backward problem

$$\begin{aligned} \begin{aligned} -(\varphi , \partial _{t}w)_{I} +(\nabla \varphi , \nabla w )_{I} + (\varphi , \tilde{d}w)_{I}&= 0, \\ w(T)&= w_{T}, \end{aligned} \end{aligned}$$
(20)

for any \( \varphi \in W(0,T) \cap H^1(I,H)\), with \( w_{T}\in H \).

Denoting by \( \hat{I}= ]0, \hat{t}[ \), \( \hat{t} \in ]0,T[ \), a truncated time interval, we introduce

$$\begin{aligned} \begin{aligned} -(\varphi , \partial _{t}\hat{w})_{\hat{I}} +(\nabla \varphi , \nabla \hat{w} )_{\hat{I}} + (\varphi , \tilde{d}\hat{w})_{\hat{I}}&= 0, \\ w(\hat{t})&= w_{T}. \end{aligned} \end{aligned}$$
(21)

Further, the semidiscrete counterpart of (20), for any \( \varphi _{k}\in U_{k}\), reads

$$\begin{aligned} B(\varphi _{k}, w_{k}) + (\varphi _{k}, \tilde{d}w_{k})_{I} = (\varphi _{k,N}, w_{T}). \end{aligned}$$
(22)

Before starting, we observe that, for any \( \varphi _{k} \in U_{k} \), the following relations hold

$$\begin{aligned} B(u-u_{k}, \varphi _{k})= & {} -(d(u)-d(u_{k}), \varphi _{k})_{I} = -((u-u_{k})\tilde{d}, \varphi _{k})_{I}, \end{aligned}$$
(23)
$$\begin{aligned} B(\varphi _{k}, w-w_{k})= & {} -(\varphi _{k}, (w-w_{k})\tilde{d})_{I}. \end{aligned}$$
(24)

In the following analysis, we will need negative norm estimates for the error between the solutions of (20), (21), and (22). These estimates will be used to derive the error at the time nodal points and inside the time intervals \( I_{n} \). Their derivation follows exactly as in [10, Lemma 5.1, Lemma 5.2], with minor changes due to the presence of the linearization \( \tilde{d} \) of the semilinear term, and therefore it is omitted. The crucial point is the boundedness of \( \tilde{d} \) in \( L^{\infty }(I\times \varOmega ) \) which follows from the Lipschitz continuity of \( d(\cdot ) \) and the regularity of \( u, u_{k} \in L^{\infty }(I \times \varOmega ) \).

For the convenience of the reader, the analog to [10, Lemma 5.1, Lemma 5.2] in our case reads as follows.

Lemma 4.1

For the error between the solutions w, \( \hat{w} \), and \( w_{k} \) of (20), (21), and (22), respectively, there holds

$$\begin{aligned} \Vert w-\hat{w}\Vert _{L^{1}(\hat{I},H)}+\Vert w(0)-\hat{w}(0)\Vert _{H^{-2}(\varOmega )}&\le ck\Big (\log \frac{T}{k}\Big )^{\frac{1}{2}}\Vert w_{T}\Vert ,\\ \Vert w-w_{k}\Vert _{L^{1}(I,H)}+\Vert w(0)-w_{k,1}\Vert _{H^{-2}(\varOmega )}&\le ck\Big (\log \frac{T}{k}\Big )^{\frac{1}{2}}\Vert w_{T}\Vert . \end{aligned}$$

With these estimates at hand, we are ready to derive the main result of the section.

Theorem 4.1

For given \( qg \in L^{\infty }(I,H) \) and \( u_{0} \in V \cap H^{2}(\varOmega ) \), let \( u \in U \) and \( u_{k} \in U_{k} \) be the solution of (2) and (15), respectively. Then, there holds

$$\begin{aligned} \Vert u-u_{k}\Vert _{L^{\infty }(I,H)}\le ck\Big (\log \frac{T}{k}+1\Big )^{\frac{1}{2}}\Big (\Vert qg\Vert _{L^{\infty }(I,H)} +\Vert u_{0}\Vert _{H^{2}(\varOmega )}+\Vert d(0)\Vert _{L^{\infty }(I\times \varOmega )}\Big ). \end{aligned}$$

Proof

Let \( e_{k}=u-u_{k} \) denote the error arising from the dG(0)-time discretization. In every time interval, we split the error into

$$\begin{aligned} \Vert e_{k}\Vert _{L^{\infty }(I_{n},H)} \le \underbrace{\Vert u(\cdot )-u(t_{n})\Vert _{L^{\infty }(I_{n},H)}}_{(a_{1})} +\underbrace{\Vert u(t_{n})-u_{k}(\cdot )\Vert _{L^{\infty }(I_{n},H)}}_{(a_{2})}, \end{aligned}$$

and we analyze the two terms \( (a_{1}), (a_{2}) \) separately. Then, taking the maximum over all \( n=1,\ldots ,N \), we obtain the assertion. Without loss of generality, we consider the last time interval \( I_{N} \). For an arbitrary time interval \( I_{n} \), we consider (20) on \( I=]0,t_{n}[ \) and (21) on \( \hat{I}=]0,\hat{t}[ \) for \( \hat{t}\in ]t_{n-1},t_{n}] \), and the proof follows mutatis mutandis, observing that \( 0\le \log (t_{n}/k)\le \log (T/k) \).

\((a_{1})\) :

For a generic fixed time \( \hat{t} \in I_{N} \), we start the derivation considering the interpolation error \( u(\hat{t})- u(t_{N}) \).

Consider the solutions w and \( \hat{w} \) to (20) and (21) on \(\hat{I}=]0,\hat{t}[\), respectively, with terminal value \( w_{T} = u(\hat{t}) - u(t_N)\). Integration by parts in time of (20) and (21) gives

$$\begin{aligned} -(\varphi (T), w(T)) + (\varphi (0), w(0)) +(\partial _{t}\varphi ,w)_{I}+(\nabla \varphi , \nabla w)_{I} + (\varphi , \tilde{d}w)_{I}&= 0, \\ -(\varphi (\hat{t}),\hat{w}(\hat{t}))+(\varphi (0),\hat{w}(0))+(\partial _{t}\varphi ,\hat{w})_{\hat{I}}+(\nabla \varphi , \nabla \hat{w})_{\hat{I}}+ (\varphi , \tilde{d}\hat{w})_{\hat{I}}&=0, \end{aligned}$$

for any \( \varphi \in W(0,T) \cap H^1(I,H)\).

In particular, setting \( \varphi = u \), the state equation (2) yields

$$\begin{aligned} -(u(T), w(T)) + (u(0), w(0)) +(qg,w)_{I} -(d(u), w)_{I}+ (u, \tilde{d}w)_{I}&= 0, \\ -(u(\hat{t}),\hat{w}(\hat{t})) + (u(0),\hat{w}(0))+ (qg,\hat{w})_{\hat{I}} -(d(u), \hat{w})_{\hat{I}}+ (u, \tilde{d}\hat{w})_{\hat{I}}&= 0. \end{aligned}$$

By definition \( w(T)=w(\hat{t})=w_{T} \), subtracting the equalities above, we get

$$\begin{aligned} \begin{aligned} (u(\hat{t})-u(T),w_{T})&= (u(0), \hat{w}(0)-w(0))+(qg,\hat{w}-w)_{\hat{I}}-(qg,w)_{I \setminus \hat{I}} \\&\quad \, +\underbrace{\big (u,(\hat{w}-w)\tilde{d}\big )_{\hat{I}}}_{(b_{1})} \underbrace{-(u, \tilde{d}w)_{I\setminus \hat{I}}}_{(b_{2})}\\&\quad \, +\underbrace{(d(u), w-\hat{w})_{\hat{I}}}_{(b_{3})} +\underbrace{(d(u), w)_{I\setminus \hat{I}}}_{(b_{4})}. \end{aligned} \end{aligned}$$
(25)

We abbreviate \(\hat{e}^w = w-\hat{w}\) and analyze the terms separately.

\((b_{1})\) :

Due to the stability in \( L^{\infty }(I \times \varOmega ) \) of the solutions of (2) and (15) and the Lipschitz continuity of d, we observe that \( \Vert \tilde{d}\Vert _{L^{\infty } (I\times \varOmega )}\le c \). Therefore,

$$\begin{aligned} |(u,\hat{e}^w\tilde{d})_{\hat{I}}| \le c\Vert u\Vert _{L^{\infty }(I,H)}\Vert \hat{e}^w\Vert _{L^{1}(\hat{I},H)}. \end{aligned}$$
\((b_{2})\) :

Exploiting again the boundedness of \( \tilde{d} \) in \( L^{\infty }(I\times \varOmega ) \), and \( \vert T-\hat{t} \vert \le k \), we have

$$\begin{aligned} -(u,\tilde{d}w)_{I\setminus \hat{I}}&\le \Big \vert \int _{\hat{t}}^{T}(u,\tilde{d}w)\mathrm {d}t\Big \vert \\&\le ck\Vert u\Vert _{L^{\infty }(I,H)}\Vert w\Vert _{L^{\infty }(I,H)}. \end{aligned}$$
\((b_{3})\) :

The Lipschitz property of d(u) and the boundedness of d(0) in \( L^{\infty }(\hat{I},H) \) yield

$$\begin{aligned} (d(u), \hat{e}^w)_{\hat{I}}&=(d(u)-d(0), \hat{e}^w)_{\hat{I}}+(d(0), \hat{e}^w)_{\hat{I}}\\&\le \Vert d(u)-d(0)\Vert _{L^{\infty }(\hat{I},H)}\Vert \hat{e}^w\Vert _{L^{1}(\hat{I},H)}\\&\quad +\Vert d(0)\Vert _{L^{\infty }(\hat{I},H)}\Vert \hat{e}^w\Vert _{L^{1}(\hat{I},H)}\\&\le c\big (\Vert u\Vert _{L^{\infty }(\hat{I},H)} +\Vert d(0)\Vert _{L^{\infty }(\hat{I},H)} \big )\Vert \hat{e}^w\Vert _{L^{1}(\hat{I},H)}. \end{aligned}$$
\((b_{4})\) :

Using the same argument as for \( (b_{3}) \), we conclude

$$\begin{aligned} (d(u), w)_{I\setminus \hat{I}}&= (d(u)-d(0), w)_{I\setminus \hat{I}}+(d(0), w)_{I\setminus \hat{I}}\\&\le c k\big (\Vert u\Vert _{L^{\infty }(I\times \varOmega )}+\Vert d(0)\Vert _{L^{\infty }(I\times \varOmega )} \big )\Vert w\Vert _{L^{\infty }(I,H)}. \end{aligned}$$

Going back to (25), but now utilizing that \( w_{T}=u(\hat{t})-u(T) \), we obtain

$$\begin{aligned} \Vert u(\hat{t})-u(T) \Vert ^{2}&\le c\Big (\Vert \hat{e}^w\Vert _{L^{1}(\hat{I},H)}+\Vert \hat{e}^w(0)\Vert _{H^{-2}(\varOmega )} +k\Vert w\Vert _{L^{\infty }(I,H)}\Big )\\&\quad \cdot \Big (\Vert qg\Vert _{L^{\infty }(I,H)}+\Vert u_{0}\Vert _{H^{2}(\varOmega )} +\Vert d(0)\Vert _{L^{\infty }(I\times \varOmega )}\\&\quad \quad +\Vert u\Vert _{L^{\infty }(I,H)} +\Vert u\Vert _{L^{\infty }(I\times \varOmega )}\Big ). \end{aligned}$$

Using the stability of the solution w of (20), i.e., \( \Vert w\Vert _{L^{\infty }(I,H)}\le c\Vert w_{T}\Vert \); see, e.g., [10, Theorem 5.3], Proposition 2.1, Lemma 4.1, and division by \( \Vert w_{T}\Vert = \Vert u(\hat{t})-u(T)\Vert \), we conclude

$$\begin{aligned} \begin{aligned} \Vert u(\hat{t})-u(T) \Vert \le ck\log \Big (\frac{T}{k}+1\Big )^{\frac{1}{2}}\Big (&\Vert q\Vert _{L^{\infty }(I, \mathbb {R}^{m})}\Vert g\Vert _{H} +\Vert u_{0}\Vert _{H^{2}(\varOmega )}\\&+\Vert d(0)\Vert _{L^{\infty }(I\times \varOmega )}\Big ). \end{aligned} \end{aligned}$$
(26)
\((a_{2})\) :

To obtain the error of the dG(0) -discretization inside the time interval \( I_{N} \), we set \( w_{T}=u(t_{N})-u_{k,N}=u(T)-u_{k,N} \) in (20) and in (22). Then, for any \(\varphi \in U_k + (L^2(I,V)\cap H^1(I,H))\) it holds

$$\begin{aligned} B(\varphi ,w)+(\varphi ,\tilde{d}w)_{I}=(\varphi _{N},u(T)-u_{k,N}). \end{aligned}$$

In particular, testing the relation above with \( \varphi =u-u_{k} \) and making use of (23) and (24), we have

$$\begin{aligned} \Vert u(T)-u_{k,N}\Vert ^{2}&= B(u-u_{k},w)+(u-u_{k},\tilde{d}w)_{I}\\&=B(u-u_{k},w-w_{k})-((u-u_{k})\tilde{d},w_{k})_{I}+((u-u_{k})\tilde{d},w)_{I}\\&=B(u,w-w_{k})+(u_{k},(w-w_{k})\tilde{d})_{I}+((u-u_{k})\tilde{d},w-w_{k})_{I}\\&=(qg,w-w_{k})_{I}+(u_{0},w(0)-w_{k}(0))\underbrace{-(d(u),w-w_{k})_{I}}_{(c_{1})} \\&\quad +\underbrace{(u_{k},(w-w_{k})\tilde{d})_{I}}_{(c_{2})} + \underbrace{((u-u_{k})\tilde{d},w-w_{k})_{I}}_{(c_{3})}, \end{aligned}$$

where, in the last step, we used (2).

We abbreviate \(e^w_k = w-w_{k}\) and consider the three terms \( (c_{1})-(c_{3}) \) separately.

\((c_{1})\) :

Observing that \( L^{\infty }(I,V) \hookrightarrow L^{\infty }(I,H) \), the stability result in Lemma 2.1 of the solution u of (2), the Lipschitz continuity of \( d(\cdot ) \), and the boundedness of d(0) in \( L^{\infty }(I,H)\) yield

$$\begin{aligned} -(d(u),e^w_k)_{I}&\le \Big (\Vert d(u)-d(0)\Vert _{L^{\infty }(I,H)} + \Vert d(0)\Vert _{L^{\infty }(I,H)}\Big ) \Vert e^w_k\Vert _{L^{1}(I,H)}\\&\le c\Big (\Vert u\Vert _{L^{\infty }(I,H)}+\Vert d(0)\Vert _{L^{\infty }(I,H)}\Big )\Vert e^w_k\Vert _{L^{1}(I,H)}\\&\le c\Big (\Vert qg\Vert _{I}+ \Vert u_{0}\Vert _{V}+\Vert d(0)\Vert _{L^{\infty }(I,H)}\Big )\Vert e^w_k\Vert _{L^{1}(I,H)}. \end{aligned}$$
\((c_{2})\) :

The boundedness of \( \tilde{d} \) in \( L^{\infty }(I\times \varOmega ) \) and the stability result of the semidiscrete equation of Proposition 3.1 yield

$$\begin{aligned} (u_{k},\tilde{d}e^w_k)_{I}&\le \Vert u_{k}\Vert _{L^{\infty }(I,H)}\Vert e^w_k\Vert _{L^{1}(I,H)}\\&\le c\Big (\Vert qg\Vert _{I}+ \Vert u_{0}\Vert _{V} +\Vert d(0)\Vert _{I} \Big ) \Vert e^w_k\Vert _{L^{1}(I,H)}. \end{aligned}$$
\((c_{3})\) :

From the Lipschitz continuity of \( d(\cdot ) \), as well as the definition and boundedness of \( \tilde{d} \), it follows

$$\begin{aligned} (\tilde{d}(u-u_{k}),e^w_k)_{I}&= (d(u)-d(u_{k}),e^w_k)_{I}\\&=(d(u)-d(0),e^w_k)_{I}+(d(0)-d(u_{k}),e^w_k)_{I}\\&\le c\Big (\Vert u\Vert _{L^{\infty }(I,H)}+\Vert u_{k}\Vert _{L^{\infty }(I,H)}\Big )\Vert e^w_k\Vert _{L^{1}(I,H)}\\&\le c\Big (\Vert qg\Vert _{I}+ \Vert u_{0}\Vert _{V} +\Vert d(0)\Vert _{I} \Big ) \Vert e^w_k\Vert _{L^{1}(I,H)}, \end{aligned}$$

where, in the last step, we used the stability of the solutions u and \( u_{k} \) of (2) and (15), from Proposition 2.1 and Proposition 3.1, respectively.

Summing up, for the error inside the time interval, we obtain

$$\begin{aligned} \begin{aligned} \Vert u(T)-u_{k,N}\Vert ^{2}\le c&\Big (\Vert e^w_k\Vert _{L^{I}(I,H)}+\Vert e^w_k(0)\Vert _{H^{-2}(\varOmega )}\Big )\\&\cdot \Big (\Vert qg\Vert _{L^{\infty }(I,H)}+\Vert u_{0}\Vert _{H^{2}(\varOmega )} +\Vert d(0)\Vert _{L^{\infty }(I,H)}\Big ). \end{aligned} \end{aligned}$$
(27)

In conclusion, combining (26) with (27) and thanks to Lemma 4.1, we obtain the assertion dividing by \(\Vert w_T\Vert = \Vert u(T)-u_{k,N}\Vert \). \(\square \)

4.2 Error Estimates for the Spatial Discretization

We develop error estimates for the spatial discretization of the problem using similar steps as in the semidiscrete case. The linearization of d now reads

$$\begin{aligned} \hat{d}= {\left\{ \begin{array}{ll} \frac{d(u_{k}(t,x))-d(u_{kh}(t,x))}{u_{k}(t,x)-u_{kh}(t,x)}, &{} \text {if}\,\,u_{k}(t,x) \ne u_{kh}(t,x), \\ 0,&{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

We remark that, thanks to the Lipschitz continuity of \( d(\cdot ) \), the linearized term \( \hat{d} \) is bounded in \( L^{\infty }(I\times \varOmega ) \).

We introduce the discrete counterpart of (20) with \(\hat{d}\) instead of \(\tilde{d}\). Find \( w_{kh}\in U_{kh}\) such that

$$\begin{aligned} B(\varphi _{kh}, w_{kh})+(\varphi _{kh},\hat{d}w_{kh})_{I}= (\varphi _{kh,N}, w_{T}), \end{aligned}$$
(28)

for any \( \varphi _{kh} \in U_{kh}\), with \( w_{T} \in H \).

We also consider the auxiliary problem (22) with \( \hat{d} \) instead of \( \tilde{d} \). Namely, find \( w_{k}\in U_{k}\) such that

$$\begin{aligned} B(\varphi _{k}, w_{k})+(\varphi _{k},\hat{d}w_{k})_{I}= (\varphi _{k,N}, w_{T}), \end{aligned}$$
(29)

for any \( \varphi _{k}\in U_{k}\).

We observe that for any \( \varphi _{kh} \in U_{kh} \) the following relations hold

$$\begin{aligned} B(u_{k}-u_{kh}, \varphi _{kh})= & {} -(d(u_{k})-d(u_{kh}), \varphi _{kh})_{I} = -((u_{k}-u_{kh})\hat{d}, \varphi _{kh})_{I}, \end{aligned}$$
(30)
$$\begin{aligned} B(\varphi _{kh}, w_{k}-w_{kh})= & {} -(\varphi _{kh}, (w_{k}-w_{kh})\hat{d})_{I}. \end{aligned}$$
(31)

As for the error in the dG(0)-semidiscretization, also here we will employ a duality argument requiring estimates for the error between the solutions of (29) and (28). Again, the proof is analogous to [10, Lemma 5.8 and Lemma 5.9] with the obvious modifications due to the presence of \( \hat{d} \). Consequently, we only state the following Lemma regarding the required error estimates.

Lemma 4.2

For the error between the solutions \( w_{k} \) and \( w_{kh} \) of (29) and (28), respectively, there holds

$$\begin{aligned} \Vert w_{k,1}-w_{kh,1}\Vert _{H^{-2}(\varOmega )} + T\Vert w_{k,1}-w_{kh,1}\Vert \le ch^{2}\Vert w_{T}\Vert . \end{aligned}$$
(32)

Theorem 4.2

For given \( qg \in L^{\infty }(I,H) \) and \( u_{0} \in H^{2}(\varOmega ) \cap V \), let \( u_{k} \in U_{k} \) and \( u_{kh} \in U_{kh} \) be the solutions of (15) and (18), respectively. Then, there holds

$$\begin{aligned} \Vert u_{k}-u_{kh}\Vert _{L^{\infty }(I,H)} \le ch^{2}\Big (\log \frac{T}{k}+1\Big )\Big (\Vert qg\Vert _{L^{\infty }(I,H)}+\Vert u_{0}\Vert _{H^{2}(\varOmega )} + \Vert d(0)\Vert _{L^{\infty }(I \times \varOmega )} \Big ). \end{aligned}$$

Proof

Since both \( u_{k} \), \( u_{kh} \) are constant on each time interval \( I_{n} \), we can equivalently show the estimate on a single time interval \( I_{n} \) and, with no loss of generality, we consider the last time interval only. For an arbitrary time interval \( I_{n} \), we consider (28) and (29) on \( I=]0,t_{n}[ \) and, noting that

$$\begin{aligned} 0\le \log (t_{n}/k)\le \log (T/k), \end{aligned}$$

the proof follows mutatis mutandis.

Proceeding as in the proof of Theorem 4.1, we set \(w_{T} = u_{k,N}-u_{kh,N} \) in (28) and (29). Then, using (30) and (31), we have

$$\begin{aligned} \Vert u_{k,N}&-u_{kh,N}\Vert ^{2} = B(u_{k}-u_{kh}, w_{k}) + (u_{k}-u_{kh}, \hat{d}w_{k})_{I}\\&= B(u_{k}-u_{kh}, w_{k}-w_{kh})-(\hat{d}(u_{k}- u_{kh}), w_{kh})_{I} + (\hat{d}(u_{k}-u_{kh}), w_{k})_{I}\\&= B(u_{k}, w_{k}-w_{kh}) + (u_{kh}, \hat{d}(w_{k}-w_{kh}))_{I} +(\hat{d}(u_{k}-u_{kh}),w_{k}- w_{kh})_{I}\\&= (qg,w_{k}-w_{kh})_{I}+(u_{0}, w_{k,1}-w_{kh,1})\underbrace{-(d(u_{k}),w_{k}-w_{kh})_{I}}_{(a_{1})}\\&\quad + \underbrace{(u_{kh}, \hat{d}(w_{k}-w_{kh}))_{I}}_{(a_{2})} +\underbrace{(\hat{d}(u_{k}-u_{kh}),w_{k}-w_{kh})_{I}}_{(a_{3})}, \end{aligned}$$

where, in the last step, we used (15). We analyze the three terms separately, abbreviating \(e^w_{kh} = w_k - w_{kh}\).

\((a_{1})\) :

The Lipschitz continuity of \( d(\cdot ) \) and the boundedness of d(0) in \( L^{\infty }(I,H) \), give

$$\begin{aligned} -(d(u_{k}),e^w_{kh})_{I}&\le c\Big (\Vert d(u_{k})-d(0)\Vert _{L^{\infty }(I,H)}+\Vert d(0)\Vert _{L^{\infty }(I,H)}\Big )\cdot \Vert e^w_{kh}\Vert _{L^{1}(I,H)}\\&\le c\Big (\Vert u_{k}\Vert _{L^{\infty }(I,H)}+\Vert d(0)\Vert _{L^{\infty }(I,H)}\Big )\Vert e^w_{kh}\Vert _{L^{1}(I,H)}. \end{aligned}$$
\((a_{2})\) :

Recalling that \( \hat{d} \) is bounded, we have

$$\begin{aligned} (u_{kh}, \hat{d}e^w_{kh})_{I} \le c\Vert u_{kh}\Vert _{L^{\infty }(I,H)} \Vert e^w_{kh}\Vert _{L^{1}(I,H)}. \end{aligned}$$
\((a_{3})\) :

For the last term, we rely again on the Lipschitz continuity of \( d(\cdot ) \) to conclude

$$\begin{aligned} (\hat{d}(u_{k}-u_{kh}),e^w_{kh})_{I}&=(d(u_{k})-d(u_{kh}),e^w_{kh})_{I}\\&\le c\Big (\Vert u_{k}\Vert _{L^{\infty }(I,H)}+\Vert u_{kh}\Vert _{L^{\infty }(I,H)}\Big )\Vert e^w_{kh}\Vert _{L^{1}(I,H)}. \end{aligned}$$

We now combine the previous inequalities and, thanks to the stability estimates (16) and (19), we obtain

$$\begin{aligned} \Vert u_{k,N}-u_{kh,N}\Vert ^{2}&\le c\Big (\Vert e^w_{kh}\Vert _{L^{1}(I,H)} + \Vert w_{k,1}-w_{kh,1}\Vert _{H^{-2}(\varOmega )} \Big )\\&\quad \cdot \Big (\Vert qg\Vert _{L^{\infty }(I,H)}+\Vert u_{0}\Vert _{H^{2}(\varOmega )}+\Vert d(0)\Vert _{L^{\infty }(I\times \varOmega )}\Big ). \end{aligned}$$

Noting that the \(L^2\)-estimate in (32) remains true on shorter intervals, it follows with \(\tau _{n,k} = T-t_{n-1}\)

$$\begin{aligned} \Vert w_{k}-w_{kh}\Vert _{L^{1}(I,H)}&\le \sum _{n=1}^{N}k_{n}\tau _{k,n}^{-1}\max _{n=1,\ldots ,N} \big (\tau _{k,n}\Vert w_{k,n}-w_{kh,n}\Vert \big )\\&\le ch^{2}\Big (\log \frac{T}{k}+1\Big )\Vert w_{T}\Vert . \end{aligned}$$

Using Lemma 4.2, dividing by \( \Vert w_{T}\Vert = \Vert u_{k,N}-u_{kh,N}\Vert \), we obtain the assertion. \(\square \)

5 Convergence Analysis

In this section, we focus on the main result of this paper. We show that for any local solution \( \bar{q} \) of the continuous problem satisfying KKT-conditions and SSCs, there exists a sequence of local solutions \( \bar{q}_{kh} \) of (\(\mathbb {P}_{kh}\)) converging to \( \bar{q} \). To analyze the errors induced by the discretization, we use the so-called two-way feasibility argument; see, e.g., [36, 37]. In this method, the linearized Slater point \( q_{\gamma } \) from Assumption 2.2 is used to construct sequences of controls (competitors) which are feasible for the continuous and discrete problem, respectively. If the problem is linear, these sequences of feasible competitors can be used in the first-order necessary and sufficient conditions to obtain convergence of the discrete problem. In the semilinear case, due to the presence of the linearized term, the complementary slackness condition cannot be used as in the linear setting. Therefore, the feasible controls have to be used in combination with second-order information, in particular in the quadratic growth condition (14) arising from the second-order sufficient conditions. This approach has been used in the recent paper [4] for the semilinear elliptic case in combination with a localization argument, as in [38]. We now intend to extend that approach to our semilinear parabolic optimal control problem with state constraints.

In the following analysis, we will introduce auxiliary problems in a neighborhood of the optimal local solution \( \bar{q} \). To this end, we denote with \( r > 0 \) a radius, to be chosen conveniently later, and we define

$$\begin{aligned} Q^{r}&:=\lbrace q \in Q_{\mathrm{ad}}:\Vert q-\bar{q}\Vert _{L^{2}(I, {\mathbb {R}}^{m})} \le r \rbrace ,\\ Q^{r}_\mathrm{feas}&:=\lbrace q \in Q^{r}:G(q)\le 0 \rbrace . \end{aligned}$$

Then, the continuous auxiliary problem reads

figure c

Due to the SSCs, for r sufficiently small, the unique global solution of (\(\mathbb {P}^{r}\)) coincides with the selected local solution \(\bar{q}\) of (\(\mathbb {P}\)). The value of introducing the auxiliary problem lies in the fact that a discretization of (\(\mathbb {P}^{r}\)), defined below, will provide a sequence of solutions converging to the selected local optimum. For \( G_{kh} = F_{kh} \circ S_{kh } \), we introduce the discrete auxiliary problem (\(\mathbb {P}_{kh}^{r}\))

figure d

We remark again that the control is not discretized, the index kh is taken only to clarify the association to the problem (\(\mathbb {P}_{kh}^{r}\)), i.e., the use of the discretized state equation.

Assumption 5.1

We assume that \( q_{\gamma } \) satisfying Slater’s condition (7) is close enough to \( \bar{q} \in Q_{\mathrm{feas}} \), meaning

$$\begin{aligned} \Vert q_{\gamma } - \bar{q}\Vert _{L^{2}(I, {\mathbb {R}}^{m})}\le \frac{r}{2}. \end{aligned}$$
(34)

The fact that \( q_{\gamma } \) is in a neighborhood of \( \bar{q} \) is a reasonable assumption. Indeed, as observed in [4, Section 2], given any Slater point \(q_\gamma \) with parameter \(\gamma \) one can construct a Slater point \(q_\gamma ^r=\bar{q}+t(q_{\gamma }-\bar{q})\) close to \(\bar{q}\) with a parameter \( \gamma (r) = t\gamma \simeq r\gamma \) with \( t=\min \{1,r/2\Vert q_{\gamma }-\bar{q}\Vert \} \). Hence, one has that (7) holds with \( \gamma \) replaced by \( t\gamma \). Further, after showing that (\(\mathbb {P}_{kh}\)) admits local solutions, and thanks to the following Lemma 5.1, we will see that it is reasonable to assume that (34) holds also for the discrete problem (\(\mathbb {P}_{kh}\)), namely

$$\begin{aligned} \Vert q_{\gamma } - \bar{q}_{kh}\Vert _{L^{2}(I, {\mathbb {R}}^{m})} \le \frac{r}{2}. \end{aligned}$$

In what follows, we abbreviate the derived convergence rate for the space-time discretization by

$$\begin{aligned} c(k,h) := k\Bigl (\ln \frac{T}{k}+1\Bigr )^{\frac{1}{2}} + h^2 \Bigl (\ln \frac{T}{k}+1\Bigr ). \end{aligned}$$

We will now define three constants \( c_{1}, c_{2}, c_{3} \), independent of the discretization parameter kh, and the Tikhonov parameter \(\alpha \). These constants are given by

$$\begin{aligned}&\sup _{q\in B_{\frac{r}{2}}(\bar{q})} \Vert (\omega ,u_{kh}(q)-u(q))\Vert _{L^\infty (I)} \le c_1 c(k,h),\\&\sup _{q\in B_{\frac{r}{2}}(\bar{q})}\Vert G^{''}(q_{k})\Vert _{\mathcal L (L^2(I,{\mathbb {R}}^m)^2;L^\infty (I))}, \sup _{q\in B_{\frac{r}{2}}(\bar{q})}\Vert G^{''}_{kh}(q_{k})\Vert _{\mathcal L (L^2(I,{\mathbb {R}}^m)^2;L^\infty (I))}\le c_{2},\\&\sup _{q\in B_{\frac{r}{2}}(\bar{q})} \Vert (G'_{kh}(q)-G'(\bar{q}))(q_\gamma -\bar{q})\Vert _{L^\infty (I)} \le c_3 \Bigl (c(k,h) + \frac{r^2}{2}\Bigr ), \end{aligned}$$

where \( B_{\frac{r}{2}}(\bar{q}) \) denotes an \( L^{2}(I, {\mathbb {R}}^{m}) \) ball centered in \( \bar{q} \) with radius \(\frac{r}{2}\).

Remark 5.1

To see that these constants are independent of kh proceed as follows

  • For the constant \(c_1\), we notice that this error can be estimated by the discretization errors obtained by Theorems 4.1 and 4.2, noting that by the proof of these theorems the constant in the error estimates remains bounded on \(B_{\frac{r}{2}}(\bar{q})\).

  • The constant \(c_2\) is a consequence of G being a \(C^2\) functional together with a discretization error bound for \(G''_{kh}\).

  • For the constant \(c_3\), we notice that

    $$\begin{aligned} F(\varphi )=F_{kh}(\varphi )=\int _{\varOmega }\varphi (t,x)\omega (x) \mathrm {d}x, \quad \varphi \in W(0,T) \cup U_{kh} \end{aligned}$$

    is linear and consequently the error satisfies

    $$\begin{aligned} (G'_{kh}(q)&-G'(\bar{q}))(q_\gamma -\bar{q})=F_{kh}\big (S^{'}_{kh}(q)(q_{\gamma }-\bar{q})\big ) -F\big (S^{'}(\bar{q})(q_{\gamma }-\bar{q})\big )\\&=\Big (\omega ,\big (S^{'}_{kh}(q) - S^{'}(\bar{q})\big )(q_{\gamma }-\bar{q})\Big )\\&= \Big (\omega ,\big (S^{'}_{kh}(q)-S^{'}(q)+S^{'}(q)-S^{'}(\bar{q})\big )(q_{\gamma }-\bar{q})\Big )\\&\le c\Big (\Vert \big (S^{'}_{kh}(q)-S^{'}(q)\big )(q_{\gamma }-\bar{q})\Vert _{L^{\infty }(I,H)}\\&\quad +\Vert q-\bar{q}\Vert _{L^{2}(I,{\mathbb {R}}^{m})}\Vert q_{\gamma }-\bar{q}\Vert _{L^{2}(I,{\mathbb {R}}^{m})}\Big ), \end{aligned}$$

    where, in the last step, we used the stability of \(S'\), i.e., (4c). The remaining term is a discretization error that can be estimated by [10, Corollary 5.5 and 5.11]. Namely, we have

    $$\begin{aligned} \Vert \big (S^{'}_{kh}(\bar{q}_{kh}^{r})-S^{'}(\bar{q}_{kh}^{r}) \big )(q_{\gamma }-\bar{q})\Vert _{L^{\infty }(I,H)}&\le c\cdot c(k,h)\cdot \\&\quad \cdot \big (\Vert g\Vert _{L^{\infty }(\varOmega )}\Vert q_{\gamma }-\bar{q}\Vert _{L^{\infty }(I,{\mathbb {R}}^{m})}\big ). \end{aligned}$$

    By virtue of the control constraints, we have

    $$\begin{aligned} \Vert q_{\gamma }-\bar{q}\Vert _{L^{\infty }(I,{\mathbb {R}}^{m})}\le |q_{\max }-q_{\min } |. \end{aligned}$$

    Then, thanks to (34) and \(\Vert q-\bar{q}\Vert _{L^2(I,{\mathbb {R}}^m)} \le \frac{r}{2}\), we conclude

    $$\begin{aligned} |(G'_{kh}(q)-G'(\bar{q}))(q_\gamma -\bar{q})|\le c_{3}\Bigg (c(k,h)+\frac{r^{2}}{2}\Bigg ). \end{aligned}$$

Moreover, by the above arguments, clearly, \(c_1, c_2, c_3\) remain bounded as \(r \rightarrow 0\).

As we have seen in the discussion after Assumption 5.1 it holds \( \gamma (r)\simeq r\gamma \). Hence there exists \( \tilde{r}\le r\) such that

$$\begin{aligned} -\gamma (\tilde{r})+\Big (c_{2}+\frac{c_{3}}{2}\Big )\tilde{r}^{2} \le -\frac{3}{4}\gamma (\tilde{r}). \end{aligned}$$
(35)

We can now summarize our requirements on r. Throughout the rest of the paper we rely on the following.

Assumption 5.2

Let the radius \( r > 0 \) be small enough, such that (35) holds and the quadratic growth condition (14) holds for elements in \( Q^{r}_{\mathrm{feas}} \). Namely,

$$\begin{aligned} j(q) \ge j(\bar{q}) + \delta \Vert q-\bar{q}\Vert _{L^{2}(I,{\mathbb {R}}^{m})}^{2}, \end{aligned}$$

for any \( q \in Q^{r}_{\mathrm{feas}} \).

After this preparation, we construct feasible competitors for (\(\mathbb {P}_{kh}^{r}\)).

Proposition 5.1

Let \( \bar{q} \) be a local solution of (\(\mathbb {P}\)) and \( q_{\gamma } \) be the Slater’s point from Assumption 2.2. Let

$$\begin{aligned} t(k,h)=\frac{c_{1} \cdot c(k,h)}{c_{4}r^{2}-\gamma } \end{aligned}$$

be given with \( c_{4} \) such that \( c_{4}r^{2}-\gamma = \gamma /4\). Then, the sequence of controls defined by

$$\begin{aligned} q_{t(k,h)}= \bar{q}+t(k,h)(q_{\gamma }-\bar{q}) \end{aligned}$$

is feasible for (\(\mathbb {P}_{kh}^{r}\)), for kh sufficiently small, such that \( 0< t(k,h) < 1 \).

Proof

To verify the feasibility of \(q_{t(k,h)}\), we use a Taylor expansion argument. The definition of \( q_{t(k,h)} \) suggests to expand \( G(q_{t(k,h)}) \) at \( \bar{q} \), obtaining

$$\begin{aligned} G(q_{t(k,h)}) = G(\bar{q})+G^{'}(\bar{q})(q_{t(k,h)}-\bar{q})+\frac{1}{2}G^{''}(q_{\zeta })(q_{t(k,h)}-\bar{q})^{2}, \end{aligned}$$

where \( q_{\zeta } \) is a convex combination of \( q_{t(k,h)}\) and \( \bar{q}\).

We insert this expansion in the following calculations

$$\begin{aligned} G_{kh}(q_{t(k,h)})&= G_{kh}(q_{t(k,h)})-G(q_{t(k,h)})+G(q_{t(k,h)})\\&=G_{kh}(q_{t(k,h)})-G(q_{t(k,h)})+G(\bar{q})+G^{'}(\bar{q})(q_{t(k,h)}-\bar{q})\\&\quad +\frac{1}{2}G^{''}(q_{\zeta })(q_{t(k,h)}-\bar{q})^{2}\\&=G_{kh}(q_{t(k,h)})-G(q_{t(k,h)})+G(\bar{q})+t(k,h)G(\bar{q})-t(k,h)G(\bar{q})\\&\quad +t(k,h)G^{'}(\bar{q})(q_{\gamma }-\bar{q})+\frac{1}{2}G^{''}(q_{\zeta })(q_{t(k,h)}-\bar{q})^{2}\\&=\underbrace{G_{kh}(q_{t(k,h)})-G(q_{t(k,h)})}_{(a_{1})}\\&\quad +\underbrace{(1-t(k,h))G(\bar{q})+t(k,h)(G(\bar{q})+G^{'}(\bar{q})(q_{\gamma }-\bar{q}))}_{(a_{2})}\\&\quad +\underbrace{\frac{1}{2}G^{''}(q_{\zeta })(q_{t(k,h)}-\bar{q})^{2}}_{(a_{3})}. \end{aligned}$$
\((a_{1})\) :

By definition of \(c_1\), it holds

$$\begin{aligned} G_{kh}(q_{t(k,h)})-G(q_{t(k,h)})&= (u_{kh}(q_{t(k,h)})-u(q_{t(k,h)}), \omega (x))_{I}\\&\le c_{1}\cdot c(k,h). \end{aligned}$$
\((a_{2})\) :

This part is handled thanks to the feasibility of \( \bar{q}\) for (\(\mathbb {P}\)) and Slater’s regularity condition of Assumption 2.2. Indeed, for kh sufficiently small, such that \( 0<t(k,h)<1 \), we have

$$\begin{aligned} (1-t(k,h))G(\bar{q})&\le 0,\\ t(k,h)(G(\bar{q})+G^{'}(\bar{q})(q_{\gamma }-\bar{q}))&\le -t(k,h)\gamma , \end{aligned}$$

from which we obtain

$$\begin{aligned} (a_{2}) \le -t(k,h)\gamma . \end{aligned}$$
\((a_{3})\) :

By definition of \(c_2\), it follows

$$\begin{aligned} G^{''}(q_{\zeta })(q_{t(k,h)}-\bar{q})^{2}&\le c_{2}t(k,h)^{2}\Vert q_{\gamma }-\bar{q}\Vert _{L^{2}(I,{\mathbb {R}}^{m})}^{2}\le c_{2}t(k,h)^{2}\frac{r^{2}}{4}. \end{aligned}$$

Combining the three parts and using the definition of t(kh) , we have

$$\begin{aligned} G_{kh}(q_{t(k,h)})&\le c_{1}\cdot c(k,h)+t(k,h)\left( c_{2}t(k,h)\frac{r^{2}}{4}-\gamma \right) \\&= t(k,h)(c_4r^2-\gamma )+t(k,h)\left( c_{2}t(k,h)\frac{r^{2}}{4}-\gamma \right) \\&= t(k,h) \Bigl ( c_{4} r^2 - 2\gamma + c_{2} t(k,h) r^2\Bigr ). \end{aligned}$$

Hence, for hk sufficiently small, such that \( 0<t(k,h)<1 \), we obtain from (35) and the definition of \(c_{4}\) that

$$\begin{aligned} G_{kh}(q_{t(k,h)})&\le t(k,h) \Bigl ( c_{4} r^2 - 2\gamma + c_{2} r^2\Bigr )\\&\le (c_{4} - \gamma ) + (c_{2}r^2 - \gamma )\\&\le \frac{\gamma }{2} - \frac{3}{4} \gamma \\&\le -\frac{1}{4} \gamma < 0, \end{aligned}$$

and the feasibility of \( q_{t(k,h)}\) is verified. \(\square \)

The proposition above in particular ensures that \( Q^{r}_{kh,\mathrm{feas}} \) is not empty once kh are small enough, thus we assert:

Corollary 5.1

For kh sufficiently small, there exists at least one global solution \( \bar{q}_{kh}^{r}\in Q^{r}_{kh,\mathrm{feas}} \) of (\(\mathbb {P}_{kh}^{r}\)).

In a second step, we show that the linearized regularity condition of Assumption 2.2 continues to hold in the discrete setting.

Lemma 5.1

Under Assumption 2.2, for kh small enough, it holds

$$\begin{aligned} G_{kh}(\bar{q}_{kh}^{r})+G^{'}_{kh}(\bar{q}_{kh}^{r})(q_{\gamma }-\bar{q}_{kh}^{r}) \le -\frac{1}{2}\gamma < 0\quad \text {on }{\text {cl}}(I). \end{aligned}$$
(36)

Proof

In view of Assumption 2.2, we add and subtract \( G(\bar{q}), G_{kh}(\bar{q})\), as well as \(G^{'}(\bar{q})(q_{\gamma }-\bar{q}) \), to obtain

$$\begin{aligned} G_{kh}(\bar{q}_{kh}^{r})+G^{'}(\bar{q}_{kh}^{r})(q_{\gamma }-\bar{q}_{kh}^{r})&= G(\bar{q}) + G^{'}(\bar{q})(q_{\gamma }-\bar{q}) +G_{kh}(\bar{q}_{kh}^{r})\\&\quad +G^{'}(\bar{q}_{kh}^{r})(q_{\gamma }-\bar{q}_{kh}^{r})-G(\bar{q})-G^{'}(\bar{q})(q_{\gamma }-\bar{q})\\&\le -\gamma + \underbrace{G_{kh}(\bar{q}_{kh}^{r})+G^{'}_{kh}(\bar{q}_{kh}^{r}) (\bar{q}-\bar{q}_{kh}^{r}) - G_{kh}(\bar{q})}_{(b_{1})} \\&\quad + \underbrace{G_{kh}(\bar{q})-G(\bar{q})}_{(b_{2})} + \underbrace{\big (G^{'}_{kh}(\bar{q}_{kh}^{r})-G^{'}(\bar{q})\big )(q_{\gamma }-\bar{q})}_{(b_{3})}. \end{aligned}$$
\((b_{1})\) :

Taylor expansion of \( G_{kh}(\bar{q}) \) at \( \bar{q}_{kh}^{r}\) reads

$$\begin{aligned} G_{kh}(\bar{q})=G_{kh}(\bar{q}_{kh}^{r})+G^{'}_{kh}(\bar{q}_{kh}^{r})(\bar{q}-\bar{q}_{kh}^{r})+\frac{1}{2}G^{''}_{kh}(q_{\zeta })(\bar{q}-\bar{q}_{kh}^{r})^{2}, \end{aligned}$$

with \( q_{\zeta } \) a convex combination of \(\bar{q}\) and \(\bar{q}_{kh}^{r}\), yielding

$$\begin{aligned} (b_{1}) = -\frac{1}{2}G_{kh}^{''}(q_{\zeta })(\bar{q}-\bar{q}_{kh}^{r})^{2}\le c_{2}\Vert \bar{q}-\bar{q}_{kh}^{r}\Vert _{L^{2}(I,{\mathbb {R}}^{m})}^{2} \le c_{2}r^{2}, \end{aligned}$$

where we used \( G_{kh} \) being a \( C^{2} \)-functional together with the feasibility of \( \bar{q}_{kh}^{r}\) for (\(\mathbb {P}_{kh}^{r}\)).

\( (b_{2}) \) :

By definition of \(c_1\) it holds

$$\begin{aligned} G_{kh}(\bar{q}_{k})-G(\bar{q})&= \int _{\varOmega }\big (u_{kh}(\bar{q})-u(\bar{q}) \big )\omega (x)\mathrm {d}x \le c_{1}\cdot c(k,h). \end{aligned}$$
\( (b_{3}) \) :

By definition of \(c_3\) it follows

$$\begin{aligned} (G'_{kh}(\bar{q}^r_{kh})-G'(\bar{q}))(q_\gamma -\bar{q}) \le c_3 \Bigl ( c(k,h)+ \frac{r^2}{2}\Bigr ). \end{aligned}$$

In conclusion, for kh sufficiently small and thanks to (35), the three estimates for \( (b_{1}), (b_{2}), (b_{3}) \) yield

$$\begin{aligned} G_{kh}(\bar{q}_{kh}^{r})+G^{'}(\bar{q}_{kh}^{r})(q_{\gamma }-\bar{q}_{kh}^{r})&\le -\gamma +c_{2}r^{2} +c_{1}\cdot c(k,h)+ c_{3}\Big (c(k,h)+\frac{r^{2}}{2}\Big )\\&\le -\gamma +(c_{2}+\frac{c_{3}}{2})r^{2}+(c_{1}+c_{3})\cdot c(k,h)\\&\le -\frac{3}{4}\gamma +(c_{1}+c_{3}) c(k,h)\\&\le -\frac{1}{2}\gamma . \end{aligned}$$

\(\square \)

We now introduce the feasible competitors for the continuous auxiliary problem (\(\mathbb {P}^{r}\)).

Proposition 5.2

Let \( \bar{q}_{kh}^{r} \) be a global optimum for (\(\mathbb {P}_{kh}^{r}\)) and \( q_{\gamma } \) be the Slater’s point from Assumption 2.2. Further, let

$$\begin{aligned} \tau (k,h)=\frac{c_{1}\cdot c(k,h)}{c_{4}r^{2}-\gamma } \end{aligned}$$

be given with a constant \( c_{4} \) such that \( 0< c_{4}r^{2}-\gamma < \gamma /2 \). Then, the sequence of controls defined by

$$\begin{aligned} q_{\tau (k,h)}= \bar{q}_{kh}^{r}+ \tau (k,h)(q_{\gamma }-\bar{q}_{kh}^{r}) \end{aligned}$$

is feasible for (\(\mathbb {P}^{r}\)), for kh sufficiently small.

Proof

The proof is analogous to the one of Proposition 5.1. \(\square \)

With these results at hands, we now show that global solutions of (\(\mathbb {P}_{kh}^{r}\)) converge to the considered local solution of (\(\mathbb {P}\)).

Proposition 5.3

Let kh be small enough, such that Propositions 5.1 and 5.2 hold. Let \( \bar{q} \) be a local solution for (\(\mathbb {P}\)) satisfying the assumptions of Theorem 2.3 and Assumption 2.4, and let \( \bar{q}_{kh}^{r}\) be a global solution of (\(\mathbb {P}_{kh}^{r}\)). Then it holds the error estimate

$$\begin{aligned} \Vert \bar{q}-\bar{q}_{kh}^{r}\Vert _{L^{2}(I,{\mathbb {R}}^{m})}^{2}\le c\Big (k\Big (\log \frac{T}{k}+1\Big )^{\frac{1}{2}}+h^{2}\Big (\log \frac{T}{k}+1\Big )\Big ). \end{aligned}$$
(37)

Proof

Let \( q_{t(k,h)}\) and \( q_{\tau (k,h)}\) be defined as in Proposition 5.1 and 5.2, respectively, and let kh be small enough, such that \( 0<t(k,h), \tau (k,h)<1 \). We have

$$\begin{aligned} \Vert \bar{q}-\bar{q}_{kh}^{r}\Vert _{L^{2}(I,{\mathbb {R}}^{m})} \le \Vert \bar{q}-q_{\tau (k,h)}\Vert _{L^{2}(I,{\mathbb {R}}^{m})}+\Vert q_{\tau (k,h)}-\bar{q}_{kh}^{r}\Vert _{L^{2} (I,{\mathbb {R}}^{m})}. \end{aligned}$$

For the second term, we have

$$\begin{aligned} \Vert q_{\tau (k,h)}-\bar{q}_{kh}^{r}\Vert _{L^{2}(I,{\mathbb {R}}^{m})} \le c\Big (k\Big (\log \frac{T}{k}+1\Big )^{\frac{1}{2}}+h^{2}\Big (\log \frac{T}{k}+1\Big )^{\frac{1}{2}}\Big ), \end{aligned}$$

since, by definition in Proposition 5.2, it is \( q_{\tau (k,h)}= \bar{q}_{kh}^{r}+ \tau (k,h)(q_{\gamma }-\bar{q}_{kh}^{r})\). Consequently, \(q_{\tau (k,h)}- \bar{q}_{kh}^{r}= \tau (k,h)(q_{\gamma }-\bar{q}_{kh}^{r})\) and convergence with order \( \tau (k,h) \) is asserted as \(\Vert q_{\gamma } -\bar{q}_{kh}^{r}\Vert _{L^{2}(I,{\mathbb {R}}^{m})} \le 2r\) by definition of \(Q^{r}_{kh,\mathrm{feas}} \). Therefore, we are left with the first term.

The competitor \( q_{\tau (k,h)}\) is feasible for (\(\mathbb {P}^{r}\)) and, using the quadratic growth condition (14), we obtain

$$\begin{aligned} \delta \Vert \bar{q}-q_{\tau (k,h)}\Vert _{L^{2}(I,{\mathbb {R}}^{m})}^{2}&\le j(q_{\tau (k,h)})-j(\bar{q})\\&= j(q_{\tau (k,h)})-j_{kh}(\bar{q}_{kh}^{r})+j_{kh}(\bar{q}_{kh}^{r})-j_{kh}(q_{t(k,h)})\\ {}&\;\;\;\;+j_{kh}(q_{t(k,h)})-j(\bar{q})\\&\le \underbrace{j(q_{\tau (k,h)})-j_{kh}(\bar{q}_{kh}^{r})}_{(d_{1})} + \underbrace{j_{kh}(q_{t(k,h)})-j(\bar{q})}_{(d_{2})}, \end{aligned}$$

where, in the last step, we have used that \( q_{t(k,h)}\in Q_{kh,\mathrm{feas}}^{r} \) and \( \bar{q}_{kh}^{r}\) is a global optimum for (\(\mathbb {P}_{kh}^{r}\)).

We now analyze the two terms separately.

\((d_{1})\) :

With simple algebraic manipulations, using the definition of the objective function j, its discrete counterpart \(j_{kh}\), the Cauchy-Schwarz inequality and binomial formulas, we have

$$\begin{aligned} j(q_{\tau (k,h)})-j_{kh}(\bar{q}_{kh}^{r})&\le \frac{1}{2}\Vert u(q_{\tau (k,h)})+u_{kh}(\bar{q}_{kh}^{r})-2u_{\mathrm{d}}\Vert _{I}\Vert u(q_{\tau (k,h)})-u_{kh}(\bar{q}_{kh}^{r})\Vert _{I}\\&\quad \, +\frac{\alpha }{2}\Vert q_{\tau (k,h)}+\bar{q}_{kh}^{r}\Vert _{L^{2}(I,{\mathbb {R}}^{m})}\Vert q_{\tau (k,h)}-\bar{q}_{kh}^{r}\Vert _{L^{2}(I,{\mathbb {R}}^{m})}. \end{aligned}$$

Then, by means of the stability of the solution u and \( u_{kh} \) of (2) and (18), respectively, together with the boundedness of \( Q_{\mathrm{ad}} \), and with the help of the Cauchy-Schwarz inequality, we get

$$\begin{aligned} j(q_{\tau (k,h)})-j_{kh}(\bar{q}_{kh}^{r})&\le c\Big (\Vert u(q_{\tau (k,h)})-u(\bar{q}_{kh}^{r})\Vert _{I}+\Vert u(\bar{q}_{kh}^{r})-u_{kh}(\bar{q}_{kh}^{r})\Vert _{I}\\&\quad +\Vert q_{\tau (k,h)}-\bar{q}_{kh}^{r}\Vert _{L^{2}(I,{\mathbb {R}}^{m})} \Big )\\&\le c \Big (\Vert u(\bar{q}_{kh}^{r})-u_{kh}(\bar{q}_{kh}^{r})\Vert _{I}+\Vert q_{\tau (k,h)}-\bar{q}_{kh}^{r}\Vert _{L^{2}(I,{\mathbb {R}}^{m})} \Big ), \end{aligned}$$

where, in the last step, we have used (4a).

The first term is a discretization error that can be estimated by [9, Theorems 3.3 and 4.2] together with the regularity of the solution of (2), obtaining

$$\begin{aligned} \Vert u(\bar{q}_{kh}^{r})-u_{kh}(\bar{q}_{kh}^{r})\Vert _{I} \le c(k+h^{2}). \end{aligned}$$

The estimate for the second term, \(\Vert q_{\tau (k,h)}-\bar{q}_{kh}^{r}\Vert \), follows directly from Proposition 5.2. Summing up, we conclude

$$\begin{aligned} j(q_{\tau (k,h)})-j_{k}(q_{k}^{r})&\le c\Big (k+h^{2} + k\Big (\log \frac{T}{k}+1\Big )^{\frac{1}{2}}+h^{2}\Big (\log \frac{T}{k}+1\Big )\Big )\\&\le c\Big (k\Big (\log \frac{T}{k}+1\Big )^{\frac{1}{2}}+h^{2}\Big (\log \frac{T}{k}+1\Big )\Big ). \end{aligned}$$
\( (d_{2}) \) :

We proceed exactly as for \( (d_{1}) \) and obtain

$$\begin{aligned} j_{kh}(q_{t(k,h)})-j(\bar{q})&\le \frac{1}{2}\Vert u_{kh}(q_{t(k,h)})+u(\bar{q})-2u_{\mathrm{d}}\Vert _{I}\Vert u_{kh}(q_{t(k,h)})-u(\bar{q})\Vert _{I}\\&\quad +\frac{\alpha }{2}\Vert q_{t(k,h)}+\bar{q}\Vert _{L^{2}(I,{\mathbb {R}}^{m})}\Vert q_{t(k,h)}-\bar{q}\Vert _{L^{2}(I,{\mathbb {R}}^{m})}\\&\le c\Big (\Vert u_{kh}(q_{t(k,h)})-u(q_{t(k,h)})\Vert _{I}+\Vert q_{t(k,h)}-\bar{q}\Vert _{L^{2}(I,{\mathbb {R}}^{m})} \Big )\\&\le c\Big (k\Big (\log \frac{T}{k}+1\Big )^{\frac{1}{2}}+h^{2}\Big (\log \frac{T}{k}+1\Big )\Big ). \end{aligned}$$

Combining \( (d_{1}) \) with \( (d_{2}) \), we have the assertion. \(\square \)

It is readily seen that, for kh small enough, global solutions of (\(\mathbb {P}_{kh}^{r}\)) are local solutions of (\(\mathbb {P}_{kh}\)), as the constraint \( \Vert \bar{q}-\bar{q}_{kh}^{r}\Vert _{L^{2}(I, {\mathbb {R}}^{m})}\le r \) is not active. In particular, this ensures the existence of a sequence \( \bar{q}_{kh} \), of local solutions to (\(\mathbb {P}_{kh}\)), converging to \( \bar{q} \). We formalize this in the main result of the paper

Theorem 5.3

Let \( \bar{q}\) be a local solution of (\(\mathbb {P}\)) satisfying the assumptions of Theorem 2.3 and Assumption 2.4. Then, for kh sufficiently small, there exists a sequence \( (\bar{q}_{kh}) \) of local solution of (\(\mathbb {P}_{kh}\)) converging to \( \bar{q} \) as \(k,h \rightarrow 0\). Further, there holds the error estimate

$$\begin{aligned} \Vert \bar{q}-\bar{q}_{kh}\Vert _{L^{2}(I,{\mathbb {R}}^{m})}^{2} \le c\Bigg (k\Big (\log \frac{T}{k}+1\Big )^{\frac{1}{2}}+h^{2}\Big (\log \frac{T}{k}+1\Big )\Bigg ). \end{aligned}$$

6 Conclusions

Within this paper, we have shown that a weak second-order sufficient condition, and its implied quadratic growth condition, holding in a local minimizer of a quadratic optimization problem constrained by a semilinear parabolic equation, is sufficient to assert that this minimizer can be approximated by a space-time finite element approximation. The result heavily relies on the fact that for such problems no two-norm discrepancy is present—the extension to more general optimization problems, where this discrepancy cannot be avoided, has to be considered an open problem.