1 Introduction

In this paper we consider the optimal control problem

$$ \min_{y\in K_Y,\ u\in K_U} J(y,u)={1\over 2}\|y-y_d \|^2_{L^2(\varOmega_T)}+{\alpha\over 2}\|u\|^2_U $$
(1.1)

subject to

$$ \left. \begin{array}{l@{\quad}l} y_t -\varDelta y+y=\mathcal{B}u & \mbox{in } \varOmega_T, \\[5pt] \dfrac{\partial y}{\partial n}=0 &\mbox{on } \varGamma_T,\\[8pt] y(0)=y_0 &\mbox{in } \varOmega, \end{array} \right\} $$
(1.2)

where Ω T =Ω×(0,T], Γ T =∂Ω×(0,T], Ω is an open bounded domain in ℝ2 with sufficiently smooth boundary Γ=∂Ω, and α>0, T>0, y d L 2(Ω T ) are fixed. The precise smoothness requirements on Γ are given in the next section. The initial value y 0 is specified in Sect. 2. The constraints on control and state are specified through the closed and convex subsets

$$ u\in K_U:=\bigl\{u\in L^2(\varOmega_T): a \leq u(x,t)\leq b,\ \mbox{for a.a. } (x,t)\in \varOmega_T\bigr\} \subset U $$
(1.3)

for controls with U:=L 2(Ω T ) and

$$ y\in K_Y:=\bigl\{y\in L^{\infty}(\varOmega_T): y(x,t)\leq \phi,\ \mbox{for a.a. } (x,t)\in \varOmega_T\bigr\} $$
(1.4)

for states, where for simplicity we assume that a, b and ϕ denote constants with a<b. Furthermore, \(\mathcal{B}:L^{2}(\varOmega_{T})\rightarrow L^{2}(0,T;H^{1}(\varOmega)^{*})\) denotes the injection.

State constrained optimal control problems are important from the practical point of view. The numerical analysis for these problems is involved since the multipliers associated to constraints on the state in general are Borel measures. To the best of the authors’ knowledge, there are only a few contributions to numerical analysis of parabolic optimal control problems with state constraints. Lavrentiev regularization of state constrained parabolic optimal control problems is studied in [24]. Recently, error estimates for state constrained parabolic control problem with controls of the form

$$ (\mathcal{B}u) (x,t):=\sum_{i=1}^{m}u_i(t)f_i(x) \quad (x,t)\in \varOmega_T, $$
(1.5)

are derived in [9], where f 1,…,f m H 1(Ω)∩L (Ω) are given functions. Error analysis for optimal control problems with final state constraints and control constraints is considered in [29]. Finally, in [22] a priori error estimates for parabolic optimal control problems with pointwise state constraints in time are considered. Other related work on state constrained optimal control problems and parabolic control problems can be found in, e.g., [1, 3, 4, 7, 18, 21].

In this paper we consider an optimal control problem for the heat equation with distributed control and pointwise control and state constraints. The optimization problem is approximated using variational discretization proposed in [14] combined with linear finite elements in space and the dG(0) scheme in time for the discretization of the state equation. Based on an improved maximum error estimate for the state equation, we among other things derive in Theorem 4 the L 2-norm error estimate

(1.6)

where (u,y) and (u h,k ,Y h,k ) denote the continuous and discrete optimal controls and states with 4<s<∞ denoting a positive real number related to the regularity of y, and h, k denote the space and time discretization parameters which have to be coupled appropriately. In the analysis we use results of [26] concerning the parabolic projection, which are only proved for two spatial dimensions. Therefore we restrict our analysis to the case Ω⊂ℝ2.

The rest of this paper is organized as follows. In Sect. 2 we present the state constrained optimal control problem and the corresponding optimality conditions. In Sect. 3 we establish the fully discrete approximation for the state equation and derive uniform estimates for the discretization error of the state. We obtain the a priori error estimates for the optimal control problem in Sect. 4. We also present some numerical experiments to support our theoretical findings.

2 Optimal control problem

Let Ω⊂ℝ2 be a convex domain which matches the smoothness requirements in [17, Chap. 4, § 4]. Specifically, we require that ΓO 2. We say that a surface Γ⊂ℝn belongs to class O l (l≥1) if there exists a number ρ>0 such that the intersection of Γ with a ball K ρ of radius ρ with center at an arbitrary point x 0Γ is a connected surface, which locally can be represented as the graph of a function ω of class O l, i.e., in the local coordinate system (y 1,…,y n ) with origin at x 0 one has y n =ω(y 1,…,y n−1) for (y 1,…,y n )∈ΓK ρ , and ω is a function of class O l on the projection of ΓK ρ onto the subspace y n =0. Here \(O^{l}(\bar{\varOmega})\) is the set of all continuous functions in \(\bar{\varOmega}\) having continuous derivatives in \(\bar{\varOmega}\) up to order l−1, with the derivatives of order l−1 having a first differential at each point of \(\bar{\varOmega}\) and the derivative of order l being bounded in \(\bar{\varOmega}\) (see [17, Chap. 1, pp. 9–10] for more details). For example, convex domains Ω with \(\mathcal{C}^{2,1}\) boundary Γ meet the above mentioned requirements. Here we note that convex polygonal domains do not meet our regularity assumptions on the domain. However, convex polygonal domains could be included in our analysis by assuming that the states appearing in our analysis satisfy the required regularity assumptions.

For nonnegative integer m we adopt the standard notation W m,s(Ω) for Sobolev spaces on Ω with norm ∥⋅∥ m,s,Ω and seminorm |⋅| m,s,Ω , and the standard modification for s=∞. We denote by H m(Ω) with norm ∥⋅∥ m,Ω and seminorm |⋅| m,Ω for s=2. Note that H 0(Ω)=L 2(Ω). For real numbers m we define the Sobolev space W m,s(Ω) by interpolation, i.e., W m,s(Ω):=(W k,s(Ω),W k+1,s(Ω)) θ,s , k∈ℕ, m∈(k,k+1), θ=m−⌊m⌋. We denote the L 2-inner products on L 2(Ω) by

$$(v,w)=\int_{\varOmega}vwdx\quad \forall v,w\in L^2( \varOmega). $$

With Ω T =Ω×(0,T] let H s,r(Ω T )=L 2(0,T;H s(Ω))∩H r(0,T;L 2(Ω)) equipped with the norm

$$\|w\|_{s,r,\varOmega_T}= \biggl(\int_0^T\bigl\|w( \cdot,t)\bigr\|^2_{s,\varOmega}dt + \int_{\varOmega}\bigl\|w(x, \cdot)\bigr\|^2_{r,[0,T]}dx \biggr)^{1\over 2}, $$

where ∥⋅∥ r,[0,T] denotes the norm on H r([0,T]). For a real positive number l, \(C^{2l,l}(\bar{\varOmega}_{T})\) is the space of functions which are continuous in \(\bar{\varOmega}_{T}\), together with all derivatives of the form \(D_{t}^{r}D_{x}^{s}\) for 2r+s<l (see [17], p. 7), where D t and D x denote the derivatives w.r.t time and space, respectively. Throughout the presentation c and C denote generic positive constants.

The variational form of problem (1.2) reads: Given uU and y 0L 2(Ω), find yL 2(0,T;H 1(Ω))∩H 1(0,T;H −1(Ω)) such that

$$ \left . \begin{array}{l@{\quad}l} (y_t,v)+(\nabla y,\nabla v)+(y,v)=(\mathcal{B}u,v)& \forall v\in H^1(\varOmega), \mbox{a.e. } t\in (0,T],\\[3pt] y(x,0)=y_0(x)& x\in \varOmega. \end{array} \right \} $$
(2.1)

We denote \(y=\mathcal{G}(\mathcal{B}u)\) the solution to problem (2.1). It is well-known that if \(\mathcal{B}u\in L^{2}(\varOmega_{T})\), y 0H 1(Ω), problem (2.1) admits a unique solution \(y=\mathcal{G}(\mathcal{B}u)\in H^{2,1}(\varOmega_{T}):=L^{2}(0,T;H^{2}(\varOmega))\cap H^{1}(0,T;L^{2}(\varOmega))\hookrightarrow C([0,T];H^{1}(\varOmega))\).

We define \(W^{2,1}_{s}(\varOmega_{T})\) (1≤s<∞) as

$$W^{2,1}_s(\varOmega_T):= \bigl\{y\in L^s\bigl(0,T;W^{2,s}(\varOmega)\bigr),\ y_t\in L^s\bigl(0,T;L^s(\varOmega)\bigr) \bigr\} $$

and use ∥⋅∥2,1,s to denote the norm defined on \(W^{2,1}_{s}(\varOmega_{T})\). Thanks to the control constraints given by (1.3), we have \(\mathcal{B}u\in L^{\infty}(\varOmega_{T})\). If in addition \(y_{0}\in W^{2-{2\over s},s}(\varOmega)\) for some 2<s<∞, from [17, Chap. 4, Thm. 9.1] we with our assumptions on the domain Ω have the improved regularity \(y\in W^{2,1}_{s}(\varOmega_{T})\) such that

$$\|y\|_{2,1,s}\leq C\bigl(\|\mathcal{B}u\|_{0,\infty,\varOmega}+\|y_0 \|_{2-{2\over s},s,\varOmega}\bigr) $$

with a positive constant C depending on Γ but not on s. We note that since \(\mathcal{B}u\in L^{\infty}(\varOmega_{T})\), we obtain for y 0W 2,∞(Ω) that \(y\in W^{2,1}_{s}(\varOmega_{T})\), ∀1<s<∞. In what follows we fix \(y_{0}\in W^{2-{2\over s},s}(\varOmega)\) for some 2<s<∞ which will be specified later, y 0<ϕ and \(\frac{\partial y_{0}}{\partial n}=0\) on Γ throughout the paper.

Our optimal control problem reads:

$$ \left . \begin{array}{l@{\quad}l} \min & J(y,u)=\dfrac{1}{2}\|y-y_d\|^2_{L^2(\varOmega_T)}+\dfrac{\alpha}{2}\|u\|^2_{L^2(\varOmega_T)}\\[8pt] \mbox{s.t. } & y=\mathcal{G}(\mathcal{B}u),\quad \mbox{and } y\in K_Y,\ u\in K_U. \end{array} \right \} $$
(2.2)

Since J is quadratic and K U and K Y are closed and convex, problem (2.2) admits a unique solution \((y,u)\in W^{2,1}_{s}(\varOmega_{T})\times K_{U}\). Note that \(W^{2,1}_{s}(\varOmega_{T})\hookrightarrow C(\bar{\varOmega}_{T})\) holds for s>2, so that it is meaningful to require

Assumption 1

(Slater condition)

There exists \(\hat{u}\in K_{U}\) such that the associated state \(\hat{y}\) fulfills (1.4) strictly, i.e. \(\hat{y}(x,t)< \phi\) holds for all \((x,t)\in \bar{\varOmega}_{T}\).

With this assumption it then follows from e.g., [5, 10, 24] that the first order optimality conditions for the optimal control problem (2.2) are given by

Theorem 1

Assume that uL (Ω T ) is the solution of problem (1.1) and let y be the corresponding state given by (2.1). Let \(\mathcal{M}(\bar{\varOmega}_{T})\) denote the space of regular Borel measures on \(\bar{\varOmega}_{T}\). Then there exists an adjoint state pL q(0,T;W 1,σ(Ω)) for all q,σ∈[1,2) with \({2\over q}+{2\over \sigma}>3\), and a Lagrange multiplier \(\mu\in \mathcal{M}(\bar{\varOmega}_{T})\) such that

$$ \left . \begin{array}{l@{\quad}l} -p_t -\varDelta p+p=y-y_d+\mu_{\varOmega_T} &\mbox{\textit{in} }\varOmega_T, \\[5pt] \dfrac{\partial p}{\partial n}=\mu_{\varGamma_T} &\mbox{\textit{on} } \varGamma_T,\\[8pt] p(T)=\mu_T &\mbox{\textit{in} } \varOmega \end{array} \right \} $$
(2.3)

is satisfied in the sense of transposition, and

(2.4)
(2.5)

holds. Here \(\mu_{\varOmega_{T}}:=\mu|_{\varOmega_{T}}\), \(\mu_{\varGamma_{T}}:=\mu|_{\varGamma_{T}}\) and \(\mu_{T}:=\mu|_{\bar{\varOmega}\times \{T\}}\).

Proof

For a proof we refer to e.g., [5, 9]. We note that in [9] a proof for a slightly different setting is provided whose adaption to the present situation is obvious. □

Let us note that (2.3) is satisfied in the sense of transposition (see [19]), if

(2.6)

holds, where

3 Finite element discretization and error estimates for the state equation

Let Ω hΩ be a polygonal approximation to Ω with a boundary Γ h =∂Ω h. From the assumption on the boundary ∂Ω we have |ΩΩ h|≤Ch 2. For the spatial discretization let \(\mathcal{T}^{h}\) be a quasi-uniform partitioning of Ω h into disjoint regular triangles or rectangles τ, so that \(\bar{\varOmega}^{h}=\bigcup_{\tau\in \mathcal{T}^{h}}\bar{\tau}\). We assume that all vertices of \(\mathcal{T}^{h}\) that are on the boundary Γ h stay on the boundary ∂Ω. Let h τ denote the diameter of τ. Set \(h=\max_{{\tau}\in \mathcal{T}_{h}}h_{\tau}\). Associated with \(\mathcal{T}_{h}\) is a finite dimensional space V h consisting of piecewise linear and continuous polynomials. We note that functions in V h can continuously be extended to Ω such that \(V^{h}\subset C(\bar{\varOmega})\) holds, i.e, for a boundary element τ, the function v h defined on the part of ΩΩ h which shares one edge with the element τ is the linear extension of v h | τ . It is easy to see that V hH 1(Ω). Since \(\mathcal{T}^{h}\) is quasi-uniform, the following inverse estimates (see [6])

(3.1)
(3.2)

hold for all v h V h.

Remark 1

In [8] a quasi-uniform partition with curved boundary elements is proposed for the approximation of elliptic optimal control problems with state constraints. This approach also would be applicable in the present situation.

Let \(\varPi_{h}:C(\bar{\varOmega})\rightarrow V^{h}\) denote the standard Lagrange interpolation operator. Interpolation error estimates imply that for yW m,r(Ω), r>2 (see, e.g., [6])

(3.3)

Let \(\mathcal{R}_{h}:H^{1}(\varOmega)\rightarrow V^{h}\) denote the Ritz projection operator defined as

(3.4)

Lemma 1

Let \(\mathcal{R}_{h}\) be the Ritz projection operator defined above. Then there holds:

(3.5)
(3.6)

Proof

A result related to (3.5) is proved by Rannacher and Scott in [28] for Dirichlet boundary conditions, but the arguments can be adapted to the present situation, we omit the details here. The result of (3.6) can be found in [30]. □

Then the semi-discrete finite element approximation of (2.1) reads: Given uU and \(y_{0}^{h}\in V^{h}\), find y h (t)∈H 1(0,T;V h) such that

$$ \left . \begin{array}{l} \biggl(\dfrac{\partial y_h}{\partial t},w_h\biggr)+(\nabla y_h,\nabla w_h)+(y_h,w_h)=(\mathcal{B}u,w_h)\quad\forall w_h\in V^h,\ t\in(0,T],\\[3pt] y_h(x,0)=y_0^h(x)\quad x\in\varOmega \end{array} \right \} $$
(3.7)

is satisfied. Here \(y_{0}^{h}=\varPi_{h} y_{0}\in V^{h}\) is an approximation to y 0.

We next consider the fully discrete approximation for above semidiscrete problem. Let 0=t 0<t 1<⋯<t N−1<t N =T be a time grid with t n =nk, n=1,2,…,N, where \(k:=\frac{T}{N}\). Let I n =(t n−1,t n ] and

$$V_{h,k}:= \bigl\{\phi:\bar{\varOmega}\times [0,T]\rightarrow \mathbb{R},\ \phi(\cdot,t)|_{\bar{\varOmega}}\in V^h,\ \phi(x, \cdot)|_{I_n}\in \mathbb{P}_0\ \mbox{for } n=1,\ldots,N \bigr\}, $$

i.e. ϕV h,k is a piecewise constant polynomial w.r.t. time. For Y,ΦV h,k we set

$$A(Y,\varPhi):=\sum_{n=1}^N \bigl( \bigl(Y^n-Y^{n-1},\varPhi^n\bigr)+k\bigl(\nabla Y^n,\nabla\varPhi^n\bigr)+k\bigl(Y^n, \varPhi^n\bigr) \bigr)+\bigl(Y_+^0,\varPhi_+^0 \bigr), $$

where \(\varPhi^{n}:=\varPhi^{n}_{-}\), \(\varPhi^{n}_{\pm}=\lim_{s\rightarrow 0^{\pm}}\varPhi(t_{n}+s)\).

The fully discrete dG(0)–cG(1) approximation scheme for (3.7) now reads: Given uU, find Y h,k V h,k such that

$$ A(Y_{h,k},\varPhi)=\sum_{n=1}^N \int_{I_n}\bigl( \mathcal{B}u,\varPhi^n\bigr)+ \bigl(y_0,\varPhi_+^0\bigr),\quad \forall \varPhi\in V_{h,k}. $$
(3.8)

Note that on each time interval I n , the solution \(Y_{h,k}^{n}\in V^{h}\) satisfies

$$ \left . \begin{array}{l} \biggl(\dfrac{Y_{h,k}^n-Y_{h,k}^{n-1}}{k},w_h\biggr)+\bigl(\nabla Y_{h,k}^n,\nabla w_h\bigr)+ \bigl(Y_{h,k}^n, w_h\bigr)=\bigl((\overline{\mathcal{B}u})^n,w_h\bigr),\\[8pt] \quad\forall w_h\in V^h;\ n=1,\ldots,N,\ Y_{h,k}^0(x)=y_0^h(x)\ x\in\varOmega, \end{array} \right \} $$
(3.9)

where \(\overline{\mathcal{B}u}= (\frac{1}{k}\int_{t_{n-1}}^{t_{n}}\mathcal{B}u )_{n=1}^{N}\) and \(y_{0}^{h}\in V^{h}\) is an approximation to y 0.

Now we are in a position to estimate the error between the solutions of problem (2.1) and (3.8). The following result is a standard consequence of error estimates for parabolic equation (see, e.g., [12]).

Theorem 2

Let \(\mathcal{B}u\in L^{2}(\varOmega_{T})\), let yH 2,1(Ω T ) be the solution to problem (2.1), and let Y h,k V h,k be the solution to problem (3.8). Then we have

$$ \|y-Y_{h,k}\|_{L^2(0,T;L^2(\varOmega))}\leq C\bigl(h^{2}+k\bigr)\|y \|_{2,1,\varOmega_T}. $$
(3.10)

We also need the maximum norm estimates for the state equation. Using ideas of [26] it is convenient to introduce the weighted-norm technique. For this purpose let

$$\rho(x):=\bigl(|x-z|^2+\omega^2\bigr)^{1\over 2} \quad \forall x\in \varOmega, $$

where zΩ and ω=Ch|logh|. The choice of C≥1 and z will be specified latter. It is easy to verify (see, e.g., Lemma 4.3 in [26] or p. 216 in [2]) that

$$ \int_{\varOmega}\rho(x)^{-m}dx\leq C\bigl((m-2) \omega^{m-2}\bigr)^{-1}\quad \mbox{for } m>2. $$
(3.11)

Theorem 3

Let \(\mathcal{B}u\in L^{\infty}(\varOmega_{T})\), \(y\in W^{2,1}_{s}(\varOmega_{T})\) (4<s<∞) be the solution of problem (2.1), and Y h,k V h,k be the solution of problem (3.8). There exists C ≥1 such that if ω=C h|logh| and kC h 2|logh|3, then

(3.12)

holds, where \(Y_{h,k}^{n}\) (n=1,…,N) is the solution of (3.8).

Proof

The proof follows [26] (see, e.g., pp. 501–504), where an error estimate for the parabolic projection in the case of Dirichlet boundary conditions is presented. The proof in the present situation is slightly different. We sketch it for the convenience of the reader. We also note that we impose Neumann boundary conditions for the state.

With

$$\bar{y}^n:=\frac{1}{k}\int_{t_{n-1}}^{t_n}y( \cdot,t)dt $$

we have

Thus

(3.13)

Note that (see [17, 26])

with m=4<s<∞. From (3.3) and (3.6) we deduce

(3.14)

where interpolation error estimates in space and time are used (see e.g., [2]).

It remains to estimate \(\|\mathcal{R}_{h}\bar{y}^{n}-Y_{h,k}^{n}\|_{0,\infty,\varOmega}\). Suppose that zΩ is the point where \(\|\mathcal{R}_{h}\bar{y}^{n}-Y_{h,k}^{n}\|_{0,\infty,\varOmega}\) achieves its maximal value. Using Thm. 3.3.3 (p. 151) in [6] (see also [25]) we have

(3.15)

Integration of (2.1) from t i−1 to t i yields

where y i=y(x,t i ), i=0,1,…,N, so that using (3.9) we have for all vV h

(3.16)

Let Z i−1V h, i=1,2,…,n be the unique solution of following backward fully discrete problem

$$ \bigl(Z^{i-1}-Z^i,w_h\bigr)+k\bigl(\nabla Z^{i-1},\nabla w_h\bigr)+k\bigl(Z^{i-1}, w_h\bigr)=0\quad \forall w_h\in V^h $$
(3.17)

with Z n=ζV h, 1≤nN. Now let v=Z i−1 in (3.16). Summing from 1 to n we find

$$ \bigl(\eta^n,\zeta\bigr)=\bigl(\xi^n,\zeta\bigr)+\sum _{i=1}^n\bigl(\xi^i,Z^{i-1}-Z^i \bigr)+\bigl(Y_h^0-y_0,Z^0\bigr), $$
(3.18)

where we have used (3.17) and the fact that η iV h. Setting \(\zeta=\mathcal{P}_{h}(\rho^{-2}\eta^{n})\) in (3.18), where \(\mathcal{P}_{h}: L^{2}(\varOmega)\rightarrow V^{h}\) is the L 2-projection operator, we deduce from η nV h that

(3.19)

Since \(\|\rho \mathcal{P}_{h}v\|_{0,\varOmega}\leq C\|\rho v\|_{0,\varOmega}\) we have

Thus Young’s inequality implies

(3.20)

Now we need a priori estimates for Z i−1Z i and Z 0 in weighted norms. According to [11] there exists C ≥1 such that if ω=C h|logh| and kC h 2|logh|3, the estimate

(3.21)

holds. We note that the estimate (3.21) was proved in [11] for problems with Dirichlet boundary conditions defined on polygonal domain. The technique used there was to exploit the properties of the L 2-projections in weighted norms, and can be adapted to our cases where smooth domain and homogeneous Neumann boundary conditions are employed.

By exploiting the property of the weight function ρ, the following estimates can be found in, e.g., [26], and generalized to our case:

(3.22)
(3.23)

where Hölder’s inequality and property (3.11) are used. Then the interpolation error estimate (3.3), (3.11) and Hölder’s inequality with \(q={s\over{s-2}}\) lead to

(3.24)

as well as to

$$ \Biggl(k\sum_{i=1}^n\bigl\|\xi^i \bigr\|_{0,s,\varOmega}^s\Biggr)^{1\over s}\leq Cs\bigl(h^2+k \bigr)\|y\|_{2,1,s}, $$
(3.25)

see [26, Lemma 4.9]. The later one used the L s-norm error estimates (3.5) for the Ritz-projection \(\mathcal{R}_{h}\). From (3.20)–(3.25) we then have that

$$ \bigl\|\rho^{-1}\eta^n\bigr\|_{0,\varOmega}\leq Cs^2| \log h|\bigl(h^{2-{4\over s}}+k^{1-{1\over s}}\bigr) \bigl(\|\mathcal{B}u \|_{0,\infty,\varOmega}+\|y_0\|_{2-{2\over s},2,\varOmega}\bigr). $$
(3.26)

With (3.15) and (3.26) we conclude that

(3.27)

Combining (3.14) and (3.27) we complete the proof of the theorem. □

Remark 2

A uniform error estimate for the discretized error of (2.1) and (3.8) is derived in [9] under the condition that the right hand side and hence the time derivative of the solution is only square integrable in time. Here the right hand side is uniformly bounded w.r.t. space and time, which guarantees an improved regularity of the solution and thus an improved error estimate.

4 Error estimates for optimal control problem

In this section we consider the finite element approximation and error estimates for optimal control problem (1.1)–(1.2).

We consider the variational discretization approach proposed in [9, 14]. Then the fully discrete optimization problem reads

$$ \min_{u\in K_U}J_h(Y_{h,k},u)=\sum ^{N}_{i=1}{ 1\over 2}k\int _{\varOmega}\bigl(Y_{h,k}^i-\bar{y}_d^i\bigr)^2+{\alpha\over 2}\int _0^T\int_{\varOmega}u^2 $$
(4.1)

subject to

$$ \left . \begin{array}{l} \displaystyle A(Y_{h,k},\varPhi)=\sum_{n=1}^N\int_{I_n}( \mathcal{B}u,\varPhi)+\bigl(y_0,\varPhi_+^0\bigr),\quad \forall \varPhi\in V_{h,k}.\\[13pt] Y_{h,k}^i(x_j)\leq \phi, \quad i=1,\ldots,N,\ j=1,\ldots, m, \end{array} \right \} $$
(4.2)

where m denotes the number of nodes in the triangulation \(\mathcal{T}^{h}\). Let \(\hat{u}\) denote the control satisfying Assumption 1, i.e., there exists δ>0 such that the corresponding state \(\hat{y}=\mathcal{G}(\mathcal{B}\hat{u})\) satisfies

$$ \hat{y}=\mathcal{G}(\mathcal{B}\hat{u})\leq \phi-\delta\quad \mbox{in } \bar{\varOmega}_T. $$
(4.3)

It follows from Theorem 3 that there exist h 0,k 0>0 such that \(\hat{Y}_{h,k}:=\mathcal{G}_{h,k}(\mathcal{B}\hat{u})\in V_{h,k}\) satisfies

$$ \hat{Y}_{h,k}^n(x_j)\leq \phi- {\delta\over 2}<\phi,\quad 1\leq n\leq N,\ 0<h\leq h_0,\ 0<k \leq k_0. $$
(4.4)

Thus the pair \((\hat{u},\hat{Y}_{h,k})\) is a discrete feasible Slater point for problem (4.1) for h and k small enough.

As a minimization problem for a quadratic functional over a closed convex set, the variational discrete optimization problem (4.1)–(4.2) admits a unique solution u h,k K U with corresponding state Y h,k V h,k . Furthermore, it follows from [5] again that the discrete Slater condition (4.4) guarantees the existence of a discrete co-state P h,k V h,k and discrete Lagrange multiplier \(\mu_{j}^{i}\in \mathbb{R}\), i=1,…,N, j=1,…,m, such that the triplet (Y h,k ,P h,k ,u h,k )∈V h,k ×V h,k ×K U , satisfies the following optimality conditions:

(4.5)
(4.6)
(4.7)
(4.8)

It follows from (4.7) that u h,k is piecewise constant w.r.t. time, but in general u h,k is not a finite element function w.r.t. space. It is easy to show that

$$u_{h,k}=\mathcal{P}_{K_{U}}\biggl(-{1\over\alpha} \mathcal{B}^* P_{h,k}\biggr), $$

where \(\mathcal{P}_{K_{U}}\) denotes the orthogonal projection in U onto K U . Let us define measure \(\mu_{h,k}\in \mathcal{M}(\bar{\varOmega}_{T})\) by

(4.9)

For v h,k V h,k we use the notation

(4.10)

As a first result for (4.1)–(4.2) we prove that the sequence of optimal controls, states and measures μ h are uniformly bounded.

Lemma 2

Let (Y h,k ,u h,k )∈V h,k ×K U be the solutions of problem (4.1)(4.2), P h,k V h,k and \(\mu_{h,k}\in \mathcal{M}(\bar{\varOmega}_{T})\) be the corresponding adjoint state and measure, respectively. Then there exists h 0>0 such that

$$\|u_{h,k}\|_{L^2(0,T;L^2(\varOmega))}^2+\|Y_{h,k} \|^2_{L^2(0,T;L^2(\varOmega))}+\sum_{i=1}^N \sum_{j=1}^m\mu^i_j \leq C\quad \mbox{\textit{for all} } 0<h\leq h_0. $$

Proof

The proof follows [9]. From (4.4) and (4.8) we obtain

(4.11)

where we have used (4.5) and (4.6). This completes the proof of the Lemma. □

Now we are in a position to prove the main result of this paper. We use a proof technique developed in Chap. 3 of [15] which only relies on uniform a priori error estimates of the state approximation.

Theorem 4

Let \((y,u)\in W^{2,1}_{s}(\varOmega_{T})\times L^{\infty}(\varOmega_{T})\), 4<s<∞ and (Y h,k ,u h,k )∈V h,k ×K U be the solutions to problem (2.2) and (4.1)(4.2). Assume that the assumptions of Theorem 3 are all satisfied. Then we have the following a priori error estimate:

(4.12)

Proof

From (2.4) and (4.7) we have

$$\int_{\varOmega_T}\bigl(\alpha u+\mathcal{B}^*p\bigr) (v-u)\geq 0 \quad \forall v\in K_{U} $$

and

$$\int_{\varOmega_T}\bigl(\alpha u_{h,k}+\mathcal{B}^* P_{h,k}\bigr) (v_h-u_{h,k})dx\geq 0\quad \forall v_h\in K_U. $$

Choosing v=u h,k as well as v h =u, and adding these inequalities gives

(4.13)

In the following we need to introduce some auxiliary problems. Let \(y^{h}:=\mathcal{G}(\mathcal{B}u_{h,k})\in W^{2,1}_{s}(\varOmega_{T})\) be the variational solution of

$$ \left . \begin{array}{l@{\quad}l} y^h_t -\varDelta y^h+y^h=\mathcal{B}u_{h,k} &\mbox{in } \varOmega_T, \\[5pt] \dfrac{\partial y^h}{\partial n}=0 &\mbox{on } \varGamma_T,\\[8pt] y^h(0)=y_0 &\mbox{in } \varOmega, \end{array} \right \} $$
(4.14)

i.e.

$$\biggl(\frac{\partial y^h}{\partial t},v\biggr)+\bigl(\nabla y^h,\nabla v\bigr)+ \bigl(y^h,v\bigr)=(\mathcal{B}u_{h,k},v)\quad \mbox{for a.a. } t\in (0,T],\ \forall v\in H^1(\varOmega) $$

with y h(⋅,0)=y 0, and let \(Y_{h,k}(u):=\mathcal{G}_{h,k}(\mathcal{B}u)\in V_{h,k}\) be the solution of

$$ A\bigl(Y_{h,k}(u),\varPhi\bigr)=\sum _{n=1}^N\int_{I_n}\bigl( \mathcal{B}u,\varPhi^n\bigr)+\bigl(y_0, \varPhi_+^0\bigr),\quad \forall \varPhi\in V_{h,k}. $$
(4.15)

Note that Y h,k and Y h,k (u) are the fully discrete approximations to y h and y, respectively. Then from (2.1) and (4.14) we have \(y^{h}-y\in H^{2,1}(\varOmega_{T})\cap C(\bar{\varOmega}_{T})\), \((y^{h}-y)_{t}-\varDelta (y^{h}-y)+(y^{h}-y)=\mathcal{B}(u_{h,k}-u)\in L^{\infty}(\varOmega_{T})\), (y hy)(x,0)=0 and \(\frac{\partial (y^{h}-y)}{\partial n}=0\). Thus \(y^{h}-y\in W_{0}^{\infty}\). Now (2.6) and (4.14) imply that

(4.16)

Similarly, from (4.6) and (4.15) we have

(4.17)

Thus

This implies

(4.18)

where we have used Theorem 2. From (2.5) we have

Since μ≥0, with (y h)+=max(y h,0) we have

$$\int_{\bar{\varOmega}_T}\bigl(y^h-\phi\bigr)d\mu\leq \int _{\bar{\varOmega}_T}\bigl(y^h-\phi\bigr)^+d\mu. $$

Define y h,n=y h(x,t n ), so that for t n−1<tt n we have

(4.19)

where we have used an interpolation estimate and Theorem 3. From \(Y_{h,k}^{n}(x_{j})\leq \phi\) (j=1,…,m) we conclude \((Y_{h,k}^{n}-\phi)^{+}(x)=0\) on Ω h. Theorem 3 implies that \(Y_{h,k}^{n}\) is uniformly bounded in Ω, so we conclude from |ΩΩ h|≤Ch 2 that

(4.20)

Similarly, from (4.8), (4.9), (4.10), Theorem 3 and the fact that μ h,k ≥0, yϕ and \(\int_{\bar{\varOmega}_{T}}(\phi-Y_{h,k})d\mu_{h,k}=0\) we have

(4.21)

We note that \(\|\mathcal{B}u_{h,k}\|_{0,\infty,\varOmega}\leq C|\varOmega|\cdot\max\{|a|,|b|\}\). Combining (4.18), (4.19)–(4.21) the claim is proved. □

In [9], the authors obtain the convergence order of \(O(|\log h|^{1\over 4}(h^{1\over 2}+k^{1\over 4}))\) in 2d and \(O(h^{1\over 4}+h^{-{1\over 4}}k^{1\over 4})\) in 3d for problems with state constraints pointwise in space and time where controls only act in time as in (1.5). In [22], the authors obtain the convergence order of \(O(\log({T\over k})^{1\over 2}(k^{1\over 2}+h))\) for problems with state constraints acting only pointwise in time and distributed control. In the present paper for some fixed s∈(4,∞) which depends on the regularity of \(y_{0}\in W^{2-{2\over s},s}(\varOmega)\) and the regularity of the domain Ω, we obtain the convergence order \(s|\log h|(k^{{1\over 2}-{1\over s}}+h^{1-{2\over s}})\) in two space dimensions. If the assumption of Theorem 3 are satisfied for all 4<s<∞ we can get the order \(h+k^{1\over 2}\) up to a logarithmic factor by setting, e.g., s=|logh|. This compares to the results of [22] in the 2d case. Thus, if the regularity of the boundary of domain Ω allows for \(y\in W^{2,1}_{s}(\varOmega_{T})\) for all 4<s<∞ we in this case obtain an improvement over the results of [9], which appears to be quasi-optimal for problems with state constraints pointwisely in space and time with distributed control when compared with the elliptic case, see e.g., [8, 9, 23].

5 Numerical examples

In this section we will carry out some numerical experiments to support our theoretical findings. We consider the following parabolic optimal control problem:

$$\min J(y,u)={1\over 2}\|y-y_d\|^2_{L^2(\varOmega_T)}+ {\alpha\over 2}\|u\|^2_{L^2(\varOmega_T)} $$

subject to

$$\left \{ \begin{array}{l@{\quad}l} y_t -\varDelta y+ y=f+u &\mbox{in }\varOmega_T, \\[5pt] \dfrac{\partial y}{\partial n}=0 &\mbox{on } \varGamma_T,\\[8pt] y(0)=y_0&\mbox{in } \varOmega \end{array} \right . $$

with box type control constraints

$$u\in K_U:=\bigl\{u\in L^{\infty}(\varOmega_T): a \leq u(x,t)\leq b,\ (x,t)\in \varOmega_T\bigr\} $$

and state constraints

$$y\in K_Y:=\bigl\{y\in L^{\infty}(\varOmega_T): y(x,t)\leq \phi(x,t) \mbox{ or } y(x,t)\geq \phi(x,t),\ (x,t)\in \varOmega_T\bigr\}. $$

For constructing an example with exact solution we allow additional data f.

The numerical solution of the two examples is performed with the method proposed in [13, 20], which goes back to an idea of Pierre and Sokolowsky [27].

Example 1

Let Ω T =[0,1]2×[0,1], α=1. Following the ideas of [9] we set

$$y(x,t)=\cos(\pi x_1)\cos(\pi x_2)\sin(\pi t),\qquad \phi(x,t)=\max \bigl(0.7, y(x,t)\bigr), $$

while control and adjoint state are given by

We omit control constraints since uL (Ω T ). We note that \(y\in W^{2,1}_{s}(\varOmega_{T})\) for all 1≤s<∞ although the domain is a polygon. In this example we have a regular multiplier associated to the state constraint, namely

$$\mu(x,t)=\max(y-0.7,0). $$

A simple calculation shows

and

To support our theoretical results we test the convergence order with respect to space and time discretization. We choose the time step k=O(h 2) where h denotes the mesh size of space triangulation. The results are listed in Table 1.

Table 1 Error of control u and state y for Example 1

It is observed that in this numerical example the convergence orders for the optimal control u and the state y are better than the expected, which may be caused by the fact that the multiplier associated to the state constraints is continuous. However, in this context it should be mentioned that it is often difficult in control of parabolic PDEs to construct numerical examples with exact known solutions which deliver the exact predicted order of convergence, see e.g., [22, 24].

Example 2

This example is chosen from [16]. We set Ω T =B 1(0)×[0,1],

Then again \(y\in W^{2,1}_{s}(\varOmega_{T})\) for all 1≤s<∞. We impose the state constraint y(x,t)≥0 such that the active set \(\mathcal{A}=\{(x,t)\in \varOmega_{T}:\ y=0\}\) has the form

$$\mathcal{A}= \biggl\{(x,t)\in \varOmega_T: \frac{1}{4}\leq t \leq\frac{3}{4},\ \bigl\|x(t)\bigr\|=\frac{t}{2} \biggr\}. $$

We define the adjoint state p as

$$p(x,t)= \left \{ \begin{array}{l@{\quad}l} ((t-\frac{1}{2})^2-\frac{1}{16} )(2-\frac{t}{2}-\|x\|)(\|x\|-\frac{t}{2}) &\|x\|>\frac{t}{2},\ \frac{1}{4}< t < \frac{3}{4}, \\[5pt] 0 &\mbox{otherwise}. \end{array} \right . $$

The Lagrange multiplier μ associated with the state constraints satisfies

for any \(w\in C(\bar{\varOmega}_{T})\).

To accurately resolve the quadrature for this example we need to construct special meshes. This is done by congruent refinement of the initial grid formed by 8 sectors of the unit circle. For each i=1,2,…,N with \(t_{i}\in [\frac{1}{4},\frac{3}{4}]\) we need to ensure that the circle with radius \(\|x\|=\frac{t_{i}}{2}\) is well triangulated. Here t i (i=1,2,…,N) denote the time grid points. To do so, the time stepping has to be coupled with the grid size such that h=O(k). Figure 1 depicts two such meshes with 289 (left) and 1089 (right) nodes.

Fig. 1
figure 1

The congruent refined meshes for Example 2 with nodes 289 (left) and 1089 (right)

We observe first order convergence for the optimal control and state from Table 2. Since the time step k is coupled with the mesh size h like h=O(k), the numerical results mainly show the convergence order related to the time discretization. This order is better than \(O(k^{{1\over 2}})\), which we would expect from our estimates. Although this numerical example does not exactly meet the predictions of our theory if certainly shows, when compared to Example 1, that less regularity of the exact solution results in a decrease of the numerically observed convergence order.

Table 2 Error of control u and state y for Example 2