1 Introduction

We consider a class of optimal control problems for a general system of nonlinear reaction–diffusion equations that covers a variety of particular cases with important applications in mathematical physics. Control problems of this type received increasing attention in the recent past, we refer for instance to [16, 24, 29, 33] with respect to control methods of theoretical physics or [3, 4, 11,12,13] from a more mathematical perspective.

The general system is posed in \(Q:= \varOmega \times (0,T)\), where \(\varOmega \subset {\mathbb {R}}^d\) is a bounded spatial domain and \(T > 0\) is a finite time. It has the form

$$\begin{aligned} E \, \displaystyle \frac{\partial y}{\partial t} - D\varDelta y + R(x,t,y) + A(x,t)\, y = B(x,t)\, u \end{aligned}$$
(1.1)

subject to appropriate initial and boundary conditions. Here, \(y: Q \mapsto {\mathbb {R}}^{n}\) is the state vector function and \(u: Q \mapsto {\mathbb {R}}^{n_c}\) is the control vector function. Moreover, constant diagonal (nn)-matrices E, D, and matrix-valued functions A, B of suitable dimensions are given, where E has positive diagonal entries and the ones of D are non-negative. The nonlinearity of the system is defined by the function \(R: Q\times {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^{n} \) that we suppose to have a diagonal structure, i.e. the j-th component of the vector function R depends only on \((x,t,y_j)\). The assumptions on A, B, and R are detailed in the next section.

In particular, the system (1.1) includes the following special cases that we recall with increasing order of complexity.

  1. (i)

    The Schlögl model

    This model is defined upon the cubic nonlinearity \(R: {\mathbb {R}} \rightarrow {\mathbb {R}}\),

    $$\begin{aligned} R(y) = \rho (y-y_1)(y-y_2)(y-y_3) \end{aligned}$$
    (1.2)

    with given real numers \(y_1 \le y_2 \le y_3\) and \(\rho > 0\). The equation has the form

    $$\begin{aligned} \frac{\partial y}{\partial t} - \varDelta y + R(y)= u \end{aligned}$$
    (1.3)

    Here, we have \(n=n_c = 1\), the matrices \(E=D=B\) reduce to the real number 1 and A is equal to the real number 0. The system (1.3) is known to exhibit traveling waves as typical solutions. We refer to [28] and to the standard textbooks [14, 22, 31] on reaction–diffusion equations. The optimal control of (1.3) was investigated in [4] with focus on the numerical analysis and in [16] from a physical point of view; we also refer to various examples in [29].

  2. (ii)

    The FitzHugh–Nagumo system

    A standard type of this well-known system is

    $$\begin{aligned} \sigma _y \displaystyle \frac{\partial y}{\partial t} - D_y \varDelta y + R(y) + \alpha \, z= & {} u\nonumber \\ \sigma _z \displaystyle \frac{\partial z}{\partial t} + \beta \, z - \gamma \, y + \delta= & {} 0, \end{aligned}$$
    (1.4)

    where \(\sigma _y,\, \sigma _z,\) and \(D_y\) are positive and \( \alpha , \, \beta , \, \delta , \, \gamma \) are real numbers. This system fits in (1.1) with \(n=2,\, n_c=1\); A has the rows \((0 \ \alpha )\) and \((-\gamma \ \beta )\), \(B = (1\ 0)^\top \), and \(R(x,t,y,z) = (R(y),\delta )^\top \).

    For spatial dimension \(d = 1\), the system has impulses as typical solutions, while it develops turning spirals or scroll rings for \(d=2\) and \(d=3\), respectively. We refer to [14, 22, 31] for general results on the equation, to [24, 33] on applications of control methods in theoretical physics and to [11, 12] with respect to the optimal control from a mathematical perspective.

  3. (iii)

    Coupling of (1.3) with a linear system of ODEs

    More general than the FitzHugh–Nagumo system and more diverse in the behavior of the solutions is the system

    $$\begin{aligned} \sigma _y \displaystyle \frac{\partial y}{\partial t} - D_y \varDelta y + R(y) + \sum _{j=1}^m \alpha _j \, z_j= & {} u, \end{aligned}$$
    (1.5)
    $$\begin{aligned} \sigma _j \displaystyle \frac{\partial z_j}{\partial t} + \beta _j \, z_j - \gamma _j \, y + \delta _j= & {} 0,\quad j = 1,\ldots ,m, \end{aligned}$$
    (1.6)

    for the state vector \((y,z_1,\ldots ,z_m)\). Here, we have \(n = m+1\), \(n_c = 1\), \(A \in {\mathbb {R}}^{n\times n}\) has the first row \((0,\alpha _1,\ldots ,\alpha _m)\), the first column \(- (0,\gamma _1,\ldots ,\gamma _m)^\top \), and the (mm)-submatrix \(\mathrm{diag}(\beta _1,\ldots ,\beta _m)\) in the lower right corner. The other matrices are \(E = \mathrm{diag}(\sigma _y,\sigma _1,\ldots ,\sigma _m)\), \(D = \mathrm{diag}(D_y,0,\ldots ,0)\), and \(B = (1,0,\ldots ,0)^\top \). Moreover, we have \(R(\cdot ,y,z_1,\ldots ,z_m)= (R(y),\delta _1,\ldots ,\delta _m)^\top \); we refer to the Ph.D. thesis [23] for the treatment of this system.

  4. (iv)

    Coupling of (1.3) with a linear system of PDEs

    In this more general system, the Eq. (1.5) is coupled with the linear equations

    $$\begin{aligned} \sigma _j \displaystyle \frac{\partial z_j}{\partial t} -D_j \varDelta z_j + \beta _j \, z_j - \gamma _j \, y + \delta _j = 0,\quad j = 1,\ldots ,m. \end{aligned}$$
    (1.7)

    Here, we have almost the same quantities as in (iii), but D has to be defined by \(D= \mathrm{diag}(D_y,D_1,\ldots ,D_m)\) with positive numbers \(D_j\).

    This system was discussed in the Ph.D. thesis [23], too.

Other similar reaction–diffusion-equations covered by the general system (1.1) and associated control strategies have been of great interest during recent years. We mention exemplarily [1, 20, 21, 25, 27, 32].

We will present several numerical examples for the different types of equations listed above. All of them are adopted from the thesis [23]. Our mathematical analysis is developed for the more general system (1.1) that includes all these particular cases for the case of unit matrices E and D.

Our paper contains the following novelties: we extend the analysis on existence, uniqueness, and differentiability of the control-to-state mapping to the general state Eq. (1.1). Here, we apply a different method of proof than in [23] and avoid the application of analytic semigroups that turns out to be too technical for the general case. We prove the existence of optimal controls under weaker assumptions than usually expected: in the unconstrained case \(a = -\,\infty \) or \(b = \infty \) it seems that the existence of optimal controls cannot be shown. The \(L^2\)-Tikhonov term ensures only the \(L^2(Q)^{n_c}\)-boundedness of the set of controls, where the infimum of the problem can be attained. However, with exception of \(d=1\), the control-to-state mapping G is not continuous from \(L^2(Q)^{n_c}\) to the state space. Therefore, this boundedness in \(L^2(Q)^{n_c}\) seems to be useless. Nevertheless, we are able to prove the existence of optimal controls.

Moreover, we consider problems with pointwise state constraints and prove associated necessary optimality conditions.

The consideration of the state constraints is motivated by an interesting application in theoretical physics, namely the problem of preventing localized moving spots from reaching the boundary of the spatial domain. This issue is discussed in Example 3 of Sect. 4.

Remark 1.1

We will develop our analysis for (nn)-unit matrices \(E = D = I_d\), since this is less technical. If E and D have positive diagonal entries, then the results remain true with E and D substituted for \(I_d\) in the associated positions. The proofs need only minor modifications. If some of the diagonal entries of D are vanishing as in (1.4) or (1.6), then some equations of (1.1) are ordinary differential equations w.r. to t. In these equations, x plays the role of a parameter and boundary conditions are not given. If they are linear as in (1.6), then the associated vector function z can be represented by the variation-of-constants formula in terms of y and herafter eliminated. Then a system for y of the form (1.1) is obtained, where D has positive diagonal elements again. If z and y appear nonlinearly in the ordinary differential equations, then the situation is more complicated. This case is not covered by our theory. The matrices E and D will be needed in our numerical examples.

2 Control constrained problem

2.1 State equation

We shall consider the following system of parabolic partial differential equations

$$\begin{aligned} \left\{ \begin{array}{ll} \displaystyle \frac{\partial y}{\partial t} - \varDelta y + R(x,t,y) + A(x,t)y = B(x,t)u&{}\quad \text { in }Q = \varOmega \times (0,T),\\ \displaystyle \partial _\nu y = 0&{}\quad \text { on } \varSigma = \varGamma \times (0,T),\\ y(x,0)=y_0(x)&{}\quad \text { in } \varOmega ,\end{array}\right. \end{aligned}$$
(2.1)

where \(\varOmega \) is an open bounded domain in \({\mathbb {R}}^d\) with \(d \in \{1,2,3\}\), \(\varGamma \) is the boundary of \(\varOmega \) that we assume to be Lipschitz, \(\nu \) is the unit outward normal vector to \(\varGamma \), \(0< T < \infty \) is fixed, \(A \in L^\infty (Q,{\mathbb {R}}^{n \times n})\), \(B \in L^\infty (Q,{\mathbb {R}}^{n \times n_c})\), \(y_0 \in L^\infty (\varOmega )^n\), and the function \(R:Q \times {\mathbb {R}}^n \longrightarrow {\mathbb {R}}^n\) is defined by \(R(x,t,y) = (R_1(x,t,y_1),\ldots , R_n(x,t,y_n))^\top \) with Carathéodory functions \(R_j:Q \times {\mathbb {R}} \longrightarrow {\mathbb {R}}\) for \(1 \le j \le n\). Given a control \(u:Q \longrightarrow {\mathbb {R}}^{n_c}\), we look for the corresponding solution \(y_u:Q \longrightarrow {\mathbb {R}}^n\) of (2.1). The analysis of this equation will be done below under the following assumptions on R. We assume that every function \(R_j\) is of class \(C^1\) with respect to the last variable and satisfies for \(1 \le j \le n\)

$$\begin{aligned}&R_j(\cdot , 0) \in L^{{{\hat{p}}}}(0,T;L^{{{\hat{q}}}}(\varOmega )) \text { with } {{\hat{p}}}, {{\hat{q}}} \in [2, \infty ] \text { and } \frac{1}{{{\hat{p}}}} + \frac{d}{2{{\hat{q}}}} < 1, \end{aligned}$$
(2.2)
$$\begin{aligned}&\exists C_R \in {\mathbb {R}} : \frac{\partial R_j}{\partial y_j}(x,t,y_j) \ge C_R \ \text { for a.a. } (x,t) \in Q, \quad \forall y _j\in {\mathbb {R}}, \end{aligned}$$
(2.3)
$$\begin{aligned}&\forall M > 0\ \exists C_M : \left| \frac{\partial R_j}{\partial y_j}(x,t,y_j)\right| \le C_M \ \text { for a.a. } (x,t) \in Q, \quad \forall |y_j| \le M. \end{aligned}$$
(2.4)

Along this paper, we take \(\Vert A\Vert _{L^\infty (Q,{\mathbb {R}}^{n\times n})}:= \text {ess sup}_{(x,t) \in Q} \Vert A(x,t)\Vert \), where \(\Vert A(x,t)\Vert \) denotes the matrix norm induced by the Euclidean norm in \({\mathbb {R}}^n\). Analogously, we define \(\Vert B\Vert _{L^\infty (Q,{\mathbb {R}}^{n\times n_c})}\).

We will use the function space

$$\begin{aligned} W(0,T) = \left\{ y \in L^2\left( 0,T,H^1(\varOmega )\right) : \partial _t y \in L^2\left( 0,T;H^1(\varOmega )^*\right) \right\} . \end{aligned}$$

and set \(Y = W(0,T)^n \cap L^\infty (Q)^n\). This is a Banach space when endowed with the norm

$$\begin{aligned} \Vert y\Vert _Y = \left\{ \sum _{j = 1}^n\Vert y_j\Vert ^2_{L^2(0,T;H^1(\varOmega ))} + \Vert \partial _t y\Vert _{L^2\left( 0,T;H^1(\varOmega )^*\right) }^2\right\} ^{1/2}+ \max _{1 \le j \le n}\Vert y_j\Vert _{L^\infty (Q)}. \end{aligned}$$

Theorem 2.1

Under the above assumptions, for every \(u \in L^{{{\bar{p}}}}(0,T;L^{\bar{q}}(\varOmega ))^{n_c}\) with \({{\bar{p}}}, {{\bar{q}}} \in [2,+\infty ]\) and \(\frac{1}{{{\bar{p}}}} + \frac{d}{2{{\bar{q}}}} < 1\), (2.1) has a unique solution \(y \in Y\). Furthermore, there exists a constant \(C_Y\) independent of u such that

$$\begin{aligned} \Vert y\Vert _Y \le C_Y\Big (\Vert y_0\Vert _{L^\infty (\varOmega )^n} + \Vert u\Vert _{L^{\bar{p}}(0,T;L^{{{\bar{q}}}}(\varOmega ))^{n_c}} + \Vert R(\cdot ,0)\Vert _{L^{{{\hat{p}}}}(0,T;L^{{{\hat{q}}}}(\varOmega ))^{n}}\Big ). \end{aligned}$$
(2.5)

For \(y_0 \in C({{\bar{\varOmega }}})\), we have that \(y \in C({{\bar{Q}}})\).

Proof

For every \(M > 0\), we set \(R_M(x,t,y) = R(x,t,{\text {Proj}}_{[-M,+M]^n}(y))\). Now, given \(w \in L^2(Q)^n\), we consider the system

$$\begin{aligned} \left\{ \begin{array}{ll} \displaystyle \frac{\partial y}{\partial t} - \varDelta y + A(x,t)y = B(x,t)u - R_M(x,t,w(x,t))&{}\quad \text {in }Q,\\ \displaystyle \partial _\nu y = 0&{}\quad \text {on } \varSigma ,\\ y(x,0)=y_0(x)&{}\quad \text {in } \varOmega .\end{array}\right. \end{aligned}$$
(2.6)

For every \(\eta > \Vert A\Vert _{L^\infty (Q,{\mathbb {R}}^{n\times n})}\), the operator \(-\varDelta + A + \eta I\) is coercive in \(H^1(\varOmega )^n\), hence there exists a unique solution \(y_w \in L^2(0,T;H^1(\varOmega ))^n\) of (2.6); see, for instance, [30, p. 112]. Further, from the partial differential equation we get that \(\partial _t y_w \in L^2(0,T;H^1(\varOmega )^*)^n\) as well, hence \(y_w \in W(0,T)^n\). For fixed \(M > 0\), the right hand side of the partial differential Eq. (2.6) is bounded in \(L^2(Q)\), hence the Schauder fixed point theorem applied to the mapping \(w \in L^2(Q)^n \rightarrow y_w \in L^2(Q)^n\) along with the compactness of the embedding \(W(0,T) \subset L^2(Q)\), implies the existence of a fixed point \(y_M \in L^2(0,T;H^1(\varOmega ))^n\) that satisfies the equation

$$\begin{aligned} \left\{ \begin{array}{ll} \displaystyle \frac{\partial y_M}{\partial t} - \varDelta y_M + R_M(x,t,y_M(x,t))+ A(x,t)y_M = B(x,t)u&{}\quad \text {in }Q,\\ \displaystyle \partial _\nu y_M = 0&{}\quad \text {on } \varSigma ,\\ y_M(x,0) = y_0(x)&{}\quad \text {in } \varOmega .\end{array}\right. \end{aligned}$$
(2.7)

Next, we derive some estimates for \(y_w\). Let us set \(\eta = |C_R| + \Vert A\Vert _{L^\infty (Q,{\mathbb {R}}^{n \times n})} + 1\) and take \(0 < T' \le T\). Multiplying (2.7) by \({\mathrm{e}}^{-2\eta t}y_M\) and integrating in \(Q' = \varOmega \times (0,T')\), we get

$$\begin{aligned}&\int _0^{T'}{\mathrm{e}}^{-2\eta t}\left( \frac{\partial y_M}{\partial t},y_M\right) _{[H^1(\varOmega )^n]^*,H^1(\varOmega )^n}\, dt + \int _0^{T'}{\mathrm{e}}^{-2\eta t}\int _\varOmega |\nabla y_M|^2\, dx\, dt\nonumber \\&\quad + \int _{Q'}{\mathrm{e}}^{-2\eta t}\left[ R_M(x,t,y_M) - R(x,t,0)\right] \cdot y_M\, dxdt + \int _{Q'}{\mathrm{e}}^{-2\eta t}y_M^\top A(x,t)y_M\, dxdt\nonumber \\&\qquad = \int _{Q'}{\mathrm{e}}^{-2\eta t}\left[ B(x,t)u - R(x,t,0)\right] \cdot y_M\, dx\, dt. \end{aligned}$$
(2.8)

For the first term of the left hand side of the above identity we have

$$\begin{aligned} \int _0^{T'}{\mathrm{e}}^{-2\eta t}\left( \frac{\partial y_M}{\partial t},y_M\right) _{[H^1(\varOmega )^n]^*,H^1(\varOmega )^n}\, dt =&\, \frac{1}{2}\int _0^{T'}{\mathrm{e}}^{-2\eta t}\frac{d}{dt}\Vert y_M(t)\Vert ^2_{L^2(\varOmega )^n}\, dt\nonumber \\ =&\,\frac{1}{2}\left\{ {\mathrm{e}}^{-2\eta T'}\left\| y_M(T')\right\| ^2_{L^2(\varOmega )^n} - \Vert y_0\Vert ^2_{L^2(\varOmega )^n}\right\} \nonumber \\&+ \eta \int _0^{T'}{\mathrm{e}}^{-2\eta t}\Vert y_M(t)\Vert ^2_{L^2(\varOmega )^n}\, dt. \end{aligned}$$
(2.9)

Moreover, applying the mean value theorem and using (2.3) we obtain

$$\begin{aligned} \left[ R_M(x,t,y_M) - R(x,t,0)\right] \cdot y_M \ge C_R|y_M|^2. \end{aligned}$$

We also have that \(y_M^\top A(x,t) y_M \ge -\Vert A\Vert _{L^\infty (Q,{\mathbb {R}}^{n \times n})}|y_M|^2\). Using these two inequalities, inserting (2.9) in (2.8), and taking into account the definition of \(\eta \), we infer by the Schwarz inequality that

$$\begin{aligned}&\frac{1}{2}{\mathrm{e}}^{-2\eta T'}\left\| y_M(T')\right\| ^2_{L^2(\varOmega )^n} + \int _0^{T'}{\mathrm{e}}^{-2\eta t}\Vert y_M(t)\Vert ^2_{L^2(\varOmega )^n}\, dt\\&\quad + \int _0^{T'}{\mathrm{e}}^{-2\eta t}\int _\varOmega |\nabla y_M|^2\, dx\, dt \le \frac{1}{2}\Vert y_0\Vert ^2_{L^2(\varOmega )^n}\\&\quad + \left( \int _{Q'}{\mathrm{e}}^{-2\eta t}|Bu - R(x,t,0)|^2\, dx\, dt \right) ^{1/2}\left( \int _{Q'}{\mathrm{e}}^{-2\eta t}|y_M|^2\, dx\, dt\right) ^{1/2}. \end{aligned}$$

Applying Young’s inequality and multiplying the resulting inequality by 2 we get

$$\begin{aligned}&{\mathrm{e}}^{-2\eta T'}\Vert y_M(T')\Vert ^2_{L^2(\varOmega )^n} + \int _0^{T'}{\mathrm{e}}^{-2\eta t}\Vert y_M(t)\Vert ^2_{L^2(\varOmega )^n}\, dt\\&\quad + 2\int _0^{T'}{\mathrm{e}}^{-2\eta t}\int _\varOmega |\nabla y_M|^2\, dx\, dt \\&\qquad \le \Vert y_0\Vert ^2_{L^2(\varOmega )^n} + \Vert B\Vert _{L^\infty (Q,{\mathbb {R}}^{n \times n_c})}\Vert u\Vert _{L^2(Q)^{n_c}} + \Vert R(\cdot ,\cdot ,0)\Vert _{L^2(Q)^n}. \end{aligned}$$

Therefore, dropping the second term, using \({\mathrm{e}}^{-2\eta t} \ge {\mathrm{e}}^{-2\eta T'}\) with \(0 < T' \le T\), and multiplying the inequality by \({\mathrm{e}}^{2\eta T'}\), we conclude

$$\begin{aligned}&\Vert y_M(T')\Vert ^2_{L^2(\varOmega )^n} + 2\int _0^{T'}\int _\varOmega |\nabla y_M|^2\, dx\, dt\nonumber \\&\quad \le {\mathrm{e}}^{2\eta T}\left( \Vert y_0\Vert ^2_{L^2(\varOmega )^n} + \Vert B\Vert _{L^\infty (Q,{\mathbb {R}}^{n \times n_c})}\Vert u\Vert _{L^2(Q)^{n_c}} + \Vert R(\cdot ,\cdot ,0)\Vert _{L^2(Q)^n}\right) . \end{aligned}$$
(2.10)

Since \(T'\) was arbitrary, we deduce that \(\{y_M\}_M\) is uniformly bounded in the space \(L^\infty (0,T;L^2(\varOmega ))^n \cap L^2(0,T;H^1(\varOmega ))^n\).

It remains to prove the boundedness of \(y_M\) in \(L^\infty (Q)^n\). To this end, consider the functions \(f_j:Q \times {\mathbb {R}} \longrightarrow {\mathbb {R}}\) and \(g_j\), \(1 \le j \le n\), defined by

$$\begin{aligned} f_j(x,t,y)= & {} R_{M,j}(x,t,y) - R_j(x,t,0) + (1+|C_R|)y,\\ g_j= & {} (Bu)_j - R_j(\cdot ,\cdot ,0) + (1+ |C_R|)y_{M,j} - (Ay_M)_j. \end{aligned}$$

From (2.7), it is obvious that \(y_{M,j}\) satisfies the equation

$$\begin{aligned} \left\{ \begin{array}{ll} \displaystyle \frac{\partial y_{M,j}}{\partial t} - \varDelta y_{M,j} + f_j(x,t,y_{M,j}(x,t)) = g_j&{}\quad \text {in }Q,\\ \displaystyle \partial _\nu y_{M,j} = 0&{}\quad \text {on } \varSigma ,\\ y_{M,j}(x,0) = y_{0,j}(x)&{}\quad \text {in } \varOmega .\end{array}\right. \end{aligned}$$
(2.11)

Due to the regularity of \(g_j\) and the fact that \(f_j\) is monotone non decreasing, we can use the methods of [15, §III.7] to infer the existence of a constant \(C_j\) independent of M such that

$$\begin{aligned} \Vert y_{M,j}\Vert _{L^\infty (Q)} \le&\,C_j\big (\Vert y_{0;j}\Vert _{L^\infty (\varOmega )} + \Vert (Bu)_j\Vert _{L^{{{\bar{p}}}}(0,T;L^{{{\bar{q}}}}(\varOmega ))}\nonumber \\&+ \Vert R_j(\cdot ,\cdot ,0)\Vert _{L^{{{\hat{p}}}}(0,T;L^{{{\hat{q}}}}(\varOmega ))}\nonumber \\&+ \Vert (1+|C_R|)y_{M,j} - (Ay_M)_j\Vert _{L^\infty (0,T;L^2(Q))}\big ) \nonumber \\ \le&\, C'_j\big (\Vert y_0\Vert _{L^\infty (\varOmega )^n} + \Vert u\Vert _{L^{\bar{p}}(0,T;L^{{{\bar{q}}}}(\varOmega ))^{n_c}}\nonumber \\&+ \Vert R(\cdot ,\cdot ,0)\Vert _{L^{{{\hat{p}}}}(0,T;L^{{{\hat{q}}}}(\varOmega ))^{n}} + \Vert y_M\Vert _{L^\infty (0,T;L^2(\varOmega ))^n}\big ). \end{aligned}$$
(2.12)

Combining (2.10) and (2.12), we obtain the boundedness in \(L^\infty (Q)^n\) of every \(y_M\) by a constant C independent of M. Hence, taking \(M > C\), we see that \(R_M(x,t,y_M(x,t)) = R(x,t,y(x,t))\). This implies that \(y_M\) is solution of (2.1). Moreover, (2.5) follows from (2.10) and (2.12).

Now, we prove the uniqueness of a solution. Let us assume that \(y^1, y^2 \in Y\) are two solutions of (2.1). We set \(y = y^2 - y^1\). Subtracting the equations satisfied by these functions, we obtain

$$\begin{aligned} \left\{ \begin{array}{ll} \displaystyle \frac{\partial y}{\partial t} - \varDelta y + R(x,t,y^2) - R(x,t,y^1) + A(x,t)y = 0&{}\quad \text {in }Q,\\ \displaystyle \partial _\nu y = 0&{}\quad \text {on } \varSigma ,\\ y(x,0)=0&{}\quad \text {in } \varOmega .\end{array}\right. \end{aligned}$$

Setting again \(\eta = 1 + |C_R| + \Vert A\Vert _{L^\infty (Q,{\mathbb {R}}^{n \times n})}\), multiplying this equation by \({\mathrm{e}}^{-2\eta t}y\), integrating in Q, using that

$$\begin{aligned} \left[ R(x,t,y^2) - R(x,t,y^1)\right] \cdot y \ge C_R|y|^2, \end{aligned}$$

and arguing as above, we get

$$\begin{aligned}&\frac{1}{2}{\mathrm{e}}^{-2\eta T}\Vert y(T)\Vert ^2_{L^2(\varOmega )^n} + \int _0^{T}{\mathrm{e}}^{-2\eta t}\Vert y(t)\Vert ^2_{L^2(\varOmega )^n}\, dt\\&\quad + \int _0^{T}{\mathrm{e}}^{-2\eta t}\int _\varOmega |\nabla y|^2\, dx\, dt \le 0. \end{aligned}$$

This shows that \(y = 0\).

Finally, if \(y_0 \in C({{\bar{\varOmega }}})^n\), then we can apply the results of [15, §III.7] to deduce that \(y \in C(\bar{Q})^n\). \(\square \)

Remark 2.1

Of course the previous theorem remains valid if we consider a more general elliptic operator with \(L^\infty (Q)\) coefficients. Moreover, the space \(Y = W(0,T)^n \cap L^\infty (Q)^n\) introduced previously can be substituted by \(Y = W(0,T)^n \cap C({{\bar{Q}}})^n\), provided that \(y_0 \in C({{\bar{\varOmega }}})^n\). Then, Theorem 2.1 and the next results are true under this new definition of Y. We also mention that the diagonal structure of R could be avoided under some additional assumptions. For instance, Theorem 2.1 is valid assuming that R is of polynomial order with respect to y and certain kind of monotonicity is satisfied: there exist constants \(C_i > 0\), \(i = 1, 2 ,3\), such that for almost all \((x,t) \in Q\) and all \(y, y' \in {\mathbb {R}}^d\)

$$\begin{aligned}&|R(x,t,y)| \le C_1(1+ |y|^r) \text { with } r> 0 \text { arbitrary if } d \le 2 \text { and } r < 2 \text { if } d =3,\\&\exists C_2> 0 : (R(x,t,y) - R(x,t,y'))\cdot (y-y') \ge -C_2|y-y'|^2,\\&\exists C_3 > 0 : R(x,t,y)\cdot y \ge -C_3|y|^2. \end{aligned}$$

The difficulty of dealing with more general functions is due to the lack of \(L^\infty \)-estimates, which are crucial in the above analysis.

Remark 2.2

As usual, assuming more regularity of \(y_0\) and \(\varOmega \), we get extra regularity of y: if \(y_0 \in H^1(\varOmega )^n\), then y belongs to \(H^1(Q)^n\), and an associated estimate is valid. If, in addition to this regularity of \(y_0\), \(\varGamma \) is of class \(C^{1,1}\) or \(\varOmega \) is convex, then \(y \in H^{2,1}(Q)^n\).

Due to Theorem 2.1, we can define a mapping \(G:L^{\bar{p}}(0,T;L^{{{\bar{q}}}}(\varOmega ))^{n_c} \longrightarrow Y\) by \(G(u) = y_u\), where \(y_u\) is the solution of (2.1) associated with u. The next theorem states the differentiability of this mapping.

Theorem 2.2

The mapping G is of class \(C^1\) and the derivatives \(z_v = G'(u)v\) are the solutions of the system

$$\begin{aligned} \left\{ \begin{array}{ll} \displaystyle \frac{\partial z}{\partial t} - \varDelta z + \frac{\partial R}{\partial y}(x,t,y_u)z + A(x,t)z = B(x,t)v&{}\quad \text {in }Q,\\ \displaystyle \partial _\nu z = 0&{}\quad \text {on } \varSigma ,\\ z(x,0)=0&{}\quad \text {in } \varOmega .\end{array}\right. \end{aligned}$$
(2.13)

Moreover, we have that \(z_v \in [H^1(Q) \cap C({{\bar{Q}}})]^n\).

Proof

We define the space

$$\begin{aligned} V = \Big \{y \in Y : \frac{\partial y}{\partial t} - \varDelta y \in L^{{{\bar{p}}}}(0,T;L^{{{\bar{q}}}}(\varOmega ))^n + L^{{{\hat{p}}}}(0,T;L^{{{\hat{q}}}}(\varOmega ))^n, \, \partial _\nu y = 0 \Big \} \end{aligned}$$

endowed with the norm

$$\begin{aligned} \Vert y\Vert _V = \Vert y\Vert _Y + \left\| \frac{\partial y}{\partial t} - \varDelta y\right\| _{L^{{{\bar{p}}}}(0,T;L^{{{\bar{q}}}}(\varOmega ))^n + L^{{{\hat{p}}}}(0,T;L^{{{\hat{q}}}}(\varOmega ))^n}; \end{aligned}$$

then V is a Banach space. Moreover, we introduce a mapping

$$\begin{aligned} {\mathcal {F}}:V \times L^{{{\bar{p}}}}\left( 0,T;L^{{{\bar{q}}}}(\varOmega )\right) ^{n_c}\longrightarrow & {} \left[ L^{{{\bar{p}}}}(0,T;L^{{{\bar{q}}}}(\varOmega ))^n + L^{{{\hat{p}}}}(0,T;L^{{{\hat{q}}}}(\varOmega ))^n\right] \\&\times L^\infty (\varOmega )^n \end{aligned}$$

given by

$$\begin{aligned} {\mathcal {F}}(y,u) = \left( \frac{\partial y}{\partial t} - \varDelta y + R(x,t,y) + A(x,t)y - B(x,t)u,y(\cdot ,0) - y_0\right) . \end{aligned}$$

By the definition of V and assumption (2.2)–(2.4), it is clear that \({\mathcal {F}}\) is well defined and is of class \(C^1\). Moreover, the identity \({\mathcal {F}}(G(u),u) = 0\) holds for all \(u \in L^{{{\bar{p}}}}(0,T;L^{{{\bar{q}}}}(\varOmega ))^{n_c}\). We want to apply the implicit function theorem to deduce the differentiability of G and to obtain \(z_v = G'(u)v\) as the solution of (2.13). To this end, we only need to prove that the linear operator

$$\begin{aligned} \frac{\partial {\mathcal {F}}}{\partial y}(G(u),u):V \longrightarrow \left[ L^{{{\bar{p}}}}\left( 0,T;L^{{{\bar{q}}}}(\varOmega )\right) ^n + L^{{{\hat{p}}}}\left( 0,T;L^{{{\hat{q}}}}(\varOmega )\right) ^n\right] \times L^\infty (\varOmega )^n \end{aligned}$$

is an isomorphism. From the expression

$$\begin{aligned} \frac{\partial {\mathcal {F}}}{\partial y}\left( G(u),u\right) z = \left( \frac{\partial z}{\partial t} - \varDelta z + \frac{\partial R}{\partial y}(x,t,y_u)z + A(x,t)z\, ,\, z(\cdot ,0)\right) , \end{aligned}$$

we have that \(\frac{\partial {\mathcal {F}}}{\partial y}(G(u),u)\) is an isomorphism if and only if for every element \((f,g,z_0) \in [L^{\bar{p}}(0,T;L^{{{\bar{q}}}}(\varOmega ))^n + L^{{{\hat{p}}}}(0,T;L^{{{\hat{q}}}}(\varOmega ))^n] \times L^\infty (\varOmega )^n\), the system

$$\begin{aligned} \left\{ \begin{array}{ll} \displaystyle \frac{\partial z}{\partial t} - \varDelta z + \frac{\partial R}{\partial y}(x,t,y_u)z + A(x,t)z = f + g&{}\quad \text {in }Q,\\ \displaystyle \partial _\nu z = 0&{}\quad \text {on } \varSigma ,\\ z(x,0) = z_0&{}\quad \text {in } \varOmega \end{array}\right. \end{aligned}$$

has a unique solution in V and the solution z depends continuously on the data \((f,g,z_0)\). This existence, uniqueness and continuity follows from Theorem 2.1. Therefore, G is of class \(C^1\). Finally, (2.13) follows from the identity

$$\begin{aligned} \frac{\partial {\mathcal {F}}}{\partial y}\left( G(u),u\right) \left( G'(u)v\right) + \frac{\partial {\mathcal {F}}}{\partial u}\left( G(u),u\right) v = (0,0)\quad \forall v \in L^{{{\bar{p}}}}\left( 0,T;L^{{{\bar{q}}}}(\varOmega )\right) ^{n_c} \end{aligned}$$

and the fact that \(\frac{\partial {\mathcal {F}}}{\partial u}(G(u),u)v = -Bv\).

The additional regularity of \(z_v\) is a consequence of the initial condition \(z_v(0) = 0\), because \(z_v(0)\) is continuous and belongs to \(H^1(\varOmega )\). \(\square \)

2.2 Optimal control problem

In this section, associated with the state Eq. (2.1), we consider the following control problem

$$\begin{aligned} {\mathrm{(P)}} \quad \min _{u \in {{\mathcal {U}}_{a,b}}} J(u), \end{aligned}$$

where

$$\begin{aligned} J(u) =&\frac{1}{2}\int _Q|C_Q(x,t)y_u(x,t) - y_Q(x,t)|^2\, dx\, dt\\&+ \frac{1}{2}\int _\varOmega |C_\varOmega (x)y_u(x,T) - y_\varOmega (x)|^2\, dx\, dt + \frac{\lambda }{2}\int _Q|u(x,t)|^2\, dx\, dt \end{aligned}$$

with \(\lambda \ge 0\), matrix functions \(C_Q \in L^\infty (Q,{\mathbb {R}}^{n_Q\times n})\), \(C_\varOmega \in L^\infty (\varOmega ,{\mathbb {R}}^{n_\varOmega \times n})\), functions \(y_Q \in L^2(Q)^{n_Q}\), \(y_\varOmega \in L^2(\varOmega )^{n_\varOmega }\), and the admissible set

$$\begin{aligned} {{\mathcal {U}}_{a,b}}= \left\{ u \in L^2(Q)^{n_c} : a \le u(x,t) \le b \text { a.e. in } Q\right\} , \end{aligned}$$

where \(a = (a_1,\ldots ,a_{n_c})^\top \), \(b = (b_1,\ldots ,b_{n_c})^\top \), and \(- \infty \le a_j < b_j\le +\infty \), \(1 \le j \le n_c\). Of course, the relations \(a\le u(x,t) \le b\) are to be understood componentwise. Obviously, this assumption implies that \({{\mathcal {U}}_{a,b}}\ne \emptyset \).

As an immediate consequence of Theorem 2.2, we deduce the following corollary just by an application of the chain rule.

Corollary 2.1

The mapping \(J:L^{{{\bar{p}}}}(0,T;L^{{{\bar{q}}}}(\varOmega ))^{n_c} \longrightarrow {\mathbb {R}}\) is of class \(C^1\) and we have

$$\begin{aligned} J'(u)v = \int _Q\left( B^\top \varphi _u + \lambda u\right) \cdot v\, dx\, dt\ \quad \forall u, v \in L^{{{\bar{p}}}}\left( 0,T;L^{{{\bar{q}}}}(\varOmega )\right) ^{n_c}, \end{aligned}$$
(2.14)

where \(\varphi _u \in L^2(0,T;H^1(\varOmega ))^n\) is the unique solution of the adjoint state equation

$$\begin{aligned} \left\{ \begin{array}{ll} \displaystyle -\frac{\partial \varphi }{\partial t} - \varDelta \varphi + \frac{\partial R}{\partial y}(x,t,y_u)\varphi + A(x,t)^\top \varphi = C_Q^\top \left[ C_Qy_u - y_Q\right] &{}\quad \text {in }Q,\\ \displaystyle \partial _\nu \varphi = 0&{}\quad \text {on } \varSigma ,\\ \varphi (\cdot ,T) = C^\top _\varOmega \left[ C_\varOmega y_u(\cdot ,T) - y_\varOmega \right] &{}\quad \text {in } \varOmega .\end{array}\right. \end{aligned}$$
(2.15)

Using this result, we can derive the optimality condition for a local solution to (P).

Theorem 2.3

Assume that \(a, b \in {\mathbb {R}}^{n_c}\). Then there exists at least one solution of the control problem (P). Any local solution \(\bar{u}\) satisfies the variational inequality

$$\begin{aligned} \int _Q\left( B^\top {{\bar{\varphi }}} + \lambda {{\bar{u}}}\right) \cdot (u - {{\bar{u}}})\, dx\, dt \ge 0 \quad \forall u \in {{\mathcal {U}}_{a,b}}, \end{aligned}$$
(2.16)

where \({{\bar{\varphi }}} = \varphi _{{{\bar{u}}}}\) is the adjoint state associated with \({{\bar{u}}}\).

Since \({{\mathcal {U}}_{a,b}}\) was assumed to be bounded in \(L^\infty (Q)^{n_c}\), the proof of the existence of an optimal control is standard by taking a minimizing sequence that converges weakly\(^*\) in \(L^\infty (Q)^{n_c}\) to some \({{\bar{u}}} \in {{\mathcal {U}}_{a,b}}\). Moreover, we mention that \(u_k {\mathop {\rightharpoonup }\limits ^{*}} u\) weakly\(^*\) in \(L^\infty (Q)^{n_c}\) implies that \(y_{u_k} \rightarrow y_u\) strongly in \(L^2(Q)^n\). This follows from Theorem 2.1 and the Aubin–Lions theorem.

Finally, we take \({{\bar{p}}} = {{\bar{q}}} = +\infty \) and apply Corollary 2.1 to deduce the optimality conditions.

Let us now analyze the case where \({{\mathcal {U}}_{a,b}}\) is not bounded, which in particular includes the case without control constraints. Here, we slightly extend ideas of [8]. First we observe that Theorem 2.1 cannot be applied to deduce the existence of a solution of the state equation for arbitrary elements in \({{\mathcal {U}}_{a,b}}\) because the \(L^\infty \)-estimates for the states fail. We need to assume more regularity for the elements of \({{\mathcal {U}}_{a,b}}\), the assumed \(L^2(Q)^{n_C}\) regularity in the definition of \({{\mathcal {U}}_{a,b}}\) is not enough. Therefore, we have to work with a control space of type \(L^{\bar{p}}(0,T;L^{{{\bar{q}}}}(\varOmega ))^{n_c}\) with \({{\bar{p}}}, {{\bar{q}}} \in [2,+\infty ]\) and \(\frac{1}{{{\bar{p}}}} + \frac{d}{2{{\bar{q}}}} < 1\).

The reader can easily identify the difficulty of proving a solution of the corresponding control problem because of the lack of coercivity of the cost functional in these spaces. Notice that the Tikhonov regularization term yields only coercivity in the space \(L^2(Q)^{n_c}\) but not in the control space introduced above. Let us redefine

$$\begin{aligned} {{\mathcal {U}}_{a,b}}= \left\{ u \in L^\infty \left( 0,T;L^2(\varOmega )\right) ^{n_c} : a \le u(x,t) \le b \text { for a.a.} \ (x,t) \in Q\right\} . \end{aligned}$$

Theorem 2.4

Let \(\lambda \) be strictly positive. Then, the optimal control problem (P) has at least one solution \({{\bar{u}}}\). Any local solution in this space satisfies the variational inequality (2.16).

Proof

For every \(1 \le j \le n_c\), we select an element \(\xi _j \in {\mathbb {R}}\) such that \(a_j< \xi _j < b_j\). We set \(\xi = (\xi _j)_{j = 1}^{n_c} \in {\mathbb {R}}^{n_c}\) and \(M_0 = |\xi |\sqrt{|\varOmega |}\), where \(|\varOmega |\) denotes the Lebesgue measure of \(\varOmega \). For every \(M \ge M_0\), we define

$$\begin{aligned} {{\mathcal {U}}_M}= \left\{ u \in {{\mathcal {U}}_{a,b}}: \Vert u\Vert _{L^\infty \left( 0,T;L^2(\varOmega )\right) ^{n_c}} \le M\right\} . \end{aligned}$$

Take \(u_0(x,t) = \xi \forall (x,t) \in Q\). Then we have that \(u_0 \in {{\mathcal {U}}_M}\;\forall M \ge M_0\) and, consequently, \({{\mathcal {U}}_M}\ne \emptyset \). According to Theorem 2.1 we know that (2.1) has a unique solution \(y_u \in Y\) for every \(u \in L^\infty (0,T;L^2(\varOmega ))^{n_c}\). We formulate the following optimal control problem

$$\begin{aligned} {(\mathrm{P}_{M})}\quad \min _{u \in {{\mathcal {U}}_M}} J(u). \end{aligned}$$

For every \(M \ge M_0\), this problem has at least one solution \(u_M\). Indeed, any minimizing sequence \(\{u_k\}\) is bounded in \(L^\infty (0,T;L^2(\varOmega ))^{n_c}\), hence the sequence of corresponding states \(\{y_k\}\) is bounded in Y. Moreover, from (2.1) we also obtain the boundedness of \(\{y_k\}\) in \(W(0,T)^n\). Then we can select subsequences, denoted in the same way, such that \(u_k {\mathop {\rightharpoonup }\limits ^{*}}u_M\) in \(L^\infty (0,T;L^2(\varOmega ))^{n_c}\) and \(y_k \rightarrow y_M\) strongly in \(L^2(Q)^n\) due to the compactness of the embedding \(W(0,T)\subset L^2(Q)\). Now, it is easy to pass to the limit in the state equation and to deduce that \(y_M\) is the state associated with \(u_M\). Finally, we have that

$$\begin{aligned} u_M \in {{\mathcal {U}}_M}\ \text { and }\ J(u_M) \le \liminf _{k \rightarrow \infty }J(u_k) = \inf {(\mathrm{P}_{M})}. \end{aligned}$$

Therefore, \(u_M\) is a solution of \({(\mathrm{P}_{M})}\). Then from (2.14) we infer that

$$\begin{aligned} \int _Q\left( B^\top \varphi _M + \lambda u_M\right) \cdot (u - u_M)\, dx\, dt \ge 0 \quad \forall u \in {{\mathcal {U}}_M}, \end{aligned}$$

where \(\varphi _M\) is the adjoint state associated with \(u_M\). This relation implies that

$$\begin{aligned} u_M = {\text {Proj}}_{{\mathcal {U}}_M}\left( -\frac{1}{\lambda }B^\top \varphi _M\right) , \end{aligned}$$
(2.17)

where the projection is taken in the \(L^2(Q)^{n_c}\) norm. Let us prove that there exists \({{\bar{M}}} \ge M_0\) such that \(\{u_M\}_{M \ge {{\bar{M}}}}\) is bounded in \(L^\infty (0,T;L^2(\varOmega ))^{n_c}\). To this end, we first observe that \(J(u_M) \le J(u_0) < \infty \;\forall M \ge M_0\). This implies that

$$\begin{aligned} \Vert u_M\Vert _{L^2(Q)^{n_c}}^2 \le \frac{2}{\lambda }J(u_0)\quad \forall M \ge M_0. \end{aligned}$$
(2.18)

Since \(y_M \in L^\infty (Q)^n\), we can multiply the state equations (2.1) by \({\mathrm{e}}^{-2\eta t}y_M\) with \(\eta = 1 + |C_R| + \Vert A\Vert _{L^\infty (Q,{\mathbb {R}}^{n \times n})}\) and argue as in the proof of Theorem 2.1, without any truncation of R (replace \(R_M\) by R in that proof), to obtain (2.10). Then, using (2.18), we deduce from (2.10) the existence of a constant \(C_1\) such that

$$\begin{aligned} \Vert y_M\Vert _{L^\infty (0,T;L^2(\varOmega ))^n} \le C_1 \quad \forall M \ge M_0. \end{aligned}$$

Since \(y_M \in W(0,T)^n \subset C(0,T;L^2(\varOmega ))^n\), we infer from the above inequality

$$\begin{aligned} \Vert y_M\Vert _{L^2(Q)^n} \le C_1\sqrt{T}\ \text { and }\ \Vert y_M(T)\Vert _{L^2(\varOmega )^n} \le C_1 \quad \forall M \ge M_0. \end{aligned}$$
(2.19)

Now, from the adjoint state equation satisfied by \(\varphi _M\), it follows the existence of a constant \(C_2\) such that

$$\begin{aligned}&\Vert \varphi _M\Vert _{L^\infty (0,T;L^2(\varOmega ))^n}\\&\quad \le C_2\Bigg (\Vert C_Q\Vert _{L^\infty (Q,{\mathbb {R}}^{n_Q \times n})}\left[ \Vert C_Q\Vert _{L^\infty (Q,{\mathbb {R}}^{n_Q \times n})}\Vert y_M\Vert _{L^2(Q)^n}+ \Vert y_Q\Vert _{L^2(Q)^{n_Q}}\right] \\&\qquad + \Vert C_\varOmega \Vert _{L^\infty (\varOmega ,{\mathbb {R}}^{n_\varOmega \times n})}\left[ \Vert C_\varOmega \Vert _{L^\infty (\varOmega ,{\mathbb {R}}^{n_\varOmega \times n})}\Vert y_M(T)\Vert _{L^2(\varOmega )^n} + \Vert y_\varOmega \Vert _{L^2(\varOmega )^{n_\varOmega }}\right] \Bigg ). \end{aligned}$$

Combining this inequality and (2.19), we conclude that

$$\begin{aligned} \exists C_\infty > 0 : \frac{1}{\lambda }\Vert B\Vert _{L^\infty \left( Q;{\mathbb {R}}^{n_ \times n_c}\right) }\Vert \varphi _M\Vert _{L^\infty \left( 0,T;L^2(\varOmega )\right) ^n} \le C_\infty \quad \forall M \ge M_0. \end{aligned}$$
(2.20)

Let us introduce the index sets

$$\begin{aligned} I_a = \{j \in [1,\ldots ,n] : a_j \in {\mathbb {R}}\}\ \text { and }\ I_b = \{j \in [1,\ldots ,n] : b_j \in {\mathbb {R}}\}, \end{aligned}$$

and define

$$\begin{aligned} {{\bar{M}}} = \max \{C_\infty ,M_0\} + \big (\max _{j \in I_a}|a_j| + \max _{j \in I_b}|b_j|\big )\sqrt{|\varOmega |}. \end{aligned}$$

Notice that \(I_a\) or \(I_b\) or both can be empty. In any of these cases, the corresponding maximum is taken as 0. If we define \(\tilde{u} \in {{\mathcal {U}}_{a,b}}\) by

$$\begin{aligned} {{\tilde{u}}}_j(x,t) = \max \left\{ a_j,\min \left\{ b_j,\left( -\frac{1}{\lambda }B^\top \varphi _M(x,t)\right) _j\right\} \right\} , \end{aligned}$$

we have

$$\begin{aligned} |{{\tilde{u}}}_j(x,t)| \le \left| \left( -\frac{1}{\lambda }B^\top \varphi _M(x,t)\right) _j\right| + \min \left\{ 0,|a_j|\right\} + \min \{0,|b_j|\}, \end{aligned}$$

therefore

$$\begin{aligned} \Vert {{\tilde{u}}}\Vert _{L^\infty (0,T;L^2(\varOmega ))^{n_c}} \le C_\infty + \left( \max _{j \in I_a}|a_j| + \max _{j \in I_b}|b_j|\right) \sqrt{|\varOmega |} \le {{\bar{M}}}. \end{aligned}$$

This, along with (2.17), implies

$$\begin{aligned} u_M = {\text {Proj}}_{{\mathcal {U}}_M}\left( -\frac{1}{\lambda }B^\top \varphi _M\right) = {\text {Proj}}_{{\mathcal {U}}_{a,b}}\left( -\frac{1}{\lambda }B^\top \varphi _M\right) = {{\tilde{u}}}. \end{aligned}$$
(2.21)

In this way, we have proved that \(\Vert u_M\Vert _{L^\infty (0,T;L^2(\varOmega ))^{n_c}} \le {{\bar{M}}} \;\forall M \ge {{\bar{M}}}\).

Now, we show that, for every \(M\ge {{\bar{M}}}\), \(u_M\) is a solution of \({\mathrm{(P)}} \). Let us take u in the space \(L^\infty (0,T;L^2(\varOmega ))^{n_c}\) and set \(M'=\Vert u\Vert _{L^\infty (0,T;L^2(\varOmega ))^{n_c}}\). If \(M'\le M\), then \(u\in {{\mathcal {U}}_M}\) and \(J(u_M)\le J(u)\). If \(M'> M\), consider \(u_{M'}\), a solution of \(\mathrm {(P}_{M'}\mathrm {)}\). We have that \(\Vert u_{M'}\Vert _{L^\infty (0,T;L^2(\varOmega ))^{n_c}}\le {{\bar{M}}}\le M\), and hence \(u_{M'}\in {{\mathcal {U}}_M}\), so \(J(u_M)\le J(u_{M'})\le J(u)\), and the proof is complete.

Finally, an application of the Corollary 2.1 with \({{\bar{p}}} = +\infty \) and \({{\bar{q}}} = 2\) leads to (2.16). \(\square \)

Remark 2.3

Let us notice that the optimal controls, whose existence is proved in Theorems 2.3 and 2.4 , belong to \(C([0,T];L^2(\varOmega ))^{n_c} \cap L^2(0,T;H^1(\varOmega ))^{n_c}\) assuming that \(b_{ij} \in C([0,T];W^{1,\infty }(\varOmega ))\) for every entry of B. Indeed, this regularity is well known for the adjoint state. Since the associated optimal control is taken as a projection on \({{\mathcal {U}}_{a,b}}\), this regularity is transferred to the control. The projection formula in the first case follows from (2.6) and it is given in (2.21) in the second case.

3 State constrained control problem

3.1 Optimal control problem and existence of a solution

In this section we analyze the following state constrained control problem

$$\begin{aligned} {\mathrm{(PS)}} \left\{ \begin{array}{l}\min J(u)\\ \text{ subject } \text{ to } \ \ u \in {{\mathcal {U}}_{a,b}}\ \ \text {and}\\ g_1(x,t) \le g(x,t,y_u(x,t)) \le g_2(x,t) \quad \forall (x,t) \in K. \end{array}\right. \end{aligned}$$

We impose the following assumptions on the data of the control problem.

(A1) :

The control u is related to the state \(y_u\) through the system (2.1). We assume that (2.2)–(2.4) hold and \(y_0 \in C({{\bar{\varOmega }}})\). We set \(Y = L^2(0,T;H^1(\varOmega ))^n \cap C({{\bar{Q}}})^n\). Notice that according to Theorem 2.1, the states \(y_u \in Y\) due to the continuity of \(y_0\); see Remark 2.1.

(A2) :

The cost functional is given as in Sect. 2.2 with \(\lambda \ge 0\) and the same regularity for matrix functions \(C_Q\), \(C_\varOmega \), and functions \(y_Q\), and \(y_\varOmega \).

(A3) :

We assume that \(a, b \in {\mathbb {R}}^{n_c}\) with \(a_j < b_j\) for \(1 \le j \le n_c\).

(A4) :

K is a compact subset of \({{\bar{Q}}}\) and the function \(g:K \times {\mathbb {R}}^n \longrightarrow {\mathbb {R}}\) is continuous, together with its partial derivatives \(\partial g/\partial y_j \in C(K \times {\mathbb {R}}^n), j = 1, \ldots , n\). We also assume that \(g_1, g_2:K \longrightarrow {\mathbb {R}}\) are continuous functions with \(g_1(x,t) < g_2(x,t) \;\forall (x,t) \in K\), and that either \(K \cap ({{\bar{\varOmega }}} \times \{0\}) = \emptyset \) or \(g_1(x,0)< g(x,t,y_0(x)) < g_2(x,0)\) holds for every \((x,0) \in K \cap ({{\bar{\varOmega }}} \times \{0\})\). We introduce the sets

$$\begin{aligned} {\mathcal {Y}}_{g} = \{y \in C(K) : g_1(x,t) \le y(x,t) \le g_2(x,t)\quad \forall (x,t) \in K\} \end{aligned}$$

and

$$\begin{aligned} {{\mathcal {U}}_{\mathrm{ad}}}=\{u \in {{\mathcal {U}}_{a,b}}: g_1(x,t) \le g(x,t,y_u(x,t)) \le g_2(x,t) \quad \forall (x,t) \in K\}. \end{aligned}$$

Under the above assumptions, we have that \({{\mathcal {U}}_{a,b}}\) is a closed, convex and bounded subset of \(L^\infty (Q)^{n_c}\) and the mapping \(G:L^\infty (Q)^{n_c} \longrightarrow Y\), defined as in Sect. 2.1 by \(G(u) = y_u\), is of class \(C^1\). The derivative \(z_v = G'(u)v \in Y\) is the solution of (2.13). From Corollary 2.1 we know that \(J:L^\infty (Q)^{n_c} \longrightarrow {\mathbb {R}}\) is of class \(C^1\) and its derivative \(J'(u)v\) is given by (2.14).

By using the above assumptions we can prove the following theorem on existence of an optimal control.

Theorem 3.1

Under the assumptions (A1)(A4), if \({{\mathcal {U}}_{\mathrm{ad}}}\ne \emptyset \), then (PS) has at least a solution.

Proof

The proof follows classical arguments. Indeed, since \({{\mathcal {U}}_{\mathrm{ad}}}\ne \emptyset \) there exists a minimizing sequence \(\{u_k\}_{k=1}^\infty \subset {{\mathcal {U}}_{\mathrm{ad}}}\) of (PS). This sequence is bounded in \(L^\infty (Q)^{n_c}\) and the associated states \(\{y_k\}_{k = 1}^\infty \) are bounded in Y. Additionally, this boundedness along with the partial differential Eq. (2.1) implies that \(\{\partial _ty_k\}_{k = 1}^\infty \) is bounded in \(L^2(0,T;H^1(\varOmega )^*)^n\). Hence, \(\{y_k\}_{k = 1}^\infty \) is bounded in \(W(0,T)^n\), which is compactly embedded in \(L^2(Q)^n\).

Therefore, we can take subsequences, denoted in the same way, such that \(u_k {\mathop {\rightharpoonup }\limits ^{*}} {{\bar{u}}}\) in \(L^\infty (Q)^{n_c}\) with \({{\bar{u}}} \in {{\mathcal {U}}_{a,b}}\), and \(y_k \rightarrow {{\bar{y}}}\) strongly in \(L^2(Q)^n\) and \(y_k(x,t) \rightarrow {{\bar{y}}}(x,t)\) for almost all \((x,t) \in Q\). Now, it is easy to pass to the limit in the state Eq. (2.1) as \(k \rightarrow \infty \) and to prove that \({{\bar{y}}} \in Y\) is the state associated to \({{\bar{u}}}\). Moreover, using the continuity of g we can pass to the limit in the inequality \(g_1(x,t) \le g(x,t,y_k(x,t)) \le g_2(x,t)\) to obtain \(g_1(x,t) \le g(x,t,{{\bar{y}}}(x,t)) \le g_2(x,t)\) for almost all \((x,t) \in K\). But the continuity of the functions \({{\bar{y}}}\), \(g_1\), \(g_2\) and g implies that the above inequality holds for all \((x,t) \in K\). Hence, we have that \({{\bar{u}}} \in {{\mathcal {U}}_{\mathrm{ad}}}\). Finally it is obvious that

$$\begin{aligned} J({{\bar{u}}}) \le \liminf _{k \rightarrow \infty }J(u_k) = \inf {\mathrm{(PS)}}, \end{aligned}$$

which proves that \({{\bar{u}}}\) is solution of (PS). \(\square \)

3.2 Optimality system

Hereafter, \({{\bar{u}}}\) will denote a local minimum of (PS) with associated state \({{\bar{y}}}\). In order to get the optimality conditions satisfied by \({{\bar{u}}}\) in a qualified form we assume the following linearized Slater condition.

(A5) :

There exists \(u_0 \in {{\mathcal {U}}_{a,b}}\) such that

$$\begin{aligned} g_1(x,t)< g(x,t,{{\bar{y}}}(x,t)) + \frac{\partial g}{\partial y}(x,t,\bar{y}(x,t))z_0(x,t) < g_2(x,t) \quad \forall (x,t) \in K, \end{aligned}$$
(3.1)

where \(z_0 \in Y\) is the unique solution of the linearized equation

$$\begin{aligned} \left\{ \begin{array}{ll} \displaystyle \frac{\partial z}{\partial t} - \varDelta z + \frac{\partial R}{\partial y}(x,t,{{\bar{y}}})z + A(x,t)z = B(x,t)(u_0 - {{\bar{u}}})&{} \quad \text {in }Q,\\ \displaystyle \partial _\nu z = 0&{}\quad \text {on } \varSigma ,\\ z(x,0) = 0&{}\quad \text {in } \varOmega .\end{array}\right. \end{aligned}$$
(3.2)

This section is devoted to the proof of the following theorem.

Theorem 3.2

Let \({{\bar{u}}}\) be a local solution of (PS) and suppose that the assumptions (A1)(A4) hold. Then there exist a real number \({{\bar{\mu }}}_0 \ge 0\), a regular Borel measure \({{\bar{\mu }}} \in {{\mathcal {M}}(K)}\), and a function \({{\bar{\varphi }}}\in L^r(0,T;W^{1,s}(\varOmega ))^n\), for all \(s, r \in [1,2)\) with \(\frac{2}{r} + \frac{d}{s} > d + 1\), such that

$$\begin{aligned}&{{\bar{\mu }}}_0 + \Vert {{\bar{\mu }}}\Vert _{{\mathcal {M}}(K)}> 0, \end{aligned}$$
(3.3)
$$\begin{aligned}&\left\{ \begin{array}{ll} \displaystyle -\frac{\partial {{\bar{\varphi }}}}{\partial t} - \varDelta {{\bar{\varphi }}} + \frac{\partial R}{\partial y}(x,t,{{\bar{y}}}){{\bar{\varphi }}} + A(x,t)^\top {{\bar{\varphi }}}&{}\\ \qquad \qquad = {{\bar{\mu }}}_0C_Q^\top [C_Q{{\bar{y}}} - y_Q] + \nabla _yg(x,t,{{\bar{y}}}){{\bar{\mu }}}_Q&{} \text {in }Q,\\ \displaystyle \partial _\nu {{\bar{\varphi }}} = \displaystyle \nabla _yg(x,t,{{\bar{y}}}){{\bar{\mu }}}_\varSigma &{}\text {on } \varSigma , \\ {{\bar{\varphi }}}(\cdot ,T)={{\bar{\mu }}}_0C^\top _\varOmega [C_\varOmega {{\bar{y}}}(\cdot ,T) - y_\varOmega ] \displaystyle + \nabla _yg(x,t,{{\bar{y}}}){{\bar{\mu }}}_\varOmega &{}\text {in } {{\bar{\varOmega }}}, \end{array}\right. \end{aligned}$$
(3.4)
$$\begin{aligned}&\int _K(z(x,t) - g(x,t,{{\bar{y}}}(x,t)) d{{\bar{\mu }}}(x,t) \le 0 \quad \forall z \in {\mathcal {Y}}_{g}, \end{aligned}$$
(3.5)
$$\begin{aligned}&\int _Q(B^\top {{\bar{\varphi }}} + {{\bar{\mu }}}_0\lambda {{\bar{u}}})\cdot (u - {{\bar{u}}})\, dx\, dt \ge 0 \quad \forall u \in {{\mathcal {U}}_{a,b}}. \end{aligned}$$
(3.6)

If in addition the Slater assumption (A5) holds, then the above optimality system is satisfied with \({{\bar{\mu }}}_0 = 1\).

In the optimality system, the measures \({{\bar{\mu }}}_Q\), \({{\bar{\mu }}}_\varSigma \) and \({{\bar{\mu }}}_\varOmega \) are the restrictions of \({{\bar{\mu }}}\) to \(K \cap Q\), \(K \cap \varSigma \) and \(K \cap ({{\bar{\varOmega }}} \times \{T\})\), respectively.

Before proving this theorem, we recall what a solution of (3.4) is. We will do it in an abstract framework. We consider a vector of real and regular Borel measures \(\mu \in {{\mathcal {M}}({{\bar{Q}}})}^n\) such that \(|\mu _j|({{\bar{\varOmega }}} \times \{0\}) = 0\) for \(1 \le j \le n\). We decompose \(\mu = \mu _Q + \mu _\varSigma + \mu _\varOmega \) by taking the restrictions of \(\mu \) to Q, \(\varSigma \) and \({{\bar{\varOmega }}} \times \{T\}\), respectively. Now, we consider the system

$$\begin{aligned} \left\{ \begin{array}{ll} \displaystyle -\frac{\partial \varphi }{\partial t} - \varDelta \varphi + D(x,t)\varphi = \mu _Q&{} \quad \text {in }Q,\\ \displaystyle \partial _\nu \varphi = \mu _\varSigma &{}\quad \text {on } \varSigma ,\\ \varphi (\cdot ,T)=\mu _\varOmega &{}\quad \text {in } {{\bar{\varOmega }}},\end{array}\right. \end{aligned}$$
(3.7)

where \(D \in L^\infty (Q,{\mathbb {R}}^{n \times n})\).

Definition 3.1

We say that a function \(\varphi \in L^1(Q)^n\) is a weak solution of (3.7) if

$$\begin{aligned} \int _Q\varphi \cdot \left[ \frac{\partial \phi }{\partial t} - \varDelta \phi + D(x,t)^\top \phi \right] \, dx \, dt = \int _{{{\bar{Q}}}}\phi \cdot \, d\mu \quad \forall \phi \in \varPhi \end{aligned}$$
(3.8)

with

$$\begin{aligned} \varPhi = \left\{ \phi \in \left[ H^1(Q) \cap C({{\bar{Q}}})\right] ^n : \frac{\partial \phi }{\partial t} - \varDelta \phi \in L^\infty (Q)^n, \partial _\nu \phi = 0, \phi (\cdot ,0) = 0\right\} . \end{aligned}$$

Notice that the last integral in \({{\bar{Q}}}\) of (3.8) can be expanded as

$$\begin{aligned} \int _{{{\bar{Q}}}}\phi \cdot \, d\mu = \int _Q\phi \cdot \, d\mu _Q + \int _\varSigma \phi \cdot \, d\mu _\varSigma + \int _\varOmega \phi (T)\cdot \, d\mu _\varOmega . \end{aligned}$$

Theorem 3.3

System (3.7) has a unique solution \(\varphi \) in \(L^1(0,T;W^{1,1}(\varOmega ))^n\). Moreover, \(\varphi \) belongs to \(L^r(0,T;W^{1,s}(\varOmega ))^n\) for all \(s, r \in [1,2)\) with \(\frac{2}{r} + \frac{d}{s} > d + 1\), and there exists \(C_{r,s}\) such that

$$\begin{aligned} \Vert \varphi \Vert _{L^r(0,T;W^{1,s}(\varOmega ))^n} \le C_{r,s}\Vert \mu \Vert _{{\mathcal {M}}({{\bar{Q}}})^n}. \end{aligned}$$

The reader is referred to [6] or [10] for the proof of this theorem in the case of a scalar equation. The arguments are identical for the above system. In the proof of theorem some regularity results for the adjoint system to (3.7) are required, they are deduced from Theorem 2.1 just taking \(R \equiv 0\). From the regularity of \(\phi \) established in the above theorem and using a density argument, it is easy to prove that

$$\begin{aligned} \int _Q\varphi \cdot \left[ \frac{\partial \phi }{\partial t} - \varDelta \phi + D(x,t)^\top \phi \right] \, dx \, dt + \int _\varSigma \varphi \cdot \partial _\nu \phi \, dx\, dt= \int _{{{\bar{Q}}}}\phi \cdot \, d\mu \end{aligned}$$

for every \(\phi \in [H^1(Q) \cap C({{\bar{Q}}})]^n\) such that \(\partial _\nu \phi \in L^\infty (\varSigma )^n\) and \(\phi (\cdot ,0) = 0\) in \(\varOmega \).

Now, we can apply Theorem 3.3 to the system (3.4) with

$$\begin{aligned} D(x,t) = \frac{\partial R}{\partial y}(x,t,{{\bar{y}}}(x,t)) + A(x,t)^\top \end{aligned}$$

taking into account that \({{\bar{y}}} \in C({{\bar{Q}}})^n\) and using (2.4). For the right hand side of the equations we consider the measures

$$\begin{aligned} \mu _Q(A)&= \int _A{{\bar{\mu }}}_0C^\top _Q\left[ C_Q{{\bar{y}}} - y_Q\right] \, dx\, dt + \int _{K \cap A}\nabla _yg\left( x,t,{{\bar{y}}}(x,t)\right) \, d{{\bar{\mu }}}_Q,\\ \mu _\varSigma (B)&= \int _{K \cap B}\nabla _yg\left( x,t,{{\bar{y}}}(x,t)\right) \, d{{\bar{\mu }}}_\varSigma ,\\ \mu _\varOmega (C)&= \int _C{{\bar{\mu }}}_0C^\top _\varOmega \left[ C_\varOmega {{\bar{y}}}(\cdot ,T) - y_\varOmega \right] \, dx\, dt + \int _{K \cap C}\nabla _yg\left( x,T,{{\bar{y}}}(x,T)\right) \, d{{\bar{\mu }}}_\varOmega , \end{aligned}$$

for arbitrary Borel sets \(A \subset Q\), \(B \subset \varSigma \) and \(C \subset {{\bar{\varOmega }}} \times \{T\}\).

The optimality conditions (3.3)–(3.6) can be deduced from the following abstract theorem, whose proof can be found in [5, Theorem 5.2].

Theorem 3.4

Let U and Z be two Banach spaces and \(K \subset U\) and \(C \subset Z\) two convex subsets, C having a nonempty interior. Let \({{\bar{u}}} \in K\) be a solution of the optimization problem:

$$\begin{aligned} (Q)\left\{ \begin{array}{l} \text{ Min } J(u) \\ u \in K \text{ and } F(u) \in C,\end{array}\right. \end{aligned}$$

where \(J:U \longrightarrow (-\infty , +\infty ]\) and \(F:U \longrightarrow Z\) are two Gâteaux differentiable mappings at \({{\bar{u}}}\). Then there exist a real number \({{\bar{\mu }}}_0 \ge 0\) and an element \(\mu \in Z^*\) such that

$$\begin{aligned}&{{\bar{\mu }}}_0 + \Vert {{\bar{\mu }}}\Vert _{Z'} > 0, \end{aligned}$$
(3.9)
$$\begin{aligned}&\langle {\bar{\mu }},z - F({\bar{u}})\rangle _{Z^*,Z} \ \le 0 \quad \forall z \in C,\end{aligned}$$
(3.10)
$$\begin{aligned}&\langle {{\bar{\mu }}}_0J'({\bar{u}}) + [DF({\bar{u}})]^{\star }{\bar{\mu }}, u - {\bar{u}}\rangle _{U^*,U} \ \ge 0 \quad \forall u \in K. \end{aligned}$$
(3.11)

Moreover \({{\bar{\mu }}}_0\) can be taken equal to 1 if the following linearized Slater condition is satisfied:

$$\begin{aligned} \exists u_0 \in K \text{ such } \text{ that } F({\bar{u}}) + DF({\bar{u}})\cdot (u_0 - {\bar{u}}) \in \, {\mathrm{int}}\, C. \end{aligned}$$
(3.12)

Now, Theorem 3.3 follows by taking \(U = L^\infty (Q)^{n_c}\), \(K = {{\mathcal {U}}_{a,b}}\), \(Z= C(K)\), \(C = {\mathcal {Y}}_{g}\), J is the cost functional of (PS) and \(F(u) = g(\cdot ,\cdot ,G(u))\). Then, we have that (3.9) and (3.10) coincide with (3.3) and (3.5), respectively. Let us prove that (3.11) is the same as (3.6). To this end, we first introduce \({{\bar{\varphi }}}\) as the solution of the system (3.4). We observe that the chain rule along with the expression of \(z_v = G'({{\bar{u}}})v\) provided in (2.13) leads to

$$\begin{aligned} \langle [DF({{\bar{u}}})]^*{{\bar{\mu }}},v\rangle _{[L^\infty (Q)^{n_c}]^*,L^\infty (Q)^{n_c}}= & {} \langle {{\bar{\mu }}},[DF({{\bar{u}}})]v\rangle _{{{\mathcal {M}}(K)},C(K)}\\= & {} \left\langle {{\bar{\mu }}},\frac{\partial g}{\partial y}(\cdot ,\cdot ,{{\bar{y}}})[G'({{\bar{u}}})v]\right\rangle _{{{\mathcal {M}}(K)},C(K)}\\= & {} \int _K\frac{\partial g}{\partial y}(x,t,{{\bar{y}}}(x,t))z_v(x,t)\, d{{\bar{\mu }}}. \end{aligned}$$

From here we get, once again with the chain rule,

$$\begin{aligned}&\langle {{\bar{\mu }}}_0J'({\bar{u}}) + [DF({\bar{u}})]^{\star }{\bar{\mu }}, v\rangle _{[L^\infty (Q)^{n_c}]^*,L^\infty (Q)^{n_c}}\\&\quad = {{\bar{\mu }}}_0\Big (\int _Q\Big [C_Q^\top (C_Q{{\bar{y}}} - y_Q)\Big ]\cdot z_v\, dx\, dt + \int _\varOmega \Big [C_\varOmega ^\top (C_\varOmega {{\bar{y}}}(T) - y_\varOmega )\Big ]\cdot z_v(T)\, dx \\&\qquad + \lambda \int _Q{{\bar{u}}}\cdot v\, dx\, dt\Big ) + \int _K\frac{\partial g}{\partial y}(x,t,{{\bar{y}}}(x,t))z_v(x,t)\, d{{\bar{\mu }}}\\&\quad = {{\bar{\mu }}}_0\lambda \int _Q{{\bar{u}}}\cdot v\, dx\, dt + \int _{{{\bar{Q}}}}z_v \cdot d\mu , \end{aligned}$$

where \(\mu = \mu _Q + \mu _\varSigma + \mu _\varOmega \) is the measure introduced above. Now, we notice that Theorem 2.2 implies \(z_v \in \varPhi \) and \(\partial _\nu z_v = 0\). Then from Definition 3.1 applied to the system (3.4) with \(\phi = z_v\), and recalling that \(D = \frac{\partial g}{\partial y} + A^\top \) we obtain with (2.13)

$$\begin{aligned} \int _{{{\bar{Q}}}}z_v\cdot d\mu&= \int _Q{{\bar{\varphi }}}\cdot \left[ \frac{\partial z_v}{\partial t} - \varDelta z_v + \frac{\partial R}{\partial y}(x,t,{{\bar{y}}})z_v + Az_v\right] \, dx\, dt\\&= \int _Q{{\bar{\varphi }}}\cdot Bv\, dx \, dt = \int _Q\left( B^\top {{\bar{\varphi }}}\right) \cdot v\, dx \, dt. \end{aligned}$$

The last two identities lead to

$$\begin{aligned} \langle {{\bar{\mu }}}_0J'({\bar{u}}) + [DF({\bar{u}})]^{\star }{\bar{\mu }}, v\rangle _{[L^\infty (Q)^{n_c}]^*,L^\infty (Q)^{n_c}} = \int _Q(B^\top {{\bar{\varphi }}} + {{\bar{\mu }}}_0\lambda {{\bar{u}}})\cdot v\, dx \, dt. \end{aligned}$$

Hence, taking \(v = u - {{\bar{u}}}\) with \(u \in {{\mathcal {U}}_{a,b}}\), we conclude that (3.11) and (3.6) are identical. It is obvious that (3.1) and (3.12) are equal, hence the possibility of taking \({{\bar{\mu }}}_0 = 1\) under the Slater assumption follows from Theorem 3.4.

After having proved Theorem 3.2, let us draw some conclusion, namely some information on \({{\bar{\mu }}}\) that follows from (3.5).

Theorem 3.5

Assume that (A4) holds and \({{\bar{\mu }}} \in {{\mathcal {M}}(K)}\) satisfies (3.5). Then the following embeddings hold

$$\begin{aligned} \left\{ \begin{array}{l} {\text {supp}}({{\bar{\mu }}}^+) \subset \{(x,t) \in K : g(x,t,{{\bar{y}}}(x,t)) = g_2(x,t)\},\\ {\text {supp}}({{\bar{\mu }}}^-) \subset \{(x,t) \in K : g(x,t,{{\bar{y}}}(x,t)) = g_1(x,t)\}, \end{array}\right. \end{aligned}$$
(3.13)

where \({{\bar{\mu }}} = {{\bar{\mu }}}^+ - {{\bar{\mu }}}^-\) is the Jordan decomposition of \({{\bar{\mu }}}\),

The proof follows the lines of [9, Proposition 2.5] with obvious modifications. As a consequence of this theorem and the assumption (A4) we have that \(|{{\bar{\mu }}}|(K\cap ({{\bar{\varOmega }}} \times \{0\}) ) = 0\). Hence, the identity \({{\bar{\mu }}} = {{\bar{\mu }}}_Q + {{\bar{\mu }}}_\varSigma + {{\bar{\mu }}}_\varOmega \) holds.

3.3 A regularity result for local solutions

As in the previous section, \({{\bar{u}}}\) will denote a local minimum of (PS) with associate state and adjoint state \({{\bar{y}}}\) and \({{\bar{\varphi }}}\), respectively. In this section we impose the following additional assumption on the problem (PS).

(A6) :

The following structure is assumed for \(B = (b_{ij}): n_c \le n\) and

$$\begin{aligned} b_{ij}(x,t) = \left\{ \begin{array}{ll} 0&{}\quad \text {if } i \ne j,\\ b_j(x,t)&{}\quad \text {if } i = j,\end{array}\right. \text { with }b_j \in L^\infty (0,T;W^{1,\infty }(\varOmega )),\quad 1 \le j \le n_c. \end{aligned}$$

Moreover, there exists a constant \({{\bar{b}}} > 0\) such that \(|b_j(x,t)| \ge {{\bar{b}}}\) for almost all \((x,t) \in Q\) and \(1 \le j \le n_c\). We also assume that \(\lambda > 0\) and (3.4)–(3.6) holds with \({{\bar{\mu }}}_0 = 1\).

Under this assumption, the well known regularity \({{\bar{u}}} \in L^r(0,T;W^{1,s}(\varOmega ))^{n_c}\) for all \(r, s \in [1,2)\) with \(\frac{2}{r}+\frac{d}{s} > d+1\) follows from the projection formula

$$\begin{aligned} {{\bar{u}}}(x,t) = {\text {Proj}}_{[a,b]}\left( -\frac{1}{\lambda } B(x,t)^\top {{\bar{\varphi }}}(x,t)\right) \end{aligned}$$

equivalent to

$$\begin{aligned} {{\bar{u}}}_j(x,t) = {\text {Proj}}_{[a_j,b_j]}\left( -\frac{1}{\lambda } b_j(x,t){{\bar{\varphi }}}_j(x,t)\right) , \quad 1 \le j \le n_c, \end{aligned}$$
(3.14)

which is well known to be deduced from (3.6). However, this projection formula leads to higher regularity, namely \({{\bar{u}}} \in L^2(0,T;H^1(\varOmega ))^{n_c}\). The next lemma, proved in the Appendix, is the key tool to establish this regularity.

Lemma 3.1

Assume that (A6) holds, and let \({{\bar{\varphi }}}\) be the solution of (3.4). Given \(M > 0\), we set

$$\begin{aligned} \varphi _M(x,t) = {\text {Proj}}_{[-M,+M]^n}({{\bar{\varphi }}}(x,t)). \end{aligned}$$

Then, \(\varphi _M \in L^2(0,T;H^1(\varOmega ))^n\) and there exists a constant C depending on \(\varOmega \), \(C_R\), and \(\Vert A\Vert _{L^\infty (Q,{\mathbb {R}}^{n \times n})}\), but independent of M, such that

$$\begin{aligned} \Vert \varphi _M\Vert _{L^2(0,T;H^1(\varOmega ))^n} \le&\, C\bigg [\Vert C_Q^\top [C_Q{{\bar{y}}} - y_Q]\Vert _{L^2(Q)^n}\nonumber \\&+ \Vert C^\top _\varOmega [C_\varOmega {{\bar{y}}}(\cdot ,T) - y_\varOmega ]\Vert _{L^2(\varOmega )^n}\nonumber \\&+ \sqrt{M\Vert \nabla _yg (\cdot ,\cdot ,{{\bar{y}}})\Vert _{C(K)^n}\Vert {{\bar{\mu }}}\Vert _{{\mathcal {M}}(K)}}\,\bigg ]. \end{aligned}$$
(3.15)

Theorem 3.6

Assume that (A6) holds. Then, \({{\bar{u}}}\) belongs to \(L^2(0,T;H^1(\varOmega ))^{n_c}\).

Proof

Let us take

$$\begin{aligned} M > \frac{\lambda }{{{\bar{b}}}}\max \{|a|,|b|\}\ \text { and }\ \varphi _M(x,t) = {\text {Proj}}_{[-M,+M]^n}({{\bar{\varphi }}}(x,t)). \end{aligned}$$

Then, from Lemma 3.1 we know that \(\varphi _M \in L^2(0,T;H^1(\varOmega ))^n\). Due to the regularity of the functions \(b_j\) we also have that \(b_j\varphi _{M,j} \in L^2(0,T;H^1(\varOmega ))\) for every \(1 \le j \le n_c\). Now, from (3.14) we have

$$\begin{aligned}&{{\bar{u}}}_j(x,t) = {\text {Proj}}_{[a_j,b_j]}\left( -\frac{b_j(x,t) {{\bar{\varphi }}}_j(x,t)}{\lambda }\right) = {\text {Proj}}_{[a_j,b_j]} \left( -\frac{b_j(x,t)\varphi _{M,j}(x,t)}{\lambda }\right) . \end{aligned}$$

This implies that \({{\bar{u}}}_j \in L^2(0,T;H^1(\varOmega ))\) as well for \(1 \le j \le n_c\). It remains to prove that both projections coincide. This is obviously the case if \(|{{\bar{\varphi }}}_j(x,t)| \le M\). If \(|{{\bar{\varphi }}}_j(x,t)| > M\), the reader can easily confirm the following facts

  1. (1)

    \(-\frac{b_j(x,t)}{\lambda }{{\bar{\varphi }}}_j(x,t) \not \in [a_j,b_j],\)

  2. (2)

    \(\text { if } -\frac{b_j(x,t)}{\lambda }{{\bar{\varphi }}}_j(x,t)< a_j,\, \text {then}\,-\frac{b_j(x,t)}{\lambda }{{\bar{\varphi }}}_j(x,t)< -\frac{b_j(x,t)}{\lambda }\varphi _{M,j}(x,t) < a_j\).

  3. (3)

    \(\text { if } -\frac{b_j(x,t)}{\lambda }{{\bar{\varphi }}}_j(x,t)> b_j,\, \text {then}\,-\frac{b_j(x,t)}{\lambda }{{\bar{\varphi }}}_j(x,t)> -\frac{b_j(x,t)}{\lambda }\varphi _{M,j}(x,t) > b_j\).

In either case the equality of the projections follows. \(\square \)

4 Examples

As applications, we consider systems of equations in two-dimensional spatial domains (\(d=2\)) that develop spiral waves or moving localized spots as solutions. Spiral waves appear for the FitzHugh–Nagumo equations, a system of 2 equations, while localized spots arise for a system of 3 equations. In all the examples, the aim is to move the appearing state function in a prescribed way. All examples are numerically very challenging but show, on the other hand, the geometrical beauty of solutions to the selected reaction–diffusion equations.

Example 1

Translation of a spiral wave along a circle

We consider the FitzHugh–Nagumo system (1.4) in \(\varOmega = (0, L_\varOmega )^2\) with \(L_\varOmega := 75\), for \(T = 1000\), and subject to homogeneous Neumann boundary conditions. The parameters of the system read \(\sigma _y = \sigma _z = D_y = 1\), \(\alpha = 1\), \(\beta = 0.05\), \(\gamma = 0.0125\), \(\delta = 0\), \(\lambda = 10^{-6}\), and the nonlinearity is \(R(y) = y(y-0.01)(y-1)\).

As pointed out in Remark 1.1, the system (1.4) does not directly fit to (1.1), since the second diagonal element of D is zero. However, our theory remains true with obvious modifications. For the necessary optimality conditions and the adjoint equation we refer to [12], where this example was not considered. Here, our control task is to translate a naturally developed spiral wave pattern along a given circle. By a standard method that is explained in [23, p. 48], a rotation of the states is triggered: spiral waves \(y_0\) and \(z_0\) are computed in \(\varOmega \) as initial states for the system (1.4); they are depicted in Fig. 1.

The area of the center of the spiral, the so-called “core”, is located around the position \((3/4\,L_\varOmega , 1/2\,L_\varOmega ) = (56.25, 37.5)\).

Fig. 1
figure 1

Initial states \(y_0\) (left) and \(z_0\) (right) for Eq. (1.4)

The desired trajectory \(y_Q\) equals the uncontrolled natural state y evolving from \((y_0,z_0)\) that is translated in counter-clockwise direction along a circular shape with radius \(1/4\,L_\varOmega \) around the center of the domain \((1/2\,L_\varOmega ,1/2\,L_\varOmega )\). Due to the Neumann boundary conditions, this is delicate issue.

However, the position of a spiral pattern is basically determined by the location of its core. Translating the core, the arm of the spiral follows accordingly with some delay. Hence, we consider the desired trajectory only in a circle-shaped area of radius 15 around the desired center X(t) of the spiral, given by \( X(t) := (1/2\,L_\varOmega + 1/4\,L_\varOmega \,\cos (2\pi t/T), 1/2\,L_\varOmega + 1/4\,L_\varOmega \,\sin (2\pi t/T))\). We set

$$\begin{aligned} { C_Q(x,t) := \left\{ \begin{array}{ll} 1 &{}\quad \text{ if } |x - X(t)| \le 7.5,\\ |x - X(t)|/7.5 - 1 &{}\quad \text{ if } 7.5< |x - X(t)| < 15,\\ 0 &{}\quad \text{ if } |x - X(t)| \ge 15.\\ \end{array}\right. } \end{aligned}$$

Figure 2 displays \(C_Q\) at \(t = 0\) as well as the product \(C_Q\,y_Q\) for some times t. The remaining parameters of the optimal control problem (P) are set to \(u_a = -\,1\), \(u_b = 1\), and \(C_\varOmega = y_\varOmega \equiv 0\).

Fig. 2
figure 2

Support function \(C_Q(\cdot ,0)\) (left) and product \(C_Q(\cdot ,t)\,y_Q(\cdot ,t)\) (right) for \(t \in \{0, 210, 420, 630, 840\}\), where the pattern that is most faded away indicates the associated product for the earliest time \(t = 0\) and so on. The dashed white line illustrates the center X(t) of the circle-shaped support function \(C_Q\) for \(t \in (0,840)\)

As optimization algorithm, a projected gradient-method with non-linear CG-step was chosen. Due to the large time-horizon with 3001 time-steps, and the circumstance that an entire desired trajectory \(y_Q\) is given, we employed Model Predictive Control with 4 time-steps of length \(\tau \) in each sub-problem. This means that we solve a sequence of (discretized) short time optimal control problems in a time horizon of length \(4 \tau \) starting with the interval \([0,4 \tau ]\). From the 4 values computed for the discretized optimal control in this short time interval, we keep the first value for the final suboptimal control. Next we move the time horizon one time step to the right, compute the next optimal control for this shifted time horizon and keep again its first value. After having solved a finite number of small optimal control problems, we arrive at a suboptimal control on [0, T]. The short time control problems were solved by a nonlinear CG-step.

Moreover, since the chosen discretization of \(101 \times 101\) grid points in space still leads to fairly high computation times, only a semi-implicit Euler-scheme for solving the discrete systems was applied. Yet, 13.42 h for computing the suboptimal control \({\bar{u}}\) was quite large.

Figure 3 illustrates the computed (sub-)optimal control \({\bar{u}}\) with associated activator state \({\bar{y}}\) at several times t. As shown, the control task is satisfied; the suboptimal value of the objective functional was \(f({\bar{u}},{\bar{y}}) = 9.899 \times 10^{-3}\).

Fig. 3
figure 3

Example 1: (sub)optimal control \({\bar{u}}\) (top row) and associated activator state \({\bar{y}}\) (bottom row) for \(t=210,420,630,840\)

One should expect that the optimal control concentrates at the support of the function \(C_Q\). However, the highest amplitudes of the control appear at the boundary of the circumcircle of the support. The reason is that the profile of the given desired trajectory is the one of an uncontrolled “standing” spiral wave. A translation of the pattern naturally leads to some deformations of the profile and the control aims to suppress those deformations where \(C_Q\) is positive.

Example 2

Translation of a propagating spot along a circle

The realization of this and the next example involves the system (1.5), (1.7) with three components, i.e. with \(m=2\). We mainly adopt the system parameters from [2] and [26], namely \(\sigma _y= \sigma _2 = 1\), \(\sigma _1 = 48\), \(D_y= 15 \times 10^{-5}\), \(D_1 = 186 \times 10^{-6}\), \(D_2 = 96 \times 10^{-4}\), \(R(x,t,y) = R(y) = y(y + \sqrt{2})(y - \sqrt{2}) + 6.92\), \(\alpha _1 = 1\), \(\alpha _2 = 8.5\), \(\beta _1 = \beta _2 = \gamma _1 = \gamma _2 = 1\), and \(\delta _1 = \delta _2 = 0\) in (1.5), (1.7). The spatial domain is \(\varOmega = (0,L_\varOmega )^2\) with \(L_\varOmega = 0.5\).

Similarly to Example 1, the control task is to translate a naturally developed pattern along a circle-shaped curve in \(\varOmega \) in counter-clockwise direction. For this purpose, we proceed as follows. First, we construct a natural developed spot profile as follows: we take as auxiliary initial states

$$\begin{aligned} {\tilde{y}}_0(x)&:= \left\{ \begin{array}{ll} 1.2 &{}\quad \text{ if } x \in [0.09,0.13] \times [0.29,0.31],\\ -\,0.8 &{}\quad \text{ elsewhere }, \end{array}\right. \\ {\tilde{z}}^1_0(x)&:= \left\{ \begin{array}{rl} -\,0.3 &{}\quad \text{ if } x \in [0.05,0.1] \times [0.29,0.31],\\ -\,0.8 &{}\quad \text{ elsewhere }, \end{array}\right. \\ {\tilde{z}}^2_0(x)&:= \left\{ \begin{array}{ll} -\,0.65 &{}\quad \text{ if } x \in [0.09,0.13] \times [0.29,0.31],\\ -\,0.8 &{}\quad \text{ elsewhere }, \end{array}\right. \end{aligned}$$

and solve the system (1.5), (1.7) for \(u \equiv 0\) with initial data \(({\tilde{y}}_0,{\tilde{z}}^1_0,{\tilde{z}}^2_0)\) subject to periodic boundary conditions. Eventually, after a finite time, a stable spot profile is generated. As soon as the center of mass of the pattern is in the center of the domain \(\varOmega \), we replace the boundary conditions by homogeneous Neumann-type. The violation of those conditions is negligible and disappears after a few further time-steps. In fact, we even let enough time pass by to have the center of mass of the spot profile in the activator state y situated in \((3/4\,L_\varOmega , 1/2\,L_\varOmega ) = (0.45,0.3)\). This state is taken as initial state, cf. Fig. 4.

Fig. 4
figure 4

Initial states \(y_0\) (left), \(z^1_0\) (middle), and \(z^2_0\) (right)

Analogously to Example 1, we define the support function

$$\begin{aligned} C_Q(x,t) := \left\{ \begin{array}{ll} 1 &{}\quad \text{ if } |x - X(t)| \le 0.05,\\ 20\,|x - X(t)| - 1 &{}\quad \text{ if } 0.05< |x - X(t)| < 0.1,\\ 0 &{}\quad \text{ if } |x - X(t)| \ge 0.1,\\ \end{array}\right. \end{aligned}$$

where X(t) is defined as in Example 1. The desired state \(y_Q\) is defined by

$$\begin{aligned} y_Q(x,t) := \left\{ \begin{array}{ll} y_0(x + (3/4\,L_\varOmega , 1/2\,L_\varOmega ) - X(t)) &{}\quad \text{ if } |x - X(t)| \le 0.1,\\ 0 &{}\quad \text{ elsewhere },\\ \end{array}\right. \end{aligned}$$

and we fix \(C_\varOmega = y_\varOmega \equiv 0\), \(T = 250.3\), \(\lambda = 10^{-6}\), as well as \(u_a = -\,1\) and \(u_b = 1\).

The problem is solved numerically in the same way as for Example 1, this time with 2504 time-steps. The computed suboptimal objective value is \(f({\bar{u}},{\bar{y}}) = 2.49 \times 10^{-6}\). The associated (sub-)optimal computed control \({\bar{u}}\) is shown in Fig. 5 along with the state \({\bar{y}}\) for various times t.

Fig. 5
figure 5

Example 2: (sub)optimal control \({\bar{u}}\) (upper row) and associated activator state \({\bar{y}}\) (bottom row) for \(t =50,\, 100,\, 150, \, 200\)

An interesting property of the spot can be observed in this example. The pattern is oriented and hence, a natural translation of the spot in positive \(x_1\)-direction would occur in the uncontrolled case. However, the desired trajectory is determined by a spot profile in which orientation and natural movement do not comply. They only comply in \(t \approx 3/4\,T\) which causes the computed control to have much lower amplitudes during these times. In all other times, the control does not only force the spot to change its position but also to keep its “non-complying” profile unchanged.

For control tasks of this type, namely translating a natural developed pattern in a reaction–diffusion equation without changing the profile, we also refer to [17,18,19, 24].

Example 3

Keeping a spot solution away from the boundary

The spot solution to (1.5), (1.7) can get trapped by the boundary \(\partial \varOmega \). Figure 6 illustrates the natural propagation of such a pattern for \(\varOmega = (0,0.4)^2\), \(T = 100\), \(\sigma _y= \sigma _2 = 1\), \(\sigma _1 = 48\), \(D_y= 1 \times 10^{-4}\), \(D_1 = 186 \times 10^{-6}\), \(D_2 = 96 \times 10^{-4}\), \(R(x,t,y) = R(y) = y(y + \sqrt{2})(y - \sqrt{2}) + 6.92\), \(\alpha _1 = 1\), \(\alpha _2 = 8.5\), \(\beta _1 = \beta _2 = \gamma _1 = \gamma _2 = 1\), and \(\delta _1 = \delta _2 = 0\).

Fig. 6
figure 6

Example 3: activator component of the initial state \({\tilde{y}}_0\) (left) and natural development of this component in (0, 100) for \(u \equiv 0\) (right). The white dashed line indicates the position of the center of mass of the pattern. In \(t \approx 60\), the spot gets trapped by the Neumann-boundary

Now, our task is to prevent the spot from touching the boundary. For this purpose, we define \( {\tilde{c}}_y(x) := 0.2-\max \{|x_1-0.2|,|x_2-0.2|\}\), \(g_2(\cdot ,t) := \min \{2,40\,{\tilde{c}}_y\}-0.5\), and consider the pointwise state-constraints

$$\begin{aligned} y(x,t) \le g_2(x,t) \quad \forall (x,t) \in {\overline{Q}}. \end{aligned}$$

Notice that \(0 \le {\tilde{c}}_y(x) \le 0.2\), \({\tilde{c}}_y(x) = 0\) holds on \(\partial \varOmega \), and we have \({\tilde{c}}_y(x) = 0.2\) in the midpoint of \(\varOmega \). Therefore, the value \(g_2 = -\,0.5\) is attained, if y hits the boundary and \(g_2 = 1.5\) is sufficiently large in the midpoint. The distance of the curve \(g_2 = 0\) to \(\partial \varOmega \) is \(0.0125 = 1/80\). The core of the uncontrolled spot is mainly positive and the principal form of the spots is known to be quite stable. Therefore, it was sufficient to bound y from above by \(g_2\). This keeps the spot away from the boundary. An additional lower bound would have made the computations a bit more demanding.

To complete the setup of the optimal control problem, we set \(C_\varOmega = y_\varOmega = C_Q = y_Q \equiv 0\), \(\lambda = 10^{-6}\), \(u_a = -\,1\), and \(u_b = 1\).

The numerical solution of this example is based on a finite difference discretization with \(h = 1/200\), \(\tau = 0.1\) as step sizes in space and time and is performed as in the preceding 2 examples.

Again, we applied Model Predictive Control with 4 time-steps and a nonlinear CG optimization method for solving each subproblem. The state constraints have been included in an associated penalty term.

It turns out that only a small negative impulse of an amplitude of maximal 0.007 in \(t \in (49,55)\) is sufficient to push the spot away from the boundary. Figure 7 shows the described behaviour. Let us also emphasize that similar examples with multiple spots in the domain instead of only one lead to analogue results.

Fig. 7
figure 7

Computed (sub-)optimal control \({\bar{u}}\) in \(t = 50\) (left) and the activator component \({\bar{y}}\) of its associated state for \(t=0, 20, 40, 60, 80, 100\) (right)

Remark 4.1

By techniques of sparse control, the support of the control functions can be reduced significantly. For associated examples, we refer the reader to [11] and the thesis [23], where the analysis of sparse optimal control for reaction–diffusion equations is developed up to second-order sufficient optimality conditions.