Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Optimal control problems (OCPs) governed by diffusion-convection-reaction equations arise in environmental control problems, optimal control of fluid flow and in many other applications. It is well known that the standard Galerkin finite element discretization causes non-physical oscillating solutions when convection dominates. Stable and accurate numerical solutions can be achieved by various effective stabilization techniques such as the streamline upwind/Petrov Galerkin (SUPG) finite element method [10], the local projection stabilization [4], the edge stabilization [19] for steady state OCPs. Recently, discontinuous Galerkin (dG) methods gain importance due to their better convergence behaviour, local mass conservation, flexibility in approximating rough solutions on complicated meshes, mesh adaptation and weak imposition of the boundary conditions in OCPs (see, e.g., [21, 22, 36, 37]).

In the recent years, much effort has been spent on parabolic OCPs (see, e.g., [2, 24]). However, there are few publications dealing with OCPs governed by nonstationary diffusion-convection-reaction equation. In many publications, for space discretization, conforming finite elements are used. In [16, 17], a priori error estimates are derived for the characteristic finite element method, whereas for time discretization, backward Euler method is used. Crank-Nicolson time discretization is applied to OCP of diffusion-convection equation in [5]. In the study of Chrysafinos [7], a priori error estimates for unconstrained parabolic OCP, where conforming finite elements are combined with dG time discretization, are presented by decoupling the optimality system. In [17], error analysis concerning the characteristic finite element solution of the OCP with control constraints is discussed. The local dG approximation of the OCP which is discretized by backward Euler in time is derived in [38] and a priori error estimates for semi-discrete OCP are provided in [30]. We present a priori error analysis for SIPG discretization combined with backward Euler method in [1].

In this paper, we solve the OCP governed by diffusion-convection-reaction equation without control constraints by applying symmetric interior penalty Galerkin (SIPG) method in space and dG discretization in time [14]. We adapt the error analysis [7] to space-time dG discretization. For this purpose, we divide the error analysis into three parts as in [17] and use the error estimates for dG bilinear forms.

Discontinuous Galerkin discretization schemes have the pleasant property that discretization and dualization interchange, i.e. discretization and optimization commute. There are two different approaches for solving OCPs: optimize-then-discretize (OD) and discretize-then-optimize (DO). In OD approach, first, the infinite dimensional optimality system is derived containing the state and the adjoint equation and the variational inequality. Then, the optimality system is discretized by using a suitable discretization method in space and time. In DO approach, the infinite dimensional OCP is discretized and then the finite-dimensional optimality system is derived. OD and DO approaches do not commute in general for OCPs governed by diffusion-convection-reaction equation [10]. However, commutativity is achieved in the case of SIPG discretization for steady state problems [21, 36]. Recently, dG time discretization has been applied to PDE constrained optimization problems, which is solved by the indirect multiple shooting method [18]. The multiple shooting method was formulated in function spaces and discretized afterwards, where dG time discretization commutes for both approaches. The spatial mesh was adapted at each constant time step using a dual weighted residual error estimate.

The rest of the paper is organized as follows. In Sect. 2, we define the model problem and then derive the optimality system. In Sect. 3, dG discretization and the semi-discrete optimality system follow. In Sect. 4, the fully discrete optimality system, which is discretized by space-time dG method and \(\theta\)-scheme, are presented. Under dG time discretization, we show that OD and DO approaches commute for time-dependent problems, too. In Sect. 5, some auxiliary results accompanied with a priori error estimates for the fully discrete optimality system follow. In Sect. 6, numerical results are shown in order to evaluate the performance of the suggested method. Additionally, we give some numerical results for \(\theta\)-method and compare them with the dG time discretization. The paper ends with some conclusions.

2 The Optimal Control Problem

We consider the following distributed optimal control problem governed by the unsteady diffusion-convection-reaction equation

$$\displaystyle\begin{array}{rcl} \mathop{\text{ minimize }}\limits_{u \in L^{2}(0,T;L^{2}(\varOmega ))}\;J(y,u):= \frac{1} {2}\int _{0}^{T}& \big( & \left \|y - y_{ d}\right \|_{L^{2}(\varOmega )}^{2}\ +\alpha \left \|u\right \|_{ L^{2}(\varOmega )}^{2}\big)\;dt, \\ \text{ subject to }\partial _{t}y -\epsilon \varDelta y +\beta \cdot \nabla y + ry& =& f + u\quad (x,t) \in \varOmega \times (0,T], \\ y(x,t)& =& 0\quad \quad \;\;(x,t) \in \partial \varOmega \times [0,T], \\ y(x,0)& =& y_{0}(x)\qquad \;\;x \in \varOmega. {}\end{array}$$
(1)

We adopt the standard notations for Sobolev spaces on computational domains and their norms. Let Ω be a bounded convex polygonal domain in \(\mathbb{R}^{2}\) with Lipschitz boundary ∂ Ω. The inner product in L 2(Ω) is denoted by (⋅ , ⋅ ). The source function and the desired state are denoted by f ∈ L 2(0, T; L 2(Ω)) and y d  ∈ L 2(0, T; L 2(Ω)), respectively. The initial condition is also defined as y 0(x) ∈ H 0 1(Ω). The diffusion and the reaction coefficients are ε > 0 and \(r \in L^{\infty }(\varOmega )\), respectively. The velocity field \(\boldsymbol{\beta }\in (W^{1,\infty }(\varOmega ))^{2}\) satisfies the incompressibility condition, i.e. \(\nabla \cdot \boldsymbol{\beta } = 0\). Furthermore, we assume the existence of the constant C 0 such that r ≥ C 0 a.e. in Ω so that the well-posedness of the OCP (1) is guaranteed. The trial and the test spaces are \(Y = V = H_{0}^{1}(\varOmega ),\;\forall t \in (0,T]\).

It is well known that the functions \((y,u) \in H^{1}(0,T;L^{2}(\varOmega )) \cap L^{2}(0,T;Y ) \times L^{2}(0,T;L^{2}(\varOmega ))\) solve (1) if and only if there is an adjoint \(p \in H^{1}(0,T;L^{2}(\varOmega )) \cap L^{2}(0,T;Y )\) such that (y, u, p) is the unique solution of the following optimality system [23, 32],

$$\displaystyle\begin{array}{rcl} (\partial _{t}y,v) + a(y,v)& =& (f + u,v),\quad \forall v \in V,\quad \;\;y(x,0) = y_{0}, \\ -(\partial _{t}p,\psi ) + a(\psi,p)& =& -(y - y_{d},\psi ),\quad \forall \psi \in V,\quad p(x,T) = 0, \\ \int _{0}^{T}(\alpha u - p,w - u)\;dt& =& 0,\quad \forall w \in L^{2}(0,T;L^{2}(\varOmega )) {}\end{array}$$
(2)

with the bilinear form

$$\displaystyle{a(y,v) =\int _{\varOmega }(\epsilon \nabla y \cdot \nabla v +\beta \cdot \nabla yv + ryv)\;dx,}$$

where (⋅ , ⋅ ) is the inner product in L 2(Ω).

3 Discontinuous Galerkin Semidiscretization

Let \(\{\mathcal{T}_{h}\}_{h}\) be a family of shape regular meshes such that \(\overline{\varOmega } = \cup _{K\in \mathcal{T}_{h}}\overline{K}\), \(K_{i} \cap K_{j} =\emptyset\) for \(K_{i},K_{j} \in \mathcal{T}_{h}\), i ≠ j. The diameters of elements K are denoted by h K . The maximum diameter is \(h =\max \limits _{K\in \mathcal{T}_{h}}h_{K}\). In addition, the length of an edge E is denoted by h E .

In this paper, we consider discontinuous piecewise finite element spaces to define the discrete test, state and control spaces

$$\displaystyle{ V _{h,p} = Y _{h,p} = U_{h,p} = \left \{y \in L^{2}(\varOmega )\,:\ y\mid _{ K} \in \mathbb{P}^{p}(K)\quad \forall K \in \mathcal{T}_{ h}\right \}. }$$
(3)

Here, \(\mathbb{P}^{p}(K)\) denotes the set of all polynomials on \(K \in \mathcal{T}_{h}\) of degree p.

We split the set of all edges \(\mathcal{E}_{h}\) into the set \(\mathcal{E}_{h}^{0}\) of interior edges and the set \(\mathcal{E}_{h}^{\partial }\) of boundary edges so that \(\mathcal{E}_{h} = \mathcal{E}_{h}^{\partial } \cup \mathcal{E}_{h}^{0}\). Let n denote the unit outward normal to ∂ Ω. We define the inflow boundary

$$\displaystyle{\varGamma ^{-} = \left \{x \in \partial \varOmega \,:\ \beta \cdot \mathbf{n}(x) < 0\right \}}$$

and the outflow boundary \(\varGamma ^{+} = \partial \varOmega \setminus \varGamma ^{-}\). The boundary edges are decomposed into edges \(\mathcal{E}_{h}^{-} = \left \{E \in \mathcal{E}_{h}^{\partial }\,:\ E \subset \varGamma ^{-}\right \}\) that correspond to inflow boundary and edges \(\mathcal{E}_{h}^{+} = \mathcal{E}_{h}^{\partial }\setminus \mathcal{E}_{h}^{-}\) that correspond to outflow boundary. The inflow and outflow boundaries of an element \(K \in \mathcal{T}_{h}\) are defined by

$$\displaystyle{ \partial K^{-} = \left \{x \in \partial K\,:\ \beta \cdot \mathbf{n}_{ K}(x) < 0\right \},\quad \partial K^{+} = \partial K\setminus \partial K^{-}, }$$

where n K is the unit normal vector on the boundary ∂ K of an element K.

Let the edge E be a common edge for two elements K and K e. For a piecewise continuous scalar function y, there are two traces of y along E, denoted by y |  E from interior of K and y e |  E from interior of K e. Then, the jump and average of y across the edge E are defined by:

$$\displaystyle{ \left [\!\left [y\right ]\!\right ] = y\vert _{E}\mathbf{n}_{K} + y^{e}\vert _{ E}\mathbf{n}_{K^{e}},\quad \left \{\!\!\left \{y\right \}\!\!\right \} = \frac{1} {2}\big(y\vert _{E} + y^{e}\vert _{ E}\big). }$$
(4)

Similarly, for a piecewise continuous vector field ∇y, the jump and average across an edge E are given by

$$\displaystyle{ \left [\!\left [\nabla y\right ]\!\right ] = \nabla y\vert _{E} \cdot \mathbf{n}_{K} + \nabla y^{e}\vert _{ E} \cdot \mathbf{n}_{K^{e}},\quad \left \{\!\!\left \{\nabla y\right \}\!\!\right \} = \frac{1} {2}\big(\nabla y\vert _{E} + \nabla y^{e}\vert _{ E}\big). }$$
(5)

For a boundary edge \(E \in K\cap \varGamma\), we set \(\left \{\!\!\left \{\nabla y\right \}\!\!\right \} = \nabla y\) and \(\left [\!\left [y\right ]\!\right ] = y\mathbf{n}\) where n is the outward normal unit vector on Γ.

The state equation (1) in space for fixed control u is discretized by the symmetric interior penalty method (SIPG). The convective term is discretized by the upwind method [3]. This leads to the following semi-discrete state equation

$$\displaystyle{ (\partial _{t}y_{h},v_{h}) + a_{h}^{s}(y_{ h},v_{h}) + b_{h}(u_{h},v_{h}) = (f,v_{h})\quad \forall v_{h} \in V _{h,p},\quad t \in (0,T], }$$
(6)

with the (bi-)linear forms

$$\displaystyle\begin{array}{rcl} a^{d}(y,v)& =& \sum \limits _{ K\in \mathcal{T}_{h}}\int \limits _{K}\epsilon \nabla y \cdot \nabla v\;dx \\ & -& \sum \limits _{E\in \mathcal{E}_{h}}\int \limits _{E}\big(\left \{\!\!\left \{\epsilon \nabla y\right \}\!\!\right \} \cdot \left [\!\left [v\right ]\!\right ] + \left \{\!\!\left \{\epsilon \nabla v\right \}\!\!\right \} \cdot \left [\!\left [y\right ]\!\right ] -\epsilon \overbrace{ \frac{\sigma } {h_{E}}\left [\!\left [y\right ]\!\right ] \cdot \left [\!\left [v\right ]\!\right ]\big)\;ds}^{J_{\sigma }(y,v)}{}\end{array}$$
(7)

and

$$\displaystyle\begin{array}{rcl} a_{h}^{s}(y,v)& =& a^{d}(y,v) +\sum \limits _{ K\in \mathcal{T}_{h}}\int \limits _{K}\big(\beta \cdot \nabla yv + ryv\big)\;dx \\ & +& \sum \limits _{K\in \mathcal{T}_{h}}\;\int \limits _{\partial K^{-}\setminus \varGamma ^{-}}\beta \cdot \mathbf{n}(y^{e} - y)v\;ds -\sum \limits _{ K\in \mathcal{T}_{h}}\;\int \limits _{\partial K^{-}\cap \varGamma ^{-}}\beta \cdot \mathbf{n}yv\;ds,{}\end{array}$$
(8)
$$\displaystyle\begin{array}{rcl} b_{h}(u,v)& =& -\sum \limits _{K\in \mathcal{T}_{h}}\int \limits _{K}uv\;dx.{}\end{array}$$
(9)

The penalty parameter \(\sigma > 0\) should be sufficiently large to ensure the stability of the dG discretization [26, Sect. 2.7.1] with a lower bound depending only on the polynomial degree.

Let f h , y h d and y h 0 be approximations of the source function f, the desired state function y d and the initial condition y 0, respectively. Then, the semi-discrete approximation of the OCP (2) can be defined as follows:

$$\displaystyle\begin{array}{rcl} \mathop{\text{ minimize }}\limits_{u_{h} \in L^{2}(0,T;U_{ h,p})}\int _{0}^{T}\big(\frac{1} {2}\sum \limits _{K\in \mathcal{T}_{h}}\|y_{h} - y_{h}^{d}\|_{ L^{2}(K)}^{2}& +& \frac{\alpha } {2}\sum \limits _{K\in \mathcal{T}_{h}}\|u_{h}\|_{L^{2}(K)}^{2}\big)\;dt, \\ \text{subject to }(\partial _{t}y_{h},v_{h}) + a_{h}^{s}(y_{ h},v_{h}) + b_{h}(u_{h},v_{h})& =& (f_{h},v_{h}),\ t \in (0,T],v_{h} \in V _{h,p} \\ y_{h}(x,0)& =& y_{h}^{0}. {}\end{array}$$
(10)

The semi-discrete optimality system is written as follows:

$$\displaystyle\begin{array}{rcl} & & (\partial _{t}y_{h},v_{h}) + a_{h}^{s}(y_{ h},v_{h}) + b(u_{h},v_{h}) = (f_{h},v_{h}),\;\forall v_{h} \in V _{h,p}\quad y_{h}(x,0) = y_{h}^{0}, \\ & & -(\partial _{t}p_{h},\psi _{h}) + a_{h}^{a}(p_{ h},\psi _{h}) = -(y_{h} - y_{h}^{d},\psi _{ h}),\;\forall \psi _{h} \in V _{h,p},\quad p_{h}(x,T) = 0, \\ & & \int _{0}^{T}(\alpha u_{ h} - p_{h},w_{h} - u_{h})\;dt = 0,\quad \forall w_{h} \in L^{2}(0,T;U_{ h,p}), {}\end{array}$$
(11)

where

$$\displaystyle\begin{array}{rcl} a_{h}^{a}(p,\psi )& =& \sum \limits _{ K\in \mathcal{T}_{h}}\int \limits _{K}\epsilon \nabla p \cdot \nabla \psi \;dx {}\\ & -& \sum \limits _{E\in \mathcal{E}_{h}}\int \limits _{E}\big(\left \{\!\!\left \{\epsilon \nabla p\right \}\!\!\right \} \cdot \left [\!\left [\psi \right ]\!\right ] + \left \{\!\!\left \{\epsilon \nabla \psi \right \}\!\!\right \}\cdot \left [\!\left [p\right ]\!\right ] - \frac{\sigma \epsilon } {h_{E}}\left [\!\left [p\right ]\!\right ] \cdot \left [\!\left [\psi \right ]\!\right ]\big)\;ds {}\\ & +& \sum \limits _{K\in \mathcal{T}_{h}}\int \limits _{K}\big(-\beta \cdot \nabla p\psi + rp\psi \big)\;dx {}\\ & -& \sum \limits _{K\in \mathcal{T}_{h}}\;\int \limits _{\partial K^{+}\setminus \varGamma ^{+}}\beta \cdot \mathbf{n}(p^{e} - p)\psi \;ds +\sum \limits _{ K\in \mathcal{T}_{h}}\;\int \limits _{\partial K^{+}\cap \varGamma ^{+}}\beta \cdot \mathbf{n}p\psi \;ds. {}\\ \end{array}$$

4 Time Discretization of the Optimal Control Problem

In this section, we derive the fully-discrete optimality system using \(\theta\)-method and dG time stepping method [14]. The fully discrete optimality systems are compared for optimize-then-discretize (OD) and discretize-then-optimize (DO) approaches.

4.1 Time Discretization Using θ-Method

Let \(0 = t_{0} < t_{1} < \cdots < t_{N_{T}} = T\) be a subdivision of I = (0, T) with time intervals I m  = (t m−1, t m ] and time steps k m  = t m t m−1 for m = 1, , N T and \(k =\max _{1\leq m\leq N_{T}}k_{m}\).

We start with OD approach, so we discretize the semi-discrete optimality system (11) using \(\theta\)-method as follows:

$$\displaystyle\begin{array}{rcl} & & (y_{h,m+1} - y_{h,m},v) + ka_{h}^{s}((1-\theta )y_{ h,m} +\theta y_{h,m+1},v) = \\ & & k((1-\theta )f_{h,m} +\theta f_{h,m+1}) + k((1-\theta )u_{h,m} +\theta u_{h,m+1},v),\quad m = 0,\cdots \,,N - 1, \\ & & y_{h,0}(x,0) = y_{0} \\ & & (p_{h,m} - p_{h,m+1},q) + ka_{h}^{a}(\theta p_{ h,m} + (1-\theta )p_{h,m+1},q) = \\ & & -k\left (\theta (y_{h,m} - y_{h,m}^{d},q) + (1-\theta )(y_{ h,m+1} - y_{h,m+1}^{d},q)\right ),\quad m = N - 1,\cdots \,,0, \\ & & p_{h,N} = 0, \\ & & (\alpha u_{h,m} - p_{h,m},w - u_{h,m}) = 0,\quad m = 0,1,\ldots,N. {}\end{array}$$
(12)

In DO approach, the first and the second parts of the cost functional are discretized by the rectangle rule and the trapezoidal rule, respectively, so that the value of the adjoint at the final time becomes zero as in [29]. Then, we have the following fully-discrete OCP:

$$\displaystyle\begin{array}{rcl} & & \text{ minimize }\frac{k} {2}\sum \limits _{m=0}^{N-1}(y_{ h,m} - y_{h,m}^{d})^{T}M(y_{ h,m} - y_{h,m}^{d}) {}\\ & & +\alpha \frac{k} {2}\left (\frac{1} {2}u_{h,0}^{T}Mu_{ h,0} +\sum \limits _{ m=1}^{N-1}u_{ h,m}^{T}Mu_{ h,m} + \frac{1} {2}u_{h,N}^{T}Mu_{ h,N}\right ) {}\\ & & \text{ subject to } {}\\ & & (y_{h,m+1} - y_{h,m},v) + ka_{h}^{s}((1-\theta )y_{ h,m} +\theta y_{h,m+1},v) = {}\\ & & k((1-\theta )f_{h,m} +\theta f_{h,m+1}) + k((1-\theta )u_{h,m} +\theta u_{h,m+1},v),\quad m = 0,\cdots \,,N - 1, {}\\ & & (y_{h,0},v) = (y_{0},v), {}\\ \end{array}$$

where M is the mass matrix.

Now, we construct the discrete Lagrangian

$$\displaystyle\begin{array}{rcl} & \mathcal{L}& (y_{h,1},\ldots,y_{h,N},p_{h,0},\ldots,p_{h,N},u_{h,0},\ldots,u_{h,N}) \\ & =& \frac{k} {2}\sum \limits _{m=0}^{N-1}(y_{ h,m} - y_{h,m}^{d})^{T}M(y_{ h,m} - y_{h,m}^{d}) \\ & +& \alpha \frac{k} {2}\left (\frac{1} {2}u_{h,0}^{T}Mu_{ h,0} +\sum \limits _{ m=1}^{N-1}u_{ h,m}^{T}Mu_{ h,m} + \frac{1} {2}u_{h,N}^{T}Mu_{ h,N}\right ) + (y_{h,0} - y_{0},p_{h,0}) \\ & +& \sum \limits _{m=0}^{N-1}((y_{ h,m+1} - y_{h,m},p_{h,m+1}) + ka_{h}^{s}((1-\theta )y_{ h,m} +\theta y_{h,m+1},p_{h,m+1}) \\ & -& k((1-\theta )f_{h,m} +\theta f_{h,m+1}) + k((1-\theta )u_{h,m} +\theta u_{h,m+1},p_{h,m+1})). {}\end{array}$$
(13)

By differentiating the Lagrangian (13), we derive the fully-discrete optimality system

$$\displaystyle\begin{array}{rcl} & & (y_{h,m+1} - y_{h,m},v) + ka_{h}^{s}((1-\theta )y_{ h,m} +\theta y_{h,m+1},v) = \\ & & k((1-\theta )f_{h,m} +\theta f_{h,m+1}) + k((1-\theta )u_{h,m} +\theta u_{h,m+1},v),\quad m = 0,\cdots \,,N - 1 \\ & & y_{h,0}(x,0) = y_{0} \\ & & (q,p_{h,N}) + ka_{h}^{s}(q,\theta p_{ h,N}) = 0, \\ & & (p_{h,m} - p_{h,m+1},q) + ka_{h}^{s}(q,\theta p_{ h,m} + (1-\theta )p_{h,m+1}) = \\ & & -k(y_{h,m} - y_{h,m}^{d},q),\quad m = N - 1,\ldots,1, \\ & & (q,p_{h,0} - p_{h,1}) + ka_{h}^{s}(q,(1-\theta )p_{ h,1}) = -k(y_{h,0} - y_{h,0}^{d},q), \\ & & ( \frac{\alpha } {2}u_{h,0} - (1-\theta )p_{h,1},w - u_{h,0}) = 0, \\ & & (\alpha u_{h,m} - (\theta p_{h,m} + (1-\theta )p_{h,m+1}),w - u_{h,m}) = 0,\quad m = 1,\ldots,N - 1, \\ & & ( \frac{\alpha } {2}u_{h,N} -\theta p_{h,N},w - u_{h,N}) = 0. {}\end{array}$$
(14)

In the case of backward Euler method (\(\theta = 1\)), the value u h, 0 is not needed, as we observe from (14). As we mentioned before, approximating the first integral in the cost functional by using the rectangle rule leads to p h, N  = 0, u h, N  = 0, as we see from (14). Due to SIPG, we obtain a h s(ψ δ , p δ ) = a h a(p δ , ψ δ ) [36]. Therefore, variational formulations (12) and (14) are the same.

In the case of Crank-Nicolson method (\(\theta = 1/2\)), we observe that some differences occur in the adjoint equation. In (12), the right-hand side of the adjoint equation is evaluated at two successive points, while it is evaluated at just one point in (14). Additional differences are seen in the variational inequalities (12) and (14), too. Thus, OD and DO approaches lead to different weak forms. Several variants of Crank-Nicolson method are used for optimal control of heat equation in [2]. For DO approach, the cost functional is discretized by using the midpoint rule. On the other hand, for OD approach, the semi-discrete state equation is discretized by using the midpoint rule and a variant of the trapezoidal rule is applied to the semi-discrete adjoint equation to obtain the fully-discrete optimality system. Then, the resulting optimality systems commute.

4.2 Discontinuous Galerkin Time Discretization

We define the space-time finite element space of piecewise discontinuous functions for test function, state and control as

$$\displaystyle\begin{array}{rcl} V _{h,p}^{k,q} = Y _{ h,p}^{k,q} = U_{ h,p}^{k,q}& =& \left \{v \in L^{2}(0,T;L^{2}(\varOmega ))\,:\ v\vert _{ I_{m}}\right. {}\\ & =& \left.\sum _{s=0}^{q}t^{s}\phi _{ s},t \in I_{m},\phi _{s} \in V _{h,p},m = 1,\ldots,N\right \}. {}\\ \end{array}$$

We define the temporal jump of v ∈ V h, p k, q as [v] m  = v + mv m, where \(w_{\pm }^{m} =\lim \limits _{\varepsilon \rightarrow 0\pm }v(t_{m}+\varepsilon )\).

Let f δ and y δ d be approximations of the source function f and the desired state function y d on each interval I m . Then, the fully-discrete OCP is written as

$$\displaystyle\begin{array}{rcl} \mathop{\text{ minimize }}\limits_{u_{\delta } \in U_{h,p}^{k,q}}\int _{ 0}^{T}\big(\frac{1} {2}\sum \limits _{K\in \mathcal{T}_{h}}& & \|y_{\delta } - y_{\delta }^{d}\|_{ L^{2}(K)}^{2} + \frac{\alpha } {2}\sum \limits _{K\in \mathcal{T}_{h}}\|u_{\delta }\|_{L^{2}(K)}^{2}\big)\;dt, \\ \text{ subject to }\sum \limits _{m=1}^{N_{T} }\int _{I_{m}}(\partial _{t}y_{\delta },v_{\delta })\;dt& +& \int _{0}^{T}a_{ h}^{s}(y_{\delta },v_{\delta })\;dt +\sum \limits _{ m=1}^{N_{T} }([y_{\delta }]_{m-1},v_{\delta,+}^{m-1}), \\ & =& \int _{0}^{T}(f_{\delta } + u_{\delta },v_{\delta })\;dt,\;\forall v_{\delta } \in V _{ h,p}^{k,q},\quad y_{\delta,-}^{0} = (y_{ 0})_{\delta }.{}\end{array}$$
(15)

The OCP (15) has a unique solution (y δ , u δ ) and that pair (y δ , u δ ) ∈ V h, p k, q × U h, p k, q is the solution of (15) if and only if there is an adjoint p δ  ∈ V h, p k, q such that (y δ , u δ , p δ ) ∈ V h, p k, q × U h, p k, q × V h, p k, q is the unique solution of the fully-discrete optimality system

$$\displaystyle\begin{array}{rcl} \sum \limits _{m=1}^{N_{T} }\int _{I_{m}}(\partial _{t}y_{\delta },v_{\delta })\;dt& +& \int _{0}^{T}a_{ h}^{s}(y_{\delta },v_{\delta })\;dt +\sum \limits _{ m=1}^{N_{T} }([y_{\delta }]_{m-1},v_{\delta,+}^{m-1}) \\ & =& \int _{0}^{T}(f_{\delta } + u_{\delta },v_{\delta })\;dt,\forall v_{\delta } \in V _{ h,p}^{k,q}, \\ y_{\delta,-}^{0}& =& (y_{ 0})_{\delta }, \\ \sum \limits _{m=1}^{N_{T} }\int _{I_{m}}(-\partial _{t}p_{\delta },\psi _{\delta })\;dt& +& \int _{0}^{T}a_{ h}^{a}(p_{\delta },\psi _{\delta })\;dt -\sum \limits _{ m=1}^{N_{T} }([p_{\delta }]_{m},\psi _{\delta,-}^{m}) \\ & =& -\int _{0}^{T}(y_{\delta } - y_{\delta }^{d},\psi _{\delta })\;dt,\forall \psi _{\delta }\in V _{ h,p}^{k,q}, \\ p_{\delta,+}^{N}& =& 0, \\ \int _{0}^{T}(\alpha u_{\delta } - p_{\delta },w_{\delta } - u_{\delta })\;dt& =& 0\quad \forall w_{\delta } \in U_{ h,p}^{k,q}. {}\end{array}$$
(16)

In DO approach, firstly, we construct the discrete Lagrangian

$$\displaystyle\begin{array}{rcl} \mathcal{L}(y_{\delta },u_{\delta },p_{\delta })& =& \frac{1} {2}\int _{0}^{T}\left (\big(\sum \limits _{ K\in \mathcal{T}_{h}}\|y_{\delta } - y_{\delta }^{d}\|_{ L^{2}(K)}^{2} +\alpha \sum \limits _{ K\in \mathcal{T}_{h}}\|u_{\delta }\|_{L^{2}(K)}^{2}\big)\right )\;dt {}\\ & +& \sum _{m=1}^{N_{T} }\big(\int _{I_{m}}\left ((\partial _{t}y_{\delta },p_{\delta }) + a_{h}^{s}(y_{\delta },p_{\delta })\right )\;dt + ([y_{\delta }]_{ m-1},p_{\delta,+}^{m-1})\big) {}\\ & -& \sum _{m=1}^{N_{T} }\int _{I_{m}}(f_{\delta } + u_{\delta },p_{\delta })\;dt\big) + ((y_{0})_{\delta } - y_{\delta,-}^{0},p_{\delta,-}^{0}). {}\\ \end{array}$$

Differentiating \(\mathcal{L}\) with respect to y δ and applying integration by parts, we obtain

$$\displaystyle\begin{array}{rcl} \sum _{m=1}^{N_{T} }\int \limits _{I_{m}}\big(\psi _{\delta },-\partial _{t}p_{\delta })& +& a_{h}^{s}(\psi _{\delta },p_{\delta })\big)\;dt +\sum \limits _{ m=1}^{N_{T}-1}(\psi _{\delta,-}^{m},-[p_{\delta }]_{ m}) + (q_{\delta,-}^{N_{T} },p_{\delta,-}^{N_{T} }) \\ & =& -\sum _{m=1}^{N_{T} }\int _{I_{m}}(y_{\delta } - y_{\delta }^{d},\psi _{\delta })\;dt,\quad \forall \psi _{\delta }\in V _{ h,p}^{k,q}. {}\end{array}$$
(17)

Now, we add and subtract \((\psi _{\delta,N_{T}}^{-},p_{\delta,N_{T}}^{+})\) to (17) and obtain

$$\displaystyle\begin{array}{rcl} \sum _{m=1}^{N_{T} }\int _{I_{m}}\big(-(\partial _{t}p_{\delta },\psi _{\delta })& +& a_{h}^{s}(\psi _{\delta },p_{\delta })\big)\;dt -\sum \limits _{ m=1}^{N_{T} }([p_{\delta }]_{m},\psi _{\delta,-}^{m}) + (\psi _{\delta,-}^{N_{T} },p_{\delta,+}^{N_{T} }) \\ & =& -\sum _{m=1}^{N_{T} }\int _{I_{m}}(y_{\delta } - y_{\delta }^{d},\psi _{\delta })\;dt,\quad \forall \psi _{\delta }\in V _{ h,p}^{k,q}. {}\end{array}$$
(18)

On each subinterval I m , the adjoint equation reads as

$$\displaystyle{\int _{I_{m}}\left (-(\partial _{t}p_{\delta },\psi _{\delta }) + a_{h}^{s}(\psi _{\delta },p_{\delta })\right )\;dt - ([p_{\delta }]_{ m},\psi _{\delta,-}^{m}) = -\int _{ I_{m}}(y_{\delta } - y_{\delta }^{d},\psi _{\delta })\;dt.}$$

However, \((q_{\delta,-}^{N_{T}},p_{\delta,+}^{N_{T}})\) does not match the right-hand side of (18), so it is set to zero, i.e. p δ, + N = 0. Now, we use a h s(ψ δ , p δ ) = a h a(p δ , ψ δ ). Thus, we arrive at (16). We note that OD and DO approaches lead to the same optimality conditions, which can be observed by differentiating the discrete Lagrangian with respect to u δ . Therefore, both approaches commute.

5 Error Estimates

In this section, firstly, we give the norms used in the analysis and mention some estimates in the literature. Secondly, the discrete characteristic function which enables us to provide error estimates at arbitrary time points is explained. Then, we prove some useful lemmas and state the main estimate of this study.

We introduce the L 2 inner product on the inflow or outflow boundaries as follows

$$\displaystyle{(w,v)_{\varGamma ^{-}} =\int _{\varGamma _{-}}\vert \boldsymbol{\beta }\cdot n\vert wv\;ds}$$

with analogous definition of \((\cdot,\cdot )_{\varGamma ^{+}}\) and associated norms \(\|\cdot \|_{\varGamma ^{-}}\) and \(\|\cdot \|_{\varGamma ^{+}}\).

The broken Sobolev space is defined as

$$\displaystyle{H^{k}(\varOmega,\mathcal{T}_{ h}) = \left \{v\,:\ v\mid _{K} \in H^{k}(K)\quad \forall K \in \mathcal{T}_{ h}\right \},}$$

with the semi-norm defined by

$$\displaystyle{\vert v\vert _{H^{k}(\varOmega,\mathcal{T}_{h})} = \left (\sum \limits _{K\in \mathcal{T}_{h}}\vert v\vert _{H^{k}(K)}^{2}\right )^{1/2},\quad v \in H^{k}(\varOmega,\mathcal{T}_{ h}).}$$

The Bochner space of functions whose kth time derivative is bounded almost everywhere on (0, T) with values in X is denoted by \(W^{k,\infty }(0,T;X)\). We use the dG energy norm in [33, Sect. 4]

$$\displaystyle{ \vert \vert \vert v\vert \vert \vert _{DG}^{2} = \vert v\vert _{ H^{1}(\varOmega,\mathcal{T}_{h})}^{2} + J_{\sigma }(v,v). }$$
(19)

We give the multiplicative trace inequality for all \(K \in \mathcal{T}_{h}\), for all v ∈ H 1(K) as follows:

$$\displaystyle{ \|v\|_{L^{2}(\partial K)}^{2} \leq C_{ M}\left (\|v\|_{L^{2}(K)}\vert v\vert _{H^{1}(K)} + h_{K}^{-1}\|v\|_{ L^{2}(K)}^{2}\right ), }$$
(20)

where C M is a positive constant independent of v, h and K. We refer the reader to the study [12, Lemma 3.1] for the proof.

In addition, the generalization of Poincaré inequality to the broken Sobolev space \(H^{1}(\varOmega,\mathcal{T}_{h})\) is given as [26, Sect. 3.1.4]

$$\displaystyle{ \|v\|_{L^{2}(\varOmega )}^{2} \leq C_{ S}\left (\vert v\vert _{H^{1}(\varOmega,\mathcal{T}_{h})}^{2} +\sum \limits _{ E\in \mathcal{E}_{h}} \frac{1} {h_{E}}\|\left [\!\left [y\right ]\!\right ]\|_{L^{2}(E)}^{2}\right ). }$$
(21)

We proceed with the standard estimates derived for finite element methods [9]. Consider the L 2-projection \(\varPi _{h}: L^{2}(\varOmega ) \rightarrow V _{h,p}\) so that

$$\displaystyle\begin{array}{rcl} \|\varPi _{h}v - v\|_{L^{2}(K)} \leq C_{\varPi }h^{p+1}\vert v\vert _{ H^{p+1}(K)},\quad \vert \varPi _{h}v - v\vert _{H^{1}(K)} \leq C_{\varPi }h^{p}\vert v\vert _{ H^{p+1}(K)},& &{}\end{array}$$
(22)

for all v ∈ H p+1(K), \(K \in \mathcal{T}_{h}\) where C Π is a positive constant and independent of v and h. In addition, as suggested in [33, Sect. 4], using the study [13], the following estimate holds for all \(v \in H^{p+1}(\varOmega,\mathcal{T}_{h})\)

$$\displaystyle{ \vert \vert \vert \varPi _{h}v - v\vert \vert \vert _{DG} \leq (2C_{M} + 1)C_{\varPi }h^{p}\vert v\vert _{ H^{p+1}(\varOmega,\mathcal{T}_{h})}, }$$
(23)

where C M and C Π are positive constants from (20) and (22), respectively. In the following we introduce the parabolic projection for m = 0, , N T and mention the properties given in [33]. Suppose that \(X \subset L^{2}(\varOmega )\) is a Hilbert space. Let us denote the space of polynomial functions depending on time as follows:

$$\displaystyle{ P^{\alpha }(I_{m},X) = \left \{v \in L^{2}(0,T;L^{2}(\varOmega ))\,:\ v =\sum _{ s=0}^{\alpha }t^{s}\phi _{ s,m},t \in I_{m},\phi _{s,m} \in X\right \}. }$$

A space-time projection π of y ∈ C(0, T; H 1(Ω)) into V h, p k, q is employed for the convergence estimates. Time projection P of y ∈ C(0, T; H 1(Ω)) is defined as

$$\displaystyle\begin{array}{rcl} & & Py \in \left \{v \in L^{2}(Q_{ T})\,:\ v\vert _{I_{m}} \in P^{q}(I_{ m},L^{2}(\varOmega ))\right \}, {}\\ & & \int _{I_{m}}(Py - y,t^{j}v)\;dt = 0,\quad \forall v \in L^{2}(\varOmega ),j = 0,\ldots,q - 1, {}\\ & & (Py)_{-}^{m} = y(t^{m}). {}\\ \end{array}$$

In addition, for m = 0, , N T , with y ∈ C(0, T; H 1(Ω)), π y ∈ V h, p k, q is defined as

$$\displaystyle\begin{array}{rcl} & & \pi y =\varPi _{h}(Py)\Longleftrightarrow\left ((\pi y)(t),v\right ) = \left ((Py)(t),v\right ),\quad \forall v \in V _{h,p},\forall t \in I_{m}, \\ & & \int _{I_{m}}(\pi y - y,v)\;dt =\int _{I_{m}}((Py,v) - (y,v))\;dt = 0,\quad \forall v \in V _{h,p}^{k,q-1}, \\ & & ((\pi y)_{-}^{m} - y(t^{m}),v) = (((Py)_{ -}^{m},v) - (y(t^{m}),v)) = 0,\quad \forall v \in V _{ h,p}.{}\end{array}$$
(24)

We note that the definition of the projection π is likewise in the study [28].

We give some estimates from [33, Lemmas 4.3, 4.5], which we need in the proofs.

Lemma 1

Suppose that \(y \in W^{q+1,\infty }(I_{m},H^{1}(\varOmega ))\) such that y = 0 on ∂Ω. Then,

$$\displaystyle\begin{array}{rcl} \|y(t) - Py(t)\|& \leq & C_{P}k_{m}^{q+1}\vert y\vert _{ W^{q+1,\infty }(I_{m},L^{2}(\varOmega ))}\quad \forall t \in I_{m}, \\ \vert y(t) - Py(t)\vert _{H^{1}(\varOmega )}& \leq & C_{P}k_{m}^{q+1}\vert y\vert _{ W^{q+1,\infty }(I_{m},H^{1}(\varOmega ))}\quad \forall t \in I_{m}, \\ \vert \vert \vert y(t) - Py(t)\vert \vert \vert _{DG}& \leq & C_{P}k_{m}^{q+1}\vert y\vert _{ W^{q+1,\infty }(I_{m},H^{1}(\varOmega ))}\quad \forall t \in I_{m}.{}\end{array}$$
(25)

Lemma 2

Suppose that \(y \in W^{q+1,\infty }(I_{m},H^{1}(\varOmega )) \cap L^{\infty }(I_{m},H^{p+1}(\varOmega ))\) such that y = 0 on ∂Ω. Then,

$$\displaystyle\begin{array}{rcl} \|y(t) -\pi y(t)\|& \leq & C_{\pi }(h^{p+1} + k_{ m}^{q+1})\|y\|_{ R}\quad \forall t \in I_{m}, \\ \vert \vert \vert y(t) -\pi y(t)\vert \vert \vert _{DG}& \leq & C_{\pi }(h^{p} + k_{ m}^{q+1})\|y\|_{ R}\quad \forall t \in I_{m},{}\end{array}$$
(26)

where \(\|y\|_{R} =\max (\vert y\vert _{W^{q+1,\infty }(I_{m},H^{1}(\varOmega ))},\vert y\vert _{L^{\infty }(I_{m},H^{p+1}(\varOmega ))})\) and C π is a positive constant independent of h,k m ,m and y.

Lemma 3

There exists a positive constant C A which is independent of h,v h ,w h ,ε such that

$$\displaystyle\begin{array}{rcl} a^{d}(y(t) -\varPi _{ h}y(t),v_{h})& \leq & C_{A}\epsilon h^{p}\|y(t)\|_{ H^{p+1}(\varOmega )}\vert \vert \vert v_{h}\vert \vert \vert _{DG} \\ \quad \text{a.e. }t& \in & (0,T),y \in L^{2}(0,T;H^{p+1}(\varOmega )),v_{ h} \in V _{h,p}.{}\end{array}$$
(27)

Proof

The proof in [11, Lemma 3.8] is adopted to the bilinear form (7) using the estimate (23). ⊓⊔

Remark 1

A similar estimate for the bilinear form arising from the nonsymmetric interior penalty Galerkin method can be found in [33, Lemma 4.2].

Lemma 4

The bilinear form a d (⋅,⋅) satisfies the coercivity inequality

$$\displaystyle\begin{array}{rcl} a^{d}(v_{ h},v_{h}) \geq \frac{\epsilon } {2}\vert \vert \vert v_{h}\vert \vert \vert _{DG}^{2},\quad \forall v_{ h} \in V _{h,p}.& &{}\end{array}$$
(28)

Proof

The proof in [11, Corollary 3.10] is adopted to the bilinear form (7) using the norm (19). ⊓⊔

5.1 Discrete Characteristic Function

We use the discrete characteristic function in order to provide error estimates at arbitrary time points as suggested in [8]. We can work on [0, k) instead of I m , since the construction of the discrete characteristic function is invariant under translation. We consider polynomials \(s \in \mathcal{P}_{q}(0,k)\) and the discrete approximation of χ [0, t) s of s which is a polynomial

$$\displaystyle{\tilde{s} \in \left \{\tilde{s} \in \mathcal{P}_{q}(0,k)\,:\ \tilde{s}(0) = s(0)\right \}\text{ such that }\int _{0}^{k}\tilde{s}z =\int _{ 0}^{t}sz,\quad \forall z \in \mathcal{P}_{ q-1}(0,k).}$$

This definition can be extended from \(\mathcal{P}_{q}(0,k)\) to V h, p k, q. The discrete approximation of χ [0, t) v for v ∈ V h, p k, q is written as \(\tilde{v} =\sum _{ i=0}^{q}\tilde{s}_{i}(t)v_{i}\). On account of these inequalities, the following estimate is given in [33]

$$\displaystyle{ \int _{I_{m}}\vert \vert \vert \tilde{w}\vert \vert \vert _{DG}^{2}\;dt \leq C_{ D}\int _{I_{m}}\vert \vert \vert w\vert \vert \vert _{DG}^{2}\;dt,\quad C_{ D} = C_{D}(q). }$$
(29)

We mention that a suitable discrete approximation \(\chi _{(t,t^{n}]}v_{h}\) must be constructed for the adjoint problem, as it is noted in the proof of [7, Theorem 3.8]. The discrete approximation of \(\chi _{ (t,t^{\,N_{T}}]}s\) is a polynomial

$$\displaystyle{\tilde{s} \in \{\tilde{ s} \in \mathcal{P}_{q}(t^{\,N_{T} -1},t^{\,N_{T} }):\tilde{ s}(t^{\,N_{T} }) = s(t^{\,N_{T} })\}\text{ such that }\int _{t^{\,N_{T}-1}}^{t^{\,N_{T}} }\tilde{s}z =\int _{ t}^{t^{\,N_{T}} }sz,}$$

\(\forall z \in \mathcal{P}_{q-1}(t^{\,N_{T}-1},t^{\,N_{T}})\). This definition can be extended from \(\mathcal{P}_{q}(t^{\,N_{T}-1},t^{\,N_{T}})\) to V h, p k, q and the estimates above can be modified for the adjoint [7, Theorem 3.8].

5.2 A Priori Error Estimates

We proceed with the derivation of the convergence estimates for the optimality system and its space-time dG approximation. We define the auxiliary state and adjoint equation which are needed for a priori error analysis

$$\displaystyle\begin{array}{rcl} \sum \limits _{m=1}^{N_{T} }\int _{I_{m}}(\partial _{t}y_{\delta }^{u},v_{\delta })\;dt& +& \int _{ 0}^{T}a_{ h}^{s}(y_{\delta }^{u},v_{\delta })\;dt +\sum \limits _{ m=1}^{N_{T} }([y_{\delta }^{u}]_{ m-1},v_{\delta,+}^{m-1}) \\ & =& \int _{0}^{T}(f_{\delta } + u,v_{\delta })\;dt, \\ y_{\delta,-}^{u,0}& =& (y_{ 0})_{\delta }, \\ \sum \limits _{m=1}^{N_{T} }\int _{I_{m}}(-\partial _{t}p_{\delta }^{u},\psi _{\delta })\;dt& +& \int _{ 0}^{T}a_{ h}^{a}(p_{\delta }^{u},\psi _{\delta })\;dt -\sum \limits _{ m=1}^{N_{T} }([p_{\delta }^{u}]_{ m},\psi _{\delta,-}^{m}) \\ & =& -\int _{0}^{T}(y_{\delta }^{u} - y_{\delta }^{d},\psi _{\delta })\;dt, \\ p_{\delta,+}^{u,N}& =& 0. {}\end{array}$$
(30)

Following [15], we assume that the reaction term satisfies | r | ≤ C r a.e. in Ω; the velocity field is bounded by a constant \(C_{\boldsymbol{\beta }}\) a.e. in Ω.

We prove some useful lemmas before stating the main theorem of this study.

Lemma 5

Let (y δ ,p δ ) and (y δ u ,p δ u ) be the solutions of  (16) and  (30) , respectively. Then, there exists a constant C independent of h and k such that

$$\displaystyle{ \sup _{t\in I_{n}}\|y_{\delta }^{u}(t) - y_{\delta }(t)\| +\sup _{ t\in I_{n}}\|p_{\delta }^{u}(t) - p_{\delta }(t)\| \leq C\int _{ 0}^{t_{n} }\|u - u_{\delta }\|\;dt. }$$
(31)

Proof

Firstly, we study the fully discrete state equation on each subinterval I m . We subtract (16) from (30) to obtain

$$\displaystyle{ \int _{I_{m}}(\partial _{t}\theta,v_{\delta })\;dt + ([\theta ]_{m-1},v_{\delta,+}^{m-1}) +\int _{ I_{m}}a_{h}^{s}(\theta,v_{\delta })\;dt =\int _{ I_{m}}(u - u_{\delta },v_{\delta })\;dt, }$$
(32)

where \(\theta = y_{\delta }^{u} - y_{\delta }\). We substitute \(v_{\delta } = 2\theta\) in (32). Then,

$$\displaystyle{ \int _{I_{m}}2(\partial _{t}\theta,\theta )\;dt + 2([\theta ]_{m-1},\theta _{+}^{m-1}) =\|\theta _{ -}^{m}\|^{2} -\|\theta _{ -}^{m-1}\|^{2} +\| [\theta ]_{ m-1}\|^{2}, }$$
(33)

is achieved. For the right-hand side, we employ Cauchy-Schwarz, Young inequalities, Poincaré inequality (21) and the definition of dG norm (19). For the left-hand side, we use (28) for diffusion term and follow the technique in (see [15, Theorem 5.1]) for convection and reaction terms. Then, we derive the following estimate in the middle of (34)

$$\displaystyle\begin{array}{rcl} \|\theta _{-}^{m}\|^{2}& -& \|\theta _{ -}^{m-1}\|^{2} + \frac{\epsilon } {2}\int _{I_{m}}\vert \vert \vert \theta \vert \vert \vert _{DG}^{2}\;dt + 2C_{ 0}\int _{I_{m}}\|\theta \|^{2}\;dt \\ & +& \frac{\epsilon } {2}\int _{I_{m}}\left (\sum \limits _{K\in \mathcal{T}_{h}}\left (\|\theta \|_{\partial K^{-}\cap \varGamma ^{-}}^{2} +\| \left [\!\left [\theta \right ]\!\right ]\|_{ \partial K^{-}\setminus \varGamma ^{-}}^{2} +\|\theta \|_{ \partial K^{+}\cap \varGamma ^{+}}^{2}\right )\right )\;dt \\ \leq \|\theta _{-}^{m}\|^{2}& -& \|\theta _{ -}^{m-1}\|^{2} + \frac{\epsilon } {2}\int _{I_{m}}\vert \vert \vert \theta \vert \vert \vert _{DG}^{2}\;dt + 2C_{ 0}\int _{I_{m}}\|\theta \|^{2}\;dt \\ & +& \int _{I_{m}}\left (\sum \limits _{K\in \mathcal{T}_{h}}\left (\|\theta \|_{\partial K^{-}\cap \varGamma ^{-}}^{2} +\| \left [\!\left [\theta \right ]\!\right ]\|_{ \partial K^{-}\setminus \varGamma ^{-}}^{2} +\|\theta \|_{ \partial K^{+}\cap \varGamma ^{+}}^{2}\right )\right )\;dt \\ & \leq & C\int _{I_{m}}\|u - u_{\delta }\|^{2}\;dt. {}\end{array}$$
(34)

We note that the lower bound on the left-hand side of (34) has been added after deriving the estimate in the middle for the clearance of the proof and will be used later. Now, we proceed by substituting \(v_{\delta } = 2\tilde{\theta }\) into (32). We employ the discrete characteristic function as in the proof of [33, Theorem 5.2] to obtain an estimate at arbitrary points and use the properties given there. With \(z =\arg \sup _{\bar{I}_{m}}\|\theta (t)\|\), the discrete characteristic function defined in Sect. 5.1 leads to

$$\displaystyle\begin{array}{rcl} & & \int _{I_{m}}(\partial _{t}\theta,\tilde{\theta })\;dt =\int _{ t_{m-1}}^{z}(\partial _{ t}\theta,\theta )\;dt,\quad \tilde{\theta }_{+}^{m-1} =\theta _{ +}^{m-1},\quad [\tilde{\theta }]_{ m-1} = [\theta ]_{m-1},{}\end{array}$$
(35)
$$\displaystyle\begin{array}{rcl} & & \int _{I_{m}}2(\partial _{t}\theta,\tilde{\theta })\;dt + 2([\theta ]_{m-1},\tilde{\theta }_{+}^{m-1}) =\|\theta (z)\|^{2} -\|\theta _{ -}^{m-1}\|^{2} +\| [\theta ]_{ m-1}\|^{2}..{}\end{array}$$
(36)

We use (35) and (36) and the inequality \(\|\theta _{-}^{m-1}\| \leq \sup _{t\in I_{m-1}}\|\theta (t)\|\) to bound the terms arising in the time derivative. We proceed by moving \(2\int _{I_{m}}a_{h}(\theta,\tilde{\theta })\;dt\) to the right-hand side. We employ (27) for the diffusion term, the proof of [15, Theorem 5.1] for the convection term. The reaction term and the control on the right-hand side is bounded by using Cauchy-Schwarz and Young inequalities (21) and (19) such that \(\|\cdot \|^{2} \leq C\vert \vert \vert \cdot \vert \vert \vert _{DG}^{2}\) is satisfied for a positive constant C. We eliminate the term \(\vert \vert \vert \tilde{\theta }\vert \vert \vert _{DG}^{2}\) on the right-hand side by using (29). Then, we obtain the following inequality

$$\displaystyle\begin{array}{rcl} & & \sup _{t\in I_{m}}\|\theta (t)\|^{2} -\sup _{ t\in I_{m-1}}\|\theta (t)\|^{2} \\ & & \leq C_{b}\int _{I_{m}}\vert \vert \vert \theta \vert \vert \vert _{DG}^{2}\;dt +\int _{ I_{m}}\sum \limits _{K\in \mathcal{T}_{h}}\left (\|\theta \|_{\partial K^{+}\cap \varGamma ^{+}}^{2} +\| \left [\!\left [\theta \right ]\!\right ]\|_{ \partial K^{-}\setminus \varGamma ^{-}}^{2}\right )\;dt \\ & & \quad + C\int _{I_{m}}\|u - u_{\delta }\|^{2}\;dt \\ & & \leq C_{b}^{{\prime}}\int _{ I_{m}}\left (\vert \vert \vert \theta \vert \vert \vert _{DG}^{2} +\sum \limits _{ K\in \mathcal{T}_{h}}\left (\|\theta \|_{\partial K^{+}\cap \varGamma ^{+}}^{2} +\| \left [\!\left [\theta \right ]\!\right ]\|_{ \partial K^{-}\setminus \varGamma ^{-}}^{2}\right )\right )\;dt \\ & & \quad + C\int _{I_{m}}\|u - u_{\delta }\|^{2}\;dt, {}\end{array}$$
(37)

where \(C_{b} = C(1 + C_{D})(\epsilon C_{A} + C_{S}(C_{r} + C_{\boldsymbol{\beta }})),C_{b}^{{\prime}} =\max \{ 1,C_{b}\}\). In order to eliminate the terms \(\theta\) on the right-hand side of (37), we use (34) multiplying it by \(C_{b}^{{\prime\prime}} = \frac{2} {\epsilon } C_{b}^{{\prime}}\). By adding these inequalities and denoting \(\varTheta _{m}\ =\ \sup _{t\in I_{m}}\|\theta (t)\|^{2} + C_{b}^{{\prime\prime}}\|\theta _{-}^{m}\|^{2}\), we arrive at

$$\displaystyle{ \varTheta _{m} -\varTheta _{m-1} \leq C(1 + C_{b}^{{\prime\prime}})\int _{ I_{m}}\|u - u_{\delta }\|^{2}\;dt. }$$
(38)

We sum (38) over m = 1, , n ≤ N T and use \(\theta = 0\) at t = 0 to derive the estimate

$$\displaystyle{ \sup _{t\in I_{n}}\|\theta (t)\|^{2} =\sup _{ t\in I_{n}}\|y_{\delta }^{u}(t) - y_{\delta }(t)\|^{2} \leq C\int _{ 0}^{t_{n} }\|u - u_{\delta }\|^{2}\;dt. }$$
(39)

Secondly, we proceed with the adjoint equation subtracting (16) from (30) and using ζ = p δ up δ . A discrete approximation to \(\chi _{(t,t_{m}]}v_{h}\) specified for the adjoint problem must be used, as we discussed in Sect. 5.1. Then, this leads to

$$\displaystyle{ \int _{I_{m}}2(-\partial _{t}\zeta,\tilde{\zeta })\;dt - 2([\zeta ]_{m},\tilde{\zeta }_{-}^{m}) =\|\zeta (z)\|^{2} -\|\zeta ^{m}\|^{2} +\| [\zeta ]_{ m}\|^{2}, }$$
(40)

where \(z =\arg \sup _{\bar{I}_{m}}\|\zeta (t)\|\). In addition, the inequalities \(\|\zeta ^{m}\|^{2} \leq \sup _{I_{N_{ T}-m+2}}\|\zeta (t)\|^{2}\) and \(\|\zeta (z)\|^{2} =\sup _{I_{N_{ T}-m+1}}\|\zeta (t)\|^{2}\) are needed. Then, we follow the same idea used to derive (39) to reach the inequality

$$\displaystyle{ \sup _{t\in I_{N_{ T}-m+1}}\|\zeta (t)\|^{2} -\sup _{ t\in I_{N_{T}-m+2}}\|\zeta (t)\|^{2} \leq Ck_{ m}\int _{t\in I_{m}}\|u - u_{\delta }\|^{2}\;dt. }$$
(41)

We sum (41) over m = N T , , n ≥ 1 and use ζ = 0 at \(t = t_{N_{T}}\). The final result (31) follows from standard algebra, (39) and (41). ⊓ ⊔

We proceed with the estimate between the exact and the approximate control.

Lemma 6

Let (y,p,u) and (y δ ,p δ ,u δ ) be the solutions of  (2) and  (16) , respectively. Then, we have

$$\displaystyle{ \|u - u_{\delta }\|_{L^{2}(0,T;L^{2}(\varOmega ))} \leq \frac{1} {\alpha } \|p - p_{\delta }^{u}\|_{ L^{2}(0,T;L^{2}(\varOmega ))}. }$$
(42)

Proof

We apply the technique used for the steady-state optimal control problem in [21, Sect. 4.2]. We start using the continuous and the fully-discrete optimality conditions (2)–(16) to obtain the following equation

$$\displaystyle\begin{array}{rcl} & & \alpha \|u - u_{\delta }\|_{L^{2}(0,T;L^{2}(\varOmega ))}^{2} =\alpha \int \limits _{ 0}^{T}(u - u_{\delta },u - u_{\delta })\;dt \\ & =& \int \limits _{0}^{T}(\alpha u - p,u - u_{\delta })\;dt -\int \limits _{ 0}^{T}(\alpha u_{\delta } - p_{\delta },u - u_{\delta })\;dt +\int \limits _{ 0}^{T}(p - p_{\delta },u - u_{\delta })\;dt \\ & =& \int \limits _{0}^{T}(p - p_{\delta }^{u},u - u_{\delta })\;dt +\int \limits _{ 0}^{T}(p_{\delta }^{u} - p_{\delta },u - u_{\delta })\;dt = J_{ 1} + J_{2}. {}\end{array}$$
(43)

We use Cauchy-Schwarz and Young inequalities to show that

$$\displaystyle{ 0 \leq J_{1} \leq \frac{1} {2\alpha }\|p - p_{\delta }^{u}\|_{ L^{2}(0,T;L^{2}(\varOmega ))}^{2} + \frac{\alpha } {2}\|u - u_{\delta }\|_{L^{2}(0,T;L^{2}(\varOmega ))}^{2}. }$$
(44)

We proceed with J 2 and use the auxiliary state equation (30) to obtain

$$\displaystyle\begin{array}{rcl} J_{2}& =& \int \limits _{0}^{T}(p_{\delta }^{u} - p_{\delta },u - u_{\delta })\;dt {}\\ & =& \sum _{m=1}^{N_{T} }\int _{I_{m}}(\partial _{t}(y_{\delta }^{u} - y_{\delta }),p_{\delta }^{u} - p_{\delta })\;dt +\int \limits _{ 0}^{T}a_{ h}^{s}(y_{\delta }^{u} - y_{\delta },p_{\delta }^{u} - p_{\delta })\;dt {}\\ & +& \sum \limits _{m=1}^{N}\big([y_{\delta }^{u} - y_{\delta }]_{ m-1},(p_{\delta }^{u} - p_{\delta })_{ +}^{m-1}\big). {}\\ \end{array}$$

We proceed applying integration by parts in time and use the auxiliary adjoint equation (30) to arrive at

$$\displaystyle\begin{array}{rcl} J_{2}& =& -\sum _{m=1}^{N_{T} }\int _{I_{m}}\big(p_{\delta }^{u} - p_{\delta },\partial _{ t}(y_{\delta }^{u} - y_{\delta })\big)\;dt +\sum \limits _{ m=1}^{N}\big(y_{\delta }^{u} - y_{\delta },p_{\delta }^{u} - p_{\delta }\big)\vert _{ t_{m-1}}^{t_{m} } \\ & +& \int \limits _{0}^{T}a_{ h}^{s}(y_{\delta }^{u} - y_{\delta },p_{\delta }^{u} - p_{\delta })\;dt +\sum \limits _{ m=1}^{N}\big([y_{\delta }^{u} - y_{\delta }]_{ m-1},(p_{\delta }^{u} - p_{\delta })_{ +}^{m-1}\big) \\ & =& -\sum _{m=1}^{N_{T} }\int _{I_{m}}\big(p_{\delta }^{u} - p_{\delta },\partial _{ t}(y_{\delta }^{u} - y_{\delta })\big)\;dt +\int \limits _{ 0}^{T}a_{ h}^{s}(y_{\delta }^{u} - y_{\delta },p_{\delta }^{u} - p_{\delta })\;dt \\ & -& \sum \limits _{m=1}^{N}\big((y_{\delta }^{u} - y_{\delta })_{ -}^{m},[p_{\delta }^{u} - p_{\delta }]_{ m}\big) \\ & =& -\int \limits _{0}^{T}\big(y_{\delta }^{u} - y_{\delta },y_{\delta }^{u} - y_{\delta }\big)\;dt \leq 0. {}\end{array}$$
(45)

Then, using (43)–(45), we derive the final result (42). ⊓ ⊔

Lemma 7

Let (y,p) and (y δ u ,p δ u ) be the solutions of  (2) and  (30) , respectively. Assume that \(y,p \in W^{q+1,\infty }(0,T;H^{1}(\varOmega )) \cap L^{\infty }(0,T;H^{p+1}(\varOmega ))\) . Then, there exists a constant C independent of h and k such that

$$\displaystyle{ \sup _{t\in I_{n}}\|y - y_{\delta }^{u}\| +\sup _{ t\in I_{n}}\|p - p_{\delta }^{u}\| \leq \mathcal{O}(h^{p} + k^{q+1}). }$$
(46)

Proof

Firstly, we integrate (2) over I m and subtract the result from (30) in order to obtain the following equation

$$\displaystyle\begin{array}{rcl} & & \int _{I_{m}}(\partial _{t}\xi,v_{\delta })\;dt + ([\xi ]_{m-1},v_{\delta,+}^{m-1}) +\int _{ I_{m}}a_{h}^{s}(\xi,v_{\delta })\;dt \\ & & = -\left (\int _{I_{m}}(\partial _{t}\eta,v_{\delta })\;dt + ([\eta ]_{m-1},v_{\delta,+}^{m-1})\right ) -\int _{ I_{m}}a_{h}(\eta,v_{\delta })\;dt,{}\end{array}$$
(47)

where \(y - y_{\delta }^{u} = (y -\pi y) + (\pi y - y_{\delta }^{u}) =\eta +\xi\).

Since we use the same mesh on each time interval, (24) leads to the following identity.

$$\displaystyle{ \int _{I_{m}}(\partial _{t}\eta,v_{\delta })\;dt + ([\eta ]_{m-1},v_{\delta,+}^{m-1}) = 0,\quad \forall v_{\delta } \in V _{ h}^{k,q}. }$$
(48)

We proceed as in the proof of Lemma 5 and the proof of [15, Theorem 5.1] by inserting the estimate (26) to obtain

$$\displaystyle\begin{array}{rcl} & & \int _{I_{m}}(\partial _{t}\xi,v_{\delta })\;dt + ([\xi ]_{m-1},v_{\delta,+}^{m-1}) +\int _{ I_{m}}a_{h}^{s}(\xi,v_{\delta })\;dt \\ & & \leq \frac{\epsilon } {4}\int _{I_{m}}\vert \vert \vert v_{\delta }\vert \vert \vert _{DG}^{2}\;dt + \frac{C_{0}} {2} \int _{I_{m}}\|v_{\delta }\|^{2}\;dt \\ & & +\frac{1} {2}\int _{I_{m}}\sum \limits _{K\in \mathcal{T}_{h}}\big(\|v_{\delta }\|_{\partial K^{+}\cap \varGamma ^{+}}^{2} +\| \left [\!\left [v_{\delta }\right ]\!\right ]\|_{ \partial K^{-}\setminus \varGamma ^{-}}^{2}\big)\;dt \\ & & +k_{m}C_{A}C_{\pi }(h^{2p} + k^{2q+2})\vert y\vert _{ R}^{2} + k_{ m}2C_{\boldsymbol{\beta }}C_{\pi }C_{M}(h^{2p+1} + k^{2q+2})\vert y\vert _{ R}^{2} \\ & & +k_{m}C_{\pi }\frac{C_{\boldsymbol{\beta }}C_{r}} {C_{0}} (h^{2p+2} + k^{2q+2})\vert y\vert _{ R}^{2}, {}\end{array}$$
(49)

where \(\vert y\vert _{R} =\max (\vert y\vert _{W^{q+1,\infty }(I_{m};H^{1}(\varOmega ))},\vert y\vert _{L^{\infty }(I_{m};H^{p+1}(\varOmega ))})\).

Firstly, we shall substitute \(v_{\delta } = 2\xi\) into (49) to obtain

$$\displaystyle\begin{array}{rcl} & & \|\xi _{-}^{m}\|^{2} -\|\xi _{ -}^{m-1}\|^{2} + \frac{\epsilon } {2}\int _{I_{m}}\vert \vert \vert \xi \vert \vert \vert _{DG}^{2}\;dt + C_{ 0}\int _{I_{m}}\|\xi \|^{2}\;dt \\ & & +\int _{I_{m}}\sum \limits _{K\in \mathcal{T}_{h}}\left (\|\xi \|_{\partial K^{-}\cap \varGamma ^{-}}^{2} + \frac{1} {2}\|\left [\!\left [\xi \right ]\!\right ]\|_{\partial K^{-}\setminus \varGamma ^{-}}^{2} + \frac{1} {2}\|\xi \|_{\partial K^{+}\cap \varGamma ^{+}}^{2}\right )\;dt \\ & & \leq k_{m}C_{b}(h^{2p} + h^{2p+1} + h^{2p+2} + k^{2q+2})\vert y\vert _{ R}^{2}, {}\end{array}$$
(50)

where \(C_{b} =\max \{ C_{A}C_{\pi },2C_{\boldsymbol{\beta }}C_{\pi }C_{M},C_{\pi }\frac{C_{\boldsymbol{\beta }}C_{r}} {C_{0}} \}\).

Secondly, we substitute \(v_{\delta } = 2\tilde{\xi }\) into (49) to obtain

$$\displaystyle\begin{array}{rcl} & & \sup _{t\in I_{m}}\|\xi (t)\|^{2} -\sup _{ t\in I_{m-1}}\|\xi (t)\|^{2} \\ & & \leq C_{b}^{{\prime}}\int _{ I_{m}}\vert \vert \vert \xi \vert \vert \vert _{DG}^{2}\;dt +\int _{ I_{m}}\sum \limits _{K\in \mathcal{T}_{h}}\left (\|\left [\!\left [\xi \right ]\!\right ]\|_{\partial K^{-}\setminus \varGamma ^{-}}^{2} +\|\xi \|_{ \partial K^{+}\cap \varGamma ^{+}}^{2}\right )\;dt \\ & & +k_{m}C_{b}(h^{2p} + h^{2p+1} + h^{2p+2} + k^{2q+2})\vert y\vert _{ R}^{2} \\ & & \leq C_{b}^{{\prime\prime}}\int _{ I_{m}}\left (\vert \vert \vert \xi \vert \vert \vert _{DG}^{2} +\sum \limits _{ K\in \mathcal{T}_{h}}\left (\|\left [\!\left [\xi \right ]\!\right ]\|_{\partial K^{-}\setminus \varGamma ^{-}}^{2} +\|\xi \|_{ \partial K^{+}\cap \varGamma ^{+}}^{2}\right )\right )\;dt \\ & & +k_{m}C_{b}(h^{2p} + h^{2p+1} + h^{2p+2} + k^{2q+2})\vert y\vert _{ R}^{2}, {}\end{array}$$
(51)

where \(C_{b}^{{\prime}} = C(1 + C_{D})(\epsilon C_{A} + C_{S}(C_{\boldsymbol{\beta }} + C_{r})),C_{b}^{{\prime\prime}} =\max \{ 1,C_{b}^{{\prime}}\}\). Now, we proceed as in the proof of Lemma 5. We multiply (50) by \(C_{b}^{{\prime\prime\prime}} = \frac{2} {\epsilon } C_{b}^{{\prime\prime}}\) in order to eliminate the terms \(\xi\) on the right-hand side of (51). Then, we add it to (51) and denote \(\varTheta _{m} =\sup _{t\in I_{m}}\|\xi (t)\|^{2} + C_{b}^{{\prime\prime\prime}}\|\xi _{-}^{m}\|^{2}\) in order to obtain

$$\displaystyle{ \varTheta _{m} -\varTheta _{m-1} \leq k_{m}2C_{b}^{{\prime\prime\prime}}(h^{2p} + h^{2p+1} + h^{2p+2} + k^{2q+2})\vert y\vert _{ R}^{2}. }$$
(52)

We sum (52) over m = 1, , n ≤ N T to obtain

$$\displaystyle{ \sup _{t\in I_{n}}\|\xi (t)\|^{2} \leq \mathcal{O}(h^{2p} + k^{2q+2}). }$$
(53)

Thirdly, we integrate (2) over I m and subtract it from (30) and denote \(p - p_{\delta }^{u} = (p -\pi p) + (\pi p - p_{\delta }^{u}) =\varphi +\mu\). Then, we use the idea in the proof of (53) in order to derive

$$\displaystyle{ \sup _{t\in I_{N-m+1}}\|\mu (t)\|^{2} -\sup _{ t\in I_{N-m+2}}\|\mu (t)\|^{2} \leq Ck_{ m}\sup _{t\in I_{m}}\|\xi (t)\|^{2}\;dt + \mathcal{O}(h^{2p} + k^{2q+2}), }$$
(54)

for C > 0. The resulting inequality is summed over m = N T , , n ≥ 1. Then, it is combined with (53) to derive the final result (46). ⊓ ⊔

Remark 1

For guaranteeing the assumptions on the exact solution, it is necessary to require a higher regularity of the data of the problem.

We state the main estimate of this study by combining Lemmas 56, and 7.

Theorem 1

Suppose that (y,p,u) and (y δ ,p δ ,u δ ) are the solutions of  (2) and  (16) , respectively. We assume that all conditions of Lemmas  56 and  7 are satisfied. Then, there exists a constant C independent of h and k such that

$$\displaystyle{ \|y - y_{\delta }\|_{L^{\infty }(0,T;L^{2}(\varOmega ))} +\| p - p_{\delta }\|_{L^{\infty }(0,T;L^{2}(\varOmega ))} +\| u - u_{\delta }\|_{L^{2}(0,T;L^{2}(\varOmega ))} \leq C\left (h^{p} + k^{q+1}\right ). }$$
(55)

In Theorem 1, the error in the state and control is measured with respect to the norm \(L^{\infty }(0,T;L^{2}(\varOmega ))\) and L 2(0, T; L 2(Ω)), respectively. The same norms are used, for example, in the study of Fu [16], too. The former norm is due to the discrete characteristic function which is used to provide error estimates at arbitrary time points. The latter norm arises from the optimality condition which is shown in Lemma 6. On the other hand, we observe that Theorem 1 is optimal in time, suboptimal in space in the \(L^{\infty }(0,T;L^{2}(\varOmega ))\) norm for the state and L 2(0, T; L 2(Ω)) for the control, i.e. \(\mathcal{O}(h^{p},k^{q+1})\), using p-degree spatial, q-degree temporal polynomial approximation. However, for example, optimal spatial convergence rate for SIPG discretization combined with backward Euler is achieved using an elliptic projection in [1]. The first reason behind the order reduction in this study is the estimate (26) for the space-time projection which is employed to bound the continuity estimate of the bilinear form in Lemma 3. The convection term also has an influence on the spatial order reduction since we follow the proof of [15, Theorem 5.1]. After eliminating the effect of the space-time projection in the bilinear form of the diffusion term, this suboptimal estimate can be improved as in [1].

6 Numerical Results

In this section, we present some numerical results. We measure the error in the state and the control in terms of \(L^{\infty }(0,1;L^{2}(\varOmega ))\) and L 2(0, 1; L 2(Ω)) norm, respectively. We have used discontinuous piecewise linear polynomials in space. In all numerical examples, we have taken h = k.

We note that, in the case of dG(0) method, the approximating polynomials are piecewise constant in time and the resulting scheme is a version of the backward Euler method with a modified right-hand side [31, Chap. 12]:

$$\displaystyle\begin{array}{rcl} (M + kA^{s})y_{ h,m}& =& My_{h,m-1} + \frac{k} {2}(f_{h,m} + f_{h,m-1}) + \frac{k} {2}M(u_{h,m} + u_{h,m-1}), {}\\ (M + kA^{a})p_{ h,m-1}& =& Mp_{h,m} -\frac{k} {2}M(y_{h,m} + y_{h,m-1}) + \frac{k} {2}(y_{h,m}^{d} + y_{ h,m-1}^{d}). {}\\ \end{array}$$

For dG(1) method, we use piecewise linear polynomials in time. The resulting linear system for the state on each time interval is given as follows [31, Chap. 12]:

$$\displaystyle{ \left (\begin{array}{cc} M + kA^{s}& M + \frac{k} {2} A^{s} \\ \frac{k} {2} A^{s} &\frac{1} {2}M + \frac{k} {3} A^{s} \\ \end{array} \right )\left (\begin{array}{c} Y _{0} \\ Y _{1}\\ \end{array} \right ) = \left (\begin{array}{c} My_{h,m-1} + \frac{k} {2} (f_{h,m} + f_{h,m-1}) + \frac{k} {2} M(u_{h,m} + u_{h,m-1}) \\ \frac{k} {2} (f_{h,m} + Mu_{h,m})\\ \end{array} \right ), }$$
(56)

where A sand M are the stiffness and the mass matrices of the state equation, respectively. We derive the solution at the time step t m as y h, m  = Y 0 + Y 1. For the adjoint equation, we have the following linear system:

$$\displaystyle{ \left (\begin{array}{cc} M + kA^{a}& M + \frac{k} {2} A^{a} \\ \frac{k} {2} A^{a} &\frac{1} {2}M + \frac{k} {3} A^{a} \\ \end{array} \right )\left (\begin{array}{c} P_{1} \\ P_{0}\\ \end{array} \right ) = \left (\begin{array}{c} Mp_{h,m} -\frac{k} {2} M(y_{h,m} + y_{h,m-1}) + \frac{k} {2} (y_{h,m}^{d} + y_{ h,m-1}^{d}) \\ -\frac{k} {2} (My_{h,m-1} - y_{h,m-1}^{d})\\ \end{array} \right ), }$$
(57)

where A a is the stiffness matrix for the adjoint equation. We obtain the adjoint at the time step t m−1 as p h, m−1 = P 0 + P 1.

The main drawback of dG time discretization is the solution of large coupled linear systems in block form. Because we are using constant time steps, the coupled matrices on the right-hand side of (56) and (57) have to be decomposed (LU block factorization) at the beginning of the integration. Then, the state and the adjoint equations are solved at each time step by forward elimination and back substitution using the block factorized matrices.

Example 1

The first example is a convection dominated OCP with smooth solutions. It is converted to an unconstrained optimal control problem [17, Ex. 1] by adding the reaction term with

$$\displaystyle{Q = (0,1]\times \varOmega,\;\varOmega = (0,1)^{2},\;\epsilon = 10^{-5},\;\beta = (1,0)^{T},\;r = 1,\;\alpha = 1.}$$

The source function f, the desired state y d and the initial condition y 0 are computed from (2) using the following exact solutions of the state and the control, respectively,

$$\displaystyle\begin{array}{rcl} y(x_{1},x_{2},t)& =& \exp (-t)\sin (2\pi x_{1})\sin (2\pi x_{2}), {}\\ u(x_{1},x_{2},t)& =& \exp (-t)(1 - t)\sin (2\pi x_{1})\sin (2\pi x_{2}). {}\\ \end{array}$$

In Table 1, errors and converge rates for dG(0) and backward Euler method are shown. We observe that the first order convergence rate is achieved in time, due to the dominance of temporal errors.

Table 1 Example 1 by dG(0) and backward Euler(in parenthesis) method

In Table 2, errors and converge rates for Crank-Nicolson method obtained by OD and DO approaches are shown. For Crank-Nicolson method, through OD approach, the second order convergence rate is achieved. However, for DO approach, discretization of the right-hand side of the adjoint equation (14) by one-step method is reflected to the numerical results and the quadratic order of convergence is not observed.

Table 2 Example 1 by Crank-Nicolson method OD and DO approach(in parenthesis)

In Table 3, We present numerical results for dG(1) time discretization. Numerical results indicate a higher order experimental order of convergence, namely \(\mathcal{O}(h^{2})\), than the one shown in Theorem 1, which is \(\mathcal{O}(h)\) with h = k. The error in the state is smaller than for Crank-Nicolson method with OD approach, while the error in the control is close to one for Crank-Nicolson method with OD approach.

Table 3 Example 1 by dG(1)method

Example 2

The second example is a convection dominated OCP adapted from [16, Ex. 2] with

$$\displaystyle{Q = (0,1]\times \varOmega,\;\varOmega = (0,1)^{2},\;\epsilon = 10^{-5},\;\beta = (0.5,0.5)^{T},\;r = 3,\;\alpha = 1.}$$

The source function f, the desired state y d and the initial condition y 0 are computed from (2) using the following exact solutions of the control and state, respectively,

$$\displaystyle\begin{array}{rcl} u(x_{1},x_{2},t)& =& \sin (\pi t)\sin (2\pi x_{1})\sin (2\pi x_{2})\exp \left (\frac{-1 +\cos (t_{x})} {\sqrt{\varepsilon }} \right ), {}\\ y(x_{1},x_{2},t)& =& u\left ( \frac{1} {2\sqrt{\varepsilon }}\sin (t_{x}) + 8\varepsilon \pi ^{2} + \frac{\sqrt{\varepsilon }} {2} \cos (t_{x}) -\frac{1} {2}\sin ^{2}(t_{ x})\right ) {}\\ & -& \pi \cos (\pi t)\sin (2\pi x_{1})\sin (2\pi x_{2})\exp \left (\frac{-1 +\cos (t_{x})} {\sqrt{\varepsilon }} \right ), {}\\ \end{array}$$

where t x  = t − 0. 5(x 1 + x 2). As opposed to the previous example, the exact solution of the PDE constraint depends on the diffusion explicitly and the problem is highly convection dominated. This example cannot be solved properly by using dG(0) and backward Euler method. Therefore, we present numerical results for Crank-Nicolson method in Table 4, where the differences between OD and DO can be seen clearly. DO approach causes order reduction in the control. However, due to the convection dominated nature of the problem, the quadratic convergence rate cannot be achieved with OD approach in contrast to Example 1. The orders of convergence correspond to those in [5].

Table 4 Example 2 by Crank-Nicolson method OD and DO approach(in parenthesis)

In Table 5, we present numerical results for dG(1) discretization. As opposed to the results in Table 4, the error in the state and the control are smaller than in the case of Crank-Nicolson. Numerical results indicate a better experimental order of convergence, namely \(\mathcal{O}(h^{2})\), than the theoretical error estimate in Theorem 1. Similar observations are made for nonstationary non-linear diffusion-convection equations for the SIPG spatial discretization in [20].

Table 5 Example 2 by dG(1) method

In Figs. 1 and 2, we present the error between the exact and the approximate solution at t = 0. 5 obtained using Crank-Nicolson-DO approach and dG(1) discretization. These figures also show that dG(1) discretization solves the problem well.

Fig. 1
figure 1

Example 2: Error at t = 0.5 with Crank-Nicolson(DO approach) h = k = 1∕80

Fig. 2
figure 2

Example 2: Error at t = 0.5 with dG(1) method h = k = 1∕80

7 Conclusion

For dG time discretization, the numerical results show that linear and quadratic convergence rates are achieved using piecewise discontinuous constant and linear polynomials in time, respectively, and DO and OD approaches commute. In a future work, we will study control constrained problem and derive the optimal convergence rates under lower regularity assumptions.

8 Outlook: Efficient Solvers for DG Time Discretization

Discontinuous Galerkin time stepping is used for solving linear and nonlinear OCPs by multiple shooting methods in [6, 18] because of the commutativity property of discretization and optimization. At each subinterval of multiple shooting, a very large system of linear or nonlinear equations has to be solved, which can be handled by iterative methods, such as Krylov subspace method. In the references mentioned above, the first order dG(0) method is used, where for nonlinear problems at each Newton iteration step, a linear system of equations with the same structure of implicit Euler method has to be solved. Higher order dG methods lead to coupled block systems and the number of the unknowns grows linearly with increasing order. Therefore, for OCPs constrained by linear and nonlinear parabolic PDEs in several space dimensions, efficient solution techniques are needed. In the following, we will give an overview of the existing approaches by narrowing our discussion to 2x2 coupled block systems arising from different dG discretizations.

In the last decade, several variational time discretization methods were developed. The test spaces always consist of piecewise discontinuous polynomials. When the solution space consists of continuous piecewise polynomials of degree k and the test functions are piecewise discontinuous polynomials of degree k − 1, the resulting method is called continuous Galerkin discretization cGP(k). For discontinuous Galerkin dG(k) method, both test and trial spaces are piecewise discontinuous polynomials of degree k. Advantages of variational time discretization are stability, convergence, space-time adaptivity. Both continuous and discontinuous Galerkin methods are A-stable; the discontinuous Galerkin methods are even L-stable (strongly stable). The convergence order of cGP(k) methods is of one order higher than the dG(k) methods. Both of these methods are super-convergent at the nodal points, namely of order 2r + 1, when the order of the method is r and the solution of the problem is sufficiently regular [31, Chap. 12]. The time-space adaptivity can be easily implemented, because the time is discretized as the space with finite elements. Using a posteriori error estimates, adaptive hp time stepping and dynamic meshes (the use of different spatial discretization for each time step) can directly be incorporated in the discrete formulation [25]. We want to mention that dynamic meshes (meshes changing with time) were used by combining dG(0) time discretization with multiple shooting method for linear and nonlinear OCPs in [18], whereas Carraro et al. [6] use fixed meshes for all discrete time levels.

As we have mentioned, the main disadvantage of variational time discretization is the large system of coupled equations as a result of space-time discretization. To illustrate this, we consider the semilinear parabolic initial value problem

$$\displaystyle{ \frac{du} {du} = Au + f(u),\quad u(0) = u_{0}, }$$
(58)

where A is a linear second order elliptic differential operator and f(u) is locally Lipschitz continuous and monotone.

The 2×2 block system associated to dG discretization of (58) can be written in the following form:

$$\displaystyle{ \begin{array}{lll} \alpha _{1,1}MU_{n}^{1} +\alpha _{1,2}MU_{n}^{2} +\varDelta t\beta _{1,1}F(U_{n}^{1}) +\varDelta t\beta _{1,2}F(U_{n}^{2})& =&c_{1}MU_{0} + d_{1}F(U_{0}), \\ \alpha _{2,1}MU_{n}^{1} +\alpha _{1,2}MU_{n}^{2} +\varDelta t\beta _{2,1}F(U_{n}^{1}) +\varDelta t\beta _{2,2}F(U_{n}^{2})& =&c_{2}MU_{0} + d_{2}F(U_{0}), \end{array} }$$
(59)

where M is the mass matrix and F(⋅ )’s are dGFEM semi-discretized nonlinear terms of the right hand side of (58).

One step of the Newton iteration for solving the coupled system in (59) corresponds to solving the following 2 × 2 block system:

$$\displaystyle{ \left (\begin{array}{cc} \varDelta t\alpha _{1,1}M +\varDelta t\beta _{1,1}\bar{A}&\varDelta t\alpha _{1,2}M +\varDelta t\beta _{1,2}\bar{A} \\ \varDelta t\alpha _{1,2}M +\varDelta t\beta _{1,2}\bar{A}&\varDelta t\alpha _{2,2}M +\varDelta t\beta _{2,2}\bar{A} \end{array} \right )\left (\begin{array}{c} W_{n}^{1} \\ W_{n}^{2} \end{array} \right ) = \left (\begin{array}{c} R_{n}^{1} \\ R_{n}^{2} \end{array} \right ), }$$
(60)

where the vectors W n i and R n i, for i = 1, 2, denote the Newton correction and residual for a temporal basis function, respectively [25].

In [35], the linear system of equations associated to dG(k) method, derived from the solution of the linear parabolic equations, are decoupled into complex valued linear systems having the same structure as the implicit Euler discretization. Because the existing finite element codes do not support complex arithmetic, implementation would be difficult and costly. In order to avoid the use of complex arithmetic, Richter et al. [25] developed an inexact Newton method for solving nonlinear parabolic PDEs discretized by dG(k) methods. At each time step, several linear systems of equations are solved with the same structure as for the implicit Euler discretization. Weller and Basting [34] suggest a different solution strategy for linear parabolic PDEs under dG(2) method approximated at Gauss-Radau points. The essential component U n 2, which is the solution of the problem at the next time step, can be obtained by an inexact factorization of the Schur complement, due to the property β 1, 2 = β 2, 1 = 0 in (59) and (60). Because the Schur complement is of the fourth order, the condition number will be worse than the condition number of the original system. They apply a symmetric preconditioned conjugate gradient method so that a number of linear systems with the same structure arising from implicit Euler discretization must be solved at each step. The nice property of the method is that it can be applied to linear parabolic PDEs with non-self adjoint operators like diffusion-convection-reaction equation, because the Schur complement is symmetric. Efficiency of the solution technique for nonlinear parabolic problems has to be tested. Schieweck [27] introduced a continuous dG method where the solution space consists piecewise continuous polynomials of degree k ≥ 1 and test space of piecewise discontinuous polynomials of degree k − 1 approximated at Gauss-Lobatto nodes. They call this technique discontinuous Galerkin-Petrov dGP(k) method. Because the time derivative of the discrete solution is contained in the discrete test space, the method has energy decreasing property so that it can be applied to gradient systems like Allen-Chan and Chan-Hilliard equations. Again, the essential unknown is U n 2 for dGP(2) method due to β 11 = 0 in (59) and (60), and the solution can be determined by fixed point iteration. However, the linear system which must be solved at each time level consists of powers of mass and stiffness matrices, which could be difficult to solve. Instead, a defect correction algorithm was introduced [27], so that at each defect correction step, linear systems like in the implicit Euler discretization have to be solved again.