1 Introduction

Adaptive finite element method (AFEM for short), contributed to the pioneer work of Babuška and Rheinboldt [2], becomes nowadays a popular approach in the community of engineering and scientific computing. It aims at distributing more mesh nodes around the area where the singularities happen to save the computational cost. Various types of reliable and efficient a posteriori error estimators, which are used to detect the location of singularity and essential for the success of AFEM, have been developed in the last decades for different kind of problems, we refer to [36] for an overview.

Although AFEM has been successfully applied for more than three decades, the convergence analysis is rather recent which started with Dörfler [13] and was further studied in [6, 7, 3133]. Besides convergence, optimality is another important issue in AFEM which was firstly addressed by Binev et al. [6] and further studied by Stevenson [34, 35]. The so-called Dörfler’s marking proposed in [13] and quasi-error introduced in [7] consisting of the sum of the energy error and the scaled estimator are crucial to prove the contraction of the errors and quasi-optimal cardinality of the standard AFEM which avoids marking for oscillation [13] and circumvents the interior node property of mesh refinement [32, 33].

AFEM also finds successful application in optimal control problems governed by partial differential equations, starting from Liu, Yan [26] and Becker, Kapp, Rannacher [3]. In [3] the authors proposed a dual-weighted goal-oriented adaptivity for optimal control problems while in [26] residual type a posteriori error estimates were derived. We refer to [17, 18, 24, 2730] for more details of recent advance. Recently, Kohls, Rösch and Siebert derived in [22] an error equivalence property which enables one to derive reliable and efficient a posteriori error estimators for optimal control problems with either variational discretization or full control discretization.

There also exist some attempts to prove the convergence of AFEM for optimal control problems. In [14] the authors considered the piecewise constant approximation of the control variable and gave an error reduction property for the quadruplet \((u,y,p,\sigma )\), where uyp denote the optimal control, state, and adjoint state variables and \(\sigma \) the associated co-control variable. However, additional requirements on the strict complementarity of the continuous problem and non-degeneracy property of the discrete control problem are assumed and the marking strategy is extended to include the discrete free boundary between the active and inactive control sets. In [4] the authors viewed the control problems as a nonlinear elliptic system of the state and adjoint variables and gave a convergence proof for the adaptive algorithm involving the marking of data oscillation. In [23] the authors proved that the sequence of adaptively generated discrete solutions converged to the true solutions for optimal control problems, but obtained only the plain convergence of the adaptive algorithm without convergence rate and optimality. In this paper we intend to give a rigorous convergence proof for the adaptive finite element algorithm of elliptic optimal control problem in an optimal control framework. Compared to [4], the AFEM adopted in current paper uses Dörfler’s marking [13] and is a standard algorithm in that it employs only the error indicators and does not use the oscillation indicators. Moreover, for the convergence analysis of AFEM we use the standard convergence results of AFEM for elliptic boundary value problems so that the proof is more clear and rigorous.

Inspired by the work [11] of Dai, Xu and Zhou where the convergence and optimality of AFEM for elliptic eigenvalue problem are proved by exploiting a certain relationship between the finite element eigenvalue approximation and the associated finite element boundary value approximation, in this paper we will provide a rigorous convergence analysis of the adaptive finite element algorithm for optimal control problems governed by a linear elliptic equation. Under mild assumptions on the initial mesh from which the adaptive algorithm starts, we show that the energy norm errors of the state and adjoint state variables are equivalent to the boundary value approximations of the state and adjoint state equations up to a higher order term. Then based on the well-known convergence result of AFEM for elliptic boundary value problems, we are able to prove the convergence of AFEM for the optimal control problems (OCPs for short). To be more specific, the AFEM for OCPs is a contraction for the sum of the energy errors and the scaled error estimators of the state y and the adjoint state p, between two consecutive adaptive loops. We also show that the AFEM yields a decay rate of the energy errors of the state y and the adjoint state p plus oscillations of the state and adjoint state equations in terms of the number of degrees of freedom. This result is an improvement over the plain convergence result presented in [23].

We remark that we study the AFEM for OCPs under energy norm errors for the state and adjoint state. Compared to a priori error estimates for optimal control problems [20], it seems to be more suitable to work with \(L^2\)-norm errors for the control, the state and the adjoint state, including a posteriori error estimates and the convergence analysis. However, the motivation to study AFEM in energy norm in this paper is two folds. Firstly, up to now almost all AFEMs with guaranteed convergence are based on energy norm error, the only contribution of convergent AFEM under \(L^2\)-norm to our knowledge is [12] by Demlow and Stevenson, where the convergence result of AFEM under energy norm is used to prove the convergence of AFEM with \(L^2\) norm by establishing certain equivalence property between the \(L^2\)-norm error and the weighted energy norm error. Secondly, to work with \(L^2\)-norm we have to assume \(H^2\)-regularity for the solution of elliptic equation, this excludes the most interesting case where the domain may be non-convex or the coefficient of the elliptic operator may be discontinuous. However, if we have \(H^2\)-regularity for the solution of elliptic equation we can already achieve optimal a priori error estimates for the optimal control problems and AFEM is thus not attractive.

The rest of the paper is organised as follows. In Sect. 2 we recall some well-known results on the convergence analysis of AFEM for elliptic boundary values problems. In Sect. 3 we introduce the finite element approximation of the optimal control problems and derive a posteriori error estimates. We also present Dörfler’s marking strategy and the adaptive algorithm for the optimal control problems. In Sect. 4 we give a rigorous convergence analysis of the AFEM for optimal control problems and the quasi-optimal cardinality is proved in Sect. 5. Numerical experiments are carried out in Sect. 6 to validate our theoretical result. Finally, we give a conclusion in Sect. 7 and outlook the possible extensions and future work.

Let \(\Omega \subset \mathbb {R}^d\) (\(d=2,3\)) be a bounded polygonal or polyhedral domain which is not necessarily convex. We denote by \(W^{m,q}(\Omega )\) the usual Sobolev space of order \(m\geqslant 0\), \(1\leqslant q<\infty \) with norm \(\Vert \cdot \Vert _{m,q,\Omega }\) and seminorm \(|\cdot |_{m,q,\Omega }\). For \(q=2\) we denote \(W^{m,q}(\Omega )\) by \(H^m(\Omega )\) and \(\Vert \cdot \Vert _{m,\Omega }=\Vert \cdot \Vert _{m,2,\Omega }\), which is a Hilbert space. Note that \(H^0(\Omega )=L^2(\Omega )\) and \(H_0^1(\Omega )=\{v\in H^1(\Omega ):\ v=0\ \mathrm{on}\ \partial \Omega \}\). We denote C a generic positive constant which may stand for different values at its different occurrences but does not depend on mesh size. We use the symbol \(A\lesssim B\) to denote \(A\leqslant CB\) for some constant C that is independent of mesh size.

2 Preliminaries

In this section, we recall some well-known results on the adaptive finite element approximation to a linear elliptic boundary value problem, which are then used for the convergence analysis of AFEM for optimal control problems. Some of the results are collected from [7, 11], see also [16].

Consider the following second order elliptic equation

$$\begin{aligned} \left\{ \begin{array}{ll} Ly=f &{}\quad \mathrm{in}\ \Omega , \\ \ y=0 &{}\quad \mathrm{on}\ \partial \Omega , \end{array}\right. \end{aligned}$$
(2.1)

where \(f\in L^2(\Omega )\) and L is a linear second order elliptic operator of the following form:

$$\begin{aligned} Ly:= -\sum \limits _{i,j=1}^d\frac{\partial }{\partial x_j}\left( a_{ij}\frac{\partial y}{\partial x_i}\right) +a_0y. \end{aligned}$$

We denote \(L^*\) the adjoint operator of L

$$\begin{aligned} L^*y:= -\sum \limits _{i,j=1}^d\frac{\partial }{\partial x_j}\left( a_{ji}\frac{\partial y}{\partial x_i}\right) +a_0y. \end{aligned}$$

Here \(0\leqslant a_0<\infty \), \(a_{ij}\in W^{1,\infty }(\Omega )\) \((i,j=1,\ldots ,d)\) and \((a_{ij})_{d\times d}\) is symmetric and positive definite. We denote \(A=(a_{ij})_{d\times d}\) and \(A^*\) its adjoint. Let

$$\begin{aligned} a(y,v)=\int _\Omega \left( \sum \limits _{i,j=1}^da_{ij}\frac{\partial y}{\partial x_i}\frac{\partial v}{\partial x_j}+a_0yv\right) dx,\quad \forall y,v\in H_0^1(\Omega ). \end{aligned}$$

It is clear that \(a(\cdot ,\cdot )\) is a bounded bilinear form over \(H_0^1(\Omega )\) and defines a norm \(\Vert \cdot \Vert _{a,\Omega }=\sqrt{a(\cdot ,\cdot )}\) which is equivalent to \(\Vert \cdot \Vert _{1,\Omega }\).

The standard weak form of (2.1) reads as follows: Find \(y\in H_0^1(\Omega )\) such that

$$\begin{aligned} a(y,v)=(f,v)\quad \forall v\in H_0^1(\Omega ). \end{aligned}$$
(2.2)

For each \(f\in H^{-1}(\Omega )\) the above problem admits a unique solution by the well-known Lax–Milgram theorem. Since the elliptic equation (2.2) is linear with respect to the right-hand side f, we can define a linear and bounded solution operator \(S : L^2(\Omega ) \rightarrow H_0^1(\Omega )\) such that \(y = Sf\).

Let \(\mathcal {T}_h\) be a regular triangulation of \(\Omega \) such that \(\bar{\Omega }=\cup _{T\in \mathcal {T}_h}\bar{T}\). We assume that \(\mathcal {T}_h\) is shape regular in the sense that: There exists a constant \(\gamma ^*>0\) such that \(\frac{h_T}{\rho _T}\leqslant \gamma ^{*}\) for all \(T\in \mathcal {T}_h\), where \(h_T\) denotes the diameter of T and \(\rho _T\) is the diameter of the biggest ball contained in T. We set \(h=\max _{T\in \mathcal {T}_h}h_T\). In this paper, we use \(\mathcal {E}_h\) to denote the set of interior faces (edges or sides) of \(\mathcal {T}_h\) and \(\# \mathcal {T}_h\) to denote the number of elements of \(\mathcal {T}_h\).

On \(\mathcal {T}_h\) we construct a family of nested finite element spaces \(V_{h}\) consisting of piecewise linear and continuous polynomials such that \(V_h\subset C(\bar{\Omega })\cap H_0^1(\Omega )\). We define the standard Galerkin projection operator \(\mathcal {R}_h: H_0^1(\Omega )\rightarrow V_h\) by [9]

$$\begin{aligned} a(y-\mathcal {R}_hy,v_h)=0\quad \forall v_h\in V_h, \end{aligned}$$
(2.3)

which satisfies the following stability result

$$\begin{aligned} \Vert \mathcal {R}_hy\Vert _{a,\Omega }\lesssim \Vert y\Vert _{a,\Omega }\quad \forall y\in H_0^1(\Omega ). \end{aligned}$$
(2.4)

A standard finite element approximation to (2.2) can then be formulated as: Find \(y_h\in V_h\) such that

$$\begin{aligned} a(y_h,v_h)=(f,v_h)\quad \forall v_h\in V_h. \end{aligned}$$
(2.5)

Similarly, we can define a discrete solution operator \(S_h : L^2(\Omega ) \rightarrow V_h\) such that \(y_h = S_hf\). Thus, we have \(y_h=\mathcal {R}_hy=\mathcal {R}_hSf\).

For the following purpose, we follow the idea of [11] to introduce the quantity \(\kappa (h)\) as follows

$$\begin{aligned} \kappa (h)=\sup \limits _{f\in L^2(\Omega ),\Vert f\Vert _{0,\Omega }=1}\inf \limits _{v_h\in V_h}\Vert Sf-v_h\Vert _{a,\Omega }. \end{aligned}$$
(2.6)

We note that the quantity \(\kappa (h)\) is determined by the regularity of Sf which is further induced by properties of the domain \(\Omega \). Indeed, if the boundary of \(\Omega \) is smooth, like \(\mathcal {C}^1\), the additional regularity \(Sf\in H^2(\Omega )\) holds and thus \(\kappa (h)=O(h)\). This is still true for polygonal or polyhedral boundaries if the domain is convex. The regularity is reduced, however, in the vicinity of non-convex portions of polygonal or polyhedral boundaries. Grisvard proved in [15] the precise regularity results (Theorem 2.4.3 for the two-dimensional case and Corollary 2.6.7 for the three-dimensional case): There exists an \(\varepsilon \in (0,{1\over 2}]\), which depends on the shape of the domain, such that \(Sf\in H^{{3\over 2}+\varepsilon }(\Omega )\) for each \(f\in L^2(\Omega )\). Obviously, \(\kappa (h)\ll 1\) for \(h\in (0,h_0)\) if \(h_0\ll 1\).

The following results are standard and can be found in, e.g., [9, 11]

Proposition 2.1

For each \(f\in L^2(\Omega )\), there hold

$$\begin{aligned} \Vert Sf-S_hf\Vert _{a,\Omega }\lesssim \kappa (h)\Vert f\Vert _{0,\Omega } \end{aligned}$$
(2.7)

and

$$\begin{aligned} \Vert Sf-S_hf\Vert _{0,\Omega }\lesssim \kappa (h)\Vert Sf-S_hf\Vert _{a,\Omega }. \end{aligned}$$
(2.8)

Now we are in the position to review the residual type a posteriori error estimator for the finite element approximation of an elliptic boundary value problem. We define the element residual \({\tilde{r}}_T(y_h)\) and the jump residual \({\tilde{j}}_E(y_h)\) by

$$\begin{aligned} {\tilde{r}}_T(y_h):= & {} f-Ly_h=f+\nabla \cdot (A\nabla y_h)-a_0y_h\quad \mathrm{in}\ T\in \mathcal {T}_h, \end{aligned}$$
(2.9)
$$\begin{aligned} {\tilde{j}}_E(y_h):= & {} [A\nabla y_h]_E\cdot n_E\quad \mathrm{on}\ E\in \mathcal {E}_h, \end{aligned}$$
(2.10)

where \([A\nabla y_h]_E\cdot n_E\) denotes the jump of \(A\nabla y_h\) across the common side E of elements \(T^+\) and \(T^-\), \(n_E\) denotes the outward normal oriented to \(T^-\). For each element \(T\in \mathcal {T}_h\), we define the local error indicator \({\tilde{ \eta }_{h}}(y_{h},T)\) by

$$\begin{aligned} {\tilde{\eta }}_{h}(y_{h},T):= \left( h_T^2\Vert \tilde{r}_{T}(y_{h})\Vert _{0,T}^{2}+\sum \limits _{E\in \mathcal {E}_h,E\subset \partial T}h_E\Vert \tilde{j}_{E}(y_{h})\Vert _{0,E}^{2}\right) ^{1\over 2}. \end{aligned}$$
(2.11)

Then on a subset \(\omega \subset \Omega \), we define the error estimator \({\tilde{\eta }}_{h}(y_h,\omega )\) by

$$\begin{aligned} {\tilde{\eta }}_{h}(y_h,\omega ):=\left( \sum \limits _{T\in \mathcal {T}_h,T\subset \omega }{\tilde{\eta }}_{h}^2(y_h,T)\right) ^{1\over 2}. \end{aligned}$$
(2.12)

Thus, \({\tilde{\eta }}_{h}(y_h,\Omega )\) constitutes the error estimator on \(\Omega \) with respect to \(\mathcal {T}_h\).

For \(f\in L^2(\Omega )\) we also need to define the data oscillation as (see [32, 33])

$$\begin{aligned} \mathrm{osc}(f,T):=\Vert h_T(f-\bar{f}_T)\Vert _{0,T},\quad \mathrm{osc}(f,\mathcal {T}_h):=\left( \sum \limits _{T\in \mathcal {T}_h}\mathrm{osc}^{2}(f,T)\right) ^{1\over 2}, \end{aligned}$$
(2.13)

where \(\bar{f}_T\) denotes the \(L^2\)-projection of f onto piecewise constant space on T. It is easy to see that

$$\begin{aligned} \mathrm{osc}(f_1+f_2,\mathcal {T}_h)\leqslant \mathrm{osc}(f_1,\mathcal {T}_h)+\mathrm{osc}(f_2,\mathcal {T}_{h}),\quad \forall f_1,f_2\in L^2(\Omega ). \end{aligned}$$
(2.14)

For the above defined data oscillation we have the following lemma whose proof can be found in [11, Lemma 2.4].

Lemma 2.2

There exists a constant \(C_*\) which depends on A, the mesh regularity constant \(\gamma ^*\) and the coefficient c, such that

$$\begin{aligned} \mathrm{osc}(Lv,\mathcal {T}_h)\leqslant C_{*}\Vert v\Vert _{a,\Omega },\quad \mathrm{osc}(L^{*}v,\mathcal {T}_h)\leqslant C_{*}\Vert v\Vert _{a,\Omega }\quad \forall v\in V_h. \end{aligned}$$
(2.15)

Now we can formulate the following global upper and lower bounds for the a posteriori error estimators of elliptic boundary value problems (see, e.g., [13, 36]):

$$\begin{aligned} \Vert y-y_{h}\Vert ^{2}_{a,\Omega }\leqslant & {} {\tilde{C}}_{1}{\tilde{\eta }}^{2}_{h}(y_{h},\Omega ),\end{aligned}$$
(2.16)
$$\begin{aligned} {\tilde{C}}_{2}{\tilde{\eta }}_{h}^{2}(y_h,\Omega )\leqslant & {} \Vert y-y_{h}\Vert ^{2}_{a,\Omega }+{\tilde{C}}_{3} \mathrm{osc}^{2}(f-Ly_{h},\mathcal {T}_{h}). \end{aligned}$$
(2.17)

For our following purpose we also need to study the adjoint equation of the elliptic boundary value problem (2.1). For each \(g\in L^2(\Omega )\), let \(p\in H_{0}^{1}(\Omega )\) be the solution of the following adjoint equation

$$\begin{aligned} a(v,p)=(g,v)\quad \forall v\in H_{0}^{1}(\Omega ) \end{aligned}$$
(2.18)

with its finite element approximation

$$\begin{aligned} a(v_{h},p_{h})=(g,v_{h})\quad \forall v_h\in V_{h}. \end{aligned}$$
(2.19)

We can also give the a posteriori global upper and lower error bounds:

$$\begin{aligned} \Vert p-p_{h}\Vert ^{2}_{a,\Omega }\leqslant & {} {\tilde{C}}_{1}{\tilde{\eta }}^{2}_{h}(p_h,\Omega ),\end{aligned}$$
(2.20)
$$\begin{aligned} {\tilde{C}}_2{\tilde{\eta }}_{h}^2(p_h,\Omega )\leqslant & {} \Vert p-p_{h}\Vert ^{2}_{a,\Omega }+{\tilde{C}}1_{3} \mathrm{osc}^{2}(g-L^{*}p_{h},\mathcal {T}_{h}). \end{aligned}$$
(2.21)

To analyze the adaptive finite element approximation for the optimal control problem, we introduce the AFEM for a system of two source problems associated with the state and adjoint state equations, which is some trivial extension for the existing results of the adaptive finite element approximation of the scalar problem (see [7]). Specifically, we introduce the adaptive finite element algorithm to solve a system of elliptic boundary value problems (2.2) and (2.18) based on the error estimators \({\tilde{\eta }}_{h}^{2}(y_{h},\Omega )+{\tilde{\eta }}_h^{2}(p_{h},\Omega )\).

The adaptive finite element procedure consists of the following loop

$$\begin{aligned} \mathrm{SOLVE}\rightarrow \mathrm{ESTIMATE}\rightarrow \mathrm{MARK}\rightarrow \mathrm{REFINE}. \end{aligned}$$

The ESTIMATE step is based on the a posteriori error estimators derived above, while the step REFINE can be done by using iterative or recursive bisection of elements with the minimal refinement condition (see [34, 36]). Due to [7], the procedure REFINE here is not required to satisfy the interior node property of [32].

There are different kinds of adaptive algorithms which differ from the marking strategies (see [3133]). Here we apply Dörfler’s marking strategy introduced in [13], which marks only the error estimator \({\tilde{\eta }}_h^{2}(y_{h},\Omega )+{\tilde{\eta }}_h^2(p_h,\Omega )\) to obtain the set of marked elements \({{\mathcal {M}}}_{h}\subset \mathcal {T}_{h}\) and avoids the marking for oscillation (compare Algorithm 3.7).

Then the adaptive algorithm for solving elliptic boundary value problems is also standard, see e.g. [7, Section 2.7], except that we solve instead a system of elliptic boundary value problems (2.2) and (2.18) (compare Algorithm 3.8).

We denote \(\mathbb {T}\) the class of all conforming refinements by bisection of \(\mathcal {T}_{h_0}\) (see [7] for more details). Given a fixed number \(b\geqslant 1\), for any \(\mathcal {T}_{h_{k}}\in \mathbb {T}\) and \(\mathcal {M}_{h_{k}}\subset \mathcal {T}_{h_{k}}\) of marked elements,

$$\begin{aligned} \mathcal {T}_{h_{k+1}}=\mathrm{REFINE}(\mathcal {T}_{h_{k}},\mathcal {M}_{h_k}) \end{aligned}$$

outputs a conforming triangulation \(\mathcal {T}_{h_{k+1}}\in \mathbb {T}\), where at least all elements of \(\mathcal {M}_{h_k}\) are bisected b times. We define \(R_{\mathcal {T}_{h_k}\rightarrow \mathcal {T}_{h_{k+1}}}=\mathcal {T}_{h_k}{\backslash }(\mathcal {T}_{h_k}\cap \mathcal {T}_{h_{k+1}})\) as set of refined elements, which satisfies \(\mathcal {M}_{h_k}\subset R_{\mathcal {T}_{h_k}\rightarrow \mathcal {T}_{h_{k+1}}}\).

Then we can formulate the following standard result on the complexity of the refinement, see [7, Lemma 2.3] and [35] for more details.

Lemma 2.3

Assume that \(\mathcal {T}_{h_0}\) verifies condition (b) of Sect. 4 in [35]. Let \(\mathcal {T}_{h_k}\) (\(k\geqslant 0\)) be a sequence of conforming and nested triangulations of \(\Omega \) generated by REFINE starting from the initial mesh \(\mathcal {T}_{h_0}\). Assume that \(\mathcal {T}_{h_{k+1}}\) is generated from \(\mathcal {T}_{h_k}\) by \(\mathcal {T}_{h_{k+1}}=\mathrm{REFINE}(\mathcal {T}_{h_k},\mathcal {M}_{h_k})\) with a subset \(\mathcal {M}_{h_k}\subset \mathcal {T}_{h_k}\). Then there exists a constant \({\hat{C}}_0\) depending on \(\mathcal {T}_{h_0}\) and b such that

$$\begin{aligned} \#\mathcal {T}_{h_{k+1}}-\#\mathcal {T}_{h_0}\leqslant {\hat{C}}_0\sum \limits _{i=0}^k\# \mathcal {M}_{h_{i}}\quad \forall k\geqslant 1. \end{aligned}$$
(2.22)

We define

$$\begin{aligned} \Vert (y,p)\Vert _a^2=a(y,y)+a(p,p). \end{aligned}$$

The convergence of adaptive algorithm based on Dörfler’s marking strategy is proven in [7] and the techniques are then used extensively for the convergence analysis of AFEM for a different kind of boundary value problems. The following Theorem 2.4, Lemma 2.5 and Lemma 2.6 are extensions of corresponding results for the single elliptic equation in [7] by some primary operations. We remark that in [10] the authors used a similar idea to prove the convergence of adaptive finite element computations for multiple eigenvalues.

Theorem 2.4

Let \((y_{h_k},p_{h_k})\in V_{h_k}\times V_{h_k}\) be a sequence of finite element solutions of problems (2.2) and (2.18) based on the adaptively refined mesh \(\mathcal {T}_{h_k}\) produced by AFEM. Then there exist constants \({\tilde{\gamma }}>0\) and \({\tilde{\beta }}\in (0,1)\), depending only on the shape regularity of meshes, the data and the parameters used in Dörfler’s marking algorithm, such that for any two consecutive iterates k and \(k+1\) we have

$$\begin{aligned}&\Vert (y-y_{h_{k+1}},p-p_{h_{k+1}})\Vert _{a}^2+{\tilde{\gamma }}\big ({\tilde{\eta }}^{2}_{h_{k+1}}(y_{h_{k+1}},\Omega )+{\tilde{\eta }}^{2}_{h_{k+1}}(p_{h_{k+1}},\Omega )\big )\nonumber \\&\qquad \leqslant {\tilde{\beta }}^{2}\Big (\Vert (y-y_{h_{k}},p-p_{h_{k}})\Vert _{a}^2+{\tilde{\gamma }}\big ({\tilde{\eta }}^{2}_{h_{k}}(y_{h_{k}},\Omega )+{\tilde{\eta }}^{2}_{h_{k}}(p_{h_{k}},\Omega )\big )\Big ). \end{aligned}$$
(2.23)

Here

$$\begin{aligned} {\tilde{\gamma }} :=\frac{1}{(1+\delta ^{-1})C_*^2} \end{aligned}$$
(2.24)

with some constant \(\delta \in (0,1)\).

To prove the optimal complexity of the adaptive algorithm we need further results. The following lemma presents a localized upper bound estimate for the distance between two nested solutions of the elliptic boundary value problems (2.2) and (2.18) (see [7, Lemma 3.6] and [11, Lemma 6.2]).

Lemma 2.5

Let \((y_{h_k},p_{h_k})\in V_{h_k}\times V_{h_k}\) and \((y_{h_{k+1}},p_{h_{k+1}})\in V_{h_{k+1}}\times V_{h_{k+1}}\) be the discrete solutions of problems (2.2) and (2.18) over a mesh \(\mathcal {T}_{h_k}\) and its refinement \(\mathcal {T}_{h_{k+1}}\) with marked element \(\mathcal {M}_{h_{k}}\subset \mathcal {T}_{h_k}\). Let \(R_{\mathcal {T}_{h_k}\rightarrow \mathcal {T}_{h_{k+1}}}\) be the set of refined elements. Then the following localised upper bound is valid

$$\begin{aligned} \Vert (y_{h_k}-y_{h_{k+1}},p_{h_k}-p_{h_{k+1}})\Vert _{a}^2\leqslant {\tilde{C}}_{1}\sum \limits _{T\in R_{\mathcal {T}_{h_k}\rightarrow \mathcal {T}_{h_{k+1}}}}\big ({\tilde{\eta }}_{h_k}^2(y_{h_k},T)+{\tilde{\eta }}_{h_k}^2(p_{h_k},T)\big ).\nonumber \\ \end{aligned}$$
(2.25)

Consequently, we can show the optimality of Dörfler’s marking strategy in the following lemma (see [7, Lemma 5.9] and [11, Proposition 6.3] for the proof).

Lemma 2.6

Let \((y_{h_k},p_{h_k})\in V_{h_k}\times V_{h_k}\) and \((y_{h_{k+1}},p_{h_{k+1}})\in V_{h_{k+1}}\times V_{h_{k+1}}\) be the discrete solutions of problems (2.2) and (2.18) over a mesh \(\mathcal {T}_{h_k}\) and its refinement \(\mathcal {T}_{h_{k+1}}\) with marked element \(\mathcal {M}_{h_{k}}\subset \mathcal {T}_{h_k}\). Suppose that they satisfy the energy decrease property

$$\begin{aligned}&\Vert (y-y_{h_{k+1}},p-p_{h_{k+1}})\Vert _{a}^2+{\tilde{\gamma }}_{0}\left( \mathrm{osc}^2(f-Ly_{h_{k+1}},\mathcal {T}_{h_{k+1}})+\mathrm{osc}^2(g-L^*p_{h_{k+1}},\mathcal {T}_{h_{k+1}})\right) \nonumber \\&\quad \leqslant {\tilde{\beta }}_0^2\left( \Vert (y-y_{h_{k}},p-p_{h_{k}})\Vert _{a}^2+{\tilde{\gamma }}_{0}\big (\mathrm{osc}^2(f-Ly_{h_k},\mathcal {T}_{h_k})+\mathrm{osc}^2(g-L^*p_{h_k},\mathcal {T}_{h_k})\big )\right) \nonumber \\ \end{aligned}$$
(2.26)

with \({\tilde{\gamma }}_{0}>0\) a constant and \({\tilde{\beta }}_{0}^{2}\in (0,{1\over 2})\). Then the set \(R_{\mathcal {T}_{h_k}\rightarrow \mathcal {T}_{h_{k+1}}}\) of marked elements satisfies the Dörfler property

$$\begin{aligned} \sum \limits _{T\in R_{\mathcal {T}_{h_k}\rightarrow \mathcal {T}_{h_{k+1}}}}\big ({\tilde{\eta }}_{h_{k}}^{2}(y_{h_{k}},T)+{\tilde{\eta }}_{h_k}^{2}(p_{h_k},T)\big )\geqslant {\tilde{\theta }} \sum \limits _{T\in \mathcal {T}_{h_k}}\big ({\tilde{\eta }}_{h_k}^2(y_{h_k},T)+{\tilde{\eta }}_{h_k}^2(p_{h_k},T)\big )\nonumber \\ \end{aligned}$$
(2.27)

with \({\tilde{\theta }}=\frac{{\tilde{C}}_{2}(1-2{\tilde{\beta }}_{0}^{2})}{{\tilde{C}}_{0}({\tilde{C}}_{1}+(1+2C_{*}^{2}{\tilde{C}}_{1}){\tilde{\gamma }}_{0})}\), where \({\tilde{C}}_{0}=\max (1,\frac{{\tilde{C}}_{3}}{{\tilde{\gamma }}_{0}})\).

3 Adaptive finite element method for the optimal control problem

In this section we consider the following elliptic optimal control problem:

$$\begin{aligned} \min \limits _{u\in U_{ad}}\ \ J(y,u)={1\over 2}\Vert y-y_d\Vert _{0,\Omega }^2 + \frac{\alpha }{2}\Vert u\Vert _{0,\Omega }^2 \end{aligned}$$
(3.1)

subject to

$$\begin{aligned} \left\{ \begin{array}{ll} Ly=u &{}\quad \mathrm{in}\ \Omega , \\ \ y=0 &{}\quad \mathrm{on}\ \partial \Omega , \end{array}\right. \end{aligned}$$
(3.2)

where \(\alpha >0\) is a fixed parameter, \(y_d\in L^2(\Omega )\) is the desired state and \(U_{ad}\) is the admissible control set with bilateral control constraints:

$$\begin{aligned} U_{ad}:= \Big \{u\in L^2(\Omega ), \quad a\leqslant u\leqslant b\ \mathrm{a.e.}\ \mathrm{in}\ \Omega \Big \}, \end{aligned}$$

where \(a,b\in \mathbb {R}\) and \(a<b\).

Remark 3.1

We remark that all the theories presented below can be generalized to the case that the control acts on a subdomain \(\omega \subset \Omega \). In this case the control operator \(B:L^2(\omega )\rightarrow L^2(\Omega )\) is an extension by zero operator and the governing equation reads \(Ly=Bu\).

With the solution operator S of the elliptic equation (3.2) introduced in the last section, we can formulate the reduced optimization problem

$$\begin{aligned} \min \limits _{u\in U_{ad}}\ \ {\hat{J}}(u):=J(Su,u)={1\over 2}\Vert Su-y_d\Vert _{0,\Omega }^2 + \frac{\alpha }{2}\Vert u\Vert _{0,\Omega }^2. \end{aligned}$$
(3.3)

Since the above optimization problem is linear and strictly convex, there exists a unique solution \(u\in U_{ad}\) by standard arguments (see [25]). Moreover, the first order necessary and sufficient optimality condition can be stated as follows:

$$\begin{aligned} {\hat{J}}'(u)(v-u)=(\alpha u+S^*(Su-y_d),v-u)\geqslant 0, \ \ \ \ \ \forall v\in U_{ad}, \end{aligned}$$
(3.4)

where \(S^*\) is the adjoint of S [21]. Introducing the adjoint state \(p:=S^*(Su-y_d)\in H_0^1(\Omega )\), we are led to the following optimality system:

$$\begin{aligned} \left\{ \begin{array}{ll}a(y,v)=(u,v),&{}\quad \forall v\in H_0^1(\Omega ),\\ a(w,p)=(y-y_d,w), &{}\quad \forall w\in H_0^1(\Omega ),\\ (\alpha u+p,v-u)\geqslant 0, &{}\quad \forall v\in U_{ad}. \end{array} \right. \end{aligned}$$
(3.5)

Hereafter, we call u, y and p the optimal control, state and adjoint state, respectively. From the last inequality of (3.5) we have the pointwise representation of u (see [25]):

$$\begin{aligned} u=P_{[a,b]}\left\{ -\frac{1}{\alpha }p\right\} , \end{aligned}$$
(3.6)

where \(P_{[a,b]}\) is the orthogonal projection operator from \(L^2(\Omega )\) to \(U_{ad}\).

Next, let us consider the finite element approximation of (3.1)–(3.2). In this paper, we use the piecewise linear finite elements to approximate the state y, and variational discretization for the optimal control u (see [20]). Based on the finite element space \(V_h\), we can define the finite dimensional approximation to the optimal control problem (3.1)–(3.2) as follows: Find \((u_h,y_h)\in U_{ad}\times V_h\) such that

$$\begin{aligned} \min \limits _{u_h\in U_{ad}}J_h( y_h,u_h)={1\over 2}\Vert y_h-y_d\Vert _{0,\Omega }^2 + \frac{\alpha }{2}\Vert u_h\Vert _{0,\Omega }^2 \end{aligned}$$
(3.7)

subject to

$$\begin{aligned} a(y_h,v_h)=(u_h,v_h),\quad \forall v_h\in V_h. \end{aligned}$$
(3.8)

Similar to the continuous case we have \(y_h=S_hu_h\). With this notation we can formulate a reduced discrete optimization problem

$$\begin{aligned} \min \limits _{u_h\in U_{ad}} {\hat{J}}_h(u_h):=J_h(S_hu_h,u_h)={1\over 2}\Vert S_hu_h-y_d\Vert _{0,\Omega }^2 + \frac{\alpha }{2}\Vert u_h\Vert _{0,\Omega }^2. \end{aligned}$$
(3.9)

We note that the above optimization problem can be solved by the projected gradient method or the semi-smooth Newton method, see [5, 19, 21, 30] for more details.

Similar to the continuous problem (3.1)–(3.2), the above discretized optimization problem also admits a unique solution \({u}_h\in U_{ad}\). Moreover, the first order necessary and sufficient optimality condition can be stated as follows:

$$\begin{aligned} {\hat{J}}'_h( u_h)(v_h- u_h)=(\alpha u_h+S^*_h(S_h u_h-y_d),v_h-u_h)\geqslant 0, \ \forall v_h\in U_{ad}, \end{aligned}$$
(3.10)

where \(S^*_h\) is the adjoint of \(S_h\). Introducing the adjoint state \(p_h:=S^*_h(S_hu_h-y_d)\in V_h\), the discretized first order necessary and sufficient optimality condition is equivalent to:

$$\begin{aligned} \left\{ \begin{array}{ll}a(y_h,v_h)=( u_h,v_h), &{}\quad \forall v_h\in V_h,\\ a(w_h,p_h)=(y_h-y_d,w_h), &{}\quad \forall w_h\in V_h,\\ (\alpha u_h+ p_h,v_h-u_h)\geqslant 0, &{}\quad \forall v_h\in U_{ad}. \end{array} \right. \end{aligned}$$
(3.11)

Hereafter, we call \(u_h\), \(y_h\) and \(p_h\) the discrete optimal control, state and adjoint state, respectively. Similar to the continuous case (3.6) we have

$$\begin{aligned} u_h=P_{[a,b]}\left\{ -\frac{1}{\alpha }p_h\right\} . \end{aligned}$$
(3.12)

It should be noticed that \(u_h\) is not generally a finite element function in \(V_h\).

For convenience we define \(y^h:=Su_h\) and \(p^h:=S^*(S_hu_h-y_d)\). It is obvious that \(y_h\) and \(p_h\) are the standard Galerkin projections of \(y^h\) and \(p^h\), i.e., \(y_h=\mathcal {R}_hy^h\) and \(p_h=\mathcal {R}_hp^h\). The following equivalence property is established in [22].

Theorem 3.2

Let \((u,y,p)\in U_{ad}\times H_0^1(\Omega )\times H_0^1(\Omega )\) and \(( u_h, y_h, p_h)\in U_{ad}\times V_h\times V_h\) be the solutions of problems (3.1)–(3.2) and (3.7)–(3.8), respectively. Then the following an equivalence property holds:

$$\begin{aligned} \Vert u-u_h\Vert _{0,\Omega }+\Vert y-y_h\Vert _{a,\Omega } +\Vert p- p_h\Vert _{a,\Omega }\approx \Vert y^h- y_h\Vert _{a,\Omega } +\Vert p^h- p_h\Vert _{a,\Omega }.\nonumber \\ \end{aligned}$$
(3.13)

Proof

For completeness we include a brief proof. Setting \(v= u_h\) in (3.4) and \(v_h = u\) in (3.10) we are led to

$$\begin{aligned} (\alpha u+S^*(Su-y_d),u_h-u)\geqslant 0, \end{aligned}$$
(3.14)
$$\begin{aligned} (\alpha u_h+S^*_h(S_hu_h-y_d),u-u_h)\geqslant 0. \end{aligned}$$
(3.15)

Adding the above two inequalities, we conclude from (3.5) that

$$\begin{aligned} \alpha \Vert u- u_h\Vert ^2_{0,\Omega }\leqslant & {} (S^*_h(S_hu_h-y_d)-S^*(Su-y_d),u- u_h)\nonumber \\= & {} (S^*_h(S_h u_h-y_d)-S^*(S_hu_h-y_d),u- u_h)\nonumber \\&+\,(S^*(S_hu_h-y_d)-S^*(Su-y_d),u-u_h)\nonumber \\= & {} (S^*_h(S_hu_h-y_d)-S^*(S_hu_h-y_d),u-u_h)\nonumber \\&+\,(S_hu_h-Su,Su-Su_h)\nonumber \\= & {} (S^*_h(S_h u_h-y_d)-S^*(S_hu_h-y_d),u-u_h)\nonumber \\&+\,(S_hu_h-Su,Su-S_hu_h)\nonumber \\&+\,(S_h u_h-Su,S_h u_h-Su_h). \end{aligned}$$
(3.16)

It follows from Young’s inequality that

$$\begin{aligned} \alpha \Vert u-u_h\Vert ^2_{0,\Omega }\leqslant C\Vert Su_h-S_h u_h\Vert ^2_{a,\Omega }+C\Vert S^*(S_hu_h-y_d)-S^*_h(S_hu_h-y_d)\Vert _{a,\Omega }^2.\nonumber \\ \end{aligned}$$
(3.17)

Moreover, from (3.5) we have

$$\begin{aligned} \Vert y-y_h\Vert _{a,\Omega }\leqslant & {} \Vert y-Su_h\Vert _{a,\Omega }+\Vert Su_h-y_h\Vert _{a,\Omega }\\\leqslant & {} C\Vert u-u_h\Vert _{0,\Omega }+\Vert Su_h-y_h\Vert _{a,\Omega } \end{aligned}$$

and

$$\begin{aligned} \Vert p- p_h\Vert _{a,\Omega }\leqslant & {} \Vert p-S^*(S_hu_h-y_d)\Vert _{a,\Omega }+\Vert S^*(S_hu_h-y_d)- p_h\Vert _{a,\Omega }\\\leqslant & {} C\Vert Su-S_hu_h\Vert _{0,\Omega }+\Vert S^*(S_hu_h-y_d)-p_h\Vert _{a,\Omega }\\\leqslant & {} C\Vert u-u_h\Vert _{0,\Omega }+\Vert S^*(S_hu_h-y_d)-p_h\Vert _{a,\Omega }+C\Vert S u_h-y_h\Vert _{a,\Omega }. \end{aligned}$$

Combining the above estimates we prove the upper bound.

Now we prove the lower bound. Note that

$$\begin{aligned} \Vert S u_h-S_hu_h\Vert _{a,\Omega }\leqslant & {} \Vert Su_h-Su\Vert _{a,\Omega }+\Vert Su-S_hu_h\Vert _{a,\Omega }\nonumber \\\leqslant & {} C\Vert u- u_h\Vert _{0,\Omega }+\Vert y- y_h\Vert _{a,\Omega }. \end{aligned}$$
(3.18)

Similarly, we can derive that

$$\begin{aligned}&\Vert S^*(S_hu_h-y_d)-S^*_h(S_hu_h-y_d)\Vert _{a,\Omega }\nonumber \\&\quad \leqslant \Vert S^*(S_hu_h-y_d)-S^*(Su-y_d)\Vert _{a,\Omega }+\Vert S^*(Su-y_d)-S^*_h(S_hu_h-y_d)\Vert _{a,\Omega }\nonumber \\&\quad \leqslant C\Vert S_hu_h-Su\Vert _{0,\Omega }+\Vert p- p_h\Vert _{a,\Omega }\nonumber \\&\quad =C\Vert y-y_h\Vert _{a,\Omega }+\Vert p- p_h\Vert _{a,\Omega }. \end{aligned}$$
(3.19)

Thus, we can conclude from the above estimates the lower bound. This completes the proof. \(\square \)

Next, we will prove a compact equivalence property which shows the certain relationship between the finite element optimal control approximation and the associated finite element boundary value approximation.

Theorem 3.3

Let \(h\in (0,h_0)\), \((u,y,p)\in U_{ad}\times H_0^1(\Omega )\times H_0^1(\Omega )\) and \((u_h,y_h,p_h)\in U_{ad}\times V_h\times V_h\) be the solutions of problems (3.1)–(3.2) and (3.7)–(3.8), respectively. Then the following equivalence properties hold

$$\begin{aligned} \Vert y-y_h\Vert _{a,\Omega }= & {} \Vert y^h-y_h\Vert _{a,\Omega }+O(\kappa (h))\Big (\Vert y-y_h\Vert _{a,\Omega } +\Vert p- p_h\Vert _{a,\Omega }\Big ),\end{aligned}$$
(3.20)
$$\begin{aligned} \Vert p-p_h\Vert _{a,\Omega }= & {} \Vert p^h- p_h\Vert _{a,\Omega } +O(\kappa (h))\Big (\Vert y-y_h\Vert _{a,\Omega } +\Vert p- p_h\Vert _{a,\Omega }\Big ) \end{aligned}$$
(3.21)

provided \(h_0\ll 1\).

Proof

It is obvious that

$$\begin{aligned} y-y_h=y^h- y_h+y-y^h,\quad p-p_h=p^h- p_h+p-p^h. \end{aligned}$$
(3.22)

Moreover, it follows from the stability results of the elliptic equation that

$$\begin{aligned} \Vert y-y^h\Vert _{a,\Omega }\leqslant C\Vert u-u_h\Vert _{0,\Omega },\quad \Vert p- p^h\Vert _{a,\Omega }\leqslant C\Vert y-y_h\Vert _{0,\Omega }. \end{aligned}$$
(3.23)

In the following we estimate \(\Vert y-y_h\Vert _{0,\Omega }\). Let \(\psi \in H_0^1(\Omega )\) be the solution of the following auxiliary problem

$$\begin{aligned} \left\{ \begin{array}{ll} L^*\psi = y-y_h &{}\quad \mathrm{in}\ \Omega , \\ \ \psi =0 &{}\quad \mathrm{on}\ \partial \Omega . \end{array}\right. \end{aligned}$$
(3.24)

Let \(\psi _h\in V_h\) be the finite element approximation of \(\psi \). In the following proof we use the duality argument (see, e.g., [9]). Multiplying by \(y-y_h\) on both sides of (3.24) and integration by parts, we can conclude from (2.7) that

$$\begin{aligned} \Vert y-y_h\Vert _{0,\Omega }^2= & {} a(y-y_h,\psi )\\= & {} a(y-y_h,\psi -\psi _h)+a(y-y_h,\psi _h)\\= & {} a(y-y_h,\psi -\psi _h)+(u-u_h,\psi _h-\psi )+(u-u_h,\psi )\\\leqslant & {} C\kappa (h)\Vert y-y_h\Vert _{a,\Omega }\Vert y-y_h\Vert _{0,\Omega }\\&+\,C(1+\kappa (h))\Vert u-u_h\Vert _{0,\Omega }\Vert y-y_h\Vert _{0,\Omega }\\\leqslant & {} C\Big (\kappa (h)\Vert y-y_h\Vert _{a,\Omega }+\Vert u-u_h\Vert _{0,\Omega }\Big )\Vert y-y_h\Vert _{0,\Omega }, \end{aligned}$$

where \(\kappa (h)\) is defined in (2.6). This in turn implies

$$\begin{aligned} \Vert y- y_h\Vert _{0,\Omega }\leqslant C\kappa (h)\Vert y-y_h\Vert _{a,\Omega }+C\Vert u-u_h\Vert _{0,\Omega }. \end{aligned}$$
(3.25)

Considering (3.23) we have

$$\begin{aligned} \Vert p- p^h\Vert _{a,\Omega }\leqslant C\kappa (h)\Vert y-y_h\Vert _{a,\Omega }+C\Vert u-u_h\Vert _{0,\Omega }. \end{aligned}$$
(3.26)

It remains to estimate \(\Vert u-u_h\Vert _{0,\Omega }\). Note that it follows from (3.14), (3.15) and the definition of \(S^*_h\) that

$$\begin{aligned} \alpha \Vert u- u_h\Vert ^2_{0,\Omega }\leqslant & {} (S^*_h(S_hu_h-y_d)-S^*(Su-y_d),u- u_h)\\= & {} (S^*_h(S_h u_h-y_d)-S^*_h(S_hu-y_d),u- u_h)\\&+\,(S^*_h(S_hu-y_d)-S^*(Su-y_d),u-u_h)\\= & {} (S_h(u_h-u),S_h(u-u_h))+(S^*_h(S_hu-y_d)-S^*(Su-y_d),u-u_h)\\\leqslant & {} (S^*_h(S_hu-y_d)-S^*(Su-y_d),u-u_h), \end{aligned}$$

which yields

$$\begin{aligned} \Vert u-u_h\Vert _{0,\Omega }\leqslant C\Vert S^*_h(S_hu-y_d)-S^*(Su-y_d)\Vert _{0,\Omega }. \end{aligned}$$
(3.27)

Let \(\phi \in H_0^1(\Omega )\) be the solution of the following auxiliary problem

$$\begin{aligned} \left\{ \begin{array}{ll} L\phi = S^*_h(S_hu-y_d)-S^*(Su-y_d) &{}\quad \mathrm{in}\ \Omega , \\ \ \phi =0 &{}\quad \mathrm{on}\ \partial \Omega . \end{array}\right. \end{aligned}$$
(3.28)

Now we use the duality argument again. It follows from the continuous and discrete adjoint state equations that

$$\begin{aligned}&\Vert S^*_h(S_hu-y_d)-S^*(Su-y_d)\Vert _{0,\Omega }^2=a(\phi ,S^*_h(S_hu-y_d)-S^*(Su-y_d))\nonumber \\&\quad =a(\phi -\phi _h,S^*_h(S_hu-y_d)-S^*(Su-y_d))+a(\phi _h,S^*_h(S_hu-y_d)-S^*(Su-y_d))\nonumber \\&\quad =a(\phi -\phi _h,S^*_h(S_hu-y_d)-S^*(Su-y_d))+(\phi _h,S_hu-Su)\nonumber \\&\quad =a(\phi -\phi _h,S^*_h(S_hu-y_d)-S^*(Su-y_d))+(\phi _h-\phi ,S_hu-Su)+(\phi ,S_hu-Su),\nonumber \\ \end{aligned}$$
(3.29)

where \(\phi _h\in V_h\) is the finite element approximation of \(\phi \). We can conclude from (2.7), (2.8) that

$$\begin{aligned}&a(\phi -\phi _h,S^*_h(S_hu-y_d)-S^*(Su-y_d))\nonumber \\&\quad \leqslant C\kappa (h)\Vert S^*_h(S_hu-y_d)-S^*(Su-y_d)\Vert _{0,\Omega }\Vert S^*_h(S_hu-y_d)-S^*(Su-y_d)\Vert _{a,\Omega }\nonumber \\ \end{aligned}$$
(3.30)

and

$$\begin{aligned} (\phi _h-\phi ,S_hu-Su)\leqslant & {} C\kappa ^2(h)\Vert S^*_h(S_hu-y_d)-S^*(Su-y_d)\Vert _{0,\Omega }\Vert S_hu-Su\Vert _{a,\Omega },\nonumber \\\end{aligned}$$
(3.31)
$$\begin{aligned} (\phi ,S_hu-Su)\leqslant & {} C\kappa (h)\Vert S^*_h(S_hu-y_d)-S^*(Su-y_d)\Vert _{0,\Omega }\Vert S_hu-Su\Vert _{a,\Omega }.\nonumber \\ \end{aligned}$$
(3.32)

Using the fact that \(\kappa ^2(h)< \kappa (h)\ll 1\) when \(h_0\ll 1\), we are able to derive that

$$\begin{aligned}&\Vert S^*_h(S_hu-y_d)-S^*(Su-y_d)\Vert _{0,\Omega }\nonumber \\&\qquad \leqslant C\kappa (h)(\Vert S^*_h(S_hu-y_d)-S^*(Su-y_d)\Vert _{a,\Omega }+\Vert S_hu-Su\Vert _{a,\Omega }).\quad \end{aligned}$$
(3.33)

Combining (3.27) and (3.33) we are led to

$$\begin{aligned}&\Vert u-u_h\Vert _{0,\Omega }\nonumber \\&\quad \lesssim \kappa (h)(\Vert S^*_h(S_hu-y_d)-S^*(Su-y_d)\Vert _{a,\Omega }+\Vert S_hu-Su\Vert _{a,\Omega })\nonumber \\&\quad \lesssim \kappa (h)(\Vert p_h-p\Vert _{a,\Omega }+\Vert S^*_h(S_hu-y_d)-S^*_h(S_hu_h-y_d)\Vert _{a,\Omega }+\Vert S_hu-Su\Vert _{a,\Omega })\nonumber \\&\quad \lesssim \kappa (h)(\Vert p_h-p\Vert _{a,\Omega }+\Vert S_hu-S_hu_h\Vert _{a,\Omega }+\Vert S_hu-Su\Vert _{a,\Omega })\nonumber \\&\quad \lesssim \kappa (h)(\Vert p_h-p\Vert _{a,\Omega }+\Vert S_hu_h-Su\Vert _{a,\Omega }+\Vert S_hu-S_hu_h\Vert _{a,\Omega })\nonumber \\&\quad \lesssim \kappa (h)(\Vert p_h-p\Vert _{a,\Omega }+\Vert y_h-y\Vert _{a,\Omega }+\Vert u-u_h\Vert _{0,\Omega }). \end{aligned}$$
(3.34)

If \(h_0\ll 1\) then \(\kappa (h)\ll 1\) for all \(h\in (0,h_0)\), and we arrive at

$$\begin{aligned} \Vert u-u_h\Vert _{0,\Omega }\lesssim \kappa (h)(\Vert p_h-p\Vert _{a,\Omega }+\Vert y_h-y\Vert _{a,\Omega }). \end{aligned}$$
(3.35)

Inserting the above estimate into (3.23) and (3.26), we can conclude from (3.22) the desired results (3.20), (3.21). This completes the proof. \(\square \)

Now we are in the position to consider the adaptive finite element method for the optimal control problem (3.1)–(3.2). At first we will derive a posteriori error estimates for above optimal control problems. To begin with, we firstly introduce some notations. Similar to the definitions (2.9) and (2.10) we define the element residuals \(r_{y,T}(y_h)\), \(r_{p,T}(p_h)\) and the jump residuals \(j_{y,E}(y_h)\), \(j_{p,E}(p_h)\) by

$$\begin{aligned} r_{y,T}(y_h):= & {} u_h-Ly_h=u_h+\nabla \cdot (A\nabla y_h)-a_0y_h\quad \mathrm{in}\ T\in \mathcal {T}_h,\end{aligned}$$
(3.36)
$$\begin{aligned} r_{p,T}(p_h):= & {} y_h-y_d-L^*p_h=y_h-y_d+\nabla \cdot (A^*\nabla p_h)-a_0p_h\quad \mathrm{in}\ T\in \mathcal {T}_h,\nonumber \\\end{aligned}$$
(3.37)
$$\begin{aligned} j_{y,E}(y_h):= & {} [A\nabla y_h]_E\cdot n_E\quad \mathrm{on}\ E\in \mathcal {E}_h,\end{aligned}$$
(3.38)
$$\begin{aligned} j_{p,E}(p_h):= & {} [A^*\nabla p_h]_E\cdot n_E\quad \mathrm{on}\ E\in \mathcal {E}_h. \end{aligned}$$
(3.39)

For each element \(T\in \mathcal {T}_h\), we define the local error indicators \(\eta _{y,h}(y_h,T)\) and \(\eta _{p,h}(p_h,T)\) by

$$\begin{aligned} \eta _{y,h}(y_h,T):= \left( h_T^2\Vert r_{y,T}(y_h)\Vert _{0,T}^2+\sum \limits _{E\in \mathcal {E}_h,E\subset \partial T}h_E\Vert j_{y,E}(y_h)\Vert _{0,E}^2\right) ^{1\over 2},\end{aligned}$$
(3.40)
$$\begin{aligned} \eta _{p,h}(p_h,T):= \left( h_T^2\Vert r_{p,T}(p_h)\Vert _{0,T}^2+\sum \limits _{E\in \mathcal {E}_h,E\subset \partial T}h_E\Vert j_{p,E}(p_h)\Vert _{0,E}^2\right) ^{1\over 2}. \end{aligned}$$
(3.41)

Then on a subset \(\omega \subset \Omega \), we define the error estimators \(\eta _{y,h} (y_h,\omega )\) and \(\eta _{p,h}(p_h,\omega )\) by

$$\begin{aligned} \eta _{y,h}(y_h,\omega ):=\left( \sum \limits _{T\in \mathcal {T}_h,T\subset \omega }\eta _{y,h}^2(y_h,T)\right) ^{1\over 2}, \end{aligned}$$
(3.42)
$$\begin{aligned} \eta _{p,h}(p_h,\omega ):=\left( \sum \limits _{T\in \mathcal {T}_h,T\subset \omega }\eta _{p,h}^2( p_h,T)\right) ^{1\over 2}. \end{aligned}$$
(3.43)

Thus, \(\eta _{y,h}(y_h,\Omega )\) and \(\eta _{p,h}(p_h,\Omega )\) constitute the error estimators for the state equation and the adjoint state equation on \(\Omega \) with respect to \(\mathcal {T}_h\).

Note that \(S_hu_h\) and \(S^*_h(S_hu_h-y_d)\) are the standard Galerkin projections of \(Su_h\) and \(S^*(S_hu_h-y_d)\), respectively. Similar to (2.16)–(2.17), standard a posterior error estimates for elliptic boundary value problems give the following upper bounds (see, e.g., [36]) which show the reliability of the error estimators.

Lemma 3.4

Let S and \(S_h\) be the continuous and discrete solution operators defined above. Then the following a posteriori error estimates hold

$$\begin{aligned} \Vert S u_h-S_hu_h\Vert _{a,\Omega }^2\leqslant & {} {\tilde{C}}_{1}{\eta _{y,h}}^{2}(y_h,\Omega ),\end{aligned}$$
(3.44)
$$\begin{aligned} \Vert S^*(S_hu_h-y_d)-S^*_h(S_hu_h-y_d)\Vert ^2_{a,\Omega }\leqslant & {} {\tilde{C}}_{1}{\eta _{p,h}}^{2}(p_h,\Omega ). \end{aligned}$$
(3.45)

Then we can also derive the following global a posteriori error lower bounds, i.e., the global efficiency of the error estimators.

Lemma 3.5

Let S and \(S_h\) be the continuous and discrete solution operators defined above. Then the following a posteriori error lower bounds hold

$$\begin{aligned} {\tilde{C}}_{2}{\eta _{y,h}}^{2}(y_{h}, \Omega )\leqslant & {} \Vert S u_h-S_hu_h\Vert ^2_{a,\Omega }+ {\tilde{C}}_{3}\mathrm{osc}^2(u_h-Ly_h,\mathcal {T}_h),\end{aligned}$$
(3.46)
$$\begin{aligned} {\tilde{C}}_{2}{\eta _{p,h}}^{2}( p_h,\Omega )\leqslant & {} \Vert S^*(S_h u_h-y_d)-S^*_h(S_hu_h-y_d)\Vert ^2_{a,\Omega }\nonumber \\&+\, {\tilde{C}}_{3}\mathrm{osc}^2(y_h-y_d-L^*p_h,\mathcal {T}_h). \end{aligned}$$
(3.47)

Let \(h_0\in (0,1)\) be the mesh size of the initial mesh \(\mathcal {T}_{h_0}\) and define

$$\begin{aligned} {\tilde{\kappa }}(h_0):=\sup \limits _{h\in (0,h_0]}\kappa (h). \end{aligned}$$

It is obvious that \({\tilde{\kappa }}(h_0)\ll 1\) if \(h_0\ll 1\). For ease of exposition we also define the following quantities:

$$\begin{aligned} \eta _h^2((y_h,p_h),T)= & {} \eta _{y,h}^2(y_{h},T)+\eta _{p,h}^2(p_{h},T),\\ \mathrm{osc}^2((y_h,p_h),T)= & {} \mathrm{osc}^2(u_h-Ly_h,T)+\mathrm{osc}^2(y_h-y_d-L^*p_h,T), \end{aligned}$$

and the straightforward modifications for \(\eta _h^2((y_h,p_h),\Omega )\) and \(\mathrm{osc}^2((y_h,p_h),\mathcal {T}_h)\).

Now we state the following a posteriori error estimates for the finite element approximation of the optimal control problem.

Theorem 3.6

Let \(h\in (0,h_0)\). Assume that \((u,y,p)\in U_{ad}\times H_0^1(\Omega )\times H_0^1(\Omega )\) and \((u_h,y_h,p_h)\in U_{ad}\times V_h\times V_h\) are the solutions of problems (3.1)–(3.2) and (3.7)–(3.8), respectively. Then there exist positive constants \(C_1\), \(C_2\) and \(C_3\), independent of the mesh size h, such that

$$\begin{aligned} \Vert (y-y_h,p- p_h)\Vert _{a}^2\leqslant C_1\eta _{h}^2( (y_h,p_h),\Omega ) \end{aligned}$$
(3.48)

and

$$\begin{aligned} C_2\eta _{h}^2((y_h,p_h),\Omega )\leqslant \Vert (y-y_h,p-p_h)\Vert _{a}^2+C_3\mathrm{osc}^2((y_h,p_h),\mathcal {T}_h) \end{aligned}$$
(3.49)

provided \(h_0\ll 1\).

Proof

Note that \(y^h = Su_h\), \(y_h=S_hu_h\), \(p^h=S^*(S_h u_h-y_d)\) and \(p_h=S^*_h(S_h u_h-y_d)\). From the estimates (3.20), (3.21) and Lemmas 3.4 and 3.5 we have

$$\begin{aligned}&\Vert (y-y_h,p-p_h)\Vert _{a}^2\nonumber \\&\quad \leqslant 2(\Vert y^h-y_h\Vert _{a,\Omega }^2+\Vert p^h-p_h\Vert _{a,\Omega }^2)+{\hat{C}}_1\kappa ^2(h)\Vert (y-y_h,p- p_h)\Vert _{a}^2\\&\quad \leqslant 2{\tilde{C}}_1\eta _{h}^2((y_h,p_h),\Omega )+{\hat{C}}_1{\tilde{\kappa }}^{2}(h_0)\Vert (y-y_h,p- p_h)\Vert _{a}^2 \end{aligned}$$

and

$$\begin{aligned} {\tilde{C}}_{2}{\eta _{h}}^{2}((y_h,p_h),\Omega )\leqslant & {} (\Vert y^h-y_h\Vert _{a,\Omega }^2+\Vert p^h-p_h\Vert _{a,\Omega }^2) +{\tilde{C}}_{3}\mathrm{osc}^{2}((y_h,p_h),\mathcal {T}_h)\\\leqslant & {} \Vert (y-y_h,p-p_h)\Vert _{a,\Omega }^2 +{\tilde{C}}_{3}\mathrm{osc}^2((y_h,p_h),\mathcal {T}_h)\\&+\,{\hat{C}}_2{\tilde{\kappa }}^{2}(h_0)\Vert (y-y_h,p- p_h)\Vert _{a}^2. \end{aligned}$$

We obtain the desired results by choosing

$$\begin{aligned} C_1=\frac{2{\tilde{C}}_1}{1-{\hat{C}}_1{\tilde{\kappa }}^{2}(h_0)},\quad C_2=\frac{{\tilde{C}}_{2}}{1+{\hat{C}}_2{\tilde{\kappa }}^{2}(h_0)},\quad C_3=\frac{{\tilde{C}}_{3}}{1+{\hat{C}}_2{\tilde{\kappa }}^{2}(h_0)}. \end{aligned}$$
(3.50)

\(\square \)

Now we present the adaptive algorithm for solving optimal control problems. Note that there are two error estimators \(\eta _{y,h}(y_h,T)\) and \(\eta _{p,h}(p_h,T)\) contributed to the state approximation and adjoint state approximation, respectively. We use the sum of the two estimators as our indicators for the marking strategy. The marking algorithm based on Dörfler’s strategy for optimal control problems can be described as follows

Algorithm 3.7

Dörfler’s marking strategy for OCPs

  1. (1)

    Given a parameter \(0<\theta <1\);

  2. (2)

    Construct a minimal subset \({{\mathcal {M}}}_{h}\subset \mathcal {T}_h\) such that

    $$\begin{aligned} \sum \limits _{T\in {{\mathcal {M}}}_{h}}\eta _{h}^2((y_h,p_h),T)\geqslant \theta \eta _{h}^2((y_h,p_h),\Omega ). \end{aligned}$$
  3. (3)

    Mark all the elements in \({{\mathcal {M}}}_{h}\).

Then we can present the adaptive finite element algorithm for the optimal control problem (3.7)–(3.8) as follows:

Algorithm 3.8

Adaptive finite element algorithm for OCPs:

  1. (1)

    Given an initial mesh \(\mathcal {T}_{h_0}\) with mesh size \(h_0\), construct the finite element space \(V_{h_0}\).

  2. (2)

    Set \(k=0\) and solve the optimal control problem (3.7)–(3.8) to obtain \((u_{h_k},y_{h_k},p_{h_k})\in U_{ad}\times V_{h_k}\times V_{h_k}\).

  3. (3)

    Compute the local error indicator \(\eta _{h_k}((y_{h_k},p_{h_k}),T)\).

  4. (4)

    Construct \({{\mathcal {M}}}_{h_k}\subset \mathcal {T}_{h_k}\) by the marking Algorithm 3.7.

  5. (5)

    Refine \({{ \mathcal {M}}}_{h_k}\) to get a new conforming mesh \(\mathcal {T}_{h_{k+1}}\) by procedure REFINE.

  6. (6)

    Construct the finite element space \(V_{h_{k+1}}\) and solve the optimal control problem (3.7)–(3.8) to obtain \((u_{h_{k+1}},y_{h_{k+1}},p_{h_{k+1}})\in U_{ad}\times V_{h_{k+1}}\times V_{h_{k+1}}\).

  7. (7)

    Set \(k=k+1\) and go to Step (3).

4 Convergence of AFEM for the optimal control problem

In this section we intend to prove the convergence of the adaptive Algorithm 3.8. The proof uses some ideas of [11, 16] and some results of [7]. Following Theorem 3.3, we may firstly establish some relationships between the two level approximations, which will be used in our analysis for both convergence and optimal complexity.

Theorem 4.1

Let \(h,H\in (0,h_0)\) and \((u,y,p)\in U_{ad}\times H_0^1(\Omega )\times H_0^1(\Omega )\) be the solution of problem (3.1)–(3.2). Assume that \((u_h, y_h, p_h)\in U_{ad}\times V_h\times V_h\) and \((u_H, y_H, p_H)\in U_{ad}\times V_H\times V_H\) are the solutions of problem (3.7)–(3.8), respectively. Define \(y^H:=Su_H\) and \(p^H:=S^*(S_Hu_H-y_d)\). Then the following properties hold

$$\begin{aligned} \Vert y-y_h\Vert _{a,\Omega }= & {} \left\| y^H-\mathcal {R}_h y^H\right\| _{a,\Omega } +O({\tilde{\kappa }}(h_0))\big (\Vert y-y_h\Vert _{a,\Omega }\nonumber \\&+\,\Vert y-y_H\Vert _{a,\Omega }+\Vert p-p_h\Vert _{a,\Omega }+\Vert p-p_H\Vert _{a,\Omega }\big ),\end{aligned}$$
(4.1)
$$\begin{aligned} \Vert p-p_h\Vert _{a,\Omega }= & {} \left\| p^H-\mathcal {R}_h p^H\right\| _{a,\Omega }+O({\tilde{\kappa }}(h_0))\big (\Vert y-y_h\Vert _{a,\Omega }\nonumber \\&+\,\Vert y-y_H\Vert _{a,\Omega }+\Vert p-p_h\Vert _{a,\Omega }+\Vert p-p_H\Vert _{a,\Omega }\big ),\end{aligned}$$
(4.2)
$$\begin{aligned} \mathrm{osc}(u_h-Ly_h,\mathcal {T}_h)= & {} \mathrm{osc}\left( u_H-L\mathcal {R}_hy^H,\mathcal {T}_h\right) +O({\tilde{\kappa }}(h_0))\big (\Vert y-y_h\Vert _{a,\Omega }\nonumber \\&+\,\Vert p-p_h\Vert _{a,\Omega }+\Vert y-y_H\Vert _{a,\Omega }+\Vert p-p_H\Vert _{a,\Omega }\big ),\end{aligned}$$
(4.3)
$$\begin{aligned} \mathrm{osc}(y_h-y_d-L^*p_h,\mathcal {T}_h)= & {} \mathrm{osc}\left( y_H-y_d-L^*\mathcal {R}_hp^H,\mathcal {T}_h\right) \nonumber \\&+\,O({\tilde{\kappa }}(h_0))\big (\Vert y-y_h\Vert _{a,\Omega }\nonumber \\&+\,\Vert p-p_h\Vert _{a,\Omega }+\Vert y-y_H\Vert _{a,\Omega }+\Vert p-p_H\Vert _{a,\Omega }\big )\qquad \qquad \end{aligned}$$
(4.4)

and

$$\begin{aligned} \eta _{y,h}(y_h,\Omega )= & {} {\tilde{\eta }}_{h}(\mathcal {R}_hy^H,\Omega )+O({\tilde{\kappa }}(h_0))\big (\Vert y-y_h\Vert _{a,\Omega }+\Vert y-y_H\Vert _{a,\Omega }\nonumber \\&+\,\Vert p-p_h\Vert _{a,\Omega }+\Vert p-p_H\Vert _{a,\Omega }\big ),\end{aligned}$$
(4.5)
$$\begin{aligned} \eta _{p,h}(p_h,\Omega )= & {} {\tilde{\eta }}_{h}(\mathcal {R}_hp^H,\Omega )+O({\tilde{\kappa }}(h_0))\big (\Vert y-y_h\Vert _{a,\Omega }+\Vert y-y_H\Vert _{a,\Omega }\nonumber \\&+\,\Vert p-p_h\Vert _{a,\Omega }+\Vert p-p_H\Vert _{a,\Omega }\big ) \end{aligned}$$
(4.6)

provided \(h_0\ll 1\).

Proof

Note that

$$\begin{aligned} y-y_h=y^H-\mathcal {R}_hy^H+\mathcal {R}_h(y^H-y^h)+y-y^H \end{aligned}$$
(4.7)

and

$$\begin{aligned} p-p_h=p^H-\mathcal {R}_hp^H+\mathcal {R}_h(p^H-p^h)+p-p^H. \end{aligned}$$
(4.8)

On the other hand, it follows from (2.4), the triangle inequality and the stability results for elliptic equation that

$$\begin{aligned} \Vert \mathcal {R}_h(y^H-y^h)\Vert _{a,\Omega }+\Vert y-y^H\Vert _{a,\Omega }\lesssim & {} \Vert y^H-y^h\Vert _{a,\Omega }+\Vert y-y^H\Vert _{a,\Omega }\nonumber \\\lesssim & {} \Vert y-y^h\Vert _{a,\Omega }+\Vert y-y^H\Vert _{a,\Omega }\nonumber \\\lesssim & {} \Vert u-u_h\Vert _{0,\Omega }+\Vert u-u_H\Vert _{0,\Omega } \end{aligned}$$
(4.9)

and

$$\begin{aligned} \Vert \mathcal {R}_h(p^H-p^h)\Vert _{a,\Omega }+\Vert p-p^H\Vert _{a,\Omega }\lesssim & {} \Vert p^H-p^h\Vert _{a,\Omega }+\Vert p-p^H\Vert _{a,\Omega }\nonumber \\\lesssim & {} \Vert y-y_h\Vert _{0,\Omega }+\Vert y-y_H\Vert _{0,\Omega }\nonumber \\\lesssim & {} \Vert u-u_h\Vert _{0,\Omega }+\kappa (h)\Vert y-y_h\Vert _{a,\Omega }\nonumber \\&+\Vert u-u_H\Vert _{0,\Omega }+\kappa (H)\Vert y-y_H\Vert _{a,\Omega },\nonumber \\ \end{aligned}$$
(4.10)

where in the last inequality we used (3.25). It follows from (3.35) that

$$\begin{aligned}&\Vert \mathcal {R}_h(y^H-y^h)\Vert _{a,\Omega }+\Vert y-y^H\Vert _{a,\Omega }+\Vert \mathcal {R}_h(p^H-p^h)\Vert _{a,\Omega }+\Vert p-p^H\Vert _{a,\Omega }\nonumber \\&\quad \lesssim \kappa (h)\Big (\Vert y-y_h\Vert _{a,\Omega }+\Vert p-p_h\Vert _{a,\Omega }\Big )+\kappa (H)\Big (\Vert y-y_H\Vert _{a,\Omega }+\Vert p-p_H\Vert _{a,\Omega }\Big )\nonumber \\&\quad \lesssim {\tilde{\kappa }}(h_0)\Big (\Vert y-y_h\Vert _{a,\Omega }+\Vert p-p_h\Vert _{a,\Omega }+\Vert y-y_H\Vert _{a,\Omega }+\Vert p-p_H\Vert _{a,\Omega }\Big ) \end{aligned}$$
(4.11)

provided \(h_0\ll 1\). Combining this with (4.7), (4.8) yields (4.1) and (4.2).

Then we prove (4.3), (4.4). Note that

$$\begin{aligned} u_h-Ly_h= & {} u_H-L\mathcal {R}_hy^H+L\mathcal {R}_h(y^H-y^h)+(u_h-u_H),\end{aligned}$$
(4.12)
$$\begin{aligned} y_h-y_d-L^*p_h= & {} y_H-y_d-L^*\mathcal {R}_hp^H+L^*\mathcal {R}_h(p^H-p^h)+(y_h-y_H).\qquad \qquad \end{aligned}$$
(4.13)

From Lemma 2.2 we have

$$\begin{aligned} \mathrm{osc}(L\mathcal {R}_h(y^H-y^h),\mathcal {T}_h)\lesssim \Vert \mathcal {R}_h(y^H-y^h)\Vert _{a,\Omega },\\ \mathrm{osc}(L^*\mathcal {R}_h(p^H-p^h),\mathcal {T}_h)\lesssim \Vert \mathcal {R}_h(p^H-p^h)\Vert _{a,\Omega }, \end{aligned}$$

which together with (4.11) imply

$$\begin{aligned}&\mathrm{osc}(L\mathcal {R}_h(y^H-y^h),\mathcal {T}_h)+ \mathrm{osc}(L^*\mathcal {R}_h(p^H-p^h),\mathcal {T}_h)\nonumber \\&\quad \lesssim {\tilde{\kappa }}(h_0)\Big (\Vert y-y_h\Vert _{a,\Omega }+\Vert p-p_h\Vert _{a,\Omega }+\Vert y-y_H\Vert _{a,\Omega }+\Vert p-p_H\Vert _{a,\Omega }\Big ).\qquad \qquad \end{aligned}$$
(4.14)

Moreover, since \(\bar{f}_T\) is the \(L^2\)-projection of f onto piecewise polynomials on T, there holds

$$\begin{aligned} \mathrm{osc}(f,\mathcal {T}_h)=\left( \sum \limits _{T\in \mathcal {T}_h}\Vert h_T(f-\bar{f}_T)\Vert _{0,T}^2\right) ^{1\over 2}\lesssim \Vert f\Vert _{0,\Omega }. \end{aligned}$$

By using the triangle inequality and (3.25) we thus have

$$\begin{aligned} \mathrm{osc}(u_h-u_H,\mathcal {T}_h)\lesssim \Vert u_h-u_H\Vert _{0,\Omega }\lesssim & {} \Vert u-u_h\Vert _{0,\Omega }+\Vert u-u_H\Vert _{0,\Omega },\\ \mathrm{osc}(y_h-y_H,\mathcal {T}_h)\lesssim \Vert y_h-y_H\Vert _{0,\Omega }\lesssim & {} \Vert u-u_h\Vert _{0,\Omega }+\Vert u-u_H\Vert _{0,\Omega }\\&+\,\kappa (H)\Vert y-y_H\Vert _{a,\Omega }+\kappa (h)\Vert y-y_h\Vert _{a,\Omega }, \end{aligned}$$

which together with (3.35) yield

$$\begin{aligned} \mathrm{osc}(u_h-u_H,\mathcal {T}_h)\lesssim & {} {\tilde{\kappa }}(h_0)\Big (\Vert y-y_h\Vert _{a,\Omega }+\Vert p-p_h\Vert _{a,\Omega }\nonumber \\&+\Vert y-y_H\Vert _{a,\Omega }+\Vert p-p_H\Vert _{a,\Omega }\Big ),\end{aligned}$$
(4.15)
$$\begin{aligned} \mathrm{osc}(y_h-y_H,\mathcal {T}_h)\lesssim & {} {\tilde{\kappa }}(h_0)\Big (\Vert y-y_h\Vert _{a,\Omega }+\Vert p-p_h\Vert _{a,\Omega }\nonumber \\&+\Vert y-y_H\Vert _{a,\Omega }+\Vert p-p_H\Vert _{a,\Omega }\Big ). \end{aligned}$$
(4.16)

We can conclude the desired results (4.3), (4.4) from the definition of the data oscillation and (4.12)–(4.16).

Now it remains to prove (4.5) and (4.6). From the definition of \(y^H\) and \(y^h\) we know that \(y^h-y^H\) is the solution of an elliptic boundary value problem with right hand side \(u_h-u_H\). It follows from (2.17) and (4.9) that

$$\begin{aligned} {\tilde{\eta }}_{h}(\mathcal {R}_h(y^h-y^H),\Omega )\lesssim & {} \left\| (y^h-y^H)-\mathcal {R}_h(y^h-y^H)\right\| _{a,\Omega }\nonumber \\&+\mathrm{osc}\left( u_h-u_H-L\mathcal {R}_h(y^h-y^H),\mathcal {T}_h\right) \nonumber \\\lesssim & {} \left\| u-u_h\Vert _{0,\Omega }+\Vert u-u_H\right\| _{0,\Omega }\nonumber \\&+\mathrm{osc}\left( u_h-u_H-L\mathcal {R}_h(y^h-y^H),\mathcal {T}_h\right) . \end{aligned}$$
(4.17)

From (2.14), (3.35), (4.14) and (4.15) we are led to

$$\begin{aligned} \mathrm{osc}(u_h-u_H-L\mathcal {R}_h(y^h-y^H),\mathcal {T}_h)\lesssim & {} {\tilde{\kappa }}(h_0)\Big (\Vert y-y_h\Vert _{a,\Omega }+\Vert p-p_h\Vert _{a,\Omega }\nonumber \\&+\Vert y-y_H\Vert _{a,\Omega }+\Vert p-p_H\Vert _{a,\Omega }\Big ).\qquad \qquad \end{aligned}$$
(4.18)

Note that

$$\begin{aligned} \eta _{y,h}(y_h,\Omega )={\tilde{\eta }}_{h}(\mathcal {R}_hy^h,\Omega )={\tilde{\eta }}_{h}(\mathcal {R}_hy^H+\mathcal {R}_h(y^h-y^H),\Omega ). \end{aligned}$$

Combining this with (3.35), (4.17) and (4.18) gives

$$\begin{aligned} \eta _{y,h}(y_h,\Omega )= & {} {\tilde{\eta }}_{h}(\mathcal {R}_hy^H,\Omega )+{\tilde{\kappa }}(h_0)\Big (\Vert y-y_h\Vert _{a,\Omega }+\Vert y-y_H\Vert _{a,\Omega }\\&+\Vert p-p_h\Vert _{a,\Omega }+\Vert p-p_H\Vert _{a,\Omega }\Big ), \end{aligned}$$

which proves (4.5). Similarly we can prove (4.6). Thus, we complete the proof of the theorem. \(\square \)

Now we are ready to prove the error reduction for the sum of the energy errors and the scaled error estimators of the state y and the adjoint state p, between two consecutive adaptive loops.

Theorem 4.2

Let \((u,y,p)\in U_{ad}\times H_0^1(\Omega )\times H_0^1(\Omega )\) be the solution of problem (3.1)–(3.2) and \((u_{h_k},y_{h_k}, p_{h_k})\in U_{ad}\times V_{h_k}\times V_{h_k}\) be a sequence of solutions to problem (3.7)–(3.8) produced by Algorithm 3.8. Then there exist constants \(\gamma >0\) and \(\beta \in (0,1)\) depending only on the shape regularity of meshes and the parameter \(\theta \) used by Algorithm 3.7, such that for any two consecutive iterates k and \(k+1\), we have

$$\begin{aligned}&\Vert (y-y_{h_{k+1}},p-p_{h_{k+1}})\Vert _{a}^2+\gamma \eta _{h_{k+1}}^2((y_{h_{k+1}},p_{h_{k+1}}),\Omega )\nonumber \\&\quad \leqslant \beta ^2\Big (\Vert (y-y_{h_{k}},p-p_{h_{k}})\Vert _{a}^2+\gamma \eta _{h_{k}}^2((y_{h_{k}},p_{h_{k}}),\Omega )\Big ) \end{aligned}$$
(4.19)

provided \(h_0\ll 1\). Therefore, Algorithm 3.8 converges with a linear rate \(\beta \), namely, the kth iterate solution \((u_{h_k},y_{h_k},p_{h_k})\) of Algorithm 3.8 satisfies

$$\begin{aligned} \Vert (y-y_{h_{k}},p-p_{h_{k}})\Vert _{a}^2+\gamma \eta _{h_{k}}^2((y_{h_{k}},p_{h_{k}}),\Omega )\leqslant C_0\beta ^{2k}, \end{aligned}$$
(4.20)

where \(C_0=\Vert (y-y_{h_{0}},p-p_{h_{0}})\Vert _{a}^2+\gamma \eta _{h_{0}}^2((y_{h_{0}},p_{h_{0}}),\Omega )\).

Proof

For convenience, we use \((u_H,y_H,p_H)\) and \((u_h,y_h,p_h)\) to denote \((u_{h_k},y_{h_k},p_{h_k})\) and \((u_{h_{k+1}}, y_{h_{k+1}},p_{h_{k+1}})\), respectively. So it suffices to prove that for \((u_H,y_H,p_H)\) and \((u_h,y_h,p_h)\), there holds

$$\begin{aligned}&\Vert (y-y_{h},p-p_{h})\Vert _{a}^2+\gamma \eta _{h}^2((y_{h},p_{h}),\Omega )\nonumber \\&\quad \leqslant \beta ^2\Big (\Vert (y-y_{H},p-p_{H})\Vert _{a}^2+\gamma \eta _{H}^2((y_{H},p_{H}),\Omega )\Big ). \end{aligned}$$
(4.21)

Recall that \(y^H:=Su_H\), \(y^h:=Su_h\) and \(p^H:=S^*(S_Hu_H-y_d)\), \(p^h:=S^*(S_hu_h-y_d)\). It follows from Algorithm 3.7 that Dörfler’s marking strategy is satisfied for \((y^H,p^H)\). So we conclude from Theorem 2.4 that there exist constants \({\tilde{\gamma }}\) as defined in (2.24) and \({\tilde{\beta }}\in (0,1)\) satisfying

$$\begin{aligned}&\Vert (y^H-\mathcal {R}_hy^H,p^H-\mathcal {R}_hp^H)\Vert _{a}^2+{\tilde{\gamma }}\big ({\tilde{\eta }}_h^{2}(\mathcal {R}_hy^H,\Omega )+{\tilde{\eta }}_{h}^{2}(\mathcal {R}_{h}p^{H},\Omega )\big )\nonumber \\&\quad \leqslant {\tilde{\beta }}^{2}\Big (\Vert (y^H-\mathcal {R}_Hy^H,p^H-\mathcal {R}_Hp^H)\Vert _{a}^2\nonumber \\&\qquad +\,{\tilde{\gamma }}\big ({\tilde{\eta }}_H^2(\mathcal {R}_Hy^H,\Omega )+{\tilde{\eta }}_{H}^{2}(\mathcal {R}_Hp^H,\Omega )\big )\Big ). \end{aligned}$$
(4.22)

Note that \(\mathcal {R}_Hy^H=y_H\) and \(\mathcal {R}_Hp^H=p_H\), we thus have

$$\begin{aligned}&\Vert (y^H-\mathcal {R}_{h}y^{H},p^{H}-\mathcal {R}_{h}p^{H})\Vert _{a}^2+{\tilde{\gamma }}\big ({\tilde{\eta }}_h^{2}(\mathcal {R}_hy^H,\Omega )+{\tilde{\eta }}_h^{2}(\mathcal {R}_{h}p^{H},\Omega )\big )\nonumber \\&\quad \leqslant {\tilde{\beta }}^{2}\Big (\Vert (y^H-y_H,p^H-p_H)\Vert _{a}^2+{\tilde{\gamma }}\big ({\eta _{y,H}}^2(y_H,\Omega )+{\eta _{p,H}}^2(p_H,\Omega )\big )\Big ).\qquad \qquad \end{aligned}$$
(4.23)

We conclude from (4.1), (4.2) and (4.5), (4.6) that there exists a constant \({\hat{C}}_4>0\) such that

$$\begin{aligned}&\Vert (y-y_{h},p-p_{h})\Vert _{a}^2+{\tilde{\gamma }}{\eta _{h}}^{2}((y_{h},p_{h}),\Omega )\\&\quad \leqslant (1+\delta _1)\Vert (y^H-\mathcal {R}_hy^{H},p^H-\mathcal {R}_hp^{H})\Vert _{a}^2+(1+\delta _1){\tilde{\gamma }}\big ({\tilde{\eta }}_{h}^2(\mathcal {R}_hy^{H},\Omega )\\&\qquad +\,{\tilde{\eta }}_{h}^2(\mathcal {R}_hp^{H},\Omega )\big )\\&\qquad +\,{\hat{C}}_4(1+\delta _1^{-1}){\tilde{\kappa }}^2(h_0)\Big (\Vert (y-y_h,p-p_h)\Vert ^2_{a}+\Vert (y-y_H,p-p_H)\Vert ^2_{a}\Big )\\&\qquad +\,{\hat{C}}_4(1+\delta _1^{-1}){\tilde{\kappa }}^{2}(h_{0}){\tilde{\gamma }}\Big (\Vert (y-y_h,p-p_h)\Vert ^2_{a}+\Vert (y-y_H,p-p_H)\Vert ^2_{a}\Big ), \end{aligned}$$

where Young’s inequality with \(\delta _1\) is used and \(\delta _1\in (0,1)\) satisfies

$$\begin{aligned} (1+\delta _1){\tilde{\beta }}^{2}<1. \end{aligned}$$
(4.24)

Thus, there exists a positive constant \({\hat{C}}_5\) depending on \({\hat{C}}_4\) and \({\tilde{\gamma }}\) such that \({\hat{C}}_4(1+\delta _1^{-1})(1+{\tilde{\gamma }})\leqslant {\hat{C}}_{5}\delta _1^{-1}\) and

$$\begin{aligned}&\Vert (y-y_{h},p-p_{h})\Vert _{a}^2+{\tilde{\gamma }}{\eta _{h}}^{2}((y_{h},p_{h}),\Omega )\nonumber \\&\quad \leqslant (1+\delta _1)\left( \left\| (y^H-\mathcal {R}_hy^{H},p^{H}-\mathcal {R}_{h}p^{H})\right\| _{a}^2\right. \nonumber \\&\qquad \left. +\,{\tilde{\gamma }}\left( {\tilde{\eta }}_{h}^{2}(\mathcal {R}_{h}y^{H},\Omega )+{\tilde{\eta }}_{h}^2(\mathcal {R}_hp^{H},\Omega )\right) \right) \nonumber \\&\qquad +\,{\hat{C}}_5\delta _1^{-1}{\tilde{\kappa }}^{2}(h_0)\Big (\Vert (y-y_h,p-p_h)\Vert ^2_{a}+\Vert (y-y_H,p-p_H)\Vert ^2_{a}\Big ). \end{aligned}$$
(4.25)

It follows from (4.23) and (4.25) that

$$\begin{aligned}&\Vert (y-y_{h},p-p_{h})\Vert _{a}^2+{\tilde{\gamma }}{\eta _{h}}^{2}((y_{h},p_{h}),\Omega )\nonumber \\&\quad \leqslant (1+\delta _1){\tilde{\beta }}^{2}\left( \left\| (y^H-y_{H},p^H-p_{H})\right\| _{a}^2+{\tilde{\gamma }}{\eta _{H}}^2((y_{H},p_{H}),\Omega )\right) \nonumber \\&\qquad +\,{\hat{C}}_5\delta _1^{-1}{\tilde{\kappa }}^{2}(h_{0})\Big (\Vert (y-y_h,p-p_h)\Vert ^2_{a}+\Vert (y-y_H,p-p_H)\Vert ^2_{a}\Big ). \end{aligned}$$
(4.26)

Then using Theorem 3.3 we arrive at

$$\begin{aligned}&\Vert (y-y_{h},p-p_{h})\Vert _{a}^2+{\tilde{\gamma }}{\eta _{h}}^{2}((y_{h},p_{h}),\Omega )\\&\quad \leqslant (1+\delta _1){\tilde{\beta }}^{2}\Big ((1+{\hat{C}}_{6}{\tilde{\kappa }}(h_0))^2\Vert (y-y_{H},p-p_{H})\Vert _{a}^2+{\tilde{\gamma }}{\eta _{H}}^{2}((y_{H},p_{H}),\Omega )\Big )\\&\qquad +\,{\hat{C}}_5\delta _1^{-1}{\tilde{\kappa }}^{2}(h_0)\Big (\Vert (y-y_h,p-p_h)\Vert ^2_{a}+\Vert (y-y_H,p-p_H)\Vert ^2_{a}\Big ),\\&\quad \leqslant (1+\delta _1){\tilde{\beta }}^{2}\Big (\Vert (y-y_{H},p-p_{H})\Vert _{a}^2+{\tilde{\gamma }}{\eta _{H}}^{2}((y_{H},p_{H}),\Omega )\Big )\\&\qquad +\,\Big ({\hat{C}}_5\delta _1^{-1}{\tilde{\kappa }}^{2}(h_0)+(1+\delta _{1}){\tilde{\beta }}^{2}\big (2{\hat{C}}_{6}{\tilde{\kappa }}(h_0)+{\hat{C}}_{6}^{2}{\tilde{\kappa }}^{2}(h_0)\big )\Big )\Vert (y-y_H,p-p_H)\Vert ^2_{a}\\&\qquad +\,{\hat{C}}_5\delta _1^{-1}{\tilde{\kappa }}^{2}(h_0)\Vert (y-y_h,p-p_h)\Vert ^2_{a}, \end{aligned}$$

and thus

$$\begin{aligned}&\Vert (y-y_{h},p-p_{h})\Vert _{a}^2+{\tilde{\gamma }}{\eta _{h}}^{2}((y_{h},p_{h}),\Omega )\nonumber \\&\quad \leqslant (1+\delta _1){\tilde{\beta }}^{2}\Big (\Vert (y-y_{H},p-p_{H})\Vert _{a}^2+{\tilde{\gamma }}{\eta _{H}}^{2}((y_{H},p_{H}),\Omega )\Big )\nonumber \\&\qquad +\,C_4{\tilde{\kappa }}(h_0)\Vert (y-y_H,p-p_H)\Vert ^2_{a}+C_4\delta _1^{-1}{\tilde{\kappa }}^{2}(h_0)\Vert (y-y_h,p-p_h)\Vert ^2_{a},\qquad \qquad \end{aligned}$$
(4.27)

where \(C_4\) is a positive constant depending on \({\hat{C}}_5\) and \({\hat{C}}_6\) when \(h_0\ll 1\). So we can derive

$$\begin{aligned}&(1-C_4\delta _1^{-1}{\tilde{\kappa }}^2(h_0))\Vert (y-y_{h},p-p_{h})\Vert _{a}^2+{\tilde{\gamma }}{\eta _{h}}^2((y_{h},p_{h}),\Omega )\nonumber \\&\quad \leqslant \Big ((1+\delta _1){\tilde{\beta }}^2+C_4{\tilde{\kappa }}(h_0)\Big )\Vert (y-y_{H},p-p_{H})\Vert _{a}^2\nonumber \\&\qquad +\,(1+\delta _1){\tilde{\beta }}^2{\tilde{\gamma }}{\eta _{H}}^2((y_{H},p_{H}),\Omega ), \end{aligned}$$
(4.28)

or equivalently,

$$\begin{aligned}&\Vert (y-y_{h},p-p_{h})\Vert _{a}^2+\frac{{\tilde{\gamma }}}{1-C_4\delta _1^{-1}{\tilde{\kappa }}^2(h_0)}{\eta _{h}}^2((y_{h},p_{h}),\Omega )\nonumber \\&\quad \leqslant \frac{(1+\delta _1){\tilde{\beta }}^2+C_4{\tilde{\kappa }}(h_0)}{1-C_4\delta _1^{-1}{\tilde{\kappa }}^2(h_0)}\Vert (y-y_{H},p-p_{H})\Vert _{a}^2\nonumber \\&\qquad +\,\frac{(1+\delta _1){\tilde{\beta }}^2\tilde{\gamma }}{1-C_4\delta _1^{-1}{\tilde{\kappa }}^2(h_0)}\eta _{H}^2((y_{H},p_{H}),\Omega ). \end{aligned}$$
(4.29)

Since \({\tilde{\kappa }}(h_0)\ll 1\) provided that \(h_0\ll 1\), we can define the constant \(\beta \) as

$$\begin{aligned} \beta :=\left( \frac{(1+\delta _1){\tilde{\beta }}^2+C_4{\tilde{\kappa }}(h_0)}{1-C_4\delta _1^{-1}{\tilde{\kappa }}^2(h_0)}\right) ^{1\over 2}, \end{aligned}$$
(4.30)

which satisfies \(\beta \in (0,1)\) if \(h_0\ll 1\) in view of (4.24). Then

$$\begin{aligned}&\Vert (y-y_{h},p-p_{h})\Vert _{a}^2+\frac{{\tilde{\gamma }}}{1-C_4\delta _1^{-1}{\tilde{\kappa }}^2(h_0)}\eta _{h}^2((y_{h},p_{h}),\Omega )\nonumber \\&\quad \leqslant \beta ^2\Big (\Vert (y-y_{H},p-p_{H})\Vert _{a}^2+\frac{(1+\delta _1){\tilde{\beta }}^2\tilde{\gamma }}{(1+\delta _1){\tilde{\beta }}^2+C_4{\tilde{\kappa }}(h_0)}{\eta _{H}}^2((y_{H},p_{H}),\Omega )\Big ).\nonumber \\ \end{aligned}$$
(4.31)

Now we choose

$$\begin{aligned} \gamma :=\frac{{\tilde{\gamma }}}{1-C_4\delta _1^{-1}{\tilde{\kappa }}^2(h_0)}, \end{aligned}$$
(4.32)

it is obvious that

$$\begin{aligned} \frac{(1+\delta _1){\tilde{\beta }}^2\tilde{\gamma }}{(1+\delta _1){\tilde{\beta }}^2+C_4{\tilde{\kappa }}(h_0)}= & {} \frac{(1+\delta _1){\tilde{\beta }}^2(1-C_4\delta _1^{-1}{\tilde{\kappa }}^2(h_0))}{(1+\delta _1){\tilde{\beta }}^2+C_4{\tilde{\kappa }}(h_0)}\gamma \\<&(1-C_4\delta _1^{-1}{\tilde{\kappa }}^2(h_0))\gamma <\gamma . \end{aligned}$$

Then we obtain (4.21), this completes the proof. \(\square \)

Remark 4.3

We remark that the requirement \(h_0\ll 1\) on the initial mesh \(\mathcal {T}_{h_0}\) is not restrictive for the convergence analysis of AFEM for nonlinear problems, such as optimal control problems studied in this paper, see, e.g., [14]. For a similar requirement we refer to [10, 11] for the convergence analysis of adaptive finite element eigenvalue computations and to [31] for the adaptive finite element computations for nonsymmetric boundary value problems, we should also mention [16] for an adaptive finite element method of semilinear elliptic equations.

Remark 4.4

In the adaptive Algorithm 3.8 we use the sum of the error estimators \(\eta _{y,h}(y_h,T)\) contributed to the state approximation and \(\eta _{p,h}(p_h,T)\) contributed to the adjoint state approximation as an indicator to select the subset \({{\mathcal {M}}}_h\) for refinement. This marking strategy enables us to prove the convergence and quasi-optimality (see Sect. 5) of AFEM for optimal control problems. We remark that it is also possible to apply Dörfler’s strategy to the contributions \(\eta _{y,h}(y_h,T)\) and \(\eta _{p,h}(p_h,T)\) as follows:

  • Construct a minimal subset \({{\mathcal {M}}}_{h,1}\subset \mathcal {T}_h\) such that \(\sum \nolimits _{T\in {{\mathcal {M}}}_{h,1}} \eta _{y,h}^2(y_h,T)\geqslant \theta \eta _{y,h}^2(y_h,\Omega )\).

  • Construct another minimal subset \({{\mathcal {M}}}_{h,2}\subset \mathcal {T}_h\) such that \(\sum \nolimits _{T\in {{\mathcal {M}}}_{h,2}} \eta _{p,h}^2(p_h,T)\geqslant \theta \eta _{p,h}^2(p_h,\Omega )\).

  • Set \({{\mathcal {M}}}_{h}:={{\mathcal {M}}}_{h,1}\cup {{\mathcal {M}}}_{h,2}\) and mark all the elements in \({{\mathcal {M}}}_{h}\).

With this marking strategy we can also prove the convergence of AFEM for optimal control problems by using the results of [7, 11] for single boundary value problems. To be more specific, the error reduction (4.22) can be derived separately for the state and adjoint state approximations. However, the resulting over-refinement for this marking strategy prevents us to prove the quasi-optimality of the adaptive algorithm.

We also would like to point out that in our convergence analysis we used the same mesh for the state and adjoint state approximations, another possibility is to use different meshes for the state and adjoint state. Therefore, starting from the same initial mesh we obtain different adaptive meshes for the state and adjoint state approximations where the refinement of each mesh can be generated by the error indicators corresponding to the respective state and adjoint state equations. It will be an interesting topic to prove the convergence of AFME based on the above different discretizations for primal and dual variables.

5 Complexity of AFEM for the optimal control problem

In this section we intend to analyze the complexity of the adaptive finite element algorithm for optimal control problems based on known results on the complexity for elliptic boundary value problems. The proof uses some ideas of [11, 16] and some results of [7].

Similar to [7, 11], for our purpose to analyse the complexity of AFEM for optimal control problems we need to introduce a function approximation class as follows

$$\begin{aligned} \mathcal {A}_\gamma ^s:=\Big \{(y,p,y_d)\in H_0^1(\Omega )\times H_0^1(\Omega )\times L^2(\Omega ):\quad |(y,p,y_d)|_{s,\gamma }<+\infty \Big \}, \end{aligned}$$

where \(\gamma >0\) is some constant and

$$\begin{aligned} |(y,p,y_d)|_{s,\gamma }=\sup \limits _{\varepsilon >0}\varepsilon \mathop {\inf }_{\begin{array}{c} \mathcal {T}\subset \mathcal {T}_{h_0}:\ \inf \ (\Vert (y-y_{\mathcal {T}},p-p_{\mathcal {T}})\Vert _{a}^2 \\ +(\gamma +1){\tiny \mathrm{osc}}^2((y_{\mathcal {T}},p_{\mathcal {T}}),{\mathcal {T}}))^{1/ 2}\le \varepsilon \end{array}}(\#{\mathcal {T}}-\#{\mathcal {T}_{h_0}})^s. \end{aligned}$$

Here \(\mathcal {T}\subset \mathcal {T}_{h_0}\) means that \(\mathcal {T}\) is a refinement of \(\mathcal {T}_{h_0}\), \(y_{\mathcal {T}}\) and \(p_{\mathcal {T}}\) are elements of the finite element space corresponding to the partition \(\mathcal {T}\). It is seen from the definition that \(\mathcal {A}_\gamma ^s=\mathcal {A}_1^s\) for all \(\gamma >0\), thus we use \(\mathcal {A}^s\) throughout the paper with corresponding norm \(|\cdot |_{s}\). So \(\mathcal {A}^s\) is the class of functions that can be approximated with a given tolerance \(\varepsilon \) by continuous piecewise linear polynomial functions over a partition \(\mathcal {T}\) with number of degrees of freedom \(\#{\mathcal {T}}-\#{\mathcal {T}_{h_0}}\lesssim \varepsilon ^{-1/s}|v|_s^{1/s}\).

Now we are in the position to prepare for the proof of optimal complexity of Algorithm 3.8 for the optimal control problem (3.1)–(3.2). At first, we define \(y^{h_k}:=Su_{h_k}\) and \(p^{h_k}:=S^*(S_{h_k}u_{h_k}-y_d)\). Then we have the following result.

Lemma 5.1

Let \((u_{h_k},y_{h_k},p_{h_k})\in U_{ad}\times V_{h_k}\times V_{h_k}\) and \((u_{h_{k+1}},y_{h_{k+1}},p_{h_{k+1}})\in U_{ad}\times V_{h_{k+1}}\times V_{h_{k+1}}\) be discrete solutions of problem (3.7)–(3.8) over mesh \(\mathcal {T}_{h_k}\) and its refinement \(\mathcal {T}_{h_{k+1}}\) with marked elements \(\mathcal {M}_{h_{k}}\). Suppose they satisfy the following property

$$\begin{aligned}&\Vert (y-y_{h_{k+1}},p-p_{h_{k+1}})\Vert _{a}^2+ \gamma _*\mathrm{osc}^2((y_{h_{k+1}},p_{h_{k+1}}),\mathcal {T}_{h_{k+1}})\nonumber \\&\quad \leqslant \beta _*^2\Big (\Vert (y-y_{h_k},p-p_{h_k})\Vert _{a}^2+\gamma _*\mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})\Big ) \end{aligned}$$
(5.1)

with \(\gamma _*\) and \(\beta _*\) some positive constants. Then for the associated state and adjoint state approximations we have

$$\begin{aligned}&\Vert (y^{h_k}-\mathcal {R}_{h_{k+1}}y^{h_k},p^{h_k}-\mathcal {R}_{h_{k+1}}p^{h_k})\Vert _{a}^2+ {\tilde{\gamma }}_{*}\mathrm{osc}^2((\mathcal {R}_{h_{k+1}}y^{h_k},\mathcal {R}_{h_{k+1}}p^{h_k}),\mathcal {T}_{h_{k+1}})\nonumber \\&\quad \leqslant {\tilde{\beta }}_{*}^{2}\Big (\Vert (y^{h_k}-\mathcal {R}_{h_k}y^{h_k},p^{h_k}-\mathcal {R}_{h_k}p^{h_k})\Vert _{a}^2+{\tilde{\gamma }}_{*}\mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})\Big ) \end{aligned}$$
(5.2)

with

$$\begin{aligned} {\tilde{\beta }}_{*}:=\left( \frac{(1+\delta _1)\beta ^2_*+C_5{\tilde{\kappa }}(h_0)}{1-C_5\delta _1^{-1}{\tilde{\kappa }}^{2}(h_0)}\right) ^{1\over 2},\quad {\tilde{\gamma }}_{*}:=\frac{\gamma _{*}}{1-C_5\delta _1^{-1}{\tilde{\kappa }}^{2}(h_0)}, \end{aligned}$$

where \(C_5\) is some constant depending on \(C_*\), \({\hat{C}}_5\) and \({\hat{C}}_6\). \({\hat{C}}_5\), \({\hat{C}}_6\) and \(\delta _1\in (0,1)\) are some constants as in the proof of Theorem 4.2.

Proof

The proof follows along the lines of the proof of Theorem 4.2 when (4.5), (4.6) are replaced by (4.3), (4.4). Specifically, in the proof of Theorem 4.2 we use (4.22), Theorems 3.3 and 4.1 to prove (4.21). Conversely, here we need to prove (4.22) from (4.21), Theorems 3.3 and 4.1. The definitions of \({\tilde{\beta }}_{*}\) and \({\tilde{\gamma }}_{*}\) are very similar to (4.30) and (4.32). \(\square \)

Next, we are able to derive a result similar to Lemma 2.6 concerning the optimality of Dörfler’s marking strategy for the optimal control problems.

Corollary 5.2

Let \((u_{h_k},y_{h_k},p_{h_k})\in U_{ad}\times V_{h_k}\times V_{h_k}\) and \((u_{h_{k+1}},y_{h_{k+1}},p_{h_{k+1}})\in U_{ad}\times V_{h_{k+1}}\times V_{h_{k+1}}\) be discrete solutions of problem (3.7)–(3.8) over the mesh \(\mathcal {T}_{h_k}\) and its refinement \(\mathcal {T}_{h_{k+1}}\) with marked elements \(\mathcal {M}_{h_k}\). Suppose they satisfy the following property

$$\begin{aligned}&\Vert (y-y_{h_{k+1}},p-p_{h_{k+1}})\Vert _{a}^2+ \gamma _*\mathrm{osc}^2((y_{h_{k+1}},p_{h_{k+1}}),\mathcal {T}_{h_{k+1}})\\&\quad \leqslant \beta _*^2(\Vert (y-y_{h_k},p-p_{h_k})\Vert _{a}^2+\gamma _*\mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})) \end{aligned}$$

with constants \(\gamma _*>0\) and \(\beta _*\in (0,\sqrt{1\over 2})\). Then the set \(R_{\mathcal {T}_{h_k}\rightarrow \mathcal {T}_{h_{k+1}}}\) of refined elements satisfies the Dörfler property

$$\begin{aligned} \sum \limits _{T\in R_{\mathcal {T}_{h_k}\rightarrow \mathcal {T}_{h_{k+1}}}}\eta _{h_k}^2((y_{h_k},p_{h_k}),T)\geqslant {\hat{\theta }} \sum \limits _{T\in \mathcal {T}_{h_k}}\eta _{h_k}^2((y_{h_k},p_{h_k}),T) \end{aligned}$$
(5.3)

with \({\hat{\theta }}=\frac{{\tilde{C}}_{2}(1-2{\tilde{\beta }}_{*}^{2})}{{\tilde{C}}_{0}(\tilde{C}_1+(1+2C_*^2{\tilde{C}}_{1}){\tilde{\gamma }}_{*})}\) and \({\tilde{C}}_{0}=\max (1,\frac{{\tilde{C}}_{3}}{{\tilde{\gamma }}_{*}})\).

Proof

From the statement of this corollary we know that the assumption (5.1) in Lemma 5.1 is satisfied, due to Lemma 5.1 we can conclude (5.2). Note that \(y_{h_k}=\mathcal {R}_{h_k}y^{h_k}\) and \(p_{h_k}=\mathcal {R}_{h_k}p^{h_k}\). By the lower bounds in Lemma 3.5 and the definition of \({\tilde{C}}_{0}\) we have

$$\begin{aligned} (1-2{\tilde{\beta }}_{*}^{2}){\tilde{C}}_{2}{\eta _{h_k}}^2((y_{h_k},p_{h_k}),\Omega )\leqslant & {} (1-2{\tilde{\beta }}_{*}^{2})\Big (\Vert (y^{h_k}-y_{h_k},p^{h_k}-p_{h_k})\Vert _{a}^2\\&+\,{\tilde{C}}_{3}\mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})\Big )\\= & {} (1-2{\tilde{\beta }}_{*}^{2})\Big (\Vert (y^{h_k}-y_{h_k},p^{h_k}-p_{h_k})\Vert _{a}^2\\&+\,\frac{{\tilde{C}}_{3}}{\tilde{\gamma }_*}{\tilde{\gamma }}_{*}\mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})\Big )\\\leqslant & {} {\tilde{C}}_{0}(1-2\tilde{\beta }_*^2)\Big (\Vert (y^{h_k}-y_{h_k},p^{h_k}-p_{h_k})\Vert _{a}^2\\&+\,{\tilde{\gamma }}_{*}\mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})\Big ). \end{aligned}$$

Thus, it follows from (5.2) that

$$\begin{aligned}&\frac{{\tilde{C}}_2}{{\tilde{C}}_0}(1-2{\tilde{\beta }}_{*}^{2})\sum \limits _{T\in \mathcal {T}_{h_k}}\eta _{h_k}^2((y_{h_k},p_{h_k}),T)\nonumber \\&\quad \leqslant (1-2{\tilde{\beta }}_{*}^2)\Big (\Vert (y^{h_k}-y_{h_k},p^{h_k}-p_{h_k})\Vert _{a}^2 +{\tilde{\gamma }}_{*}\mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})\Big )\nonumber \\&\quad =\Vert (y^{h_k}-y_{h_k},p^{h_k}-p_{h_k})\Vert _{a}^2 +{\tilde{\gamma }}_{*}\mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})\nonumber \\&\qquad -\,2{\tilde{\beta }}_{*}^2\Big (\Vert (y^{h_k}-y_{h_k},p^{h_k}-p_{h_k})\Vert _{a}^2 +{\tilde{\gamma }}_{*}\mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})\Big )\nonumber \\&\quad \leqslant \Vert (y^{h_k}-y_{h_k},p^{h_k}-p_{h_k})\Vert _{a}^2 +{\tilde{\gamma }}_{*}\mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})\nonumber \\&\qquad -\,2\Big (\Vert (y^{h_k}-\mathcal {R}_{h_{k+1}}y^{h_k},p^{h_k}-\mathcal {R}_{h_{k+1}}p^{h_k})\Vert _{a}^2\nonumber \\&\qquad +\,{\tilde{\gamma }}_{*}\mathrm{osc}^2((\mathcal {R}_{h_{k+1}}y^{h_k},\mathcal {R}_{h_{k+1}}p^{h_k}),\mathcal {T}_{h_{k+1}})\Big )\nonumber \\&\quad \leqslant \Vert (y^{h_k}-y_{h_k},p^{h_k}-p_{h_k})\Vert _{a}^2-\Vert (y^{h_k}-\mathcal {R}_{h_{k+1}}y^{h_k},p^{h_k}-\mathcal {R}_{h_{k+1}}p^{h_k})\Vert _{a}^2\nonumber \\&\qquad +\,{\tilde{\gamma }}_{*}\Big (\mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})-2\mathrm{osc}^2((\mathcal {R}_{h_{k+1}}y^{h_k},\mathcal {R}_{h_{k+1}}p^{h_k}),\mathcal {T}_{h_{k+1}})\Big ). \end{aligned}$$
(5.4)

Note that \(y_{h_k}\) and \(\mathcal {R}_{h_{k+1}}y^{h_k}\) are the Galerkin projections of \(y^{h_k}\) on \(V_{h_k}\) and \(V_{h_{k+1}}\), respectively. From the standard Galerkin orthogonality we have

$$\begin{aligned}&\Vert (y^{h_k}-y_{h_k},p^{h_k}-p_{h_k})\Vert _{a}^2-\Vert (y^{h_k}-\mathcal {R}_{h_{k+1}}y^{h_k},p^{h_k}-\mathcal {R}_{h_{k+1}}p^{h_k})\Vert _{a}^2\nonumber \\&\quad =\Vert (y_{h_k}-\mathcal {R}_{h_{k+1}}y^{h_k},p_{h_k}-\mathcal {R}_{h_{k+1}}p^{h_k})\Vert _{a}^2. \end{aligned}$$
(5.5)

By (2.15), the triangle and Young’s inequalities we have

$$\begin{aligned} \mathrm{osc}^2((y_{h_k},p_{h_k}),T)\leqslant & {} 2 \mathrm{osc}^2((\mathcal {R}_{h_{k+1}}y^{h_k},\mathcal {R}_{h_{k+1}}p^{h_k}),T)\\&+\,2C_*^2\Vert (y_{h_k}-\mathcal {R}_{h_{k+1}}y^{h_k},p_{h_k}-\mathcal {R}_{h_{k+1}}p^{h_k})\Vert _{a}^2, \end{aligned}$$

which together with the dominance of the indicator over oscillation (see [7, Remark 2.1])

$$\begin{aligned} \mathrm{osc}^2(u_{h_k}-Ly_{h_k},T)\leqslant & {} \eta _{y,h_k}^2(y_{h_k},T),\end{aligned}$$
(5.6)
$$\begin{aligned} \mathrm{osc}^2(y_{h_k}-y_d-L^*p_{h_k},T)\leqslant & {} \eta _{p,h_k}^2(p_{h_k},T) \end{aligned}$$
(5.7)

implies

$$\begin{aligned}&\mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})-2\mathrm{osc}^2\left( \left( \mathcal {R}_{h_{k+1}}y^{h_k},\mathcal {R}_{h_{k+1}}p^{h_k}\right) ,\mathcal {T}_{h_{k+1}}\right) \nonumber \\&\quad \leqslant \sum \limits _{T\in R_{\mathcal {T}_{h_k}\rightarrow \mathcal {T}_{h_{k+1}}}}\mathrm{osc}^2((y_{h_k},p_{h_k}),T)+\mathrm{osc}^2\left( (y_{h_k},p_{h_k}),\mathcal {T}_{h_k}\cap \mathcal {T}_{h_{k+1}}\right) \nonumber \\&\qquad -\,2\mathrm{osc}^2\left( \left( \mathcal {R}_{h_{k+1}}y^{h_k},\mathcal {R}_{h_{k+1}}p^{h_k}\right) ,\mathcal {T}_{h_k}\cap \mathcal {T}_{h_{k+1}}\right) \nonumber \\&\quad \leqslant \sum \limits _{T\in R_{\mathcal {T}_{h_k}\rightarrow \mathcal {T}_{h_{k+1}}}}\eta _{h_k}^2((y_{h_k},p_{h_k}),T)+2C_*^2\left\| \left( y_{h_k}-\mathcal {R}_{h_{k+1}}y^{h_k},p_{h_k}-\mathcal {R}_{h_{k+1}}p^{h_k}\right) \right\| _{a}^2\nonumber \\&\quad \leqslant (1+2C_*^2{\tilde{C}}_1)\sum \limits _{T\in R_{\mathcal {T}_{h_k}\rightarrow \mathcal {T}_{h_{k+1}}}}\eta _{h_k}^2((y_{h_k},p_{h_k}),T), \end{aligned}$$
(5.8)

where we used (2.25) in the last inequality. Combining (5.4)–(5.8) and (2.25) we obtain

$$\begin{aligned}&\frac{{\tilde{C}}_2}{{\tilde{C}}_0}(1-2{\tilde{\beta }}_{*}^2)\sum \limits _{T\in \mathcal {T}_{h_k}}\eta _{h_k}^2((y_{h_k},p_{h_k}),T)\nonumber \\&\qquad \leqslant ({\tilde{C}}_1+(1+2C_{*}^2{\tilde{C}}_1){\tilde{\gamma }}_{*})\sum \limits _{T\in R_{\mathcal {T}_{h_k}\rightarrow \mathcal {T}_{h_{k+1}}}}\eta _{h_k}^2((y_{h_k},p_{h_k}),T). \end{aligned}$$
(5.9)

By choosing

$$\begin{aligned} {\hat{\theta }}:=\frac{\frac{{\tilde{C}}_2}{{\tilde{C}}_0}(1-2{\tilde{\beta }}_{*}^2)}{{\tilde{C}}_1+(1+2C_{*}^2{\tilde{C}}_1){\tilde{\gamma }}_{*}}=\frac{{\tilde{C}}_2(1-2{\tilde{\beta }}_{*}^2)}{{\tilde{C}}_0({\tilde{C}}_1+(1+2C_{*}^2{\tilde{C}}_1){\tilde{\gamma }}_{*})} \end{aligned}$$

we complete the proof. \(\square \)

Lemma 5.3

Let \((y,p,y_d)\in \mathcal {A}^s\) and \(\mathcal {T}_{h_k}\) (\(k\geqslant 0\)) be a sequence of meshes generated by Algorithm 3.8 starting from the initial mesh \(\mathcal {T}_{h_0}\). Let \(\mathcal {T}_{h_{k+1}}=\mathrm{REFINE}(\mathcal {T}_{h_k},\mathcal {M}_{h_k})\) where \(\mathcal {M}_{h_k}\) is produced by Algorithm 3.7 with \(\theta \) satisfying \(\theta \in (0,\frac{ C_2\gamma }{C_3(C_1+(1+2C_*^2 C_1)\gamma )})\). Then

$$\begin{aligned} \#\mathcal {M}_{h_{k}} \leqslant C_5\Big (\Vert (y-y_{h_k},p-p_{h_k})\Vert _{a}^2+ \gamma \mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})\Big )^{-{1\over 2s}}|(y,p,y_d)|_s^{1\over s},\nonumber \\ \end{aligned}$$
(5.10)

where the constant \(C_5\) depends on the discrepancy between \(\theta \) and \(\frac{ C_2\gamma }{C_3(C_1+(1+2C_*^2 C_1)\gamma )}\).

Proof

Let \(\rho ,\rho _1\in (0,1)\) satisfy \(\rho _1\in (0,\rho )\) and

$$\begin{aligned} \theta <\frac{ C_2\gamma }{C_3(C_1+(1+2C_*^2 C_1)\gamma )}(1-\rho ^2). \end{aligned}$$

Choose \(\delta _1\in (0,1)\) to satisfy (4.24) and

$$\begin{aligned} (1+\delta _1)^2\rho _1^2\leqslant \rho ^2, \end{aligned}$$
(5.11)

which implies

$$\begin{aligned} (1+\delta _1)\rho _1^2<1. \end{aligned}$$
(5.12)

Set

$$\begin{aligned} \varepsilon = \frac{1}{\sqrt{2}}\rho _1\Big (\Vert (y-y_{h_k},p-p_{h_k})\Vert _{a}^2+ \gamma \mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})\Big )^{1\over 2} \end{aligned}$$

and let \(\mathcal {T}_{h_\varepsilon }\) be a refinement of \(\mathcal {T}_{h_0}\) with minimal degrees of freedom satisfying

$$\begin{aligned} \Vert (y-y_{h_\varepsilon },p-p_{h_\varepsilon })\Vert _{a}^2+ (\gamma +1)\mathrm{osc}^2((y_{h_\varepsilon },p_{h_\varepsilon }),\mathcal {T}_{h_\varepsilon })\leqslant \varepsilon ^2. \end{aligned}$$
(5.13)

We can conclude from the definition of \(\mathcal {A}^s\) that

$$\begin{aligned} \#\mathcal {T}_{h_\varepsilon }-\#\mathcal {T}_{h_0}\lesssim \varepsilon ^{-{1\over s}} |(y,p,y_d)|_s^{1\over s}. \end{aligned}$$

Let \(\mathcal {T}_{h_*}:=\mathcal {T}_{h_\varepsilon }\oplus \mathcal {T}_{h_k}\) be the smallest common refinement of \(\mathcal {T}_{h_\varepsilon }\) and \(\mathcal {T}_{h_k}\). Let \(V_{h_\varepsilon }\subset H_0^1(\Omega )\) and \(V_{h_*}\subset H_0^1(\Omega )\) be the finite element spaces defined on \(\mathcal {T}_{h_\varepsilon }\) and \(\mathcal {T}_{h_*}\), respectively. Assume that \((u_{h_\varepsilon },y_{h_\varepsilon },p_{h_\varepsilon })\in U_{ad}\times V_{h_\varepsilon }\times V_{h_\varepsilon }\) is the solution of problem (3.7)–(3.8).

Define \(y^{h_\varepsilon }:=Su_{h_\varepsilon }\) and \(p^{h_\varepsilon }:=S^*(S_{h_\varepsilon }u_{h_\varepsilon }-y_d)\). From the definition of oscillation we can conclude from Lemma 2.2 that

$$\begin{aligned}&\mathrm{osc}\left( u_{h_\varepsilon }-L\mathcal {R}_{h_*}y^{h_\varepsilon },\mathcal {T}_{h_*}\right) \\&\quad \leqslant \mathrm{osc}\left( u_{h_\varepsilon }-L\mathcal {R}_{h_\varepsilon }y^{h_\varepsilon },\mathcal {T}_{h_*}\right) +\mathrm{osc}\left( L(\mathcal {R}_{h_*}-\mathcal {R}_{h_\varepsilon })y^{h_\varepsilon },\mathcal {T}_{h_*}\right) \\&\quad \leqslant \mathrm{osc}\left( u_{h_\varepsilon }-L\mathcal {R}_{h_\varepsilon }y^{h_\varepsilon },\mathcal {T}_{h_*}\right) +C_*\left\| (\mathcal {R}_{h_*}-\mathcal {R}_{h_\varepsilon })y^{h_\varepsilon }\right\| _{a,\Omega } \end{aligned}$$

and

$$\begin{aligned} \mathrm{osc}\left( y_{h_\varepsilon }-y_d-L^*\mathcal {R}_{h_*}p^{h_\varepsilon },\mathcal {T}_{h_*}\right)\leqslant & {} \mathrm{osc}\left( y_{h_\varepsilon }-y_d-L^*\mathcal {R}_{h_\varepsilon }p^{h_\varepsilon },\mathcal {T}_{h_*}\right) \\&+\,\mathrm{osc}\left( L^*(\mathcal {R}_{h_*}-\mathcal {R}_{h_\varepsilon })p^{h_\varepsilon },\mathcal {T}_{h_*}\right) \\\leqslant & {} \mathrm{osc}\left( y_{h_\varepsilon }-y_d-L^*\mathcal {R}_{h_\varepsilon }p^{h_\varepsilon },\mathcal {T}_{h_*}\right) \\&+\,C_*\left\| \left( \mathcal {R}_{h_*}-\mathcal {R}_{h_\varepsilon }\right) p^{h_\varepsilon })\right\| _{a,\Omega }. \end{aligned}$$

Then from Young’s inequality we have

$$\begin{aligned} \mathrm{osc}^2\left( (\mathcal {R}_{h_*}y^{h_\varepsilon },\mathcal {R}_{h_*}p^{h_\varepsilon }),\mathcal {T}_{h_*}\right)\leqslant & {} 2\mathrm{osc}^2\left( (\mathcal {R}_{h_\varepsilon }y^{h_\varepsilon },\mathcal {R}_{h_\varepsilon }p^{h_\varepsilon }),\mathcal {T}_{h_*}\right) \\&+\,2C_*^2\left\| \left( \left( \mathcal {R}_{h_*}-\mathcal {R}_{h_\varepsilon })y^{h_\varepsilon },(\mathcal {R}_{h_*}-\mathcal {R}_{h_\varepsilon })p^{h_\varepsilon }\right) \right) \right\| _{a}^2. \end{aligned}$$

Due to the orthogonality

$$\begin{aligned} \left\| \left( y^{h_\varepsilon }-\mathcal {R}_{h_*}y^{h_\varepsilon },p^{h_\varepsilon }-\mathcal {R}_{h_*}p^{h_\varepsilon }\right) \right\| _{a}^2= & {} \left\| \left( y^{h_\varepsilon }-\mathcal {R}_{h_\varepsilon }y^{h_\varepsilon },p^{h_\varepsilon }-\mathcal {R}_{h_\varepsilon }p^{h_\varepsilon }\right) \right\| _{a}^2\\&-\left\| \left( \left( \mathcal {R}_{h_*}-\mathcal {R}_{h_\varepsilon })y^{h_\varepsilon },(\mathcal {R}_{h_*}-\mathcal {R}_{h_\varepsilon })p^{h_\varepsilon }\right) \right) \right\| _{a}^2, \end{aligned}$$

we arrive at

$$\begin{aligned}&\left\| \left( y^{h_\varepsilon }-\mathcal {R}_{h_*}y^{h_\varepsilon },p^{h_\varepsilon }-\mathcal {R}_{h_*}p^{h_\varepsilon }\right) \right\| _{a}^2+\frac{1}{2C_*^2}\mathrm{osc}^2\left( \left( \mathcal {R}_{h_*}y^{h_\varepsilon },\mathcal {R}_{h_*}p^{h_\varepsilon }\right) ,\mathcal {T}_{h_*}\right) \\&\quad \leqslant \left\| \left( y^{h_\varepsilon }-\mathcal {R}_{h_\varepsilon }y^{h_\varepsilon },p^{h_\varepsilon }-\mathcal {R}_{h_\varepsilon }p^{h_\varepsilon }\right) \right\| _{a}^2+\frac{1}{C_*^2}\mathrm{osc}^2\left( \left( \mathcal {R}_{h_\varepsilon }y^{h_\varepsilon },\mathcal {R}_{h_\varepsilon }p^{h_\varepsilon }\right) ,\mathcal {T}_{h_*}\right) . \end{aligned}$$

From (2.24) of Theorem 2.4 we can see that \({\tilde{\gamma }}\leqslant \frac{1}{2C_*^2}\), which implies

$$\begin{aligned}&\left\| \left( y^{h_\varepsilon }-\mathcal {R}_{h_*}y^{h_\varepsilon },p^{h_\varepsilon }-\mathcal {R}_{h_*}p^{h_\varepsilon }\right) \right\| _{a}^2+{\tilde{\gamma }}\mathrm{osc}^2\left( \left( \mathcal {R}_{h_*}y^{h_\varepsilon },\mathcal {R}_{h_*}p^{h_\varepsilon }\right) ,\mathcal {T}_{h_*}\right) \\&\quad \leqslant \left\| \left( y^{h_\varepsilon }-\mathcal {R}_{h_\varepsilon }y^{h_\varepsilon },p^{h_\varepsilon }-\mathcal {R}_{h_\varepsilon }p^{h_\varepsilon }\right) \right\| _{a}^2+\frac{1}{C_*^2}\mathrm{osc}^2\left( \left( \mathcal {R}_{h_\varepsilon }y^{h_\varepsilon },\mathcal {R}_{h_\varepsilon }p^{h_\varepsilon }\right) ,\mathcal {T}_{h_*}\right) \\&\quad \leqslant \left\| \left( y^{h_\varepsilon }-\mathcal {R}_{h_\varepsilon }y^{h_\varepsilon },p^{h_\varepsilon }-\mathcal {R}_{h_\varepsilon }p^{h_\varepsilon }\right) \right\| _{a}^2+({\tilde{\gamma }}+\sigma )\mathrm{osc}^2\left( \left( \mathcal {R}_{h_\varepsilon }y^{h_\varepsilon },\mathcal {R}_{h_\varepsilon }p^{h_\varepsilon }\right) ,\mathcal {T}_{h_*}\right) \end{aligned}$$

with \(\sigma = \frac{1}{C_*^2}-{\tilde{\gamma }}\in (0,1)\). Following the similar procedure as in the proof of Theorem 4.2 when (4.5), (4.6) are replaced by (4.3), (4.4), we are led to

$$\begin{aligned}&\Vert (y-y_{h_*},p-p_{h_*})\Vert _{a}^2+ \gamma \mathrm{osc}^2((y_{h_*},p_{h_*}),\mathcal {T}_{h_*})\nonumber \\&\quad \leqslant \beta _0^2\Big (\Vert (y-y_{h_\varepsilon },p-p_{h_\varepsilon })\Vert _{a}^2+ (\gamma +\sigma )\mathrm{osc}^2((y_{h_\varepsilon },p_{h_\varepsilon }),\mathcal {T}_{h_\varepsilon })\Big )\nonumber \\&\quad \leqslant \beta _0^2\Big (\Vert (y-y_{h_\varepsilon },p-p_{h_\varepsilon })\Vert _{a}^2+ (\gamma +1)\mathrm{osc}^2((y_{h_\varepsilon },p_{h_\varepsilon }),\mathcal {T}_{h_\varepsilon })\Big ), \end{aligned}$$
(5.14)

where

$$\begin{aligned} \beta _0:=\Big (\frac{(1+\delta _1)+C_4{\tilde{\kappa }}(h_0)}{1-C_4\delta _1^{-1}{\tilde{\kappa }}^2(h_0)}\Big )^{1\over 2} \end{aligned}$$

and \(C_4\) is the constant that appeared in the proof of Theorem 4.2. Thus, it follows from (5.13), (5.14) and the definition of \(\varepsilon \) that

$$\begin{aligned}&\Vert (y-y_{h_*},p-p_{h_*})\Vert _{a}^2+ \gamma \mathrm{osc}^2((y_{h_*},p_{h_*}),\mathcal {T}_{h_*})\nonumber \\&\quad \leqslant \beta _1^2\Big (\Vert (y-y_{h_k},p-p_{h_k})\Vert _{a}^2+ \gamma \mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})\Big ) \end{aligned}$$
(5.15)

with \(\beta _1=\frac{1}{\sqrt{2}}\beta _0\rho _1\).

In view of (5.12) and the definitions of \(\beta _0\) and \(\beta _1\) we have \(\beta _1^2\in (0,{1\over 2})\) provided \(h_0\ll 1\). It follows from Corollary 5.2 that

$$\begin{aligned} \sum \limits _{T\in R_{\mathcal {T}_{h_k}\rightarrow \mathcal {T}_{h_*}}}\eta _{h_k}^2((y_{h_k},p_{h_k}),T)\geqslant \theta _1 \sum \limits _{T\in \mathcal {T}_{h_k}}\eta _{h_k}^2((y_{h_k},p_{h_k}),T), \end{aligned}$$
(5.16)

where \(\theta _1=\frac{{\tilde{C}}_2(1-2{\tilde{\beta }}_1^2)}{{\tilde{C}}_5({\tilde{C}}_1+(1+2C_{*}^2{\tilde{C}}_1){\tilde{\gamma }}_1)}\), \({\tilde{\gamma }}_1=\frac{\gamma }{1-C_5\delta _1^{-1}{\tilde{\kappa }}^2(h_0)}\), \({\tilde{C}}_5=\max (1,\frac{{\tilde{C}}_3}{{\tilde{\gamma }}_1})\) and

$$\begin{aligned} {\tilde{\beta }}_1=\left( \frac{(1+\delta _1)\beta _1^2+C_5{\tilde{\kappa }}(h_0)}{1-C_5\delta _1^{-1}{\tilde{\kappa }}^2(h_0)}\right) ^{1\over 2}. \end{aligned}$$

It follows from the definition of \(\gamma \) in (2.24) and \({\tilde{\gamma }}\) in (4.32) that \({\tilde{\gamma }}_1<1\), which together with \({\tilde{C}}_3>1\) (see [11]) implies \({\tilde{C}}_5=\frac{{\tilde{C}}_3}{{\tilde{\gamma }}_1}\). Since \(h_0\ll 1\), we obtain that \({\tilde{\gamma }}_1>\gamma \) and \({\tilde{\beta }}_1\in (0,\frac{1}{\sqrt{2}}\rho )\) from (5.11). It is easy to see from (3.50) and \({\tilde{\gamma }}_1>\gamma \) that

$$\begin{aligned} \theta _1= & {} \frac{{\tilde{C}}_2(1-2{\tilde{\beta }}_1^2)}{\frac{{\tilde{C}}_3}{{\tilde{\gamma }}_1}({\tilde{C}}_1+(1+2C_{*}^2{\tilde{C}}_1){\tilde{\gamma }}_1)}\geqslant \frac{{\tilde{C}}_2}{{\tilde{C}}_3\left( \frac{{\tilde{C}}_1}{{\tilde{\gamma }}_1}+1+2C_*^2{\tilde{C}}_1\right) } (1-\rho ^2)\nonumber \\= & {} \frac{ C_2(1+{\hat{C}}_2{\tilde{\kappa }}^2(h_0))}{C_3(1+{\hat{C}}_2{\tilde{\kappa }}^2(h_0))\left( \frac{C_1(1-{\hat{C}}_1{\tilde{\kappa }}^2(h_0))}{2{\tilde{\gamma }}_1}+1+C_{*}^2 C_1(1-{\hat{C}}_1{\tilde{\kappa }}^2(h_0))\right) } (1-\rho ^2)\nonumber \\\geqslant & {} \frac{ C_2}{C_3\left( \frac{C_1}{\gamma }+1+2C_*^2 C_1\right) } (1-\rho ^2)=\frac{ C_2\gamma }{C_3(C_1+(1+2C_*^2 C_1)\gamma )} (1-\rho ^2)>\theta ,\nonumber \\ \end{aligned}$$
(5.17)

provided \(h_0\ll 1\). This implies

$$\begin{aligned} \sum \limits _{T\in R_{\mathcal {T}_{h_k}\rightarrow \mathcal {T}_{h_*}}}\eta _{h_k}^2((y_{h_k},p_{h_k}),T)>\theta \sum \limits _{T\in \mathcal {T}_{h_k}}\eta _{h_k}^2((y_{h_k},p_{h_k}),T). \end{aligned}$$

Note that Algorithm 3.7 selects a minimal set \(\mathcal {M}_{h_k}\) satisfying

$$\begin{aligned} \sum \limits _{T\in \mathcal {M}_{h_k}}\eta _{h_k}^2((y_{h_k},p_{h_k}),T)\geqslant \theta \sum \limits _{T\in \mathcal {T}_{h_k}}\eta _{h_k}^2((y_{h_k},p_{h_k}),T). \end{aligned}$$

Thus,

$$\begin{aligned} \#\mathcal {M}_{h_k}\leqslant & {} \#R_{\mathcal {T}_{h_k}\rightarrow \mathcal {T}_{h_*}}\leqslant \#\mathcal {T}_{h_*}-\#\mathcal {T}_{h_k}\leqslant \#\mathcal {T}_{h_\varepsilon }-\#\mathcal {T}_{h_0}\\\leqslant & {} \left( \frac{1}{\sqrt{2}}\rho _1\right) ^{-{1\over s}}\Big (\Vert (y-y_{h_k},p-p_{h_k})\Vert _{a}^2+ \gamma \mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})\Big )^{-{1\over 2s}}\\&\times |(y,p,y_d)|_s^{1\over s}, \end{aligned}$$

which is the desired result with an explicit dependance on the discrepancy between \(\theta \) and \(\frac{ C_2\gamma }{C_3(C_1+(1+2C_*^2 C_1)\gamma )}\). \(\square \)

We are now ready to prove that Algorithm 3.8 possesses optimal complexity for the state and adjoint state approximations.

Theorem 5.4

Let \((u,y,p)\in U_{ad}\times H_0^1(\Omega )\times H_0^1(\Omega )\) be the solution of problem (3.1)–3.2 and \((u_{h_n},y_{h_n},p_{h_n})\in U_{ad}\times V_{h_n}\times V_{h_n}\) be a sequence of solutions of problem (3.7)–(3.8) corresponding to a sequence of finite element spaces \(V_{h_n}\) with partitions \(\mathcal {T}_{h_n}\) produced by Algorithm 3.8. Then the nth iterate solution \((y_{h_n},p_{h_n})\) of Algorithm 3.8 satisfies the optimal bound

$$\begin{aligned} \Vert (y-y_{h_n},p-p_{h_n})\Vert _{a}^2+ \gamma \mathrm{osc}^2((y_{h_n},p_{h_n}),\mathcal {T}_{h_n}) \lesssim (\#\mathcal {T}_{h_n}-\#\mathcal {T}_{h_0})^{-2s}, \end{aligned}$$
(5.18)

where the hidden constant depends on the exact solution (uyp) and the discrepancy between \(\theta \) and \(\frac{ C_2\gamma }{C_3(C_1+(1+2C_*^2 C_1)\gamma )}\).

Proof

It follows from (2.22) and (5.10) that

$$\begin{aligned}&\#\mathcal {T}_{h_n}-\#\mathcal {T}_{h_0}\lesssim \sum \limits _{k=0}^{n-1}\#\mathcal {M}_{h_k}\nonumber \\&\quad \lesssim \sum \limits _{k=0}^{n-1}\Big (\Vert (y-y_{h_k},p-p_{h_k})\Vert _{a}^2+ \gamma \mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})\Big )^{-{1\over 2s}} |(y,p,y_d)|_s^{1\over s}.\qquad \qquad \end{aligned}$$
(5.19)

From the lower bound (3.49) we have

$$\begin{aligned} \Vert (y-y_{h_k},p-p_{h_k})\Vert _{a}^2+ \gamma \eta ^2_{h_k}((y_{h_k},p_{h_k}),\Omega )\leqslant & {} C_6\Big (\Vert (y-y_{h_k},p-p_{h_k})\Vert _{a}^2\\&+\, \gamma \mathrm{osc}^2((y_{h_k},p_{h_k}),\mathcal {T}_{h_k})\Big ), \end{aligned}$$

where \(C_6=\max (1+\frac{\gamma }{C_2},\frac{ C_3}{C_2})\). Then we arrive at

$$\begin{aligned} \#\mathcal {T}_{h_n}-\#\mathcal {T}_{h_0}\lesssim & {} \sum \limits _{k=0}^{n-1}\Big (\Vert (y-y_{h_k},p-p_{h_k})\Vert _{a}^2+ \gamma \eta _{h_k}^2((y_{h_k},p_{h_k}),\Omega )\Big )^{-{1\over 2s}} \nonumber \\&\times |(y,p,y_d)|_s^{1\over s}. \end{aligned}$$
(5.20)

Due to (4.19), we obtain for \(0\leqslant k< n\) that

$$\begin{aligned} \Vert (y-y_{h_n},p-p_{h_n})\Vert _{a}^2+ \gamma \eta _{h_n}^2((y_{h_n},p_{h_n}),\Omega )\leqslant & {} \beta ^{2(n-k)}\Big (\Vert (y-y_{h_k},p-p_{h_k})\Vert _{a}^2\\&+ \gamma \eta _{h_k}^2((y_{h_k},p_{h_k}),\Omega )\Big ). \end{aligned}$$

Thus,

$$\begin{aligned} \#\mathcal {T}_{h_n}-\#\mathcal {T}_{h_0}\lesssim & {} \Big (\Vert (y-y_{h_n},p-p_{h_n})\Vert _{a}^2+ \gamma \eta _{h_n}^2((y_{h_n},p_{h_n}),\Omega )\Big )^{-{1\over 2s}} |(y,p,y_d)|_s^{1\over s}\nonumber \\\sum \limits _{k=0}^{n-1}\beta ^{\frac{n-k}{s}}\lesssim & {} \Big (\Vert (y-y_{h_n},p-p_{h_n})\Vert _{a}^2+ \gamma \eta _{h_n}^2((y_{h_n},p_{h_n}),\Omega )\Big )^{-{1\over 2s}}|(y,p,y_d)|_s^{1\over s},\nonumber \\ \end{aligned}$$
(5.21)

where the last inequality holds due to the fact that \(\beta <1\).

From (5.6), (5.7) we have

$$\begin{aligned} \mathrm{osc}^2\left( (y_{h_n},p_{h_n}),\mathcal {T}_{h_n}\right) \leqslant \eta _{h_n}^2\left( (y_{h_n},p_{h_n}),\Omega \right) , \end{aligned}$$

which together with (5.21) yields

$$\begin{aligned} \#\mathcal {T}_{h_n}-\#\mathcal {T}_{h_0} \lesssim \Big (\Vert (y-y_{h_n},p-p_{h_n})\Vert _{a}^2+ \gamma \mathrm{osc}^2((y_{h_n},p_{h_n}),\mathcal {T}_{h_n})\Big )^{-{1\over 2s}}, \end{aligned}$$
(5.22)

this completes the proof. \(\square \)

Remark 5.5

From (3.35) and the equivalence property (3.13) we can conclude that Theorem 4.2 also implies the convergence of \(\Vert u-u_{h_n}\Vert _{0,\Omega }\), namely, for the n-th iterate solution \(u_{h_n}\) of Algorithm 3.8 there holds

$$\begin{aligned} \Vert u-u_{h_n}\Vert _{0,\Omega }^2\lesssim \beta ^{2n}. \end{aligned}$$
(5.23)

We remark that the control variable can also be included into the complexity analysis of AFEM for optimal control problems to obtain

$$\begin{aligned} \Vert u-u_{h_n}\Vert _{0,\Omega }^2\lesssim (\#\mathcal {T}_{h_n}-\#\mathcal {T}_{h_0})^{-2s}. \end{aligned}$$
(5.24)

However, the above results are sub-optimal for the optimal control as illustrated by the numerical results in Sect. 6. To prove the optimality of the AFEM for the control variable, it seems that we need to work with an AFEM based on \(L^2\)-norm error estimators. We refer to [20] for an optimal a priori error estimate. We expect that the results in [12] will enable us to prove the optimal convergence of the AFEM for the optimal control u, this will be postponed to future work.

6 Numerical experiments

In this section we carry out some numerical tests in two dimensions to support our theoretical results obtained in this paper. We take the elliptic operator L as \(-\Delta \) with homogeneous Dirichlet boundary conditions for all the examples.

Example 6.1

This example is taken from [1]. The domain \(\Omega \) can be described in polar coordinates by

$$\begin{aligned} \Omega =\left\{ (r,\vartheta ),\ \ 1<r<1,\ \ \ 0<\vartheta <{3\over 2}\pi \right\} . \end{aligned}$$

We take the exact solutions as

$$\begin{aligned} y(r,\vartheta )= & {} (r^\lambda -r^{\nu _1})\sin (\lambda \vartheta ),\\ p(r,\vartheta )= & {} \alpha (r^\lambda -r^{\nu _2})\sin (\lambda \vartheta ),\\ u(r,\vartheta )= & {} P_{[a,b]}\left\{ -\frac{p}{\alpha }\right\} \end{aligned}$$

with \(\lambda ={2\over 3}\) and \(\nu _1=\nu _2={5\over 2}\). We set \(\alpha = 0.1\), \(a=-0.3\) and \(b=1\). We assume the additional right hand side f for the state equation.

We give the numerical results for the optimal control approximation by Algorithm 3.8 with parameter \(\theta = 0.4\) and \(\theta = 0.5\). Figure 1 shows the profiles of the numerically computed optimal state and adjoint state. We present in Fig. 2 the triangulations by Algorithm 3.8 after 8 and 10 adaptive iterations. We can see that the meshes are concentrated on the reentrant corner where the singularities are located. We also illustrate the active sets of the continuous solution, the discrete solutions with variational control discretization and piecewise linear control discretization after 9 adaptive iterations with \(\theta =0.4\). In this example only the lower bound \(u\geqslant -0.3\) is active. Figure 3 clearly shows that the active set crosses element edges and is not restricted to finite element edges by our variational discretization for control u, and is much closer to that of the continuous solution compared with full control discretization.

Fig. 1
figure 1

The profiles of the discretised optimal state \(y_h\) (left) and adjoint state \(p_h\) (right) for Example 6.1 on the adaptively refined mesh

Fig. 2
figure 2

The meshes after 8 (left) and 10 (right) adaptive iterations for Example 6.1 generated by Algorithm 3.8 with \(\theta =0.4\)

Fig. 3
figure 3

Left The red line depicts the boarder of the active set of the continuous solution, the blue line depicts the boarder of the active set when using variational discretization, and the cyan line depicts the boarder of the active set obtained by using piecewise linear, continuous controls for Example 6.1. Right Zoom near the corner

To illustrate the efficiency of adaptive finite element method for solving optimal control problems, we show in the left plot of Fig. 4 the error histories of the optimal control, state and adjoint state with uniform refinement. We can only observe the reduced orders of convergence which are less than one for the energy norms of the state and adjoint state, and less than two for the \(L^2\)-norm of the control. In the right plot of Fig. 4 we present the convergence behaviours of the optimal control, state and adjoint state, as well as the error estimators \(\eta _{y,h}(y_h,\Omega )\) and \(\eta _{p,y}(p_h,\Omega )\) for the state and adjoint state equations with adaptive refinement. In Fig. 5 we present the convergence of the error \(\Vert (y-y_h,p-p_h)\Vert _a\) and error indicator \(\eta _h((y_h,p_h),\Omega )\) with \(\theta = 0.4\) and \(\theta = 0.5\), respectively. It is shown from Fig. 5 that the error \(\Vert (y-y_h,p-p_h)\Vert _a\) is proportional to the a posteriori error estimators, which implies the efficiency of the a posteriori error estimators given in Sect. 3. Moreover, we can also observe that the convergence order of the error \(\Vert (y-y_h,p-p_h)\Vert _a\) is approximately parallel to the line with slope \(-1/2\) which is the optimal convergence rate we can expect by using linear finite elements, this coincides with our theory in Sect. 5. For the error \(\Vert u-u_h\Vert _{0,\Omega }\) we can observe a reduction with slope \(-1\), which is better than the results presented in Remark 5.5, and strongly suggests that the convergence rate for the optimal control is not optimal.

Fig. 4
figure 4

The convergence history of the optimal control, state and adjoint state on uniformly refined meshes (left), and the convergence of the errors and estimators on adaptively refined meshes (right) for Example 6.1 generated by Algorithm 3.8

Fig. 5
figure 5

The convergence history of the optimal control, the state and adjoint state and error indicator on adaptively refined meshes with \(\theta =0.4\) (left) and \(\theta =0.5\) (right) for Example 6.1 generated by Algorithm 3.8

Example 6.2

In the second example we consider an optimal control problem without explicit solutions. We set \(\Omega =(-1,1)^2\), \(\alpha = 10^{-3}\), \(a=-10\) and \(b=10\). The desired state \(y_d\) is chosen as 10, 1, \(-10\) and \(-1\) in the first, second, third and fourth quadrant, respectively.

Similar to the above example Fig. 6 shows the profiles of the numerically computed optimal state and adjoint state. We present in the left plot of Fig. 7 the triangulation generated by Algorithm 3.8 after 8 adaptive iterations with parameter \(\theta =0.5\). From Fig. 6 we can see that the state and adjoint state are very smooth, so the AFEM should produce a quasi-uniform mesh in this case and indeed we can observe this phenomena from the left plot of Fig. 7. Since there are no explicit solutions we can not show the convergence of the error \(\Vert (y-y_h,p-p_h)\Vert _a\) as in Example 6.1. Instead we show in the right plot of Fig. 7 the convergence of the error indicator \(\eta _h((y_h,p_h),\Omega )\), the error estimators \(\eta _{y,h}(y_h,\Omega )\) and \(\eta _{p,y}(p_h,\Omega )\) for the state and adjoint state equations. We can observe an error reduction with slope \(-1/2\). We also plot in Fig. 8 the boarders of the active sets when using variational control discretization and piecewise linear control discretization. In this example, both the upper and lower bounds are active and we can observe a very sharp boundary between the active sets of the upper and lower bounds.

Fig. 6
figure 6

The profiles of the discretised optimal state \(y_h\) (left) and adjoint state \(p_h\) (right) for Example 6.2 on adaptively refined mesh

Fig. 7
figure 7

The mesh (left) after 8 adaptive iterations and the convergence history of the error estimators on adaptively refined meshes (right) with \(\theta =0.5\) for Example 6.2 generated by Algorithm 3.8

Fig. 8
figure 8

Left The blue (upper bound) and yellow (lower bound) lines depict the boarders of the active sets when using variational discretization, and the red (upper bound) and cyan (lower bound) lines depict the boarders of the active sets obtained by using piecewise linear, continuous controls for Example 6.2. Right Zoom near the origin

Fig. 9
figure 9

The profiles of the discretised optimal state \(y_h\) (left) and adjoint state \(p_h\) (right) for Example 6.3 on adaptively refined mesh

Fig. 10
figure 10

The mesh (left) after 10 adaptive iteration and the convergence history of the error estimators on adaptively refined meshes (right) with \(\theta =0.4\) for Example 6.3 generated by Algorithm 3.8

Fig. 11
figure 11

Left The blue line depicts the boarder of the active set when using variational discretization and the red line depicts the boarder of the active set obtained by using piecewise linear, continuous controls for Example 6.3. Right Zoom near the corner

Example 6.3

In this example we consider an optimal control problem without explicit solutions defined on domain \(\Omega =(-1,1)\times (-1,1){\backslash } [0,1)\times (x_1,0]\). We set \(\alpha = 10^{-2}\), \(a=0\) and \(b=8\). We take the desired state \(y_d=2\).

We show in Fig. 9 the profiles of the numerically computed optimal state and adjoint state, singularities for both the state and adjoint state can be observed around the reentrant corner. We present in the left plot of Fig. 10 the triangulation generated by Algorithm 3.8 after 8 adaptive iterations with \(\theta =0.5\) which is locally refined around the corner. Similar to Examples 6.1 and 6.2 we also illustrate the active sets of the discrete solutions with both variational discretization and piecewise linear discretization after 9 adaptive iterations with \(\theta =0.4\). In this example only the upper bound \(u\leqslant 8\) is active. Figure 11 clearly shows the advantage of variational dicretization over full discretization for control u. Since there are no explicit solutions we show in the right plot of Fig. 10 the convergence of the error indicator \(\eta _h((y_h,p_h),\Omega )\), the error estimators \(\eta _{y,h}(y_h,\Omega )\) and \(\eta _{p,y}(p_h,\Omega )\) for the state and adjoint state equations. We can also observe an error reduction with slope \(-1/2\).

7 Conclusion and outlook

In this paper we give a rigorous convergence analysis of the adaptive finite element algorithm for optimal control problems governed by linear elliptic equations. We prove that the AFEM is a contraction, for the sum of the energy errors and the scaled error estimators of the state y and the adjoint state p, between two consecutive adaptive loops. We also show that the AFEM yields a decay rate of the energy errors of the state y and the adjoint state p plus oscillations of the state and adjoint state equations in terms of the number of degrees of freedom.

We expect that the results should also be valid for optimal Neumann boundary control problems (see [27]) by the following observations. The key point for the convergence analysis is the equivalence property presented in Theorem 3.3 where the relation between the finite element optimal control approximation and the standard finite element boundary value approximation is established. Consider the governing equation of the Neumann boundary control problem:

$$\begin{aligned} \left\{ \begin{array}{ll} Ly=f &{}\quad \mathrm{in}\ \Omega , \\ \ A\nabla y\cdot n=u &{}\quad \mathrm{on}\ \partial \Omega . \end{array}\right. \end{aligned}$$

Similar to the proof of Theorem 3.3 we can conclude from the trace theorem that

$$\begin{aligned} \Vert u-u_h\Vert _{0,\partial \Omega }\lesssim \kappa ^{1\over 2}(h)(\Vert y-y_h\Vert _{a,\Omega }+\Vert p-p_h\Vert _{a,\Omega }), \end{aligned}$$

where \(u_h\) is the discrete optimal control. Then we can obtain the counterpart of (3.20), (3.21) for Neumann boundary control problems

$$\begin{aligned} \Vert y-y_h\Vert _{a,\Omega }= & {} \Vert y^h-y_h\Vert _{a,\Omega }+O(\kappa ^{1\over 2}(h))\Big (\Vert y-y_h\Vert _{a,\Omega } +\Vert p- p_h\Vert _{a,\Omega }\Big ),\\ \Vert p-p_h\Vert _{a,\Omega }= & {} \Vert p^h- p_h\Vert _{a,\Omega } +O(\kappa ^{1\over 2}(h))\Big (\Vert y-y_h\Vert _{a,\Omega } +\Vert p- p_h\Vert _{a,\Omega }\Big ) \end{aligned}$$

provided \(h_0\ll 1\). Thus, the convergence and complexity analysis of AFEM carries out to the Neumann boundary control problems.

There are many important issues remained unsolved for the convergence analysis of AFEM for optimal control problems compared to AFEM for boundary value problems. Firstly, at this moment we only prove the optimality of AFEM for energy errors of the state and adjoint state variables, the convergence for the optimal control u is sub-optimal. To prove the optimality of AFEM for the optimal control u it seems that we should work on the optimality of AFEM for boundary value problems under \(L^2\)-norms, as done in [12]. This complicates the convergence analysis with additional restrictions to the adaptive algorithms and will be postponed to future work.

Secondly, the convergence analysis of the adaptive finite element algorithm for other kinds of optimal control problems like Stokes control problems (see [28]), and non-standard finite element algorithm such as mixed finite element methods (see [8]) remains open and will be addressed in forthcoming papers.

Thirdly, we only prove the convergence of AFEM for optimal control problems with control constraints using a variational control discretization. The full control discretization concept by using piecewise constant or piecewise linear finite elements is also very important among the numerical methods for control problems. This kind of control discretization results in an additional discretised control space and an additional contribution to the a posteriori error estimators (see [22]) which should be incorporated within the adaptive algorithm and the corresponding convergence analysis. We also intend to generalise our approach in this paper to analyse the convergence of AFEM for optimal control problems with full control discretization in the future.