1 Introduction

Drift–diffusion equations play an important role in modeling the movement of charged particles particularly in semiconductor physics [1, 2, 10, 28, 45,46,47, 53]. Besides the applications to semiconductors, these kinds of PDEs have many applications in the simulation of batteries [54, 64], charged particles in biology [52, 65] and physical chemistry [7, 30, 43, 44].

We consider the following model time dependent drift–diffusion equation posed on a Lipschitz polyhedral domain \(\varOmega \subset {\mathbb {R}}^d (d = 2,3)\): we seek to determine the unknown electron density u and the electric potential \(\phi \) that satisfy

$$\begin{aligned} u_t- \varDelta u+\nabla \cdot (u\nabla {\phi })&=0&\text { in }\varOmega \times (0,T], \end{aligned}$$
(1a)
$$\begin{aligned} -\varepsilon \varDelta {\phi }+u&=0&\text { in }\varOmega \times (0,T],\end{aligned}$$
(1b)
$$\begin{aligned} u&=g_u&\text { on }\partial \varOmega \times (0,T],\end{aligned}$$
(1c)
$$\begin{aligned} {\phi }&=g_{{\phi }}&\text { on }\partial \varOmega \times (0,T],\end{aligned}$$
(1d)
$$\begin{aligned} u(\cdot ,0)&=u_0&\text { in }\varOmega , \end{aligned}$$
(1e)

where \(\varepsilon \) is a constant and typically small in real applications. In our analysis, we have not analyzed the \(\varepsilon \) dependence of the coefficients. This will be considered in future work. To simplify the presentation, we set \(\varepsilon =1\) in the rest of the paper. We shall discuss the smoothness assumptions on \(g_u\), \(g_{\phi }\) and \(u_0\) needed for our analysis later in the paper. Applications of the drift–diffusion model often involve more complicated versions of the above model, for example including additional particle transport equations (for example, for holes) and recombination terms. However the above system contains the principle difficulty from the point of view of proving convergence: the term \(\nabla \cdot (u\nabla \phi )\).

Theoretical and numerical studies for this type of partial differential equation (PDE) have a long history. For the theoretical analysis of the drift–diffusion system we refer to [5, 6, 34, 35, 46, 56] and the references therein. Computational studies started in the 1960s [29, 39] and many discretization methods have been used for the drift–diffusion system in the past decades. For an extensive body of literature devoted to this subject we refer to, e.g., the finite difference method [31, 40, 50, 55], the finite volume method [3, 4, 12,13,14], the standard finite element method (FEM) [36, 52, 62], and mixed FEM [37, 41]. Furthermore, there are many new models in which the drift–diffusion equation coupled with other PDEs; such as Stokes [42], Navier–Stokes [61] and Darcy flow [32]. However these extensions are outside the scope of this paper.

The product of the gradient of the electric potential, \(\nabla \phi \) with electron concentration u in (1a) can cause a reduction in the convergence rate of the solution if the numerical schemes for the two equations are not properly devised. In [62], the authors obtained an optimal convergence rate in \(H^1\) norm but a suboptimal in \(L^2\) norm by using the standard FEM. To overcome the convergence order reduction, a new method was proposed to discretize the system (1): mixed FEM for Poisson equation (1b) and standard FEM for (1a). This scheme provides optimal error estimates for u and \(\phi \) in both the \(H^1\) and \(H(\mathrm{{}div})\) norms. Very recently, the authors in [37] obtained an optimal convergence rate by using mixed FEM for both (1a) and (1b).

In the drift–diffusion model, typically, the magnitude of \(\nabla \phi \) is huge (see [9]). Therefore, it is natural to consider the discontinuous Galerkin (DG) method to discretize the system (1). In [51], a local DG (LDG) method was used to study a 1D drift-diffsuion equation, they obtained an optimal convergence rate by using an important relationship between the gradient and interface jump of the numerical solution with the independent numerical solution of the gradient in the LDG methods; see [63, Lemma 2.4] and [51, Lemma 4.3]. However, to the best of our knowledge, the inequality in [63, Lemma 2.4] is not straightforward to extend to high dimensions.

Moreover, the number of degrees of freedom for the DG or LDG methods is much larger compared to standard FEM; this is the main drawback of DG methods. Hybridizable discontinuous Galerkin (HDG) methods were originally proposed in [25] to remedy this issue. The global system of HDG methods only involve the degrees of freedom on the interfaces between elements. Therefore, HDG methods have a significantly smaller number of degrees of freedom in the global system compared to DG methods, LDG methods or mixed FEM. Moreover, HDG methods keep the advantages of DG methods, which are suitable for the drift term if \(\nabla \phi \) is large. For more information about HDG methods for convection diffusion problems; see, e.g., [17,18,19, 33, 59].

There are many different HDG schemes, see for example [20,21,22,23,24,25, 48]. Among all of these methods, two are most popular, following standard terminology we call them are \(\hbox {HDG}_k\) and HDG(A) in the rest of the paper. The \(\hbox {HDG}_k\) method uses polynomials of degree k to approximate the solution, the flux, and the trace on the interfaces between elements together with a positive stabilization parameter chosen to be \({\mathcal {O}}(1)\). The HDG(A) method uses polynomial degree \(k+1\) to approximate the solution, polynomial degree k to approximate the flux and uses the so called Lehrenfeld-Schöberl stabilization function, see [48, Remark 1.2.4]. These two methods were used to study the Poisson equation in [27, 49, 57], the linear elasticity [22, 58], the convection diffusion equation in [18, 19, 59], the Stokes equation in [26, 38] and the Navier–Stokes equation in [11, 60].

The goal of this paper is to design an HDG scheme by the appropriate choice of HDG spaces such that the overall scheme is optimally convergent and to prove semi-discrete optimal convergence rates in d spatial dimensions (\(d=2,3\)). The result is a new HDG scheme for the drift–diffusion system with attractive convergence properties. We shall assume that a suitably regular solution of the drift–diffusion system exists. For existence theory, see for example the book of Markowich [53].

To develop our HDG method, we write the drift–diffusion system as a first order system by introducing new variable \({\varvec{q}}\) and \({\varvec{p}}\) such that \({\varvec{q}}+\nabla u=0\), \({\varvec{p}}+\nabla {\phi }=0\). Then (1), becomes the problem of finding \((u,{{\varvec{q}}},\phi ,{{\varvec{p}}})\) such that

$$\begin{aligned} {\varvec{q}}+\nabla u&=0&\text { in }\varOmega \times (0,T], \end{aligned}$$
(2a)
$$\begin{aligned} {\varvec{p}}+\nabla {\phi }&=0&\text { in }\varOmega \times (0,T],\end{aligned}$$
(2b)
$$\begin{aligned} u_t+ \nabla \cdot {\varvec{q}} - \nabla \cdot ({\varvec{p}} u)&=0&\text { in }\varOmega \times (0,T],\end{aligned}$$
(2c)
$$\begin{aligned} {\varepsilon }\nabla \cdot {\varvec{p}}+u&=0&\text { in }\varOmega \times (0,T],\end{aligned}$$
(2d)
$$\begin{aligned} u&=g_u&\text { on }\partial \varOmega \times (0,T],\end{aligned}$$
(2e)
$$\begin{aligned} {\phi }&=g_{{\phi }}&\text { on }\partial \varOmega \times (0,T],\end{aligned}$$
(2f)
$$\begin{aligned} u(\cdot ,0)&=u_0&\text { in }\varOmega . \end{aligned}$$
(2g)

In this work, we only

We can now introduce our HDG formulation by first defining the mesh. Let \(\mathcal {T}_h\) denote a collection of disjoint simplexes K that partition \(\varOmega \) and let \(\partial \mathcal {T}_h\) be the set \(\{\partial K: K\in \mathcal {T}_h\}\). Here h denotes the maximum diameter of the simplices in \(\mathcal {T}_h\). Since we will need to use an inverse inequality in our analysis, we assume that the mesh is shape regular and quasi-uniform.

We denote by \(\mathcal {E}_h\) the set of all faces (or edges in when \(d=2\)) in the mesh. Then we define the set of interior and boundary faces or edges denoted \(\mathcal {E}_h^o \) and \(\mathcal {E}_h^\partial \) respectively. From now on, to simplify terminology, we shall refer to elements of \(\mathcal {E}_h\) as faces, even if \(d=2\). For each face e we say \(e\in \mathcal {E}_h^o \) is an interior face if the Lebesgue measure of \(e = \partial K^+ \cap \partial K^-\) for some pair of elements \(K^+,K^-\in \mathcal {T}_h\) is non-zero, similarly, \(e \in \mathcal {E}_h^{\partial }\) is a boundary face if the Lebesgue measure of \(e = \partial K \cap \partial \varOmega \) is non-zero. We set

$$\begin{aligned} (w,v)_{\mathcal {T}_h} := \sum _{K\in \mathcal {T}_h} (w,v)_K, \quad \left\langle \zeta ,\rho \right\rangle _{\partial \mathcal {T}_h} := \sum _{K\in \mathcal {T}_h} \left\langle \zeta ,\rho \right\rangle _{\partial K}, \end{aligned}$$

where \((\cdot ,\cdot )_K\) denotes the \(L^2(K)\) inner product and \( \langle \cdot , \cdot \rangle _{\partial K} \) denotes the \(L^2\) inner product on \(\partial K\).

The HDG method uses discontinuous finite element spaces \({\varvec{Q}}_h\), \( V_h\), \({\widehat{V}}_{h}\), \({\varvec{S}}_h\), \(\varPsi _h\), \({\widehat{\varPsi }}_{h}\) that we shall discuss shortly. Assuming these are given, the HDG method seeks \(({\varvec{q}}_h ,u_h ,{\widehat{u}}_h )\in {\varvec{Q}}_h\times V_h\times {\widehat{V}}_{h}({g_u})\) and \(({\varvec{p}}_h ,{\phi }_h ,\widehat{\phi }_h ) \in {\varvec{S}}_h\times \varPsi _h\times \widehat{\varPsi }_{h}(g_{\phi })\) satisfying

$$\begin{aligned} ({\varvec{q}}_h, {\varvec{r}}_1)_{{\mathcal {T}}_h} - (u_h,\nabla \cdot {\varvec{r}}_1)_{{\mathcal {T}}_h} + \langle {\widehat{u}}_h, {\varvec{r}}_1\cdot {\varvec{n}} \rangle _{\partial {\mathcal {T}}_h}&=0, \end{aligned}$$
(3a)
$$\begin{aligned} ({\varvec{p}}_h, {\varvec{r}}_2)_{{\mathcal {T}}_h} - (\phi _h,\nabla \cdot {\varvec{r}}_2)_{{\mathcal {T}}_h} + \langle {\widehat{\phi }}_h, {\varvec{r}}_1\cdot {\varvec{n}} \rangle _{\partial {\mathcal {T}}_h}&=0, \end{aligned}$$
(3b)

for all \(({\varvec{r}}_1, {\varvec{r}}_2) \in {\varvec{Q}}_h\times {\varvec{S}}_h\), together with

$$\begin{aligned}&(u_{h,t}, w_1)_{{\mathcal {T}}_h} - ({\varvec{q}}_h, \nabla w_1)_{{\mathcal {T}}_h} + \langle \widehat{{\varvec{q}}}_h\cdot {\varvec{n}}, w_1\rangle _{\partial {\mathcal {T}}_h} + ({\varvec{p}}_h u_h, \nabla w_1)_{{\mathcal {T}}_h}\nonumber \\&\quad - \,\langle \widehat{{\varvec{p}}}_h\cdot {\varvec{n}} {\widehat{u}}_h, w_1\rangle _{\partial {\mathcal {T}}_h}=0, \end{aligned}$$
(3c)
$$\begin{aligned}&-\, ({\varvec{p}}_h, \nabla w_2)_{{\mathcal {T}}_h} + \langle \widehat{{\varvec{p}}}_h\cdot {\varvec{n}}, w_2\rangle _{\partial {\mathcal {T}}_h} + (u_h, w_2)_{{\mathcal {T}}_h} =0 \end{aligned}$$
(3d)

for all \((w_1,w_2)\in V_h\times \varPsi _h\). The boundary fluxes must satisfy

$$\begin{aligned} \langle \widehat{{\varvec{q}}}_h\cdot {\varvec{n}},\mu _1\rangle _{\partial {\mathcal {T}}_h\backslash \partial \varOmega }&=0, \end{aligned}$$
(3e)
$$\begin{aligned} \langle \widehat{{\varvec{p}}}_h\cdot {\varvec{n}},\mu _2\rangle _{\partial {\mathcal {T}}_h\backslash \partial \varOmega }&=0 \end{aligned}$$
(3f)

for all \((\mu _1,\mu _2)\in {\widehat{V}}_h(0)\times {\widehat{\varPsi }}_h(0)\). The numerical fluxes \(\widehat{{\varvec{q}}}_h\) and \(\widehat{{\varvec{p}}}_h\) will be specified later.

As in [11, 51], we shall need the following energy estimate

$$\begin{aligned}&\Vert \nabla u_h\Vert _{{\mathcal {T}}_h} + \Vert h_K^{-1/2}(u_h - {\widehat{u}}_h)\Vert _{\partial {\mathcal {T}}_h}^2 \nonumber \\&\quad \le C\left( \Vert {\varvec{q}}_h\Vert _{{\mathcal {T}}_h}^2 + \Vert h_K^{-1/2}(\varPi _k^\partial u_h - {\widehat{u}}_h)\Vert _{\partial {\mathcal {T}}_h}^2\right) . \end{aligned}$$
(4)

where \(\varPi _k^\partial \) is the \(L^2\) projection defined in (12). Inequality (4) cannot hold for the \(\hbox {HDG}_k\) method unless we take the stabilization function to be \(h_K^{-1}\). However, in this case we only have a suboptimal convergence rate for the flux \({\varvec{q}} \). Hence we need to use the HDG(A) method to approximate the Eq. (1a), i.e., we choose

$$\begin{aligned} {\varvec{Q}}_h&:=\{{\varvec{v}}_h\in [L^2(\varOmega )]^d:{\varvec{v}}_h|_K\in [\mathcal {P}^{k}(K)]^{d},\quad \forall K\in \mathcal {T}_h\},\\ V_h&:=\{v_h\in L^2(\varOmega ):v_h|_K\in \mathcal {P}^{k+1}(K),\quad \forall K\in \mathcal {T}_h\},\\ {\widehat{V}}_h(g)&:=\{\widehat{v}_h\in L^2(\mathcal {E}_h):{\widehat{v}}_h|_{{e}}\in \mathcal {P}^{k}({e}),\quad \forall {e}\in \mathcal {E}_h, {\widehat{v}}_h|_{\mathcal {E}_h^{\partial }}=\varPi _k^{\partial }g\}, \end{aligned}$$

where \(\mathcal {P}^k(K)\) denotes the set of polynomials of degree at most k on the element K (similarly \( \mathcal {P}^{k}(\mathcal {E}_h)\) denotes the set of polynomials of degree at most k on the faces in the mesh). Moreover, the numerical trace of the flux on \(\partial {\mathcal {T}}_h\) is defined as

$$\begin{aligned} \widehat{{\varvec{q}}}_h \cdot {\varvec{n}} = {{\varvec{q}}}_h \cdot {\varvec{n}} + h_K^{-1}(\varPi _k^\partial u_h - {\widehat{u}}_h), \end{aligned}$$
(5)

where \(\varPi _k^\partial \) denotes \(L^2\) projection onto \( \mathcal {P}^{k}(\mathcal {E}_h)\) which can be done face by face.

To avoid a reduction in the convergence rate for the solution \(u_h\), the polynomial degree of the space \(V_h\) for \(u_h\) and the space \({\varvec{S}}_h\) for \({\varvec{p}}_h\) need to be the same, i.e.,

$$\begin{aligned} {\varvec{S}}_h:=\{{\varvec{v}}_h\in [L^2(\varOmega )]^d:{\varvec{v}}_h|_K\in [\mathcal {P}^{k+1}(K)]^{d},\quad \forall K\in \mathcal {T}_h\}. \end{aligned}$$

If we choose the HDG(A) method to discretize (1b) we would need to use polynomials of degree \(k+2\) to approximate \(\phi \), but in this case, we get a suboptimal convergence rate for \(\phi \). Therefore, we use \(\hbox {HDG}_{k+1}\) to discretize (1b) and so choose

$$\begin{aligned} \varPsi _h&:=\{v_h\in L^2(\varOmega ):v_h|_K\in \mathcal {P}^{k+1}(K),\quad \forall K\in \mathcal {T}_h\},\\ {\widehat{\varPsi }}_h(g)&:=\{\widehat{v}_h\in L^2(\mathcal {E}_h):{\widehat{v}}_h|_{{e}}\in \mathcal {P}^{k+1}({e}),\quad \forall {e}\in \mathcal {E}_h, {\widehat{v}}_h|_{\mathcal {E}_h^{\partial }}=\varPi _{k+1}^{\partial }g\}. \end{aligned}$$

and the numerical trace of the flux on \(\partial {\mathcal {T}}_h\) is defined as

$$\begin{aligned} \widehat{{\varvec{p}}}_h \cdot {\varvec{n}} = {{\varvec{p}}}_h \cdot {\varvec{n}} + \tau (\phi _h - {\widehat{\phi }}_h), \end{aligned}$$
(6)

where \(\tau \) is a positive \({\mathcal {O}}(1)\) function and the initial condition \(u_h(0)\) will be specified in Sect. 3.1. If needed, \(\tau \) can be chosen to provide upwind stabilization as in [59].

The organization of the paper is as follows. In Sect. 2, we present our main results and some useful projections. Then the proof of the main results is given in Sect. 3. In Sect. 4, we provide some numerical experiments to support our theoretical results.

2 Main Result and Preliminary Material

In this section, we first present the main result in Sect. 2.1 for the semidiscrete HDG formulation (3). Next, we provide preliminary material in Sect. 2.2, which are required for the analysis.

We use the standard notation \(W^{m,p}(D)\) for Sobolev spaces on D with norm \(\Vert \cdot \Vert _{m,p,D}\) and seminorm \(|\cdot |{m,p,D}\). We also write \(H^m(D)\) instead of \(W^{m,2}(D)\), and we omit the index p in the corresponding norms and seminorms. Moreover, we omit the index m when \(m=0\).

Throughout, we assume the data and the solution of (1) are smooth enough for our analysis.

2.1 Main Result

The proof of our main error estimate relies on the use of duality arguments and requires sufficient regularity for the solution of the corresponding “adjoint”problem. In particular:

Assumption 2.1

Assume that the component \({\varvec{p}}\) of the solution of (2) is such that \({\varvec{p}}\in H^1((0,T),{\varvec{W}}_1^\infty (\varOmega ))\). Let \(M>0\) be such that for all time \(t\in (0,T)\)

$$\begin{aligned} M\ge \Vert \nabla \cdot {\varvec{p}}(t)\Vert _{0,\infty }+2\Vert \partial _t{\varvec{p}}(t)\Vert _{0,\infty }. \end{aligned}$$
(7)

If \({{\varvec{p}}}=0\), set \(M=0\). Then, for \(\varTheta \in L^2(\varOmega \times (0,T))\), let \(({\varvec{\varPhi }},\varPsi )\) be the solution of

$$\begin{aligned} \begin{aligned} {\varvec{\varPhi }} + \nabla \varPsi&=0 \ \quad \text{ in } \varOmega ,\\ M\varPhi +\nabla \cdot {\varvec{\varPhi }}+ {\varvec{p}} \cdot \nabla \varPsi&=\varTheta \quad \text{ in } \varOmega ,\\ \varPsi&=0\quad \ \ \text{ on } \partial \varOmega . \end{aligned} \end{aligned}$$
(8)

We assume the so lution \(({\varvec{\varPsi }},\varPhi )\) has the following regularity

$$\begin{aligned} \Vert {\varvec{\varPhi }}\Vert _{H^1(\varOmega )}+\Vert \varPsi \Vert _{H^2(\varOmega )}&\le C_{\text {reg}}\Vert \varTheta \Vert _{{\mathcal {T}}_h}. \end{aligned}$$
(9)

Remark 2.2

It is well known that the above regularity holds if the domain is convex, which is usually the case in solar cell applications.

We can now state our main result for the HDG method.

Theorem 2.3

Suppose that Assumption 2.1 holds and that the mesh is quasi-uniform. Assume in addition that

$$\begin{aligned} ({\varvec{q}}, u)&\in H^2((0,T),{\varvec{H}}^{k+1}(\varOmega ))\times H^2((0,T), H^{k+2}(\varOmega )), \\ ({\varvec{p}}, \phi )&\in H^2((0,T),{\varvec{H}}^{k+2}(\varOmega ))\times H^2((0,T), H^{k+3}(\varOmega )) \end{aligned}$$

for \(k\ge 0\) solve (2). Let \(({\varvec{q}}_h, u_h, {\varvec{p}}_h, \phi _h) {\in {\varvec{Q}}_h\times V_h\times {\varvec{S}}_h\times \varPsi _h}\) be the solution of the semi-discrete HDG equations (3). Then we have

$$\begin{aligned} \Vert u- u_h\Vert _{{\mathcal {T}}_h} + \Vert \phi - \phi _h\Vert _{{\mathcal {T}}_h} + \Vert {\varvec{p}}- {\varvec{p}}_h\Vert _{{\mathcal {T}}_h} \le C h^{k+2} \end{aligned}$$

for all \(t\in [0,T]\), and

$$\begin{aligned} \sqrt{\int _0^T \Vert {\varvec{q}} - {\varvec{q}}_h\Vert _{{\mathcal {T}}_h}^2 dt } \le C h^{k+1}. \end{aligned}$$

Remark 2.4

The error estimates in Theorem 2.3 are optimal for the variables \({\varvec{q}}\), u, \({\varvec{p}}\) and \(\phi \). Since the global degrees of freedom are the numerical traces, then from the point of view of global degrees of freedom, the error estimates for the variable u is superconvergent, which, to our knowledge, is the first time this has been proved in the literature.

2.2 Preliminary Material

We first introduce the \(\hbox {HDG}_k\) projection operator \(\varPi _h({\varvec{p}},\phi ) := ({\varvec{\varPi }}_{V} {\varvec{p}},\varPi _{W}\phi )\) defined in [27], where \({\varvec{\varPi }}_{V} {\varvec{p}}\) and \(\varPi _{W}\phi \) denote components of the projection of \({\varvec{p}}\) and \(\phi \) into \({\varvec{S}}_h\) and \(\varPsi _h\), respectively. For each element \(K\in {\mathcal {T}}_h\), the projection is determined by the equations

$$\begin{aligned} ({\varvec{\varPi }}_V{\varvec{p}},{\varvec{r}})_K&= ({\varvec{p}},{\varvec{r}})_K,\quad \forall {\varvec{r}}\in [{\mathcal {P}}_{k}(K)]^d, \end{aligned}$$
(10a)
$$\begin{aligned} (\varPi _W\phi , w)_K&= (\phi , w)_K,\quad \forall w\in {\mathcal {P}}_{k}(K ),\end{aligned}$$
(10b)
$$\begin{aligned} \langle {\varvec{\varPi }}_V{\varvec{p}}\cdot {\varvec{n}}+\tau \varPi _W\phi ,\mu \rangle _{e}&= \langle {\varvec{p}}\cdot {\varvec{n}}+\tau \phi ,\mu \rangle _{e},~\quad \forall \mu \in {\mathcal {P}}_{k+1}(e) \end{aligned}$$
(10c)

for all faces e of the simplex K. The approximation properties of the \(\hbox {HDG}_k\) projection (10) are given in the following result from [27]:

Lemma 2.5

Suppose \(k\ge 0\), \(\tau |_{\partial K}\) is nonnegative and \(\tau _K^{\max }:=\max \tau |_{\partial K}>0\). Then the system (10) is uniquely solvable for \({\varvec{\varPi }}_V{\varvec{p}}\) and \(\varPi _W \phi \). Furthermore, there is a constant C independent of K and \(\tau \) such that

$$\begin{aligned} \Vert {{\varvec{\varPi }}_V}{\varvec{p}}-{\varvec{p}}\Vert _K&\le Ch_{K}^{\ell _{{\varvec{p}}}+1}|{\varvec{p}}|_{\ell _{{\varvec{p}}}+1,K}+Ch_{K}^{\ell _{{\phi }}+1}\tau _{K}^{*}{|\phi |}_{\ell _{{\phi }}+1,K}, \end{aligned}$$
(11a)
$$\begin{aligned} \Vert {{\varPi }_W}{\phi }-\phi \Vert _K&\le Ch_{K}^{\ell _{{\phi }}+1}|{\phi }|_{\ell _{{\phi }}+1,K}+C\frac{h_{K}^{\ell _{{{\varvec{p}}}}+1}}{\tau _K^{\max }}{|\nabla \cdot {\varvec{p}}|}_{\ell _{{\varvec{p}}},K} \end{aligned}$$
(11b)

for \(\ell _{{\varvec{p}}},\ell _{\phi }\) in \([0,k+1]\). Here \(\tau _K^{*}:=\max \tau |_{{\partial K}\backslash e^{*}}\), where \(e^{*}\) is a face of K at which \(\tau |_{\partial K}\) is maximum.

We next define the standard \(L^2\) projections \({\varvec{\varPi }}_{k}^o : [L^2(\varOmega )]^d \rightarrow {\varvec{Q}}_h\), \(\varPi _{k+1}^o : L^2(\varOmega ) \rightarrow V_h\), and \(\varPi _k^{\partial }: L^2({\mathcal {E}}_h) \rightarrow {\widehat{V}}_h\), which satisfy

$$\begin{aligned} \begin{aligned} ({\varvec{\varPi }}_k^o {\varvec{q}},{\varvec{r}}_1)_{K}&= ({\varvec{q}},{\varvec{r}}_1)_{K} ,\quad \forall {\varvec{r}}_1\in [{{\mathcal {P}}}_{k}(K)]^d,\\ (\varPi _{k+1}^o u,w_1)_{K}&= (u,w_1)_{K} ,\quad \forall w_1\in {\mathcal {P}}_{k+1}(K),\\ \langle \varPi _k^\partial u, \mu _1\rangle _{ e}&= \left\langle u, \mu _1\right\rangle _{e }, \quad \forall \mu _1\in {\mathcal {P}}_{k}(e). \end{aligned} \end{aligned}$$
(12)

In the analysis, we use the following classical results [16, Lemma 3.3]:

$$\begin{aligned} \Vert {{\varvec{q}} -{\varvec{\varPi }}_k^o {\varvec{q}}}\Vert _{{\mathcal {T}}_h}&\le C h^{k+1} \Vert {{\varvec{q}}}\Vert _{k+1,\varOmega },\quad \Vert {u -{\varPi _{k+1}^o u}}\Vert _{{\mathcal {T}}_h} \le C h^{k+2} \Vert {u}\Vert _{k+2,\varOmega },\end{aligned}$$
(13a)
$$\begin{aligned} \Vert {u - {\varPi _{k+1}^o u}}\Vert _{\partial {\mathcal {T}}_h}&\le C h^{k+\frac{3}{2}} \Vert {u}\Vert _{k+2,\varOmega }, \quad \Vert {w}\Vert _{\partial {\mathcal {T}}_h} \le C h^{-\frac{1}{2}} \Vert {w}\Vert _{ {\mathcal {T}}_h}, \quad \forall w\in V_h. \end{aligned}$$
(13b)

To shorten lengthy equations, we rewrite the HDG formulation (3) in the following compact form: find \(({\varvec{q}}_h ,u_h ,{\widehat{u}}_h )\in {\varvec{Q}}_h\times V_h\times {\widehat{V}}_{h}(g_u)\) and \(({\varvec{p}}_h ,{\phi }_h ,\widehat{\phi }_h ) \in {\varvec{S}}_h\times \varPsi _h\times {\widehat{\varPsi }}_{h}(g_{\phi })\) such that

(14a)
(14b)

for all \(({\varvec{r}}_1, {\varvec{r}}_2, w_1,w_2,\mu _1,\mu _2)\in {\varvec{Q}}_h\times {\varvec{S}}_h\times V_h\times \varPsi _h\times {\widehat{V}}_h(0)\times {\widehat{\varPsi }}_h(0)\), where the HDG bilinear forms , and the trilinear form are defined by

(14c)

for all \(({\varvec{q}}_h ,u_h ,{\widehat{u}}_h, {\varvec{r}}_1, w_1,\mu _1)\in {\varvec{Q}}_h\times V_h\times {\widehat{V}}_{h}(g_u)\times {\varvec{Q}}_h\times V_h\times {\widehat{V}}_{h}(0)\),

(14d)

for all \(({\varvec{p}}_h ,\phi _h ,{\widehat{\phi }}_h, {\varvec{r}}_2, w_2,\mu _2)\in {\varvec{S}}_h\times \varPsi _h\times {\widehat{\varPsi }}_{h}(g_{\phi })\times {\varvec{S}}_h\times \varPsi _h\times {\widehat{\varPsi }}_{h}(0)\),

(14e)

for all \((u_h, {\widehat{u}}_h, w_1,\mu _1)\in V_h\times {\widehat{V}}_{h}(g_{u})\times V_h\times {\widehat{V}}_{h}(0)\).

Next, we present basic properties of the operators and .

Lemma 2.6

For any \(({\varvec{q}}_h ,u_h ,{\widehat{u}}_h, {\varvec{r}}_1, w_1,\mu _1)\in {\varvec{Q}}_h\times V_h\times {\widehat{V}}_{h}(0)\times {\varvec{Q}}_h\times V_h\times {\widehat{V}}_{h}(0)\) and \(({\varvec{p}}_h ,\phi _h ,{\widehat{\phi }}_h, {\varvec{r}}_2, w_2,\mu _2)\in {\varvec{S}}_h\times \varPsi _h\times {\widehat{\varPsi }}_{h}(0)\times {\varvec{S}}_h\times \varPsi _h\times {\widehat{\varPsi }}_{h}(0)\), we have

and

The proof of Lemma 2.6 is straightforward, hence we omit it here.

The proof of the following two lemmas are found in [59, Lemma 3.2] and [8, Equation (1.3)], respectively.

Lemma 2.7

If \(({\varvec{q}}_h, u_h, {\widehat{u}}_h)\) satisfies the Eq. (3a), then we have

$$\begin{aligned} \Vert \nabla u_h\Vert _{\mathcal {T}_h} +\Vert h_K^{-1/2}(u_h-{\widehat{u}}_h)\Vert _{\partial \mathcal {T}_h} \le C \left( \Vert {\varvec{q}}_h\Vert _{\mathcal {T}_h}+\Vert h_K^{-1/2}(\varPi _{k}^{\partial } u_h-{\widehat{u}}_h)\Vert _{\partial \mathcal {T}_h} \right) . \end{aligned}$$

Lemma 2.8

(Piecewise Poincáre-Friedrichs’ inequality) Let \(v_h\in H^1(\mathcal {T}_h)\), then we have

$$\begin{aligned} \Vert v_h\Vert _{\mathcal {T}_h}^2 \le C\left( \Vert \nabla v_h\Vert _{\mathcal {T}_h}^2 +|\langle v_h,1 \rangle _{\partial \varOmega }|^2+\sum _{e\in \mathcal {E}^o_h} |e|^{d/(1-d)}\left( \int _e[[v_h]]\,ds\right) ^2 \right) , \end{aligned}$$

where |e| denotes the measure of e.

By Lemma 2.8, we immediately have the following lemma.

Lemma 2.9

(HDG Poinćare inequality) If \((v_h,{\widehat{v}}_h)\in {V}_h\times {\widehat{V}}_h(0)\), then we have

$$\begin{aligned} \Vert v_h\Vert ^2_{\mathcal {T}_h}\le C \left( \Vert \nabla v_h\Vert ^2_{\mathcal {T}_h}+ \Vert h_K^{-1/2}(\varPi _k^{\partial }v_h-{\widehat{v}}_h)\Vert ^2_{\partial \mathcal {T}_h} \right) . \end{aligned}$$

Proof

By Lemma 2.9, \({\widehat{v}}_h\) is zero on \(\partial \varOmega \) and is single valued on interior faces. We have

$$\begin{aligned} \Vert v_h\Vert ^2_{\mathcal {T}_h}&\le C \left( \Vert \nabla v_h\Vert ^2_{\mathcal {T}_h}+ \Vert h_K^{-1/2}[[v_h]] \Vert ^2_{{\mathcal {E}}_h} \right) \\&= C \left( \Vert \nabla v_h\Vert ^2_{\mathcal {T}_h}+ \Vert h_K^{-1/2}[[v_h-\varPi _k^\partial v_h +\varPi _k^\partial v_h -{\widehat{v}}_h]] \Vert ^2_{{\mathcal {E}}_h} \right) \\&\le C \left( \Vert \nabla v_h\Vert ^2_{\mathcal {T}_h}+ \Vert h_K^{-1/2}(v_h-\varPi _k^\partial v_h)\Vert _{\partial \mathcal {T}_h}^2 +\Vert h_K^{-1/2}(\varPi _k^\partial v_h -{\widehat{v}}_h)\Vert ^2_{\partial \mathcal {T}_h} \right) \\&\le C \left( \Vert \nabla v_h\Vert ^2_{\mathcal {T}_h}+ \Vert h_K^{-1/2}(\varPi _k^{\partial }v_h-{\widehat{v}}_h)\Vert ^2_{\partial \mathcal {T}_h} \right) . \end{aligned}$$

\(\square \)

3 Proof of Theorem 2.3

To prove Theorem 2.3, we follow a similar strategy to that in [15]. We first bound the error between the solution of an HDG elliptic projection defined in (15) and the solution of the system (1a). Then we bound the error between the solution of the HDG elliptic projection (15) and the HDG formulation (14a) and the error between the solution of the system (1b) and the solution of the HDG formulation (14b). A simple application of the triangle inequality then gives a bound on the error between the solution of the HDG formulation (14) and the system (1). First, we present the HDG elliptic projection.

3.1 HDG Elliptic Projection and Basic Estimates

For \(t\in [0,T]\), let \(({{\varvec{q}}}_{Ih},u_{Ih}, {\widehat{u}}_{Ih})\in {\varvec{Q}}_h\times V_h\times {\widehat{V}}_h({g_u})\) be the solution of

(15)

for all \((r_1,w_1,\mu _1)\in {\varvec{Q}}_h\times V_h\times {\widehat{V}}_h(0)\) where M is a given constant such that (7) holds.

Take the partial derivative of (15) with respect to t, hence, \((\partial _t {\varvec{q}}_{Ih}, \partial _t u_{Ih}, \partial _t {\widehat{u}}_{Ih})\in {\varvec{Q}}_h\times V_h\times {\widehat{V}}_h(\partial _t {g_u})\) is the solution of

(16)

for all \((r_1,w_1,\mu _1)\in {\varvec{Q}}_h\times V_h\times {\widehat{V}}_h({0})\).

We choose the initial condition \(u_h(0) = u_{Ih}(0)\) for the purposes of analysis. In fact, the initial condition \(u_h(0)\) can be chosen to be the \(L^2\) projection of \(u_0\), i.e., \(\varPi _k^o u_0\).

The following result, Theorem 3.1, gives the error between the solution of an HDG elliptic projection (15) and the solution of the system (1a) and the proofs are given in Sect. 6.

Theorem 3.1

For any \(t\in [0,T]\), if the elliptic regularity inequality (9) holds and h is small enough, then we have the following error estimates

$$\begin{aligned} \Vert u-u_{Ih}\Vert _{\mathcal {T}_h}&\le Ch^{k+2}\Vert u\Vert _{k+2}, \end{aligned}$$
(17a)
$$\begin{aligned} \Vert {\varvec{q}}-{\varvec{q}}_{Ih}\Vert _{\mathcal {T}_h}+\Vert h_K^{-1/2}(\varPi _k^{\partial }u_{Ih}-{\widehat{u}}_{Ih})\Vert _{\partial \mathcal {T}_h}&\le Ch^{k+1}\Vert u\Vert _{k+2}. \end{aligned}$$
(17b)

In addition, we have

$$\begin{aligned} \Vert \partial _tu-\partial _tu_{Ih}\Vert _{\mathcal {T}_h}&\le Ch^{k+2 }\Vert \partial _t u\Vert _{k+2}. \end{aligned}$$
(17c)

3.2 Error Equation Between the HDG Formulation (14) and the HDG Elliptic Projection (15)

To bound the error between the solution of the HDG elliptic projection (15) and the system (14a), and the error between the solution of the HDG formulation (14b) and the system (1b). We first derive the error equation summarized in the next lemma. To simplify notation, we define

$$\begin{aligned}&\xi _h^{{\varvec{q}}}={\varvec{q}}_{Ih}-{\varvec{q}}_h,\quad \xi _h^{u}=u_{Ih}-u_h,\quad \xi _h^{{\widehat{u}}}={\widehat{u}}_{Ih}-{\widehat{u}}_{h},\\&\xi _h^{{\varvec{p}}}={\varvec{\varPi }}_{V}{\varvec{p}}-{\varvec{p}}_h,\quad \xi _h^{{\phi }}=\varPi _{W}{\phi }-{\phi }_h,\quad \xi _h^{\widehat{\phi }}=\varPi _{k+1}^{\partial }{\phi }-\widehat{\phi }_{h}. \end{aligned}$$

Lemma 3.2

For any \(({\varvec{r}}_1, w_1,\mu _1, {\varvec{r}}_2, w_2,\mu _2)\in {\varvec{Q}}_h\times V_h\times {\widehat{V}}_{h}(0)\times {\varvec{S}}_h\times \varPsi _h\times {\widehat{\varPsi }}_{h}(0)\), we have the following error equation

(18a)
(18b)

Proof

We first prove (18a). Subtracting Eq. (14a) from (15) and using the definition of and we get

This gives

We note that the nonlinear operator is linear for each variables, hence we have

This implies

Next, we prove (18b). By the definition of in (14d), we have

By the definition of \({\varvec{\varPi }}_{V}\) and \(\varPi _{W}\) in (10) we get

Since

$$\begin{aligned} (\nabla \cdot ({\varvec{\varPi }}_{V}{\varvec{p}} - {\varvec{p}}),w_2)_{\mathcal {T}_h}&= \langle ({\varvec{\varPi }}_V {\varvec{p}} - {\varvec{p}})\cdot {\varvec{n}}, w_2 \rangle _{\partial {\mathcal {T}}_h}- ({\varvec{\varPi }}_{V}{\varvec{p}} - {\varvec{p}},\nabla w_2)_{\mathcal {T}_h} \\&= \langle ({\varvec{\varPi }}_{V}{\varvec{p}} - {\varvec{p}})\cdot {\varvec{n}}, w_2 \rangle _{\partial {\mathcal {T}}_h}. \end{aligned}$$

We have

Using the analogue of Eq. (14b) for the exact solution, and (10) we get

Therefore, subtracting Eq. (14b) we have the following error equation

\(\square \)

3.2.1 \(L^2\) Error Estimates for p and \(\phi \)

Lemma 3.3

We have the following estimate

$$\begin{aligned} \Vert \xi _h^{{\varvec{p}}}\Vert ^2_{\mathcal {T}_h} +\Vert \sqrt{\tau }(\varPi _{k+1}^{\partial }\xi _h^{{\phi }}-\xi _h^{\widehat{\phi }})\Vert ^2_{\partial \mathcal {T}_h} \le \Vert u-u_h\Vert _{\mathcal {T}_h}\Vert \xi _h^{{\phi }}\Vert _{\mathcal {T}_h}. \end{aligned}$$

Proof

We take \(({\varvec{r}}_2,w_2,\mu _2)=(\xi _h^{{\varvec{p}}},\xi _h^{{\phi }},\xi _h^{\widehat{{\phi }}} )\) in (18b) to get

On the other hand, by Lemma 2.6 we have

$$\begin{aligned} \Vert \xi _h^{{\varvec{p}}}\Vert ^2_{\mathcal {T}_h} +\Vert \sqrt{\tau }(\xi _h^{{\phi }}-\xi _h^{\widehat{\phi }})\Vert ^2_{\partial \mathcal {T}_h} \le \Vert u-u_h\Vert _{\mathcal {T}_h}\Vert \xi _h^{{\phi }}\Vert _{\mathcal {T}_h}. \end{aligned}$$

\(\square \)

If we directly apply Lemma 2.9 to get the estimate of \(\Vert \xi _h^{{\phi }}\Vert _{\mathcal {T}_h}\), we will obtain only suboptimal convergence rates. To obtain optimal rates we use the dual problem introduced in Eq. (8) with \({\varvec{p}}=0\) and \(M=0\) and assume the regularity estimate (9).

We follow the proof of Lemma 3.2 to get the following lemma.

Lemma 3.4

Let \(({\varvec{\varPhi }},\varPsi )\) solve (8) with \({\varvec{p}}=0\) and \(M=0\) having data \(\varTheta \). Then for any \(({\varvec{r}}_2, w_2,\mu _2)\in {\varvec{S}}_h\times \varPsi _h\times {\widehat{\varPsi }}_{h}(0)\), we have the following equation

Using this lemma we can now estimate \(\xi _h^{\phi }\) in terms of \(u-u_h\) and other consistency terms.

Lemma 3.5

For any \(t\in [0,T]\), if the elliptic regularity inequality (9) holds, then we have the following error estimates

$$\begin{aligned} \Vert \xi _h^{{\phi }}\Vert ^2_{\mathcal {T}_h} \le Ch^2 \Vert {\varvec{\varPi }}_V {\varvec{p}} - {\varvec{p}}\Vert _{{\mathcal {T}}_h}^2 +C\Vert u -u_h \Vert _{{\mathcal {T}}_h}^2. \end{aligned}$$

Proof

Consider the dual problem (8) with \({\varvec{p}}=0\) and \(M=0\) and \(\varTheta = \xi _h^{{\phi }}\). We take \(({\varvec{r}}_2,w_2,\mu _2) = (-{\varvec{\varPi }}_V{\varvec{\varPhi }},\varPi _W\varPsi ,\varPi _{k+1}^\partial \varPsi )\) in Eq. (18b) of Lemma 3.2 to get

(19)

On the other hand, by Lemmas 2.6 and 3.4, we have

(20)

Comparing the above two equalities (19) and (20) gives

$$\begin{aligned} \Vert \xi _h^{{\phi }}\Vert ^2_{\mathcal {T}_h}&=({\varvec{\varPi }}_V {\varvec{\varPhi }} - {\varvec{\varPhi }},\xi _h^{{\varvec{p}}})_{\mathcal {T}_h}-({\varvec{\varPi }}_V{\varvec{p}} - {\varvec{p}},{\varvec{\varPi }}_V{\varvec{\varPhi }})_{{\mathcal {T}}_h}- {( u - u_h, \varPi _W\varPsi )_{{\mathcal {T}}_h}}\\&=({\varvec{\varPi }}_V {\varvec{\varPhi }} - {\varvec{\varPhi }},\xi _h^{{\varvec{p}}})_{\mathcal {T}_h}-({\varvec{\varPi }}_V{\varvec{p}} - {\varvec{p}},{\varvec{\varPi }}_V{\varvec{\varPhi }}- {\varvec{\varPhi }})_{{\mathcal {T}}_h} \\&\quad -\,({\varvec{\varPi }}_V{\varvec{p}} - {\varvec{p}},{\varvec{\varPhi }})_{{\mathcal {T}}_h}- {( u - u_h, \varPi _W\varPsi )_{{\mathcal {T}}_h}}\\&=({\varvec{\varPi }}_V {\varvec{\varPhi }} - {\varvec{\varPhi }},\xi _h^{{\varvec{p}}})_{\mathcal {T}_h}-({\varvec{\varPi }}_V{\varvec{p}} - {\varvec{p}},{\varvec{\varPi }}_V{\varvec{\varPhi }}- {\varvec{\varPhi }})_{{\mathcal {T}}_h} \\&\quad +\,({\varvec{\varPi }}_V{\varvec{p}} - {\varvec{p}},\nabla \varPsi )_{{\mathcal {T}}_h}- {( u - u_h, \varPi _W\varPsi )_{{\mathcal {T}}_h}}\\&=({\varvec{\varPi }}_V {\varvec{\varPhi }} - {\varvec{\varPhi }},\xi _h^{{\varvec{p}}})_{\mathcal {T}_h}-({\varvec{\varPi }}_V{\varvec{p}} - {\varvec{p}},{\varvec{\varPi }}_V{\varvec{\varPhi }}- {\varvec{\varPhi }})_{{\mathcal {T}}_h} \\&\quad +\,({\varvec{\varPi }}_V{\varvec{p}} - {\varvec{p}},\nabla (\varPsi -\varPi _W\varPsi ))_{{\mathcal {T}}_h}- {( u - u_h, \varPi _W\varPsi )_{{\mathcal {T}}_h}}\\&\le C h^2 \Vert \xi _h^{{{\varvec{p}}}}\Vert _{{\mathcal {T}}_h}^2 + Ch^2 \Vert {\varvec{\varPi }}_V {\varvec{p}} - {\varvec{p}}\Vert _{{\mathcal {T}}_h}^2 +\Vert u -u_h \Vert _{{\mathcal {T}}_h} {\Vert \varPi _W \varPsi \Vert _{\mathcal {T}_h}}. \end{aligned}$$

By the continuous dependence result (9) and the projection property of \(\varPi _W\) in (11b) we get

$$\begin{aligned} \Vert \varPi _W \varPsi \Vert _{{\mathcal {T}}_h}\le \Vert \varPi _W \varPsi - \varPsi \Vert _{{\mathcal {T}}_h}+\Vert \varPsi \Vert _{{\mathcal {T}}_h} \le C\Vert \varPsi \Vert _{H^2(\varOmega )} \le C \Vert \varTheta \Vert _{{\mathcal {T}}_h} = C\Vert \xi _h^{{\phi }}\Vert _{{\mathcal {T}}_h}. \end{aligned}$$

By Lemma 3.3 and the Cauchy-Schwarz inequality we obtain the result of the lemma:

$$\begin{aligned} \Vert \xi _h^{{\phi }}\Vert ^2_{\mathcal {T}_h} \le Ch^2 \Vert {\varvec{\varPi }}_V {\varvec{p}} - {\varvec{p}}\Vert _{{\mathcal {T}}_h}^2 +C\Vert u -u_h \Vert _{{\mathcal {T}}_h}^2. \end{aligned}$$

\(\square \)

As a consequence of the above result, a simple application of the triangle inequality and Lemmas 3.3 and 3.5 gives the following bounds of \(\Vert \phi -\phi _h\Vert _{{\mathcal {T}}_h}\) and \(\Vert {\varvec{p}} - {\varvec{p}}_h\Vert _{{\mathcal {T}}_h}\):

Lemma 3.6

Let \(( {\varvec{p}}, \phi )\) and \(({\varvec{p}}_h, \phi _h)\) be the solutions of (2) and (3), respectively. For any \(t\in [0,T]\), if the elliptic regularity inequality (9) holds, then we have the following error estimates

$$\begin{aligned} \Vert \phi - \phi _h\Vert _{{\mathcal {T}}_h} + \Vert {\varvec{p}}- {\varvec{p}}_h\Vert _{{\mathcal {T}}_h} \le C_1 h^{k+2} + C \Vert u - u_h\Vert _{{\mathcal {T}}_h}, \end{aligned}$$

where \(C_1\) depends on the \(H^{k+1}(\varOmega )\) norm of \({\varvec{p}}\) at each time.

3.3 \(L^2\) Error Estimates for u

Having the result of Lemma 3.6 it remains to estimate \(u-u_h\). The fundamental estimate is contained in the next lemma.

Lemma 3.7

If h small enough, then there exists \(t_h^\star \in [0,T]\) such that for all \(t\in [0, t_h^\star ]\) we have

$$\begin{aligned} \Vert \xi _h^u\Vert ^2_{\mathcal {T}_h}+\int _{0}^t \left( \Vert \xi _h^{{\varvec{q}}}\Vert ^2_{\mathcal {T}_h} +\Vert h_K^{-1/2}(\varPi _k^{\partial }\xi _h^u-\xi _h^{{\widehat{u}}})\Vert ^2_{\partial \mathcal {T}_h}\right) dt \le Ch^{2k+4}. \end{aligned}$$

Proof

We take \(({\varvec{r}}_1,w_1,\mu _1)=(\xi _h^{{\varvec{q}}},\xi _h^u,\xi _h^{{\widehat{u}}})\) in (18a) to get

$$\begin{aligned} \begin{aligned}&(\partial _t\xi _h^u,\xi _h^u)_{\mathcal {T}_h}+ \Vert \xi _h^{{\varvec{q}}}\Vert ^2_{\mathcal {T}_h}+\Vert h_K^{-1/2}(\varPi _k^{\partial }\xi _h^u-\xi _h^{{\widehat{u}}})\Vert ^2_{\partial \mathcal {T}_h}\\&\quad =(\partial _t(u_{Ih}-u),\xi ^u_h)_{\mathcal {T}_h}+M(u-u_{Ih},\xi ^u_h)_{\mathcal {T}_h}\\&\qquad -\,( ({\varvec{p}}-{\varvec{p}}_h) u_h, \nabla \xi ^u_h)_{\mathcal {T}_h} + \langle ({\varvec{p}} - \widehat{{\varvec{p}}}_h)\cdot {\varvec{n}} {\widehat{u}}_h, \xi ^u_h - \xi _h^{{\widehat{u}}} \rangle _{\partial \mathcal {T}_h}\\&\qquad -\,( {\varvec{p}} \xi ^u_h, \nabla \xi ^u_h)_{\mathcal {T}_h} + \langle {\varvec{p}} \cdot {\varvec{n}} \xi _h^{{\widehat{u}}}, \xi ^u_h \rangle _{\partial \mathcal {T}_h}\\&\quad =:R_1+R_2+R_3+R_{4} + R_{5} +R_{6}. \end{aligned} \end{aligned}$$
(21)

We note that \(\xi _h^u(0) = u_h(0) - u_{Ih}(0) = 0\). Let \(t=0\) in (21) to get

$$\begin{aligned} \Vert \xi _h^{{\varvec{q}}}(0)\Vert ^2_{\mathcal {T}_h}+\Vert h_K^{-1/2} (\varPi _k^{\partial }\xi _h^u(0)-\xi _h^{{\widehat{u}}}(0))\Vert ^2_{\partial \mathcal {T}_h} = 0. \end{aligned}$$

This implies \(\xi _h^{{\widehat{u}}}(0) = \xi _h^u(0)=0\). Hence we have \({\widehat{u}}_h(0) = {\widehat{u}}_{Ih}(0)\). By Theorem 3.1 we have

$$\begin{aligned} \Vert \varPi _{k+1}^o u(0) -u_h(0)\Vert _{{\mathcal {T}}_h}&= \Vert \varPi _{k+1}^o u(0) - u_{Ih}(0)\Vert _{{\mathcal {T}}_h} \le Ch^{k+2},\\ \Vert \varPi _k^\partial u(0) - {\widehat{u}}_h(0)\Vert _{\partial {\mathcal {T}}_h}&= \Vert \varPi _k^\partial u(0) - {\widehat{u}}_{Ih}(0)\Vert _{\partial {\mathcal {T}}_h} \le Ch^{k+3/2}. \end{aligned}$$

For h small enough these estimates imply that

$$\begin{aligned} \Vert u(t) - \varPi _{k+1}^o u(t)\Vert _{L^\infty (\varOmega )}\le 1/2 \text{ and } \Vert u(t) - \varPi _{k}^\partial u(t)\Vert _{L^\infty ({\mathcal {E}}_h)} \le 1/2 \text{ for } \text{ all } t\in [0,T]. \end{aligned}$$
(22)

Let \({\mathcal {M}} = \max _{(t,x)\in [0,T]\times \varOmega } |u(t,x)|\), then the inverse inequality gives

$$\begin{aligned} \Vert u_h(0)\Vert _{L^\infty (\varOmega )}&\le Ch^{-d/2}\Vert \varPi _{k+1}^o u(0) -u_h(0)\Vert _{{\mathcal {T}}_h} \\&\quad +\, \Vert \varPi _{k+1}^o u(0) - u(0)\Vert _{L^\infty (\varOmega )} + \Vert u(0)\Vert _{L^\infty (\varOmega )}\\&\le Ch^{k+2-d/2} + {\mathcal {M}} + 1/2,\\ \Vert {\widehat{u}}_h(0)\Vert _{L^\infty ({\mathcal {E}}_h)}&\le Ch^{{1/2-d}/2}\Vert \varPi _k^\partial u(0) -{\widehat{u}}_h(0)\Vert _{{\mathcal {T}}_h} \\&\quad + \,\Vert \varPi _k^\partial u(0) - u(0)\Vert _{L^\infty ({\mathcal {E}}_h)} + \Vert u(0)\Vert _{L^\infty ({\mathcal {E}}_h)}\\&\le Ch^{k+2-d/2} + {\mathcal {M}} + 1/2. \end{aligned}$$

Also, since the error Eq. (18a) is continuous with respect to the time t, then there exists \(t_h^\star \in [0,T]\) such that for h small enough,

$$\begin{aligned} \Vert u_h\Vert _{L^\infty (\varOmega )} + \Vert {\widehat{u}}_h\Vert _{L^\infty ({\mathcal {E}}_h)} \le 2 {\mathcal {M}}+2. \end{aligned}$$
(23)

By the Cauchy-Schwarz inequality, Theorem 3.1 and Lemma 2.7 we get

$$\begin{aligned} R_1 +R_2&\le Ch^{k+2}\Vert \xi _h^u\Vert _{\mathcal {T}_h}\\&\le Ch^{2k+4}+\frac{1}{8}\left( \Vert \xi _h^{{\varvec{q}}}\Vert ^2_{\mathcal {T}_h} +\Vert h_K^{-1/2}(\varPi _k^{\partial }\xi _h^u-\xi _h^{{\widehat{u}}})\Vert ^2_{\partial \mathcal {T}_h} \right) . \end{aligned}$$

For the term \(R_3\), by the Cauchy-Schwarz, Lemmas 3.6, 2.9 and 2.7 we get

$$\begin{aligned} R_3&\le C\Vert {\varvec{p}}-{\varvec{p}}_h\Vert _{\mathcal {T}_h}\Vert \nabla \xi ^u_h\Vert _{\mathcal {T}_h}\\&\le C\Vert {\varvec{p}}-{\varvec{p}}_h\Vert _{\mathcal {T}_h}^2 + \frac{1}{C}\Vert \nabla \xi ^u_h\Vert _{\mathcal {T}_h}^2\\&\le Ch^{2k+4} + C\Vert u - u_h\Vert _{\mathcal {T}_h}^2 + \frac{1}{C}\Vert \nabla \xi ^u_h\Vert _{\mathcal {T}_h}^2\\&\le Ch^{2k+4} +C \Vert \xi _h^u\Vert _{{\mathcal {T}}_h}^2+\frac{1}{8}\left( \Vert \xi _h^{{\varvec{q}}}\Vert ^2_{\mathcal {T}_h}+\Vert h_K^{-1/2}(\varPi _k^{\partial }\xi _h^u-\xi _h^{{\widehat{u}}})\Vert ^2_{\partial \mathcal {T}_h}\right) . \end{aligned}$$

Also, applying Lemma 2.7 again to obtain

$$\begin{aligned} R_4&=\langle ({\varvec{p}}-\widehat{{\varvec{p}}}_h) \cdot {\varvec{n}} {\widehat{u}}_h, \xi ^u_h-\xi ^{{\widehat{u}}}_h \rangle _{\partial \mathcal {T}_h}\\&\le C \Vert h_K^{1/2}({\varvec{p}}-\widehat{{\varvec{p}}}_h)\Vert _{\partial \mathcal {T}_h} \Vert h_K^{-1/2}(\xi _h^u- \xi ^{{\widehat{u}}}_h)\Vert _{\partial \mathcal {T}_h}\\&\le Ch^{2k+4}+C \Vert \xi _h^u\Vert _{{\mathcal {T}}_h}^2+\frac{1}{8}\left( \Vert \xi _h^{{\varvec{q}}}\Vert ^2_{\mathcal {T}_h}+\Vert h_K^{-1/2}(\varPi _k^{\partial }\xi _h^u-\xi _h^{{\widehat{u}}})\Vert ^2_{\partial \mathcal {T}_h}\right) . \end{aligned}$$

For the last two terms \(R_5+R_6\), by the Assumption 2.1, we know that \({\varvec{p}}\) is bounded. Next, integration by parts to get

$$\begin{aligned} R_5 + R_6&= -( {\varvec{p}} \xi ^u_h, \nabla \xi ^u_h)_{\mathcal {T}_h} + \langle {\varvec{p}} \cdot {\varvec{n}} \xi _h^{{\widehat{u}}}, \xi ^u_h \rangle _{\partial \mathcal {T}_h}\\&= -\frac{1}{2} \langle {\varvec{p}}\cdot {\varvec{n}} (\xi ^u_h - \xi _h^{{\widehat{u}}}), \xi ^u_h - \xi _h^{{\widehat{u}}} \rangle _{{\mathcal {T}}_h} -( \nabla \cdot {\varvec{p}} \xi ^u_h, \xi ^u_h)_{\mathcal {T}_h} \\&\le \frac{1}{8} \Vert h_K^{-1/2}(\varPi _k^{\partial }\xi _h^u-\xi _h^{{\widehat{u}}})\Vert ^2_{\partial \mathcal {T}_h} + \Vert \nabla \cdot {\varvec{p}}\Vert _{L^\infty (\varOmega )} \Vert \xi ^u_h\Vert _{{\mathcal {T}}_h}^2. \end{aligned}$$

Sum the above estimates of \(\{R_i\}_{i=1}^6\) to get

$$\begin{aligned} (\partial _t\xi _h^u,\xi _h^u)_{\mathcal {T}_h}+ \Vert \xi _h^{{\varvec{q}}}\Vert ^2_{\mathcal {T}_h}+\Vert h_K^{-1/2}(\varPi _k^{\partial }\xi _h^u-\xi _h^{{\widehat{u}}})\Vert ^2_{\partial \mathcal {T}_h} \le C h^{2k+4} + C \Vert \xi ^u_h\Vert _{{\mathcal {T}}_h}^2. \end{aligned}$$
(24)

Integrating both sides of (24) on \([0, t_h^*]\) we finally obtain

$$\begin{aligned}&\Vert \xi _h^u(t_h^*)\Vert ^2_{\mathcal {T}_h}+\int _{0}^{t_h^*} \left( \Vert \xi _h^{{\varvec{q}}}\Vert ^2_{\mathcal {T}_h} +\Vert h_K^{-1/2}(\varPi _k^{\partial }\xi _h^u-\xi _h^{{\widehat{u}}})\Vert ^2_{\partial \mathcal {T}_h}\right) dt\\&\quad \le Ch^{2k+4} + C \int _0^{t_h^*} \Vert \xi _h^u\Vert _{{\mathcal {T}}_h}^2 dt. \end{aligned}$$

The use of Gronwall’s inequality gives the desired result. \(\square \)

Lemma 3.8

For h small enough, the result in Lemma 3.7 holds on the whole time interval [0, T].

Proof

Fix \( h^* > 0 \) so that Lemma 3.7 is true for all \( h \le h^* \), and assume \(t^*_h\) is the largest value for which (23) is true for all \( h \le h^* \). Define the set \( {\mathbb {A}} = \{ h \in [0,h^*] : t^*_h \ne T \} \). If the result is not true, then \( {\mathbb {A}} \) is nonempty, \( \inf \{ h : h\in {\mathbb {A}} \} = 0 \), and also

$$\begin{aligned} \Vert u_h\Vert _{L^\infty (\varOmega )}+ \Vert {\widehat{u}}_h\Vert _{L^\infty ({\mathcal {E}}_h)} = 2{\mathcal {M}}+2 \quad \text{ for } \text{ all } h \in \mathcal {A} \text{. } \end{aligned}$$
(25)

However, by the inverse inequality and since Lemma 3.7 holds, we have

$$\begin{aligned} \Vert u_h\Vert _{L^\infty (\varOmega )}+ \Vert {\widehat{u}}_h\Vert _{L^\infty ({\mathcal {E}}_h)} \le C h^{2-d/2} + 2{\mathcal {M}} +1 \quad \text{ for } \text{ all } h \in \mathcal {A} \text{. } \end{aligned}$$

Since C does not depend on h, there exists \( h^*_1 \le h^* \) such that \( \Vert u_h\Vert _{L^\infty (\varOmega )}+ \Vert {\widehat{u}}_h\Vert _{L^\infty ({\mathcal {E}}_h)} <2{\mathcal {M}} +2 \) for all \( h \in {\mathbb {A}} \) such that \( h \le h^*_1 \). This contradicts (25), and therefore \(t^*_h = T\) for all h small enough. \(\square \)

The above lemma, the triangle inequality, and Lemma 3.3 complete the proof of Theorem 2.3.

4 Numerical Results

In this section we present some numerical results in two spatial dimensions.

Example 4.1

We begin with an example with an exact solution in order to illustrate the convergence theory. The domain is the unit square \(\varOmega = [0,1]\times [0,1]\subset {\mathbb {R}}^2\) and homogeneous Dirichlet boundary conditions are applied on the boundary. The source terms \(f_1\), \(f_2\) and the initial condition are chosen so that \(\varepsilon = 0.1\) and the exact solution \(u = \cos (t)\sin (x)\cos (y)\) and \(\phi = \sin (t)\cos (x)\sin (y)\). The second order backward differentiation formula (BDF2) is applied for the time discretization and for the space discretization we choose polynomial degrees \( k = 0 \) or \( k = 1 \) (used in the definition of the discrete spaces in Sect. 1). The time step is chosen to be \(\varDelta t = h\) when \(k=0\) and \(\varDelta t = h^{3/2}\) when \(k=1\). We report the errors at the final time \( T = 1 \). The observed convergence rates match our theory (Tables 1, 2).

Table 1 History of convergence for \({\varvec{q}}_h\) and \({\varvec{p}}_h\) for Example 4.1 under uniform mesh refinement
Table 2 History of convergence for \(u_h\) and \(\phi _h\) for Example 4.1 under uniform mesh refinement

Next, we test an example without a convergence rate but that show the performance of the HDG method. We take \(k=0\), the domain is also the unit square \(\varOmega = [0,1]\times [0,1]\subset {\mathbb {R}}^2\) and partition into 20, 000 triangles, i.e., \(h=\sqrt{2}/100\). BDF2 is applied for time discretization and the time step \(\varDelta t = 1/1000\), at each time step, we utilized the Newton’s method to solve the nonlinear system.

Example 4.2

This example has non-homogeneous Dirichlet data and demonstrates that our HDG scheme can handle this case. We take \(\varepsilon =10^{-2}\) and the source terms \(f_1=0\) and

$$\begin{aligned} f_2 = {\left\{ \begin{array}{ll} -\,0.8&{} (0,0.5)\times (1/2,1),\\ 0.8&{} \text {else}. \end{array}\right. } \end{aligned}$$

The Dirichlet boundary condition \(g_{u}=0.9, g_{\phi }=1.1\) on \(\{y=0\}\), and \(g_{u}=0.1, g_{\phi }=-1.1\) on \(\{y=1, 0\le x\le 0.25\}\). Elsewhere we impose homogeneous Neumann boundary conditions. Initial condition \(u_0 = (1+f_2)/2\). A similar example was studied in [3] by a finite volume method. We plot the solutions \(u_h\) and \(\phi _h\) at different final time T; see Figs. 1 and 2.

Fig. 1
figure 1

From left to right, from top to bottom are the contour plots of \(u_h\) at time: \(T=0.01, 0.4,0.7,1\) for Example 4.2

Fig. 2
figure 2

From left to right, from top to bottom are the contour plots of \(\phi _h\) at time: \(T=0.01, 0.4,0.7,1\) for Example 4.2

5 Conclusion

In this work, we proposed an HDG method for the drift–diffusion equation. We proved optimal semi-discrete error estimates for all variables; moreover, from the point view of degrees of freedom, we obtained a superconvergent convergence rate for the variable u. As far as we are aware, this is the first such result in the literature.

Clearly it would be desirable to prove convergence without the need to assume an inverse assumption. Equally, it would be useful to prove fully discrete estimates using, for example BDF2 in time.

This is the first of a series of papers in which we develop efficient HDG methods for drift–diffusion equation, including devising HDG methods when \(\varepsilon \) approaches to zero. We have a great interest in the numerical solution of steady state drift–diffusion equation, and we will explore this problem in our future papers.