13.1 Introduction

When we deal with the simulation of physical problems like transient diffusion problems, heat-conduction problems, or electromagnetic eddy current problems, the governing partial differential equations (PDEs) are often of parabolic type. Thus, the development of efficient numerical schemes for solving parabolic equations is of great importance. The standard approach to the numerical solution of parabolic PDEs uses some time-stepping method applied to the large-scale system of ordinary differential equations arising from a semi-discretization in the spatial variables, e.g., by means of the Finite Element Method (FEM); see, e.g., [42]. Another approach first discretizes the parabolic problem with respect to time by some time-stepping method, and then perform a discretization of the resulting elliptic problems in the spatial variables. This approach is sometimes also called Rothe’s method, see, e.g., [25]. There are many papers on the more recent continuous or discontinuous Galerkin or Galerkin-Petrov (cG, cGP, dG, dGP) methods based on time-slices; see, e.g., [1, 4, 6, 14, 27, 31,32,33,34, 37] and the references therein. These methods are closely related to classical time-integration methods that can be solved in a time-stepping procedure. This is a sequential procedure that is not suited for parallel computing . In order overcome this drawback on massively parallel computers, time-parallel solvers, see, e.g. [12, 15, 18, 45], or time-parallel time-integration methods like PARAREAL [28] have been developed. An excellent historical overview of 50 years of time-parallel integration methods is given in [16]. An alternative approach consists in a full space-time discretization at once by treating time just as another space variable, i.e., we solve a problem with one dimension more. In fact, this approach to the numerical solution of transient problems in space and time simultaneously is not new, but becomes now a really hot topic in connection with the availability of massively parallel computers with many thousands of cores. Besides the overview paper [16] that focuses on time-parallel integration methods, we refer to [41] that provides an overview of the latest developments in this field with focus on completely unstructured space-time methods and simultaneous space-time adaptivity that treats time just as another variable.

In this paper, we will focus on space-time finite element methods that use really unstructured simplicial space-time meshes. The motivation behind this is that, for elliptic problems, there exist plenty of efficient and, most important, parallel solving methods. If we would be able to derive a stable discrete bilinear form, for which we can prove coercivity (ellipticity) with respect to some mesh-dependent norm in the space-time FE-space, then we can expect that we can efficiently solve the space-time problem fully in parallel as in the elliptic case. In this way, we can overcome the curse of sequentiality of the time-stepping methods. Another reason for the space-time approach is the fact that we are not restricted to a special structure of the mesh. This means that we can apply adaptive mesh refinement in space and time simultaneously. Last but not least, we can easily deal with moving interfaces and computational domains, where the coefficients of the PDE and/or the spatial domain Ω(t) depend on the time as well. Moreover, optimization problems constrained by a parabolic initial-boundary value problem lead to optimality conditions that can very efficiently be solved by space-time methods.

As already mentioned above, these advantages of space-time methods together with the common availability of massively parallel computers have led to a revival of space-time methods. This especially concerns space-time methods that are based on completely unstructured space-time meshes produced, e.g., by simultaneous space-time adaptivity ; see [41] for a review of recent publications on this topic. For instance, Steinbach introduced a inf-sup-stable Petrov-Galerkin method [39], whereas Toulopoulos used bubble functions to stabilize a space-time finite element method [43]. In the context of using Isogeometric Analysis (IgA) as space-time discretization method, Langer et al. proposed an upwind-stabilized space-time method for parabolic evolution equations [26]; see also PhD thesis [29] by Moore. Similar stabilized space-time finite element schemes have recently been developed in [5, 10, 30]. In [5], beside the upwind-stabilization scheme, Bank, Vassilevski and Zikatanov proposed and analyzed new EAFE (edge average finite element) schemes for parabolic convection-diffusion-reaction problems, whereas Devaud and Schwab [10] introduced upwind-stabilized schemes with mesh grading in time dealing with time singularities and hp schemes leading to exponential convergence.

The main aim of this paper is to generalize the results for the stabilized space-time scheme proposed in [26], where the authors use IgA for the discretization, and the corresponding stabilized space-time FE scheme considered by Moore [30] to the case of moving interfaces, i.e., t-dependent, discontinuous diffusion coefficients, and the possibility to choose local (element-wise) upwind test functions of the form v h + θ K h K t v h depending on the mesh-size h K of an element K from the finite element mesh. This localization of the upwind-stabilization is very important for adaptivity that produces a family of shape regular meshes. We will use an unstructured conforming FEM to discretize the parabolic initial-boundary value problem, which we specify in the following. Let Q := Ω × (0, T) be the space-time cylinder, with Ω ⊂R d, d ∈{1, 2, 3}, being a sufficiently smooth and bounded spatial domain, and T > 0 being the final time. Furthermore, let Σ := ∂Ω × (0, T), \( {\overline {\varSigma }_0} := \overline {\varOmega }\times \{0\} \) and \( {\overline {\varSigma }_T} := \overline {\varOmega }\times \{T\} \) such that \( \partial Q = \varSigma \cup {\overline {\varSigma }_0}\cup {\overline {\varSigma }_T} \). Then we consider the following model problem that can formally be written as follows: Given f, g, ν and u 0, find u such that (s.t.)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{\partial u}{\partial t}(x,t) - \mathrm{div}_x(\nu(x,t) \nabla_x u(x,t)) &\displaystyle =&\displaystyle f(x,t), \quad (x,t)\in Q, {} \end{array} \end{aligned} $$
(13.1)
$$\displaystyle \begin{aligned} \begin{array}{rcl} u(x,t) &\displaystyle =&\displaystyle g(x,t) = 0, \quad (x,t)\in\varSigma, \end{array} \end{aligned} $$
(13.2)
$$\displaystyle \begin{aligned} \begin{array}{rcl} u(x,0) &\displaystyle =&\displaystyle u_0(x), \quad x\in\overline{\varOmega}, {} \end{array} \end{aligned} $$
(13.3)

where the diffusion coefficient (reluctivity in electromagnetics) ν is a given uniformly positive and bounded coefficient function. The dependence of ν not only on space but also on time enables us to model moving interfaces. Note that we do not require ν to be smooth. In fact, we will admit discontinuities for ν. For simplicity, we assume homogeneous Dirichlet boundary conditions.

The paper is structured in the following way: In Sect. 13.2, we provide a space-time variational formulation of the parabolic initial boundary value problem (13.1)–(13.3), and we recall some existence, uniqueness and regularity results for weak solutions in appropriate space-time Sobolev spaces. Section 13.3 is devoted to the derivation and analysis of a new locally stabilized space-time finite element scheme. Moreover, we derive a priori discretization error estimates. In Sect. 13.4, we present four typical test cases for which we have performed extensive numerical studies, and we discuss the numerical results. Section 13.5 draws some conclusions, and provides an outlook on the future work.

13.2 The Space-Time Variational Formulation

In order to derive well-posed space-time variational formulations in space-time Sobolev spaces, we follow the classical approach developed in the monograph [24] by Ladyžhenskaya, Solonnikov and Uraltseva, and in the lecture notes [23] by Ladyžhenskaya. Let us first define the proper function spaces.

Definition 13.1

Let L 2(Q) be the space of square integrable functions in the space-time cylinder Q. Then we define the following Sobolev (Hilbert) spaces

$$\displaystyle \begin{aligned} \begin{array}{rcl} H^1_0(Q) &\displaystyle =&\displaystyle W_{2,0}^1(Q) := \{ u\in L_2(Q) : \nabla u \in [L_2(Q)]^{d+1} \;\mbox{and}\; u = 0 \:\mbox{on}\; \varSigma\}, \\ H^{1,0}(Q) &\displaystyle =&\displaystyle W_2^{1,0}(Q) := \{ u\in L_2(Q) : \nabla_x u \in [L_2(Q)]^d \},\\ {H}^{1,0}_{0}(Q) &\displaystyle =&\displaystyle {W}^{1,0}_{2,0}(Q) := \{u\in H^{1,0}(Q) : u = 0 \:\mbox{on}\; \varSigma\}, \end{array} \end{aligned} $$

equipped with the usual scalar products and norms, as well as the Banach space

$$\displaystyle \begin{aligned} \begin{array}{rcl} V_2(Q) &\displaystyle :=&\displaystyle \{u\in H^{1,0}(Q) : | u |{}_{Q} < \infty\}, \end{array} \end{aligned} $$

with the subspaces

$$\displaystyle \begin{aligned} \begin{array}{rcl} {V}_{2,0}(Q) &\displaystyle :=&\displaystyle \{u\in{H}^{1,0}_{0}(Q) : | u |{}_{Q} < \infty\},\\ V^{1,0}_2(Q) &\displaystyle :=&\displaystyle \{u\in V_2(Q) : \lim\limits_{\varDelta t \rightarrow 0} \| u(\cdot,t+\varDelta t) - u(\cdot,t) \|{}_{L_2(\varOmega)} = 0 , \mbox{uniformly on } [0,T] \},\\ {V}^{1,0}_{2,0}(Q) &\displaystyle :=&\displaystyle V^{1,0}_2(Q) \cap {H}^{1,0}_{0}(Q), \end{array} \end{aligned} $$

where the norm |⋅|Q is defined by

$$\displaystyle \begin{aligned} | u |{}_{Q_t} := \max_{0\leq\tau\leq t}\| u(\cdot,\tau) \|{}_{L_2(\varOmega)} + \| \nabla_x u \|{}_{Q_t}, \end{aligned}$$

and Q t = Ω × (0, t) denotes a truncated space-time cylinder. Here, the appearing differential operators are defined as follows:

$$\displaystyle \begin{aligned} \nabla = (\nabla_x,\nabla_t)^T, \quad \nabla_x = (\partial_{x_1},\dots,\partial_{x_d})^T\quad \mbox{and}\quad \nabla_t = (\partial_t). \end{aligned}$$

Multiplying the PDE (13.1) by a test function \(v \in \hat {H}^1_0(Q) := \{ v\in H^1_0(Q) : v=0\mbox{ on } \varSigma _T \}\), integrating over the complete space-time domain (cylinder) Q = Ω × (0, T), integrating by parts with respect to time and space once, and incorporating the initial and boundary conditions, we immediately arrive at the following space-time variational formulation of the initial-boundary value problem (13.1)–(13.3): find a function \( u\in {H}^{1,0}_{0}(Q) \) such that

$$\displaystyle \begin{aligned} a(u,v) = l(v), \quad \forall v \in \hat{H}^1_0(Q), \end{aligned} $$
(13.4)

where the bilinear form a(⋅, ⋅) and the linear form l(⋅) are defined by the identities

$$\displaystyle \begin{aligned} a(u,v) =\int_{Q} ( -u \partial_{t} v \,+\, \nu(x,t) \nabla_x u\nabla_x v) \; \mathrm{d}x\mathrm{d}t \end{aligned}$$

and

$$\displaystyle \begin{aligned} l(v) = \int_{\varOmega} u_0(x)\,v(x,0) \; \mathrm{d}x + \int_{Q} f v \; \mathrm{d}x\mathrm{d}t, \end{aligned}$$

respectively. A solution u of the space-time variational (13.4) is called generalized (weak) solution of the parabolic initial-boundary value problem (13.1)–(13.3) in the space \( u\in {H}^{1,0}_{0}(Q) \).

Under the assumptions that

$$\displaystyle \begin{aligned} u_0\in L_2(\varOmega)\quad \mbox{and}\quad f\in L_{2,1}(Q):=\{ v: Q \rightarrow \mathbf{R} : \int_{0}^{T} \| v(\cdot,t) \|{}_{L_2(\varOmega)} \; \mathrm{d}t < \infty \}, \end{aligned} $$
(13.5)

and that

$$\displaystyle \begin{aligned} 0 < \underline{\nu} \leq \nu(x,t) \leq \overline{\nu},\quad \mbox{for almost all }(x,t)\in Q, \end{aligned} $$
(13.6)

with positive constants \( \underline {\nu }\) and \(\overline {\nu }\), the following theorem was proven by means of Galerkin’s method and appropriate a priori estimates in [23]:

Theorem 13.1 ([23, Chapter III, Thm. 3.1])

Under the conditions (13.5) and (13.6), the space-time variational problem (13.4) has at least one generalized (weak) solution in \( {H}^{1,0}_{0}(Q) \).

Definition 13.2 ([23, Chapter III])

A generalized solution \( u \in {H}^{1,0}_{0}(Q) \) of the space-time variational problem (13.4) is a called a generalized solution in \( {V}^{1,0}_{2,0}(Q) \), if \( u\in {V}^{1,0}_{2,0}(Q) \) and if it fulfills the energy-balance equation

$$\displaystyle \begin{aligned} \frac{1}{2}\| u(\cdot,t) \|{}_{L_2(\varOmega)}^2 + \int_{Q_t} \nu(x,{\tau})| \nabla_x u |{}^2 \; \mathrm{d}x\mathrm{d}{\tau} = \frac{1}{2} \| u(\cdot,0) \|{}_{L_2(\varOmega)}^2 + \int_{Q_t} f\,u \; \mathrm{d}x\mathrm{d}{\tau}. \end{aligned}$$

and the identity

$$\displaystyle \begin{aligned} \begin{array}{rcl} \int_{\varOmega} u(x,t)&\displaystyle &\displaystyle v(x,t) \; \mathrm{d}x - \int_{\varOmega} u_{0} v(x,0) \; \mathrm{d}x\\ &\displaystyle &\displaystyle + \int_{Q_t} -u\partial_{t} v + \nu\nabla_x u\nabla_x v \; \mathrm{d}x\mathrm{d}{\tau} =\int_{Q_t} f\,v \; \mathrm{d}x\mathrm{d}{\tau}, \end{array} \end{aligned} $$

for all \( v\in H^1_0(Q) \) and any t ∈ (0, T).

Theorem 13.2 ([23, Chapter III, Thm. 3.2])

If the assumptions (13.5) and (13.6) are fulfilled, then any generalized solution of the space-time variational problem (13.4) in \( {H}^{1,0}_{0}(Q) \) is the generalized solution in \( {V}^{1,0}_{2,0}(Q) \) and it is unique in \( {H}^{1,0}_{0}(Q) \).

Corollary 13.1

If the assumptions (13.5) and (13.6) hold, then there exists a unique generalized solution \( u\in {{V}^{1,0}_{2,0}(Q)} \) to the the space-time variational problem (13.4).

Remark 13.1

For the case ν = 1, f ∈ L 2(Q) and \(u_0\in H^1_0(\varOmega )\), Ladyžhenskaya proved in [23, Chapter III, Thm. 2.1] that the generalized solution u of (13.4) belongs to space \(H^{\varDelta ,1}_0(Q) = W^{\varDelta ,1}_{2,0}(Q) = \{v \in H^1_0(Q) : \varDelta _x v \in L_2(Q) \}\), and u continuously depends on t in the norm of the space \(H^1_0(\varOmega )\). If ∂Ω ∈ C 2, then \(u \in W^{2,1}_{2,0}(Q)\).

More regularity results can already be found in the classical monograph [24] and in the more recent references [46] and [22]. The last reference provides an overview on maximal parabolic regularity results; see also [27]. The space-time finite element scheme that we are going to derive is consistent for solutions u of (13.4) that have at least piecewise partial time derivative t u in L 2 and fluxes νx u in H(divx) = {v = (v 1, …, v d) ∈ [L 2(Q)]d : divx v ∈ L 2(Q)}. This is ensured in the case of maximal parabolic regularity where t u ∈ L 2(Q) and divx(νx u) ∈ L 2(Q), i.e. t u −divx(νx u) = f in L 2(Q). We emphasize that we need this property only element-wise for deriving a consistent scheme.

13.3 The Space-Time Finite Element Scheme

From the previous section, we know that there exists a unique generalized solution of the initial-boundary value problem (13.1) in \( {H}^{1,0}_{0}(Q)\cap {V}^{1,0}_{2,0}(Q) \) that may have more regularity due to more regularity of the data, see Remark 13.1 and the references mentioned above. The goal of this section is to derive a consistent and stable space-time finite element scheme with a discrete (mesh-dependent) bilinear form a h(⋅, ⋅) that is coercive (elliptic) on the space-time finite element spaces and bounded on extended spaces with respect to appropriately chosen, mesh-dependent norms. These properties ensure existence and uniqueness of a finite element solution, and, together with appropriate interpolation respectively approximation error estimates, a priori discretization error estimates for sufficiently smooth solutions.

Similar to Langer et al. in [26], we use special time-upwind test functions, but in contrast to [26] the time-upwind test functions are now locally scaled by the element mesh-size in order to handle adaptivity. First, we need a regular or, at least, a shape regular triangulation \( \mathcal {T}_h \) of the space-time cylinder Q; see, e.g., [7, 9] for details. We now formally define this triangulation as \(\mathcal {T}_h := \{ K : K \subset Q, K\mbox{ open}\}\) such that \(\overline {Q} = \bigcup _{K\in \mathcal {T}_h} \overline {K}\), with K ∩ K′ = ∅ for \(K \neq K'\in \mathcal {T}_h\), and the usual conditions imposed on a regular or a shape regular triangulation are fulfilled [7, 9]. On each of these elements K, we now define individual time-upwind test functions

$$\displaystyle \begin{aligned} v_{h,K}(x,t) := v_h(x,t) + \theta_K h_K \partial_{t} v_h(x,t), \mbox{ for all } (x,t) \in K,\end{aligned} $$

where θ K is a local positive parameter that will be defined later, and h K := diam(K). Here, v h is some test function from a standard conforming space-time finite element space \( V_{0h} = \{ v\in C(\overline {Q}) : v(x_K(\cdot )) \in {\mathbf {P}}_p(\hat {K}),\, \forall K \in \mathcal {T}_h,\, v=0\;\mbox{on}\; {\overline \varSigma } \cup {\overline \varSigma _0} \}\), where x K(⋅) is the map from the reference element \(\hat {K}\) to the finite element \(K \in \mathcal {T}_h\), and \({\mathbf {P}}_p(\hat {K})\) is the space of polynomials of the degree p on the reference element \(\hat {K}\). For simplicity, throughout this paper and, in particular, in our numerical experiments in Sect. 13.4, we use affine-linear mappings x K(⋅) and simplicial elements. From now on, unless specified otherwise, all functions depend on both space and time variables. So, we can omit the arguments.

In this section, we will also use the following spaces:

$$\displaystyle \begin{aligned} \begin{array}{rcl} H^{1,1}_{0,\underline{0}}(Q) &\displaystyle :=&\displaystyle \{ u\in L_2(Q) : \nabla_x u\in [L_2(Q)]^d,\ \partial_{t} u\in L_2(Q)\mbox{ and }u|{}_{\varSigma\cup\varSigma_0}=0 \},\\ H^{2,1}_{0,\underline{0}}(\mathcal{T}_h) &\displaystyle :=&\displaystyle \{ v\in H^{1,1}_{0,\underline{0}}(Q) : v|{}_{K} \in H^{2,1}(K),\ \forall K\in\mathcal{T}_h \},\\ W_{\infty}^{1}(\mathcal{T}_h) &\displaystyle :=&\displaystyle \{ v\in L_{\infty}(Q) : v|{}_K\in W_{\infty}^{1}(K),\ \forall K\in\mathcal{T}_h \},\vspace{-2pt} \end{array} \end{aligned} $$

where \( H^{2,1}(K) := \{ v\in L_2(K) : \partial _t v, \partial _{x_i} v, \partial _{x_i}\partial _{x_j} v \in L_2(K)\ \mbox{and}\ \partial _t v\in L_2(K) \} \). For the sake of convenience, we now consider homogeneous initial conditions, i.e., u 0 = 0 on Ω. Furthermore, we assume that \( \nu \in W_{\infty }^1(\mathcal {T}_h) \), and that the PDE has a sufficiently smooth solution u, e.g., \( u\in H^{2,1}_{0, \underline {0}}(\mathcal {T}_h) \); cf. also our discussion in Sect. 13.2. Now we first multiply the PDE (13.1) by the space-time test function v h,K, and then integrate over a single element K. Summing up over all elements and applying integration by parts in the principle term, we obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_{K\in\mathcal{T}_h}&\displaystyle &\displaystyle \int_{K} \left(\partial_{t} u v_h {+} \theta_{K} h_K\partial_{t} u \partial_{t} v_h +\nu \nabla_x u\cdot \nabla_x v_h + \theta_{K} h_K\nu\nabla_x u\cdot\nabla_x(\partial_{t} v_h)\right) \, \mathrm{d}(x,t) \\ &\displaystyle &\displaystyle - \sum_{K\in\mathcal{T}_h}\int_{\partial K}\left( \nu\nabla_x u \cdot n_{x} v_h + \theta_{K} h_K\nu\nabla_x u\cdot n_{x} \partial_{t} v_h \right) \, \mathrm{d}s_{(x,t)} = l_h(v_h)\vspace{-2pt} {} \end{array} \end{aligned} $$
(13.7)

with the linear form

$$\displaystyle \begin{aligned} l_h(v_h) := \sum_{K\in\mathcal{T}_h} \int_{K} f\,( v_h + \theta_{K} h_K\partial_{t}v_h) \; \mathrm{d}(x,t).\vspace{-2pt} {} \end{aligned} $$
(13.8)

For the exact solution u of (13.1), we know that the fluxes have to be continuous across the boundaries of the elements \(K\in \mathcal {T}_h\). This observation means that

$$\displaystyle \begin{aligned} \sum_{K\in\mathcal{T}_h}\int_{\partial K} \nu\nabla_x u \cdot n_{x} v_h \, \mathrm{d}s_{(x,t)} = 0 \end{aligned}$$

for all test functions v h ∈ V 0h. We mention that v h is zero on Σ, and that n x vanishes on \({\overline \varSigma } \cup {\overline \varSigma _0}\). Therefore, the first boundary term completely disappears from (13.7), but, in general, not the second term, since θ K h K varies from element to element. We now arrived at the consistency identity

$$\displaystyle \begin{aligned} a_h(u,v_h) = l_h(v_h), \quad \forall v_h\in V_{0h}, \end{aligned} $$
(13.9)

that holds for a sufficiently smooth solution u, e.g., \( u\in H^{2,1}_{0, \underline {0}}(\mathcal {T}_h) \), where the discrete (mesh-dependent) bilinear form a h(⋅, ⋅) is defined by the identity

$$\displaystyle \begin{aligned} \begin{array}{rcl} a_h(u,v_h) &\displaystyle :=&\displaystyle \sum_{K\in\mathcal{T}_h} \int_{K} \left(\partial_{t} u\,v_h + \theta_{K} h_K\,\partial_{t} u\,\partial_{t}v_h \right)\, \mathrm{d}(x,t)\\ &\displaystyle &\displaystyle + \sum_{K\in\mathcal{T}_h} \int_{K} \left(\nu \nabla_{x} u\cdot\nabla_{x}v_h + \theta_{K} h_K\,\nu\nabla_{x} u\cdot\nabla_{x}(\partial_{t}v_h) \right)\, \mathrm{d}(x,t){}\\ &\displaystyle &\displaystyle - \sum_{K\in\mathcal{T}_h} \int_{\partial K} \theta_{K} h_K\,\nu\nabla_x u\cdot n_{x}\,\partial_{t}v_h \; \mathrm{d}s_{(x,t)}, \end{array} \end{aligned} $$
(13.10)

and the linear form l h(⋅) is defined by (13.8), with given \( \nu \in W_{\infty }^{1}(\mathcal {T}_h) \) and f ∈ L 2(Q).

Remark 13.2

We can derive a scheme that is equivalent to (13.10). In particular, instead of applying integration by parts on both principal terms, we only apply it to the first principal term and keep the second. Hence, we obtain another consistency identity

$$\displaystyle \begin{aligned} \tilde{a}_h(u,v_h) = l_h(v_h), \quad \forall v_h\in V_{0h}, \end{aligned}$$

that holds for a solution u of (13.4) that only belongs to \(H^{L,1}_{0, \underline 0}(\mathcal {T}_h) := \{ u\in L_2(Q): (\partial _tu)|{ }_K\in L_2(K), (\nu \nabla _x u)|{ }_K\in H(\mathrm {div}_x,K),\ u = 0 \mbox{ on } K \cap (\varSigma \cup \varSigma _0) \ \forall K\in \mathcal T_h \}\), where

$$\displaystyle \begin{aligned} \begin{array}{rcl} \tilde{a}_h(u,v_h) &\displaystyle :=&\displaystyle \sum_{K\in\mathcal{T}_h} \int_{K} \left(\partial_{t} u\,v_h + \theta_{K} h_K\,\partial_{t} u\,\partial_{t}v_h\right)\, \mathrm{d}(x,t) \\ &\displaystyle &\displaystyle + \sum_{K\in\mathcal{T}_h} \int_{K} \left(\nu\nabla_x u\cdot\nabla_xv_h - \theta_{K} h_K\,\mathrm{div}_x(\nu\nabla_x u)\partial_{t}v_h \right)\, \mathrm{d}(x,t) \\ {} \end{array} \end{aligned} $$
(13.11)

with given \( \nu \in W_{\infty }^{1}(\mathcal {T}_h) \) and f ∈ L 2(Q), and l h as in (13.8). We mention that \(u \in H^{L,1}_{0, \underline 0}(\mathcal {T}_h)\) is ensured in the case of maximal parabolic regularity where u belongs H L, 1(Q) := {v ∈ H 1(Q) : Lu := divx(νx u) ∈ L 2(Q)}.

Remark 13.3

If the test functions v h ∈ V 0h are continuous and piecewise linear (p = 1), then the term in (13.10) containing ∇x( t v h) vanishes in all elements \( K\in \mathcal {T}_h \), since it only contains mixed second order derivatives of the test functions.

Now we look for a Galerkin approximation u h ∈ V 0h to the generalized solution u of the initial boundary value problem (13.1)–(13.3) using the variational identity (13.9), i.e., find u h ∈ V 0h such that

$$\displaystyle \begin{aligned} a_h(u_{h},v_h) = l_h(v_h), \quad \forall v_h\in V_{0h},\end{aligned} $$
(13.12)

with a h(⋅, ⋅) and l h(⋅) as defined above by (13.10) and (13.8), respectively. In Sect. 13.2, we already showed existence and uniqueness of a weak solution to the initial-boundary value problem (13.1)–(13.3). However, our finite element scheme (13.12) is based on a mesh-dependent bilinear form a h(⋅, ⋅). Thus, we have to investigate the stability of the space-time finite element scheme. More precisely, we will show ellipticity of the bilinear form a h(⋅, ⋅) : V 0h × V 0h →R w.r.t. the mesh-dependent norm

$$\displaystyle \begin{aligned} \| v_h \|{}_h^2 := \sum_{K\in\mathcal{T}_h} \big[\| \nu^{1/2}\nabla_x v_h \|{}_{L_2(K)}^2 + \theta_K h_K \| \partial_{t} v_h \|{}_{L_2(K)}^2 \big] + \frac{1}{2} \| v_h \|{}_{L_2(\varSigma_T)}^2 .\end{aligned} $$
(13.13)

This implies existence and uniqueness of the finite element solution u h ∈ V 0h of (13.12). For the following derivations, we assume that our triangulation \( \mathcal {T}_h \) of Q is shape regular such that local approximation error estimates are available [7, 9]. A shape regular triangulation \( \mathcal {T}_h \) of Q is called quasi-uniform, if there exists a constant c u such that

$$\displaystyle \begin{aligned} h_K \leq h \leq c_u h_K, \quad \mbox{for all } K\in\mathcal{T}_h, \end{aligned}$$

where \( h = \max _{K\in \mathcal {T}_h}h_K \). Moreover, we introduce localized bounds for our coefficient function ν, i.e.,

$$\displaystyle \begin{aligned} \underline{\nu}_K \leq \nu(x,t) \leq \overline{\nu}_K,\quad \mbox{for almost all }(x,t)\in K\mbox{ and for all }K\in\mathcal{T}_h,\end{aligned} $$
(13.14)

where \( \underline {\nu }_K \ge \underline {\nu }\) and \(\overline {\nu }_K \le \overline {\nu } \) are positive constants on every \(K \in \mathcal {T}_h\). In the following, we need some inverse inequalities for functions from finite element spaces.

Lemma 13.1

There exist generic positive constants c I,1 and c I,2 such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \| v_h \|{}_{L_2(\partial K)} \leq c_{I,1} h_K^{-1/2} \| v_h \|{}_{L_2(K)},{} \end{array} \end{aligned} $$
(13.15)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \| \nabla v_h \|{}_{L_2(K)} \leq c_{I,2} h_K^{-1} \| v_h \|{}_{L_2(K)} {} \end{array} \end{aligned} $$
(13.16)

for all v h ∈ V 0h and for all \( K\in \mathcal {T}_h \).

Proof

For (13.15); see e.g. [11, 35], and for (13.16) see e.g. [7, 9, 11]. □

From ∇ = (∇x, t)T and (13.16), we can immediately deduce

$$\displaystyle \begin{aligned} \| \partial_{t}v_h \|{}_{L_2(K)} \leq c_{I,2} h_K^{-1}\| v_h \|{}_{L_2(K)}.\vspace{-2pt} {} \end{aligned} $$
(13.17)

The above inequalities hold for the standard L 2-norms. However, we will also need such a result in some scaled norm.

Lemma 13.2

Let \( \nu \in W_{\infty }^{1}(\mathcal {T}_h) \) be a given, uniformly positive function. Then

$$\displaystyle \begin{aligned} \| v \|{}_{L_2^{\nu}(K)}^2 = \int_{K} \nu(x,t)\,| v(x,t) |{}^2 \; \mathrm{d}(x,t) \end{aligned}$$

is a norm, and the inverse estimate

$$\displaystyle \begin{aligned} \| \partial_{t}v_h \|{}_{L_2^{\nu}(K)}\leq\| \nabla v_h \|{}_{L_2^{\nu}(K)}\leq c_{I,\nu}h_K^{-1} \| v_h \|{}_{L_2^{\nu}(K)} \end{aligned}$$

holds for all v h ∈ V 0h and for all \( K\in \mathcal {T}_h \) , with \( c_{I,\nu } := (\overline {\nu }_K/ \underline {\nu }_K)^{1/2} c_{I,2} \).

Proof

See [36]. □

We note that, in practical applications, it is clear that \( 1 \leq \overline {\nu }_K / \underline {\nu }_K \) is close to 1. Below, we will also need the estimate

$$\displaystyle \begin{aligned} \| \partial_{t}\partial_{x_i} v_h \|{}_{L_2^{\nu}(K)} \leq c_{I,\nu} h_K^{-1}\| \partial_{x_i} v_h \|{}_{L_2^{\nu}(K)},\vspace{-2pt} {} \end{aligned} $$
(13.18)

which obviously holds for all v h ∈ V 0h and for all \( K\in \mathcal {T}_h \). Moreover, we need the following inverse inequality.

Lemma 13.3

Let \( \nu \in W_{\infty }^{1}(\mathcal {T}_h) \) be a given uniformly positive function. Let W h|K := {w h : w h = ∇x v h, v h ∈ V 0h|K}. Then the inverse estimate

$$\displaystyle \begin{aligned} \| \mathrm{div}_x(\nu w_h) \|{}_{L_2(K)} \leq c_{I,3} h_K^{-1} \| \nu w_h \|{}_{L_2(K)}, \forall w_h\in W_h|{}_{K}\end{aligned} $$
(13.19)

holds, where c I,3 is a positive constant that is independent of h K.

Proof

First, we know that V 0h|K is a finite-dimensional space spanned by the local shape functions \( \{ p^{(i)} \}_{i\in \overline \omega _{K}} \), where \( \overline {\omega }_K \) is the index set of local degrees of freedom. Hence, the space W h|K is also finite-dimensional and spanned by the generating system \( \{ \nabla _xp^{(i)} \}_{i\in \overline \omega _{K}} \). Moreover, for a fixed ν, each product z h := ν w h can be represented by means of a non-necessary unique linear combination \( \{ \nu \,\nabla _xp^{(i)} \}_{i\in \overline \omega _{K}} \) on K. We denote this space by \( Z_h(K):=\mathrm {span}_{i\in \overline \omega _{K}}\{ \nu \,\nabla _xp^{(i)} \} \). Using Cauchy’s inequality, we obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} \| \mathrm{div}_x z_h \|{}_{L_2(K)}^2 &\displaystyle =&\displaystyle \int_{K} | \mathrm{div}_x z_h |{}^2 \; \mathrm{d}(x,t) = \int_{K} | \sum_{i=1}^{d}\partial_{x_i} z_{h,i} |{}^2 \; \mathrm{d}(x,t) \\ &\displaystyle \leq&\displaystyle d \int_{K} \sum_{i=1}^{d}| \partial_{x_i} z_{h,i} |{}^2 \; \mathrm{d}(x,t) = d\sum_{i=1}^{d}\| \partial_{x_i} z_{h,i} \|{}_{L_2(K)}^2, \end{array} \end{aligned} $$

for all z h ∈ Z h(K). Now, by a simple scaling argument, we can estimate each element in the sum, and obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} d\sum_{i=1}^{d}\| \partial_{x_i} z_{h,i} \|{}_{L_2(K)}^2 &\displaystyle \leq&\displaystyle d\sum_{i=1}^{d}C^2h_K^{-2}\| z_{h,i} \|{}_{L_2(K)}^2\\ &\displaystyle =&\displaystyle d C^2 h_K^{-2} \| z_h \|{}_{L_2(K)}^2,\vspace{-2pt} \end{array} \end{aligned} $$

where C is a positive constant that is independent of h K. Taking the square root and setting \( c_{I,3} := C\sqrt {d} \) closes the proof. □

Lemma 13.3 gives information how the two norms in (13.19) scale w.r.t. the mesh-size h K. However, the estimate (13.19) is not sharp w.r.t. the constant.

Lemma 13.4

Let the assumptions of Lemma 13.3 hold. Then

$$\displaystyle \begin{aligned} \| \mathrm{div}_x(\nu w_h) \|{}_{L_2(K)} \leq c_{opt} \| \nu w_h \|{}_{L_2(K)}, \forall w_h\in W_h|{}_K, \end{aligned}$$

with \( c_{opt}^2 = \sup _{0\neq z_h\in Z_h(K)} \frac {\| \mathrm {div}_x(z_h) \|{ }_{L_2(K)}^2}{\| z_h \|{ }_{L_2(K)}^2} \le C^2 d h_K^{-1}\) .

Proof

See [36]. □

Remark 13.4

We note that the constant c opt in Lemma 13.4 is not only optimal, but also computable. If z h ∈ Z h(K), then, by definition, we have the representation

$$\displaystyle \begin{aligned} z_h(x,t) = \sum_{j\in\tilde{\omega}_{K}} \tilde{z}_j \tilde{q}^{(j)},\end{aligned} $$
(13.20)

where \( \{\tilde {q}^{(j)}\}_{j\in \tilde {\omega }_K} \) forms some basis of Z h(K). Once some basis is chosen, we can rewrite

$$\displaystyle \begin{aligned} \| z_h \|{}_{L_2(K)}^2 = (z_h,z_h)_{L_2(K)}\quad \mbox{and}\quad \| \mathrm{div}_x z_h \|{}_{L_2(K)}^2 = \underbrace{(\mathrm{div}_x z_h,\mathrm{div}_x z_h)_{L_2(K)}}_{=:b(z_h,z_h)}\end{aligned}$$

in the form

$$\displaystyle \begin{aligned} (y_h,z_h)_{L_2(K)} = (M_h\underline{y},\underline{z}) \quad \mbox{and}\quad b(y_h,z_h) = (B_h \underline{y},\underline{z}),\end{aligned} $$

with the element mass matrix \(M_h = (M_{ij} = (\tilde {q}^{(j)},\tilde {q}^{(i)})_{L_2(K)})_{i,j \in \tilde {\omega }_K}\) and the element divx-stiffness matrix \(B_h = (B_{ij} = b(\tilde {q}^{(j)},\tilde {q}^{(i)})_{L_2(K)})_{i,j \in \tilde {\omega }_K}\), respectively. Here, the vectors \( \underline {y} \) and \( \underline {z} \) are the vector of coefficients in the representation (13.20) w.r.t. the chosen basis \( \{\tilde {q}^{(j)}\}_{j\in \tilde {\omega }_K} \). Using this matrix representation, we immediately get

$$\displaystyle \begin{aligned} c_{opt}^2 = \sup_{0\neq z_h\in Z_h(K)} \frac{\| \mathrm{div}_x(z_h) \|{}_{L_2(K)}^2}{\| z_h \|{}_{L_2QT(K)}^2} = \sup_{\underline{z}\in{\mathbf{R}}^{N_{K}=| \tilde{\omega}_K |}} \frac{(B_h \underline{z},\underline{z})_{\ell_2} }{(M_h \underline{z},\underline{z})_{\ell_2}}. \end{aligned}$$

Hence, \( c_{opt}^2 \) is the largest eigenvalue of the generalized eigenvalue problem

$$\displaystyle \begin{aligned} B_h\underline{z} = \lambda M_h\underline{z}, \end{aligned}$$

that can easily be computed.

Now, we are in the position to proof the following coercivity lemma that is crucial for our approach.

Lemma 13.5

There exits a positive constant μ c such that

$$\displaystyle \begin{aligned} a_h(v_h,v_h) \geq \mu_c \| v_h \|{}_h^2, \quad \forall v_h\in V_{0h}, \end{aligned}$$

with \( \mu _c = \min _{K\in \mathcal {T}_h} \big \{ 1-c_{I,3}\sqrt {\frac {\overline {\nu }_{K}\theta _K}{4h_K}} \big \} \geq \frac {1}{2}\) provided that \( \theta _K \leq \frac {h_K}{c_{I,3}^2 \overline {\nu }_{K}} \) . For instance, \( \theta _K = \frac {h_K}{c_{I,3}^2 \overline {\nu }_{K}} \) yields \( \mu _c = \frac {1}{2} \).

Proof

Integration by parts in the last term of (13.10) yields

$$\displaystyle \begin{aligned} \begin{array}{rcl} a_h(v_h,v_h) &\displaystyle =&\displaystyle \sum_{K\in\mathcal{T}_h} \left[\int_{K} \frac{1}{2}\partial_{t}(v_h^2) \; \mathrm{d}(x,t) + \theta_{K} h_K \| \partial_{t}v_h \|{}_{L_2(K)}^2 {+} \int_{K} \nu| \nabla_xv_h |{}^2 \; \mathrm{d}(x,t)\right. \\ &\displaystyle &\displaystyle \left. - \int_{K} \theta_{K} h_K\,\mathrm{div}_x(\nu\nabla_xv_h)\partial_{t}v_h \; \mathrm{d}(x,t)\right]. \end{array} \end{aligned} $$

Now using Gauss’ theorem and the facts that v h is continuous across the element boundary and that n t = 0 on Σ, we obtain

The first, second and third term already appear in the definition of our mesh-dependent norm (13.13). It remains to estimate the last term. Using the Cauchy-Schwarz inequality, Lemma 13.3, and a scaled Young’s inequality, we arrive at the estimate

$$\displaystyle \begin{aligned} \begin{array}{rcl} | \theta_{K} h_K\,\int_{K} &\displaystyle \mathrm{div}_x(\nu\nabla_xv_h)&\displaystyle \partial_{t}v_h \; \mathrm{d}(x,t) | \\ &\displaystyle &\displaystyle \leq c_{I,3}\big(\frac{\varepsilon\overline{\nu}_{K}\theta_K}{2h_K}\| \nabla_xv_h \|{}_{L_2^{\nu}(K)}^2+\frac{1}{2\varepsilon}\theta_{K} h_K\| \partial_{t}v_h \|{}_{L_2(K)}^2\big). \end{array} \end{aligned} $$

This estimate and the fact that v h = 0 on Σ 0 immediately yield the estimate

$$\displaystyle \begin{aligned} \begin{array}{rcl} a_h(v_h,v_h) \geq \frac{1}{2}\| v_h \|{}_{L_2(\varSigma_T)}^2 + \sum_{K\in\mathcal{T}_h}&\displaystyle &\displaystyle \big[\big(1-\frac{c_{I,3}}{2\varepsilon}\big)\theta_{K} h_K\| \partial_{t}v_h \|{}_{L_2(K)}^2 \\ &\displaystyle &\displaystyle + \big(1-\varepsilon\frac{c_{I,3}\overline{\nu}_{K}\theta_K}{2h_K}\big)\| \nabla_xv_h \|{}_{L_2^{\nu}(K)}^2\big]. \end{array} \end{aligned} $$

We now choose \( \varepsilon = \sqrt {h_K/(\theta _K\overline {\nu }_{K})} \) and obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} a_h(v_h,v_h) &\displaystyle \geq &\displaystyle \min_{K\in\mathcal{T}_h} \left(1-c_{I,3}\sqrt{\frac{\theta_K\,\overline{\nu}_{K}}{4h_K}}\right)\\ &\displaystyle &\displaystyle \qquad \times\bigg(\sum_{K\in\mathcal{T}_h}\big[\| \nabla_xv_h \|{}_{L_2^{\nu}(K)}^2 + \theta_{K} h_K\| \partial_{t}v_h \|{}_{L_2(K)}^2\big] + \frac{1}{2}\| v_h \|{}_{L_2(\varSigma_T)}^2\bigg)\\ &\displaystyle \geq&\displaystyle \mu_c \| v_h \|{}_h^2, \end{array} \end{aligned} $$

which concludes the proof. □

Remark 13.5

The above proof holds for any polynomial degree p ≥ 1 and any fixed, uniformly positive ν ∈ L (Q). However, for the special case p = 1 and ν|K = const, the proof is trivial since t(∇x v h) ≡ 0 and ν|K Δ x v h ≡ 0. Hence, the identity

$$\displaystyle \begin{aligned} a_h(v_h,v_h) = \sum_{K\in\mathcal{T}_h} \frac{1}{2} \int_{\partial K} v_h^2\,n_{t} \; \mathrm{d}s_{(x,t)} + \theta_{K} h_K\| \partial_{t}v_h \|{}_{L_2(K)}^2 + \| \nu^{1/2} \nabla_x v_h \|{}_{L_2(K)}^2 = \| v_h \|{}_h^2 \end{aligned}$$

holds, i.e., μ c = 1. Thus, for this special case, the choice of θ K has no influence on the coercivity (ellipticity) of the space-time finite element method.

Lemma 13.5 already ensures uniqueness of the finite element solution u h ∈ V 0h. Furthermore, the space V 0h is finite-dimensional. Hence, uniqueness implies existence of finite element solution u h ∈ V 0h of (13.9). For the special case of uniform meshes and uniform θ, i.e., h K = h and θ K = θ for all \( K\in \mathcal {T}_h \), and ν ≡ 1, a proof for ellipticity with a mesh-independent constant was done by Langer, Moore and Neumüller in [26] and by Moore in [29]. For a second special case, where θ K vanishes, i.e., θ K = θ = 0 for all \( K\in \mathcal {T}_h \), Steinbach has shown existence and uniqueness of solutions to both the continuous and discrete version of (13.9) on the basis of Banach-Nec̆as-Babuška’s theorem in [39]. In addition, both papers also include a priori discretization error estimates, where Steinbach’s estimate is based on a discrete \( \inf \)-\(\sup \) condition. To derive an a priori error estimate w.r.t. the mesh dependent norm (13.13), we need to show that our bilinear form a h(⋅, ⋅) is uniformly bounded on V 0h,∗× V 0h, where \( V_{0h,*} = H^{1,0}_0(Q)\cap H^{2}(\mathcal {T}_h) + V_{0h} \) is equipped with the norm

$$\displaystyle \begin{aligned} \begin{array}{rcl} \| v \|{}_{h,*}^2 &\displaystyle =&\displaystyle \frac{1}{2}\| v \|{}_{L_2(\varSigma_T)}^2 + \sum_{K\in\mathcal{T}_h} \big[ \theta_{K} h_K\| \partial_{t} v \|{}_{L_2(K)}^2 + \| \nabla_x v \|{}_{L_2^{\nu}(K)}^2 \\ &\displaystyle &\displaystyle \qquad \qquad \qquad + (\theta_{K} h_K)^{-1}\| v \|{}_{L_2(K)}^2 + \theta_{K} h_K | v |{}_{H^{2}(K)}^2 \big]{} \end{array} \end{aligned} $$
(13.21)

Moreover, we will make use of the following scaled trace inequality.

Lemma 13.6

There exists a positive constants c Tr > 0 such that

$$\displaystyle \begin{aligned} \| v \|{}_{L_2(\partial K)}^2 \leq 2c_{Tr}^2 h_K^{-1} \big(\| v \|{}_{L_2(K)}^2 + h_K^2 \| \nabla v \|{}_{L_2(K)}^2 \big) {} \end{aligned} $$
(13.22)

for all v  H 1(K) and for all \(K\in \mathcal {T}_h \).

Proof

See, e.g., [35]. □

Lemma 13.7

The bilinear form a h(⋅, ⋅) is uniformly bounded on V 0h,∗ × V 0h , i.e.,

$$\displaystyle \begin{aligned} | a_h(u,v_h) | \leq \mu_b \| u \|{}_{h,*}\,\| v_h \|{}_h, \quad \forall \; u \in V_{0h,*}, v_h \in V_{0h}, \end{aligned}$$

where \( \mu _b = \max _{K\in \mathcal {T}_h}\big \{2(1+\theta _{K} h_K^{-1}c_{Tr}^2 \overline {\nu }_K^2 \underline {\nu }_K^{-1}), 2c_{Tr}^2\overline {\nu }_K^2,2+c_{I,1}^2, 1+(c_{I,\nu }\theta _K)^2 \big \}^{1/2} \) that is uniformly bounded provided that \( \theta _K = \mathcal {O}(h_K) \).

Proof

We will estimate the bilinear form (13.10) term by term. Since \( V_{0h}\subset H^{1,1}_{0, \underline {0}}(Q) \), we can apply integration by parts and the Cauchy-Schwarz inequality to the first term, and obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left| \sum_{K\in\mathcal{T}_h}\int_{K} \partial_{t} u v_h \; \mathrm{d}(x,t) \right| &\displaystyle \leq&\displaystyle \sum_{K\in\mathcal{T}_h}\big[\big((\theta_{K} h_K)^{-1}\| u \|{}_{L_2(K)}^2\big)^{1/2}\big(\theta_{K} h_K\| \partial_{t}v_h \|{}_{L_2(K)}^2\big)^{1/2}\big] \\ &\displaystyle &\displaystyle \quad + \big(\| u \|{}_{L_2(\varSigma_T)}^2\big)^{1/2}\big(\| v_h \|{}_{L_2(\varSigma_T)}^2\big)^{1/2}. \end{array} \end{aligned} $$

For the second and third term, applying the Cauchy-Schwarz inequality to each term of the sum, we immediately get the estimates

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left| \theta_{K} h_K\int_{K} \partial_{t} u\partial_{t}v_h \; \mathrm{d}(x,t)\right| &\displaystyle \leq&\displaystyle \big(\theta_{K} h_K\| \partial_{t} u \|{}_{L_2(K)}^2\big)^{1/2}\big(\theta_{K} h_K\| \partial_{t}v_h \|{}_{L_2(K)}^2\big)^{1/2},\\ \left|\int_{K} \nu\nabla_x u\nabla_xv_h \; \mathrm{d}(x,t)\right| &\displaystyle \leq&\displaystyle \big(\| \nabla_x u \|{}_{L_2^{\nu}(K)}^2\big)^{1/2}\big(\| \nabla_x v_h \|{}_{L_2^{\nu}(K)}^2\big)^{1/2}, \end{array} \end{aligned} $$

respectively. For the fourth term, we use again Cauchy-Schwarz’ inequality, the inverse estimate (13.18), and obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \left|\theta_{K}h_{K}\int_{K} \nu \nabla_x u\nabla_x(\partial_{t}v_h) \; \mathrm{d}(x,t) \right| \\ &\displaystyle \leq&\displaystyle \left(\| \nabla_x u \|{}_{L_2^{\nu}(K)}^2\right)^{1/2}\left((\theta_{K} h_K)^2 \sum_{i=1}^{d}c_{I,\nu}^2h_K^{-2}\| \partial_{x_i}v_h \|{}_{L_2^{\nu}(K)}^2\right)^{1/2}\\ &\displaystyle =&\displaystyle \big(\| \nabla_x u \|{}_{L_2^{\nu}(K)}^2\big)^{1/2}\big((c_{I,\nu}\theta_K)^2 \| \nabla_xv_h \|{}_{L_2^{\nu}(K)}^2\big)^{1/2}. \end{array} \end{aligned} $$

For the last term, we apply Cauchy-Schwarz’ inequality, the trace inequalities (13.15) and (13.22), and get

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \left|\theta_{K}h_{K}\int_{\partial K} \nu\nabla_x u \cdot n_{x} \partial_{t}v_h \; \mathrm{d}s_{(x,t)}\right| \\ &\displaystyle \leq&\displaystyle \big(2\theta_K\overline{\nu}_{K}^2c_{Tr}^2h_K^{-1} \big[\| \nabla_x u \|{}_{L_2(K)}^2 + h_K^2 \sum_{i=1}^{d}\| \nabla\partial_{x_i} u \|{}_{L_2(K)}^2\big]\big)^{1/2} \big(\theta_{K} h_Kc_{I,1}^2 \| \partial_{t}v_h \|{}_{L_2(K)}^2\big)^{1/2}\\ &\displaystyle \leq&\displaystyle \left(2\theta_Kc_{Tr}^2\frac{\overline{\nu}_K^2}{\underline{\nu}_K}h_K^{-1}\| \nabla_x u \|{}_{L_2^{\nu}(K)}^2 + 2c_{Tr}^2 \overline{\nu}_K^2\theta_{K} h_K| u |{}^2_{H^{2}(K)}\right)^{1/2} \left(c_{I,1}^2\theta_{K} h_K\| \partial_{t}v_h \|{}_{L_2(K)}^2\right)^{1/2}. \end{array} \end{aligned} $$

Now combining the above estimates, applying Cauchy’s inequality and gathering all similar items, we finally arrive at the estimate

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle | a_h(u,&\displaystyle v_h) | \\ &\displaystyle \leq&\displaystyle \bigg(\| u \|{}_{L_2(\varSigma_T)}^2 {+} \sum_{K\in\mathcal{T}_h} \big[\theta_{K} h_K\| \partial_{t} u \|{}_{L_2(K)}^2 +2(1+\theta_Kc_{Tr}^2\frac{\overline{\nu}_K^2}{\underline{\nu}_K}h_K^{-1})\| \nabla_x u \|{}_{L_2^{\nu}(K)}^2 \\ &\displaystyle &\displaystyle \qquad + (\theta_{K} h_K)^{-1} \| u \|{}_{L_2(K)}^2 + 2c_{Tr}^2 \overline{\nu}_K^2\theta_{K} h_K| u |{}^2_{H^{2}(K)} \big]\bigg)^{1/2}\\ &\displaystyle &\displaystyle \quad \times \bigg(\| v_h \|{}_{L_2(\varSigma_T)}^2 + \sum_{K\in\mathcal{T}_h} \big[ (2+c_{I,1}^2)\theta_{K} h_K\| \partial_{t}v_h \|{}_{L_2(K)}^2 \\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad \qquad \qquad +\big(1+(c_{I,1}\theta_K)^2\big)\| \nabla_x v_h \|{}_{L_2^{\nu}(K)}^2 \big]\bigg)^{1/2} \\ &\displaystyle \leq&\displaystyle \mu_b\| u \|{}_{h,*}\| v_h \|{}_h, \end{array} \end{aligned} $$

with \( \mu _b := \max _{K\in \mathcal {T}_h}\big \{2(1+\theta _{K} h_K^{-1}c_{Tr}^2\frac {\overline {\nu }_K^2}{ \underline {\nu }_K}),2c_{Tr}^2\overline {\nu }_K^2,2+c_{I,1}^2, 1+(c_{I,\nu }\theta _K)^2 \big \}^{1/2} \). Choosing now \( \theta _K = \mathcal {O}(h_K) \) ensures the uniform boundedness of the constant μ b. □

Remark 13.6

Choosing θ K as in Lemma 13.5, i.e., \( \theta _K = h_K/(c_{I,3}^2\overline {\nu }_K) \), we obtain μ c = 1∕2 and \( \mu _b = \max _{K\in \mathcal {T}_h}\big \{2(1+\frac {\overline {\nu }_Kc_{Tr}^2}{ \underline {\nu }_Kc_{I,3}^2}),2c_{Tr}^2\overline {\nu }_K^2,2+c_{I,1}^2, 1+(\frac {c_{I,\nu }h_K}{c_{I,3}^2\overline {\nu }_K})^2 \big \}^{1/2}\).

Remark 13.7

If we consider the bilinear form from Remark 13.2, we can derive an equivalent statement, but in a different norm ∥⋅∥h,∗ defined as

$$\displaystyle \begin{aligned} \begin{array}{rcl} \| v \|{}_{h,*}^2 &\displaystyle =&\displaystyle \frac{1}{2}\| v \|{}_{L_2(\varSigma_T)}^2 + \sum_{K\in\mathcal{T}_h} \big[ \theta_{K} h_K\| \partial_{t} v \|{}_{L_2(K)}^2 + \| \nabla_x v \|{}_{L_2^{\nu}(K)}^2\\ &\displaystyle &\displaystyle \qquad \qquad \qquad + (\theta_{K} h_K)^{-1}\| v \|{}_{L_2(K)}^2 + \theta_{K} h_K \| \mathrm{div}_x(\nu\nabla_{x}v) \|{}_{L_{2}(K)}^2 \big].\vspace{-3pt} \end{array} \end{aligned} $$

By the same arguments as in the proof above, we estimate the first three terms in (13.11) by

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left| \sum_{K\in\mathcal{T}_h}\int_{K} \partial_{t} u v_h \; \mathrm{d}(x,t) \right| &\displaystyle \leq&\displaystyle \sum_{K\in\mathcal{T}_h}\big[\big((\theta_{K} h_K)^{-1}\| u \|{}_{L_2(K)}^2\big)^{1/2}\big(\theta_{K} h_K\| \partial_{t}v_h \|{}_{L_2(K)}^2\big)^{1/2}\big] \\ &\displaystyle &\displaystyle \quad + \big(\| u \|{}_{L_2(\varSigma_T)}^2\big)^{1/2}\big(\| v_h \|{}_{L_2(\varSigma_T)}^2\big)^{1/2},\\ \left| \theta_{K} h_K\int_{K} \partial_{t} u\partial_{t}v_h \; \mathrm{d}(x,t)\right| &\displaystyle \leq&\displaystyle \big(\theta_{K} h_K\| \partial_{t} u \|{}_{L_2(K)}^2\big)^{1/2}\big(\theta_{K} h_K\| \partial_{t}v_h \|{}_{L_2(K)}^2\big)^{1/2},\\ \left|\int_{K} \nu\nabla_x u\nabla_xv_h \; \mathrm{d}(x,t)\right| &\displaystyle \leq&\displaystyle \big(\| \nabla_x u \|{}_{L_2^{\nu}(K)}^2\big)^{1/2}\big(\| \nabla_x v_h \|{}_{L_2^{\nu}(K)}^2\big)^{1/2}.\vspace{-3pt} \end{array} \end{aligned} $$

For the fourth term, we just apply the Cauchy-Schwarz inequality to each term of the sum to obtain

$$\displaystyle \begin{aligned} \left|\theta_{K}h_{K}\int_{K} \mathrm{div}_x\left(\nu\nabla_{x}u\right)\partial_{t}v_{h}\;\mathrm{d}(x,t) \right| \leq \left(\theta_{K}h_{K}\|\mathrm{div}_x\left(\nu\nabla_{x}u\right)\|{}_{L_2(K)}^2\right)^{1/2}\left(\theta_{K}h_{K}\|\partial_{t}v_h\|{}_{L_2(K)}^2\right)^{1/2}.\end{aligned} $$

Now, combining the above estimates, applying Cauchy’s inequality and reordering the terms, we finally obtain the estimate

$$\displaystyle \begin{aligned} \begin{array}{rcl} | a_h(u,&\displaystyle v_h) |&\displaystyle \\ &\displaystyle \leq&\displaystyle \bigg(\| u \|{}_{L_2(\varSigma_T)}^2 + \sum_{K\in\mathcal{T}_h} \big[\theta_{K} h_K\| \partial_{t} u \|{}_{L_2(K)}^2 + \| \nabla_x u \|{}_{L_2^{\nu}(K)}^2 \\ &\displaystyle &\displaystyle \qquad + (\theta_{K} h_K)^{-1} \| u \|{}_{L_2(K)}^2 + \theta_{K}h_{K} \|\mathrm{div}_x\left(\nu\nabla_{x}u\right)\|{}_{L_2(K)}^2 \big]\bigg)^{1/2}\\ &\displaystyle &\displaystyle \quad \times \bigg(\| v_h \|{}_{L_2(\varSigma_T)}^2 + \sum_{K\in\mathcal{T}_h} \big[ 3\theta_{K} h_K\| \partial_{t}v_h \|{}_{L_2(K)}^2 + \| \nabla_x v_h \|{}_{L_2^{\nu}(K)}^2 \big]\bigg)^{1/2} \\ &\displaystyle \leq&\displaystyle 3 \| u \|{}_{h,*}\| v_h \|{}_h. \end{array} \end{aligned} $$

Thus, the bilinear form (13.11) is bounded for all choices of θ K.

Remark 13.8

As in Remark 13.5, we can provide a simplified estimate for the special case p = 1 and ν|K = ν K = const. The first three terms can be estimated as in the above proof. The fourth term completely vanishes, since ∇x( t v h) = 0. For the fifth term, we use the fact that t v h = const on \(K \in \mathcal {T}_h\), Gauss’ theorem and the Cauchy-Schwarz inequality, obtaining

$$\displaystyle \begin{aligned} |\theta_{K} h_K\int_{\partial K} \nu_K \nabla_x u\cdot n_{x} \partial_{t}v_h \; \mathrm{d}s_{(x,t)} | \leq\big(\theta_{K} h_K\nu_K^2 \| \varDelta_x u \|{}_{L_2(K)}^2\big)^{1/2}\big(\theta_{K} h_K\| \partial_{t}v_h \|{}_{L_2(K)}^2\big)^{1/2}.\end{aligned} $$

Gathering the terms from the proof and the above estimate, we get

$$\displaystyle \begin{aligned} \begin{array}{rcl} | a_h(u,v_h) | &\displaystyle \leq&\displaystyle \bigg(\| u \|{}_{L_2(\varSigma_T)}^2 + \sum_{K\in\mathcal{T}_h} \big[\theta_{K} h_K\| \partial_{t} u \|{}_{L_2(K)}^2 +\| \nabla_x u \|{}_{L_2^{\nu}(K)}^2 \\ &\displaystyle &\displaystyle \qquad + (\theta_{K} h_K)^{-1} \| u \|{}_{L_2(K)}^2 + \nu_K^2\theta_{K} h_K\| \varDelta_x u \|{}_{L_2(K)}^2 \big]\bigg)^{1/2}\\ &\displaystyle &\displaystyle \quad \times \bigg(\| v_h \|{}_{L_2(\varSigma_T)}^2\\ &\displaystyle &\displaystyle \qquad \qquad + \sum_{K\in\mathcal{T}_h} \big[ 3\theta_{K} h_K\| \partial_{t}v_h \|{}_{L_2(K)}^2+\| \nabla_x v_h \|{}_{L_2^{\nu}(K)}^2 \big]\bigg)^{1/2}\\ &\displaystyle \leq&\displaystyle \max_{K\in\mathcal{T}_h}\{3,\nu_K^2 \}^{1/2} \| u \|{}_{h,*}\| v_h \|{}_h.\vspace{-3pt} \end{array} \end{aligned} $$

We immediately deduce that this new constant \( \tilde {\mu }_b = \max _{K\in \mathcal {T}_h}\{3,\nu _K^2 \}^{1/2} \) is also independent of h K for all choices of positive θ K, \(K \in \mathcal {T}_h\).

Coercivity, boundedness, and consistency of the bilinear form a h(⋅, ⋅) immediately yield a Céa-like estimate of the discretization error in the norm ∥⋅∥h by the best approximation error in the norm ∥⋅∥h,∗.

Lemma 13.8

Let the assumptions of the coercivity Lemma 13.5 and the boundedness Lemma 13.7 hold, and let the solution u of the space-time variational problem (13.4) belong to \(H^{2,1}_{0, \underline {0}}(\mathcal {T}_h)\) . Then the discretization error estimate

$$\displaystyle \begin{aligned} \| u - u_h \|{}_h \le \left( 1 + \frac{\mu_b}{\mu_c} \right) \inf_{v_h \in V_{0h}} \| u - v_h \|{}_{h,*} \end{aligned} $$
(13.23)

hold, where u h ∈ V 0h denotes the solution of the space-time finite element scheme (13.12), and the norms ∥⋅∥h and ∥⋅∥h,∗ are defined by (13.13) and (13.21), respectively.

Proof

First, from the consistency identity (13.9) and the space-time finite element scheme (13.12), we immediately deduce Galerkin orthogonality, i.e.,

$$\displaystyle \begin{aligned} a_h(u-u_{h},v_h) = 0,\quad \forall v_h\in V_{0h}. \end{aligned} $$
(13.24)

We start with the triangle inequality for the discretization error, i.e.,

$$\displaystyle \begin{aligned} \| u-u_{h} \|{}_h \leq \| u- v_h \|{}_h + \| v_h - u_{h} \|{}_h. \end{aligned} $$

Applying ellipticity proved in Lemma 13.5, the Galerkin orthogonality (13.24) and the generalized boundedness from Lemma 13.7 to the second term, we get

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mu_c \| v_h - u_{h} \|{}_h^2 &\displaystyle \leq&\displaystyle a_h(v_h - u_{h}, v_h - u_{h}) = a_h( v_h - u, v_h u - u_{h})\\ &\displaystyle \leq&\displaystyle \mu_b \| v_h - u \|{}_{h,*}\| v_h - u_{h} \|{}_h.\vspace{-3pt} \end{array} \end{aligned} $$

Inserting this estimate in the triangle inequality above, we obtain

$$\displaystyle \begin{aligned} \| u-u_{h} \|{}_h \leq \| u- v_h \|{}_h + \frac{\mu_b}{\mu_c}\| v_h - u \|{}_{h,*}.\end{aligned} $$
(13.25)

Since ∥u − v hh ≤∥v h − uh,∗, we immediately get the Céa-like estimate (13.23). □

Remark 13.9

Remark 13.7 immediately implies that the Céa-like estimate (13.23) is also valid for solutions u from \(H^{L,1}_{0, \underline 0}(\mathcal {T}_h)\) provided that the norm ∥⋅∥h,∗ is now defined as in Remark 13.7.

To obtain a priori error estimates w.r.t. to the mesh dependent norm (13.13), we need approximation respectively interpolation error estimates for the finite element spaces V 0h w.r.t. the norm (13.21), which we summarize in the next Lemmas. Moreover, we need the broken Sobolev space

$$\displaystyle \begin{aligned} H^{l}(\mathcal{T}_h) := \{ v\in L_2(Q) : v|{}_{K} \in H^{l}(K)\; \forall K \in \mathcal{T}_h \}, \end{aligned}$$

equipped with the broken Sobolev (semi-)norm

$$\displaystyle \begin{aligned} | v |{}_{H^{l}(\mathcal{T}_h)}^2:=\sum_{K\in\mathcal{T}_h}| v |{}_{H^{l}(K)}^2\quad \mbox{and}\quad \| v \|{}_{H^l(\mathcal{T}_h)}^2 := \sum_{K\in\mathcal{T}_h}\| v \|{}_{H^{l}(K)}^2, \end{aligned}$$

where l is some positive integer. For further details on such spaces, we refer to [11, 35].

Lemma 13.9

Let l and k be positive integers with l  k > (d + 1)∕2, and let \( v \in V_0\cap H^k(Q)\cap H^{l}(\mathcal {T}_h) \) , where \( \{\mathcal {T}_h\}_{h>0} \) is a shape regular family of subdivisions of Q. Then there exists an interpolation operator Π h , mapping from V 0 ∩ H k(Q) to V 0h , such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \| v-\varPi_h v \|{}_{L_2(K)} &\displaystyle \leq&\displaystyle C\,h_K^{s}\ | v |{}_{H^{s}(K)}, {} \end{array} \end{aligned} $$
(13.26)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \| \nabla(v-\varPi_h v) \|{}_{L_2(K)} &\displaystyle \leq&\displaystyle C\,h_K^{s-1}\ | v |{}_{H^{s}(K)}, {} \end{array} \end{aligned} $$
(13.27)
$$\displaystyle \begin{aligned} \begin{array}{rcl} | v-\varPi_h v |{}_{H^{2}(K)} &\displaystyle \leq&\displaystyle C\,h_K^{s-2}\ | v |{}_{H^{s}(K)}, {} \end{array} \end{aligned} $$
(13.28)

where C is some generic constant independent of h K and v, \(s = \min \{l,p+1\}\) , and p denotes the polynomial degree of the finite element shape functions on the reference element, and \( V_0 = H^{1,1}_{0, \underline {0}}(Q) \).

Proof

See e.g. [8, Theorem 4.4.4] or [9, Theorem 3.1.6]. □

Lemma 13.10

Let the assumptions of Lemma 13.9 hold. Then the following interpolation error estimates are valid:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \| v-\varPi_h v \|{}_{L_2(\varSigma_T)} &\leq& c_1\big(\sum_{\begin{array}{c} K\in\mathcal{T}_h\\ \partial K\cap\varSigma_T\neq\emptyset\end{array}}h_K^{2s-1}| v |{}_{H^{s}(K)}^2\big)^{1/2}, {} \end{array} \end{aligned} $$
(13.29)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \| v-\varPi_h v \|{}_h &\leq& c_2 \big(\sum_{K\in\mathcal{T}_h}h_K^{2(s-1)} | v |{}_{H^{s}(K)}\big)^{1/2}, {} \end{array} \end{aligned} $$
(13.30)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \| v-\varPi_h v \|{}_{h,*} &\leq& c_3 \big(\sum_{K\in\mathcal{T}_h}h_K^{2(s-1)} | v |{}_{H^{s}(K)}\big)^{1/2}, {} \end{array} \end{aligned} $$
(13.31)

with positive constants c 1, c 2 and c 3 that do not depend on v or h K provided that \( \theta _K=\mathcal {O}(h_K) \) for all \(K \in \mathcal {T}_h\).

Proof

We start with the first estimate (13.29). We use the scaled trace inequality (13.22), and the interpolation error estimates (13.26) and (13.27), obtaining

$$\displaystyle \begin{aligned} \begin{array}{rcl} \| v-\varPi_h v \|{}_{L_2(\varSigma_T)}^2 &\displaystyle =&\displaystyle \sum_{\begin{array}{c}K\in\mathcal{T}_h\\ \partial K\cap\varSigma_T\neq\emptyset\end{array}} \| v-\varPi_h v \|{}_{L_2(\partial K\cap\varSigma_T)}^2 \leq \sum_{\begin{array}{c}K\in\mathcal{T}_h\\ \partial K\cap\varSigma_T\neq\emptyset\end{array}} \| v-\varPi_h v \|{}_{L_2(\partial K)}^2 \\ &\displaystyle \leq&\displaystyle \sum_{\begin{array}{c}K\in\mathcal{T}_h\\ \partial K\cap\varSigma_T\neq\emptyset\end{array}} \big[ 2c_{Tr}^2h_K^{-1}(\| v-\varPi_h v \|{}_{L_2(K)}^2 + h_K^2 \| \nabla (v-\varPi_h v) \|{}_{L_2(K)}^2) \big] \\ &\displaystyle \leq&\displaystyle c_{Tr}^2\,C^2 \sum_{\begin{array}{c}K\in\mathcal{T}_h\\ \partial K\cap\varSigma_T\neq\emptyset\end{array}} [h_K^{2s-1} | v |{}_{H^{s}(K)}]. \end{array} \end{aligned} $$

To prove (13.30), we use definition (13.13), assumption (13.14), the interpolation error estimate (13.27), and the above estimate (13.29), and obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} \| v-\varPi_h v \|{}_h^2 &\displaystyle =&\displaystyle \sum_{K\in\mathcal{T}_h} \big[\theta_{K} h_K\| \partial_{t}(v-\varPi_h v) \|{}_{L_2(K)}^2 + \| \nabla_x(v-\varPi_h v) \|{}_{L_2^{\nu}(K)}^2\big] \\ &\displaystyle &\displaystyle \quad + \frac{1}{2} \| v-\varPi_h v \|{}_{L_2(\varSigma_T)}^2 \\ &\displaystyle \leq&\displaystyle \sum_{K\in\mathcal{T}_h} \bigg[(C^2\theta_{K} h_K + \overline{\nu}_K C^2 + c_1^2 h_K) h_K^{2(s-1)} | v |{}_{H^{s}(K)}^2 \bigg]. \end{array} \end{aligned} $$

For the last estimate (13.31), we use definition (13.21), the above estimate (13.30), and the interpolation error estimate (13.28), obtaining

$$\displaystyle \begin{aligned} \begin{array}{rcl} \| v-\varPi_h v \|{}_{h,*}^2 &\displaystyle =&\displaystyle \| v-\varPi_h v \|{}_h^2 \\ &\displaystyle &\displaystyle + \sum_{K\in\mathcal{T}_h} \big[(\theta_{K} h_K)^{-1}\| v-\varPi_h v \|{}_{L_2(K)}^2 + \theta_{K} h_K| v-\varPi_h v |{}_{H^{2}(K)}^2\big]\\ &\displaystyle \leq&\displaystyle \sum_{K\in\mathcal{T}_h} \big(c_2^2 + h_K\theta_K^{-1} C^2 + \theta_{K} h_K^{-1} C^2\big) h_K^{2(s-1)} | v |{}_{H^{s}(K)}^2. \end{array} \end{aligned} $$

The special choice \( \theta _K = \mathcal {O}(h_K) \) ensures that the constant c 3 is independent of h K. □

Remark 13.10

The strong assumption v ∈ H k(Q) with k > (d + 1)∕2 is needed for the Lagrangian interpolation operator. However, in practical applications, where usually different materials occur, this requirement is too restrictive. In this case, the space-time cylinder \( \overline {Q} = \bigcup _{i=1}^{M} \overline {Q_i} \) can be split into subdomains Q i, which correspond to different materials. On each such subdomain Q i, we can assume some regularity for the solution u, e.g., \( u \in H^{\mathbf {l}}(\mathcal {T}(Q)) := \{ v\in L_2(Q) : v|{ }_{Q_i}\in H^{l_i}(Q_i), \mbox{ for all } i=1,\dots , M \} \) with some l = (l 1, …, l M) > 1. For a similar case, Duan, Li, Tan and Zheng have shown a localized interpolation error estimate of the form

$$\displaystyle \begin{aligned} \| \nabla(v - I_h v) \|{}_{L_2(Q)} \leq C \sum_{i=1}^{M} h_i^{s_i-1}\| v \|{}_{H^{s_i}(Q_i)}, \end{aligned}$$

in [13], where I h is a special quasi-interpolation operator, and \(s_i = \min \{l_i,p+1\}\).

Now we can formulate the following a priori estimate for the discretization error.

Theorem 13.3

Let l and k be positive integers with l  k > (d + 1)∕2, \( u\in V_0 \cap H^k(Q) \cap H^{l}(\mathcal {T}_h) \) be the exact solution, and u h ∈ V 0h be the solution of the finite element scheme (13.12). Furthermore, let the assumptions of the Lemmas 13.5 (coercivity), 13.7 (boundedness) and 13.10 (interpolation error estimates) be fulfilled. Then the a priori error estimate

$$\displaystyle \begin{aligned} \| u-u_h \|{}_h \leq c\bigg(\sum_{K\in\mathcal{T}_h}h_K^{2(s-1)}| u |{}_{H^{s}(K)}^2\bigg)^{1/2} {} \end{aligned} $$
(13.32)

holds with \(s = \min \{l,p+1\}\) and some generic positive constant c.

Proof

Setting v h = Π h u in (13.25), and using the interpolation error estimates (13.30) and (13.31), we obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} \| u-u_{h} \|{}_h &\displaystyle \leq&\displaystyle \| u-\varPi_h u \|{}_h + \frac{\mu_b}{\mu_c}\| \varPi_h u - u \|{}_{h,*}\\ &\displaystyle \leq&\displaystyle \big( c_2 + c_3 \frac{\mu_b}{\mu_c} \big) \big(\sum_{K\in\mathcal{T}_h} h_K^{2(s-1)} | u |{}_{H^{s}(K)}^2\big)^{1/2}, \end{array} \end{aligned} $$

which proves estimate (13.32) with c = c 2 + c 3(μ bμ c). □

Now we proceed with solving the discrete variational problem (13.12) that is nothing but one huge system of linear algebraic equations. Indeed, let {p (i) : i = 1, …, N h} be the finite element nodal basis of V 0h, i.e., \(V_{0h} = \mbox{span}\{p^{(1)},\ldots ,p^{(N_h)} \}\), where N h is the number of all space-time unknowns (dofs). Then we can express the approximate solution u h in terms of this basis, i.e., \( u_{h}(x,t) = \sum _{i=1}^{N_h} u_i\,p^{(i)}(x,t) \). Furthermore, each basis function is a valid test function. Thus, we obtain N h equations from (13.12). We can rewrite this system in terms of a system of linear algebraic equations

$$\displaystyle \begin{aligned} {\mathbf{K}}_{h} {\mathbf{u}}_{h}= {\mathbf{f}}_{h}, \end{aligned} $$
(13.33)

with \( {\mathbf {K}}_{h} = (a_h(p^{(j)},p^{(i)}))_{i,j=1,\ldots ,N_h} \), \({\mathbf {u}}_{h}= (u_j)_{j=1,\ldots ,N_h} \), \( {\mathbf {f}}_{h} = (l_h(p^{(i)}))_{i=1,\ldots ,N_h}\). The system matrix is non-symmetric, but positive definite due to Lemma 13.5. Indeed,

$$\displaystyle \begin{aligned} ({\mathbf{K}}_{h}{\mathbf{v}}_h, {\mathbf{v}}_h) = a_h(v_h,v_h)\geq \mu_c\| v_h \|{}_h^2 > 0 \end{aligned}$$

for all \( V_{0h}\ni v_h \leftrightarrow {\mathbf {v}}_h\in {\mathbf {R}}^{N_h} : {\mathbf {v}}_h \neq 0 \). In dependence on the dimension N h, the linear system (13.33) of algebraic equations can efficiently be solved by means of a sparse direct solver (e.g., sparse LU-factorization) or an iterative solver (e.g., preconditioned GMRES). In particular, it turns out that parallel versions of the GMRES preconditioned by algebraic multigrid can solve large-scale systems with several millions of unknowns on distributed memory computers with several hundreds of cores in a few seconds, also see Example 13.1 in Sect. 13.4.

13.4 Implementation and Numerical Results

We implemented our conforming space-time finite element scheme with the help of MFEM [21], a C++ library for finite elements. The resulting linear systems were then solved by means of the GMRES method, preconditioned by one V-algebraic multigrid (AMG) cycle of BoomerAMG. As a stopping criterion we used the reduction of the initial residual by a factor of 10−8. These methods were provided by the solver library hypre.Footnote 1 We note that both libraries are already fully parallelized with MPI. All numerical tests were performed on the RADON1Footnote 2 high performance computing cluster at Linz. The initial (spatial) meshes were created by NETGEN [38], and the space-time meshes were obtained by means of an algorithm provided by Karabelas and Neumüller [19]. For visualization we either use GLVis [20] or ParaView [2].

Example 13.1

For the first example, we consider the unit (hyper-)cube Q = (0, 1)d+1, with d = 2, 3, as space-time cylinder, and choose the diffusion coefficient ν ≡ 1. The manufactured function

$$\displaystyle \begin{aligned} u(x,t) = \prod_{i=1}^{d}\sin{}(x_i \pi) \sin{}(t\pi) \end{aligned}$$

is chosen as the exact solution, where the right-hand side is computed accordingly. This solution is highly smooth, and thus fulfills all regularity assumptions made for deriving the a prior error estimate (13.32) with optimal rates. Hence, we really can expect optimal convergence rates provided that we choose θ K as in Remark 13.4, i.e., on each element \( K\in \mathcal {T}_h \), we numerically solve a small generalized eigenvalue problem with LAPACK [3]. Indeed, Fig. 13.1b shows optimal convergence rates for all tested polynomial degrees and spatial dimensions. Moreover, we can observe from Fig. 13.1c that the preconditioned GMRES method has an optimal strong scaling behavior for systems with N h = 4 601 025 (p = 1, 2) and N h = 5 764 804 (p = 3) unknowns in the case d = 3, i.e., Q = (0, 1)4. The stagnation of the scaling rate at 256 cores is due to an increased communication overhead since the problems become too small on each processor (only ∼15 000 dofs).

Fig. 13.1
figure 1

Decomposition of the space-time cylinder into 64 subdomains for parallel computing (left); Error rates in the ∥ . ∥h-norm (right); Strong scaling of the solver for d = 3 and N h = 4 601 025, 4 601 025, 5 764 801 for p = 1, 2, 3 (below)

Example 13.2

Let us now consider an example with a moving interface in the unit hyper-cube Q := (0, 1)d+1, with d = 2, 3. The moving interface is defined by the discontinuous diffusion coefficient

$$\displaystyle \begin{aligned} \nu(x,t) = \left\{ \begin{array}{ll} 1\times10^2, &\ \mbox{for }2x_1-t < \frac{1}{2},\\ 7\times10^5, &\ \mbox{for }2x_1-t > \frac{1}{2}; \end{array} \right. \end{aligned}$$

see Fig. 13.2 (left). We choose the function

$$\displaystyle \begin{aligned} u(x,t) =\left\{ \begin{array}{ll} \sin\left(9\pi \left(2x_1-t-\frac{1}{2}\right)^2 (x_1-x_1^2)\right)\sin{}(4\pi t)g(x), &\ \mbox{for }2x_1-t \leq \frac{1}{2},\\ {} \sin\left(40\pi\left(2x_1-t-\frac{1}{2}\right)^2 (x_1-x_1^2)(t-t^2) \right)g(x), &\ \mbox{else}, \end{array} \right. \end{aligned}$$

with \(g(x) = \prod _{i=2}^{d}\sin {}(\pi x_i) \), as our exact solution, and compute the corresponding right-hand side and initial data. This manufactured solution fulfills the interface conditions since this function and its first derivatives are 0 at the interface. Since this function is smooth on both sides of the moving interface, we expect optimal convergence rates; cf. Theorem 13.3. Indeed, for linear, quadratic and cubic shape functions, we observe optimal rates provided that we choose θ K according to Remark 13.4; see Fig. 13.2 (right).

Fig. 13.2
figure 2

Initial space-time mesh and diffusion coefficient ν(x, t) in color (left); Error rates in the ∥ . ∥h-norm (right)

Example 13.3

For the third example, we consider the exact solution

$$\displaystyle \begin{aligned} u(x,t) = \left(x_1^2-x_1\right)\left(x_2^2-x_2\right)\left(t^2-t\right)\mathrm{e}^{-100\left((x_1-0.25)^2 +(x_2-0.25)^2 +(t-0.25)^2 \right)} \end{aligned}$$

in the unit cube Q = (0, 1)3, i.e. d = 2, and compute the initial and boundary conditions as well as the right-hand side accordingly, where we set ν ≡ 1. This function is almost zero everywhere in the space-time cylinder Q except a small area around (0.25, 0.25, 0.25); see Fig. 13.3 (left). This motivates the use of an a posteriori error estimator. In particular, we use the residual-based error indicator proposed by Steinbach and Yang in [40]. For each element \( K \in \mathcal {T}_h \), we compute the error indicator

$$\displaystyle \begin{aligned} \eta_K := \left( h_K^2 \|R_h(u_h)\|{}_{L_2(K)}^2 + h_K \|J_h(u_h) \|{}_{L_2(\partial K)}^2 \right)^{1/2}, \end{aligned}$$

where u h is the solution of the finite element scheme (13.12), and

$$\displaystyle \begin{aligned} \begin{array}{rcl} R_h(u_h) &\displaystyle :=&\displaystyle f + \mathrm{div}_x(\nu \nabla_x u_h) - \partial_t u_h \quad \mbox{in }K,\\ J_h(u_h) &\displaystyle :=&\displaystyle [\nu \nabla_x u_h]_e\qquad \qquad \ \mbox{on }e\subset\partial K. \end{array} \end{aligned} $$

Here, [ . ]e denotes the jump across one face e ⊂ ∂K. We use a maximum marking strategy, i.e., given a parameter Θ ∈ [0, 1], we mark all elements whose error indicator fulfills the condition

$$\displaystyle \begin{aligned} \eta_K \geq \varTheta \max_{K\in\mathcal{T}_h} \eta_K. \end{aligned}$$

Unless stated otherwise, we set Θ = 0.5. We note that uniform refinement is achieved by setting Θ = 0. The exact solution in this example is smooth. Hence, we expect optimal convergence rates for uniform refinement after some pre-asymptotic range, which we indeed observe for all tested polynomial degrees; c.f., Fig. 13.3. For adaptive refinement, we get a better error w.r.t. to the absolute value and optimal convergence rates.

Fig. 13.3
figure 3

Plot of the exact solution at t = 0.25 (left); Convergence rates in the ∥ . ∥h-norm (right)

Example 13.4

For the fourth and last example, we consider the exact solution

$$\displaystyle \begin{aligned} u(x,t) = \sin\left(\frac{1}{\frac{1}{10\pi}+\sqrt{x_1^2+x_2^2+t^2}}\right), \end{aligned}$$

in the unit cube Q = (0, 1)3, i.e. d = 2, and compute the initial and boundary conditions as well as the right-hand side accordingly, where we set ν ≡ 1. This function has a highly oscillatory behavior near the origin (0, 0, 0) and is smooth everywhere else in the space-time cylinder Q; see Fig. 13.3 (left). This again motivates the use of an a posteriori error estimator. We use the same setup as in Example 13.3, i.e., the residual-based error indicator by Steinbach and Yang with a maximum marking strategy. For adaptive refinement, we recover the optimal rates for all polynomial degrees tested, whereas only reduced rates are observed for p = 2, 3; c.f., Fig. 13.4 (right). Moreover, we only need 47 330 dofs to obtain an energy error of the same magnitude as for 135 005 697 dofs after uniform refinement in the case p = 1.

Fig. 13.4
figure 4

Plot of the exact solution (left); Convergence rates in the ∥ . ∥h-norm (right)

13.5 Conclusions and Future Work

In this paper, following the classical books [23] and [24], we recalled that the parabolic initial boundary value problem (13.1)–(13.3) has a unique generalized (weak) solution in \( {H}^{1,0}_{0}(Q) \) that even belongs to \( {V}^{1,0}_{2,0}(Q) \). Already Ladyžhenskaya proved that, in the case ν = 1, the solution u even belongs to H Δ, 1(Q) provided that the right-hand side f ∈ L 2(Q) and initial conditions \(u_0 \in H^1_0(\varOmega )\); see [23]. This setting of the so-called maximal parabolic regularity was also considered in this paper. We again mention that we only need this property element-wise to construct a consistent and stable space-time finite element scheme. We proceeded with deriving a stable space-time finite element scheme, for which we showed coercivity (ellipticity) and boundedness on the finite element spaces respectively extended finite element spaces. These properties together with consistency and standard interpolation or quasi-interpolation error estimates led to a priori discretization error estimates in the corresponding mesh-dependent norm with optimal rates. We performed several numerical experiments with four test problems possessing different features. The first example has a smooth solution that led to optimal convergence rate as predicted by the theory. Moreover, due to the ellipticity of the bilinear form a h(⋅, ⋅), the AMG precondition GMRES is a very efficient parallel solver. The second example has a moving interface that is given by a discontinuous diffusion coefficient ν(x, t) depending on both x and t. In the third and fourth example, we studied adaptivity based on the a posteriori residual error indicator proposed in [40]. It is clear that the interplay of adaptivity and fast parallel iterative solvers will lead to the most efficient completely unstructured adaptive space-time solvers for complicated initial-boundary value problems for linear and even non-linear parabolic partial differential equations. Adaptive Space-Time Finite Element Methods and Solvers can be useful for solving eddy current problems with moving and non-moving parts like in electrical machines. In many practical applications, one is interested in optimal control or in optimal design of electrical machines; see, e.g., [17]. Adaptive Space-Time Finite Element Methods are especially suited for solving the optimality system that is nothing but a coupled PDE system living in the space-time cylinder Q; see, e.g., [44].