1 Introduction

There are many ways that a parabolic second-order PDE can fail to be uniformly parabolic. Many results for such PDEs concern equations that fail to be uniformly parabolic on specified lower-dimensional sets, such as when the dependent variable equals zero (e.g. the porous media equation [1]) or tends to infinity [2], or the time variable equals zero [3]. Other results concern partially parabolic systems, which are uniformly parabolic with respect to some of the spatial variables and contain no second derivatives with respect to the remaining spatial variables [4], or hyperbolic-parabolic systems [5], which are uniformly parabolic with respect to certain equations and contain no second derivatives in the remaining equations. The results presented here place no restrictions on the locations where uniform parabolicity fails and do not assume uniform parabolicity with respect to certain spatial or dependent variables. Linear non-uniformly parabolic equations and systems without such restrictions are a special case of the degenerate elliptic equations and systems treated in [6,7,8], but the results there cannot be generalized to the nonlinear equations and systems considered here because, as detailed in Appendix B, the estimates require more derivatives of the coefficients than are estimated for the solution.

Methods specific to parabolic equations and systems [9] can often be utilized even when uniform parabolicity does not hold at specified locations [10], but do not seem to be applicable under the conditions considered here. The energy method, which is applicable to a wide variety of evolutionary PDEs, will be used instead. The difficulty in deriving Sobolev energy estimates for parabolic equations is due to the troublesome contributions to the energy estimates that arise from differentiating the coefficients of the second-order terms. For uniformly parabolic, partially-parabolic, or hyperbolic-parabolic equations [11, Theorem 4] the troublesome terms can be estimated by cancellation against the helpful contribution obtained on account of full or partial uniform parabolicity. The main point of this paper is that those troublesome terms can be controlled even when the parabolic terms are nonlinear and make no helpful contribution whatsoever.

Initial-value problems will be considered here for PDEs having one of the forms

$$\begin{aligned} A^0(t,x,u) u_t+\sum _{j=1}^d A^j(t,x,u)u_{x_j}=\sum _{j,k=1}^d D^{j,k}(t,x,u)\partial _{x_j}\partial _{x_k}u +F(t,x,u) \end{aligned}$$
(1)

or

$$\begin{aligned} A^0(\varepsilon u) u_t+\sum _{j=1}^d A^j(t,x,u)u_{x_j}+\tfrac{1}{\varepsilon }\sum _{j=1}^d C^ju_{x_j} =\sum _{j,k=1}^d D^{j,k}(t,x,u)\partial _{x_j}\partial _{x_k}u +F(t,x,u), \end{aligned}$$
(2)

where the first-order part is symmetric hyperbolic and the second-order terms are symmetric and satisfy the non-strict Legendre condition, as detailed in Assumption 2.1 below. In contrast to the corresponding hyperbolic systems in which the second-order spatial derivatives are absent, when the second derivative terms are present but make no helpful contribution it is not possible in general to obtain a closed energy estimate for just spatial derivatives of solutions to (1) or (2). The problem is that the time derivative appears in the equations for spatial derivatives, and when that time derivative is eliminated by solving the original PDE for \(u_t\) and substituting the result for \(u_t\) wherever it appears on the right side of energy estimates for the spatial derivatives then the resulting expressions involve one more spatial derivative than is being estimated. The main results Theorems 2.7 and  3.1 of this paper say that under the hypotheses mentioned above a local in time closed energy estimate can be obtained when both space and time derivatives are are estimated together, where for (2) the norms involving time derivatives are weighted by appropriate powers of \(\varepsilon \) and the resulting estimate is uniform in \(\varepsilon \).

The difficulty in proving estimates for the solution and its derivatives when the parabolic terms provide no helpful contribution is that those parabolic terms involve one more derivative than is dealt with in classical estimates for hyperbolic systems. The key estimates (13)–(15) provide bounds for the parabolic terms for the case when more derivatives have been applied to a single factor of the solution than are being estimated. Although those estimates, like the key estimate [6, (3.4)–(3.5)] in the theory of degenerate elliptic systems, involve repeated integrations by parts, the two sets of estimates are otherwise different in every possible manner.

However, a problem remains even for the terms in which the maximum number of derivatives applied to any one factor of the solution is at most the number of derivatives being estimated, because the total number of derivatives applied to all factors of the solution remains larger than for the hyperbolic case. The standard proof of the commutator estimate [11, Lemma A.1] for the product of two functions used in hyperbolic estimates yields the slightly more general result

$$\begin{aligned}&\sum _{\sum _{\ell } \arrowvert \alpha _\ell \arrowvert \le s+1,\max _\ell \arrowvert \alpha _\ell \arrowvert \le s} \int \prod _{\ell } [D^{\alpha _{\ell }} w^{(\ell )}]^2\,dx\le c\prod _{\ell }\Vert w^{(\ell )}\Vert _{H^s}^2 \nonumber \\&\quad \text {provided}\ s\ge s_0+1, \end{aligned}$$
(3)

where \(D^{\alpha _\ell }\) is a spatial derivative of order \(\arrowvert \alpha _\ell \arrowvert \) and

$$\begin{aligned} s_0\mathrel {:=}\lfloor \tfrac{d}{2}\rfloor +1 \end{aligned}$$
(4)

is the Sobolev embedding index, i.e., the smallest integer such that \(H^{s_0}\subset L^\infty \). The requirement that s be at least \(s_0+1\) is therefore a standard requirement in the local existence theory for nonlinear symmetric hyperbolic systems. In order to deal with the parabolic terms and in light of the fact that we need to estimate time as well as spatial derivatives, we require a generalization of (3) to the case when the total number of derivatives is \(s+2\) rather than \(s+1\), and moreover some of those derivatives may be with respect to time, although the integral and norms still involve integration only over the spatial variables. The required calculus estimate, which more generally allows the total number of derivatives to be \(s+r\) provided that \(s\ge s_0+r\), is presented in Lemma A.1 in Appendix A. The estimate proven there is also used in [12]. Since \(s+2\) derivatives appear in the parabolic terms when s derivatives are taken of the PDE, the condition on s in Lemma A.1 requires that s be at least \(s_0+2\), which will therefore be assumed in both Theorems 2.7 and 3.1.

Since the presentation of the assumptions and the norms used in the theorems is somewhat lengthy, the full statements of the theorems on uniform bounds for (1) and for (2) are given in the same section as the corresponding proof. The uniform bound for (1) is proven in Sect. 2; for emphasis, the key estimates for the nonuniformly parabolic term are presented there in Lemma 2.3. The modifications needed to obtain uniform bounds for (2) using \(\varepsilon \)-weighted norms of time derivatives are shown in Sect. 3, both for the case when the first time derivative is uniformly bounded at time zero (known as the case of “well-prepared” initial data), and for the general case. Obtaining uniform estimates for (2) is more complicated than for purely hyperbolic systems having large constant-coefficient terms on account of the need to estimate time derivatives in order to obtain bounds for spatial derivatives. Moreover, in contrast to the purely hyperbolic case, the estimates here for well-prepared initial data are more complicated than those for the general case. The key estimate (14) for the parabolic terms is the source of the chief difficulty, since it makes the nonuniform second time derivative \(u_{tt}\) appear in the estimate for the first time derivative \(u_t\) that we want to show to be uniformly bounded. In order to prove that \(u_t\) nevertheless remains uniformly bounded for positive times it is necessary to use an intricate system of \(\varepsilon \)-dependent weights.

Once the energy estimates are obtained, local-in-time existence of solutions to the initial-value problem for those equations, for a time independent of \(\varepsilon \) in the case of (2), then follows by a mollification argument as in [13, Section 5.2, Section 7.1]. Furthermore, the local-in-time convergence of solutions of (2) to corresponding solutions of appropriate limit or profile equations as \(\varepsilon \rightarrow 0+\) then follows as in [14] since the convergence parts of the theorems there only require uniform estimates and times of existence plus an easy \(L^2\) estimate for the difference of sufficiently smooth solutions of the limit system. The statements of those theorems and some indications of the proofs are presented in Sect. 4. The full details of the existence and convergence proofs are omitted since they are minor variations of those in the references mentioned above.

In addition to the existence and convergence results just described, several additional results follow from the methods here. First, uniform estimates can be obtained for two variants of (1)–(2):

  1. 1.

    If the diagonal terms \((D^{j,k})_{\ell \ell }\) of the second-order coefficients depend only on t, x, and \(u_\ell \), the off-diagonal terms depend only on t and x, and all second-order coefficients \(D^{j,k}\) have one more bounded derivative than assumed below, then the second-order terms can be allowed to have the divergence form \(\sum _{j,k}\partial _{x_j}\!\left[ D^{j,k} \partial _{x_k}u\right] \). In particular, divergence form is allowed for scalar PDEs. The additional assumptions just mentioned are needed in order to eliminate via integration by parts the term in the Sobolev energy estimates in which \(\partial _{x_j}\) and all derivatives applied to the PDE are placed on the variable u appearing inside \(D^{j,k}\).

  2. 2.

    The solution u can be required to satisfy a linear constant-coefficient constraint \(Lu=0\), enforced by adding the adjoint Lagrange multiplier term \(L^*\phi \) to the equation. This situation is familiar from the incompressible Navier-Stokes equations.

For brevity, further details of the extension to such variant equations are omitted. Applying both extensions yields uniform estimates for the incompressible Navier-Stokes equations with eddy viscosity considered in [15], even when the variable-coefficient eddy viscosity coefficient is merely non-negative, since the incompressibility constraint can be used to write the viscosity terms in symmetric form.

Finally, the results here also make it possible to generalize geometric optics results of [16] to the case when the second-order terms are not uniformly parabolic, by using scaled norms \(\Vert u\Vert _{L^2}+\sum _{1\le \ell +m\le s}\varepsilon ^{\ell +m-1}\Vert \partial _t^m u\Vert _{H^{\ell }}\). The verification that the bound obtained for the scaled norms is independent of \(\varepsilon \) is similar to the calculations in [16, Section IV] or [17, Proof of Theorem 3.6].

2 Uniform bounds

Since \(\partial _{x_j}\partial _{x_k}=\partial _{x_k}\partial _{x_j}\) we can ensure without loss of generality that

$$\begin{aligned} D^{j,k}=D^{k,j} \end{aligned}$$
(5)

by replacing \(D^{j,k}\) with \(\tfrac{1}{2}(D^{j,k}+D^{k,j})\) if necessary. The following assumptions will be made on \(D^{j,k}\) and the other coefficient matrices appearing in (1) and (2).

Assumption 2.1

  1. 1.

    For every \(k<\infty \) there is a constant b(k) such that

    $$\begin{aligned} {\left\{ \begin{array}{ll} \Vert A^0(t,x,v)\Vert _{C^s([0,\infty )\times {\mathbb {R}}^d\times \{|v|\le k\})}\le b(k) &{}\text {for (1)} \\ \Vert A^0(v)\Vert _{C^s(\{|v|\le k\})}\le b(k) &{}\text {for (2)} \end{array}\right. } \end{aligned}$$
    (6:s)

    and

    $$\begin{aligned}&\Vert \{A^j(t,x,v)\}_{j=1}^d, \{D^{j,k}(t,x,v)\}_{j,k=1}^d\Vert _{C^s([0,\infty )\times {\mathbb {R}}^d\times \{|v|\le k\})}\\&\quad + \Vert F(t,x,v), {\textstyle \int _0^1\! \tfrac{\partial F}{\partial u}(t,x,r v)\,dr} \Vert _{C^s([0,\infty )\times {\mathbb {R}}^d\times \{|v|\le k\})} \le b(k), \end{aligned}$$
    (7:s)

    where s is a positive integer to be specified.

  2. 2.

    The matrices \(A^j\) for \(0\le j\le d\), the matrices \(C^j\) for \(1\le j\le d\), and the matrices \(D^{j,k}\) for \(1\le j,k\le d\) are symmetric, and (5) holds.

  3. 3.

    The matrix \(A^0\) is positive definite, i.e., for every \(k<\infty \) there is a \(\delta (k)>0\) such that

    $$\begin{aligned} {\left\{ \begin{array}{ll} w^T\!A^0(t,x,v)w\ge \delta (k) |w|^2 \quad \hbox { for}\ |v|\le k &{}\text {for (1)} \\ w^T\!A^0(v)w\ge \delta (k) |w|^2 \quad \hbox { for}\ |v|\le k &{}\text {for (2).} \end{array}\right. } \end{aligned}$$
    (8)
  4. 4.

    The matrices \(D^{j,k}\) satisfy the non-strict Legendre condition

    $$\begin{aligned}&\sum _{j,k=1}^d (w^{(j)})^{\!T}\!D^{j,k}(t,x,u) w^{(k)}\ge 0 \nonumber \\&\text {for all}\ (t,x,u)\ \text {and all sets of vectors}\ \{w^{(j)}\}_{j=1}^d. \end{aligned}$$
    (9)

Remark 2.2

The non-strict Legendre condition (9) in Assumption 2.1 ensures that

$$\begin{aligned} -\int _{{\mathbb {R}}^d} \sum _{j,k=1}^d(\partial _{x_j}v(t,x))^TD^{j,k}(t,x,u(t,x))\partial _{x_k}v(t,x)\le 0, \end{aligned}$$
(10)

which will be vital for the proof of Theorem 2.7 below. In contrast, if the \(D^{j,k}\) are not constant then the non-strict Legendre-Hadamard condition

$$\begin{aligned} \sum _{j,k=1}^d\xi _j\xi _k w^TD^{j,k}w\ge 0 \qquad \text {for all real numbers}\ \{\xi _j\}_{j=1}^d\ \text {and vectors}\ w \end{aligned}$$
(11)

only implies the the sharp Gårding inequality (e.g., [13, Section 0.7])

$$\begin{aligned} -\int _{{\mathbb {R}}^d}\sum _{j,k=1}^d (\partial _{x_j}v(t,x))^TD^{j,k}(t,x,u(t,x))\partial _{x_k}v(t,x)\le c\Vert v\Vert _{H^{1/2}}^2, \end{aligned}$$
(12)

which is not sufficient for the proof of Lemma 2.3 because there is no helpful contribution to compensate for the extra half derivative on the right side of (12).

The following lemma contains the key estimates involving the nonuniformly parabolic terms that will be used for the troublesome case when more derivatives are applied to the solution than are being estimated.

Lemma 2.3

Let the spatial domain be the torus \({\mathbb {T}}^d\) or the whole space \({\mathbb {R}}^d\). Suppose that Assumption 2.1 holds, with(6:s)–(7.s) holding for \(s\mathrel {:=}s_0+2\). In addition, let u and v be functions in \(\cap _{m=0}^2 C^m([0,T_u];H^{s-m})\) and \(\cap _{m=0}^1 C^m([0,T_u];H^{2-m})\), respectively. Then

  1. 1.
    $$\begin{aligned} \int \sum _{j,k}v^TD^{j,k}(t,x,u(t,x)))\partial _{x_j}\partial _{x_k} v \le b(\Vert u\Vert _{L^\infty })P(\Vert u\Vert _{H^{s_0+2}})\Vert v\Vert _{L^2}^2, \end{aligned}$$
    (13)

    where here and later P denotes a polynomial, which may be different in different occurrences.

  2. 2.
    $$\begin{aligned} \begin{aligned} 2&\int \sum _{j,k} (\partial _tv)^T (\partial _t D^{j,k}(t,x,u(t,x)))\partial _{x_j}\partial _{x_k}v \\&\le -\tfrac{d}{dt}\bigg [ \int \sum _{j,k} (\partial _{x_j}v)^T (\partial _t D^{j,k}(t,x,u(t,x)))\partial _{x_k}v\bigg ] \\&\quad + b(\Vert u\Vert _{L^\infty })P(\Vert u\Vert _{H^{s_0+1}})(1+\Vert u_t\Vert _{H^{s_0+1}})\Vert v_t\Vert _{L^2}\Vert v\Vert _{H^1} \\&\quad + b(\Vert u\Vert _{L^\infty })P(\Vert u\Vert _{H^{s_0}})(1+\Vert u_t\Vert _{H^{s_0}}^2+\Vert u_{tt}\Vert _{H^{s_0}})\Vert v\Vert _{H^1}^2, \end{aligned} \end{aligned}$$
    (14)

    where here and later \(\partial _t D^{j,k}(t,x,u(t,x))\) means

    $$\begin{aligned}{}[\partial _t(D^{j,k}(t,x,w)) +(u_t\cdot \nabla _w)(D^{j,k}(t,x,w))]{\big \arrowvert _{w=u(t,x)}} \end{aligned}$$

    and \(\partial _{x_\ell }D^{j,k}\) has an analogous meaning.

  3. 3.
    $$\begin{aligned}&\int \sum _{j,k} (\partial _{x_\ell }v)^T (\partial _{x_m} D^{j,k}(t,x,u(t,x)))\partial _{x_j}\partial _{x_k}v \nonumber \\&\quad \le b(\Vert u\Vert _{L^\infty })P(\Vert u\Vert _{H^{s_0+2}})\Vert v\Vert _{H^1}^2. \end{aligned}$$
    (15)

Proof

To estimate the left side of (13), the two derivatives \(\partial _{x_j}\) and \(\partial _{x_k}\) must both be transferred from v to \(D^{j,k}\). That can be accomplished by using integration by parts, the product rule for derivatives, the consequence (10) of the assumed Legendre condition (9), the assumed symmetry of the coefficient matrices \(D^{j,k}\), the integration-by-parts identity

$$\begin{aligned} abc'+a'bc=(abc)'-ab'c, \end{aligned}$$
(16)

and the facts that the integral over \({\mathbb {R}}^d\) or \({\mathbb {T}}^d\) of a spatial derivative vanishes and that \(H^{s_0+2}\) is an algebra. This yields,

$$\begin{aligned}&2\int \sum _{j,k}v^TD^{j,k}\partial _{x_j}\partial _{x_k} v = -2 \int \sum _{j,k}\partial _{x_j}\!\left[ v^TD^{j,k}\right] \! \partial _{x_k} v \nonumber \\&\quad =-2 \bigg \{\int \sum _{j,k}\left[ \partial _{x_j} v \right] ^T\!D^{j,k}\partial _{x_k} v +\int \sum _{j,k} v^T\!\left[ \partial _{x_j}D^{j,k}\right] \! \partial _{x_k} v \bigg \} \nonumber \\&\quad \le -2 \int \sum _{j,k}v^T\left[ \partial _{x_j}D^{j,k}\right] \partial _{x_k} v \nonumber \\&\quad =- \int \sum _{j,k} v^T\!\left[ \partial _{x_j}D^{j,k}\right] \!\partial _{x_k} v -\int \sum _{j,k} \left\{ \partial _{x_k} v^T\right\} \!\left[ \partial _{x_j}D^{j,k}\right] \! v \nonumber \\&\quad =- \sum _{j,k}\int \partial _{x_k} \left\{ v^T\left[ \partial _{x_j}D^{j,k}\right] v \right\} + \int v^T\bigg [ \sum _{j,k}\partial _{x_k}\partial _{x_j}D^{j,k}\bigg ] v \nonumber \\&\quad = \int v^T\bigg [ \sum _{j,k}\partial _{x_k}\partial _{x_j}D^{j,k}\bigg ] v \le \bigg \Vert \sum _{j,k}\partial _{x_k}\partial _{x_j}D^{j,k}\bigg \Vert _{L^\infty } \Vert v\Vert _{L^2}^2 \nonumber \\&\quad \le b(\Vert u\Vert _{L^\infty })P(\Vert u\Vert _{H^{s_0+2}})\Vert v\Vert _{L^2}^2, \end{aligned}$$
(17)

where here and later P denotes a polynomial, which may be different in different occurrences.

To estimate the left side of (14), integrate by parts, use the fact that \(H^{s}\) is an algebra for \(s\ge s_0\) to estimate the term in which two derivatives are applied to the \(D^{j,k}\), use the symmetry of the matrices \(D^{j,k}\) and the symmetry (5) of \(D^{j,k}\) under interchange of the indices jk and switch the labels j and k of the indices to write the term in which a time derivative is applied to \(\partial _{x_j}v\) as half the sum of terms in which a time derivative is applied once to \(\partial _{x_j}v\) and once to \(\partial _{x_k}v\), and use the identity (16) to obtain a term in which a time derivative is applied to an entire integral minus a term in which both time derivatives are applied to the coefficients \(D^{j,k}\). Upon estimating the last term in similar fashion to the previous term in which two derivatives were applied to \(D^{j,k}\), this yields

$$\begin{aligned} 2&\int \sum _{j,k} (\partial _tv)^T (\partial _t D^{j,k}(t,x,u(t,x)))\partial _{x_j}\partial _{x_k}v \\ {}&=-2 \int \sum _{j,k} (\partial _t\partial _{x_j}v)^T (\partial _t D^{j,k}(t,x,u(t,x)))\partial _{x_k}v \\ {}&\quad -2 \int \sum _{j,k} (\partial _tv)^T (\partial _{x_j}\partial _t D^{j,k}(t,x,u(t,x)))\partial _{x_k}v \\ {}&\le -2 \int \sum _{j,k} (\partial _t\partial _{x_j}v)^T (\partial _t D^{j,k}(t,x,u(t,x)))\partial _{x_k}v \\ {}&\quad + b(\Vert u\Vert _{L^\infty })P(\Vert u\Vert _{H^{s_0+1}})(1+\Vert u_t\Vert _{H^{s_0+1}})\Vert v_t\Vert _{L^2}\Vert v\Vert _{H^1} \\ {}&=- \int \sum _{j,k} (\partial _t\partial _{x_j}v)^T (\partial _t D^{j,k}(t,x,u(t,x)))\partial _{x_k}v \\ {}&\quad - \int \sum _{j,k} (\partial _{x_j}v)^T (\partial _t D^{j,k}(t,x,u(t,x)))\partial _t\partial _{x_k}v \\ {}&\quad + b(\Vert u\Vert _{L^\infty })P(\Vert u\Vert _{H^{s_0+1}})(1+\Vert u_t\Vert _{H^{s_0+1}})\Vert v_t\Vert _{L^2}\Vert v\Vert _{H^1} \\ {}&=-\tfrac{d}{dt}\int \sum _{j,k} (\partial _{x_j}v)^T (\partial _t D^{j,k}(t,x,u(t,x)))\partial _{x_k}v \\ {}&\quad +\int \sum _{j,k} (\partial _{x_j}v)^T (\partial _t^2 D^{j,k}(t,x,u(t,x)))\partial _{x_k}v \\ {}&\quad + b(\Vert u\Vert _{L^\infty })P(\Vert u\Vert _{H^{s_0+1}})(1+\Vert u_t\Vert _{H^{s_0+1}})\Vert v_t\Vert _{L^2}\Vert v\Vert _{H^1} \\ {}&\le -\tfrac{d}{dt}\int \sum _{j,k} (\partial _{x_j}v)^T (\partial _t D^{j,k}(t,x,u(t,x)))\partial _{x_k}v \\ {}&\quad + b(\Vert u\Vert _{L^\infty })P(\Vert u\Vert _{H^{s_0}})(1+\Vert u_t\Vert _{H^{s_0}}^2+\Vert u_{tt}\Vert _{H^{s_0}})\Vert v\Vert _{H^1}^2, \\ {}&\quad + b(\Vert u\Vert _{L^\infty })P(\Vert u\Vert _{H^{s_0+1}})(1+\Vert u_t\Vert _{H^{s_0+1}})\Vert v_t\Vert _{L^2}\Vert v\Vert _{H^1}. \end{aligned}$$

The process of estimating the left side of (15) is similar, but since spatial rather than time derivatives are involved the term \(-\int \partial _{x_\ell } \left[ (\partial _{x_j}v)^T (\partial _{x_m} D^{j,k}(t,x,u(t,x)))\partial _{x_k}v\right] \) that vanishes is obtained instead of the time derivative of an integral, and spatial derivatives rather than time derivatives are applied to v and u. Hence the estimate on the right side of (15) is obtained. \(\square \)

Before stating the theorem on uniform bounds for (1) it is necessary to define the expression that will satisfy a differential inequality and verify that it is the square of a norm.

Definition 2.4

For any nonnegative integer r and function \(u\in L^\infty \) define a weighted \(H^r\) norm by

$$\begin{aligned} \Vert v\Vert _{r,u}\mathrel {:=}\sqrt{\sum _{0\le \arrowvert \alpha \arrowvert \le r}\int _{{\mathbb {R}}^d} (D^\alpha v)^TA^0(t,x,u)(D^\alpha v)\,dx}, \end{aligned}$$
(18)

and for any \((u,u_t)\in L^\infty \) and any set \(W\mathrel {:=}\{w_m\}_{m=0}^s\) of positive constants, define

$$\begin{aligned} {\left| \left| \left| v \right| \right| \right| }_{s,u,W}\mathrel {:=}\sqrt{Q_{s,u,W}(v,v_t,\ldots , \partial _t^s v)}\,, \end{aligned}$$
(19)

where

$$\begin{aligned} \begin{aligned} Q_{s,u,W}(v^{(0)},\ldots , v^{(s)}) \mathrel {:=}\sum _{m=0}^{s-1} \Vert v^{(m)}\Vert _{s-1-m,u}^2 + \sum _{m=0}^s w_m \sum _{\arrowvert \alpha \arrowvert =s-m}\Vert D^\alpha v^{(m)}\Vert _{0,u}^2 \\+\sum _{m=0}^{s-1} (m+1)w_{m+1}\sum _{j,k=1}^d\sum _{\arrowvert \gamma \arrowvert = s-1-m} \int (\partial _{x_j}D^\gamma v^{(m)})^T [\partial _t D^{j,k}]\partial _{x_k}D^\gamma v^{(m)}. \end{aligned} \end{aligned}$$
(20)

Although the inclusion of the term involving \(\partial _t D^{j,k}\) in (20) may seem strange, that term will arise naturally in the proof of Theorem 2.7 on account of the use of estimate (14).

Lemma 2.5

Let s be a positive integer, and assume that the positivity condition (8) holds and the boundedness conditions (6:s)–(7.s) hold for that value of s. Assume in addition that

$$\begin{aligned} \Vert u\Vert _{L^\infty }\le M_1 \qquad \text {and}\qquad \Vert u_t\Vert _{L^\infty }\le M_2, \end{aligned}$$
(21)

and that \(W\mathrel {:=}\{w_m\}_{m=0}^s\) are positive constants satisfying

$$\begin{aligned}&\max _{0\le m\le s-1}\frac{(m+1)w_{m+1}}{w_m}\le \frac{\mu }{d^2(1+NM_2)b(M_1)} \nonumber \\&\text {for some}\ \mu \ \text {satisfying}\ 0<\mu <\delta (M_1), \end{aligned}$$
(22)

where N denotes the number of components of the vector u, d is the number of spatial variables, and \(\delta (\cdot )\) and \(b(\cdot )\) are defined in Assumption 2.1. Then

  1. 1.

    \(\Vert \cdot \Vert _{s,u}\) is a norm on \(H^s\) equivalent to the standard \(H^s\) norm; specifically,

    $$\begin{aligned} \delta (M_1)\Vert v\Vert _{H^s}\le \Vert v\Vert _{s,u}\le b(M_1)\Vert v\Vert _{H^s}. \end{aligned}$$
    (23)
  2. 2.

    \(\sqrt{Q_{s,u,W}(v^{(0)},\ldots , v^{(s)})}\) is a norm on \(\prod _{s=0}^m H^{s-m}\) equivalent to sum of the standard \(H^{s-m}\) norms of the components. Quantitatively,

    $$\begin{aligned} \begin{aligned} Q_{s,u,W}(v^{(0)},\ldots , v^{(s)})&\ge \delta (M_1) \sum _{m=0}^{s-1}\Vert v^{(m)}\Vert _{H^{s-1-m}}^2 +\delta (M_1) w_s \Vert v^{(s)}\Vert _{L^2}^2 \\ {}&\qquad +(\delta (M_1)-\mu )\sum _{m=0}^{s-1}w_m \sum _{\arrowvert \alpha \arrowvert =s-m}\Vert D^\alpha v^{(m)}\Vert _{L^2}^2 \end{aligned} \end{aligned}$$
    (24)

    and

    $$\begin{aligned} \begin{aligned} Q_{s,u,W}(v^{(0)},\ldots , v^{(s)})&\le b(M_1) \sum _{m=0}^{s-1}\Vert v^{(m)}\Vert _{H^{s-1-m}}^2+b(M_1) w_s\Vert v^{(s)}\Vert _{L^2}^2 \\ {}&\qquad +(b(M_1)+\mu )\sum _{m=0}^{s-1}w_m \sum _{\arrowvert \alpha \arrowvert =s-m}\Vert D^\alpha v^{(m)}\Vert _{L^2}^2. \end{aligned} \end{aligned}$$
    (25)

Proof

The bounds (23) follow directly from (21), (8), and (6:s). Straightforward estimation of the term involving \(D^{j,k}\) in (20) using (21) and (7.s) shows that

$$\begin{aligned} \begin{aligned}&\sum _{j,k=1}^d\sum _{\arrowvert \gamma \arrowvert = s-1-m}\left.{\&\#Xarrowvert;}\int (\partial _{x_j}D^\gamma v^{(m)})^T [\partial _t D^{j,k}]\partial _{x_k}D^\gamma v^{(m)}\right.{\&\#Xarrowvert;}\\&\quad \le d^2 \Vert \partial _t D^{j,k}\Vert _{L^\infty }\sum _{\arrowvert \alpha \arrowvert =s-m} \Vert D^\alpha v^{(m)}\Vert _{L^2}^2 \\&\quad \le d^2 (1+NM_2)b(M_1)\sum _{\arrowvert \alpha \arrowvert =s-m} \Vert D^\alpha v^{(m)}\Vert _{L^2}^2, \end{aligned} \end{aligned}$$
(26)

and (24)–(25) follow directly from (26), formula (20) for \(Q_{s,u,W}\), and the bounds (22), (8), and (6:s). \(\square \)

In order to simplify the statement of the theorem, it will be convenient to state and prove a lemma estimating \(Q_{u,s,W}(u,u_t,\ldots ,\partial _t^s u)\) in terms of \(\Vert u\Vert _{H^{2s}}\) when u is a solution of (1). This result is only used to bound the initial value of Q.

Lemma 2.6

Let the assumptions of Lemma 2.5 hold, and assume that \(s\ge s_0\), where \(s_0\) is defined in (4). Then there exists a continuous increasing function \(C_{\tiny init }\) such that any solution u of (1) satisfies

$$\begin{aligned} {\left| \left| \left| u \right| \right| \right| }_{u,s,W}^2\le C_{{\tiny init }}(\Vert u\Vert _{H^{2s}}). \end{aligned}$$
(27)

Proof

Since \(H^r\) is an algebra for \(r\ge s_0\), solving the PDE (1) for \(u_t\), and then repeatedly differentiating with respect to t, substituting into the result the formulas for lower-order time derivatives obtained at earlier stages, and estimating an appropriate norm of the result yields a bound \(\sum _{m=0}^s \Vert \partial _t^mu\Vert _{H^{s-m}}\le C(\Vert u\Vert _{H^{2s}})\), because each time derivative adds at most two spatial derivatives. Combining this with (25) yields (27). \(\square \)

The theorem on bounds for (1) includes two parts: an estimate for \(Q_{s,u,W}\) and an estimate for \(\Vert u\Vert _{s,u}^2\) when the equation satisfies additional conditions that make it possible to eliminate \(u_t\) from that estimate. In the latter case an estimate for \(u_t\) can then be obtained by solving the PDE for \(u_t\) and estimating the expression so obtained. Since that expression contains second derivatives of u, the Sobolev index of the bound for \(u_t\) for that case is two less than the index of the bound for u. Although one might expect that should hold for the first case as well, the expression \(Q_{s,u,W}\) actually includes only one less derivative of \(u_t\) than of u. The reason is that the estimate for derivatives of order s of u involves derivatives of order \(s-1\) of \(u_t\), so it is necessary to estimate \(u_t\) in a Sobolev space having index only one less that the index for u. As a consequence, in order to obtain an \(H^s\) estimate for u its initial data must belong to \(H^{2s}\), in accordance with the estimate (27).

Theorem 2.7

Let the spatial domain be the torus \({\mathbb {T}}^d\) or the whole space \({\mathbb {R}}^d\). Let s be an integer satisfying

$$\begin{aligned} s\ge s_0+2, \end{aligned}$$
(28)

where \(s_0\) is defined in (4). Suppose that Assumption 2.1 holds, with (6:s)–(7.s) holding for the given value of s, and that \(\sum _{m=0}^s \Vert \partial _t^m F(t,x,0)\Vert _{C^0([0,\infty );H^{s-m})}\) is finite.

  1. 1.

    Let \(m_1\), \(m_2\), and \(m_3\) be arbitrary positive constants, and let \(M_1\), \(M_2\), and \(M_3\) satisfy

    $$\begin{aligned} M_1>m_1,\qquad M_2>m_2,\qquad \text {and}\qquad M_3>C_{{\tiny init }}(m_3). \end{aligned}$$
    (29)

    Then there exist positive weights \(W\mathrel {:=}\{w_m\}_{m=0}^s\) satisfying (22), a continuously differentiable nondecreasing function G, and a positive time T such that every solution u of (1) that satisfies

    $$\begin{aligned} \Vert u(0,\cdot )\Vert _{L^\infty }\le m_1, \qquad \Vert u_t(0,\cdot )\Vert _{L^\infty }\le m_2, \qquad \text {and}\qquad \Vert u(0,x)\Vert _{H^{2s}}\le m_3, \end{aligned}$$
    (30)

    and belongs to \(\cap _{m=0}^s C^m([0,T_u];H^{s-m})\) also satisfies the estimates

    $$\begin{aligned} \Vert u\Vert _{L^\infty }\le M_1 \qquad \text {and}\qquad \Vert u_t\Vert _{L^\infty }\le M_2 \end{aligned}$$

    from (21) and the differential and integrated energy estimates

    $$\begin{aligned} \tfrac{d}{dt}{\left| \left| \left| u(t,\cdot ) \right| \right| \right| }_{s,u,W}^2\le G({\left| \left| \left| u(t,\cdot ) \right| \right| \right| }_{s,u,W}^2) \end{aligned}$$
    (31)

    and

    $$\begin{aligned} {\left| \left| \left| u \right| \right| \right| }_{s,u,W}^2\le U(t,C_{{\tiny init }}(m_3)) \le M_3 \end{aligned}$$
    (32)

    for \(0\le t\le \min (T,T_u)\), where \(U(t,U_0)\) is the unique solution of the ODE \(U'=G(U)\) satisfying \(U(0)=U_0\). The constant T, the weights W, and the function G are all independent of the particular solution u satisfying the above conditions.

  2. 2.

    Assume that in addition either the PDE (1) is scalar or the matrix \(A^0\) is independent of x and u. Let \(m_1\) and \(m_3\) be arbitrary constants, and let \(M_1\) and \(M_3\) satisfy \(M_1>m_1\), and \(M_3>b(M_1)^2m_3^2\). Then there exist a constant \(M_2\), a continuous nondecreasing function \({{\widetilde{G}}}\), and a positive time \({{\widetilde{T}}}\) such that any sufficiently smooth solution u of (1) that satisfies

    $$\begin{aligned} \Vert u(0,\cdot )\Vert _{L^\infty }\le m_1 \qquad \qquad \text {and}\qquad \Vert u(0,x)\Vert _{H^{s}}\le m_3 \end{aligned}$$
    (33)

    and belongs to \(\cap _{m=0}^s C^m([0,T_u];H^{s-m})\) also satisfies the estimates

    $$\begin{aligned} \Vert u\Vert _{L^\infty }\le M_1 \qquad \text {and}\qquad \Vert u_t\Vert _{L^\infty }\le M_2 \end{aligned}$$

    from (21) and the estimates

    $$\begin{aligned} \tfrac{d}{dt}\Vert u(t,\cdot )\Vert _{s,u}^2\le \widetilde{G}(\Vert u(t,\cdot )\Vert _{s,u}^2) \end{aligned}$$
    (34)

    and

    $$\begin{aligned} \Vert u\Vert _{s,u}^2\le \widetilde{U}(t,b(m_1)^2m_3^2) \le M_3 \end{aligned}$$
    (35)

    for \(0\le t\le \min ({{\widetilde{T}}},T_u)\), where \({{\widetilde{U}}}(t,U_0)\) is the unique solution of the ODE \({{\widetilde{U}}}'=\widetilde{G}({{\widetilde{U}}})\) satisfying \({{\widetilde{U}}}(0)=U_0\). The constants \(M_2\) and \({{\widetilde{T}}}\) and the function \({{\widetilde{G}}}\) are independent of the particular solution u satisfying the above conditions.

Proof

Let the assumptions of the first part hold. The standard method of mollifying u allows us to assume that as many derivatives of it as needed exist and belong to \(L^2\), which justifies the calculations below that involve placing more than s derivatives on u or use the fact that the integral of a spatial derivative of any expression vanishes even when that expression contains a derivative of order \(s+1\) of u.

The difference \(F(t,x,u)-F(t,x,0)\) can be written as \(\int _0^1 \partial _r F(t,x,r u)\,dr\), and the derivative inside the integral equals \(u\cdot \frac{\partial F}{\partial u}(t,x,r u)\), which yields the identity

$$\begin{aligned} F(t,x,u)=F(t,x,0)+H(t,x,u)u,\qquad \text {where}\ H(t,x,u)\mathrel {:=}\int _0^1 \left( \frac{\partial F}{\partial u}(t,x,r u)\right) ^T\,dr.\nonumber \\ \end{aligned}$$
(36)

Substitute (36) into (1) and take m time derivatives and a spatial derivative \(D^\alpha \) of the result, with \(0\le m+\arrowvert \alpha \arrowvert \le s\). Rewrite the expression so obtained as the sum of the terms in which all derivatives are applied to the derivatives of u already appearing in (1) plus commutator terms, multiply the result on the left by \(2(D^\alpha \partial _t^m u)^T\), integrate over the spatial domain, and use the symmetry of the \(A^j\) to obtain

$$\begin{aligned} 2\int v^TA(Dv)=\int v^TA(Dv)+\int (Dv)^TAv=\int D(v^TAv)-v^T(DA)v, \end{aligned}$$
(37)

where \(v=D^\alpha \partial _t^m u\), A is any of the matrices \(A^j\) for \(0\le j\le d\), and the operator D is \(\partial _t\) if \(j=0\) and is \(\partial _{x_j}\) if \(j>0\). After noting that \(\int \partial _{x_j}(v^TA^jv)=0\) by the periodicity of u and its derivatives or their decay at infinity, this yields

$$\begin{aligned} \begin{aligned}&\tfrac{d}{dt}\int (D^\alpha \partial _t^m u)^TA^0(D^\alpha \partial _t^m u) \\&\quad =\int (D^\alpha \partial _t^m u)^{\!T}\left[ \partial _t A^0+\sum _{j=1}^d \partial _{x_j}A_{j}\right] (D^\alpha \partial _t^mu) +2\int (D^\alpha \partial _t^m u)^T(D^\alpha \partial _t^mF(t,x,0)) \\&\qquad +2\int (D^\alpha \partial _t^m u)^T(D^\alpha \partial _t^m(Hu)) +2\int \sum _{j,k}(D^\alpha \partial _t^m u)^TD^{j,k}\partial _{x_j}\partial _{x_k}(D^\alpha \partial _t^m u) \\&\qquad -2\int (D^\alpha \partial _t^m u)^{\!T}\! \Big \{ [D^\alpha \partial _t^m,A^0]u_t+\sum _j [D^\alpha \partial _t^m,A^j]u_{x_j} -\sum _{j,k} [D^\alpha \partial _t^m,D^{j,k}]\partial _{x_j}\partial _{x_k}u \Big \}. \end{aligned} \end{aligned}$$
(38)

By the assumption on F(tx, 0), the Cauchy-Schwartz inequality and the elementary estimate \(2ab\le a^2+b^2\) can be used to obtain

$$\begin{aligned} 2\int (D^\alpha \partial _t^m u)^T D^\alpha \partial _t^m F(t,x,0)\le 2 \Vert \partial _t^mu\Vert _{H^{\arrowvert \alpha \arrowvert }}\Vert \partial _t^m F(t,x,0)\Vert _{H^{\arrowvert \alpha \arrowvert }} \le c+\Vert \partial _t^mu\Vert _{H^{\arrowvert \alpha \arrowvert }}^2. \end{aligned}$$
(39)

All the remaining terms on the right side of (38) have the form

$$\begin{aligned} c\int (D^\alpha \partial _t^m u_i) g(t,x,u)\prod _{\ell } D^{\alpha _\ell }\partial _t^{m_\ell } u_{i_{\ell }} \end{aligned}$$
(40)

where \(u_i\) is a component of u and g is some component of some derivative of a coefficient \(A^0\), \(A^j\), \(D^{j,k}\), or H. Because each term of the PDE (1) contains at most two derivatives, and at most s derivatives were applied to that equation,

$$\begin{aligned} \sum _\ell \arrowvert \alpha _\ell \arrowvert +m_\ell \le s+2. \end{aligned}$$
(41)

The terms in (40) that satisfy in addition

$$\begin{aligned} \arrowvert \alpha _\ell \arrowvert +m_{\ell }\le s\qquad \hbox { for all}\ \ell \end{aligned}$$
(42)

will be estimated using Lemma A.1. Since no derivatives are applied to the coefficients in the original PDE, the factors g in (40) involve at most derivatives of order s of the coefficients. Hence by the assumed bound (6:s)–(7.s) for the coefficients and the Cauchy-Schwartz inequality,

$$\begin{aligned} \begin{aligned}&\text {[sum of all terms on right side of (38) having form (40) with (42) holding]} \\&\quad \le \sum _{{\tiny \text {terms (40) satisfying (42)}}} c \Vert g\Vert _{L^\infty } \int \arrowvert D^\alpha \partial _t^mu_i\arrowvert \left.{\&\#Xarrowvert;}\prod _{\ell } [D^{\alpha _\ell }\partial _t^{m_\ell } u_{i_{\ell }}\right.{\&\#Xarrowvert;}\\&\quad \le \sum _{\{i_\ell \}, \{m_\ell \},\{\alpha _\ell \} \ \tiny {\text {satisfying }} (41),(42) } c\,b(\Vert u\Vert _{L^\infty }) \Vert D^\alpha \partial _t^mu\Vert _{L^2} \left\{ \int \prod _\ell [ D^{\alpha _\ell }\partial _t^{m_\ell } u_{i_{\ell }}]^2\right\} ^{1/2} \end{aligned} \end{aligned}$$
(43)

Since the conditions (41)–(42) hold and s satisfies (28), applying Lemma A.1 with \(r\mathrel {:=}2\) shows that for all the terms appearing in (43)

$$\begin{aligned} \left[ \int \prod _\ell [ D^{\alpha _\ell }\partial _t^{m_\ell } u_{i_{\ell }}]^2\right] ^{1/2} \le \prod _\ell \Vert \partial _t^{m_\ell }u\Vert _{H^{s-m_\ell }}. \end{aligned}$$
(44)

Substituting (44) back into (43) yields the estimate

$$\begin{aligned} \begin{aligned}&\text {[sum of all terms on right side of (38) having form (40) with (42) holding]} \\&\quad \le c\,b(\Vert u\Vert _{L^\infty }) \Vert \partial _t^m u\Vert _{H^{\arrowvert \alpha \arrowvert }} \sum _{\{m_\ell \} \tiny \text {satisfying } m_\ell \le s, \sum m_\ell \le s+1} \prod _\ell \Vert \partial _t^{m_\ell }u\Vert _{H^{s-m_\ell }}, \end{aligned} \end{aligned}$$
(45)

where the restriction \(\sum m_\ell \le s+1\) in the final sum comes from the fact that only one time derivative appears in the PDE and at most s were applied to that equation.

The remaining terms on the right side of (38) have at least \(s+1\) derivatives applied to some occurrence of u, and all such terms involve the viscosity matrix \(D^{j,k}\) since the presence of the commutators in the terms involving \(A^0\) and \(A^j\) prevents all \(s+1\) derivatives in those terms from being applied to one factor. To treat the term in which \(D^{j,k}\) appears undifferentiated, apply (13) with \(v\mathrel {:=}D^\alpha \partial _t^m u \), which yields

$$\begin{aligned} 2\int \sum _{j,k}(D^\alpha \partial _t^m u)^TD^{j,k}\partial _{x_j}\partial _{x_k} (D^\alpha \partial _t^m u)\le b(\Vert u\Vert _{L^\infty })P(\Vert u\Vert _{H^{s_0+2}})\Vert \partial _t^m u\Vert _{H^{s-m}}^2. \end{aligned}$$
(46)

The other term involving \(D^{j,k}\) is the commutator term, which has the form

$$\begin{aligned} 2\sum _{j,k} \sum _{\begin{array}{c} 0\le l\le m,0\le \beta _i\le \alpha _i\\ \ell +\arrowvert \beta \arrowvert \ge 1 \end{array}}{\textstyle \left( {\begin{array}{c}m\\ \ell \end{array}}\right) \left( {\begin{array}{c}\alpha \\ \beta \end{array}}\right) } \int (D^\alpha \partial _t^m u)^T (\partial _t^\ell D^\beta D^{j,k})\partial _t^{m-\ell }D^{\alpha -\beta }\partial _{x_j}\partial _{x_k}u, \end{aligned}$$
(47)

where \(\left( {\begin{array}{c}\alpha \\ \beta \end{array}}\right) \) is the product binomial coefficient \(\prod _{j=1}^d \left( {\begin{array}{c}\alpha _j\\ \beta _j\end{array}}\right) \). Since \(\ell +\arrowvert \beta \arrowvert \le m+\arrowvert \alpha \arrowvert \le s\), the terms for which \(m-\ell +\arrowvert \alpha \arrowvert -\arrowvert \beta \arrowvert +2\le s\) have already been estimated in (45), so only the cases for which

$$\begin{aligned} m+\arrowvert \alpha \arrowvert -[\ell +\arrowvert \beta \arrowvert ]\ge s-1 \end{aligned}$$
(48)

need be considered. Since \(m+\arrowvert \alpha \arrowvert \le s\) and \(\ell +\arrowvert \beta \arrowvert \ge 1\), (48) can only hold when \(m+\arrowvert \alpha \arrowvert =s\) and \(\ell +\arrowvert \beta \arrowvert =1\), i.e., when a total of s derivatives have been applied to the PDE and only one of those derivatives is applied to the coefficient \(D^{j,k}\). When the derivative applied to \(D^{j,k}\) is a spatial derivative, use (15) with \(v\mathrel {:=}D^{\alpha -\beta } \partial _t^m u\), which yields

$$\begin{aligned} \begin{aligned} 2&\sum _{j,k}\sum _{\begin{array}{c} 0\le \beta _i\le \alpha _i\\ \arrowvert \beta \arrowvert =1 \end{array}} {\textstyle \left( {\begin{array}{c}\alpha \\ \beta \end{array}}\right) } \int (D^{\alpha } \partial _t^m u)^T (D^\beta D^{j,k})\partial _{x_j}\partial _{x_k}D^{\alpha -\beta }\partial _t^m u \\ {}&\le b(\Vert u\Vert _{L^\infty }) P(\Vert u\Vert _{H^{s_0+2}})\Vert \partial _t^m u\Vert _{H^{\arrowvert \alpha \arrowvert -m}}^2. \end{aligned} \end{aligned}$$
(49)

When the derivative applied to \(D^{j,k}\) is a time derivative, use (14) with \(v\mathrel {:=}D^\alpha \partial _t^{m-1} u\), which yields

$$\begin{aligned} \begin{aligned} 2&m\sum _{j,k} \int (D^\alpha \partial _t^m u)^T (\partial _t D^{j,k})\partial _{x_j}\partial _{x_k}D^\alpha \partial _t^{m-1} u \\ {}&\le - m\sum _{j,k}\tfrac{d}{dt}\int (\partial _{x_j} D^\alpha \partial _t^{m-1} u)^T (\partial _t D^{j,k})\partial _{x_k}D^\alpha \partial _t^{m-1} u \\ {}&\quad +b(\Vert u\Vert _{L^\infty })P(\Vert u\Vert _{H^{s_0+1}}) (1+\Vert u_t\Vert _{H^{s_0+1}} )\Vert \partial _t^m u\Vert _{H^{\arrowvert \alpha \arrowvert }}\Vert \partial _t^{m-1}u\Vert _{H^{\arrowvert \alpha \arrowvert +1}} \\ {}&\quad + b(\Vert u\Vert _{L^\infty })P(\Vert u\Vert _{H^{s_0}})(1+\Vert u_t\Vert _{H^{s_0}}^2+\Vert u_{tt}\Vert _{H^{s_0}})\Vert \partial _t^{m-1} u\Vert _{H^{\arrowvert \alpha \arrowvert +1}}^2. \end{aligned} \end{aligned}$$
(50)

As noted above, only the case when \(\arrowvert \alpha \arrowvert +m=s\) needs to be estimated by (50).

Now pick \(\mu \) satisfying \(0<\mu <\delta (M_1)\). Then define \(w_0\mathrel {:=}1\) and successively choose the \(w_m\) for \(1\le w_m\le s\) satisfying (22) for the given values of \(M_1\), \(M_2\), and \(\mu \), so the conclusions of Lemma 2.5 will hold for as long as (21) holds. Multiply (38) by \(w_m\) if \(\arrowvert \alpha \arrowvert +m=s\) and by 1 otherwise, add the resulting equations, use the estimates (39), (45), (46), (49), and (50) to estimate the right side of the result, and move the time derivative terms arising from (50) to the left side. By the definition (19)–(20), the left side of the combined estimate is the time derivative of \({\left| \left| \left| u \right| \right| \right| }_{s,u,W}^2\). Hence, after replacing \(b(\Vert u\Vert _{L^\infty })\) by its upper bound \(b(M_1)\) and using the elementary estimate \(x\le 1+x^2\) to eliminate odd powers of norms, the combined estimate has the form

$$\begin{aligned} \tfrac{d}{dt}{\left| \left| \left| u \right| \right| \right| }_{s,u,W}^2 \le b(M_1)P(\textstyle {\sum _{m=0}^s} \Vert \partial _t^m u\Vert _{H^{s-m}}^2). \end{aligned}$$
(51)

The right side of (24) with \(v^{(m)}\) replaced by \(\partial _t^m u\) is bounded from above and below by constants times \(\sum _{m=0}^s \Vert \partial _t^mu\Vert _{H^{s-m}}^2\). Hence by Lemma 2.5 and the definition (19) of \({\left| \left| \left| u \right| \right| \right| }_{s,u,W}\), the right side of (51) can be bounded by a polynomial of \({\left| \left| \left| u \right| \right| \right| }_{s,u,\varepsilon ,W}^2\) as long as (21) holds. Once (21) has been shown to hold for some positive time this will yield (31).

By the comparison principle for ODEs, also known as the fence theorem [18, Theorem 4.7.1], as long as (21) holds then (31) together with (27) and (30) implies the first inequality in (32). In view of the last condition in (29), there exists a time \(T_3\) such that \(U(t,C_{{\tiny init }}(m_3))\le M_3\) for \(0\le t\le T_3\), which is the second inequality in (32). Moreover, the bounds

$$\begin{aligned} \Vert u\Vert _{L^\infty } \le m_1+t \sup _{[0,t]}\Vert u_t\Vert _{L^\infty } \quad \text {and}\quad \Vert u_t\Vert _{L^\infty }\le m_2+c t \sup _{[0,t]}{\left| \left| \left| u \right| \right| \right| }_{s,u,W} \end{aligned}$$
(52)

yield times \(T_1\mathrel {:=}\frac{M_1-m_1}{M_2}\) and \(T_2\mathrel {:=}\frac{M_2-m_2}{c \sqrt{M_3}}\) such that

$$\begin{aligned} \Vert u_t\Vert _{L^\infty }&\le M_2 \quad \text {on the time interval}\ 0\le t\le \min (T_2,T_u)\ \text {for as long as (32) holds,} \end{aligned}$$
(53)
$$\begin{aligned} \Vert u\Vert _{L^\infty }&\le M_1 \quad \text {on the time interval}\ 0\le t\le \min (T_1,T_u)\ \text {for as long as (53) holds,} \end{aligned}$$
(54)

where \(T_u\) is the time of existence of u defined in the statement of the theorem. Since the estimates (52) and (32) imply bounds smaller than \((M_1,M_2,M_3)\) for

$$\begin{aligned} (\Vert u\Vert _{L^\infty },\Vert u_t\Vert _{L^\infty }, {\left| \left| \left| u \right| \right| \right| }_{s,u,\varepsilon ,W}^2) \end{aligned}$$
(55)

for times less than the minimum of \(T\mathrel {:=}\min (T_1,T_2,T_3)\) and \(T_u\) as long as the bounds \((M_1,M_2,M_3)\) hold, the continuity of the expressions (55) implies that the bounds \((M_1,M_2,M_3)\), and hence also (31)–(32), remain valid on \([0,\min (T,T_u)]\).

Now assume that the additional hypothesis of the second part of the theorem holds. In this case we take only spatial derivatives of the PDE (1), so (38) is replaced by

$$\begin{aligned} \begin{aligned} \tfrac{d}{dt}&\int (D^\alpha u)^TA^0(D^\alpha u) \\ =&\int (D^\alpha u)^T\left[ \partial _t A^0+\sum _{j=1}^d \partial _{x_j}A_{j}\right] (D^\alpha u) +2\int (D^\alpha u)^T(D^\alpha F(t,x,0)) \\&+2\int (D^\alpha u)^T(D^\alpha (Hu)) +2\int (D^\alpha u)^TD^{j,k}\partial _{x_j}\partial _{x_k}(D^\alpha u) \\&-2\int (D^\alpha u)\cdot \left\{ [D^\alpha ,A^0]u_t+\sum _j [D^\alpha ,A^j]u_{x_j} -\sum _{j,k} [D^\alpha ,D^{j,k}]\partial _{x_j}\partial _{x_k}u \right\} . \end{aligned} \end{aligned}$$
(56)

The terms on the right side of (56) that do not contain the time derivative \(u_t\) can be estimated in the same way as was done for (38), which yields a bound \(C(\Vert u\Vert _{H^s})\) since no time derivatives are present. When \(A^0\) is independent of x and v then \([D^\alpha ,A^0]\) vanishes, so there is no term on the right of (56) that involves a time derivative of u. When the equation is scalar, then modify (56) by using the PDE (1) and the identity (36) to replace \([D^\alpha ,A^0]u_t\) in (56) with the equal expression

$$\begin{aligned}{}[D^\alpha ,A^0]\bigg \{ (A^0)^{-1}\bigg ( \sum _{j,k}D^{j,k}\partial _{x_j}\partial _{x_k}u+F(t,x,0)+H(t,x,u)u-\sum _j A^j u_{x_j}\bigg )\bigg \}. \end{aligned}$$
(57)

Since (57) is only used to eliminate \(u_t\) when the equation is scalar, the term \([D^\alpha A^0](A^0)^{-1}\partial _{x_j}D^{j,k}\) appearing in (57) is a \(1\times 1\) matrix, and so is automatically symmetric. Hence the term in the modified (56) in which that expression appears can be estimated in the same way that the term in (38) containing \([D^\alpha ,D^{j,k}]\partial _{x_j}\partial _{x_k}u\) was estimated. The terms in (56) arising from the remaining terms in (57) can also be estimated in the same manner as for the terms in (38). Since no time derivatives appear, summing the result over \(0\le \arrowvert \alpha \arrowvert \le s\) and proceeding in similar fashion to the general case yields (34) and (35). In particular, (52) is replaced by

$$\begin{aligned} \Vert u_t\Vert _{L^\infty }\le c \Vert u_t\Vert _{H^{s_0}}\le C(\Vert u\Vert _{H^{s_0+2}})\le C\left( \tfrac{\Vert u\Vert _{s,u}}{\delta (M_1)}\right) \le M_2\mathrel {:=}C\left( \frac{\sqrt{M_3}}{\delta (M_1)}\right) , \end{aligned}$$

where \(u_t\) has been estimated in terms of u in similar fashion to (27), and

$$\begin{aligned} \Vert u\Vert _{L^\infty }\le m_1+t\sup _{[0,t]}\Vert u_t\Vert _{L^\infty }\le m_1+ctM_2. \end{aligned}$$

\(\square \)

3 Uniform \(H^s\) bounds and weighted time-derivative bounds for singular limit equations

For the PDE (2) the norm (18) is naturally replaced by

$$\begin{aligned} \Vert v\Vert _{r,u,\varepsilon }\mathrel {:=}\sqrt{\sum _{0\le \arrowvert \alpha \arrowvert \le r}\int _{{\mathbb {R}}^d} (D^\alpha v)^TA^0(\varepsilon u)(D^\alpha u)\,dx}\,. \end{aligned}$$
(58)

In order to obtain uniform spatial bounds for solutions of (2), the expression Q whose time derivative is estimated must be modified to include powers of \(\varepsilon \) in the terms involving time derivatives. Just as for hyperbolic singular limits, for general initial data one factor of \(\varepsilon \) is applied to each time derivative, so \(\partial _t^mu\) is multiplied by \(\varepsilon ^m\). For hyperbolic singular limits the well-prepared case is treated by simply replacing that factor \(\varepsilon ^m\) by \(\varepsilon ^{\max (m-1,0)}\), but this does not quite work for the PDE (2), because the energy equation (38) for the case when \(m=1\) and \(\arrowvert \alpha \arrowvert =s-1\) would not be multiplied by a power of \(\varepsilon \) yet the term (50) of its estimate contains a term with a factor \(u_{tt}\) that needs to be multiplied by \(\varepsilon \) to be uniformly bounded. Hence the term \(\sum _{\arrowvert \alpha \arrowvert =s-1}\Vert D^\alpha u_t\Vert _{0,u,\varepsilon }^2\) in Q must be multiplied by \(\varepsilon \), which will necessitate a more careful treatment of some of the estimates. Thus, for general initial data (19)–(20) are replaced by

$$\begin{aligned} {\left| \left| \left| v \right| \right| \right| }_{s,u,\varepsilon ,W}\mathrel {:=}\sqrt{Q_{s,u,\varepsilon ,W}(v,v_t,\ldots , \partial _t^s v)}\,, \end{aligned}$$
(59)

where

$$\begin{aligned} \begin{aligned}&Q_{s,u,\varepsilon ,W}(v^{(0)},\ldots , v^{(s)}) \mathrel {:=}\sum _{m=0}^{s-1} \varepsilon ^{2m}\Vert v^{(m)}\Vert _{s-1-m,u,\varepsilon }^2 \\&\quad + \sum _{m=0}^s w_m\varepsilon ^{2m} \sum _{\arrowvert \alpha \arrowvert =s-m}\Vert D^\alpha v^{(m)}\Vert _{0,u,\varepsilon }^2 \\&\quad +\sum _{m=0}^{s-1} (m+1)w_{m+1}\varepsilon ^{2(m+1)}\sum _{j,k=1}^d\sum _{\begin{array}{c} \arrowvert \gamma \arrowvert =\\ s-1-m \end{array}} \int (\partial _{x_j}D^\gamma v^{(m)})^T [\partial _t D^{j,k}]\partial _{x_k}D^\gamma v^{(m)}, \end{aligned} \end{aligned}$$
(60)

while for well prepared initial data

$$\begin{aligned}&{\left| \left| \left| v \right| \right| \right| }_{s,u,\varepsilon ,W,{\tiny alt }}\mathrel {:=}\sqrt{Q_{s,u,\varepsilon ,W,{\tiny alt }}(v,v_t,\ldots , \partial _t^s v)}\,, \end{aligned}$$
(61)
$$\begin{aligned}&Q_{s,u,\varepsilon ,W,{\tiny alt }}(v^{(0)},\ldots , v^{(s)}) \mathrel {:=}\Vert v^{(0)}\Vert _{s-1,u,\varepsilon }^2+\sum _{m=1}^{s-1} \varepsilon ^{2(m-1)}\Vert v^{(m)}\Vert _{s-1-m,u,\varepsilon }^2\nonumber \\&\qquad + w_0 \sum _{\arrowvert \alpha \arrowvert =s}\Vert D^\alpha v^{(0)}\Vert _{0,u,\varepsilon }^2+\varepsilon w_1\sum _{\arrowvert \alpha \arrowvert =s-1}\Vert D^\alpha v^{(1)}\Vert _{0,u,\varepsilon }^2 \nonumber \\&\qquad +\sum _{m=2}^s w_m\varepsilon ^{2(m-1)} \sum _{\arrowvert \alpha \arrowvert =s-m}\Vert D^\alpha v^{(m)}\Vert _{0,u,\varepsilon }^2 \nonumber \\&\qquad +w_1\varepsilon \sum _{j,k=1}^d\sum _{\arrowvert \gamma \arrowvert = s-1} \int (\partial _{x_j}D^\gamma v)^T [\partial _t D^{j,k}]\partial _{x_k}D^\gamma v \nonumber \\&\qquad +\sum _{m=1}^{s-1} (m+1)w_{m+1}\varepsilon ^{2m} \sum _{j,k=1}^d\sum _{\begin{array}{c} \arrowvert \gamma \arrowvert =\\ s-1-m \end{array}} \int (\partial _{x_j}D^\gamma v^{(m)})^T [\partial _t D^{j,k}]\partial _{x_k}D^\gamma v^{(m)} \end{aligned}$$
(62)

will be used. Calculations similar to those in the proof of Lemma 2.5 show that if

$$\begin{aligned} \varepsilon \Vert u\Vert _{L^\infty }\le M_1 \end{aligned}$$
(63)

then (23) remains valid when \(\Vert \,\Vert _{s,u}\) is replaced by \(\Vert \,\Vert _{s,u,\varepsilon }\), and if in addition, for positive \((M_2, \varepsilon _0,\mu )\) satisfying \(\mu <\delta (M_1)\),

$$\begin{aligned} \varepsilon \Vert u_t\Vert _{L^\infty }\le M_2 \end{aligned}$$
(64)

and

$$\begin{aligned} \max _{0\le m\le s-1}\frac{(m+1)w_{m+1}}{w_m}\le \frac{\mu }{d^2\varepsilon _0(\varepsilon _0+NM_2)b(M_1)}, \end{aligned}$$
(65)

then (24)–(25) remain valid for \(0<\varepsilon \le \varepsilon _0\) when \(Q_{s,u,W}\) is replaced by \(Q_{s,u,\varepsilon ,W}\) and \(v^{(m)}\) for \(0\le m\le s\) is replaced by \(\varepsilon ^m v^{(m)}\) on the right sides of those inequalities. Similarly, if instead

$$\begin{aligned} \Vert u_t\Vert _{L^\infty }\le M_2 \end{aligned}$$
(66)

and

$$\begin{aligned}&\max _{0\le m\le s-1}\frac{\varepsilon _0^{p(m)}(m+1)w_{m+1}}{w_m}\le \frac{\mu }{d^2(1+NM_2)b(M_1)} \nonumber \\&\text { with}\ p(m)\mathrel {:=}{\left\{ \begin{array}{ll} 1&{}m\le 1\\ 2 &{}m\ge 2\end{array}\right. }, \end{aligned}$$
(67)

then (24)–(25) remain valid for \(0<\varepsilon \le \varepsilon _0\) when \(Q_{s,u,W}\) is replaced by \(Q_{s,u,\varepsilon ,W,\tiny \text { alt}}\) and powers of \(\varepsilon \) matching the corresponding powers on the first two lines of (62) are inserted into the right sides of those inequalities.

In similar fashion to (27), there exist continuous increasing functions \(C_{\tiny \text {init},\varepsilon }\), \(C_*\), and and \(C_{\tiny \text { init},\varepsilon ,\tiny \text {alt}}\) such that when Assumption 2.1, (63), (64), and (65) hold and \(s\ge s_0\) defined in (4) then solutions of (2) satisfy

$$\begin{aligned} {\left| \left| \left| v \right| \right| \right| }_{s,u,\varepsilon ,W}^2\le C_{\tiny \text {init},\varepsilon }(\Vert u\Vert _{H^{2s}}), \end{aligned}$$
(68)

while if Assumption 2.1, (63), (66), and (67) hold then solutions of (2) satisfy

$$\begin{aligned} \Vert u_t\Vert _{H^{2(s-1)}}\le C_*\bigg (\Vert \tfrac{1}{\varepsilon }\textstyle {\sum _j} C^j\partial _{x_j} u\Vert _{H^{2(s-1)}},\Vert u\Vert _{H^{2s}}\bigg ) \end{aligned}$$
(69)

and

$$\begin{aligned} {\left| \left| \left| v \right| \right| \right| }_{s,u,\varepsilon ,W,\tiny alt }^2\le C_{\tiny \text {init},\varepsilon ,\tiny \text {alt}}(\Vert u\Vert _{H^{2s}},\Vert u_t\Vert _{H^{2(s-1)}}). \end{aligned}$$
(70)

Like (27), (68) is obtained by solving the PDE for \(u_t\) and then taking time derivatives of the result and using the formulas obtained previously to express time derivatives in terms of u and its spatial derivatives. Similarly, (70) is obtained by refraining from substituting for \(u_t\) in terms of u but substituting for \(u_{tt}\) and higher time derivatives in terms of u and \(u_t\), while (69) is obtained by solving (2) for \(u_t\) and estimating the result while keeping the large term intact.

The theorem for (2) includes three parts: The first part is an estimate for \(Q_{s,u,\varepsilon ,W}\) when \(u_t\) is not assumed to be uniformly bounded at time zero, and the second is an estimate for \(Q_{s,u,\varepsilon ,W,\tiny \text {alt}}\) assuming well-prepared initial data. The final part of the theorem is an estimate for \(\Vert u\Vert _{s,u,\varepsilon }^2\) when the equation satisfies additional conditions that make it possible to eliminate \(u_t\) from that estimate. In that case, a uniform estimate for \(\varepsilon u_t\) can then be obtained by solving the PDE for \(u_t\) and estimating the expression so obtained. If the initial data is well prepared, then a uniform estimate for \(u_t\) without a power of \(\varepsilon \) can be obtained by differentiating the PDE with respect to t and using the result to estimate \(\Vert u_t\Vert _{s-2,u,\varepsilon }^2\) in similar fashion to the estimate for \(\Vert u\Vert _{s,u,\varepsilon }^2\).

Theorem 3.1

Let the spatial domain be the torus \({\mathbb {T}}^d\) or the whole space \({\mathbb {R}}^d\). Let s be an integer satisfying \(s\ge s_0+2\), where the Sobolev embedding index \(s_0\) is defined in (4). Suppose also that Assumption 2.1 holds, with (6:s)–(7.s) holding for the given value of s, and that \(\sum _{m=0}^s \Vert \partial _t^m F(t,x,0)\Vert _{C^0([0,\infty );H^{s-m})}\) is finite.

  1. 1.

    Let \(\varepsilon _0\), \(m_1\), and \(m_3\) be arbitrary positive constants, and let \(M_1\) and \(M_3\) satisfy \(M_1>m_1\) and \(M_3>C_{\tiny \text {init},\varepsilon }(m_3)\). Then there exist a positive \(M_2\), positive weights \(W\mathrel {:=}\{w_m\}_{m=0}^s\) satisfying (65), a continuously differentiable nondecreasing function G, and a positive time T such that every solution u of (2) with \(0<\varepsilon \le \varepsilon _0\) that satisfies

    $$\begin{aligned} \varepsilon \Vert u(0,\cdot )\Vert _{L^\infty }\le m_1 \qquad \text {and}\qquad \Vert u(0,x)\Vert _{H^{2s}}\le m_3 \end{aligned}$$
    (71)

    and belongs to \(\cap _{m=0}^s C^m([0,T_u];H^{s-m})\) also satisfies

    $$\begin{aligned} \varepsilon \Vert u\Vert _{L^\infty }\le M_1\qquad \text {and}\qquad \varepsilon \Vert u_t\Vert _{L^\infty }\le M_2 \end{aligned}$$

    from (63) to (64),

    $$\begin{aligned} \tfrac{d}{dt}{\left| \left| \left| u(t,\cdot ) \right| \right| \right| }_{s,u,\varepsilon ,W}^2\le G({\left| \left| \left| u(t,\cdot ) \right| \right| \right| }_{s,u,\varepsilon ,W}^2), \end{aligned}$$
    (72)

    and

    $$\begin{aligned} {\left| \left| \left| u \right| \right| \right| }_{s,u,\varepsilon ,W}^2\le U(t,C_{\tiny \text {init},\varepsilon }(m_3) ) \le M_3 \end{aligned}$$
    (73)

    for \(0\le t\le \min (T,T_u)\), where \(U(t,U_0)\) is the unique solution of the ODE \(U'=G(U)\) satisfying \(U(0)=U_0\). The constants \(M_2\) and T, the weights W, and the function G are all independent of \(\varepsilon \in (0,\varepsilon _0]\) and of the particular solution u satisfying the above conditions.

  2. 2.

    Let \(\varepsilon _0\), \(m_1\), \(m_3\), and \(m_4\) be arbitrary positive constants, and let \(M_1\) and \(M_3\) satisfy \(M_1>m_1\) and \(M_3>C_{\tiny \text {init},\varepsilon ,\tiny \text {alt}}(m_3,C_*(m_4,m_3))\). Then there exist a positive \(M_2\), positive weights \(W\mathrel {:=}\{w_m\}_{m=0}^s\) satisfying (67), a continuously differentiable nondecreasing function G, and a positive time T such that every solution u of (2) with \(0<\varepsilon \le \varepsilon _0\) that satisfies (71) and the well-preparedness condition

    $$\begin{aligned} \Vert \textstyle {\sum _j} C^j \partial _{x_j}u(0,x)\Vert _{H^{2(s-1)}}\le m_4\varepsilon \end{aligned}$$
    (74)

    and belongs to \(\cap _{m=0}^s C^m([0,T_u];H^{s-m})\) also satisfies

    $$\begin{aligned} \varepsilon \Vert u\Vert _{L^\infty }\le M_1\qquad {}\qquad \Vert u_t\Vert _{L^\infty }\le M_2 \end{aligned}$$

    from (63), 66),

    $$\begin{aligned} \tfrac{d}{dt}{\left| \left| \left| u(t,\cdot ) \right| \right| \right| }_{s,u,\varepsilon ,W}^2\le G({\left| \left| \left| u(t,\cdot ) \right| \right| \right| }_{s,u,\varepsilon ,W}^2), \end{aligned}$$
    (75)

    and

    $$\begin{aligned} {\left| \left| \left| u \right| \right| \right| }_{s,u,\varepsilon ,W}^2\le U(t,C_{\tiny \text {init},\varepsilon ,\tiny \text {alt}}(m_3,C_*(m_4,m_3)))\le M_3 \end{aligned}$$
    (76)

    for \(0\le t\le \min (T,T_u)\), where \(U(t,U_0)\) is the unique solution of the ODE \(U'=G(U)\) satisfying \(U(0)=U_0\). The constants \(M_2\) and T, the weights W, and the function G are all independent of \(\varepsilon \in (0,\varepsilon _0]\) and of the particular solution u satisfying the above conditions.

  3. 3.

    Assume that in addition either the PDE (2) is scalar or the matrix \(A^0\) is constant. Let \(m_1\) and \(m_3\) be arbitrary constants, and let \(M_1\) and \(M_3\) satisfy \(M_1>m_1\), and \(M_3>b(M_1)^2m_3^2\). Then there exist a constant \(M_2\), a continuous nondecreasing function \({{\widetilde{G}}}\), and a positive time \({{\widetilde{T}}}\) such that for any sufficiently smooth solution u of (2) that satisfies (71) and belongs to \(\cap _{m=0}^s C^m([0,T_u];H^{s-m})\), also satisfies

    $$\begin{aligned} \varepsilon \Vert u\Vert _{L^\infty }\le M_1\qquad \text {and}\qquad \varepsilon \Vert u_t\Vert _{L^\infty }\le M_2 \end{aligned}$$

    from (63) to (64),

    $$\begin{aligned} \tfrac{d}{dt}\Vert u(t,\cdot )\Vert _{s,u,\varepsilon }^2\le \widetilde{G}(\Vert u(t,\cdot )\Vert _{s,u,\varepsilon }^2), \end{aligned}$$
    (77)

    and

    $$\begin{aligned} \Vert u\Vert _{s,u,\varepsilon }^2\le \widetilde{U}(t,b(m_1)^2m_3^2) \le M_3 \end{aligned}$$
    (78)

    for \(0\le t\le \min ({{\widetilde{T}}},T_u)\), where \({{\widetilde{U}}}(t,U_0)\) is the unique solution of the ODE \({{\widetilde{U}}}'=\widetilde{G}({{\widetilde{U}}})\) satisfying \({{\widetilde{U}}}(0)=U_0\). The constants \(M_2\) and \({{\widetilde{T}}}\) and the function \({{\widetilde{G}}}\) are independent of the particular solution u satisfying the above conditions.

Proof

Assume that the conditions of the first part hold. The derivation of the estimate (72) for solutions of (2) proceeds in similar fashion to the derivation of estimate (31) for solutions of (1). Since the matrices \(C^j\) are symmetric and constant, (37) implies that the terms involving them drop out of the energy estimates, so (38) still holds. The terms on the right side of (38) are estimated in the same way as before. This time choose the \(w_m=w_m(M_2)\) to satisfy (65) for the given \(M_1\) and a chosen \(\mu \) as a function of \(M_2\) that will be determined later. Multiply (38) by \(w_m\varepsilon ^{2m}\) if \(\arrowvert \alpha \arrowvert +m=s\) and by \(\varepsilon ^{2m}\) otherwise. As before, add the resulting equations, use the estimates shown in the proof of Theorem 2.7 to estimate the right side of the result, and move the time derivatives to the left side. On account of the powers of \(\varepsilon \), this time the resulting left side is the time derivative of \({\left| \left| \left| u \right| \right| \right| }_{s,u,\varepsilon ,W}^2\). The key point is that the total number of time derivatives of u appearing in each term on the right side is less than or equal to the power of \(\varepsilon \) that term is multiplied by, because every term in (38) either had 2m time derivatives or had \(2m+1\) time derivatives multiplied by a derivative of \(A^0\), which yields a factor of \(\varepsilon \) thanks to the assumed form of that matrix, so the multiplication by \(\varepsilon ^{2m}\) made the powers of \(\varepsilon \) balance the number of time derivatives (some of which may be applied to the explicit time dependence of coefficients rather than to u). Hence instead of (51) we obtain, provided (63)–(64) hold,

$$\begin{aligned} \tfrac{d}{dt}{\left| \left| \left| u \right| \right| \right| }_{s,u,\varepsilon ,W}^2&\le b(M_1)P(\textstyle {\sum _{m=0}^s}\varepsilon ^{2m}\Vert \partial _t^m u\Vert _{H^{s-m}}^2) \\&\le b(M_1){{\widetilde{P}}}_{M_1,\mu }({\left| \left| \left| u \right| \right| \right| }_{s,u,\varepsilon ,W}^2)\mathrel {:=}G({\left| \left| \left| u \right| \right| \right| }_{s,u,\varepsilon ,W}^2), \end{aligned}$$

which is (72), from which (73) follows for some time \(T_3\). To define \(M_2\) so that (64) holds, note that

$$\begin{aligned} \varepsilon \Vert u_t\Vert _{L^\infty }\le c \varepsilon \Vert u_t\Vert _{H^{s_0}}\le \tfrac{c\varepsilon }{\delta (M_1)}\Vert u_t\Vert _{s-2,u}\le \tfrac{c}{\delta (M_1)}{\left| \left| \left| u \right| \right| \right| }_{s,u,\varepsilon ,W}\le \tfrac{cM_3}{\delta (M_1)} \end{aligned}$$

with the final constant c independent of the W and hence of \(M_2\), because the weights W do not appear in the terms of \({\left| \left| \left| u \right| \right| \right| }_{s,u,\varepsilon ,W}\) involving at most \(s-1\) space and time derivatives. Hence defining \(M_2\mathrel {:=}\tfrac{cM_3}{\delta (M_1)}\) ensures that (64) holds as long as (63) and (73) hold. Finally, since it is only necessary to bound \(\varepsilon \Vert u\Vert _{L^\infty }\), in similar fashion to the proof of Theorem 2.7,

$$\begin{aligned} \varepsilon \Vert u\Vert _{L^\infty }\le m_1+\varepsilon t\sup _{[0,t]}\Vert u_t\Vert _{L^\infty }\le m_1+ctM_2, \end{aligned}$$

so (63) holds for some positive time \(T_1\). Since that estimate and (73) show that \(\varepsilon \Vert u\Vert _{L^\infty }\) and \({\left| \left| \left| u \right| \right| \right| }_{u,s,\varepsilon ,W}^2\) satisfy smaller bounds for times, all the claimed results hold for the minimum of \(T\mathrel {:=}\min (T_1,T_3)\) and \(T_u\).

Now assume that the conditions of the second part hold. Proceed as for the first case, but multiply (38) by the factors of \(\varepsilon \) corresponding to those in definition (62). Since the terms were all estimated in the proof of Theorem 2.7, in order to obtain (75) it suffices to verify that the powers of \(\varepsilon \) balance appropriately. In the estimates (39), (46), and (49) the only expressions on the right sides involving time derivatives have the same number of time and space derivatives as appear on the left sides, so the powers of \(\varepsilon \) balance exactly. In (50) the factor of \(\Vert \partial _t^{m-1}u\Vert _{H^{\arrowvert \alpha \arrowvert +1}}^2\) on the right side requires at least one power less of \(\varepsilon \) than the equation was multiplied by, because the fact that the estimate is only used when \(\arrowvert \alpha \arrowvert +m=s\) implies that the equation was multiplied by \(\varepsilon \) when \(m=1\). Indeed, that factor is used in the terms involving spatial derivatives of order \(s-1\) of \(u_t\) precisely for this reason. This factor of \(\varepsilon \) balances the factor \(u_{tt}\) that is also present on the right side of (50), so that estimate is also \(\varepsilon \)-balanced. This leaves only (45), which is more complicated and will be separated into a number of cases. Note that the ratio of the power of \(\varepsilon \) to the number of time derivatives in the weights in \(Q_{s,u,\varepsilon ,W,\tiny \text {alt}}\) is nondecreasing as a function of the number of time derivatives, even when the weight of \(u_t\) is taken to be \(\tfrac{1}{2}\) as it is in the terms with \(s-1\) space derivatives. Hence splitting the time derivative \(\partial _t^m\) over several factors of u never increases the power of \(\varepsilon \) needed to balance the result. That implies that the powers of \(\varepsilon \) in the terms in (45) arising from the \(A^j\), H, or the \(D^{j,k}\) balance properly when either \(m\ne 1\) or \(m=1\) and \(\arrowvert \alpha \arrowvert =s-1\). The case when \(m=1\) and \(\arrowvert \alpha \arrowvert <s-1\) must be considered separately because the powers of \(\varepsilon \) only balance in that case if the number of spatial derivatives applied to \(u_t\) in the estimate is at most \(s-1\) so no power of \(\varepsilon \) is needed for balance. That indeed holds for the terms in (45) arising from the \(A^j\) or H since at most one derivative is added by the PDE and the derivatives are split over at least two factors. The terms in (45) arising from \(A^0\) have one more time derivative and one less spatial derivative than the terms arising from \(A^j\), but also have an extra power of \(\varepsilon \) since \(A^0\) is always differentiated, and that ensures that the terms balance since adding a time derivative to \(D^\gamma \partial _t^ku\) increases the required power of \(\varepsilon \) by at most one. This leaves to consider only the terms in (45) with \(m=1\) and \(\arrowvert \alpha \arrowvert \le s-2\) that arise from \(D^{j,k}\). For those terms, the product \(g\prod D^{\alpha _\ell }\partial _t^{m_l}u_{i_\ell }\) appearing in (40) comes from \(D^\alpha \partial _t(D^{j,k}\partial _{x_j}\partial _{x_k}u)\). Suppose that the time derivative is applied to \(D^{j,k}\). If \(\partial _t\) is applied to the explicit time dependence in \(D^{j,k}\) then there are no factors of \(u_t\) so no power of \(\varepsilon \) is needed. If \(\partial _t\) is applied to u in \(D^{j,k}\) then define \(v=\partial _{x_k}u\) and apply Lemma A.1 with \(s_*\mathrel {:=}s-1\) and \(r\mathrel {:=}1\) to obtain the bound \(\Vert u_t\Vert _{H^{s_*-1}}\Vert v\Vert _{H^{s_*}}\Vert u\Vert _{H^{s_*}}^{L-2}\), which equals \(\Vert u_t\Vert _{H^{s-2}}\Vert u\Vert _{H^s}\Vert u\Vert _{H^{s-1}}^{L-2}\) and so does not require a power of \(\varepsilon \) to balance it since the number of spatial derivatives applied to \(u_t\) is less that \(s-1\). The case when \(\partial _t\) is applied to \(\partial _{x_j}\partial _{x_k}u\) but either \(\arrowvert \alpha \arrowvert \le s-4\) or else \(\arrowvert \alpha \arrowvert =s-3\) and at least one spatial derivative is applied to \(D^{j,k}\) can be estimated similarly. The case when \(\arrowvert \alpha \arrowvert =s-2\) and all derivatives are applied to \(\partial _{x_j}\partial _{x_k}u\) is always estimated by (17) with \(v\mathrel {:=}D^\alpha \partial _t^m u\) rather than by (45), since \(D^\alpha \partial _{x_j}\partial _{x_k}\partial _tu\) has more than s derivatives applied to u. The case when \(\arrowvert \alpha \arrowvert =s-3\) and all derivatives are applied to \(\partial _{x_j}\partial _{x_k}u\) can also be estimated by (17) with \(v\mathrel {:=}D^\alpha \partial _t^m u\), and as already noted that estimate is \(\varepsilon \)-balanced. Finally, the case when \(\arrowvert \alpha \arrowvert =s-2\) and only one spatial derivative is applied to \(D^{j,k}\) can be estimated by (49), which is also \(\varepsilon \)-balanced. Since all terms have now been estimated in balanced fashion, choosing weights \(w_m(M_2)\) to satisfy (67), adding the estimates, and moving the time derivative term from (50) to the left side of the result yields (75) provided (63) and (66) hold. Those bounds can be shown similarly to the proof of the first part.

Finally, let the additional conditions of the final part hold. Since \(A^0\) in (2) depends on \(\varepsilon u\), and in the commutator term \([D^\alpha , A^0]u_t\) at least one derivative is applied to \(A^0\), that term contains a factor \(\varepsilon \). Solving (2) for \(u_t\) and multiplying the result by \(\varepsilon \) yields an expression that is bounded uniformly in \(\varepsilon \), so after substituting for \(\varepsilon u_t\) the resulting terms can be estimated as for (1), which yields the desired bounds. Also, solving the PDE (2) for \(u_t\) and using the bound (78) to estimate the result yields the required bound (64) for \(\varepsilon \Vert u_t\Vert _{L^\infty }\) after defining \(M_2\) appropriately. \(\square \)

4 Existence and convergence results

Theorem 4.1

Under the conditions of any part of Theorem 2.7 or Theorem 3.1, there exists a unique solution of the PDE (1) or (2), respectively, with a specified initial value \(u(0,x)=u_0(x)\), for at least a time T or \({{\widetilde{T}}}\) as determined in those theorems, satisfying the bounds specified there.

Idea of the proof

The solution to the initial value problem for (1) can be obtained as the limit as \(\delta \rightarrow 0+\) of the solution \(u^\delta \) of

$$\begin{aligned} \begin{aligned} A^0(t,x,J_\delta u^\delta ) (u^\delta )_t&+J_\delta \sum _{j=1}^d A^j(t,x,J_\delta u)(J_\delta u)_{x_j} \\ {}&=J_\delta \sum _{j,k=1}^d D^{j,k}(t,x,J_\delta u)\partial _{x_j}\partial _{x_k}(J_\delta u) +J_\delta \left[ \phi _\delta F(t,x,J_\delta u)\right] , \\ u^\delta (0,\cdot )=J_\delta u_0, \end{aligned} \end{aligned}$$

with an analogous formula holding for (2). Here \(u_0\) is the initial data for (1) or (2) that belongs to a Sobolev space with sufficiently high index as detailed in the assumptions of Theorem 2.7, \(J_\delta \) is a symmetric spatial mollification operator tending to the identity as \(\delta \rightarrow 0\), \(\phi _\delta \) is a smooth compactly-supported function equal to one for \(\delta |x|\le 1\) whose derivatives all tend uniformly to zero as \(\delta \rightarrow 0\), and the mollification parameter has been denoted \(\delta \) rather than \(\varepsilon \) as in [13, Section 5.2, Section 7.1] to avoid confusion with the singular perturbation parameter of (2). Since the mollified equations are ODEs in Hilbert spaces, the existence of solutions to the initial-value problem for them follows from the ODE existence theorem, for as long as the solutions remain finite in an appropriate norm.

The energy estimates derived in Theorems 2.7 and 3.1 remain valid for the mollified equations, since the spatial cutoff applied to the undifferentiated term ensures that it is rapidly decaying. The energy estimate for the mollified equations ensures that the solutions exist and are bounded in the appropriate norm for a time and with a bound independent of the mollification parameter \(\delta \) and, for (2), also independent of the small parameter \(\varepsilon \). As in [13, Section 5.2, Section 7.1], the convergence of \(u^\delta \) to the solution u of (1) or (2) satisfying the same bound for the same time then follows provided that sufficiently smooth solutions of (1) and (2) having specified initial values are unique.

As for symmetric hyperbolic systems [19, Theorem 2.1] and many other PDEs, in order to prove the uniqueness of sufficiently smooth solutions to the initial-value problem for (1) and (2) it suffices to obtain an \(L^2\) energy estimate

$$\begin{aligned} \tfrac{d}{dt}\int v^TA^0v \,dx\le C \int v^TA^0v\,dx \end{aligned}$$
(79)

for the difference \(v\mathrel {:=}u^{(1)}-u^{(2)}\) of two solutions having the same initial data, where C depends on a norm of \(u^{(1)}\) and \(u^{(2)}\) whose boundedness is known, since (79) plus the fact that \(\int v^TA^0v\) vanishes at time zero implies that \(v\equiv 0\). The fact that the hyperbolic terms in the PDEs yield estimates of the form (79) is standard, so it suffices to estimate the second-order terms in (1) or (2). The difference \(D^{j,k}(t,x,u^{(1)})\partial _{x_j}\partial _{x_k}u^{(1)}-D^{j,k}(t,x,u^{(2)})\partial _{x_j}\partial _{x_k}u^{(2)}\) can be written as the sum of the terms \(\left[ D^{j,k}(t,x,u^{(2)}+v)-D^:{j,k}(t,x,u^{(2)})\right] \partial _{x_j}\partial _{x_k}u^{(1)}\) and \(D^{(j,k}(t,x,u^{(2)})\partial _{x_j}\partial _{x_k}v\). Since the first term is O(|v|) and the estimates obtained for the \(u^{(j)}\) bound their \(C^2\) norms, the first term makes a contribution of the desired form to the \(L^2\) energy estimate. By (17) with u replaced by \(u^{(2)}\), the second term above also yields a contribution of the desired form. Hence (79) indeed holds. \(\square \)

Theorem 4.2

  1. 1.

    If part 2 of Theorem 3.1 holds then as \(\varepsilon \rightarrow 0\) the solution of (2) with initial data converging in \(H^s\) tends to the unique solution of the limit equation

    $$\begin{aligned} \begin{aligned} A^0(0) u_t+\sum _{j=1}^d A^j(t,x,u)u_{x_j}+\sum _{j=1}^d C^jV_{x_j}&=\sum _{j,k=1}^d D^{j,k}(t,x,u)\partial _{x_j}\partial _{x_k}u \\ {}&\qquad +F(t,x,u), \\ \sum _{j=1}^d C^ju_{x_j}&=0 \end{aligned} \end{aligned}$$
    (80)

    having the limit initial data, where V is the Lagrange multiplier variable that enforces the constraint in the second equation of (80).

  2. 2.

    Suppose that part 1 or part 3 of Theorem 3.1 holds, that the spatial domain is periodic, and that the initial data converges to \(u^0(x)\) in \(H^s\) as \(\varepsilon \rightarrow 0\). As in [14, Section 2], make the following definitions: First normalize the matrix \(A^0\) to satisfy \(A^0(0)=I\) by making the change of variables \(u\mapsto (A^0(0))^{-1/2}u\) and multiplying the resulting equation by \((A^0(0))^{-1/2}\). Let \({\mathcal {S}}(\tau )\) be the solution operator of the fast equation \(u_\tau +\sum _{j=1}^d C^j u_{x_j}=0\), which can be written in Fourier space as

    $$\begin{aligned} \widehat{[S(\tau )u]}(t,\tau ,k)=e^{-i\tau \sum _{j=1}^d k_j C^j}{{\widehat{u}}}(t,\tau ,k). \end{aligned}$$

    Define the averaging operator

    $$\begin{aligned}{}[{\mathbb {M}}(g)](t,x)\mathrel {:=}\lim _{\tau \rightarrow \infty }\frac{1}{\tau }\int _0^\tau g(t,\tau _1,x)\,d\tau _1 \end{aligned}$$

    and the projection operator

    $$\begin{aligned} {\mathcal {E}}\mathrel {:=}{\mathcal {S}}(\tau ){\mathbb {M}}(\mathcal S(-\tau )\cdot ), \end{aligned}$$

    which is the orthogonal projection with respect to the inner product \([u,v]\mathrel {:=}\lim _{\tau \rightarrow \infty }\frac{1}{\tau }\int _0^\tau u\cdot v\,d\tau _1\) onto terms \(f(t,\tau ,x)\) having the form \(\mathcal S(\tau )g(t,x)\). Then

    $$\begin{aligned} u(t,x)=v^0(t,\tau ,x)_{\arrowvert \tau =\frac{t}{\varepsilon }}+o(1)\qquad \text {in}\ H^{s-1}, \end{aligned}$$
    (81)

    where \(v^0\) is the unique solution of the limit profile equations

    $$\begin{aligned} v^0&=\mathcal Ev^0, v^0(0,\tau ,x)={\mathcal {S}}(\tau )u^0(x),\nonumber \\ v^0_t&+{\mathcal {E}}\bigg \{ (v^0\cdot A^0_u(0))v^0_\tau +\sum _{j=1}^d A^j(t,x,v^0)v^0_{x_j}\nonumber \\&\qquad \qquad -\sum _{j,k=1}^d D^{j,k}(t,x,v^0)\partial _{x_j}\partial _{x_k}v^0-F(t,x,v^0)\bigg \}=0. \end{aligned}$$
    (82)

The proof of the first part of Theorem 4.2 is essentially the same as the proof of [20, Theorem 2], and the proof of the second part is essentially the same as the proof of the convergence part of [14, Theorem 2.1], since that proof uses only the uniform bounds, some general results on averaging, and the fact that an \(L^2\) estimate for the difference of two solutions of the limit profile equations can be shown in similar fashion to the \(L^2\) estimate for the difference of two solutions of the original system.