1 Introduction and main result

In the paper [5] we proved the first existence result of quasi-periodic solutions for autonomous quasi-linear PDEs (also called “strongly nonlinear” in [23]), in particular of small amplitude quasi-periodic solutions of the KdV equation subject to a Hamiltonian quasi-linear perturbation. The approach developed in [5] (see also [4]) is of wide applicability for quasi-linear PDEs in one space dimension. In this paper we take the opportunity to explain the general strategy of [5] applied to a model which is slightly simpler than KdV.

We consider the cubic, focusing or defocusing, mKdV equation

$$\begin{aligned} u_t + u_{xxx} + \varsigma \, \partial _x(u^3) + {\mathcal N}_4 (x, u, u_x, u_{xx}, u_{xxx}) = 0, \quad \varsigma = \pm 1, \end{aligned}$$
(1.1)

under periodic boundary conditions \( x \in \mathbb T:= \mathbb R/ 2 \pi \mathbb Z\), where

$$\begin{aligned} {\mathcal N}_4 (x, u, u_x, u_{xx}, u_{xxx}) := - \partial _x [ (\partial _u f)(x, u,u_x) - \partial _{x} ((\partial _{u_x} f)(x, u,u_x)) ] \end{aligned}$$
(1.2)

is the most general quasi-linear Hamiltonian (local) nonlinearity. Note that \( \mathcal{N}_4 \) contains as many derivatives as the linear vector field \( \partial _{xxx} \). It is a quasi-linear perturbation because \( \mathcal{N}_4 \) depends linearly on the highest derivative \( u_{xxx} \) multiplied by a coefficient which is a nonlinear function of the lower order derivatives \( u, u_x, u_{xx} \). The Eq. (1.1) is the Hamiltonian PDE

$$\begin{aligned} u_t = X_H (u) , \quad X_H (u) := \partial _x \nabla H (u) , \end{aligned}$$
(1.3)

where \( \nabla H \) denotes the \(L^2(\mathbb T_x)\) gradient of the Hamiltonian

$$\begin{aligned} H(u) = \frac{1}{2} \int _\mathbb Tu_x^2 \, dx - \frac{\varsigma }{4} \int _\mathbb Tu^4 \, dx + \int _\mathbb Tf(x, u,u_x) \, dx \end{aligned}$$
(1.4)

on the real phase space

$$\begin{aligned} H^1_0 (\mathbb T_x) := \left\{ u(x ) \in H^1 (\mathbb T, \mathbb R) :\int _{\mathbb T} u(x) \, dx = 0 \right\} \end{aligned}$$
(1.5)

endowed with the non-degenerate symplectic form

$$\begin{aligned} \Omega (u, v) := \int _{\mathbb T} (\partial _x^{-1 } u) \, v \, dx , \quad \forall u, v \in H^1_0 (\mathbb T_x) , \end{aligned}$$
(1.6)

where \( \partial _x^{-1} u \) is the periodic primitive of u with zero average. The phase space \( H^1_0 (\mathbb T_x) \) is invariant for the evolution of (1.1) because the integral \( \int _{\mathbb T} u(x) \, dx \) is a prime integral (the mass). For simplicity we fix its value to \( \int _{\mathbb T} u(x) \, dx = 0 \). We recall that the Poisson bracket between two functions F , \( G :H^1_0(\mathbb T_x) \rightarrow \mathbb R\) is defined as

$$\begin{aligned} \{ F, G \}(u) := \Omega (X_F(u), X_G(u) ) = \int _{\mathbb T} \nabla F(u) \partial _x \nabla G (u) dx . \end{aligned}$$
(1.7)

We assume that the “Hamiltonian density” f is of class \(C^q (\mathbb T\times \mathbb R\times \mathbb R; \mathbb R) \) for some q large enough (otherwise, as it is well known, we cannot expect the existence of smooth invariant KAM tori). We also assume that f vanishes at \( u = u_x = 0\) and

$$\begin{aligned} |f(x,u,v)| \le C (|u| + |v|)^5 \quad \forall (u,v) \in \mathbb R^2 , \quad |u|+|v| \le 1. \end{aligned}$$
(1.8)

As a consequence the nonlinearity \( {\mathcal N}_4 \) vanishes at order 4 at \( u = 0 \) and (1.1) may be seen, close to the origin, as a “small” perturbation of the cubic mKdV equation

$$\begin{aligned} u_t + u_{xxx} + 3 \varsigma u^2 u_x = 0 . \end{aligned}$$
(1.9)

Such equation is known to be completely integrable. Actually it is mapped into KdV by a Miura transform, and it may be described by global analytic action-angle variables, as it was proved by Kappeler and Topalov [19]. We also remark that, among the generalized KdV equations \( u_t + u_{xxx} \pm \partial _x (u^p) = 0\), \( p \in \mathbb N\), the only known completely integrable ones are the KdV \( p=2\) and the cubic mKdV \( p = 3 \).

It is a natural question to know whether the periodic, quasi-periodic or almost periodic solutions of (1.9) persist under small perturbations. This is the content of KAM theory. It is a difficult problem because of small divisors resonance phenomena, which are especially strong in presence of quasi-linear perturbations like \(\mathcal{N}_4\).

In this paper (as well as in [5]) we restrict the analysis to the search of small amplitude quasi-periodic solutions. It is also a very interesting question to investigate possible extensions of this result to perturbations of finite gap solutions. A difficulty which arises in the search of small amplitude solutions is that the mKdV Eq. (1.1) is a completely resonant PDE at \( u = 0 \), namely the linearized equation at the origin is the linear Airy equation

$$\begin{aligned} u_t + u_{xxx} = 0 \end{aligned}$$

which possesses only the \( 2 \pi \)-periodic in time, real solutions

$$\begin{aligned} u(t,x) = \sum _{j \in \mathbb Z{\setminus } \{0\} } u_j e^{{\mathrm i}j^3 t} e^{{\mathrm i}jx } , \quad u_{- j} = {\bar{u}}_j . \end{aligned}$$
(1.10)

Thus the existence of small amplitude quasi-periodic solutions of (1.1) is entirely due to the nonlinearity. Indeed, the nonlinear term \(\varsigma \partial _x (u^3)\) is the one that produces the main modulation of the frequency vector of the solution with respect to its amplitude (the well-known frequency-to-action map, or frequency-amplitude relation, or “twist”, see (4.10)) and that allows to “tune” the action parameters \(\xi \) so that the frequencies becomes rationally independent and diophantine. Note that the mKdV Eq. (1.1) does not depend on other external parameters which may influence the frequencies. This is a further difficulty in the study of autonomous PDEs with respect to the forced cases studied in [3]. Actually, in [3] we considered non-autonomous quasi-linear (and fully nonlinear) perturbations of the Airy equation and we used the forcing frequencies as independent parameters.

The core of the matter is to understand the perturbative effect of the quasi-linear term \( \mathcal{N}_4 \) over infinite times. By (1.8), close to the origin, the quartic term \( \mathcal{N}_4 \) is smaller than the pure cubic mKdV (1.9). Therefore, when we restrict the equation to finitely many space-Fourier indices \( |j| \le C \), we essentially enter in the range of applicability of finite dimensional KAM theory close to an elliptic equilibrium. The new problem is to understand what happens to the dynamics on the high frequencies \( |j| \rightarrow + \infty \), since \( \mathcal{N}_4 \) is a nonlinear differential operator of the same order (i.e. 3) as the constant coefficient linear (and integrable) vector field \(\partial _{xxx}\).

Does such a strongly nonlinear perturbation give rise to the formation of singularities for a solution in finite time, as it happens for the quasi-linear wave equations considered by Lax [25] and Klainerman and Majda [20]? Or, on the contrary, does the KAM phenomenon persist nevertheless for the mKdV Eq. (1.1)? The answer to these questions has been controversial for several years. For example, Kappeler and Pöschel [18, Remark 3, p. 19] wrote: “It would be interesting to obtain perturbation results which also include terms of higher order, at least in the region where the KdV approximation is valid. However, results of this type are still out of reach, if true at all”.

We think that these are very important dynamical questions to be investigated, especially because many of the equations arising in physics are quasi-linear or even fully nonlinear.

The main result of this paper proves that the KAM phenomenon actually persists, at least close to the origin, for quasi-linear Hamiltonian perturbations of mKdV (the same result is proved in [5] for KdV). More precisely, Theorem 1 proves the existence of Cantor families of small amplitude, linearly stable, quasi-periodic solutions of the mKdV Eq. (1.1) subject to quasi-linear Hamiltonian perturbations. It is not surprising that the same result applies for both the focusing and the defocusing mKdV because we are looking for small amplitude solutions. Thus the different sign \( \varsigma = \pm 1 \) only affects the branch of the bifurcation.

From a dynamical point of view, note that the parameters \(\xi \) selected by the KAM Theorem 1 give rise to solutions of (1.1) and (1.2) which are global in time. This is interesting information because, as far as we know, there are no results of global or even local solutions of the Cauchy problem for (1.1) and (1.2), and such PDEs are in general believed to be ill-posed in Sobolev spaces (for a rough result of local well-posedness for (1.1) and (1.2) see [6]).

The iterative procedure we are going to present is able to select many parameters \(\xi \) which give rise to quasi-periodic solutions (hence defined for all times). This procedure works for parameters belonging to a finite dimensional Cantor like set which becomes asymptotically dense at the origin.

How can this kind of result be achieved? The proof of Theorem 1—which we shall discuss in more detail later—is based on an iterative Nash–Moser scheme. As it is well known, the main step of this procedure is to invert the linearized operators obtained at each step of the iteration and to prove that the inverse operators, albeit they lose derivatives (because of small divisors), satisfy tame estimates in high Sobolev norms. The linearized equations are non-autonomous linear PDEs which depend quasi-periodically on time. The key point of this paper (and [5]) is that, using the symplectic decoupling of [10], some techniques of pseudo-differential operators adapted to the symplectic structure, and a linear Birkhoff normal form analysis, we are able to construct, for most diophantine frequencies, a time dependent (quasi-periodic) change of variables which conjugates each linearized equation into another one that is diagonal and has constant coefficients, that is, in “normal form”. This means that, in the new coordinates, we have integrated the equations. Then we easily invert the linearized operator (recall that the inverse loses derivatives because of small divisors) and we conjugate it back to solve the linear equation in the original set of variables. We remark that these quasi-periodic Floquet changes of variable map Sobolev spaces of arbitrarily high norms into itself and satisfy tame estimates. Hence the inverse operator also loses derivatives, but it satisfies tame estimates as well.

In the dynamical systems literature, this strategy is called “reducibility” of the equation and it is a quasi-periodic KAM perturbative extension of Floquet theory (Floquet theory deals with periodic solutions of finite dimensional systems). The difficulty to make it work in the present setting is due to the quasi-linear character of the nonlinearity in (1.1).

Before stating precisely our main result we shortly present some related literature. In the last years a big interest has been devoted to understand the effect of derivatives in the nonlinearity in KAM theory. For unbounded perturbations the first KAM results have been proved by Kuksin [22] and Kappeler and Pöschel [18] for KdV (see also Bourgain [13]), and more recently by Liu and Yuan [26], Zhang et al. [30] for derivative NLS, and by Berti et al. [7, 8] for derivative NLW. For a recent survey of known results for KdV, we refer to [15]. Actually all these results still concern semi-linear perturbations.

The KAM theorems in [18, 22] prove the persistence of the finite-gap solutions of the integrable KdV under semilinear Hamiltonian perturbations \( \varepsilon \partial _{x} (\partial _u f) (x, u) \), namely when the density f is independent of \( u_x \), so that (1.2) is a differential operator of order 1 . The key idea in [22] is to exploit the fact that the frequencies of KdV grow as \( \sim j^3 \) and the difference \( |j^3 - i^3| \ge \frac{1}{2} (j^2 + i^2) \), \(i \ne j \), so that KdV gains (outside the diagonal) two derivatives. This approach also works for Hamiltonian pseudo-differential perturbations of order 2 (in space), using the improved Kuksin’s lemma proved by Liu and Yuan [26]. However it does not work for the general quasi-linear perturbation in (1.2), which is a nonlinear differential operator of the same order as the constant coefficient linear operator \( \partial _{xxx}\).

Now we state precisely the main result of the paper. The solutions we find are, at the first order of amplitude, localized in Fourier space on finitely many “tangential sites”

$$\begin{aligned} S^+ := \{ \bar{\jmath }_1, \ldots , \bar{\jmath }_\nu \}, \quad S := \{ \pm j : j \in S^+ \}, \quad {\bar{\jmath }}_i \in \mathbb N{\setminus } \{0\} \quad \forall i =1, \ldots , \nu . \end{aligned}$$
(1.11)

The set S is required to be even because the solutions u of (1.1) have to be real valued. Moreover, we also assume the following explicit hypothesis on S:

$$\begin{aligned} \frac{2}{2\nu -1} \, \sum _{i=1}^\nu \bar{\jmath }_i^{\,2} \, \notin \, \{ j^2 + kj + k^2 :j,k \in \mathbb Z{\setminus } S, \ \, j \ne k \}. \end{aligned}$$
(1.12)

Assumption (1.12) is a “non-degeneracy” condition. We assume it to prove that the Cantor-like set of amplitudes \( \xi \in \mathbb R^\nu _+ \) for which the quasi-periodic solution (1.13) exists has positive measure, see Lemmata 28, 29 and Remark 13.

Theorem 1

(KAM for quasi-linear perturbations of mKdV) Given \( \nu \in \mathbb N\), let \( f \in C^q \) [with \( q := q(\nu ) \) large enough] satisfy (1.8). Then, for all the tangential sites S as in (1.11) satisfying (1.12), the mKdV Eq. (1.1) possesses small amplitude quasi-periodic solutions with diophantine frequency vector \(\omega := \omega (\xi ) = (\omega _j)_{j \in S^+} \in \mathbb R^\nu \) of the form

$$\begin{aligned} u(t,x) = \sum _{j \in S^+} 2 \sqrt{\xi _j} \, \cos ( \omega _j t + j x) + o( \sqrt{|\xi |} ), \end{aligned}$$
(1.13)

where

$$\begin{aligned} \omega _j := j^3 + 3 \varsigma \left[ \xi _j - 2 \left( \sum _{j' \in S^+} \xi _{j'} \right) \right] j, \quad j \in S^+, \end{aligned}$$
(1.14)

for a “Cantor-like” set of small amplitudes \( \xi \in \mathbb R^\nu _+ \) with density 1 at \( \xi = 0 \). The term \(o(\sqrt{|\xi |})\) in (1.13) is a function \(u_1(t,x) = \tilde{u}_1(\omega t, x)\), with \(\tilde{u}_1\) in the Sobolev space \(H^s(\mathbb T^{\nu +1},\mathbb R)\) of periodic functions, and Sobolev norm \(\Vert \tilde{u}_1 \Vert _s = o(\sqrt{|\xi |})\) as \(\xi \rightarrow 0\), for some \(s < q\). These quasi-periodic solutions are linearly stable.

If the density \( f(u, u_x) \) is independent on x , a similar result holds for all the choices of the tangential sites, without assuming (1.12).

This result is deduced from Theorem 2. It was announced also in [4, 5] under the stronger condition on the tangential sites

$$\begin{aligned} \frac{2}{2\nu -1} \, \sum _{i=1}^\nu \bar{\jmath }_i^{\,2} \, \notin \mathbb Z. \end{aligned}$$
(1.15)

Let us make some comments.

  1. 1.

    In the case \(\nu = 1\) (time-periodic solutions), the condition (1.12) is always satisfied. Indeed, suppose, by contradiction, that there exist integers \(\bar{\jmath }_1 \ge 1\), \(j,k \in \mathbb Z\) such that

    $$\begin{aligned} 2 \bar{\jmath }_1^{\,2} = j^2 + jk + k^2. \end{aligned}$$
    (1.16)

    Then \(j^2 + jk + k^2\) is even, and therefore both j and k are even, say \(j = 2n\), \(k = 2m\) with \(n,m \in \mathbb Z\). Hence \(2 \bar{\jmath }_1^{\,2} = 4(n^2 + nm + m^2)\), and this implies that \(\bar{\jmath }_1\) is even, say \(\bar{\jmath }_1 = 2p\) for some positive integer p. It follows that \(2 p^2 = n^2 + nm + m^2\), namely pnm satisfy (1.16). Then, iterating the argument, we deduce that \(\bar{\jmath }_1\) can be divided by 2 infinitely many times in \(\mathbb N\), which is impossible.

  2. 2.

    When the density \( f(u, u_x )\) is independent of x , the \(L^2\)-norm

    $$\begin{aligned} M(u) := \int _\mathbb Tu^2 \, dx = \Vert u \Vert _{L^2(\mathbb T)}^2 \end{aligned}$$
    (1.17)

    is a prime integral of the Hamiltonian Eq. (1.1). Hence the solutions of (1.1) are in one-to-one correspondence with those of the Hamiltonian equation

    $$\begin{aligned} v_t = \partial _x \nabla K(v) \ \quad \text {with } K := H + \lambda M^2 , \quad \lambda \in \mathbb R. \end{aligned}$$
    (1.18)

    More precisely, if u(tx) is a solution of (1.1), then \(v(t,x) := u(t, x-ct)\), with \(c := -4\lambda M(u)\), is a solution of (1.18). Vice versa, if v(tx) solves (1.18), then the function \(u(t,x) := v(t, x+ct)\), with \(c := -4\lambda M(v)\), is a solution of (1.1) (M(v) is also a prime integral of the Eq. (1.18)). The advantage of looking for quasi-periodic solutions of (1.18) is that, for \( \lambda = 3\varsigma /4 \), the fourth order Birkhoff normal form of K is diagonal (Remark 1) and therefore no conditions on the tangential sites S are required (Remark 13).

  3. 3.

    The diophantine frequency vector \( \omega (\xi ) = (\omega _j)_{j \in S^+} \in \mathbb R^\nu \) of the quasi-periodic solutions of Theorem 1 is \( O(|\xi |) \)-close as \( \xi \rightarrow 0 \) (see (1.14)) to the integer vector of the unperturbed linear frequencies

    $$\begin{aligned} \bar{\omega }:= (\bar{\jmath }_1^3, \ldots , \bar{\jmath }_\nu ^3) \in \mathbb N^\nu . \end{aligned}$$
    (1.19)

    This makes perturbation theory more difficult. This is the difficulty due to the fact that the mKdV Eq. (1.1) is completely resonant at \( u = 0 \).

  4. 4.

    As shown by (1.13) the expected quasi-periodic solutions are mainly supported in Fourier space on the tangential sites S . The dynamics of the Hamiltonian PDE (1.1) restricted (and projected) to the symplectic subspaces

    $$\begin{aligned} H_S := \left\{ v = \sum _{j \in S} u_j e^{{\mathrm i}jx} \right\} , \quad H_S^\bot := \left\{ z = \sum _{j \in S^c} u_j e^{{\mathrm i}jx} \in H^1_0(\mathbb T_x) \right\} , \end{aligned}$$
    (1.20)

    where \(S^c := \{ j \in \mathbb Z{\setminus } \{ 0 \} : j \notin S \}\), is quite different. We call v the tangential variable and z the normal one. On \( H_S \) the dynamics is mainly governed by a finite dimensional integrable system (see Proposition 1), and we find it convenient to describe the dynamics in this subspace by introducing action-angle variable, see Sect. 4. On the infinite dimensional subspace \( H_S^\bot \) the solution will stay forever close to the elliptic equilibrium \( z = 0 \).

In Theorem 1 it is stated that the quasi-periodic solutions are linearly stable. This information is not only an important complement of the result, but also an essential ingredient for the existence proof. Let us explain better what we mean. By the general procedure in [10] we prove that, around each invariant torus, there exist symplectic coordinates (see (6.13))

$$\begin{aligned} (\psi , \eta , w) \in \mathbb T^{\nu } \times \mathbb R^{\nu } \times H_{S}^{\bot } \end{aligned}$$

in which the mKdV Hamiltonian (1.4) assumes the normal form

$$\begin{aligned} K (\psi , \eta , w)&= \omega \cdot \eta + \frac{1}{2} K_{2 0}(\psi ) \eta \cdot \eta + ( K_{11}(\psi ) \eta , w )_{L^2(\mathbb T)} + \frac{1}{2} (K_{02}(\psi ) w , w )_{L^2(\mathbb T)} \nonumber \\&\quad + K_{\ge 3}(\psi , \eta , w) \end{aligned}$$
(1.21)

where \( K_{\ge 3} \) collects the terms at least cubic in the variables \( (\eta , w )\), see Remark 4. In these coordinates the quasi-periodic solution reads \( t \mapsto (\omega t , 0, 0 ) \) and the corresponding linearized equations are

$$\begin{aligned} {\left\{ \begin{array}{ll} {\dot{\psi }} = K_{20}(\omega t) {\eta } + K_{11}^T (\omega t ) w\\ {\dot{\eta }} = 0\\ \dot{w} - \partial _x K_{0 2}(\omega t ) w = \partial _x K_{11}(\omega t) \eta . \end{array}\right. } \end{aligned}$$
(1.22)

Thus the actions \( \eta (t) = \eta (0) \) do not evolve in time and the third equation reduces to the forced PDE

$$\begin{aligned} {\dot{w}} = \partial _x K_{02}(\omega t)[w] + \partial _x K_{11}(\omega t)[ \eta _0] . \end{aligned}$$
(1.23)

Ignoring the forcing term \(\partial _x K_{11}(\omega t)[ \eta _0]\) for a moment, we note that the equation \(\dot{w} = \partial _x K_{02}(\omega t)[w]\) is, up to a finite dimensional remainder (Proposition 3), the restriction to \( H_{S}^\bot \) of the “variational equation”

$$\begin{aligned} h_t = \partial _x \, (\partial _u \nabla H)( u(\omega t, x) ) [h ] = X_{K}(h) , \end{aligned}$$

where \( X_K \) is the KdV Hamiltonian vector field with quadratic Hamiltonian \( K = \frac{1}{2} ((\partial _u \nabla H)(u)[h], h)_{L^2(\mathbb T_x)} \) \(= \frac{1}{2} (\partial _{uu} H)(u)[h, h] \). This is a linear PDE with quasi-periodically time-dependent coefficients of the form

$$\begin{aligned} h_t = \partial _{xx} (a_1 (\omega t, x) \partial _x h) + \partial _x ( a_0 (\omega t, x) h ) . \end{aligned}$$
(1.24)

In Sect. 8 we prove the reducibility of the linear operator \( {\dot{w}} - \partial _x K_{0 2}(\omega t ) w \), which conjugates (1.23) to the diagonal system (see (8.64))

$$\begin{aligned} \partial _t v = - {\mathrm i}\mathcal{D}_\infty v + f (\omega t) \end{aligned}$$
(1.25)

where \(\mathcal{D}_\infty := \mathrm{Op} \{ \mu _j^\infty \}_{j \in S^c}\) is a Fourier multiplier operator acting in \( H^s_\bot \),

$$\begin{aligned} \mu _j^\infty := {\mathrm i}(-m_3 j^3 + m_1 j) + r_j^\infty \in {\mathrm i}\mathbb R, \quad j \in S^c , \end{aligned}$$

with \( m_3 = 1 + O(\varepsilon ^3) \), \( m_1 = O(\varepsilon ^2) \), \( \sup _{j \in S^c} r_j^\infty = o(\varepsilon ^2) \), see (8.61) and (8.62). The eigenvalues \( \mu _j^\infty \) are the Floquet exponents of the quasi-periodic solution. The solutions of the scalar non-homogeneous equations

$$\begin{aligned} {\dot{v}}_j + \mu _j^\infty v_j = f_j (\omega t) , \quad j \in S^c , \quad \mu _j^\infty \in {\mathrm i}\mathbb R, \end{aligned}$$

are

$$\begin{aligned} v_j (t) = c_j e^{ \mu _j^\infty t} + {\tilde{v}}_j (t) , \quad \text {where } {\tilde{v}}_j (t) := \sum _{l \in \mathbb Z^\nu } \frac{f_{jl} \, e^{{\mathrm i}\omega \cdot l t } }{ {\mathrm i}\omega \cdot l + \mu _j^\infty } \end{aligned}$$

(recall that the first Melnikov conditions (8.66) hold at a solution). As a consequence, the Sobolev norm of the solution of (1.25) satisfies

$$\begin{aligned} \Vert v(t) \Vert _{H^s_x} \le C \Vert v(0) \Vert _{H^s_x} , \quad \forall t \in \mathbb R, \end{aligned}$$

i.e. it does not increase in time.

We now describe in detail the strategy of proof of Theorem 1. Many of the arguments that we use are quite general and of wide applicability to other PDEs. Nevertheless, we think that a unique abstract theorem of existence and stability of quasi-periodic solutions applicable to all quasi-linear PDEs cannot be expected. Indeed the suitable pseudo-differential operators that are required to conjugate the highest order of the linearized operator to constant coefficients highly depend on the PDE at hand, see the discussion after (1.29).

There are two main issues in the proof:

  1. 1.

    Bifurcation analysis Find approximate quasi-periodic solutions of (1.1) up to a sufficiently small remainder [which, in our case, should be \( O( u^4 ) \)]. In this step we also find the approximate “frequency-to-amplitude” modulation of the frequency with respect to the amplitude, see (4.10). This is the goal of Sects. 3 and 4.

  2. 2.

    Nash–Moser implicit function theorem Prove that, close to the above approximate solutions, there exist exact quasi-periodic solutions of (1.1). By means of a Nash–Moser iteration, we construct a sequence of approximate solutions that converges to a quasi-periodic solution of (1.1) (Sects. 59). The key step consists in proving the invertibility of the linearized operator and tame estimates for its inverse. This is achieved in two main steps.

    1. (a)

      Symplectic decoupling procedure The method in Berti and Bolle [10] allows to approximately decouple the “tangential” and the “normal” dynamics around an approximate invariant torus (Sect. 6). It reduces the problem to the one of inverting a quasi-periodically forced PDE restricted to the normal subspace \( H_S^\bot \). Its precise form is found in Sect. 7.2.

    2. (b)

      Analysis of the linearized operator in the normal directions In Sects. 7 and 8 we reduce the linearized equations to constant coefficients. This involves three steps:

      1. (i)

        Reduction in decreasing symbols Sections 8.18.3 and 8.5.

      2. (ii)

        Linear Birkhoff normal form Section 8.4.

      3. (iii)

        KAM reducibility Section 8.6.

    All the changes of variables used in the steps (i)–(iii) are \( \varphi \)-dependent families of symplectic maps \( \Phi (\varphi ) \) which act on the phase space \( H^1_0 (\mathbb T_x) \). Therefore they preserve the Hamiltonian dynamical systems structure of the conjugated linear operators.

Let us discuss these issues in detail.

Weak Birkhoff normal form. According to the orthogonal splitting

$$\begin{aligned} H^1_0 (\mathbb T_x) := H_S \oplus H_S^\bot \end{aligned}$$

into the symplectic subspaces defined in (1.20), we decompose

$$\begin{aligned} u = v + z, \quad v = \Pi _S u := \sum _{j \in S} u_j \, e^{{\mathrm i}jx}, \quad z = \Pi _S^\bot u := \sum _{j \in S^c} u_j \, e^{{\mathrm i}jx}, \end{aligned}$$
(1.26)

where \(\Pi _S \), \(\Pi _S^\bot \) denote the orthogonal projectors on \( H_S \), \( H_S^\bot \).

We perform a “weak” Birkhoff normal form (weak BNF), whose goal is to find an invariant manifold of solutions of the third order approximate mKdV Eq. (1.1), on which the dynamics is completely integrable, see Sect. 3. We construct in Proposition 1 a symplectic map \( \Phi _B \) such that the transformed Hamiltonian \(\mathcal {H}:= H \circ \Phi _B\) possesses the invariant subspace \( H_S \) (see (1.20)). To this purpose we have to eliminate the term \( \int v^3 z \, dx \) (which is linear in z ). Then we check that its dynamics on \( H_S \) is integrable and non-isocronous. For that we perform the classical finite dimensional Birkhoff normalization of the Hamiltonian term \( \int v^4 \, dx \) which turns out to be integrable and non-isocronous.

Since the present weak Birkhoff map has to remove only finitely many monomials, it is the time 1 -flow map of an Hamiltonian system whose Hamiltonian is supported on only finitely many Fourier indices. Therefore it is close to the identity up to finite dimensional operators, see Proposition 1. The key advantage is that it modifies \( {\mathcal N}_4 \) very mildly, only up to finite dimensional operators (see for example Lemma 12), and thus the spectral analysis of the linearized equations (that we shall perform in Sect. 8) is essentially the same as if we were in the original coordinates.

The weak normal form (3.7) does not remove (nor normalize) the monomials \( O(z^2) \). We point out that a stronger normal form that removes/normalizes the monomials \(O(z^2)\) is also well-defined (it is called “partial Birkhoff normal form” in Kuksin and Pöschel [24] and Pöschel [27]). However, we do not use it because, for such a stronger normal form, the corresponding Birkhoff map is close to the identity only up to an operator of order \( O(\partial _x^{-1}) \), and so it would produce terms of order \( \partial _{xx} \) and \( \partial _x \). For the same reason, we do not use the global nonlinear Fourier transform in [19] (Birkhoff coordinates), which is close to the Fourier transform up to smoothing operators of order \( O(\partial _x^{-1}) \) (this is explicitly proved for KdV).

We remark that mKdV is simpler than KdV because the nonlinearity in (1.1) is cubic and not only quadratic, and, as a consequence, less steps of Birkhoff normal form are required to reach the sufficient smallness for the Nash–Moser scheme to converge (see Remark 11).

Action-angle and rescaling At this point we introduce action-angle variables on the tangential sites (Sect. 4) and, after the rescaling (4.5), we look for quasi-periodic solutions of the Hamiltonian (4.9). Note that the coefficients of the normal form \( \mathcal{N } \) in (4.13) depend on the angles \( \theta \), unlike the usual KAM theorems [21, 27], where the whole normal form is reduced to constant coefficients. This is because the weak BNF of Sect. 3 did not normalize the quadratic terms \( O(z^2) \). These terms are dealt with the “linear Birkhoff normal form” (linear BNF) in Sect. 8.4. In some sense the “partial” Birkhoff normal form of [27] is split into the weak BNF of Sect. 3 and the linear BNF of Sect. 8.4.

The present functional formulation with the introduction of the action-angle variables allows to prove the stability of the solutions (unlike the Lyapunov–Schmidt reduction approach).

Nonlinear functional setting and approximate inverse We look for a zero of the nonlinear operator (5.6), where the unknown is the torus embeddeding \( \varphi \mapsto i(\varphi ) \), and where the frequency \( \omega \) is seen as an “external” parameter. This formulation is convenient in order to verify the Melnikov non-resonance conditions required to invert the linearized operators at each step. The solution is obtained by a Nash–Moser iterative scheme in Sobolev scales. The key step is to construct (for \( \omega \) restricted to a suitable Cantor-like set) an approximate inverse (à la Zehnder [31]) of the linearized operator at any approximate solution. Roughly, this means to find a linear operator which is an inverse at an exact solution. A major difficulty is that the tangential and the normal dynamics near an invariant torus are strongly coupled.

Symplectic approximate decoupling The above difficulty is overcome by implementing the abstract procedure in Berti and Bolle [10], which was developed in order to prove the existence of quasi-periodic solutions for autonomous NLW (and NLS) with a multiplicative potential. This approach reduces the search of an approximate inverse for (5.6) to the invertibility of a quasi-periodically forced PDE restricted to the normal directions. This method approximately decouples the tangential and the normal dynamics around an approximate invariant torus, introducing a suitable set of symplectic variables

$$\begin{aligned} (\psi , \eta , w) \in \mathbb T^\nu \times \mathbb R^\nu \times H_S^\bot \end{aligned}$$

near the torus, see (6.13). Note that, in the first line of (6.13), \( \psi \) is the “natural” angle variable which coordinates the torus, and, in the third line, the normal variable z is only translated by the component \( z_0 (\psi )\) of the torus. The second line completes this transformation to a symplectic one. The canonicity of this map is proved in [10] using the isotropy of the approximate invariant torus \( i_\delta \), see Lemma 8. In these new variables the torus \( \psi \mapsto i_\delta (\psi ) \) reads \( \psi \mapsto (\psi , 0, 0 )\). The main advantage of these coordinates is that the second equation in (6.22) (which corresponds to the action variables of the torus) can be immediately solved, see (6.24). Then it remains to solve the third Eq. (6.25), i.e. to invert the linear operator \( \mathcal{L}_\omega \). This is a quasi-periodic Hamiltonian perturbed linear Airy equation of the form

$$\begin{aligned} h \mapsto \mathcal{L}_\omega h := \Pi _S^\bot ( \omega \! \cdot \! \partial _\varphi h + \partial _{xx} (a_1 \partial _x h) + \partial _x ( a_0 h ) + \partial _x \mathcal {R}h ) , \quad \forall h \in H_S^\bot , \end{aligned}$$
(1.27)

where \( \mathcal {R}\) is a finite dimensional remainder. The exact form of \( \mathcal{L}_\omega \) is obtained in Proposition 3, see (7.23).

Reduction to constant coefficients of the linearized operator in the normal directions In Sect. 8 we conjugate the variable coefficients operator \( \mathcal{L}_\omega \) to a diagonal operator with constant coefficients which describes infinitely many harmonic oscillators

$$\begin{aligned} {\dot{v}}_j + \mu _j^\infty v_j = 0 , \quad \mu _j^\infty := {\mathrm i}(-m_3 j^3 + m_1 j) + r_j^\infty \in {\mathrm i}\mathbb R, \quad j \notin S , \end{aligned}$$
(1.28)

where the constants \( m_3 -1 \), \( m_1 \in \mathbb R\) and \( \sup _j |r_j^\infty | \) are small, see Theorem 4. The main perturbative effect to the spectrum (and the eigenfunctions) of \( \mathcal{L}_\omega \) is due to the term \( a_1 (\omega t, x ) \partial _{xxx} \) (see (1.27)), and it is too strong for the usual reducibility KAM techniques to work directly. The conjugacy of \( \mathcal{L}_\omega \) with (1.28) is obtained in several steps. The first task (obtained in Sects. 8.18.5) is to conjugate \( \mathcal{L}_\omega \) to another Hamiltonian operator of \( H_S^\bot \) with constant coefficients

$$\begin{aligned} \mathcal{L}_5 := \Pi _S^\bot (\omega \cdot \partial _\varphi + m_3 \partial _{xxx} + m_1 \partial _x + R_5 ) \Pi _S^\bot , \quad m_1, m_3 \in \mathbb R, \end{aligned}$$
(1.29)

up to a small bounded remainder \( R_5 = O(\partial _x^0 ) \), see (8.56). This expansion of \( \mathcal{L}_\omega \) in “decreasing symbols” with constant coefficients follows [3], and it is somehow in the spirit of the works of Iooss et al. [16, 17] in water waves theory, and Baldi [2] for Benjamin–Ono. It is obtained by transformations which are very different from the usual KAM changes of variables. We underline that the specific form of these transformations depend on the structure of mKdV. For other quasi-linear PDEs the analogous reduction requires different transformations, see for example Alazard and Baldi [1], Berti and Montalto [12] for recent developments of these techniques for gravity-capillary water waves, and Feola and Procesi [14] for quasi-linear forced perturbations of Schrödinger equations.

The transformation of (1.27) into (1.29) is made in several steps.

  1. 1.

    Reduction of the highest order The first step (Sect. 8.1) is to eliminate the x -dependence from the coefficient \( a_1 (\omega t, x ) \partial _{xxx} \) of the Hamiltonian operator \( \mathcal{L}_\omega \). For this purpose, we have to construct a symplectic diffeomorphism of \( H_S^\bot \) near \( \mathcal{A}_\bot := \Pi _S^\bot \mathcal {A}\Pi _S^\bot \), where \(\mathcal {A}\) is a diffeomorphism of the form

    $$\begin{aligned} u \mapsto (\mathcal{A} u)(\varphi ,x) := (1 + \beta _x(\varphi ,x)) u(\varphi ,x + \beta (\varphi ,x)) , \end{aligned}$$

    see (8.1). The starting point is to observe that \(\mathcal {A}\) is, for each \( \varphi \in \mathbb T^\nu \), the time-one flow map of the time dependent Hamiltonian transport linear PDE

    $$\begin{aligned} \partial _\tau u = \partial _x (b(\varphi , \tau , x) u) , \quad b (\varphi , \tau , x) := \frac{\beta (\varphi , x)}{1 + \tau \beta _x(\varphi , x)}. \end{aligned}$$
    (1.30)

    Actually the flow of (1.30) is the path of symplectic diffeomorphisms

    $$\begin{aligned} u (\varphi , x) \mapsto (1+ \tau \beta _x (\varphi , x) ) u (\varphi , x+ \tau \beta (\varphi , x) ), \quad \tau \in [0,1] . \end{aligned}$$

    Thus, like in [5], we conjugate \( \mathcal{L}_\omega \) with the symplectic time-one flow map of the projected Hamiltonian equation

    $$\begin{aligned} \partial _\tau u = \Pi _S^\bot \partial _x (b(\tau , x) u) = \partial _x (b(\tau , x) u) - \Pi _S \partial _x (b(\tau , x) u) , \quad u \in H_S^\bot \end{aligned}$$
    (1.31)

    generated by the quadratic Hamiltonian \( \frac{1}{2} \int _{\mathbb T} b(\tau , x) u^2 dx \) restricted to \( H_S^\bot \). By Lemma 15 (which was proved in [5]) such a symplectic map differs from \( \mathcal{A}_\bot \) only for finite dimensional operators. This step may be seen as a quantitative application of the Egorov theorem, see [29], which describes how the principal symbol of a pseudo-differential operator [here \( a_1 (\omega t, x) \partial _{xxx} \)] transforms under the flow of a linear hyperbolic PDE (here (1.31)). Because of the Hamiltonian structure, the previous step also eliminates the term \( O( \partial _{xx} )\), see (8.13). In Sect. 8.2 we eliminate the time-dependence of the coefficient at the order \( \partial _{xxx} \).

  2. 2.

    Linear Birkhoff normal form In Sect. 8.4 we eliminate the variable coefficient terms at the order \( O(\varepsilon ^2 )\), which are present in the operator \( \mathcal{L}_\omega \), see (7.23) and (7.24). This is a consequence of the fact that the weak BNF procedure of Sect. 3 did not touch the quadratic terms \( O(z^2 ) \). These terms cannot be reduced to constants by the perturbative scheme in Sect. 8.6 (developed in [3]) which applies to terms R such that \( R \gamma ^{ -1} \ll 1 \) where \( \gamma \) is the diophantine constant of the frequency vector \( \omega \) (the case in [3] is simpler because the diophantine constant is \( \gamma = O(1) \)). Here, as well as in [5], since mKdV is completely resonant, such \( \gamma = o(\varepsilon ^2 ) \), see (5.3). The terms of size \(\varepsilon ^2\) are reduced to constant coefficients in Sect. 8.4 by means of purely algebraic arguments (linear BNF), which, ultimately, stem from the complete integrability of the fourth order BNF of the mKdV Eq. (1.9). More general nonlinearities should be dealt with the normal form arguments of Procesi and Procesi [28] for generic choices of the tangential sites.

Complete diagonalization of (1.29) In Sect. 8.6 we apply the abstract KAM reducibility Theorem 4.2 of [3], which completely diagonalizes the linearized operator, obtaining (1.28). The required smallness condition (8.58) for \( R_5 \) holds, after that the linear BNF of Sect. 8.4 has put into constant coefficients the unbounded terms of nonperturbative size \(\varepsilon ^2\), and the conjugation procedure of Sects. 8.1-8.3 and 8.5 has arrived to a bounded and small remainder \(R_5\).

The Nash–Moser iteration to an invariant torus embedding In Sect. 9 we perform the nonlinear Nash–Moser iteration which finally proves Theorem 2 and, therefore, Theorem 1. The smallness condition that is required for the convergence of the scheme is \( \varepsilon ^2 \Vert \mathcal{F}(\varphi , 0, 0 ) \Vert _{s_0+ \mu } \gamma ^{-2}\) sufficiently small, see (9.5). It is verified because \( \Vert X_P(\varphi , 0 , 0 ) \Vert _s \le _s \varepsilon ^{5 - 2b} \) (Lemma 5) and \( \gamma = \varepsilon ^{2+a}\) with \( a > 0 \) small. See also Remark 11 for a comparison between the smallness condition required here with the one in [5].

Notation We shall use the notation

$$\begin{aligned} a \le _s b \Longleftrightarrow a \le C(s) b \quad \text {for some constant } C(s) > 0 . \end{aligned}$$

We denote by \( \pi _0 \) the operator

$$\begin{aligned} u \mapsto \pi _0(u) := u - \frac{1}{2\pi } \int _\mathbb Tu \, dx . \end{aligned}$$
(1.32)

2 Functional setting

For a function \(u :\Omega _o \rightarrow E\), \(\omega \mapsto u(\omega )\), where \((E, \Vert \ \Vert _E)\) is a Banach space and \( \Omega _o \) is a subset of \(\mathbb R^\nu \), we define the sup-norm and the Lipschitz semi-norm

$$\begin{aligned} \begin{array}{lll} \displaystyle \Vert u \Vert ^{\sup }_E &{} := \displaystyle \Vert u \Vert ^{\sup }_{E,\Omega _o} := \sup _{ \omega \in \Omega _o } \Vert u(\omega ) \Vert _E, \\ \displaystyle \Vert u \Vert ^{\mathrm {lip}}_E &{} := \displaystyle \Vert u \Vert ^{\mathrm {lip}}_{E,\Omega _o} := \sup _{\omega _1 \ne \omega _2 } \frac{ \Vert u(\omega _1) - u(\omega _2) \Vert _E }{ | \omega _1 - \omega _2 | }, \end{array} \end{aligned}$$
(2.1)

and, for \( \gamma > 0 \), the Lipschitz norm

$$\begin{aligned} \Vert u \Vert ^{{\mathrm {Lip}(\gamma )}}_E := \Vert u \Vert ^{{\mathrm {Lip}(\gamma )}}_{E,\Omega _o} := \Vert u \Vert ^{\sup }_E + \gamma \Vert u \Vert ^{\mathrm {lip}}_E . \end{aligned}$$
(2.2)

If \( E = H^s \) we simply denote \( \Vert u \Vert ^{{\mathrm {Lip}(\gamma )}}_{H^s} := \Vert u \Vert ^{{\mathrm {Lip}(\gamma )}}_s \).

Sobolev norms We denote by

$$\begin{aligned} \Vert u \Vert _s := \Vert u \Vert _{H^s( \mathbb T^{\nu + 1})} := \Vert u \Vert _{H^s_{\varphi ,x} } \end{aligned}$$
(2.3)

the Sobolev norm of functions \( u = u(\varphi ,x) \) in the Sobolev space \( H^{s} (\mathbb T^{\nu + 1} ) \). We denote by \( \Vert \ \Vert _{H^s_x} \) the Sobolev norm in the phase space of functions \( u := u(x) \in H^{s} (\mathbb T) \). Moreover \( \Vert \ \Vert _{H^s_\varphi } \) denotes the Sobolev norm of scalar functions, like the Fourier components \( u_j (\varphi ) \).

We fix \( s_0 := (\nu +2) \slash 2 \) so that \( H^{s_0} (\mathbb T^{\nu + 1} ) \hookrightarrow L^{\infty } (\mathbb T^{\nu + 1} ) \) and any space \( H^s (\mathbb T^{\nu + 1} ) \), \( s \ge s_0 \), is an algebra and satisfy the interpolation inequalities: for \(s \ge s_0\),

$$\begin{aligned} \Vert uv \Vert _s \le C(s_0) \Vert u\Vert _s \Vert v\Vert _{s_0} + C(s) \Vert u\Vert _{s_0} \Vert v \Vert _s , \quad \forall u,v \in H^s(\mathbb T^d) . \end{aligned}$$

The above inequalities also hold for the norms \(\Vert \ \Vert _s^{\mathrm{Lip}(\gamma )}\).

We also denote

$$\begin{aligned} H^s_{S^\bot } (\mathbb T^{\nu +1})&:= \{ u \in H^s(\mathbb T^{\nu + 1} ) :u (\varphi , \cdot ) \in H_S^\bot \quad \forall \varphi \in \mathbb T^\nu \} , \nonumber \\ H^s_{S} (\mathbb T^{\nu +1})&:= \{ u \in H^s(\mathbb T^{\nu + 1} ) :u (\varphi , \cdot ) \in H_{S} \quad \forall \varphi \in \mathbb T^\nu \} . \end{aligned}$$

Matrices with off-diagonal decay A linear operator can be identified, as usual, with its matrix representation. We recall the definition of the s -decay norm (introduced in [9]) of an infinite dimensional matrix.

Definition 1

Let \( A := (A_{i_1}^{i_2} )_{i_1, i_2 \in \mathbb Z^b } \), \(b \ge 1\), be an infinite dimensional matrix. Its s-decay norm \(|A|_s\) is defined by

$$\begin{aligned} \left| A \right| _{s}^2 := \sum _{i \in \mathbb Z^b} \left\langle i \right\rangle ^{2s} \left( \sup _{ \begin{array}{c} i_{1} - i_{2} =i \end{array}} | A^{i_2}_{i_1}| \right) ^{2}. \end{aligned}$$
(2.4)

For parameter dependent matrices \( A := A(\omega ) \), \(\omega \in \Omega _o \subseteq \mathbb R^\nu \), the definitions (2.1) and (2.2) become

$$\begin{aligned} | A |^{\sup }_s := \sup _{ \omega \in \Omega _o } | A(\omega ) |_s, \quad | A |^{\mathrm {lip}}_s := \sup _{\omega _1 \ne \omega _2} \frac{ | A(\omega _1) - A(\omega _2) |_s }{ | \omega _1 - \omega _2 | }, \end{aligned}$$
(2.5)

and \(| A |^{{\mathrm {Lip}(\gamma )}}_s := | A |^{\sup }_s + \gamma | A |^{\mathrm {lip}}_s\).

Such a norm is modeled on the behavior of matrices representing the multiplication operator by a function. Actually, given a function \( p \in H^s(\mathbb T^b) \), the multiplication operator \( h \mapsto p h \) is represented by the Töplitz matrix \( T_i^{i'} = p_{i - i'} \) and \( |T|_s = \Vert p \Vert _s \). If \(p = p(\omega )\) is a Lipschitz family of functions, then

$$\begin{aligned} |T|_s^{\mathrm {Lip}(\gamma )}= \Vert p \Vert _s^{\mathrm {Lip}(\gamma )}. \end{aligned}$$

The s-norm satisfies classical algebra and interpolation inequalities proved in [3].

Lemma 1

Let \(A = A(\omega ), B = B(\omega )\) be matrices depending in a Lipschitz way on the parameter \(\omega \in \Omega _o \subset \mathbb R^\nu \). Then for all \(s \ge s_0 > b/2 \) there are \( C(s) \ge C(s_0) \ge 1 \) such that

$$\begin{aligned} |A B |_s^{{\mathrm {Lip}(\gamma )}}&\le C(s) |A|_s^{{\mathrm {Lip}(\gamma )}} |B|_s^{{\mathrm {Lip}(\gamma )}} , \nonumber \\ |A B|_{s}^{{\mathrm {Lip}(\gamma )}}&\le C(s) |A|_{s}^{{\mathrm {Lip}(\gamma )}} |B|_{s_0}^{{\mathrm {Lip}(\gamma )}} + C(s_0) |A|_{s_0}^{{\mathrm {Lip}(\gamma )}} |B|_{s}^{{\mathrm {Lip}(\gamma )}} . \end{aligned}$$

The s -decay norm controls the Sobolev norm, namely

$$\begin{aligned} \Vert A h \Vert _s^{\mathrm {Lip}(\gamma )}\le C(s) \big (|A|_{s_0}^{\mathrm {Lip}(\gamma )}\Vert h \Vert _s^{\mathrm {Lip}(\gamma )}+ |A|_{s}^{\mathrm {Lip}(\gamma )}\Vert h \Vert _{s_0}^{\mathrm {Lip}(\gamma )}\big ). \end{aligned}$$

Let now \( b := \nu + 1 \). An important sub-algebra is formed by the Töplitz in time matrices defined by

$$\begin{aligned} A^{(l_2, j_2)}_{(l_1, j_1)} := A^{j_2}_{j_1}(l_1 - l_2 ), \end{aligned}$$

whose decay norm (2.4) is

$$\begin{aligned} |A|_s^2 = \sum _{j \in \mathbb Z, l \in \mathbb Z^\nu } \left( \sup _{j_1 - j_2 = j} |A_{j_1}^{j_2}(l)| \right) ^2 \langle l,j \rangle ^{2 s} . \end{aligned}$$

These matrices are identified with the \( \varphi \)-dependent family of operators

$$\begin{aligned} A(\varphi ) := ( A_{j_1}^{j_2} (\varphi ))_{j_1, j_2 \in \mathbb Z} , \quad A_{j_1}^{j_2} (\varphi ) := \sum _{l \in \mathbb Z^\nu } A_{j_1}^{j_2}(l) e^{{\mathrm i}l \cdot \varphi } \end{aligned}$$

which act on functions of the x-variable as

$$\begin{aligned} A(\varphi ) : h(x) = \sum _{j \in \mathbb Z} h_j e^{{\mathrm i}jx} \mapsto A(\varphi ) h(x) = \sum _{j_1, j_2 \in \mathbb Z} A_{j_1}^{j_2} (\varphi ) h_{j_2} e^{{\mathrm i}j_1 x} . \end{aligned}$$

Transformations of this kind were also used in [3, 9, 11]. All the transformations that we construct in this paper are of this type [with \( j, j_1, j_2 \ne 0 \) because they act on the phase space \( H^1_0 (\mathbb T_x) \)].

Definition 2

We say that

  1. 1.

    an operator \((A h)(\varphi , x) := A(\varphi ) h(\varphi , x)\) is symplectic if each \( A (\varphi ) \), \( \varphi \in \mathbb T^\nu \), is a symplectic map of the phase space (or of a symplectic subspace like \( H_S^\bot \));

  2. 2.

    an operator is real if it maps real-valued functions into real-valued functions;

  3. 3.

    the real operator \(\omega \cdot \partial _{\varphi } - \partial _x G( \varphi )\) is Hamiltonian if each \( G (\varphi ) \), \( \varphi \in \mathbb T^\nu \), is self-adjoint with respect to the \(L^2(\mathbb T)\) complex scalar product.

A Hamiltonian operator is transformed, under a symplectic map, into another Hamiltonian operator, see [3, section 2.3].

We conclude this preliminary section recalling the following well known lemmata about composition of functions (see, e.g., Appendix of [3]).

Lemma 2

(Composition) Assume \( f \in C^s (\mathbb T^d \times B_1)\), \(B_1 := \{ y \in \mathbb R^m :|y| \le 1 \}\). Then \( \forall u \in H^{s}(\mathbb T^d, \mathbb R^m) \) such that \( \Vert u \Vert _{L^\infty } < 1 \), the composition operator \(\tilde{f}(u)(x) := f(x, u(x))\) satisfies \( \Vert \tilde{f}(u) \Vert _s \le C \Vert f \Vert _{C^s} (\Vert u\Vert _{s} + 1) \) where the constant C depends on sd . If \( f \in C^{s+2} \) and \( \Vert u + h \Vert _{L^\infty } < 1\), then for \(k=0,1\)

$$\begin{aligned} \left\| \tilde{f}(u+h) - \sum _{i = 0}^k \frac{\tilde{f}^{(i)}(u)}{i !} [h^i] \right\| _s \le C \Vert f \Vert _{C^{s+ 2}} \, \Vert h \Vert _{L^\infty }^k ( \Vert h \Vert _{s} + \Vert h \Vert _{L^\infty } \Vert u \Vert _{s}). \end{aligned}$$

The statement also holds replacing \(\Vert \ \Vert _s\) with the norms \(| \ |_{s, \infty }\) of \(W^{s,\infty }(\mathbb T^d)\).

Lemma 3

(Change of variable) Let \(p \in W^{s,\infty } (\mathbb T^d,\mathbb R^d) \), \( s \ge 1\), with \( \Vert p \Vert _{W^{1, \infty }}\) \( \le 1/2 \). Then the function \(f(x) = x + p(x)\) is invertible, with inverse \( f^{-1}(y) = y + q(y)\) where \(q \in W^{s,\infty }(\mathbb T^d,\mathbb R^d)\), and \( \Vert q \Vert _{W^{s, \infty }} \le C \Vert p \Vert _{ W^{s, \infty }} \).

If, moreover, p depends in a Lipschitz way on a parameter \(\omega \in \Omega \subset \mathbb R^\nu \), and \(\Vert D_x p \Vert _ {L^\infty } \le 1/2 \) for all \(\omega \), then \( \Vert q \Vert _{W^{s, \infty }}^{\mathrm{Lip}(\gamma )} \le C \Vert p \Vert _{W^{s+1, \infty }}^{\mathrm{Lip}(\gamma )} \). The constant \(C := C (d, s) \) is independent of \(\gamma \).

If \(u \in H^s (\mathbb T^d,\mathbb C)\), then \( (u\circ f)(x) := u(x+p(x))\) satisfies

$$\begin{aligned} \Vert u \circ f \Vert _s&\le C (\Vert u\Vert _s + \Vert p \Vert _{W^{s, \infty }} \Vert u\Vert _1), \\ \Vert u \circ f - u \Vert _s&\le C ( \Vert p \Vert _{L^\infty } \Vert u \Vert _{s + 1} + \Vert p \Vert _{W^{s, \infty }} \Vert u \Vert _{2} ) , \\ \Vert u \circ f \Vert _{s}^{{\mathrm{Lip}(\gamma )}}&\le C \, \big ( \Vert u \Vert _{s+1}^{{\mathrm{Lip}(\gamma )}} + \Vert p \Vert _{W^{s, \infty }}^{\mathrm{Lip}(\gamma )}\Vert u \Vert _2^{\mathrm{Lip}(\gamma )} \big ). \nonumber \end{aligned}$$

The function \(u \circ f^{-1} \) satisfies the same bounds.

3 Weak Birkhoff normal form

In this section it is convenient to analize the mKdV equation in the Fourier representation

$$\begin{aligned} u(x) = \sum \limits _{j \in \mathbb Z{\setminus } \{0\} } u_j e^{{\mathrm i}j x}, \quad u(x) \longleftrightarrow u := (u_j)_{j \in \mathbb Z{\setminus } \{0\} }, \quad u_{-j} = \overline{u}_j, \end{aligned}$$
(3.1)

where the Fourier indices are nonzero integers j, by the definition (1.5) of the phase space, and \(u_{-j} = \overline{u}_j\) because u(x) is real-valued. The symplectic structure (1.6) writes

$$\begin{aligned} \Omega = \frac{1}{2} \sum _{j \ne 0} \frac{1}{{\mathrm i}j} du_j \wedge d u_{-j}, \quad \Omega ( u, v ) = \sum _{j \ne 0} \frac{1}{{\mathrm i}j} u_j v_{-j}, \end{aligned}$$
(3.2)

the Hamiltonian vector field \(X_H\) in (1.3) and the Poisson bracket \(\{ F, G \}\) in (1.7) are respectively

$$\begin{aligned}{}[X_H (u)]_j = {\mathrm i}j \partial _{u_{-j}} H(u), \quad \{ F, G \}(u) = - \sum _{j \ne 0} {\mathrm i}j (\partial _{u_{-j}} F) (u) (\partial _{u_j} G) (u). \end{aligned}$$
(3.3)

We shall sometimes identify \( v \equiv (v_j)_{j \in S } \) and \( z \equiv (z_j)_{j \in S^c } \).

The Hamiltonian of the perturbed cubic mKdV Eq. (1.1) is \( H = H_2 + H_4 + H_{\ge 5} \) (see (1.4)) where

$$\begin{aligned} H_2(u) := \int _{\mathbb T} \frac{u_x^{2}}{2} dx, \quad H_4(u) := - \varsigma \int _\mathbb T\frac{u^4}{4} dx, \quad H_{\ge 5}(u) := \int _\mathbb Tf(x, u,u_x) dx, \end{aligned}$$
(3.4)

\(\varsigma = \pm 1\) and f satisfies (1.8). According to the splitting (1.26) \( u = v + z \), where \( v \in H_S \) and \( z \in H_S^\bot \), we have \(H_2(u) = H_2(v) + H_2(z)\) and

$$\begin{aligned} H_4(u) = - \frac{\varsigma }{4} \int _{\mathbb T} v^4 \, dx - \varsigma \int _{\mathbb T} v^3 z \, dx - \frac{3\varsigma }{2} \int _{\mathbb T} v^2 z^2 \, dx - \varsigma \int _{\mathbb T} v z^3 \, dx - \frac{\varsigma }{4} \int _{\mathbb T} z^4 \, dx. \end{aligned}$$

For a finite-dimensional space

$$\begin{aligned} E := E_{C} := \mathrm {span} \{ e^{{\mathrm i}jx} :0 < |j| \le C \}, \quad C > 0, \end{aligned}$$
(3.5)

let \(\Pi _E \) denote the corresponding \( L^2 \)-projector on E.

In the next proposition we construct a symplectic map \( \Phi _B \) such that the transformed Hamiltonian \(\mathcal {H}:= H \circ \Phi _B\) possesses the invariant subspace \( H_S \) defined in (1.20), and its dynamics on \( H_S \) is integrable and non-isocronous. To this purpose we have to eliminate the term \( \int v^3 z \, dx \) (which is linear in z ) and to normalize the term \( \int v^4 \, dx \) (which is independent of z ) in the quartic component of the Hamiltonian.

Proposition 1

(Weak Birkhoff normal form) There exists an analytic invertible symplectic transformation of the phase space \( \Phi _B : H^1_0 (\mathbb T_x) \rightarrow H^1_0 (\mathbb T_x) \) of the form

$$\begin{aligned} \Phi _B(u) = u + \Psi (u), \quad \Psi (u) = \Pi _E \Psi (\Pi _E u), \end{aligned}$$
(3.6)

where E is a finite-dimensional space as in (3.5), such that the transformed Hamiltonian is

$$\begin{aligned} \mathcal{H} := H \circ \Phi _B = H_2 + \mathcal {H}_4 + \mathcal{H}_{\ge 5} , \end{aligned}$$
(3.7)

where \(H_2\) is defined in (3.4),

$$\begin{aligned} \mathcal {H}_4&:= \frac{3\varsigma }{4} \left( \sum _{j \in S} |u_j|^4 - \sum _{j,j' \in S} |u_j|^2 |u_{j'}|^2 \right) - \frac{3\varsigma }{2} \int _\mathbb Tv^2 z^2 \, dx \nonumber \\&\quad \ \ - \varsigma \int _\mathbb Tv z^3 \, dx - \frac{\varsigma }{4} \int _\mathbb Tz^4 \, dx, \end{aligned}$$
(3.8)

and \(\mathcal{H}_{\ge 5}\) collects all the terms of order at least five in (vz).

Proof

In Fourier coordinates (3.1) we have (see (3.4))

$$\begin{aligned} H_2(u) = \frac{1}{2} \sum _{j \ne 0} j^2 |u_j|^2, \quad H_4(u) = - \frac{\varsigma }{4} \, \sum _{j_1 + j_2 + j_3 + j_4 = 0} u_{j_1} u_{j_2} u_{j_3} u_{j_4} . \end{aligned}$$
(3.9)

We look for a symplectic transformation \(\Phi \) of the phase space which eliminates or normalizes the monomials \( u_{j_1} u_{j_2} u_{j_3} u_{j_4} \) of \( H_4 \) with at most one index outside S . By the relation \( j_1 + j_2 + j_3 + j_4 = 0 \), they are finitely many. Thus, we look for a map \(\Phi := (\Phi _{F}^t)_{|t=1}\) which is the time 1-flow map of an auxiliary quartic Hamiltonian

$$\begin{aligned} F(u) := \sum _{j_1 + j_2 + j_3 + j_4 = 0} F_{j_1 j_2 j_3 j_4} u_{j_1} u_{j_2} u_{j_3} u_{j_4} . \end{aligned}$$

The transformed Hamiltonian is

$$\begin{aligned} \mathcal {H}:= H \circ \Phi = H_2 + \mathcal {H}_4 + \mathcal {H}_{\ge 5}, \quad \mathcal {H}_4 = \{ H_2, F \} + H_4, \end{aligned}$$
(3.10)

where \( \mathcal {H}_{\ge 5} \) collects all the terms in \(\mathcal {H}\) of order at least five. By (3.3) and (3.9) we calculate

$$\begin{aligned} \mathcal {H}_4 = \sum _{j_1 + j_2 + j_3 + j_4 = 0} \left\{ - \frac{\varsigma }{4} \, - {\mathrm i}(j_1^3 + j_2^3 + j_3^3 + j_4^3) F_{j_1 j_2 j_3 j_4} \right\} \, u_{j_1} u_{j_2} u_{j_3} u_{j_4}. \end{aligned}$$

In order to eliminate or normalize only the monomials with at most one index outside S , we choose

$$\begin{aligned} F_{j_1 j_2 j_3 j_4} := {\left\{ \begin{array}{ll} \dfrac{{\mathrm i}\varsigma }{4 (j_1^3 + j_2^3 + j_3^3 + j_4^3)} &{}\quad \text {if} \,\,(j_1,j_2,j_3,j_4) \in \mathcal{A},\\ 0 &{}\quad \text {otherwise}, \end{array}\right. } \end{aligned}$$
(3.11)

where

$$\begin{aligned} \mathcal{A}:= & {} \{ (j_1 , j_2 , j_3, j_4) \in (\mathbb Z{\setminus } \{ 0 \})^4 : \ j_1 + j_2 + j_3 + j_4= 0, \quad j_1^{3} + j_2^{3} + j_3^{3} + j_4^3 \ne 0, \\&\text { and at least three among} \ j_1 , j_2 , j_3, j_4 \ \text {belong to } S \}. \end{aligned}$$

We recall the following elementary identity (Lemma 13.4 in [18]). \(\square \)

Lemma 4

Let \(j_1, j_2, j_3, j_4 \in \mathbb Z\) such that \( j_1 + j_2 + j_3 + j_4 = 0 \). Then

$$\begin{aligned} j_1^3 + j_2^3 + j_3^3 + j_4^3 = -3 (j_1 + j_2) (j_1 + j_3) (j_2 +j_3). \end{aligned}$$

By definition (3.11), \(\mathcal {H}_4\) does not contain any monomial \(u_{j_1} u_{j_2} u_{j_3} u_{j_4}\) with three indices in S and one outside, because there exist no integers \( j_1, j_2 , j_3 \in S\), \( j_4 \in S^c \) satisfying \( j_1 + j_2 + j_3 + j_4 = 0 \) and \( j_1^3 + j_2^3 + j_3^3 + j_4^3 = 0 \), by Lemma 4 and the fact that S is symmetric.

By construction, the quartic monomials with at least two indices outside S are not changed by \(\Phi \). Also, by construction, the monomials \(u_{j_1} u_{j_2} u_{j_3} u_{j_4}\) in \(\mathcal {H}_4\) with all integers in S are those for which \(j_1 + j_2 + j_3 + j_4 = 0\) and \(j_1^3 + j_2^3 + j_3^4 + j_4 ^3 = 0\). By Lemma 4, we split

$$\begin{aligned} \sum _{\begin{array}{c} j_1, j_2, j_3, j_4 \in S \\ j_1 + j_2 + j_3 + j_4 = 0 \\ j_1^3 + j_2^3 + j_3^3 + j_4^3 = 0 \end{array}} u_{j_1} u_{j_2} u_{j_3} u_{j_4} = A_1 + A_2 + A_3 \end{aligned}$$

where \(A_1\) is given by the sum over \(j_1, j_2, j_3, j_4 \in S\), \(j_1 + j_2 + j_3 + j_4 = 0\) with the restriction \(j_1 + j_2 = 0\), \(A_2\) with the restriction \(j_1 + j_2 \ne 0\) and \(j_1 + j_3 = 0\), and \(A_3\) with the restriction \(j_1 + j_2 \ne 0\), \(j_1 + j_3 \ne 0\) and \(j_2 + j_3 = 0\). We get

$$\begin{aligned} A_2&= \sum _{\begin{array}{c} j, j' \in S \\ j' \ne -j \end{array}} |u_j|^2 |u_{j'}|^2 = \sum _{j, j' \in S} |u_j|^2 |u_{j'}|^2 - \sum _{j \in S} |u_j|^4 , \quad A_1 = \sum _{j, j' \in S} |u_j|^2 |u_{j'}|^2 ,\\ A_3&= \sum _{\begin{array}{c} j, j' \in S \\ j' \ne \pm j \end{array}} |u_j|^2 |u_{j'}|^2 = \sum _{j, j' \in S} |u_j|^2 |u_{j'}|^2 - 2 \sum _{j \in S} |u_j|^4 , \end{aligned}$$

whence (3.8) follows.

Remark 1

In the Birkhoff normal form for the Hamiltonian \( K = H + \lambda M^2 \) defined in (1.18), three additional terms appear in (3.8), which are

$$\begin{aligned} \lambda \sum _{j, j' \in S} |u_j|^2 |u_{j'}|^2 + 2 \lambda M(v) M(z) + \lambda M^2(z). \end{aligned}$$

Then in (3.8) the sum \((\lambda - \frac{3\varsigma }{4}) \sum _{j, j' \in S} |u_j|^2 |u_{j'}|^2\) vanishes if we choose \(\lambda := 3 \varsigma /4\).

4 Action-angle variables

We introduce action-angle variables on the tangential directions by the change of coordinates

$$\begin{aligned} u_j := \sqrt{\tilde{\xi }_j + |j| \tilde{y}_j} \, e^{{\mathrm i}\tilde{\theta }_j} \quad \text {for} \ j \in S ; \quad u_j := \tilde{z}_j \quad \text {for} \ j \in S^c , \end{aligned}$$
(4.1)

where (recall that \( u_{-j} = {\overline{u}}_j \))

$$\begin{aligned} \tilde{\xi }_{-j} = \tilde{\xi }_j , \quad \tilde{\xi }_j > 0 , \quad \tilde{y}_{-j} = \tilde{y}_j , \quad \tilde{\theta }_{-j} = - \tilde{\theta }_j , \quad \tilde{\theta }_j, \, \tilde{y}_j \in \mathbb R, \quad \forall j \in S . \end{aligned}$$
(4.2)

To simplify notation, for the tangential sites \( S^+ := \{ {\bar{\jmath }_1}, \ldots , {\bar{\jmath }_\nu } \} \) we also denote \(\tilde{\theta }_{\bar{\jmath }_i} := \tilde{\theta }_i \), \( \tilde{y}_{\bar{\jmath }_i} := \tilde{y}_i \), \( \tilde{\xi }_{\bar{\jmath }_i} := \tilde{\xi }_i \), \( i =1, \ldots \, \nu \).

The symplectic 2-form \( \Omega \) in (3.2) (i.e. (1.6)) becomes

$$\begin{aligned} \mathcal{W} := \sum _{i=1}^\nu d \tilde{\theta }_i \wedge d \tilde{y}_i + \frac{1}{2} \sum _{j \in S^c {\setminus } \{ 0 \} } \frac{1}{{\mathrm i}j} \, d \tilde{z}_j \wedge d \tilde{z}_{-j} = \left( \sum _{i=1}^\nu d \tilde{\theta }_i \wedge d \tilde{y}_i \right) \oplus \Omega _{S^\bot } = d \Lambda \end{aligned}$$
(4.3)

where \( \Omega _{S^\bot } \) denotes the restriction of \( \Omega \) to \( H_S^\bot \) (see (1.20)) and \( \Lambda \) is the Liouville 1 -form on \( \mathbb T^\nu \times \mathbb R^\nu \times H_S^\bot \) defined by \( \Lambda _{(\tilde{\theta }, \tilde{y}, \tilde{z})} : \mathbb R^\nu \times \mathbb R^\nu \times H_S^\bot \rightarrow \mathbb R\),

$$\begin{aligned} \Lambda _{(\tilde{\theta }, \tilde{y}, \tilde{z})}[{\widehat{\theta }}, {\widehat{y}}, {\widehat{z}}] := - \tilde{y} \cdot {\widehat{\theta }} + \frac{1}{2} ( \partial _x^{-1} \tilde{z}, {\widehat{z}} )_{L^2 (\mathbb T)} . \end{aligned}$$
(4.4)

We rescale the “unperturbed actions” \( \tilde{\xi }\) and the variables \(\tilde{\theta }, \tilde{y}, \tilde{z}\) as

$$\begin{aligned} \tilde{\xi }= \varepsilon ^2 \xi , \quad \tilde{y} = \varepsilon ^{2b} y , \quad \tilde{z} = \varepsilon ^b z , \quad b > 1, \end{aligned}$$
(4.5)

where \(b>1\) will be fixed below (see (5.9) and Remark 3). The symplectic 2 -form in (4.3) transforms into \( \varepsilon ^{2b} \mathcal{W } \). Hence the Hamiltonian system generated by \( \mathcal{H} \) in (3.7) transforms into the new Hamiltonian system

$$\begin{aligned} {\left\{ \begin{array}{ll} {\dot{\theta }} = \partial _y H_{\varepsilon } (\theta , y, z), \\ {\dot{y}} = - \partial _\theta H_{\varepsilon } (\theta , y, z), \\ {\dot{z}} = \partial _x \nabla _z H_{\varepsilon } (\theta , y, z), \end{array}\right. } \quad H_{\varepsilon } := \varepsilon ^{-2b} \mathcal {H}\circ A_\varepsilon , \end{aligned}$$
(4.6)

where

$$\begin{aligned} A_\varepsilon (\theta , y, z) := \varepsilon v_\varepsilon (\theta , y) + \varepsilon ^b z, \ \ \ v_\varepsilon (\theta ,y) := \sum _{j \in S} \sqrt{\xi _j + \varepsilon ^{2(b-1)} |j| y_j} \, e^{{\mathrm i}\theta _j} e^{{\mathrm i}j x}. \end{aligned}$$
(4.7)

We still denote by

$$\begin{aligned} X_{H_\varepsilon } = (\partial _y H_\varepsilon , - \partial _\theta H_\varepsilon , \partial _x \nabla _z H_\varepsilon ) \end{aligned}$$

the Hamiltonian vector field in the variables \( (\theta , y, z ) \in \mathbb T^\nu \times \mathbb R^\nu \times H_S^\bot \).

We now write explicitly the Hamiltonian \( H_{\varepsilon } (\theta , y, z) \) defined in (4.6). Recall the expression of \( \mathcal{H } \) given in (3.7). The quadratic Hamiltonian \( H_2 \) in (3.4) transforms into

$$\begin{aligned} \varepsilon ^{-2b} H_2 \circ A_\varepsilon = const + \sum _{j \in S^+} j^3 y_j + \frac{1}{2} \int _{\mathbb T} z_x^2 \, dx , \end{aligned}$$
(4.8)

and, by (3.7) and (3.8) we get [writing, in short, \( v_\varepsilon := v_\varepsilon (\theta , y) \)]

$$\begin{aligned} H_{\varepsilon } (\theta , y, z)= & {} e(\xi ) + \alpha (\xi ) \cdot y + \frac{1}{2} \int _\mathbb Tz_x^2 \, dx - \frac{3 \varsigma }{2}\, \varepsilon ^2 \int _\mathbb Tv_\varepsilon ^2 z^2 \, dx \nonumber \\&+ 3 \varsigma \varepsilon ^{2b} \left( \frac{1}{2} \sum _{j \in S^+} j^2 y_j^2 - \sum _{j,j' \in S^+} j y_j j' y_{j'} \right) - \varsigma \varepsilon ^{1+b} \int _\mathbb Tv_\varepsilon z^3 \, dx \nonumber \\&- \frac{\varsigma }{4}\, \varepsilon ^{2b} \int _\mathbb Tz^4 \, dx + \varepsilon ^{-2b} \mathcal{H}_{\ge 5} (\varepsilon v_\varepsilon (\theta ,y) + \varepsilon ^b z ) \end{aligned}$$
(4.9)

where \(e(\xi )\) is a constant, and \(\alpha (\xi ) \in \mathbb R^\nu \) is the vector of components

$$\begin{aligned} \alpha _i(\xi ) := \bar{\jmath }_i^3 + 3 \varsigma \varepsilon ^2 [ \xi _i - 2 (\xi _1 + \cdots + \xi _\nu ) ] \bar{\jmath }_i , \quad i = 1, \ldots , \nu . \end{aligned}$$

This is the “frequency-to-amplitude” map which describes, at the main order, how the tangential frequencies are shifted by the amplitudes \( \xi := ( \xi _1, \ldots , \xi _\nu ) \). It can be written in compact form as

$$\begin{aligned} \alpha (\xi ) := \bar{\omega }+ \varepsilon ^2 {\mathbb A} \xi , \quad {\mathbb A} := 3\varsigma D_S (I - 2 U), \end{aligned}$$
(4.10)

where \( \bar{\omega }:= (\bar{\jmath }_1^3, \ldots , \bar{\jmath }_\nu ^3) \in \mathbb N^\nu \) (see (1.19)) is the vector of the unperturbed linear frequencies of oscillations on the tangential sites, \( D_S \) is the diagonal matrix

$$\begin{aligned} D_S := \mathrm {diag}(\bar{\jmath }_1, \ldots , \bar{\jmath }_\nu ) \in \mathrm{Mat}(\nu \times \nu ) , \end{aligned}$$

I is the \(\nu \times \nu \) identity matrix, and U is the \(\nu \times \nu \) matrix with all entries equal to 1. The matrix \(\mathbb A\) is often called the “twist” matrix . It turns out to be invertible. Indeed, since \(U^2 = \nu U\), one has \((I - 2 U)( I - \frac{2}{2\nu -1}\, U ) = I\), and therefore

$$\begin{aligned} \mathbb A^{-1} = \frac{1}{3\varsigma }\, \left( I - \frac{2}{2\nu -1}\, U \right) D_S^{-1} . \end{aligned}$$
(4.11)

With this notation, one can also write

$$\begin{aligned} \frac{1}{2} \sum _{j \in S^+} j^2 y_j^2 - \sum _{j,j' \in S^+} j y_j j' y_{j'} = \frac{1}{2} (I-2U) (D_S y) \cdot (D_S y). \end{aligned}$$
(4.12)

Remark 2

By Remark 1, for the Hamiltonian \( K = H + \lambda M^2 \), \(\lambda := 3 \varsigma /4\), defined in (1.18) the twist matrix in the frequency-amplitude relation (4.10) becomes \(\mathbb A = 3 \varsigma D_S\), which is diagonal.

We write the Hamiltonian in (4.9) [eliminating the constant \(e(\xi )\) which is irrelevant for the dynamics] as \(H_{\varepsilon } = \mathcal{N} + P\), where

$$\begin{aligned} \begin{array}{lll} &{}\displaystyle \quad \mathcal{N}(\theta , y, z) = \alpha (\xi ) \cdot y + \frac{1}{2} (N(\theta ) z , z )_{L^2(\mathbb T)} , \\ &{}\displaystyle (N(\theta ) z, z )_{L^2(\mathbb T)} := \int _\mathbb Tz_x^2 dx - 3\varsigma \varepsilon ^2 \int _\mathbb Tv_\varepsilon ^2(\theta ,0) z^2 \, dx , \end{array} \end{aligned}$$
(4.13)

describes the linear dynamics, and \( P := H_{\varepsilon } - \mathcal{N} \), namely

$$\begin{aligned} P&:= \frac{3\varsigma }{2}\, \varepsilon ^{2b} (I-2U) (D_S y) \cdot (D_S y) - \frac{3 \varsigma }{2}\, \varepsilon ^2 \int _\mathbb T[v_\varepsilon ^2(\theta ,y) - v_\varepsilon ^2(\theta ,0)] z^2 \, dx \nonumber \\&\quad - \varsigma \varepsilon ^{1+b} \int _\mathbb Tv_\varepsilon (\theta ,y) z^3 \, dx - \frac{\varsigma }{4}\, \varepsilon ^{2b} \int _\mathbb Tz^4 \, dx + \varepsilon ^{-2b} \mathcal{H}_{\ge 5} (\varepsilon v_\varepsilon (\theta ,y) + \varepsilon ^b z ) , \end{aligned}$$
(4.14)

collects the nonlinear perturbative effects.

5 The nonlinear functional setting

We look for an embedded invariant torus

$$\begin{aligned} i :\mathbb T^\nu \rightarrow \mathbb T^\nu \times \mathbb R^\nu \times H_S^\bot , \quad \varphi \mapsto i (\varphi ) := ( \theta (\varphi ), y (\varphi ), z (\varphi )) \end{aligned}$$
(5.1)

of the Hamiltonian vector field \( X_{H_\varepsilon } \) filled by quasi-periodic solutions with diophantine frequency \( \omega \in \mathbb R^\nu \), that we regard as independent parameters. We require that \( \omega \) belongs to the set

$$\begin{aligned} \Omega _\varepsilon := \alpha ( [1,2]^\nu ) = \{ \alpha (\xi ) : \xi \in [1,2]^\nu \} \end{aligned}$$
(5.2)

where \( \alpha \) is the affine diffeomorphism (4.10). Since any \( \omega \in \Omega _\varepsilon \) is \( \varepsilon ^2 \)-close to the integer vector \( \bar{\omega }\in \mathbb N^\nu \) (see (1.19), (4.10)), we require that the constant \(\gamma \) in the diophantine inequality

$$\begin{aligned} |\omega \cdot l | \ge \gamma \langle l \rangle ^{-\tau } , \quad \forall l \in \mathbb Z^\nu {\setminus } \{0\} , \quad \text {satisfies} \ \gamma = \varepsilon ^{2+a} \quad \text {for some} \ a > 0 . \end{aligned}$$
(5.3)

In (5.9) we will fix \(a \in (0,1/6)\) (see also the discussion in Remark 3). Note that the definition of \(\gamma \) in (5.3) is slightly stronger than the minimal condition, which is \( \gamma \le c \varepsilon ^2 \) with c small enough. We assume \( a > 0 \) just for simplicity. In addition to (5.3) we shall also require that \( \omega \) satisfies the first and second order Melnikov-non-resonance conditions (8.63).

We fix the amplitude \(\xi \) as a function of \(\omega \) and \( \varepsilon \), as

$$\begin{aligned} \xi := \varepsilon ^{-2} \mathbb {A}^{-1} [\omega - \bar{\omega }], \end{aligned}$$
(5.4)

so that \(\alpha (\xi ) = \omega \) (see (4.10)).

Now we look for an embedded invariant torus of the modified Hamiltonian vector field \( X_{H_{\varepsilon , \zeta }} = X_{H_\varepsilon } + (0, \zeta , 0) \), \( \zeta \in \mathbb R^\nu \), which is generated by the Hamiltonian

$$\begin{aligned} H_{\varepsilon , \zeta } (\theta , y, z) := H_\varepsilon (\theta , y, z) + \zeta \cdot \theta ,\quad \zeta \in \mathbb R^\nu . \end{aligned}$$
(5.5)

Note that the vector field \( X_{H_{\varepsilon , \zeta }} \) is periodic in \(\theta \) (unlike the Hamiltonian \( H_{\varepsilon , \zeta } \)). We introduce \(\zeta \) in order to adjust the average in the second equation of the linearized system (6.22), see (6.23). The vector \( \zeta \) has however no dynamical consequences. Indeed it turns out that an invariant torus for the Hamiltonian vector field \( X_{H_{\varepsilon , \zeta }} \) is actually invariant for \( X_{H_\varepsilon } \) itself, see Lemma 6. Hence we look for zeros of the nonlinear operator

$$\begin{aligned} \mathcal{F} (i, \zeta )&:= \mathcal{F} (i, \zeta , \omega , \varepsilon ) := \mathcal{D}_\omega i (\varphi ) - X_{H_\varepsilon } (i(\varphi )) + (0, \zeta , 0 ) \nonumber \\&= \begin{pmatrix} \mathcal{D}_\omega \theta (\varphi ) - \partial _y H_\varepsilon ( i(\varphi ) ) \\ \mathcal{D}_\omega y (\varphi ) + \partial _\theta H_\varepsilon ( i(\varphi ) ) + \zeta \nonumber \\ \mathcal{D}_\omega z (\varphi ) - \partial _x \nabla _z H_\varepsilon ( i(\varphi )) \end{pmatrix} \nonumber \\&= \begin{pmatrix} \mathcal{D}_\omega \Theta (\varphi ) - \partial _y P (i(\varphi ) ) \\ \mathcal{D}_\omega y (\varphi ) + \frac{1}{2} \partial _\theta ( N(\theta (\varphi )) z(\varphi ), z(\varphi ) )_{L^2(\mathbb T)} + \partial _\theta P ( i(\varphi ) ) + \zeta \\ \mathcal{D}_\omega z (\varphi ) - \partial _x N ( \theta (\varphi )) z (\varphi ) - \partial _x \nabla _z P ( i(\varphi ) ) \end{pmatrix} \end{aligned}$$
(5.6)

where \( \Theta (\varphi ) := \theta (\varphi ) - \varphi \) is \( (2 \pi )^\nu \)-periodic and we use (here and everywhere in the paper) the short notation

$$\begin{aligned} \mathcal{D}_\omega := \omega \cdot \partial _\varphi . \end{aligned}$$
(5.7)

The Sobolev norm of the periodic component of the embedded torus

$$\begin{aligned} {\mathfrak I}(\varphi ) := i (\varphi ) - (\varphi ,0,0) := ( {\Theta } (\varphi ), y(\varphi ), z(\varphi )), \quad \Theta (\varphi ) := \theta (\varphi ) - \varphi , \end{aligned}$$
(5.8)

is \(\Vert {\mathfrak I} \Vert _s := \Vert \Theta \Vert _{H^s_\varphi } + \Vert y \Vert _{H^s_\varphi } + \Vert z \Vert _s \) where \( \Vert z \Vert _s := \Vert z \Vert _{H^s_{\varphi ,x}} \) is defined in (2.3). We link the rescaling (4.5) with the diophantine constant \( \gamma = \varepsilon ^{2+a} \) by choosing

$$\begin{aligned} \gamma = \varepsilon ^{2+a} = \varepsilon ^{2b}, \quad b = 1 + ( a \slash 2 ) , \quad a \in (0, 1/6). \end{aligned}$$
(5.9)

Other choices are possible, see Remark 3.

Theorem 2

Let the tangential sites S in (1.11) satisfy (1.12). For all \( \varepsilon \in (0, \varepsilon _0 ) \), where \( \varepsilon _0 \) is small enough, there exist a constant \(C>0\) and a Cantor-like set \( \mathcal{C}_\varepsilon \subset \Omega _\varepsilon \), with asympotically full measure as \( \varepsilon \rightarrow 0 \), namely

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \, \frac{|\mathcal{C}_\varepsilon |}{|\Omega _\varepsilon |} = 1 , \end{aligned}$$
(5.10)

such that, for all \( \omega \in \mathcal{C}_\varepsilon \), there exists a solution \( i_\infty (\varphi ) := i_\infty (\omega , \varepsilon )(\varphi ) \) of the equation \(\mathcal {F}(i_\infty , 0, \omega , \varepsilon ) = 0\) (the nonlinear operator \(\mathcal {F}(i,\zeta ,\omega ,\varepsilon )\) is defined in (5.6)). Hence the embedded torus \( \varphi \mapsto i_\infty (\varphi ) \) is invariant for the Hamiltonian vector field \( X_{H_\varepsilon } \), and it is filled by quasi-periodic solutions with frequency \( \omega \). The torus \(i_\infty \) satisfies

$$\begin{aligned} \Vert i_\infty (\varphi ) - (\varphi ,0,0) \Vert _{s_0 + \mu }^{\mathrm {Lip}(\gamma )}\le C \varepsilon ^{5-2b} \gamma ^{-1} = C \varepsilon ^{1-2a} \end{aligned}$$
(5.11)

for some \( \mu := \mu (\nu ) > 0 \). Moreover, the torus \( i_\infty \) is linearly stable.

Theorem 2 is proved in Sects. 69. It implies Theorem 1 where the \( \xi _j \) in (1.13) are the components of the vector \(\mathbb {A}^{-1}[\omega - \bar{\omega }]\). By (5.11), going back to the variables before the rescaling (4.5), we get \( \tilde{\Theta }_\infty = O( \varepsilon ^{5-4b}) \), \( \tilde{y}_\infty = O( \varepsilon ^{5-2b} ) \), \( \tilde{z}_\infty = O( \varepsilon ^{5-3b} ) \).

Remark 3

The way to link the amplitude-rescaling (4.5) with the diophantine constant \( \gamma = \varepsilon ^{2+a} \) in (5.3) is not unique.

The choice \( \varepsilon ^{2b} < \gamma \) (i.e. “\( b > 1 \) large”) reduces to study the Hamiltonian \( H_\varepsilon \) in (4.9) as a perturbation of an isochronous system (as in [21, 23, 27]). We can take \( b = 4 / 3 \) in order to minimize the size of the perturbation \( P = O( \varepsilon ^{7/3}) \), estimating uniformly all the terms in the last two lines of (4.9). As a counterpart we have to regard in (4.9) the constants \( \alpha := \alpha (\xi ) \in \mathbb R^\nu \) (or \( \xi \) in (4.7)) as independent variables. This is the perspective described for example in [10]. Then the Nash–Moser scheme produces iteratively a sequence of \( \xi _n = \xi _n (\omega ) \) and embeddings \( \varphi \mapsto i_n (\varphi ) := (\theta _n (\varphi ), y_n (\varphi ), z_n (\varphi ) )\) at the same time.

The case \( \varepsilon ^{2b} > \gamma \) (i.e. “\( b \ge 1 \) small”), in particular if \( b = 1 \), reduces to study the Hamiltonian \( H_\varepsilon \) in (4.9) as a perturbation of a non-isochronous system à la Arnold–Kolmogorov (note that the quadratic Hamiltonian in (4.12) satisfies the usual Kolmorogov non-degeneracy condition). In this case, the constant \( \xi _j \) in (4.7) and the average of \( |j| y_j (\varphi ) \) have the same size and therefore the same role. Then we may consider \( \xi _j \) as fixed, and tune the average of the action component \( y_j (\varphi ) \) in order to solve the linear Eq. (6.28), which corresponds to the angle component. We use the invertible (averaged) “twist”-matrix (6.30) to impose that the right hand side in (6.28) has zero average.

The intermediate case \(\varepsilon ^{2b} = \gamma \), adopted in this paper (as well as in [5]), has the advantage to avoid the introduction of the \( \xi (\omega ) \) as an independent variable, but it also enables to estimate uniformly the sizes of the components of \( (\Theta (\varphi ) , y (\varphi ) , z (\varphi ) ) \) with no distinctions.

Now we prove tame estimates for the composition operator induced by the Hamiltonian vector fields \( X_\mathcal{N} \) and \( X_P \) in (5.6), which are used in the next sections. Since the functions \( y \mapsto \sqrt{\xi + \varepsilon ^{2(b - 1)}|j| y} \), \(\theta \mapsto e^{{\mathrm i}\theta }\) are analytic for \(\varepsilon \) small enough, \(j \in S\) and \(|y| \le C\), the composition Lemma 2 implies that, for all \( \Theta , y \in H^s(\mathbb T^\nu , \mathbb R^\nu )\) with \(\Vert \Theta \Vert _{s_0}\), \(\Vert y \Vert _{s_0} \le 1\), setting \(\theta (\varphi ) := \varphi + \Theta (\varphi )\), one has the tame estimate

$$\begin{aligned} \Vert v_\varepsilon (\theta (\varphi ) ,y(\varphi ) ) \Vert _s \le _s 1 + \Vert \Theta \Vert _s + \Vert y \Vert _s . \end{aligned}$$

Hence the map \( A_\varepsilon \) in (4.7) satisfies, for all \( \Vert {\mathfrak I} \Vert _{s_0}^{\mathrm {Lip}(\gamma )}\le 1 \) (see (5.8))

$$\begin{aligned} \Vert A_\varepsilon (\theta (\varphi ),y(\varphi ),z(\varphi )) \Vert _s^{{\mathrm {Lip}(\gamma )}} \le _s \varepsilon (1 + \Vert {\mathfrak I} \Vert _s^{\mathrm {Lip}(\gamma )}) . \end{aligned}$$
(5.12)

In the following lemma we collect tame estimates for the Hamiltonian vector fields \( X_\mathcal{N} \), \( X_P \), \( X_{H_\varepsilon } \) (see (4.13), (4.14)) whose proof is a direct application of classical tame product and composition estimates.

Lemma 5

Let \( {\mathfrak {I}}(\varphi ) \) in (5.8) satisfy \( \Vert {\mathfrak I} \Vert _{s_0 + 3}^{\mathrm {Lip}(\gamma )}\le C \varepsilon ^{5-2b} \gamma ^{-1} = C \varepsilon ^{5-4b}\). Then, writing in short \(\Vert \ \Vert _s\) to indicate \(\Vert \ \Vert _s^{\mathrm {Lip}(\gamma )}\), one has

$$\begin{aligned} \begin{array}{llllll} &{}\Vert \partial _y P(i) \Vert _s \le _s \varepsilon ^3 + \varepsilon ^{2b} \Vert {\mathfrak I}\Vert _{s+3} &{} \Vert \partial _\theta P(i) \Vert _s \le _s \varepsilon ^{5-2b} (1 + \Vert {\mathfrak I} \Vert _{s+3}) \\ &{}\Vert \nabla _z P(i) \Vert _s \le _s \varepsilon ^{4-b} + \varepsilon ^{6-3b} \Vert {\mathfrak I} \Vert _{s+3} &{} \Vert X_P(i)\Vert _s \le _s \varepsilon ^{5-2b} + \varepsilon ^{2b} \Vert {\mathfrak I}\Vert _{s+3} \\ &{}\Vert \partial _{\theta } \partial _y P(i)\Vert _s \le _s \varepsilon ^3 + \varepsilon ^{5-2b} \Vert {\mathfrak I}\Vert _{s+3} &{} \Vert \partial _y \nabla _z P(i)\Vert _s \le _s \varepsilon ^{2+b} + \varepsilon ^{2b} \Vert {\mathfrak I} \Vert _{s+3} \\ &{} &{} \Vert \partial _{yy} P(i) - \varepsilon ^{2b} \mathbb {A} D_S \Vert _s \le _s \varepsilon ^{1+2b} + \varepsilon ^3 \Vert {\mathfrak {I}}\Vert _{s+3} \end{array} \end{aligned}$$

(\(\mathbb {A}, D_S\) are defined in (4.10)) and, for all \( {\widehat{\imath }} := ({\widehat{\Theta }}, {\widehat{y}}, {\widehat{z}}) \),

$$\begin{aligned} \Vert \partial _y d_{i} X_P(i)[{\widehat{\imath }} \,] \Vert _s&\le _s \varepsilon ^{2b} \big ( \Vert {\widehat{\imath }} \,\Vert _{s + 3} + \Vert {\mathfrak I}\Vert _{s + 3} \Vert {\widehat{\imath }} \,\Vert _{s_0 + 3}\big )\end{aligned}$$
(5.13)
$$\begin{aligned} \Vert d_i X_{H_\varepsilon }(i) [{\widehat{\imath }} \, ] + (0,0, \partial _{xxx} \hat{z})\Vert _s&\le _s \varepsilon ^2 \big ( \Vert {\widehat{\imath }} \,\Vert _{s + 3} + \Vert {\mathfrak I} \Vert _{s + 3} \Vert {\widehat{\imath }} \,\Vert _{s_0 + 3} \big )\end{aligned}$$
(5.14)
$$\begin{aligned} \Vert d_i^2 X_{H_\varepsilon }(i) [{\widehat{\imath }}, {\widehat{\imath }} \,]\Vert _s&\le _s \varepsilon ^2 \big ( \Vert {\widehat{\imath }} \,\Vert _{s + 3} \Vert {\widehat{\imath }} \,\Vert _{s_0 + 3} + \Vert {\mathfrak I}\Vert _{s + 3} \Vert {\widehat{\imath }} \,\Vert _{s_0 + 3}^2 \big ) . \end{aligned}$$
(5.15)

In the sequel we also use that, by the diophantine condition (5.3), the operator \( \mathcal{D}_\omega ^{-1} \) (see (5.7)) is defined for all functions u with zero \( \varphi \)-average, and satisfies

$$\begin{aligned} \Vert \mathcal{D}_\omega ^{-1} u \Vert _s \le C \gamma ^{-1} \Vert u \Vert _{s+ \tau } , \quad \Vert \mathcal{D}_\omega ^{-1} u \Vert _s^{{\mathrm {Lip}(\gamma )}} \le C \gamma ^{-1} \Vert u \Vert _{s+ 2 \tau +1}^{{\mathrm {Lip}(\gamma )}} . \end{aligned}$$
(5.16)

6 Approximate inverse

In order to implement a convergent Nash–Moser scheme that leads to a solution of \( \mathcal {F}(i, \zeta ) = 0 \), we now construct an approximate right inverse (which satisfies tame estimates) of the linearized operator

$$\begin{aligned} d_{i, \zeta }\mathcal{F}(i_0, \zeta _0)[{\widehat{\imath }}, {\widehat{\zeta }} ] = \mathcal{D}_\omega {\widehat{\imath }}- d_i X_{H_\varepsilon } ( i_0 (\varphi ) ) [{\widehat{\imath }} ] + (0, {\widehat{\zeta }}, 0 ) , \end{aligned}$$
(6.1)

see Theorem 3. Note that \( d_{i, \zeta } \mathcal{F}(i_0, \zeta _0 ) \) is independent of \( \zeta _0 \) (see (5.6)).

The notion of approximate right inverse is introduced in [31]. It denotes a linear operator which is an exact right inverse at a solution \( (i_0, \zeta _0) \) of \( \mathcal{F}(i_0, \zeta _0) = 0 \). We implement the general strategy in [10] which reduces the search of an approximate right inverse of (6.1) to the search of an approximate inverse on the normal directions only.

It is well known that an invariant torus \( i_0 \) with diophantine flow is isotropic (see e.g. [10]), namely the pull-back 1-form \( i_0^* \Lambda \) is closed, where \( \Lambda \) is the Liouville 1-form in (4.4). This is tantamount to say that the 2-form \( \mathcal W \) (see (4.3)) vanishes on the torus \( i_0 (\mathbb T^\nu )\), because \( i_0^* \mathcal{W} = i_0^* d \Lambda = d i_0^* \Lambda \). For an “approximately invariant” torus \( i_0 \) the 1-form \( i_0^* \Lambda \) is only “approximately closed”. In order to make this statement quantitative we consider

$$\begin{aligned} i_0^* \Lambda&= \sum _{k = 1}^\nu a_k (\varphi ) d \varphi _k , \nonumber \\ a_k(\varphi )&:= - ( [\partial _\varphi \theta _0 (\varphi )]^T y_0 (\varphi ) )_k + \frac{1}{2} ( \partial _{\varphi _k} z_0(\varphi ), \partial _{x}^{-1} z_0(\varphi ) )_{L^2(\mathbb T)} \end{aligned}$$
(6.2)

and we quantify how small is

$$\begin{aligned} i_0^* \mathcal{W} = d \, i_0^* \Lambda = \mathop {\sum }_{1 \le k < j \le \nu } A_{k j}(\varphi ) d \varphi _k \wedge d \varphi _j, \quad A_{k j} := \partial _{\varphi _k} a_j - \partial _{\varphi _j} a_k. \end{aligned}$$
(6.3)

Along this section we will always assume the following hypothesis (which will be verified at each step of the Nash–Moser iteration):

  • Assumption The map \(\omega \mapsto i_0(\omega )\) is a Lipschitz function defined on some subset \(\Omega _o \subset \Omega _\varepsilon \), where \(\Omega _\varepsilon \) is defined in (5.2), and, for some \({\mu } := {\mu } ({{\tau }}, {\nu }) > 0 \),

$$\begin{aligned} \Vert {\mathfrak I}_0 \Vert _{s_0+\mu }^{{\mathrm {Lip}(\gamma )}}\le & {} C \varepsilon ^{5-2b} \gamma ^{-1} = C \varepsilon ^{5-4b}, \quad \Vert Z \Vert _{s_0 + \mu }^{{\mathrm {Lip}(\gamma )}} \le C \varepsilon ^{5-2b}, \nonumber \\ \gamma= & {} \varepsilon ^{2 + a} = \varepsilon ^{2b}, \quad b := 1 + (a/2) , \quad a \in (0, 1/6), \end{aligned}$$
(6.4)

where \({\mathfrak {I}}_0(\varphi ) := i_0(\varphi ) - (\varphi ,0,0)\), and

$$\begin{aligned} Z(\varphi ) := (Z_1, Z_2, Z_3) (\varphi ) := \mathcal{F}(i_0, \zeta _0) (\varphi ) = \omega \cdot \partial _\varphi i_0(\varphi ) - X_{H_{\varepsilon , \zeta _0}}(i_0(\varphi )) \end{aligned}$$
(6.5)

is the “error” function.

Lemma 6

(Lemma 6.1 in [5]) \( |\zeta _0|^{{\mathrm {Lip}(\gamma )}} \le C \Vert Z \Vert _{s_0}^{{\mathrm {Lip}(\gamma )}}\) . If \( \mathcal{F}(i_0, \zeta _0) = 0 \), then \( \zeta _0 = 0 \), and the torus \(i_0(\varphi )\) is invariant for \(X_{H_\varepsilon }\).

Now we estimate the size of \( i_0^* \mathcal{W} \) in terms of Z . From (6.2) and (6.3) one has \(\Vert A_{kj} \Vert _s^{\mathrm {Lip}(\gamma )}\le _s \Vert {\mathfrak {I}}_0 \Vert _{s+2}^{\mathrm {Lip}(\gamma )}\). Moreover, \(A_{kj}\) also satisfies the following bound.

Lemma 7

(Lemma 6.2 in [5]) The coefficients \(A_{kj} (\varphi ) \) in (6.3) satisfy

$$\begin{aligned} \Vert A_{k j} \Vert _s^{{\mathrm {Lip}(\gamma )}} \le _s \gamma ^{-1} \big (\Vert Z \Vert _{s+2{\tau }+2}^{{\mathrm {Lip}(\gamma )}} + \Vert Z \Vert _{s_0+1}^{{\mathrm {Lip}(\gamma )}} \Vert {\mathfrak I}_0 \Vert _{s+ 2 {\tau }+ 2}^{{\mathrm {Lip}(\gamma )}} \big ). \end{aligned}$$
(6.6)

As in [10], we first modify the approximate torus \( i_0 \) to obtain an isotropic torus \( i_\delta \) which is still approximately invariant. We denote the Laplacian \( \Delta _\varphi := \sum _{k=1}^\nu \partial _{\varphi _k}^2 \).

Lemma 8

(Isotropic torus) The torus \( i_\delta (\varphi ) := (\theta _0(\varphi ), y_\delta (\varphi ), z_0(\varphi ) ) \) defined by

$$\begin{aligned} y_\delta := y_0 + [\partial _\varphi \theta _0(\varphi )]^{- T} \rho (\varphi ) , \quad \rho _j(\varphi ) := \Delta _\varphi ^{-1} \mathop {\sum }_{ k = 1}^\nu \partial _{\varphi _j} A_{k j}(\varphi ) \end{aligned}$$
(6.7)

is isotropic. If (6.4) holds, then, for some \( \sigma := \sigma (\nu ,{\tau }) \),

$$\begin{aligned} \Vert y_\delta - y_0 \Vert _s^{{\mathrm {Lip}(\gamma )}}&\le _s \Vert {\mathfrak {I}}_0 \Vert _{s+\sigma }^{{\mathrm {Lip}(\gamma )}} ,\end{aligned}$$
(6.8)
$$\begin{aligned} \Vert y_\delta - y_0 \Vert _s^{{\mathrm {Lip}(\gamma )}}&\le _s \gamma ^{-1} \big \{ \Vert Z \Vert _{s+\sigma }^{\mathrm {Lip}(\gamma )}+ \Vert Z \Vert _{s_0+\sigma }^{\mathrm {Lip}(\gamma )}\Vert {\mathfrak {I}}_0 \Vert _{s+\sigma }^{\mathrm {Lip}(\gamma )}\big \} ,\end{aligned}$$
(6.9)
$$\begin{aligned} \Vert \mathcal{F}(i_\delta , \zeta _0) \Vert _s^{{\mathrm {Lip}(\gamma )}}&\le _s \Vert Z \Vert _{s + \sigma }^{{\mathrm {Lip}(\gamma )}} + \Vert {\mathfrak {I}}_0 \Vert _{s+\sigma }^{\mathrm {Lip}(\gamma )}\Vert Z \Vert _{s_0 + \sigma }^{{\mathrm {Lip}(\gamma )}} ,\end{aligned}$$
(6.10)
$$\begin{aligned} \Vert \partial _i [ i_\delta ][ {\widehat{\imath }} ] \Vert _s&\le _s \Vert {\widehat{\imath }} \Vert _s + \Vert {\mathfrak I}_0\Vert _{s + \sigma } \Vert {\widehat{\imath }} \Vert _s . \end{aligned}$$
(6.11)

In the paper we denote equivalently the differential by \( \partial _i \) or \( d_i \). Moreover we denote by \( \sigma := \sigma (\nu , {\tau }) \) possibly different (larger) “loss of derivatives” constants.

Proof

It is sufficient to closely follow the proof of Lemma 6.3 of [5]. We mention the only difference: equation (6.11) of [5] is \(\Vert \mathcal{F}(i_\delta , \zeta _0) \Vert _s^{{\mathrm {Lip}(\gamma )}} \le _s \Vert Z \Vert _{s + \sigma }^{{\mathrm {Lip}(\gamma )}} + \varepsilon ^{2b-1} \gamma ^{-1} \Vert {\mathfrak {I}}_0 \Vert _{s+\sigma }^{\mathrm {Lip}(\gamma )}\Vert Z \Vert _{s_0 + \sigma }^{{\mathrm {Lip}(\gamma )}}\), with a big factor \(\varepsilon ^{2b-1} \gamma ^{-1} = \varepsilon ^{-1}\) more with respect to the present bound (6.10). In (6.10) there is no such a factor, because, by the estimates for \(\partial _\theta \partial _y P, \partial _{yy}P, \partial _y \nabla _z P\) in Lemma 5, here we have \(\Vert \partial _y X_P (i) \Vert _s \le _s \varepsilon ^{2b} (1 + \Vert {\mathfrak I}\Vert _{s+3})\). Hence (6.4), (6.8) and (6.9) imply that

$$\begin{aligned} \Vert X_ P(i_\delta ) - X_P(i_0 )\Vert _s \le _s \Vert Z \Vert _{s + \sigma } + \Vert {\mathfrak I}_0 \Vert _{s + \sigma } \Vert Z \Vert _{s_0 + \sigma }. \end{aligned}$$
(6.12)

Then the proof goes on as in [5], without the large factor \(\varepsilon ^{2b-1} \gamma ^{-1}\). \(\square \)

In order to find an approximate inverse of the linearized operator \(d_{i, \zeta } \mathcal{F}(i_\delta )\) we introduce a suitable set of symplectic coordinates nearby the isotropic torus \( i_\delta \). We consider the map \( G_\delta : (\psi , \eta , w) \rightarrow (\theta , y, z)\) of the phase space \(\mathbb T^\nu \times \mathbb R^\nu \times H_S^\bot \) defined by

$$\begin{aligned} \begin{pmatrix} \theta \\ y \\ z \end{pmatrix} := G_\delta \begin{pmatrix} \psi \\ \eta \\ w \end{pmatrix} := \begin{pmatrix} \theta _0(\psi ) \\ y_\delta (\psi ) + [\partial _\psi \theta _0(\psi )]^{-T} \eta + \big [ (\partial _\theta \tilde{z}_0) (\theta _0(\psi )) \big ]^T \partial _x^{-1} w \\ z_0(\psi ) + w \end{pmatrix} \end{aligned}$$
(6.13)

where \(\tilde{z}_0 (\theta ) := z_0 (\theta _0^{-1} (\theta ))\). It is proved in [10] that \( G_\delta \) is symplectic, using that the torus \( i_\delta \) is isotropic (Lemma 8). In the new coordinates, \( i_\delta \) is the trivial embedded torus \( (\psi , \eta , w ) = (\psi , 0, 0 ) \). The transformed Hamiltonian \( K := K(\psi , \eta , w, \zeta _0) \) is (recall (5.5))

$$\begin{aligned} K&:= H_{\varepsilon , \zeta _0} \circ G_\delta \nonumber \\&= \theta _0(\psi ) \cdot \zeta _0 + K_{00}(\psi ) + K_{10}(\psi ) \cdot \eta + (K_{0 1}(\psi ), w)_{L^2(\mathbb T)} + \tfrac{1}{2} K_{2 0}(\psi ){\eta } \cdot {\eta } \nonumber \\&\quad + ( K_{11}(\psi ) {\eta } , w )_{L^2(\mathbb T)} + \tfrac{1}{2} (K_{02}(\psi ) w , w )_{L^2(\mathbb T)} + K_{\ge 3}(\psi , \eta , w) \end{aligned}$$
(6.14)

where \( K_{\ge 3} \) collects the terms at least cubic in the variables \( (\eta , w )\). At any fixed \(\psi \), the Taylor coefficient \(K_{00}(\psi ) \in \mathbb R\), \(K_{10}(\psi ) \in \mathbb R^\nu \), \(K_{01}(\psi ) \in H_S^\bot \) (it is a function of \( x \in \mathbb T\)), \(K_{20}(\psi ) \) is a \(\nu \times \nu \) real matrix, \(K_{02}(\psi )\) is a linear self-adjoint operator of \( H_S^\bot \) and \(K_{11}(\psi ) :\mathbb R^\nu \rightarrow H_S^\bot \). Note that the above Taylor coefficients do not depend on the parameter \( \zeta _0 \).

The Hamilton equations associated to (6.14) are

$$\begin{aligned} {\left\{ \begin{array}{ll} {\dot{\psi }} = K_{10}(\psi ) + K_{20}(\psi ) \eta + K_{11}^T (\psi ) w + \partial _{\eta } K_{\ge 3}(\psi , \eta , w)\\ {\dot{\eta }} = - [\partial _\psi \theta _0(\psi )]^T \zeta _0 - \partial _\psi K_{00}(\psi ) - [\partial _{\psi }K_{10}(\psi )]^T \eta - [ \partial _{\psi } K_{01}(\psi )]^T w\\ \qquad - \partial _\psi \{ \frac{1}{2} K_{2 0}(\psi )\eta \cdot \eta + ( K_{11}(\psi ) \eta , w )_{L^2(\mathbb T)} + \frac{1}{2} ( K_{02}(\psi ) w , w )_{L^2(\mathbb T)}\\ \qquad + K_{\ge 3}(\psi , \eta , w) \} \\ {\dot{w}} = \partial _x ( K_{01}(\psi ) + K_{11}(\psi ) \eta + K_{0 2}(\psi ) w + \nabla _w K_{\ge 3}(\psi , \eta , w) ) \end{array}\right. } \end{aligned}$$
(6.15)

where \( [\partial _{\psi }K_{10}(\psi )]^T \) is the \( \nu \times \nu \) transposed matrix and the operators \( [\partial _{\psi }K_{01}(\psi )]^T \) and \( K_{11}^T(\psi ) :{H_S^\bot \rightarrow \mathbb R^\nu } \) are defined by the duality relation \(( \partial _{\psi } K_{01}(\psi ) [\hat{\psi } ], w)_{L^2}\) \(= \hat{\psi } \cdot [\partial _{\psi }K_{01}(\psi )]^T w \), for all \(\hat{\psi } \in \mathbb R^\nu \), \(w \in H_S^\bot \), and similarly for \( K_{11} \). Explicitly, for all \( w \in H_S^\bot \), and denoting \(\underline{e}_k\) the k-th versor of \(\mathbb R^\nu \),

$$\begin{aligned} K_{11}^T(\psi ) w = \sum _{k=1}^\nu (K_{11}^T(\psi ) w \cdot \underline{e}_k) \underline{e}_k = \sum _{k=1}^\nu ( w, K_{11}(\psi ) \underline{e}_k )_{L^2(\mathbb T)} \underline{e}_k \, \in \mathbb R^\nu . \end{aligned}$$

In the next lemma we estimate the coefficients \( K_{00}, K_{10}, K_{01} \) of the Taylor expansion (6.14). Note that on an exact solution we have \( Z = 0 \) and therefore \( K_{00} (\psi ) = \mathrm{const} \), \( K_{10} = \omega \) and \( K_{01} = 0 \).

Lemma 9

Assume (6.4). Then there is \( \sigma := \sigma (\tau , \nu )\) such that

$$\begin{aligned} \Vert \partial _\psi K_{00} \Vert _s^{{\mathrm {Lip}(\gamma )}}, \Vert K_{10} - \omega \Vert _s^{{\mathrm {Lip}(\gamma )}}, \Vert K_{0 1} \Vert _s^{{\mathrm {Lip}(\gamma )}} \le _s \Vert Z \Vert _{s + \sigma }^{{\mathrm {Lip}(\gamma )}} + \Vert Z \Vert _{s_0 + \sigma }^{{\mathrm {Lip}(\gamma )}} \Vert {\mathfrak I}_0 \Vert _{s+\sigma }^{{\mathrm {Lip}(\gamma )}}. \end{aligned}$$

Proof

Follow the proof of Lemma 6.4 in [5]. The fact that here there is no factor \(\varepsilon ^{2b-1} \gamma ^{-1}\) is a consequence of the better estimate (6.10) for \(\mathcal {F}(i_\delta ,\zeta _0)\) compared to the analogous estimate in [5]. \(\square \)

Remark 4

If \( \mathcal{F} (i_0, \zeta _0) = 0 \) then \(\zeta _0 = 0\) by Lemmas 6 and 9 implies that (6.14) simplifies to the normal form

$$\begin{aligned} K = const + \omega \cdot \eta + \frac{1}{2} K_{2 0}(\psi )\eta \cdot \eta + ( K_{11}(\psi ) \eta , w )_{L^2(\mathbb T)} + \frac{1}{2} ( K_{02}(\psi ) w , w )_{L^2(\mathbb T)} + K_{\ge 3} . \end{aligned}$$

We now estimate \( K_{20}, K_{11}\) in (6.14). The norm of \(K_{20}\) is the sum of the norms of its matrix entries.

Lemma 10

Assume (6.4). Then

$$\begin{aligned} \Vert K_{20} - \varepsilon ^{2b} \mathbb {A} D_S \Vert _s^{{\mathrm {Lip}(\gamma )}}&\le _s \varepsilon ^{2b+1} + \varepsilon ^{2b} \Vert {\mathfrak I}_0\Vert _{s + \sigma }^{{\mathrm {Lip}(\gamma )}} ,\end{aligned}$$
(6.16)
$$\begin{aligned} \Vert K_{11} \eta \Vert _s^{{\mathrm {Lip}(\gamma )}}&\le _s \varepsilon ^{5-2b} \Vert \eta \Vert _s^{{\mathrm {Lip}(\gamma )}} + \varepsilon ^{2b} \Vert {\mathfrak I}_0\Vert _{s + \sigma }^{{\mathrm {Lip}(\gamma )}} \Vert \eta \Vert _{s_0}^{{\mathrm {Lip}(\gamma )}} , \end{aligned}$$
(6.17)
$$\begin{aligned} \Vert K_{11}^T w \Vert _s^{{\mathrm {Lip}(\gamma )}}&\le _s \varepsilon ^{5-2b} \Vert w \Vert _{s + 2}^{{\mathrm {Lip}(\gamma )}} + \varepsilon ^{2b} \Vert {\mathfrak I}_0\Vert _{s + \sigma }^{{\mathrm {Lip}(\gamma )}} \Vert w \Vert _{s_0 + 2}^{{\mathrm {Lip}(\gamma )}} . \end{aligned}$$
(6.18)

In particular \( \Vert K_{20} - \varepsilon ^{2b} \mathbb {A} D_S \Vert _{s_0 }^{{\mathrm {Lip}(\gamma )}} \le C \varepsilon ^{5-2b}\), and

$$\begin{aligned} \Vert K_{11} \eta \Vert _{s_0}^{{\mathrm {Lip}(\gamma )}} \le C \varepsilon ^{5-2b} \Vert \eta \Vert _{s_0}^{{\mathrm {Lip}(\gamma )}} , \quad \Vert K_{11}^T w \Vert _{s_0}^{{\mathrm {Lip}(\gamma )}} \le C \varepsilon ^{5-2b} \Vert w \Vert _{s_0+2}^{{\mathrm {Lip}(\gamma )}} . \end{aligned}$$

Proof

See the proof of Lemma 6.6 in [5]. \(\square \)

Consider the linear change of variables \(({\widehat{\theta }}, {\widehat{y}}, {\widehat{z}}) = D G_\delta (\varphi , 0, 0) [{\widehat{\psi }}, {\widehat{\eta }}, {\widehat{w}}]\), where \(D G_\delta (\varphi ,0,0)\) is obtained by linearizing \(G_\delta \) in (6.13) at \((\varphi ,0,0)\), and it is represented by the matrix

$$\begin{aligned} D G_\delta (\varphi , 0, 0) = \begin{pmatrix} \partial _\psi \theta _0(\varphi ) &{} 0 &{} 0 \\ \partial _\psi y_\delta (\varphi ) &{} \quad [\partial _\psi \theta _0(\varphi )]^{-T} &{} \quad - [(\partial _\theta {\tilde{z}}_0)(\theta _0(\varphi ))]^T \partial _x^{-1} \\ \partial _\psi z_0(\varphi ) &{} 0 &{} I \end{pmatrix}. \end{aligned}$$
(6.19)

The linearized operator \(d_{i, \zeta }\mathcal{F}(i_\delta , \zeta _0)\) transforms (approximately, see (6.40)) into the operator obtained linearizing (6.15) at \((\psi , \eta , w, \zeta ) = (\varphi , 0, 0, \zeta _0 )\) (with \( \partial _t \rightsquigarrow \mathcal{D}_\omega \)), which is the linear operator

$$\begin{aligned} B[{\widehat{\psi }}, {\widehat{\eta }}, {\widehat{w}}, {\widehat{\zeta }}] = \begin{pmatrix} B_1 [{\widehat{\psi }}, {\widehat{\eta }}, {\widehat{w}}, {\widehat{\zeta }}] \\ B_2 [{\widehat{\psi }}, {\widehat{\eta }}, {\widehat{w}}, {\widehat{\zeta }}] \\ B_3 [{\widehat{\psi }}, {\widehat{\eta }}, {\widehat{w}}, {\widehat{\zeta }}] \end{pmatrix}, \end{aligned}$$

where

$$\begin{aligned} B_1&:= \mathcal{D}_\omega {\widehat{\psi }} - \partial _\psi K_{10}(\varphi )[{\widehat{\psi }} \, ] - K_{2 0}(\varphi ){\widehat{\eta }} - K_{11}^T (\varphi ) {\widehat{w}}, \nonumber \\ B_2&:= \mathcal{D}_\omega {\widehat{\eta }} + [\partial _\psi \theta _0(\varphi )]^T {\widehat{\zeta }} + \partial _\psi [\partial _\psi \theta _0(\varphi )]^T [ {\widehat{\psi }}, \zeta _0] + \partial _{\psi \psi } K_{00}(\varphi )[{\widehat{\psi }}] \nonumber \\&\quad + [\partial _\psi K_{10}(\varphi )]^T {\widehat{\eta }} + [\partial _\psi K_{01}(\varphi )]^T {\widehat{w}}, \nonumber \\ B_3&:= \mathcal{D}_\omega {\widehat{w}} - \partial _x \{ \partial _\psi K_{01}(\varphi )[{\widehat{\psi }}] + K_{11}(\varphi ) {\widehat{\eta }} + K_{02}(\varphi ) {\widehat{w}} \}. \end{aligned}$$
(6.20)

Lemma 11

(Lemma 6.7 in [5]) Assume (6.4) and let \( {\widehat{\imath }} := ({\widehat{\psi }}, {\widehat{\eta }}, {\widehat{w}})\). Then

$$\begin{aligned} \Vert DG_\delta (\varphi ,0,0) [{\widehat{\imath }}] \Vert _s + \Vert DG_\delta (\varphi ,0,0)^{-1} [{\widehat{\imath }}] \Vert _s&\le _s \Vert {\widehat{\imath }} \Vert _{s} + \Vert {\mathfrak I}_0 \Vert _{s + \sigma } \Vert {\widehat{\imath }} \Vert _{s_0}, \nonumber \\ \Vert D^2 G_\delta (\varphi ,0,0)[{\widehat{\imath }}_1, {\widehat{\imath }}_2] \Vert _s&\le _s \Vert {\widehat{\imath }}_1\Vert _s \Vert {\widehat{\imath }}_2 \Vert _{s_0} + \Vert {\widehat{\imath }}_1\Vert _{s_0} \Vert {\widehat{\imath }}_2 \Vert _{s} \nonumber \\&\quad + \Vert {\mathfrak I}_0 \Vert _{s + \sigma } \Vert {\widehat{\imath }}_1 \Vert _{s_0} \Vert {\widehat{\imath }}_2\Vert _{s_0} \end{aligned}$$
(6.21)

for some \(\sigma := \sigma (\nu ,{\tau })\). The same estimates hold for the \(\Vert \ \Vert _s^{{\mathrm {Lip}(\gamma )}}\) norm.

In order to construct an approximate inverse of (6.20) it is sufficient to solve the equation

$$\begin{aligned} {\mathbb D} [{\widehat{\psi }}, {\widehat{\eta }}, {\widehat{w}}, {\widehat{\zeta }} ] := \begin{pmatrix} \mathcal{D}_\omega {\widehat{\psi }} - K_{20}(\varphi ) {\widehat{\eta }} - K_{11}^T(\varphi ) {\widehat{w}}\\ \mathcal{D}_\omega {\widehat{\eta }} + [\partial _\psi \theta _0(\varphi )]^T {\widehat{\zeta }} \\ \mathcal{D}_\omega {\widehat{w}} - \partial _x K_{11}(\varphi ){\widehat{\eta }} - \partial _x K_{0 2}(\varphi ) {\widehat{w}} \end{pmatrix} = \begin{pmatrix} g_1 \\ g_2 \\ g_3 \end{pmatrix} \end{aligned}$$
(6.22)

which is obtained by neglecting in \(B_1, B_2, B_3\) in (6.20) the terms \( \partial _\psi K_{10} \), \( \partial _{\psi \psi } K_{00} \), \( \partial _\psi K_{00} \), \( \partial _\psi K_{01} \) and \( \partial _\psi [\partial _\psi \theta _0(\varphi )]^T [ \cdot , \zeta _0] \) (these terms are naught at a solution by Lemmata 6 and 9).

First we solve the second equation in (6.22), namely \( \mathcal{D}_\omega {\widehat{\eta }} = g_2 - [\partial _\psi \theta _0(\varphi )]^T {\widehat{\zeta }} \). We choose \( {\widehat{\zeta }} \) so that the \(\varphi \)-average of the right hand side is zero, namely

$$\begin{aligned} {\widehat{\zeta }} = \langle g_2 \rangle \end{aligned}$$
(6.23)

(we denote \( \langle g \rangle := (2 \pi )^{- \nu } \int _{\mathbb T^\nu } g (\varphi ) d \varphi \)). Note that the \(\varphi \)-averaged matrix \( \langle [\partial _\psi \theta _0 ]^T \rangle \) \( = \langle I + [\partial _\psi \Theta _0]^T \rangle = I \) because \(\theta _0(\varphi ) = \varphi + \Theta _0(\varphi )\) and \(\Theta _0(\varphi )\) is a periodic function. Therefore

$$\begin{aligned} {\widehat{\eta }} := \mathcal{D}_\omega ^{-1} \big ( g_2 - [\partial _\psi \theta _0(\varphi ) ]^T \langle g_2 \rangle \big ) + \langle {\widehat{\eta }} \rangle , \quad \langle {\widehat{\eta }} \rangle \in \mathbb R^\nu , \end{aligned}$$
(6.24)

where the average \(\langle {\widehat{\eta }} \rangle \) will be fixed below. Then we consider the third equation

$$\begin{aligned} \mathcal{L}_\omega {\widehat{w}} = g_3 + \partial _x K_{11}(\varphi ) {\widehat{\eta }}, \ \quad \mathcal{L}_\omega := \omega \cdot \partial _\varphi - \partial _x K_{0 2}(\varphi ) . \end{aligned}$$
(6.25)
  • Inversion assumption

There exists a set \( \Omega _\infty \subset \Omega _o\) such that for all \( \omega \in \Omega _\infty \), for every function \( g \in H^{s+\mu }_{S^\bot } (\mathbb T^{\nu +1}) \) there exists a solution \( h := \mathcal{L}_\omega ^{- 1} g \in H^{s}_{S^\bot } (\mathbb T^{\nu +1}) \) of the linear equation \( \mathcal{L}_\omega h = g \), which satisfies

$$\begin{aligned} \Vert \mathcal{L}_\omega ^{- 1} g \Vert _s^{{\mathrm {Lip}(\gamma )}} \le C(s) \gamma ^{-1} \big ( \Vert g \Vert _{s + \mu }^{{\mathrm {Lip}(\gamma )}} + \varepsilon ^2 \gamma ^{-1} \Vert {\mathfrak I}_0 \Vert _{s + \mu }^{{\mathrm {Lip}(\gamma )}} \Vert g \Vert _{s_0}^{{\mathrm {Lip}(\gamma )}} \big ) \end{aligned}$$
(6.26)

for some \( \mu := \mu ({\tau }, \nu ) > 0 \).

By the above assumption there exists a solution

$$\begin{aligned} {\widehat{w}} := \mathcal{L}_\omega ^{-1} [ g_3 + \partial _x K_{11}(\varphi ) {\widehat{\eta }} \, ] \end{aligned}$$
(6.27)

of (6.25). Finally, we solve the first equation in (6.22), which, substituting (6.24) and (6.27), becomes

$$\begin{aligned} \mathcal{D}_\omega {\widehat{\psi }} = g_1 + M_1(\varphi ) \langle {\widehat{\eta }} \rangle + M_2(\varphi ) g_2 + M_3(\varphi ) g_3 - M_2(\varphi )[\partial _\psi \theta _0]^T \langle g_2 \rangle , \end{aligned}$$
(6.28)

where

$$\begin{aligned} M_1(\varphi ):= & {} K_{2 0}(\varphi ) + K_{11}^T(\varphi ) \mathcal {L}_\omega ^{-1} \partial _x K_{11}(\varphi ), \quad M_2(\varphi ) := M_1 (\varphi ) \mathcal{D}_\omega ^{-1} ,\nonumber \\ M_3(\varphi ):= & {} K_{11}^T (\varphi ) \mathcal{L}_\omega ^{-1} . \end{aligned}$$
(6.29)

To solve Eq. (6.28) we have to choose \(\langle {\widehat{\eta }} \rangle \) such that the right hand side in (6.28) has zero average. By Lemma 10 and (6.4), the \(\varphi \)-averaged matrix

$$\begin{aligned} \langle M_1 \rangle = \varepsilon ^{2 b} \mathbb {A} D_S + O(\varepsilon ^{5-2b}) . \end{aligned}$$
(6.30)

Therefore, for \( \varepsilon \) small, \(\langle M_1 \rangle \) is invertible and \(\langle M_1 \rangle ^{-1} = O(\varepsilon ^{-2 b}) = O(\gamma ^{- 1})\) (recall (5.9)). Thus we define

$$\begin{aligned} \langle {\widehat{\eta }} \rangle := - \langle M_1 \rangle ^{-1} [ \langle g_1 \rangle + \langle M_2 g_2 \rangle + \langle M_3 g_3 \rangle - \langle M_2 [\partial _\psi \theta _0]^T \rangle \langle g_2 \rangle ]. \end{aligned}$$
(6.31)

With this choice of \(\langle {\widehat{\eta }} \rangle \), Eq. (6.28) has the solution

$$\begin{aligned} {\widehat{\psi }} := \mathcal{D}_\omega ^{-1} [ g_1 + M_1(\varphi ) \langle {\widehat{\eta }} \rangle + M_2(\varphi ) g_2 + M_3(\varphi ) g_3 - M_2(\varphi )[\partial _\psi \theta _0]^T \langle g_2 \rangle ]. \end{aligned}$$
(6.32)

In conclusion, we have constructed a solution \(({\widehat{\psi }}, {\widehat{\eta }}, {\widehat{w}}, {\widehat{\zeta }})\) of the linear system (6.22).

Proposition 2

Assume (6.4) and (6.26). Then, \(\forall \omega \in \Omega _\infty \), \( \forall g := (g_1, g_2, g_3) \), the system (6.22) has a solution \( {\mathbb D}^{-1} g := ({\widehat{\psi }}, {\widehat{\eta }}, {\widehat{w}}, {\widehat{\zeta }} ) \) where \(({\widehat{\psi }}, {\widehat{\eta }}, {\widehat{w}}, {\widehat{\zeta }})\) are defined in (6.23), (6.24), (6.27), (6.31), (6.32), and satisfy

$$\begin{aligned} \Vert {\mathbb D}^{-1} g \Vert _s^{\mathrm{Lip}(\gamma )} \le _s \gamma ^{-1} \big ( \Vert g \Vert _{s + \mu }^{\mathrm{Lip}(\gamma )} + \varepsilon ^2 \gamma ^{-1} \Vert {\mathfrak I}_0 \Vert _{s + \mu }^{\mathrm{Lip}(\gamma )} \Vert g \Vert _{s_0 + \mu }^{\mathrm{Lip}(\gamma )} \big ). \end{aligned}$$
(6.33)

Proof

Recalling (6.29), by Lemma 10, (6.26) and (6.4) we get \( \Vert M_2 h \Vert _{s_0} + \Vert M_3 h \Vert _{s_0} \) \(\le C \Vert h \Vert _{s_0 + \sigma } \). Then, by (6.31) and \(\langle M_1 \rangle ^{-1} = O(\varepsilon ^{-2 b}) = O(\gamma ^{-1}) \), we deduce \( |\langle {\widehat{\eta }}\rangle |^{{\mathrm {Lip}(\gamma )}} \le C\gamma ^{-1} \Vert g \Vert _{s_0+ \sigma }^{{\mathrm {Lip}(\gamma )}} \) and (6.24), (5.16) imply \( \Vert {\widehat{\eta }} \Vert _s^{{\mathrm {Lip}(\gamma )}} \le _s \gamma ^{-1} \big ( \Vert g \Vert _{s + \sigma }^{\mathrm {Lip}(\gamma )}\) \( + \Vert {\mathfrak {I}}_0 \Vert _{s + \sigma } \Vert g \Vert _{s_0}^{\mathrm {Lip}(\gamma )}\big )\). The bound (6.33) is sharp for \( {\widehat{w}} \) because \( \mathcal{L}_\omega ^{-1} g_3 \) in (6.27) is estimated using (6.26). Finally \( {\widehat{\psi }} \) satisfies (6.33) using (6.26), (6.29), (6.32), (5.16), and Lemma 10. \(\square \)

Let \(\widetilde{G}_\delta (\psi , \eta , w, \zeta ) := ( G_\delta (\psi , \eta , w), \zeta )\). Let \(\Vert (\psi , \eta , w, \zeta ) \Vert _s^{\mathrm {Lip}(\gamma )}\) denote the maximum between \(\Vert (\psi , \eta , w) \Vert _s^{\mathrm {Lip}(\gamma )}\) and \(| \zeta |^{\mathrm {Lip}(\gamma )}\). We prove that the operator

$$\begin{aligned} \mathbf{T}_0 := (D { \widetilde{G}}_\delta )(\varphi ,0,0) \circ {\mathbb D}^{-1} \circ (D G_\delta ) (\varphi ,0,0)^{-1} \end{aligned}$$
(6.34)

is an approximate right inverse for \(d_{i,\zeta } \mathcal{F}(i_0 )\).

Theorem 3

(Approximate inverse) Assume (6.4) and the inversion assumption (6.26). Then there exists \( \mu := \mu (\tau , \nu ) > 0 \) such that, for all \( \omega \in \Omega _\infty \), for all \( g := (g_1, g_2, g_3) \), the operator \( \mathbf{T}_0 \) defined in (6.34) satisfies

$$\begin{aligned} \Vert \mathbf{T}_0 g \Vert _{s}^{\mathrm{Lip}(\gamma )} \le _s \gamma ^{-1} \big (\Vert g \Vert _{s + \mu }^{\mathrm{Lip}(\gamma )} + \varepsilon ^2 \gamma ^{-1} \Vert {\mathfrak I}_0 \Vert _{s + \mu }^{\mathrm{Lip}(\gamma )} \Vert g \Vert _{s_0 + \mu }^{\mathrm{Lip}(\gamma )} \big ). \end{aligned}$$
(6.35)

The operator \(\mathbf {T}_0\) is an approximate inverse of \(d_{i, \zeta } \mathcal{F}(i_0 )\), namely

$$\begin{aligned}&\Vert ( d_{i, \zeta } \mathcal{F}(i_0) \circ \mathbf{T}_0 - I ) g \Vert _s^{\mathrm{Lip}(\gamma )}\nonumber \\&\quad \le _s \gamma ^{- 1}\Vert \mathcal{F}(i_0, \zeta _0) \Vert _{s_0 + \mu }^{\mathrm {Lip}(\gamma )}\Vert g \Vert _{s + \mu }^{\mathrm {Lip}(\gamma )}\nonumber \\&\quad \quad + \gamma ^{- 1} \big \{ \Vert \mathcal{F}(i_0, \zeta _0) \Vert _{s + \mu }^{\mathrm {Lip}(\gamma )}+ \varepsilon ^2 \gamma ^{-1} \Vert \mathcal{F}(i_0, \zeta _0) \Vert _{s_0 + \mu }^{\mathrm {Lip}(\gamma )}\Vert {\mathfrak I}_0 \Vert _{s + \mu }^{\mathrm {Lip}(\gamma )}\big \} \Vert g \Vert _{s_0 + \mu }^{\mathrm {Lip}(\gamma )}. \end{aligned}$$
(6.36)

Proof

In this proof we denote \(\Vert \ \Vert _s\) instead of \(\Vert \ \Vert _s^{{\mathrm {Lip}(\gamma )}}\). The bound (6.35) follows from (6.21), (6.33) and (6.34). By (5.6), since \( X_\mathcal {N}\) does not depend on y , and \( i_\delta \) differs from \( i_0 \) only for the y component, we have

$$\begin{aligned}&d_{i, \zeta } \mathcal{F}(i_0 )[\, {\widehat{\imath }}, {\widehat{\zeta }} \, ] - d_{i, \zeta } \mathcal{F}(i_\delta ) [\, {\widehat{\imath }}, {\widehat{\zeta }} \, ] = d_i X_P (i_\delta ) [\, {\widehat{\imath }} \, ] - d_i X_P (i_0) [\, {\widehat{\imath }} \, ] \nonumber \\&\quad = \int _0^1 \partial _y d_i X_P (\theta _0, y_0 + s(y_\delta -y_0), z_0) [y_\delta -y_0, {\widehat{\imath }}\,] ds =: \mathcal{E}_0 [\, {\widehat{\imath }}, {\widehat{\zeta }} \, ]. \end{aligned}$$
(6.37)

By (5.13), (6.4), (6.8) and (6.9), we estimate

$$\begin{aligned} \Vert \mathcal{E}_0 [\, {\widehat{\imath }}, {\widehat{\zeta }} \, ] \Vert _s \le _s \Vert Z \Vert _{s_0 + \sigma } \Vert {\widehat{\imath }} \Vert _{s + \sigma } + (\Vert Z \Vert _{s + \sigma } + \Vert Z \Vert _{s_0 + \sigma } \Vert {\mathfrak {I}}_0 \Vert _{s+\sigma }) \Vert {\widehat{\imath }} \Vert _{s_0 + \sigma } \end{aligned}$$
(6.38)

where \(Z := \mathcal {F}(i_0, \zeta _0)\) (recall (6.5)). Note that \(\mathcal {E}_0[{\widehat{\imath }}, {\widehat{\zeta }}]\) is, in fact, independent of \({\widehat{\zeta }}\). Denote the set of variables \( (\psi , \eta , w) =: {\mathtt u} \). Under the transformation \(G_\delta \), the nonlinear operator \(\mathcal{F}\) in (5.6) transforms into

$$\begin{aligned} \mathcal{F}(G_\delta ( {\mathtt u} (\varphi ) ), \zeta ) = D G_\delta ( {\mathtt u} (\varphi ) ) ( \mathcal{D}_\omega {\mathtt u} (\varphi ) - X_K ( {\mathtt u} (\varphi ), \zeta ) ), \end{aligned}$$
(6.39)

where \(K = H_{\varepsilon , \zeta } \circ G_\delta \), see (6.14) and (6.15). Differentiating (6.39) at the trivial torus \( {\mathtt u}_\delta (\varphi ) = G_\delta ^{-1}(i_\delta ) (\varphi ) = (\varphi , 0 , 0 ) \), at \( \zeta = \zeta _0 \), in the direction \(({\widehat{\mathtt u}}, {\widehat{\zeta }}\,)\) \(= (D G_\delta ({\mathtt u}_\delta )^{-1} [\, {\widehat{\imath }} \, ], {\widehat{\zeta }}) = D {\widetilde{G}}_\delta ({\mathtt u}_\delta )^{-1} [\, {\widehat{\imath }} , {\widehat{\zeta }} \, ] \), we get

$$\begin{aligned} d_{i , \zeta } \mathcal{F}(i_\delta ) [\, {\widehat{\imath }}, {\widehat{\zeta }} \, ]&= D G_\delta ( {\mathtt u}_\delta ) ( \mathcal{D}_\omega {\widehat{\mathtt u}} - d_{\mathtt u, \zeta } X_K( {\mathtt u}_\delta , \zeta _0) [{\widehat{\mathtt u}}, {\widehat{\zeta }} \, ] ) + \mathcal{E}_1 [ \, {\widehat{\imath }} , {\widehat{\zeta }} \, ],\end{aligned}$$
(6.40)
$$\begin{aligned} \mathcal{E}_1 [\, {\widehat{\imath }} , {\widehat{\zeta }} \, ]&:= D^2 G_\delta ( {\mathtt u}_\delta ) [ D G_\delta ( {\mathtt u}_\delta )^{-1} \mathcal{F}(i_\delta , \zeta _0), \, D G_\delta ({\mathtt u}_\delta )^{-1} [ \, {\widehat{\imath }} \, ] ] , \end{aligned}$$
(6.41)

where \( d_{\mathtt u, \zeta } X_K( {\mathtt u}_\delta , \zeta _0) \) is expanded in (6.20). In fact, \(\mathcal{E}_1\) is independent of \({\widehat{\zeta }}\). We split

$$\begin{aligned} \mathcal{D}_\omega {\widehat{\mathtt u}} - d_{\mathtt u, \zeta } X_K( {\mathtt u}_\delta , \zeta _0) [{\widehat{\mathtt u}}, {\widehat{\zeta }}] = \mathbb {D} [{\widehat{\mathtt u}}, {\widehat{\zeta }} \, ] + R_Z [ {\widehat{\mathtt u}}, {\widehat{\zeta }} \, ], \end{aligned}$$

where \( {\mathbb D} [{\widehat{\mathtt u}}, {\widehat{\zeta }}] \) is defined in (6.22) and \(R_Z [ {\widehat{\psi }}, {\widehat{\eta }}, {\widehat{w}}, {\widehat{\zeta }}]\) is defined by difference, so that its first component is \( - \partial _\psi K_{10}(\varphi ) [{\widehat{\psi }} ]\), its second component is

$$\begin{aligned} \partial _\psi [\partial _\psi \theta _0(\varphi )]^T [{\widehat{\psi }}, \zeta _0] + \partial _{\psi \psi } K_{00} (\varphi ) [{\widehat{\psi }}] + [\partial _\psi K_{10}(\varphi )]^T {\widehat{\eta }} + [\partial _\psi K_{01}(\varphi )]^T {\widehat{w}}, \end{aligned}$$

and its third component is \(- \partial _x \{ \partial _{\psi } K_{01}(\varphi )[{\widehat{\psi }}] \} \) (in fact, \(R_Z\) is independent of \({\widehat{\zeta }}\)). By (6.37) and (6.40),

$$\begin{aligned} d_{i, \zeta } \mathcal{F}(i_0 )&= D G_\delta ({\mathtt u}_\delta ) \circ {\mathbb D} \circ D {\widetilde{G}}_\delta ({\mathtt u}_\delta )^{-1} + \mathcal{E}_0 + \mathcal{E}_1 + \mathcal {E}_2, \nonumber \\ \mathcal {E}_2&:= D G_\delta ( {\mathtt u}_\delta ) \circ R_Z \circ D {\widetilde{G}}_\delta ({\mathtt u}_\delta )^{-1}. \end{aligned}$$
(6.42)

By Lemmata 6, 9, 11, and (6.4) and (6.10), the terms \(\mathcal {E}_1, \mathcal {E}_2\) satisfy the same bound (6.38) as \(\mathcal {E}_0\). Thus the sum \(\mathcal {E}:= \mathcal {E}_0 + \mathcal {E}_1 + \mathcal {E}_2\) satisfies (6.38). Applying \( \mathbf{T}_0 \) defined in (6.34) to the right in (6.42), since \( {\mathbb D} \circ {\mathbb D}^{-1} = I \) (see Proposition 2), we get \(d_{i, \zeta } \mathcal{F}(i_0 ) \circ \mathbf{T}_0 - I = \mathcal {E}\circ \mathbf{T}_0\). Then (6.36) follows from (6.35) and the bound (6.38) for \(\mathcal {E}\). \(\square \)

7 The linearized operator in the normal directions

The goal of this section is to write an explicit expression of the linearized operator \(\mathcal {L}_\omega \) defined in (6.25), see Proposition 3. To this aim, we compute \( \frac{1}{2} ( K_{02}(\psi ) w, w )_{L^2(\mathbb T)} \), \( w \in H_S^\bot \), which collects all the terms of \((H_\varepsilon \circ G_\delta )(\psi , 0, w)\) that are quadratic in w, see (6.14). We first recall some preliminary lemmata.

Lemma 12

[5, Lemma 7.1] Let H be a Hamiltonian function of class \( C^2 ( H^1_0(\mathbb T_x), \mathbb R)\) and consider a map \( \Phi (u) := u + \Psi (u) \) satisfying \(\Psi (u) = \Pi _E \Psi (\Pi _E u)\), for all u , where E is a finite dimensional subspace as in (3.5). Then

$$\begin{aligned} \partial _u [\nabla ( H \circ \Phi )] (u) [h] = (\partial _u \nabla H )(\Phi (u)) [h] + \mathcal{R}(u)[h], \end{aligned}$$
(7.1)

where \( \mathcal{R}(u) \) has the “finite dimensional” form

$$\begin{aligned} \mathcal{R}(u)[h] = {\mathop \sum }_{|j| \le C} \big ( h , g_j(u) \big )_{L^2(\mathbb T)} \chi _j(u) \end{aligned}$$
(7.2)

with \( \chi _j (u) = e^{{\mathrm i}j x} \) or \( g_j(u) = e^{{\mathrm i}j x} \). The remainder in (7.2) is \( \mathcal{R} (u) = \mathcal{R}_0 (u) + \mathcal{R}_1 (u) + \mathcal{R}_2 (u) \) with

$$\begin{aligned} \mathcal{R}_0 (u)&:= (\partial _u \nabla H)(\Phi (u)) \partial _u \Psi (u), \quad \mathcal{R}_1 (u) := [\partial _{u }\{ \Psi '(u)^T\}] [ \cdot , \nabla H(\Phi (u)) ], \nonumber \\ \, \mathcal{R}_2 (u)&:= [\partial _u \Psi (u)]^T (\partial _u \nabla H)(\Phi (u)) \partial _u \Phi (u). \end{aligned}$$
(7.3)

Lemma 13

(Lemma 7.3 in [5]) Let \( \mathcal{R} \) be an operator of the form

$$\begin{aligned} \mathcal{R} h = \sum _{|j| \le C } \int _0^1 (h,\,g_j({\tau }) )_{L^2(\mathbb T)} \chi _j ({\tau })\,d {\tau }, \end{aligned}$$
(7.4)

where the functions \(g_j({\tau }),\,\chi _j({\tau }) \in H^s\), \({\tau }\in [0, 1]\) depend in a Lipschitz way on the parameter \(\omega \). Then its matrix s-decay norm (see (2.4), (2.5)) satisfies

$$\begin{aligned} | \mathcal{R} |_s^{\mathrm {Lip}(\gamma )}\le _s \sum _{|j| \le C} \sup _{{\tau }\in [0,1]} \big ( \Vert \chi _j({\tau }) \Vert _s^{\mathrm {Lip}(\gamma )}\Vert g_j({\tau }) \Vert _{s_0}^{\mathrm {Lip}(\gamma )}+ \Vert \chi _j({\tau }) \Vert _{s_0}^{\mathrm {Lip}(\gamma )}\Vert g_j({\tau }) \Vert _s^{\mathrm {Lip}(\gamma )}\big ). \end{aligned}$$

7.1 Composition with the map \(G_\delta \)

In the sequel we use the fact that \({\mathfrak {I}}_\delta := {\mathfrak {I}}_\delta (\varphi ; \omega ) := i_\delta (\varphi ; \omega ) - (\varphi ,0,0) \) satisfies, by (6.4) and (6.8),

$$\begin{aligned} \Vert {\mathfrak I}_\delta \Vert _{s_0+\mu }^{{\mathrm {Lip}(\gamma )}} \le C\varepsilon ^{5 - 2b} \gamma ^{-1} = C \varepsilon ^{5-4b}. \end{aligned}$$
(7.5)

In this section we study the Hamiltonian \( K := H_\varepsilon \circ G_\delta = \varepsilon ^{-2b} \mathcal {H}\circ A_\varepsilon \circ G_\delta \) defined in (4.6) and (6.14). Recalling (4.7) and (6.13), \(A_\varepsilon \circ G_\delta \) has the form

$$\begin{aligned} A_\varepsilon (G_\delta (\psi , \eta , w)) = \varepsilon v_\varepsilon ( \theta _0(\psi ), \, y_\delta (\psi ) + L_1(\psi ) \eta + L_2(\psi ) w ) + \varepsilon ^b (z_0(\psi ) +w) \end{aligned}$$
(7.6)

where \(v_\varepsilon \) is defined in (4.7), and

$$\begin{aligned} L_1(\psi ) := [\partial _\psi \theta _0(\psi )]^{-T} , \quad L_2(\psi ) := \big [ (\partial _\theta \tilde{z}_0) (\theta _0(\psi )) \big ]^T \partial _x^{-1} . \end{aligned}$$
(7.7)

By Taylor’s formula, we expand (7.6) in w at \((\eta ,w)=(0,0)\), and we get

$$\begin{aligned} (A_\varepsilon \circ G_\delta )(\psi , 0, w) = T_\delta (\psi ) + T_1(\psi ) w + T_2(\psi )[w,w] + T_{\ge 3}(\psi , w) , \end{aligned}$$

where

$$\begin{aligned} T_\delta (\psi ) := A_\varepsilon (G_\delta (\psi , 0, 0)) = \varepsilon v_\delta (\psi ) + \varepsilon ^b z_0(\psi ), \ \ v_\delta (\psi ) := v_\varepsilon (\theta _0(\psi ), y_\delta (\psi )) \end{aligned}$$
(7.8)

is the approximate isotropic torus in the phase space \( H^1_0 (\mathbb T) \) (it corresponds to \( i_\delta \) in Lemma 8),

$$\begin{aligned} T_1(\psi ) w := \varepsilon ^{2b-1} U_1 (\psi ) w + \varepsilon ^b w, \quad T_2(\psi )[w,w] := \varepsilon ^{4b - 3} U_2(\psi )[w,w] \end{aligned}$$
$$\begin{aligned} U_1(\psi ) w&= \sum _{j \in S} \frac{|j| [ L_2(\psi ) w ]_j \, e^{{\mathrm i}[\theta _0(\psi )]_j}}{2 \sqrt{ \xi _j + \varepsilon ^{2(b-1)} |j| [ y_\delta (\psi ) ]_j }} \, e^{{\mathrm i}jx}, \end{aligned}$$
(7.9)
$$\begin{aligned} U_2(\psi )[w,w]&= - \sum _{j \in S} \frac{j^2 [ L_2(\psi ) w ]_j^2 \, e^{{\mathrm i}[\theta _0(\psi )]_j}}{8 \{ \xi _j + \varepsilon ^{2(b-1)} |j| [ y_\delta (\psi ) ]_j \}^{3/2} } \, e^{{\mathrm i}jx}, \end{aligned}$$
(7.10)

and \(T_{\ge 3}(\psi , w)\) collects all the terms of order at least cubic in w. The terms \(U_1, U_2 = O(1)\) in \(\varepsilon \). Moreover, using that \( L_2 (\psi ) \) in (7.7) vanishes as \( z_0 = 0 \), they satisfy

$$\begin{aligned} \Vert U_1 w \Vert _s&\le _s \Vert {\mathfrak {I}}_\delta \Vert _s \Vert w \Vert _{s_0} + \Vert {\mathfrak {I}}_\delta \Vert _{s_0} \Vert w \Vert _s , \nonumber \\ \Vert U_2 [w,w] \Vert _s&\le _s \Vert {\mathfrak {I}}_\delta \Vert _s \Vert {\mathfrak {I}}_\delta \Vert _{s_0} \Vert w \Vert _{s_0}^2 + \Vert {\mathfrak {I}}_\delta \Vert _{s_0}^2 \Vert w \Vert _{s_0} \Vert w \Vert _s \end{aligned}$$
(7.11)

and also in the \( \Vert \ \Vert _s^{\mathrm {Lip}(\gamma )}\)-norm. We expand \( \mathcal{H} \) by Taylor’s formula

$$\begin{aligned} \mathcal {H}(u+h) = \mathcal {H}(u) + ( (\nabla \mathcal {H})(u), h )_{L^2(\mathbb T)} + \tfrac{1}{2} ( (\partial _u \nabla \mathcal {H})(u) [h], h )_{L^2(\mathbb T)} + O(h^3). \end{aligned}$$

Specifying at \(u = T_\delta (\psi )\) and \( h = T_1(\psi ) w + T_2(\psi )[w,w] + T_{\ge 3}(\psi ,w)\), we obtain that the sum of all the components of \( K = \varepsilon ^{-2b} (\mathcal {H}\circ A_\varepsilon \circ G_\delta )(\psi , 0, w) \) that are quadratic in w is

$$\begin{aligned} \begin{array}{ll} \tfrac{1}{2} ( K_{02}w, w )_{L^2(\mathbb T)} &{} = \varepsilon ^{-2b} ( (\nabla \mathcal {H})(T_\delta ), T_2 [w,w] )_{L^2(\mathbb T)} \\ &{} \quad + \varepsilon ^{-2b} \tfrac{1}{2} ( (\partial _u \nabla \mathcal {H})(T_\delta ) [T_1 w], T_1 w )_{L^2(\mathbb T)} . \end{array} \end{aligned}$$

Inserting the expressions (7.9) and (7.10) in the last equality we get

$$\begin{aligned} K_{02}(\psi ) w&= (\partial _u \nabla \mathcal {H})(T_\delta ) [w] + 2 \varepsilon ^{b-1} (\partial _u \nabla \mathcal {H})(T_\delta ) [U_1 w] \nonumber \\&\quad \, + \varepsilon ^{2(b-1)} U_1^T (\partial _u \nabla \mathcal {H})(T_\delta ) [U_1 w] + 2 \varepsilon ^{2b- 3} U_2[w, \cdot ]^T (\nabla \mathcal {H})(T_\delta ). \end{aligned}$$
(7.12)

Lemma 14

The operator \(K_{02}\) reads

$$\begin{aligned} ( K_{02}(\psi ) w, w )_{L^2(\mathbb T)} = ( (\partial _u \nabla \mathcal {H})(T_\delta ) [w], w )_{L^2(\mathbb T)} + ( R(\psi ) w, w )_{L^2(\mathbb T)} \end{aligned}$$
(7.13)

where \(R(\psi )w \) has the “finite dimensional” form

$$\begin{aligned} R(\psi ) w = {\mathop \sum }_{|j| \le C} ( w , g_j(\psi ) )_{L^2(\mathbb T)} \chi _j(\psi ). \end{aligned}$$
(7.14)

The functions \(g_j, \chi _j\) satisfy, for some \(\sigma := \sigma (\nu , {\tau }) > 0\),

$$\begin{aligned}&\Vert g_j \Vert _s^{\mathrm {Lip}(\gamma )}\Vert \chi _j \Vert _{s_0}^{\mathrm {Lip}(\gamma )}+ \Vert g_j \Vert _{s_0}^{\mathrm {Lip}(\gamma )}\Vert \chi _j \Vert _s^{\mathrm {Lip}(\gamma )}\le _s \varepsilon ^{b+1} \Vert {\mathfrak I}_\delta \Vert _{s + \sigma }^{\mathrm {Lip}(\gamma )}, \end{aligned}$$
(7.15)
$$\begin{aligned}&\Vert \partial _i g_j [{\widehat{\imath }} ]\Vert _s \Vert \chi _j \Vert _{s_0} + \Vert \partial _i g_j [{\widehat{\imath }} ]\Vert _{s_0} \Vert \chi _j \Vert _{s} + \Vert g_j \Vert _{s_0} \Vert \partial _i \chi _j [{\widehat{\imath }} ] \Vert _s + \Vert g_j \Vert _{s} \Vert \partial _i \chi _j [{\widehat{\imath }} ]\Vert _{s_0} \nonumber \\&\quad \le _s \varepsilon ^{b + 1} ( \Vert {\widehat{\imath }} \Vert _{s + \sigma } + \Vert {\mathfrak I}_\delta \Vert _{s + \sigma } \Vert {\widehat{\imath }} \Vert _{s_0 + \sigma }) , \end{aligned}$$
(7.16)

where \(i = (\theta , y, z)\) (see (5.1)) and \({\widehat{\imath }} = ({\widehat{\theta }}, {\widehat{y}}, {\widehat{z}})\).

Proof

Since \( U_1 = \Pi _S U_1 \) and \( U_2 = \Pi _S U_2 \), the last three terms in (7.12) have all the form (7.14). We have to prove that they are also small in size.

By (4.8), (6.13) and (7.7), the only term in \(\varepsilon ^{-2b} H_2(A_\varepsilon (G_\delta (\psi , \eta , w)))\) that is quadratic in w is \(\frac{1}{2} \int _\mathbb Tw_x^2 \, dx\), so this is the only contribution to (7.12) coming from \(H_2\).

It remains to consider all the terms coming from \(\mathcal {H}_{\ge 4} := \mathcal {H}_4 + \mathcal{H}_{\ge 5} = O(u^4)\). The term \(\varepsilon ^{b - 1} \partial _u \nabla \mathcal{H}_{\ge 4}(T_\delta ) U_1\), the term \(\varepsilon ^{2(b - 1)} U_1^T (\partial _u \nabla \mathcal{H}_{\ge 4})(T_\delta ) U_1\) and the term \(\varepsilon ^{2 b - 3} U_2^T \nabla \mathcal{H}_{\ge 4}(T_\delta ) \) have all the form (7.14) and, using the inequality \( \Vert T_\delta \Vert _s^{\mathrm {Lip}(\gamma )}\le \varepsilon (1 + \Vert {\mathfrak {I}}_\delta \Vert _s^{\mathrm {Lip}(\gamma )}) \), (6.4) and (7.11), the bound (7.15) holds. By (6.11) and using explicit formulae (7.7)–(7.10) we get (7.16). \(\square \)

The conclusion of this section is that, after the composition with the action-angle variables, the rescaling (4.5), and the transformation \( G_\delta \), the linearized operator to analyze is \(w \mapsto (\partial _u \nabla \mathcal {H})(T_\delta ) [w] \), \(w \in H_S^\bot \), up to finite dimensional operators which have the form (7.14) and size (7.15).

7.2 The linearized operator in the normal directions

In view of (7.13) we now compute \( ( (\partial _u \nabla \mathcal {H})(T_\delta ) [w], w )_{L^2(\mathbb T)} \), \( w \in H_S^\bot \), where \( \mathcal {H}= H \circ \Phi _B \) and \(\Phi _B \) is the Birkhoff map of Proposition 1. We recall that \(\Phi _B(u) = u + \Psi (u)\) where \(\Psi \) satisfies (3.6) and \(\Psi (u) = O(u^3)\). It is convenient to estimate separately the terms in

$$\begin{aligned} \mathcal {H}= H \circ \Phi _B = H_2 \circ \Phi _B + H_4 \circ \Phi _B + H_{\ge 5} \circ \Phi _B \end{aligned}$$
(7.17)

where \( H_2, H_4, H_{\ge 5}\) are defined in (3.4).

We first consider \( H_{\ge 5} \circ \Phi _B \). By (3.4) we get \( \nabla H_{\ge 5}(u) = \pi _0[ (\partial _u f)(x, u, u_x) ]\) \(- \partial _x \{ (\partial _{u_x} f)(x, u,u_x) \} \) where \( \pi _0 \) is the operator defined in (1.32). Since \( \Phi _B \) has the form (3.6), Lemma 12 (at \( u = T_\delta \), see (7.8)) implies that

$$\begin{aligned} \partial _u \nabla ( H_{\ge 5} \circ \Phi _B ) (T_\delta ) [h]&= (\partial _u \nabla H_{\ge 5})(\Phi _B(T_\delta )) [h] + \mathcal{R}_{H_{\ge 5}}(T_\delta )[h] \nonumber \\&= \partial _x (r_1(T_\delta ) \partial _x h ) + r_0(T_\delta ) h + \mathcal{R}_{H_{\ge 5}}(T_\delta )[h] \end{aligned}$$
(7.18)

where the multiplicative functions \(r_0(T_\delta )\), \(r_1(T_\delta )\) are

$$\begin{aligned} r_0 (T_\delta )&:= \sigma _0(\Phi _B(T_\delta )), \quad r_1 (T_\delta ) := \sigma _1(\Phi _B(T_\delta )), \nonumber \\ \sigma _0(u)&:= (\partial _{uu} f)(x, u, u_x) - \partial _x \{ (\partial _{u u_x} f)(x, u, u_x) \}, \nonumber \\ \sigma _1(u)&:= - (\partial _{u_x u_x} f)(x, u, u_x), \end{aligned}$$
(7.19)

the remainder \( \mathcal{R}_{H_{\ge 5}}(u) \) has the form (7.2) with \(\chi _j = e^{{\mathrm i}jx}\) or \(g_j = e^{{\mathrm i}jx}\) and, using (7.3), it satisfies, for some \( \sigma := \sigma (\nu , {\tau }) > 0\),

$$\begin{aligned}&\Vert g_j \Vert _s^{\mathrm {Lip}(\gamma )}\Vert \chi _j \Vert _{s_0}^{\mathrm {Lip}(\gamma )}+ \Vert g_j \Vert _{s_0}^{\mathrm {Lip}(\gamma )}\Vert \chi _j \Vert _s^{\mathrm {Lip}(\gamma )}\le _s \varepsilon ^5 (1 + \Vert {\mathfrak {I}}_\delta \Vert _{s+2}^{\mathrm {Lip}(\gamma )}), \nonumber \\&\Vert \partial _i g_j [{\widehat{\imath }} ]\Vert _s \Vert \chi _j \Vert _{s_0} + \Vert \partial _i g_j [{\widehat{\imath }} ]\Vert _{s_0} \Vert \chi _j \Vert _{s} + \Vert g_j \Vert _{s_0} \Vert \partial _i \chi _j [{\widehat{\imath }} ] \Vert _s + \Vert g_j \Vert _{s} \Vert \partial _i \chi _j [{\widehat{\imath }} ]\Vert _{s_0} \nonumber \\&\quad \le _s \varepsilon ^5 ( \Vert {\widehat{\imath }} \Vert _{s+\sigma } + \Vert {\mathfrak {I}}_\delta \Vert _{s+2} \Vert {\widehat{\imath }} \Vert _{s_0 + 2} ). \end{aligned}$$
(7.20)

Now we consider the contributions from \( H_2 \circ \Phi _B\) and \(H_4 \circ \Phi _B \). By Lemma 12 and the expressions of \( H_2, H_4 \) in (3.4) we deduce that

$$\begin{aligned} \partial _u \nabla ( H_2 \circ \Phi _B) (T_\delta ) [h]&= - \partial _{xx} h + \mathcal{R}_{H_2}(T_\delta )[h] , \end{aligned}$$
(7.21)
$$\begin{aligned} \partial _u \nabla ( H_4 \circ \Phi _B) (T_\delta ) [h]&= -3\varsigma (\Phi _B (T_\delta ))^2 h + \mathcal{R}_{H_4}(T_\delta )[h] , \end{aligned}$$
(7.22)

where \( \mathcal{R}_{H_2}(u) \), \( \mathcal{R}_{H_4}(u) \) have the form (7.2). By (7.3), they have size \(\mathcal{R}_{H_2}(T_\delta ) = O(\varepsilon ^2)\), \(\mathcal{R}_{H_4}(T_\delta ) = O(\varepsilon ^4)\). More precisely, the functions \(g_j, \chi _j\) in \(\mathcal {R}_{H_4}(T_\delta )\) satisfy the bounds in (7.20) with \(\varepsilon ^5\) replaced by \(\varepsilon ^4\). Regarding \(\mathcal {R}_{H_2}(T_\delta )\), we need to find an exact formula for the terms of order \(\varepsilon ^2\).

The sum of (7.18), (7.21) and (7.22) gives a formula for \(\partial _u \nabla \mathcal {H}(T_\delta )[h]\), where the terms of form (7.2) and order \(\varepsilon ^2\) are confined in \(\mathcal {R}_{H_2}(T_\delta )\). On the other hand, recalling (3.7), \(\mathcal {H}= H_2 + \mathcal {H}_4 + \mathcal {H}_{\ge 5}\), and \(\partial _u \nabla H_2(T_\delta ) = -\partial _{xx}\), while \(\partial _u \nabla \mathcal {H}_{\ge 5}(T_\delta ) = O(\varepsilon ^3)\). Therefore all the terms of order \(\varepsilon ^2\) in \(\partial _u \nabla \mathcal {H}(T_\delta )\) can only come from \(\partial _u \nabla \mathcal {H}_4(T_\delta )\). Using formula (3.8) for \(\mathcal {H}_4\), we calculate

$$\begin{aligned} \Pi _S^\bot \big ( \partial _u \nabla \mathcal {H}_4(T_\delta )[h] \big ) = - 3\varsigma \Pi _S^\bot (T_\delta ^2 h) \quad \forall h \in H_{S^\bot }^s . \end{aligned}$$

Hence all the terms of order \(\varepsilon ^2\) in \(\Pi _S^\bot (\partial _u \nabla \mathcal {H}(T_\delta )[h] )\) are contained in the term \(- 3\varsigma \Pi _S^\bot (T_\delta ^2 h)\) (and the term \(- 3\varsigma \Pi _S^\bot (T_\delta ^2 h)\) is included in \(-3\varsigma \Pi _S^\bot [ (\Phi _B (T_\delta ))^2 h]\) because \(\Phi _B(T_\delta ) = T_\delta + \Psi (T_\delta )\)). As a consequence, \(\Pi _S^\bot \mathcal {R}_{H_2}(T_\delta )\) is of size \(O(\varepsilon ^3)\), and its functions \(g_j, \chi _j\) (see (7.2)) satisfy (7.20) with \(\varepsilon ^5\) replaced by \(\varepsilon ^3\).

By Lemma 14 and the results of this section we deduce:

Proposition 3

Assume (7.5). Then the Hamiltonian operator \( \mathcal{L}_\omega \) has the form, \( \forall h \in H_{S^\bot }^s ( \mathbb T^{\nu +1}) \),

$$\begin{aligned} \mathcal{L}_\omega h := \mathcal {D}_\omega h - \partial _x K_{02} h = \Pi _S^\bot \big ( \mathcal {D}_\omega h + \partial _{xx} (a_1 \partial _x h) + \partial _x ( a_0 h ) - \partial _x \mathcal {R}_* h \big ) \end{aligned}$$
(7.23)

where \( {\mathcal {R}}_* := \mathcal {R}_{H_2}(T_\delta ) + \mathcal {R}_{H_4}(T_\delta ) + \mathcal {R}_{H_{\ge 5}}(T_\delta ) + R(\psi ) \) (with \(R(\psi )\) defined in Lemma 14, and \(\mathcal {R}_{H_2}(T_\delta )\), \(\mathcal {R}_{H_4}(T_\delta )\), \(\mathcal {R}_{H_{\ge 5}}(T_\delta )\) defined in (7.18), (7.21) and (7.22)), the functions

$$\begin{aligned} a_1 := 1 - r_1 ( T_\delta ) , \quad a_0 := 3\varsigma (\Phi _B(T_\delta ))^2 - r_0(T_\delta ) , \quad \end{aligned}$$
(7.24)

\( r_0, r_1 \) are defined in (7.19), and \( T_\delta \) in (7.8). They satisfy

$$\begin{aligned} \Vert a_1 -1 \Vert _s^{\mathrm {Lip}(\gamma )}+ \Vert a_0 - 3\varsigma T_\delta ^2 \Vert _s^{\mathrm {Lip}(\gamma )}&\le _s \varepsilon ^3 ( 1 + \Vert {\mathfrak I}_\delta \Vert _{s+\sigma }^{\mathrm {Lip}(\gamma )}) , \end{aligned}$$
(7.25)
$$\begin{aligned} \Vert \partial _i a_1[{\widehat{\imath }} ] \Vert _s + \Vert \partial _i (a_0 - 3\varsigma T_\delta ^2) [{\widehat{\imath }} ] \Vert _s&\le _s \varepsilon ^3 ( \Vert {\widehat{\imath }} \Vert _{s+\sigma } + \Vert {\mathfrak I}_\delta \Vert _{s+\sigma } \Vert {\widehat{\imath }} \Vert _{s_0+\sigma } ) \end{aligned}$$
(7.26)

where \( {\mathfrak {I}}_\delta (\varphi ) := (\theta _0(\varphi ) - \varphi , y_\delta (\varphi ), z_0(\varphi )) \) corresponds to \(T_\delta \). The remainder \( \mathcal{R}_* \) has the form (7.2), and its coefficients \(g_j, \chi _j\) satisfy bounds (7.15) and (7.16).

Remark 5

For \( K = H + \lambda M^2\), \( \lambda = 3 \varsigma / 4 \), the coefficient \(a_0\) in (7.24) becomes

$$\begin{aligned} a_0 = 3 \varsigma \pi _0 [ (\Phi _B (T_\delta ))^2 ] - r_0(T_\delta ), \end{aligned}$$

where \( \pi _0\) is defined in (1.32). Thus the space average of \( a_0\) has size \(O(\varepsilon ^3)\).

Bound (7.15) imply, by Lemma 13, estimates for the s -decay norms of \(\mathcal{R}_*\). The linearized operator \( \mathcal{L}_\omega := \mathcal{L}_\omega (\omega , i_\delta (\omega ))\) depends on the parameter \( \omega \) both directly and also through the dependence on the torus \(i_\delta (\omega )\). We have estimated also the partial derivative \( \partial _i \) with respect to the variables i (see (5.1)) in order to control, along the nonlinear Nash–Moser iteration, the Lipschitz variation of the eigenvalues of \( \mathcal{L}_\omega \) with respect to \( \omega \) and the approximate solution \( i_\delta \).

8 Reduction of the linearized operator in the normal directions

The goal of this section is to conjugate the Hamiltonian linear operator \( \mathcal{L}_\omega \) in (7.23) to the constant coefficients linear operator \( \mathcal{L}_\infty \) defined in (8.64). The proof is obtained applying different kind of symplectic transformations. We shall always assume (7.5).

8.1 Space reduction at the order \( \partial _{xxx} \)

As a first step, we symplectically conjugate the operator \( \mathcal{L}_\omega \) in (7.23) to \( \mathcal{L}_1 \) in (8.13), which has the coefficient of \(\partial _{xxx}\) independent on the space variable. Because of the Hamiltonian structure, this step also eliminates the terms \( O( \partial _{xx} )\).

We look for a \( \varphi \)-dependent family of symplectic diffeomorphisms \(\Phi (\varphi ) \) of \( H_S^\bot \) which differ from

$$\begin{aligned} \mathcal{A}_{\bot } := \Pi _S^\bot \mathcal{A} \Pi _S^\bot , \quad (\mathcal{A} h)(\varphi ,x) := (1 + \beta _x(\varphi ,x)) h(\varphi ,x + \beta (\varphi ,x)) , \end{aligned}$$
(8.1)

up to a small “finite dimensional” remainder, see (8.3). For each \( \varphi \in \mathbb T^\nu \), the map \( \mathcal{A}(\varphi ) \) is a symplectic map of the phase space, see Remark 3.3 in [3]. If \( \Vert \beta \Vert _{W^{1,\infty }} < 1/2\), then \( \mathcal{A} \) is invertible (see Lemma 3), and its inverse and adjoint maps are

$$\begin{aligned} (\mathcal{A}^{-1} h)(\varphi ,y)&= (1 + \tilde{\beta }_y(\varphi ,y)) h(\varphi , y + \tilde{\beta }(\varphi ,y)) , \nonumber \\ (\mathcal{A}^T h) (\varphi ,y)&= h(\varphi , y + \tilde{\beta }(\varphi ,y)) \end{aligned}$$
(8.2)

where \(x = y + \tilde{\beta } (\varphi , y) \) is the inverse diffeomorphism (of \(\mathbb T\)) of \( y = x + \beta (\varphi , x) \).

The restricted map \( \mathcal{A}_\bot (\varphi ): H_S^\bot \rightarrow H_S^\bot \) is not symplectic. We have already observed in the introduction that \( \mathcal{A }(\varphi ) \) is the time-1 flow map of the linear Hamiltonian PDE (1.30). The Eq. (1.30) is a linear transport equation, whose charactheristic curves are the solutions of the ODE

$$\begin{aligned} \frac{d}{d{\tau }} x = - b(\varphi , {\tau }, x) . \end{aligned}$$

To obtain a symplectic transformation close to \(\mathcal {A}_\bot \), we define a symplectic map \(\Phi \) of \( H_S^\bot \) as the time 1 flow of the Hamiltonian PDE (1.31). The linear operator \( \Pi _S^\bot \partial _x (b({\tau }, x) u) \) is the Hamiltonian vector field generated by the quadratic Hamiltonian \( \frac{1}{2} \int _{\mathbb T} b({\tau }, x) u^2 dx \) restricted to \( H_S^\bot \). The flow of (1.31) is well defined in the Sobolev spaces \( H^s_{S^\bot } (\mathbb T_x) \) for \( b(\varphi , {\tau }, x) \) smooth enough, by standard theory of linear hyperbolic PDEs (see e.g. section 0.8 in [29]). The difference between the time 1 flow map \( \Phi \) and \( \mathcal{A}_\bot \) is a “finite-dimensional” remainder of size \(O(\beta )\).

Lemma 15

(Lemma 8.1 of [5]) For \( \Vert \beta \Vert _{W^{s_0 + 1,\infty }} \) small, there exists an invertible symplectic transformation \(\Phi = \mathcal{A}_\bot + \mathcal{R}_\Phi \) of \(H_{S^\bot }^s\), where \( \mathcal{A}_\bot \) is defined in (8.1) and \( \mathcal{R}_\Phi \) is a “finite-dimensional” remainder

$$\begin{aligned} \mathcal{R}_\Phi h= \sum _{j \in S} \int _0^1 (h, g_j ({\tau }) )_{L^2(\mathbb T)} \, \chi _j ({\tau }) \, d {\tau }+ \sum _{j \in S} \big (h, \psi _j \big )_{L^2(\mathbb T)} e^{{\mathrm i}j x} \end{aligned}$$
(8.3)

for some functions \( \chi _j ({\tau }), g_j ({\tau }) , \psi _j \in H^s \) satisfying for all \({\tau }\in [0,1]\)

$$\begin{aligned} \Vert \psi _j\Vert _s + \Vert g_j({\tau })\Vert _s \le _s \Vert \beta \Vert _{W^{s + 2, \infty }}, \quad \Vert \chi _j({\tau })\Vert _s \le _s 1 + \Vert \beta \Vert _{W^{s + 1, \infty }} . \end{aligned}$$
(8.4)

Moreover

$$\begin{aligned} \Vert \Phi h \Vert _s + \Vert \Phi ^{-1} h \Vert _s \le _s \Vert h \Vert _s + \Vert \beta \Vert _{W^{s + 2, \infty }} \Vert h \Vert _{s_0} \quad \forall h \in H^s_{S^\bot } . \end{aligned}$$
(8.5)

We conjugate \( \mathcal{L}_\omega \) in (7.23) via the symplectic map \( \Phi = \mathcal{A}_\bot + \mathcal{R}_\Phi \) of Lemma 15. Using the splitting \( \Pi _S^{\bot } = I - \Pi _S \), we compute

$$\begin{aligned} \mathcal{L}_\omega \Phi = \Phi \mathcal{D}_\omega + \Pi _S^\bot \mathcal{A} ( b_3 \partial _{yyy} + b_2 \partial _{yy} + b_1 \partial _{y} + b_0 ) \Pi _S^\bot + \mathcal{R}_I , \end{aligned}$$
(8.6)

where the coefficients \(b_i(\varphi ,y)\), \(i=0,1,2,3\), are

$$\begin{aligned} b_3&:= \mathcal{A}^T [ a_1 ( 1 + \beta _x)^3 ], \quad b_2 := \mathcal{A}^T [ 2 (a_1)_x (1 + \beta _x )^2 + 6 a_1 \beta _{xx} (1 + \beta _x )], \nonumber \\ b_1&:= \mathcal{A}^T \left[ (\mathcal{D}_\omega \beta ) + \frac{3 a_1 \beta _{xx}^2 }{1 + \beta _x} + 4 a_1 \beta _{xxx} + 6 (a_1)_x \beta _{xx} + ((a_1)_{xx} + a_0) (1 + \beta _x) \right] , \nonumber \\ b_0&:= \mathcal{A}^T \left[ \frac{1}{1 + \beta _x} \left( \mathcal {D}_\omega \beta _x + a_1 \beta _{xxxx} + 2 (a_1)_{x} \beta _{xxx} + ((a_1)_{xx} + a_0) \beta _{xx} \right) + (a_0)_x \right] , \end{aligned}$$
(8.7)

and the remainder

$$\begin{aligned} \mathcal{R}_I&:= - \Pi _S^\bot \big ( a_1 \partial _{xxx} + 2 (a_1)_x \partial _{xx} + ( (a_{1})_{xx} + a_0)\partial _x + (a_0)_x \big ) \Pi _{S} \mathcal{A} \Pi _S^\bot \, \nonumber \\&\quad - \Pi _S^\bot \partial _x \mathcal {R}_{*} \mathcal{A}_\bot + [\mathcal{D}_\omega , \mathcal{R}_\Phi ] + (\mathcal{L}_\omega - \mathcal{D}_\omega ) \mathcal{R}_\Phi . \end{aligned}$$
(8.8)

The commutator \([\mathcal{D}_\omega , \mathcal{R}_\Phi ] \) has the form (8.3) with \(\mathcal{D}_\omega g_j\) or \(\mathcal{D}_\omega \chi _j\), \(\mathcal{D}_\omega \psi _j\) instead of \(\chi _j\), \(g_j\), \(\psi _j\) respectively. Also the last term \((\mathcal{L}_\omega - \mathcal{D}_\omega ) \mathcal{R}_\Phi \) in (8.8) has the form (8.3) (note that \(\mathcal{L}_\omega - \mathcal{D}_\omega \) does not contain derivatives with respect to \(\varphi \)). By (8.6), and decomposing \( I = \Pi _S + \Pi _S^\bot \), we get

$$\begin{aligned} \mathcal{L}_\omega \Phi = {}&\Phi ( \mathcal{D}_\omega + b_3 \partial _{yyy} + b_2 \partial _{yy} + b_1 \partial _{y} + b_0 ) \Pi _S^\bot + \mathcal{R}_ {II} , \end{aligned}$$
(8.9)
$$\begin{aligned} \mathcal{R}_{II} := {}&\{\Pi _S^\bot (\mathcal{A} - I) \Pi _{S} - \mathcal{R}_\Phi \} ( b_3 \partial _{yyy} + b_2 \partial _{yy} + b_1 \partial _{y} + b_0 ) \Pi _S^\bot + \mathcal{R}_I . \end{aligned}$$
(8.10)

Now we choose the function \( \beta = \beta (\varphi , x) \) such that

$$\begin{aligned} a_1(\varphi , x) (1 + \beta _x (\varphi , x))^3 = b_3 (\varphi ) \end{aligned}$$
(8.11)

so that the coefficient \( b_3 \) in (8.7) depends only on \( \varphi \) (note that \( \mathcal{A}^T [b_3 (\varphi )]\) \(= b_3 (\varphi )\)). The only solution of (8.11) with zero space average is (see e.g. [3, section 3.1]) \(\beta := \partial _x^{-1} \rho _0\), where \(\rho _0 := b_3 (\varphi )^{1/3} (a_1 (\varphi , x))^{-1/3} - 1\), and

$$\begin{aligned} b_3 (\varphi ) = \left( \frac{1}{2 \pi } \int _{\mathbb T} (a_1 (\varphi , x))^{-1/3} dx \right) ^{-3}. \end{aligned}$$
(8.12)

Applying the symplectic map \( \Phi ^{-1} \) in (8.9) we obtain the Hamiltonian operator (see Definition 2)

$$\begin{aligned} \mathcal{L}_1 := \Phi ^{-1} \mathcal{L}_\omega \Phi = \Pi _S^\bot ( \omega \cdot \partial _\varphi + b_3(\varphi ) \partial _{yyy} + b_1 \partial _y + b_0 ) \Pi _S^\bot + {\mathfrak R}_1 \end{aligned}$$
(8.13)

where \( {\mathfrak R}_1 := \Phi ^{-1} \mathcal{R}_{II} \). Note that the term \(b_2 \partial _{yy}\) has disappeared from (8.13) because, by the Hamiltonian nature of \( \mathcal{L}_1 \), the coefficient \( b_2 = 2 (b_3)_y \) (see [3, Remark 3.5]) and therefore, by (8.12), \( b_2 = 2 (b_3)_y = 0 \).

Lemma 16

(Lemma 8.2 of [5]) The operator \( {\mathfrak R}_1 \) in (8.13) has the form (7.4).

Since \(a_1 = 1 + O(\varepsilon ^3)\) and \(a_0 = 3\varsigma T_\delta ^2 + O(\varepsilon ^3)\) (see (7.25), (7.26) for the precise estimates), by the usual composition estimates we deduce the following lemma.

Lemma 17

There is \(\sigma = \sigma ({\tau },\nu ) > 0\) such that

$$\begin{aligned}&\Vert \beta \Vert _s^{\mathrm {Lip}(\gamma )}+ \Vert b_3 -1 \Vert _s^{\mathrm {Lip}(\gamma )}+ \Vert b_1 - 3 \varsigma T_\delta ^2 \Vert _s^{\mathrm {Lip}(\gamma )}+ \Vert b_0 - 3 \varsigma (T_\delta ^2)_x \Vert _s^{\mathrm {Lip}(\gamma )}\nonumber \\&\quad \le _s \varepsilon ^3 (1 + \Vert {\mathfrak {I}}_\delta \Vert _{s+\sigma }^{\mathrm {Lip}(\gamma )}), \end{aligned}$$
(8.14)
$$\begin{aligned}&\Vert \partial _i \beta [{\widehat{\imath }}] \Vert _s + \Vert \partial _i b_3 [{\widehat{\imath }}] \Vert _s + \Vert \partial _i(b_1 - 3 \varsigma T_\delta ^2) [{\widehat{\imath }}] \Vert _s + \Vert \partial _i(b_0 - 3 \varsigma (T_\delta ^2)_x) [{\widehat{\imath }}] \Vert _s \nonumber \\&\quad \le _s \varepsilon ^3 \big ( \Vert {\widehat{\imath }} \Vert _{s+\sigma } + \Vert {\mathfrak I}_\delta \Vert _{s+\sigma } \Vert {\widehat{\imath }} \Vert _{s_0+\sigma } \big ), \end{aligned}$$
(8.15)

where \(T_\delta \) is defined in (7.8). The transformations \(\Phi \), \(\Phi ^{-1}\) satisfy

$$\begin{aligned} \Vert \Phi h \Vert _s^{{\mathrm {Lip}(\gamma )}} + \Vert \Phi ^{-1} h \Vert _s^{{\mathrm {Lip}(\gamma )}}&\le _s \Vert h \Vert _{s + 1}^{{\mathrm {Lip}(\gamma )}} + \Vert {\mathfrak I}_\delta \Vert _{s + \sigma }^{{\mathrm {Lip}(\gamma )}} \Vert h \Vert _{s_0 + 1}^{{\mathrm {Lip}(\gamma )}}\end{aligned}$$
(8.16)
$$\begin{aligned} \Vert \partial _i (\Phi h) [{\widehat{\imath }}] \Vert _s + \Vert \partial _i (\Phi ^{-1} h) [\widehat{\imath }] \Vert _s&\le _s \Vert h \Vert _{s + \sigma } \Vert {\widehat{\imath }} \Vert _{s_0 + \sigma } + \Vert h\Vert _{s_0 + \sigma } \Vert {\widehat{\imath }} \Vert _{s + \sigma } \nonumber \\&\quad + \Vert {\mathfrak I}_\delta \Vert _{s + \sigma } \Vert h\Vert _{s_0 + \sigma } \Vert {\widehat{\imath }} \Vert _{s_0 + \sigma }. \end{aligned}$$
(8.17)

Moreover the remainder \({\mathfrak R}_1\) has the form (7.4), where the functions \(\chi _j({\tau })\), \(g_j({\tau })\) satisfy the estimates (7.15) and (7.16) uniformly in \({\tau }\in [0, 1]\).

8.2 Time reduction at the order \( \partial _{xxx}\)

The goal of this section is to get a constant coefficient in front of \( \partial _{yyy} \), using a quasi-periodic reparametrization of time. We consider the change of variable

$$\begin{aligned} (B w)(\varphi , y) := w(\varphi + \omega \alpha (\varphi ), y), \quad ( B^{-1} h)(\vartheta , y ) := h(\vartheta + \omega \tilde{\alpha }(\vartheta ), y), \end{aligned}$$
(8.18)

where \(\mathbb T^\nu \rightarrow \mathbb T^\nu \), \(\vartheta \mapsto \varphi = \vartheta + \omega \tilde{\alpha }(\vartheta )\) is the inverse diffeomorphism of \( \vartheta = \varphi + \omega \alpha (\varphi ) \) in \(\mathbb T^\nu \). By conjugation, the differential operators become

$$\begin{aligned} B^{-1} \omega \cdot \partial _\varphi B = \rho (\vartheta )\, \omega \cdot \partial _{\vartheta } , \quad B^{-1} \partial _y B = \partial _y, \quad \rho := B^{-1} (1 + \omega \cdot \partial _{\varphi } \alpha ). \end{aligned}$$
(8.19)

By (8.13), using also that B and \( B^{-1} \) commute with \( \Pi _S^\bot \), the conjugate operator \(B^{-1} \mathcal{L}_1 B\) is equal to

$$\begin{aligned} \Pi _S^\bot [ \rho \omega \cdot \partial _{\vartheta } + (B^{-1} b_3) \partial _{yyy} + ( B^{-1} b_1 ) \partial _y + ( B^{-1} b_0 ) ] \Pi _S^\bot + B^{-1} {\mathfrak R}_1 B. \end{aligned}$$
(8.20)

We choose \( \alpha \) such that \((B^{-1}b_3 )(\vartheta ) = m_3 \rho (\vartheta )\) for some constant \(m_3 \in \mathbb R\), namely

$$\begin{aligned} b_3 (\varphi ) = m_3 ( 1 + \omega \cdot \partial _\varphi \alpha (\varphi ) ) \end{aligned}$$
(8.21)

(recall (8.19)). The unique solution with zero average of (8.21) is

$$\begin{aligned} \alpha (\varphi ) := \frac{1}{m_3} ( \omega \cdot \partial _\varphi )^{-1} ( b_3 - m_3 ) (\varphi ) , \quad m_3 := \frac{1}{(2 \pi )^\nu } \int _{\mathbb T^\nu } b_3 (\varphi ) d \varphi . \end{aligned}$$
(8.22)

Hence, by (8.20),

$$\begin{aligned}&\begin{array}{llllll}&B^{-1} \mathcal{L}_1 B = \rho \mathcal{L}_2,&\,&\mathcal{L}_2 := \Pi _S^\bot ( \omega \cdot \partial _{\vartheta } + m_3 \partial _{yyy} + c_1 \partial _y + c_0 ) \Pi _S^\bot + {\mathfrak R}_2 \end{array}\end{aligned}$$
(8.23)
$$\begin{aligned}&\begin{array}{llllll}&c_1 := \rho ^{-1} (B^{-1} b_1 ), \quad&\,&c_0 := \rho ^{-1} (B^{-1} b_0 ), \quad {\mathfrak R}_2 := \rho ^{-1} B^{-1}{\mathfrak R}_1 B . \end{array} \end{aligned}$$
(8.24)

The transformed operator \(\mathcal{L}_2\) in (8.23) is still Hamiltonian, because the reparametrization of time preserves the Hamiltonian structure (see Section 2.2 and Remark 3.7 in [3]).

Lemma 18

There is \( \sigma = \sigma (\nu ,{\tau }) > 0 \) (possibly larger than \( \sigma \) in Lemma 17) such that

$$\begin{aligned} | m_3 - 1 |^{\mathrm {Lip}(\gamma )}&\le C \varepsilon ^3, \quad | \partial _i m_3 [\widehat{\imath }]| \le C \varepsilon ^3 \Vert \widehat{\imath }\Vert _{s_0 + \sigma } \nonumber \\ \Vert \alpha \Vert _s^{\mathrm {Lip}(\gamma )}&\le _s \varepsilon ^3 \gamma ^{-1} (1 + \Vert {\mathfrak {I}}_\delta \Vert _{s + \sigma }^{\mathrm {Lip}(\gamma )}) \Vert \partial _i \alpha [\widehat{\imath }] \Vert _s&\le _s \varepsilon ^3 \gamma ^{-1} ( \Vert \widehat{\imath }\Vert _{s+\sigma } + \Vert {\mathfrak I}_\delta \Vert _{s+\sigma } \Vert \widehat{\imath }\Vert _{s_0+\sigma } ) \nonumber \\ \Vert \rho -1 \Vert _s^{\mathrm {Lip}(\gamma )}&\le _s \varepsilon ^3 (1 + \Vert {\mathfrak {I}}_\delta \Vert _{s+\sigma }^{\mathrm {Lip}(\gamma )}) \nonumber \\ \Vert \partial _i \rho [\widehat{\imath }] \Vert _s&\le _s \varepsilon ^3 ( \Vert \widehat{\imath }\Vert _{s+\sigma } + \Vert {\mathfrak {I}}_\delta \Vert _{s+\sigma } \Vert \widehat{\imath }\Vert _{s_0 +\sigma } ) \end{aligned}$$
(8.25)
$$\begin{aligned}&\Vert c_1 - 3\varsigma T_\delta ^2 \Vert _s^{\mathrm {Lip}(\gamma )}+ \Vert c_0 - 3\varsigma (T_\delta ^2)_x \Vert _s^{\mathrm {Lip}(\gamma )}\le _s \varepsilon ^5 \gamma ^{-1} (1 + \Vert {\mathfrak {I}}_\delta \Vert _{s+\sigma }^{\mathrm {Lip}(\gamma )}), \nonumber \\&\quad \quad \Vert \partial _i (c_1 - 3 \varsigma T_\delta ^2) [\widehat{\imath }] \Vert _s + \Vert \partial _i (c_0 - 3 \varsigma (T_\delta ^2)_x) [\widehat{\imath }] \Vert _s \nonumber \\&\quad \le _s \varepsilon ^5 \gamma ^{-1} ( \Vert \widehat{\imath }\Vert _{s+\sigma } + \Vert {\mathfrak {I}}_\delta \Vert _{s+\sigma } \Vert \widehat{\imath }\Vert _{s_0 +\sigma } ). \end{aligned}$$
(8.26)

The transformations B, \(B^{-1}\) satisfy the estimates (8.16) and (8.17). The remainder \( \mathfrak {R}_2 \) has the form (7.4), and the functions \(g_j({\tau })\), \(\chi _j({\tau })\) satisfy the estimates (7.15) and (7.16) for all \({\tau }\in [0,1]\).

Proof

To estimate \(\Vert \alpha \Vert _s^{\mathrm {Lip}(\gamma )}\) we also differentiate (8.22) with respect to the parameter \( \omega \). Note that \(c_1 - 3 \varsigma B^{-1}(T_\delta ^2) = O(\varepsilon ^3)\), and similarly \(c_0 - 3 \varsigma B^{-1}((T_\delta ^2)_x)\) \(= O(\varepsilon ^3)\). The factor \(\varepsilon ^5 \gamma ^{-1}\) in the last two inequalities comes from the estimate of the difference \(B^{-1}(T_\delta ^2) - T_\delta ^2 \simeq (T_\delta ^2)_\varphi \alpha = O(\varepsilon ^2 \varepsilon ^3 \gamma ^{-1})\). \(\square \)

8.3 Translation of the space variable

In this section we remove the space average from the coefficient in front of \( \partial _y \). Consider the change of the space variable \( z = y + p(\vartheta ) \) which induces on \( H^s_{S^\bot } (\mathbb T^{\nu +1}) \) the operators

$$\begin{aligned} (\mathcal{T} w)(\vartheta , y ) := w(\vartheta , y + p(\vartheta )) , \quad (\mathcal{T}^{-1} h) (\vartheta ,z ) = h(\vartheta , z - p(\vartheta )) \end{aligned}$$
(8.27)

(which are a particular case of those used in Sect. 8.1). The differential operators become \( \mathcal{T}^{-1} \omega \cdot \partial _{\vartheta } \mathcal{T} \) \( = \omega \cdot \partial _{\vartheta } + \{ \omega \cdot \partial _{\vartheta }p (\vartheta ) \} \partial _z \), \( \mathcal{T}^{-1} \partial _{y} \mathcal{T} = \partial _{z} \). Since \(\mathcal {T}, \mathcal {T}^{-1}\) commute with \( \Pi _S^\bot \), we get

$$\begin{aligned} \mathcal {L}_3&:= \mathcal{T}^{-1}\mathcal{L}_2 \mathcal{T} = \Pi _S^\bot (\omega \cdot \partial _{\vartheta } + m_3 \partial _{zzz} + d_1 \partial _z + d_0 ) \Pi _S^\bot + {\mathfrak R}_3 ,\end{aligned}$$
(8.28)
$$\begin{aligned} d_1&:= (\mathcal{T}^{-1} c_1) + \omega \cdot \partial _{\vartheta } p , \quad d_0 := \mathcal{T}^{-1} c_0 , \quad {\mathfrak R}_3 := \mathcal{T}^{-1} {\mathfrak R}_2 \mathcal{T}. \end{aligned}$$
(8.29)

We choose

$$\begin{aligned} m_1 := \frac{1}{(2\pi )^{\nu + 1}} \int _{\mathbb T^{\nu + 1}} c_1 d\vartheta dy , \quad p := (\omega \cdot \partial _\vartheta )^{-1} \left( m_1 - \frac{1}{2 \pi } \int _{\mathbb T} c_1 d y \right) , \end{aligned}$$
(8.30)

so that

$$\begin{aligned} \frac{1}{2 \pi } \int _{\mathbb T} d_1 (\vartheta , z) \, dz = m_1 \quad \forall \vartheta \in \mathbb T^\nu . \end{aligned}$$
(8.31)

Recalling (8.26), we analyze the space average of \(c_1\) in more detail. To avoid ambiguity between the space variable \(y \in \mathbb T\) and the action \(y_\delta : \mathbb T^\nu \rightarrow \mathbb R^\nu \) of (7.8), we rename \(x \in \mathbb T\) the space variable, and \(\varphi \in \mathbb T^\nu \) the variable on the torus (time variable). Let

$$\begin{aligned} {\bar{v}} (\varphi , x) := \sum _{j \in S} \sqrt{\xi _j} e^{{\mathrm i}\ell (j) \cdot \varphi } e^{{\mathrm i}j x}, \end{aligned}$$
(8.32)

where \(\ell :S \rightarrow \mathbb Z^\nu \) is the odd injective map (see (1.11))

$$\begin{aligned} \ell (\bar{\jmath }_i) := e_i , \quad \ell (-\bar{\jmath }_i) := - e_i , \quad i = 1,\ldots ,\nu \end{aligned}$$
(8.33)

and \(e_i = (0,\ldots ,1, \ldots ,0)\) denotes the i-th vector of the canonical basis of \(\mathbb R^\nu \). In view of the next linear Birkhoff normal form step (whose goal is to normalize the term of size \(\varepsilon ^2\)), we observe that the component of order \(\varepsilon ^2\) in \(T_\delta ^2\) (see (7.8)) is \(\varepsilon ^2 \bar{v}^2\), with

$$\begin{aligned} \Vert T_\delta ^2 - \varepsilon ^2 \bar{v}^2 \Vert _s^{\mathrm {Lip}(\gamma )}&\le _s \varepsilon ^2 \Vert {\mathfrak {I}}_\delta \Vert _{s+\sigma }^{\mathrm {Lip}(\gamma )}, \nonumber \\ \Vert \partial _i (T_\delta ^2 - \varepsilon ^2 \bar{v}^2)[\widehat{\imath }\,] \Vert _s&\le _s \varepsilon ^2 ( \Vert \widehat{\imath }\Vert _{s+\sigma } + \Vert {\mathfrak {I}}_\delta \Vert _{s+\sigma } \Vert \widehat{\imath }\Vert _{s_0+\sigma } ) . \end{aligned}$$
(8.34)

Moreover, from (7.8), since \((v_\delta , z_0)_{L^2(\mathbb T)} = 0\), and \((\theta _0)_{-j} = - (\theta _0)_j\) for all \(j \in S\), we have

$$\begin{aligned} \int _\mathbb TT_\delta ^2 \, dx = \varepsilon ^2 \int _\mathbb Tv_\delta ^2 \, dx + \varepsilon ^{2b} \int _\mathbb Tz_0^2 \, dx = \varepsilon ^2 \sum _{j \in S} \xi _j + \varepsilon ^{2b} \sum _{j \in S} |j| (y_\delta )_j + \varepsilon ^{2b} \int _\mathbb Tz_0^2 \, dx. \end{aligned}$$

We define

$$\begin{aligned} \widetilde{d}_1 := d_1 - 3\varsigma \varepsilon ^2 \bar{v}^2, \quad \widetilde{d}_0 := d_0 - 3\varsigma \varepsilon ^2 (\bar{v}^2)_x, \end{aligned}$$
(8.35)

and note that, by (8.31) and (8.32),

$$\begin{aligned} \frac{1}{2\pi } \int _\mathbb T\tilde{d}_1 \, dx = m_1 - \frac{3\varsigma \varepsilon ^2}{2\pi } \int _\mathbb T\bar{v}^2 \, dx = m_1 - \varepsilon ^2 c(\xi ), \quad c(\xi ) := 3\varsigma \sum _{j \in S} \xi _j . \end{aligned}$$
(8.36)

Using the explicit formulae above, and Lemma 13 for the estimate of \(\mathfrak R_3\), we get the following bounds.

Lemma 19

There is \( \sigma := \sigma (\nu ,{\tau }) > 0 \) (possibly larger than in Lemma 18) such that

$$\begin{aligned} | m_1 - \varepsilon ^2 c(\xi ) |^{\mathrm {Lip}(\gamma )}&\le C \varepsilon ^5 \gamma ^{-1}, \quad | \partial _i m_1 [\widehat{\imath }]| \le C \varepsilon ^{2b} \Vert {\widehat{\imath }} \Vert _{s_0 + \sigma } \Vert p \Vert _s^{\mathrm {Lip}(\gamma )}\nonumber \\&\le _s \varepsilon ^5 \gamma ^{-2} + \Vert {\mathfrak {I}}_\delta \Vert _{s + \sigma }^{\mathrm {Lip}(\gamma )}, \nonumber \\ \Vert \partial _i p [\widehat{\imath }] \Vert _s&\le _s \Vert \widehat{\imath }\Vert _{s+\sigma } + \varepsilon ^5 \gamma ^{-2} \Vert {\mathfrak I}_\delta \Vert _{s+\sigma } \Vert \widehat{\imath }\Vert _{s_0+\sigma } , \end{aligned}$$
(8.37)
$$\begin{aligned} \Vert \widetilde{d}_k \Vert _s^{\mathrm {Lip}(\gamma )}&\le _s \varepsilon ^7 \gamma ^{-2} + \varepsilon ^2 \Vert {\mathfrak {I}}_\delta \Vert _{s + \sigma }^{\mathrm {Lip}(\gamma )}, \quad k = 0,1, \nonumber \\ \Vert \partial _i \widetilde{d}_k [\widehat{\imath }] \Vert _s&\le _s \varepsilon ^5 \gamma ^{-1} (\Vert \widehat{\imath }\Vert _{s + \sigma } + \Vert {\mathfrak I}_\delta \Vert _{s + \sigma } \Vert \widehat{\imath }\Vert _{s_0 + \sigma } ) , \quad k=0,1. \end{aligned}$$
(8.38)

The matrix s -decay norm (see (2.4)) of the operator \({\mathfrak R}_3\) satisfies

$$\begin{aligned} |{\mathfrak R}_3|_s^{{\mathrm {Lip}(\gamma )}}&\le _s \varepsilon ^{1+b} \Vert {\mathfrak I}_\delta \Vert _{s+\sigma }^{\mathrm {Lip}(\gamma )}, \nonumber \\ | \partial _i {\mathfrak R}_3 [\widehat{\imath }] |_s&\le _s \varepsilon ^{1+b} ( \Vert \widehat{\imath }\Vert _{s+\sigma } + \Vert {\mathfrak I}_\delta \Vert _{s+\sigma } \Vert \widehat{\imath }\Vert _{s_0+\sigma } ). \end{aligned}$$
(8.39)

The transformations \(\mathcal{T}\), \(\mathcal{T}^{-1}\) satisfy (8.16) and (8.17).

Remark 6

When \( K = H + \lambda M^2 \), \( \lambda = 3 / 4 \), the constant coefficient \(m_1\) in (8.30) becomes of size

$$\begin{aligned} | m_1 |^{\mathrm {Lip}(\gamma )}\le C \varepsilon ^5 \gamma ^{-1}. \end{aligned}$$
(8.40)

The inequality (8.40) is the key difference between the cases \(H + (3\varsigma /4) M^2\) and H (compare (8.40) with (8.37), where \(m_1\) contains the non-perturbative term \(\varepsilon ^2 c(\xi )\)).

It is sufficient to estimate \( \mathfrak R_3 \) (which has the form (7.4)) only in the s -decay norm (see (8.39)) because the next transformations will preserve it. Such norms will be used in the reducibility scheme of Sect. 8.6.

8.4 Linear Birkhoff normal form

Now we normalize the terms of order \( \varepsilon ^2 \) of \( \mathcal{L}_3 \). This step is different from the reducibility steps that we shall perform in Sect. 8.6: the diophantine constant \(\gamma \) in (5.3) is \( \gamma = o(\varepsilon ^2 ) \), and therefore the terms of order \( \varepsilon ^2 \) are not perturbative, because \(\varepsilon ^2 \gamma ^{-1}\) is not small (in fact, it is big). The reduction of this section is possible thanks to the special form of the term \( \varepsilon ^2 \mathcal{B} \) defined in (8.41): the harmonics of \( \varepsilon ^2 \mathcal {B}\) corresponding to a possible small divisor are naught, except \(\mathcal {B}_j^j(0)\), see Lemma 20. Note that, since the previous linear transformations \( \Phi \), B , \( \mathcal{T} \) are \( O(\varepsilon ^5 \gamma ^{-2} ) \)-close to the identity, the terms of order \( \varepsilon ^2 \) in \( \mathcal{L}_3 \) are the same as in the original linearized operator.

First, we collect all the terms of order \( \varepsilon ^2 \) in the operator \( \mathcal{L}_3 \) in (8.28). We have

$$\begin{aligned} \mathcal {L}_3 = \Pi _S^\bot ( \omega \cdot \partial _\varphi + m_3 \partial _{xxx} + \varepsilon ^2 \mathcal{B} + {\tilde{d}}_1 \partial _x + {\tilde{d}}_0 ) \Pi _S^\bot + {\mathfrak R}_3 \end{aligned}$$

where \( \widetilde{d}_1, \widetilde{d}_0, {\mathfrak R}_3 \) are defined in (8.29), (8.35) and (recall (8.32))

$$\begin{aligned} \mathcal{B} h := 3 \varsigma \bar{v}^2 \partial _x h + 3 \varsigma (\bar{v}^2)_x h = \partial _x (3 \varsigma \bar{v}^2 h ). \end{aligned}$$
(8.41)

Note that \(\mathcal{B}\) is the linear Hamiltonian vector field of \( H_{S}^\bot \) generated by the Hamiltonian \( z \mapsto \frac{3\varsigma }{2} \int _\mathbb T\bar{v}^2 z^2 \, dx \).

We transform \( \mathcal{L}_3 \) by a symplectic operator \( \Phi _2 : H_{S^\bot }^s(\mathbb T^{\nu + 1}) \rightarrow H_{S^\bot }^s(\mathbb T^{\nu + 1}) \) of the form

$$\begin{aligned} \Phi _2 := \mathrm{exp}(\varepsilon ^2 A) = I_{H_S^\bot } + \varepsilon ^2 A + \varepsilon ^4 \widehat{A}, \quad \widehat{A} := \sum _{k \ge 2} \frac{\varepsilon ^{2(k-2)}}{k!} A^k , \end{aligned}$$
(8.42)

where \( A(\varphi ) h = {\mathop \sum }_{j,j' \in S^c} A_j^{j'}(\varphi ) h_{j'} e^{{\mathrm i}j x} \) is a Hamiltonian vector field. The map \( \Phi _2 \) is symplectic, because it is the time 1 flow of a Hamiltonian vector field. We calculate

$$\begin{aligned}&\mathcal{L}_3 \Phi _2 - \Phi _2 \Pi _S^\bot ( \mathcal{D}_\omega + m_3 \partial _{xxx} ) \Pi _S^\bot \nonumber \\&\quad = \varepsilon ^2 \Pi _S^\bot \{ \mathcal {B}+ (\mathcal{D}_\omega A) + m_3 [\partial _{xxx}, A] \} \Pi _S^\bot + \Pi _S^\bot \tilde{d}_1 \partial _x \Pi _S^\bot + R_3 \end{aligned}$$
(8.43)

where

$$\begin{aligned} R_3&:= \varepsilon ^4 \Pi _S^\bot \{ (\mathcal {D}_\omega \widehat{A}) + m_3 [\partial _{xxx}, \widehat{A}] + \mathcal {B}(A + \varepsilon ^2 \widehat{A}) \} \Pi _S^\bot \nonumber \\&\quad + \Pi _S^\bot \tilde{d}_1 \partial _x \Pi _S^\bot (\Phi _2 - I) + (\Pi _S^\bot \tilde{d}_0 \Pi _S^\bot + \mathfrak R_3 ) \Phi _2 . \end{aligned}$$
(8.44)

Remark 7

\( R_3 \) has no longer the form (7.4). However \( R_3 = O( \partial _x^0 ) \) because \(A = O(\partial _x^{-1})\) (see Lemma 22), and therefore \(\Phi _2 - I_{H_S^\bot } = O(\partial _x^{-1})\). Moreover the matrix decay norm of \( R_3 \) is \( o(\varepsilon ^2) \).

In order to normalize the term of order \(\varepsilon ^2\) of (8.43), we expand \(A_j^{j'}(\varphi ) = \sum _{l \in \mathbb Z^\nu } A_j^{j'}(l) e^{{\mathrm i}l \cdot \varphi }\), and for each \(j, j' \in S^c\), \(l \in \mathbb Z^\nu \), we choose

$$\begin{aligned} A_j^{j'}(l) := {\left\{ \begin{array}{ll} - \dfrac{\mathcal {B}_j^{j'}(l)}{{\mathrm i}(\omega \cdot l + m_3( j'^3 - j^3))} &{}\quad \text {if} \ \bar{\omega } \cdot l + j'^3 - j^3 \ne 0 , \\ 0 &{}\quad \text {otherwise}. \end{array}\right. } \end{aligned}$$
(8.45)

This definition is well posed. Indeed, by (8.32) and (8.41),

$$\begin{aligned} \mathcal{B}_{j}^{j'}(l) := 3 \varsigma {\mathrm i}j \sum _{\begin{array}{c} j_1, j_2 \in S \\ j_1 + j_2 = j - j' \\ \ell (j_1) + \ell (j_2) = l \end{array}} \sqrt{\xi _{j_1} \xi _{j_2}} . \end{aligned}$$
(8.46)

In particular \( \mathcal{B}_{j}^{j'}(l) = 0 \) unless \( |l| \le 2 \). For \(|l| \le 2\) and \( \bar{\omega } \cdot l + j'^3 - j^3 \ne 0 \), the denominators in (8.45) satisfy

$$\begin{aligned} |\omega \cdot l +m_3( j'^3 - j^3)|&= | m_3 (\bar{\omega }\cdot l + j'^3 - j^3) + ( \omega - m_3 \bar{\omega }) \cdot l | \nonumber \\&\ge |m_3| | \bar{\omega }\cdot l + j'^3 - j^3 | - | \omega - m_3 \bar{\omega }| |l| \ge 1/2 \end{aligned}$$
(8.47)

for \( \varepsilon \) small, because \( |\bar{\omega } \cdot l + j'^3 - j^3| \ge 1 \) (\(\bar{\omega } \cdot l + j'^3 - j^3\) is a nonzero integer), \( \omega = \bar{\omega }+ O(\varepsilon ^2) \) and by (8.25).

Remark 8

The operator A defined in (8.45) is Hamiltonian, because \(\mathcal {B}\) is Hamiltonian. The reason is a general fact: the denominators \( \delta _{l,j,k} := {\mathrm i}(\omega \cdot l + m_3( k^3 - j^3)) \) satisfy \( \overline{ \delta _{l,j,k} } = \delta _{-l,k,j} \) and an operator \(G(\varphi )\) is self-adjoint with respect to the \(L^2(\mathbb T)\) scalar product if and only if its matrix elements satisfy \( \overline{ G_j^k(l) } = G_k^j(-l) \), see [3, Remark 4.5]. Alternatively, we could solve the homological equation of this Birkhoff step directly for the Hamiltonian function whose flow generates \( \Phi _2 \).

By the definition (8.45), the term of order \(\varepsilon ^2\) in (8.43) is zero on the Fourier indices \((l,j,j')\) such that \(\bar{\omega }\cdot l + j'^3 - j^3 \ne 0\), while it is equal to \(\varepsilon ^2 \mathcal {B}_j^{j'}(l)\) for \((l,j,j')\) such that \(\bar{\omega }\cdot l + j'^3 - j^3 = 0\). Now we prove that the only nonzero components of \(\mathcal {B}\) that remain in (8.43) are \(\mathcal {B}_j^j(0)\).

Lemma 20

If \(\bar{\omega }\cdot l + j'^3 - j^3 = 0\) and \(\mathcal {B}_j^{j'}(l) \ne 0\), then \(l=0\) and \(j=j'\).

Proof

If \(\mathcal {B}_j^{j'}(l) \ne 0\), then, by (8.46), there exist \(j_1, j_2 \in S\) such that \(j_1 + j_2 = j - j'\) and \(\ell (j_1) + \ell (j_2) = l\). Hence, recalling (1.19) and (8.33),

$$\begin{aligned} 0 = \bar{\omega }\cdot l + j'^3 - j^3 = \bar{\omega }\cdot \ell (j_1) + \bar{\omega }\cdot \ell (j_2) + j'^3 - j^3 = j_1^3 + j_2^3 + j'^3 - j^3. \end{aligned}$$

This equality, together with \(j_1 + j_2 + j' - j = 0\), implies that \((j_1 + j_2) (j_1 + j') (j_2 + j') = 0\) by Lemma 4. Since \(j_1, j_2 \in S\), \(j' \in S^c\), the set S is symmetric, and \(0 \notin S\), we deduce that the factors \(j_1 + j'\) and \(j_2+j'\) are nonzero. Hence \(j_1 + j_2 = 0\), and therefore \(l=\ell (j_1) + \ell (-j_1) = 0\). \(\square \)

Thus, the only nonzero term of order \(\varepsilon ^2\) in (8.43) is \(\mathcal {B}_j^j(0)\). By (8.46), we calculate \(\mathcal {B}_j^j(0) = {\mathrm i}j c(\xi )\), where \(c(\xi )\) is defined in (8.36). Hence, by (8.36), (8.45) and Lemma 20, the term of order \(\varepsilon ^2\) in (8.43) is

$$\begin{aligned} \varepsilon ^2 \Pi _S^\bot \{ \mathcal {B}+ (\mathcal{D}_\omega A) + m_3 [\partial _{xxx}, A] \} \Pi _S^\bot = \varepsilon ^2 c(\xi ) \partial _x \Pi _S^\bot . \end{aligned}$$
(8.48)

Remark 9

When \( K = H + \lambda M^2 \), \( \lambda = 3 \varsigma / 4 \), the operator in (8.41) becomes \(\mathcal {B}h = \partial _x (3 \varsigma \pi _0(\bar{v}^2) h)\). Hence \(\mathcal {B}_j^j(0) = 0\), and the right-hand side term in (8.48) is zero, namely the first step of linear Birkhoff normal form completely eliminates all the terms of order \(\varepsilon ^2\).

We now estimate the transformation A .

Lemma 21

  1. (i)

    For all \(l \in \mathbb Z^\nu \), \(j,j' \in S^c\),

    $$\begin{aligned} | A_j^{j'}(l)| \le C (| j | + | j' |)^{-1}, \quad | A_j^{j'}(l)|^\mathrm{lip} \le \varepsilon ^{-2} (|j| + |j'|)^{-1} . \end{aligned}$$
    (8.49)
  2. (ii)

    \( (A_1)_j^{j'}(l) = 0\) for all \(l \in \mathbb Z^\nu \), \(j,j' \in S^c\) such that \(|j - j'| > 2 C_S \), where \(C_S := \max \{ |j| : j \in S\}\).

Proof

(i) As already observed, for all \(|l| > 2\) one has \( \mathcal {B}_j^{j'}(l) = 0\), and therefore \( A_j^{j'}(l) = 0\). For \(|l| \le 2\), \( j \ne j' \), one has (since \( | \omega | \le |\bar{\omega }| + 1 \))

$$\begin{aligned} |\omega \cdot l + m_3 (j'^3 - j^3)| \ge |m_3||j'^3 - j^3| - |\omega \cdot l| \ge \tfrac{1}{4} (j'^2 + j^2) - 2 |\omega | \ge \tfrac{1}{8} (j'^2 + j^2) \end{aligned}$$

for \((j'^2 + j^2) \ge C\), for some constant C. Since also (8.47) holds, we deduce that, for all \( j \ne j' \),

$$\begin{aligned} A_j^{j'}(l) \ne 0 \Rightarrow |\omega \cdot l + m_3 (j'^3 - j^3)| \ge c ( | j | + | j' | )^2 . \end{aligned}$$
(8.50)

On the other hand, if \( j = j' \in S^c\), and \(l \ne 0\), then \(\mathcal {B}_j^{j'}(l) = 0\), and therefore \(A_j^{j'}(l) = 0\). For \(j=j'\) and \(l=0\) we also have \(A_j^{j'}(l) = 0\) because \(\bar{\omega }\cdot l + j'^3 - j^3 = 0\). Hence (8.50) holds for all \( j, j' \). By (8.45), (8.46) and (8.50) we deduce the first bound in (8.49). The Lipschitz bound follows similarly (use also \( |j - j'| \le 2 C_S \)). (ii) follows by (8.45) and (8.46). \(\square \)

The previous lemma means that \( A = O(| \partial _x|^{-1})\). More precisely, we deduce the following bound.

Lemma 22

(Lemma 8.19 of [5]) \( | A \partial _x |_s^{\mathrm {Lip}(\gamma )}+ | \partial _x A |_s^{\mathrm {Lip}(\gamma )}\le C(s) \).

It follows that the symplectic map \( \Phi _2 \) in (8.42) is invertible for \( \varepsilon \) small, with inverse

$$\begin{aligned} \begin{array}{ll} &{} \Phi _2^{-1} = \mathrm{exp}(-\varepsilon ^2 A) = I_{H_S^\bot } + \varepsilon ^2 {\check{A}} , \quad {\check{A}} := \sum \limits _{n \ge 1} \frac{\varepsilon ^{2n-2}}{n!} \, (-A)^n , \\ &{} | {\check{A}} \partial _x |_s^{\mathrm {Lip}(\gamma )}+ | \partial _x {\check{A}} |_s^{\mathrm {Lip}(\gamma )}\le C(s) . \end{array} \end{aligned}$$
(8.51)

By (8.43) and (8.48) we get the Hamiltonian operator

$$\begin{aligned} \mathcal{L}_4&:= \Phi _2^{-1} \mathcal{L}_3 \Phi _2 = \Pi _S^\bot ( \mathcal{D}_\omega + m_3 \partial _{xxx} + (\varepsilon ^2 c(\xi ) + {\tilde{d}}_1) \partial _x ) \Pi _S^\bot + R_4 ,\end{aligned}$$
(8.52)
$$\begin{aligned} R_4&:= (\Phi _2^{-1} - I) \Pi _S^\bot (\varepsilon ^2 c(\xi ) + {\tilde{d}}_1) \partial _x \Pi _S^\bot + \Phi _2^{-1} R_3 . \end{aligned}$$
(8.53)

Lemma 23

There is \( \sigma = \sigma (\nu ,{\tau }) > 0 \) (possibly larger than in Lemma 19) such that

$$\begin{aligned} | R_4 |_s^{\mathrm {Lip}(\gamma )}&\le _s \varepsilon ^7 \gamma ^{-2} + \varepsilon ^2 \Vert {\mathfrak {I}}_\delta \Vert _{s+\sigma }^{\mathrm {Lip}(\gamma )}, \nonumber \\ | \partial _i R_4 [\widehat{\imath }] |_s&\le _s \varepsilon ^{1+b} \Vert \widehat{\imath }\Vert _{s + \sigma } + \varepsilon ^2 \Vert {\mathfrak {I}}_\delta \Vert _{s + \sigma } \Vert \widehat{\imath }\Vert _{ s_0 + \sigma } . \end{aligned}$$
(8.54)

Proof

Use (8.25), (8.38), (8.39), (8.42), (8.44), and Lemma 22. \(\square \)

8.5 Space reduction at the order \( \partial _x \)

The goal of this section is to transform \( \mathcal{L}_5 \) in (8.52) so that the coefficient of \( \partial _x \) becomes constant. We conjugate \( \mathcal{L}_4 \) via a symplectic map of the form

$$\begin{aligned} \mathcal{S} := \exp (\Pi _S^\bot (w \partial _x^{-1})) \Pi _S^\bot = \Pi _S^\bot \big ( I + w \partial _x^{-1} \big ) \Pi _S^\bot + \widehat{\mathcal{S}}, \end{aligned}$$
(8.55)

where \(\widehat{\mathcal{S}} := \sum _{k \ge 2} \frac{1}{k!} [\Pi _S^\bot (w \partial _x^{-1})]^k \Pi _S^\bot \) and \( w :\mathbb T^{\nu +1} \rightarrow \mathbb R\) is a function. Note that the linear operator \(\Pi _S^\bot (w \partial _x^{-1}) \Pi _S^\bot \) is the Hamiltonian vector field generated by the Hamiltonian \( - \frac{1}{2} \int _\mathbb Tw (\partial _x^{-1} h)^2\,dx\), \(h \in H_S^\bot \). We calculate

$$\begin{aligned}&\mathcal{L}_4 \mathcal{S} - \mathcal{S} \Pi _S^\bot ( \mathcal{D}_\omega + m_3 \partial _{xxx} + m_1 \partial _x ) \Pi _S^\bot \\&\quad = \Pi _S^\bot ( 3 m_3 w_x + \varepsilon ^2 c(\xi ) + \tilde{d}_1 - m_1 ) \partial _x \Pi _S^\bot + \tilde{R}_5 , \end{aligned}$$
$$\begin{aligned} {\tilde{R}}_{5}&:= \Pi _{S}^{\bot } \{ ( 3 m_{3} w_{xx} + (\varepsilon ^2 c(\xi ) + {\tilde{d}}_{1} - m_{1}) {\Pi _{S}}^{\bot w} ) \pi _{0}\\&\quad + ( ({\mathcal {D}}_{\omega } w) + m_{3} w_{xxx} + (\varepsilon ^2 c(\xi ) + {\tilde{d}}_{1}) \Pi _{S}^\bot w_{x} ) \partial _{x}^{-1}\\&\quad + ({\mathcal {D}}_{\omega } {\widehat{\mathcal {S}}}) + m_3 [\partial _{xxx}, {\widehat{\mathcal {S}}}] + (\varepsilon ^2 c(\xi ) + {\tilde{d}}_{1}) \partial _{x} {\widehat{\mathcal {S}}} - m_{1} {\widehat{\mathcal {S}}} \partial _{x} + R_4 {\mathcal {S}} \} \Pi _S^\bot , \end{aligned}$$

where \(\tilde{R}_5\) collects all the terms of order at most \(\partial _x^0\). By (8.36), we solve \( 3 m_3 w_x\) \(+ \varepsilon ^2 c(\xi ) + \tilde{d}_1 - m_1 = 0 \) by choosing \(w := - (3 m_3)^{-1} \partial _x^{-1} ( \varepsilon ^2 c(\xi ) + \tilde{d}_1 - m_1 )\). For \( \varepsilon \) small the operator \( \mathcal{S} \) is invertible, and we get

$$\begin{aligned} \mathcal {L}_5 := \mathcal {S}^{-1} \mathcal {L}_4 \mathcal {S}= \Pi _S^\bot ( \mathcal{D}_\omega + m_3 \partial _{xxx} + m_1 \partial _x ) \Pi _S^\bot + R_5 , \quad R_5 := \mathcal{S}^{-1} \tilde{R}_5 . \end{aligned}$$
(8.56)

Since \( \mathcal{S} \) is symplectic, \(\mathcal{L}_5\) is Hamiltonian (recall Definition 2). By (8.25), (8.37) and (8.38), one has \(\Vert w \Vert _s^{\mathrm {Lip}(\gamma )}\le _s \varepsilon ^7 \gamma ^{-2} + \varepsilon ^2 \Vert {\mathfrak I}_\delta \Vert _{s + \sigma }^{\mathrm {Lip}(\gamma )}\).

Lemma 24

There is \( \sigma = \sigma (\nu ,{\tau }) > 0 \) (possibly larger than in Lemma 23) such that

$$\begin{aligned} |\mathcal{S}^{\pm 1} - I|_s^{\mathrm {Lip}(\gamma )}&\le _s \varepsilon ^7 \gamma ^{-2} + \varepsilon ^2 \Vert {\mathfrak I}_\delta \Vert _{s+\sigma }^{\mathrm {Lip}(\gamma )}, \\ |\partial _i \mathcal{S}^{\pm 1} [\widehat{\imath }]|_s&\le _s \varepsilon ^{2b} \Vert \widehat{\imath }\Vert _{s+\sigma } + \varepsilon ^5 \gamma ^{-1} \Vert {\mathfrak I}_\delta \Vert _{s + \sigma } \Vert \widehat{\imath }\Vert _{s_0 + \sigma } . \end{aligned}$$

The remainder \(R_5\) satisfies the same estimates (8.54) as \(R_4\).

8.6 KAM reducibility and inversion of \( \mathcal{L}_{\omega } \)

The coefficients \( m_3, m_1 \) of the operator \( \mathcal{L}_5 \) in (8.56) are constants, and the remainder \( R_5 \) is a bounded operator of order \( \partial _x^0 \) with small matrix decay norm, see (8.59). Then we can diagonalize \( \mathcal{L}_5 \) by applying the iterative KAM reducibility Theorem 4.2 in [3] along the sequence of scales

$$\begin{aligned} N_n := N_{0}^{\chi ^n}, \quad n = 0,1,2,\ldots , \quad \chi := 3/2, \quad N_0 > 0 . \end{aligned}$$
(8.57)

In Sect. 9, the initial \( N_0 \) will (slightly) increase to infinity as \( \varepsilon \rightarrow 0 \), see (9.5). The required smallness condition (see (4.14) in [3]) is (written in the present notations)

$$\begin{aligned} N_0^{C_0} | R_5 |_{s_0 + \beta }^{{\mathrm {Lip}(\gamma )}} \gamma ^{-1} \le 1 \end{aligned}$$
(8.58)

where \( \beta := 7 {\tau }+ 6 \) (see (4.1) in [3]), \( {\tau }\) is the diophantine exponent in (5.3) and (8.63), and the constant \( C_0 := C_0 ({\tau }, \nu ) > 0 \) is fixed in Theorem 4.2 in [3]. By Lemma 24, the remainder \( R_5 \) satisfies the bound (8.54), and using (7.5) we get (recall (5.9))

$$\begin{aligned} | R_5|_{s_0 + \beta }^{{\mathrm {Lip}(\gamma )}} \le C \varepsilon ^7 \gamma ^{-2} = C \varepsilon ^{3-2a}, \quad | R_5 |_{s_0 + \beta }^{{\mathrm {Lip}(\gamma )}} \gamma ^{-1} \le C \varepsilon ^7 \gamma ^{-3} = C \varepsilon ^{1 - 3 a}. \end{aligned}$$
(8.59)

We use that \( \mu \) in (7.5) is assumed to satisfy \( \mu \ge \sigma + \beta \) where \( \sigma := \sigma ({\tau }, \nu ) \) is given in Lemma 24.

Theorem 4

(Reducibility) Assume that \(\omega \mapsto i_\delta (\omega ) \) is a Lipschitz function defined on some subset \(\Omega _o \subset \Omega _\varepsilon \) (recall (5.2)), satisfying (7.5) with \( \mu \ge \sigma + \beta \), where \( \sigma := \sigma ({\tau }, \nu ) \) is given in Lemma 24 and \( \beta := 7 {\tau }+ 6 \). Then there exists \( \delta _{0} \in (0,1) \) such that, if

$$\begin{aligned} N_0^{C_0} \varepsilon ^7 \gamma ^{-3} = N_0^{C_0} \varepsilon ^{1 - 3 a} \le \delta _{0} , \quad \gamma := \varepsilon ^{2b}:= \varepsilon ^{2 + a} , \quad a \in (0,1/6) , \end{aligned}$$
(8.60)

then:

  1. (i)

    (Eigenvalues) For all \( \omega \in \Omega _\varepsilon \) there exists a sequence

    $$\begin{aligned} \mu _j^\infty (\omega ) := \mu _j^\infty (\omega , i_\delta (\omega )) := {\mathrm i}( - {\tilde{m}}_3 (\omega ) j^3 + {\tilde{m}}_1(\omega ) j ) + r_j^\infty (\omega ), \quad j \in S^c , \end{aligned}$$
    (8.61)

    where \( {\tilde{m}}_3, {\tilde{m}}_1\) coincide with the coefficients \(m_3, m_1\) of \( \mathcal{L}_5 \) in (8.56) for all \( \omega \in \Omega _o \), and

    $$\begin{aligned} | {\tilde{m}}_3 - 1 |^{\mathrm {Lip}(\gamma )}&\le C \varepsilon ^3, \quad | {\tilde{m}}_1 - \varepsilon ^2 c(\xi ) |^{\mathrm {Lip}(\gamma )}\le C \varepsilon ^5 \gamma ^{-1}, \nonumber \\ | r^{\infty }_j |^{\mathrm {Lip}(\gamma )}&\le C \varepsilon ^{3 - 2 a} \quad \forall j \in S^c \end{aligned}$$
    (8.62)

    for some \( C > 0 \) (and \(c(\xi )\) is defined in (8.36)). All the eigenvalues \(\mu _j^{\infty }\) are purely imaginary. We define, for convenience, \(\mu _0^\infty (\omega ) := 0\).

  2. (ii)

    (Conjugacy) For all \(\omega \) in the set

    $$\begin{aligned} \Omega _\infty ^{2\gamma } := \Omega _\infty ^{2\gamma } (i_\delta )&:= \Bigg \{ \omega \in \Omega _o : \, | {\mathrm i}\omega \cdot l + \mu ^{\infty }_j (\omega ) - \mu ^{\infty }_{k} (\omega ) | \ge \frac{2 \gamma | j^{3} - k^{3} |}{ \langle l \rangle ^{{\tau }}} \nonumber \\&\qquad \quad \forall l \in \mathbb Z^{\nu }, \ \forall j ,k \in S^c \cup \{0\} \Bigg \} \end{aligned}$$
    (8.63)

    there is a real, bounded, invertible linear operator \(\Phi _\infty (\omega ) : H^s_{S^\bot } (\mathbb T^{\nu +1}) \rightarrow H^s_{S^\bot } (\mathbb T^{\nu +1}) \), with bounded inverse \(\Phi _\infty ^{-1}(\omega )\), that conjugates \(\mathcal {L}_6\) in (8.56) to constant coefficients, namely

    $$\begin{aligned} \begin{array}{ll} \mathcal{L}_{\infty }(\omega ) &{} := \Phi _{\infty }^{-1}(\omega ) \circ \mathcal {L}_5(\omega ) \circ \Phi _{\infty }(\omega ) = \omega \cdot \partial _{\varphi } + \mathcal{D}_{\infty }(\omega ), \\ \mathcal{D}_{\infty }(\omega ) &{} := \mathrm{diag}_{j \in S^c} \{ \mu ^{\infty }_{j}(\omega ) \} . \end{array} \end{aligned}$$
    (8.64)

    The transformations \(\Phi _\infty , \Phi _\infty ^{-1}\) are close to the identity in matrix decay norm, with

    $$\begin{aligned} | \Phi _{\infty } - I |_{s,\Omega _\infty ^{2\gamma }}^{\mathrm{Lip}(\gamma )} + | \Phi _{\infty }^{- 1} - I |_{s,\Omega _\infty ^{2\gamma }}^{\mathrm {Lip}(\gamma )}\le _s \varepsilon ^7 \gamma ^{-3} + \varepsilon ^2 \gamma ^{-1} \Vert {\mathfrak I}_\delta \Vert _{s + \sigma }^{\mathrm {Lip}(\gamma )}. \end{aligned}$$
    (8.65)

    Moreover \(\Phi _{\infty }, \Phi _{\infty }^{-1}\) are symplectic, and \(\mathcal {L}_\infty \) is a Hamiltonian operator.

Proof

The proof closely follows the one of Theorem 4.1 in [3], which is based on Theorem 4.2, Corollaries 4.1, 4.2 and Lemmata 4.1, 4.2 of [3]. Here \(\omega \in \mathbb R^\nu \), while in [3] the parameter \(\lambda \in \mathbb R\), but Kirszbraun’s theorem on Lipschitz extension also holds in \(\mathbb R^\nu \). The bound (8.65) follows by Corollary 4.1 of [3] and the estimate of \( R_5 \) in Lemma 24 above.

To adapt the proof of [3] to the present case, the only changes in the statement of Theorem 4.2 of [3] are: \(\varepsilon ^{3-2a}\) instead of \(\varepsilon \) in (4.18) of [3], and \(\varepsilon ^{1+b}\) instead of \(\varepsilon \) in (4.23), (4.25) and (4.26) of [3]. The factor \(\varepsilon ^{1+b}\) comes from the bound for \(\partial _i R_5\), see Lemma 24 and (8.54). \(\square \)

Remark 10

Theorem 4.2 in [3] also provides the Lipschitz dependence of the (approximate) eigenvalues \( \mu _j^n \) with respect to the unknown \( i_0 (\varphi ) \), which is used for the measure estimate (Lemma 25).

All the parameters \( \omega \in \Omega _\infty ^{2 \gamma } \) satisfy (specialize (8.63) for \( k = 0 \))

$$\begin{aligned} |{\mathrm i}\omega \cdot l + \mu _j^\infty (\omega )| \ge 2 \gamma | j |^3 \langle l \rangle ^{-{\tau }} , \quad \forall l \in \mathbb Z^\nu , \ j \in S^c, \end{aligned}$$
(8.66)

and the diagonal operator \( \mathcal{L}_\infty \) is invertible.

In the following theorem we verify the inversion assumption (6.26) for \(\mathcal{L}_\omega \).

Theorem 5

(Inversion of \( \mathcal{L}_\omega )\) Assume the hypotheses of Theorem 4 and (8.60). Then there exists \( \sigma _1 := \sigma _1 ( {\tau }, \nu ) > 0 \) such that, \( \forall \omega \in \Omega ^{2 \gamma }_\infty (i_\delta )\) (see (8.63)), for any function \( g \in H^{s+\sigma _1}_{S^\bot } (\mathbb T^{\nu +1}) \) the equation \(\mathcal{L}_\omega h = g\) has a solution \(h = \mathcal{L}_\omega ^{-1} g \in H^s_{S^\bot } (\mathbb T^{\nu +1})\), satisfying

$$\begin{aligned} \Vert \mathcal{L}_\omega ^{-1} g \Vert _s^{\mathrm{Lip}(\gamma )}&\le _s \gamma ^{-1} \big ( \Vert g \Vert _{s +\sigma _1}^{\mathrm{Lip}(\gamma )} + \varepsilon ^2 \gamma ^{-1} \Vert {\mathfrak I}_0 \Vert _{s + \sigma _1}^{\mathrm {Lip}(\gamma )}\Vert g \Vert _{s_0}^{\mathrm{Lip}(\gamma )} \big ) . \end{aligned}$$
(8.67)

Proof

See the proof of Theorem 8.16 in [5]. \(\square \)

9 The Nash–Moser nonlinear iteration

In this section we prove Theorem 2. It will be a consequence of the Nash–Moser Theorem 6 below.

Consider the finite-dimensional subspaces

$$\begin{aligned} E_n := \big \{ {\mathfrak {I}}(\varphi ) = ( \Theta , y, z )(\varphi ) :\Theta = \Pi _n \Theta , \ y = \Pi _n y, \ z = \Pi _n z \big \} \end{aligned}$$

where \( N_n := N_0^{\chi ^n} \) are introduced in (8.57), and \( \Pi _n \) are the projectors (which, with a small abuse of notation, we denote with the same symbol)

$$\begin{aligned} \Pi _n \Theta (\varphi ) := \sum _{|l| < N_n} \Theta _l e^{{\mathrm i}l \cdot \varphi }, \quad \Pi _n z(\varphi ,x) := \sum _{|(l,j)| < N_n} z_{lj} e^{{\mathrm i}(l \cdot \varphi + jx)}, \end{aligned}$$
(9.1)

where \(\Theta (\varphi ) = \sum _{l \in \mathbb Z^\nu } \Theta _l e^{{\mathrm i}l \cdot \varphi }\) and \(z(\varphi ,x) = \sum _{l \in \mathbb Z^\nu , j \in S^c} z_{lj} e^{{\mathrm i}(l \cdot \varphi + jx)}\) [for \(\Pi _n y(\varphi )\) similar definition as for \(\Pi _n \Theta (\varphi )\)]. We define \( \Pi _n^\bot := I - \Pi _n \). The classical smoothing properties hold: for all \(\alpha , s \ge 0\),

$$\begin{aligned} \Vert \Pi _{n} {\mathfrak {I}}\Vert _{s + \alpha }^{\mathrm {Lip}(\gamma )}&\le N_{n}^{\alpha } \Vert {\mathfrak {I}}\Vert _{s}^{\mathrm {Lip}(\gamma )}\quad \forall {\mathfrak {I}}(\omega ) \in H^{s}, \nonumber \\ \Vert \Pi _{n}^\bot {\mathfrak {I}}\Vert _{s}^{\mathrm {Lip}(\gamma )}&\le N_{n}^{-\alpha } \Vert {\mathfrak {I}}\Vert _{s + \alpha }^{\mathrm {Lip}(\gamma )}\quad \forall {\mathfrak {I}}(\omega ) \in H^{s + \alpha }. \end{aligned}$$
(9.2)

We define the constants

$$\begin{aligned} \mu _1:= & {} 3 \mu + 9,\quad \alpha := 3 \mu _1 + 1,\quad \alpha _1 := (\alpha - 3 \mu )/2 ,\end{aligned}$$
(9.3)
$$\begin{aligned} \kappa:= & {} 3 (\mu _1 + \rho ^{-1} )+ 1,\quad \beta _1 := 6 \mu _1+ 3 \rho ^{-1} + 3 , \quad 0 < \rho < \frac{1 - 3 a}{C_1(2 + 3 a)}, \end{aligned}$$
(9.4)

where \( \mu := \mu ({\tau }, \nu ) \) is the “loss of regularity” defined in Theorem 3 (see (6.35)) and \( C_1 \) is fixed below.

Theorem 6

(Nash–Moser) Assume that \( f \in C^q \) with \( q > s_0 + \beta _1 + \mu + 3 \). Let \( {\tau }\ge \nu + 2 \). Then there exist \( C_1 > \max \{ \mu _1 + \alpha , C_0 \} \) [where \( C_0 := C_0 ({\tau }, \nu ) \) is the one in Theorem 4], \( \delta _0 := \delta _0 ({\tau }, \nu ) > 0 \) such that, if

$$\begin{aligned} N_0^{C_1} \varepsilon ^{b_* + 2} \gamma ^{-2}< \delta _0, \quad \gamma := \varepsilon ^{2 + a} = \varepsilon ^{2b} ,\quad N_0 := (\varepsilon ^4 \gamma ^{-3})^\rho ,\quad b_* := 5 - 2 b , \end{aligned}$$
(9.5)

then, for all \( n \ge 0 \):

  • \((\mathcal{P}1)_{n}\) there exists a function \(({\mathfrak {I}}_n, \zeta _n) : \mathcal{G}_n \subseteq \Omega _\varepsilon \rightarrow E_{n-1} \times \mathbb R^\nu \), \(\omega \mapsto ({\mathfrak {I}}_n(\omega ), \zeta _n(\omega ))\), \( ({\mathfrak {I}}_0, \zeta _0) := 0 \), \( E_{-1} := \{ 0 \} \), satisfying \( | \zeta _n |^{\mathrm {Lip}(\gamma )}\le C \Vert \mathcal{F}(U_n) \Vert _{s_0}^{\mathrm {Lip}(\gamma )}\),

    $$\begin{aligned} \Vert {\mathfrak {I}}_n \Vert _{s_0 + \mu }^{\mathrm{Lip}(\gamma )} \le C_* \varepsilon ^{b_*} \gamma ^{-1}, \quad \Vert \mathcal{F}(U_n)\Vert _{s_0 + \mu + 3}^{\mathrm{Lip}(\gamma )} \le C_*\varepsilon ^{b_*} , \end{aligned}$$
    (9.6)

    where \(U_n := (i_n, \zeta _n)\) with \(i_n(\varphi ) = (\varphi ,0,0) + {\mathfrak {I}}_n(\varphi )\). The sets \(\mathcal{G}_{n} \) are defined inductively by:

    $$\begin{aligned} \mathcal{G}_{0}:= & {} \Bigg \{\omega \in \Omega _\varepsilon \, : \, |\omega \cdot l| \ge \frac{2 \gamma }{\langle l \rangle ^{{\tau }}} \, \ \forall l \in \mathbb Z^\nu {\setminus } \{0\} \Bigg \} ,\nonumber \\ \mathcal{G}_{n+1}:= & {} \Bigg \{ \omega \in \mathcal{G}_{n} \, : \, |{\mathrm i}\omega \cdot l + \mu _j^\infty ( i_n) - \mu _k^\infty ( i_n )| \ge \frac{2\gamma _{n} |j^{3}-k^{3}|}{\left\langle l\right\rangle ^{{\tau }}} \nonumber \\&\quad \forall j , k \in S^c \cup \{0\}, \ l \in \mathbb Z^{\nu } \Bigg \}, \end{aligned}$$
    (9.7)

    where \( \gamma _{n}:=\gamma (1 + 2^{-n}) \) and \(\mu _j^\infty (\omega ) := \mu _j^\infty (\omega , i_n(\omega )) \) are defined in (8.61) [and \( \mu _0^\infty (\omega ) = 0 ]\). The difference \(\widehat{\mathfrak I}_n := {\mathfrak I}_n - {\mathfrak I}_{n - 1} \) (where we set \( \widehat{\mathfrak {I}}_0 := 0 \)) is defined on \(\mathcal {G}_n\), and it satisfies

    $$\begin{aligned} \Vert \widehat{\mathfrak I}_1 \Vert _{ s_0 + \mu }^{{\mathrm {Lip}(\gamma )}} \le C_* \varepsilon ^{b_*} \gamma ^{-1} , \quad \Vert \widehat{\mathfrak I}_n \Vert _{ s_0 + \mu }^{{\mathrm {Lip}(\gamma )}} \le C_* \varepsilon ^{b_*} \gamma ^{-1} N_{n - 1}^{-\alpha _1} \quad \forall n > 1. \end{aligned}$$
    (9.8)
  • \((\mathcal{P}2)_{n}\) \( \Vert \mathcal{F}(U_n) \Vert _{ s_{0}}^{\mathrm{Lip}(\gamma )} \le C_* \varepsilon ^{b_*} N_{n - 1}^{- \alpha }\) where we set \(N_{-1} := 1\).

  • \((\mathcal{P}3)_{n}\) (High norms). \( \Vert {\mathfrak {I}}_n \Vert _{ s_{0}+ \beta _1}^{\mathrm{Lip}(\gamma )} \le C_* \varepsilon ^{b_*} \gamma ^{-1} N_{n - 1}^{\kappa } \) and \( \Vert \mathcal{F}(U_n ) \Vert _{ s_{0}+\beta _1}^{\mathrm{Lip}(\gamma )} \le C_* \varepsilon ^{b_*} N_{n - 1}^{\kappa } \).

  • \((\mathcal{P}4)_{n}\) (Measure). The measure of the “Cantor-like” sets \( \mathcal{G}_n \) satisfies

    $$\begin{aligned} | \Omega _\varepsilon {\setminus } \mathcal{G}_0 | \le C_* \varepsilon ^{2(\nu - 1)} \gamma , \quad \big | \mathcal{G}_n {\setminus } \mathcal{G}_{n+1} \big | \le C_* \varepsilon ^{2(\nu - 1)} \gamma N_{n - 1}^{-1} . \end{aligned}$$
    (9.9)

All the Lip norms are defined on \( \mathcal{G}_{n} \), namely \(\Vert \ \Vert _s^{\mathrm{Lip}(\gamma )} = \Vert \ \Vert _{s,\mathcal {G}_n}^{\mathrm{Lip}(\gamma )}\).

Proof

To simplify notations, in this proof we denote \(\Vert \, \Vert ^{\mathrm{Lip}(\gamma )}\) by \(\Vert \, \Vert \).

Step 1: Proof of \((\mathcal{P}1, 2, 3)_0\). Recalling (5.6) we have \( \Vert \mathcal{F}( U_0 ) \Vert _s\) \(= \Vert \mathcal{F}(\varphi , 0 , 0, 0 ) \Vert _s\) \(= \Vert X_P(\varphi , 0 , 0 ) \Vert _s \le _s \varepsilon ^{5-2b} \) by Lemma 5. Hence (recall that \( b_* := 5 - 2 b \)) the smallness conditions in \((\mathcal{P}1)_0\)\((\mathcal{P}3)_0\) hold taking \( C_* := C_* (s_0 + \beta _1) \) large enough.

Step 2: Assume that \((\mathcal{P}1,2,3)_n\) hold for some \(n \ge 0\), and prove \((\mathcal{P}1,2,3)_{n+1}\). The proof of this step closely follows Step 2 in the proof of Theorem 9.1 of [5]. We just mention the main changes: here it is convenient to define

$$\begin{aligned} w_n := \varepsilon ^2 \gamma ^{-2} \Vert \mathcal{F}(U_n) \Vert _{s_0},\quad B_n := \varepsilon ^2 \gamma ^{-1}\Vert {\mathfrak {I}}_n \Vert _{s_0 + \beta _1} + \varepsilon ^2 \gamma ^{-2} \Vert \mathcal{F}(U_n) \Vert _{s_0 + \beta _1} , \end{aligned}$$
(9.10)

while the corresponding quantities defined in (9.18) of [5] have \(\varepsilon \) instead of \(\varepsilon ^2\) (and then, with definition (9.10), the bounds (9.19) of [5] are also valid here without changes). In the present case, the estimates (9.20) and (9.21) of [5] for the quadratic Taylor remainder have to be adapted by replacing the factor \(\varepsilon \) with \(\varepsilon ^2\). The reason for this improvement is that the nonlinearity in the mKdV equation is cubic, whereas in the KdV equation considered in [5] the nonlinearity is just quadratic. \(\square \)

Remark 11

Since the KdV, respectively mKdV, nonlinearity is quadratic, respectively cubic, the smallness condition required in [5] for the convergence of the Nash–Moser scheme is stronger than for Theorem 6: it is \( \varepsilon \Vert \mathcal{F}(\varphi , 0, 0 ) \Vert _{s_0+ \mu } \gamma ^{-2} \ll 1 \) instead of \( \varepsilon ^2 \Vert \mathcal{F}(\varphi , 0, 0 ) \Vert _{s_0+ \mu } \gamma ^{-2} \ll 1 \). As a consequence less steps of Birkhoff normal form are required (namely less monomials to work out in the original Hamiltonian) to reach the sufficient smallness \(\mathcal {F}(U_0) = O( \varepsilon ^{5-2b}) \) to make the Nash–Moser scheme to converge (in [5] it is needed \(\mathcal {F}(U_0) = O( \varepsilon ^{6-2b}) \)).

Step 3: Prove \((\mathcal{P}4)_n\) for all \(n \ge 0\). For all \(n \ge 0\), the difference \(\mathcal {G}_n {\setminus } \mathcal {G}_{n+1}\) is the union over \(l \in \mathbb Z^\nu \), \(j,k \in S^c \cup \{ 0 \}\) of the sets \(R_{ljk}(i_n)\), where

$$\begin{aligned} R_{ljk}(i_n) := \{ \omega \in \mathcal{G}_n :|{\mathrm i}\omega \cdot l + \mu _j^\infty (i_{n}) - \mu _k^\infty (i_{n})| < 2\gamma _{n} |j^{3}-k^{3}|\left\langle l\right\rangle ^{- {\tau }}\}. \end{aligned}$$
(9.11)

Since \(R_{ljk}(i_n) = \emptyset \) for \(j = k\), in the sequel we assume that \(j \ne k\).

Lemma 25

For \(n \ge 1\), \(|l| \le N_{n - 1}\), one has the inclusion \(R_{ljk}(i_n) \subseteq R_{ljk}(i_{n - 1}) \).

Proof

The proof closely follows the one of Lemma 5.2 in [3]. The differences are that here the vector \(\omega \) is not confined along a fixed direction, here we have \(N_{n-1}\) instead of \(N_n\), and the factor \(\varepsilon \) in (5.28) and (5.33) of [3] is replaced here by \(\varepsilon ^7 \gamma ^{-2} = \varepsilon ^{3-2a}\).

In the proof we use (8.25), (8.37), (8.59) and (9.8), and the bounds (4.25), (4.26) and (4.34) of [3] adapted to the present case (the bounds (4.25) and (4.26) of [3] hold here with \(\varepsilon ^{1+b}\) instead of \(\varepsilon \), as already pointed out in the proof of Theorem 4; the bound (4.34) of [3] holds here with no change). \(\square \)

By definition, \( R_{ljk} (i_n) \subseteq \mathcal{G}_n \) (see (9.11)). By Lemma 25, for \(n \ge 1\) and \( |l| \le N_{n-1} \) we also have \(R_{ljk}(i_n) \subseteq R_{ljk}(i_{n - 1}) \). On the other hand, \( R_{ljk}(i_{n-1}) \cap \mathcal{G}_{n} = \emptyset \) (see (9.7)). As a consequence, \( R_{ljk} (i_n) = \emptyset \) for all \( |l| \le N_{n-1} \), and

$$\begin{aligned} \mathcal{G}_n {\setminus } \mathcal{G}_{n+1} \subseteq \bigcup _{\begin{array}{c} j, k \in S^c \cup \{0\} \\ |l| > N_{n-1} \end{array}} R_{ljk} ( i_n) \quad \forall n \ge 1. \end{aligned}$$
(9.12)

Lemma 26

Let \(n \ge 0\). If \(R_{ljk}(i_n) \ne \emptyset \), then \(|l| \ge C_1 |j^3 - k^3| \ge \frac{1}{2} C_1 (j^2 + k^2) \) for some constant \(C_1 > 0\) (independent of \(l,j,k,n,i_n,\omega \)).

Proof

Follow the proof of Lemma 5.3 of [3], also using (8.62). Note that \(|\omega | \le 2 |\bar{\omega }|\) for all \(\omega \in \Omega _\varepsilon \), for \(\varepsilon \) small enough, by (4.10) and (5.2). \(\square \)

Now we study the measure of the resonant sets \(R_{ljk}(i_n)\) defined in (9.11). We have to analyze in more details the sublevels of the function

$$\begin{aligned} \omega \mapsto \phi (\omega ) := {\mathrm i}\omega \cdot l + \mu _j^\infty (\omega ) - \mu _k^\infty (\omega ), \end{aligned}$$
(9.13)

appearing in (9.11) (\(\phi \) also depends on \(l,j,k,i_n\)).

Lemma 27

There exists \(C_0 > 0\) such that for all \(j \ne k\), with \(j^2 + k^2 > C_0\), the set \(R_{ljk}(i_n)\) has Lebesgue measure \(|R_{ljk}(i_n)| \le C \varepsilon ^{2(\nu -1)} \gamma \langle l \rangle ^{-{\tau }}\).

Proof

For \(l \ne 0\), decompose \(\omega = s \hat{l} + v\), where \(\hat{l} := l / |l|\), \(s \in \mathbb R\), and \(l \cdot v = 0\) (so that \(\omega \cdot l = s |l|\)). Let \(\psi (s) := \phi (s \hat{l} + v)\). The eigenvalues \(\mu _j^\infty \) are given in (8.61). By (5.4) and (8.36), \( \varepsilon ^2 |c(\xi )|^\mathrm {lip}\le C_2 \) for some constant \(C_2 > 0 \) depending only on the set S of the tangential sites. Then, by (2.2) and (8.62),

$$\begin{aligned} |\tilde{m}_3(s_1) - \tilde{m}_3(s_2)|&\le C \varepsilon ^3 \gamma ^{-1} |s_1 - s_2|, \\ |\tilde{m}_1(s_1) - \tilde{m}_1(s_2)|&\le (C_2 + C \varepsilon ^5 \gamma ^{-2})|s_1 - s_2| \le 2 C_2 |s_1 - s_2|, \\ |r_j^\infty (s_1) - r_j^\infty (s_2)|&\le C \varepsilon ^{3-2a} \gamma ^{-1} |s_1 - s_2| \end{aligned}$$

for some \(C > 0\) and \(\varepsilon \) small enough, where, with a slight abuse of notations, we have written

$$\begin{aligned} \tilde{m}_i(s) = \tilde{m}_i (s \hat{l} + v), \quad i = 1, 3 \quad \text {and} \quad r_j^\infty (s) = r_j^\infty (s \hat{l} + v), \quad j \in S^c. \end{aligned}$$

By (8.61) and Lemma 26,

$$\begin{aligned} |\psi (s_1) - \psi (s_2)|&\ge \big ( |l| - C \varepsilon ^3 \gamma ^{-1} |j^3 - k^3| - 2 C_2 |j-k| - 2C \varepsilon ^{3-2a} \gamma ^{-1} \big ) |s_1 - s_2| \\&\ge |j^3 - k^3| \left( C_1 - C \varepsilon ^3 \gamma ^{-1} - \frac{2 C_2 |j-k|}{|j^3 - k^3|} \, - \frac{2C \varepsilon ^{3-2a} \gamma ^{-1}}{|j^3 - k^3|} \right) |s_1 - s_2| \\&\ge \frac{C_1}{2} \,|j^3 - k^3| |s_1 - s_2| \end{aligned}$$

for \(\varepsilon \) small enough and \(j^2 + k^2 + jk > C_0 := 12 C_2 / C_1\). As a consequence, the set \(\Delta _{ljk}(i_n) := \{ s :s \hat{l} + v \in R_{ljk}(i_n) \}\) has Lebesgue measure

$$\begin{aligned} | \Delta _{ljk}(i_n) | \le \frac{2}{C_1 |j^3 - k^3|} \, \frac{4 \gamma _n |j^3 - k^3|}{\langle l \rangle ^{\tau }} \le \frac{C \gamma }{\langle l \rangle ^{\tau }} \end{aligned}$$

for some \(C > 0\). The lemma follows by Fubini’s theorem. \(\square \)

Remark 12

When \( K = H + \lambda M^2 \), \( \lambda = 3 / 4 \), using (8.40), the conclusion of Lemma 27 holds without restrictions on jk.

It remains to estimate the measure of the finitely many resonant sets \(R_{ljk}(i_n)\) for \(j^2 + k^2 \le C_0\). Recalling (8.36) and the parity \(\xi _{-j} = \xi _j\), we write \(c(\xi ) = 6 \varsigma \mathbf {1} \cdot \xi \) where \(\mathbf {1}\) is the vector \((1, \ldots , 1) \in \mathbb R^\nu \) and \(\xi = (\xi _j)_{j \in S^+} \in \mathbb R^\nu \). Hence, by (5.4),

$$\begin{aligned} \varepsilon ^2 c(\xi ) = 6 \varsigma \mathbf {1} \cdot \mathbb {A}^{-1} [\omega - \bar{\omega }] = 6 \varsigma \mathbb {A}^{-T} \mathbf {1} \cdot [\omega - \bar{\omega }] \end{aligned}$$
(9.14)

where \(\mathbb {A}^{-T}\) is the transpose of \(\mathbb {A}^{-1}\). We write the function \( \phi (\omega ) \) in (9.13) as

$$\begin{aligned} \phi (\omega ) = a_{jk} + b_{ljk} \cdot \omega + q_{jk}(\omega ) , \end{aligned}$$

where

$$\begin{aligned} a_{jk}&:= - {\mathrm i}\big ( j^3 - k^3 + 6 \varsigma (j-k) \mathbf {1} \cdot \mathbb {A}^{-1} \bar{\omega }\big ) , \\ b_{ljk}&:= {\mathrm i}\big ( l + 6 \varsigma (j-k) \mathbb {A}^{-T} \mathbf {1} \big ) , \\ q_{jk}(\omega )&:= - {\mathrm i}(\tilde{m}_3 -1) (j^3 - k^3) + {\mathrm i}(\tilde{m}_1 -\varepsilon ^2 c(\xi )) (j-k) + r_j^\infty - r_k^\infty \end{aligned}$$

(and \(\tilde{m}_3, \tilde{m}_1, \xi , r_j^\infty , r_k^\infty \) all depend on \(\omega \)). By (8.62) and since \( j^2 + k^2 \le C_0 \) we deduce that \( |q_{jk}|^{\mathrm {Lip}(\gamma )}\le C \varepsilon ^{3-2a} \). Recalling (2.2) we get

$$\begin{aligned} |q_{jk}|^\mathrm{sup} \le C \varepsilon ^{3-2a} , \quad |q_{jk}|^\mathrm {lip}\le \gamma ^{-1} |q_{jk}|^{\mathrm {Lip}(\gamma )}\le C \varepsilon ^{1-3a} \end{aligned}$$
(9.15)

so that \( \phi (\omega ) \) is a small perturbation of the affine function \( \omega \mapsto a_{jk} + b_{ljk} \cdot \omega \). By the next lemma, the hypothesis (1.12) on the tangential sites S allows to verify that such function does not vanish identically.

Lemma 28

Assume (1.12). Then, for all \(j \ne k\), \(j^2 + k^2 \le C_0\) it results \(a_{jk} \ne 0\).

Proof

Using formulae (1.19) and (4.11), we calculate

$$\begin{aligned} \mathbf {1} \cdot \mathbb {A}^{-1} \bar{\omega }= - \frac{1}{3\varsigma (2\nu -1)} \, \sum _{i=1}^\nu \bar{\jmath }_i^{\,2}. \end{aligned}$$

Hence

$$\begin{aligned} a_{jk} = - {\mathrm i}(j-k) \left( j^2 + jk + k^2 - \frac{2}{2\nu -1} \, \sum _{i=1}^\nu \bar{\jmath }_i^{\,2} \right) \ne 0 \end{aligned}$$

by assumption (1.12) on the set S. \(\square \)

Lemma 28 implies that \(\delta := \min \{ |a_{jk}| :j^2 + k^2 \le C_0, \ j \ne k \} > 0\).

Lemma 29

Assume (1.12). If \(j^2 + k^2 \le C_0\), then \(|R_{ljk}(i_n)| \le C \varepsilon ^{2(\nu -1)} \gamma \langle l \rangle ^{-{\tau }}\).

Proof

Denote \(b := b_{ljk}\) for brevity. For \( j^2 + k^2 \le C_0\), \(\omega \in R_{ljk}(i_n)\), one has, by (9.11) and (9.15),

$$\begin{aligned} | b \cdot \omega | \ge |a_{jk}| - |\phi (\omega )| - |q_{jk}(\omega )| \ge \delta - 2 \gamma _n |j^3 - k^3| \langle l \rangle ^{-{\tau }} - C \varepsilon ^{3-2a} \ge \delta /2 \end{aligned}$$

for \(\varepsilon \) small enough. On the other hand, \(| b \cdot \omega | \le 2 | \bar{\omega }| |b|\) because \(|\omega | \le 2 |\bar{\omega }|\) (see (4.10) and (5.2)). Hence \(|b| \ge \delta _1\) where \(\delta _1 := \delta / (4 |\bar{\omega }|) > 0\). Split \(\omega = s \hat{b} + v\) where \(\hat{b} := b / |b|\) and \(v \cdot b = 0\). Let \(\psi (s) := \phi ( s \hat{b} + v )\). By (9.15), for \(\varepsilon \) small enough, we get

$$\begin{aligned} | \psi (s_1) - \psi (s_2)| \ge (|b| - |q_{jk}|^\mathrm {lip}) |s_1 - s_2| \ge \frac{\delta _1}{2} |s_1 - s_2| . \end{aligned}$$

Then we proceed similarly as in the proof of Lemma 27. \(\square \)

The proof of (9.9) follows from the Lemmata 2529, proceeding like in [3] (see the conclusion of the proof of Theorem 5.1 in [3]).

Proof of Theorem 2 concluded The conclusion of the proof of Theorem 2 follows exactly like in [5] (see “Proof of Theorem 5.1 concluded” in [5]).

Remark 13

By Remark 12 and Lemma 28 (which is the only point in the paper where assumption (1.12) is used) is not needed any more. Thus Theorem 1 applies to \( K = H + (3\varsigma /4) M^2\) without assuming hypothesis (1.12).