1 Introduction

With the development of KAM theory, there are many well known KAM theorems [1, 4, 10, 12,13,14, 16, 21, 22]. The classical KAM theorem [1, 10, 16] asserts that if the frequency mapping satisfies Kolmogorov non-degeneracy condition, then the Lagrangian invariant tori with Diophantine frequencies can persist under small perturbations. Kolmogorov non-degeneracy condition can be weakened to Bruno non-degeneracy condition and Rüssmann non-degeneracy condition [6, 19, 23, 26], in particular, Rüssmann non-degeneracy condition is sharpest one for KAM theorems. Moreover, the Diophantine condition can be weakened to the Bruno-Rüssmann condition [2, 8, 17,18,19,20]. In addition, a similar problem for non-Hamiltonian vector fields with Bruno frequency vectors is studied in [9]. In particular, as an alternative to the KAM method, the renormalization method is used in [8, 9].

In this paper we are concerned about lower dimensional invariant tori with Bruno frequency vectors in Hamiltonian systems. Consider the following real analytic nearly integrable Hamiltonians

$$\begin{aligned} H_{\pm }(x,y,u, v)=\langle \omega , y\rangle +\frac{1}{2}\mathop \sum \limits _{j=1}^m\Omega _j \left( u_j^2\pm v_j^2 \right) +P(x,y,u,v). \end{aligned}$$
(1.1)

The phase space is \(T^n\times {\mathbb {R}}^n\times {\mathbb {R}}^m\times {\mathbb {R}}^m\) associated with the symplectic structure

$$\begin{aligned} \mathop \sum \limits _{i=1}^n dx_i\wedge dy_i+\mathop \sum \limits _{j=1}^m du_j\wedge d v_j, \end{aligned}$$

where \(T^n={\mathbb {R}}^n/2\pi {\mathbb {Z}}^n\) is the n-torus. The tangential frequency \(\omega \) is regarded as a parameter and is usually implied for simplicity of notations. Assume \(\Omega _j\ne 0, \ \forall j=1, 2, \ldots m,\) which usually depend on \(\omega \). P is a small perturbation. If \(P=0\), then Hamiltonian \(H_+\) (\(H_-\)) becomes a normal form and has a parameterized family of elliptic (hyperbolic) lower dimensional invariant tori \({\mathcal {T}}_\omega =T^n\times \{0\}\times \{0\}\times \{0\}\) with frequencies \(\omega \).

Melnikov [12, 13] concluded that if P is sufficiently small, for most of the frequency parameters \(\omega \), the invariant tori \({\mathcal {T}}_\omega \) for Hamiltonian \(H_+\) can persist under the following non-resonance conditions:

$$\begin{aligned}{} & {} \langle \omega , k\rangle +\Omega _j(\omega ) \quad \ne 0,\qquad \qquad \quad \forall {k}\in {{\mathbb {Z}}}^n,\ {j}=1,2,\cdots m, \end{aligned}$$
(1.2)
$$\begin{aligned}{} & {} \langle \omega , k\rangle +\Omega _i(\omega )+ \Omega _j(\omega )\ne 0,\quad \, \forall {k}\in {{\mathbb {Z}}}^n,\ i,j=1,2,\cdots m, \end{aligned}$$
(1.3)
$$\begin{aligned}{} & {} \langle \omega , k\rangle +\Omega _i(\omega )-\Omega _j(\omega )\ne 0,\quad \, \forall {k}\in {{\mathbb {Z}}}^n,\ |k|+|i-j|\ne 0, \end{aligned}$$
(1.4)

where (1.2) is called the first Melnikov condition, while (1.3) and (1.4) are called the second Melnikov condition. Later the result is improved by Pöschel and Bourgain [3, 17].

As to hyperbolic invariant tori for Hamiltonian \(H_-\), there are many well known KAM theorems [5, 7, 11, 15], which are essentially some extension of Lagrangian invariant tori. Actually, hyperbolic case is much simpler than elliptic case since there is no problem of Melnikov conditions.

Recently, Xu and Lu [24] developed some new KAM techniques to prove two formal KAM theorems, which can be used to prove various kinds of KAM theorems for Lagrangian tori and elliptic lower dimensional tori. Note that the frequency considered in [24] is Diophantine. By motivation of [24], in this paper we want to give a formal KAM theorem for hyperbolic invariant tori under Bruno-Rüssmann non-resonance. By this formal KAM theorem, many previous results can be direct corollaries.

2 Main Result

For s, \(r>0\), let \(T_{s}=\bigl \{x \in {\mathbb {C}}^n/2\pi {\mathbb {Z}}^{n}\ | \ |\textrm{Im}x|\le s \bigr \}\) and

$$\begin{aligned} D_{s,r}=\bigl \{w \in {\mathbb {C}}^n/2\pi {\mathbb {Z}}^{n} \times {\mathbb {C}}^n\times {\mathbb {C}}^m \times {\mathbb {C}}^m:|\textrm{Im}x|\le s, |y|_1\le r^2, |u|_2\le r, |v|_2\le r\bigr \}, \end{aligned}$$

where \(|\cdot |\) is the sup-norm, \(|\cdot |_1\) is the \(l^1-\)norm, and \(|\cdot |_2\) indicates the Euclidean norm. Let \(U\subset {\mathbb {R}}^n\) be a domain and \(\ell \ge 0\) be an integer.

Consider a parameterized Hamiltonian

$$\begin{aligned} H(\xi ;w)=\langle \omega (\xi ),y\rangle +\langle \Omega u,v\rangle +P(\xi ;w), \end{aligned}$$
(2.1)

where \(w=(x,y,u,v)\) is the phase variable and \(\xi \) is a parameter. It is easy to see that \((u, v)=(0,0)\) is a hyperbolic equilibrium for Hamiltonian H if \(P=0\). Here we should note that under the symplectic mapping, \(\frac{u-v}{\sqrt{2}}={\tilde{u}}\), \(\frac{u+v}{\sqrt{2}}={\tilde{v}},\) \(\langle \Omega {\tilde{u}},{\tilde{v}}\rangle =\frac{1}{2}\mathop \sum \nolimits _{j=1}^m\Omega _j(u_j^2- v_j^2).\) So we use the normal form in (2.1) for convenience.

Assume that \(H(\xi ;w)\) is analytic in w on \(D_{s,r}\) and \(C^{\ell }\)-smooth in \(\xi \) on U. Then \(P(\xi ;w)\) can be expanded as Fourier series with respect to x with

$$\begin{aligned} P(\xi ; w)=\sum \limits _{k\in {\mathbb {Z}}^n}P_k(\xi ;{\bar{w}})e^{\sqrt{-1}\langle k,x\rangle }, \end{aligned}$$

where \(P_k(\xi ;{\bar{w}})=\sum \nolimits _{i\in {\mathbb {Z}}_+^n,j,l\in {\mathbb {Z}}_+^m}P_{ijlk}(\xi )y^iu^jv^l\), where \({\mathbb {Z}}_+^n\) is composed of all the integer vectors with nonnegative components, and \({\mathbb {Z}}_+^m\) has the same meaning.

Denote by \(C^{\ell ;a}(U\times D_{s,r})\) the set which consists of functions that are analytic in w on \(D_{s,r}\) and \(C^{\ell }\)-smooth in \(\xi \) on U. For \(P\in C^{\ell ;a}(U\times D_{s,r}),\) we define

$$\begin{aligned} \Vert P\Vert _{U\times D_{s,r}}=\sum \limits _k \Vert P_k\Vert _{U;r}e^{|k|s}, \end{aligned}$$

where

$$\begin{aligned} \Vert P_k\Vert _{U;r}=\mathop {\sup }_{|y|_1\le r^2, |z|_2\le r, |{\bar{z}}|_2\le r} \left| \sum \limits _{i\in {\mathbb {Z}}_+^n,j,l\in {\mathbb {Z}}_+^m}\Vert P_{ijlk}\Vert _{\alpha ,C^{\ell }(U)}y^iu^jv^l \right| , \end{aligned}$$

with the weighted norm

$$\begin{aligned} \Vert P_{ijlk}\Vert _{\alpha ,C^{\ell }(U)}=\mathop {\textrm{max}}\limits _{|\beta |\le {\ell }}\alpha ^{|\beta |}\mathop {\textrm{max}}\limits _{\xi \in U} \left| \frac{\partial ^{\beta }P_{ijlk}(\xi )}{\partial ^{\beta }\xi }\right| , \end{aligned}$$

where \(\beta \in {\mathbb {Z}}_+^n\) and \(\alpha \) is a constant in (2.4).

2.1 Bruno-Rüssmann Condition

Let \(\Xi :[0,+\infty )\rightarrow [1,+\infty )\) be a nondecreasing unbounded function. \(\Xi \) is called an approximating function if

$$\begin{aligned} \Xi (0)=1, \ \ \frac{{\textrm{log}}(\Xi (t))}{t}\rightarrow 0,\ \ 0\le t\rightarrow \infty , \end{aligned}$$
(2.2)

and

$$\begin{aligned} \int ^{+\infty }t^{-2}{\textrm{log}}(\Xi (t))\, dt<\infty . \end{aligned}$$
(2.3)

Moreover, assume that the approximation function \(\Xi (t)\) is sufficiently increasing, which is absolutely continuous and satisfies the condition (5.3) in the Appendix.

If

$$\begin{aligned} |\big \langle k,\omega \big \rangle |\ge \frac{\alpha }{\Xi (|k|)},\ \ 0\ne k\in {\mathbb {Z}}^n, \end{aligned}$$
(2.4)

where \(0<\alpha \le 1\), we call \(\omega \) satisfies Bruno-Rüssmann condition.

Theorem 2.1

(The formal KAM theorem) Let \(H\in C^{\ell ;a}(U\times D_{s,r})\) be given in (2.1). Then for \(0<\sigma \le s/2\), there exists a sufficiently small \(\gamma >0\), such that if

$$\begin{aligned} \Vert P\Vert _{ U\times D_{s,r}}\le \epsilon =\alpha \gamma r^2, \end{aligned}$$
(2.5)

there exist a \(C^{\ell }(U)\)-smooth family of parameterized symplectic mappings \(\{\Psi (\xi ;\cdot )\}_{\xi \in U}\) and a family of Hamiltonians \(\{H_*(\xi ;\cdot )\}_{\xi \in U}\) with the following conclusions holding true:

  1. (1)

    \(\Psi _*\in C^{\ell ;a}(U\times D_{s/2,r/2})\) with

    $$\begin{aligned} \Vert W(\Psi _*-id)\Vert _{U\times D_{s/2,r/2}}\le c\Delta (\sigma )\gamma , \end{aligned}$$

    where \(W=diag(\sigma ^{-1}Id,r^{-2}Id,r^{-1}Id,r^{-1}Id)\), and \(\Delta (\sigma )\) is as shown in (5.2).

  2. (2)
    $$\begin{aligned} H_*(\xi ;w)=N_*(\xi ;w)+P_*(\xi ;w), \end{aligned}$$
    (2.6)

    where \(N_*(\xi ;w)=\big \langle \omega _*(\xi ),y\big \rangle +\big \langle \Omega u,v\big \rangle +\big \langle Q_*(\xi ;x) z, z\big \rangle \) with \(z=(u, v)^T\), and

    $$\begin{aligned} P_*(w)=\mathop {\sum }\limits _{2|i|+|j|+|l| > 2}P_{*\beta }(x){\bar{w}}^\beta , \ \ {\bar{w}}^\beta =y^iu^jv^l. \end{aligned}$$

    Furthermore,

    $$\begin{aligned} \Vert \omega _*-\omega \Vert _{{\mathcal {C}}^{\ell }(U)}\le 2\alpha \gamma , \ \ \Vert Q_*\Vert _{{\mathcal {C}}^{\ell }(U)\times T_{s/2}}\le c\Delta (\sigma )\gamma . \end{aligned}$$
    (2.7)
  3. (3)

    If for some \(\xi \in U\), \(\omega _*(\xi )\) satisfies (2.4), then

    $$\begin{aligned} H\circ \Psi _*(\xi ;w)=H_*(\xi ;w), \end{aligned}$$

    therefore, \(H(\xi ;\cdot )\) has an invariant torus \(\Psi _*(\xi ; T^n\times \{0\}\times \{0\}\times \{0\})\) with frequencies \(\omega _*(\xi )\).

Remark 2.1

Note that in Theorem 2.1 we use the Bruno-Rüssmann condition, which is a little weaker than the Diophantine condition in [24]. Moreover, we can have a similar result for elliptic lower dimensional tori. For simplicity we do not mention elliptic case in this paper.

3 Applications of Theorem 2.1

In this section we give some applications of Theorem 2.1 in two non-degenerate cases and delay the proof to the next section.

  1. (1)

    Bruno non-degenerate case Consider a real analytic Hamiltonian

    $$\begin{aligned} H(q, p, u,v)=h(p)+ \langle \Omega u, v\rangle +f(q,p, u,v), \end{aligned}$$
    (3.1)

    where \(\Omega =\text{ diag }(\Omega _1, \cdots \Omega _m)\) with \(\Omega _j\ne 0\), for \(\forall j=1,2,\cdots m\) and f is a sufficiently small perturbation. The phase space is \( T^n\times D\times {\mathbb {R}}^m\times {\mathbb {R}}^m\), where \(D\subset {\mathbb {R}}^n\) is an open domain. By introducing parameters, we consider an equivalent system. Let \(q=x\), \(p=y+\xi \), \(w=(x, y, u, v)\), then

    $$\begin{aligned} H(q,p, u, v)&=h(y+\xi )+\big \langle \Omega u,v\big \rangle +f(x, \xi +y, u, v)\nonumber \\&=e+\big \langle \omega (\xi ),y\big \rangle +\big \langle \Omega u,v\big \rangle +P(\xi ; w), \end{aligned}$$
    (3.2)

    where \(e=h(\xi )\) is an energy constant, which is usually ignored, \(\omega (\xi )=h_p(\xi ),\) and \(P(\xi ; w)=O(y^2)+f(\xi +y;x,y, u, v)\), where \(O(y^2)= h(\xi +y)-h(\xi )-\big \langle \omega (\xi ),y\big \rangle .\) Consider the parameterized Hamiltonian (3.2), which is real analytic in w on \(D_{s,r}\) and \(C^{\ell }\)-smooth in \(\xi \) on U, where \(U=\{\xi \in D \ | \ \text{ dist } (x, \partial D)\ge \delta _0>0\}\). Suppose the Bruno non-degeneracy condition holds:

    $$\begin{aligned} \text{ rank }(\partial _{\xi }\omega )=n-1, \ \ \text{ rank } (\partial _{\xi }\omega ^T,\omega ^T)=n, \ \ \forall \xi \in U. \end{aligned}$$
    (3.3)

    Let

    $$\begin{aligned} | f(q,p, u,v)|\le \varepsilon , \ \forall q\in T_s, \ p\in D, \ |u|\le \delta ,\ |v|\le \delta . \end{aligned}$$

    Let \(r=\varepsilon ^{\frac{1}{4}}\le \min \{\delta _0, \delta \}.\) Then

    $$\begin{aligned} \Vert P\Vert _{U\times D_{s,r}}\le \varepsilon +cr^4\le c\varepsilon =\epsilon =\alpha \gamma r^2, \end{aligned}$$

    where \(\gamma =\frac{c\varepsilon ^{\frac{1}{2}}}{\alpha }.\) If \(\varepsilon \) is sufficiently small, Theorem 2.1 holds for Hamiltonian (3.2). Obviously, \(\gamma \) is sufficiently small if \(\varepsilon \) is sufficiently small. By measure estimate it follows that for most of \(\xi \in U\), \(\omega (\xi )\) satisfies (2.4). Moreover, \(\omega _*(\xi )\) is a small perturbation of \(\omega \). Since \(\omega (\xi )\) is Bruno non-degenerate and \(\omega _*\) is a small perturbation of \(\omega \), by measure estimate as in [17, 24], we can prove that for most of \(\xi \) in the sense of Lebesgue measure, \(\omega _*(\xi )\) satisfies (2.4). By Theorem 2.1, for \(\xi \in U\) such that \(\omega _*(\xi )\) satisfies (2.4), then the original Hamiltonian

    $$\begin{aligned} H(\xi ;w)=\big \langle \omega (\xi ),y\big \rangle +\big \langle \Omega u,v\big \rangle +P(\xi ;w), \ \ \xi \in U \end{aligned}$$

    can be normalized to

    $$\begin{aligned} H_*(\xi ;w)=\big \langle \omega _*(\xi ),y\big \rangle +\big \langle \Omega _* u,v\big \rangle +P_*(\xi ;w), \ \ \xi \in U, \end{aligned}$$

    and then it admits a lower dimensional invariant torus with frequencies \(\omega _*(\xi )\). However, in this paper we are interested in the persistence of an invariant torus of the unperturbed system with frequency \(\omega _0=\omega (\xi _0)\) (\(\xi _0\in U\)). If \(\omega _0\) satisfies (2.4), since \(\omega (\xi )\) is Bruno non-degenerate in the sense of (3.3), in the same way as in [24] (Here we refer to Proposition 1 in [24] for details), there exist a \(\xi \in U\) and a small constant \(\lambda =O(\varepsilon )\) such that \(\omega _*(\xi )=(1+\lambda )\omega _0.\) By Theorem 2.1, Hamiltonian H has an invariant torus with the frequency \(\omega _*(\xi )\), which is a small dilation of \(\omega _0\).

  2. (2)

    Rüssmann non-degenerate case In this case we can also obtain many invariant tori by standard KAM method if f is sufficiently small, but we cannot get more information about their frequencies. Here we are concerned about the persistence of KAM tori with prescribed frequencies. Consider the Hamiltonian H in (3.1) with

    $$\begin{aligned} h(y)=\langle \omega _0, y\rangle + y_1^{2l_1}+\cdots +y_n^{2l_n},\ \ |y|\le 2\delta _0, \ \ l_1,l_2,\cdots l_n\ge 2. \end{aligned}$$

    Then \(\omega (\xi )=\omega _0+(2l_1\xi _1^{2l_1-1}, \cdots 2l_n\xi _n^{2l_n-1})\). Assume that \(\omega _0\) satisfies (2.4). Obviously, \(\text{ deg }(\omega , U,\omega _0)\ne 0,\) where \(U=\{\xi \in {\mathbb {R}}^n \ | \ |\xi |\le \delta _0 \}\). By Theorem 2.1, if f is sufficiently small, \(\text{ deg }(\omega _*, U,\omega _0)\ne 0.\) Then there exists \(\xi _*\in U\), such that \(\omega _*(\xi _*)=\omega _0\) and so \(H(\xi _*;\cdot )\) has a hyperbolic lower dimensional invariant torus with frequencies \(\omega _0\).

4 Proof of Theorem 2.1

In this section we are going to prove Theorem 2.1. Our KAM iteration is divided into several parts. Let

$$\begin{aligned} \Gamma (\sigma )=\sup \limits _{t\ge 0}(1+t)^{\ell +2}\Xi ^{\ell +2}(t)e^{-\sigma t}. \end{aligned}$$

By the property of approximation functions, \(\Gamma (\sigma )\) is well defined.

4.1 KAM Step and Iteration Lemma

Our KAM step is summarized in the following iteration lemma.

Lemma 4.1

(Iteration Lemma) Consider \(H(\xi ;w)=N(\xi ; w)+P(\xi ;w), \) where

$$\begin{aligned} N(\xi ; w)=\big \langle \omega (\xi ),y\big \rangle +\big \langle \Omega u,v\big \rangle +\big \langle Q(\xi ;x)z, z\big \rangle \end{aligned}$$

is a normal form, with \( z=(u,v)^T\), \( Q(\xi ; x)\) is a small 2m-order symmetric matrix, and P is a perturbation.

Let \(H\in C^{\ell ;a}(U\times D_{s,r})\), and \(\Vert Q\Vert _{U\times T_s}\ll 1.\) Suppose

$$\begin{aligned} \Vert P\Vert _{U\times D_{s,r}}\le \epsilon =\alpha r^2E. \end{aligned}$$

Let \(r_+=\eta r\), \(s_+=s-4\sigma \). If \(\epsilon >0\) is sufficiently small, then the following results hold true:

  1. (1)

    There exists a parameterized family of symplectic mappings \(\{\Phi (\xi ;\cdot ), \ \xi \in U\}\), such that \(\Phi \in C^{\ell ;a}(U\times D_{s_+,r_+})\) with

    $$\begin{aligned} \Phi (\xi ;\cdot ): D_{s_+,r_+}\rightarrow D_{s,r}. \end{aligned}$$

    Moreover,

    $$\begin{aligned} ||W(\Phi -id)||_{U\times D_{s_+,r_+}}\le c\Gamma E \end{aligned}$$

    and

    $$\begin{aligned} ||W({\mathcal {D}}\Phi -Id)W^{-1}||_{U\times D_{s_+,r_+}} \le c\Gamma E, \end{aligned}$$

    where \(W=diag(\sigma ^{-1}Id,r^{-2}Id,r^{-1}Id,r^{-1}Id)\) and \({\mathcal {D}}\) denotes the differential operator with respect to w.

  2. (2)

    There exists a Hamiltonian \(H_+\in C^{\ell ;a}(U\times D_{s_+,r_+})\) with

    $$\begin{aligned} H_+(\xi ;w)=N_+(\xi ; w)+P_+(\xi ;w), \end{aligned}$$

    where \(N_+(\xi ; w)=\langle \omega _+(\xi ),y\rangle +\langle \Omega u, v\rangle + \big \langle Q_+(\xi ; x) z, z\big \rangle \), and \(\omega _+=\omega +{\hat{\omega }}\). Moreover,

    $$\begin{aligned} \Vert {\hat{\omega }}\Vert \le \frac{\epsilon }{r^2}, \ \ \ \Vert Q_+-Q\Vert _{U\times T_{s}}\le c \Gamma E. \end{aligned}$$
    (4.1)

    Furthermore, \(P_+\) satisfies

    $$\begin{aligned} \Vert P_+\Vert _{U\times D_{s-4\sigma ,\eta r}}\le c\Gamma E\epsilon +ce^{-K\sigma }\epsilon +c{\eta ^3}\epsilon . \end{aligned}$$
  3. (3)

    Set

    $$\begin{aligned} R_{\alpha }^K=\left\{ \omega \in {\mathbb {R}}^n \ | \ |\langle k,\omega \rangle |\ge \frac{\alpha }{\Xi (|k|)}, \ 0< |k|\le K \right\} \end{aligned}$$

    and

    $$\begin{aligned} {\tilde{U}}=\{\xi \in U \ | \ \omega (\xi )\in R_{\alpha }^K\}. \end{aligned}$$
    (4.2)

    Then,

    $$\begin{aligned} H\circ \Phi (\xi ;w)=H_+(\xi ;w)=N_+(\xi ;w)+P_+(\xi ;w), \ \forall \ \xi \in {\tilde{U}}. \end{aligned}$$

    Moreover, define

    $$\begin{aligned} \ {\tilde{U}}_+=\left\{ \xi \in U \ | \ \omega _+(\xi )\in R_{\alpha _+}^{K_+}\right\} , \end{aligned}$$
    (4.3)

    where \(K_+>K\). If \(2K\Xi (K)\epsilon \le (\alpha _+-\alpha )r^2\), then \({\tilde{U}}_+\subset {\tilde{U}}\).

4.1.1 Proof of Iteration Lemma

1. Truncation Let

$$\begin{aligned} P=\sum \limits _{i,j,l}P_{ijl}(\xi ;x)y^iu^jv^l. \end{aligned}$$

Make a truncation for the perturbation P and let

$$\begin{aligned} R&=P_{000}(\xi ;x)+\big \langle P_{100}(\xi ;x),y\big \rangle +\big \langle P_{010}(\xi ;x),u\big \rangle +\big \langle P_{001}(\xi ;x),v\big \rangle \end{aligned}$$

and

$$\begin{aligned} \big \langle {\hat{Q}}_1(\xi ;x)z, z\big \rangle =\big \langle P_{020}(x)u,u\big \rangle +\big \langle P_{011}(x)u,v\big \rangle +\big \langle P_{002}(x)v,v\big \rangle , \end{aligned}$$

here and below \(\xi \) is implied without confusion.

Let

$$\begin{aligned} R^K= P_{000}^K(x)+\big \langle P_{100}^K(x),y\big \rangle +\big \langle P_{010}^K(x),u\big \rangle +\big \langle P_{001}^K(x),v\big \rangle , \end{aligned}$$

where

$$\begin{aligned} P_{ijl}^K(x)=\sum \limits _{k\in {\mathbb {Z}}^n,|k|\le K}P_{ijlk}e^{\textrm{i}\big \langle k, x\big \rangle }, \ \ \mathrm i=\sqrt{-1}. \end{aligned}$$

Since R is composed of the zero-order terms and the one-order terms of P, by Cauchy’s estimate we have \(\Vert R\Vert _{U\times D_{s,r}}\le 4\epsilon \). Then we truncate the Fourier series of R at order K to obtain \(R^K\). By the definition of the norm, we have

$$\begin{aligned} \Vert R^K\Vert _{U\times D_{s,r}}\le 4\epsilon , \end{aligned}$$

and

$$\begin{aligned} \Vert R-R^K\Vert _{U\times D_{s-\sigma ,r}}&\le \sum \limits _{|k|>K}\Vert R_k\Vert e^{|k|(s-\sigma )}\nonumber \\&\le e^{-K\sigma }\sum \limits _{|k|>K}\Vert R_k\Vert e^{|k|s} \le 4e^{-K\sigma }\epsilon . \end{aligned}$$
(4.4)

2. Construction of symplectic transformations The symplectic mapping \(\Phi \) is the flow \(X_F^t\) at 1-time, where F will be decided later. Let

$$\begin{aligned} F=F_{000}(x)+\big \langle F_{100}(x),y\big \rangle +\big \langle F_{010}(x),u\big \rangle +\big \langle F_{001}(x),v\big \rangle . \end{aligned}$$

Let \(G=(F_{010}, F_{001})^T\) and J be the standard 2m-th symplectic matrix. Let \(H =N+R+(P-R)\), it follows that

$$\begin{aligned} N\circ \Phi= & {} N+\{ N,F\}+\int _0^1\{(1-t)\{ N, F\},F\}\circ X^t_F\,dt,\\ R\circ \Phi= & {} R+\int _0^1\{R, F\}\circ X^t_F\,dt,\\ \big \langle {\hat{Q}}_1(\xi ;x)z, z\big \rangle \circ \Phi= & {} \big \langle {\hat{Q}}_1(\xi ;x)z, z\big \rangle +\int _0^1\{\big \langle {\hat{Q}}_1(\xi ;x)z, z\big \rangle , F\}\circ X^t_F\,dt, \end{aligned}$$

where \(\{\cdot ,\cdot \}\) denotes the Poisson bracket.

Then

$$\begin{aligned} \{ N,F\}=\big \langle \big \langle Q_x,F_{100}\big \rangle z,z\big \rangle -\big \langle \omega , F_x\big \rangle +\big \langle \Omega v,F_{001}\big \rangle -\big \langle \Omega u, F_{010}\big \rangle +\big \langle JG, 2Q z\big \rangle . \end{aligned}$$

It follows that

$$\begin{aligned} H\circ X_{F}^1&=N-\partial _{\omega } F -\big \langle \Omega u,F_{010}\big \rangle +\big \langle \Omega v,F_{001}\big \rangle \\&\quad + R^K+\big \langle {\hat{Q}}_1z, z\big \rangle +\big \langle \big \langle Q_x,F_{100}\big \rangle z, z\big \rangle +\big \langle JG, 2Q z\big \rangle + P_+, \end{aligned}$$

where \(\partial _{\omega }F\overset{\text {def}}{=}\big \langle \omega ,F_x\big \rangle \) and

$$\begin{aligned} P_+= \left( R-R^K \right) + \left( P-R-\big \langle {\hat{Q}}_1(\xi ;x)z, z\big \rangle \right) \circ X_{F}^1 +{\tilde{P}}, \end{aligned}$$
(4.5)

where

$$\begin{aligned} {\tilde{P}}=\int _0^1\{(1-t)\{N, F\}+R +\big \langle {\hat{Q}}_1(\xi ;x)z, z\big \rangle , F\}\circ X^t_F\,dt. \end{aligned}$$

Then, we need to solve the equations:

$$\begin{aligned} \left\{ \begin{aligned}&\partial _{\omega }F_{000}=P^K_{000}-[P_{000}]\\&\partial _{\omega } F_{100}= P^K_{100}-[P_{100}]\\&\partial _{\omega }G- M G-2Q(\xi ; x) J G=g\\ \end{aligned}\right. , \end{aligned}$$
(4.6)

where \(M=\text{ diag }(-\Omega , \Omega )\), \(g=(P_{010}, P_{001})^T\), \([ \ \cdot \ ]\) denotes the mean value over \(T^n\).

3. Extension of small divisors Take a \(C^{\infty }({\mathbb {R}})\)-smooth function \(\psi (t)\) such that

$$\begin{aligned} \psi (t)= \left\{ \begin{aligned}&~ 0, \ \ \ |t| \le \frac{1}{2}, \\&~1, \ \ \ |t| \ge 1. \end{aligned}\right. \end{aligned}$$

For \(h>0\), set \(\psi _h(t)= \psi (\frac{t}{h})\). Then \(\psi _h(t)\in C^{\infty }({\mathbb {R}})\) with the estimate:

$$\begin{aligned} |\frac{{d}^{l} \ }{d t^{l}}\psi _h(t)|\le \frac{c_{l}}{h^{l}}, \quad \forall t\in {\mathbb {R}}, \ \ \forall l\ge 1, \end{aligned}$$

where \(c_{l}\) is a constant depending on l.

Set

$$\begin{aligned} h=\frac{\alpha }{\Xi (|k|)}, \ \ t_k(\xi )=\langle k,\omega (\xi )\rangle ,\ \ f_k(\xi )=\frac{\psi _{h}(t_k(\xi ))}{\mathrm i\langle k,\omega (\xi )\rangle }. \end{aligned}$$

Recall the definition of \({\tilde{U}}\), it follows easily that for \(\xi \in {\tilde{U}}\), \(f_k(\xi )=\frac{1}{\mathrm i\langle k,\omega (\xi )\rangle }\). Here, we observe that even though \({\tilde{U}}=\emptyset \), the extension \(f_k(\xi )\) is still well defined on U. Then \(f_k(\xi )\in C^{\ell }(U)\), which satisfies

$$\begin{aligned} \bigl |\frac{\partial ^{\beta }f_k}{\partial \xi ^{\beta }} (\xi )\bigr |\le c h^{-|\beta |-1}|k|^{|\beta |}, \quad \xi \in U,\ \ \forall |\beta |\le \ell . \end{aligned}$$

Set

$$\begin{aligned} F_{\flat k}(\xi ;{\bar{w}})=f_k(\xi )(P_{\flat k}-[P_{\flat k}])=\frac{\psi _{h}(t_k(\xi ))}{\mathrm i\langle k,\omega (\xi )\rangle }(P_{\flat k}-[P_{\flat k}]), \end{aligned}$$

where the subscript \(\flat =000,100, \ 0<|k|\le K\). Then we extend \( F_{\flat k}(\xi ;{\bar{w}})\) for \(\xi \) from \({\tilde{U}}\) to the whole set U.

4. Solving the homological equations The first two equations for (4.6) are standard. By the extension of small divisors, in the same way as in [24, 25], we have \(F_{000}\) and \(F_{100}\) such that

$$\begin{aligned} \Vert F_{000}\Vert _{U\times D_{s-2\sigma ,r}}&\le c\alpha ^{-1} \sum \limits _{k} \Xi ^{\ell +1}(|k|)|k|^{\ell }e^{-|k|\sigma }\Vert P_{000k}\Vert e^{|k|s}\\&\le c\alpha ^{-1}\sup \limits _{t\ge 0}t^{\ell }\Xi ^{\ell +1}(t)e^{-\sigma t}\Vert P_{000}\Vert _{U\times D_{s-\sigma ,r}}\\&\le c\alpha ^{-1}\epsilon \sup \limits _{t\ge 0}(1+t)^{\ell }\Xi ^{\ell +1}(t)e^{-\sigma t}\\&\le c\alpha ^{-1} \Gamma _{\ell +1}\epsilon \end{aligned}$$

and

$$\begin{aligned} \Vert F_{100}\Vert _{U\times D_{s-2\sigma ,r/2}}\le c \alpha ^{-1}r^{-2}\Gamma _{\ell +1}\epsilon , \end{aligned}$$

where \(\Gamma _{\ell +1}(\sigma )=\sup \limits _{t\ge 0}(1+t)^{\ell +1}\Xi ^{\ell +1}(t)e^{-\sigma t}\). Moreover, for \(\xi \in {\tilde{U}}\), where \({\tilde{U}}\) is given in (4.2), \(F_{000}\) and \( F_{100}\) are solutions of the equations.

For \(G=(F_{010}, F_{001})^T\), we apply Lemma 5.1 with \(Q_0=M,\ {\hat{Q}}=2QJ\) to have G satisfying

$$\begin{aligned} \Vert F_{010} \Vert _{U\times D_{s-2\sigma ,r/2}}\le c r^{-1}\epsilon ,\ \ \Vert F_{001} \Vert _{U\times D_{s-2\sigma ,r/2}}\le c r^{-1}\epsilon . \end{aligned}$$
(4.7)

Therefore,

$$\begin{aligned} \Vert F\Vert _{U\times D_{s-2\sigma ,r/2}}\le c\alpha ^{-1}\Gamma _{\ell +1}\epsilon , \end{aligned}$$
(4.8)

and

$$\begin{aligned} \Vert \partial _x F\Vert _{U\times D_{s-2\sigma ,r/2}}&\le \sum \limits _{k} |k|\cdot \Vert F_{k}\Vert e^{|k|(s-\sigma )}\\&\le c\alpha ^{-1}\sup \limits _{t\ge 0}t\cdot (1+t)^{\ell +1}\Xi ^{\ell +1}(t)e^{-\sigma t}\epsilon \le c \alpha ^{-1} \Gamma \epsilon , \end{aligned}$$

where \(\Gamma (\sigma )=\sup \limits _{t\ge 0}(1+t)^{\ell +2}\Xi ^{\ell +2}(t)e^{-\sigma t}\).

5. Estimates of the symplectic mapping Write the symplectic mapping as

$$\begin{aligned} \Phi (\xi ; w)=X_{F}^1=({\tilde{a}}(\xi ;x), {\tilde{b}}(\xi ;w), {\tilde{d}} (\xi ;x,u), {\tilde{e}} (\xi ;x,v)). \end{aligned}$$

By the construction of F and \(\Phi \), it follows that \({\tilde{b}}\) is affine in yuv, \({\tilde{d}}\) and \({\tilde{e}}\) are the translations of uv, respectively. Moreover, by the estimates of F,  we have

$$\begin{aligned}{} & {} \Vert {\tilde{a}}-id\Vert _{U\times D_{ s-2\sigma ,r/2}}\le c \Gamma _{\ell +1} E,\ \ \Vert {\tilde{b}}-id\Vert _{U\times D_{ s-2\sigma ,r/4}}\le c \alpha ^{-1}\Gamma \epsilon ,\\{} & {} \Vert {\tilde{d}}-id\Vert _{U\times D_{s-2\sigma ,r/4}}\le c r^{-1}\epsilon ,\ \ \Vert {\tilde{e}}-id\Vert _{U\times D_{ s-2\sigma ,r/4}}\le c r^{-1}\epsilon . \end{aligned}$$

And

$$\begin{aligned} {\mathcal {D}}\Phi =\begin{pmatrix} {\tilde{a}}_x &{} \quad 0 &{} \quad 0 &{} \quad 0 \\ {\tilde{b}}_x&{} \quad {\tilde{b}}_y&{} \quad {\tilde{b}}_u &{} \quad {\tilde{b}}_{v}\\ {\tilde{d}}_x&{} \quad 0 &{} \quad Id&{} \quad 0\\ {\tilde{e}}_x&{} \quad 0 &{} \quad 0 &{} \quad Id \end{pmatrix}. \ \ \end{aligned}$$

By Lemma 5.4, we have

$$\begin{aligned} \frac{\Gamma _{\ell +1}(\sigma )}{\sigma }\le \frac{\Gamma _{{\ell +2}}(\sigma )}{2({\ell +1})}= \frac{\Gamma (\sigma )}{2({\ell +1})}. \end{aligned}$$

Assume that

$$\begin{aligned} c \Gamma E\le \sigma <\eta ^2\le \frac{1}{4}. \end{aligned}$$
(4.9)

Then the symplectic mapping \(\Phi : D_{s-4\sigma ,\eta r} \rightarrow D_{ s-3\sigma ,2\eta r}\), with estimates

$$\begin{aligned} \Vert W(\Phi -id)\Vert _{U\times D_{s-4\sigma ,\eta r}} \le c\Gamma E \end{aligned}$$
(4.10)

and

$$\begin{aligned} \Vert W({\mathcal {D}}\Phi -Id)W^{-1}\Vert _{U\times D_{s-4\sigma ,\eta r}} \le c\Gamma E, \end{aligned}$$
(4.11)

where the weight matrix \(W=\text{ diag }(\sigma ^{-1}Id,r^{-2}Id,r^{-1}Id,r^{-1}Id)\).

6. Estimates of the new error terms Recall that \(N=\big \langle \omega (\xi ),y\big \rangle +\big \langle \Omega u,v\big \rangle +\big \langle Q z, z\big \rangle \). Let \({\hat{Q}}_2=Q_x\cdot F_{100}.\) Then it follows that \(H\circ \Phi =N_++P_+,\) where \(N_+=N+{\hat{N}}\), with

$$\begin{aligned} {\hat{\omega }}(\xi )=[P_{100}],\ \ {\hat{Q}}(\xi ; x)={\hat{Q}}_1+ {\hat{Q}}_2. \end{aligned}$$

By standard estimate, we get

$$\begin{aligned} \Vert {\hat{\omega }}\Vert _{U}\le \alpha E, \ \ \Vert {\hat{Q}}\Vert _{U\times T_{s-4\sigma }}\le c\Gamma E. \end{aligned}$$

Also note that \(P_+\) is given in (4.5). By (4.4), we have

$$\begin{aligned} \left\| R-R^K \right\| _{U\times D_{s-\sigma ,r}}\le 4 e^{-K\sigma }\epsilon . \end{aligned}$$

By Taylor’s formula with remainder and Cauchy’s estimate, we have

$$\begin{aligned} \left\| P-R- \left\langle {\hat{Q}}_1(\xi ;x)z, z \right\rangle \right\| _{U\times D_{s,2\eta r}}\le c\eta ^3\epsilon , \end{aligned}$$

Combining with the estimates of F, R and \({\hat{Q}}_1\), we get

$$\begin{aligned} \Vert P_+\Vert _{U\times D_{s-4\sigma ,\eta r}}\le c\Gamma E\epsilon +ce^{-K\sigma }\epsilon +c{\eta ^3}\epsilon , \end{aligned}$$
(4.12)

where c is a constant independent of KAM steps.

In the same way as [17], we will choose iteration parameters such that the KAM step can iterate. The idea is as follows. By some suitable choices of \(K,\ \eta , \ \epsilon _+,\ r_+\) as

$$\begin{aligned} e^{-K\sigma }\sim \Gamma E, \ \ \eta ^3\sim \Gamma E, \ \ \epsilon _+\sim \Gamma E\epsilon , \ \ r_+\sim \eta r, \end{aligned}$$

we can have

$$\begin{aligned} \Vert P_+\Vert _{U\times D_{s-4\sigma ,\eta r}}\le c\Gamma E\epsilon +ce^{-K\sigma }\epsilon +c{\eta ^3}\epsilon \le \epsilon _+=\alpha _+ r_+^2 E_+. \end{aligned}$$

Moreover, it follows that

$$\begin{aligned} \frac{\epsilon _+}{r_+^2}\sim \frac{\Gamma E\epsilon }{\eta ^2r^2}\sim \frac{\Gamma E^2}{\eta ^2}\sim \Gamma ^{\frac{1}{3}}E^{\frac{4}{3}}\sim E_+. \end{aligned}$$

In KAM step, E will decrease rapidly and it will be so small that \(\Gamma ^{\frac{1}{3}}E^{\frac{4}{3}}\) becomes much smaller.

4.1.2 KAM Iteration

Recall that

$$\begin{aligned} \Gamma (\sigma )=\sup \limits _{t\ge 0}(1+t)^{\ell +2}\Xi ^{\ell +2}(t)e^{-\sigma t}. \end{aligned}$$

By Lemma 5.3, for \(\sigma =s/2\), there exists a sequence \(\sigma _0\ge \sigma _1\ge \sigma _2\ge \cdots >0\), such that \(\sigma _0+\sigma _1+\sigma _2+\cdots =\sigma \) and

$$\begin{aligned} \Delta (\sigma )=\mathop {\Pi }\limits _{j=0}^{\infty }\Gamma (\sigma _j)^{\kappa _j},\ \ \kappa _j=\frac{\kappa -1}{\kappa ^{j+1}} \ \ \text{ with }\ \ \kappa =\frac{4}{3}. \end{aligned}$$

At the initial step, let \(H_0=H\) and set \(s_0=s\), \(r_0=r, E_0=\gamma \). For \(i\ge 0\), define

$$\begin{aligned} \alpha _{i+1}=(1-\frac{1}{2^{i+3}})\alpha , \ \ \Theta _i=\mathop {\Pi }\limits _{j=0}^{i-1}\bigl (a2^{j}\Gamma (\sigma _j)\bigr )^{\kappa _j}, \ \ E_i=(\Theta _iE_0)^{\kappa ^i}, \end{aligned}$$

where \(\Theta _0=1\), \(a=(2c)^3\), and c is the constant in the estimate of \(P_+\). Let \( \epsilon _i=\alpha _ir_i^2E_i.\) Moreover, define \(K_i\) and \(\eta _i\) by

$$\begin{aligned} e^{-K_i\sigma _i}=2^{i+4}\Gamma (\sigma _i) E_i,\ \ \eta _i^3=2^{i}\Gamma (\sigma _i)E_i. \end{aligned}$$

Here the multipliers \(2^{i+4}\) and \(2^i\) are required for small divisor conditions. Define \( r_{i+1}=\eta _ir_i, \ s_{i+1}=s_i-4\sigma _i. \)

Note that

$$\begin{aligned} \Delta (\sigma )=\mathop {\Pi }\limits _{j=0}^{\infty }\Gamma (\sigma _j)^{\kappa _j}, \ \sum \limits _{j=0}^{\infty }\kappa _j=1, \ \ \sum \limits _{j=0}^{\infty }j\kappa _j=\frac{1}{\kappa -1}. \end{aligned}$$

Then we get

$$\begin{aligned} \Theta _i\rightarrow 8a \Delta (\sigma ), \ \ i\rightarrow \infty . \end{aligned}$$

Note that \(\Gamma (\sigma _i)\le \Gamma (\sigma _j)\) for all \(j\ge i\). It follows that

$$\begin{aligned} a2^{i}\Gamma (\sigma _i) =\mathop {\Pi }\limits _{j=i}^{\infty }\bigl (a2^{i}\Gamma (\sigma _i)\bigr ) ^{\kappa _j\kappa ^i} \le \left( \mathop {\Pi }\limits _{j=i}^{\infty } \left( a2^{j}\Gamma (\sigma _j)\right) ^{\kappa _j} \right) ^{\kappa ^i} \end{aligned}$$

and then

$$\begin{aligned} a2^{i}\Gamma (\sigma _i)\cdot E_i&\le \left( \left( \mathop {\Pi }\limits _{j=0}^{\infty }a2^{j}\Gamma (\sigma _j)\right) ^{\kappa _j} E_0\right) ^{\kappa ^i}=(a\Delta (\sigma ) E_0)^{\kappa ^i}. \end{aligned}$$
(4.13)

Denote by \(D_i=D_{s_i,r_i}\). By Lemma 4.1, there exists a sequence of Hamiltonians \(\{H_i(\xi ; w),\ \xi \in U,\ w\in D_i\}\) such that \(H_i\in C^{\ell ;a}(U\times D_i)\) and \(H_{i}=N_{i}+P_{i},\) where

$$\begin{aligned} N_{i}=\langle \omega _{i},y\rangle +\langle \Omega u, v\rangle +\langle Q_{i}(x)z, z\rangle , \end{aligned}$$

and \(P_i\) satisfies that

$$\begin{aligned} \Vert P_{i}\Vert _{U\times D_{i}}\le \epsilon _i=\alpha _ir_i^2E_i. \end{aligned}$$
(4.14)

Moreover, there exists a sequence of parameterized symplectic transformations \(\{\Phi _i(\xi ; w), \ \xi \) \(\in U,\ w\in D_{i+1}\}\), such that for each \(\xi \in U\), \(\Phi _i(\xi ;w):D_{i+1}\rightarrow D_i\). Moreover, and \(\Phi _i\in C^{\ell ;a}(U\times D_{i+1})\) with estimates:

$$\begin{aligned} \Vert W_i(\Phi _i-id)\Vert _{ U\times D_{i+1}}\le c \Gamma (\sigma _i) E_i \end{aligned}$$
(4.15)

and

$$\begin{aligned} \Vert W_i({\mathcal {D}}\Phi _i-Id)W_i^{-1}\Vert _{ U\times D_{i+1}}\le c \Gamma (\sigma _i)E_i. \end{aligned}$$
(4.16)

Let

$$\begin{aligned} {\tilde{U}}_i=\left\{ \xi \in U \ | \ |\langle k,\omega _i(\xi )\rangle |\ge \frac{\alpha _i}{\Xi (|k|)}, \ 0< |k|\le K_i \right\} . \end{aligned}$$

Then for \(\xi \in {\tilde{U}}_i\),

$$\begin{aligned} H_{i+1}=H_i\circ \Phi _i=N_{i+1}+P_{i+1}, \end{aligned}$$
(4.17)

where

$$\begin{aligned} N_{i+1}=\langle \omega _{i+1},y\rangle +\langle \Omega u, v\rangle +\langle Q_{i+1}(x)z, z\rangle , \end{aligned}$$

and by (4.9) and (4.12),

$$\begin{aligned} \Vert P_{i+1}\Vert _{ U\times D_{i+1}}&\le c\Gamma (\sigma _i) E_i\epsilon _i+ce^{-K_i\sigma _i}\epsilon _i+c{\eta _i^3}\epsilon _i\\&\le c\cdot 2^{i}\Gamma (\sigma _i) E_i\epsilon _i. \end{aligned}$$

By the definitions of \(\eta _i\), \(\epsilon _i\), \(E_{i}\) and \(r_{i+1}=\eta _ir_i\), it follows that

$$\begin{aligned} \frac{c2^i\Gamma (\sigma _i)E_i\epsilon _i}{r^2_{i+1}\alpha _{i+1}} \le \frac{2c2^i\Gamma (\sigma _i)E_i^2}{\eta _i^2} \le \bigl ((2c)^32^i\Gamma (\sigma _i)\bigr )^{\frac{1}{3}}E_i^{\frac{4}{3}} \le \bigl (a2^i\Gamma (\sigma _i)\bigr )^{\frac{1}{3}}E_i^{\frac{4}{3}}\le E_{i+1}, \end{aligned}$$

where \(a=(2c)^3\). Then

$$\begin{aligned} \Vert P_{i+1}\Vert _{ U\times D_{i+1}} \le \epsilon _{i+1}=\alpha _{i+1}r_{i+1}^2E_{i+1}. \end{aligned}$$
(4.18)

In addition,

$$\begin{aligned} |{\hat{\omega _i}}|\le \alpha _i E_i, \ \ \ \Vert {\hat{Q}}_{i}\Vert \le c\Gamma (\sigma _i)E_i, \end{aligned}$$
(4.19)

where \({\hat{\omega _i}}=\omega _{i+1}-\omega _i\).

Let \(\Psi _0=id\), \(\Psi _i=\Phi _0\circ \Phi _1\circ \cdots \circ \Phi _{i-1}\), \(i\ge 1\). By Lemma 4.1 again, if \(2K_i\Xi (K_i)\epsilon _i\le (\alpha _{i+1}-\alpha _i)r_i^2,\) \(\forall i\ge 0\), we have \( {\tilde{U}}_i\supset {\tilde{U}}_{i+1},\ \forall i\ge 0.\) The monotonousness of \(\{{\tilde{U}}_i\}\) implies that for \(\xi \in {\tilde{U}}_i\), \(H_i=H\circ \Psi _i\).

Now we verify the assumption \(2K_i\Xi (K_i)\epsilon _i\le (\alpha _{i+1}-\alpha _i)r_i^2\), which is equivalent to \(2^{i+4}K_i\Xi (K_i)\epsilon _i/r_i^2\le \alpha \). By the definition of \(\epsilon _i\), we need to prove

$$\begin{aligned} E_i\le \frac{1}{(2^{i+4}-4)K_i\Xi (K_i)}. \end{aligned}$$
(4.20)

Recall \(e^{-K_{i}\sigma _{i}}=2^{i+4}\Gamma (\sigma _i) E_i.\) By the definition of \(\Gamma (\sigma _i)\), it follows that

$$\begin{aligned} \frac{1}{K_{i}\Xi (K_{i})}&=\frac{e^{-K_{i}\sigma _{i}}}{K_{i}\Xi (K_{i})e^{-K_{i}\sigma _{i}}} =\frac{2^{i+4}\Gamma (\sigma _i)E_{i}}{K_{i}\Xi (K_{i})e^{-K_{i}\sigma _{i}}}\\&\ge \frac{2^{i+4}\Gamma (\sigma _i)E_{i}}{\Gamma (\sigma _i)}=2^{i+4}E_i, \end{aligned}$$

then

$$\begin{aligned} E_i\le \frac{1}{2^{i+4}K_i\Xi (K_i)}\le \frac{1}{(2^{i+4}-4)K_i\Xi (K_i)}, \end{aligned}$$

which shows (4.20).

4.1.3 Convergence

Now we consider the convergence of the KAM iteration. Note that \(r_i\rightarrow 0\) as \(i\rightarrow \infty \). Let \(D_i\rightarrow D_*=D_{s/2,0}\) as \(i\rightarrow \infty .\) We first consider the convergence of \(\{\Psi _i\}.\) The proof is the same way as in [17]. First we have

$$\begin{aligned} \Vert W_0 {\mathcal {D}}\Psi _{i-1}W_{i-1}^{-1}\Vert&\le \Vert W_0 {\mathcal {D}}\Phi _{0}W_0^{-1}\Vert \Vert W_0W_1^{-1}\Vert \Vert W_1{\mathcal {D}}\Phi _{1}W_1^{-1}\Vert \cdots \Vert W_{i-3}W_{i-2}^{-1}\Vert \\&\quad \Vert W_{i-2}{\mathcal {D}}\Phi _{i-2}W_{i-2}^{-1}\Vert \Vert W_{i-2}W_{i-1}^{-1}\Vert \\&\le \prod _{j=0}^{i-2} \left( 1+c\Gamma \left( \sigma _j \right) E_j \right) . \end{aligned}$$

By (4.13), if \(a\Delta (\sigma )E_0<1\), then \(\prod _{j=0}^{\infty } (1+c\Gamma (\sigma _j)E_j)<\infty \). (4.15) and (4.16) imply that

$$\begin{aligned} \Vert W_0(\Psi _i-\Psi _{i-1})\Vert _{ U\times D_{i}}&=\Vert W_0(\Psi _{i-1}\circ \Phi _{i-1}-\Psi _{i-1})\Vert _{ U\times D_{i}}\nonumber \\&\le \Vert W_0 {\mathcal {D}}\Psi _{i-1}W_{i-1}^{-1}\Vert _{ U\times D_{i}}\cdot \Vert W_{i-1}(\Phi _{i-1}-id)\Vert _{ U\times D_{i}}\nonumber \\&\le c\Gamma (\sigma _{i-1})E_{i-1}, \end{aligned}$$
(4.21)

and so \(\{\Psi _i\}\) is convergent on \(D_*\).

Note that \(\Psi _i\) has the same structure as \(\Phi _i\), and recall \(z=(u,v)^T\). Let

$$\begin{aligned} \Psi _i(\xi ; w)=(A_i(x), y+B_i(x)+C_i(x)y+D_i(x)z, z+E_i(x)). \end{aligned}$$

Since \(\{\Psi _i(\xi ; w)\}\) is convergent for \(x\in T_{s/2}, y=0, z=0,\) then \(\{A_i(x)\}\) and \(\{B_i(x)\},\ \{E_i(x)\}\) are convergent as \(i\rightarrow \infty \) for \(x\in T_{s/2}.\) Below we prove that \(\{C_i(x)\},\ \{D_i(x)\}\) are also convergent on \(T_{s/2}.\)

Let

$$\begin{aligned} \Phi _i(\xi ; w)=(a_i(x), y+b_i(x)+c_i(x)y+d_i(x)z, z+e_i(x)). \end{aligned}$$

Then \(a_i: x\in T_{s_{i+1}}\rightarrow a_i(x)\in T_{s_i}.\) By the estimate for \({\mathcal {D}}\Phi \) in Lemma 4.1, it follows that

$$\begin{aligned} \Vert c_i(x)\Vert \le c \Gamma (\sigma _i)E_i,\ \Vert d_i(x)\Vert \le c \Gamma (\sigma _i)E_ir_i, \ x\in T_{s_{i+1}}. \end{aligned}$$

Moreover,

$$\begin{aligned} \begin{pmatrix} Id_{n}+C_{i}(x)&{} \quad D_i(x)\\ 0 &{} \quad Id_{2m} \end{pmatrix}= \prod _{j=0}^{i-1} \begin{pmatrix} Id_{n}+c_{j}({\tilde{x}}_j(x))&{} \quad d_j({\tilde{x}}_j(x))\\ 0 &{} \quad Id_{2m} \end{pmatrix}, \ \ \end{aligned}$$

where \(Id_{k}\) indicates the k-th unit matrix and

$$\begin{aligned} {\tilde{x}}_j(x)=a_{j}\circ a_{j-1}\circ \cdots \circ a_{i-1}(x), \ \ {\tilde{x}}_j: \ x\in T_{s_i}\rightarrow {\tilde{x}}_j(x)\in T_{s_{j+1}}. \end{aligned}$$

Then we have

$$\begin{aligned} \Vert c_j({\tilde{x}}_j(x))\Vert \le c \Gamma (\sigma _i)E_i, \ \Vert d_j({\tilde{x}}_j(x))\Vert \le c \Gamma (\sigma _i)E_ir_i, \ x\in T_{s_{i}}. \end{aligned}$$

Thus, as \(i\rightarrow \infty \), \(\{C_i(x)\}\) and \(\{D_i(x)\}\) are convergent on \(T_{s/2}.\) So \(\Psi _i\) is actually convergent on \(D_{s/2,r/2}\). Let \(\Psi _*=\mathop {\textrm{lim}}\nolimits _{i\rightarrow \infty }\Psi _i\).

Note that \(\omega _i=\omega _0+\sum \nolimits _{j=0}^{i-1}{\hat{\omega _j}}\). By (4.19), it follows \(\omega _i\rightarrow \omega _*\) as \(i\rightarrow \infty \), moreover,

$$\begin{aligned} |\omega _*-\omega _i|\le \sum \limits _{j=i}^{\infty }\alpha _jE_j\le 2\alpha _i E_i. \end{aligned}$$

In particular, noting \(\omega _0=\omega \), we have

$$\begin{aligned} |\omega _*-\omega |\le 2\alpha E_0. \end{aligned}$$

Also note \(Q_i=\sum \nolimits _{j=0}^{i-1}{\hat{Q}}_j.\) (4.19) implies \( Q_i\rightarrow Q_*\) as \(i\rightarrow \infty \). Recall that \(E_0=\gamma .\) If \(\gamma \) is sufficiently small,

$$\begin{aligned} \Vert Q_i\Vert _{ U\times T_{s_i}}\le \sum \limits _{j=0}^{i-1}c\Gamma (\sigma _j) E_j\le c\Delta (\sigma )E_0=c\Delta (\sigma )\gamma \ll 1. \end{aligned}$$

Thus \(\mathop {\textrm{lim}}\nolimits _{i\rightarrow \infty }N_i=N_*\), where

$$\begin{aligned} N_*= \big \langle \omega _*,y\big \rangle +\big \langle \Omega u,v\big \rangle + \big \langle Q_*(\xi ; x)z, z\big \rangle . \end{aligned}$$
(4.22)

Let \(P_i\rightarrow P_*\), then \(P_*\in C^{\ell ;a}(U\times D_{s/2,r/2}).\) By (4.14) and Cauchy’s estimate we have \(\partial _y P_*=0, \partial _z P_*=0, \partial ^2_{zz} P_*=0\) for \((y, z)=(0,0).\) Thus,

$$\begin{aligned} P_*(\xi ; w)=\mathop {\sum }\limits _{2|i|+|j|+|l| > 2}P_{*\beta }(\xi ;x){\bar{w}}^\beta , \ \ {\bar{w}}^\beta =y^iu^jv^l. \end{aligned}$$
(4.23)

Let \( {\tilde{U}}_*=\{\xi \in {\tilde{U}} \ | \ \omega _*(\xi )\in R_{\alpha }^K\}\). We are going to prove that \( {\tilde{U}}_*\subset {\tilde{U}}_i\) for \(\forall i\ge 0\). Recall that \(2^{i+4}K_i\Xi (K_i)\epsilon _i/r_i^2\le \alpha \). For \(\xi \in {\tilde{U}}_*\) and \(0<|k|\le K_i\),

$$\begin{aligned} |\big \langle k,\omega _i(\xi )\big \rangle |\ge |\big \langle k,\omega _*(\xi )\big \rangle |-|\big \langle k,\omega _*(\xi )-\omega _i(\xi )\big \rangle |\ge \frac{\alpha }{\Xi (|k|)}-\frac{2\epsilon _i}{r_i^2}K_i\ge \frac{\alpha _i}{\Xi (|k|)}, \end{aligned}$$

thus \(\omega _i(\xi )\in R_{\alpha _i}^{K_i}, \ \forall i\ge 0 \) and so \({\tilde{U}}_*\subset {\tilde{U}}_i, \ \forall i\ge 0\).

By (4.17), it follows that

$$\begin{aligned} H\circ \Psi _{i}=N_i+P_i,\ (\xi ; w)\in {\tilde{U}}_*\times D_i. \end{aligned}$$

Taking the limit (as \(i\rightarrow \infty \)) in the above equation, we get

$$\begin{aligned} H\circ \Psi _{*}=N_*+P_*, \ (\xi ; w)\in {\tilde{U}}_*\times D_{s/2,r/2}, \end{aligned}$$

where \(N_*\) and \(P_*\) are given in (4.22) and (4.23). Thus, we finish the proof.