1 Introduction

The problem of casting an analytic nearly-integrable Hamiltonian system into normal form is deeply related to Poincaré’s challenging problème général de la dynamique, Poincaré (1892). Nowadays, normal forms are still one of the main technical tools used to deal with the issue raised by Poincaré in this context.

The particular case in which the unperturbed part is supposed to be linear in the actions (isochronous case), already investigated by Birkhoff (and for this reason also known as the Birkhoff problem) Birkhoff (1927), has a peculiar interest. The first rigorous statement concerning its stability can be found in Gallavotti (1986). The possibility to cast the considered Hamiltonian in normal form, up to some finite orderFootnote 1 r and to obtain, as a consequence, a stability time estimate “à la Nekhoroshev”, is directly related to a particularly simple small-divisors analysis: the non-resonant (Diophantine) hypothesis on the frequency vector \(\omega \) of the unperturbed system is sufficient in order to ensure the resolvability of the (standard) homological equation arising in the normalization algorithm. An extensive bibliography on this problem goes beyond the purposes of this paper, we only mention the recent generalisations for the planetary problem of Pinzari (2013) and of Bambusi (2005) for infinite dimensional systems.

It is well known that the extension to the non-isochronous case requires a careful analysis (geometric part, see Nekhoroshev 1977, 1979; Benettin and Gallavotti 1986) on the regions of the phase space in which the actions I are such that \(\omega =\omega (I)\) is non-resonant (non-resonant domains).

The problem of dealing with time-dependent perturbations without any hypothesis on the time dependence (e.g. periodic or quasi-periodic) has peculiar technical difficulties. After the pioneering works of Pustyl’nikov (1974) and Giorgilli and Zehnder (1992), the interest for this class of problems has been recently renewed in Bounemoura (2013), Fortunati and Wiggins (2014a) and subsequent papers. Examples of more general (i.e. aperiodic) non-autonomous perturbation in the context of the Lagrangian transport theory for fluids have been pointed out in Wiggins and Mancho (2014). Despite towards a different direction it is worth mentioning the stochastic perturbations of the Kepler problem discussed in Cresson et al. (2015), naturally arising in some Celestial Mechanics models.

From a technical point of view, the presence of an aperiodic time dependence, requires a different treatment of the homological equation which takes the form of a linear PDE. A first approach consists in keeping the terms involving the time derivative of the generating function (also called extra-terms) in the normal form and then providing a bound for them. This approach, originally suggested in Giorgilli and Zehnder (1992) then used in Fortunati and Wiggins (2014a), yields a normal form result for the case a of slow time dependence. This hypothesis provides a smallness condition for the mentioned extra-terms. Alternatively, those terms can be removed by including them into the homological equation, which turns out to be, in this way, a linear ODE in time. This has been profitably used in Fortunati and Wiggins (2014b), Fortunati and Wiggins (2015a) and in Fortunati and Wiggins (2015b) but requires (except for a particular case described in Fortunati and Wiggins 2015b) an important assumption. More precisely, it is necessary to suppose that the perturbation, as a function of t, belongs to the class of summable functions over the real semi-axis.Footnote 2 As in (3), those functions exhibiting a (slow) exponential decay will be used as a paradigmatic case. It will be shown that the consequences of this assumption in the isochronous case are remarkable: the normalization algorithm can be iterated an infinite number of times by means of a superconvergent method borrowed from KAM type arguments, see e.g. Chierchia (2009). The procedure leads to the so-called strong normal form i.e. in which the normalized Hamiltonian has the same form of the integrable part of the initial problem. Furthermore, no restrictions are imposed on \(\omega \), hence flows with arbitrary frequencies persist in the transformed system.

As it would be likely to expect, this phenomenon has an important consequence also in the non-isochronous case. The possibility to disregard the problems related to the small divisors implies that the well known geography of the resonances analysis, a key step of the Nekhoroshev theorem, is not necessary in this case and the results that can be stated are purely “analytic”. In such a way, the classical assumptions on the unperturbed part of the Hamiltonian (such as steepness, convexity etc.), are no longer required. As a common feature with the isochronous case, the obtained normal form does not exhibit resonant terms, as these have been annihilated in the normalization by using the time-dependent homological equation. This implies that, in this case, the plane of fast drift (see e.g. Giorgilli 2003) degenerates to a point. The paper uses in a concise but self-contained form, the tools developed in the above mentioned papers of the same authors, especially of Fortunati and Wiggins (2015b) in which the concept of “family” of canonical transformations parametrised by t is introduced. The proofs are entirely constructed by using the language and the tools of the Lie series and Lie transform methods developed by Giorgilli et al., see e.g. Giorgilli (2003).

2 Setting and main results

Consider the following nearly integrable Hamiltonian

$$\begin{aligned} H(I,\varphi ,\eta ,t)=h(I) + \eta + \hat{\varepsilon } f(I,\varphi ,t), \end{aligned}$$
(1)

with \((I,\varphi ,\eta ,t) \in G \times {\mathbb {T}}^n \times {\mathbb {R}}\times {\mathbb {R}}^+\), where \( G \subset {\mathbb {R}}^n\) and \(\hat{\varepsilon }>0\) is a small parameter, which is the “autonomous equivalent” in the extended phase space of Hamiltonian \(\mathcal {H}(I,\varphi ,t)=h(I) + \hat{\varepsilon } f(I,\varphi ,t)\).

We define, for all \(t \in {\mathbb {R}}^+:=[0,+\infty )\), the following complexified domain \(\mathcal {D}_{\rho ,\sigma }:=\mathcal {G}_{\rho } \times {\mathbb {T}}_{\sigma }^n \times \mathcal {S}_{\rho }\), where \(\mathcal {G}_{\rho }:=\bigcup _{I \in G} \Delta _{\rho }(I)\) and

$$\begin{aligned} \Delta _{\rho }(I):=\left\{ \hat{I} \in {\mathbb {C}}^n{:}|\hat{I}-I| \le \rho \right\} , \quad {\mathbb {T}}_{\sigma }^n := \left\{ \varphi \in {\mathbb {C}}^n{:} |\mathfrak {I}\varphi | \le \sigma \right\} , \quad \mathcal {S}_{\rho }:=\left\{ \eta \in {\mathbb {C}}: |\mathfrak {I}\eta | \le \rho \right\} , \end{aligned}$$

with \(\rho ,\sigma \in (0,1)\). For all \(g:\mathcal {G}_{\rho } \times {\mathbb {T}}_{\sigma }^n \times {\mathbb {R}}^+ \rightarrow {\mathbb {C}}\), write \(g=\sum _{k \in {\mathbb {Z}}^n} g_k(I,t) e^{i k \cdot \varphi }\), then define the Fourier norm (parametrized by t)

$$\begin{aligned} \left\| g \right\| _{\rho ,\sigma }:=\sum _{k \in {\mathbb {Z}}^n} \left| g_k(I,t)\right| _{\rho } e^{|k|\sigma }, \end{aligned}$$
(2)

with \(|\cdot |_{\rho }\) is the usual supremum norm over \(\mathcal {G}_{\rho }\) and \(|k|:=\sum _{l=1}^n |k_l|\). For all \(w : \mathcal {G}_{\rho } \times {\mathbb {T}}_{\sigma }^n \times {\mathbb {R}}^+ \rightarrow {\mathbb {C}}^n\) we shall set \(\left\| w \right\| _{\rho ,\sigma }:=\sum _{l=1}^n\left\| w_l \right\| _{\rho ,\sigma }\). The standard framework (see eg. Benettin et al. 1984) is the space \({\mathfrak {C}}_{\rho ,\sigma }\), of continuous functions on \(\mathcal {G}_{\rho } \times {\mathbb {T}}_{\sigma }^n\), holomorphic in its interior for some \(\rho ,\sigma \) and real on \(G \times {\mathbb {T}}^n\) for allFootnote 3 \(t \in {\mathbb {R}}^+\). We shall suppose \(h(I) \in {\mathfrak {C}}_{\rho ,\cdot }\) and \(f \in {\mathfrak {C}}_{\rho ,\sigma }\) while it is sufficient to assume that, for all \(I \in \mathcal {G}_{\rho }, f_k(I,\cdot ) \in \mathcal {C}^1({\mathbb {R}}^+)\).

Similarly to Fortunati and Wiggins (2015b), we introduce the following

Hypothesis 2.1

(Time decay) There exists \(M_f>0\) and \(a \in (0,1)\)

$$\begin{aligned} \left\| f(I,\varphi ,t) \right\| _{\rho ,\sigma } \le M_f e^{-a t}. \end{aligned}$$
(3)

Set \(\varepsilon :=\hat{\varepsilon } M_f\). We firstly state the following

Theorem 2.2

(Strong aperiodic Birkhoff) Consider Hamiltonian (1) with \(h(I):=\omega \cdot I\), under the Hypothesis 2.1 and the described regularity assumptions. Then, for all \(a \in (0,1)\) there exists \(\varepsilon _a>0\) such that the following statement holds true. For all \(\varepsilon \in (0,\varepsilon _a]\), it is possible to find \(0<\rho _*<\rho _0<\rho \) and \(0<\sigma _*<\sigma _0<\sigma \) and an analytic, canonical, \(\varepsilon -\)close and asymptotic to the identity change of variables \((I,\varphi ,\eta ) = \mathcal {B}(I^{(\infty )},\varphi ^{(\infty )},\eta ^{(\infty )}), \mathcal {B}:\mathcal {D}_{\rho _*,\sigma _*} \rightarrow \mathcal {D}_{\rho _0,\sigma _0}\) for all \(t \in {\mathbb {R}}^+\), casting Hamiltonian (1) into the strong Birkhoff normal form

$$\begin{aligned} H^{(\infty )}\left( I^{(\infty )},\varphi ^{(\infty )},\eta ^{(\infty )}\right) =\omega \cdot I^{(\infty )} + \eta ^{(\infty )}. \end{aligned}$$
(4)

Hence, in the new variables, the flow with frequency \(\omega \) persists for all \(\omega \), regardless of the numerical features of this vector, i.e. more specifically, no matter if it is resonant or not. Note that the absence of a non-resonance hypothesis on \(\omega \) implies also that (4) holds also if \(\omega \) has an arbitrary number of zero components.

With a straightforward adaptation of the notational setting, the result in the general case states as follows:

Theorem 2.3

There exist \(\varepsilon _a^*>0\) and \(r \in {\mathbb {N}}\setminus \{0\}\) such that, for all \(\varepsilon \in (0,\varepsilon _a^*]\) it is possible to find an analytic, canonical, \(\varepsilon -\)close and asymptotic to the identity change of variables \((I,\varphi ,\eta ) = \mathcal {N}_r\left( I^{(r)},\varphi ^{(r)},\eta ^{(r)}\right) , \mathcal {N}_r:\mathcal {D}_{\tilde{\rho _*},\tilde{\sigma _*}} \rightarrow \mathcal {D}_{\tilde{\rho _0},\tilde{\sigma _0}}\) for all \(t \in {\mathbb {R}}^+\), casting Hamiltonian (1) under the Hypothesis 2.1, into the normal form of order r

$$\begin{aligned} H^{(r)}\left( I^{(r)},\varphi ^{(r)},\eta ^{(r)},t\right) =h\left( I^{(r)}\right) + \eta ^{(r)} + \mathcal {R}^{(r+1)}\left( I^{(r)},\varphi ^{(r)},t\right) , \end{aligned}$$
(5)

where \(\mathcal {R}^{(r+1)}\) is “exponentially small” with respect to r and vanishes forFootnote 4 \(t \rightarrow +\infty \). Moreover, for all \(I(0) \in G\) one has in (1): \(|I(t)-I(0)| \le \sqrt{\varepsilon } \tilde{\rho }_0/8\) for all \(t \in {\mathbb {R}}^+\).

Similarly to Fortunati and Wiggins (2015b) (and the mentioned previous papers), no lower bounds are imposed on a so that the decay can be arbitrarily slow. The (natural) consequence is that either \(\varepsilon _a\) or \(\varepsilon _a^*\) decrease with a, see (15) and (56).

We stress that, as a difference with the classical (non-autonomous) case, the stability property following from the above stated results, is an easy consequence of (3) and it could have been possible to show it directly from the equations of motion, by means of elementary method, without the use of the normal form approach.

3 Part I

4 Proof of Theorem 2.2

5 The normalization algorithm

Given a function \(G:=G (I,\varphi ,t)\), define the Lie series operator \(\exp ({\mathcal {L}}_{G}):={{\mathrm{Id}}}+\sum _{s \ge 1} (1/s!) \mathcal {L}_{G}^s\), where \(\mathcal {L}_{G} F :=\{F,G\} \equiv F_{\varphi } \cdot G_I - G_{\varphi } \cdot F_I- F_{\eta } G_t \). The aim is to construct a generating sequence \(\{\chi ^{(j)}\}_{j \in {\mathbb {N}}}\), such that the formal limit

$$\begin{aligned} \mathcal {B}:=\lim _{j \rightarrow \infty } \mathcal {B}^{(j)} \circ \mathcal {B}^{(j-1)} \circ \ldots \circ \mathcal {B}^{(0)}, \end{aligned}$$
(6)

where \(\mathcal {B}^{(j)}:=\exp (\mathcal {L}_{\chi ^{(j)}})\) is such that \(\mathcal {B} \circ H\) is of the form (4). The following statement shows that this is possible, at least at a formal level

Proposition 3.1

Suppose that for some \(j \in {\mathbb {N}}\) Hamiltonian (1) is of the form

$$\begin{aligned} H^{(j)}=\omega \cdot I + \eta + F^{(j)} (I,\varphi ,t). \end{aligned}$$
(7)

Then \(H^{(j+1)} :=\mathcal {B}^{(j)} \circ H^{(j)} \) is still of the form (7) with

$$\begin{aligned} F^{(j+1)}=\sum _{s \ge 1} \frac{s}{(s+1)!}\mathcal {L}_{\chi ^{(j)}}^s F^{(j)}, \end{aligned}$$
(8)

provided that \(\chi ^{(j)}\) solves the homological equation

$$\begin{aligned} \chi _t^{(j)} + \omega \cdot \chi _{\varphi }^{(j)}=F^{(j)}. \end{aligned}$$
(9)

Since Hamiltonian (1) is of the form (7), one can set \(H^{(0)}:=H\) with \(F^{(0)}:=\hat{\varepsilon } f\). Thus, by induction, the form (7) holds for all \(j \in {\mathbb {N}}\). Clearly, this does not guarantee that the objects involved in the algorithm are meaningful for all j, as it is well known their sizes can grow unboundedly as j increases, as a consequence of small divisors phenomena. The aim of Sect. 4 (and in particular of Lemma 4.5) is to show that this is not the case: the key ingredient is the time decay of f.

Proof

We get \(\exp (\mathcal {L}_{\chi ^{(j)}})H^{(j)}=I \cdot \omega + \eta + F^{(j)}(I,\varphi ,t)+\mathcal {L}_{\chi ^{(j)}}( \omega \cdot I + \eta )+\sum _{s \ge 1} (1/s!) \mathcal {L}_{\chi ^{(j)}}^s F^{(j)}+ \sum _{s \ge 2} (1/s!) \mathcal {L}_{\chi ^{(j)}}^s (\omega \cdot I + \eta )\). The sum between the third and fourth terms of the r.h.s. of the latter equation vanishes due to (9). As for the last two terms, by setting \(F^{(j+1)}\) as the sum of them, one gets \(F^{(j+1)}=\sum _{s \ge 1} (1/s!) \mathcal {L}_{\chi ^{(j)}}[F^{(j)}+(s+1)^{-1}\mathcal {L}_{\chi ^{(j)}}(\omega \cdot I +\eta )]\), which immediately yields (8) by using (9). \(\square \)

The (formal) expansions \(\chi ^{(j)}=\sum _{k \in {\mathbb {Z}}^n} c_k^{(j)}(I,t)e^{i k \cdot \varphi }\) and \(F^{(j)}=\sum _{k \in {\mathbb {Z}}^n} f_k^{(j)}(I,t)e^{i k \cdot \varphi }\) yield (9) in terms of Fourier components

$$\begin{aligned} \partial _t c_k^{(j)}(I,t)+i \lambda (k) c_k^{(j)} (I,t)= f_k^{(j)}(I,t), \end{aligned}$$
(10)

with \(\lambda (k):=\omega \cdot k\). The solution of (10) is

$$\begin{aligned} c_k^{(j)}(I,t)=e^{-i \lambda (k) t}\left[ c_k^{(j)}(I,0)+\int _0^t e^{i \lambda (k) s} f_k^{(j)}(I,s) ds\right] , \end{aligned}$$
(11)

where \( c_k^{(j)}(I,0)\) will be chosen later.

6 Convergence

The classical argument requires the construction of a sequence of nested domains \(\mathcal {D}_{\rho _{j+1},\sigma _{j+1}} \subset \mathcal {D}_{\rho _j,\sigma _j} \ni (I^{(j)},\varphi ^{(j)},\eta ^{(j)})\), such that \(\mathcal {B}_j: \mathcal {D}_{j+1} \rightarrow \mathcal {D}_{j}\). The resulting progressive restriction is essential in order to use standard Cauchy tools, see Proposition 4.1. The estimates found in Lemma 4.2, concerning the solution of Eq. (9), will be used to prove Lemma 4.5, providing in this way the bound on \(F^{(j)}\) defined in Proposition 3.1. This is achieved for a suitable sequence of domains prepared in Lemma 4.4 via \(\{\rho _j\}\) and \(\{\sigma _j\}\). This allows us to conclude that the perturbation term is actually removed in the limit (6).

The final step consists of showing that \(\mathcal {B}\) defines an analytic map \(\mathcal {B} : \mathcal {D}_{\rho _*,\sigma _*} \ni (I^{(\infty )},\varphi ^{(\infty )},\eta ^{(\infty )}) \rightarrow \mathcal {D}_{\rho _0,\sigma _0} \ni (I^{(0)},\varphi ^{(0)},\eta ^{(0)}) \equiv (I,\varphi ,\eta )\), where \(\rho _* \le \rho _j\) and \(\sigma _* \le \sigma _j\) for all \(j \in {\mathbb {N}}\). This property is shown in Lemma 4.6. As \(\mathcal {D}_{\rho _*,\sigma _*}\) will be the domain of analyticity of the transformed Hamiltonian via \(\mathcal {B}\), it will be essential to require that \(\rho _*,\sigma _*>0\).

6.1 Some preliminary results

Proposition 4.1

Let \(F,G: \mathcal {G}_{\rho } \times {\mathbb {T}}_{\sigma }^n \times {\mathbb {R}}^+ \rightarrow {\mathbb {C}}\) such that \(\left\| F \right\| _{(1-d^{\prime })(\rho ,\sigma )}\) and \(\left\| G \right\| _{(1-d^{\prime \prime })(\rho ,\sigma )}\) are bounded for some \(d^{\prime },d^{\prime \prime } \in [0,1)\). Then, defining \(\delta :=|d^{\prime }-d^{\prime \prime }|\) and \({\hat{d}}:=\max \{d^{\prime },d^{\prime \prime }\}\), for all \(\tilde{d} \in (0,1-{\hat{d}})\) one has for all \(s \in {\mathbb {N}}\setminus \{0\}\)

$$\begin{aligned} \left\| \mathcal {L}_G^s F \right\| _{\left( 1-\tilde{d}-{\hat{d}}\right) (\rho ,\sigma )} \le \frac{s!}{e^2} \left( \frac{2e}{\tilde{d}\left( \tilde{d}+\tilde{\delta }_s\right) \rho \sigma } \left\| G \right\| _{(1-d^{\prime \prime })(\rho ,\sigma )} \right) ^s \left\| F \right\| _{(1-d^{\prime })(\rho ,\sigma )}, \end{aligned}$$
(12)

where \(\tilde{\delta }_s=\delta \) if \(s=1\) and is zero otherwise.

Proof

Straightforward from (Giorgilli 2003, Lemmas 4.1, 4.2). \(\square \)

Lemma 4.2

Suppose that \(F^{(j)}\) satisfies \(\left\| F^{(j)} \right\| _{[{\hat{\sigma }},{\hat{\rho }}]} \le M^{(j)}\exp (-at)\) for some \(M^{(j)}>0, {\hat{\rho }} \le \rho \) and \({\hat{\sigma }} \le \sigma \). Define \(C_{\omega }:=1+|\omega |\), then for all \(\delta \in (0,1)\) the solution of (9) satisfies

$$\begin{aligned} \left\| \chi ^{(j)} \right\| _{(1-\delta )({\hat{\rho }},{\hat{\sigma }})} \le \frac{M^{(j)}}{a} \left( \frac{e}{\delta {\hat{\sigma }}} \right) ^{2n} e^{-at}, \quad \left\| \chi _t^{(j)} \right\| _{(1-\delta )({\hat{\rho }},{\hat{\sigma }})} \le C_{\omega } \frac{M^{(j)}}{a} \left( \frac{e}{\delta {\hat{\sigma }}} \right) ^{2n} e^{-at}.\nonumber \\ \end{aligned}$$
(13)

Proof

First of all, by hypothesis \(\left| f_k^{(j)}(I,t)\right| \le M^{(j)} \exp (-|k|{\hat{\sigma }}-at)\), in particular, by choosing \(c_k^{(j)}(I,0):=-\int _{{\mathbb {R}}^+} \exp (i \lambda (k)s) f_k^{(j)}(I,s) ds\) we have that \(|c_k^{(j)}(I,0)|<+\infty \) for all \(I \in \mathcal {G}_{\rho }\). Substituting \(c_k^{(j)}(I,0)\) in (11) one gets \(|c_k^{(j)}(I,t)| \le \int _t^{\infty }|f_k^{(j)}(I,s)|ds \le (M^{(j)} /a) \exp (-|k|{\hat{\sigma }} -at)\) which yieldsFootnote 5 the first of (13). As for the second of (13), it is sufficient to use (10), which implies, \(|\partial _t c_k^{(j)}(I,t)| \le (M^{(j)}/a)(1+|\omega ||k|) \exp (-|k|{\hat{\sigma }}-a t)\) then proceed similarly. \(\square \)

Remark 4.3

It is immediate to notice that a hypothesis of non-resonance on \(\omega \) does not substantially improve the bounds (13). A more careful computation yields

$$\begin{aligned} \left| c_k^{(j)}(I,t)\right| \le M^{(j)} \left( a^2+\left( \omega \cdot k\right) ^2\right) ^{-\frac{1}{2}}e^{- |k| \sigma _j-a t}, \end{aligned}$$

Hence the estimate cannot be refined due to the presence of \(|c_0^{(j)}(I,t)|\), no matter what the minimum value of \((\omega \cdot k)\) is.

6.2 A suitable sequence of domains

Lemma 4.4

Let \(\{d_j\}_{j \in {\mathbb {N}}}\) be a (real valued) sequence such that \(0 \le d_j \le 1/6\). Consider, for all \(j \in {\mathbb {N}}\), the following sequences

$$\begin{aligned} \epsilon _{j+1}:=K a^{-1} d_j^{-\tau } \epsilon _j^2, \quad \left( \rho _{j+1},\sigma _{j+1}\right) :=\left( 1-3d_j\right) \left( \rho _j,\sigma _j\right) , \end{aligned}$$
(14)

with \(K>0\) and \(\tau :=2 n + 3\). Then, for all \(0<\rho _0 \le \rho , 0<\sigma _0 \le \sigma \) and \(\epsilon _0 \le \varepsilon _a\) where

$$\begin{aligned} \varepsilon _a \le a K^{-1} (2 \pi )^{-2 \tau }, \end{aligned}$$
(15)

it is possible to construct \(\{d_j\}_{j \in {\mathbb {N}}}\) such that \((\rho _*,\sigma _*)=(1/2)(\rho _0,\sigma _0)\), in particular they are strictly positive. Furthermore \(\lim _{j \rightarrow \infty } \epsilon _j=0\).

Proof

Choose \(\epsilon _j:=\epsilon _0 (j+1)^{-2 \tau }\) (so that \(\lim _{j \rightarrow \infty } \epsilon _j=0\) by construction). By the first of (14) one gets

$$\begin{aligned} d_j = \left( \epsilon _0 K a^{-1}\right) ^{\frac{1}{\tau }} (j+2)^2/(j+1)^4, \end{aligned}$$
(16)

hence, by (15), \(d_j \le \pi ^{-2} (j+1)^{-2}\). This implies \(\sum _{j \ge 0} d_j \le 1/6\) and then, trivially, \(d_j \le 1/6\) for all \(j \in {\mathbb {N}}\). Now we haveFootnote 6 \( \ln \Pi _{j \ge 0} (1-3 d_j)=\sum _{j \ge 0} \ln (1-3 d_j) \ge -6 \ln 2 \sum _{j \ge 0} d_j = - \ln 2\), hence \(\lim _{j \rightarrow \infty } \rho _j=\rho _0 \Pi _{j \ge 0} (1-3 d_j) \ge \rho _0/2 =: \rho _*\). Analogously \(\sigma _*:=\sigma _0/2\). \(\square \)

6.3 Bounds on the formal algorithm

Lemma 4.5

There exists \(K=K(\rho _0,\sigma _0)>0\) such that, if \(\varepsilon \le \varepsilon _a\) where \(\varepsilon _a\) satisfies (15), then

$$\begin{aligned} \left\| F^{(j)} \right\| _{(\rho _j,\sigma _j)}\le \epsilon _j e^{-at}, \end{aligned}$$
(17)

for all \(j \in {\mathbb {N}}\). Hence, the transformed Hamiltonian \(\mathcal {B} \circ H\) is in the form (4).

Proof

By induction. Note that (17) is true for \(j=0\) setting \(\epsilon _0:=\varepsilon \). The condition on \(\varepsilon \) ensures the validity of Lemma 4.4. Hence, supposing (17), by Lemmas 4.2 and 4.4, we get

$$\begin{aligned} \left\| \chi ^{(j)} \right\| _{(1- d_j)(\rho _j,\sigma _j)} \le \epsilon _j (e/\sigma _*)^{2n} a^{-1} d_j^{-2n} e^{-a t}. \end{aligned}$$
(18)

By (8) and Proposition 4.1 with \(d^{\prime }=d_j, d^{\prime \prime }=0\) and \(\tilde{d}=d_j\) (the condition \(d_j \le 1-d_j\) holds as \(d_j \le 1/6\))

$$\begin{aligned} \left\| F^{(j+1)} \right\| _{(1-2 d_j)(\rho _j,\sigma _j)} \le \sum _{s \ge 1} \frac{1}{s!} \left\| \mathcal {L}_{\chi ^{(j)}}^s F^{(j)} \right\| _{(1-2 d_j)(\rho _j,\sigma _j)} \le 2^{-1} \Theta \left\| F^{(j)} \right\| _{(\rho _j,\sigma _j)}, \end{aligned}$$
(19)

whereFootnote 7

$$\begin{aligned} \Theta :=2 \epsilon _j n C_{\omega } \left( e/\sigma _*\right) ^{\tau } \rho _*^{-1} a^{-1} d_j^{-2n-2} e^{- a t} \le 1/2 \end{aligned}$$
(20)

is a sufficient condition for the convergence of the operator \(\exp (\mathcal {L}_{\chi ^{(j)}})\), from which \(\sum _{s \ge 1} \Theta ^s \le 2 \Theta \). Hence, by (19), (20), then by (18) one gets (use also \(\sigma _*,\rho _*,d_j<1\))

$$\begin{aligned} \left\| F^{(j+1)} \right\| _{(1-2 d_j)(\rho _j,\sigma _j)} \le \epsilon _j^2 n C_{\omega } (e/\sigma _*)^{\tau } \rho _*^{-1} a^{-1} d_j^{-\tau } e^{-at}. \end{aligned}$$
(21)

The latter is valid a fortiori in \(\mathcal {D}_{(1-3d_j)(\rho _j,\sigma _j)}\).

In conclusion, by choosing \(K:=n C_{\omega } (e/\sigma _*)^{\tau } \rho _*^{-1} =2^{\tau +1} n C_{\omega } (e/\sigma _0)^{\tau } \rho _0^{-1}\), from the first of (14), we have that (17) is satisfied for \(j \rightarrow j+1\). Furthermore, by the first of (14), condition (20) yields \(1 \ge 4 \epsilon _j K d_j a^{-1} d_j^{-\tau }e^{-at} =4 d_j (\epsilon _{j+1}/\epsilon _j) e^{-a t}\). The latter is trivially true for all \(t \in {\mathbb {R}}^+\) by the monotonicity of \(\epsilon _j\) and as \(d_j \le 1/6\). Furthermore this implies

$$\begin{aligned} \Theta \le 2 d_j e^{-at}. \end{aligned}$$
(22)

Hence \(\exp (\mathcal {L}_{\chi ^{(j)}})\) is well defined for all \(j \in {\mathbb {N}}\). \(\square \)

In this way the value of \(\varepsilon _a\) mentioned in the statement of Theorem 2.2 is determined once and for all.

6.4 Estimates on the transformation of coordinates

Lemma 4.6

The limit (6) exists, it is \(\varepsilon -\)close to the identity and satisfies

$$\begin{aligned} \left| I^{(\infty )}-I\right| ,\left| \eta ^{(\infty )}-\eta \right| \le \left( \rho _0/6\right) e^{-at}, \quad \left| \varphi ^{(\infty )}-\varphi \right| \le \left( \sigma _0/6\right) e^{-at}, \end{aligned}$$
(23)

in particular it defines an analytic map \(\mathcal {B}:\mathcal {D}_{\rho _*,\sigma _*} \rightarrow \mathcal {D}_{\rho _0,\sigma _0}\) and \(H^{(\infty )}\) is an analytic function on \(\mathcal {D}_{\rho _*,\sigma _*} \) for all \(t \in {\mathbb {R}}^+\).

Proof

Let us start with I. Note that \(\left\| \mathcal {L}_{\chi ^{(j)}} I^{(j+1)} \right\| _{(1-2d_j)(\rho _j,\sigma _j)} \le n (e d_j \rho _j)^{-1}\left\| \chi ^{(j)} \right\| _{(1- d_j)(\rho _j,\sigma _j)}\) by a Cauchy estimate [see (Giorgilli 2003, Lemma 4.1)], so that the presence of n in (20) is justified. Hence use Proposition 4.1 with \(F \leftarrow \mathcal {L}_{\chi ^{(j)}} I^{(j+1)}, s \leftarrow s-1\), obtaining \(\left\| \mathcal {L}_{\chi ^{(j)}}^{s} \varphi ^{(j+1)} \right\| _{(1-3d_j)(\rho _j,\sigma _j)} \le e^{-2}s!\Theta ^{s} \rho _0\). This implies

$$\begin{aligned} |I^{(j+1)}-I^{(j)}|\le e^{-2}\sum _{s \ge 1} (1/s!) \left\| \mathcal {L}_{\chi ^{(j)}}^{s} I^{(j+1)} \right\| _{(1-3d_j)(\rho _j,\sigma _j)} \le 2^{-1} \Theta \rho _0 \le d_j \rho _0 e^{-at}, \end{aligned}$$

by (22). In particular \(|I^{(j+1)}-I^{(j)}|\) is \(\varepsilon -\) close to the identity by (16) for all \(j \in {\mathbb {N}}\), hence \(|I^{(\infty )}-I| \le \sum _{j \ge 0} |I^{(j+1)}-I^{(j)}|\) is. It is now sufficient to recall \(\sum _{j \ge 0} d_j \le 1/6\) in order to conclude.

The argument for \(\varphi \) is analogous while the variable \(\eta \) requires a slight modification. In particular, as one needs to set \(F \leftarrow \mathcal {L}_{\chi ^{(j)}} \eta =-\chi _t^{(j)}\), the use of the second of (13) requires the contribution of \(C_{\omega }\) in (20).

In conclusion, the obtained composition of analytic maps is uniformly convergent in any compact subset of \(\mathcal {D}_{\rho _*,\sigma _*}\). This implies that \(\mathcal {B}\) is analytic on \(\mathcal {D}_{\rho _*,\sigma _*}\) by the Weierstraß Theorem and hence the image of H via \(\mathcal {B}\) is an analytic function in the same domain. \(\square \)

7 Further perturbation examples

In this section we consider two alternative examples of perturbation. The main purpose is to show that the hypothesis of summability in time over the semi-axis is the only key requirement for the argument beyond the proof of Theorem 2.2.

In particular, we shall firstly consider a decay which is assumed to be quadratic in time, while in the second example a perturbation exhibiting a finite number of (differentiable) bumps is examined. The procedure is fully similar, with the exception of some bounds that will be explicitly given below.

7.1 Quadratic decay

Let us suppose that (3) is modified as

$$\begin{aligned} \left\| f(I,\varphi ,t) \right\| _{\rho ,\sigma } \le M_f (t+1)^{-2}. \end{aligned}$$

In the same framework, it is immediate to show that the analogous of Lemma 4.2 yields the following estimates

$$\begin{aligned}&\left\| \chi ^{(j)} \right\| _{(1-\delta )\left( {\hat{\rho }},{\hat{\sigma }}\right) } \le M^{(j)} \left( e \delta ^{-1} {\hat{\sigma }}^{-1}\right) ^{2n} (t+1)^{-1}, \quad \left\| \chi _t^{(j)} \right\| _{(1-\delta )\left( {\hat{\rho }},{\hat{\sigma }}\right) } \\&\quad \le M^{(j)} C_{\omega } \left( e \delta ^{-1} {\hat{\sigma }}^{-1}\right) ^{2n} (t+1)^{-1}. \end{aligned}$$

Clearly, in this case, the integration has led to a “loss of a power” in the decay. This is harmless as, by (19), \(\left\| F^{(j+1)} \right\| _{(1-2d_j)(\rho _j,\sigma _j)}=O(F^{j}) O(\chi ^{(j)})+h.o.t.\) and then \(F^{(j+1)} \sim (t+1)^{-3} \le (t+1)^{-2}\) so that the scheme can be iterated.Footnote 8

The rest of the proof is analogous provided that the term \(e^{-at}\) is replaced with 1 in the remaining estimates.

7.2 Differentiable bumps

Let \(L \in {\mathbb {N}}\setminus \{0\}\) and \(h>0\). Consider an increasing sequence \(\{t_l\}_{l=1,\ldots ,L} \in {\mathbb {R}}^+\) such that \(t_{l+1}-t_l>2h\), then the following function

$$\begin{aligned} \xi _l(t):= \left\{ \begin{array}{l@{\quad }c@{\quad }l} \left( a_l/h^4\right) \left[ \left( t-t_l+h\right) \left( t-t_l-h\right) \right] ^2 &{} \quad &{} t \in \left[ t_l-h,t_l+h\right] \\ 0 &{} &{} \text{ otherwise } \end{array} \right. \end{aligned}$$

where \(a_l \in {\mathbb {R}}\). Considering a function \(\tilde{f}(I,\varphi ) \in {\mathfrak {C}}_{\rho ,\sigma }\), we set as

$$\begin{aligned} f(I,\varphi ,t) := \tilde{f}(I,\varphi ) \sum _{l =1}^L \xi _l(t). \end{aligned}$$

In such case we find

$$\begin{aligned} \left\| \chi ^{(j)} \right\| _{(1-\delta )({\hat{\rho }},{\hat{\sigma }})} \le 2 A M^{(j)} h \left( e \delta ^{-1} {\hat{\sigma }}^{-1}\right) ^{2n} , \quad \left\| \chi _t^{(j)} \right\| _{(1-\delta )({\hat{\rho }},{\hat{\sigma }})} \le M^{(j)} C_{\omega } \left( e \delta ^{-1} {\hat{\sigma }}^{-1}\right) ^{2n}, \end{aligned}$$

with \(A:=\sum _{l=1}^L |a_l|\). The remaining part of the proof is straightforward with the obvious modifications. In particular, as for the proof of Lemma 4.5, one finds \(K=2 n C_{\omega } (e/\sigma _*)^{\tau } h A \rho _*^{-1}\).

8 Part II

9 Proof of Theorem 2.3

In order to simplify the notation, we shall use \((\rho _H,\sigma _H)\) in place of \((\rho ,\sigma )\) and \((\rho ,\sigma )\) in place of \((\tilde{\rho }_0,\tilde{\sigma }_0)\) from now on.

10 Formal algorithm

As in Giorgilli (2003), we write Hamiltonian (1) in the form

$$\begin{aligned} H(I,\varphi ,\eta ,t)=H_0(I,\eta )+H_1(I,\varphi ,t)+H_2(I,\varphi ,t)+\ldots \end{aligned}$$

where

$$\begin{aligned} H_0(I,\eta ):=h(I)+\eta , \quad H_s(I,\varphi ,t):=\sum _{k \in \Lambda _s} f_k(I,t)e^{i k \cdot \varphi }, \end{aligned}$$

where \(\Lambda _s:=\{k \in {\mathbb {Z}}^n: (s-1)N \le |k| < sN\}\) and \(N \in {\mathbb {N}}\setminus \{0\}\) is meant to be determined.

Given a sequence of functions \(\{\chi ^{(s)}\}_{s\ge 1}:{\mathfrak {C}}_{\rho ,\sigma } \rightarrow {\mathbb {C}}\), the Lie transform operator is defined as

$$\begin{aligned} T_{\chi }:=\displaystyle \sum _{s \ge 0} E_s,\quad E_s:= \left\{ \begin{array}{l@{\quad }l@{\quad }l} {{\mathrm{Id}}}&{} \quad &{} s=0\\ \displaystyle \frac{1}{s} \sum _{j=1}^s j \mathcal {L}_{\chi ^{(j)}} E_{s-j} &{} \quad &{} s \ge 1 \end{array} \right. . \end{aligned}$$
(24)

Let \(r \in {\mathbb {N}}\setminus \{0\}\) to be determined. A finite generating sequence of order r, denoted with \(\chi ^{[r]}\), is such that \(\chi ^{(s)} \equiv 0\) for all \(s >r\). Our aim is to determine it in such a way the effect of \(H_1,\ldots ,H_r\) is removed, i.e.

$$\begin{aligned} H^{(r)}:=T_{\chi ^{[r]}} H=H_0+\mathcal {R}^{(r+1)}(I,\varphi ,t), \end{aligned}$$
(25)

where the remainder \(\mathcal {R}^{(r+1)}\) contains \(H_{> r}\) and a moltitude of terms produced during the normalization, which Fourier harmonics lie on \(\Lambda _{>r}\). The smallness of the remainder is an immediate consequence of the decay property of the coefficients of an analytic function. The procedure is standard: condition (25), with the use of (24), yields a well known diagram which \(s-\)th level Footnote 9 is of the form

$$\begin{aligned} \mathcal {E}_s:=E_s H_0+\sum _{l=1}^{s-1} E_{s-l} H_l + H_s=0, \end{aligned}$$
(26)

if \(s = 2,\ldots ,r\) and \(E_1 H_0+H_1=0\) if \(s=1\). As sum of all the “non-normalised” levels, the remainder easily reads as

$$\begin{aligned} \mathcal {R}^{(r+1)}=\sum _{s > r} \mathcal {E}_s. \end{aligned}$$
(27)

By writing the first term of (26) in the form \(E_s=\mathcal {L}_{\chi ^{(s)}}+\sum _{j=1}^{s-1} (j/s) \mathcal {L}_{\chi ^{(j)}}E_{s-j}\) and using the manipulation described in (Giorgilli 2003, Chapter 5), one obtains a remarkable cancellation of the contribution of \(H_0\). In this way, the generating sequence is determined as a solution of

$$\begin{aligned} \mathcal {L}_{H_0} \chi ^{(s)} = \Psi _s,\quad \Psi _s:= \left\{ \begin{array}{l@{\quad }l@{\quad }l} H_1 &{} \quad &{} s=1\\ \displaystyle H_s+\sum _{j=1}^{s-1} \frac{j}{s} E_{s-j}H_j &{} &{} s \ge 2 \end{array} \right. . \end{aligned}$$
(28)

A formal expansion of \(\chi ^{(j)}\) and of \(\Psi _s:=\sum _{k \in {\mathbb {Z}}^n} \psi _k^{(s)}(I,t)e^{i k \cdot \varphi }\) yields for all \(s=1,\ldots ,r\)

$$\begin{aligned} \partial _t c_k^{(s)}(I,t)+i(\omega (I)\cdot k) c_k^{(s)}(I,t) = \psi _k^{(s)}(I,t), \quad k \in \Lambda _s , \end{aligned}$$
(29)

where, as usual, \(\omega (I):=\partial _I h(I)\).

Remark 6.1

As a substantial difference with the isochronous case, the function \(\omega (I)\) is a complex valued vector as \(I \in \mathcal {G}_{\rho }\). In this way the exponent \(\lambda (k)t\) appearing in formula (11) is no longer purely complex. More precisely, one finds a term of the form \(\exp ((\omega _C (I)\cdot k) t)\), having denoted \(\omega (I)=\omega _R(I)+i \omega _C(I), \omega _{R,C}(I) \in {\mathbb {R}}^n\). The size of this term cannot be controlled without a cut-off on k. By restricting the analysis on the levels \(\Lambda _s\) and using the fact that \(|\omega _C(I)| \rightarrow 0 \) as \( \rho \rightarrow 0\), a loss “of part of time decay” at each step (see Lemma 7.1) will be the key ingredient to overcome this difficulty. The mentioned elements are clear obstructions to the limit \(r \rightarrow \infty \).

11 Convergence

11.1 Set-up and some preliminary results

The use of the analytic tools requires the usual construction of a sequence of nested domains. We shall choose, for all \(s=1,\ldots ,r\), the rule

$$\begin{aligned} d_s:=d (s-1)/r, \end{aligned}$$
(30)

with \(d \in (0,1/4]\). Clearly \(d_s < d\) for all \(s=1,\ldots ,r\). Consider also the monotonically decreasing sequence of non-negative real numbers \(\{a_s\}\) defined as follows

$$\begin{aligned} a_{s+1}:=a_s(2r-s)/(2r), \quad a_1:=a. \end{aligned}$$
(31)

Given the analyticity domain of H expressed by \((\rho _H,\sigma _H)\), set \(\sigma :=\sigma _H/2\). Now consider the function \(\Omega (\rho ):=\sup _{I \in \mathcal {G}_{\rho }}|\omega _C(I)|\), clearly \(\Omega (0)=0\). From now on we shall suppose that \(\rho \) satisfies the following condition

$$\begin{aligned} 4 r N \Omega (\rho ) \le a. \end{aligned}$$
(32)

The analyticityFootnote 10 of h(I) implies the existence of \(C_h \in [1,+\infty )\) such that the value of \(\rho \) can be determined as

$$\begin{aligned} \rho :=\min \left\{ \rho _H,a(4 r N C_h)^{-1}\right\} , \end{aligned}$$
(33)

once r and N will be chosen.

The scheme is constructed in such a way one can set \((\tilde{\rho }_*,\tilde{\sigma }_*):=(1-d)(\rho ,\sigma )\).

As a consequence of Hypothesis 2.1 and of the standard properties of analytic functions, one has

$$\begin{aligned} \left\| H_m \right\| _{\rho ,\sigma } \le \mathcal {F} h^{m-1} e^{-a t},\quad m \ge 1, \end{aligned}$$
(34)

with \(\mathcal {F}:=\varepsilon \tilde{\mathcal {F}}\), where [see (Giorgilli 2003, Lemma 5.2)] \(\tilde{\mathcal {F}}:=[(1+\exp (-\sigma /2))/(1-\exp (-\sigma /2))]^n\) and

$$\begin{aligned} h:=\exp (-N\sigma /2). \end{aligned}$$
(35)

Lemma 7.1

Suppose that \(\left\| \Psi _s \right\| _{(1-d_{s})(\rho ,\sigma )} \le M^{(s)} \exp (-a_s t)\), for some \(M^{(s)}>0\). Then the solution of (28) satisfies

$$\begin{aligned} 4a \left\| \chi ^{(s)} \right\| _{(1-d_{s+1/2})(\rho ,\sigma )} , 4 \left\| \partial _t \chi ^{(s)} \right\| _{(1-d_{s+1/2})(\rho ,\sigma )} \le C_r M^{(s)} e^{-a_{s+1}t}, \end{aligned}$$
(36)

where \(C_r:=2^{2n+4}(r/d)^n\).

Proof

Use (29). Similarly to Lemma 4.2, we choose \(c_k^{(s)}(I,0):=-\int _{{\mathbb {R}}^+} \exp (-(\omega (I)\cdot k)\tau ) \psi _k^{(s)}(I,\tau )d \tau \). Note that \(|c_k^{(s)}(I,0)| \le M^{(s)} \exp (-(1-d_s)|k|\sigma ) \int _{{\mathbb {R}}^+} \exp (|\omega _C(I)||k|-a)\tau ) d \tau < +\infty \) on \(\Lambda _s\) by (32). By using again (32) one gets

$$\begin{aligned} |c_k^{(s)}(I,t)| \le M^{(s)} e^{-(1-d_s)|k|\sigma } e^{\frac{a_s s}{4r}t} \int _t^{\infty } e^{a_s \left( \frac{s-4r}{4r} \right) \tau } d \tau \le \frac{4}{a} M^{(s)} e^{-(1-d_s)|k|\sigma } e^{-a_s \left( 1-\frac{s}{2r}\right) t}\text{. }\nonumber \\ \end{aligned}$$
(37)

The first of (36) is easily recognisedFootnote 11 by (31). The second of (36) follow from (37) and from (29). \(\square \)

Lemma 7.2

Let \(A,\Gamma ,\tau > 0\) and consider the real-valued sequences \(\{\kappa _s\}_{s \ge 1}\) and \(\{\gamma _l\}_{l \ge 0}\) defined as

$$\begin{aligned} \kappa _s:=A \tau ^{s-1} + \Gamma \sum _{j=1}^{s-1} \tau ^{j-1} \kappa _{s-j},\quad \gamma _l := \Gamma \sum _{j=1}^l \tau ^{j-1} \gamma _{l-j}, \end{aligned}$$
(38)

where \(\kappa _1\) and \(\gamma _0\) are given. Define \(\Delta :=\tau +\Gamma \), then for all \(s \ge 2\) and \( l \ge 1\)

$$\begin{aligned} \kappa _s = \left( \Gamma \kappa _1+\tau A\right) \Delta ^{s-2}, \quad \gamma _l = \gamma _0 \Gamma \Delta ^{l-1}. \end{aligned}$$
(39)

Proof

We shall denote with (38a) and (38b) the first and the second of (38), respectively. The same for (39). Let us suppose for a moment that (39a) is proven, then choose \(A=\Gamma \gamma _0\) and \(\kappa _1=\Gamma \gamma _0 = \gamma _1\). By substituting in (39a) one immediately gets (39b). Hence we need only to prove (39a).

For this purpose we use the well-known generating function method (see e.g. Wilf 2006). Namely, define \(g(z):=\sum _{n=1}^{\infty } w_n z^n\), multiply each equation obtained from (38a) by \(z^s\) as s varies, then “sum” all the equations. This leads to \(g(z)=[1-\Delta z]^{-1}(\kappa _1 (z-\tau z^2)+A \tau z^2)=(1+\Delta z + \Delta ^2 z^2+\ldots )(\kappa _1 (z-\tau z^2)+A \tau z^2)=\kappa _1 z + (\Gamma \kappa _1 + \tau A) \sum _{n \ge 2} \Delta ^{n-2} z^n\), which is the (39a). \(\square \)

11.2 Bounds on the generating function

Proposition 7.3

For all \(s \le r\), the following estimate holds

$$\begin{aligned} \left\| \chi _s \right\| _{(1-d_{s+1/2})(\rho ,\sigma )} \le (4a)^{-1} C_r \beta _s \mathcal {F} e^{-a_{s+1} t} \text{, } \end{aligned}$$
(40)

where the sequence \(\{\beta _s\}_{s=1,\ldots ,r} \in {\mathbb {R}}^+\) is determined by the following system

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l@{\quad }l} \beta _s &{} = &{} \displaystyle h^{s-1} + \frac{\Gamma }{s} \sum _{j=1}^{s-1} j \theta _{s-j} \\ \theta _l &{} = &{} \displaystyle \frac{\Gamma }{l} \sum _{j=1}^l j \beta _j \theta _{l-j} \end{array} \right. \end{aligned}$$
(41)

with \(\{\theta _l\}_{l=0,\ldots ,r-1} \in {\mathbb {R}}^+\) and

$$\begin{aligned} \Gamma :=16 n r^2 C_r \mathcal {F} (a d^2 \rho \sigma )^{-1} \text{, } \end{aligned}$$
(42)

under the conditionsFootnote 12 \(\beta _1=\theta _0=1\).

First of all note that by (24) and (34), one has \(\left\| \Psi _1 \right\| _{(1-d_{1})(\rho ,\sigma )} \le \mathcal {F}\exp (-a_1 t)\) and \(\left\| E_0 H_m \right\| _{(1-d_{})(\rho ,\sigma )} \le \mathcal {F} h^{m-1} \exp (-a_1 t)\) (recall (31)). Hence, given by \(s \le r\), we can suppose by induction to know \(\beta _1,\ldots ,\beta _{s-1}\) and \({\tilde{\theta }}_{0,m},\ldots ,{\tilde{\theta }}_{s-2,m}\), for all \(m \ge 1\), with \(\beta _1=1\) and \({\tilde{\theta }}_{0,m}=h^{m-1}\), such that the the following bounds hold for all \(j=1,\ldots ,s-1\) and \(l=0,\ldots ,s-2\)

$$\begin{aligned} \left\| \Psi _j \right\| _{(1-d_{j})(\rho ,\sigma )}&\le \beta _j \mathcal {F} e^{-a_j t}, \end{aligned}$$
(43a)
$$\begin{aligned} \left\| E_l H_m \right\| _{(1-d_{l+1})(\rho ,\sigma )}&\le {\tilde{\theta }}_{l,m} \mathcal {F} e^{-a_{l+1}t}, \end{aligned}$$
(43b)

By (43a) and Lemma 7.1, the bound (40) holds with j in place of s. Hence by Proposition 4.1 with \(G=\chi ^{(j)}, F=E_{s-j-1}H_m\) then \({\hat{d}}=\max _{j=1,\ldots ,s-1}\{d_{j+1/2},d_{s-j}\}=d_{s-1/2}\) and finally \(\tilde{d}:=d_s-d_{s-1/2}=d/(2r)\), one has (by setting \(\delta =0\))

$$\begin{aligned} \begin{array}{l@{\quad }l@{\quad }l} \left\| \mathcal {L}_{\chi ^{(j)}} E_{s-j-1} H_m \right\| _{(1-d_{s})(\rho ,\sigma )} &{} \le &{} 8 r^2 (e d^2 \rho \sigma )^{-1} \left\| \chi ^{(j)} \right\| _{(1-d_{j+1/2})(\rho ,\sigma )}\\ &{}&{}\quad \left\| E_{l-j} H_0 \right\| _{(1-d_{l-j+1/2})(\rho ,\sigma )}\\ &{} \le &{} \Gamma \mathcal {F} \beta _j \gamma _{l-j} e^{-a_{l+1}t} \end{array} \end{aligned}$$
(44)

where the property \(a_{j+1} + a_{l-j+1} \ge a_{l+1}\) has been used. Recalling (24), we have that (43b) holds also for \(l=s-1\), where

$$\begin{aligned} {\tilde{\theta }}_{l,m} = \frac{\Gamma }{l} \sum _{j=1}^l j \beta _j {\tilde{\theta }}_{l-j,m} \text{. } \end{aligned}$$
(45)

Furthermore, it is easy to show from the latter that \({\tilde{\theta }}_{l,m}=h^{m-1} {\tilde{\theta }}_{l,1}\) in such a way, defined \(\theta _l:={\tilde{\theta }}_{l,1}\) one gets \({\tilde{\theta }}_{l,m}=h^{m-1}\theta _l\), and then the second of (41), provided \(\theta _0=1\). In conclusion, by using (34), and the second of (41) in the definition of \(\Psi _s\) as in (28), we get that (43a) is satisfied if \(\beta _s\) is defined as in the first of (41). Bound (40) follows from Lemma 7.1.

Proposition 7.4

The sequence \(\beta _s\) defined by (41) satisfies

$$\begin{aligned} \beta _s \le \tau ^{s-1}/s, \end{aligned}$$
(46)

for \(s=1,\ldots ,r\), if

$$\begin{aligned} \tau :=eh, \quad \Gamma \le h/(2r^2). \end{aligned}$$
(47)

Proof

The property (46) is trivially true for \(s=1\), hence let us suppose it for \(j=1,\ldots ,s-1\) and proceed by induction with \(\tau \) to be determined. Define \({\tilde{\theta }}_l:=\theta _l(\beta _j)|_{\beta _j=\tau ^{j-1}/j}\), then \({\hat{\theta }}_l:={\tilde{\theta }}_l/l\), obtaining \( {\hat{\theta }}_l=\Gamma \sum _{j=1}^{l} \tau ^{j-1} {\hat{\theta }}_{l-j} \). Clearly \(\theta _l \le {\tilde{\theta }}_l \le {\hat{\theta }}_l/l\), furthermore \(\theta _0 = {\tilde{\theta }}_0={\hat{\theta }}_0=1\). Hence, by Lemma 7.2 we have

$$\begin{aligned} \theta _l \le \Gamma \Delta ^{l-1}/l. \end{aligned}$$
(48)

Now choose \(\tau ,\Gamma \) as in (47). By using (34) and (48) in the first of (41) one gets that (46) is satisfied simply by checking that the inequality

$$\begin{aligned} y(s):=s+\frac{(s-1)}{2 r^2} \left( e+ \frac{1}{2 r^2} \right) ^{s-1} \le e^{s-1} \end{aligned}$$
(49)

holds true for allFootnote 13 \(s=1,\ldots ,r\). \(\square \)

11.3 Estimates on the coordinates transformation

From now on we shall suppose that h and \(\varepsilon \) are chosen in such a way

$$\begin{aligned} 8 e h&\le 1 \end{aligned}$$
(50a)
$$\begin{aligned} 2 r^ 2 \Gamma&\le \sqrt{\varepsilon } h \end{aligned}$$
(50b)

In particular, by definition and by (47), this immediately implies that

$$\begin{aligned} 4 \Delta \le 1 \end{aligned}$$
(51)

As in Giorgilli (2003) it is used that, despite the generating sequence is finite, one can use the bound obtained from 7.3

$$\begin{aligned} \left\| \chi ^{(s)} \right\| _{(1-d_{})(\rho ,\sigma )}\le (4a)^{-1} C_r \mathcal {F} \beta _s e^{-a_{r+1}t}, \end{aligned}$$
(52)

with \(\beta _s\) satisfying (46) for all s, as it would be, trivially, \(\beta _{>r}=0\).

Proposition 7.5

Define \((I^{(r)},\varphi ^{(r)},\eta ^{(r)}):=T_{\chi ^{[r]}}(I,\varphi ,\eta )\). Then the following estimates hold

$$\begin{aligned}&\left\| I-I^{(r)} \right\| _{(1-d_{})(\rho ,\sigma )}, \left\| \eta -\eta ^{(r)} \right\| _{(1-d_{})(\rho ,\sigma )} \le \frac{d \rho }{8}e^{-a_{r+1}t},\nonumber \\&\quad \left\| \varphi -\varphi ^{(r)} \right\| _{(1-d_{})(\rho ,\sigma )} \le \frac{d \sigma }{8}e^{-a_{r+1}t} \text{. } \end{aligned}$$
(53)

Proof

Let us start from the variable I. Firstly, note that \(\left\| I-T_{\chi ^{[r]}}I \right\| _{(1-d_{})(\rho ,\sigma )}\le \sum _{s \ge 1} \left\| E_s I \right\| _{(1-d_{})(\rho ,\sigma )}\). In addition

$$\begin{aligned}&\left\| E_1 I \right\| _{(1-d_{2})(\rho ,\sigma )} = \left\| \partial _{\varphi }\chi ^{(1)} \right\| _{(1-d_{2})(\rho ,\sigma )} \le 2 n r (e d \sigma )^{-1} \left\| \chi ^{(1)} \right\| _{(1-d_{3/2})(\rho ,\sigma )}\\&\quad \le D_{\sigma } \mathcal {F} \exp (-a_{r+1}t) \text{, } \end{aligned}$$

with \(D_{\sigma }:=n r C_r/(2 d \sigma a)\) by Prop. 7.3. Hence suppose \(\left\| E_l I \right\| _{(1-d_{l+1})(\rho ,\sigma )} \le \mathcal {F} u_l \exp (-a_{r+1}t)\) for all \(l=1,\ldots ,s-1\) with \(u_1=D_{\sigma }\) and proceed by induction.

The bound of \(E_l I\) can be treated in the same way of (43b) with the difference that in this case the term \(\mathcal {L}_{\chi ^{(l)}} I\) appearing in \(E_l I\) needs to be bounded separately by using (40) and a Cauchy estimate. This leads to \(u_l=\beta _l D_{\sigma }+ \Gamma /l \sum _{j=1}^{l-1} j \beta _j u_{l-j}\). By using the same procedure used in the proof of Proposition 7.4 for \(\theta _l\) one gets \(u_l \le (D_\sigma /l) \Delta ^{l-1}\). The required bound easily follows as \( \mathcal {F} \sum _{s \ge 1 } u_s \le 2 \mathcal {F} D_{\sigma } \le \Gamma d \rho \le \sqrt{\varepsilon } d \rho /8\), where the second inequality follows from (51) and the last one from (50b) then from (50a). The procedure for the variables \(\varphi \) and \(\eta \) is similar. The analyticity of the transformation \(\mathcal {N}_r:=T_{\chi ^{[r]}}^{-1}\) easily follows from the bounds (53) and the invertibility of the Lie transform operator, see Giorgilli (2003). \(\square \)

11.4 Bound on the remainder

Proposition 7.6

Define \(A:=10 \tilde{\mathcal {F}}\) then for all \(r \ge 1\)

$$\begin{aligned} \left\| \mathcal {R}^{(r+1)} \right\| _{(1-2d)(\rho ,\sigma )} \le \varepsilon A e^{-(r+a_{r+1}t)} \text{. } \end{aligned}$$
(54)

Proof

Define \((\rho ^{\prime },\sigma ^{\prime }):=(1-d)(\rho ,\sigma )\). Now recall (27) and suppose by induction, for all \(l=1,\ldots ,s-1, m=0,\ldots ,s-2\) with \(s \in {\mathbb {N}}\)

$$\begin{aligned}&\left\| E_l H_0 \right\| _{(1-(l/s)d)(\rho ^{\prime },\sigma ^{\prime })} \le \mathcal {F} \epsilon _l \exp (-a_{r+1}t), \nonumber \\&\left\| E_m H_n \right\| _{(1-(m/s)d)(\rho ^{\prime },\sigma ^{\prime })} \le \mathcal {F} \zeta _{m,n} \exp (-a_{r+1}t). \end{aligned}$$
(55)

Indeed one can set \(\zeta _{0,n}=h^{n-1}\) and \(\epsilon _1=\beta _1=1\) as \(\mathcal {L}_{\chi ^{(1)}}H_0=-\Psi _1\) by (28). We stress that, despite based on the same computations, the argument is conceptually different from the previous estimates as \(s \in (r,+\infty )\) and the use of \(\delta \) in (12) plays here a key role. More precisely, use Proposition 4.1 with \(G=\chi ^{(j)}\) and \(F=E_{s-j}H_0\) hence \(d^{\prime \prime }=0\) then \({\hat{d}}=d^{\prime }=\delta =d(s-j)/s\) from which \(\tilde{d}=(j/s)d\). This leads to \(\left\| \mathcal {L}_{\chi ^{(j)}} E_{s-j} H_0 \right\| _{(1-d)(\rho ^{\prime },\sigma ^{\prime })} \le \Gamma (s/j)\beta _j \epsilon _{s-j} \exp (-a_{r+1}t)\), implyingFootnote 14 that the first of (55) holds for \(l=s\) provided \(\epsilon _s=\beta _s+\Gamma \sum _{j=1}^{s-1} \beta _j \epsilon _{s-j} = \Delta ^{s-1}\), the latter by Lemma 7.2. This implies \(\left\| \sum _{l=1}^s E_{s-l}H_l \right\| _{(1-d)(\rho ^{\prime },\sigma ^{\prime })} \le \mathcal {F} (s+1) \Delta ^{s-1} \exp (-a_{r+1}t)\) by using (34) and the trivial bound \(h \le \Delta \). Similarly one finds \(\zeta _{s,n}=h^{n-1}\Delta ^{s-1}\), hence

$$\begin{aligned} \left( \mathcal {F} e^{-a_{r+1}t}\right) ^{-1} \mathcal {R}^{(r+1)} \le \sum _{s > r} (2+s) \Delta ^{s-1} = \Delta ^r \left( \frac{r+3}{1-\Delta }+ \frac{1}{1-\Delta ^2} \right) \le 2 (r+4) \Delta ^r , \end{aligned}$$

by (51). Noticing that \(\mathcal {D}_{(1-2d)(\rho ,\sigma )} \subset \mathcal {D}_{(1-d)^2(\rho ,\sigma )}\), the bound (54) easily follows from (51) and from the simple inequality \((r+4)e^r \le 5 (4^r)\). \(\square \)

11.5 Choice of the parameters

Let us discuss a possible choice of the parameters in such a way the convergence conditions are satisfied. More precisely by (35), condition (50a) holds if \(N=\lceil 2 \sigma ^{-1}(1+3 \log 2) \rceil \), where \(\lceil \cdot \rceil \) denotes the rounding to the greater integer. This implies that \( h \ge 1/(16 e) \), hence (50b) holds if \(2^5 e r^2 \Gamma \le \sqrt{\varepsilon }\). Hence, recalling (32) and (42), this condition is achieved by choosing (see also Giorgilli and Galgani 1985)

$$\begin{aligned} r:=\left\lfloor \left( \frac{\varepsilon _a^*}{\varepsilon } \right) ^{\frac{1}{2 \gamma }} \right\rfloor , \quad \sqrt{\varepsilon _a^*}:=\frac{a^2 d^{n+2} \rho _H \sigma ^2}{2^{2n+19} e n C_h \tilde{\mathcal {F}} }, \end{aligned}$$
(56)

whereFootnote 15 \(\gamma =5+n\) and \(\lfloor \cdot \rfloor \) denotes the rounding to the lower integer. The condition \(\varepsilon \le \varepsilon _a^*\), as in the statement of Theorem 2.3, clearly ensures that \( r \ge 1\). The final value of \(\rho \) is determined with (33).

Let us write the usual bound \(|I(t)-I(0)| \le |I(t)-I^{(r)}(t)|+|I^{(r)}(t)-I^{(r)} (0)|+|I^{(r)}(0)-I(0)|\). The first and third term of the r.h.s. are bounded by \(\sqrt{\varepsilon }d \rho /8\) by (53). As for the second one, from the equations of motion \(\dot{I}^{(r)}=-\partial _{\varphi }H^{(r)}=-\partial _{\varphi }\mathcal {R}^{(r+1)}\), furthermore \(\left\| \partial _{\varphi } \mathcal {R}^{(r+1)} \right\| _{(1-2d)(\rho ,\sigma )} \le \varepsilon A (e d \sigma )^{-1} \exp (-(r+a_{r+1}t))\) by a Cauchy estimate and by (54). Hence

$$\begin{aligned} \left| I^{(r)}(t)-I^{(r)} (0)\right| \le \varepsilon A (e d \sigma )^{-1} e^{-r} \int _0^t e^{-a_{r+1}s}ds \le \varepsilon A (a d e \sigma )^{-1} (2/e)^{r}, \end{aligned}$$
(57)

as \(a_{r+1}=a(2r-1)(2r-2)\ldots (r)/(2r)^r>a2^{-r}\).

Remark 7.7

The bound (57) is the key difference with the standard Nekhoroshev theorem, despite a normal form of finite order. The remainder, which is bounded by a constant in the classical Nekhoroshev estimate and then produces a linearly growing bound for the quantity \(|I^{(r)}(t)-I^{(r)} (0)|\), is now summable over \({\mathbb {R}}^+\). Hence, a restriction to exponentially large times is no longer necessary.

It is immediate from (57) that for all \(\varepsilon \le \varepsilon _a^*\) one has \(|I^{(r)}(t)-I^{(r)} (0)|\le 2 \varepsilon _a^* A (a d e^2 \sigma )^{-1}\) which is clearly smaller than \(\sqrt{\varepsilon } d \rho /4\) by (56). Hence \(|I(t)-I(0)| \le \sqrt{\varepsilon } d \rho /2\).