1 Introduction

The theory of normal forms of maps plays an important role in dynamics and goes back to the works of Poincare and Sternberg [20]. Normal forms at fixed points and invariant manifolds have been extensively studied [1]. More recently, the theory of non-stationary linearizations and, more generally, normal forms was developed in the context of diffeomorphisms with contracting foliation [5, 8, 9, 12, 19]. It proved useful in the study of smooth dynamics and rigidity for dynamical systems and group actions exhibiting various forms of hyperbolicity, see for example [2, 3, 6, 7, 11, 1315].

Let f be a \(C^\infty \) diffeomorphism of a compact smooth manifold \(\mathcal {M}\), and let W be a continuous invariant foliation with \(C^\infty \) leaves which is contracted by f, that is \(\Vert Df|_{TW}\Vert <1\) in some Riemannian metric. The goal of the normal form theory in this setting is to find a family of diffeomorphisms \({\mathcal {H}}_x: W_x \rightarrow T_xW\) such that the maps

$$\begin{aligned} \tilde{f}_x ={\mathcal {H}}_{fx} \circ f \circ {\mathcal {H}}_x ^{-1}: \;T_x W \rightarrow T_{fx}W \end{aligned}$$
(1.1)

are as simple as possible, for example linear maps or polynomial maps in a finite dimensional Lie group. The maps \({\mathcal {H}}_x\) should depend well on x and ideally form a good atlas on each leaf of W. For technical reasons, it is often easier to operate with marked leaves, which can be identified with the tangent spaces \(T_xW\), producing an extension of the original system to TW. The results in [8, 9, 12] are stated in such a setting, but most applications come from the foliated systems.

Non-stationary linearization, i.e. existence of \({\mathcal {H}}_x\) so that \(\tilde{f}_x\) in (1.1) are linear, was first established by Katok and Lewis for one-dimensional W [12]. For higher-dimensional foliations under a uniform 1/2 pinching assumption, non-stationary linearization follows from results of Guysinsky and Katok [9] or from results of Feres in [4], where a differential geometric point of view was developed. Under a weaker assumption of pointwise 1/2 pinching, it was obtained in [19] and additional properties were established in [15]. These results are summarized below.

Non-stationary linearization Suppose that \(\Vert Df|_{TW}\Vert <1\), and there exist \(C>0\) and \(\gamma <1\) such that for all \(x\in \mathcal {M}\) and \(n\in \mathbb N\),

$$\begin{aligned} \left\| \,\left( Df^n|_{T_xW}\right) ^{-1}\,\right\| \cdot \left\| \,Df^n|_{T_xW}\,\right\| ^2 \le C\gamma ^n. \end{aligned}$$

Then for every \(x\in \mathcal {M}\) there exists a \(C^\infty \) diffeomorphism \({\mathcal {H}}_x: W_x \rightarrow T_xW\) such that

  1. (i)

    \({\mathcal {H}}_{fx}\circ f \circ {\mathcal {H}}_x^{-1}=Df|_{T_xW},\)

  2. (ii)

    \({\mathcal {H}}_x(x)=0\) and \(D_x{\mathcal {H}}_x\) is the identity map,

  3. (iii)

    \({\mathcal {H}}_x\) depends continuously on \(x\in \mathcal {M}\) in \(C^\infty \) topology,

  4. (iv)

    Such a family \({\mathcal {H}}_x\) is unique and depends smoothly on x along the leaves of W,

  5. (v)

    The map \({\mathcal {H}}_y \circ {\mathcal {H}}_x^{-1}: T_xW\rightarrow T_yW\) is affine for any \(x\in \mathcal {M}\) and \(y\in W_x\). Hence the non-stationary linearization \({\mathcal {H}}\) defines affine structures on the leaves of W.

For higher-dimensional W without 1/2 pinching there may be no smooth non-stationary linearization, and so a polynomial normal form is sought. Under the narrow band spectrum assumption, such forms were introduced in [8, 9]. The assumption is satisfied, for example, by perturbations of algebraic systems. It ensures that the polynomial maps involved belong to a finite dimensional Lie group, which is important for applications. In Refs. [8, 9] Katok and Guysinsky proved existence of normal form coordinates \(\{ {\mathcal {H}}_x\}\) as well as a centralizer theorem. This was sufficient for some applications, such as local rigidity of higher rank actions [13]. However, smooth dependence of \({\mathcal {H}}_x\) along the leaves of W was obtained under a strong extra assumption that the splitting of TW into the spectral subspaces is smooth along W, rather than just continuous. This assumption is not typically satisfied by perturbations of algebraic systems. There were also no results on the coherence of maps \({\mathcal {H}}_x\) on the leaves of W. A geometric point of view on normal forms was developed by Feres in [5], where he established existence of an f-invariant infinitesimal structure. If this structure, a certain generalized connection, is smooth then it can be integrated to recover normal forms. The smoothness, however, again relied on the smoothness of the spectral splitting.

In this paper we overcome these difficulties and obtain a system of normal form coordinates \(\{ {\mathcal {H}}_x\}\) which are smooth along the leaves of W and on each leaf give a coherent atlas with transition maps in a finite dimensional Lie group. This gives homogeneous space structures on the leaves of W which are invariant under f. Our results hold for any narrow band spectrum system without any extra assumptions. In particular, they apply to perturbations of algebraic systems. We also note that, in contrast to the case of the non-stationary linearization, the system \(\{ {\mathcal {H}}_x\}\) is not unique. In fact, the construction in [8, 9] produces \({\mathcal {H}}_x\) which may not be smooth if the spectral splitting in not. We give a modified construction that allows us to obtain the desired properties of \({\mathcal {H}}_x\).

2 Definitions and results

Let \(\mathcal {M}\) be a smooth compact connected manifold and let f be a \(C^\infty \) diffeomorphism of \(\mathcal {M}\). Let W be an f-invariant topological foliation of \(\mathcal {M}\) with uniformly \(C^\infty \) leaves. The latter means that all leaves are \(C^\infty \) submanifolds and that all their derivatives are also continuous transversely to the leaves. Slightly more generally, we can consider a homeomorphism f which is uniformly smooth along the leaves of W. We assume that f contracts W, i.e. \(\Vert Df |_{TW}\Vert <1\) in some metric.

Let \({\mathcal {E}}= TW\) be the tangent bundle of the foliation W. We denote by F the automorphism of \({\mathcal {E}}\) given by the derivative of f:

$$\begin{aligned} F_x = Df|_{T_xW}: \,{\mathcal {E}}_x \rightarrow {\mathcal {E}}_{fx}. \end{aligned}$$

F induces a bounded linear operator \(F^*\) on the space of continuous sections of \({\mathcal {E}}\) by \(F^*v(x)=F(v(f^{-1}x))\). The spectrum of complexification of \(F^*\) is called Mather spectrum of \({F}\). Under a mild assumption that non-periodic points of f are dense in \(\mathcal {M}\), Mather spectrum consists of finitely many closed annuli centered at 0, see e.g. [17].

Definition 2.1

We say that \({F}\) has narrow band spectrum if its Mather spectrum is contained in a finite union of closed annuli \(A_i, i=1, \ldots , l\), bounded by circles of radii \(e^{{\lambda }_i}\) and \(e^{{\mu }_i}\), where the numbers \({\lambda }_1 \le {\mu }_1 <\cdots <{\lambda }_l \le {\mu }_l<0\) satisfy

$$\begin{aligned} {\mu }_i + {\mu }_l < {\lambda }_i \quad \text {for }\quad i=1, \ldots , l. \end{aligned}$$
(2.1)

This condition can be written as \({\mu }_i - {\lambda }_i < - {\mu }_l\) for \(i=1, \ldots , l,\,\) so it means that the length of each of the intervals \([ {\lambda }_i , {\mu }_i ]\) is smaller than that of \([{\mu }_l, 0]\). When there is only one spectral interval, (2.1) is the uniform 1/2 pinching condition which yields non-stationary linearizations and was used in [4] to construct an invariant affine connection.

For the given spectral intervals \(\{[\lambda _i, \mu _i]\}\), the bundle \({\mathcal {E}}\) splits into a direct sum

$$\begin{aligned} {\mathcal {E}}={\mathcal {E}}^{1} \oplus \cdots \oplus {\mathcal {E}}^{l} \end{aligned}$$
(2.2)

of continuous F-invariant sub-bundles so that Mather spectrum of \(F|_{{\mathcal {E}}^i}\) is contained in the annulus \(A_i\). This can be expressed using a convenient metric [9]: for each \(i=1,\ldots ,l\) and each \(\varepsilon >0\) there exists a continuous metric \(\Vert .\Vert _{x,\varepsilon }\) on \({\mathcal {E}}^i\) such that

$$\begin{aligned} e^{{\lambda }_i -\varepsilon } \Vert t \Vert _{x,\varepsilon } \le \Vert {F}_x (t)\Vert _{fx,\varepsilon } \le e^{{\mu }_i +\varepsilon } \Vert t\Vert _{x,\varepsilon } \quad \text {for every } \quad t \in {\mathcal {E}}^i_x. \end{aligned}$$
(2.3)

Definition 2.2

A sub-resonance relation for \(\,({\lambda },{\mu })=({\lambda }_1, \ldots , {\lambda }_l,\, {\mu }_1, \ldots , {\mu }_l )\,\) with \({\lambda }_1 \le {\mu }_1 <\cdots <{\lambda }_l \le {\mu }_l<0\,\) is a relation of the form

$$\begin{aligned} {\lambda }_i\le \sum s_j {\mu }_j, \quad \text {where } s_1,\ldots ,s_l\, \hbox {are non-negative integers.} \end{aligned}$$
(2.4)

Clearly, \(s_j=0\) for \(j<i\), and \(\sum s_j \le \lambda _1/ \mu _l\). The narrow band condition (2.1) implies that if \(s_i\ne 0\), then \(s_i=1\) and \(s_j=0\) for \(j>i\), and hence all sub-resonance relations are of the form \( {\lambda }_i \le {\mu }_i \,\text { or } \,{\lambda }_i\le \sum _{j \ge i+1} s_j {\mu }_j. \)

For any vector spaces E and \(\bar{E}\) we say that a map \(P:E \rightarrow \bar{E}\) is polynomial if for some, and hence every, bases of E and \(\bar{E}\) each component of P is a polynomial. A polynomial map P is homogeneous of degree n if \(P(a v)=a^n P(v)\) for all \(v \in E\) and \(a \in \mathbb {R}\). More generally, for a given splitting \(E=E^{1} \oplus \cdots \oplus E^{l}\) we say that \(P:E \rightarrow \bar{E}\) has homogeneous type \(s= (s_1, \ldots , s_l)\) if for any real numbers \(a_1, \ldots , a_l\) and vectors \(t_j\in E^j, j=1,\ldots , l,\) we have

$$\begin{aligned} P(a_1 t_1+ \cdots + a_l t_l)= a_1^{s_1}\cdot \cdots \cdot a_l^{s_l} \, P( t_1+ \cdots + t_l). \end{aligned}$$
(2.5)

Any polynomial map can be written uniquely as a linear combination of terms of specific homogeneous types.

Definition 2.3

Suppose \(E=E^{1} \oplus \cdots \oplus E^{l}, \bar{E}=\bar{E}^{1} \oplus \cdots \oplus \bar{E}^{l}\) and \(P : E \rightarrow \bar{E}\) is a polynomial map. We split P into components \(P_i : E \rightarrow \bar{E}^i, P=(P_1,\ldots ,P_l)\). We say that P is of \(({\lambda },{\mu })\) sub-resonance type if each component \(P_i\) has only terms of homogeneous types \(s= (s_1, \ldots , s_l)\) satisfying sub-resonance relations \({\lambda }_i\le \sum s_j {\mu }_j\).

We denote by \({\mathcal {S}}^{{\lambda },{\mu }}(E, \bar{E})\) the space of all polynomials \(E\rightarrow \bar{E}\) of \(({\lambda },{\mu })\) sub-resonance type.

It follows from the definition that polynomials in \({\mathcal {S}}^{{\lambda },{\mu }}(E, \bar{E})\) have degree at most

$$\begin{aligned} d=d({\lambda },{\mu })= \lfloor {\lambda }_1/{\mu }_l \rfloor . \end{aligned}$$
(2.6)

It was shown in [9] that in the case of point spectrum, that is \({\lambda }_i = {\mu }_i\) for \(i=1, \ldots ,l\), the elements of \({\mathcal {S}}^{{\lambda },{\lambda }}(E,E)\) with \(P(0)=0\) and invertible derivative at the origin form a group \(G^{{\lambda },{\lambda }}(E)\) with respect to composition. More generally, if \(({\lambda },{\mu })\) satisfies the narrow band condition, they generate (under composition) a finite-dimensional Lie group which we denote by \(G^{{\lambda },{\mu }}(E)\). The maps in \(G^{{\lambda },{\mu }}\) are called sub-resonance generated and can be described by adding finitely many relations to the set of sub-resonance ones. In fact, \(G^{{\lambda },{\mu }}(E)\) is contained in \(G^{{\lambda }',{\lambda }'}(E)\) for a certain \({\lambda }'\), explicitly written in terms of \(({\lambda },{\mu })\) [9]. This larger group may be used in place of \(G^{{\lambda },{\mu }}(E)\), for simplicity, in all arguments.

We return to our setting with fixed \(F, {\mathcal {E}}\), and \(({\lambda },{\mu })\). We denote by \({\mathcal {S}}_{x,y}= S^{{\lambda },{\mu }}({\mathcal {E}}_x,{\mathcal {E}}_y)\) the space of sub-resonance polynomials and by \(G_x= G^{{\lambda },{\mu }}({\mathcal {E}}_x)\) the group of sub-resonance generated polynomial maps. For any x and y in \(\mathcal {M}\), any invertible linear map \(A:{\mathcal {E}}_y \rightarrow {\mathcal {E}}_x\) that respects the splitting induces an isomorphism between the groups \(G_x\) and \(G_y\). The set \(G_{x,y}\) of invertible sub-resonance generated polynomial maps from \({\mathcal {E}}_x\) to \({\mathcal {E}}_y\) is also naturally defined either by specifying homogeneity types using relations or simply by identifying \({\mathcal {E}}_x\) with \({\mathcal {E}}_y\): \(G_{x,y} =\{ P: {\mathcal {E}}_x \rightarrow {\mathcal {E}}_y \; |\; A\circ P \in G_x \}\) for any A as above. Of course, if the bundle \({\mathcal {E}}\) and the sub-bundles \({\mathcal {E}}^i\) are trivial, then all \(G_x\) and \(G_{x,y}\) can be identified with a single group \(G^{{\lambda },{\mu }}(E)\).

Theorem 2.4

Let \(\mathcal {M}\) be a smooth compact connected manifold and let f be a \(C^\infty \) diffeomorphism of \(\,\mathcal {M}\). Let W be an f-invariant topological foliation of \(\mathcal {M}\) with uniformly \(C^\infty \) leaves. Suppose that W is contracted by f, i.e. \(\Vert Df |_{TW}\Vert <1\) for some metric, and that \(Df |_{TW}\) has narrow band spectrum, see Definition 2.1.

Then there exist a family \(\{ {\mathcal {H}}_x \} _{x\in \mathcal {M}}\) of \(C^\infty \) diffeomorphisms

$$\begin{aligned} {\mathcal {H}}_x: W_x \rightarrow {\mathcal {E}}_x =T_xW \quad \text {such that } \end{aligned}$$
  1. (i)

    \({\mathcal {P}}_x ={\mathcal {H}}_{fx} \circ f \circ {\mathcal {H}}_x ^{-1}:{\mathcal {E}}_{x} \rightarrow {\mathcal {E}}_{fx}\) is a polynomial map of sub-resonance type for each \(x \in \mathcal {M}\),

  2. (ii)

    \({\mathcal {H}}_x(x)=0\) and \(D_x{\mathcal {H}}_x \) is the identity map for each \(x \in \mathcal {M}\),

  3. (iii)

    \({\mathcal {H}}_x\) depends continuously on \(x \in \mathcal {M}\) in \(C^\infty \) topology and it depends smoothly on x along the leaves of W,

  4. (iv)

    \({\mathcal {H}}_y \circ {\mathcal {H}}_x^{-1}: {\mathcal {E}}_x \rightarrow {\mathcal {E}}_y\) is a sub-resonance generated polynomial for each \(x \in \mathcal {M}\) and each \(y \in W_x\).

Another way to interpret (iv) is to view \({\mathcal {H}}_x\) as a coordinate chart on \(W_x\), identifying it with \({\mathcal {E}}_x\), see Sect. 3.2 for more details. In this coordinate chart, (iv) yields that all transition maps \({\mathcal {H}}_y \circ {\mathcal {H}}_x^{-1}\) for \(y\in W_x\) are in \(\bar{G}_x\), the group generated by \(G_x\) and the translations of \({\mathcal {E}}_x\). Thus \({\mathcal {H}}_x\) gives the leaf a structure of homogeneous space \(W_x\sim \bar{G}_x/G_x\), which is consistent with other coordinate charts \({\mathcal {H}}_y\) for \(y\in W_x\) and is preserved by the normal form \({\mathcal {P}}_x\) according to (i).

Remark 2.5

An important and useful feature of normal forms for contractions was established in [9]. Let g be a homeomorphism of \(\mathcal {M}\) which commutes with f, preserves W, and is smooth along the leaves of W. Then the maps \({\mathcal {H}}_x\) bring g to a normal form as well, i.e. the map \(Q_x={\mathcal {H}}_{fx} \circ g \circ {\mathcal {H}}_x ^{-1}\) is a sub-resonance generated polynomial for each \(x \in \mathcal {M}\).

3 Proof of theorem 2.4

3.1 Proof of (i), (ii), (iii)

Let us fix a small tubular neighborhood U of the zero section in TW such that for each x in \(\mathcal {M}\), we can identify the open set \(U_x=U \cap T_xW\) in \(T_xW\) with a neighborhood of x in the leaf \(W_x\) using the exponential map. The size of this neighborhood can be chosen the same for all \(x\in \mathcal {M}\). Thus we can view the diffeomorphism f as a family of maps \({\mathcal {F}}_x=f|_{U_x}\) from \(U_x \subset T_xW\) to \(U_{fx} \subset T_{fx}W\). Our first goal is to obtain a formal power series for \({\mathcal {H}}_x\).

For each map \({\mathcal {F}}_x : {\mathcal {E}}_{x} \rightarrow {\mathcal {E}}_{fx}\) we consider its formal power series at \(t=0\):

$$\begin{aligned} {\mathcal {F}}_x(t)= \sum _{n=1}^\infty {F}^{(n)}_x(t). \end{aligned}$$
(3.1)

As a function of t for a fixed \(x, {F}^{(n)}_x(t) :{\mathcal {E}}_{x} \rightarrow {\mathcal {E}}_{fx}\) is a vector valued homogeneous polynomial map of degree n (that is, in any bases of \({\mathcal {E}}_x\) and \({\mathcal {E}}_{fx}\), each coordinate in \({\mathcal {E}}_{fx}\) is a homogeneous polynomial of degree n of the coordinates in \({\mathcal {E}}_x\)).

First we construct the formal power series at \(t=0\) for the desired coordinate change \({\mathcal {H}}_x(t)\) as well as the finite power series at \(t=0\) for the resulting polynomial extension \({\mathcal {P}}_x(t)\). We use notations for these series similar to (3.1):

$$\begin{aligned} {\mathcal {H}}_x(t)= \sum _{n=1}^\infty H^{(n)}_x(t)\quad \text {and} \quad {\mathcal {P}}_x(t)= \sum _{n=1}^d {P}^{(n)}_x(t). \end{aligned}$$

We will use notation \({F}_x={F}^{(1)}_x\) for the first derivative of \({\mathcal {F}}\). For \({\mathcal {H}}\) and \({\mathcal {P}}\) we choose

$$\begin{aligned} H^{(1)}_x= \text {Id}: {\mathcal {E}}_x \rightarrow {\mathcal {E}}_x \quad \text {and} \quad {P}^{(1)} _x ={F}_x \quad \text {for all}\quad x \in \mathcal {M}. \end{aligned}$$

We construct the terms \(H^{(n)}_x\) inductively to “eliminate” the part of \({F}^{(n)}_x\) which is not of sub-resonance type. More precisely, we ensure that the terms \({P}^{(n)}_x\) determined by the conjugacy equation are of sub-resonance type. The base of the induction is the linear terms chosen above. We will obtain \(H^{(n)}_x\) and \({F}^{(n)}_x\) which are continuous in x on \(\mathcal {M}\) and smooth in x along the leaves of W.

Using the notations above we can write the conjugacy equation

$$\begin{aligned} {\mathcal {H}}_{fx} \circ {\mathcal {F}}_{x} ={\mathcal {P}}_{x} \circ {\mathcal {H}}_{x} \end{aligned}$$

as follows:

$$\begin{aligned} \left( \text {Id}+\sum _{k=2}^\infty H^{(k)} _{fx} \right) \circ \left( F_x+\sum _{k=2}^\infty F^{(k)} _{x} \right) =\left( F_{x} + \sum _{k=2}^\infty P^{(k)}_{x} \right) \circ \left( \text {Id}+\sum _{k=2}^\infty H^{(k)}_{x} \right) \end{aligned}$$

for all \(x\in \mathcal {M}\). Considering the terms of degree n, we obtain for \(n=2\)

$$\begin{aligned} F^{(2)}_{x}+H^{(2)}_{fx}\circ F_{x} = F_{x} \circ H^{(2)}_{x}+P^{(2)}_{x}, \end{aligned}$$

and in general for \(n\ge 2\)

$$\begin{aligned} F^{(n)}_{x}\, + \, H^{(n)}_{fx} \circ F(x)\,+ \, \sum H^{(k)}_{fx} \circ F^{(j)}_{x} \,= \,F_{x} \circ H^{(n)}_{x}+ P^{(n)}_{x}+\, \sum P^{(j)}_{x} \circ H^{(k)}_{x}, \end{aligned}$$

where the summations are over all k and j such that \(k j=n\) and \( 1<k,j<n\). We rewrite the equation as

$$\begin{aligned} F_{x}^{-1} \circ P^{(n)}_{x} = - H^{(n)}_{x} + F_{x}^{-1} \circ H^{(n)}_{fx} \circ F_{x} + Q_{x}, \; \end{aligned}$$
(3.2)

where

$$\begin{aligned} Q_{x}= F_{x}^{-1} \left( F^{(n)}_{x} + \sum _{kj=n, \;\, 1<k,j<n} H^{(k)}_{fx} \circ F^{(j)}_{x} - P^{(j)}_{x} \circ H^{(k)}_{x} \right) . \end{aligned}$$

We note that \(Q_x\) is composed only of terms \(H^{(k)}\) and \(P^{(k)}\) with \(1<k<n\), which are already constructed, and terms \(F^{(k)}\) with \(1<k\le n\), which are given. Thus by the inductive assumption \(Q_x\) is continuous in x on \(\mathcal {M}\) and smooth along the leaves of W.

We denote by \({\mathcal {R}}_x^{(n)}\) the vector space of all homogeneous polynomial maps of degree n from \({\mathcal {E}}_x\) to \({\mathcal {E}}_x\). Identifying these polynomial maps with symmetric n-linear maps, one can view this space as \(Sym_n({\mathcal {E}}_x^*) \otimes {\mathcal {E}}_x\), where the former is the space symmetric elements in the \(n^{th}\) tensor power of the dual space of \({\mathcal {E}}_x\), see [5]. Let \({\mathcal {R}}^{(n)}\) be the vector bundle over \(\mathcal {M}\) whose fiber at x is \({\mathcal {R}}_x^{(n)}\). We denote by \({\mathcal {S}}_x^{(n)}\) and \({\mathcal {N}}_x^{(n)}\) the subspaces of \({\mathcal {R}}_x^{(n)}\) consisting of sub-resonance and non sub-resonance polynomials respectively. These subspaces depend continuously on x and thus \({\mathcal {R}}^{(n)}\) splits into the direct sum of the continuous sub-bundles \({\mathcal {S}}^{(n)} \oplus {\mathcal {N}}^{(n)}\).

Our goal is to find a section \(H^{(n)}\) of \({\mathcal {R}}^{(n)}\) so that the right side of (3.2) is a section of \({\mathcal {S}}^{(n)}\), and hence so is \(P^{(n)}\) when defined by this equation. The sub-bundle \({\mathcal {N}}^{(n)}\) in general is only continuous. To construct \(H^{(n)}_{x}\) which depends smoothly on x along the leaves of W we will work with the factor bundle \({\mathcal {R}}^{(n)} / {\mathcal {S}}^{(n)}\) rather than with the splitting \({\mathcal {S}}^{(n)} \oplus {\mathcal {N}}^{(n)}\). The fiber of \({\mathcal {R}}^{(n)} / {\mathcal {S}}^{(n)}\) at x is the vector space \({\mathcal {R}}^{(n)}_x / {\mathcal {S}}^{(n)}_x\) and, as a continuous vector bundle, \({\mathcal {R}}^{(n)} / {\mathcal {S}}^{(n)}\) is isomorphic to \({\mathcal {N}}^{(n)}\) via the natural identification. A local trivialization of \({\mathcal {R}}^{(n)} / {\mathcal {S}}^{(n)}\) can be obtained by fixing locally a constant transversal to \({\mathcal {S}}^{(n)}\) in any trivialization of \({\mathcal {R}}^{(n)}\). We will show in Lemma 3.3 below that the subspace \({\mathcal {S}}^{(n)}_x\) depends smoothly on x along the leaves of W, and hence the bundle \({\mathcal {R}}^{(n)} / {\mathcal {S}}^{(n)}\) and the projection to it from \({\mathcal {R}}^{(n)}\) are smooth along the leaves of W.

Projecting (3.2) to the factor bundle \({\mathcal {R}}^{(n)} / {\mathcal {S}}^{(n)}\), our goal is to solve the equation

$$\begin{aligned} 0 = - \bar{H}^{(n)}_{x} + F_{x}^{-1} \circ \bar{H}^{(n)}_{fx} \circ F_{x} + \bar{Q}_{x}, \; \end{aligned}$$
(3.3)

where \(\bar{H}^{(n)}\) and \( \bar{Q}\) are the projections of \(H^{(n)}\) and Q respectively.

We consider the bundle automorphism \(\Phi : {\mathcal {R}}^{(n)} \rightarrow {\mathcal {R}}^{(n)}\) covering \(f^{-1}: \mathcal {M}\rightarrow \mathcal {M}\) given by the maps \(\Phi _x : {\mathcal {R}}^{(n)} _{fx} \rightarrow {\mathcal {R}}^{(n)} _{x}\)

$$\begin{aligned} \Phi _x (R)= {F}_{x}^{-1} \circ R \circ {F}_{x}. \end{aligned}$$
(3.4)

Since \({F}\) preserves the splitting \({\mathcal {E}}={\mathcal {E}}^1\oplus \cdots \oplus {\mathcal {E}}^l\), it follows from the definition that the sub-bundles \({\mathcal {S}}^{(n)}\) and \({\mathcal {N}}^{(n)}\) are \(\Phi \)-invariant. We denote by \(\bar{\Phi }\) the induced automorphism of \({\mathcal {R}}^{(n)} / {\mathcal {S}}^{(n)}\) and conclude that (3.3) is equivalent to

$$\begin{aligned} \bar{H}^{(n)}_{x} = \tilde{\Phi }_x \left( \bar{H}^{(n)}_{fx} \right) , \quad \text {where }\quad \tilde{\Phi }_x (R)= \bar{\Phi }_x (R) + \bar{Q}_{x}. \end{aligned}$$
(3.5)

Thus a solution of (3.3) is a \(\tilde{\Phi }\)-invariant section of \({\mathcal {R}}^{(n)} / {\mathcal {S}}^{(n)}\). Lemma 3.1 below shows that \(\tilde{\Phi }\) is a contraction and hence has a unique continuous invariant section, which can be explicitly written as

$$\begin{aligned} \bar{H} _x^{(n)} = \sum _{k=0}^\infty \left( F^k_{x}\right) ^{-1} \circ \bar{Q}_{f^k x} \circ F^k_{x}, \quad \text {where }\quad F^k_{x}= F_{f^{k-1}x}\circ \ldots \circ F_{fx} \circ F_{x}. \end{aligned}$$
(3.6)

To show that the maps \(\bar{H}_x^{(n)}\) depend smoothy on x along the leaves of W we use the version of the \(C^r\) Section Theorem of Hirsch, Pugh, and Shub formulated below. We apply the theorem with \(\mathcal B={\mathcal {R}}^{(n)} / {\mathcal {S}}^{(n)}\) and \(\Psi = \tilde{\Phi }\). Since \(f^{-1}\) expands W, we have \(\alpha _x>1\), which yields that \(\bar{H} _x\) is \(C^r\) along the leaves of W for any r. We conclude that (3.6) gives the unique solution of (3.3) which is smooth along the leaves of W. Now we take a lift \(H^{(n)}\) of \(\bar{H}^{(n)}\) to \({\mathcal {R}}^{(n)}\), which is not unique. We take a continuous sub-bundle \(\tilde{{\mathcal {N}}}^{(n)}\) which is smooth along the leaves of W so that subspace \(\tilde{{\mathcal {N}}}_x^{(n)}\) is sufficiently close to \({\mathcal {N}}_x^{(n)}\) and hence is transverse to \({\mathcal {S}}_x^{(n)}\) for each \(x \in \mathcal {M}\). Then we define \(H_x^{(n)}\) as the unique element in \(\tilde{{\mathcal {N}}}_x^{(n)}\) that projects to \(\bar{H}_x^{(n)}\). This lift \(H^{(n)}\) is smooth along the leaves of W and it is the desired section of \({\mathcal {R}}^{(n)}\) so that the right side of (3.2) is section of \({\mathcal {S}}^{(n)}\). Finally, we define \(P^{(n)}_{x}\) from (3.2) and note that it is in \({\mathcal {S}}_{x,fx}^{(n)}\) and smooth along the leaves of W since \(H_x^{(n)}\) is.

\(C^r\) Section Theorem [10]. Let f be a \(C^r, r\ge 1\), diffeomorphism of a compact smooth manifold \(\mathcal {M}\). Let W be an f-invariant topological foliation with uniformly \(C^r\) leaves. Let \(\mathcal B\) be a normed vector bundle over \(\mathcal {M}\) and \(\Psi :\mathcal B\rightarrow \mathcal B\) be an extension of f such that both \(\mathcal B\) and \(\Psi \) are uniformly \(C^r\) along the leaves of W.

Suppose that \(\Psi \) contracts fibers of \(\mathcal B\), i.e. for any \(x \in \mathcal {M}\) and any \(u,w \in \mathcal B(x)\)

$$\begin{aligned} \Vert \Psi (u)-\Psi (w)\Vert _{fx} \le k_x \Vert u-w\Vert _x \quad \text { with }\quad \sup \{\, k_x:\, x \in \mathcal {M}\,\} <1. \end{aligned}$$
(3.7)

Then there exists a unique continuous \(\Psi \)-invariant section of \(\mathcal B\). Moreover, if

$$\begin{aligned} \sup \{ \,k_x \alpha _x ^r:\, x \in \mathcal {M}\,\} <1, \quad \text { where }\quad \alpha _x = \Vert (Df|_{T_xW})^{-1}\Vert , \end{aligned}$$
(3.8)

then the unique invariant section is uniformly \(C^r\) smooth along the leaves of W.

This version of the \(C^r\) Section Theorem (see [16, Theorem 3.7]) summarizes for our context Theorems 3.1, 3.2, and 3.5, and Remarks 1 and 2 after Theorem 3.8 in [10]. In this theorem the smoothness of the invariant section is obtained only along W, so the smoothness of \(\mathcal B\) and \(\Psi \) is required along W only. It also follows from the proof in [10] that the contraction in the base (3.8) needs to be estimated only along W (formally, this can be obtained by applying the theorem with the base manifold \(\mathcal {M}\) considered as the disjoint union of the leaves of W). This has been observed in the study of partially hyperbolic systems, see for example Introduction, Theorem 3.1 and remarks in [18].

Lemma 3.1

The map \(\Phi : {\mathcal {N}}^{(n)} \rightarrow {\mathcal {N}}^{(n)}\) given by (3.4) is a contraction in the sense of (3.7), and hence so is \(\,\tilde{\Phi } : {\mathcal {R}}^{(n)} / {\mathcal {S}}^{(n)} \rightarrow {\mathcal {R}}^{(n)} / {\mathcal {S}}^{(n)}\) given by (3.5).

Proof

We recall that the norm of a homogeneous polynomial map \(R: E \rightarrow \bar{E}\) of degree n is defined as \(\Vert R\Vert =\sup \{\,\Vert R(v)\Vert :\; v\in E, \; \Vert v\Vert =1 \,\}\). If P is a linear or a homogeneous polynomial map \(\tilde{E} \rightarrow E\), then for the norm of the composition we have

$$\begin{aligned} \Vert \, R \circ P \,\Vert \le \Vert R\Vert \cdot \Vert P\Vert ^n. \end{aligned}$$
(3.9)

Suppose that \(E=E^1\oplus \cdots \oplus E^l\) and \(\Vert v\Vert = \max _i \Vert v_i\Vert \) for \(v=(v_1,\ldots , v_l)\). If R is of homogeneous type \(s=(s_1,\ldots , s_l)\) then by (2.5) we get \(\Vert R(v)\Vert \le \Vert R\Vert \, \Vert v_1\Vert ^{s_1} \ldots \Vert v_l\Vert ^{s_l}\).

We will use the norm on \({\mathcal {E}}\) which is the maximum of the norms on \({\mathcal {E}}^i\) used in (2.3): \(\Vert t\Vert =\Vert t\Vert _{x,\varepsilon }= \max _i \Vert t_i\Vert _{x,\varepsilon }\) for any \(t=(t_1,\ldots , t_l)\in {\mathcal {E}}_x\).

Then for a polynomial \(R: {\mathcal {E}}_{f x} \rightarrow {\mathcal {E}}_{f x}^i \) of homogeneous type \(s=(s_1,\ldots , s_l)\) we have

$$\begin{aligned} \Vert \, R \circ F_x \,\Vert \le \Vert R\Vert \cdot \Vert F|_{{\mathcal {E}}^1_x}\Vert ^{s_1}\cdots \Vert F|_{{\mathcal {E}}^l_x}\Vert ^{s_l} \end{aligned}$$
(3.10)

If \(R \in {\mathcal {N}}_{f x}\) then, by definition, we have \({\lambda }_i > \sum s_j {\mu }_j\). We can choose a sufficiently small \(\varepsilon >0\) so that \( {\lambda }_i > \sum s_j {\mu }_j + (n+2)\varepsilon \) for all such relations. It follows from (2.3) that \(\Vert F|_{{\mathcal {E}}^j_x}\Vert \le e^{{\mu }_j +\varepsilon }\) and \(\Vert F|_{{\mathcal {E}}^i_x}^{-1}\Vert \le e^{-{\lambda }_i +\varepsilon }\), so we conclude that for each \(x \in \mathcal {M}\)

$$\begin{aligned} \begin{aligned} \Vert \Phi _x (R) \Vert&\le \left\| F|_{{\mathcal {E}}^i_x}^{-1}\right\| \cdot \Vert R \Vert \cdot \prod _j \left\| F|_{{\mathcal {E}}^j_x}\right\| ^{s_j} \le e^{-{\lambda }_i +\varepsilon } \cdot \Vert R \Vert \cdot \prod _j (e^{{\mu }_j +\varepsilon })^{s_j} \le \\&\le e^{-{\lambda }_i + \sum s_j {\mu }_j +(n+1)\varepsilon } \cdot \Vert R \Vert \le e^{-\varepsilon } \Vert R \Vert . \end{aligned} \end{aligned}$$

Thus \(\Phi \) is a contraction on \({\mathcal {N}}^{(n)}\). The second statement follows since the linear part \(\bar{\Phi }\) of \(\tilde{\Phi }\) is a contraction. Indeed, \(\bar{\Phi }\) corresponds to \(\Phi \) under the identification of \({\mathcal {R}}^{(n)} / {\mathcal {S}}^{(n)}\) and \({\mathcal {N}}^{(n)}\) as continuous vector bundles given by the natural linear isomorphisms between \({\mathcal {R}}^{(n)}_x / {\mathcal {S}}^{(n)}_x\) and \({\mathcal {N}}^{(n)}_x\), which depend continuously on the base point. Hence \(\Vert \bar{\Phi }_x \Vert \le e^{-\varepsilon }\) with respect to a continuous family of norms on the fibers of \({\mathcal {R}}^{(n)} / {\mathcal {S}}^{(n)}\). \(\square \)

Proposition 3.2

The vector space of sub-resonance polynomials \({\mathcal {S}}(E)={\mathcal {S}}^{({\lambda },{\mu })} (E,E)\) for the given \(({\lambda },{\mu })\) and splitting \(E= E^1 \oplus \cdots \oplus E^l\) depends only on the fast flag V:

$$\begin{aligned} E^1=V^1 \subset V^2 \subset \cdots \subset V^l =E, \quad \mathrm {where }\quad { V}^{ i}= { E}^1 \oplus \cdots \oplus { E}^{ i}. \end{aligned}$$

More precisely, \({\mathcal {S}}(E)={\mathcal {S}}(\tilde{E})\) if \(\tilde{E}\) is E equipped with a splitting that generates the same fast flag: \(V^i= \tilde{E}^1 \oplus \cdots \oplus \tilde{E}^i\), for \(i=1,\ldots ,l\).

Proof

Let \(A: E \rightarrow \tilde{E}\) be a linear isomorphism with \(A(E^i)= \tilde{E}^i\), for \(i=1,\ldots ,l\). Then \(R \in {\mathcal {S}}(E)\) if and only if \(\tilde{R} = A \circ R \circ A^{-1} \in {\mathcal {S}}(\tilde{E})\). Also, A and \(A^{-1}\) are block triangular for the splitting \(E=E^1 \oplus \cdots \oplus E^l\), that is their matrices are block triangular in any basis adapted to this splitting or, equivalently, they belong to \({\mathcal {S}}(E)\). To complete the proof we need to show that if \(R \in {\mathcal {S}}(E)\) then so are \(A \circ R\) and \(R \circ A^{-1}\).

By splitting R into components and homogeneity types, it suffices to check this for \(R: E \rightarrow E^i\) corresponding to a sub-resonance relation \({\lambda }_i \le \sum s_j {\mu }_j\), whose combinatorial type we dente by \((i; s_1,\ldots , s_l)\). We call a relation \((m; s_1',\ldots , s_l')\) with \(|s'|=|s|\) subordinate to \((i; s_1,\ldots , s_l)\) if \(m\le i\) and \(\sum _{j \le k} s_j' \le \sum _{j \le k} s_j\) for each k (this gives a partial order on relations). It follows that \({\lambda }_m \le \sum s_j' {\mu }_j\) so that all subordinate relations are also of sub-resonance type. It is easy to see that the homogeneity type of any term in \(A \circ R\) or \(R \circ A^{-1}\) corresponds to a relation subordinate to \((i; s_1,\ldots , s_l)\), and hence these polynomials are in \({\mathcal {S}}(E)\).

In fact, the subspace spanned by all sub-resonance polynomials corresponding to the relations subordinate to \((i; s_1,\ldots , s_l)\) can be explicitly described in terms of the flag V only: it consists of all polynomial maps \(R: {\mathcal {E}}_x \rightarrow V^i_x\) such that for each k the degree of R along the subspace \(V^k_x\) is at most \(\sum _{j \le k} s_j\). This is an alternative way to see that \({\mathcal {S}}(E)\) depends only on the fast flag. \(\square \)

Lemma 3.3

The fibers \({\mathcal {S}}_x ^{(n)}\) of the vector sub-bundle \({\mathcal {S}}^{(n)} \subset {\mathcal {R}}^{(n)}\) depend smoothly on x along the leaves of W.

Proof

We consider the fast flag \({\mathcal {V}}_x\) in \({\mathcal {E}}_x\):

$$\begin{aligned} {\mathcal {E}}_x^1={\mathcal {V}}_x^1 \subset {\mathcal {V}}_x^2 \subset \cdots \subset {\mathcal {V}}_x^l ={\mathcal {E}}_x, \quad \text {where }\quad {\mathcal {V}}^i_x= {\mathcal {E}}^1_x \oplus \cdots \oplus {\mathcal {E}}^i_x \end{aligned}$$
(3.11)

First we note that the subspaces \({\mathcal {V}}^i_x\) depend smoothly on x along W. This is relatively well-known and is proved by an application of the \(C^r\) Section Theorem to a suitable graph transform extension, see [16, Proposition 3.9] for an explicit reference. In fact, in our setting, smoothness of the fast flag along W follows directly from normal forms since \({\mathcal {H}}_x\) maps the fast sub-foliations of W to linear foliations of \({\mathcal {E}}_x\), see Sect. 3.2 for details (smooth dependence of \({\mathcal {H}}_x\) on x is not needed for this argument, so there is no circular reasoning).

It now follows that there exist a splitting \( {\mathcal {E}}_x = \tilde{{\mathcal {E}}}^1_x \oplus \cdots \oplus \tilde{{\mathcal {E}}}^l_x\) which is smooth in x along W and defines the same flag \({\mathcal {V}}_x\). Then Proposition 3.2 implies that \( {\mathcal {S}}_x ^{(n)}=\tilde{{\mathcal {S}}}_x ^{(n)}\), the space of all sub-resonance homogeneous polynomials of degree n for the same \(({\lambda }, {\mu })\) and the new splitting. The latter clearly depends smoothly on x along W. \(\square \)

Thus we have constructed the formal series \({\mathcal {H}}_x(t)= \sum _{n=1}^\infty H^{(n)}_x(t)\) for the coordinate change and the polynomial map \({\mathcal {P}}_x(t)= \sum _{n=1}^d {P}^{(n)}_x(t)\), where \(d=\lfloor {\lambda }_1/{\mu }_l \rfloor \). Now we obtain the actual coordinate change function. We fix \(N > d\) and conjugate \({\mathcal {F}}\) by the polynomial coordinate change \(\bar{{\mathcal {H}}}^N\) given by

$$\begin{aligned} \bar{{\mathcal {H}}}^N_x(t)= \sum _{n=1}^N H^{(n)}_x(t)\quad \text {and denote}\quad \tilde{{\mathcal {F}}}_x(t)=\bar{{\mathcal {H}}}_{fx}^N \circ {\mathcal {F}}_{x} \circ \left( \bar{{\mathcal {H}}}_x^N\right) ^{-1}. \end{aligned}$$
(3.12)

We note that \(\bar{{\mathcal {H}}}^N_x(t)\) is a diffeomorphism on a neighborhood \(\tilde{U}_x \subset U_x\) of \(0\in {\mathcal {E}}_x\) since its differential at 0 is \(\text {Id}\), moreover the size of \(\tilde{U}_x\) can be bounded away from 0 by compactness of \(\mathcal {M}\). By the construction of \(\bar{{\mathcal {H}}}^N\), the maps \(\tilde{{\mathcal {F}}}_x\) and \({\mathcal {P}}_x \) have the same derivatives at \(t=0\) up to order N for each \(x \in \mathcal {M}\). We now look for the coordinate change conjugating \(\tilde{{\mathcal {F}}}_x\) and \({\mathcal {P}}_x\) in the space

$$\begin{aligned} {\mathcal {C}}_x = \left\{ R_x \in C^N\left( \tilde{U}_x,{\mathcal {E}}_x\right) : \;R_x(0)=0, \quad D_0 R_x=\text {Id}, \quad D^{(k)}_0 R_x=0, \quad k=2,\ldots ,N\right\} . \end{aligned}$$

It is a closed affine subspace of \(C^N(\tilde{U}_x,{\mathcal {E}}_x)\) with a standard norm

$$\begin{aligned} \Vert R \Vert _N=\max \left\{ \,\Vert D^{(k)}_t R \Vert : \; t \in U_x, \quad 0\le k \le N\right\} . \end{aligned}$$

Each element in \({\mathcal {C}}_x\) is a diffeomorphism on a neighborhood of \(0\in {\mathcal {E}}_x\). We will fix \(\delta <1\) and then choose \(\tilde{U}_x\) to be convex and small enough so that for any \(R_1, R_2\) in \({\mathcal {C}}_x\) and \(R = R_1 -R_2\) we have

$$\begin{aligned} \Vert D^{(k)} R \Vert _{0}\le \delta \, \Vert D^{(N)} R \Vert _{0} \quad \text {for }\quad 0 \le k<N \quad \text {and hence} \quad \Vert R \Vert _{N} = \Vert R^{(N)} \Vert _{0}. \end{aligned}$$
(3.13)

This is possible since \(D^{(k)}_0 R=0\) for \(0 \le k\le N\) and hence for any \(t \in \tilde{U}_x\) and \(0 \le k < N\) we can estimate

$$\begin{aligned} \left\| D^{(k)}_t R \right\| \le \Vert t\Vert \cdot \sup \left\{ \left\| D^{(k+1)}_s R\right\| : \Vert s\Vert \le \Vert t\Vert \right\} \le \text {diam}\, \tilde{U}_x \cdot \Vert R^{(k+1)} \Vert _{0}. \end{aligned}$$

We consider the space \(\mathcal S\) of continuous sections of the bundle \({\mathcal {C}}\) and equip it with the distance \(\text {dist}(R_1,R_2)=\Vert R_1 - R_2 \Vert _N= \sup _x \Vert (R_1 - R_2)_x \Vert _N \). We consider the operator

$$\begin{aligned} T[R]_x= ({\mathcal {P}}_x)^{-1} \circ R_{fx} \circ \tilde{{\mathcal {F}}}_{x} \end{aligned}$$

on \(\mathcal S\) so that the fixed point of T is the desired coordinate change. Note that T[R] is in \(\mathcal S\) by the definition of \({\mathcal {C}}_x\) and the coincidence of the derivatives of \({\mathcal {P}}\) and \(\tilde{{\mathcal {F}}}\) at 0.

To show that T is a contraction on \(\mathcal S\) we denote \(R = R_1 -R_2\) for \(R_1, R_2\) in \(\mathcal S\) and estimate \(\Vert T[R] \Vert _N = \sup _x \Vert T[R]_x \Vert _N \), which by (3.13) equals to \( \sup _x \Vert T[R]_x^{(N)} \Vert _0\).

$$\begin{aligned} \begin{aligned} D_t^{(N)} T[R]_x&= D_t^{(N)} \left( ({\mathcal {P}}_x)^{-1} \circ R_{fx} \circ \tilde{{\mathcal {F}}}_{x} \right) \\&= D_{R_{fx}(\tilde{{\mathcal {F}}}_{x}(t))}^{(1)} \, ({\mathcal {P}}_x)^{-1} \circ D_{\tilde{{\mathcal {F}}}_{x}(t)}^{(N)} R_{fx} \circ D_t^{(1)}\tilde{{\mathcal {F}}}_{x} \,+ \,J, \end{aligned} \end{aligned}$$
(3.14)

where J consists of a fixed number of terms of the type

$$\begin{aligned} D_{R_{fx}\left( \tilde{{\mathcal {F}}}_{x}(t)\right) }^{(i)} \, ({\mathcal {P}}_x)^{-1} \circ D_{\tilde{{\mathcal {F}}}_{x}(t)}^{(j)} R_{fx} \circ D_t^{(k)}\tilde{{\mathcal {F}}}_{x}, \quad ijk=N, \quad j<N, \end{aligned}$$

whose norm can be estimated using (3.9) as

$$\begin{aligned} \left\| D_{R_{fx}(\tilde{{\mathcal {F}}}_{x}(t))}^{(i)} \, ({\mathcal {P}}_x)^{-1} \right\| \cdot \left\| D_{\tilde{{\mathcal {F}}}_{x}(t)}^{(j)} R_{fx} \right\| ^i\cdot \left\| D_t^{(k)}\tilde{{\mathcal {F}}}_{x}\right\| ^{ij}. \end{aligned}$$

By (3.13), \( \Vert D_{\tilde{{\mathcal {F}}}_{x}(t)}^{(j)} R_{fx}\Vert \le \delta \, \Vert D^{(N)} R_{fx} \Vert _{0}<1\) if \(\delta \) is small enough. Therefore, there exists a constant \(M=M({\mathcal {F}}, {\mathcal {P}}, N)\) such that

$$\begin{aligned} \Vert J\Vert \le M \cdot \delta \left\| D^{(N)} R_{fx} \right\| _{0} \le M \delta \cdot \Vert R \Vert _{N}. \end{aligned}$$
(3.15)

We estimate the main term in (3.14) as follows

$$\begin{aligned} \begin{aligned}&\left\| D_{R_{fx} (\tilde{{\mathcal {F}}}_{x}(t))}^{(1)} \, ({\mathcal {P}}_x)^{-1} \circ D_{\tilde{{\mathcal {F}}}_{x}(t)}^{(N)} R_{fx} \circ D_t^{(1)}\tilde{{\mathcal {F}}}_{x} \right\| \, \\&\quad \le \, \left\| D_{R_{fx}(\tilde{{\mathcal {F}}}_{x}(t))}^{(1)} ({\mathcal {P}}_x)^{-1} \right\| \cdot \Vert R \Vert _N \cdot \left\| D_t^{(1)}\tilde{{\mathcal {F}}}_{x} \right\| ^N \\&\quad \le \, e^ {-\lambda _1+\varepsilon } \cdot \Vert R \Vert _N \cdot e^ {N\mu _l +N\varepsilon } \,= \,k'\cdot \Vert R \Vert _N, \end{aligned} \end{aligned}$$
(3.16)

where \(k'=\exp ((N\mu _l - \lambda _1) +(N+1)\varepsilon )\). Since \(\lambda _1> N \mu _l\) by the choice of N we can take \(\varepsilon \) small enough so that \(k'<1\). Combining the estimates (3.15) and (3.16) we get

$$\begin{aligned} \Vert T[R_1-R_2] \Vert _N = \sup _x \left\| T[R_1-R_2]_x^{(N)} \right\| _0 \le (k'+M \delta ) \cdot \Vert R_1-R_2 \Vert _{N} \end{aligned}$$

so that T is a contraction if \(\delta \) is small enough. Thus T has a unique fixed point, which is a continuous family \(\tilde{{\mathcal {H}}}^N_x\) of coordinate changes conjugating \({\mathcal {P}}_x\) and \(\tilde{{\mathcal {F}}}_x\) given by (3.12).

We conclude that the maps \({\mathcal {H}}^N_x=\tilde{{\mathcal {H}}}^N_x \circ \bar{{\mathcal {H}}}^N_x\) give a family of \(C^N\) diffeomorphisms defined on a neighborhood of the zero section of \({\mathcal {E}}\) which depend continuously on \(x\in \mathcal {M}\) in \(C^N\) topology and satisfy (i) and (ii) of Theorem 2.4. Since f contracts W, this family extends uniquely to a family of \(C^N\) global diffeomorphisms \({\mathcal {H}}^N_x : W_x \rightarrow {\mathcal {E}}_x\) that satisfy (i). Once we fix a formal power series for \({\mathcal {H}}\), the construction works for each \(N>d\) and \({\mathcal {H}}^N\) is unique among \(C^N\) diffeomorphisms whose derivatives up to order N are given by the series. This means that all \({\mathcal {H}}^N\) coincide and give a family of \(C^\infty \) diffeomorphisms.

Thus we have proved parts (i),(ii) and (iii) with smoothness along W established so far for the Taylor coefficients of \({\mathcal {H}}_x\) at 0. The smoothness of \({\mathcal {H}}_x\) in x along W will follow from this once we establish part (iv). Indeed, in the coordinates on \(W_x\) given by \({\mathcal {H}}_x\) for a fixed x, for all other \(y \in W_x\) the maps \({\mathcal {H}}_y\) are polynomials and thus coincide with their finite Taylor series.

3.2 Consistency of the fast foliations

First we describe some properties of the fast flag (3.11) and related properties of \({\mathcal {H}}_x\) and of sub-resonance generated maps. The leaves of W are subfoliated by unique foliations \(U^{k}\) tangent to \(V^k_x= {\mathcal {E}}_x^{1} \oplus \cdots \oplus {\mathcal {E}}_x^{k}.\) We denote by \(\bar{U}^{k}\) the corresponding foliations of \({\mathcal {E}}_x\) obtained by the identification \({\mathcal {H}}_x :W_x \rightarrow {\mathcal {E}}_x\). Thus we obtain the foliations of \(\bar{U}^{k}\) of \({\mathcal {E}}\) which are invariant under the polynomial normal form maps \({\mathcal {P}}_x\). Since the maps \({\mathcal {H}}_x\) are diffeomorphisms, \(\bar{U}^{k}\) are also unique fast foliations with the same contraction rates. They are characterized by

$$\begin{aligned} \text {for } \quad y, z \in {\mathcal {E}}_x \quad z \in \bar{U}^{k} (y) \Leftrightarrow \text {dist}\left( {\mathcal {P}}^n _x(y),{\mathcal {P}}^n_x (z)\right) \le C e^{n ({\mu }_k +\varepsilon )} \quad \text { for all } \quad n \in \mathbb {N}\end{aligned}$$

for any \(\varepsilon \) sufficiently small so that \({\mu }_k +\varepsilon < {\lambda }_{k+1}\).

It follows from Definition 2.3 that sub-resonance polynomials \(R \in {\mathcal {S}}_{x,y}\) are block triangular in the sense that \({\mathcal {E}}^i\) component does not depend on \({\mathcal {E}}^j\) components for \(j<i\) or, equivalently, it maps the subspaces \({\mathcal {V}}^i_x\) of fast flag in \({\mathcal {E}}_x\) to those in \({\mathcal {E}}_y\). By considering compositions, we obtain that any sub-resonance generated polynomial in \(G _{x,y}\) is also block triangular.

It is easy to see that all derivatives of a sub-resonance polynomial are sub-resonance polynomials. In particular, the derivative \(D_y {\mathcal {P}}_x\) at any point \(y \in {\mathcal {E}}_x\) is sub-resonance and hence is block triangular and thus maps subspaces parallel to \(V^{k}_x\) to subspaces parallel to \(V^{k}_{fx}\). Hence the foliation of \({\mathcal {E}}\) by those parallel to \(V^{k}_x\) in \({\mathcal {E}}_x\) is invariant under the maps \({\mathcal {P}}_x\) and hence coincides with \(\bar{U}^{k}\) by uniqueness of the fast foliation. It follows that for any \(x\in \mathcal {M}\) and any \(y\in W_x\) the diffeomorphism

$$\begin{aligned} \mathcal G_{x,y}:={\mathcal {H}}_y \circ {\mathcal {H}}_x^{-1} : {\mathcal {E}}_x \rightarrow {\mathcal {E}}_y \end{aligned}$$
(3.17)

maps the fast flag of linear foliations of \({\mathcal {E}}_x\) to that of \({\mathcal {E}}_y\). In particular, the same holds for its derivative \( D_0 \mathcal G_{x,y}=D_x H_y : {\mathcal {E}}_x \rightarrow {\mathcal {E}}_y\) and we conclude that \( D_0 \mathcal G_{x,y}\) is block triangular and thus is a sub-resonance linear map.

3.3 Proof of (iv): consistency of normal form coordinates

In this section we show that the map \(\mathcal G_{x,y}\) in (3.17) is a sub-resonance generated polynomial. First we note that

$$\begin{aligned} \mathcal G_{x,y} (0) ={\mathcal {H}}_y (x) =:\bar{x} \in {\mathcal {E}}_y \quad \text {and}\quad D_0 \,\mathcal G_{x,y}=D_x {\mathcal {H}}_y. \end{aligned}$$

Since \( {\mathcal {H}}_{f^n x}^{-1} \circ {\mathcal {P}}^n_x \circ {\mathcal {H}}_{x} = f^n = {\mathcal {H}}_{f^n y}^{-1} \circ {\mathcal {P}}^n_y \circ {\mathcal {H}}_{y}\) we obtain that

$$\begin{aligned} {\mathcal {H}}_{f^n y} \circ {\mathcal {H}}_{f^n x}^{-1} \circ {\mathcal {P}}^n_x= & {} {\mathcal {H}}_{f^n y} \circ f^n \circ {\mathcal {H}}_x^{-1} = {\mathcal {P}}^n_y \circ {\mathcal {H}}_y \circ {\mathcal {H}}_x^{-1} \quad \text{ and } \text{ hence }\nonumber \\ \mathcal G_{f^n x,f^n y} \circ {\mathcal {P}}^n_x= & {} {\mathcal {P}}^n_y \circ \mathcal G_{x,y}. \end{aligned}$$
(3.18)

Now we consider the formal power series for \(\mathcal G_{x,y} : {\mathcal {E}}_x \rightarrow {\mathcal {E}}_y\) at \(t=0 \in {\mathcal {E}}_x\):

$$\begin{aligned} \mathcal G_{x,y}(t) \sim G_{x,y}(t)=\bar{x} + \sum _{m=1}^\infty G^{(m)}_{x,y}(t). \end{aligned}$$

Our first goal is to show that all its terms are sub-resonance generated. We proved in Sect. 3.2 that the first derivative \( G^{(1)}_{x,y}=D_x H_y \) is a sub-resonance linear map.

Inductively, we assume that \( G^{(m)}_{x,y}\) has only sub-resonance generated terms for all \(x\in \mathcal {M}, y \in W_x\), and \(m=1,\ldots ,k-1\) and show that the same holds for \( G^{(k)}_{x,y}\). We split \(G^{(k)}_{x,y}=S_{x,y}+N_{x,y}\) into the sub-resonance generated part and the rest. Using invariance under contracting maps \({\mathcal {P}}_x\), it suffices to show that \(N_{x,y}= 0\) for all \(y \in W_x\) that are sufficiently close to x. Assuming the contrary, we fix such x and y with \(N_{x,y}\ne 0\). We will write \(N_{x}\) for \(N_{x,y}\) and \(N_{f^nx}\) for \(N_{f^nx,f^ny}\). Now we consider order k terms in the Taylor series at \(0\in {\mathcal {E}}_x\) for (3.18). Taylor series for \({\mathcal {P}}_x^n\) at \(0\in {\mathcal {E}}_x\) coincides with \({\mathcal {P}}_x^n(t)=\sum _{m=1}^M {P}^{(m)}_{x}(t)\). We also consider the Taylor series for \({\mathcal {P}}_y^n\) at \(\mathcal G_{x,y}(0) = \bar{x} \in {\mathcal {E}}_y\)

$$\begin{aligned} {\mathcal {P}}_y^n(z)= \bar{x}_n + \sum _{m=1}^M Q^{(m)}_{y}(z-\bar{x}), \quad \text {where}\quad \bar{x}_n ={\mathcal {P}}_y^n (\bar{x}) \end{aligned}$$

All terms \(Q^{(m)}\) are sub-resonance as the derivatives of a sub-resonance polynomial. Consider the formal power series for

$$\begin{aligned} \mathcal G_{f^n x,f^n y}(t) \sim G_{f^n x,f^n y}(t) = \bar{x}_n + \sum _{m=1}^\infty G^{(m)}_{f^n x,f^n y}(t). \end{aligned}$$

Now we obtain from (3.18)

$$\begin{aligned} \bar{x}_n + \sum _{j=1}^\infty G^{(j)}_{f^n x,f^n y}\left( \sum _{m=1}^M {P}^{(m)}_{x}(t) \right) = \,\bar{x}_n + \sum _{m=1}^M Q^{(m)}_{y}\left( \sum _{j=1}^\infty G^{(j)}_{x,y}(t) \right) . \end{aligned}$$

Since any composition of sub-resonance generated terms is sub-resonance generated, taking non sub-resonance generated terms of order k in the above equation yields

$$\begin{aligned} N_{f^n x}\left( {P}^{(1)}_{x}(t) \right) = Q^{(1)}_{y}\left( N_{x}(t) \right) . \end{aligned}$$
(3.19)

We decompose N into components \(N=(N^1,\ldots , N^l)\) and let i be the largest index so that \(N^i _x \ne 0\), i.e. there exists \(t' \in {\mathcal {E}}_x\) so that \(z'= N^i (t')\) has non-zero component in \({\mathcal {E}}^i_y\). We denote

$$\begin{aligned} w= Q^{(1)}_{y}(z') = D_{\bar{x}} {\mathcal {P}}_y^n (z') \in {\mathcal {E}}_{f^n y} \quad \text { and let }\quad w_i =Pr_i (w) \in {\mathcal {E}}^i_{f^n y} \end{aligned}$$
(3.20)

be its i component. We claim that

$$\begin{aligned} \Vert w_i \Vert \ge C e^{n ({\lambda }_i -\varepsilon )} \end{aligned}$$
(3.21)

where the constant C does not depend on n. This follows from (2.3) and the fact that

$$\begin{aligned} D_{\bar{x}} {\mathcal {P}}_y^n = D_{f^n x}H_{f^n y} \circ {F}^n_x \circ (D_x H_y) ^{-1} . \end{aligned}$$

Indeed, the differentials \(D_{f^n x}H_{f^n y}\) and \(D_{x}H_y ^{-1} \) preserve the fast flag and are close to identity since x is close to y by our assumption and hence \(f^n x\) is close to \(f^n y\). Then \(z=(D_x H_y) ^{-1} (z')\) has non-zero component \(z_i\) in \({\mathcal {E}}^i_x\), which is transformed by \( {F}^n_x\) according to (2.3): \(\Vert {F}^n_x (z_i) \Vert \ge e^{n ({\lambda }_i -\varepsilon )} \Vert z_i \Vert \). Then finally \(\Vert w_i \Vert \ge C' \Vert {F}^n_x (z_i) \Vert \) as \(D_{f^n x}H_{f^n y}\) is close to identity and \({F}^n_x (z)\) has no j components for \(j>i\).

Now we estimate from above the i component of the left side of (3.19) at \(t'\). First,

$$\begin{aligned} \left\| {P}^{(1)}_{x}(t'_j) \right\| = \left\| {F}^n_x (t'_j) \right\| \le \Vert (t') \Vert e^{n ({\mu }_j +\varepsilon )} \quad \text {for any }\quad j. \end{aligned}$$

Let \(N^s_{f^n x}\) be a term of homogeneity type \(s=(s_1, \ldots , s_l)\) in the component \(N_{f^n x}^i\). We estimate as in (3.10)

$$\begin{aligned} \left\| N^s_{f^n x} \left( {P}^{(1)}_{x}(t')\right) \right\| \le \left\| N_{f^n x}\right\| \cdot \Vert (t') \Vert ^k \cdot e^{n \sum s_j ({\mu }_j +\varepsilon )} \end{aligned}$$

Since no term in \(N_{f^n x}^i\) is a sub-resonance one, we have \({\lambda }_i > \sum s_j {\mu }_j\). We can choose a sufficiently small \(\varepsilon >0\) so that \( {\lambda }_i > \sum s_j {\mu }_j + (n+2)\varepsilon \) for all such relations. Hence the left side of (3.19) at \(t'\) can be estimated as

$$\begin{aligned} \left\| N^s_{f^n x} \left( {P}^{(1)}_{x}(t')\right) \right\| \le C' e^{n ({\lambda }_i -2\varepsilon )}, \end{aligned}$$

where the constant \(C'\) does not depend on n since the norms of \(G^{(k)}_{f^n x,f^n y}\), and hence those of \(N_{f^n x}\), are uniformly bounded. This contradicts (3.19) and (3.21) for large n.

Thus we shown that the Taylor series \(G_{x,y}\) of \(\,\mathcal G_{x,y}\) at 0 contains only sub-resonance generated terms, and in particular it is a finite polynomial. It remains to show that \(\mathcal G_{x,y}\) coincides with its Taylor series.

In addition to (3.18) we have the same relation for their finite Taylor series

$$\begin{aligned} G_{f^n x,f^n y} \circ {\mathcal {P}}^n_x = {\mathcal {P}}^n_y \circ G_{x,y} \end{aligned}$$
(3.22)

and hence denoting \(\Delta _n= \mathcal G_{f^n x,f^n y}-G_{f^n x,f^n y}\) we obtain

$$\begin{aligned} \Delta _0 = ({\mathcal {P}}^n_y ) ^{-1} \circ \Delta _n \circ {\mathcal {P}}^n_x \end{aligned}$$
(3.23)

Below we show that the right hand side of (3.23) tends to 0 as \(n \rightarrow \infty \) and thus \(\mathcal G_{x,y}=G_{x,y}\), completing the proof.

Since \(\Delta _n\) is infinitely flat at \(0\in {\mathcal {E}}_{f^n x}\), for each k there exists \(C_{k,\delta }\) such that

$$\begin{aligned} \Vert \Delta _n (z)\Vert \le C_{k,\delta } \Vert z\Vert ^k \quad \text {for all } \quad z \in {\mathcal {E}}_{f^n x} \quad \text { with } \quad \Vert z\Vert \le \delta . \end{aligned}$$
(3.24)

We can choose the constant \(C_{k,\delta }\) independent of n since \(\mathcal G_{x,y}\) depends continuously on (xy) in \(C^\infty \) topology and \(f^n x\) remains close to \(f^n y\). We fix k and \(\varepsilon \) such that

$$\begin{aligned} k {\mu }_l +2(k+1)\varepsilon < {\lambda }_1. \end{aligned}$$
(3.25)

We estimate \( ({\mathcal {P}}^n_y ) ^{-1}\) and \( {\mathcal {P}}^n_x \) as follows:

$$\begin{aligned} \left\| ({\mathcal {P}}^n_y ) ^{-1} (t) \right\| \le e^{-n ({\lambda }_1 -2\varepsilon )} \Vert t\Vert \quad \text {and} \quad \left\| ({\mathcal {P}}^n_x ) (t) \right\| \le e^{n ({\mu }_l +2\varepsilon )} \Vert t\Vert \end{aligned}$$
(3.26)

for any \(n \in \mathbb {N}\) provided that sizes of both the input \(\Vert t\Vert \) and the outputs \(\Vert ({\mathcal {P}}^n_y ) (t) \Vert \) and \(\Vert ({\mathcal {P}}^n_y ) ^{-1} (t) \Vert \) are at most \(\delta \). Both estimates follow from the fact that for any \(x \in \mathcal {M}\)

$$\begin{aligned} \Vert D_0 {\mathcal {P}}_x \Vert = \Vert {F}_x \Vert \le e^{{\mu }_l +\varepsilon } \quad \text {and} \quad \left\| D_0 ({\mathcal {P}}_x ) ^{-1} \right\| = \left\| ({F}_x) ^{-1} \right\| \le e^{-{\lambda }_1 +\varepsilon } \end{aligned}$$

by (2.3) and hence we have similar estimates for derivatives at all points \(t \in {\mathcal {E}}_x\) with \(\Vert t\Vert \le \delta \) for sufficiently small \(\delta >0\):

$$\begin{aligned} \Vert D_t {\mathcal {P}}_x \Vert \le e^{{\mu }_l +2\varepsilon } \quad \text {and} \quad \left\| D_t ({\mathcal {P}}_x ) ^{-1}\right\| \le e^{-{\lambda }_1 +2\varepsilon }. \end{aligned}$$

Then the bound for \(\Vert ({\mathcal {P}}^n_x ) (t) \Vert \) follows by simply estimating its derivative as the trajectory product, as long as all points involved are \(\delta \) close to 0.

Finally using (3.23), (3.24), and (3.26) we estimate

$$\begin{aligned} \begin{aligned} \Vert \Delta _0 (t) \Vert&\,\le \, \left\| ({\mathcal {P}}^n_y ) ^{-1} \right\| \cdot \left\| \Delta _n \circ {\mathcal {P}}^n_x (t)\right\| \,\le \, e^{-n ({\lambda }_1 -2\varepsilon )} C_{k,\delta } \left( e^{n ({\mu }_l +2\varepsilon )} \Vert t\Vert \right) ^k \,\\&\,= \, e^{n \left( -{\lambda }_1 + k {\mu }_l +2(k+1)\varepsilon \right) } \,C_{k,\delta }\, \Vert t\Vert ^k \, \rightarrow 0 \quad \text {as } \quad n\rightarrow \infty \end{aligned} \end{aligned}$$

by (3.25). Thus, \(\Delta _0=0\), i.e. the map \(\mathcal G_{x,y}\) coincides with its Taylor series.

This completes the proof of Theorem 2.4. \(\square \)