1 Introduction and results

1.1 Main result

Throughout this article, a system is a probability space \((X,\mathcal {X},\mu )\) together with invertible, measure preserving transformations \(T_1,\ldots , T_\ell :X\rightarrow X\) that commute. A multiple correlation sequence is a sequence of the form

$$\begin{aligned} \int T_1^{n_1}f_1\cdot \ldots \cdot T_\ell ^{n_\ell }f_\ell \, d\mu \end{aligned}$$

where \((X,\mathcal {X},\mu , T_1,\ldots , T_\ell )\) is a system, \(f_1,\ldots , f_\ell \in L^\infty (\mu )\), and \(n_1,\ldots , n_\ell \in {\mathbb Z}\). The study of the limiting behavior of averages of such sequences, where the iterates are restricted to certain subsets of \({\mathbb Z}^\ell \), has been an indispensable tool in ergodic Ramsey theory and in particular in proving various far reaching extensions of Szemerédi’s theorem on arithmetic progressions. Although the precise structure of the multiple correlation sequences is unknown even when \(n_1=\cdots =n_\ell =n\), there is a widespread belief that modulo negligible terms the building blocks are sequences with algebraic structure (see [7, Problem 1] for a related conjecture).

Definition

([5]) For \(\ell \in {\mathbb N}\), an \(\ell \) -step nilsequence is a complex valued sequence of the form \((F(g^n\Gamma ))\), where \(F\in C(X)\), \(X=G/\Gamma \), \(G\) is an \(\ell \)-step nilpotent Lie group, \(\Gamma \) is a discrete cocompact subgroup, and \(g\in G\). A \(0\) -step nilsequence is a constant sequence.

When \(T_i=T^i\), \(i=1,\ldots , \ell ,\) following the discovery of characteristic factors with algebraic structure for some closely related multiple ergodic averages, Bergelson et al. proved the following beautiful result (see also [17] for related work for \(\ell =3\)):

Theorem

([5, Theorem 1.9]) For \(\ell \in {\mathbb N}\), let \((X,\mathcal {X},\mu , T)\) be an ergodic system and \(f_1,\ldots ,f_\ell \in L^{\infty }(\mu )\) be functions with \(\left\| f_i\right\| _\infty \le 1\). Then we have the decomposition

$$\begin{aligned} \int T^{n}f_1\cdot \ldots \cdot T^{\ell n}f_\ell \ d\mu =a_{st}(n)+a_{er}(n), \quad n\in {\mathbb N}, \end{aligned}$$

where

  1. (i)

    \((a_{st}(n))\) is a uniform limit of \((\ell -1)\)-step nilsequences with \(\left\| a_{st}\right\| _{\infty }\le 1\);

  2. (ii)

    \(\lim _{N-M\rightarrow \infty } \frac{1}{N-M}\sum _{n=M}^{N-1} |a_{er}(n)|^2=0\).

This result was extended by Leibman to cover polynomial iterates in [14] and not necessarily ergodic transformations in [15]. The proofs of these results depend in an essential way on the fact that characteristic factors for some suitable multiple ergodic averages are inverse limits of nilsystems. This is no longer true for correlation sequences involving actions of commuting transformations, which is why efforts to prove decomposition results for such sequences did not bring any results so far. In fact, characteristic factors for commuting actions are known to be extremely complex (for related work see [2, 3]) which has raised suspicions that decomposition results in this more general setup may involve sequences very different from nilsequences. Our main result settles this rather elusive problem; we show that modulo error terms that are small in uniform density, correlation sequences of actions of commuting transformations are nilsequences.

Theorem 1.1

For \(\ell \in {\mathbb N}\) let \((X,\mathcal {X},\mu , T_1,\ldots , T_\ell )\) be a system and \(f_1,\ldots ,f_\ell \in L^{\infty }(\mu )\) be functions with \(\left\| f_i\right\| _\infty \le 1\). Then for every \(\varepsilon >0\) we have the decomposition

$$\begin{aligned} \int T_1^{n}f_1\cdot \ldots \cdot T_\ell ^{n}f_\ell \ d\mu =a_{st}(n)+a_{er}(n), \quad n\in {\mathbb N}, \end{aligned}$$
(1)

where

  1. (i)

    \((a_{st}(n))\) is an \((\ell -1)\)-step nilsequence with \(\left\| a_{st}\right\| _{\infty }\le 1\);

  2. (ii)

    \(\lim _{N-M\rightarrow \infty } \frac{1}{N-M}\sum _{n=M}^{N-1} |a_{er}(n)|^2\le \varepsilon \).

Remark

We do not know if a strengthening similar to the one in [5, Theorem 1.9] holds where one uses uniform limits of nilsequences in \((i)\) and takes \(\varepsilon =0\) in \((ii)\).

Our argument is rather versatile and does not rely on the theory of characteristic factors; we rather focus on some distinctive properties correlation sequences as in (1) satisfy (see Theorem 1.3). The idea that starts the proof comes from answering the following natural question: “Can a multiple correlation sequence as in (1) be asymptotically orthogonal to all \((\ell -1)\)-step nilsequences?”

On the one hand, using an inverse theorem of Host and Kra (see Theorem 2.1), one gets that any such sequence has to be \(U_{\ell }\)-uniform. On the other hand, by successively applying van der Corput’s lemma one sees that a sequence of the form (1) is asymptotically orthogonal to all \(U_{\ell }\)-uniform sequences. Hence, any sequence that provides a positive answer to our question has to be asymptotically orthogonal to itself, that is, has to converge to \(0\) in density.

With this idea in mind, we prove our main result as follows: Given a sequence \((a(n))\) as in (1), we consider the \((\ell -1)\)-step nilsequence, call it \(a_{st}\), that lies “closest” to \((a(n))\) with respect to the semi-norm \(\left\| \cdot \right\| _2\) defined in (3). Then \(a_{er}:=a-a_{st}\) is asymptotically orthogonal to all \((\ell -1)\)-step nilsequences, and arguing as before, we get that \(a_{st}\) and \(a_{er}\) have the asserted properties. A slight complication appears because for \(\ell \ge 2\) the space of \((\ell -1)\)-step nilsequences (or uniform limits of such sequences) is not \(\left\| \cdot \right\| _2\)-complete; this is the reason why we are led to an error term \(a_{er}\) that is small, but not zero, in uniform density. For our argument to work we also have to make sure that various limits of uniform Cesàro averages exist; to guarantee this, we use a result of Austin [1].

Using a variant of the previous argument and a result of Walsh [19] we get:

Theorem 1.2

Let \(\ell ,m\in {\mathbb N}\) and \(p_{i,j}\in {\mathbb Z}[t]\), \(i=1,\ldots , \ell , j=1,\ldots , m\), be polynomials. Then there exists \(k\in {\mathbb N}\), \(k=k(\ell ,m,\max {\deg (p_{i,j})})\), such that for every system \((X,\mathcal {X},\mu , T_1,\ldots , T_\ell )\), functions \(f_1,\ldots ,f_m\in L^{\infty }(\mu )\) with \(\left\| f_i\right\| _\infty \le 1\), and \(\varepsilon >0\), we have the decomposition

$$\begin{aligned} \int \left( \prod _{i=1}^\ell T_i^{p_{i,1}(n)}\right) f_1\cdot \ldots \cdot \left( \prod _{i=1}^\ell T_i^{p_{i,m}(n)}\right) f_m\, d\mu =a_{st}(n)+a_{er}(n), \quad n\in {\mathbb N}, \end{aligned}$$
(2)

where

  1. (i)

    \((a_{st}(n))\) is a \(k\)-step nilsequence with \(\left\| a_{st}\right\| _{\infty }\le 1\);

  2. (ii)

    \(\lim _{N-M\rightarrow \infty } \frac{1}{N-M}\sum _{n=M}^{N-1} |a_{er}(n)|^2\le \varepsilon \).

1.2 A more general framework

It turns out that Theorem 1.1 is a manifestation of a more general principle which asserts that if a sequence is asymptotically orthogonal to all \(U_{\ell }\)-uniform sequences and satisfies some necessary regularity conditions, then it admits a decomposition like the one in Theorem 1.1. To make this more precise we introduce some notation (see Sect. 2.1 for the definition of the uniformity seminorms).

Definition

Let \(\ell \in {\mathbb N}\). We say that the bounded sequence \(a:{\mathbb N}\rightarrow {\mathbb C}\) is

  1. (i)

    \(\ell \hbox {-}\textit{anti-uniform}\) if there exists \(C:=C(\ell ,a)\) such that

    $$\begin{aligned} \limsup _{N-M\rightarrow \infty } \Big |\frac{1}{N-M}\sum _{n=M}^{N-1} a(n)b(n)\Big |\le C \left\| b\right\| _{U_{\ell }(\mathbb {N})} \end{aligned}$$

    for every \(b\in \ell ^{\infty }\).

  2. (ii)

    \(\ell \)-regular if the limit

    $$\begin{aligned} \lim _{N-M\rightarrow \infty } \frac{1}{N-M}\sum _{n=M}^{N-1} a(n)\psi (n) \end{aligned}$$

    exists for every \((\ell -1)\)-step nilsequence \((\psi (n))\).

Theorem 1.3

For \(\ell \in {\mathbb N}\) let \(a:{\mathbb N}\rightarrow {\mathbb C}\) be a sequence with \(\left\| a\right\| _\infty \le 1\) that is \(\ell \)-anti-uniform and \(\ell \)-regular. Then for every \(\varepsilon >0\) we have the decomposition

$$\begin{aligned} a(n)=a_{st}(n)+a_{er}(n), \quad n\in {\mathbb N}, \end{aligned}$$

where

  1. (i)

    \((a_{st}(n))\) is an \((\ell -1)\)-step nilsequence with \(\left\| a_{st}\right\| _{\infty }\le 1\);

  2. (ii)

    \(\lim _{N-M\rightarrow \infty } \frac{1}{N-M}\sum _{n=M}^{N-1} |a_{er}(n)|^2\le \varepsilon \).

Remark

For general \(\ell \)-regular sequences a similar result is proved in [12, Theorem 2.19] with an error term that is small with respect to the seminorm \(\left\| \cdot \right\| _{U_\ell ({\mathbb N})}\).

A sequence \((a(n))\) that satisfies the asserted decomposition has to be \(\ell \)-regular. It also has to satisfy the estimate defining the \(\ell \)-anti-uniformity property if one introduces an arbitrarily small error term \(\varepsilon \) on the right hand side and allows \(C\) to depend on \(\varepsilon \) (this follows from [12, Theorem 2.14]).

Theorem 1.3 fails if we use standard Cesàro averages to define the notions of anti-uniformity and regularity (and leave the definition of \(\left\| \cdot \right\| _{U_\ell ({\mathbb N})}\) as is); the sequence \((e^{ i \sqrt{n}})\), illustrates this. The same sequence shows that anti-uniformity does not imply regularity (\((e^{ i \sqrt{n}})\) is \(2\)-anti-uniform but not \(1\)-regular).

1.3 Applications

On \(\ell ^\infty ({\mathbb N})\) we define the seminorm \(\left\| \cdot \right\| _2\) by

$$\begin{aligned} \left\| a\right\| _2^2:=\limsup _{N-M\rightarrow \infty } \frac{1}{N-M}\sum _{n=M}^{N-1}|a(n)|^2. \end{aligned}$$
(3)

For \(\ell \in {\mathbb N}\) we consider the following subspaces of \(\ell ^\infty ({\mathbb N})\):

$$\begin{aligned} \mathcal {A}_\ell :=\Big \{(\psi (n)) :\psi \text { is an } (\ell -1)\text {-step nilsequence}\Big \}; \end{aligned}$$
$$\begin{aligned} \mathcal {B}_\ell&:= \Big \{\Big (\int T^{k_1n}f_1\cdot \ldots \cdot T^{k_{\ell } n}f_\ell \, d\mu \Big ):(X,\mathcal {X}, \mu ,T) \text { is a system},\\&f_i\in L^\infty (\mu ), k_i=\ell !/i \Big \}; \end{aligned}$$
$$\begin{aligned} \mathcal {C}_\ell&:= \Big \{\Big (\int T_1^nf_1\cdot \ldots \cdot T_{\ell }^{n}f_\ell \, d\mu \Big ):(X,\mathcal {X}, \mu , T_1,\ldots , T_\ell ) \text { is a system and } \\&f_i\in L^\infty (\mu )\Big \}. \end{aligned}$$

After Proposition 2.4 we explain why in the definition of \(\mathcal {B}_\ell \) we use the exponents \(k_1,\ldots , k_\ell \) instead of \(1,\ldots , \ell \). The space \(\mathcal {A}_\ell \) is linear since if for \(i=1,2\), \((F_i(g_i^n\Gamma _i))\) are \((\ell -1)\)-step nilsequences on \(G_i/\Gamma _i\), then their sum is the \((\ell -1)\)-step nilsequence \((F(g^n\Gamma ))\) on \(G/\Gamma \), where \(G=G_1\times G_2\), \(\Gamma :=\Gamma _1\times \Gamma _2\), \(g:=(g_1,g_2)\), \(F(g\Gamma ):=F_1(g_1\Gamma _1)+F_2(g_2\Gamma _2)\). To see that the space \(\mathcal {C}_\ell \) is linear (similarly for \(\mathcal {B}_\ell \)), let \(a,b\in \mathcal {C}_\ell \) be defined by the systems \((X_i, \mathcal {X}_i, \mu _i, T_i)\) and the functions \(f^i_1,\ldots , f^i_\ell \), \(i=1,2\). Then \(c:=(a+b)/2\) is also a multiple correlation sequence defined by the system \((X, \mathcal {X}, \mu , T)\), where \(X=X_1\cup X_2\) (considered as disjoint subsets) with the corresponding \(\sigma \)-algebra \(\mathcal {X}\), \(\mu :=(\mu _1+\mu _2)/2\), \(T\) equals \(T_1\) on \(X_1\) and \(T_2\) on \(X_2\), and \(f_i:=f^1_i\mathbf{1}_{X_1}+f^2_i\mathbf{1}_{X_2}\), \(i=1,\ldots , \ell \).

It is a rather striking fact that, modulo sequences that are small in uniform density, the three subspaces \(\mathcal {A}_\ell \), \(\mathcal {B}_\ell \), \(\mathcal {C}_\ell \) coincide.

Theorem 1.4

For every \(\ell \in {\mathbb N}\) we have

$$\begin{aligned} \overline{\mathcal {A}_{\ell }}=\overline{\mathcal {B}_\ell }=\overline{\mathcal {C}_\ell } \end{aligned}$$

where the closure is taken with respect to the seminorm \(\left\| \cdot \right\| _2\) defined in (3).

It is not hard to see that the first equality fails if we consider closures with respect to the \(\left\| \cdot \right\| _\infty \) norm. The second equality may still hold under such circumstances but this is not something we can prove with the methods developed so far.

The next two results illustrate some rather surprising principles: \((i)\) convergence results for actions of a single transformation automatically imply stronger convergence results for actions of commuting transformations; and \((ii)\) convergence results involving linear iterates automatically imply stronger convergence results involving polynomial iterates.

Theorem 1.5

Let \((r_n)\) be a strictly increasing sequence of integers such that \(r_n=O(n)\). Then for every \(\ell \in {\mathbb N}\) the following statements are equivalent:

  1. (i)

    For every \((\ell \!-\!1)\)-step nilsequence \((\psi (n))\) the limit \(\lim _{N\rightarrow \infty }\frac{1}{N}\sum _{n\!=\!1}^{N}\psi (r_n)\) exists.

  2. (ii)

    For every system \((X,\mathcal {X},\mu , T)\), functions \(f_1,\ldots , f_\ell \in L^\infty (\mu )\), and for \(k_i=\ell !/i\), \(i=1,\ldots , \ell \), the following limit exists

    $$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N}\sum _{n=1}^{N} \int T^{k_1r_n}f_1 \cdot \ldots \cdot T^{k_\ell r_n}f_\ell \, d\mu . \end{aligned}$$
  3. (iii)

    For every system \((X,\mathcal {X},\mu , T_1,\ldots , T_\ell )\) and functions \(f_1,\ldots , f_\ell \in L^\infty (\mu )\) the following limit exists

    $$\begin{aligned} \lim _{N\rightarrow \infty } \frac{1}{N}\sum _{n=1}^{N} \int T_1^{r_n}f_1\cdot \ldots \cdot T_{\ell }^{r_n} f_\ell \, d\mu . \end{aligned}$$

Remark

Equivalently, the growth condition \(r_n=O(n)\) holds if the set \(R:=\{r_1,r_2,\ldots \}\) has positive lower natural density.

In the previous result we have established an equivalence for every fixed \(\ell \in {\mathbb N}\), in the next result we have to assume that a certain property is known for every \(\ell \in {\mathbb N}\) in order to establish an equivalence [this is needed for the equivalence of (ii) and (iii)].

Theorem 1.6

Let \((r_n)\) be a strictly increasing sequence of integers such that \(r_n=O(n)\). Then the following statements are equivalent:

  1. (i)

    For every \(\ell \in {\mathbb N}\) and \(\ell \)-step nilsequence \((\psi (n))\) the limit \(\lim _{N\rightarrow \infty }\frac{1}{N}\sum _{n=1}^{N}\psi (r_n)\) exists.

  2. (ii)

    For every \(\ell \in {\mathbb N}\), system \((X,\mathcal {X},\mu , T)\), and functions \(f_1,\ldots , f_\ell \in L^\infty (\mu )\), the following limit exists

    $$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N}\sum _{n=1}^{N} \int T^{r_n}f_1\cdot \ldots \cdot T^{\ell r_n}f_\ell \, d\mu . \end{aligned}$$
  3. (iii)

    For every \(\ell \in {\mathbb N}\), polynomials \(p_1,\ldots , p_\ell \in {\mathbb Z}[t]\), system \((X,\mathcal {X},\mu , T_1,\ldots , T_\ell )\), and functions \(f_1,\ldots , f_\ell \in L^\infty (\mu )\), the following limit exists

    $$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N}\sum _{n=1}^{N} \int T_1^{p_1(r_n)}f_1\cdot \ldots \cdot T_{\ell }^{p_\ell (r_n)} f_\ell \, d\mu . \end{aligned}$$

Similar results hold if in (i)–(iii) of Theorems 1.5 and 1.6 one replaces the limit \(\lim _{N\rightarrow \infty }\frac{1}{N}\sum _{n=1}^{N}\) with the limit \(\lim _{N-M\rightarrow \infty }\frac{1}{N-M}\sum _{n=M}^{N-1}\) and the growth assumption on \((r_n)\) with the assumption that the range of this sequence has positive lower Banach density. Furthermore, the same method can be used to prove convergence criteria for weighted averages where for a given bounded sequence of complex numbers \((w_n)\) one replaces in (i)–(iii) of Theorems 1.5 and 1.6 the averaging operation \(\frac{1}{N}\sum _{n=1}^{N}\) with the averaging operation \(\frac{1}{N}\sum _{n=1}^{N} w_n\).

1.4 Conjectures

The growth assumption on \((r_n)\) in Theorems 1.5 and 1.6 is crucial for our argument to work as the proofs use Theorem 1.1 which is not helpful for sequences that grow faster than linearly. Nevertheless, we believe that the following is true:

Conjecture 1

In Theorems 1.5 and 1.6 the growth assumption on \((r_n)\) is superfluous.

We also believe in the following strengthening of the second identity in Theorem 1.4:

Conjecture 2

For every \(\ell \in \mathbb {N}\) we have \(\overline{\mathcal {B}_\ell }=\overline{\mathcal {C}_\ell }\) where the closure is taken with respect to the norm \(\left\| \cdot \right\| _\infty \).

1.5 Notation

We denote by \({\mathbb N}\) the set of positive integers.

If \((a(n))\) is a bounded sequence we denote by \(\limsup _{N-M\rightarrow \infty } |\frac{1}{N-M}\sum _{n=M}^{N-1}a(n)|\) the limit (it exists by subadditivity) \( \lim _{N\rightarrow \infty } \sup _{M\in {\mathbb N}} \Big |\!\frac{1}{N}\sum _{n=M}^{M+N-1}\!a(n)\Big |. \)

2 Proofs of results

2.1 Uniformity seminorms and the Host-Kra inverse theorem

We give a slight variant of the uniformity seminorms defined by Host and Kra [12].

Definition

Let \(\ell \in {\mathbb N}\) and \(a:{\mathbb N}\rightarrow {\mathbb C}\) be a bounded sequence.

  1. (i)

    Given a sequence of intervals \(\mathbf{I}=(I_N)\) with lengths tending to infinity, we say that the sequence \((a(n))\) is distributed regularly along \(\mathbf{I}\) if the limit

    $$\begin{aligned} \lim _{N\rightarrow \infty } \frac{1}{|I_N|}\sum _{n\in I_N} a_1(n+h_1)\cdot \ldots \cdot a_r(n+h_r) \end{aligned}$$

    exists for every \(r\in {\mathbb N}\) and \(h_1,\ldots , h_r\in {\mathbb N}\), where \(a_i\) is either \(a\) or \(\bar{a}\).

  2. (ii)

    If \(\mathbf{I}\) is as in (i) and \((a(n))\) is distributed regularly along \(\mathbf{I}, \) we define inductively

    $$\begin{aligned} \left\| a\right\| _{\mathbf{I}, 1}:= \lim _{N\rightarrow \infty } \Big |\frac{1}{|I_N|}\sum _{n\in I_N} a(n)\Big |; \end{aligned}$$

    and for \(\ell \ge 2\) (one can show as in [12, Proposition 4.3] that the next limit exists)

    $$\begin{aligned} \left\| a\right\| _{\mathbf{I}, \ell }^{2^{\ell }} :=\lim _{H\rightarrow \infty } \frac{1}{H}\sum _{h=1}^H \left\| \sigma _ha\cdot \bar{a}\right\| ^{2^\ell -1}_{\mathbf{I}, \ell -1} \end{aligned}$$

    where \(\sigma _h\) is the shift transformation defined by \((\sigma _ha)(n):=a(n+h)\).

  3. (iii)

    If \((a(n))\) is a bounded sequence we let

    $$\begin{aligned} \left\| a\right\| _{U_\ell ({\mathbb N})}:=\sup _{\mathbf{I}}\left\| a\right\| _{\mathbf{I}, \ell } \end{aligned}$$

    where the sup is taken over all sequences of intervals \(\mathbf{I}\) with lengths tending to infinity along which the sequence \((a(n))\) is distributed regularly.

An application of Lemma 2.2 shows that \(\left\| a\right\| _{\mathbf{I}, 1}\), as defined here, is smaller than the corresponding quantity defined in [12] (they can be different though). Furthermore, the inductive formula is identical in both cases (see [12, Proposition 4.4]), hence \(\left\| \cdot \right\| _{U_\ell ({\mathbb N})}\), as defined here, is a seminorm that is smaller than the corresponding seminorm defined in [12]. In fact, it can be shown that the two seminorms coincide but we will not need this.

Using the main structural result in [11], Host and Kra proved an inverse theorem that will be a key ingredient in the proof of Theorem 1.3. We state a slight variant of it next ([12, Theorem 2.16] gives a stronger lower bound but it does not allow to assume that \(\left\| b\right\| _\infty \le 1\)). Its proof amounts to a simple modification of the argument given in [12, Theorem 2.16]; we give the details for completeness.

Theorem 2.1

([12, Theorem 2.16]) Let \(a:{\mathbb N}\rightarrow {\mathbb C}\) be a sequence of complex numbers with \(\left\| a\right\| _\infty \le 1\) and \(\ell \in {\mathbb N}\). Then for every \(\varepsilon >0\) there exists an \((\ell -1)\)-step nilsequence \((b(n))\) with \(\left\| b\right\| _\infty \le 1\) such that

$$\begin{aligned} \limsup _{N-M\rightarrow \infty }\Big |\frac{1}{N-M}\sum _{n=M}^{N-1} a(n)b(n)\Big |\ge \left\| a\right\| _{U_{\ell }({\mathbb N})}^{2^\ell }-\varepsilon . \end{aligned}$$

Remark

It is crucial that the seminorms were defined using uniform and not standard Cesàro averages as in the latter case it is shown in [12, Paragraph 2.4.3] that the corresponding inverse theorem fails. For standard Cesàro averages a finitary inverse theorem was proved in [10] but it is not clear whether it has an infinitary variant that is useful for our purposes.

Proof

We refer the reader to [12] for notation used in this argument. In what follows we assume that the seminorms \(\left\| a\right\| _{\mathbf{I}, \ell }\) are defined as in [12].

Let \(0<\varepsilon <1\). By [12, Proposition 6.2] there exists a sequence of intervals \(\mathbf{I}=(I_N)\) with lengths tending to infinity and an \((\ell -1)\)-step nilsequence \((c(n))\) of the form \(c(n)=F(g^n\Gamma )\), where \(F\) is a continuous function on an \((\ell -1)\)-step nilmanifold \(X=G/\Gamma \) and \(g\in G\) is an element that acts ergodically on \(X\), such that the sequences \(a-c\) and \(a\) satisfy property \(\mathcal {P}(\ell )\) on \(\mathbf{I}\) and moreover we have the estimates

$$\begin{aligned} \left\| a-c\right\| _{\mathbf{I}, \ell }\le \varepsilon , \quad \left\| a\right\| _{\mathbf{I}, \ell }\ge \left\| a\right\| _{U_\ell ({\mathbb N})}-\varepsilon . \end{aligned}$$
(4)

Furthermore, we have \(\left\| F\right\| _\infty \le 1\), this is because in the proof of [12, Proposition 6.2] the function \(F\) is defined as a conditional expectation of a function bounded by \(1\). We let \(b(n):=H(g^n\Gamma )\), where \(H:=\mathcal {D}_\ell F\), and check that the asserted properties are satisfied.

First note that \((b(n))\) is an \((\ell -1)\)-step nilsequence and since \(\left\| F\right\| _\infty \le 1\) we have \(\left\| H\right\| _\infty \le 1\), hence \(\left\| b\right\| _\infty \le 1\). Furthermore, by [12, Corollary 5.3] we have \(H\in C(X)\), hence \(F\cdot H\in C(X)\), and since \(g\) acts ergodically on \(X\) we have

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{|I_N|}\sum _{n\in I_N} c(n)b(n)= \int F\cdot H\, dm_X= \left\| F\right\| _\ell ^{2^\ell }=\left\| c\right\| _{\mathbf{I}, \ell }^{2^\ell } \end{aligned}$$

where we used the identity \(\int F\cdot \mathcal {D}_\ell F\, dm_X=\left\| F\right\| _\ell ^{2^\ell }\) and [12, Corollary 3.11] to justify the last two identities. By (4) and the triangle inequality this is greater or equal than

$$\begin{aligned} (\left\| a\right\| _{\mathbf{I}, \ell }-\varepsilon )^\ell \ge (\left\| a\right\| _{U_\ell ({\mathbb N})}-2\varepsilon )^\ell \ge \left\| a\right\| _{U_\ell ({\mathbb N})}^\ell -k_\ell \varepsilon \end{aligned}$$

for some positive integer \(k_\ell \). On the other hand, by [12, Theorem 2.13] we have

$$\begin{aligned} \limsup _{N\rightarrow \infty }\Big |\frac{1}{|I_N|}\sum _{n\in I_N}(a(n)-c(n))b(n) \Big |\le \left\| a-c\right\| _{\mathbf{I}, \ell } \left\| b\right\| ^*_{\ell }\le \varepsilon \end{aligned}$$

where we used (4) and that \(\left\| b\right\| ^*_{\ell }=\left\| \mathcal {D}_\ell F\right\| _\ell ^*=\left\| F\right\| _\ell ^{2^\ell -1}\le 1\) (the second identity follows from [12, Equation (14)]). Combining the previous bounds we get the asserted result. \(\square \)

2.2 Proof of Theorem 1.3

Let \(\ell \in {\mathbb N}\) and \((a(n))\) be an \(\ell \)-regular and \(\ell \)-anti-uniform sequence with \(\left\| a\right\| _\infty \le 1\). We first remark that the limit

$$\begin{aligned} \lim _{N-M\rightarrow \infty } \frac{1}{N-M}\sum _{n=M}^{N-1} |a(n)|^2 \quad \text { exists}. \end{aligned}$$
(5)

This follows from our anti-uniformity assumption and [12, Theorem 2.19] (it applies since \((a(n))\) is \(\ell \)-regular) which states that for every \(\epsilon >0\) we have a decomposition \(a=a_1+a_2\) where \(a_1\) is an \((\ell -1)\)-step nilsequence and \(\left\| a_2\right\| _{U_\ell ({\mathbb N})}\le \epsilon \). Writing \(|a(n)|^2=a\bar{a}_1+a\bar{a}_2\) one checks the asserted convergence at once.

We let

$$\begin{aligned} Y:=\Big \{(\psi (n)) :\psi \text { is an } (\ell -1)\text {-step nilsequence}\Big \} \end{aligned}$$

and

$$\begin{aligned} X:=\text {span}\{{Y,a\}}. \end{aligned}$$

On \(X\times X\) we define the bilinear form

$$\begin{aligned} \langle f, g \rangle :=\lim _{N-M\rightarrow \infty } \frac{1}{N-M}\sum _{n=M}^{N-1} f(n)\overline{g}(n). \end{aligned}$$

Note that the limit exists for \(f,g\in X\). This is the case if \(f\) or \(g\) is equal to \(a\) because of our regularity assumption and (5), and when both \(f\) and \(g\) are in \(Y\) because limits of uniform Cesàro averages of nilsequences exist [13, 16]. This bilinear form induces the seminorm

$$\begin{aligned} \left\| f\right\| _2:=\sqrt{\langle f, f \rangle }. \end{aligned}$$

This is the restriction on \(X\) of the seminorm (3) defined on \(\ell ^\infty ({\mathbb N})\).

Let \(\varepsilon >0\). There exists \(y_0\in Y\) such that

$$\begin{aligned} \left\| a-y_0\right\| _2^2\le d^2+\delta ^2 \end{aligned}$$
(6)

where

$$\begin{aligned} d:=\inf \{ \left\| a-y\right\| _2:y\in Y\}, \quad \delta :=(\varepsilon /(4C))^{2^\ell }, \end{aligned}$$
(7)

and \(C:=C(\ell ,a)\) is the constant determined by our \(\ell \)-anti-uniformity assumption on \(a\). We can assume that \(C\ge 1\). Furthermore, we can assume without loss of generality that

$$\begin{aligned} \left\| y_0\right\| _\infty \le 1. \end{aligned}$$
(8)

Indeed, let \(y_0:=(F(g^n\Gamma ))\) where \(X=G/\Gamma \) is a nilmanifold, \(g\in G\), and \(F\in C(X)\). Then the sequence \(\tilde{y}_0:=(\tilde{F}(g^n\Gamma ))\), where \(\tilde{F}:=F\cdot \mathbf {1}_{|F|\le 1}+e^{2\pi i \arg (F)}\cdot \mathbf {1}_{|F|\ge 1} \in C(X)\), is a nilsequence, \(\left\| \tilde{y}_0\right\| _\infty \le 1\), and as \(\left\| a\right\| _\infty \le 1\) we get that \(|a(n)-\tilde{y}_0(n)|\le |a(n)-y_0(n)|\) for every \(n\in {\mathbb N}\), hence \(\left\| a-\tilde{y}_0\right\| _2\le \left\| a-y_0\right\| _2\).

It follows from (6) that for every \(y\in Y\) we have

$$\begin{aligned} -\delta ^2\le \left\| a-(y_0+\delta y)\right\| _2^2-\left\| a-y_0\right\| _2^2=-2\delta \text {Re}(\langle a-y_0,y\rangle )+\delta ^2\left\| y\right\| _2^2. \end{aligned}$$

Hence,

$$\begin{aligned} \text {Re}(\langle a-y_0,y\rangle ) \le \delta \ \text { for every } y\in Y \text { with } \left\| y\right\| _2\le 1. \end{aligned}$$

Inserting \(-y\) and \(\pm i y\) in place of \(y\) we deduce that

$$\begin{aligned} \sup _{y \in Y:\left\| y\right\| _2\le 1} |\langle a-y_0,y\rangle | \le 2\delta . \end{aligned}$$
(9)

Since the set \(\{y\in Y:\left\| y\right\| _2\le 1\}\) contains all \((\ell -1)\)-step nilsequences that are bounded by \(1\), we deduce from Theorem 2.1 that

$$\begin{aligned} \left\| a-y_0\right\| _{U_{\ell }({\mathbb N})}\le (2\delta )^{2^{-\ell }}. \end{aligned}$$
(10)

We let

$$\begin{aligned} a_{st}:=y_0, \quad a_{er}:=a-y_0. \end{aligned}$$

Then

$$\begin{aligned} a=a_{st}+a_{er} \end{aligned}$$

and \((a_{st}(n))\) is an \((\ell -1)\)-step nilsequence with \(\left\| a_{st}\right\| _\infty \le 1\) by (8). Since \(a\) is \(\ell \)-anti-uniform we get using (10) and the definition of \(\delta \) in (7) that

$$\begin{aligned} |\langle a,a_{er}\rangle |\le C\left\| a_{er}\right\| _{U_{\ell }({\mathbb N})}\le \varepsilon /2. \end{aligned}$$

Furthermore, (9) gives

$$\begin{aligned} |\langle a_{st},a_{er}\rangle |\le \varepsilon /2. \end{aligned}$$

Combining the last two estimates we deduce that

$$\begin{aligned} \left\| a_{er}\right\| _2^2 =\langle a_{er},a_{er}\rangle \le |\langle a,a_{er}\rangle |+ |\langle a_{st},a_{er}\rangle | \le \varepsilon . \end{aligned}$$

This completes the proof of Theorem 1.3.

2.3 Proof of Theorem 1.1

In view of Theorem 1.3, it suffices to prove that for every \(\ell \in {\mathbb N}\) the sequence \(a:{\mathbb N}\rightarrow {\mathbb C}\) defined by

$$\begin{aligned} a(n):=\int T_1^{n}f_1\cdot \ldots \cdot T_\ell ^{n}f_\ell \ d\mu , \quad n\in {\mathbb N}, \end{aligned}$$
(11)

is \(\ell \)-anti-uniform and \(\ell \)-regular.

2.3.1 Anti-uniformity

Throughout, we can and will assume that \(\left\| f_i\right\| _\infty \le 1\) for \(i=1,\ldots , \ell \). The \(\ell \)-anti-uniformity follows by successive applications of the following Hilbert space variant of van der Corput’s estimate (for a proof see [4]).

Lemma 2.2

Let \((v_n)\) be a bounded sequence of vectors in an inner product space and \((I_N)\) be a sequence of intervals with lengths tending to infinity. Then

$$\begin{aligned} \limsup _{N\rightarrow \infty } \left\| \frac{1}{|I_N|}\sum _{n\in I_N} v_n\right\| ^2\le 4 \ \! \limsup _{H\rightarrow \infty } \frac{1}{H}\sum _{h=1}^H \limsup _{N\rightarrow \infty }\Big | \frac{1}{|I_N|}\sum _{n\in I_N} \langle v_{n+h},v_{n}\rangle \Big |. \end{aligned}$$

It suffices to show that for every \(\ell \in {\mathbb N}\) and every sequence of intervals \(\mathbf{I}:=(I_N)\) with lengths tending to infinity, any sequence \((a(n))\) given by (11) satisfies the estimate

$$\begin{aligned} \limsup _{N\rightarrow \infty } \Big |\frac{1}{|I_N|}\sum _{n\in I_N} a(n)b(n)\Big |\le 4 \left\| b\right\| _{U_{\ell }({\mathbb N})} \end{aligned}$$

for every \(b\in \ell ^{\infty }({\mathbb N})\). Using a diagonal argument and passing to a subsequence of \((I_N)\) (if necessary) we can and will assume that the sequence \((b(n))\) is distributed regularly along the sequence \(\mathbf{I}\). It suffices to establish that for any sequence \((a(n))\) as in (11) which is bounded by \(1\) and any \(b\in \ell ^\infty ({\mathbb N})\) which is distributed regularly along a sequence of intervals \(\mathbf{I}\), we have

$$\begin{aligned} \limsup _{N\rightarrow \infty } \Big |\frac{1}{|I_N|}\sum _{n\in I_N} a(n)b(n)\Big |\le 4 \left\| b\right\| _{\mathbf{I}, \ell }. \end{aligned}$$
(12)

We prove this by induction on \(\ell \). For \(\ell =1\) the result holds trivially. Suppose that \(\ell \ge 2\) and the statement holds for \(\ell -1\).

We compose with \(T_\ell ^{-n}\), use the Cauchy-Schwarz inequality, and then Lemma 2.2 (on the space \(L^2(\mu )\)) for the sequence

$$\begin{aligned} v_n := b(n)\cdot \tilde{T}_1^nf_1\cdot \tilde{T}_2^nf_2\cdot \ldots \cdot \tilde{T}_{\ell -1}^nf_{\ell -1}, \quad n\in {\mathbb N}, \end{aligned}$$

where \(\tilde{T}_i:=T_iT_\ell ^{-1}\) for \(i=1,\ldots ,\ell -1\). We deduce that the square of the left hand side in (12) is bounded by

$$\begin{aligned} \limsup _{N\rightarrow \infty }\Bigl ||\frac{1}{|I_N|}\sum _{n\in I_N}v_{n}\Bigr ||_{L^2(\mu )}^2\!\le \! 4 \limsup _{H\rightarrow \infty }\frac{1}{H}\sum _{h=1}^{H} \limsup _{N\rightarrow \infty }\Big | \frac{1}{|I_N|}\sum _{n\in I_N} \langle { v_{n+h}, v_n \rangle }\Big |. \end{aligned}$$
(13)

A simple computation gives that

$$\begin{aligned} \frac{1}{|I_N|}\sum _{n\in I_N} \langle { v_{n+h}, v_n \rangle } \!=\! \frac{1}{|I_N|}\sum _{n\in I_N} b(n\!+\!h)\cdot \bar{b}(n) \!\int \tilde{T}_1^n \tilde{f}_{1,h} \cdot \ldots \cdot \tilde{T}_{\ell \!-\!1}^n \tilde{f}_{\ell \!-\!1,h}\,d\mu \end{aligned}$$

where \(\tilde{f}_{j,h}=\tilde{T}_{j}^hf_{j}\cdot \bar{f}_{j}\) for \(j=1,\ldots , \ell -1\). Note that the maps \(\tilde{T}_1, \ldots , \tilde{T}_{\ell -1}\) commute, for \(h\in {\mathbb N}\) the sequence \((b(n+h) \bar{b}(n))\) is distributed regularly along \(\mathbf{I}\), and \(\left\| \tilde{f}_{j,h}\right\| _\infty \le 1\) for \(j=1,\ldots , \ell -1\). Using the induction hypothesis and the defining property of the seminorms we can bound the right hand side in (13) by \(16\) times

$$\begin{aligned} \lim _{H\rightarrow \infty } \frac{1}{H}\sum _{h=1}^{H}\left\| \sigma _hb\cdot b\right\| _{\mathbf{I}, \ell -1 } \le \lim _{H\rightarrow \infty } \left( \frac{1}{H} \sum _{h=1}^{H}\left\| \sigma _hb\cdot b\right\| _{\mathbf{I}, \ell -1 }^{2^{\ell -1}}\right) ^{1/2^{\ell -1}} =\left\| b\right\| _{\mathbf{I}, \ell }^2 \end{aligned}$$

where \((\sigma _h b)(n):=b(n+h)\). Taking square roots we get the asserted estimate.

2.3.2 Regularity

Let \(\ell \in {\mathbb N}\). To prove that \((a(n))\) is \(\ell \)-regular we will use a known mean convergence result for multiple ergodic averages and Proposition 2.4 below. We start with the following result of Green and Tao:

Lemma 2.3

([9, Lemma 14.2]) For \(\ell \in {\mathbb N}\) let \(X=G/\Gamma \) be an \((\ell -1)\)-step nilmanifold. Then there exists a continuous map \(P:X^{\ell }\rightarrow X\) such that

$$\begin{aligned} P(hg\Gamma , h^2g\Gamma , \ldots , h^{\ell } g\Gamma )=g\Gamma , \quad \text { for every } g,h\in G. \end{aligned}$$
(14)

The result in [9, Lemma 14.2] gives \(P(g\Gamma ,hg\Gamma , h^2g\Gamma , \ldots , h^{\ell -1} g\Gamma )=h^{\ell } g\Gamma \). Inserting \(h^{-\ell }g \) in place of \(g\), then \(h^{-1}\) in place of \(h\), and rearranging coordinates, we get (14).

Proposition 2.4

For \(\ell \in {\mathbb N}\) let \((\psi (n))\) be an \((\ell -1)\)-step nilsequence. Then for every \(\varepsilon >0\) there exists a system \((X,\mathcal {X},\mu ,T)\) and functions \(f_1,\ldots , f_{\ell }\in L^\infty (\mu )\), such that the sequence \((b(n))\), defined by

$$\begin{aligned} b(n):=\int T^{k_1n}f_1 \cdot \ldots \cdot T^{k_{\ell } n}f_{\ell }\ d\mu , \quad n\in {\mathbb N}, \end{aligned}$$
(15)

where \(k_i:=\ell !/i\) for \(i=1,\ldots , \ell \), satisfies

$$\begin{aligned} \left\| \psi -b\right\| _\infty \le \varepsilon . \end{aligned}$$

Remark

To prove a variant of this result that uses the integers \(1, \ldots , \ell \) in place of \(k_1,\ldots , k_\ell \), one would have to prove a non-trivial variant of Lemma 2.3 that establishes in place of (14) the identity \(P(h^{k_1}g\Gamma , h^{k_2}g\Gamma , \ldots , h^{k_\ell } g\Gamma )=g\Gamma \) for every \(g,h\in G\).

Combining [6, Theorem A (ii)] with Proposition 2.4 one deduces that for every bounded generalized polynomial \(p:{\mathbb N}\rightarrow {\mathbb R}\) (see definition in [6]) the sequences \((p(n))\) and \((e^{ip(n)})\) can be approximated arbitrarily well in \(\left\| \cdot \right\| _2\) by a sequence of the form (15).

Proof

Let \(\varepsilon >0\) and

$$\begin{aligned} \psi (n):=F(g^n\Gamma ) \end{aligned}$$

where \(F\in C(X)\), \(X=G/\Gamma \) is an \((\ell -1)\)-step nilmanifold, and \(g\in G\).

By [13, Paragraph 1.11] we have that \(X\) is isomorphic to a subnilmanifold of a nilmanifold \(\tilde{X}=\tilde{G}/\tilde{\Gamma }\), where \(\tilde{G}\) is a connected and simply connected \((\ell -1)\)-step nilpotent Lie group, \(\tilde{\Gamma }\) is a discrete cocompact subgroup of \(\tilde{G}\), and all elements of \(G\) are represented in \(\tilde{G}\). Then \(\psi (n)=\tilde{F}(\tilde{b}^n\tilde{\Gamma })\) for some \(\tilde{b}\in \tilde{G}\) and \(\tilde{F}\in C(\tilde{X})\). Hence, in what follows we can and will assume that the group \(G\) is connected.

Using Lemma 2.3 with \(g^n\) in place of \(g\) and \(h:=g^m\), \(m,n\in {\mathbb N}\), we get that there exists a continuous map \(P:X^{\ell }\rightarrow X\) such that

$$\begin{aligned} g^n\Gamma =P(g^{m+n}\Gamma , g^{2m+n}\Gamma ,\ldots , g^{\ell m+n}\Gamma ) \quad \text { for every } m,n\in {\mathbb N}. \end{aligned}$$
(16)

Let \(g_0\in G\) be such that \(g_0^{\ell !}=g\) (such a \(g_0\) exists since \(G\) is connected, hence divisible) and for \(i=1,\ldots , \ell \) let \(g_i:=g_0^i\). Applying (16) with \(g_0\) in place of \(g\) and \(\ell ! n\) (a multiple of \(n\) is needed that is divisible by all the coefficients of \(m\) that appear in (16)) in place of \(n\) we get

$$\begin{aligned} \psi (n)\!=\!\!F(g_0^{\ell ! n}\Gamma )\!=\!\tilde{F}(g_1^{m+k_1n}\Gamma , g_2^{m+k_2n}\Gamma ,\ldots , g_{\ell }^{ m+k_{\ell } n}\Gamma ) \quad \text {for every}\, m,n\!\in \! {\mathbb N}, \end{aligned}$$

where \(\tilde{F}:=F\circ P\in C(X^{\ell })\). Averaging over \(m\in {\mathbb N}\) we get

$$\begin{aligned} \psi (n)\!=\!\lim _{M\rightarrow \infty } \frac{1}{M}\sum _{m=1}^M \tilde{F}(g_1^{m+k_1n}\Gamma , g_2^{m+k_2n}\Gamma ,\ldots , g_{\ell }^{ m+k_{\ell } n}\Gamma ) \quad \text {for every} \, n\!\in \! {\mathbb N}. \end{aligned}$$

Since \(\tilde{F}\) can be approximated uniformly by linear combinations of functions of the form \(\tilde{f}_1\otimes \cdots \otimes \tilde{f}_{\ell }\), where for \(i=1,\ldots , \ell \) the function \(\tilde{f}_i \in C(X^\ell )\) depends on the coordinate \(x_i\) only, we get that \((\psi (n))\) can be approximated in the \(\left\| \cdot \right\| _\infty \) norm within \(\varepsilon \) by a finite linear combination of sequences \((a(n))\) of the form

$$\begin{aligned} a(n):=\lim _{M\rightarrow \infty } \frac{1}{M}\sum _{m=1}^M \tilde{f}_1(\tilde{g}^{m+k_1n}\tilde{\Gamma })\cdot \tilde{f}_2(\tilde{g}^{m+k_2n}\tilde{\Gamma })\cdot \ldots \cdot \tilde{f}_{\ell }(\tilde{g}^{ m+k_{\ell } n}\tilde{\Gamma }), \quad n\in {\mathbb N}, \end{aligned}$$
(17)

where \(\tilde{X}:=X^{\ell }\) , \(\tilde{\Gamma }:=\Gamma \times \cdots \times \Gamma \), \(\tilde{f}_i\in C(\tilde{X})\), and \(\tilde{g}:=(g_1, \ldots , g_{\ell })\). It is known (see [13] for example) that the limit in (17) is equal to

$$\begin{aligned} \int _{\tilde{Y}} \tilde{f}_1(\tilde{g}^{k_1n}\tilde{y})\cdot \tilde{f}_2(\tilde{g}^{k_2n}\tilde{y})\cdot \ldots \cdot \tilde{f}_{\ell }(\tilde{g}^{k_{\ell } n}\tilde{y}) \, dm_{\tilde{Y}}, \quad n\in {\mathbb N}, \end{aligned}$$

where \(\tilde{Y}\) is the subnilmanifold of \(\tilde{X}\) defined by the closure of the set \(\{\tilde{g}^m \tilde{\Gamma }:m\in {\mathbb N}\}\). This proves that the sequence \((a(n))\) has the form (15). Since finite linear combinations of sequences of the form (15) still have the form (15) (see Sect. 1.3) the proof is complete. \(\square \)

We are now ready to verify that if \((a(n))\) is as in (11), then it is \(\ell \)-regular for every \(\ell \in {\mathbb N}\).

By Proposition 2.4, in order to check that the limit \(\lim _{N-M\rightarrow \infty } \frac{1}{N-M}\sum _{n=M}^{N-1}a(n)\psi (n) \) exists for every \((\ell -1)\)-step nilsequence \((\psi (n))\), it suffices to check that the limit

$$\begin{aligned} \lim _{N-M\rightarrow \infty } \frac{1}{N-M}\sum _{n=M}^{N-1} a(n)b(n) \end{aligned}$$
(18)

exists for every sequence \((b(n))\) of the form \(\int S^{k_1n}g_1\cdot \ldots \cdot S^{k_{\ell } n}g_{\ell }\ d\nu \) , where \(k_1,\ldots , k_\ell \in {\mathbb N}\), \((Y,\mathcal {Y},\nu , S)\) is a system, and \(g_1,\ldots , g_{\ell }\in L^\infty (\nu )\). This follows from the mean convergence result of Austin [1] (which strengthens the convergence result of Tao [18] to uniform averages) applied to the transformations \(\tilde{T}_i:=T_i\times S^{k_i}\) acting on \(X\times Y\) with the measure \(\tilde{\mu }:=\mu \times \nu \) and the functions \(\tilde{f_i}:=f_i\otimes g_i \in L^\infty (\tilde{\mu })\), \(i=1,\ldots , \ell \).

2.4 Proof of Theorem 1.2

Modulo a known convergence result of Walsh [19] the argument is similar to the one used to prove Theorem 1.1, we explain the minor modifications needed next.

To verify \(k\)-anti-uniformity for some \(k\in {\mathbb N}\) that depends only on \(\ell , m\) and the maximum degree of the polynomials \(p_{i,j}\), one has to make successive uses of Lemma 2.2 and apply an inductive argument, often called PET induction, introduced by V. Bergelson in [4]. The details are very similar to those in the proof of [8, Lemma 3.5] and so we omit them.

To verify regularity, we can argue as in the case of linear iterates, using the convergence result of Walsh [19] for averages of expressions of the form (2). At the very last step one needs to verify that if \((a(n))\) is as in (2), then the limit (18) exists for every sequence \((b(n))\) of the form \(\int S^{k_1n}g_1\cdot \ldots \cdot S^{k_r n}g_r\ d\nu \), where \(r\in {\mathbb N}\) is arbitrary, \(k_1,\ldots , k_r\in {\mathbb N}\), \((Y,\mathcal {Y},\nu , S)\) is a system, and \(g_1,\ldots , g_r\in L^\infty (\nu )\). The only change needed is to use Walsh’s convergence result for the \(\ell +r\) commuting measure preserving transformations \(T_i\times \text {id}\), \(i=1,\ldots , \ell \), and \(\text {id}\times S^{k_j}\), \(j=1,\ldots , r\), acting on \(X\times Y\) with the measure \(\tilde{\mu }:=\mu \times \nu \), and the functions \(f_i\otimes 1\), \(i=1,\ldots , \ell \) and \(1\otimes g_j\), \(j=1 \ldots , r\). If the polynomial iterates are chosen appropriately, one verifies that \(a(n)b(n)\) is also a multiple correlation sequence with polynomial iterates, hence, by Walsh’s convergence result [19], the limit (18) exists.

2.5 Extension to nilpotent groups

Essentially the same argument can be used when the transformations \(T_1,\ldots , T_\ell \) generate a nilpotent group; the only extra difficulty occurs in proving \(k\)-anti-uniformity for some \(k\in {\mathbb N}\) that depends also on the degree of nilpotency of the group generated by \(T_1,\ldots , T_\ell \). In this case, the PET induction is somewhat more complicated, but can be handled by modifying the PET induction used in [8, Lemma 3.5] along the lines of the argument used to prove [19, Theorem 4.2].

2.6 Proof of Theorem 1.4

The inclusion \(\overline{\mathcal {A}_{\ell }}\subset \overline{\mathcal {B}_\ell }\) follows from Proposition 2.4. The inclusion \(\overline{\mathcal {B}_{\ell }}\subset \overline{\mathcal {C}_\ell }\) is obvious. The inclusion \(\overline{\mathcal {C}_{\ell }}\subset \overline{\mathcal {A}_\ell }\) follows from Theorem 1.1.

2.7 Proof of Theorems 1.5 and 1.6

The implication \((ii)\Rightarrow (i)\) follows from Proposition 2.4. (for Theorem 1.6 in order to get property (i) for some fixed \(\ell \in {\mathbb N}\) we use property (ii) for \(\ell !\)). The implication \((i) \Rightarrow (iii)\) follows from Theorem 1.1 and the remarks following this theorem. The implication \((iii) \Rightarrow (ii)\) is obvious.

The same argument applies for the extensions mentioned after Theorem 1.6 related to uniform and weighted Cesàro averages.