1 Introduction

For a permutation w in the symmetric group \(S_n\) written in one-line notation \(w = w_1 \dots w_n\), the inversion and major index statistics are given by

$$\begin{aligned} {{\,\mathrm{inv}\,}}(w) := \#\{i < j : w_i> w_j\} \quad \text {and}\quad {{\,\mathrm{maj}\,}}(w) := \sum _{\begin{array}{c} 1 \le i \le n-1\\ w_i > w_{i+1} \end{array}} i. \end{aligned}$$

It is well known that \({{\,\mathrm{inv}\,}}\) and \({{\,\mathrm{maj}\,}}\) are equidistributed on \(S_n\) [Mac, §1] with common mean and standard deviation

$$\begin{aligned} \mu _n = \frac{n(n-1)}{4} \quad \text {and}\quad \sigma _n^2 = \frac{2n^3 + 3n^2 - 5n}{72}. \end{aligned}$$

These results also follow easily from our arguments; see Remark 2.7. In [BZ10], Baxter and Zeilberger proved that \({{\,\mathrm{inv}\,}}\) and \({{\,\mathrm{maj}\,}}\) are jointly independently asymptotically normally distributed as \(n \rightarrow \infty \). More precisely, define normalized random variables on \(S_n\)

$$\begin{aligned} X_n := \frac{{{\,\mathrm{inv}\,}}- \mu _n}{\sigma _n}, \quad Y_n := \frac{{{\,\mathrm{maj}\,}}- \mu _n}{\sigma _n}. \end{aligned}$$
(1)

Theorem 1.1

(Baxter–Zeilberger [BZ10]). For each \(u, v \in \mathbb {R}\), we have

$$\begin{aligned} \lim _{n \rightarrow \infty } \mathbb {P}[X_n \le u, Y_n \le v] = \frac{1}{2\pi } \int _{-\infty }^u \int _{-\infty }^v e^{-x^2/2} e^{-y^2/2}\,\mathrm{d}y\,\mathrm{d}x. \end{aligned}$$

See [BZ10] for further historical background. Baxter and Zeilberger’s argument involves mixed moments and recurrences based on combinatorial manipulations with permutations. Romik suggested a generating function due to Roselle, quoted as Theorem 2.2 below, should provide another approach. Zeilberger subsequently offered a $300 reward for such an argument, which has happily now been collected. Our overarching motivation, and Romik’s original impetus for suggesting Roselle’s formula, is to give a local limit theorem, i.e. a formula for the counts \(\#\{w \in S_n : {{\,\mathrm{inv}\,}}(w) = u, {{\,\mathrm{maj}\,}}(w) = v\}\) with an explicit error term, which will be the subject of a future article. For further context, see [Zei] and [Thi16].

2 Consequences of Roselle’s Formula

Here we recall Roselle’s formula, originally stated in different but equivalent terms, and derive a generating function expression which quickly motivates Theorem 1.1.

Definition 2.1

Let \(H_n\) be the bivariate \({{\,\mathrm{inv}\,}}, {{\,\mathrm{maj}\,}}\) generating function on \(S_n\), i.e.

$$\begin{aligned} H_n(p, q) := \sum _{w \in S_n} p^{{{\,\mathrm{inv}\,}}(w)} q^{{{\,\mathrm{maj}\,}}(w)}. \end{aligned}$$

Theorem 2.2

(Roselle [Ros74]). We have

$$\begin{aligned} \sum _{n \ge 0} \frac{H_n(p, q) z^n}{(p)_n (q)_n} = \prod _{a, b \ge 0} \frac{1}{1 - p^a q^b z}, \end{aligned}$$
(2)

where \((p)_n := (1-p)(1-p^2) \cdots (1-p^n)\).

The following is the main result of this section. An integer partition \(\mu \vdash n\) is a weakly decreasing list of positive integers \(\mu _1 \ge \mu _2 \ge \cdots \ge \mu _k > 0\) summing to n. The length of \(\mu \) is \(\ell (\mu ) = k\).

Theorem 2.3

There are constants \(c_\mu \in \mathbb {Z}\) indexed by integer partitions \(\mu \vdash n\) such that

$$\begin{aligned} \frac{H_n(p, q)}{n!} = \frac{[n]_p! [n]_q!}{n!^2} F_n(p, q), \end{aligned}$$
(3)

where

$$\begin{aligned} F_n(p, q) = \sum _{d=0}^n [(1-p)(1-q)]^d \sum _{\begin{array}{c} \mu \vdash n\\ \ell (\mu ) = n-d \end{array}} \frac{c_\mu }{\prod _i [\mu _i]_p [\mu _i]_q} \end{aligned}$$
(4)

and \([n]_p! := [n]_p [n-1]_p \cdots [1]_p\), \([c]_p := 1 + p + \cdots + p^{c-1} = (1-p^c)/(1-p)\).

An explicit expression for \(c_\mu \) is given below in (12). The rest of this section is devoted to proving Theorem 2.3. Straightforward manipulations with (2) immediately yield (3), where

$$\begin{aligned} F_n(p, q) := (1-p)^n (1-q)^n n! \cdot \{z^n\} \left( \prod _{a, b \ge 0} \frac{1}{1 - p^a q^b z}\right) \end{aligned}$$
(5)

and \(\{z^n\}\) here refers to extracting the coefficient of \(z^n\). Thus it suffices to show (5) implies (4). By standard arguments, the \(z^n\) coefficient of the product over ab in (5) is the bivariate generating function of size-n multisets of pairs \((a, b) \in \mathbb {Z}_{\ge 0}^2\), where the weight of such a multiset is \(p^{\sum _i a_i} q^{\sum _i b_i}\). We will use this same componentwise-sum weight throughout.

Definition 2.4

For \(\lambda \vdash n\), let \(M_\lambda \) be the bivariate generating function for multisets of pairs \((a, b) \in \mathbb {Z}_{\ge 0}^2\) of type \(\lambda \), meaning some element has multiplicity \(\lambda _1\), another element has multiplicity \(\lambda _2\), etc.

We clearly have

$$\begin{aligned} \{z^n\}\left( \prod _{a, b \ge 0} \frac{1}{1 - p^a q^b z}\right) = \sum _{\lambda \vdash n} M_\lambda (p, q), \end{aligned}$$
(6)

though the \(M_\lambda \) are inconvenient to work with, so we perform a change of basis.

Definition 2.5

Let P[n] denote the lattice of set partitions of \([n] := \{1, 2, \ldots , n\}\) with minimum \(\widehat{0} = \{\{1\}, \{2\}, \ldots , \{n\}\}\) and maximum \(\widehat{1} = \{\{1, 2, \ldots , n\}\}\). Here \(\Lambda \le \Pi \) means that \(\Pi \) can be obtained from \(\Lambda \) by merging blocks of \(\Lambda \). The type of a set partition \(\Lambda \) is the integer partition obtained by rearranging the list of the block sizes of \(\Lambda \) in weakly decreasing order. For \(\lambda \vdash n\), set

$$\begin{aligned} \Pi (\lambda ) := \{\{1, 2, \ldots , \lambda _1\}, \{\lambda _1+1, \lambda _1+2, \ldots , \lambda _1+\lambda _2\}, \ldots \}, \end{aligned}$$

which has type \(\lambda \).

Definition 2.6

For \(\Pi \in P[n]\), let \(R_\Pi \) denote the bivariate generating function for lists \(L \in (\mathbb {Z}_{\ge 0}^2)^n\) where for each block of \(\Pi \) the entries in L from that block are all equal. Similarly, let \(S_\Pi \) denote the bivariate generating function of lists L where in addition to entries from the same block being equal, entries from two different blocks are not equal.

We easily see that

$$\begin{aligned} R_\Lambda = \prod _{A \in \Lambda } \frac{1}{(1 - p^{\#A})(1 - q^{\#A})} \end{aligned}$$
(7)

and that

$$\begin{aligned} R_\Lambda = \sum _{\Pi : \Lambda \le \Pi } S_\Pi , \end{aligned}$$
(8)

so that, by Möbius inversion on P[n],

$$\begin{aligned} S_\Pi = \sum _{\Lambda : \Pi \le \Lambda } \mu (\Pi , \Lambda ) R_\Lambda , \end{aligned}$$
(9)

where \(\mu (\Pi , \Lambda )\) is the Möbius function of the lattice of set partitions. Under the “forgetful” map from lists to multisets, a multiset of type \(\lambda \vdash n\) has fiber of size \(\left( {\begin{array}{c}n\\ \lambda \end{array}}\right) \). It follows that

$$\begin{aligned} S_{\Pi (\lambda )} = \frac{n!}{\lambda !} M_\lambda , \end{aligned}$$
(10)

where \(\lambda ! := \lambda _1! \lambda _2! \cdots \). Combining in order (5), (6), (10), (9), and (7) gives

$$\begin{aligned} F_n(p, q) = \sum _{d=0}^n [(1-p)(1-q)]^d \sum _{\lambda \vdash n} \lambda ! \sum _{\begin{array}{c} \Lambda : \Pi (\lambda ) \le \Lambda \\ \#\Lambda = n-d \end{array}} \frac{\mu (\Pi (\lambda ), \Lambda )}{\prod _{A \in \Lambda } [\#A]_p [\#A]_q}. \end{aligned}$$
(11)

Now (4) follows from (11), where

$$\begin{aligned} c_\mu = \sum _{\lambda \vdash n} \lambda ! \sum _{\begin{array}{c} \Lambda : \Pi (\lambda ) \le \Lambda \\ {{\,\mathrm{type}\,}}(\Lambda )=\mu \end{array}} \mu (\Pi (\lambda ), \Lambda ). \end{aligned}$$
(12)

This completes the proof of Theorem 2.3.

Remark 2.7

From (12), \(c_{(1^n)} = 1\) since the sum only involves \(\Lambda = \widehat{0}\), where \((1^n)\) refers to the partition with n parts of size 1. Letting \(p \rightarrow 1\) (or, symmetrically, \(q \rightarrow 1\)) in (4), the only surviving term is \(d=0\) and \(\mu = (1^n)\). Consequently, \(H_n(1, q) = [n]_q!\), recovering a classic result of MacMahon [Mac, §1]. Explicitly,

$$\begin{aligned} \sum _{w \in S_n} q^{{{\,\mathrm{inv}\,}}(w)} = [n]_q! = \sum _{w \in S_n} q^{{{\,\mathrm{maj}\,}}(w)}. \end{aligned}$$

The mean and standard deviation may be extracted by recognizing \([n]_q!/n!\) as the probability generating function of the sum of independent discrete random variables.

Remark 2.8

Using (3), we see that the probability generating function (discussed below in Example 4.3) \(H_n(p, q)/n!\) differs from \([n]_p! [n]_q!/n!^2\) by precisely the correction factor \(F_n(p, q)\). Using (5), this factor has the following combinatorial interpretation:

$$\begin{aligned} F_n = \frac{n! \cdot \text {g.f. of size}\hbox { - }n\ \text {multisets from}\ \mathbb {Z}_{\ge 0}^2}{\text {g.f. of size}\hbox { - }n\ \text {lists from}\ \mathbb {Z}_{\ge 0}^2}. \end{aligned}$$

Intuitively, the numerator and denominator should be the same “up to first order.” Theorem 3.1 will give one precise sense in which they are asymptotically equal.

3 Estimating the Correction Factor

This section is devoted to showing that the correction factor \(F_n(p, q)\) from Theorem 2.3 is negligible in an appropriate sense, Theorem 3.1. Recall that \(\sigma _n\) denotes the standard deviation of \({{\,\mathrm{inv}\,}}\) or \({{\,\mathrm{maj}\,}}\) on \(S_n\).

Theorem 3.1

Uniformly on compact subsets of \(\mathbb {R}^2\), we have

$$\begin{aligned} F_n(e^{is/\sigma _n}, e^{it/\sigma _n}) \rightarrow 1 \quad \text {as} \quad n \rightarrow \infty . \end{aligned}$$

We begin with some simple estimates starting from (11) which motivate the rest of the inequalities in this section. We may assume \(|s|, |t| \le M\) for some fixed M. Setting \(p=e^{is/\sigma _n}, q=e^{it/\sigma _n}\), we have \(|1-p| = |1-\exp (is/\sigma _n)| \le |s|/\sigma _n\). For n sufficiently large compared to M and \(1 \le c \le n\), we claim that \(|[c]_p| \ge 1\). Assuming this for the moment, for n sufficiently large, (11) gives

$$\begin{aligned} |F_n(e^{is/\sigma _n}, e^{it/\sigma _n}) - 1| \le \sum _{d=1}^n \frac{|st|^d}{\sigma _n^{2d}} \sum _{\lambda \vdash n} \lambda ! \sum _{\begin{array}{c} \Lambda : \Pi (\lambda ) \le \Lambda \\ \#\Lambda = n-d \end{array}} |\mu (\Pi (\lambda ), \Lambda )|. \end{aligned}$$
(13)

As for the claim, one finds \(|[c]_p| = |\sinh (cs/2\sigma _n)/\sinh (s/2\sigma _n)|\). For sufficiently large n, \(cs/2\sigma _n \ll 1\), and \(\sinh (z)\) is increasing near 0, which gives the claim. We now simplify the inner sum on the right-hand side of (13).

Lemma 3.2

Suppose \(\lambda \vdash n\) with \(\ell (\lambda ) = n-k\), and fix d. Then

$$\begin{aligned} \sum _{\begin{array}{c} \Lambda : \Pi (\lambda ) \le \Lambda \\ \#\Lambda = n-d \end{array}} \mu (\Pi (\lambda ), \Lambda ) = (-1)^{d-k} \sum _{\begin{array}{c} \Lambda \in P[n-k]\\ \#\Lambda = n-d \end{array}} \prod _{A \in \Lambda } (\#A-1)! \end{aligned}$$
(14)

and the terms on the left all have the same sign \((-1)^{d-k}\). The sums are empty unless \(n \ge d \ge k \ge 0\).

Proof

The upper order ideal \(\{\Lambda \in P[n] : \Pi (\lambda ) \le \Lambda \}\) is isomorphic to \(P[n-k]\) by collapsing the \(n-k\) blocks of \(\Pi (\lambda )\) to singletons. This isomorphism preserves the number of blocks. Furthermore, recall that in P[n] the Möbius function satisfies

$$\begin{aligned} \mu (\widehat{0}, \widehat{1}) = (-1)^{n-1} (n-1)!, \end{aligned}$$

from which it follows easily that

$$\begin{aligned} \mu (\widehat{0}, \Lambda ) = \prod _{A \in \Lambda } (-1)^{\#A - 1} (\#A - 1)!. \end{aligned}$$
(15)

The result follows immediately upon combining these observations. \(\square \)

Lemma 3.3

Let \(\lambda \vdash n\) with \(\ell (\lambda ) = n-k\) and \(n \ge d \ge k \ge 0\). Then

$$\begin{aligned} \sum _{\begin{array}{c} \Lambda : \Pi (\lambda ) \le \Lambda \\ \#\Lambda = n-d \end{array}} |\mu (\Pi (\lambda ), \Lambda )| \le (n-k)^{2(d-k)}. \end{aligned}$$
(16)

Proof

Using (14), we can interpret the sum as the number of permutations of \([n-k]\) with \(n-d\) cycles, which is a Stirling number of the first kind. There are well-known asymptotics for these numbers, though the stated elementary bound suffices for our purposes. We induct on d. At \(d=k\), the result is trivial. Given a permutation of \([n-k]\) with \(n-d\) cycles, choose \(i, j \in [n-k]\) from different cycles. Suppose the cycles are of the form \((i'\ \cdots \ i)\) and \((j\ \cdots \ j')\). Splice the two cycles together to obtain

$$\begin{aligned} (i'\ \cdots \ i\ j\ \cdots \ j'). \end{aligned}$$

This procedure constructs every permutation of \([n-k]\) with \(n-(d+1)\) cycles and requires no more than \((n-k)^2\) choices. The result follows. \(\square \)

Lemma 3.4

For \(n \ge d \ge k \ge 0\), we have

$$\begin{aligned} \sum _{\begin{array}{c} \lambda \vdash n\\ \ell (\lambda ) = n-k \end{array}} \lambda ! \sum _{\begin{array}{c} \Lambda : \Pi (\lambda ) \le \Lambda \\ \#\Lambda = n-d \end{array}} |\mu (\Pi (\lambda ), \Lambda )| \le (n-k)^{2d-k} (k+1)!. \end{aligned}$$
(17)

Proof

For \(\lambda \vdash n\) with \(\ell (\lambda ) = n-k\), \(\lambda !\) can be thought of as the product of terms obtained from filling the ith row of the Young diagram of \(\lambda \) with \(1, 2, \ldots , \lambda _i\). Alternatively, we may fill the cells of \(\lambda \) as follows: put \(n-k\) one’s in the first column, and fill the remaining cells with the numbers \(2, 3, \ldots , k+1\) starting at the largest row and proceeding left to right. It’s easy to see the labels of the first filling are bounded above by the labels of the second filling, so that \(\lambda ! \le (k+1)!\). Furthermore, each \(\lambda \vdash n\) with \(\ell (\lambda ) = n-k\) can be constructed by first placing \(n-k\) cells in the first column and then deciding on which of the \(n-k\) rows to place each of the remaining k cells, so there are no more than \((n-k)^k\) such \(\lambda \). The result follows from combining these bounds with (16). \(\square \)

Lemma 3.5

For n sufficiently large, for all \(0 \le d \le n\) we have

$$\begin{aligned} \sum _{\lambda \vdash n} \lambda ! \sum _{\begin{array}{c} \Lambda : \Pi (\lambda ) \le \Lambda \\ \#\Lambda = n-d \end{array}} |\mu (\Pi (\lambda ), \Lambda )| \le 3n^{2d}. \end{aligned}$$

Proof

For \(n \ge 2\) large enough, for all \(n \ge k \ge 2\) we see that \((k+1)! < n^{k-1}\). Using (17) gives

$$\begin{aligned} \sum _{\lambda \vdash n} \lambda ! \sum _{\begin{array}{c} \Lambda : \Pi (\lambda ) \le \Lambda \\ \#\Lambda = n-d \end{array}} |\mu (\Pi (\lambda ), \Lambda )|&\le \sum _{k=0}^d (n-k)^{2d-k} (k+1)! \\&\le n^{2d} + 2(n-1)^{2d-1} + \sum _{k=2}^d (n-k)^{2d-k} n^{k-1} \\&\le n^{2d} + 2n^{2d-1} + \sum _{k=2}^d n^{2d-1} \\&= n^{2d} + 2n^{2d-1} + (d-1) n^{2d-1} \le 3n^{2d}. \end{aligned}$$

\(\square \)

We may now complete the proof of Theorem 3.1. Combining Lemma 3.5 and (13) gives

$$\begin{aligned} |F_n(e^{is/\sigma _n}, e^{it/\sigma _n}) - 1| \le 3\sum _{d=1}^n \frac{(Mn)^{2d}}{\sigma _n^{2d}}. \end{aligned}$$

Since \(\sigma _n^2 \sim n^3/36\) and M is constant, \((Mn)^{2d}/\sigma _n^{2d} \sim (36^2M^2/n)^d\). Since M is constant, using a geometric series it follows that

$$\begin{aligned} \lim _{n \rightarrow \infty } \sum _{d=1}^n \frac{(Mn)^{2d}}{\sigma _n^{2d}} = 0, \end{aligned}$$

completing the proof of Theorem 3.1.

Remark 3.6

Indeed, the argument shows that \(|F_n(e^{is/\sigma _n}, e^{it/\sigma _n}) - 1| = O(1/n)\). The above estimates are particularly far from sharp for large d, though for small d they are quite accurate. Working directly with (11), one finds the \(d=1\) contribution to be

$$\begin{aligned} (1-p)(1-q) \frac{2 - \left( {\begin{array}{c}n\\ 2\end{array}}\right) }{[2]_p [2]_q}. \end{aligned}$$

Letting \(p = e^{is/\sigma _n}, q = e^{it/\sigma _n}\), straightforward estimates shows that this is \(\Omega (1/n)\). Consequently, the preceding arguments are strong enough to identify the leading term, and in particular

$$\begin{aligned} |F_n(e^{is/\sigma _n}, e^{it/\sigma _n}) - 1| = \Theta (1/n). \end{aligned}$$

4 Deducing Baxter and Zeilberger’s Result

We next summarize enough of the standard theory of characteristic functions to prove Theorem 1.1 using (3) and Theorem 3.1.

Definition 4.1

The characteristic function of an \(\mathbb {R}^k\)-valued random variable \(X = (X_1, \ldots , X_k)\) is the function \(\phi _X :\mathbb {R}^k \rightarrow \mathbb {C}\) given by

$$\begin{aligned} \phi _X(s_1, \ldots , s_k) := \mathbb {E}[\exp (i(s_1 X_1 + \cdots + s_k X_k))]. \end{aligned}$$

Example 4.2

It is well known that the characteristic function of the standard normal random variable with density \(\frac{1}{\sqrt{2\pi }} e^{-x^2/2}\) is \(e^{-s^2/2}\). Similarly, the characteristic function of a bivariate jointly independent standard normal random variable with density \(\frac{1}{2\pi } e^{-x^2/2 - y^2/2}\) is \(e^{-s^2/2 - t^2/2}\).

Example 4.3

If W is a finite set and \({{\,\mathrm{stat}\,}}= ({{\,\mathrm{stat}\,}}_1, \ldots , {{\,\mathrm{stat}\,}}_k) :W \rightarrow \mathbb {Z}_{\ge 0}^k\) is some statistic, the probability generating function of \({{\,\mathrm{stat}\,}}\) on W is

$$\begin{aligned} P(x_1, \ldots , x_k) := \frac{1}{\#W} \sum _{w \in W} x_1^{{{\,\mathrm{stat}\,}}_1(w)} \cdots x_k^{{{\,\mathrm{stat}\,}}_k(w)}. \end{aligned}$$

The characteristic function of the corresponding random variable X where the w are chosen uniformly from W is

$$\begin{aligned} \phi _X(s_1, \ldots , s_k) = P(e^{is_1}, \ldots , e^{is_k}). \end{aligned}$$

From Example 4.3, Remark 2.7, and an easy calculation, it follows that the characteristic functions of the random variables \(X_n\) and \(Y_n\) from (1) are

$$\begin{aligned} \phi _{X_n}(s) = e^{-i\mu _n s/\sigma _n} \frac{[n]_{e^{is/\sigma _n}}!}{n!} \quad \text {and}\quad \phi _{Y_n}(t) = e^{-i\mu _n t/\sigma _n} \frac{[n]_{e^{it/\sigma _n}}!}{n!}. \end{aligned}$$
(18)

An analogous calculation for the random variable \((X_n, Y_n)\) together with (18) and (3) gives

$$\begin{aligned} \begin{aligned} \phi _{(X_n, Y_n)}(s, t)&= e^{-i(\mu _n s/\sigma _n + \mu _n t/\sigma _n)} \frac{H_n(e^{is/\sigma _n}, e^{it/\sigma _n})}{n!} \\&= \phi _{X_n}(s) \phi _{Y_n}(t) F_n(e^{is/\sigma _n}, e^{it/\sigma _n}). \end{aligned} \end{aligned}$$
(19)

Theorem 4.4

(Multivariate Lévy Continuity [Bil95, p. 383]). Suppose that \(X^{(1)}\), \(X^{(2)}\), \(\ldots \) is a sequence of \(\mathbb {R}^k\)-valued random variables and X is an \(\mathbb {R}^k\)-valued random variable. Then \(X^{(1)}, X^{(2)}, \ldots \) converges in distribution to X if and only if \(\phi _{X^{(n)}}\) converges pointwise to \(\phi _X\).

If the distribution function of X is continuous everywhere, convergence in distribution means that for all \(u_1, \ldots , u_k\) we have

$$\begin{aligned} \lim _{n \rightarrow \infty } \mathbb {P}[X^{(n)}_i \le u_i, 1 \le i \le k] = \mathbb {P}[X_i \le u_i, 1 \le i \le k]. \end{aligned}$$

Many techniques are available for proving that \({{\,\mathrm{inv}\,}}\) and \({{\,\mathrm{maj}\,}}\) on \(S_n\) are asymptotically normal. The result is typically attributed to Feller.

Theorem 4.5

([Fel68, p. 257]). The sequences of random variables \(X_n\) and \(Y_n\) from (1) each converge in distribution to the standard normal random variable.

We may now complete the proof of Theorem 1.1. From Theorem 4.5 and Example 4.2, we have for all \(s, t \in \mathbb {R}\)

$$\begin{aligned} \lim _{n \rightarrow \infty } \phi _{X_n}(s) = e^{-s^2/2} \quad \text {and}\quad \lim _{n \rightarrow \infty } \phi _{Y_n}(t) = e^{-t^2/2}. \end{aligned}$$
(20)

Combing in order (20), (19), and Theorem 3.1 gives

$$\begin{aligned} \lim _{n \rightarrow \infty } \phi _{(X_n, Y_n)}(s, t) = e^{-s^2/2 - t^2/2}. \end{aligned}$$

Theorem 1.1 now follows from Example 4.2 and Theorem 4.4.