1 Introduction

Consider the complex unitary group U(N) endowed with the probability Haar measure. The nth secular coefficient of \(U\in U(N)\) is defined through the expansion

$$\begin{aligned} \det (zI+U) = \sum _{n=0}^N z^{N-n} \textrm{Sc}_n(U). \end{aligned}$$

If \(A=(a_{i,j})\) is an \(m\times n\) matrix with nonnegative integer entries, Diaconis and Gamburd [7] define the row-sum vector \(\textrm{row}(A)\in \mathbb {Z}^m\) and column-sum vector \(\textrm{col}(A) \in \mathbb {Z}^n\) by

$$\begin{aligned} \textrm{row}(A)_i = \sum _{j=1}^{n} a_{i,j}, \qquad \textrm{col}(A)_j = \sum _{i=1}^{m} a_{i,j}. \end{aligned}$$

Given two partitions \(\mu = (\mu _1,\ldots ,\mu _m)\) and \(\widetilde{\mu } = (\widetilde{\mu }_1,\ldots , \widetilde{\mu }_n)\) they denote by \(N_{\mu , \widetilde{\mu }}\) the number of nonnegative \(m \times n\) integer matrices A with \(\textrm{row}(A)=\mu \) and \(\textrm{col}(A) = \widetilde{\mu }\). When \(m=n=k\) and \(\mu _1=\ldots =\mu _k=\widetilde{\mu }_1=\ldots =\widetilde{\mu }_k\), matrices counted by \(N_{\mu ,\widetilde{\mu }}\) are known as magic squares of order k, see [7, §2.2] for a review. Given sequences \((a_1,\ldots ,a_{\ell })\) and \((b_1,\ldots ,b_{\ell })\) of nonnegative integers, they proved the following equality [7, Thm. 2]:

$$\begin{aligned} \int _{U(N)} \prod _{j=1}^{\ell } \textrm{Sc}_j(U)^{a_j} \overline{\textrm{Sc}_j(U)^{b_j}}\, dU = N_{\mu , \widetilde{\mu }} \end{aligned}$$
(1.1)

as long as \(\max \big \{\sum _{j=1}^{\ell } ja_j, \sum _{j=1}^{\ell } jb_j\big \} \le N\), where \(\mu \) and \(\widetilde{\mu }\) are the partitions with \(a_j\) and \(b_j\) parts of size j, respectively.

Identity (1.1) answered a question raised in [11, 26], where it was shown that \(\int _{U(N)}\textrm{Sc}_n(U)dU = 0\) and \(\int _{U(N)}|\textrm{Sc}_n(U)|^2dU = 1\) hold for \(1 \le n \le N\). The results in [7] inspired the study of pseudomoments of the Riemann zeta function [5] and were used in [19] to study the variance of the k-fold divisor function in short intervals. Recently, Najnudel, Paquette and Simm studied the distribution of \(\textrm{Sc}_n\) with n growing with N [22].

In §2 we give a new combinatorial proof of (1.1), which makes use of the characteristic map. This is in the spirit of Bump’s derivation [2, Prop. 40.4] of the Diaconis–Shahshahani moment computation [8].

In §3 we show that a result similar to (1.1) holds for traces of symmetric powers in place of secular coefficients, with substantially relaxed conditions. These traces are also the complete homogeneous symmetric polynomials \(h_n\) evaluated on the eigenvalues of the matrix. This result can be derived from a theorem of Baxter [16, Prop. 2.11] but again, our proof is combinatorial in nature.

In §4 we give two evaluations of (1.1) without any restriction on N. One evaluation uses the RSK correspondence and generalizes a result of Rains [23], and the second evaluation uses Gelfand–Tsetlin patterns and generalizes an argument of Rodgers [24]. These evaluations extend to moments of traces of symmetric powers. As an application, we give a new formula for a matrix integral that was considered by Keating, Rodgers, Roditty-Gershon and Rudnick in their study of the k-fold divisor function [19].

In §5 we show how an analogue of our proof of (1.1) has appeared in number-theoretic works of Vaughan and Wooley and of Granville and Soundararajan. The number-theoretic analogues of (1.1) concern moments of character sums on which there are several unconditional results in the literature. However, our result on symmetric traces corresponds to a conjecture we make about moments of sums of the Liouville (or the Möbius) function twisted by a Dirichlet character, which seems very difficult and reflects the random nature of Möbius, see Conjecture 5.1. As we explain in §5, the conjecture suggests that the Steinhaus random multiplicative function is a good model for the Liouville (or the Möbius) function twisted by a random Dirichlet character: \(\lambda \cdot \chi \) where \(\chi \) is chosen uniformly at random from the group of Dirichlet characters modulo q (\(q \rightarrow \infty \)) and \(\lambda (n)= (-1)^{\sum _{p,\,k\ge 1: p^k \mid n} 1}\).

2 Proof of the Diaconis–Gamburd Theorem

2.1 The symmetric group

For a permutation \(\pi \) we say that S is an invariant set for \(\pi \) if \(\pi (S)=S\). Equivalently, S is a union of cycles of \(\pi \). Given a sequence \(\lambda =(\lambda _1,\ldots ,\lambda _{\ell })\) of nonnegative integers that sum to n, we define the following function on the symmetric group \(S_{n}\) acting on \([n]:=\{1,2,\ldots ,n\}\):

where means disjoint union. We use the letter d here as short for divisor, as invariant sets for \(\pi \) are analogous to divisors of an integer m, and \(d_{\lambda }\) is analogous to a generalized divisor function over the integers, with divisors localized at certain scales (in the integers we might define \(m\mapsto \#\{ (m_1,m_2,\cdots ,m_{\ell }): m_1\cdots m_{\ell }=m,\, \log m_i \in [\lambda _i,\lambda _i+1)\}\)). The simplest examples are \(d_{(n)}\) which is identically 1, and \(d_{(a,n-a)}\) which counts invariant sets of size a.

Remark 2.1

The function \(d :S_n \rightarrow \mathbb {C}\) given by \(d:= \sum _{a=0}^{n} d_{(a,n-a)}\) equals the number of invariant sets, namely \(d(\pi )=2^{C(\pi )}\) where \(C(\pi )\) is the number of cycles of \(\pi \); this is the permutation analogue of the divisor function.

Remark 2.2

See [9] for a recent application of the analogy between invariant sets for a permutation and divisors of an integer.

Given sequences \(\mu \) and \(\widetilde{\mu }\) of nonnegative integers summing to n, let us define

$$\begin{aligned} N_{\mu ,\widetilde{\mu }}':= \frac{1}{|S_{n}|}\sum _{\pi \in S_{n}} d_{\mu }(\pi )d_{\widetilde{\mu }}(\pi ). \end{aligned}$$

Proposition 2.3

Suppose \(\mu ,\widetilde{\mu }\vdash n\). We have \(N_{\mu ,\widetilde{\mu }}' =N_{\mu ,\widetilde{\mu }}\).

Proof

By definition, given a partition \(\lambda = (\lambda _1,\ldots ,\lambda _{\ell })\vdash n\) we may express \(d_{\lambda }(\pi )\) as a sum over ordered set partitions:

(2.1)

where \(\alpha _{A_1,\ldots ,A_{\ell }}\) is the indicator function of permutations \(\pi \in S_n\) with \(\pi (A_i)=A_i\) for all i. Applying (2.1) with \(\lambda =\mu \) and multiplying by (2.1) with \(\lambda =\widetilde{\mu }\) we obtain

where \(\ell (\lambda )\) is the number of parts in a partition. Averaging this over \(S_n\) and interchanging the order of summation, we find

(2.2)

The inner sum in the right-hand side of (2.2) counts permutations \(\pi \in S_n\) for which \(A_i\) are invariant sets, as well as the \(B_j\). In particular \(\pi (A_i \cap B_j) \subseteq A_i, B_j\), forcing \(\pi (A_i \cap B_j) = A_i \cap B_j\). Conversely, given a permutation such that \(\pi (A_i \cap B_j) = A_i\cap B_j\) for all i and j, it necessarily satisfies \(\pi (A_i)=A_i\) and \(\pi (B_j)=B_j\) for all i and j. Thus, the inner sum counts \(\pi \)s with \(\pi (A_i \cap B_j) = A_i \cap B_j\). The sets \(A_i \cap B_j\) (\(1 \le i \le \ell (\mu )\), \(1 \le j \le \ell (\widetilde{\mu })\)) are pairwise disjoint and their union is [n], and so such \(\pi \)s are determined uniquely by their restrictions to \(A_i \cap B_j\), which may be arbitrary, proving that the inner sum is \(\prod _{i,j=1}^{n} |A_i \cap B_j|!\). Hence,

(2.3)

Observe that the \(n \times m\) matrix \(C=(|A_i \cap B_j|)\) has \(\textrm{row}(C) =\mu \) and \(\textrm{col}(C) = \widetilde{\mu }\). Hence

The inner expression in the right-hand side is the number of ordered set partitions of [n] into subsets \(C_{i,j}\) of size \(c_{i,j}\) (these sets correspond to \(A_i \cap B_j\) and one reconstructs \(A_i\) by \(A_i = \cup _{j} C_{i,j}\) and similarly \(B_j = \cup _{i} C_{i,j}\)). This is just the multinomial

$$\begin{aligned}\left( {\begin{array}{c}n\\ (c_{i,j}): 1 \le i \le \ell (\mu ),\, 1 \le j \le \ell (\widetilde{\mu })\end{array}}\right) = \frac{n!}{\prod _{i,j} c_{i,j}!},\end{aligned}$$

so that (2.3) simplifies to

$$\begin{aligned} N_{\mu ,\widetilde{\mu }}' = \sum _{\begin{array}{c} C \text { a matrix } \\ \text {counted by }N_{\mu ,\widetilde{\mu }} \end{array}} 1 = N_{\mu ,\widetilde{\mu }} \end{aligned}$$

as claimed. \(\square \)

In the simple case \(\widetilde{\mu }=(n)\), Proposition 2.3 reduces to \(\sum _{\pi \in S_n} d_{\mu }(\pi )/|S_n|=1\).

2.2 The characteristic map

Endow \(S_n\) with the uniform probability measure. The characteristic (or Frobenius) map \(\textrm{Ch}^{(N)}\) is a linear map from class functions on \(S_n\) to class functions on U(N), with the property that if \(n \le N\) then it is an isometry with respect to the \(L_2\)-norm, see [2, Thm. 40.1]. It may be given by

$$\begin{aligned} \textrm{Ch}^{(N)}(f) = \frac{1}{n!} \sum _{ \pi \in S_n} f(\pi ) p_{\lambda (\pi )}, \end{aligned}$$

see [2, Thm. 39.1]. Here \(\lambda (\pi )\) is the partition associated with \(\pi \) (the nondecreasing sequence, summing to n, of integers corresponding to the cycle sizes in \(\pi \)), and \(p_{\lambda }\) is the power sum symmetric polynomial associated with \(\lambda \), evaluated at the eigenvalues of \(U \in U(N)\).

Lemma 2.4

Suppose \(\lambda \vdash n\). We have

$$\begin{aligned} \textrm{Ch}^{(N)}(\textrm{sgn}\cdot d_{\lambda }) = e_{\lambda }, \end{aligned}$$

where \(\textrm{sgn}\) is the sign representation and \(e_{\lambda }\) is the elementary symmetric polynomial associated with the partition \(\lambda \).

Proof

Given \(\pi \in S_{n}\), we set \(p_{\pi } = p_{\lambda (\pi )}\). We denote by \(\ell (\lambda )\) the number of parts in \(\lambda \). We then have, by plugging (2.1) in the definition of \( \textrm{Ch}^{(N)}(\textrm{sgn}\cdot d_{\lambda })\) and interchanging order of summation,

We claim that the inner sum is \(e_{\lambda }\). Indeed, since \(\pi \) is determined by the restrictions \(\pi |_{A_i}\), and since \(p_{\lambda } = \prod _i p_{\lambda _i}\), we have

$$\begin{aligned} \sum _{\begin{array}{c} \pi \in S_{n}\\ \forall i:\,\pi (A_i)=A_i \end{array}} \textrm{sgn}(\pi )p_{\pi } = \prod _{i=1}^{\ell (\lambda )} \left( \sum _{\pi _i \in S_{A_i}} \textrm{sgn}(\pi ) p_{\pi _i} \right) = \prod _{i=1}^{\ell (\lambda )} \lambda _i! e_{\lambda _i}, \end{aligned}$$

where the last equality follows from the Newton–Girard identity \(\sum _{\pi \in S_m} \textrm{sgn}(\pi )p_{\pi }/m! = e_{m}\). To finish, note that the number of ordered set partitions of [n] into \(\ell (\lambda )\) sets of sizes \((\lambda _i)_{i=1}^{\ell (\lambda )}\) is exactly the binomial coefficient \(\left( {\begin{array}{c}n\\ \lambda _1,\ldots , \lambda _{\ell (\lambda )}\end{array}}\right) \). \(\square \)

2.3 Conclusion of proof

Here we establish (1.1). Let \((a_1,\ldots ,a_{\ell })\) and \((b_1,\ldots ,b_{\ell })\) be sequences of nonnegative integers satisfying the condition \(\max \{\sum _{j=1}^{\ell } ja_j, \sum _{j=1}^{\ell } jb_j\} \le N\). Let \(\mu \) and \(\widetilde{\mu }\) be the partitions with \(a_j\) and \(b_j\) parts of size j, respectively.

If \(\sum _j j a_j \ne \sum _j j b_j\), it is easy to see that both sides of (1.1) vanish. Indeed, for the right-hand side, note that the integrand is an homogeneous polynomial in the eigenvalues fof U, whose degree is nonzero, so its integral must vanish by translation-invariance of the Haar measure. On the other hand, if \(N_{\mu ,\widetilde{\mu }}\) is nonzero, we must have that \(\mu \) and \(\widetilde{\mu }\) sum to the same number (if \(A=(a_{i,j})\) is a matrix counted by \(N_{\mu ,\widetilde{\mu }}\) then both \(\mu \) and \(\widetilde{\mu }\) sum to \(\sum _{i,j} a_{i,j})\).

Now assume \(\sum _j j a_j = \sum _j j b_j =n \le N\). As \(\prod _{j}\textrm{Sc}_j(U)^{a_j} \overline{\textrm{Sc}_j(U)^{b_j}} = {e}_{\mu } \overline{{e}_{\widetilde{\mu }}}\) by definition, the fact that \(\textrm{Ch}^{(N)}\) is an isometry if \(n \le N\) shows, through Lemma 2.4, that the integral in (1.1) is equal to

$$\begin{aligned} \frac{1}{|S_{n}|}\sum _{\pi \in S_{n}} (\textrm{sgn}\cdot d_{\mu })(\pi )\overline{\textrm{sgn}\cdot d_{\widetilde{\mu }}}(\pi ) = \frac{1}{|S_{n}|}\sum _{\pi \in S_{n}} d_{\mu }(\pi ) d_{\widetilde{\mu }}(\pi )=N_{\mu ,\widetilde{\mu }}', \end{aligned}$$

and the proof is concluded by applying Proposition 2.3.

3 Symmetric powers

Let \(\textrm{Tr}\textrm{Sym}^n(U)\) be the trace of the nth symmetric power of \(U \in U(N)\). This is also the nth complete homogeneous symmetric polynomial \(h_n\) evaluated on the eigenvalues of U.

Lemma 3.1

Let \(( a_j)_{j=1}^{\ell }\), \((b_j)_{j=1}^{\ell }\) be sequences of nonnegative integers. We have

$$\begin{aligned} \int _{U(N)} \prod _{j=1}^{\ell } (\textrm{Tr}\textrm{Sym}^j(U))^{a_j} \overline{(\textrm{Tr}\textrm{Sym}^j(U))^{b_j}}\, dU = N_{\mu , \widetilde{\mu }} \end{aligned}$$
(3.1)

as long as \(\min \{\sum _{j=1}^{\ell } a_j,\sum _{j=1}^{\ell } b_j\} \le N\), where \(\mu \) and \(\widetilde{\mu }\) are the partitions with \(a_j\) and \(b_j\) parts of size j, respectively.

We start with the following corollary of Lemma 2.4.

Corollary 3.2

Suppose \(\lambda \vdash n\). We have \( \textrm{Ch}^{(N)}( d_{\lambda }) = h_{\lambda }\).

Proof

This follows from Lemma 2.4 through the existence of an involution \(\iota \) on the space of symmetric polynomials, with the properties \(\iota (\textrm{Ch}^{(N)}(f))=\textrm{Ch}^{(N)}(\textrm{sgn}\cdot f) \) [2, Thm. 39.3] and \(\iota (e_{\lambda }) = h_{\lambda }\) [2, Thm. 36.3]. Alternatively, one may repeat the proof of Lemma 2.4 with the Newton–Girard identity \(\sum _{\pi \in S_m} p_{\pi }/m! = h_{m}\). \(\square \)

At this point we can deduce (3.1) in the restricted range \(\max \{\sum _{j=1}^{\ell } ja_j\), \(\sum _{j=1}^{\ell } jb_j \}\le N\), in the same way we proved (1.1).

Next we prove the following well-known identity, often proved as a consequence of the RSK correspondence. Recall that for given partitions \(\lambda \) and \(\mu \), the Kostka number \(K_{\lambda ,\mu }\) is defined as the number of semistandard Young tableaux (SSYTs) of shape \(\lambda \) and weight \(\mu \) [7, §2.3].

Lemma 3.3

Given \(\mu ,\widetilde{\mu } \vdash n\) we have

$$\begin{aligned} \sum _{\lambda \vdash n} K_{\lambda ,\mu }K_{\lambda ,\widetilde{\mu }} = N_{\mu ,\widetilde{\mu }}. \end{aligned}$$
(3.2)

Proof

We may expand \(e_{\mu }\) and \(e_{\widetilde{\mu }}\) in the Schur basis, see [27, p. 335]:

$$\begin{aligned} e_{\mu } = \sum _{\lambda \vdash n} K_{\lambda ', \mu } s_{\lambda }, \qquad e_{\widetilde{\mu }} = \sum _{\lambda \vdash n} K_{\lambda ', \widetilde{\mu }} s_{\lambda }, \end{aligned}$$
(3.3)

where \(\lambda '\) is the conjugate of \(\lambda \). Let \(a_j\) and \(b_j\) be the number of js in \(\mu \) and \(\widetilde{\mu }\), respectively. Orthogonality of Schur functions [7, Eq. (22)] implies that

$$\begin{aligned} \int _{U(n)} \prod _{j=1}^{\ell } \textrm{Sc}_j(U)^{a_j} \overline{\textrm{Sc}_j(U)^{b_j}}\, dU = \sum _{\lambda \vdash n} K_{\lambda ',\mu }K_{\lambda ',\widetilde{\mu }}=\sum _{\lambda \vdash n} K_{\lambda ,\mu }K_{\lambda ,\widetilde{\mu }}. \end{aligned}$$

On the other hand, this integral was shown to equal \(N_{\mu ,\widetilde{\mu }}\) in (1.1). \(\square \)

We now prove Lemma 3.1.

Proof

The case \(\sum _{j=1}^{\ell } ja_j\ne \sum _{j=1}^{\ell } jb_j\) is treated as in the secular coefficients case. Next, assume that \(\sum _j ja_j= \sum _j jb_j =n\) and \(\min \{\sum _{j=1}^{\ell } a_j,\sum _{j=1}^{\ell } b_j\} \le N\). The multiset of eigenvalues of \(\textrm{Tr}\textrm{Sym}^j (U)\) consists of products of j eigenvalues of U, and so the integrand in the left-hand side of (3.1) is \(h_{\mu }\overline{h_{\widetilde{\mu }}}\). We may expand \(h_{\lambda }\) in the Schur basis, see Stanley [27, Cor. 7.12.4]:

$$\begin{aligned} h_{\mu } = \sum _{\lambda \vdash n} K_{\lambda , \mu } s_{\lambda }. \end{aligned}$$

Orthogonality of Schur functions implies that the left-hand side of (3.1) is

$$\begin{aligned} \sum _{\begin{array}{c} \lambda \vdash n\\ \ell (\lambda )\le N \end{array}} K_{\lambda , \mu } K_{\lambda , \widetilde{\mu }}. \end{aligned}$$
(3.4)

We claim \(K_{\lambda ,\mu } \ne 0\) implies \(\ell (\lambda ) \le \ell (\mu )\) (see e.g. [27, Prop. 7.10.5]). This follows from the definition of Kostka numbers: the first column of an SSYT counted by \(K_{\lambda ,\mu }\) contains \(\ell (\lambda )\) increasing positive integers less than or equal to \(\ell (\mu )\), hence the implication.

As \(\min \{\ell (\mu ),\ell (\widetilde{\mu })\} = \min \{ \sum _j a_j, \sum _j b_j \} \le N\) by assumption, we deduce (3.4) is equal to the full sum \(\sum _{\lambda \vdash n} K_{\lambda , \mu } K_{\lambda , \widetilde{\mu }}\) and the proof is concluded by (3.2). \(\square \)

Remark 3.4

Lemma 3.1 may also be derived from a theorem of Baxter on Toeplitz determinants for certain generating functions [1] (cf. [16, Prop. 2.11]), special cases of which appeared in earlier works of Szegő and Onsager. Concretely, Baxter proved that (originally in the language of Toeplitz determinants) if \(\min \{\ell ,m\} \le N\) then

$$\begin{aligned} \int _{U(N)} \frac{1}{\prod _{j=1}^{\ell }\det (I-a_j U)}\frac{1}{\prod _{i=1}^{m}\det (I-b_i \overline{U})}dU = \prod _{i=1}^{m} \prod _{j=1}^{\ell }\frac{1}{1-a_j b_i} \end{aligned}$$
(3.5)

for complex \(|a_j|<1\) and \(|b_i|<1\). Expanding the rational functions in both sides as power series and comparing coefficients, one obtains Lemma 3.1. In a sense, the appearance of magic squares in random matrix theory could have been anticipated due to (3.5).

Remark 3.5

A weaker version of Lemma 3.1, with \(\max \) in place of \(\min \), may be derived from formulas for averages of ratios of characteristic polynomials [3, 6].

Remark 3.6

See [22, Thm. 1.6] for a variant of Lemma 3.1 and (3.5), where one works in the circular \(\beta \)-ensemble and takes \(N \rightarrow \infty \).

4 General moments

Given a matrix with nonnegative integer entries, an SE-chain is a sequence of entries in which each entry is located weakly to the right of and weakly below the preceding entry. The length of an SE-chain is defined as the sum of elements in it. An ne-chain is a sequence of nonzero entries in which each entry is strictly to the right of and strictly above the preceding entry. The length of an ne-chain is defined as the number of elements in it.

The RSK correspondence is a bijection from the set of matrices with nonnegative integers to the set \(\{ (P_1,P_2): P_i \text { are SSYTs with the same shape}\}\), see [27, Ch. 7.11] for its description. The bijection takes a matrix A to a pair \((P_1,P_2)\) where the weight of \(P_1\) is \(\textrm{row}(A)\) and the weight of \(P_2\) is \(\textrm{col}(A)\). We denote by \(\lambda \) the common shape of \(P_1\) and \(P_2\). A theorem of Schensted [25] (cf. Theorem 8 of Krattenthaler [21] with \(k=1\)) tells us that the largest part of \(\lambda \) (resp. the number of parts in \(\lambda \)) equals the length of the longest SE-chain (resp. ne-chain) of A.

Proposition 4.1

Let \(N \ge 1\). Given sequences \((a_1,\ldots ,a_{\ell })\) and \((b_1,\ldots ,b_{\ell })\) of nonnegative integers, let \(\mu \) and \(\widetilde{\mu }\) be the partitions with \(a_j\) and \(b_j\) parts of size j, respectively. The integral

$$\begin{aligned} \int _{U(N)} \prod _{j=1}^{\ell } \textrm{Sc}_j(U)^{a_j} \overline{\textrm{Sc}_j(U)^{b_j}}\, dU \end{aligned}$$
(4.1)

is equal to the the number of \(\ell (\mu ) \times \ell (\widetilde{\mu })\) matrices A with nonnegative integer entries such that \(\textrm{row}(A)=\mu \), \(\textrm{col}(A) = \widetilde{\mu }\) and the longest SE-chain in A has length \(\le N\). The integral

$$\begin{aligned} \int _{U(N)} \prod _{j=1}^{\ell } (\textrm{Tr}\textrm{Sym}^j(U))^{a_j} \overline{(\textrm{Tr}\textrm{Sym}^j(U))^{b_j}}\, dU \end{aligned}$$
(4.2)

is equal to the the number of \(\ell (\mu ) \times \ell (\widetilde{\mu })\) matrices A with nonnegative integer entries such that \(\textrm{row}(A)=\mu \), \(\textrm{col}(A) = \widetilde{\mu }\) and the longest ne-chain in A has length \(\le N\).

Proposition 4.1 generalizes (1.1) and Lemma 3.1. If \(a_j=b_j=0\) for \(j\ge 2\) and \(a_1=b_1=n\), Proposition 4.1 shows

$$\begin{aligned} \int _{U(N)} |\textrm{Tr}(U)^n|^2\, dU \end{aligned}$$

is equal to the number of permutations in \(S_n\) with longest increasing subsequence of length \(\le N\), a result of Rains [23].

Proof of Proposition 4.1

The proof of the evaluation of (4.1) is almost the same as the proof of (1.1) in [7], where \(N \ge \max \{\sum _j j a_j, \sum _j jb_j\}\) was imposed. Instead of imposing this we invoke Schensted’s theorem at the end of the proof:

As in the proof of (1.1) we may assume \(\sum _j ja_j = \sum _j jb_j=n\) for some \(n\ge 0\). Since \(\prod _{j=1}^{\ell } \textrm{Sc}_j(U)^{a_j}=e_{\mu }(U)\) and \(\prod _{j=1}^{\ell } \textrm{Sc}_j(U)^{b_j}=e_{\widetilde{\mu }}(U)\), the expansions in (3.3) together with orthogonality of Schur functions [7, Eq. (22)] imply that (4.1) equals

$$\begin{aligned} \sum _{\begin{array}{c} \lambda \vdash n \\ \ell (\lambda ) \le N \end{array}} K_{\lambda ',\mu }K_{\lambda ',\widetilde{\mu }}=\sum _{\begin{array}{c} \lambda \vdash n \\ \lambda _1 \le N \end{array}} K_{\lambda ,\mu }K_{\lambda ,\widetilde{\mu }}, \end{aligned}$$

i.e. the number of pairs (PQ) of SSYTs where P has weight \(\mu \), Q has weight \(\widetilde{\mu }\), and P and Q have a common shape \(\lambda \vdash n\) such that \(\lambda _1 \le N\). By the RSK correspondence and Schensted’s theorem [25], such pairs are in one-to-one correspondence with the matrices described in the first part of the proposition.

Now consider (4.2). As we saw in the proof of Lemma 3.1, (4.2) is equal to the sum in (3.4), i.e. the number of pairs (PQ) of SSYTs where P has weight \(\mu \), Q has weight \(\widetilde{\mu }\), and P and Q have a common shape \(\lambda \) such that \(\lambda \vdash n\) and \(\ell (\lambda ) \le N\). By the RSK correspondence and Schensted’s theorem [25], such pairs are in one-to-one correspondence with the matrices described in the second part of the proposition. \(\square \)

Let

$$\begin{aligned} I_k(n;N):= \int _{U(N)}|[u^n]\det (I+uU)^k|^2\, dU = \int _{U(N)}\left| \sum _{j_1+\ldots +j_k=n}\prod _{i=1}^{k} \textrm{Sc}_{j_i}(U)\right| ^2\, dU. \end{aligned}$$

Summing (4.1) over all sequences \((a_j)_j\) and \((b_j)_j\) of nonnegative integers such that \(\sum _j ja_j=\sum _j b_j=n\), \(\sum _j a_j = \sum _j b_j=k\) and applying the first part of Proposition 4.1, we obtain

Corollary 4.2

Let \(n,k,N\ge 1\). Then \(I_k(n;N)\) is equal to the number of \(k\times k\) matrices with nonnegative integer entries whose sum is n, and their longest SE-chain has length \(\le N\).

The integral \(I_k(n;N)\) was studied extensively in [19]. According to Theorem 1.5 of [19],

$$\begin{aligned} I_k(n;N) = N^{k^2-1}(\gamma _k(n/N)+O_k(1/N)) \end{aligned}$$

holds uniformly for \(n,N\ge 1\), where \(\gamma _k:\mathbb {R}_{\ge 0} \rightarrow \mathbb {R}_{\ge 0}\) is an explicit function supported on [0, k] [19, Eq. (1.12)]. Thus, in view of Corollary 4.2, if we pick uniformly at random a \(k\times k\) matrix with nonnegative integer entries that sum to n, the probability that its longest SE-chain has length \(\le N\) can be shown to be

$$\begin{aligned} \frac{I_k(n;N)}{I_k(n;n)} = \frac{\gamma _k(n/N)}{\gamma _k(1)(n/N)^{k^2-1}}+O_k(1/n). \end{aligned}$$

Theorem 1.4 of [19] (cf. [24, Thm. 1.5]) shows \(I_k(n;N)\) equals the number of arrays \((x_{i,j})_{1\le i,j\le k}\) of nonnegative integers satisfying \(x_{1,1}\le N\), \(\sum _{i=1}^{k} x_{i,i} = n\), and \(x_{i,j}\) is weakly decreasing when either i or j is fixed. We give a similar description of the integrals in Proposition 4.1 using an idea of Rodgers [24, p. 1270].

Proposition 4.3

Let \(N \ge 1\). Given sequences \((a_1,\ldots ,a_{\ell })\) and \((b_1,\ldots ,b_{\ell })\) of nonnegative integers, let \(\mu \) and \(\widetilde{\mu }\) be the partitions with \(a_j\) and \(b_j\) parts of size j, respectively. The integral (4.1) (resp. (4.2)) equals the number of arrays \((x_{i,j})_{1\le i\le \ell (\mu ),\,1\le j \le \ell (\widetilde{\mu })}\) of nonnegative integers satisfying each of the following conditions:

  1. 1.

    \(x_{1,1} \le N\) (resp. \(x_{i,i}=0\) for \(N+1 \le i \le \min \{\ell (\mu ),\ell (\widetilde{\mu })\}\)),

  2. 2.

    \(x_{i,j}\) is weakly decreasing when either i or j is fixed,

  3. 3.

    \(\sum _{j-i = r-\ell (\mu )} x_{i,j}=\mu _1+\ldots +\mu _r\) for \(1 \le r\le \ell (\mu )\) and \(\sum _{i-j = s-\ell (\widetilde{\mu })} x_{i,j}=\widetilde{\mu }_1+\ldots +\widetilde{\mu }_r\) for \(1 \le s \le \ell (\widetilde{\mu })\).

Proof

If \(\sum _j j a_j \ne \sum _j j b_j\), i.e. \(\sum _i \mu _i \ne \sum _i \widetilde{\mu }_i\), then the integrals (4.1) and (4.2) vanish as in the proof of (1.1). In this case there can be no arrays satisfying the third condition with \((r,s)=(\ell (\mu ),\ell (\widetilde{\mu }))\), which is what we needed to show. From now on we assume that \(\sum _j j a_j = \sum _j j b_j=n\) for some n.

A Gelfand–Tsetlin pattern (or GT-pattern) of k rows is a triangular array of nonnegative integers \((a_{i,j})_{1 \le i\le j \le k}\) with \(a_{i,j}\le a_{i+1,j+1}\le a_{i,j+1}\) when \(1\le i,j\le k-1\); its largest element is \(a_{1,k}\). SSYTs with entries in \(\{1,2,\ldots ,k\}\) are in one-to-one correspondence, described in detail by Stanley [27, pp. 313-314], with such arrays. SSYTs of shape \(\lambda =(\lambda _1,\ldots ,\lambda _m)\) and weight \((w_1,\ldots ,w_r)\) (\(r \le k\)) correspond to GT-patterns of k rows such that \(a_{1,i}= \lambda _{k-i+1}\) and \(\sum _{j\in [i,k]} a_{i,j}=w_1+w_2+\ldots +w_{k-i+1}\) for all \(1 \le i \le k\) (here \(w_i\equiv 0\) if \(i>r\), \(\lambda _i\equiv 0\) if \(i>m\)). In words, the first row of the pattern recovers the shape of the SSYT (by reversing the row), and the row sums recover the weight.

In the proof of Proposition 4.1 we saw that (4.1) (resp. (4.2)) equals the number of pairs (PQ) of SSYTs where P has weight \(\mu \), Q has weight \(\widetilde{\mu }\), and P and Q have the same shape. This common shape, let us call it \(\lambda \), satisfies \(\lambda \vdash n\) and \(\lambda _1 \le N\) (resp. \(\ell (\lambda ) \le N\)). Necessarily \(\ell (\lambda )\le \min \{\ell (\mu ),\ell (\widetilde{\mu })\}\), otherwise there are no such pairs. We apply the above one-to-one correspondence with GT-patterns to obtain all pairs \(((a_{i,j})_{1\le i \le j\le \ell (\mu )},(b_{i,j})_{1\le i \le j \le \ell (\widetilde{\mu })})\) of patterns, such that their (common) first row is \(\lambda \) (in reverse order), and their respective row sums encode \(\mu \) and \(\widetilde{\mu }\).

Assume without loss of generality that \(\ell (\mu )\ge \ell (\widetilde{\mu })\). We make the following observation: since \(\ell (\lambda ) \le \ell (\widetilde{\mu })\), it follows that \(a_{1,i}=0\) for all \(1 \le i \le \ell (\mu )- \ell (\widetilde{\mu })\). Due to the property \(a_{i,j}\le a_{i+1,j+1}\le a_{i,j+1}\), this forces \(a_{i,j}=0\) for all \(1 \le i \le j \le \ell (\mu )-\ell (\widetilde{\mu })\).

Next we define an array \((x_{i,j})_{1 \le i\le \ell (\mu ),\, 1\le j \le \ell (\widetilde{\mu })}\), satisfying the three conditions in the proposition, by \(x_{i,j}:=a_{1+i-j,\ell (\mu )+1-j}\) if \(i \ge j\) and \(x_{i,j}:=b_{1+j-i,\ell (\widetilde{\mu })+1-i}\) if \(i \le j\), giving the desired result.

In the secular coefficient case, the condition \(\lambda _1 \le N\) is encoded by the inequalities \(a_{1,\ell (\mu )}=b_{1,\ell (\widetilde{\mu })}\le N\) which become \(x_{1,1}\le N\). In the symmetric traces case, the condition \(\ell (\lambda )\le N\) is encoded by the relations \(a_{1,i}=0\) for \(1 \le i \le \ell (\mu )-N\) and \(b_{1,i}=0\) for \(1 \le i \le \ell (\widetilde{\mu })-N\) (recall \(\lambda \) – in reverse order – is the first row of the patterns \((a_{i,j})_{1\le i \le j \le \ell (\mu )}\) and \((b_{i,j})_{1\le i \le j \le \ell (\widetilde{\mu })}\)), which become \(x_{i,i}=0\) for \(N+1 \le i \le \ell (\widetilde{\mu })\). \(\square \)

Summing Proposition 4.3 over all sequences \((a_j)_j\) and \((b_j)_j\) of k nonnegative integers such that \(\sum _j ja_j=\sum _j jb_j = n\) and \(\sum _j a_j =\sum _j b_j= k\), we recover Theorem 1.4 of [19].

5 Number-theoretic connections

5.1 A polynomial analogue of \(d_{\lambda }\)

Let \(\lambda \vdash n\). Over \(\mathbb {F}_q[T]\), the polynomial ring over the finite field of q elements, it is straightforward to define an analogue of \(d_{\lambda }\). Letting \(\mathcal {M}_{n,q} \subseteq \mathbb {F}_q[T]\) be the subset of

monic polynomials of degree n, we set

$$\begin{aligned} d_{\lambda ,q}(f):=\sum _{\begin{array}{c} f_1 \cdots f_{\ell (\lambda )}=f \\ \forall i:\, \deg (f_i) = \lambda _i,\, f_i \text { monic} \end{array}} 1. \end{aligned}$$

If \(\mu \), \(\widetilde{\mu }\) are partitions of n then we claim

$$\begin{aligned} \lim _{q\rightarrow \infty }\frac{1}{q^n} \sum _{f \in \mathcal {M}_{n,q}} d_{\mu ,q}(f) d_{\widetilde{\mu },q}(f) = \frac{1}{|S_n|} \sum _{ \pi \in S_n} d_{\mu }(\pi ) d_{\widetilde{\mu }}(\pi )=N_{\mu ,\widetilde{\mu }}. \end{aligned}$$
(5.1)

The second equality in (5.1) is just Proposition 2.3. The first equality in (5.1) has to do with a general principle and applies to a general class of functionsFootnote 1 However, the fact that the left-hand side of (5.1) equals to the right-hand side can be established directly, as we sketch now. By definition, \(\sum _{f \in \mathcal {M}_{n,q}} d_{\mu ,q}(f) d_{\widetilde{\mu },q}(f)\) counts solutions to the equation

$$\begin{aligned} f_1 f_2 \cdots f_{\ell (\mu )} = g_1 g_2 \cdots g_{\ell (\widetilde{\mu })} \end{aligned}$$

where \(\deg f_i = \mu _i\) and \(\deg g_j = \widetilde{\mu }_j\). We explain how to count such solutions. We can associate with any pair of tuples of solutions \((f_i)_i\) and \((g_j)_j\) a gcd matrix

$$\begin{aligned} h_{i,j}:= \gcd (g_i,f_j), \end{aligned}$$

analogous to the matrix constructed in the proof of Proposition 2.3. At least if the \((g_i)_i\) are pairwise coprime and so are \((f_j)_j\), we may reconstruct them from \(h_{i,j}\) via \(g_i = \prod _{j}h_{i,j}\) and \(h_j = \prod _{i} h_{i,j}\). Fortunately, in the large-q limit we can impose these coprimality conditions and only incur an error of \(o_{q \rightarrow \infty }(q^{n})\) (the details are left to the reader).

Given a gcd matrix \((h_{i,j})_{i,j}\) (coming from pairwise coprime \((g_i)_i\) and pairwise coprime \((f_j)_j\)) we can further form a degree matrix \((\deg h_{i,j})_{i,j}\) of the same size, which is counted by \(N_{\mu ,\widetilde{\mu }}\). For each degree matrix \((d_{i,j})_{i,j}\) counted by \(N_{\mu ,\widetilde{\mu }}\) we need to count the number of gcd matrices corresponding to it, that is, monic \(h_{i,j}\) with \(\deg h_{i,j}=d_{i,j}\) and \(h_{i,j}\) being pairwise coprime (a consequence of the pairwise coprimality of \((g_i)_i\) and \((f_j)_j\)). As already mentioned, in the large-q limit these coprimality conditions do not affect asymptotics, and the number of such \(h_{i,j}\) is \(q^{\sum _{i,j} d_{i,j}}(1+o_{q \rightarrow \infty }(1))=q^{n}(1+o_{q \rightarrow \infty }(1))\) (so, always asymptotic to \(q^n\), regardless of the specific \((d_{i,j})_{i,j}\)). All in all, we obtain that the left-hand side of (5.1) is equal to the right-hand side, without using the first equality in (5.1).

This idea can be made to work for fixed q as well using a clever construction of Vaughan and Wooley [28, §8], which can be used to show that in fact \(\sum _{f \in \mathcal {M}_{n,q}} d_{\mu ,q}(f) d_{\widetilde{\mu },q}(f)\) is a polynomial in q of degree n and leading coefficient \(N_{\mu ,\widetilde{\mu }}\), but we do not give details. They construct matrices which take into account common factors. They used an inductive process to associate with a solution of \(m_1 \cdots m_{\ell (\mu )} = n_1 \cdots n_{\ell (\widetilde{\mu })}\) a \(\ell (\mu ) \times \ell (\widetilde{\mu })\) matrix \((a_{i,j})_{i,j}\) such that \(m_i=\prod _{i=1}^{\ell (\widetilde{\mu })}a_{r,i}\) and \(n_i=\prod _{i=1}^{\ell (\mu )}a_{i,r}\); this also works with polynomials instead of integers. The process goes by letting \(a_{1,1}=\gcd (m_1,n_1)\) and then defining, using induction on \(i+j\), \(a_{i,j}=\gcd (m_i/\prod _{\ell<j}a_{i,\ell },n_j/\prod _{\ell <i}a_{\ell ,j})\). The above process was discovered independently by Granville and Soundararajan in the proof of [10, Thm. 4]. We explain how these gcd matrices arose in their work because it is not to difficult to relate it to matrix integrals. They were interested in the order of magnitude of moments of character sums:

$$\begin{aligned} M_k(x,q) = \frac{1}{\phi (q)}\sum _{\chi \bmod q} \left| \sum _{n \le x} \chi (n)\right| ^{2k}. \end{aligned}$$

Here the sum is over all Dirichlet characters modulo q. Using orthogonality of characters, we have

$$\begin{aligned} M_k(x,q) = \#\{ n_1 n_2 \cdots n_k \equiv m_1 m_2 \cdots m_k \bmod q,\, \forall i:\, n_i,m_i \le x,\, (n_im_i,q)=1\}. \end{aligned}$$

If \(x^k \le q\), this reduces to

$$\begin{aligned} M_k(x,q) = \#\{ n_1 n_2 \cdots n_k = m_1 m_2 \cdots m_k,\, \forall i:\, n_i,m_i \le x,\, (n_im_i,q)=1\}. \end{aligned}$$

They were also interested in moments of sums of the (Steinhaus) random multiplicative function, which we denote by \(\alpha \) and whose definition we now recall. It is a random completely multiplicative function (\(\alpha (nm)=\alpha (n)\alpha (m)\) for all \(n,m\ge 1\)), chosen in such a way that \((\alpha (p))_p\) (p prime) are i.i.d. random variables taking values uniformly on \(\{z \in \mathbb {C}: |z|=1\}\). Then we have the orthogonality relation

$$\begin{aligned} \mathbb {E} \alpha (n) \overline{\alpha }(m) = \delta _{nm} \end{aligned}$$
(5.2)

and similarly,

$$\begin{aligned} M_k(x):= \mathbb {E}\left| \sum _{n \le x} \alpha (n)\right| ^{2k}= \#\{ n_1 n_2 \cdots n_k = m_1 m_2 \cdots m_k,\, \forall i:\, n_i,m_i \le x\}. \end{aligned}$$

The gcd matrices one associates with the solutions counted by \(M_k(x,q)\) (if \(x^k \le q\)) and \(M_k(x)\) can be counted and in turn lead to bounds on \(M_k(x,q)\) and \(M_k(x)\) as in [10, Thms. 4.1–4.2]. We refer the reader to Heap and Lindqvist [15] for asymptotics results for \(M_k(x,q)\) (\(x^k \le q\)) and \(M_k(x)\) (cf. Harper, Nikeghbali and Radziwiłł  [14]). See also Harper [12, 13] for estimates on \(M_k(x)\) in a wide range of \(k\ge 0\), including noninteger k.

In function fields, the connection between character sums and secular coefficients is natural. If \(\chi \) is a nonprincipal Dirichlet character modulo Q, we can form its Dirichlet L-function

$$\begin{aligned} L(u,\chi ) = \sum _{f \text { monic}} \chi (f)u^{\deg f}, \end{aligned}$$

which is a polynomial whose nth coefficient is exactly a character sum. By Weil’s Riemann hypothesis, we can see that

$$\begin{aligned} \sum _{f \in \mathcal {M}_{n,q}} \chi (f) \ll _{n,\deg Q} q^{\frac{n}{2}}. \end{aligned}$$
(5.3)

If \(\chi \) is odd (i.e. \(\chi \) is not trivial on \(\mathbb {F}_q^{\times }\)) and primitive we have \(\deg L(u,\chi )= \deg Q - 1\) and the zeros of L all lie on \(|u|=q^{-1/2}\) so that we can write \(L(u,\chi )\) as a characteristic polynomial of a scaled unitary matrix:

$$\begin{aligned} L(u,\chi ) = \det (I_{\deg Q-1} - u\sqrt{q} \Theta _{\chi }),\, \Theta _{\chi } \in U(\deg Q-1). \end{aligned}$$

Comparing coefficients, we find that

$$\begin{aligned} \textrm{Sc}_n(\Theta _{\chi }) = \frac{(-1)^n}{q^{\frac{n}{2}}} \sum _{f \in \mathcal {M}_{n,q}} \chi (f). \end{aligned}$$
(5.4)

If \(nk \le \deg Q\) then orthogonality relations show

$$\begin{aligned} \frac{1}{\phi (Q)} \sum _{\chi \bmod Q} \left| \sum _{f \in \mathcal {M}_{n,q}} \chi (f) \right| ^{2k}= & {} \#\{ f_1 f_2 \cdots f_k = g_1 g_2 \cdots g_k, \, \forall i:\,\deg f_i \nonumber \\= & {} \deg g_i = n,\, (f_i g_i,Q)=1\}. \end{aligned}$$
(5.5)

Since in the large-q limit a random polynomial will be coprime to Q with probability approaching 1, we can deduce from (5.5) (using gcd matrices as used in establishing (5.1)) that

$$\begin{aligned} \frac{1}{\phi (Q)} \sum _{\chi \bmod Q} \left| \sum _{f \in \mathcal {M}_{n,q}} \chi (f) \right| ^{2k} =(1+o_{q \rightarrow \infty }(1)) q^{nk} N_{(n^k),(n^k)} \end{aligned}$$
(5.6)

holds as \(q \rightarrow \infty \), where we assume \(nk \le \deg Q\) and that n, k and \(\deg Q\) are fixed. Here \((n^k)\) stands for the partition of nk consisting of n repeated k times. The term \(o_{q \rightarrow \infty }(1)\) goes to 0 as \(q \rightarrow \infty \) and may depend on the fixed parameters. We can also package (5.5) as

$$\begin{aligned} \frac{1}{\phi (Q)} \sum _{\chi \bmod Q} \left| \sum _{f \in \mathcal {M}_{n,q}} \chi (f) \right| ^{2k} = \mathbb {E} \left| \sum _{\begin{array}{c} f \in \mathcal {M}_{n,q}\\ (f,Q)=1 \end{array}} \alpha (f)\right| ^{2k} \end{aligned}$$
(5.7)

where now \(\alpha \) is a Steinhaus random multiplicative function in \(\mathbb {F}_q[T]\) which satisfies orthogonality relations similar to (5.2), and \(nk \le \deg Q\).

In the large-q limit almost all characters are primitive and odd [20, Eq. (3.25)] and we obtain from (5.3) and (5.4) that

$$\begin{aligned} \frac{1}{\phi (Q)} \sum _{\chi \bmod Q} \left| \sum _{f \in \mathcal {M}_{n,q}} \chi (f) \right| ^{2k} = q^{nk} \left( \mathbb {E}_{\begin{array}{c} \chi \text { odd,}\\ \text {primitive mod }Q \end{array}} \left| \textrm{Sc}_{n}(\Theta _{\chi })\right| ^{2k} + o_{q \rightarrow \infty }(1)\right) \nonumber \\ \end{aligned}$$
(5.8)

if \(nk \le \deg Q -1\) (to handle the contribution of the principal character). Comparing (5.8) and (5.6) we find that

$$\begin{aligned} \lim _{q \rightarrow \infty }\mathbb {E}_{\begin{array}{c} \chi \text { odd,}\\ \text {primitive mod }Q \end{array}} \left| \textrm{Sc}_{n}(\Theta _{\chi })\right| ^{2k} = N_{(n^k),(n^k)} = \int _{U(\deg Q - 1)} \left| \textrm{Sc}_{n}(U)\right| ^{2k} dU\nonumber \\ \end{aligned}$$
(5.9)

where the last equality is (1.1). Here n, k and \(\deg Q\) are fixed and satisfy \(nk \le \deg Q-1\). The fact that the left-hand side of (5.9) converges to the right-hand side is essentially a special case of a deep equidistribution theorem of Katz [17]. Katz proved that – at least for squarefree Q – the ensemble \(( \Theta _{\chi })_{\chi \text { odd, primitive mod }Q}\) equidistributes in \(U(\deg Q-1)\) as \(q \rightarrow \infty \). In other words, the average of a continuous function \(f:U(\deg Q-1)\rightarrow \mathbb {C}\) over the finite ensemble \((\Theta _{\chi })_{\chi \text { odd, primitive mod }Q}\) converges, as \(q \rightarrow \infty \), to an average of f over the full group \(U(\deg Q-1)\).Footnote 2 For certain test functions f, Katz’s result can be proved elementarily, and in fact his proof proceeds by first (without using algebraic geometry) establishing it for a particular family of functions. The elementary proof of (5.9), which corresponds to \(f(U)=|\textrm{Sc}_n(U)|^{2k}\), certainly does not generalize to general functions.

What about traces of symmetric powers? These arise when twisting the Möbius function by a character. Given a Dirichlet character \(\chi \) modulo Q, we have

$$\begin{aligned} \frac{1}{L(u,\chi )} = \sum _{f \text { monic}} \mu (f)\chi (f)u^{\deg f}. \end{aligned}$$
(5.10)

The nth coefficient of this rational function is

$$\begin{aligned} \sum _{f \in \mathcal {M}_{n,q}} \mu (f)\chi (f) \ll _{n, \deg Q} q^{\frac{n}{2}} \end{aligned}$$
(5.11)

by Weil’s Riemann hypothesis. If \(\chi \) is odd and primitive we have

$$\begin{aligned} \frac{1}{L(u,\chi )} = \frac{1}{\det (I_{\deg Q-1}-u\sqrt{q}\Theta _{\chi })} = \sum _{n=0}^{\infty } q^{\frac{n}{2}}u^n \textrm{Tr}\textrm{Sym}^n (\Theta _{\chi }), \end{aligned}$$

i.e.

$$\begin{aligned} \sum _{f \in \mathcal {M}_{n,q}} \mu (f)\chi (f)=q^{\frac{n}{2}} \textrm{Tr}\textrm{Sym}^n (\Theta _{\chi }) \end{aligned}$$
(5.12)

for all n. In the large-q limit almost all characters are primitive and odd and we obtain from (5.10), (5.11) and (5.12) that

$$\begin{aligned}{} & {} \frac{1}{\phi (Q)} \sum _{\chi \bmod Q} \left| \sum _{f \in \mathcal {M}_{n,q}} \mu (f)\chi (f) \right| ^{2k}\\{} & {} \quad = q^{nk} \left( \mathbb {E}_{\begin{array}{c} \chi \text { odd,}\\ \text {primitive mod }Q \end{array}} \left| \textrm{Tr}\textrm{Sym}^n(\Theta _{\chi })\right| ^{2k} + o_{q \rightarrow \infty }(1)\right) \end{aligned}$$

if \(k\le \deg Q -1\) (to handle the contribution of the principal character). In particular, by using Katz’s equidistribution result [17, Thm.],

$$\begin{aligned}{} & {} \frac{1}{\phi (Q)} \sum _{\chi \bmod Q} \left| \sum _{f \in \mathcal {M}_{n,q}} \mu (f)\chi (f) \right| ^{2k} \sim q^{nk}\int _{U(\deg Q-1)} \left| \textrm{Tr}\textrm{Sym}^n(U)\right| ^{2k}\nonumber \\{} & {} \quad = q^{nk}N_{(n^k),(n^k)} \end{aligned}$$
(5.13)

holds as \(q \rightarrow \infty \) if \(k\le \deg Q-1\), at least if Q is squarefree (a condition required by Katz). In the second equality in (5.13) we used Lemma 3.1, which is the reason for the range \(k \le \deg Q-1\).

In the restricted range \(nk \le \deg Q\) we can mimic (5.5)–(5.7) and obtain

$$\begin{aligned} \frac{1}{\phi (Q)} \sum _{\chi \bmod Q} \left| \sum _{f \in \mathcal {M}_{n,q}} \mu (f)\chi (f) \right| ^{2k} = \mathbb {E} \left| \sum _{\begin{array}{c} f \in \mathcal {M}_{n,q}\\ (f,Q)=1 \end{array}} \mu ^2(f)\alpha (f) \right| ^{2k}. \end{aligned}$$
(5.14)

The asymptotic relation (5.13) can be shown to imply that if we replace equality in (5.14) with an asymptotic then it continues to hold in the much wider range \(k \le \deg Q - 1\), at least if one takes a large-q limit and assumes Q is squarefree. We believe this should hold without taking the large-q limit and for all Q, and we suggest the following conjecture in integers.

Conjecture 5.1

Suppose q and x tend to \(\infty \) and that \(k < \log q\). We have

$$\begin{aligned}\frac{1}{\phi (q)} \sum _{\chi \bmod q} \left| \sum _{n \le x} \mu (n)\chi (n)\right| ^{2k}&\sim \mathbb {E} \left| \sum _{\begin{array}{c} n \le x\\ (n,q)=1 \end{array}} \mu ^2(n)\alpha (n)\right| ^{2k},\\ \frac{1}{\phi (q)} \sum _{\chi \bmod q} \left| \sum _{n \le x} \lambda (n)\chi (n)\right| ^{2k}&\sim \mathbb {E} \left| \sum _{\begin{array}{c} n \le x\\ (n,q)=1 \end{array}} \alpha (n)\right| ^{2k}. \end{aligned}$$

Here k is allowed to vary with x and q, and \(\lambda \) is the Liouville function \(\lambda (n)=(-1)^{\sum _{p,\, k \ge 1:\, p^k \mid n} 1}\).

This should be contrasted with the more modest range \(x^k\le q\) that occurs in the same problem but without an appearance of \(\mu \) or \(\lambda \):

$$\begin{aligned} \frac{1}{\phi (q)} \sum _{\chi \bmod q} \left| \sum _{n \le x} \chi (n)\right| ^{2k} = \mathbb {E} \left| \sum _{\begin{array}{c} n \le x\\ (n,q)=1 \end{array}} \alpha (n)\right| ^{2k}. \end{aligned}$$
(5.15)

Remark 5.2

The range \(x^k \le q\) is essentially optimal (up to \(x^{o(1)}\)), even if we replace equality with an asymptotic. In the current formulation this is trivial, since the principal character contributes \(x^{2k}/\phi (q)\) to the left-hand side of (5.15). If one removes the principal character, this still should be optimal. We do not attempt to demonstrate this here, but make two comments: 1) if x is an integer divisible by q then \(\sum _{n\le x} \chi (n)\) vanishes for any nonprincipal \(\chi \), showing the left-hand side of (5.15) vanishes for \(x=q\) if we remove the principal character, 2) the optimality claim is related to \(\int _{U(N)} |\textrm{Sc}_n(U)|^{2k}\,dU \sim N_{(n^k),(n^k)}\) failing to hold if \(N \le nk(1-\varepsilon )\) and \(n \rightarrow \infty \), which we expect can be demonstrated using Proposition 4.3 and the tools in [19].

Already for \(k=1\) Conjecture 5.1 is a very difficult open problem, related to the variance of \(\mu \) and \(\lambda \) in arithmetic progressions, see e.g. [18].

We view Conjecture 5.1 as a manifestation of Möbius randomness. One interpretation of it is that the Steinhaus random multiplicative function \(\alpha \) (resp. \(\alpha \cdot \mu ^2\)) is a good model for \(\lambda \) (resp. \(\mu \)) twisted by a random Dirichlet character \(\chi \) modulo q (\(q \rightarrow \infty \)).Footnote 3 It certainly models \(\lambda \) times a random character better than it models just a random character, at least from the point of view of moments.

Let

$$\begin{aligned} \mu _{x,k}(n)&:= \sum _{n_1 n_2 \cdots n_k = n, \, \forall i: \, n_i \le x} \prod _{i=1}^{k} \mu (n_i),\\ \lambda _{x,k}(n)&:= \sum _{n_1 n_2 \cdots n_k = n, \, \forall i: \, n_i \le x} \prod _{i=1}^{k} \lambda (n_i)=\lambda (n)\sum _{n_1 n_2 \cdots n_k = n, \, \forall i: \, n_i \le x} 1. \end{aligned}$$

From orthogonality of characters we have the identities

$$\begin{aligned} \frac{1}{\phi (q)} \sum _{\chi \bmod q} \left| \sum _{n \le x}\mu (n)\chi (n)\right| ^{2k}= & {} \frac{1}{\phi (q)} \sum _{\chi \bmod q} \left| \sum _{n \le x^k}\mu _{x,k}(n)\chi (n)\right| ^{2}\nonumber \\= & {} \sum _{\begin{array}{c} n,m \le x^k\\ n \equiv m \bmod q \\ (nm,q)=1 \end{array}} \mu _{x,k}(n) \mu _{x,k}(m). \end{aligned}$$
(5.16)

and similarly

$$\begin{aligned} \frac{1}{\phi (q)} \sum _{\chi \bmod q} \left| \sum _{n \le x}\lambda (n)\chi (n)\right| ^{2k}= & {} \frac{1}{\phi (q)} \sum _{\chi \bmod q} \left| \sum _{n \le x^k}\lambda _{x,k}(n)\chi (n)\right| ^{2}\nonumber \\= & {} \sum _{\begin{array}{c} n,m \le x^k\\ n \equiv m \bmod q \\ (nm,q)=1 \end{array}} \lambda _{x,k}(n) \lambda _{x,k}(m). \end{aligned}$$
(5.17)

Observe that the diagonal terms in (5.16) are \(\sum _{n \le x^k, \, (n,q)=1} \mu _{x,k}(n)^2\), and this sum satisfies (using (5.2))

$$\begin{aligned}\sum _{n \le x^k, \, (n,q)=1} \mu _{x,k}(n)^2= \mathbb {E} \left| \sum _{\begin{array}{c} n \le x\\ (n,q)=1 \end{array}} \mu ^2(n) \alpha (n)\right| ^{2k}.\end{aligned}$$

Hence, Conjecture 5.1 says that the diagonal terms \(n=m\) give (asymptotically) the main contribution to the 2kth moment in (5.16), as long as \(k< \log q\) and \(\min \{q,x\} \rightarrow \infty \). Similarly, the same applies to (5.17), according to the conjecture.