Recall from Chapter 1 that if \((\mathcal{A},\varphi )\) is a non-commutative probability space and \(\mathcal{A}_{1},\ldots,\mathcal{A}_{s}\) are subalgebras of \(\mathcal{A}\) which are free with respect to φ, then freeness gives us in principle a rule by which we can evaluate φ(a 1 a 2a k ) for any alternating word in random variables a 1, a 2, , a k . Thus we can in principle calculate all mixed moments for a system of free random variables. However, we do not yet have any concrete idea of the structure of this factorization rule. This situation will be greatly clarified by the introduction of free cumulants. Classical cumulants appeared in Chapter 1, where we saw that they are intimately connected with the combinatorial notion of set partitions. Our free cumulants will be linked in a similar way to the lattice of non-crossing set partitions; the latter were introduced in combinatorics by Kreweras [113]. We will motivate the appearance of free cumulants and non-crossing partition lattices in free probability theory by examining in detail a proof of the central limit theorem by the method of moments.

The combinatorial approach to free probability was initiated by Speicher in [159, 161], in order to get alternative proofs for the free central limit theorem and the main properties of the R-transform, which had been treated before by Voiculescu in [176, 177] by more analytic tools. Nica showed a bit later in [135] how this combinatorial approach connects in general to Voiculescu’s operator-theoretic approach in terms of creation and annihilation operators on the full Fock space. The combinatorial path was pursued much further by Nica and Speicher; for more details on this, we refer to the standard reference [137].

2.1 The classical and free central limit theorems

Our setting is that of a non-commutative probability space \((\mathcal{A},\varphi )\) and a sequence \((a_{i})_{i\in \mathbb{N}} \subset \mathcal{A}\) of centred and identically distributed random variables. This means that φ(a i ) = 0 for all i ≥ 1 and that φ(a i n) = φ(a j n) for any i, j, n ≥ 1. We assume that our random variables a i ,  i ≥ 1 are either classically independent or freely independent as defined in Chapter 1. Either form of independence gives us a factorization rule for calculating mixed moments in the random variables.

For k ≥ 1, set

$$\displaystyle{ S_{k}:= \frac{1} {\sqrt{k}}(a_{1} +\ldots +a_{k}). }$$
(2.1)

The Central Limit Theorem is a statement about the limit distribution of the random variable S k in the large k limit. Let us begin by reviewing the kind of convergence we shall be considering.

Recall that given a real-valued random variable X on a probability space, we have a probability measure μ X on \(\mathbb{R}\), called the distribution of X. The distribution of X is defined by the equation

$$\displaystyle{ \mathrm{E}(f(X)) =\int f(t)\,d\mu _{X}(t)\mbox{ for all }f \in C_{b}(\mathbb{R}) }$$
(2.2)

where \(C_{b}(\mathbb{R})\) is the C -algebra of all bounded continuous functions on \(\mathbb{R}\). We say that a probability measure μ on \(\mathbb{R}\) is determined by it moments if μ has moments {α k } k of all orders and μ is the only probability measure on \(\mathbb{R}\) with moments {α k } k . If the moment generating function of μ has a positive radius of convergence, then μ is determined by its moments (see Billingsley [41, Theorem 30.1]).

Exercise 1. 

Show that a compactly supported measure is determined by its moments.

A more general criterion is the Carleman condition (see Akhiezer [3, p. 85]) which says that a measure μ is determined by its moments {α k } k if we have k ≥ 1(α 2k )−1∕(2k) = .

Exercise 2. 

Using the Carleman condition, show that the Gaussian measure is determined by its moments.

A sequence of probability measures {μ n } n on \(\mathbb{R}\) is said to converge weakly to μ if {∫f n } n converges to ∫f for all \(f \in C_{b}(\mathbb{R})\). Given a sequence {X n } n of real-valued random variables, we say that {X n } n converges in distribution (or converges in law) if the probability measures \(\{\mu _{X_{n}}\}_{n}\) converge weakly.

If we are working in a non-commutative probability space \((\mathcal{A},\varphi )\), we call an element a of A a non-commutative random variable. Given such an a, we may define μ a by ∫p a = φ(p(a)) for all polynomials \(p \in \mathbb{C}[x]\). At this level of generality, we may not be able to define ∫f a for all functions \(f \in C_{b}(\mathbb{R})\), so we call the linear functional \(\mu _{a}: \mathbb{C}[x] \rightarrow \mathbb{C}\) the algebraic distribution of a, even if it is not a probability measure. However when it is clear from the context we shall just call μ a the distribution of a. Note that if a is a self-adjoint element of a C -algebra and φ is positive and has norm 1, then μ a extends from \(\mathbb{C}[x]\) to \(C_{b}(\mathbb{R})\) and thus μ a becomes a probability measure on \(\mathbb{R}\).

Definition 1.

Let \((\mathcal{A}_{k},\varphi _{k})\), for \(k \in \mathbb{N}\), and \((\mathcal{A},\varphi )\) be non-commutative probability spaces.

  1. 1)

    Let \((b_{k})_{k\in \mathbb{N}}\) be a sequence of non-commutative random variables with \(b_{k} \in \mathcal{A}_{k},\) and let \(b \in \mathcal{A}.\) We say that b k converges in distribution to b, denoted by \(b_k\overset {\text{distr}}{\longrightarrow} b\), if

    $$\displaystyle{ \lim _{k\rightarrow \infty }\varphi _{k}(b_{k}^{n}) =\varphi (b^{n}) }$$
    (2.3)

    for any fixed \(n \in \mathbb{N}.\)

  2. 2)

    More generally, let I be an index set. For each iI, let \(b_{k}^{(i)} \in \mathcal{A}_{k}\) for \(k \in \mathbb{N}\) and \(b^{(i)} \in \mathcal{A}\). We say that (b k (i)) iI converges in distribution to (b (i)) iI , denoted by \((b_{k}^{(i)})_{i\in I}\mathop{\longrightarrow }\limits^{\text{distr}}(b^{(i)})_{i\in I}\), if

    $$\displaystyle{ \lim _{k\rightarrow \infty }\varphi _{k}(b_{k}^{(i_{1})}\cdots b_{ k}^{(i_{n})}) =\varphi (b^{(i_{1})}\cdots b^{(i_{n})}) }$$
    (2.4)

    for all \(n \in \mathbb{N}\) and all i 1, , i n I.

Note that this definition is neither weaker nor stronger than weak convergence of the corresponding distributions. For real-valued random variables, the convergence in (2.3) is sometimes called convergence in moments. However there is an important case where the two conditions coincide. If we have a sequence of probability measures {μ k } k on \(\mathbb{R}\), each having moments of all orders and a probability measure μ determined by its moments, such that for every n we have ∫t n k (t) → ∫t n as k, then {μ k } k converges weakly to μ (see Billingsley [41, Theorem 30.2]). To see that weak convergence does not imply convergence in moments, consider the sequence {μ k } k where μ k = (1 − 1∕k)δ 0 + (1∕k)δ k and δ k is the probability measure with an atom at k of mass 1.

Exercise 3. 

Show that {μ k } k converges weakly to δ 0 but that we do not have convergence in moments.

We want to make a statement about convergence in distribution of the random variables \((S_{k})_{k\in \mathbb{N}}\) from (2.1) (which all come from the same underlying non-commutative probability space). Thus we need to do a moment calculation. Let [k] = {1, , k} and [n] = {1, , n}. We have

$$\displaystyle{ \varphi (S_{k}^{n}) = \frac{1} {k^{n/2}}\sum _{r:[n]\rightarrow [k]}\varphi (a_{r_{1}}\cdots a_{r_{n}}). }$$

It turns out that the fact that the random variables a 1, , a k are independent and identically distributed makes the task of calculating this sum less complex than it initially appears. The key observation is that because of (classical or free) independence of the a i ’s and the fact that they are identically distributed, the value of \(\varphi (a_{r_{1}}\cdots a_{r_{n}})\) depends not on all details of the multi-index r, but just on the information where the indices are the same and where they are different. Let us recall some notation from the proof of Theorem 1.1.

Notation 2.

Let i = (i 1, , i n ) be a multi-index. Then its kernel , denoted by keri, is that partition in \(\mathcal{P}(n)\) whose blocks correspond exactly to the different values of the indices (Fig.  2.1 ),

$$\displaystyle{k\mathit{\text{ and }}l\mathit{\text{ are in the same block of }}\ker i\quad \Longleftrightarrow\quad i_{k} = i_{l}.}$$
Fig. 2.1
figure 1

Suppose j 1 = j 3 = j 4 and j 2 = j 5 but {j 1, j 2, j 6} are distinct. Then ker(j) = {(1, 3, 4), (2, 5), (6)}

Lemma 3.

With this notation we have that keri = kerj implies \(\varphi (a_{i_{1}}\cdots a_{i_{n}}) =\varphi (a_{j_{1}}\cdots a_{j_{n}})\).

Proof:

To see this note first that keri = kerj implies that the i-indices can be obtained from the j-indices by the application of some permutation σ, i.e. (j 1, , j n ) = (σ(i 1), , σ(i n )). We know that the random variables a 1, , a k are (classically or freely) independent. This means that we have a factorization rule for calculating mixed moments in a 1, , a k in terms of the moments of individual a i ’s. In particular this means that \(\varphi (a_{i_{1}}\cdots a_{i_{n}})\) can be written as some expression in moments φ(a i r), while \(\varphi (a_{j_{1}}\cdots a_{j_{n}})\) can be written as that same expression except with φ(a i r) replaced by φ(a σ(i) r). However, since our random variables all have the same distribution, then φ(a i r) = φ(a σ(i) r) for any i, j, and thus \(\varphi (a_{i_{1}}\cdots a_{i_{n}}) =\varphi (a_{j_{1}}\cdots a_{j_{n}})\). □

Let us denote the common value of \(\varphi (a_{i_{1}}\cdots a_{i_{n}})\) for all i with keri = π, for some \(\pi \in \mathcal{P}(n)\), by φ(π). Consequently, we have

$$\displaystyle{ \varphi (S_{k}^{n}) = \frac{1} {k^{n/2}}\sum _{\pi \in \mathcal{P}(n)}\varphi (\pi ) \cdot \vert \{i: [n] \rightarrow [k]\mid \ker i =\pi \} \vert. }$$

It is not difficult to see that

$$\displaystyle{\#\{i: [n] \rightarrow [k]\mid \ker i =\pi \}= k(k - 1)\cdots (k - \#(\pi ) + 1)}$$

because we have k choices for the first block of π, k − 1 choices for the second block of π, and so on until the last block where we have k#(π) + 1.

Then what we have proved is that

$$\displaystyle{ \varphi (S_{k}^{n}) = \frac{1} {k^{n/2}}\sum _{\pi \in \mathcal{P}(n)}\varphi (\pi ) \cdot k(k - 1)\cdots (k - \#(\pi ) + 1). }$$

The great advantage of this expression over what we started with is that the number of terms does not depend on k. Thus we are in a position to take the limit as k, provided we can effectively estimate each term of the sum.

Our first observation is the most obvious one, namely, we have

$$\displaystyle{ k(k - 1)\cdots (k - \#(\pi ) + 1) \sim k^{\#(\pi )}\qquad \text{as }k \rightarrow \infty. }$$

Next observe that if π has a block of size 1, then we will have φ(π) = 0. Indeed suppose that \(\pi =\{ V _{1},\ldots,V _{m},\ldots,V _{s}\} \in \mathcal{P}(n)\) with V m = {l} for some l ∈ [n]. Then we will have

$$\displaystyle{ \varphi (\pi ) =\varphi (a_{j_{1}}\cdots a_{j_{l-1}}a_{j_{l}}a_{j_{l+1}}\cdots a_{j_{n}}) }$$

where ker(j) = π and thus j l ∉ {j 1, , j l−1, j l+1, , j n }. Hence we can write \(\varphi (\pi ) =\varphi (ba_{j_{l}}c)\), where \(b = a_{j_{1}}\cdots a_{j_{l-1}}\) and \(c = a_{j_{l+1}}\cdots a_{j_{n}}\) and thus

$$\displaystyle{ \varphi (\pi ) =\varphi (ba_{j_{l}}c) =\varphi (a_{j_{l}})\varphi (bc) = 0, }$$

since \(a_{j_{l}}\) is (classically or freely) independent of {b, c}. (For the free case, this factorization was considered in Equation (1.13) in the last chapter. In the classical case, it is obvious, too.) Of course, for this part of the argument, it is crucial that we assume our variables a i to be centred.

Thus the only partitions which contribute to the sum are those with blocks of size at least 2. Note that such a partition can have at most n∕2 blocks. Now,

$$\displaystyle{ \lim _{k\rightarrow \infty }\frac{k^{\#(\pi )}} {k^{n/2}} = \left \{\begin{array}{@{}l@{\quad }l@{}} 1,\quad &\text{if }\#(\pi ) = n/2\\ 0,\quad &\text{if } \#(\pi ) <n/2 \end{array} \right.. }$$

Hence the only partitions which contribute to the sum in the k limit are those with exactly n∕2 blocks, i.e. partitions each of whose blocks has size 2. Such partitions are called pairings , and the set of pairings is denoted \(\mathcal{P}_{2}(n).\)

Thus we have shown that

$$\displaystyle{ \lim _{k\rightarrow \infty }\varphi (S_{k}^{n}) =\sum _{\pi \in \mathcal{P}_{2}(n)}\varphi (\pi ). }$$

Note that in particular if n is odd, then \(\mathcal{P}_{2}(n) =\emptyset,\) so that the odd limiting moments vanish. In order to determine the even limiting moments, we must distinguish between the setting of classical independence and free independence.

2.1.1 Classical central limit theorem

In the case of classical independence, our random variables commute and factorize completely with respect to φ. Thus if we denote by φ(a i 2) = σ 2 the common variance of our random variables, then for any pairing \(\pi \in \mathcal{P}_{2}(n)\) we have φ(π) = σ n. Thus we have

$$\displaystyle{ \lim _{k\rightarrow \infty }\varphi (S_{k}^{n}) =\sum _{\pi \in \mathcal{P}_{2}(n)}\sigma ^{n} = \left \{\begin{array}{@{}l@{\quad }l@{}} \sigma ^{n}(n - 1)(n - 3)\ldots 5 \cdot 3 \cdot 1,\quad &\text{if }n\text{ even} \\ 0, \quad &\text{if }n\text{ odd} \end{array} \right.. }$$

From Section 1.1, we recognize these as exactly the moments of a Gaussian random variable of mean 0 and variance σ 2. Since by Exercise 2 the normal distribution is determined by its moments, and hence our convergence in moments is the same as the classical convergence in distribution, we get the following form of the classical central limit theorem: if \((a_{i})_{i\in \mathbb{N}}\) are classically independent random variables which are identically distributed with φ(a i ) = 0 and φ(a i 2) = σ 2, and having all moments, then S k converges in distribution to a Gaussian random variable with mean 0 and variance σ 2. Note that one can see the derivation above also as a proof of the Wick formula for Gaussian random variables if one takes the central limit theorem for granted.

2.1.2 Free central limit theorem

Now we want to deal with the case where the random variables are freely independent. In this case, φ(π) will not be the same for all pair partitions \(\pi \in \mathcal{P}_{2}(2n)\) (we focus on the even moments now because we already know that the odd ones are zero). Let’s take a look at some examples:

$$\displaystyle\begin{array}{rcl} \varphi (\{(1,2),(3,4)\})& =& \varphi (a_{1}a_{1}a_{2}a_{2}) =\varphi (a_{1}^{2})\varphi (a_{ 2}^{2}) =\sigma ^{4} {}\\ \varphi (\{(1,4),(2,3)\})& =& \varphi (a_{1}a_{2}a_{2}a_{1}) =\varphi (a_{1}^{2})\varphi (a_{ 2}^{2}) =\sigma ^{4} {}\\ \varphi (\{(1,3),(2,4)\})& =& \varphi (a_{1}a_{2}a_{1}a_{2}) = 0. {}\\ \end{array}$$

The last equality is just from the definition of freeness, because a 1 a 2 a 1 a 2 is an alternating product of centred free variables.

In general, we will get φ(π) = σ 2n if we can successively remove neighbouring pairs of identical random variables in the word corresponding to π so that we end with a single pair (see Fig. 2.2); if we cannot we will have φ(π) = 0 as in the example φ(a 1 a 2 a 1 a 2) = 0 above. Thus the only partitions that give a non-zero contribution are the non-crossing ones (see [137, p. 122] for details). Non-crossing pairings were encountered already in Chapter 1, where we denoted the set of non-crossing pairings by NC 2(2n). Then we have as our free central limit theorem that

$$\displaystyle{ \lim _{k\rightarrow \infty }\varphi (S_{k}^{2n}) =\sigma ^{2n} \cdot \vert NC_{ 2}(2n)\vert. }$$

In Chapter 1 we already mentioned that the cardinality C n : = | NC 2(2n) | is given by the Catalan numbers. We want now to elaborate on the proof of this claim.

Fig. 2.2
figure 2

We start with the pairing {(1, 4), (2, 3), (5, 6)} and remove the pair (2, 3) of adjacent elements (middle figure). Next we remove the pair (1, 4) of adjacent elements. We are then left with a single pair; so the pairing must have been non-crossing to start with

A very simple method is to show that the pairings are in a bijective correspondence with the Dyck paths; by using André’s reflection principle, one finds that there are \(\binom{2n}{n} -\binom{2n}{n - 1} = \frac{1} {n+1}\binom{2n}{n}\) such paths (see [137, Prop. 2.11] for details).

Our second method for counting non-crossing pairings is to find a simple recurrence which they satisfy. The idea is to look at the block of a pairing which contains the number 1. In order for the pairing to be non-crossing, 1 must be paired with some even number in the set [2n], else we would necessarily have a crossing. Thus 1 must be paired with 2i for some i ∈ [n]. Now let i run through all possible values in [n], and count for each the number of non-crossing pairings that contain this pair, as in the diagram (Fig. 2.3).

Fig. 2.3
figure 3

We have C i−1 possible pairings on [2, 2i − 1] and C ni possible pairings on [2i + 1, 2n]

In this way we see that the cardinality C n of NC 2(2n) must satisfy the recurrence relation

$$\displaystyle{ C_{n} =\sum _{ i=1}^{n}C_{ i-1}C_{n-i}, }$$
(2.5)

with initial condition C 0 = 1. One can then check using a generating function that the Catalan numbers satisfy this recurrence; hence \(C_{n} = \frac{1} {n+1}\binom{2n}{n}\).

Exercise 4. 

Let f(z) = n = 0 C n z n be the generating function for {C n } n , where C 0 = 1 and C n satisfies the recursion (2.5).

  1. (i)

    Show that 1 + zf(z)2 = f(z).

  2. (ii)

    Show that f is also the power series for \(\frac{1-\sqrt{1-4z}} {2z}\).

  3. (iii)

    Show that \(C_{n} = \frac{1} {n+1}\binom{2n}{n}\).

We can also prove directly that \(C_{n} = \frac{1} {n+1}\binom{2n}{n}\) by finding a bijection between NC 2(2n) and some standard set of objects which we can see directly is enumerated by the Catalan numbers. A reasonable choice for this “canonical” set is the collection of 2 × n standard Young tableaux. A standard Young tableaux of shape 2 × n is a filling of the squares of a 2 × n grid with the numbers 1, , 2n which is strictly increasing in each of the two rows and each of the n columns. The number of these standard Young tableaux is very easy to calculate, using a famous and fundamental result known as the hook-length formula [167, Vol. 2, Corollary 7.21.6]. The hook-length formula tells us that the number of standard Young tableaux on the 2 × n rectangle is

$$\displaystyle{ \frac{(2n)!} {(n + 1)!n!} = \frac{1} {n + 1}\binom{2n}{n}. }$$
(2.6)

Thus we will have proved that \(\vert NC_{2}(2n)\vert = \frac{1} {n+1}\binom{2n}{n}\) if we can bijectively associate to each pair partition πNC 2(2n) a standard Young tableaux on the 2 × n rectangular grid. This is very easy to do. Simply take the “left-halves” of each pair in π and write them in increasing order in the cells of the first row. Then take the “right-halves” of each pair of π and write them in increasing order in the cells of the second row. Figure 2.4 shows the bijection between NC 2(6) and standard Young tableaux on the 2 × 3 rectangle.

Fig. 2.4
figure 4

In the bijection between NC 2(6) and 2 × 3 standard Young tableaux, the pairing {(1, 2), (3, 6), (4, 5)} gets mapped to the tableaux on the right

Definition 4.

A self-adjoint random variable s with odd moments φ(s 2n+1) = 0 and even moments φ(s 2n) = σ 2n C n , where C n is the n-th Catalan number and σ > 0 is a constant, is called a semi-circular element of variance σ 2. In the case σ = 1, we call it the standard semi-circular element.

The argument we have just provided gives us the free central limit theorem.

Theorem 5.

If \((a_{i})_{i\in \mathbb{N}}\) are self-adjoint, freely independent, and identically distributed with φ(a i ) = 0 and φ(a i 2) = σ 2, then S k converges in distribution to a semi-circular element of variance σ 2 as k. 

This free central limit theorem was proved as one of the first results in free probability theory by Voiculescu already in [176]. His proof was much more operator theoretic; the proof presented here is due to Speicher [159] and was the first hint at a relation between free probability theory and the combinatorics of non-crossing partitions. (An early concrete version of the free central limit theorem, before the notion of freeness was isolated, appeared also in the work of Bożejko [43] in the context of convolution operators on free groups.)

Recall that in Chapter 1 it was shown that for a random matrix X N chosen from N × N gue we have that

$$\displaystyle{ \lim _{N\rightarrow \infty }E[\mathrm{tr}(X_{N}^{n})] = \left \{\begin{array}{@{}l@{\quad }l@{}} 0, \quad &\text{ if }n\text{ odd} \\ C_{n/2},\quad &\text{ if }n\text{ even} \end{array} \right. }$$
(2.7)

so that a gue random matrix is a semi-circular element in the limit of large matrix size, \(X_{N}\mathop{\longrightarrow }\limits^{\text{distr}}s\).

We can also define a family of semi-circular random variables.

Definition 6.

Suppose \((\mathcal{A},\varphi )\) is a ∗-probability space. A self-adjoint family \((s_{i})_{i\in I} \subset \mathcal{A}\) is called a semi-circular family of covariance C = (c ij ) i, jI if C ≥ 0 and for any n ≥ 1 and any n-tuple i 1, , i n I we have

$$\displaystyle{ \varphi (s_{i_{1}}\cdots s_{i_{n}}) =\sum _{\pi \in NC_{2}(n)}\varphi _{\pi }[s_{i_{1}},\ldots,s_{i_{n}}], }$$

where

$$\displaystyle{ \varphi _{\pi }[s_{i_{1}},\ldots,s_{i_{n}}] =\prod _{(p,q)\in \pi }c_{i_{p}i_{q}}. }$$

If C is diagonal, then (s i ) iI is a free semi-circular family.

This is the free analogue of Wick’s formula. In fact, using this language and our definition of convergence in distribution from Definition 1, it follows directly from Lemma 1.9 that if X 1, , X r are matrices chosen independently from gue, then, in the large N limit, they converge in distribution to a semi-circular family s 1, , s r of covariance c ij = δ ij . 

Exercise 5. 

Show that if {x 1, , x n } is a semi-circular family and A = (a ij ) is an invertible matrix with real entries, then {y 1, , y n } is a semi-circular family where y i = j a ij x j .

Exercise 6. 

Let {x 1, , x n } be a semi-circular family such that for all i and j we have φ(x i x j ) = φ(x j x i ). Show that by diagonalizing the covariance matrix we can find an orthogonal matrix O = (o ij ) such that {y 1, , y n } is a free semi-circular family where y i = j o ij x j .

Exercise 7. 

Formulate and prove a multidimensional version of the free central limit theorem.

2.2 Non-crossing partitions and free cumulants

We begin by recalling some relevant definitions concerning non-crossing partitions from Section 1.8

Definition 7.

A partition \(\pi \in \mathcal{P}(n)\) is called non-crossing if there do not exist numbers i, j, k, l ∈ [n] with i < j < k < l such that i and k are in the same block of π and j and l are in the same block of π, but i and j are not in the same block of π. The collection of all non-crossing partitions of [n] was denoted NC(n).

Figure 2.5 should make it clear what a crossing in a partition is; a non-crossing partition is a partition with no crossings.

Fig. 2.5
figure 5

A crossing in a partition

Note that \(\mathcal{P}(n)\) is partially ordered by

$$\displaystyle{ \pi _{1} \leq \pi _{2}\;\Longleftrightarrow\;\text{ each block of }\pi _{1}\text{ is contained in a block of }\pi _{2}. }$$
(2.8)

We also say that π 1 is a refinement of π 2. NC(n) is a subset of \(\mathcal{P}(n)\) and inherits this partial order, so NC(n) is an induced sub-poset of \(\mathcal{P}(n)\). In fact both are lattices; they have well-defined join ∨ and meet ∧ operations (though the join of two non-crossing partitions in NC(n) does not necessarily agree with their join when viewed as elements of \(\mathcal{P}(n)\)). Recall that the join π 1π 2 in a lattice is the smallest σ with the property that σπ 1 and σπ 2 and that the meet π 1π 2 is the largest σ with the property that σπ 1 and σπ 2.

We now define the important free cumulants of a non-commutative probability space (A, φ). They were introduced by Speicher in [161]. For other notions of cumulants and the relation between them, see [11, 74, 117, 153].

Definition 8.

Let \((\mathcal{A},\varphi )\) be a non-commutative probability space. The corresponding free cumulants \(\kappa _{n}: \mathcal{A}^{n} \rightarrow \mathbb{C}\) (n ≥ 1) are defined inductively in terms of moments by the moment-cumulant formula

$$\displaystyle{ \varphi (a_{1}\cdots a_{n}) =\sum _{\pi \in NC(n)}\kappa _{\pi }(a_{1},\ldots,a_{n}), }$$
(2.9)

where, by definition, if π = {V 1, , V r }, then

$$\displaystyle{ \kappa _{\pi }(a_{1},\ldots,a_{n}) =\mathop{ \prod _{V \in \pi }}_{V =(i_{1},\ldots,i_{l})}\kappa _{l}(a_{i_{1}},\ldots,a_{i_{l}}). }$$
(2.10)

Remark 9.

In Equation (2.10) and below, we always mean that the elements i 1, , i l of V are in increasing order. Note that Equation (2.9) has a formulation using Möbius inversion which we might call the cumulant-moment formula . To present this we need the moment version of Equation (2.10). For a partition \(\pi \in \mathcal{P}(n)\) with π = {V 1, , V r }, we set

$$\displaystyle{ \varphi _{\pi }(a_{1},\ldots,a_{n}) =\mathop{ \prod _{V \in \pi }}_{V =(i_{1},\ldots,i_{l})}\varphi (a_{i_{1}}\cdots a_{i_{l}}). }$$
(2.11)

We also need the Möbius function μ for NC(n) (see [137, Lecture 10]). Then our cumulant-moment relation can be written

$$\displaystyle{ \kappa _{n}(a_{1},\ldots,a_{n}) =\sum _{\pi \in NC(n)}\mu (\pi,1_{n})\varphi _{\pi }(a_{1},\ldots,a_{n}). }$$
(2.12)

One could use Equation (2.12) as the definition of free cumulants; however for practical calculations Equation (2.9) is usually easier to work with.

Example 10.

  1. (1)

    For n = 1, we have φ(a 1) = κ 1(a 1), and thus

    $$\displaystyle{ \kappa _{1}(a_{1}) =\varphi (a_{1}). }$$
    (2.13)
  2. (2)

    For n = 2, we have

    $$\displaystyle{ \varphi (a_{1}a_{2}) =\kappa _{\{(1,2)\}}(a_{1},a_{2}) +\kappa _{\{(1),(2)\}}(a_{1},a_{2}) =\kappa _{2}(a_{1},a_{2}) +\kappa _{1}(a_{1})\kappa _{1}(a_{2}). }$$

    Since we know from the n = 1 calculation that κ 1(a 1) = φ(a 1), this yields

    $$\displaystyle{ \kappa _{2}(a_{1},a_{2}) =\varphi (a_{1}a_{2}) -\varphi (a_{1})\varphi (a_{2}). }$$
    (2.14)
  3. (3)

    For n = 3, we have

    $$\displaystyle\begin{array}{rcl} \varphi (a_{1}a_{2}a_{3})& =& \ \kappa _{\{(1,2,3)\}}(a_{1},a_{2},a_{3})\,+\,\kappa _{\{(1,2),(3)\}}(a_{1},a_{2},a_{3})\,+\,\kappa _{\{(1),(2,3)\}}(a_{1},a_{2},a_{3}) {}\\ & & +\kappa _{\{(1,3),(2)\}}(a_{1},a_{2},a_{3}) +\kappa _{\{(1),(2),(3)\}}(a_{1},a_{2},a_{3}) {}\\ & =& \ \kappa _{3}(a_{1},a_{2},a_{3}) +\kappa _{2}(a_{1},a_{2})\kappa _{1}(a_{3}) +\kappa _{2}(a_{2},a_{3})\kappa _{1}(a_{1}) {}\\ & & +\kappa _{2}(a_{1},a_{3})\kappa _{1}(a_{2}) +\kappa _{1}(a_{1})\kappa _{1}(a_{2})\kappa _{1}(a_{3}). {}\\ \end{array}$$

Thus we find that

$$\displaystyle\begin{array}{rcl} \kappa _{3}(a_{1},a_{2},a_{3})& =& \varphi (a_{1}a_{2}a_{3}) -\varphi (a_{1})\varphi (a_{2}a_{3}) \\ & & -\varphi (a_{2})\varphi (a_{1}a_{3}) -\varphi (a_{3})\varphi (a_{1}a_{2}) + 2\varphi (a_{1})\varphi (a_{2})\varphi (a_{3}).{}\end{array}$$
(2.15)

These three examples outline the general procedure of recursively defining κ n in terms of the mixed moments. It is easy to see that κ n is an n-linear function.

Exercise 8. 

(i) Show the following: if φ is a trace, then the cumulant κ n is, for each \(n \in \mathbb{N}\), invariant under cyclic permutations, i.e. for all \(a_{1},\ldots,a_{n} \in \mathcal{A}\), we have

$$\displaystyle{ \kappa _{n}(a_{1},a_{2},\ldots,a_{n}) =\kappa _{n}(a_{2},\ldots,a_{n},a_{1}). }$$

(ii) Let us assume that all moments with respect to φ are invariant under all permutations of the entries, i.e. that we have for all \(n \in \mathbb{N}\) and all \(a_{1},\ldots,a_{n} \in \mathcal{A}\) and all σS n that φ(a σ(1)a σ(n)) = φ(a 1a n ). Is it then true that also the free cumulants κ n (\(n \in \mathbb{N}\)) are invariant under all permutations?

Let us also point out how the definition appears when a 1 = = a n = a, i.e. when all the random variables are the same. Then we have

$$\displaystyle{ \varphi (a^{n}) =\sum _{\pi \in NC(n)}\kappa _{\pi }(a,\ldots,a). }$$

Thus if we write α n a: = φ(a n) and κ π a: = κ π (a, , a), this reads

$$\displaystyle{ \alpha _{n}^{a} =\sum _{\pi \in NC(n)}\kappa _{\pi }^{a}. }$$
(2.16)

Note the similarity to Equation (1.3) for classical cumulants.

Since the Catalan number is the number of non-crossing pairings of [2n] as well as the number of non-crossing partitions of [n], we can use Equation (2.16) to show that the cumulants of the standard semi-circle law are all 0 except κ 2 = 1.

Exercise 9. 

Use Equation (2.16) to show that for the standard semi-circle law all cumulants are 0, except κ 2 which equals 1.

As another demonstration of the simplifying power of the moment-cumulant formula (2.16), let us use the formula to find a simple expression for the moments and free cumulants of the Marchenko-Pastur law. This is a probability measure on \(\mathbb{R}^{+} \cup \{ 0\}\) that is as fundamental as the semi-circle law (see Section 4.5). Let 0 < c < be a positive real number. For each c we shall construct a probability measure ν c . Set \(a = (1 -\sqrt{c})^{2}\) and \(b = (1 + \sqrt{c})^{2}\). For c ≥ 1, ν c has as support the interval [a, b] and the density \(\sqrt{(b - x)(x - a)}/(2\pi x)\); that is

$$\displaystyle{d\nu _{c}(x) = \frac{\sqrt{(b - x)(x - a)}} {2\pi x} dx.}$$

For 0 < c < 1, ν c has the same density on [a, b] and in addition has an atom at 0 of mass 1 − c; thus

$$\displaystyle{d\nu _{c}(x) = (1 - c)\delta _{0} + \frac{\sqrt{(b - x)(x - a)}} {2\pi x} dx.}$$

Note that when c = 1, a = 0 and the density has a “pole” of order 1/2 at 0 and thus is still integrable.

Exercise 10. 

In this exercise we shall show that ν c is a probability measure for all c. Let R = −x 2 + (a + b)xab, and then write

$$\displaystyle{\frac{\sqrt{R}} {x} = \frac{R} {x\sqrt{R}} = \frac{1} {2} \frac{-2x + (a + b)} {\sqrt{R}} + \frac{1} {2} \frac{a + b} {\sqrt{R}} - \frac{ab} {x\sqrt{R}}.}$$
  1. (i)

    Show that the integral of the first term on [a, b] is 0.

  2. (ii)

    Using the substitution \(t = (x - (1 + c))/\sqrt{c}\), show that the integral of the second term over [a, b] is π(a + b)∕2.

  3. (iii)

    Let u = (ba)∕(2ab), v = (b + a)∕(2ab) and t = u −1(vx −1). With this substitution show that the integral of the third term over [a, b] is \(-\pi \sqrt{ab}\).

  4. (iv)

    Using the first three parts, show that ν c is a probability measure.

Definition 11.

The Marchenko-Pastur distribution is the law with distribution ν c with 0 < c < . We shall see in Exercise 11 that all free cumulants of ν c are equal to c. By analogy with the classical cumulants of the Poisson distribution, ν c is also called the free Poisson law (of rate c). We should also note that we have chosen a different normalization than that used by other authors in order to make the cumulants simple; see Remark 12 and Exercise 12 below.

Exercise 11. 

In this exercise we shall find the moments and free cumulants of the Marchenko-Pastur law.

  1. (i)

    Let α n be the n th moment. Use the substitution \(t = (x - (1 + c))/\sqrt{c}\) to show that

    $$\displaystyle{\alpha _{n} =\sum _{ k=0}^{[(n-1)/2]} \frac{1} {k + 1}\binom{n - 1}{2k}\binom{2k}{k}(1 + c)^{n-2k-1}c^{1+k}.}$$
  2. (ii)

    Expand the expression (1 + c)n−2k−1 to obtain that

    $$\displaystyle{\alpha _{n} =\sum _{ k=0}^{[(n-1)/2]}\sum _{ l=k}^{n-k-1} \frac{(n - 1)!} {k!\,(k + 1)!\,(l - k)!\,(n - k - l - 1)!}c^{l+1}.}$$
  3. (iii)

    Interchange the order of summation and use Vandermonde convolution ([79, (5.23)]) to show that

    $$\displaystyle{\alpha _{n} =\sum _{ l=1}^{n}\frac{c^{l}} {n}\binom{n}{l - 1}\binom{n}{l}.}$$
  4. (iv)

    Finally use the fact ([137, Cor. 9.13]) that \(\frac{1} {n}\binom{n}{l - 1}\binom{n}{l}\) is the number of non-crossing partitions of [n] with l blocks to show that

    $$\displaystyle{\alpha _{n} =\sum _{\pi \in NC(n)}c^{\#(\pi )}.}$$

    Use this formula to show that κ n = c for all n ≥ 1.

Remark 12.

Given y > 0, let \(a' = (1 -\sqrt{y})^{2}\) and \(b' = (1 + \sqrt{y})^{2}\). Let ρ y be the probability measure on \(\mathbb{R}\) given by \(\sqrt{(b' - t)(t - a')}/(2\pi yt)\,dt\) on [a′, b′] when y ≤ 1 and \((1 - y^{-1})\delta _{0} + \sqrt{(b' - t)(t - a')}/(2\pi yt)\,dt\) on {0} ∪ [a′, b′] when y > 1. As above δ 0 is the Dirac mass at 0. This might be called the standard form of the Marchenko-Pastur law. In the exercise below, we shall see that ρ y is related to ν c in a simple way and the cumulants of ρ y are not as simple as those of ν c .

Exercise 12. 

Show that by setting c = 1∕y and making the substitution t = xc we have

$$\displaystyle{\int x^{k}\,d\nu _{ c}(x) = c^{k}\int t^{k}\,d\rho _{ y}(t).}$$

Show that the free cumulants of ρ y are given by κ n = c 1−n.

There is a combinatorial formula by Krawczyk and Speicher [111] for expanding cumulants whose arguments are products of random variables. For example, consider the expansion of κ 2(a 1 a 2, a 3). This can be written as

$$\displaystyle{ \kappa _{2}(a_{1}a_{2},a_{3}) =\kappa _{3}(a_{1},a_{2},a_{3}) +\kappa _{1}(a_{1})\kappa _{2}(a_{2},a_{3}) +\kappa _{2}(a_{1},a_{3})\kappa _{1}(a_{2}). }$$
(2.17)

A more complicated example is given by:

$$\displaystyle\begin{array}{rcl} & & \kappa _{2}(a_{1}a_{2},a_{3}a_{4}) \\ & & \quad =\kappa _{4}(a_{1},a_{2},a_{3},a_{4}) +\kappa _{1}(a_{1})\kappa _{3}(a_{2},a_{3},a_{4}) +\kappa _{1}(a_{2})\kappa _{3}(a_{1},a_{3},a_{4}) \\ & & \quad +\,\kappa _{1}(a_{3})\kappa _{3}(a_{1},a_{2},a_{4}) +\kappa _{1}(a_{4})\kappa _{3}(a_{1},a_{2},a_{3}) +\kappa _{2}(a_{1},a_{4})\kappa _{2}(a_{2},a_{3}) \\ & & \quad +\,\kappa _{2}(a_{1},a_{3})\kappa _{1}(a_{2})\kappa _{1}(a_{4}) +\kappa _{2}(a_{1},a_{4})\kappa _{1}(a_{2})\kappa _{1}(a_{3}) \\ & & \quad +\,\kappa _{1}(a_{1})\kappa _{2}(a_{2},a_{3})\kappa _{1}(a_{4}) +\kappa _{1}(a_{1})\kappa _{2}(a_{2},a_{4})\kappa _{1}(a_{3}). {}\end{array}$$
(2.18)

In general, the evaluation of a free cumulant with products of entries involves summing over all π which have the property that they connect all different product strings. Here is the precise formulation, for the proof we refer to [137, Theorem 11.12]. Note that this is the free counter part of the formula (1.16) for classical cumulants.

Theorem 13.

Suppose n 1, , n r are positive integers and n = n 1 + + n r . Consider a non-commutative probability space \((\mathcal{A},\varphi )\) and \(a_{1},a_{2},\ldots,a_{n} \in \mathcal{A}\) . Let

$$\displaystyle{A_{1} = a_{1}\cdots a_{n_{1}},\quad A_{2} = a_{n_{1}+1}\cdots a_{n_{1}+n_{2}},\quad \ldots,\quad A_{r} = a_{n_{1}+\cdots +n_{r-1}+1}\cdots a_{n}.}$$

Then

$$\displaystyle{ \kappa _{r}(A_{1},\ldots,A_{r}) =\mathop{ \sum _{\pi \in NC(n)}} _{\pi \vee \tau =1_{n}}\kappa _{\pi }(a_{1},\ldots,a_{n}) }$$
(2.19)

where the summation is over those πNC(n) which connect the blocks corresponding to A 1, , A r . More precisely, this means that πτ = 1 n where

$$\displaystyle{ \tau =\{ (1,\ldots,n_{1}),(n_{1} + 1,\ldots,n_{1} + n_{2}),\ldots,(n_{1} +\ldots +n_{r-1} + 1,\ldots,n)\} }$$

and 1 n = {(1, 2, , n)} is the partition with only one block.

Exercise 13. 

(i) Let τ = {(1, 2), (3)}. List all πNC(3) such that πτ = 13. Check that these are exactly the terms appearing on the right-hand side of Equation (2.17).

(ii) Let τ = {(1, 2), (3, 4)}. List all πNC(4) such that πτ = 14. Check that these are exactly the terms on the right-hand side of Equation (2.18).

The most important property of free cumulants is that we may characterize free independence by the vanishing of “mixed” cumulants. Let \((\mathcal{A},\varphi )\) be a non-commutative probability space and \(\mathcal{A}_{1},\ldots,\mathcal{A}_{s} \subset \mathcal{A}\) unital subalgebras. A cumulant κ n (a 1, a 2, , a n ) is mixed if each a i is in one of the subalgebras, but a 1, a 2, , a n do not all come from the same subalgebra.

Theorem 14.

The subalgebras \(\mathcal{A}_{1},\ldots,\mathcal{A}_{s}\) are free if and only if all mixed cumulants vanish.

The proof of this theorem relies on formula (2.19) and on the following proposition which is a special case of Theorem 14. For the details of the proof of Theorem 14, we refer again to [137, Theorem 11.15].

Proposition 15.

Let \((\mathcal{A},\varphi )\) be a non-commutative probability space and let κ n , n ≥ 1 be the corresponding free cumulants. For n ≥ 2, κ n (a 1, , a n ) = 0 if 1 ∈ {a 1, , a n }. 

Proof:

We consider the case where the last argument a n is equal to 1 and proceed by induction on n. 

For n = 2, 

$$\displaystyle{ \kappa _{2}(a,1) =\varphi (a1) -\varphi (a)\varphi (1) = 0. }$$

So the base step is done.

Now assume for the induction hypothesis that the result is true for all 1 ≤ k < n. We have that

$$\displaystyle\begin{array}{rcl} \varphi (a_{1}\cdots a_{n-1}1)& =& \sum _{\pi \in NC(n)}\kappa _{\pi }(a_{1},\ldots,a_{n-1},1) {}\\ & =& \kappa _{n}(a_{1},\ldots,a_{n-1},1) +\sum _{\begin{array}{c}\pi \in NC(n) \\ \pi \neq 1_{n} \end{array}}\kappa _{\pi }(a_{1},\ldots,a_{n-1},1). {}\\ \end{array}$$

According to our induction hypothesis, a partition π ≠ 1 n can have κ π (a 1, , a n−1, 1) different from zero only if (n) is a one-element block of π, i.e. π = σ ∪{ (n)} for some σNC(n − 1). For such a partition, we have

$$\displaystyle{ \kappa _{\pi }(a_{1},\ldots,a_{n-1},1) =\kappa _{\sigma }(a_{1},\ldots,a_{n-1})\kappa _{1}(1) =\kappa _{\sigma }(a_{1},\ldots,a_{n-1}), }$$

hence

$$\displaystyle\begin{array}{rcl} \varphi (a_{1}\cdots a_{n-1}1)& =& \kappa _{n}(a_{1},\ldots,a_{n-1},1) +\sum _{\sigma \in NC(n-1)}\kappa _{\sigma }(a_{1},\ldots,a_{n-1}) {}\\ & =& \kappa _{n}(a_{1},\ldots,a_{n-1},1) +\varphi (a_{1}\cdots a_{n-1}). {}\\ \end{array}$$

Since φ(a 1a n−11) = φ(a 1a n−1), we have proved that κ n (a 1, , a n−1, 1) = 0. □

Whereas Theorem 14 gives a useful characterization for the freeness of subalgebras, its direct application to the case of random variables would not yield a satisfying characterization in terms of the vanishing of mixed cumulants in the subalgebras generated by the variables. By invoking again the product formula for free cumulants, Theorem 13, it is quite straightforward to get the following much more useful characterization in terms of mixed cumulants of the variables.

Theorem 16.

Let \((\mathcal{A},\varphi )\) be a non-commutative probability space. The random variables \(a_{1},\ldots,a_{s} \in \mathcal{A}\) are free if and only if all mixed cumulants of the a 1, , a s vanish. That is, a 1, , a s are free if and only if whenever we choose i 1, , i n ∈ {1, , s} in such a way that i k ≠ i l for some k, l ∈ [n], then \(\kappa _{n}(a_{i_{1}},\ldots,a_{i_{n}}) = 0\).

2.3 Products of free random variables

We want to understand better the calculation rule for mixed moments of free variables. Thus we will now derive the basic combinatorial description for such mixed moments.

Let {a 1, , a r } and {b 1, , b r } be free random variables, and consider

$$\displaystyle{ \varphi (a_{1}b_{1}a_{2}b_{2}\cdots a_{r}b_{r}) =\sum _{\pi \in NC(2r)}\kappa _{\pi }(a_{1},b_{1},a_{2},b_{2},\ldots,a_{r},b_{r}). }$$

Since the a’s are free from the b’s, we only need to sum over those partitions π which do not connect the a’s with the b’s. Each such partition may be written as π = π a π b , where π a denotes the blocks consisting of a’s and π b the blocks consisting of b’s. Hence by the definition of free cumulants

$$\displaystyle\begin{array}{rcl} \varphi (a_{1}b_{1}a_{2}b_{2}\cdots a_{r}b_{r})& =& \sum _{\pi _{a}\cup \pi _{b}\in NC(2r)}\kappa _{\pi _{a}}(a_{1},\ldots,a_{r}) \cdot \kappa _{\pi _{b}}(b_{1},\ldots,b_{r}) {}\\ & =& \sum _{\pi _{a}\in NC(r)}\kappa _{\pi _{a}}(a_{1},\ldots,a_{r}) \cdot \Bigg (\sum _{\begin{array}{c}\pi _{b}\in NC(r) \\ \pi _{a}\cup \pi _{b}\in NC(2r)\end{array}} \kappa _{\pi _{b } } (b_{1},\ldots,b_{r})\Bigg). {}\\ \end{array}$$

It is now easy to see that, for a given π a NC(r), there exists a biggest σNC(r) with the property that π a σNC(2r). This σ is called the Kreweras complement of π a and is denoted by K(π a ); see [137, Def. 9.21]. This K(π a ) is given by connecting as many b’s as possible in a non-crossing way without getting crossings with the blocks of π a . The mapping K is an order-reversing bijection on the lattice NC(r).

But then the summation condition on the internal sum above is equivalent to the condition π b K(π a ). Summing κ π over all πNC(r) gives the corresponding r-th moment, which extends easily to

$$\displaystyle{\sum _{\begin{array}{c}\pi \in NC(r) \\ \pi \leq \sigma \end{array}}\kappa _{\pi }(b_{1},\ldots,b_{r}) =\varphi _{\sigma }(b_{1},\ldots,b_{r}),}$$

where φ σ denotes, in the same way as in κ π , the product of moments along the blocks of σ; see Equation (2.11).

Thus we get as the final conclusion of our calculations that

$$\displaystyle{ \varphi (a_{1}b_{1}a_{2}b_{2}\cdots a_{r}b_{r}) =\sum _{\pi \in NC(r)}\kappa _{\pi }(a_{1},\ldots,a_{r}) \cdot \varphi _{K(\pi )}(b_{1},\ldots,b_{r}). }$$
(2.20)

Let us consider some simple examples for this formula. For r = 1, there is only one πNC(1), which is its own complement, and we get

$$\displaystyle{ \varphi (a_{1}b_{1}) =\kappa _{1}(a_{1})\varphi (b_{1}). }$$

As κ 1 = φ, this gives the usual factorization formula

$$\displaystyle{ \varphi (a_{1}b_{1}) =\varphi (a_{1})\varphi (b_{1}). }$$

For r = 2, there are two elements in NC(2),  and  , and we have

and the formula above gives

$$\displaystyle{\varphi (a_{1}b_{1}a_{2}b_{2}) =\kappa _{2}(a_{1},a_{2})\varphi (b_{1})\varphi (b_{2}) +\kappa _{1}(a_{1})\kappa _{1}(a_{2})\varphi (b_{1}b_{2}).}$$

With κ 1(a) = φ(a) and κ 2(a 1, a 2) = φ(a 1 a 2) −φ(a 1)φ(a 2), this reproduces formula (1.14).

The formula above is not symmetric between the a’s and the b’s (the former appear with cumulants, the latter with moments). Of course, one can also exchange the roles of a and b, in which case one ends up with

$$\displaystyle{ \varphi (a_{1}b_{1}a_{2}b_{2}\cdots a_{r}b_{r}) =\sum _{\pi \in NC(r)}\varphi _{K^{-1}(\pi )}(a_{1},\ldots,a_{r}) \cdot \kappa _{\pi }(b_{1},\ldots,b_{r}). }$$
(2.21)

Note that K 2 is not the identity, but a cyclic rotation of π.

Formulas (2.20) and (2.21) are particularly useful when one of the sets of variables has simple cumulants, as is the case for semi-circular random variables b i = s. Then only the second cumulants κ 2(s, s) = 1 are non-vanishing, i.e. in effect the sum is only over non-crossing pairings. Thus, if s is semi-circular and free from {a 1, , a r }, then we have

$$\displaystyle{ \varphi (a_{1}sa_{2}s\cdots a_{r}s) =\sum _{\pi \in NC_{2}(r)}\varphi _{K^{-1}(\pi )}(a_{1},\ldots,a_{r}). }$$
(2.22)

Let us also note in passing that one can rewrite the Equations (2.20) and (2.21) above in the symmetric form (see [137, (14.4)])

$$\displaystyle{ \kappa _{r}(a_{1}b_{1},a_{2}b_{2},\ldots,a_{r}b_{r}) =\sum _{\pi \in NC(r)}\kappa _{\pi }(a_{1},\ldots,a_{r}) \cdot \kappa _{K(\pi )}(b_{1},\ldots,b_{r}). }$$
(2.23)

2.4 Functional relation between moment series and cumulant series

Notice how much more efficient the result on the description of freeness in terms of cumulants is in checking freeness of random variables than the original definition of free independence. In the cumulant framework, we can forget about centredness and weaken “alternating” to “mixed”. Also, the problem of adding two freely independent random variables becomes easy on the level of free cumulants. If \(a,b \in (\mathcal{A},\varphi )\) are free with respect to φ, then

$$\displaystyle\begin{array}{rcl} \kappa _{n}^{a+b}& =& \kappa _{ n}(a + b,\ldots,a + b) {}\\ & =& \kappa _{n}(a,\ldots,a) +\kappa _{n}(b,\ldots,b) + (\text{mixed cumulants in }a,b) {}\\ & =& \kappa _{n}^{a} +\kappa _{ n}^{b}. {}\\ \end{array}$$

Thus the problem of calculating moments is shifted to the relation between cumulants and moments. We already know that the moments are polynomials in the cumulants, according to the moment-cumulant formula (2.16), but we want to put this relationship into a framework more amenable to performing calculations.

For any \(a \in \mathcal{A}\), let us consider formal power series in an indeterminate z defined by

$$\displaystyle\begin{array}{rcl} \begin{array}{rclrcl} M(z)& =&1 +\sum _{ n=1}^{\infty }\alpha _{n}^{a}z^{n},& \mathit{moment\ series\ of \ a} \\ C(z)& =&1 +\sum _{ n=1}^{\infty }\kappa _{n}^{a}z^{n},&\mathit{cumulant\ series\ of \ a}.\end{array} & & {}\\ \end{array}$$

We want to translate the moment-cumulant formula (2.16) into a statement about the relationship between the moment and cumulant series.

Proposition 17.

The relation between the moment series M(z) and the cumulant series C(z) of a random variable is given by

$$\displaystyle{ M(z) = C(zM(z)). }$$
(2.24)

Proof:

The idea is to sum first over the possibilities for the block of π containing 1, as in the derivation of the recurrence for C n . Suppose that the first block of π looks like V = {1, v 2, , v s }, where 1 < v 1 < < v s n. Then we build up the rest of the partition π out of smaller “nested” non-crossing partitions π 1, , π s with π 1NC({2, , v 2 − 1}), π 2NC({v 2 + 1, , v 3 − 1}), etc. Hence if we denote i 1 = | {2, , v 2 − 1} |, i 2 = | {v 2 + 1, , v 3 − 1} |, etc., then we have

$$\displaystyle\begin{array}{rcl} \alpha _{n}& =& \sum _{s=1}^{n}\sum _{ \begin{array}{c}i_{1},\ldots,i_{s}\geq 0 \\ s+i_{1}+\ldots +i_{s}=n\end{array}}\sum _{\pi =V \cup \pi _{1}\cup \ldots \cup \pi _{s}}\kappa _{s}\kappa _{\pi _{1}}\cdots \kappa _{\pi _{s}} {}\\ & =& \sum _{s=1}^{n}\sum _{ \begin{array}{c}i_{1},\ldots,i_{s}\geq 0 \\ s+i_{1}+\ldots +i_{s}=n\end{array}}\kappa _{s}\bigg(\sum _{\pi _{1}\in NC(i_{1})}\kappa _{\pi _{1}}\bigg)\cdots \bigg(\sum _{\pi _{s}\in NC(i_{s})}\kappa _{\pi _{s}}\bigg) {}\\ & =& \sum _{s=1}^{n}\sum _{ \begin{array}{c}i_{1},\ldots,i_{s}\geq 0 \\ s+i_{1}+\ldots +i_{s}=n\end{array}}\kappa _{s}\alpha _{i_{1}}\cdots \alpha _{i_{s}}. {}\\ \end{array}$$

Thus we have

$$\displaystyle\begin{array}{rcl} 1 +\sum _{ n=1}^{\infty }\alpha _{ n}z^{n}& =& 1 +\sum _{ n=1}^{\infty }\sum _{ s=1}^{n}\sum _{ \begin{array}{c}i_{1},\ldots,i_{s}\geq 0 \\ s+i_{1}+\ldots +i_{s}=n\end{array}}\kappa _{s}z^{s}\alpha _{ i_{1}}z^{i_{1} }\ldots \alpha _{i_{s}}z^{i_{s} } {}\\ & =& 1 +\sum _{ s=1}^{\infty }\kappa _{ s}z^{s}\bigg(\sum _{ i=0}^{\infty }\alpha _{ i}z^{i}\bigg)^{s}. {}\\ \end{array}$$

Now consider the Cauchy transform of a:

$$\displaystyle{ G(z):=\varphi \Big ( \frac{1} {z - a}\Big) =\sum _{ n=0}^{\infty }\frac{\varphi (a^{n})} {z^{n+1}} = \frac{1} {z}M(1/z) }$$
(2.25)

and the R-transform of a defined by

$$\displaystyle{ R(z):= \frac{C(z) - 1} {z} =\sum _{ n=0}^{\infty }\kappa _{ n+1}^{a}z^{n}. }$$
(2.26)

Also put \(K(z) = R(z) + \frac{1} {z} = \frac{C(z)} {z}.\) Then we have the relations

$$\displaystyle{ K(G(z)) = \frac{1} {G(z)}C(G(z)) = \frac{1} {G(z)}C\left (\frac{1} {z}M\left (\frac{1} {z}\right )\right ) = \frac{1} {G(z)}zG(z) = z. }$$

Note that M and C are in \(\mathbb{C}[\![z]\!]\), the ring of formal power series in z, \(G \in \mathbb{C}[\![\frac{1} {z}]\!]\), and \(K \in \mathbb{C}(\!(z)\!)\), the ring of formal Laurent series in z, i.e. \(zK(z) \in \mathbb{C}[\![z]\!]\). Thus \(K \circ G \in \mathbb{C}(\!(\frac{1} {z})\!)\) and \(G \circ K \in \mathbb{C}[\![z]\!]\). We then also have G(K(z)) = z.

Thus we recover the following theorem of Voiculescu, which is the main result on the R-transform. Voiculescu’s original proof in [177] was much more operator theoretic. One should also note that this computational machinery for the R-transform was also found independently and about the same time by Woess [204, 205], Cartwright and Soardi [49], and McLaughlin [125], in a more restricted setting of random walks on free product of groups. Our presentation here is based on the approach of Speicher in [161].

Theorem 18.

For a random variable a, let G a (z) be its Cauchy transform, and define its R-transform R a (z) by

$$\displaystyle{ G_{a}[R_{a}(z) + 1/z] = z. }$$
(2.27)

Then, for a and b freely independent, we have

$$\displaystyle{ R_{a+b}(z) = R_{a}(z) + R_{b}(z). }$$
(2.28)

Let us write, for a and b free, the above as:

$$\displaystyle{ z = G_{a+b}[R_{a+b}(z) + 1/z] = G_{a+b}[R_{a}(z) + R_{b}(z) + 1/z]. }$$
(2.29)

If we now put w: = R a+b (z) + 1∕z, then we have z = G a+b (w) and we can continue Equation (2.29) as:

$$\displaystyle{ G_{a+b}(w) = z = G_{a}[R_{a}(z) + 1/z] = G_{a}[w - R_{b}(z)] = G_{a}{\bigl [w - R_{b}[G_{a+b}(w)]\bigr ]}. }$$

Thus we get the subordination functions ω a and ω b given by

$$\displaystyle{ \omega _{a}(z) = z - R_{b}[G_{a+b}(z)]\qquad \text{and}\qquad \omega _{b}(z) = z - R_{a}[G_{a+b}(z)]. }$$
(2.30)

We have \(\omega _{a},\omega _{b} \in \mathbb{C}(\!(\frac{1} {z})\!)\), so \(G_{a} \circ \omega _{a} \in \mathbb{C}[\![\frac{1} {z}]\!]\). These satisfy the subordination relations

$$\displaystyle{ G_{a+b}(z) = G_{a}[\omega _{a}(z)] = G_{b}[\omega _{b}(z)]. }$$
(2.31)

We say that G a+b is subordinate to both G a and G b . The name comes from the theory of univalent functions; see [65, Ch. 6] for a general discussion.

Exercise 14. 

Show that ω a (z) + ω b (z) − 1∕G a (ω a (z)) = z.

Exercise 15. 

Suppose we have formal Laurent series ω a (z) and ω b (z) in \(\frac{1} {z}\) such that

$$\displaystyle{ G_{a}(\omega _{a}(z)) = G_{b}(\omega _{b}(z))\quad \text{and}\quad \omega _{a}(z) +\omega _{b}(z) - 1/G_{a}(\omega _{a}(z)) = z. }$$
(2.32)

Let G be the formal power series G(z) = G a (ω a (z)) and R(z) = G 〈−1〉(z) − z −1. (G 〈−1〉 denotes here the inverse under composition of G.) By replacing z by G 〈−1〉(z) in the second equation of (2.32), show that R(z) = R a (z) + R b (z). These equations can thus be used to define the distribution of the sum of two free random variables.

At the moment these are identities on the level of formal power series. In the next chapter, we will elaborate on their interpretation as identities of analytic functions; see Theorem 3.43.

2.5 Subordination and the non-commutative derivative

One might wonder about the relevance of the subordination formulation in (2.31). Since it has become more and more evident that the subordination formulation of free convolution is in many cases preferable to the (equivalent) description in terms of the R-transform, we want to give here some idea why subordination is a very natural concept in the context of free probability. When subordination appeared in this context first in papers of Voiculescu [181] and Biane [34], it was more an ad hoc construction – its real nature was only revealed later in the paper [190] of Voiculescu, where he related it to the non-commutative version of the derivative operation.

We will now introduce the basics of this non-commutative derivative; as before in this chapter, we will ignore all analytic questions and just deal with formal power series. In Chapter 8 we will have more to say about the analytic properties of the non-commutative derivatives.

Let \(\mathbb{C}\langle x\rangle\) be the algebra of polynomials in the variable x. Then we define the non-commutative derivative x as a linear mapping \(\partial _{x}: \mathbb{C}\langle x\rangle \rightarrow \mathbb{C}\langle x\rangle \otimes \mathbb{C}\langle x\rangle\) by the requirements that it satisfies the Leibniz rule

$$\displaystyle{ \partial _{x}(qp) = \partial _{x}(q) \cdot 1 \otimes p + q \otimes 1 \cdot \partial _{x}(p) }$$

and by

$$\displaystyle{ \partial _{x}1 = 0,\qquad \partial _{x}x = 1 \otimes 1. }$$

This means that it is given more explicitly as the linear extension of

$$\displaystyle{ \partial _{x}x^{n} =\sum _{ k=0}^{n-1}x^{k} \otimes x^{n-1-k}. }$$
(2.33)

We can also (and will) extend this definition from polynomials to infinite formal power series.

Exercise 16. 

(i) Let, for some \(z \in \mathbb{C}\) with z ≠ 0, f be the formal power series

$$\displaystyle{ f(x) = \frac{1} {z - x} =\sum _{ n=0}^{\infty } \frac{x^{n}} {z^{n+1}}. }$$

Show that we have then x f = ff.

(ii) Let f be a formal power series in x with the property that x f = ff. Show that f must then be either zero or of the form f(x) = 1∕(zx) for some \(z \in \mathbb{C}\), with z ≠ 0.

We will now consider polynomials and formal power series in two non-commuting variables x and y. In this context, we still have the notion of x (and also of  y ), and now their character as “partial” derivatives becomes apparent. Namely, we define \(\partial _{x}: \mathbb{C}\langle x,y\rangle \rightarrow \mathbb{C}\langle x,y\rangle \otimes \mathbb{C}\langle x,y\rangle\) by the requirements that it should be a derivation, i.e. satisfy the Leibniz rule, and by the prescriptions:

$$\displaystyle{\partial _{x}x = 1 \otimes 1,\qquad \partial _{x}y = 0,\qquad \partial _{x}1 = 0.}$$

For a monomial \(x_{i_{1}}\cdots x_{i_{n}}\) in x and y (where we put x 1: = x and x 2: = y), this means explicitly

$$\displaystyle{ \partial _{x}x_{i_{1}}\cdots x_{i_{n}} =\sum _{ k=1}^{n}\delta _{ 1i_{k}}x_{i_{1}}\cdots x_{i_{k-1}} \otimes x_{i_{k+1}}\cdots x_{i_{n}}. }$$
(2.34)

Again it is clear that we can extend this definition also to formal power series in non-commuting variables.

Let us note that we may define the derivation x+y on \(\mathbb{C}\langle x + y\rangle\) exactly as we did x . Namely, x+y (1) = 0 and x+y (x + y) = 1 ⊗ 1. Note that x+y can be extended to all of \(\mathbb{C}\langle x,y\rangle\) but not in a unique way unless we specify another basis element. Since \(\mathbb{C}\langle x + y\rangle \subset \mathbb{C}\langle x,y\rangle\), we may apply x to \(\mathbb{C}\langle x + y\rangle\) and observe that x (x + y) = 1 ⊗ 1 = x+y (x + y). Thus

$$\displaystyle{ \partial _{x}(x + y)^{n} =\sum _{ k=1}^{n}(x + y)^{k-1} \otimes (x + y)^{n-k} = \partial _{ x+y}(x + y)^{n}. }$$

Hence

$$\displaystyle{ \partial _{x}\vert _{\mathbb{C}\langle x+y\rangle } = \partial _{x+y}. }$$
(2.35)

If we are given a polynomial \(p(x,y) \in \mathbb{C}\langle x,y\rangle\), then we will also consider E x [p(x, y)], the conditional expectation of p(x, y) onto a function of just the variable x, which should be the best approximation to p among such functions. There is no algebraic way of specifying what best approximation means; we need a state φ on the ∗-algebra generated by self-adjoint elements x and y for this. Given such a state, we will require that the difference between p(x, y) and E x [p(x, y)] cannot be detected by functions of x alone; more precisely, we ask that

$$\displaystyle{ \varphi {\bigl (q(x) \cdot \mathrm{ E}_{x}[p(x,y)]\bigr )} =\varphi {\bigl ( q(x) \cdot p(x,y)\bigr )} }$$
(2.36)

for all \(q \in \mathbb{C}\langle x\rangle\). If we are going from the polynomials \(\mathbb{C}\langle x,y\rangle\) over to the Hilbert space completion L 2(x, y, φ) with respect to the inner product given by 〈f, g〉: = φ(g f), then this amounts just to an orthogonal projection from the space L 2(x, y, φ) onto the subspace L 2(x, φ) generated by polynomials in the variable x. (Let us assume that φ is positive and faithful so that we get an inner product.) Thus, on the Hilbert space level, the existence and uniqueness of E x [p(x, y)] are clear. In general, though, it might not be the case that the projection of a polynomial in x and y is a polynomial in x – it will just be an L 2-function. If we assume, however, that x and y are free, then we claim that this projection maps polynomials to polynomials. In fact for this construction to work at the algebraic level we only need assume that \(\varphi \vert _{\mathbb{C}\langle x\rangle }\) is non-degenerate as this shows that E x is well defined by (2.36). It is clear from Equation (2.36) that φ(E x (a)) = φ(a) for all \(a \in \mathbb{C}\langle x,y\rangle\).

Let us consider some examples. Assume that x and y are free. Then it is clear that we have

$$\displaystyle{ \mathrm{E}_{x}[x^{n}y^{m}] = x^{n}\varphi (y^{m}) }$$

and more generally

$$\displaystyle{ \mathrm{E}_{x}[x^{n_{1} }y^{m}x^{n_{2} }] = x^{n_{1}+n_{2} }\varphi (y^{m}). }$$

It is not so clear what E x [yxyx] might be. Before giving the general rule, let us make some simple observations.

Exercise 17. 

Let \(\mathcal{A}_{1} = \mathbb{C}\langle x\rangle\) and \(\mathcal{A}_{2} = \mathbb{C}\langle y\rangle\) with x and y free and \(\varphi \vert _{\mathcal{A}_{1}}\) non-degenerate.

  1. (i)

    Show that \(\mathrm{E}_{x}[\mathring{\mathcal{A}}_{2}] = 0\).

  2. (ii)

    For α 1, , α n ∈ {1, 2} with α 1 ≠ ⋯ ≠ α n and n ≥ 2, show that \(\mathrm{E}_{x}[\mathring{\mathcal{A}}_{\alpha _{1}}\cdots\mathring{\mathcal{A}}_{\alpha _{n}}] = 0\).

Exercise 18. 

Let \(\mathcal{A}_{1}\) and \(\mathcal{A}_{2}\) be as in Exercise 17. Since \(\mathcal{A}_{1}\) and \(\mathcal{A}_{2}\) are free, we can use Equation (1.12) from Exercise 1.9 to write

$$\displaystyle{\mathcal{A}_{1}\vee \mathcal{A}_{2} = \mathcal{A}_{1}\oplus\mathring{\mathcal{A}}_{2}\oplus {{}{\sum }^{\oplus }}_{ n\geq 2}\,{{}{\sum }^{\oplus }}_{ \alpha _{1}\not =\cdots \not =\alpha _{n}}\mathring{\mathcal{A}}_{\alpha _{1}}\mathring{\mathcal{A}}_{\alpha _{2}}\cdots\mathring{\mathcal{A}}_{\alpha _{n}}.}$$

We have just shown that if E x is a linear map satisfying Equation (2.36), then E x is the identity on the first summand and 0 on all remaining summands. Show that by defining E x this way we get the existence of a linear mapping from \(\mathcal{A}_{1} \vee \mathcal{A}_{2}\) to \(\mathcal{A}_{1}\) satisfying Equation (2.36). An easy consequence of this is that for \(q_{1}(x),q_{2}(x) \in \mathbb{C}\langle x\rangle\) and \(p(x,y) \in \mathbb{C}\langle x,y\rangle\) we have E x [q 1(x)p(x, y)q 2(x)] = q 1(x)E x [p(x, y)]q 2(x).

Let \(a_{1} = y^{n_{1}},a_{2} = y^{n_{2}}\) and \(b = x^{m_{1}}\). To compute \(\mathrm{E}_{x}(y^{n_{1}}x^{m_{1}}y^{n_{2}})\) we follow the same centring procedure used to compute φ(a 1 ba 2) in Section 1.12. From Exercise 17 we see that

$$\displaystyle\begin{array}{rcl} \mathrm{E}_{x}[a_{1}ba_{2}]& =& \mathrm{E}_{x}[\mathring{a}_{1}ba_{2}] +\varphi (a_{1})b\varphi (a_{2}) {}\\ & =& \mathrm{E}_{x}[\mathring{a}_{1}\mathring{b}a_{2}] +\varphi (\mathring{a}_{1}a_{2})\varphi (b) +\varphi (a_{1})b\varphi (a_{2}) {}\\ & =& \varphi (\mathring{a}_{1}a_{2})\varphi (b) +\varphi (a_{1})b\varphi (a_{2}) {}\\ & =& \varphi (a_{1}a_{2})\varphi (b) -\varphi (a_{1})\varphi (b)\varphi (a_{2}) +\varphi (a_{1})b\varphi (a_{2}). {}\\ \end{array}$$

Thus

$$\displaystyle\begin{array}{rcl} \mathrm{E}_{x}[y^{n_{1} }x^{m_{1} }y^{n_{2} }x^{m_{2} }]& =& \ \varphi (y^{n_{1}+n_{2} })\varphi (x^{m_{1} })x^{m_{2} } +\varphi (y^{n_{1} })x^{m_{1} }\varphi (y^{n_{2} })x^{m_{2} } {}\\ & & -\varphi (y^{n_{1} })\varphi (x^{m_{1} })\varphi (y^{n_{2} })x^{m_{2} }. {}\\ \end{array}$$

The following theorem (essentially in the work [34] of Biane) gives the general recipe for calculating such expectations. As usual the formulas are simplified by using cumulants. To give the rule, we need the following bit of notation. Given \(\sigma \in \mathcal{P}(n)\) and \(a_{1},\ldots,a_{n} \in \mathcal{A}\), we define \(\tilde{\varphi }_{\sigma }(a_{1},\ldots,a_{n})\) in the same way as φ σ in Equation (2.11) except we do not apply φ to the last block, i.e. the block containing n. For example, if σ = {(1, 3, 4), (2, 6), (5)}, then \(\tilde{\varphi }_{\sigma }(a_{1},a_{2},a_{3},a_{4},a_{5},a_{6}) =\varphi (a_{1}a_{3}a_{4})\varphi (a_{5})a_{2}a_{6}\). More explicitly, for σ = {V 1, , V s } ∈ NC(r) with rV s , we put

$$\displaystyle{\tilde{\varphi }_{\sigma }(a_{1},\ldots,a_{r}) =\varphi {\bigl (\prod _{i_{1}\in V _{1}}a_{i_{1}}\bigr )}\cdots \varphi {\bigl (\prod _{i_{s-1}\in V _{s-1}}a_{i_{s-1}}\bigr )} \cdot \prod _{i_{s}\in V _{s}}a_{i_{s}}.}$$

Theorem 19.

Let x and y be free. Then for r ≥ 1 and n 1, m 1, , n r , m r ≥ 0, we have

$$\displaystyle{ E_{x}[y^{n_{1} }x^{m_{1} }\cdots y^{n_{r} }x^{m_{r} }] =\sum _{\pi \in NC(r)}\kappa _{\pi }(y^{n_{1} },\ldots,y^{n_{r} }) \cdot \tilde{\varphi }_{K(\pi )}(x^{m_{1} },\ldots,x^{m_{r} }). }$$
(2.37)

Let us check that this agrees with our previous calculation of \(\mathrm{E}_{x}[y^{n_{1}}x^{m_{1}}y^{n_{2}}x^{m_{2}}]\).

$$\displaystyle\begin{array}{rcl} & & \mathrm{E}_{x}[y^{n_{1} }x^{m_{1} }y^{n_{2} }x^{m_{2} }] {}\\ & & \quad =\kappa _{\{(1,2)\}}(y^{n_{1} },y^{n_{2} }) \cdot \tilde{\varphi }_{\{(1),(2)\}}(x^{m_{1} },x^{m_{2} }) +\kappa _{\{(1),(2)\}}(y^{n_{1} },y^{n_{2} }) \cdot \tilde{\varphi }_{\{(1,2)\}}(x^{m_{1} },x^{m_{2} }) {}\\ & & \quad =\kappa _{2}(y^{n_{1} },y^{n_{2} })\varphi (x^{m_{1} })x^{m_{2} } +\kappa _{1}(y^{n_{1} })\kappa _{1}(y^{n_{2} })x^{m_{1}+m_{2} } {}\\ & & \quad ={\bigl (\varphi (y^{n_{1}+n_{2} }) -\varphi (y^{n_{1} })\varphi (y^{n_{2} })\bigr )}\varphi (x^{m_{1} }) \cdot x^{m_{2} }\mbox{ } +\varphi (y^{n_{1} })\varphi (y^{n_{2} }) \cdot x^{m_{1}+m_{2} }. {}\\ \end{array}$$

The proof of the theorem is outlined in the exercise below.

Exercise 19. 

(i) Given πNC(n), let π′ be the non-crossing partition of [n′] = {0, 1, 2, 3, , n} obtained by joining 0 to the block of π containing n. For \(a_{0},a_{1},\ldots,a_{n} \in \mathcal{A}\), show that \(\varphi _{\pi '}(a_{0},a_{1},a_{2},\ldots,a_{n}) =\varphi (a_{0}\tilde{\varphi }_{\pi }(a_{1},\ldots,a_{n}))\).

(ii) Suppose that \(\mathcal{A}_{1},\mathcal{A}_{2} \subset \mathcal{A}\) are unital subalgebras of \(\mathcal{A}\) which are free with respect to the state φ. Let \(x_{0},x_{1},\ldots,x_{n} \in \mathcal{A}_{1}\) and \(y_{1},y_{2},\ldots,y_{n} \in \mathcal{A}_{2}\). Show that

$$\displaystyle{\varphi (x_{0}y_{1}x_{1}y_{2}x_{2}\cdots y_{n}x_{n}) =\sum _{\pi \in NC(n)}\kappa _{\pi }(y_{1},\ldots,y_{n})\varphi _{K(\pi )'}(x_{0},x_{1},\ldots,x_{n}).}$$

Prove Theorem 19 by showing that with the expression given in (2.37) one has for all m ≥ 0

$$\displaystyle{ \varphi {\bigl (x^{m} \cdot \mathrm{ E}_{ x}[y^{n_{1} }x^{m_{1} }\cdots y^{n_{r} }x^{m_{r} }]\bigr )} =\varphi {\bigl ( x^{m} \cdot y^{n_{1} }x^{m_{1} }\cdots y^{n_{r} }x^{m_{r} }\bigr )}. }$$

Exercise 20. 

Use the method of Exercise 19 to work out \(\mathrm{E}_{x}[x^{m_{1}}y^{n_{1}}\cdots x^{m_{r}}y^{n_{r}}]\).

By linear extension of Equation (2.37), one can thus get the projection onto one variable x of any non-commutative polynomial or formal power series in two free variables x and y. We now want to identify the projection of resolvents in x + y. To achieve this we need a crucial intertwining relation between the partial derivative and the conditional expectation.

Lemma 20.

Suppose φ is a state on \(\mathbb{C}\langle x,y\rangle\) such that x and y are free and \(\varphi \vert _{\mathbb{C}\langle x\rangle }\) is non-degenerate. Then

$$\displaystyle{ \mathrm{E}_{x} \otimes \mathrm{ E}_{x} \circ \partial _{x+y}\vert _{\mathbb{C}\langle x+y\rangle } = \partial _{x} \circ \mathrm{ E}_{x}\vert _{\mathbb{C}\langle x+y\rangle }. }$$
(2.38)

Proof:

We let \(\mathcal{A}_{1} = \mathbb{C}\langle x\rangle\) and \(\mathcal{A}_{2} = \mathbb{C}\langle y\rangle\). We use the decomposition from Exercise 1.9

$$\displaystyle{\mathcal{A}_{1}\vee \mathcal{A}_{2}\ominus \mathcal{A}_{1} =\mathring{\mathcal{A}}_{2}\oplus {{}{\sum }^{\oplus }}_{ n\geq 2}{{}{\sum }^{\oplus }}_{ \alpha _{1}\not =\cdots \not =\alpha _{n}}\mathring{\mathcal{A}}_{\alpha _{1}}\cdots\mathring{\mathcal{A}}_{\alpha _{n}}}$$

and examine the behaviour of E x ⊗ E x x on each summand. We know that x is 0 on \(\mathring{\mathcal{A}}_{2}\) by definition. For n ≥ 2

$$\displaystyle\begin{array}{rcl} & & \mathrm{E}_{x} \otimes \mathrm{ E}_{x} \circ \partial _{x}(\mathring{\mathcal{A}}_{\alpha _{1}}\cdots\mathring{\mathcal{A}}_{\alpha _{n}}) {}\\ & & \quad \subseteq \sum _{k=1}^{n}\delta _{ 1,\alpha _{k}}\mathrm{E}_{x}(\mathring{\mathcal{A}}_{\alpha _{1}}\cdots\mathring{\mathcal{A}}_{\alpha _{k-1}}(\mathbb{C}1\oplus\mathring{\mathcal{A}}_{\alpha _{k}})) \otimes \mathrm{ E}_{x}((\mathbb{C}1\oplus\mathring{\mathcal{A}}_{\alpha _{k}})\mathring{\mathcal{A}}_{\alpha _{k+1}}\cdots\mathring{\mathcal{A}}_{\alpha _{n}}).{}\\ \end{array}$$

By Exercise 17, in each term, one or both of the factors is 0. Thus \(\mathrm{E}_{x} \otimes \mathrm{ E}_{x} \circ \partial _{x}\vert _{\mathcal{A}_{1}\vee \mathcal{A}_{2}\ominus \mathcal{A}_{1}} = 0\). Hence

$$\displaystyle{\mathrm{E}_{x} \otimes \mathrm{ E}_{x} \circ \partial _{x}\vert _{\mathcal{A}_{1}\vee \mathcal{A}_{2}} =\mathrm{ E}_{x} \otimes \mathrm{ E}_{x} \circ \partial _{x} \circ \mathrm{ E}_{x}\vert _{\mathcal{A}_{1}\vee \mathcal{A}_{2}} = \partial _{x} \circ \mathrm{ E}_{x}\vert _{\mathcal{A}_{1}\vee \mathcal{A}_{2}},}$$

and then by Equation (2.35) we have

$$\displaystyle{\mathrm{E}_{x} \otimes \mathrm{ E}_{x} \circ \partial _{x+y}\vert _{\mathbb{C}\langle x+y\rangle } =\mathrm{ E}_{x} \otimes \mathrm{ E}_{x} \circ \partial _{x}\vert _{\mathbb{C}\langle x+y\rangle } = \partial _{x} \circ \mathrm{ E}_{x}\vert _{\mathbb{C}\langle x+y\rangle }.}$$

Theorem 21.

Let x and y be free. For every \(z \in \mathbb{C}\) with z ≠ 0, there exists a \(w \in \mathbb{C}\) such that

$$\displaystyle{ \mathrm{E}_{x}\left [ \frac{1} {z - (x + y)}\right ] = \frac{1} {w - x}. }$$
(2.39)

In other words, the best approximation for a resolvent in x + y by a function of x is again a resolvent.

By applying the state φ to both sides of (2.39), one obtains the subordination for the Cauchy transforms, and thus it is clear that the w from above must agree with the subordination function from (2.31), w = ω(z).

Proof:

We put

$$\displaystyle{ f(x,y):= \frac{1} {z - (x + y)}. }$$

By Exercise 16, part (i), we know that x+y f = ff. By Lemma 20 we have that for functions g of x + y

$$\displaystyle{ \partial _{x}\mathrm{E}_{x}[g(x + y)] =\mathrm{ E}_{x} \otimes \mathrm{ E}_{x}[\partial _{x+y}g(x + y)]. }$$
(2.40)

By applying (2.40) to f, we obtain

$$\displaystyle{\partial _{x}\mathrm{E}_{x}[f] =\mathrm{ E}_{x} \otimes \mathrm{ E}_{x}[\partial _{x+y}f] =\mathrm{ E}_{x} \otimes \mathrm{ E}_{x}[f \otimes f] =\mathrm{ E}_{x}[f] \otimes \mathrm{ E}_{x}[f].}$$

Thus, by the second part of Exercise 16, we know that E x [f] is a resolvent in x and we are done. □