Keywords

MSC 2010 Classification

34.1 Introduction

In  this work, we review and investigate the Gaussian \(\beta \)-ensembles as the basis for a generalized Wishart density distribution and how this can be optimized over various surfaces, in particular a unit sphere. We take advantage of the general properties of Vandermonde determinant. To begin with, we give a brief outline of key terms including but not limited to Gaussian univariate and multivariate distributions, the Chi-squared density, the Wishart density, the occurrence of random matrices, their join eigenvalue probability distribution, the \(\beta \)-ensembles, the Vandermonde matrix and its determinant. We then illustrate the optimization of the joint probability density function of the \(\beta \)-ensembles over a unit sphere based on the characteristic properties of the Vandermonde determinant.

34.1.1 Univariate and Multivariate Normal Distribution

Definition 34.1

The univariate normal probability density function (Gaussian normal density) for a random variable X, which is the basis for construction of many multivariate distributions that occur in statistics, can be expressed as [4]:

$$\begin{aligned} \mathbb {P}_{X}(x) = k\exp \left\{ -\frac{1}{2}\alpha (x-\beta )^{2}\right\} \equiv k\exp \left\{ -\frac{1}{2} (x-\beta ) \alpha (x-\beta )\right\} \end{aligned}$$
(34.1)

where \(\alpha \) and k is chosen so that the integral of (34.1) over the entire \(x-\)axis is unity and \(\beta \) is equal to the expectation of X, that is, \(\mathbb {E}[X] = \beta \). It is then said that X follows a normal probability density function with parameters \(\alpha \) and \(\beta \), also expressed as \(X\sim \mathcal {N}(\alpha ,\beta )\).

The density function of the multivariate normal distribution of random variables say \(X_{1},\, \ldots ,\, X_{p}\) is defined analogously. If the scalar variable x in (34.1) is directly replaced by the vector \(\mathbf {X} = (X_{1}, \,\ldots ,\, X_{p})^{\top }\), the scalar constant \(\beta \) is replaced by a vector \(\mathbf {b} = (b_{1},\, \ldots , \,b_{p})^{\top }\) and the positive definite matrix

$$\begin{aligned} \mathbf {A} = \left( \begin{array}{cccc} a_{11} &{} a_{12} &{} \cdots &{} a_{1p} \\ a_{21} &{} a_{22} &{} \cdots &{} a_{2p} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ a_{p1} &{} a_{p2} &{} \cdots &{} a_{pp} \end{array} \right) . \end{aligned}$$
(34.2)

The expression

$$\begin{aligned} \alpha (x - \beta )^{2} = (x - \beta ) \alpha (x - \beta ) \end{aligned}$$

is replaced by the quadratic form

$$\begin{aligned} (\mathbf {X} - \mathbf {b})^{\top }\mathbf {A}(\mathbf {X} - \mathbf {b}) = \sum _{i,j=1}^{p} a_{ij} (x_{i}-b_{i})(x_{j}-b_{j}). \end{aligned}$$
(34.3)

Thus, the density of the p-variate normal distribution becomes

$$\begin{aligned} \mathbb {P}(\mathbf {X}) = K \exp \left\{ \frac{1}{2}(\mathbf {X} - \mathbf {b})^{\top }\mathbf {A}(\mathbf {X} - \mathbf {b}) \right\} \end{aligned}$$
(34.4)

where \(\top \) denotes transpose and \(K > 0\) is chosen so that the integral over the entire p-dimensional Euclidean space \(x_{1}, \ldots , x_{p}\) is unity.

Theorem 34.1

If the density of a p-dimensional random vector \(\mathbf {X}\) is

$$\begin{aligned} {\sqrt{|\mathbf {A}|}}{(2\pi )^{-\frac{1}{2}p}}\exp \left\{ -\frac{1}{2}(\mathbf {X} - \mathbf {b})^{\top }\mathbf {A}(\mathbf {X} - \mathbf {b})\right\} , \end{aligned}$$

then the expected value of \(\mathbf {X}\) is \(\mathbf {b}\) and the covariance matrix is \(\mathbf {A}^{-1}\), see [4]. Conversely, given a vector \(\pmb {\mu }\) and a positive definite matrix \(\pmb {\varSigma }\), there is a multivariate normal density

$$\begin{aligned} \mathbb {P}(\mathbf {X}) = (2\pi )^{-\frac{1}{2}p}|\pmb {\varSigma }|^{-\frac{1}{2}}\exp \left\{ (\mathbf {X} - \pmb {\mu })^{\top }\pmb {\varSigma }^{-1}(\mathbf {X} - \pmb {\mu }) \right\} \end{aligned}$$
(34.5)

such that the expected value of the density is \(\pmb {\mu }\) and the covariance matrix is \(\pmb {\varSigma }\).

The density (34.5) is often denoted as \(\mathbf {X} \sim \mathcal {N}_{p}(\pmb {\mu }, \pmb {\varSigma })\).

For example, the diagonal elements of the covariance matrix, \(\pmb {\varSigma }_{ii}\), is the variance of the ith component of \(\mathbf {X}\), which may sometimes be denoted by \(\sigma _{i}^{2}\). The correlation between \(X_{i}\) and \(X_{j}\) is defined as

$$ \rho _{ij} = \frac{\sigma _{ij}}{\sqrt{\sigma _{ii}}\sqrt{\sigma _{jj}}} = \frac{\sigma _{ij}}{{\sigma _{i}}{\sigma _{j}}} $$

where \(\sigma _k\) denotes the standard deviation of \(X_k\) and \(\sigma _{ij} = \pmb {\varSigma }_{ij}\). This measure of association is symmetric in \(X_{i}\) and \(X_{j}\) such that \(\rho _{ij} = \rho _{ji}\). Since

$$ \left( \begin{array}{cc} \sigma _{ii} &{} \sigma _{ij} \\ \sigma _{ji} &{} \sigma _{jj} \end{array} \right) = \left( \begin{array}{cc} \sigma _{i}^{2} &{} \sigma _{i}\sigma _{j}\rho _{ij} \\ \sigma _{i}\sigma _{j}\rho _{ij} &{} \sigma _{j}^{2} \end{array} \right) $$

is positive-definite, the determinant

$$ \left| \begin{array}{cc} \sigma _{i}^{2} &{} \sigma _{i}\sigma _{j}\rho _{ij} \\ \sigma _{i}\sigma _{j}\rho _{ij} &{} \sigma _{j}^{2} \end{array} \right| = \sigma _{i}^{1}\sigma _{j}^{2}(1 - \rho _{ij}^{2}) $$

is positive. Therefore \(-1< \rho _{ij} < 1\).

34.1.2 Wishart Distribution

The matrix distribution that is now known as a Wishart distribution, was first derived by Wishart in the late 1920s [56]. It is usually regarded as a multivariate extension of the \(\chi ^{2}-\)distribution.

Theorem 34.2

The sum of squares, \(\displaystyle \pmb {\chi }^{2} = Z_{1}^{2} + \cdots + Z_{n}^{2}\) of n- independent standard normal variables \(Z_{i}\) of mean 0 and variance 1, that is, distributed as \(\mathcal {N}(0,1)\) has a \(\chi ^{2}\)-distribution defined by:

$$\begin{aligned} \mathbb {P}_{\pmb {\chi ^2}}(x) = \frac{1}{2^{\frac{1}{2}n}\mathrm {\Gamma }{\left( \frac{1}{2}n\right) }}e^{-\frac{1}{2}x^{2}}(\pmb {\chi }^{2})^{\frac{1}{2}n-1}. \end{aligned}$$
(34.6)

where \(\mathrm {\Gamma }\left( \cdot \right) \) is the Gamma function [40].

Definition 34.2

Let \(\mathbf {X} = (X_{1}, \ldots , X_{n})\), where \(X_{i} \sim \mathcal {N}(\mu _{i}, \pmb {\varSigma })\) and \(\mathbf {X}_{i}\) is independent of \(\mathbf {X}_{j}\), where \(i\not = j\). The matrix \(\mathbf {W}:p\times p\) is said to be Wishart distributed [56] if and only if \(\mathbf {W} = \mathbf {X}\mathbf {X}^{\top }\) for some matrix \(\mathbf {X}\) in a family of Gaussian matrices \(\mathbf {G}_{m \times n}, m \le n\), that is, \(\mathbf {X} \sim \mathcal {N}_{m,n}(\pmb {\mu }, \pmb {\varSigma }, \mathbf {I})\) where \(\pmb {\varSigma }\ge 0\). If \(\pmb {\mu } = 0\) we have a central Wishart distribution which will be denoted by \(\mathbf {W} \sim \mathcal {W}_{m}(\pmb {\varSigma }, n)\), and if \(\pmb {\mu } \not =0\) we have a non-central Wishart distribution which will be denoted \(\mathbf {W} \sim \mathcal {W}_{m}(\pmb {\varSigma }, n, \pmb {\triangle })\), where \(\pmb {\triangle } = \pmb {\mu }\pmb {\mu }^{\top }\) and n is the number of degrees of freedom.

In our study, we shall mainly focus on the central Wishart distribution for which \(\pmb {\mu } = 0\) and \(\mathbf {X} \sim \mathcal {N}_{m,n}(\pmb {\mu }, \pmb {\varSigma }, \mathbf {I})\)

Theorem 34.3

([4]) Given a random matrix \(\mathbf {W}\) which can be expressed as \(\displaystyle \mathbf {W} = \mathbf {X}\mathbf {X}^{\top } \) where \(\mathbf {X}_{1}, \cdots , \mathbf {X}_{n}, ~(n \ge p)\) are independent, each with the distribution \(\mathcal {N}_{p}(\pmb {\mu }, \pmb {\varSigma })\). Then, the distribution of \(\mathbf {W} \sim \mathcal {W}_{p}(\pmb {\varSigma }, n)\). If \(\pmb {\varSigma } > 0\), then the random matrix \(\mathbf {W}\) has a joint density functions:

$$\begin{aligned} \mathbb {P}(\mathbf {W}) = {\left\{ \begin{array}{ll} \displaystyle \frac{1}{2^{\frac{np}{2}}\mathrm {\Gamma }_{p}\left( \frac{n}{2}\right) } |\mathbf {W}|^{\frac{n-p-1}{2}}\exp \left( {-\frac{1}{2}\mathrm {Tr}\left( \pmb {\varSigma }^{-1}\mathbf {W} \right) }\right) , &{} \text{ if } \,\,\mathbf {W} > 0 \\ 0, &{} \text{ otherwise }. \end{array}\right. } \end{aligned}$$
(34.7)

where the multivariate Gamma function is given by

$$\begin{aligned} \displaystyle {\mathrm {\Gamma }}_{p}\left( n/2\right) = \pi ^{\frac{p(n-1)}{2}}\prod _{i=1}^{p}{\mathrm {\Gamma }}\left( \frac{1}{2}(n+1-i)\right) . \end{aligned}$$
(34.8)

If \(p=1, \pmb {\mu } = \mathbf {0}\) and \(\pmb {\varSigma } = \mathbf {1}\), then the Wishart matrix is identical to a central \(\pmb {\chi }^{2}\)-variable with n degrees of freedom as defined in (34.6).

Theorem 34.4

([24, 39]) If   \(\mathbf {X}\) is distributed as \(\mathcal {N}(\pmb {\mu }, \displaystyle \pmb {\varSigma }),\) then the probability density distribution of the eigenvalues of \(\mathbf {X}\mathbf {X}^{\top }\), denoted \(\pmb {\lambda } = (\lambda _{1}, \ldots , \lambda _{m})\), is given by:

$$\begin{aligned} \displaystyle \mathbb {P}({\pmb {\lambda }}) = \frac{\pi ^{-\frac{1}{2}n}\det (\pmb {\varSigma })^{-\frac{1}{2}n}\det (\mathbf {D})^{\frac{1}{2}(n-p-1)}}{2^{\frac{1}{2}np}\mathrm {\Gamma }_{p}{\left( \frac{1}{2}n\right) }\mathrm {\Gamma }_{p}{\left( \frac{1}{2}p\right) }}\prod _{i < j}(\lambda _{i} - \lambda _{j})\exp \left( -\frac{1}{2}\mathrm {Tr}(\pmb {\varSigma }^{-1}\mathbf {D}) \right) \end{aligned}$$
(34.9)

where \(\mathbf {D} = \mathrm {diag}(\lambda _{i})\) and \(\Gamma \) is the Gamma function.

It will prove useful that (34.9) contains the term \(\displaystyle \prod _{i < j}(\lambda _{i} - \lambda _{j})\) which is the determinant of a Vandermonde matrix [46]. A Vandermonde matrix is a well-known type of matrix that appears in many different applications both in mathematics, physics and recently in multivariate statistics, most famously curve-fitting using polynomials, for details see [46].

Definition 34.3

Square Vandermonde matrices of size \(n \times n\) are determined by N values \(\mathbf {x}=(x_1,\ldots ,x_n)\) and is defined as follows:

$$\begin{aligned} V_{n}(\mathbf {x}) = \begin{bmatrix} 1 &{} 1 &{} \cdots &{} 1 \\ x_1 &{} x_2 &{} \cdots &{} x_n \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ x_1^{n-1} &{} x_2^{n-1} &{} \cdots &{} x_n^{n-1} \end{bmatrix}. \end{aligned}$$
(34.10)

The determinant of the Vandermonde matrix is well known.

Lemma 34.1

The determinant of square Vandermonde matrices has the form

$$\begin{aligned} \det V_n(\mathbf {x}) \equiv v_n(\mathbf {x}) =\prod _{1\le i<j\le n}(x_j-x_i). \end{aligned}$$
(34.11)

This determinant is also referred to as the Vandermonde determinant or Vandermonde polynomial or Vandermondian [46].

We take advantage of this fact of Vandermonde determinant to establish the relationship between the product of Vandermonde matrices and joint eigenvalue probability density functions for large random matrices that occur in various areas of both classical mechanics, mathematics, statistics and many other areas of science. We also illustrate the optimization of these densities based of the extreme points Vandermonde determinant.

The extreme points of the Vandermonde determinant appears in random matrix theory, for example to compute the limiting value of the so called Stieltjes transform using the method sometimes called the ‘Coulomb gas analogy’ [32]. This is also closely related to many problems in quantum mechanics and statistical mechanics. For an overview of some other applications of the extreme points see [35].

In the next section we give a brief overview of random matrix theory (RMT).

34.2 Overview of Random Matrix Theory

Random matrices were first introduced in mathematical statistics in the late 1920s [56] and today the joint probability density function of eigenvalues of random matrices play a significant role both in probability theory, mathematical physics and quantum mechanics [22]. A random matrix, in simple terms can be defined as any matrix whose real or complex valued entries are random variables.

Random matrix theory primarily discusses the properties large or complex matrices with random variables as entries by utilizing the existing probability laws, in particularly, Gaussian distributions [4, 5]. The main motivational question in the probabilistic approach to random matrices is: what can be said about the probabilities of a few or if not all of its eigenvalues and eigenvectors? This question is significant in many areas of science including particle physics, mathematics, statistics and finance as highlighted here under.

In nuclear physics random matrices were applied in the modelling of the nuclei of heavy atoms [55]. The main idea was to investigate the spacing between the lines in the electromagnetic spectrum of a heavy atom nucleus, e.g. Uranium 238, which resembles the separation between the eigenvalues of a random matrix [32]. These random matrices have also been employed in solid-state physics to model the chaotic behaviour of large disordered Hamiltonians in terms of mean field approximation [14]. Random matrices have also been applied in quantum chaos to characterise the spectral statistics of quantum systems [9, 12].

Random unitary matrix transformations has also appears in theoretical physics, e.g. the boson sampling model [1] has been applied in quantum optics to describe the advantages of quantum computation over classical computation. Random unitary transformations can also be directly implemented in an optical circuit, by mapping their parameters to optical circuit components [41].

Other applications in theoretical physics include, analysing the chiral Dirac operator [28, 52] quantum chromodynamics, quantum gravity in two dimensions [21], in mesoscopic physics random matrices are used to characterise materials of intermediate length [43], spin-transfer torque [42], the fractional quantum Hall effect [10], Anderson localization [25], quantum dots [59] and superconductors [7], electrodynamic properties of structural materials [58], describing electrical conduction properties of disordered organic and inorganic materials [57], quantum gravity [15] and string theory [8].

In mathematics some application include the distribution of the zeros of the Riemann zeta function [27], enumeration of permutations having certain particularities in which the random matrices can help to derive polynomials permutation patterns [38], counting of certain knots and links as applies to folding and coloring [8].

In multivariate statistics random matrices were introduced for statistical analysis of large samples in estimation of covariance matrices [18,19,20, 33, 45, 56]. More significant results have proven that to extend the classical scalar inequalities for improved analysis of a structured dimension reduction based on largest eigenvalues of finite sums of random Hermitian matrices [48].

Random matrices have also been applied to financial modelling especially risk models and time series [6, 23, 51, 56].

Random matrices also are increasingly used to model the network of synaptic connections between neurons in the brain as applies to neural networks or neuroscience. Neuronal networks can help to construct dynamical models based on random connectivity matrix [44]. This has also helped to establish the link relating the statistical properties of the spectrum of biologically inspired random matrix models to the dynamical behaviour of randomly connected neural networks [13, 26, 36, 47, 53].

In optimal control theory random matrices appear as coefficients in the state equation of linear evolution. In most problems the values of the parameters in these matrices are not known with certainty, in which case there are random matrices in the state equation and the problem is known as one of stochastic control [11, 49, 50].

In the next section, we will discuss some well-known ensembles that that appear in the mathematical study of random matrices.

34.3 Classical Random Matrix Ensembles

The key famously known classical ensembles include the Gaussian Orthogonal Ensembles (G.O.E), the Gaussian Unitary Ensembles (G.U.E), the Gaussian Symplectic Ensembles (GSE), the Wishart Ensembles (W.E), the MANOVA Ensembles (M.E) and the Circular Ensembles (C.E). These can be derived from the multivariate Gaussian matrix, \(\mathbf {G}_{\beta }, \beta = 1, 2, 4\). Since, the multivariate Gaussian possesses an inherent orthogonal property from the standard normal distribution, that is, they remain invariant under orthogonal transformations. More detailed discussions on these ensembles can be found in [3, 4, 32, 37, 54, 56].

Definition 34.4

([29]) The Gaussian Orthogonal Ensembles (G.O.E) are characterised by the symmetric matrix \(\mathbf {X} = \mathbf {G}_{1}(N,N)\) obtained as \(\left( \mathbf {X} + \mathbf {X}^{\top }\right) /2\). The diagonal entries of \(\mathbf {X}\) are independent and identically distributes (i.i.d) with a standard normal distribution \(\mathcal {N}(0,1)\) while the off-diagonal entries are i.i.d with a standard normal distribution \(\mathcal {N}_{1}(0,1/2)\). That is, a random matrix \(\mathbf {X}\) is called the Gaussian Orthogonal Ensemble (GOE), if it is symmetric and real-valued (\(X_{ij} = X_{ji}\)) and has

$$\begin{aligned} X_{-ij} = {\left\{ \begin{array}{ll} \sqrt{2}\xi _{ii} \sim \mathcal {N}_{1}(0,1), &{} \text{ if } i = j \\ \xi _{ij} \sim \mathcal {N}_{1}(0,1/2), &{} i < j. \end{array}\right. } \end{aligned}$$
(34.12)

Definition 34.5

([16, 29]) The Gaussian Unitary Ensembles (G.U.E), are characterised by the Hermitian complex-valued matrix \(\mathbf {H} = \mathbf {G}_{2}(N,N)\) obtained as \(\left( \mathbf {H} + \mathbf {H}^{\top ^*}\right) /2\) where \(\top ^*\) is the operation of taking the Hermitian transpose, that is, the Hermitian or conjugate transpose of \(\mathbf {H}\), and expressed as \((\mathbf {H}^{\top ^*})_{ij} = \overline{\mathbf {H}}_{ji}\). The diagonal entries of \(\mathbf {H}\) are independent and identically distributes (i.i.d) with a standard normal distribution \(\mathcal {N}(0,1)\) while the off-diagonal entries are i.i.d with a standard normal distribution \(\mathcal {N}_{2}(0,1/2)\). That is, random matrix \(\mathbf {H}\) is called a Gaussian Unitary Ensemble (GUE), if it is complex-valued, Hermitian \((\mathbf {H}_{ij}^{\top ^*} = \overline{\mathbf {H}}_{ji})\), and the entries satisfy

$$\begin{aligned} \mathbf {H}_{ij} = {\left\{ \begin{array}{ll} \sqrt{2}\xi _{ii} \sim \mathcal {N}_{2}(0,1), &{} \text{ if } i = j \\ \frac{1}{\sqrt{2}}(\xi _{ij} + \sqrt{-1}\eta _{ij}) \sim \mathcal {N}_{2}(0,1/2), &{} i < j. \end{array}\right. } \end{aligned}$$
(34.13)

Definition 34.6

([6, 29]) The Gaussian Symplectic Ensembles (GSE), are characterised by the self-dual matrix \(\mathbf {S} = \mathbf {G}_{4}(N,N)\) obtained as \(\left( \mathbf {S} + \mathbf {S}^{\top ^*}\right) /2\) where \(\top ^*\) represents the operation of taking the conjugate transpose of a quaternion matrix. The diagonal entries \(\mathbf {H}\) are independent and identically distributes (i.i.d) with a standard normal distribution \(\mathcal {N}(0,1)\) while the off-diagonal entries are i.i.d with a standard normal distribution \(\mathcal {N}_{4}(0,1/2)\).

Definition 34.7

([6, 29]) The Wishart Ensembles (W.E), \(\mathcal {W}_{\beta }(m,n), m \ge n\), are characterised by the symmetric, Hermitian or self-dual matrix \(\mathbf {W} = \mathbf {W}_{\beta }(N,N)\) obtained as \(\mathbf {W} = \mathbf {A}\mathbf {A}^{\top }, \mathbf {W} = \mathbf {H}\mathbf {H}^{\top }\), or \(\mathbf {W} = \mathbf {S}\mathbf {S}^{\top }\) where \(\top \) represents the operation of taking the usual transposes of defined in G.O.E, G.U.E and G.S.E above respectively.

Definition 34.8

([6, 29]) The MANOVA Ensembles (M.E), \(\mathcal {J}_{\beta }(m_{1}, m_{2},n), m_{1}, m_{2} \ge n\), are characterised by the symmetric, Hermitian or self-dual matrix \(\mathbf {A}/(\mathbf {A} + \mathbf {B})\) where \(\mathbf {A}\) and \(\mathbf {B}\) are \(\mathbf {W}_{\beta }(m_{1},n)\) and \(\mathbf {W}_{\beta }(m_{2},n)\) respectively.

Definition 34.9

([16, 29]) The Circular Ensembles (C.E), are characterised by the special matrix \(\mathbf {U}\mathbf {U}^{\top }\) where \(\mathbf {U}_{\beta }, \beta = 1, 2\) is a uniformly distributed unitary matrix.

Lemma 34.2

([29]) From the Gaussian normal distribution with mean \(\mu \) and variance \(\sigma ^{2}\), that is, \(\mathbf {X}\sim \mathcal {N}(\mu , \sigma ^{2})\), given by (34.1) and the multivariate normal distribution with mean vector \(\pmb {\mu }\) and the covariance matrix is \(\pmb {\varSigma }\), \(\mathcal {N}_{N}(\pmb {\mu },\pmb {\varSigma })\) given in (34.5), then it can be verified that the joint density of \(\mathbf {A}\) is written as:

$$ \mathbb {P}_{\mathbf {X}}(\mathbf {A}) = \frac{1}{2^{n/2}}\frac{1}{\pi ^{n(n+1)/4}}\exp \left( -\Vert \mathbf {A} \Vert _{F}^{2}/2 \right) $$

where \(\Vert \mathbf {A} \Vert _{F}\) represents the Frobenius norm of \(\mathbf {A}\).

Theorem 34.5

([4, 16]) If we let \(\mathbf {X}\) be an \(N \times N\) random matrix with entries that are independently identically distributed as \(\mathcal {N}(0,1)\), then the joint density distribution of the Gaussian ensembles is given by:

$$\begin{aligned} \text{ Gaussian: }\qquad \left| \begin{array}{cc} \text{ Orthogonal } &{} \beta = 1 \\ \text{ Unitary }&{} \beta = 2\\ \text{ Symplectic } &{} \beta = 4 \end{array}\right| \qquad \mathbb {P}_{\beta }(\mathbf {A}) = \frac{1}{2^{n/2}}\frac{1}{\pi ^{n(n+1)~\beta /4}}\exp \left( -\frac{1}{2}\Vert \mathbf {A} \Vert _{F}^{2} \right) . \end{aligned}$$

Theorem 34.6

([16, 32]) Considering a Wishart matrix \(\mathbf {W}_{\beta }(m,n) = \mathbf {X}\mathbf {X}^{\top }\) where \(\mathbf {X} = \mathbf {G}_{\beta }(m,n)\) is a multivariate Gaussian matrix. Then, the joint elements of \(\mathbf {W}_{\beta }(m,n)\) can be computed in two steps, first writing \(\mathbf {W} = \mathbf {Q}\mathbf {R}\) and then integrating out \(\mathbf {Q}\) leaving \(\mathbf {R}\). Secondly applying the transformation \(\mathbf {W} = \mathbf {R}\mathbf {R}^{\top }\), which is the famous Cholesky factorization of matrices in numerical analysis. Then the joint density distribution for Wishart ensembles of \(\mathbf {W}\) is given by:

$$\begin{aligned} \text{ Wishart: }\qquad \left| \begin{array}{cc} \text{ Orthogonal } &{} \beta = 1\\ \text{ Unitary }&{} \beta = 2\\ \text{ Symplectic } &{} \beta = 4\end{array}\right| \qquad \mathbb {P}_{\beta }(W) = \frac{\exp \left( -\text{ tr }(\mathbf {W}/2) \right) \left( \det \mathbf {W} \right) ^{\beta (m-n+1)/2 - 1}}{2^{mn\beta /2}\Gamma _{n}^{\beta }(m\beta /2)}. \end{aligned}$$

Here we notice that the density distribution for both the Gaussian and Wishart ensembles are made up of determinant term and exponential trace term. This generalizes the fact that indeed the determinant term is actually the Vandermonde determinant in (34.11) for the case of the joint eigenvalue density functions. This concept further explained in the next section.

34.4 The Vandermonde Determinant and Joint Eigenvalue Probability Densities for Random Matrices

To obtain the joint eigenvalue densities for random matrices, we apply the the principle of matrix factorization, for instance if the random matrix \(\mathbf {X}\) is expressed as \(\mathbf {X} = \mathbf {Q} \pmb {\varLambda } \mathbf {Q}^{\top }\), then \(\pmb {\varLambda }\) directly gives the eigenvalues \(\mathbf {X}\) [24]. Applying the Jacobian technique for joint density transformation, see for example [4], this yields the joint densities of eigenvalues and eigenvectors.

Lemma 34.3

The three Gaussian ensembles have joint eigenvalues probability density function [32, 37] given by

$$\begin{aligned} \text{ Gaussian: }~~\mathbb {P}_{\beta }(\pmb {\lambda }) = \displaystyle C_{N}^{\beta }\prod _{i < j}|\lambda _{1} - \lambda _{2}|^{\beta }\exp \left( -\frac{1}{2}\sum _{i=1}^{N}\lambda _{i}^{2}\right) \end{aligned}$$
(34.14)

where \(\beta = 1\) representing reals, \(\beta = 2\) representing the complexes, and \(\beta = 4\) representing the quaternion, and

$$ C_{N}^{\beta } = (2\pi )^{-N/2}\prod _{j=1}^{N}\frac{{\mathrm {\Gamma }}\left( 1 + \beta /2\right) }{{\mathrm {\Gamma }}\left( 1 + j\beta /2\right) }. $$

Lemma 34.4

([24, 32]) The Wishart (or Laguerre) ensembles have a joint eigenvalue probability density distribution given by

$$\begin{aligned} \text{ Wishart: }~~\mathbb {P}_{\beta }(\lambda ) = \displaystyle C_{N}^{\beta ,\alpha }\prod _{i < j}|\lambda _{1} - \lambda _{2}|^{\beta }\prod _{i}\lambda _{i}^{\alpha -p}\exp \left( -\frac{1}{2}\displaystyle \sum _{i=1}^{N}\lambda _{i}^{2}\right) \end{aligned}$$
(34.15)

where \(\alpha = \frac{\beta }{2}m\) and \(p = 1 + \frac{\beta }{2}(N-1)\). The \(\beta \) parameter is decided by what type of elements are in the Wishart matrix, real-valued elements corresponds to \(\beta = 1\), complex-valued elements correspond to \(\beta = 2\) and quarternion elements correspond to \(\beta = 4\), and the normalizing constant \(C_N^{\beta ,\alpha }\) is given by

$$\begin{aligned} C_{N}^{\beta , \alpha } = 2^{-N\alpha }\prod _{j=1}^{N}\frac{{\mathrm {\Gamma }}\left( 1 + \beta /2\right) }{{\mathrm {\Gamma }}\left( 1 + j\beta /2\right) {\mathrm {\Gamma }}\left( \alpha -\frac{\beta }{2}(n-j)\right) } \end{aligned}$$
(34.16)

Thus the joint eigenvalue probability density distribution for all the ensembles can be summarized in the following theorem [18, 29, 32].

Theorem 34.7

Suppose that \(\mathbf {X}_{N} \in \mathcal {H}^{\beta }\) for \(\beta = 1,2,4\). Then, the distribution of eigenvalues of \(\mathbf {X}_{N}\) is given by

$$\begin{aligned} \mathbb {P}_{\mathbf {X}}(x_{1}, \ldots , x_{N}) = \bar{C}_{N}^{\beta }\prod _{i<j}|x_{i} - x_{j}|^{\beta }\exp \left( -\frac{\beta }{4}\sum _{i} x_{i}^{2} \right) \end{aligned}$$
(34.17)

where \(\bar{C}_{N}^{(\beta )}\) are normalized constants and can be computed explicitly.

From (34.17) it should be noted that trivially, the properties of a probability density function, that is,

$$\begin{aligned} \displaystyle 0 \le \mathbb {P}(\mathbf {x}) \le 1 ~~\mathrm {and}~~ \displaystyle \int _{\mathbb {R}^{N}}\mathbb {P}(\mathbf {x})\prod _{i=1}^{N}dx_{i} = 1 \end{aligned}$$

do hold as verified in [32]. We also notice that the term \(\displaystyle \prod _{i<j}|x_{i} - x_{j}|^{\beta }\) in the expression (34.17) is the determinant of the famous Vandermonde matrix (34.10) raised to the power \(\beta = 1, 2, 4\). For example, from (34.10) and (34.11) and applying the principles of linear algebra, that is, if \(\mathbf {A}\) is an \(N\times N\) matrix, then \(|\mathbf {A}^{\beta }| = |\mathbf {A}|^{\beta }\), for determinants. Thus,

$$\begin{aligned} \Big |V_{N}(\mathbf {x}) \Big |^{\beta } = \begin{vmatrix} 1&1&\cdots&1 \\ x_1&x_2&\cdots&x_N \\ \vdots&\vdots&\ddots&\vdots \\ x_1^{N-1}&x_2^{N-1}&\cdots&x_N^{N-1} \end{vmatrix}^{\beta } = \Big |\prod _{1\le i<j\le N}(x_j-x_i) \Big |^{\beta } = \prod _{1\le i<j\le N}|x_j-x_i|^{\beta } \end{aligned}$$
(34.18)

It should also be noted that the trace exponential term

$$ \displaystyle \exp \left( -\frac{\beta }{4}\sum _{i}^{N}x_{i}^{2} \right) $$

in (34.17) is a product of weight functions of Freud type [30, 34] of the form

$$\begin{aligned} \displaystyle \omega (x) = \exp (-\alpha x^{2}), \end{aligned}$$

where \(\alpha = 1, 1/2, 1/4\). The weights \(\omega (x)\) are bounded, that is, \(\displaystyle 0 \le |\omega (x)| \le 1\). If we assume the random variables \(\mathbf {X} = \{x_{1}, \ldots , x_{N}\}\) having normal probability density function with mean \(\mu = 0\) and variance \(\sigma ^{2} = 1\), that is, \(x_{i}\) are independent identically distributed i.i.d. as \(\mathcal {N}(0,1)\), then it follows that we can construct the normal density in terms of a Gaussian weights such that

$$\begin{aligned} \displaystyle \mathbb {P}_{\mathbf {X}}(x_{i}) = \frac{1}{\sqrt{2\pi }}\omega (x_{i}). \end{aligned}$$

Also, from the definition of the p-th moment of a probability density function

$$\begin{aligned} \displaystyle \mathbb {E}[X^{p}] = x_{i}^{p}\mathbb {P}_{\mathbf {X}}(x_{i})dx_{i} \end{aligned}$$

and we have equivalently in terms of Gaussian weights

$$\begin{aligned} \displaystyle \mathbb {E}[X^{p}] = x_{i}^{p}\mathbb {P}_{\mathbf {X}}(x_{i})dx_{i} = x_{i}^{p} \cdot \frac{1}{\sqrt{2\pi }}\omega (x_{i})dx_{i}. \end{aligned}$$

Thus focusing on the coefficients terms of \(x_{i}\), that is,

$$\begin{aligned} x_{i}^{p} \cdot \frac{1}{\sqrt{2\pi }}\omega (x_{i}) = kx_{i}^{p}\omega (x_{i}), ~k =1/\sqrt{2\pi }, \end{aligned}$$

we generate a weighted Vandermonde matrix of the \(\omega (\mathbf {x})\) weighted form as follows:

$$\begin{aligned} V_{\mathrm {N}}\left( \omega (\mathbf {x})\mathbf {x}\right) = \begin{bmatrix} \omega _{1}(x_{1}) &{} \omega _{2}(x_{1}) &{} \cdots &{} \omega _{N}(x_{1}) \\ \omega _{1}(x_{1})x_1 &{} \omega _{2}(x_{2})x_2 &{} \cdots &{} \omega _{N}(x_{N})x_N \\ \omega _{1}(x_{1})x_1^{2} &{} \omega _{2}(x_{2})x_2^{2} &{} \cdots &{} \omega _{N}(x_{N})x_N^{2}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \omega _{1}(x_{1})x_1^{N-1} &{} \omega _{2}(x_{2})x_2^{N-1} &{} \cdots &{} \omega _{N}(x_{N})x_N^{N-1} \end{bmatrix}. \end{aligned}$$
(34.19)

The determinant of the Vandermonde matrix in (34.19) can also be obtained taking advantage of the Gaussian weights of the form:

$$\begin{aligned} \displaystyle \omega _i = \omega (x_{i}) = \exp \left( -\frac{1}{4} x_{i}^{2}\right) \end{aligned}$$

and properties of determinant that is, if say \(\mathbf {A}\) is an \(N \times N\) matrix, then \(|\alpha \mathbf {A}| = \alpha ^{N}|A|\). Thus,

$$\begin{aligned} \begin{aligned}&\Big |\omega (\mathbf {x})V_{N}(\mathbf {x}) \Big |^{\beta } = \left| \begin{vmatrix} \omega _{1}(x_{1})&\omega _{2}(x_{1})&\cdots&\omega _{N}(x_{1}) \\ \omega _{1}(x_{1})x_1&\omega _{2}(x_{2})x_2&\cdots&\omega _{N}(x_{N})x_N \\ \omega _{1}(x_{1})x_1^{2}&\omega _{2}(x_{2})x_2^{2}&\cdots&\omega _{N}(x_{N})x_N^{2}\\ \vdots&\vdots&\ddots&\vdots \\ \omega _{1}(x_{1})x_1^{N-1}&\omega _{2}(x_{2})x_2^{N-1}&\cdots&\omega _{N}(x_{N})x_N^{N-1} \end{vmatrix} \right| ^{\beta } \\&= \left| \begin{vmatrix} \omega _{1}(x_{1})&0&0&\cdots&0 \\ 0&\omega _{2}(x_{2})&0&\cdots&0 \\ 0&0&\omega _{3}(x_{3})&\cdots&0 \\ \vdots&\vdots&\vdots&\ddots&\vdots \\ 0&0&0&\cdots&\omega _{N}(x_{N}) \end{vmatrix} \begin{vmatrix} 1&1&1&\cdots&1 \\ x_1&x_2&x_{3}&\cdots&x_N \\ x_1^{2}&x_2^{2}&x_{3}^{2}&\cdots&x_N^{2}\\ \vdots&\vdots&\vdots&\ddots&\vdots \\ x_1^{N-1}&x_2^{N-1}&x_{3}^{N-1}&\cdots&x_N^{N-1} \end{vmatrix} \right| ^{\beta } \\&= \Big |\prod _{i=1}^{N}\omega (x_{i})\prod _{1\le i<j\le N}(x_j-x_i) \Big |^{\beta }\\&= \left[ \exp \left( -\frac{1}{4}x_{1}^{2}\right) \cdots \exp \left( -\frac{1}{4}x_{N}^{2}\right) \right] ^{\beta } \Big |\prod _{1\le i<j\le N}(x_j-x_i) \Big |^{\beta }\\&= \left[ \exp \left( -\frac{1}{4}\sum _{i=1}^{N}x_{i}^{2}\right) \right] ^{\beta }\prod _{1\le i<j\le N}|x_j-x_i|^{\beta }\\&= \prod _{1\le i<j\le N}|x_j-x_i|^{\beta } \exp \left( -\frac{\beta }{4}\sum _{i=1}^{N}x_{i}^{2} \right) . \end{aligned} \end{aligned}$$
(34.20)

If \(\mathbf {X}\) has a central normal distribution, then for any finite non-negative integer p the plain central moments are given by

$$\begin{aligned} \mathbb {E}[\mathbf {X}^{p}] = {\left\{ \begin{array}{ll} 0, &{} \text{ if } \text{ p } \text{ is } \text{ odd }\\ \sigma ^{p}(p-1)!!, &{} \text{ if } \text{ p } \text{ is } \text{ even }. \end{array}\right. } \end{aligned}$$
(34.21)

where n!! denotes the double factorial, that is, the product of numbers from n to 1 that have same parity as n.

The absolute central moment coincides with the plain moments for all even orders and are non-zero for odd orders. Thus, for any non-negative integer p

$$\begin{aligned} \mathbb {E}[|\mathbf {X}|^{p}] = \sigma ^{p}(p-1)!! \cdot \left\{ \begin{array}{cc} \sqrt{\frac{2}{\pi }}, &{} \text{ if } \text{ p } \text{ is } \text{ odd } \\ 1, &{} \text{ if } \text{ p } \text{ is } \text{ even } \end{array}\right\} = \sigma ^{p} \cdot \frac{2^{p/2}\varGamma {\left( \frac{p+1}{2}\right) }}{\sqrt{\pi }}. \end{aligned}$$
(34.22)

Considering \(\mathbf {X}_{1}, \ldots , \mathbf {X}_{N}\) independently normally distributed random variables with mean \(\mu = 0\), then the p-th product moment can be expressed as

$$\begin{aligned} \displaystyle \mathbb {E}\left[ \left( \mathbf {X}_{1} \cdots \mathbf {X}_{N}\right) ^{p}\right] = \mathbb {E}\left[ \mathbf {X}_{1}^{p}\right] \cdots \mathbb {E}\left[ \mathbf {X}_{N}^{p}\right] . \end{aligned}$$

Thus from (34.22), the joint p-th product moment will be given by

$$\begin{aligned} \mathbb {E}\left[ |\mathbf {X}_{1}|^{p}\right] \cdots \mathbb {E}\left[ |\mathbf {X}_{N}|^{p}\right] = \left[ \sigma ^{p}(p-1)!! \cdot \left\{ \begin{array}{cc} \sqrt{\frac{2}{\pi }}, &{} \text{ if } \text{ p } \text{ is } \text{ odd } \\ 1, &{} \text{ if } \text{ p } \text{ is } \text{ even } \end{array}\right\} \right] ^{N} = \left[ \sigma ^{p} \cdot \frac{2^{p/2}{\mathrm {\Gamma }}{\left( \frac{p+1}{2}\right) }}{\sqrt{\pi }}\right] ^{N}. \end{aligned}$$

This, thorough examination, generates an equivalent expression for multivariate Gamma function as defined in (34.16), which is also the normalizing constant for the joint eigenvalue density for \(\beta \)-ensembles as given in (34.17). Thus, the same normalizing coefficient can be introduced in the expression (34.20) to lead to the same result as in (34.17).

Basing on the above close link between the Vandermonde determinant, then it is plausible enough to consider the general optimization of Vandermonde determinant over the polynomial constraint defined by trace factor \(\displaystyle \sum _{i=1}^{N}x_{i}^{2}\) in the bounded exponential term. We will apply the method of Lagrange multipliers to optimize the density (34.12) to optimize the Vandermonde determinant on the unit sphere and other surfaces, which in turn optimize the joint eigenvalue density as will be demonstrated in the next section.

34.5 Optimising the Joint Eigenvalue Probability Density Function

Lemma 34.5

For any symmetric \(n \times n\) matrix A with eigenvalues \(\{ \lambda _i, i = 1,\ldots ,n \}\) that are all distinct, and any polynomial P:

$$\begin{aligned} \sum _{k = 1}^{n} P(\lambda _{k}) = \mathrm {Tr}\left( P(\mathbf {A})\right) . \end{aligned}$$

Proof

By definition, for any eigenvalue \(\lambda \) and eigenvector \(\mathbf {v}\) we must have \(\mathbf {A}\mathbf {v} = \lambda \mathbf {v}\) and thus

$$\begin{aligned} \displaystyle P(\mathbf {A}) \mathbf {v} = \left( \sum _{k=0}^{m} c_k \mathbf {A}^{k}\right) \mathbf {v} = \sum _{k=0}^{m} c_k (A^{k} \mathbf {v}) = \sum _{k=0}^{m} c_k \lambda ^{k} \mathbf {v} \end{aligned}$$

and thus \(P(\lambda )\) is an eigenvalue of \(P(\mathbf {A})\). For any matrix, \(\mathbf {A}\), the sum of eigenvalues is equal to the trace of the matrix

$$\begin{aligned} \displaystyle \sum _{k=1}^{n} \lambda _k = \text{ Tr }(\mathbf {A}) \end{aligned}$$

when multiplicities are taken into account. For the matrices considered in the Lemma 34.5 all eigenvalues are distinct. Thus applying this property to the matrix \(P(\mathbf {A})\) gives the desired statement.

Lemma 34.6

A Wishart distributed matrix \(\mathbf {W}\) as defined in Definition 34.2 will be a symmetric \(n \times n\) matrix.

Proof

From the definition \(\mathbf {W}\) is a \(p \times p\) matrix such that \(\mathbf {W} = \mathbf {X}\mathbf {X}^{\top }\). Then

$$\begin{aligned} \mathbf {W}^{\top } = (\mathbf {X}\mathbf {X}^\top )^\top = (\mathbf {X}^\top )^\top \mathbf {X}^\top = \mathbf {X}\mathbf {X}^\top = \mathbf {W} \end{aligned}$$

and thus \(\mathbf {W}\) is symmetric.

Lemma 34.7

Suppose we have a Wishart distributed matrix \(\mathbf {W}\) with the probability density function of its eigenvalues given by

$$\begin{aligned} \mathbb {P}(\mathbf {\lambda }) = C_n v_n(\mathbf {\lambda })^m \exp \left( -\frac{\beta }{2} \sum _{k=1}^{n} P(\lambda _k)\right) \end{aligned}$$
(34.23)

where \(C_n\) is a normalising constant, m is a positive integer, \(\beta > 1\) and P is a polynomial with real coefficients. Then the vector of eigenvalues of \(\mathbf {W}\) will lie on the surface defined by

$$\begin{aligned} \sum _{k=1}^{n} P(\lambda _k) = \mathrm {Tr}(P(\mathbf {W})). \end{aligned}$$
(34.24)

Proof

Since \(\mathbf {W}\) is symmetric by Lemma 34.6 then it will also have real eigenvalues. By Lemma 34.5

$$\begin{aligned} \displaystyle \sum _{k=1}^{n} P(\lambda _k) = \text{ Tr }(P(\mathbf {W})) \end{aligned}$$

and thus the point given by \(\mathbf {\lambda } = (\lambda _{1}, \lambda _{2}, \ldots , \lambda _{n})\) will be on the surface defined by

$$\begin{aligned} \displaystyle \sum _{k=1}^{n} P(\lambda _k) = \text{ Tr }(P(\mathbf {W})). \end{aligned}$$

To find the maximum values we can use the method of Lagrange multipliers and find eigenvectors such that

$$\begin{aligned} \frac{\partial \mathbb {P}}{\partial \lambda _k} = \eta \frac{\partial }{\partial \lambda _k}\left( \text{ Tr }(P(\mathbf {W}))-\sum _{k=1}^{n} P(\lambda _k)\right) = -\eta \frac{\mathrm {d}P(\lambda _k)}{\mathrm {d} \lambda _k},~~k=1,\ldots ,n, \end{aligned}$$

where \(\eta \) is some real-valued constant. Computing the left-hand side gives

$$\begin{aligned} \frac{\partial \mathbb {P}^{(\beta )}}{\partial \lambda _k} = \mathbb {P}(\lambda ) \left( -\frac{\beta }{2}\frac{\mathrm {d}P(\lambda _k)}{\mathrm {d} \lambda _k} + \sum _{{\mathop {i \ne k}\limits ^{i = 1}}}^{n} \frac{m}{\lambda _k-\lambda _i}\right) . \end{aligned}$$

Thus the stationary points of (34.23) on the surface given by (34.24) are the solution to the equation system

$$\begin{aligned} \mathbb {P}(\lambda ) \left( -\frac{\beta }{2}\frac{\mathrm {d}P(\lambda _k)}{\mathrm {d} \lambda _k} + \sum _{{\mathop {i \ne k}\limits ^{i = 1}}}^{n} \frac{m}{\lambda _k-\lambda _i}\right) = -\eta \frac{\mathrm {d}P(\lambda _k)}{\mathrm {d} \lambda _k},~~k=1,\ldots ,n. \end{aligned}$$

If we denote the value of \(\mathbb {P}\) in a stationary point with \(P_s\) then the system above can be rewritten as

$$\begin{aligned} \sum _{{\mathop {i \ne k}\limits ^{i = 1}}}^{n} \frac{1}{\lambda _k-\lambda _i} = \frac{1}{m}\left( \frac{\beta }{2}-\frac{\eta }{P_s}\right) \frac{\mathrm {d}P(\lambda _k)}{\mathrm {d} \lambda _k} = \rho \, \frac{\mathrm {d}P(\lambda _k)}{\mathrm {d} \lambda _k},~~k=1,\ldots ,n. \end{aligned}$$
(34.25)

The equation system described by (34.25) appears when one tries to optimize the Vandermonde determinant on a surface defined by a univariate polynomial. This problem also appears in other settings, such as finding the Fekete points on a surface [34], certain electrostatics problems [17] and D-optimal design [35]. This equation system can be rewritten as an ordinary differential equation.

Consider the polynomial

$$\begin{aligned} f(\lambda ) = \prod _{i=1}^{n} (\lambda -\lambda _i) \end{aligned}$$

and note that

$$\begin{aligned} \frac{1}{2} \frac{f''(\lambda _j)}{f'(\lambda _j)} = \sum _{\begin{array}{c} i=1 \\ i \ne j \end{array}}^{n} \frac{1}{\lambda _j-\lambda _i}. \end{aligned}$$

Thus in each of the extreme points we will have the relation

$$\begin{aligned} \left. \frac{\mathrm {d}^2 f}{\mathrm {d}\lambda ^2}\right| _{\lambda = \lambda _j} - 2 \rho \left. \frac{\mathrm {d}P}{\mathrm {d}\lambda }\right| _{\lambda = \lambda _j} \left. \frac{\mathrm {d}f}{\mathrm {d}\lambda }\right| _{\lambda = \lambda _j} = 0, ~ j = 1,2,\ldots ,n \end{aligned}$$

for some \(\rho \in \mathbb {R}\). Since each \(\lambda _j\) is a root of \(f(\lambda )\) we see that the left hand side in the differential equation must be a polynomial with the same roots as \(f(\lambda )\), thus we can conclude that for any \(\lambda \in \mathbb {R}\)

$$\begin{aligned} \frac{\mathrm {d}^2 f}{\mathrm {d}\lambda } - 2 \rho \frac{\mathrm {d}P}{\mathrm {d}\lambda } \frac{\mathrm {d}f}{\mathrm {d}\lambda } - Q(\lambda ) f(\lambda ) = 0 \end{aligned}$$
(34.26)

where Q is a polynomial of degree \((\deg (p)-2)\).

Consider the \(\beta \) ensemble described by (34.17). For this ensemble the polynomial that defines the surface that the eigenvalues will be on is \(p(\lambda ) = \lambda ^2\). Thus by Lemma 34.7 the surface becomes a sphere with radius \(\sqrt{\text{ Tr }(\mathbf {W}^2)}\).

The solution to the equation system given by (34.25) on the unit sphere has been known for a long time, see [46] or [31, 35] for a more explicit description. The solution is given as the roots of a polynomial, in this case the solution can be written as the roots of the rescaled Hermite polynomials, the explicit expression for the polynomial whose roots give the maximum points is

$$\begin{aligned} \nonumber f(x)= & {} H_n\left( \left( \frac{n-1}{2 (r_1^2 - 2 r_0)}\right) ^{\frac{1}{2}} \frac{(x+r_1)}{2}\right) \\= & {} n! \sum _{i=0}^{\left\lfloor \frac{n}{2}\right\rfloor } \frac{(-1)^i}{i!}\left( \frac{n-1}{2(r_1^2 - 2 r_0)}\right) ^{\frac{n-2i}{2}}\frac{(x+r_1)^{n-2i}}{(n-2i)!} \end{aligned}$$
(34.27)

where \(H_n\) denotes the nth (physicist) Hermite polynomial [2].

The solution on the unit sphere can then be used to find the vector of eigenvalues that maximizes the probability density function \(\mathbb {P}(\lambda )\) given by (34.17). Since rescaling the vector of eigenvalues affects the probability density depending on the length of the original vector in the following way

$$\begin{aligned} \mathbb {P}(c\mathbf {\lambda }) = c^{\frac{n(n-1)m}{2}} \exp \left( \frac{\beta }{2}(1-c^2) |\mathbf {\lambda }|^2 \right) \mathbb {P}(\mathbf {\lambda }) \end{aligned}$$

the unit sphere solution can be rescaled so that it ends up on the appropriate sphere.

For other polynomials that define the surface that the eigenvalues lie on similar techniques, for instance for some polynomials of the form \(P(\lambda ) = \lambda ^k\) where k is an even positive integer techniques like the ones demonstrated in [35] or [34] can be employed.

For a \(\beta \) ensemble the extreme points of \(\mathbb {P}\) share the properties of the extreme points of the Vandermonde determinant, for example all the extreme points will lie on the intersection of the sphere and the plane \(\displaystyle \sum _{k=1}^{n} \lambda _{k} = 0\). What this can look like for \(n=3\) is shown in Fig. 34.1 and for \(n=4\) in Fig. 34.2.

To visualize the location of the extreme points we use a technique described in detail in [31].

It can be shown that the extreme points of \(v_4(\mathbf {x})\) on the sphere all lie in the hyperplane \(x_1+x_2+x_3+x_4=0\). The intersection of this hyperplane with the unit sphere in \(\mathbb {R}^4\) can be described as a unit sphere in \(\mathbb {R}^3\), under a suitable basis, and can then be easily visualized.

This can be realized using the transformation

$$\begin{aligned} \mathbf {x}=\begin{pmatrix} -1 &{} -1 &{} 0 \\ -1 &{} 1 &{} 0 \\ 1 &{} 0 &{} -1 \\ 1 &{} 0 &{} 1 \end{pmatrix} \begin{pmatrix} 1/\sqrt{4} &{} 0 &{} 0 \\ 0 &{} 1/\sqrt{2} &{} 0 \\ 0 &{} 0 &{} 1/\sqrt{2} \\ \end{pmatrix} \mathbf {t} \end{aligned}$$
(34.28)

where \(\mathbf {x}\) is the coordinate vector in \(\mathbb {R}^4\) and \(\mathbf {t}\) is the corresponding coordinate vector in \(\mathbb {R}^3\). This will give a new sphere that can be parametrised using angles as normal.

Similar visualizations of the locations of extreme points on the unit sphere can be constructed up to \(n=7\), see [31] for further discussion.

Fig. 34.1
figure 1

Illustration of the expression given by (34.23) on the unit sphere in three dimensions, with parameters \(n = 3\), \(m = 2\), \(\beta = 2\) and \(C_n = 1\). Note that this expression is not correctly normalized and therefore not the exact value of the probability density distribution. On the right the value of the expression on the sphere is drawn and on the left the sphere has been parametrized in such a way that the point of the sphere given by \(\left( \frac{1}{\sqrt{3}},\frac{1}{\sqrt{3}},\frac{1}{\sqrt{3}}\right) \) corresponds to the point (0, 0)

Fig. 34.2
figure 2

Illustration of the expression given by (34.23) on the unit sphere in four dimensions, with parameters \(n = 4\), \(m = 2\), \(\beta = 2\) and \(C_n = 1\). Note that this expression is not correctly normalized and therefore not the exact value of the probability density distribution. In order to visualize the locations of the extreme points in four dimensions using only a two-dimensional surface the transformation given in (34.28) is used

34.6 Summary

In our study we establish that finding the extreme points for the probability distribution of the eigenvalues of a Wishart matrix can be done by finding the extreme points on a sphere with a radius related to the trace of the Wishart matrix. This close link between the Vandermonde determinant and the joint eigenvalue probability density function for \(\beta \)-ensembles helps to study more properties and applications of the probability density functions that occur in random matrices. As illustrated in Fig. 34.1, such results can be used to explain the distribution of charges over a unit sphere which agrees with Coloumb’s theory for electrostatic charge distribution. In this case the extreme points of the probability density function of the eigenvalues happen to be the zeros of deformed Hermite polynomials given by (34.27).