Abstract
Suppose that Xn = (X1,…,Xn) have mean 0, and a single-factor covariance Σ = (σij) with σii = 1 and σij = ρ ≥ 0 for i ≠ j. For a threshold c, let Sn be the number of components of Xn that exceed c. We express the distribution of Sn in terms of a single integral, provide the limiting distribution as \(n \rightarrow \infty \), and show that the limit resembles the Beta family. We then describe the shape of the exceedance distribution when the underlying distributions of the single-factor model have a certain likelihood ratio criterion with respect to its scale parameter, and we show that it obeys a majorization ordering.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The problem of determining the probability of crossing a threshold by a random process has a wide range of applications, and a considerable history (Castillo et al. 2005; Leadbetter et al. 2011). The use of control charts or the study of k-of-n systems in reliability theory (Barlow and Proschan, 1965) are a classical example. Recent examples of considerable current interest include modeling flooding in hydrology (Huang et al. 2020) and the frequency of large forest fires (Alvarado et al. 1998) in climate science.
In many applications, (systems of) differential equations model the processes that underlie the exceedances. For example, stochastic differential equations with Brownian motion as the underlying driver in fields as diverse as mathematical finance (prices of equities) (Steele, 2000), hydrology (groundwater flow) (Cushman, 1987), and neuroscience (spike generation of a neuron) (Tuckwell, 1988). In other cases, the critical events are modeled as exceedances of rather simple probabilistic models, such as the multivariate normal.
The problem that we address here arose from a discussion with Professor Shun’ichi Amari about a paper by Shadlen and Newsome (1998) which dealt with the problem of modeling the natural variability of cortical neurons using a simple integrate-and-fire model. In this model, a neuron receives thousands of synaptic inputs. The magnitudes of the inputs could be viewed as i.i.d. Gaussian random variables, with positive (negative) values corresponding to excitatory (inhibitory). If the inputs were independent, the number of excitatory inputs would have a binomial distribution. However, the independence of the inputs is often not reasonable, so a natural generalization is to consider exchangeable random variables instead: the inputs may be dependent, but they are sense stationary in some sense. In short, we are interested in the structure of the probabilities of exceedances.
In Section 2 we define the exceedance statistic and its distribution. Although our primary interest is on an underlying Gaussian model, we state some of our results more generally. We show that the exceedance probabilities are expressible in terms of a single integral. We use it to describe the different shapes of the exceedance distribution and prove a majorization ordering for it.
2 Properties of the Exceedance Distribution
Consider i.i.d. symmetric random variables Z0,…,Zn with cdf F, pdf f, with f(x) = f(−x) > 0 or all x, and \(E({Z_{i}^{2}}) = 1\). Many of the results below are for the Gaussian, for which we denote the cdf and pdf by Φ and ϕ, respectively. For ρ ≥ 0, let \(X_{i} = \sqrt {1-\rho } Z_{i} + \sqrt {\rho } Z_{0}\), so that Xn = (X1,…,Xn) has mean 0 and covariance matrix Σ = (σij) with σii = 1 and σij = ρ for i ≠ j. For a constant c the exceedance statistic is
By conditioning on Z0, we get the exceedance distribution,
Several special cases of this expression are elementary (David, 1953): for example, for the Gaussian
which also hold for − 1/(n − 1) ≤ ρ < 0; these expressions also hold for elliptically contoured distributions (Iyengar and Tong, 1989). And for any such F,
is the probability that Z0 is the (n − k + 1)st order statistic among (Z0,…,Zn). More generally, the connection to order statistics of equicorrelated random variables is clear: writing F(−x) = 1 − F(x), we have
where Mn is the largest order statistic in an equicorrelated sample of size n.
For large n there is an easily derived approximation.
Theorem 2.1.
For 0 ≤ t ≤ 1, as \(n \rightarrow \infty \)
or with a slight abuse of notation,
Proof.
By the strong law,
Thus, by dominated convergence
For the Gaussian, the following properties of \(G_{a,b}(t) = {\Phi }(a + b {\Phi }^{-1}(t))\) for \(a \in \mathbb {R}\) and b > 0, and its density
are easy to verify.
-
(a)
G0,1 is the uniform.
-
(b)
If b = 1 and a > 0 (a < 0), the density decreases (increases) from \(\infty \) to 0 (0 to \(\infty \)).
-
(c)
If b > 1 the density is bounded with ga,b(0) = ga,b(1) = 0 and it is unimodal with mode at t = Φ(ab/(1 − b2)).
-
(d)
If b < 1 the density is bounded with \(g_{a,b}(0)=g_{a,b}(1)=\infty \) and it is U-shaped with minimum at t = Φ(ab/(1 − b2)).
-
(e)
The raw moments of this distribution are
$$ {{\int}_{0}^{1}} t^{k} g_{a,b}(t) dt = {\int}_{\mathbb{R}} {\Phi} \left( \frac{x-a}{b} \right)^{k} \phi(x) dx; $$The first moment is \(1 - {\Phi }(a/\sqrt {1+b^{2}})\); the rest are easily computed, but not expressible in elementary terms.
Thus, the family (2.6) of limiting distributions resembles the Beta(α, β) family. For the special case c = 0 that resemblance holds for finite n for not only the Gaussian, but other latent distributions that satisfy a certain likelihood ratio ordering with respect to the scale parameter. □
Definition 2.2.
We say that the cdf F and its pdf f satisfy the LR condition if f(x) = f(−x) and that the ratio
is decreasing (increasing) in |x| for 0 < σ ≤ 1 (\(1 \leq \sigma < \infty \)). The Gaussian, Laplace, and t-distributions all satisfy this LR condition.
We next prove the intuitively clear result that for c = 0 the exceedance distribution is either U-shaped or unimodal with mode or minimum at the middle. The proof is rather involved, requiring a detailed study of the integrands and the use of the likelihood ratio method which transfers attention from any ρ to ρ = 1/2, for which the exceedance distribution is uniform from (2.3).
Theorem 2.3.
Suppose that F and f satisfy the LR condition in (2.2), and that c = 0, so that the exceedance distribution is symmetric. Then for 0 ≤ ρ ≤ 1/2 the exceedance distribution is unimodal with probabilities decreasing away from the mode. And for 1/2 ≤ ρ ≤ 1 it is U-shaped with probabilities increasing away from the minimum. The mode or minimum is at n/2 for n even and at (n ± 1)/2 for n odd.
Proof.
We prove this result for 0 ≤ ρ ≤ 1/2; the the proof for 1/2 ≤ ρ ≤ 1 is similar, so we omit it. We first show that if 0 ≤ ρ ≤ 1/2, then \(p_{1}^{(n)} - p_{0}^{(n)} \geq 0\), and that for 2 ≤ k ≤ (n + 1)/2 we have \(p_{k}^{(n)} - p_{k-1}^{(n)} \geq 0\). Let \(\alpha = \sqrt {\rho /(1-\rho )}\). Then
where gn(y) = nyn− 1 − (n + 1)yn and hn(y) = gn(y) + gn(1 − y). We need the following facts, all of which are derived by a close examination of these polynomials. First, gn has roots at 0 and n/(n + 1); its boundary values are gn(0) = 0, gn(1) = − 1, \(g_{n}^{\prime }(0) = 0\), and \(g_{n}^{\prime }(1) = -n\); gn has its maximum value at y = (n − 1)/(n + 1). Next, hn is strictly positive for 1/2 ≤ y ≤ n/n + 1, and is strictly decreasing for n/(n + 1) ≤ y ≤ 1; thus, there is a unique y∗ between n/(n + 1) and 1 such that hn(y∗) = 0. Now let F(t∗) = y∗. Then
The proof of \(p_{k}^{(n)} - p_{k-1}^{(n)} \geq 0\) for 2 ≤ k ≤ (n + 1)/2 and 0 ≤ ρ ≤ 1/2 is similar, but the functions corresponding to gn and hn are more involved. Start with
where
and
Note that gn,k has roots at 0, 1, and (n − k + 1)/(n + 1), and that \(g_{n,k}^{\prime }\) has roots at
Next, we show that hn,k has a unique root y∗ between 1/2 and 1, with hn,k positive (negative) to the left (right) of y∗, so we can then use the same proof as before. Using the properties of gn,k, we see that hn,k is strictly positive for 1/2 ≤ y ≤ (n − k + 1)/(n + 1), and strictly decreasing in the interval
We must therefore show that hn,k ≤ 0 in [Un,k,1]. This is clearly true for y = 1; for y < 1 we have
Since n ≥ 2k + 1, it suffices to show that hn,k(y) ≤ 0 in the interval
In this interval we have
because the function on the left is strictly increasing in y. Thus, it is now enough to show that
for k ≥ 2 and n ≥ 2k − 1. This inequality is trivial for n = 2k − 1,2k,2k + 1. Finally, writing u = n − 2k + 1, we must verify that for all u ≥ 0,
which is an easy (if tedious) verification, and our proof is complete. A small note: the details of this proof requires n ≥ 3; the result also holds for n = 2 using the expressions Eq. 2.1. □
Our next result concerns majorization properties that the exceedance distribution: see Marshall and Olkin (1979). Let \(x,y \in \mathbb {R}^{n}\) be nonincreasing sequences of numbers; that is, x1 ≥ x2 ≥⋯ ≥ xn, and similarly for y. Then x majorizes y, written x ≻ y, if for k = 1,…,n
Theorem 2.4.
As \(n \rightarrow \infty \), Suppose that the cdf F and its density f satisfy the LR condition, that c = 0, and p(ρ) = (p0(ρ),…,pn(ρ)) be the exceedance distribution. If 0 ≤ ρ1 < ρ2 ≤ 1/2, then p(ρ1) ≻ p(ρ2); and if 1/2 ≤ ρ1 < ρ2 ≤ 1, then p(ρ2) ≻ p(ρ1);
Proof.
We prove this result for n = 2m + 1 is odd and 0 ≤ ρ ≤ 1/2; the proof for even n and 1/2 ≤ ρ ≤ 1 is similar. Because c = 0, pn,i(ρ) = pn,2m+ 1−i(ρ), and
Thus, it suffices to show that the functions pn,m, 2pn,m, 2pn,m + pn,m− 1, …, 2pn,m + 2pn,m− 1,…, all decrease with ρ. To do this, we will show that all the derivatives are negative. As before, let \(\alpha = \sqrt {\rho /(1-\rho )}\), and note that α is a strictly increasing function of ρ. Thus, for j < m, we have
Next, writing h(t) = tϕ(t)ϕ(αt), and dropping the constant combinatorial coefficient
the derivative of H is
Note that because h(t) is an odd function, we have
Applying that to the expression for \(H^{\prime }(\alpha )\) we get a collapsing sum that leads to
where
We now see that \({\sum }_{i=1}^{j} p_{n,m-i}(\rho )\) is decreasing in ρ for j < m; the case of j = m is trivial because \({\sum }_{i=1}^{m} p_{n,m-i}(\rho ) = 1/2\). Finally, the same calculations show that the same result holds for
and our proof is complete. □
3 Discussion
Numerical examples indicate that the approximation in Eq. 2.5 is good for n ≥ 20 near the mode, but that n ≥ 50 gives better results in the tails. Of course, for the neuroscience applications that motivated this work the approximation is quite good because n is in the thousands. Our main results – the shape of the exceedance distribution and the the majorization result – are limited in scope because c = 0. Extending these results to c≠ 0 requires knowledge of the location of the mode, complicating the computations considerably. Our numerical work indicate that the beta-distribution-like shapes may well hold for any c and ρ ≥ 0. However, the majorization result does not generalize to c≠ 0.
References
Alvarado, E, Sandberg, DV and Pickford, SG (1998). Modeling large forest fires as extreme events. Northwest Sci. 72, 66–75.
Barlow, RE and Proschan, F (1965). Mathematical Theory of Reliability. Wiley, Hoboken.
Castillo, E, Hadi, AS, Balakrishnan, N and Sarabia, JM (2005). Extreme Value and Related Models with Applications in Engineering and Science. Wiley, Hoboken.
Cushman, JH (1987). Development of stochastic partial differential equations for subsurface hydrology. Stochastic Hydrol. Hydraulics 1, 241–262.
David, FN (1953). A note on the evaluation of the multivariate normal integral. Biometrika 40, 458–459.
Huang, Y, Liang, Z, Hu, Y, Li, B and Wang, J (2020). Theoretical derivation for the exceedance probability of corresponding flood volume of the equivalent frequency regional composition method in hydrology. Hydrol. Res. 51, 1274–1292.
Iyengar, S and Tong, YL (1989). Convexity of elliptically contoured distributions with applications. Sankhya A 51, 13–29.
Leadbetter, ML, Lindgren, G and Rootzén, H (2011). Extremes and Related Properties of Random Sequences and Processes. Springer, Berlin.
Marshall, A and Olkin, I (1979). Inequalities: Theory of Majorization and its Applications. Academic Press, Cambridge.
Shadlen, MN and Newsome, WT (1998). The variable discharge of cortical neurons: implications for connectivity, computation, and information coding. J Neuroscience 18, 3870–3896.
Steele, JM (2000). Stochastic Calculus and Financial Applications. Springer, Berlin.
Tuckwell, HC (1988). Introduction to Theoretical Neurobiology. Cambridge University Press, Cambridge.
Author information
Authors and Affiliations
Corresponding author
Additional information
Dedicated to Dr. C.R. Rao on the occasion of his 100th birthday.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Iyengar, S. On the Exceedances of Exchangeable Random Variables. Sankhya B 83 (Suppl 1), 26–35 (2021). https://doi.org/10.1007/s13571-021-00252-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13571-021-00252-3