1 Introduction and main results

Bomze [2] coined the term “standard quadratic optimization problem” (StQP) for the problem

$$\begin{aligned}&\min \mathbf{x}^TQ \mathbf{x}, \end{aligned}$$
(1.1)
$$\begin{aligned}&\text { s.t. }{} \mathbf{e}^T\mathbf{x}=1, \quad \mathbf{x}\ge \mathbf{0}, \end{aligned}$$
(1.2)

where \(Q=[Q_{ij}]\in \mathfrak {R}^{n\times n}\) is a symmetric matrix, not necessarily positive-semidefinite, and \(\mathbf{e}\in \mathfrak {R}^n\) is the all 1-vector. We will refer to the set in (1.2) as the simplex \(\Delta _n\).

The StQP appears in numerous applications such as resource allocation [25], portfolio selection [32], machine learning [33], the maximal clique problem in discrete optimization [20], and the determination of co-positivity of a matrix in linear algebra [4], etc. Since it is prototype for numerous, more general, quadratic programming problems, it has been used to test various algorithms proposed in the literature (see [4, 35, 36] and the references therein for details).

Our subject in this paper is a random instance of the StQP, where the symmetric matrix Q is generated from a certain distribution. To put our work into perspective, we note that the study of optimization problems with random data can be traced back to early 1980s, e.g. Goldberg and Marchetti-Spaccamela [21] (knapsack problem). See Beier [1] for a more recent progress on random knapsack problems. There has been made a significant progress in analysis of the so-called \(L_1\) minimization problem with random constraints. Notably it was proved that when the coefficient matrix is generated from a normal distribution, then with high probability (whp), the optimal solution of the \(L_1\) minimization problem is the sparsest point in the constrained set (see [9, 10, 16]).

It is also important to note that in the optimization literature, when testing algorithms, it is not uncommon to generate optimization problem data randomly due to the lack of testing instances. For example, Bomze and De Klerk [3] and Bundfuss and Dür [7] generate StQPs with symmetric Q whose entries are uniformly distributed. Thus, a good understanding of the behavior of the optimal solutions under randomly generated instances may shed light on the behaviors of various algorithms tested on these instances. Indeed, our results, together with those in [11, 12], establishing the sparsity of the optimal solutions of randomly generated StQPs under quite general distribution assumptions, indicate that the performance of algorithms tested on these instances must be carefully analyzed before any general statement can be made. Interestingly, motivated by the sparsity of the optimal solutions, Bomze et al. [5] construct StQP instances with a rich solution structure. Another important implication of our results is that though the StQP is computationally intractable in the worst case, it might be tractable in the average case. This implication is consistent with the observation in a recent paper by Burer and Ye [8], who prove that a class of random quadratic constrained programs has exact semidefinite relaxations and thus is tractable with high probability.

The first author, together with Peng and Zhang [11], prodded by a close relation between the StQP and the \(L_1\) minimization problem and a keen interest in understanding randomly generated optimization problems, proved that, as \(n\rightarrow \infty \), the random StQP with high probability (whp) has an optimal solution \(\mathbf{X}^*\) with the number of non-zero components bounded in probability, provided that the distribution F of \(Q_{i,j}\), (\(i\le j\)), is supported by [AB), with finite A, and F is concave on [AB). This family of distributions contains, for instance, the uniform distribution and the exponential distribution. However, the concavity assumption excludes \(A=-\infty \), whence the normal distribution was out. In a follow-up work, Chen and Peng [12] were still able to prove that for the GOE (the abbreviation for the Gaussian orthogonal ensemble, [31]) matrix \(Q=(M+M^T)/2\), \(M_{i,j}\) being i.i.d. normal, whp the minimum point \(\mathbf{X}^*\) has at most two non-zero components, thus being almost an extreme vertex of the simplex. The key ingredient of the proof was an upper bound \(e^{-n^2/4}\) for the probability that n-dimensional GOE matrix is positive-semidefinite (see [15]).

The goal of our study is to prove that the likely support of the minimum point is small under a much broader condition on the asymptotic behavior of the matrix entries distribution at just a leftmost point, finite or minus infinity, i.e. without any constraint on the distribution shape.

Our main results are as follows. Let F stand for the cumulative distribution function (c.d.f.) of the entries \(Q_{i,j}\), \(i\le j\), independently distributed.

Let \(\mathbf{X}^*\) denote a global minimum solution of the StQP with the random matrix Q. Let \(K_n\) denote the support size for \( \mathbf{X}^*\), i.e. \(K_n=|\{j\in [n]:\,X_j^*>0\}|\).

Theorem 1.1

Suppose F(x) is continuous. Let \(\alpha >e\sqrt{2}\), and \(k_n=\lceil \alpha n^{1/2}\rceil \). Then

$$\begin{aligned} P \{K_n\ge k_n\} =O\bigl (e^{\gamma (\alpha )n^{1/2}}\bigr ), \quad \gamma (\alpha ):=2\alpha \log (e\sqrt{2}/\alpha ) < 0. \end{aligned}$$

So \(K_n=O(n^{1/2})\) with probability sub-exponentially close to 1.

A surprisingly short proof of this general claim is based on Theorem 3 in [12]. It turns out that this bound is vastly improved, if one imposes some constraints going beyond simple continuity of the c.d.f. F(x).

Interestingly, our results and analysis depend on whether the support of the distribution F is left bounded or not. Note that problem (1.11.2) is shift-equivalent, i.e., adding a constant c to each entry of the matrix Q would not change the solutions, and the new \(Q_{ij}\) would remain i.i.d. If the support of the distribution F is left bounded, we can choose \(c=-\min Q_{ij}\), that is we can assume without loss of generality that the support of F is left bounded at 0. In this case, all entries of Q are nonnegative. Of course, even if the support of the random entries of Q is not left bounded, one could still add \(-\min Q_{ij}\) to the entries to make them nonnegative. However, the entries wouldn’t be independent anymore as \(-\min Q_{ij}\) is random now. Thus, we need to treat the two cases separately.

Theorem 1.2

(Left-bounded support) Suppose that the c.d.f. F(x) has a continuous density f(x), and satisfies the following properties.

  1. (1)

    \(f(x)>0\) for \(x\in [0, B)\), \(0<B\le \infty \); and \(f(x)=0\) otherwise;

  2. (2)

    There exists \(\nu >0\) and \(\rho >0\) such that

    $$\begin{aligned} F(x)=\rho x^{\nu } +O(x^{\nu +1}),\quad x\downarrow 0; \end{aligned}$$
  3. (3)
    $$\begin{aligned} \sup \left\{ \frac{f(x')}{f(x)}:\,x,x'\in (0,B),\,x'\in (x,2x)\right\} <\infty . \end{aligned}$$

Then, for \(k\le k_n\),

$$\begin{aligned} P \{K_n=k{+1}\}&\le \exp \bigl (-c(\max (\nu ,1)) k+o(k)\bigr ),\\ c(\mu )&:=\int _0^1\log (1+x^{\mu })\,dx. \end{aligned}$$

Notes. (i) So \( P \{K_n>k\}\) decays exponentially fast long before k gets as large as \(k_n\). In fact, the exponential decay holds much longer, up until k is, say, of order \(n\log ^{-2} n\). That is, in probability, \(K_n\) is very much bounded. (ii) The uniform distribution and the exponential distribution are covered by this theorem: \(\nu =1\) for the former, and \(\nu =2\) for the latter. Notice also that the leading term \(\rho x^{\nu }\) in the condition (2) is concave for \(\nu \le 1\). The local nature of this condition makes Theorem 1.2 substantially more general than Theorem 3.4 in [11], proved for F(x) concave everywhere on the support of the density f, whence for the two distributions mentioned above. (iii) The condition (3) is an extension of the notion of a “dominatedly varying” monotone function introduced and studied by Feller [17, 18] (Ch. 8, Exer. 33). It holds, in particular, for every continuous positive density supported by (0, B], \(B<\infty \), or by \((0,\infty )\) if there exists \(C>0\) such that f(x) is decreasing for \(x>C\). So, contrary to its formulation, the condition is rather mild, with only pathological positive densities being excluded.

While the class of distributions covered by Theorem 1.2 is quite broad, it does not contain, for instance, the normal distribution (supported by \((-\infty ,\infty )\)), which is typically assumed in the extensive literature on the \(L_1\) minimization problem and its various generalizations. With the normal distribution in our peripheral vision, we introduce a class of distributions supported by \((-\infty ,B)\), \(B\le \infty \), such that, for \(x\rightarrow -\infty \),

$$\begin{aligned} F(x)= (c+O(|x|^{-\kappa })) |x|^a \exp (-r |x-x_0|^b),\quad b,c,r,\kappa >0; \end{aligned}$$
(1.3)

shifting/scaling x we make \(r=1\), \(x_0=0\). It is well-known that the normal distribution meets (1.3) with \(a=-1\), \(b=2\). Among other examples are the two-sided exponential density (\(a=0\), \(b=1\)), and the \(\cosh \)-density (\(a=0\), \(b=1\)). A broad class of sub-Gaussian random variables, cf. Boucheron et al. [6] falls into this category as well. As in Theorem 1.2, the condition (1.3) restricts the behavior of the c.d.f. F(x) in the vicinity of a single point, which is \(-\infty \) this time.

Theorem 1.3

Let \(b>1\). Then, for all \(k\le k_n\), we have

$$\begin{aligned} P {\bigl \{K_n= k+1\bigr \}}\le O\left( n \left( \frac{8}{9}\right) ^{k/4}+n\exp \Biggl (-\frac{k\left( \log \frac{n}{k}\right) ^{\min \{0,a/b\}}}{2e}\Biggr )\right) . \end{aligned}$$

So, invoking Theorem 1.1, \(K_n=O_p(\log n)\) for \(a\ge 0\), meaning that \( P (K_n > \omega (n)\log n)\rightarrow 0\) for every \(\omega (n)\rightarrow \infty \) however slowly, and \(K_n=\) \(O_p\bigl ((\log n)^{1+|a|/b}\bigr )\) for \(a<0\).

Theorem 1.4

Let \(b\le 1\). Define

$$\begin{aligned} \sigma (a,b)=\left\{ \begin{aligned}&1+\frac{1+2a}{b},&\text { if }a>0,\\&1+\frac{1+|a|}{b},&\text { if }a\le 0; \end{aligned}\right. \end{aligned}$$

so \(\sigma (a,b)>2\). For every \(\sigma >\sigma (a,b)\), and \(d<\frac{b(\sigma -\sigma (a,b))}{2}\),

$$\begin{aligned} P \{K_n > \log ^{1+d} n\}< \exp \bigl (-\log ^{1+d} n\bigr ). \end{aligned}$$

Note Therefore, with probability \(> 1 - O(n^{-L})\), (\(\forall \,L>0\)), \(K_n\) is below \(\log ^{1+d} n\). Using the term coined in Knuth et al. [29], “quite surely” (q.s.) the support size is of a moderate (poly-logarithmic) order. A pivotal technical element of the proof is the fact that the order statistics of the independent [0, 1]-uniforms are distributed, jointly, as the normalized consecutive sums of the independent exponentials [27].

Turn to the alternative model: first randomly generate an \(n\times n\) matrix M whose elements are i.i.d. random variables with the c.d.f. G; then define \(Q=(M+M^T)/2\).

Suppose G satisfies the (one-point) condition (1.3), and \(x_0=0\), \(r=1\), without loss of generality. The diagonal entries of Q have distribution G and the non-diagonal entries of Q have the distribution \(F(x)= (G\star G)(2x)\), \(\star \) standing for convolution. We prove that, for \(a>-1\) when \(b\le 1\), F satisfies the equation (1.3) as well, with the parameters \(b'=b\), \(c'>0\), \(r'=2^{\text {min}(1,b)}\) and

$$\begin{aligned} a'=\left\{ \begin{array}{l@{\quad }l} 2a+\dfrac{b}{2},&{}\text { if }\, b>1,\\ a+b-1,&{}\text { if }\,0<b<1,\\ 2a+1,&{}\text { if }\,b=1. \end{array}\right. \end{aligned}$$

Since \(r'>1=r\), we have \(\lim _{x\rightarrow -\infty }G(x)/F(x)= \infty \). Combining this fact and the general identity proved in [11] (the proof of Theorem 2.2, second top line in first display), we easily transfer the proof of our Theorems 1.3 and 1.4 to this model. To state the resulting claims one has only to replace a with \(a'\) in Theorems 1.3 and 1.4.

We note that the theorems above have the natural counterparts for the problem \(\max \{\mathbf{x}^TQ\mathbf{x}:\,\mathbf{x}\in \Delta _n\}\): the distribution F of \(Q_{i,j}\) is supported by \((A,\infty )\), \(A\ge -\infty \), and the restrictions are imposed on the behavior of F(x) in the vicinity of \(\infty \). Since \(-Q\) meets the conditions of the respective claim for the support \((-\infty ,-A)\), no additional proof is needed. So, for the quasi-normal distribution, i.e. \(a=-1\), \(b=2\), both the minimum point and the maximum point are sparse, with the likely support size of order \((\log n)^{3/2}\ll n\) in each case. As we mentioned, Chen and Peng [12] proved that for the GOE matrix \(Q=(M+M^T)/2\), \(M_{i,j}\) being i.i.d. exactly normal, whp the support size of \(\mathbf{X}^*\) is 2, at most. In absence of a counterpart of Dean-Majumdar’s probability bound on the positive semidefiniteness of GOE matrix for the quasi-normal distributions, chances to prove that the support size for these distributions is bounded in probability appear to be exceedingly small. On the other hand, it may well be possible to show that the likely support is poly-logarithmic in n for a broader family of quasi-normal distributions. This is in sharp contrast with the stronger bounds obtained in [5], (Proposition 1), building upon [11, 12] under considerably more restrictive conditions on the distribution F.

Theorems 1.2, 1.3 and 1.4 clearly illustrate the importance of the local asymptotic behavior at the left endpoint of the support of the underlying distribution in characterizing the sparsity of global optimal solutions. Roughly speaking, the upper bounds on \(P\{K_n=k+1\}\) we derive are higher for distributions which have a faster increase near the left endpoints of their supports, i.e., for distributions with a larger \(\nu \) (\(\nu \ge 1\)) in Theorem 1.2 and larger b or smaller a in Theorem 1.3. It is in sharp contrast to [11] and [12] as [11] requires the concavity of the c.d.f., which is a global property, while [12] primarily deals with normal distributions.

Our results focus on the global optimal solutions of StQP. Interestingly, the behavior of the local optimal solutions has attracted the attention of the population genetics literature thirty years ago. Specifically, Kingman [28] initiated the study of local maxima of the random quadratic forms \( \mathbf{p}^T F\mathbf{p}\) on the simplex \(\Delta _n\), with \(\mathbf{p}\) interpreted as the distribution of the alleles \(A_1,\dots , A_n\) at a single locus, and \(F_{i,j}\in [0,1]\) as the (random) fitness, i.e. the probability that the gene pair \((A_i,A_j)\) survives to a reproductive age. Kingman’s main interest was potential coexistence of several alleles at the locus, i.e. of locally stable distributions (local maxima) \(\mathbf{p}\) with a sizable support. Concurrently, in the context of evolutionary stable strategies, Haigh [23, 24] established the counterparts of some of Kingman’s results for a non-symmetric payoff matrix; see a more recent paper Kontogiannis and Spirakis [30]. The second author of the current paper showed that for a broad class of the fitness distributions, the likely support of a local maximum point \(\mathbf{p}\) not containing a support of a local equilibrium is \(\lceil (2/3)\log _2n\rceil \), at most. And, for the uniform fitnesses, there are likely many potential local maxima supports free of local equilibriums, each of size close to \(\lceil (1/2)\log _2n\rceil \), [34]. Conjecturally the likely size of the largest support of a local maximum is poly-logarithmic in n.

The paper is organized as follows. In Sect. 2, we provide some preliminaries useful for our analysis. In Sect. 3, we present the proofs of our key results, and finish the paper with some concluding remarks in Sect. 4.

2 Preliminaries

The analysis of the random StQP in [11] began with the formulation and the proof of the following optimality conditions. Given \(\mathbf{x}\in \Delta _n\), denote \(k(\mathbf{x})=|\{j\in [n]: x_j>0\}|\).

Proposition 2.1

Suppose that \(\mathbf{x}^*\) is the sparsest optimal solution of the problem (1.11.2) satisfying \(k(\mathbf{x}^*)=k>1\). So \(\lambda ^*:=(\mathbf{x}^*)^T Q \mathbf{x}^*\) is the absolute minimum of the quadratic form on the simplex \(\Delta _n\). Denoting \(\mathcal K=\{j\in [n]: x^*_j>0\}\), let \(Q_{\mathcal {K}}\)

be the principal \(k\times k\) submatrix of Q induced by the elements of the set \(\mathcal K\). Then

  1. C.1

    there exists a row (whence a column) of \(Q_{\mathcal {K}}\) such that the arithmetic mean of all its elements is (strictly) less than \(\min _{j\in [n]} Q_{j,j}\);

  2. C.2

    with \(E_{\mathcal K}(i,j):=1_{\{i,j\in \mathcal K\}}\), \(Q_{\mathcal {K}}-\lambda ^* E_{\mathcal K}\) is positive semidefinite; in short, \(Q_{\mathcal {K}}-\lambda ^* E_{\mathcal K}\succcurlyeq \mathbf{0}\).

Properties C.1 and C.2 follow, respectively, from the first-order optimality condition and the second-order optimality condition.

Consider the random symmetric matrix \(\{Q_{i,j}\}\): (1) the diagonal entries \(Q_{i,i}\) are independent, each having the same, continuous, distribution G; (2) the above-diagonal elements \(Q_{i,j}\), \((i<j)\), are independent of each other, and of the diagonal elements, each having the same, continuous, distribution F; (3) the below-diagonal elements \(Q_{i,j}\) are set equal to \(Q_{j,i}\).

If we relabel the elements of [n] to make \(Q_{1,1}<Q_{2,2}<\cdots <Q_{n,n}\), then the above-diagonal elements in the new \(n\times n\) array will remain independent of each other and of the diagonal entries, that now form the sequence \(V_1,\dots , V_n\) of the order statistics for n independent variables, each with distribution G.

Let us use the capital \(\mathbf{X}^*\) to denote the solution of the random StQP problem. Let \(K_n\) denote the support size of \( \mathbf{X}^*\). Property C.1 was used in [11] to prove the following, crucial, estimate.

Lemma 2.1

Let \(V_1<V_2<\cdots < V_n\) (\(W_1<W_2<\cdots <W_{n-1}\) resp.) denote the order statistics of the sequence of n (\(n-1\) resp.) independent, G-distributed (F-distributed resp.) random variables. Assume that \(\mathbf{V}:=(V_1,\dots ,V_n)\) and \( \mathbf{W}:=(W_1,\dots ,W_{n-1})\) are independent of each other. Then, for \(k\ge 1\),

$$\begin{aligned}&P \{K_n=k+1\}\le \rho (n,k),\\&\rho (n,k):=\sum _{i=1}^n P \bigl \{{\bar{W}}_k\le (k+1)V_1-V_i\bigr \},\quad {\bar{W}}_k:=\sum _{j=1}^{k} W_j. \end{aligned}$$

Proof

(For completeness) Consider Q obtained from the initial \(\{Q_{i,j}\}\) via the above relabeling, so that now \(Q_{i,i}\) is the i-th smallest among the diagonal entries. For each \(i\in [n]\), let \(\mathbf{W}(i)=(W_1(i)<\cdots < W_{n-1}(i))\) stand for the \((n-1)\) order statistics of the non-diagonal entries in the i-th row of the transformed Q. \(\mathbf{W}(1),\dots , \mathbf{W}(n)\) are equi-distributed, independent of \(\mathbf{V}\), though not of each other since the matrix is symmetric. By the property C.1, there exists a row \(i^*\) in Q such the sum of some \(k+1\) entries in this row, that includes its diagonal entry, is below \((k+1)\min _i Q_{i,i}=(k+1)V_1\). This sum is certainly not smaller than

$$\begin{aligned} \sum _{j=1}^k W_j(i^*)+V_{i^*}={\bar{W}}_k(i^*)+V_{i^*}. \end{aligned}$$

For a generic row \(i\in [n]\),

$$\begin{aligned} P \bigl \{{\bar{W}}_k(i)+V_i\le (k+1)V_1\}= P \bigl \{{\bar{W}}_k+V_i\le (k+1)V_1\}. \end{aligned}$$

Applying the union bound we complete the proof. \(\square \)

In the case \(k=n\) Property C.2 was utilized in [12] to prove

Lemma 2.2

For \(n\ge 2\),

$$\begin{aligned} P \left\{ K_n=n\right\} \le P \left\{ \bigcap _{i\ne j\in [n]}\bigl \{Q_{i,j}\le \max (Q_{i,i},Q_{j,j})\bigr \}\right\} \le \frac{2^n}{(n+1)!}. \end{aligned}$$

Define

$$\begin{aligned}&\rho (n,k)= P \bigl \{{\bar{W}}_k\le kV_1\}+{\hat{\rho }}(n,k), \\&{\hat{\rho }}(n,k):=\sum _{i=2}^n P \{{\bar{W}}_k\le (k+1)V_1-V_i\}. \end{aligned}$$

Notice that \({\hat{\rho }}(n,k)\le (n-1) P \bigl \{{\bar{W}}_k\le kV_1\}\), since \(V_i\ge V_1\) for \(i\ge 2\). Therefore, by Lemma 2.1,

$$\begin{aligned} P \{K_n=k+1\}\le n P \bigl \{{\bar{W}}_k\le kV_1\}. \end{aligned}$$
(2.1)

Using the classic formula for the joint distribution of the order statistics, they found in [11] (see (11), and the top identity in the proof of Theorem 2.2 therein) the multi-dimensional integral representations of the functions \(\rho (\cdot )\) and \({{\hat{\rho }}}(\cdot )\).

In terms of the order statistics \(W_1,\dots , W_{n-1}\) of the \((n-1)\) independent, F-distributed random variables, and \({\bar{W}}_k:=\sum _{j\in [k]}W_j\), the formulas for \( P \bigl \{{\bar{W}}_k\le kV_1\}\) and \({{\hat{\rho }}}(n,k)\) become

$$\begin{aligned} \begin{aligned} P \bigl \{{\bar{W}}_k\le kV_1\}&=E \left[ \bigl (1-G({\bar{W}}_k/k)\bigr )^n\right] ,\\ {{\hat{\rho }}}(n,k)&=(n-1)\,E \left[ H_{n,k}({\bar{W}}_k)\right] ,\\ H_{n,k}(w)&:=-\frac{[1-G(w/k)]^n}{n-1}\\&\quad \, +\frac{n(k+1)}{n-1}\int \limits _{w/k}^{\infty }g\bigl ((k+1)v-w)\,[1-G(v)]^{n-1}\,dv. \end{aligned} \end{aligned}$$
(2.2)

Here \(g(\cdot )\) is the density of G, the distribution function of the diagonal entries \(Q_{i,i}\). The top formula was the main focus of analysis in [11], and will be instrumental in our paper as well. As in [11], we switch to \(U_j=F(W_j)\), so that \(U_j\) are order statistics for the variables \(F(X_j)\), where \(X_j\) are i.i.d. with the c.d.f. F. We know, of course, that \(F(X_j)\) are [0, 1]-uniform. Thus \(U_1,\dots ,U_{n-1}\) are the order statistics of the sequence of \((n-1)\) independent, uniformly distributed, random variables. So the formula of the keen interest becomes

$$\begin{aligned} P \bigl \{{\bar{W}}_k\le kV_1\}={ E \Biggl [\Biggl (1-G\Biggl (\frac{1}{k}\sum _{j=1}^kW_j\Biggr )\Biggr )^{n}\Biggr ]=}E \Biggl [\Biggl (1-G\Biggl (\frac{1}{k}\sum _{j=1}^kF^{-1}(U_j)\Biggr )\Biggr )^{n}\Biggr ].\nonumber \\ \end{aligned}$$
(2.3)

3 Proofs

For convenience, as we go along, we will restate the claims already made in Sect. 1.

Let \(G=F\). We begin with Theorem 1.1, which provides an \(O(n^{1/2})\) upper bound for the likely size of the minimum point support under a very weak condition on the c.d.f. F.

Theorem 1.1

Let \(K_n\) be the support size for the solution of the random StQP with continuously distributed entries. Picking \(\alpha >e\sqrt{2}\) and setting \(k_n:=\lceil \alpha n^{1/2}\rceil \), we have

$$\begin{aligned} P \{K_n \ge k_n\} =O\bigl (e^{\gamma (\alpha )n^{1/2}}\bigr ), \quad \gamma (\alpha ):=2\alpha \log (e\sqrt{2}/\alpha ) <0. \end{aligned}$$

Proof

From Proposition 2.1 C.2 and Lemma 2.2, we have that

$$\begin{aligned} P \{K_n=k\}&\le P \Bigl \{\exists \text { a }k\times k\text { submatrix } {{\mathcal {K}}} \text{ s.t. } Q_{{\mathcal {K}}}-\lambda ^*E_k\succcurlyeq \mathbf{0}\Bigr \} \\&\le S(n,k):=\left( {\begin{array}{c}n\\ k\end{array}}\right) \cdot \frac{2^k}{(k+1)!}. \end{aligned}$$

Using the inequality \(b!\ge (b/e)^b\) (implied by \((1+1/b)^b<e\)) and its corollary \(\left( {\begin{array}{c}a\\ b\end{array}}\right) \le (ea/b)^b\), we obtain

$$\begin{aligned} S(n,k)\le \left( \frac{2e^2n}{k^2}\right) ^k\le \left( \frac{2e^2}{\alpha ^2}\right) ^k,\quad \forall \, k\ge \alpha n^{1/2}. \end{aligned}$$

Summing up this bound for \(k\ge k_n=\lceil \alpha n^{1/2}\rceil \), we complete the proof. \(\square \)

This result will enable us to use (2.3) for \(k\le k_n\) only, which severely restricts a likely range of the order statistics \(U_k\) involved. First, observe that, given \(\delta \in (0,1)\), by the union bound we have: for \(k\le k_n\), and n large,

$$\begin{aligned} \begin{aligned} P \{U_k\ge \delta \}&\le \left( {\begin{array}{c}n-1\\ n-k\end{array}}\right) (1-\delta )^{n-k}\le \left( {\begin{array}{c}n\\ k\end{array}}\right) (1-\delta )^{n-k}\\&\le (1-\delta )^{n-k} n^k\le (1-\delta )^{0.99n} n^k \le n^{-k}, \end{aligned} \end{aligned}$$
(3.1)

provided that \(\delta =\delta _n:=2.1n^{-1}k_n\log n=O(n^{-1/2}\log n)\). So the contribution to the RHS of (2.3) coming from \(U_k\ge \delta _n\) is at most \(n^{-k}\), uniformly for \(k\le k_n\). Second, let us show that we can focus on \(\mathbf{U}\) with

$$\begin{aligned} S(\mathbf{U}):=\frac{1}{k}\sum _{j=1}^k \log \frac{1}{U_j}\lesssim \log \frac{ne}{k}; \end{aligned}$$
(3.2)

(we write \(L\lesssim R\) if \(L\le (1+o(1))R\)). Here is the intuition behind (3.2). Roughly, \(U_j\) are close to j / n; therefore whp

$$\begin{aligned} S( \mathbf{U})\approx \frac{1}{k}\sum _{j=1}^k\log \frac{n}{j}=\frac{1}{k}\log \frac{n^k}{{k!}}\approx \log \frac{ne}{k}. \end{aligned}$$

This fact will come handy in the analysis of the F’s support going all the way to \(-\infty \).

Lemma 3.1

Given \(\alpha >0\), define \(S=S(\alpha )= \log \frac{n}{\alpha k}\). If \(\alpha <e^{-1}\), then for every \(\beta \in (0, 1-\alpha e)\) we have

$$\begin{aligned} P \bigl \{S(\mathbf{U})>S\bigr \} \le O\left( \Biggl (\frac{\alpha e}{1-\beta }\Biggr )^{\beta k} \right) . \end{aligned}$$

Proof

Recall that the components of \(\mathbf{U}\) are the k first order statistics of \((n -1)\) independent, [0, 1]-uniform random variables. In view of the sum-type formula for \(S(\mathbf{U})\) in (3.2), we apply a variation of Chernoff’s method. Picking \(\lambda >0\), and \(\lambda <k\) in order to make the coming integral finite, we define \(\mathbf{u}=(u_1<\cdots <u_k)\in (0,1)^{n-1}\) and write

$$\begin{aligned} P \bigl \{S(\mathbf{U})>S\bigr \}&\le e^{-\lambda S} (n-1)_k\int \limits _{S(\mathbf{u})>S}(1-u_k)^{n-1-k} \exp \left( \frac{\lambda }{k}\sum _{j=1}^k\log \frac{1}{u_j}\right) d\mathbf{u}\\&=\frac{e^{-\lambda S}(n-1)_k}{(k-1)!}\int _0^1 (1-u_k)^{n-1-k} u_k^{-\lambda /k}\left( \int _0^{u_k} w^{-\lambda /k}\,dw\right) ^{k-1}du_k\\&=\frac{e^{-\lambda S}(n-1)_k}{(1-\lambda /k)^{k-1}\,(k-1)!}\int _0^1 (1-u_k)^{n-1-k} u_k^{k-1-\lambda }\,du_k\\&=\frac{e^{-\lambda S}(n-1)_k}{(1-\lambda /k)^{k-1}\,(k-1)!}\cdot \frac{\Gamma (n-k)\,\Gamma (k-\lambda )}{\Gamma (n-\lambda )}\\&=\frac{e^{-\lambda S}}{(1-\lambda /k)^{k-1}}\cdot \frac{\Gamma (n)\,\Gamma (k-\lambda )}{\Gamma (n-\lambda )\,\Gamma (k)}{,} \end{aligned}$$

where \((n)_k:=\prod _{j=n-k+1}^n j\). Set \(\lambda =\beta k\); using Stirling formula for the Gamma function (see for instance [26]) in the last expression, it is easy to see that

$$\begin{aligned} P \bigl \{S(\mathbf{U})>S\bigr \}\le O\left( \left( \frac{\alpha e}{1-\beta }\right) ^{\beta k} \right) ; \end{aligned}$$

the bound is exponentially small since \(\alpha e<1-\beta \). \(\square \)

3.1 Distributions with left-bounded supports

In this section, we focus on a class of distributions satisfying the properties in Theorem 1.2.

In (3.1) we showed that, at the cost of \(O(n^{-k})\) error term, we can neglect \(\mathbf{U}\) with \(U_k> \delta _n =2.1n^{-1}k_n\log n\), for \(k\le k_n=\lceil \alpha n^{1/2}\rceil \). We need to show that, for the remaining \(\mathbf{U}\), \(\phi ( \mathbf{U}):=F\bigl (k^{-1}\sum _{j=1}^k F^{-1}(U_j)\bigr )\) typically dwarfs 1 / n for large k, and so makes \((1-\phi (\mathbf{U}))^n\) close to zero in all likelihood. Our first step is to establish an explicit lower bound for \(\phi (\mathbf{u})\). For this purpose, define

$$\begin{aligned} \sigma =\min _{x\in [0,F^{-1}(\delta _n)]} \frac{F(x)}{x^{\nu }}, \quad \eta =\max _{x\in [0, F^{-1}(\delta _n)]}\frac{F(x)}{x^{\nu }}, \end{aligned}$$

and \(\gamma =\frac{\sigma }{\eta }\le 1\).

Lemma 3.2

Assume that F meets the conditions (1) and (2) in Theorem 1.2. Then

(1):

\(\gamma =1+O\bigl (\delta _n^{1/\nu }\bigr )\) and uniformly for \(\mathbf{u}=0< u_1\le \cdots \le u_k \le \delta _n\), we have: \(\phi (\mathbf{u})\ge \gamma \Bigl (k^{-1}\sum _{j =1}^ku_j^{1/\nu }\Bigr )^{\nu }\);

(2):

further, for \(\nu \ge 1\), we have \(\phi (\mathbf{u})\ge \sum _{j=1}^k \gamma _j u_j\), with

$$\begin{aligned} \begin{aligned}&\gamma _j:=\gamma \left[ \left( 1-\frac{j-1}{k}\right) ^{\nu }-\left( 1-\frac{j}{k}\right) ^{\nu }\right] { \in (0,1)}.\\ \end{aligned} \end{aligned}$$
(3.3)

Proof

(1) We will use the notation \(g=\Omega (f)\), if \(g,f>0\) and \(g\ge c f\) for an absolute constant \(c>0\) in a given range of the arguments of f and g.

Since \(F(x)=\Omega (x^{\nu })\), \((x\downarrow 0)\), we see that \(F^{-1}(u)\in [0,F^{-1}(\delta _n)]\) for \(u\in [0,\delta _n]\), and \(F^{-1}(\delta _n)=O\bigl (\delta _n^{1/\nu }\bigr )\). Since \(F(x) =\rho x^{\nu } +O(x^{\nu +1})\), we have \(\sigma , \eta =\rho +O\bigl (\delta _n^{1/\nu }\bigr )\), so \(\gamma =1+O\bigl (\delta _n^{1/\nu }\bigr )\), and furthermore

$$\begin{aligned} F(x)\ge \sigma x^{\nu },\,\,\forall \, x\in \bigl [0,F^{-1}(\delta _n)\bigr ];\quad F^{-1}(u)\ge \left( \frac{u}{\eta }\right) ^{1/\nu },\,\,\forall \,u\in [0,\delta _n]. \end{aligned}$$
(3.4)

Applying (3.4), we lower-bound

$$\begin{aligned} \phi (\mathbf{u})&\ge \sigma \Bigl (k^{-1}\sum _{j=1}^kF^{-1}(u_j)\Bigr )^{\nu } \ge \gamma \Bigl (k^{-1}\sum _{j =1}^ku_j^{1/\nu }\Bigr )^{\nu }. \end{aligned}$$

(2) Suppose \(\nu \ge 1\). Our task is to prove that, for \(v_j=u_j^{1/\nu }\), we have

$$\begin{aligned} \left( \sum _{j=1}^k v_j\right) ^{\nu }\ge \sum _{j=1}^k v_j^{\nu }\bigl [(k-j+1)^{\nu }-(k-j)^{\nu }\bigr ]. \end{aligned}$$

Clearly,

$$\begin{aligned} 0\le v_1\le v_2\le \cdots \le v_k. \end{aligned}$$
(3.5)

Without loss of generality, we may assume that \(\sum _{j=1}^k v_j=1\). We need to show that, subject to this constraint, the maximum value of the RHS function, call it \(\psi (\mathbf{v})\), is 1. Since \(\nu \ge 1\), \(\psi (\mathbf{v})\) is a convex function of \( \mathbf{v}\ge \mathbf{0}\). So, for \(\mathbf{v}\) meeting (3.5) and \(\sum _{j=1}^k v_j=1\), \(\psi (\mathbf{v})\) attains its maximum at an extreme point of the resulting polyhedron. Every such point \(\mathbf{v}\) is of the form \(\mathbf{v}=(0,\dots 0, v,\dots ,v)\). So if the last zero is at position \(j_0\), then \((k-j_0)v=1\), i.e. \(v=(k-j_0)^{-1}\), and therefore \(\psi (\mathbf{v})\) is

$$\begin{aligned}&\sum _{j=1}^kv_j^{\nu } \bigl [(k-j+1)^{\nu }-(k-j)^{\nu }\bigr ]\\&\quad =(k-j_0)^{-\nu }\sum _{j=j_0+1}^k\bigl [(k-j+1)^{\nu }-(k-j)^{\nu }\bigr ]\\&\quad =(k-j_0)^{-\nu }\cdot (k-j_0)^{\nu }=1. \end{aligned}$$

\(\square \)

Armed with this lemma, we derive the upper bound for the truncated expectation

$$\begin{aligned} E_{n,k}:= E \Bigl [1_{\{U_k\le \delta _n\}}\bigl (1-\phi ( \mathbf{U})\bigr )^n\Bigr ], \quad \phi (\mathbf{U})=F\left( k^{-1}\sum _{j =1}^kF^{-1}(U_j)\right) . \end{aligned}$$

Let \(\nu \ge 1\). Using notations \(\gamma _{j:k}=\sum _{\ell = j}^{k}\gamma _{\ell }\in (0,1]\), \(d\mathbf{u}_t=\prod _{j=1}^t du_j\), \(u_0=0\), we write

$$\begin{aligned} \frac{E_{n,k}}{(n-1)_k} =&\; \int \limits _0^{\delta _n}\cdots \int \limits _{u_{k-1}}^{\delta _n}(1-u_k)^{n-1-k} \bigl (1-\phi (\mathbf{u})\bigr )^n d\mathbf{u}_k\\ \le&\; \int \limits _0^{\delta _n}\cdots \int \limits _{u_{k-1}}^{\delta _n}(1-u_k)^{n-1-k} \left( 1-\sum _{j=1}^k\gamma _ju_j\right) ^n d\mathbf{u}_k\\&\qquad \text {using concavity of }\log (\cdot )\\ \le&\; \int \limits _0^{\delta _n}\cdots \int \limits _{u_{k-1}}^{\delta _n}\Biggl [1-u_k+\frac{n}{2n-1-k}\Biggl (u_k(1-\gamma _k)-\sum _{j=1}^{k-1}\gamma _ju_j\Biggr )\Biggr ]^{2n-1-k} d\mathbf{u}_k\\&\qquad \text {integrating over }u_k {\text { and dropping the negative term at }} u_k=\delta _n { \text { as } \gamma _{1:k}\le 1} \\ \le&\; (2n-k)^{-1}\left( 1-\frac{n}{2n-1-k}(1-\gamma _k)\right) ^{-1}\\ \times&\; \int \limits _0^{\delta _n}\cdots \int \limits _{u_{k-2}}^{\delta _n} \left[ 1-u_{k-1}+\frac{n}{2n-1-k}\left( u_{k-1}(1-\gamma _{k-1:k})-\sum _{j=1}^{k-2}\gamma _ju_j\right) \right] ^{2n-k} d\mathbf{u}_{k-1}. \end{aligned}$$

Repeating the integration step \((k-1)\) times, we arrive at the bound

$$\begin{aligned} E_{n,k}&\le \prod _{j=1}^k \frac{n-1-k+j}{2n-1-k+j}\prod _{j=1}^k\left( 1-\frac{n}{2n-1-k}(1-\gamma _{k-j+1:k}) \right) ^{-1}\\&=\prod _{j=1}^k \frac{n-1-k+j}{2n-1-k+j}\prod _{j=1}^k\left( 1-\frac{n}{2n-1-k}\left( 1-\gamma \left( \frac{j}{k}\right) ^{\nu }\right) \right) ^{-1}.\\ \end{aligned}$$

Taking logarithms, using \(k\le k_n\), \(\gamma =1+O(\delta _n^{1/\nu })\), and viewing the resulting sum as a Riemann sum, we easily obtain

$$\begin{aligned} \begin{aligned}&\frac{\log E_{n,k}}{k}\le o(1) -c(\nu ),\\&c(\nu )=\log 2+\int _0^1\log \left( \frac{1}{2}+\frac{x^{\nu }}{2}\right) \,dx=\int _0^1\log \bigl (1+x^{\nu }\bigr )\,dx. \end{aligned} \end{aligned}$$
(3.6)

Clearly, \(c(\nu )\) is positive and decreasing as \(\nu \) increases. Note that

$$\begin{aligned} c(1)&=2\log 2-1\approx 0.386, \\ c(\nu )&=\sum _{j\ge 1}\frac{(-1)^{j-1}}{j(\nu j+1)}\rightarrow 0 \text{ as } \nu \rightarrow \infty . \end{aligned}$$

Let \(\nu <1\). Since \(\left( \frac{1}{k}\sum _{j=1}^k u_j^{1/\nu }\right) ^{\nu }\) is decreasing in \(\nu \), (3.6) holds with \(c(\nu ):=c(1)\). Thus, for all \(\nu >0\), the truncated expectation \(E_{n,k}\) is decreasing exponentially with \(k\le k_n\). Combining this claim with (3.1), we have proved

Lemma 3.3

Under the conditions (1-2) in Theorem 1.2, for \(k\le k_n\), we have

$$\begin{aligned} P \{{\bar{W}}_k\le kV_{1}\}\le \exp \bigl (o(k) -c(\max (\nu ,1)) k). \end{aligned}$$

It remains to upper-bound \({{\hat{\rho }}}(n,k)\). According to (2.2),

$$\begin{aligned} {{\hat{\rho }}}(n,k)\le n(k+1)E \Biggl [\int _{{\bar{W}}_k/k}^{\infty }f\bigl ((k+1)v-{\bar{W}}_k\bigr )\,\bigl [1-F(v)\bigr ]^{n-1}\,dv\Biggr ]. \end{aligned}$$
(3.7)

Let us bound the integral. We now assume that the density f satisfies the condition (3):

$$\begin{aligned} \beta =\sup \Bigl \{f(v')/f(v):\,v' \in [v, 2v],\,f(v)>0\Bigr \}<\infty ; \end{aligned}$$

following Feller [17], one can say that the density f is dominatedly varying on the support of the distribution F. We will need the following result, cf. [17, 18].

Lemma 3.4

Introduce

$$\begin{aligned} \beta (j)=\sup \left\{ \frac{f(x')}{f(x)}: x,\,x'\in (0,b),\,x'\in [x,jx],\,f(x)>0\right\} ,\quad j>1. \end{aligned}$$
(3.8)

Under the condition (3), we have \(\beta (j)\le \beta j^{\alpha }\), with \(\alpha :=\log _2\beta \).

Proof

(For completeness.) First of all,

$$\begin{aligned} \beta (k_1k_2)\le \beta (k_1)\beta (k_2),\quad k_1,\,k_2\ge 2; \end{aligned}$$

thus \(\beta (\cdot )\) is a sub-multiplicative function. Consequently, \(\beta (2^m)\le \beta (2)^m=\beta ^m\). Second, given \(k>2\), let \(m=m(k)\) be such that \(2^{m-1}< k\le 2^{m}\). Since \(\beta (\cdot )\) is increasing, we have

$$\begin{aligned} \beta (k)&\le \beta (2^m)\le \beta ^m\le \beta ^{\frac{\log (2k)}{\log 2}} =(2k)^{\log _2\beta }=\beta k^{\log _2\beta }. \end{aligned}$$

\(\square \)

The argument of the density f in (3.7) is sandwiched between v and \((k+1)v\). So, by Lemma 3.4, we obtain

$$\begin{aligned} {{\hat{\rho }}}(n,k)&\le n (k+1)^{\log _2\beta +1}E \Biggl [\int _{{\bar{W}}_k/k}^{\infty }f(v)\,\bigl [1-F(v)\bigr ]^{n-1}\,dv\Biggr ]\\&=(k+1)^{\log _2\beta +1}E \Bigl [\bigl [1-F({\bar{W}}_k/k)\bigr ]^n\Bigr ]\\&=(k+1)^{\log _2\beta +1}\exp \bigl (-c(\max (\nu ,1))k+o(k)\bigr )\\&=\exp \bigl (-c(\max (\nu ,1))k+o(k)\bigr ), \end{aligned}$$

as \(\log k=o(k)\) for \(k\rightarrow \infty \).

Therefore we completed the proof of

Theorem 3.1

Under the properties (1–3) in Theorem 1.2, for \(k\le k_n\), we have

$$\begin{aligned} P \{K_n=k+1\}\le \exp \bigl (- c(\max (\nu ,1)) k+o(k)\bigr ). \end{aligned}$$

Note The conditions (1), (2) relate to a single point a, i.e. they are so mild that there are scores of the classic densities meeting them. The condition (3) is different, as it concerns the ratio of the density values at the pairs of comparably large/small x and \(x'\). For the density f of a c.d.f. F meeting the conditions (1) and (2), the condition (3) is met, for instance, if (i) \(f(x)>0\) for \(x\in (0,x_1)\) and (ii) there is \(x_2\in (0,x_1)\) such that f(x) is decreasing on \([x_2,x_1)\). Many commonly used distributions satisfy the properties (1–3).

3.2 Distributions whose supports are not left bounded

In this section, we turn to the case when the support of the distribution extends all the way to \(-\infty \). Two examples come to mind. One is the normal distribution, with density

$$\begin{aligned} f(x)=\frac{1}{\sqrt{2\pi }} e^{-x^2/2}. \end{aligned}$$

It is known, and can be proved via a single integration by parts, that

$$\begin{aligned} F(x)=\frac{1+O(|x|^{-1})}{|x|\sqrt{2\pi }}\,e^{-x^2/2},\quad x\rightarrow -\infty . \end{aligned}$$

Another example is a positive exponential, with \(F(x)=e^x\), for \(x<0\), and \(F(x)\equiv 1\) for \(x\ge 0\). They both are special cases of the distribution F(x) such that, for some \(a\in (-\infty ,\infty )\), \(b>0\), \(c>0\), and \(\kappa >0\),

$$\begin{aligned} F(x)= (c+O(|x|^{-\kappa })) |x|^a \exp (-|x|^b),\quad x\rightarrow -\infty , \end{aligned}$$
(3.9)

which will be the focus of our analysis here.

We distinguish between two cases depending on the value of b.

3.2.1 Case \({b}\ge {1}\)

Recall the notation

$$\begin{aligned} \phi (\mathbf{u})=F\left( \frac{1}{k}\sum _{j=1}^k F^{-1}(u_j)\right) ,\quad S(\mathbf{u})=\frac{1}{k}\sum _{j=1}^k\log \frac{1}{u_j}. \end{aligned}$$

We will write \(L\gtrsim R\) if \(L\ge (1+o(1))R\).

Lemma 3.5

As all \(u_j\downarrow 0\), it holds that \( \phi (\mathbf{u})\gtrsim S(\mathbf{u})^{\min \{0,a/b\}} e^{-S(\mathbf{u})}. \)

Proof

Since \(x=F^{-1}(u)\) iff \(u=F(x)\), we have: for \(u\downarrow 0\),

$$\begin{aligned} |x|&=\left( \log \frac{c}{u}+ a\log |x|+O(|x|^{-\kappa })\right) ^{1/b}\nonumber \\&=\left[ \log \frac{c}{u}+\frac{a}{b}\log \left( \log \frac{c}{u}+a\log |x|+O(|x|^{-\kappa }\right) \right] ^{1/b}\nonumber \\&=\left[ \log \frac{c}{u}+\frac{a}{b}\log \log \left( \frac{1}{u}\right) +O\left( \frac{\log \log (1/u)}{\log (1/u)}\right) \right] ^{1/b}. \end{aligned}$$
(3.10)

As \(y^{1/b}\) is concave for \(b\ge 1\), we obtain then

$$\begin{aligned} X&:=\frac{1}{k}\sum _{j=1}^k F^{-1}(u_j)\\&=-\frac{1}{k}\sum _{j=1}^k\left[ \log \frac{c}{u_j}+\frac{a}{b}\log \log \left( \frac{1}{u_j}\right) +O\left( \frac{\log \log (1/u_j)}{\log (1/u_j)}\right) \right] ^{1/b}\\&\ge -\left[ \frac{1}{k}\sum _{j=1}^k\left( \log \frac{c}{u_j}+\frac{a}{b}\log \log \left( \frac{1}{u_j}\right) +O\left( \frac{\log \log (1/u_j)}{\log (1/u_j)}\right) \right) \right] ^{1/b}\\&=-\left[ \log c+S(\mathbf{u}) +o(1)+\frac{a}{bk}\sum _{j=1}^k \log \log \left( \frac{1}{u_j}\right) \right] ^{1/b}. \end{aligned}$$

(i) If \(a\le 0\), then \( X\ge -\bigl [\log c +S( \mathbf{u})+o(1)\bigr ]^{1/b}. \) Since \(X\rightarrow -\infty \), we evaluate F(X) using (3.9):

$$\begin{aligned} F(X)&=(c+O(|X|^{-\kappa })) |X|^a \exp (-|X|^b)\\&\ge (1+o(1)) S(\mathbf{u})^{a/b} e^{-S(\mathbf{u})}. \end{aligned}$$

(ii) If \(a>0\), then, using concavity of \(\log (\cdot )\), we obtain

$$\begin{aligned} X&\ge -\left[ \log c+S(\mathbf{u}) +o(1)+\frac{a}{b}\log \left( \frac{1}{k}\sum _{j=1}^k \log \frac{1}{u_j}\right) \right] ^{1/b}\\&=-\left[ \log c+S(\mathbf{u}) +o(1)+\frac{a}{b}\log S( \mathbf{u})\right] ^{1/b}. \end{aligned}$$

Therefore

$$\begin{aligned} F(X)=(c+O(|X|^{-\kappa })) |X|^a \exp (-|X|^b)\ge (1+o(1)) e^{-S( \mathbf{u})}. \end{aligned}$$

\(\square \)

Theorem 3.6

Let \(b>1\). For \(k\le k_n\), we have

$$\begin{aligned} P \bigl \{{\bar{W}}_k\le kV_1\bigr \}\le O\left( \left( \frac{8}{9}\right) ^{k/4} +\exp \left( -\frac{k\left( \log \frac{n}{k}\right) ^{\min \{0,a/b\}}}{2e}\right) \right) . \end{aligned}$$
(3.11)

Consequently, by (2.1),

$$\begin{aligned} P \bigl \{K_n=k+1\bigr \}\le O\left( n \left( \frac{8}{9}\right) ^{k/4}+n\exp \left( -\frac{k\left( \log \frac{n}{k}\right) ^{\min \{0,a/b\}}}{2e}\right) \right) . \end{aligned}$$

Proof

According to (2.3) and the definition of \(\phi (\mathbf{U})\), we have: given \(S>0\),

$$\begin{aligned} P \bigl \{{\bar{W}}_k\le kV_1\bigr \}&= E \bigl [(1-\phi (\mathbf{U}))^n\bigr ]\nonumber \\&= E \bigl [1_{\{S(\mathbf{U})>S\}}(1-\phi (\mathbf{U}))^n\bigr ] +E \bigl [1_{\{S(\mathbf{U})\le S\}}(1-\phi (\mathbf{U}))^n\bigr ].\nonumber \\ \end{aligned}$$
(3.12)

By Lemma 3.1, if \(k\le k_n\) then \(S( \mathbf{U})>S:=\log \frac{n}{\alpha k}\) with probability of order \((\alpha e/(1-\beta ))^{\beta k}= (8/9)^{k/4}\), if we select \(\alpha =\frac{2}{3e}\) and \(\beta =1/4\). If \(S(\mathbf{U})\le S\), then by Lemma 3.5 we have

$$\begin{aligned} \phi (\mathbf{U})&\ge (1+o(1)) S^{\min \{0,a/b\}}e^{-S}=(1+o(1))\frac{\alpha k}{n}\left( \log \frac{n}{\alpha k}\right) ^{\min \{0,a/b\}}\\&=(1+o(1))\frac{k}{(3/2) en}\left( \log \frac{n}{k}\right) ^{\min \{0,a/b\}}. \end{aligned}$$

In that case we obtain

$$\begin{aligned} \bigl (1-\phi (\mathbf{U})\bigr )^n \le \exp \left( -(1+o(1))\frac{k\left( \log \frac{n}{k}\right) ^{\min \{0,a/b\}}}{(3/2)e}\right) . \end{aligned}$$
(3.13)

Invoking (3.12) yields the inequality (3.11). \(\square \)

3.2.2 Case \({b}<{1}\)

We consider \(k\in [k_1,k_n]\), \(k_1=\lceil \log ^{\sigma } n\rceil \), \(\sigma >1+1/b\). (The choice of \(\sigma \) will become clear shortly.) Our starting bound for \(|X|^b\) is based on the bottom line in (3.10): for \(u_k\le \delta _n\), we have

$$\begin{aligned} \begin{aligned} |X|^b&\le \left[ \frac{1}{k}\sum _{j=1}^k\left( \log \frac{1}{u_j}\right) ^{1/b}\right] ^b\\&\quad \times \left[ 1+\left( \frac{a}{b}+O(\log ^{-1}(1/u_k))\right) \frac{\log \log (1/u_{k})}{\log (1/u_{k})}\right] . \end{aligned} \end{aligned}$$
(3.14)

Recall that \(u_1,\dots ,u_k\) are generic values of the first k order statistics \(U_1,\dots , U_k\) for the \((n-1)\) [0, 1]-uniform, independent random variables. To find a usable upper bound for the RHS in (3.14), valid for almost all \(u_1,\dots ,u_k\le \delta _n\), we need to identify a sufficiently likely upper bound for that RHS when the k-tuple \((u_1,\dots ,u_k)\) is replaced with the order statistics \(U_1,\dots ,U_k\).

To this end, we pick \(j_n=\lceil \log ^{\sigma _1} n\rceil \), such that \(1<\sigma _1<\sigma -1/b\) (a choice made possible by the condition \(\sigma >1+1/b\)), and use \(b<1\) to bound

$$\begin{aligned} \begin{aligned}&\left[ \frac{1}{k}\sum _{j=1}^k\left( \log \frac{1}{U_j}\right) ^{1/b}\right] ^b\le T_1^b+T_2^b,\\&T_1:=\frac{1}{k}\sum _{j=1}^{j_n}\left( \log \frac{1}{U_j}\right) ^{1/b},\quad T_2:=\frac{1}{k}\sum _{j=j_n+1}^{k}\left( \log \frac{1}{U_j}\right) ^{1/b}. \end{aligned} \end{aligned}$$
(3.15)

Pick \(\sigma _2\in \bigl (1,\,(\sigma -\sigma _1)b\bigr )\) and let \(\sigma _3:=(\sigma -\sigma _1)b-\sigma _2>0\). Using

$$\begin{aligned} T_1&\le \frac{j_n}{k}\left( \log \frac{1}{U_1}\right) ^{1/b},\quad P \{U_1\le u\}=1-(1-u)^{n-1}\le nu, \end{aligned}$$

we obtain:

$$\begin{aligned} \begin{aligned}&P \Bigl \{T_1^b> \log ^{-\sigma _3} n\Bigr \}\le \exp \bigl (-\log ^{\sigma _2} n+\log n\bigr )\le \exp \bigl (-0.5\log ^{\sigma _2} n\bigr ).\\ \end{aligned} \end{aligned}$$
(3.16)

We obviously need \(\sigma _2>1\) to overcome \(\log n\) term; besides we need all our small probabilities to be really small, i.e. of order \(\exp \bigl (-(\log n)^{1+\Delta }\bigr )\). In contrast, \(\sigma _3>0\) is good enough for the proof. Existence of the desired \(\sigma _i\) is assured by the starting constraint \(\sigma >1+1/b\). Since \(\sigma _2>1\), we see that \(T_1^b\) is vanishingly small with super-polynomially high probability, i.e. quite surely.

Turn to \(T_2\). Our estimates need to be rather sharp since we expect \(T_2^b\) to be q.s. the dominant contribution to the RHS in (3.15). Here is a key observation. If \(w_j\) are independent negative exponentials, with the same parameter, say 1, then, denoting \(\mathbf{U}=(U_1<\dots < U_{n-1})\), we have

$$\begin{aligned} \mathbf{U}\overset{{\mathcal {D}}}{\equiv }\left\{ \frac{W_i}{W_{n-1}}\right\} _{i\in [n-1]},\quad W_i:=\sum _{j=1}^iw_j, \end{aligned}$$
(3.17)

([27], Section 5.4). In particular,

$$\begin{aligned} T_1\overset{{\mathcal {D}}}{\equiv }\frac{1}{k}\sum _{j=1}^{j_n}\left( \log \frac{W_{n-1}}{W_j}\right) ^{1/b},\quad T_2\overset{{\mathcal {D}}}{\equiv }\frac{1}{k}\sum _{j=j_n+1}^{k}\left( \log \frac{W_{n-1}}{W_j}\right) ^{1/b}. \end{aligned}$$

The relation (3.17) allows us simply to define \( \mathbf{U}=\bigl \{W_i/W_{n-1}\bigr \}\). In the case of \(T_2\), both \(W_{n-1}\) and \(W_j\) are sums of the large numbers of independent \(w_j\), and this opens the door to the Chernoff-type estimates.

Here is a general, Chernoff-type, claim. Let \(X_1,\dots ,X_{\nu }\) be the independent copies of a random variable X such that \(f(\lambda ):=E \bigl [\exp (\lambda X)\bigr ]\) exists and is two-times continuously differentiable for \(\lambda \in (-\lambda _0,\lambda _0)\), for some \(\lambda _0>0\). Let \(S_{\mu }:=\sum _{j=1}^{\mu }X_j\). Then the distribution of \(S_{\mu }\) is exponentially concentrated around \(\mu f'(0)\). More precisely, there exist \(\Delta _0\in (0,1)\) and \(\rho >0\) such that,

$$\begin{aligned} P \bigl \{|S_{\mu }-\mu f'(0)|\ge \mu f'(0)\Delta \bigr \}\le 2e^{-\mu f'(0) \rho \Delta ^2},\quad \forall \,\Delta \in (0,\Delta _0). \end{aligned}$$
(3.18)

For our case \(X=w\) we have

$$\begin{aligned} E \bigl [e^{\lambda w}\bigr ]=f(\lambda ):=\frac{1}{1-\lambda },\quad |\lambda |<1, \end{aligned}$$

so that \(\lambda _0=1\), \(f'(0)=1\). So, by (3.18), for some \(\Delta _0\in (0,1)\),

$$\begin{aligned} P \bigl \{W_i\in [(1-\Delta )i,(1+\Delta )i]\bigr \}\ge 1 - 2\exp (-i\rho \Delta ^2),\quad \forall \,\Delta \in (0,\Delta _0). \end{aligned}$$
(3.19)

In particular, with exponentially high probability, \(W_{n-1}\in [n/2,2n]\).

Further, using the arithmetic-geometric mean inequality, we write

$$\begin{aligned}&T_2=\frac{1}{k}\sum _{j=j_n+1}^k\left( \log \frac{W_{n-1}}{W_j}\right) ^{1/b}\nonumber \\&\quad = \frac{1}{k}\sum _{j=j_n+1}^k\left( \log \frac{W_{n-1}/j}{\frac{1}{j}\sum _{i=1}^j w_i}\right) ^{1/b}\le \frac{1}{k}\sum _{j=j_n+1}^k\left( \log \frac{W_{n-1}/j}{\prod _{i=1}^j w_i^{1/j}}\right) ^{1/b}\nonumber \\&\quad =\frac{1}{k}\sum _{j=j_n+1}^k\left( \log \frac{W_{n-1}}{j}\right) ^{1/b}\left( 1-\frac{\log \prod _{i=1}^jw_i^{1/j}}{\log \frac{W_{n-1}}{j}}\right) ^{1/b}\nonumber \\&\qquad \text {using } 1 + x\le e^x\nonumber \\&\quad \le \frac{1}{k}\sum _{j=j_n+1}^k\left( \log \frac{W_{n-1}}{j}\right) ^{1/b}\cdot \, \exp \left( \frac{1}{b\log (W_{n-1}/j)}\cdot \frac{\sum _{i=1}^j\log \frac{1}{w_i}}{j}\right) . \end{aligned}$$
(3.20)

So this time we are dealing with the sums of logarithms of the independent exponentials \(w_i\). Observe that

$$\begin{aligned} \phi (\lambda ):=E \left[ \exp \left( \lambda \log \frac{1}{w}\right) \right] =\int _0^{\infty }e^{-z} z^{-\lambda }\,dz =\Gamma (1-\lambda ),\quad \lambda <1. \end{aligned}$$

So

$$\begin{aligned} \phi '(\lambda )=\left( E \left[ \exp \left( \lambda \log \frac{1}{w}\right) \right] \right) '_{\lambda }=E \left[ \left( \frac{1}{w}\right) ^{\lambda }\cdot \log \frac{1}{w}\right] = \left( \Gamma (1-\lambda )\right) ', \end{aligned}$$

hence \(\phi '(\lambda )\) exists, and is continuous, for \(\lambda <1\). Letting \(\lambda \rightarrow 0\), we obtain

$$\begin{aligned} \phi '(0)=E \left[ \log \frac{1}{w}\right] =-\left. \frac{d\Gamma (z)}{dz}\right| _{z=1}={ \gamma ,} \end{aligned}$$

where \(\gamma \) is the Euler constant. Likewise \(\phi ''(\lambda )\) exists and is continuous, for \(\lambda <1\). Using (3.18), we have: for some \(\Delta _0\in (0,1)\) and \(\rho >0\),

$$\begin{aligned} P \left\{ {\frac{1}{j}}\sum _{i=1}^j\log \frac{1}{w_i}\ge {\gamma (1+\Delta )}\right\} \le 2\exp (-j\rho \Delta ^2),\quad \forall \,\Delta \in (0, \Delta _0). \end{aligned}$$
(3.21)

Observe that on the event in this equation the exponential factor in (3.20) is \(1+O(\log ^{-1}n)\), uniformly for \(j\in [j_n+1,k_n]\). Since \(j>j_n=\lceil \log ^{\sigma _1} n\rceil \), and \(\sigma _1>1\), the probability of the union of the events in (3.21) over \(j\in [j_n+1,k]\) is of order \(\exp \bigl [-\Omega (\log ^{\sigma _1} n)\bigr ]\).

Therefore, in conjunction with (3.19) for \(i=n-1\), we have: with probability \(\ge 1-\exp \bigl (-\Omega (\log ^{\sigma _1} n)\bigr )\), the bound (3.20) implies

$$\begin{aligned} T_2\le & {} \frac{1+O(\log ^{-1}n)}{k}\sum _{j=j_n+1}^k\left( \log \frac{W_{n-1}}{j}\right) ^{1/b}\\\le & {} \frac{1+O(\log ^{-1}n)}{k}(k-j_n)\left( \log \frac{W_{n-1}}{j_n}\right) ^{1/b}\\= & {} \bigl (1+O(\log ^{-1}n)\bigr )\left( \log \frac{n}{j_n}\right) ^{1/b}, \end{aligned}$$

since \(j_n/k\le O\left( (\log n)^{\sigma _1-\sigma } \right) \ll (\log n)^{-1/b}\). Therefore

$$\begin{aligned} P \left\{ T_2^b> \bigl (1+O(\log ^{-1}n)\bigr )\cdot \log \frac{n}{j_n}\right\} \le \exp \bigl (-\Omega (\log ^{\sigma _1} n)\bigr ). \end{aligned}$$
(3.22)

Combining (3.15), (3.16) and (3.22), we obtain: with \(U_j=\frac{W_j}{W_{n-1}}\),

(3.23)

Finally, using (3.19), we have \(1/U_{k}=W_{n-1}/W_k=\Omega (n/k)\ge \Omega (n^{1/2})\) with probability exceeding

$$\begin{aligned} 1-\exp (-\Omega (k))-\exp (-\Omega (n))\ge 1-\exp \bigl (-\Omega (\log ^{\sigma }n)\bigr ). \end{aligned}$$

So, combining the above estimate with (3.14), we arrive at

$$\begin{aligned}&P \left\{ |X|^b \le \left[ 1+\left( \frac{a}{b}+O[\log ^{-1}(n/k)]\right) \frac{\log \log (n/k)}{\log (n/k)}\right] \log \left( \frac{n}{\log ^{\sigma _1}n}\right) \right\} \nonumber \\&\qquad \quad \qquad \ge 1-\exp \bigl (-\Omega (\log ^{\sigma _5} n)\bigr ),\quad \sigma _5:=\min \{\sigma , \sigma _4\}. \end{aligned}$$
(3.24)

Recall that \(k=O(n^{1/2})\). So for \(a\le 0\), the event on the LHS of (3.24) is contained in the event

$$\begin{aligned} C_n':=\left\{ |X|^b\le \log \frac{n}{\log ^{\sigma _1}n} +O\left( \frac{\log \log n}{\log n}\right) \right\} . \end{aligned}$$
(3.25)

For \(a>0\), the containing event is

$$\begin{aligned} C_n'':=\left\{ |X|^b\le \log \frac{n}{(\log n)^{\sigma _1-\frac{2a}{b}+O((\log \log n)^{-1})}}\right\} , \end{aligned}$$
(3.26)

because

$$\begin{aligned}&(\log n) \cdot \left( \frac{a}{b}+O[\log ^{-1}(n/k)]\right) \frac{\log \log (n/k)}{\log (n/k)}\\&\quad \le \frac{2a}{b}(\log \log n) \left( 1+O\left( \frac{\log \log n}{\log n}\right) \right) , \end{aligned}$$

with factor 2 coming from \(k_n=\lceil \alpha n^{1/2}\rceil \), the largest value of k under consideration. Using the formula (3.9), i.e.

$$\begin{aligned} F(x)= (c+O(|x|^{-\kappa })) |x|^a \exp (-|x|^b),\quad x\rightarrow -\infty , \end{aligned}$$

we see that \(nF(X)= \Omega \Bigl ((\log n)^{\sigma _1+\frac{a}{b}}\Bigr )\) on \(C_n'\) (i.e. for \(a\le 0\)), and \(nF(X)=\Omega \Bigl ((\log n)^{\sigma _1-\frac{2a}{b}}\Bigr )\) on \(C_n''\) (i.e. for \(a>0\)). We want \(nF(X)\gg \log n\) on the respective event \(C_n\). To ensure this property, we need to have \(\sigma _1 >1+\frac{|a|}{b}\) if \(a\le 0\), and \(\sigma _1>\frac{2a}{b}+1\) if \(a>0\). Recall though that for the desirable \(\sigma _i\) to exist, it was necessary and sufficient to have \(\sigma > \sigma _1 + 1/b\). So let us introduce \(\sigma (a,b)\), the infimum of admissible \(\sigma \), i.e.

$$\begin{aligned} \sigma (a,b)=\left\{ \begin{aligned}&1+\frac{|a|+1}{b},&\text { if }a\le 0,\\&1+\frac{2a+1}{b},&\text { if }a>0.\end{aligned}\right. \end{aligned}$$
(3.27)

Given \(\sigma >\sigma (a,b)\), we can choose \(\sigma _1^*=\sigma _1^*(a,b)\) as the middle point of its respective admissible range:

$$\begin{aligned} \sigma _1^*=\left\{ \begin{aligned}&\frac{1+|a|/b +\sigma -1/b}{2}=1+\frac{|a|}{b}+\frac{\sigma -\sigma (a,b)}{2},&\text { if }a\le 0,\\&\frac{1+2a/b+\sigma -1/b}{2}=1+\frac{2a}{b}+\frac{\sigma -\sigma (a,b)}{2},&\text { if } a>0. \end{aligned}\right. \end{aligned}$$

Consequently, both \(\sigma _1^*+a/b\) for \(a\le 0\) and \(\sigma _1^*-2a/b\) for \(a>0\) equal \(1+0.5(\sigma -\sigma (a,b))\). By (3.16), \(\sigma _2\) can be chosen arbitrarily close to, but strictly less than

$$\begin{aligned} \sigma _2^*=b(\sigma -\sigma _1^*) =1+\frac{b(\sigma -\sigma (a,b))}{2}, \end{aligned}$$
(3.28)

allowing \(\sigma _3\) to be positive. It turns out that \(\sigma _2^*<\sigma _1^*\), and consequently \(\min \{\sigma _j: j\in [5]{\setminus } \{3\}\}\) can be made arbitrarily close from below to \(\sigma _2^*\). Sure enough, \(\sigma _2^*>1\) since \(\sigma >\sigma (a,b)\). We have proved

Lemma 3.7

Given a, and \(b\in (0,1)\), let \(\sigma (a,b)\) be defined by (3.27). For \(\sigma > \sigma (a,b)\), let \(k_1=\lceil \log ^{\sigma }n\rceil \), \(k_n=\lceil \alpha n^{1/2}\rceil \). Then

$$\begin{aligned} P \left\{ \forall \,k\in [k_1,k_n]: nF\left( \frac{1}{k}\sum _{j=1}^k F^{-1}(U_j)\right) \ge \log ^{1+c} n\right\} \ge 1-\exp (-\log ^{1+d} n) \end{aligned}$$

if \(0<c<\frac{\sigma -\sigma (a,b)}{2}\) and \(0<d<\frac{b(\sigma -\sigma (a,b))}{2}\).

The next claim follows directly from Lemma 3.7 and

$$\begin{aligned} P \bigl \{{\bar{W}}_k\le kV_1\bigr \}=E \bigl [\bigl (1-\phi ( \mathbf{U})\bigr )^n\bigr ],\quad \phi (\mathbf{U})=F\left( \frac{1}{k}\sum _{j=1}^k F^{-1}(U_j)\right) . \end{aligned}$$

Theorem 3.8

Let \(b<1\). For all \(k\in [k_1,k_n]\),

$$\begin{aligned} P \bigl \{{\bar{W}}_k\le kV_1\bigr \} \le \exp \bigl (-\log ^{1+d} n\bigr ),\quad \forall \, d<\frac{b\,(\sigma -\sigma (a,b))}{2}. \end{aligned}$$

Consequently, by (2.1),

$$\begin{aligned} P \bigl \{K_n=k+1\bigr \} \le \exp \bigl (-\log ^{1+d} n\bigr ),\quad \forall \, d<\frac{b\,(\sigma -\sigma (a,b))}{2}. \end{aligned}$$

3.2.3 Case \({Q}=({M}+{M}^{{T}})/{{2}}\)

Consider the following random matrix model: first randomly generate an \(n\times n\) matrix M whose elements are i.i.d. random variables with the c.d.f. \(G(\cdot )\); then define \(Q=(M+M^T)/2\). Chen and Peng [12] studied the case when G is the standard normal distribution. Let us consider the more general case when for some a and positive b, c and \(\kappa \),

$$\begin{aligned} G(x)= (c+O(|x|^{-\kappa })) |x|^a \exp (-|x|^b),\quad x\rightarrow -\infty . \end{aligned}$$
(3.29)

We will assume that this asymptotic formula can be differentiated to yield an asymptotic formula for the density g(x). The diagonal entries of Q have distribution G, while the non-diagonal entries of Q have the distribution \(F(x)= (G\star G)(2x)\). Let us show that F satisfies the condition similar to (3.29).

Lemma 3.9

Suppose G satisfies the condition (3.29). Suppose that \(a>-1\) if \(b\le 1\). Then there exist \(c'>0\), \(\kappa '>0\) such that

$$\begin{aligned}&F(x)=(c'+O(|x|^{-\kappa '}))|x|^{a'}\exp \Bigl (-2^{\min (1,b)}|x|^b\Bigr ),\quad x\rightarrow -\infty ,\\&a'=\left\{ \begin{aligned}&2a+\frac{b}{2},&\text { if }\, b>1,\\&a+b-1,&\text { if }\,0<b<1,\\&2a+1,&\text { if }\,b=1. \end{aligned}\right. \end{aligned}$$

Note. Importantly, thanks to the factor \(2^{\min (1,b)}>1\), we have

$$\begin{aligned} \lim _{x\rightarrow -\infty }\frac{G(x)}{F(x)}=\infty . \end{aligned}$$
(3.30)

Chen et al. [12] demonstrated that, for the diagonal entries and the non-diagonal entries having respectively the distributions G and F,

$$\begin{aligned} P \bigl \{{\bar{W}}_k\le kV_1\bigr \}=E \left[ \left( 1-G\left( \frac{1}{k}\sum _{j=1}^k F^{-1}(U_j)\right) \right) ^n\right] . \end{aligned}$$

So by (3.30),

$$\begin{aligned} \left( 1-G\left( \frac{1}{k}\sum _{j=1}^kF^{-1}(u_j)\right) \right) ^n \le \left( 1-F\left( \frac{1}{k}\sum _{j=1}^kF^{-1}(u_j)\right) \right) ^n \end{aligned}$$
(3.31)

for \(\frac{1}{k}\sum _{j=1}^k F^{-1}(u_j)< -{\mathcal {S}}\), if \({\mathcal {S}}>0\) is sufficiently large. Therefore the argument in the previous section will apply to this model once we show that F meets the condition (3.9).

Proof

We will write \(f(x)\sim g(x)\) if, for some \(\omega >0\), \(f(x)/g(x)= 1+O(|x|^{-\omega })\) as \(x\rightarrow -\infty \). Differentiating the asymptotic formula (3.29), we have

$$\begin{aligned} g(x)=(cb+O(|x|^{-\kappa })) |x|^{a+b-1}\exp (-|x|^b),\quad x\rightarrow -\infty . \end{aligned}$$

Now

$$\begin{aligned} \begin{aligned} F(x)=&\; P \{Q_{i,j}\le x\}= P \{M_{i,j}+M_{j,i}\le 2x\}\\ =&\;2 P \{M_{i,j}+M_{j,i}\le 2x,\,M_{i,j}\le x\}- P ^2\{M_{i,j}\le x\}. \end{aligned} \end{aligned}$$
(3.32)

Here, by (3.29),

$$\begin{aligned} P ^2\{M_{i,j}\le x\}=(c^2+O(|x|^{-{\kappa }})) |x|^{2a} \exp (-2|x|^b),\quad x\rightarrow -\infty . \end{aligned}$$
(3.33)

Consider \( P \{M_{i,j}+M_{j,i}\le 2x,\,M_{i,j}\le x\}\).

Case \(\mathbf{b}>\mathbf{1}\). Picking \(\lambda \in (1,2)\) such that \(\lambda ^b>2\), we have: for \(x<0\),

$$\begin{aligned}&P \{M_{i,j}+M_{j,i}\le 2x,\,M_{i,j}\le x\}=\int \limits _{-\infty }^xG(2x-u) g(u)\,du\nonumber \\&\quad =\int \limits _{-\infty }^{\lambda x}G(2x-u)g(u)\,du+\int \limits _{\lambda x}^xG(2x-u)g(u)\,du=:\int _1+\int _2.\nonumber \\ \end{aligned}$$
(3.34)

Here, by (3.29), for \(x\rightarrow -\infty \),

$$\begin{aligned} \int _1\le G(\lambda x)\le 2c |\lambda x|^a e^{-|\lambda x|^b}, \end{aligned}$$
(3.35)

and

$$\begin{aligned}&\int _2\sim c^2b \int \limits _{\lambda x}^x |2x-u|^a \cdot |u|^{a+b-1}\exp \bigl (-|2x-u|^b-|u|^b\bigr )\,du\nonumber \\&\quad \,\,(\psi _x(u):=|2x-u|^b + |u|^b\text { attains its minimum over [2{ x},\,{ x}] at }u=x)\nonumber \\&\quad \sim c^2b\, |x|^{2a+b-1}\int \limits _{\lambda x}^x\exp \bigl (-|2x-u|^b-|u|^b\bigr )\,du\nonumber \\&\qquad \bigl (\text {Taylor-expanding }\psi _x(u)\text { at } u=x\bigr )\nonumber \\&\quad \sim c^2b\, |x|^{2a+b-1}\exp \bigl (-2|x|^b\bigr )\int \limits _{\lambda x}^{x}\exp \bigl (-b(b-1)|x|^{b-2}(x-u)^2\bigr )\,du\nonumber \\&\qquad \bigl (\text {extending the integration to }(-\infty ,x]\bigr )\nonumber \\&\quad \sim c^2b\,|x|^{2a+\frac{b}{2}} \exp \bigl (-2|x|^b\bigr )\sqrt{\frac{\pi }{b(b-1)}}\nonumber \\&\quad =\frac{c^2}{2}\sqrt{\frac{b\pi }{b-1}}\,|x|^{2a+\frac{b}{2}}\,\exp \bigl (-2|x|^b\bigr ). \end{aligned}$$
(3.36)

Combining the bounds (3.29), (3.33) and (3.36), and recalling that \(\lambda ^b >2\), we complete the proof.

Case \(\mathbf{b}\in (\mathbf{0},\mathbf{1}]\). This time we pick \(\lambda >2\) in (3.34). Substituting \(u=\xi x\) in the first line of (3.36), and using \(a>-1\), we have: for \(x\rightarrow -\infty \),

$$\begin{aligned} \frac{1}{c^2b}\int _2\sim |x|^{2a+b}\int \limits _{1}^{\lambda }|2-\xi |^a\,|\xi |^{a+b-1}\exp \Bigl (-|x|^b\bigl (|2-\xi |^b+|\xi |^b\bigr )\Bigr )\,d\xi .\qquad \end{aligned}$$
(3.37)

(It is the factor \(|2-\xi |^a\) that dictates the condition \(a>-1\).) For \(b<1\), the function \(|2-\xi |^b +|\xi |^b\) attains its minimum over \([1,\lambda ]\), which is \(2^b\), at \(\xi =2\). Further

$$\begin{aligned} |2-\xi |^b +|\xi |^b=2^b+|2-\xi |^b +O(|2-\xi |),\qquad \xi \rightarrow 2. \end{aligned}$$

Therefore

$$\begin{aligned}&\frac{1}{c^2b}\int _2\sim 2^{a+b-1}|x|^{2a+b}\int \limits _{1}^{\lambda }|2-\xi |^a\exp \Bigl (-|x|^b\bigl (2^b+|2-\xi |^b\bigr )\Bigr )\,d\xi \\&\quad =2^{a+b-1}|x|^{2a+b} e^{-|2x|^b}\int \limits _{1}^{\lambda }|2-\xi |^ae^{-|x(2-\xi )|^b}\,d\xi \\&\quad =2^{a+b-1}|x|^{2a+b} e^{-|2x|^b}\int _{-1}^{\lambda -2}\eta ^ae^{-|x\eta |^b}\,d\eta \\&\quad \sim 2^{a+b}|x|^{a+b-1}e^{-|2x|^b}\int _0^{\infty }z^a e^{-z^b}\,dz\\&\quad = \frac{2^{a+b}}{b} \Gamma \Big (\frac{a+1}{b}\Bigr ) |x|^{a+b-1} e^{-|2x|^b}. \end{aligned}$$

In combination with (3.35), the constraint \(\lambda >2\) and (3.33), this completes the proof for \(b<1\).

Consider \(b=1\). Starting with (3.37), a similar work shows that, for \(a>-1\),

$$\begin{aligned} \frac{1}{c^2b}\int _2\sim |x|^{2a+b}e^{-2|x|}\int _1^2(2-\xi )^a\xi ^{a+b-1}\,d\xi . \end{aligned}$$

So, again by (3.35), \(\int _1/\int _2\) is of order, roughly, \(e^{-(\lambda -2)|x|}\), and by (3.33), \( P ^2\{M_{i,j}\le x\}/\int _2\) is of order \(|x|^{-b}\). \(\square \)

Lemma 3.9 immediately yields the counterparts of Theorems 3.6 and 3.8.

Theorem 3.10

Let \(b\le 1\) and \(a>-1\). For

$$\begin{aligned} a'=\left\{ \begin{aligned}&a+b-1,&\text { if }\,0<b<1,\\&2a+1,&\text { if }\,b=1,\end{aligned}\right. \quad \sigma>\sigma (a',b):=\left\{ \begin{aligned}&1+\frac{|a'|+1}{b},&\text { if }a'\le 0,\\&1+\frac{2a'+1}{b},&\text { if } a'>0,\end{aligned}\right. \end{aligned}$$

and all k between \(\lfloor \log ^{\sigma }n\rfloor \) and \(k_n=\lceil \alpha n^{1/2}\rceil \), (\(\alpha >e\sqrt{2}\)), we have

$$\begin{aligned} P \bigl \{K_n=k+1\bigr \} \le \exp \bigl (-\log ^{1+d'} n\bigr ),\quad \forall \, d'<\frac{b(\sigma -\sigma (a',b))}{2}, \end{aligned}$$

Theorem 3.11

Let \(b>1\). Then, for all \(k\le k_n\), we have

$$\begin{aligned} P \bigl \{K_n=k+1\bigr \}&\le O\left( n\left( \frac{8}{9}\right) ^{k/4} +n\exp \left( -\frac{k\left( \log \frac{n}{k}\right) ^{\min \{0,a'/b\}}}{2e}\right) \right) ,\\ a'&:=2a+\frac{b}{2}. \end{aligned}$$

Consequently \(K_n=O_p\bigl ((\log n)^{\max (1,1/2-2a/b)}\bigr )\). So for the quasi-normal case \(a=-1\), \(b=2\) we have \(K_n=O_p\bigl (\log ^{3/2} n\bigr )\).

Note The reader certainly noticed that the inequality (3.31) is weaker than the inequality (3.30). It may well be possible to lower the powers of \(\log n\) in the likely order of \(K_n\) by using (3.30) fully, but the additional technicalities look to be disproportionately high.

4 Conclusion

Our results, together with [11, 12], demonstrate that the optimal solutions of randomly generated StQPs are sparse under very general distribution assumptions. It would be interesting to extend our analysis to portfolio selection problems in which the variance of a portfolio of assets with random returns is minimized [32] so as to diversify the investment risk. However, it has been observed empirically in the literature (see for instance [14, 19, 22]) that this does not lead to the diversification one would have expected, i.e., the solutions are usually quite sparse, when the empirical covariance matrices are constructed from real data. Our results and/or methodologies may allow us to provide an understanding of the sparsity of portfolio selection problems theoretically.

The sparsity of solutions holds beyond the randomly generated StQPs [13]. See also the references to \(L_1\) minimization research in the introduction. It would be important to identify a broader class of random quadratic optimization problems with many linear constraints that are likely to have solutions close to an extreme point of the attendant polyhedron.

It would also be interesting to explore how sparsity can be employed to facilitate the design of algorithms that are efficient on average. One possibility is to sift through all possible supports whose sizes are no more than the likely (poly-logarithmic) upper bound in our theorems. Our results indicate that, typically, the running time of even such a primitive algorithm is of order \(\exp (\log ^{\alpha } n)\), i.e. well below an exponential order. In this context, we refer the reader to Ye et al. [37] who develop a homotopy method for solving a sequence of quadratic programs with slightly varying problem parameters. Their computational experiments demonstrate adaptability of the method to solution sparsity.