Keywords

1 Introduction

1.1 Entropy Measures

Entropy, as a measure of randomness contained in a probability distribution, is a fundamental concept in information theory and cryptography. There exist many entropy definitions and they are not equally good for all applications. While the most famous (and most liberal) Shannon Entropy [Sha48], which quantifies the encoding length, is extremely useful in information theory, more conservative measures, like min-entropy (which quantifies unpredictability) or collision entropy (which bounds collision probability between samples), are necessary in cryptographic applications, like extracting randomness [NZ96, HILL99, RW05] or key derivation [DY13, BDK+11]. Any misunderstanding about what is a suitable entropy notion may be a serious problem not only of a theoretical concern, because it leads to vulnerabilities due to overestimating security. In fact, when entropy is overestimated, security of real-world applications can be broken [DPR+13]. Standards [BK12, AIS11] recommend to use more conservative entropy metrics in practical designs, but in the other hand Shannon entropy is easier to evaluate [AIS11] (in particular when the distribution of the randomness source is not exactly known) and moreover Shannon entropy estimators have already been relatively well studied and are being used in practice [Mau92, Cor99, BL05, LPR11].

1.2 Motivations and Goals of this Work

The aim of this paper is to provide sharp separation results between Shannon entropy and Renyi entropy (focusing on collision entropy and min-entropy). Under certain conditions, for example when consecutive bits of a given random variable are independent (produced by a memoryless source), they are comparable [RW05, Hol11] (this observation is closely related to a result in information theory known as the Asymptotic Equipartition Property [Cac97]). Such a simplifying assumption is used to argue about provable security of true random number generators [BKMS09, VSH11, LPR11], and may be enforced in certain settings, for example when certifying devices in a laboratory [BL05]. But in general (especially from a theoretical viewpoint) neither min-entropy (being of fundamental importance for general randomness extraction [RW05, Sha11]) nor collision entropy, useful for key derivation [DY13, BDK+11, Shi15], randomness extraction [HILL99], and random number generating [BKMS09, BST03]) cannot be well estimated by Shannon entropy. Still, in practice Shannon entropy remains an important tool for testing cryptographic quality of randomness [AIS11]. In this paper we address the natural question

How bad is Shannon entropy as an estimate of cryptographic quality of randomness?

and answer it in a series of bounds, focusing on three important cryptographic applications, which require entropy estimation: (a) uniformity testing, (b) general randomness extraction and (c) key derivation.

1.3 Our Results and Techniques

Brief Summary. We investigate in details the gap between Shannon Entropy and Renyi Entropy (focusing on smooth collision entropy and smooth min-entropy) in a given entropy source. We impose no restrictions on the source and obtain general and tight bounds, identifying the worst case. Our results are mostly negative, in the sense that the gap may be very big, so that even almost full Shannon Entropy does not guarantee that the given distribution is close to uniform or that it may used to derive a secure key. This agrees with folklore. However, to the best of our knowledge, our analysis for the first time provides a comprehensive and detailed study of this problem, establishing tight bounds. Moreover, our techniques may be of independent interests and can be extended to compare Renyi entropy of different orders.

Results and Corollaries. Bounding Renyi Entropy by Shannon Entropy. Being interested in establishing a bound on the amount of extractable entropy in terms of Shannon Entropy only, we ask the following question

Q: Suppose that the Shannon Entropy \(H_1(X)\) of an n-bit random variable X is at least k. What is the best lower bound on the collision entropy \(H_2(X)\)?

We give a complete answer to this question in Sect. 3.1. It is briefly summarized in Table 1 below.

Table 1. Minimal collision entropy given Shannon entropy constraints.

The Shape of the Worst-case Distribution. Interestingly, the description of the “worst” distribution X is pretty simple: it is a combination of a one-point heavy mass with a flat distribution outside. In fact, it has been already observed in the literature that such a shape provides good separations for Shannon Entropy [Cac97]. However, as far as we know, our paper is the first one which provides a full proof that this shape is really best possible.

Infeasibility of Uniformity Tests Based on Entropy Estimators. If an n-bit random variable X satisfies \(H_1(X)=n\) then it must be uniform. It might be tempting to think that a very small entropy gap \(\varDelta =n-H_1(X)\) (when entropy is very “condensed”) implies closeness to the uniform distribution. Clearly, this is a necessary condition. For example, standards for random number generating [AIS11] require the Shannon entropy of raw bits to be at least 0.997 per bit on average.

Q: Suppose that the Shannon Entropy \(H_1(X)\) of an n-bit random variable X is at least \(n-\varDelta \), where \(\varDelta \approx 0\). What is the best upper bound on the distance between X and the uniform distribution \(U_n\)?

There are popular statistical randomness tests [Mau92, Cor99] which are based on the fact that very small \(\varDelta \) is necessary to a very small statistical distance. Theoretically, they can detect any deviation at any confidence level. In this paper we quantify what is well known in folklore, namely that this approach cannot be provable secure and efficient at the same time. Based on the results summarized in Table 1, we prove that for the statistical distance (the \(\ell _1\) distance) the gap \(\varDelta \) can be as small as \(\epsilon \) but still the source is \(\epsilon /n\)-far from the uniform distribution. Putting this statement around, to guarantee \(\epsilon \)-closeness we need to estimate the entropy up to a tiny margin \(n\epsilon \). This shows that an application of entropy estimators to test sequences of truly random bits may be problematic, because estimating entropy within such a small margin is computationally inefficient. Having said this, we stress that entropy estimators like Maurer’s Universal Test [Mau92] are powerful tools capable of discovering most of defects which appear within a broader margin of error.

Large Gap Between Shannon and Smooth Collision Entropy. Many constructions in cryptography require min-entropy. However, the weaker notion of collision entropy found also many applications, especially for problems when one deals with imperfect randomness. The collision entropy of a distribution X constitutes a lower bound on the number of extractable almost-uniform bits, according to the Leftover Hash Lemma [HILL99, RW05]. Moreover, the recent improvements in key derivation [DY13, BDK+11] show that for some applications we can use high collision entropy to generate secure keys, wasting much less entropy comparing to extractors-based techniques (see Sect. 2.5). For example, consider the one-time MAC with a 160-bit key over \(GF(2^{80})\), where the key is written as (ab) and the tag for a message x is \(ax+b\). The security is \(\epsilon =2^{-80}\) when the key is uniform [DY13]. We also know that it is \(\epsilon = 2^{-70}\)-secure when the key has \(150=160-10\) bits of collision entropy. Suppose that a Shannon entropy estimator indicates 159 bits of entropy. Is our scheme secure? This discussion motivates the following question

Q: Suppose that the Shannon Entropy \(H_1(X)\) of a random variable \(X\in \{0,1\}^n\) is at least \(n-\varDelta \) where \(\varDelta \leqslant 1\). What is the best lower bound on \(H_2(X)\)? Does it help if we consider only \(H_2(X')\) where \(X'\) is close to X?

As a negative result, we demonstrate that the gap between the Shannon Entropy and Renyi Entropy could be almost as big as the length of the entropy source output (that is almost maximal possible). Moreover, smoothing entropy, even with weak security requirements, does not help. For example, we construct a 256-bit string of more than 255 bits of Shannon Entropy, but only 19 bits of (smooth) Renyi entropy. This is just an illustrative example, we provide a more general analysis in Corollary 4 in Sect. 4.

Large Gap Between Shannon and Extractable Entropy. Min entropy gives only a lower bound on extractable entropy. However, its smooth version can be used to establish an upper bound on the amount of almost random bits, of required quality, that can be extracted from a given source [RW05].

Q: Suppose that the Shannon Entropy \(H_1(X)\) of a random variable \(X\in \{0,1\}^n\) is at least \(n-\varDelta \) where \(\varDelta < 1\). How many bits that are close to uniform can be extracted from X?

Again, analogously to the previous result, we provide a separation between Shannon and extractable entropy, where the gap is almost as big as the length of the random variable. For example, we construct a 256-bit string of more than 255.5 bits of Shannon Entropy, but only 10 bits of extractable entropy, even if we allow them to be of very weak quality, not really close to uniform! This is just an illustrative example, we provide a more precise and general statement. To our knowledge, the concrete tight bounds we provide are new, though a similar “extreme” numerical example can be found in [Cac97]. The separation is again a straightforward application of ideas behind the proof of the results in Table 1

Converting Shannon Entropy into Renyi Entropy. Even though the gap in our separations are almost as big as the length of the source output, there might be small amount of Renyi Entropy in every distribution of high Shannon Entropy.

Q: Suppose that the Shannon Entropy of an n-bit random variable X is at least \(n-\varDelta \) where \(\varDelta \geqslant 1\). Does X have some non-trivial amount of collision entropy?

This question may be relevant in settings, when one would like to check whether some (not really big though) collision entropy is present in the source. For example, there are necessary conditions on security of message authentication codes in terms of collision entropy [Shi15]. We establish a simple and tight bound on this amount: it is about \(2\log _2 n -2\log _2\varDelta \). For example, in the concrete case of a 256-bit string of Shannon Entropy 255 we find that the necessary amount of Renyi entropy is 15. We also establish an interesting rule of thumb: for much more than one bit of Renyi entropy in the output of a source, its Shannon Entropy must be bigger than the half of its length. Again, besides this numerical example we provide detailed and general bounds.

Techniques. To prove our main technical results, we use standard convex optimization techniques combined with some calculus which allows us to deal with implicit equations. In particular, we demonstrate that the Lambert-W function is useful in studying Shannon Entropy constraints.

1.4 Organization of the Paper

We start with necessary definitions and explanations of basic concepts in Sect. 2. Our main result is discussed in Sect. 3. Further applications are given in Sect. 4. We end with the conclusion in Sect. 5. The proofs of main results, which are technical and complicated a bit, appear in Sect. 5.

2 Preliminaries

2.1 Basic Notions

By \(U_S\) we denote the uniform distribution over a set S, and \(U_n\) is a shortcut for the uniform n-bit distribution. The probability mass function of a random variable X is denoted by \(P_X\).

2.2 Quantifying Closenes of Distributions

The closeness of two distributions XY over the same domain \(\varOmega \) is most commonly measured by the so called statistical or variational distance \(\mathrm {SD}(X;Y)\). It is defined as the half of the \(\ell _1\)-distance between the probability mass functions \(\mathrm {SD}(X;Y) = \frac{1}{2}\mathrm {d}_1 (P_X;P_Y) = \frac{1}{2}\sum _{x}\left| \Pr [X=x]-\Pr [Y=x]\right| \). In this paper we use also the \(\ell _2\)-distance between probability distributions, defined as \( \mathrm {d}_2(P_X;P_Y) = \sqrt{\sum _{x}\left( \Pr [X=x]-\Pr [Y=x]\right) ^2} \). These two \(\ell _p\) distances are related by \(d_2(\cdot ) < d_{1}(\cdot ) \leqslant \sqrt{ |\varOmega | }\cdot d_2(\cdot ) \). In information theory the closeness of two distributions is often measures using so called divergences. The Kullback-Leibler divergence between X and Y is defined as \(\mathrm {KL}(X\parallel Y) = -\sum _{x} P_X(x)\log \frac{P_X(x)}{P_Y(x)}\), and the Renyi divergence of order 2 equals \(D_2\left( X\parallel Y\right) = \sum _{x} \frac{(P_X(x)-P_Y(x))^2}{P_X(x)}\). We have \(D_{2}\left( X\parallel U_S \right) =H_2(U_S)-H_2(X) =\log _2\left( |S|\mathrm {CP}(X)\right) \).

For convenience we define also the collision probability of X as the probability that two independent copies of X collide: \(\mathrm {CP}(X) = \sum _{x}\Pr [X=x]^2\).

2.3 Entropy Definitions

Below we define the three key entropy measures, already mentioned in the introduction. It is worth noting that they all are special cases of a much bigger parametrized family of Renyi entropies. However the common convention in cryptography, where only these three matter, is to slightly abuse the terminology and to refer to collision entropy when talking about Renyi entropy, keeping the names for Shannon and Min-Entropy.

Definition 1

(Entropy Notions). The Shannon Entropy \(\mathrm {H}(X)=H_1(X)\), the collision entropy (or Renyi entropy) \(H_2(X)\), and the Min-Entropy \(\mathrm {H}_{\infty }(X)\) of a distribution X are defined as follows

$$\begin{aligned} \mathrm {H}(X )&= \sum _{x}\Pr [X=x]\log \Pr [X=x] \end{aligned}$$
(1)
$$\begin{aligned} H_2(X)&= -\log \left( \sum _{x}\Pr [X=x]^2\right) \end{aligned}$$
(2)
$$\begin{aligned} \mathrm {H}_{\infty }(X)&= -\log \max _{x}\Pr [X=x]. \end{aligned}$$
(3)

Remark 1

(Comparing Different Entropies). It is easy to see that we have

$$\begin{aligned} \mathrm {H}(X ) \geqslant H_2(X ) \geqslant \mathrm {H}_{\infty }(X ), \end{aligned}$$

with the equality if and only if X is uniform.

2.4 Entropy Smoothing

The Concept. Entropy Smoothing is a very useful concept of replacing one distribution by a distribution which is very close in the statistical distance (which allows keeping its most important properties, like the amount of extractable entropy) but more convenient for the application at hand (e.g. a better structure, removed singularities, more entropy).

Applications of Smooth Entropy. The smoothing technique is typically used to increase entropy by cutting off big but rare “peaks” in a probability distribution, that is point masses relatively heavy comparing to others. Probably the most famous example is the so called Asymptotic Equipartition Property (AEP). Imagine a sequence X of n independent Bernoulli trials, where 1 appears with probability \(p > 1/2\). Among all n-bit sequences the most likely ones are those with 1 in almost all places. In particular \(\mathrm {H}_{\infty }({X}) = -n\log p\). However, for most of the sequences the number of 1’s oscillates around pn (these are so called typical sequences). By Hoeffding’s concentration inequality, the number of 1’s is at most \(pn + h\) with probability \(1-\exp (-2h^2/n)\). For large n and suitably chosen h, the distribution of X approaches a distribution \({X}'\) of min-entropy \(\mathrm {H}_{\infty }({X}') \approx -n( p\log p + (1-p)\log (1-p) )\approx \mathrm {H}({X})\) (the relative error here is of order \(O(n^{-1/2})\)), much larger than the min-entropy of the original distribution! A quantitative version of this fact was used in the famous construction of a pseudorandom generator from any one-way function [HILL88]. Renner and Wolf [RW04] revisited the smoothing technique in entropy framework and came up with new applications.

Definition 2

(Smooth Entropy, [RW04]). Suppose that \(\alpha \in \{1,2,\infty \}\). We say that the \(\epsilon \)-smooth entropy of order \(\alpha \) of X is at least k if there exists a random variable \(X'\) such that \(\mathrm {SD}(X;X')\leqslant \epsilon \) and \(\mathrm {H}_{\alpha }(X')\geqslant k\).

For shortness, we also say smooth Shannon Entropy, smooth Renyi entropy or smooth min-entropy. We also define the extractable entropy of X as follows

Definition 3

(Extractable Entropy, [RW05]). The \(\epsilon \)-extractable entropy of X is defined to be

$$\begin{aligned} \mathrm {H}_{\mathrm {ext}}^{\epsilon }(X)&=\max _{\mathcal {U}:\ \exists f \in \varGamma ^{\epsilon }(X\rightarrow \mathcal {U})}\log |\mathcal {U}| \end{aligned}$$
(4)

where \(\varGamma ^{\epsilon }(X\rightarrow \mathcal {U})\) consists of all functions f such that \(\mathrm {SD}(f(X,R); U_{\mathcal {U}},R)\leqslant \epsilon \) where R is uniform and independent of X and \(U_{\mathcal {U}}\).

2.5 Randomness Extraction and Key Derivation

Roughly speaking, an extractor is a randomized function which produces an almost uniform string from a longer string but not of full entropy. The randomization here is necessary if one wants an extractor working with all high-entropy sources; the role of that auxiliary randomness is similar to the purpose of catalysts in chemistry.

Definition 4

(Strong Extractors [NZ96]). A strong \((k,\epsilon )\)-extractor is a function \(\mathrm {Ext}: \{0,1\}^{n}\times \{0,1\}^{d} \rightarrow \{0,1\}^{k}\) such that

$$\begin{aligned} \mathrm {SD}( \mathrm {Ext}(X,U_d), U_d; U_{k+d}) \leqslant \epsilon . \end{aligned}$$
(5)

A very simple, efficient and optimal (with respect to the necessarily entropy loss) extractor is based on universal hash functions. Recall that a class \(\mathcal {H}\) of functions from n to m bits is universal [CW79] if for any different xy there are exactly \(|\mathcal {H}|/2^m\) functions \(h\in \mathcal {H}\) such that \(h(x)=h(y)\).

Lemma 1

(Leftover Hash Lemma). Let \(\mathcal {H}\) be a universal class of functions from n to random m bits, let H be chosen from \(\mathcal {H}\) at random and let X be an n-bit variable. If \(H_2(X) \geqslant k\), then \(\mathrm {SD}(H(X),H; U_m,H) \leqslant \frac{1}{2}\cdot 2^{\frac{m-k}{2}}\).

By Lemma 1 and the properties of the statistical distance we obtain

Corollary 1

(Bound on Extractable Entropy, [RW05]). We have \(H^{\epsilon }_{\infty }(X) \geqslant {H}^{\epsilon }_{\mathrm {ext}}(X) \geqslant {H}^{\epsilon /2}_2(X)-2\log (1/\epsilon )-1\).

Note that to extract k bits \(\epsilon \)-close to uniform we need to invest \(k+2\log (1/\epsilon )\) bits of (collision) entropy; the loss of \(2\log (1/\epsilon )\) bits here is optimal [RTS00]. While there are many other extractors, the Leftover Hash Lemma is particularly often used in the TRNG design [BST03, BKMS09, VSH11] because it is simple, efficient, and provable secure. Extractors based on the LHL are also very important in key derivation problems [BDK+11]. Note that the LHL uses only collision entropy, weaker than min-entropy.

To get an illustrative example, note that deriving a key which is \(\epsilon \)-close to uniform with \(\epsilon =2^{-80}\) requires losing \(L = 2\log (1/\epsilon ) = 160\) bits of entropy. Sometimes we can’t afford to lose so much. In special cases, in particular for so called square-friendly applications [BDK+11, DY13] we can get an improvement over Corollary 1. In particular, for these applications (which include message authentication codes or digital signatures), we can apply X of collision entropy \(k<m\), still achieving some non-trivial security.

Theorem 1

(Beating the \(2\log (1/\epsilon )\) Entropy Loss for Some Applications. [BDK+11]). Let P be an \(\epsilon \)-secure and \(\sigma \)-square-secure application (when keyed with \(U_m\)). Let \(\mathcal {H}\) be a universal class of functions from n to random m bits, let H be chosen from \(\mathcal {H}\) at random. Then for any X of length n and collision-entropy k, the application P keyed with H(X) given H is \(\epsilon '\)-secure when \(\epsilon ' \leqslant \epsilon +\sqrt{\sigma }\cdot 2^{\frac{m-k}{2}}\).

In particular, when \(\sigma = \epsilon \) we get around of the RT-bound, achieving \(\epsilon '\approx 2\epsilon \) with only \(k=m+\log (1/\epsilon )\). This way we save \(\log (1/\epsilon )\approx 80\) bits.

3 Main Result

In this section we calculate what is the minimal collision entropy in a distribution having a certain amount of Shannon entropy. First, by means of convex optimization, we show in Sect. 3.1 that the uniform distribution with one extra heavy mass is the “worst” shape. Next, using some facts about Lambert W function, in Sect. 3.2 we solve the corresponding implicit equation and derive a closed-form answer.

3.1 Maximizing Collisions Given Shannon Entropy

Below we answer the posted question on the best bound on \(H_2\) in terms of \(H_1\). The “worst case” distribution, which minimizes the gap, is pretty simple: it is a combination of a one-point mass at some point and a uniform distribution outside.

Theorem 2

Let X be a random variable with values in a d-element set. If \(\mathrm {H}(X) = k\), then

$$\begin{aligned} H_2(X) \geqslant -\log _2\left( b^2 + \frac{(1-b)^2}{d-1} \right) \end{aligned}$$
(6)

where b is the unique solution to

$$\begin{aligned} \mathrm {H}(b) + (1-b)\log _2(d-1) = k \end{aligned}$$
(7)

under the restriction \(b \geqslant \frac{1}{d}\) (H(b) denotes the entropy of a bit equal 1 with probability b). The bound in Eq. (6) is best possible.

Remark 2

(The Implicit Equation in Theorem 2 ). The number b is defined nondirectly depending on d and k. In Sect. 3.2, we will show how to accurately approximate the solution of this equation.

The proof of Theorem 2 appears in Appendix A. The main idea is to write down the posted question as a constrained optimization problem and apply standard Lagrange multipliers techniques.

3.2 Closed-Form Bounds for Solutions

Below we present a tight formula approximating the solution to Eq. (7). We will substitute it to Eq. (6) in order to obtain a closed-form expression.

Lemma 2

(The solution for Moderate Gaps). Let b be the solution to Eq. (7) and let \(\varDelta = \log _2 d-k\) be the entropy gap. Suppose \(d \varDelta \geqslant 13\). Then we have

$$\begin{aligned} \frac{0.84\varDelta }{\log _2(d\varDelta ) - 1.52} \leqslant b \leqslant \frac{1.37\varDelta }{\log _2(d\varDelta ) - 1.98} \end{aligned}$$
(8)

The proof is referred to Appendix B. The main idea is to solve Eq. (8) approximately using the so called Lambert W function, that matches Shannon-like expressions of the form \(y\log y\). Here we discuss the lemma and its applications.

Remark 3

(Establishing Tighter Constants). The inspection of the proof shows that the numerical constants in Lemma 2 can be made sharper, if needed. Under the mild assumption that \(\varDelta ^{-1}=2^{o(\log _2 d)}\) one can get

$$\begin{aligned} b = \frac{(1+o(1))\varDelta }{\log _2(d\varDelta ) - \log _2\mathrm {e}-\log _2\log _2\mathrm {e} + o(1)} \end{aligned}$$
(9)

The gap between 1.52 and 1.98 is self-improving, in the sense that knowing in advance a better upper bound on b one can make it closer to 0. In turn, the gap between 0.84 and 1.37 can be made closer to 0 by choosing in the proof a more accurate approximation for the Lambert W function.

Now we are ready to compute minimal collision entropy given Shannon Entropy.

Corollary 2

(Minimal Collision Entropy, General Case). Let \(X^{*}\) minimizes \(H_2(X)\) subject to \(\mathrm {H}(X) \geqslant n-\varDelta \) where X takes its values in a given d-element set. If \(d\varDelta \geqslant 13\) then

$$\begin{aligned} \frac{0.55\varDelta }{\log _2(d\varDelta )}\leqslant \mathrm {d}_2(X^{*};U) \leqslant \frac{3.24\varDelta }{\log _2(d\varDelta )}, \end{aligned}$$
(10)

where U is uniform over the domain of X. If \(d\varDelta < 13\) then

$$\begin{aligned} \mathrm {d}_2(X^{*};U) < 0.88\varDelta . \end{aligned}$$
(11)

The collision entropy is obtained as \(H_2(X^{*})=-\log _2\left( \frac{1}{d}+\mathrm {d}_2(X^{*};U)^2\right) \).

Proof

(Proof of Corollary 2 ). We will consider two cases.

figure a

. By Lemma 2 we get

$$\begin{aligned} \frac{0.84\varDelta }{\log _2(d\varDelta )} \leqslant b\leqslant \frac{2.95\varDelta }{\log _2(d\varDelta )} \end{aligned}$$
(12)

By the last inequality and the fact that \(x\rightarrow \frac{x}{\log _2 x}\) is increasing for \(x\geqslant \mathrm {e}\) we get

$$\begin{aligned} bd\geqslant \frac{0.84d\varDelta }{\log _2(d\varDelta )} \geqslant 2.95 \end{aligned}$$

Let \(b_0 = \frac{1}{d}\). By the last inequality we get \(b-b_0 \geqslant 0.66 b\). Since

$$\begin{aligned} b^2 + \frac{(1-b)^2}{d-1} =b_0 +\frac{d}{d-1}\cdot (b-b_0)^2, \end{aligned}$$

by the identity \(\mathrm {d}_2(X;U)^2 = \sum _{x}\Pr [X=x]^2-\frac{1}{d}\) and the definition of collision entropy we get

$$\begin{aligned} \mathrm {d}_2(X^{*},U)^2 = \mathrm {CP}(X^{*}) -b_0 = \frac{d}{d-1}\cdot (b-b_0)^2. \end{aligned}$$

Note that \(d\varDelta \geqslant 13\) implies \(d\log _2 d\geqslant 13\) (because \(\varDelta \leqslant \log _2 d\)) and hence \(d > 5\). By this inequality and \(b-b_0 \geqslant 0.66 b\) we finally obtain

$$\begin{aligned} 0.43 b^2 \leqslant \mathrm {d}_2(X^{*};U)^2 \leqslant 1.2 b^2 \end{aligned}$$
(13)

and the result for the case \(d\varDelta \geqslant 13\) follows by combining Eqs. (12) and (13).

figure b

. We do a trick to “embed” our problem into a higher dimension. If \(\varvec{p}\in \mathbb {R}^{d}\) is the distribution of X, define \(\varvec{p}'\in \mathbb {R}^{d+1}\) by \(\varvec{p}'_i = (1-\gamma )\varvec{p}'_i\) for \(i\leqslant d\) and \(\varvec{p}'_{d+1} = \gamma \). It is easy to check that \(H_1(\varvec{p}') =-(1-\gamma )\log _2(1-\gamma )-\gamma \log _2 \gamma + (1-\gamma )H_1(\varvec{p})\). Setting \(\gamma = \frac{1}{1+2^{H_1(\varvec{p})}}\) we get

$$\begin{aligned} H_1(\varvec{p}') -H_1(\varvec{p})&=-(1-\gamma )\log _2(1-\gamma )-\gamma \log _2 \gamma -\gamma H_1(\varvec{p}) \\&-(1-\gamma )\log _2(1-\gamma )-\gamma \log _2 \left( 2^{H_1(\varvec{p})}\gamma \right) \\&= \log _2 \frac{2^{H_1(\varvec{p})}+1}{2^{H_1(\varvec{p})}} \\&\geqslant \log _2\frac{d+1}{d} \\&\geqslant (1-b)\log _2\frac{d}{d-1} \end{aligned}$$

where the first inequality follows by \(H_1(\varvec{p}) \leqslant \log _2 d\), and the second inequality follows because \(b\geqslant \frac{1}{d}\) implies that is suffices to prove \( \log _2\frac{d+1}{d} \geqslant \left( 1-\frac{1}{d}\right) \log _2\frac{d}{d-1}\) or equivalently that \(d\log _2\frac{d+1}{d} \geqslant (d-1)\log _2\frac{d}{d-1}\); this is true because the map \(u\rightarrow u\log _2(1+u^{-1}) \) is increasing in u for \(u > 0\) (we skip an easy argument, which simply checks the derivative). Since \(H_1(\varvec{p}') -H_1(\varvec{p}) = 0\) for \(\gamma = 0\) and since \(H_1(\varvec{p}') -H_1(\varvec{p}) \geqslant (1-b)\log _2\frac{d}{d-1} > (1-b)\log _2\frac{d+1}{d}\) for \(\frac{1}{1+2^{H_1(\varvec{p})}}\) for \(\geqslant (1-b)\log _2\frac{d}{d-1}\), by continuity we conclude that there exists \(\gamma =\gamma _b\), between 0 and \(\frac{1}{1+2^{H_1(\varvec{p})}}\), such that \(\varvec{p}'\) satisfies

$$\begin{aligned} (1-b)\log _2\frac{d+1}{d} = H_1(\varvec{p}') -H_1(\varvec{p}). \end{aligned}$$

Adding this Eq. (7) by sides, we conclude that also b solves 7 with the dimension d replaced by \(d'=d+1\) and the constraint k replaced by \(k' = H_1(\varvec{p}')\). By \(H_1(\varvec{p}') -H_1(\varvec{p}) \geqslant \log _2\frac{d+1}{d}\) we conclude that \(\varDelta '= \log _2 (d+1)-H_1(\varvec{p}') \leqslant \log _2 d-H_1(\varvec{p})=\varDelta \) so the entropy gap is even smaller. After a finite number of step, we end with \(\varDelta ' \leqslant \varDelta \), the same b and \(d'\varDelta ' \geqslant 13\). Then by the first case we get that the squared distance is at most \(O(\varDelta '^2) = O(\varDelta ^2)\).

4 Applications

4.1 Negative Results

The first result we provide is motivated by uniformness testing based on Shannon entropy. We hope that n-bit distribution with entropy \(n-\varDelta \) where \(\varDelta \approx 0\), that is with an extremely small entropy defficency, is close to uniform. We show that for this to be true, \(\varDelta \) has to be negligible.

Corollary 3

(Shannon Entropy Estimators are Inefficient as Uniformity Tests). Suppose that \(n \gg 1\) and \(\epsilon > 2^{-0.9n}\). Then there exists a distribution \(X\in \{0,1\}^n\) such that \(H_1(X) \geqslant n-\epsilon \) but \(\mathrm {SD}(X;U_n) = \varOmega (\epsilon /n)\).

Remark 4

Note that typically one estimates Shannon Entropy within an additive error O(1). However here, to prove that the distribution is \(\epsilon \)-close to uniform, one has to estimate the entropy with an error \(O(n\epsilon )\), which is much tighter! The best known bounds on the running time for an additive error \(O(\epsilon )\) are polynomial in \(\epsilon \) [AOST14, Hol06]Footnote 1. With \(\epsilon \) secure (meaning small) enough for cryptographic purposes, such a precision is simply not achievable within reasonable time.

Proof

(Proof of Corollary 3 ). Take \(d=2^{n}\) in Corollary 2 and \(\varDelta = \epsilon \). Suppose that \(\varDelta = \varOmega (2^{-0.9 n})\). We have \(\mathrm {d}_2(X;U_n) = \varTheta (\varDelta n^{-1})\). In the other hand we have the trivial inequality \(\mathrm {d}_2(X;U_n) \leqslant 4\cdot \mathrm {SD}(X;U_n)\) (which is a consequence of standard facts about \(\ell _p\)-norms) and the result follows.

Corollary 4

(Separating Smooth Renyi Entropy and Shannon Entropy). For any n,\(\delta \) such that \(2^{-n} <\delta <\frac{1}{6}\), there exists a distribution \(X\in \{0,1\}^n\) such that \(\mathrm {H}(X) \geqslant (1-2\delta )n +\log _2(1-2^{-n})\), \(H_2(X) \leqslant 2\log _2(1/\delta )-2\) and \(\mathrm {H}^{\epsilon }_2(X) \leqslant H_2(X)+1\) for every \(\epsilon \leqslant \delta \). For a concrete setting consider \(n=256\) and \(\delta = 2^{-10}\). We have \(\mathrm {H}(X) > 255\) but \(\mathrm {H}_{2}(X) \leqslant 18\) and \(\mathrm {H}_{{2}}^{\epsilon }(X)\leqslant 19\) for every \(\epsilon < 2^{-9}\)!

Proof

We use a distribution of the same form as the optimal distribution as for problem (15). Denote \(N=2^n\) and define \(\varvec{p}_i = \frac{1-2\delta }{N-1}\) for \(i=2,\ldots ,N\), and \(\varvec{p}_1 = 2\delta \). It is easy to see that \(\mathrm {H}(\varvec{p}) \geqslant (1-2\delta )n+\log _2(1-2^{n})\) and \(H_2(\varvec{p}) < \log (1/\delta )-2\). Consider now arbitrary distribution \(\varvec{p}'\) such that \(\mathrm {SD}(\varvec{p}; \varvec{p}') \leqslant \epsilon \). We have \(\varvec{p}'_i = \varvec{p}_i + \epsilon _i\) where \(\sum _{i}\epsilon _i = 0\) and \(\sum _{i}|\epsilon _i| = 2\epsilon \). Note that

$$\begin{aligned} \sum _{i>1}{\varvec{p}'}_i^2 - \sum _{i>1}\varvec{p}_i^2&> 2\sum _{i>1}\varvec{p_i} \epsilon _i \\&> -\frac{2(1-2\delta )\epsilon }{N-1} \\&= -\frac{2\epsilon }{1-2\delta }\cdot \sum _{i>1}\varvec{p_i}^2, \end{aligned}$$

and \({\varvec{p}'}_1^2 - {\varvec{p}}_1^2 \geqslant -\delta ^2 = -\frac{1}{2}{\varvec{p}}_1^2\). Thus, for \(2\epsilon +\delta < \frac{1}{2}\) it follows that \(\sum _{i\geqslant 1}{\varvec{p}'}_i^2 \geqslant \left( 1-\frac{1}{2}\right) \sum _{i\geqslant 1}{\varvec{p}}_i^2\) and the result follows.

Corollary 5

(Separating Extractable Entropy and Shannon Entropy). For any \(n\geqslant 1\), \(\epsilon \in (0,1)\) and \(\delta > 2^{-n}\), there exists a random variable \(X\in \{0,1\}^n\) such that \(\mathrm {H}(X) \geqslant (1-\epsilon -\delta )n+\log _2(1-2^{-n})\) but \(\mathrm {H}_{\mathrm {ext}}^{\epsilon }(X) \leqslant \log (1/ \delta )\). For a concrete setting consider \(n=256\) and \(\delta = 2^{-10}\). We have \(\mathrm {H}(X) > 255.5\) but \(\mathrm {H}_{\mathrm {ext}}^{\epsilon }(X) \leqslant 10\) for every \(\epsilon < 2^{-10}\)!

Proof

(Proof of Corollary 5 ). We use a distribution of the same form as the optimal distribution as for problem (15). Fix \(\epsilon ,\delta \) (we can assume \(\epsilon +\delta <1\)) and denote \(N=2^{n}\). We define \(\varvec{p}_i = \frac{1-\epsilon -\delta }{N-1}\) for \(i=2,\ldots ,N\), and \(\varvec{p}_1 = \epsilon +\delta \). Note that \(\varvec{p}_i < \delta \) for \(i\not =1\). It follows then that \(\mathrm {H}_{\infty }^{\epsilon }(\varvec{p}) \leqslant \log (1/\epsilon )\). In the other hand, note that \(\varvec{p}\) is a convex combination of the distribution uniform over the first \(N-1\) points (with the weight \(1-\epsilon -\delta \) and a point mass at N (with the weight \(\epsilon +\delta \). It follows that Shannon Entropy of \(\varvec{p}\) is at least \((1-\epsilon -\delta ) \cdot \log _2(N-1)\).

4.2 Positive Results

Now we address the question what happens when \(\varDelta > 1\). This is motivated by settings where keys with entropy deficiency can be applied (cf. Theorem 1 and related references).

Corollary 6

(Collision Entropy When the Shannon Gap is Moderate). Let \(k \leqslant n-1\) and let \(X^{*}\in \{0,1\}^n\) minimizes \(H_2(X)\) subject to \(\mathrm {H}(X) \geqslant k\) where \(X\in \{0,1\}^n\). Then

$$\begin{aligned} 2\log _2 n - 2\log _2 (n-k) \leqslant H_2(X^{*}) \leqslant 2\log _2 n - 2\log _2 (n+1-k) + 1. \end{aligned}$$
(14)

For instance, if \(k=255\) then \(15< H_2(X^{*}) < 16\).

Proof

(Proof of Corollary 6 ). Let b be the solution to Eq. (7) (here we have \(d=2^{n}\)). Since \(0 \leqslant H(b) \leqslant 1\) we have \(\frac{k}{\log _2 (d-1)} \geqslant 1-b \geqslant \frac{k-1}{\log _2 (d-1)}\). We improve the left-hand side inequality a little bit

Claim

We have \(1-\frac{k-1}{\log _2 d} \geqslant b\geqslant 1-\frac{k}{\log _2 d}\).

Proof

(Proof of Sect. 4.2 ). Since \(b\geqslant \frac{1}{d}\) we have \(\log _2 (d-1) - \log (1-b) \geqslant \log _2 d\) and therefore

$$\begin{aligned} k&= -b\log _2 b - (1-b)\log _2(1-b) + (1-b)\log _2(d-1) \\&\geqslant - b\log _2 b + (1-b)\log _2 d \end{aligned}$$

from which it follows that \(1-b \leqslant \frac{k}{\log _2 d}\). The left part is already proved.

The result now easily follows by observing that \(\frac{(1-b)^2}{d-1}\geqslant b^2\) holds true for \(b\leqslant \frac{-1+\sqrt{d-1}}{d-2}\leqslant \frac{1}{2}\), also for \(d=2\). This is indeed satisfied by Sect. 4.2 and \(k\leqslant \log _2 d-1\).

4.3 Bounds in Terms of the Renyi Divergence

Our Corollary 2 gives a bound on the \(\ell _2\)-distance between X and U. Note that

$$\mathrm {d}_2(X;U)^2 = \mathrm {CP}(X)-d^{-1} = d^{-1}\left( d\mathrm {CP}(X)-1\right) = d^{-1}\left( 2^{D_2\left( X\parallel U\right) }-1\right) $$

and thus our bounds can be expressed in terms of the Renyi divergence \(D_2\). Since we find the distribution X with possibly minimal entropy, this gives an upper bound on the divergence in terms the Shannon entropy.

5 Conclusion

Our results put in a quantitative form the well-accepted fact that Shannon Entropy does not have good cryptographic properties, unless additional strong assumptions are imposed on the entropy source. The techniques we applied may be of independent interests.