Abstract
Smooth entropy of X is defined as possibly biggest entropy of a distribution Y close to X. It has found many applications including privacy amplification, information reconciliation, quantum information theory and even constructing random number generators. However the basic question about the optimal shape for the distribution Y has not been answered yet. In this paper we solve this problem for Renyi entropies in non-quantum settings, giving a formal treatment to an approach suggested at TCC’05 and ASIACRYPT’05. The main difference is that we use a threshold cut instead of a quantile cut to rearrange probability masses of X. As an example of application, we derive tight lower bounds on the number of bits extractable from Shannon memoryless sources.
M. Skorski—This work was partly supported by the WELCOME/2010-4/2 grant founded within the framework of the EU Innovative Economy Operational Programme.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
1.1 Entropy Smoothing
Security Based on Statistical Closeness. In most of cryptographic applications, probability distributions which are close enough in the variational (statistical) distance are considered indistinguishable. More informally, they have similar cryptographic “quality”, when used as randomness sources (randomness extracting) [15] or secure keys (in the context of key derivation [1, 6]).
Entropy Notions do not See Statistical Closeness. Unfortunately, standard entropy notions (including important min-entropy and collision entropy which are widely used as randomness measures in cryptography), are not robust with respect to small probability perturbations. Consider the AES cipher with a 256-bit key which is \(\epsilon = 2^{-80}\)-close to uniform. While such a key is considered secure nowadays, it may happen that it has no more than 81 bits of min-entropy (more precisely, fix \(x\in \{0,1\}^{256}\) and consider the key X which is \(x_0\) with probability \(2^{-256}+2^{-80}\) and uniform for \(x\in \{0,1\}^{256}\setminus \{x_0\}\)). This is a mismatch with respect to our intuitive understanding of min-entropy as a measure of how many almost random bits can be extracted.
Smooth Entropy takes Probability Perturbations into Account. To fix the issue described above, the concept of smooth min-entropy has been proposed [4, 15]. Smooth entropy is defined as the maximal possible entropy within a certain distance to a given distribution. More precisely, for a given entropy notion \(H(\cdot )\) (which is usually Renyi entropy, see Sect. 2 for its formal definition) we define the \(\epsilon \)-smooth entropy of X as the value of the following optimization program
where \(\mathrm {SD}()\) stands for statistical (variational) distance (see Sect. 2) for a formal definition). This definition is now well-suited for cryptographic applications, because does not depend anymore on negligible variations of the probability distribution. In particular, setting \(\epsilon =2^{-80}\) for our AES example we obtain the “correct” result of 256 bits of (smooth) entropy.
Importance of Smooth Entropy. Smooth Renyi entropy, formally introduced by Renner and Wolf in [15], found many applications including privacy amplification [13, 15, 19], information reconciliation [15] and quantum information theory [17, 18]. The technique of perturbing a distribution to get more-entropy was actually known before. For example, entropy smoothing is implicitly used to prove the Asymptotic Equipartition Property [10] or more concretely in the construction of a pseudorandom generator from one-way functions [7–9]. However the simple question
Question 1
How does the shape of optimal Y depend on X?
has not been fully understood so far. In this paper we answer Question 1 by explicitly characterizing the shape of Y depending on X, and give some applications of the derived characterization.
1.2 Related Works and Our Contribution
Related Works. The problem of finding the optimal shape for Y has been addressed in [13, 15]. The authors argued intuitively that for min-entropy (which is a special case of Renyi entropy, particularly useful in randomness extraction) the optimal solution cuts down the biggest probabilities of X.
Our Contribution. We show that this characterization is not true, and the problem is more subtle: the optimal solution uses a threshold not a quantile cut (see Fig. 1).
The precise answer to Question 1 is given in Theorem 1. We provide an intuitive explanation, as a three-step algorithm, in Fig. 2.
Theorem 1
(Optimal Renyi entropy smoothing). Let \(\alpha >1\) be fixed, X be an arbitrary distribution over a finite set and \(\epsilon \in (0,1)\). Let \(t\in (0,1)\) be such that
and Y be distributed according to
Then Y is nearly optimal, that is we have
and
Corollary 1
(Tightness of Theorem 1 ). Note that for fixed \(\alpha > 1\) we have \(\frac{\alpha }{\alpha -1}\log \left( \frac{1}{1-\epsilon }\right) = O(\epsilon )\). Thus, our solution differs from the ideal one by only a negligible additive constant in the entropy amount, which is good enough for almost all applications.
1.3 Tight No-Go Results for Extracting from Stateless Shannon Sources
A stateless source (called also memoryless) is a source which produces consecutive samples independently. While this is a restriction, it is often assumed by practitioners working on random number generators (cf. [2, 3, 5, 11]) and argued to be reasonable under some circumstances (so called restart mode which enforces fresh samples, see [3, 5]). An important result is obtained from a more general fact called Asymptotic Equipartition Property (AEP). Namely, for a stateless source the min-entropy rate (min-entropy per sample) is close to its Shannon entropy per bit. The convergence holds in probability, for large number of samples.
A variant of the AEP: The min entropy per bit in a sequence \(X_1,\ldots ,X_n\) of i.id. samples from X converges, when \(n\rightarrow \infty \), to the Shannon entropy of X. More precisely
where the probability is taken over \(X_1,\ldots ,X_n\).
Thus, the AEP is a bridge connecting the heuristic use of Shannon entropy as a measure of extractable randomness (practice) and the provable security (randomness extractors theory). The best known quantitative form of Eq. (6) appears in [9].
Lemma 1
(Asymptotic Equipartition Property [9]). Let \(X_1,\ldots ,X_n\) be i.i.d. samples from a distribution X of Shannon entropy k. Then the sequence \((X_1,\ldots ,X_n)\) is \(\epsilon \)-close to a distribution of min entropy \(kn - O\left( \sqrt{kn\log (1/\epsilon ) }\right) \).
Corollary 2
In particular, one can extract \(kn - O\left( \sqrt{kn\log (1/\epsilon ) }\right) -2\log (1/\epsilon )\) bits which are \(\epsilon \)-close to uniform (e.g. using independent hash functions [8] as an extractor).
Based on Theorem 1 we reprove the following result which matches the bound in [9]. Our result can be understood as the lower bound on the convergence speed in the Asymptotic Equipartition Property given in Lemma 1.
Theorem 2
(An upper bound on the extraction rate [16]). Let \(X_1,X_2,\ldots \) be of i.i.d. random variables, each of Shannon entropy k. Then from the sequence \((X_1,X_2,\ldots ,X_n)\) no extractor can get more than
bits which are \(\epsilon \)-close (in the variation distance) to uniform (the constant under \(\varTheta (\cdot )\) depends on the source).
Remark 1
(The bound is tight for most settings). Since from N bits of min-entropy we can extract at least \(N-2\log (1/\epsilon )\) bits \(\epsilon \)-close to uniform, and since in most cases \(\log (1/\epsilon ) = o(kn)\)
From Theorem 2 we conclude that the error in Eq. (6) is significant and has to be taken into account no matter what the extractor is. It is worth of noting that our separation between Shannon entropy and extractable entropy holds in the most favorable case, when the bits are independent.
Corollary 3
(A significant error in the heuristic estimate). In the above setting, the gap between the Shannon entropy and the number of extractable bits \(\epsilon \)-close to uniform equals at least \(\varTheta (kn-\sqrt{\log (1/\epsilon )})\). In particular, for the recommended security level (\(\epsilon =2^{-80}\)) we obtain the loss of \(kn - N \approx \sqrt{80 k n}\) bits, no matter what an extractor we use.
1.4 Organization
Notions we use, as well as some auxiliary technical facts, are explained in Sect. 2. We prove our main result, that is Theorem 1, in Sect. 3. The proof of Theorem 2 appears in Sect. 4.
2 Preliminaries
2.1 Basic Definitions
The most popular way of measuring how two distributions are close is the statistical distance.
Definition 1
(Statistical Distance). The statistical (or total variation) distance of two distributions X, Y over the same finite set is defined as
We also say that X and Y are \(\epsilon \)-close.
Below we recall the definition of Renyi entropy of order \(\alpha \). The logarithms are taken at base 2.
Definition 2
(Renyi Entropy). The Renyi entropy of order \(\alpha \) of a distribution X equals \(\mathbf {H}_{\alpha }(X) = \frac{1}{1-\alpha }\log \left( \sum _{x} \Pr [X=x]^{\alpha } \right) \).
Choosing \(\alpha \rightarrow 1\) and \(\alpha \rightarrow \infty \) we recover two important notions: Shannon entropy and min entropy.
Definition 3
(Shannon Entropy). The Shannon Entropy of a distribution X equals \(H(X) = -\sum _{x}\Pr [X=x]\log \Pr [X=x]\).
Definition 4
(Min Entropy). The min entropy of a distribution X equals \(H_\mathrm {\infty }(X) = -\max _{x}\log \Pr [X=x]\).
Smooth Renyi Entropy is defined as the value of the program (1).
Definition 5
(Smooth Renyi Entropy, [4]). The \(\epsilon \)-smooth Renyi entropy of order \(\alpha \) of a distribution X equals \({H}^{\epsilon }_{\alpha }(X) = \max _{Y} {H}_{\alpha }(Y)\) where the maximum is taken over Y satisfying the constraint \(\mathrm {SD}(X;Y)\leqslant \epsilon \).
Definition 6
(Extractable Entropy, [14]). We say that X has k extractable bits within distance \(\epsilon \), denoted \(H_\mathrm {ext}^{\epsilon }(X)\geqslant k\), if for some randomized function \(\mathrm {Ext}\) we have \(\mathrm {SD}\left( \mathrm {Ext}(X,S); U_k,S\right) \leqslant \epsilon \), where \(U_k\) is a uniform k-bit string and S is an independent uniform string.
2.2 Technical Facts
We will need the following simple fact on convex functions
Proposition 1
Let f be a strictly convex differentiable real-valued function and \(x<y\). Then for any \(\delta > 0\) we have
Our proof uses the following characterization of “extractable” distributions.
Theorem 3
(An Upper Bound on Extractable Entropy, [14]). If \(H_\mathrm {ext}^{\epsilon }(X) \geqslant k\) then X is \(\epsilon \)-close to Y such that \(H_\mathrm {\infty }(Y) \geqslant k\).
Another important fact we use is the sharp bound on binomial tails.
Theorem 4
(Tight Binomial Tails [12]). Let B(n, p) be a sum of independent Bernoulli trials with success probability p. Then for \(\gamma \leqslant \frac{3}{4}q\) we have
with the error term satisfies
where \(\mathrm {KL}\left( a \parallel b\right) = a\log (a/b) + (1-a)\log ((1-a)/(1-b)\) is the Kullback-Leibler divergence, and Q is the complement of the cumulative distribution function of the standard normal distribution.
3 Proof of Theorem 1
We start by rewriting Eq. (1) in the following way
where for the sake of clarity we replace \(\mathbf {P}_X\) by \(\mu _X\).
Claim
Let \(\epsilon (x)\) be optimal for Eq. (11). Define \(S^{+} = \{x:\ \epsilon (x) < 0\}\). Then \(\mu (x) + \epsilon (x) = \mu (y) + \epsilon (y)\) for all \(x,y \in S^{+}\). We will show the optimality by a mass-shifting argument.
Proof
(of Claim). Suppose that
for two different points \(x_1,x_2\) (the statement is trivially true when there is only one point). Take a number \(\delta \) such that
and modify \(\mu \) by shifting the mass from \(x_2\) to \(x_1\) in the following way
that is shifting the mass from the biggest point to the smallest point. Note that from Eqs. (11) and (13) it follows that the constraints in (11) are satisfied with \(\epsilon (x)\) replaced by \(\epsilon '(x)\). Let Y be a random variable distributed according to \(\mathbf {P}_Y(x) = \mu (x)+\epsilon (x)\) and let \(Y'\) be distributed as \(\mathbf {P}_{Y'}(x) = \mu (x)+\epsilon '(x)\). Note that we have
Note that we have
by Eqs. (12) and (13). Now from Proposition 1 applied to \(f(u) = u^{\alpha }\), \(x=\mathbf {P}_Y(x_2)+\delta \), \(y=\mathbf {P}_Y(x_2)\) and \(\delta \) (here we also use the assumption \(\alpha > 1\)), it follows that
which means \(\mathbf {H}_{\alpha }(Y') > \mathbf {H}_{\alpha }(Y)\). In other words, Y is not optimal. \(\square \)
By the last claim it is clear that there is a number \(t\in (0,1)\) such that the set \(S^{+}=\{x:\ \mathbf {P}_{Y^{*}}(x) < \mu (x) \}\) is contained in \(\{x:\ \mu (x) \geqslant t\}\) and that \(\mathbf {P}_{Y^{*}}(x) \geqslant t\) for \(x\in S^{+}\). Therefore
which, since \(\mathbf {H}^{\epsilon }_{\alpha }(X) = \mathbf {H}_{\alpha }(Y^{*})\), proves the second inequality in Eq. (5). To prove the first inequality in Eq. (5) note that
Since \(\frac{t}{1-\epsilon } > t\) we have
and therefore by Eq. (2) we obtain
which finishes the proof.
4 Proof of Theorem 2
4.1 Characterizing Extractable Entropy
We state the following fact with an explanation in Fig. 3.
Lemma 2
(Lower bound on the extractable entropy). Let X be a distribution. Then for every distribution Y which is \(\epsilon \)-close to X, we have \(H_{\infty }(Y) \leqslant -\log t\) where t satisfies
The proof follows by Theorem 1.
Without losing generality, we assume from now that \(X\in \{0,1\}\) where \(\Pr [X=1]=p, q=1-p\). Define \(X^n = \left( X_1,\ldots ,X_n\right) \). For any \(x\in \{0,1\}^n\) we have
According to the last lemma and Theorem 3, we have
where
From now we assume that
4.2 Determining the Threshold t
The next key observation is that t is actually small and can be omitted. That is, we can simply cut the \((1-\epsilon )\)-quantile. This is stated in the lemma below.
Lemma 3
(Replacing the threshold by the quantile). Let \(x_0\in \{0,1\}^n\) be a point such that \(\Vert x_0\Vert = pn + \gamma n\). Then we have
To prove the lemma, note that from Theorem 4 it follows that setting
we obtain
when \(\gamma \) is sufficiently small comparing to p and q (formally this is justified by calculating the derivative with respect to \(\gamma \) and noticing that it is bigger by at most a factor of \(1+\frac{\gamma }{\sqrt{npq}}\)). But we also have
Therefore,
which finishes the proof.
4.3 Putting This All Together
Now, by combining Lemmas 2 and 3 and the estimate \(Q(x) \approx x^{-1}\exp (-x^2/2)\) for \(x\gg 0\) we obtain
which, because of the Taylor expansion \(\mathrm {KL}\left( p+\gamma \parallel p\right) = \frac{\gamma ^2}{2pq}+ O_{p,q}(\gamma ^3)\), gives us
Setting \(\gamma = c\cdot \sqrt{\frac{\log (1/\epsilon )}{pq n}} \), with sufficiently big c, we obtain the claimed result.
References
Barak, B., Dodis, Y., Krawczyk, H., Pereira, O., Pietrzak, K., Standaert, F.-X., Yu, Y.: Leftover hash lemma, revisited. Cryptology ePrint Archive, Report 2011/088 (2011). http://eprint.iacr.org/
Bouda, J., Krhovjak, J., Matyas, V., Svenda, P.: Towards true random number generation in mobile environments. In: Jøsang, A., Maseng, T., Knapskog, S.J. (eds.) NordSec 2009. LNCS, vol. 5838, pp. 179–189. Springer, Heidelberg (2009). http://dx.doi.org/10.1007/978-3-642-04766-4_13
Bucci, M., Luzzi, R.: Design of testable random bit generators. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 147–156. Springer, Heidelberg (2005)
Cachin, C.: Smooth entropy and Rényi entropy. In: Fumy, W. (ed.) EUROCRYPT 1997. LNCS, vol. 1233, pp. 193–208. Springer, Heidelberg (1997)
Dichtl, M., Golić, J.D.: High-speed true random number generation with logic gates only. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 45–62. Springer, Heidelberg (2007)
Dodis, Y., Pietrzak, K., Wichs, D.: Key derivation without entropy waste. In: Nguyen, P.Q., Oswald, E. (eds.) EUROCRYPT 2014. LNCS, vol. 8441, pp. 93–110. Springer, Heidelberg (2014)
Hastad, J., Impagliazzo, R., Levin, L.A., Luby, M.: Pseudo-random generation from one-way functions. In: Proceedings of the 20th STOC, pp. 12–24 (1988)
Hastad, J., Impagliazzo, R., Levin, L.A., Luby, M.: A pseudorandom generator from any one-way function. SIAM J. Comput. 28(4), 1364–1396 (1999). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.35.3930
Holenstein, T.: Pseudorandom generators from one-way functions: a simple construction for any hardness. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 443–461. Springer, Heidelberg (2006)
Holenstein, T., Renner, R.: On the randomness of independent experiments. IEEE Trans. Inf. Theory 57(4), 1865–1871 (2011)
Lacharme, P., Röck, A., Strubel, V., Videau, M.: The linux pseudorandom number generator revisited. Cryptology ePrint Archive, Report 2012/251 (2012). http://eprint.iacr.org/
McKay, B.D.: On Littlewood’s estimate for the binomial distribution. Adv. Appl. Probab. 21(2), 475–478 (1989)
Renner, R.S., König, R.: Universally composable privacy amplification against quantum adversaries. In: Kilian, J. (ed.) TCC 2005. LNCS, vol. 3378, pp. 407–425. Springer, Heidelberg (2005). http://dx.doi.org/10.1007/978-3-540-30576-7_22
Renner, R., Wolf, S.: Smooth Renyi entropy and applications. In: ISIT 2004, Chicago, Illinois, USA, p. 232 (2004)
Renner, R.S., Wolf, S.: Simple and tight bounds for information reconciliation and privacy amplification. In: Roy, B. (ed.) ASIACRYPT 2005. LNCS, vol. 3788, pp. 199–216. Springer, Heidelberg (2005)
Skorski, M.: How much randomness can be extracted from memoryless shannon entropy sources. In: WISA 2015 (2015)
Schoenmakers, B., Tjoelker, J., Tuyls, P., Verbitskiy, E.: Smooth Renyi entropy of ergodic quantum information sources. In: 2007 IEEE International Symposium on Information Theory. ISIT 2007, pp. 256–260 (2007)
Tomamichel, M.: A framework for non-asymptotic quantum information theory. Ph.D. thesis, ETH Zurich (2012)
Watanabe, S., Hayashi, M.: Non-asymptotic analysis of privacy amplification via Renyi entropy and inf-spectral entropy. In: 2013 IEEE International Symposium on Information Theory Proceedings (ISIT), pp. 2715–2719 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Skorski, M. (2016). How to Smooth Entropy?. In: Freivalds, R., Engels, G., Catania, B. (eds) SOFSEM 2016: Theory and Practice of Computer Science. SOFSEM 2016. Lecture Notes in Computer Science(), vol 9587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49192-8_32
Download citation
DOI: https://doi.org/10.1007/978-3-662-49192-8_32
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49191-1
Online ISBN: 978-3-662-49192-8
eBook Packages: Computer ScienceComputer Science (R0)