Keywords

1 Introduction

Randomness extractors are functions that convert weakly random sources into nearly uniform bits. Though the motivation of extractors is to simulate randomized algorithms with weak random sources as might arise in nature, randomness extractors have been successfully applied to coding theory, cryptography, complexity, etc. [12, 14, 22]. In this paper, we focus on the extractors that can be applied to privacy amplification. In this scenario, two parties Alice and Bob share a weakly random secret \(W \in \{0, 1\}^n\). W may be a human-memorizable password, some biometric data, and physical sources, which are themselves weakly random, or a uniform secret which may have been partially leaked to an adversary Eve. Thus, only the min-entropy of W is guaranteed. Alice and Bob interact over a public communication channel in order to securely agree on a nearly uniform secret key \(R \in \{0, 1\}^m\) in the presence of the adversary, Eve, who can see every message transmitted in the public channel. The public seed length and min-entropy of W are two main measures of efficiency in this setting. If Eve is passive, a (strong) randomness extractor yields the following solution: Alice sends a uniformly random seed S to Bob, then they both compute \(R= \textsf {Ext}(W, S)\) as the nearly uniform secret key [18]. If Eve is active (i.e., it may change the messages in arbitrary ways), some protocols have been proposed to achieve this goal [4, 69, 1315, 21, 23].

As a major progress, Dodis and Wichs [9] introduced non-malleable extractors to study privacy amplification protocols, where the attacker is active and computationally unbounded. If an attacker sees a random seed S and modifies it into an arbitrarily related seed \(S'\), then the relationship between \(R = \textsf {Ext}(W, S)\) and \(R' = \textsf {Ext}(W, S')\) is bounded to avoid related key attacks. More formally, a non-malleable extractor is a function \(\textsf {nmExt}: \{0, 1\}^n \times \{0, 1\}^d \rightarrow \{0, 1\}^m\) that takes two inputs, a weakly-random secret sourceFootnote 1 W with min-entropy \(\alpha \) and uniformly random seed S, and outputs a string which is \(\gamma \)-close to uniform (see Definition 1), given S as well as \(\textsf {nmExt}(W, \mathcal {A}(S))\), for an arbitrary function \(\mathcal {A}\) with \(\mathcal {A}(S) \ne S\). They proved that \((\alpha , 2 \gamma )\)-non-malleable extractors exist as long as \(\alpha > 2m + 3 \log \frac{1}{\gamma } + \log d + 9\) and \(d > \log (n-\alpha +1) + 2 \log \frac{1}{\gamma } +7\). The first explicit non-malleable extractor was constructed by Dodis, Li, Wooley and Zuckerman [8]. It works for any weakly random input source with the min-entropy \(\alpha > \frac{n}{2}\) and uniformly random seed of length \(d=n\) (It works even if the seed has entropy only \( \varTheta (m+ \log n) \)). However, when outputting more than a logarithmic number of bits, its efficiency relies on a longstanding conjecture on the distribution of prime numbers.

Li [14] proposed that \((\alpha , 2 \gamma )\)-non-malleable extractor \(\textsf {nmExt}: \{0, 1\}^n \times \{0, 1\}^d \rightarrow \{0, 1\}\), where \( \alpha = (\frac{1}{2}-\delta ) \cdot n \) and \(d=O(\log n + \log (1/\gamma ))\) for any constant \(\delta > 0\), can be constructed as follows: the seed S is encoded using the parity check matrix of a BCH code, and then the output is the inner product function of the encoded source and the encoded seed over \(\mathbb {F}_2\). Dodis and Yu [11] observed that for 4-wise independent hash function family \(\{h_w : \{0, 1\}^d \rightarrow \{0, 1\}^m \mid w \in \{0, 1\}^n \}\), \(\textsf {nmExt} (w, s)=h_w(s)\) is a \((\alpha , 2 \sqrt{2^{n-\alpha -d}} )\)-non-malleable extractor. In 2012, an alternative explicit construction based on the extractor of Raz [20] was given by Cohen et al. [6]. Without using any conjecture, their construction works for any weakly random source with the min-entropy \(\alpha = (\frac{1}{2}+ \delta ) \cdot n\) and uniformly random seed of length \(d \ge \frac{23}{\delta } \cdot m + 2 \log n\) (see Theorem 1 for details). However, their result suffers from some drawbacks: The non-malleable extractor is constructed based on the explicit seeded extractor of Raz [20], while the errorFootnote 2 estimation in that construction is too rough. Furthermore, though one main purpose of [6] is to shorten the length of the seed, the lower bound on the seed length is still not optimal.

Our Contributions and Techniques

  • By developing the combination and permutation techniques, we improve the error estimation of Raz’s extractor in STOC’05 [20], a special case of which was used by Cohen et al. in CCC’12 [6]. For simplicity, denote \(\gamma _1\) as the error of the extractor in [6], and \(\gamma _2\) as the counterpart in this paper. Recall that \(\gamma _1 = 2^{ \frac{(\frac{1}{2}- \delta )n}{k}} \cdot ( 2 \epsilon )^{ \frac{1}{k}}\) in [6] under the assumption that \( \epsilon \ge 2^{ - \frac{dk}{2}} \cdot k^k\) and \( 0 < \delta \le \frac{1}{2}\) (see Lemma 1). If \(\epsilon \ge \frac{1}{2^{(\frac{1}{2}-\delta )n+1}}\), then \(\gamma _1 = 2^{ \frac{(\frac{1}{2}- \delta )n}{k}} \cdot ( 2 \epsilon )^{ \frac{1}{k}} \ge 1\). In this case, the error estimation is meaningless. One main reason is that in those proofs, the partition method about the sum [6, 20] which bounds the error didn’t capture the essence of the biased sequence for linear tests (see Definition 2). In this paper, we propose another partition method and give a better bound on the sum by employing the combination and permutation formulas. In particular, the combination and permutation techniques (see Proposition 1) may be useful in future works. Correspondingly, the error is \( \gamma _2=2^{ \frac{(\frac{1}{2}- \delta )n}{k}} \cdot [ 2^{- \frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1 \cdot (1-\epsilon ) + \epsilon ]^{\frac{1}{k} } \) (see Theorem 2). Since \( \epsilon \ge 2^{ - \frac{dk}{2}} \cdot k^k\) and \( 2^{ - \frac{dk}{2}} \cdot k^k > 2^{- \frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1 \) for any even integer k, we get \(\gamma _1 > \gamma _2\). To simplify this bound, let k be a specific value. For instance, let \(k=4\), then the error \(\gamma _2 = 2^{ \frac{(\frac{1}{2}- \delta )n}{4}} \cdot [2^{- 2d} \cdot 3 \cdot (1-\epsilon ) + \epsilon ]^{\frac{1}{4}}\).

  • Note that the error estimation of the Raz’s extractor impacts greatly on the constraints of the parameters including the seed length, the weak source’s min-entropy and the errorFootnote 3 of the non-malleable extractor. Based on the above improvement of the error estimation, we present an explicit construction of non-malleable extractors, which is an improvement of the construction of Cohen et al. in CCC’12 [6] in the sense that the seed length is shorter. More concretely, we present an explicit \((1016, \frac{1}{2})\)-non-malleable extractor \(\textsf {nmExt}: \{0, 1\}^{n} \times \{0, 1\}^d \rightarrow \{0, 1\}\) with \(n= 1024\) and \(d=19\), which beats the condition “\(2.01 \cdot \log n \le d \le n\)” in [6], since seed length d is just \(1.9 \cdot \log n\) in our construction while it is no less than \(\frac{46}{63} + 66\) according to [6]. Moreover, we improve the parameters of the general explicit construction given by Cohen et al.

  • We show how our non-malleable extractors are applied to privacy amplification.

Organization. The remainder of the paper is organized as follows. In Sect. 2, we review some notations, concepts, and results. In Sect. 2, we show an existing central lemma about the error estimation of Raz’s Extractor and improve it by proposing a new partition method. In Sect. 4, we propose the explicit construction of non-malleable extractors with shorter seed length compared with that in [6]. In Sect. 5, we show how the non-malleable extractors are applied to privacy amplification. Section 6 concludes the paper.

2 Preliminaries

For any positive integer n, denote \([n]=\{1,2, \ldots , n\}\). Denote \(U_m\) as the uniformly random distribution over \(\{0, 1\}^m\). We measure the distance between two distributions by the \( \mathcal {L}_1\) norm in order to be consistent with [6]. The statistical distance of X and Y is defined as \(\textsf {SD}(X, Y) = \frac{1}{2} \Vert X-Y\Vert _1\). It’s well known that for any function f, \(\textsf {SD}(f(X), f(Y)) \le \textsf {SD}(X, Y)\). Denote \(\textsf {SD}((X_1, X_2), (Y_1, Y_2) \mid Z)\) as the abbreviation of \(\textsf {SD}((X_1, X_2, Z), (Y_1, Y_2, Z))\).

The min-entropy of variable W is \(H_\infty (W) = - \log \max _w Pr(W=w).\) W over \(\{0, 1\}^n\) is called an \((n, \alpha )\)it-source if \(H_\infty (W) \ge \alpha \). We say that a source (i.e., a random variable) is a weak source if its distribution is not uniform. We say W is a flat source if it is a uniform distribution over some subset \(S \subseteq \{0, 1\}^n\). Chor and Goldreich [5] observed that the distribution of any \((n, \alpha )\)-source is a convex combination of distributions of flat (nb)-sources. Therefore, for general weak sources, it will be enough to consider flat sources instead in most cases.

Definition 1

We say that the distribution X is \(\epsilon \)-close to the distribution Y if \(\Vert X-Y\Vert _1 = \sum _{s} |\Pr [X =s]- \Pr [Y =s]| \le \epsilon \) Footnote 4. A function \(\textsf {Ext}: \{0, 1\}^n \times \{0, 1\}^d \rightarrow \{0, 1\}^m\) is an \((\alpha , \gamma )\)-seeded extractor if for every \((n, \alpha )\)-source W and an independent uniformly random variable S (called seed) over \(\{0, 1\}^d\), the distribution of \(\textsf {Ext}(W, S)\) is \(\gamma \)-close to \(U_m\). \(\gamma \) is called the error of the seeded extractor. A seeded extractor is a if for W and S as above, \((\textsf {Ext}(W, S), S)\) is \(\gamma \)-close to \((U_m, U_d)\).

Definition 2

A random variable Z over \(\{0, 1\}\) is \(\epsilon \)-biased if \(bias(Z)=|\Pr [Z=0]-\Pr [Z=1]| \le \epsilon \) (i.e., Z is \(\epsilon \)-close to uniform). A sequence of 0-1 random variables \(Z_1, Z_2, \ldots , Z_N\) is \(\epsilon \)-biased for linear tests of size k if for any nonempty \(\tau \subseteq [N]\) with \(|\tau | \le k\), the random variable \(Z_\tau = \oplus _{i \in \tau } Z_i\) is \(\epsilon \)-biased. We also say that the sequence \(Z_1, Z_2, \ldots , Z_N\) \(\epsilon \)-fools linear tests of size k.

For every \(k'\), \(N \ge 2\), variables \(Z_1, \cdots , Z_N\) as above can be explicitly constructed using \(2 \cdot \lceil \log (1/\epsilon ) + \log k' + \log \log N \rceil \) random bits [1].

The Extractor of Raz. Raz [20] constructed an extractor based on a sequence of 0-1 random variables that have small bias for linear tests of a certain size. Let \(Z_1, \cdots , Z_{m \cdot 2^d }\) be 0-1 random variables that are \(\epsilon \)-biased for linear tests of size \(k'\) that are constructed using n random bits. The set of indices \([m \cdot 2^d ]\) can be considered as the set \(\{ (i, s): i \in [m], s \in \{ 0, 1\}^d\}\). Define \(\textsf {Ext}: \{0, 1\}^n \times \{0, 1\}^d \rightarrow \{0, 1\}^m\) by \(\textsf {Ext}(w, s)=Z_{(1, s)}(w)||Z_{(2, s)}(w) \ldots ||Z_{(m, s)}(w)\), where “||” is the concatenation operator. Raz proposed that \(\textsf {Ext}\) is a seeded extractor with good parameters [20].

Cohen et al. [6] proved that the above extractor is in fact non-malleable. We’ll also construct non-malleable extractors based on it. The formal definition of non-malleable extractors is as follows.

Definition 3

(see [6]) We say that a function \(\mathcal {A}: \{0, 1\}^d \rightarrow \{0, 1\}^d\) is an adversarial function, if for every \(s \in \{0, 1\}^d\), \( f(s) \ne s\) holds. A function \(\textsf {nmExt}: \{0, 1\}^n \times \{0, 1\}^d \rightarrow \{0, 1\}^m\) is a \((\alpha , \gamma )\) -non-malleable extractor if for every \((n, \alpha )\)-source W, independent uniformly random variable S, and every adversarial function \(\mathcal {A}\),

$$\Vert (\textsf {nmExt}(W, S), \textsf {nmExt}(W, \mathcal {A}(S)), S) - (U_m, \textsf {nmExt}(W, \mathcal {A}(S)), S)\Vert _1 \le \gamma .$$

\(\gamma \) is called the error of the non-malleable extractor.

One-time message authentication code (MAC) is used to guarantee that the received message is sent by a specified legitimate sender in an unauthenticated channel. Formally,

Definition 4

A family of functions \(\{ \textsf {MAC}_r : \{0, 1\}^{v} \rightarrow \{0, 1\}^\tau \}_{r \in \{0, 1\}^m}\) is a \(\varepsilon \) -secure (one-time) message authentication code (MAC) if for any \(\mu \) and any function \(f: \{ 0, 1\}^\tau \rightarrow \{ 0, 1\}^v \times \{ 0, 1\}^\tau \), it holds that,

$$\Pr _{r \leftarrow U_m }[ \textsf {MAC}_r(\mu ')= \sigma ' \wedge \mu ' \ne \mu \mid (\mu ', \sigma ') = f(\textsf {MAC}_r(\mu )) ] \le \varepsilon .$$

Recall that the main theorem about the explicit construction of non-malleable extractors proposed in [6] is as follows.

Theorem 1

(see [6]) For any integers n, d, and m, and for any \( 0 < \delta \le \frac{1}{2}\) such that \(d \ge \frac{23}{ \delta } \cdot m + 2 \log n, \) \(n \ge \frac{160}{\delta } \cdot m,\) and \(\delta \ge 10 \cdot \frac{\log (nd) }{n}, \) there exists an explicit \((( \frac{1}{2} + \delta ) \cdot n, 2^{-m})\)-non-malleable extractor \(\textsf {nmExt}: \{0, 1\}^n \times \{0, 1\}^d \rightarrow \{0, 1\}^m\).

3 Error Estimation of Raz’s Extractor and Its Improvement

In this section, we first recall the central lemma used in [6], which is a special case about the error estimation of Raz’s Extractor [20]. Then we point out the flaw in the proof and improve its error estimation. Afterwards, we compare our result with the original one and roughly show the role of the improvement.

3.1 A Special Case of Raz’s Extractor

The central lemma used in [6] is below, the proof of which is essentially the same as that in [20]. It can be considered as a special case of Raz’s Extractor [20].

Lemma 1

Let \(D = 2^d\). Let \(Z_1, \ldots , Z_D\) be 0-1 random variables that are \(\epsilon \)-biased for linear tests of size \(k'\) that are constructed using n random bits. Define \(\textsf {Ext}^{(1)}\): \(\{0, 1\}^n \times \{0, 1\}^d \rightarrow \{0, 1\}\) by \(\textsf {Ext}^{(1)}(w, s) = Z_s(w)\), that is, \(\textsf {Ext}^{(1)}(w, s)\) is the random variable \(Z_s\), when using w as the value of the n bits needed to produce \(Z_1, \ldots , Z_D\). Then, for any \(0 < \delta \le \frac{1}{2}\) and even integer \(k \le k'\) s.t. \(k \cdot ( \frac{1}{ \epsilon })^{ \frac{1}{k}} \le D^{\frac{1}{2}}\), \(\textsf {Ext}^{(1)}\) is a \((( \frac{1}{2} + \delta )\cdot n, \gamma _1)\)-seeded-extractor, with \(\gamma _1=(\epsilon \cdot 2^{(\frac{1}{2}-\delta )n+1})^{ \frac{1}{k}}\).

Proof

Let W be a \((n, ( \frac{1}{2}+ \delta ) \cdot n)\)-source. Let S be a random variable that is uniformly distributed over \(\{0, 1\}^d\) and is independent of W. We will show that the distribution of \(\textsf {Ext}^{(1)}(W, S)\) is \(\gamma _1\)-close to uniform. As in [5], it is enough to consider the case where W is uniformly distributed over a set \(W' \subseteq \{0, 1\}^n\) of size \(2^{(1/2+ \delta )n}\). For every \(w \in \{0, 1\}^n\) and \(s \in \{0, 1\}^d\) denote \(e(w, s)=(-1)^{Z_s(w)}.\)

Claim 1. For any \(r \in [k]\) and any different \(s_1, \ldots , s_r \in \{0, 1\}^d\) ,

$$ \sum \limits _{w \in \{0, 1\}^n} \prod \limits _{j=1}^{r} e(w, s_j) \le \epsilon \cdot 2^n. $$

Proof

$$\sum \limits _{w \in \{0, 1\}^n} \prod \limits _{j=1}^{r} e(w, s_j)= \sum \limits _{w \in \{0, 1\}^n} \prod \limits _{j=1}^{r} (-1)^{Z_{s_j}(w)} = \sum \limits _{w \in \{0, 1\}^n} (-1)^{Z_{s_1}(w) \oplus \cdots \oplus Z_{s_r}(w)},$$

and since \(Z_{s_1}(w) \oplus \cdots \oplus Z_{s_r}(w)\) is \(\epsilon \)-biased, the last sum is at most \(\epsilon \cdot 2^n\).    \(\square \)

The \( \mathcal {L}_1 \) distance of \(\textsf {Ext}^{(1)}(W, S)\) and U is

$$\begin{aligned}&{~~~~}\Vert \textsf {Ext}^{(1)}(W, S)- U \Vert _1 \\ {}&=| \Pr [\textsf {Ext}^{(1)}(W, S)=0]- \Pr [\textsf {Ext}^{(1)}(W, S)=1]|\\ {}&= |\frac{1}{2^{(\frac{1}{2} + \delta )n}} \cdot \frac{1}{2^d} (\sum \limits _{w \in W'} \sum \limits _{s \in \{0, 1\}^d} e(w, s))|. \end{aligned}$$

Denote \( \gamma (W, S)= \frac{1}{2^{(\frac{1}{2} + \delta )n}} \cdot \frac{1}{2^d} (\sum \limits _{w \in W'} \sum \limits _{s \in \{0, 1\}^d} e(w, s)).\)

Define \(f: [-1, 1] \rightarrow [-1, 1]\) by \(f(z)=z^k\), then f is a convex function for any even positive integer k.

Thus, by a convexity argument, we have

$$\begin{aligned} 2^{(\frac{1}{2}+ \delta )n} \cdot (2^d \cdot \gamma (W, S))^k&= 2^{(\frac{1}{2}+ \delta )n} \cdot \{ \sum \limits _{w \in W'} [ \frac{1}{ 2^{(1/2+ \delta )n}} \sum \limits _{s \in \{0, 1\}^d} e(w, s)] \} ^k\\ {}&\le 2^{(\frac{1}{2}+ \delta )n} \cdot \{ \sum \limits _{w \in W'} \frac{1}{2^{(1/2+ \delta )n} } [\sum \limits _{s \in \{0, 1\}^d} e(w, s)]^k \} \\ {}&\le \sum \limits _{w \in \{0, 1\}^n} [\sum \limits _{s \in \{0, 1\}^d} e(w, s)]^k \\ {}&= \sum \limits _{w \in \{0, 1\}^n} \sum \limits _{s_1, \ldots , s_k \in \{0, 1\}^d} \prod \limits _{j=1}^k e(w, s_j) \\ {}&= \sum \limits _{s_1, \ldots , s_k \in \{0, 1\}^d} \sum \limits _{w \in \{0, 1\}^n} \prod \limits _{j=1}^k e(w, s_j). \end{aligned}$$

The sum over \(s_1, \ldots , s_k \in \{0, 1\}^d\) is broken into two sums. The first sum is over \(s_1, \ldots , s_k \in \{0, 1\}^d\) such that in each summand, at least one \(s_j\) is different than all other elements in the sequence \( s_1, \ldots , s_k\) Footnote 5, and the second sum is over \(s_1, \ldots , s_k \in \{0, 1\}^d\) such that in each summand every \(s_j\) is identical to at least one other element in the sequence \( s_1, \ldots , s_k\). The number of summands in the first sum is trivially bounded by \(2^{d \cdot k}\), and by Claim 1 each summand is bounded by \(2^n \cdot \epsilon \). The number of summands in the second sum is bounded by \(2^{d \cdot \frac{k}{2}} \cdot (\frac{k}{2})^k\), and each summand is trivially bounded by \(2^n\). Therefore,

$$ 2^{(\frac{1}{2} + \delta )n} \cdot 2^{d \cdot k } \cdot \gamma (W, S)^k \le 2^n \cdot \epsilon \cdot 2^{d \cdot k} + 2^n \cdot 2^{d \cdot \frac{k}{2}} \cdot (\frac{k}{2})^k \le 2 \cdot 2^n \cdot \epsilon \cdot 2^{d \cdot k},$$

where the last inequality follows by the assumption that \(k \cdot (1/ \epsilon )^{1/k} \le D^{\frac{1}{2}}\). That is, \( \gamma (W, S) \le (\epsilon \cdot 2^{(\frac{1}{2}-\delta )n+1})^{\frac{1}{k}}.\)    \(\square \)

The above partition method about the sum over \(s_1, \ldots , s_k \in \{0, 1\}^d\) is not optimal, since it doesn’t capture the essence of random variable sequence that is biased for linear tests (i.e., \(Z_1, \ldots , Z_{2^d}\) is called \(\epsilon \)-biased for linear tests of size k if for any nonempty \(\tau \subseteq [2^d]\) with \(|\tau | \le k\), the random variable \(Z_\tau = \oplus _{i \in \tau } Z_i\) is \(\epsilon \)-biased). Moreover, the bounds on the number of summands in the two sums are too large. The same problem exists in [20].

In fact, when every \(s_j\) is identical to at least one other element in the sequence \( s_1, \ldots , s_k\) under the assumption that at least one \(s_j\) appears odd times in the sequence \( s_1, \ldots , s_k\), the summand \(\sum \limits _{w \in \{0, 1\}^n} \prod \limits _{j=1}^{k} e(w, s_j)\) is still upper bounded by \(2^n \cdot \epsilon \), since \(\sum \limits _{w \in \{0, 1\}^n} \prod \limits _{j=1}^{k} e(w, s_j)= \sum \limits _{w \in \{0, 1\}^n} \prod \limits _{j=1}^{k} (-1)^{Z_{s_j}(w)}= \sum \limits _{w \in \{0, 1\}^n} (-1)^{Z_{s_1}(w) \oplus \cdots \oplus Z_{s_k}(w) }\) and \(Z_1\), \(\ldots \), \(Z_D\) are 0-1 random variables that are \(\epsilon \)-biased for linear tests of size \(k'\). However, in this case the upper bound on the summand \(\sum \limits _{w \in \{0, 1\}^n} \prod \limits _{j=1}^{k} e(w, s_j)\) was considered to be \(2^n\) in [6, 20].

3.2 Improvement for the Error Estimation of Raz’s Extractor

We improve the error estimation of Raz’s extractor as follows. Unlike [6, 20], we present another partition method of the sum in the following proof. The combination and permutation formulas are exploited to show a tight bound on the sum. Correspondingly, the error can be reduced.

Proposition 1

Consider fixed positive numbers k and d. Assume that a sequence \(s_1, \ldots , s_k\) satisfies the following two conditions: (1) for every \(i \in [k]\), \(s_i \in \{0, 1\}^d\), and (2) for every \(j \in [k]\), \(s_j\) appears even times in the sequence \(s_1, \ldots , s_k\). Then the number of such sequences \(s_1, \ldots , s_k\) is \( 2^{\frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1\).

Proof

Denote \(C^l_r\) as the number of possible combinations of r objects from a set of l objects. Then \(C^l_r = \frac{l!}{r!(l-r)!} = \frac{l(l-1)(l-2) \cdots (l-r+1)}{r!}\). Denote \(P^l_r\) as the number of possible permutations of r objects from a set of l objects. Then \(P^l_r = \frac{l!}{(l-r)!} =l(l-1)(l-2) \cdots (l-r+1)\). Hence the number of the corresponding sequences is

$$ \frac{C_2^k \cdot C_2^{k-2} \cdot \cdots \cdot C_2^2}{P_{\frac{k}{2}}^{\frac{k}{2}}} \cdot 2^{ \frac{dk}{2}}=\frac{k ! \cdot \frac{1}{2^{\frac{k}{2}}}}{(\frac{k}{2})!} \cdot 2^{\frac{dk}{2}} =\frac{k!}{(\frac{k}{2})! \cdot 2^{ \frac{k}{2} } } \cdot 2^{\frac{dk}{2}} = 2^{\frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1.$$

Theorem 2

Let \(D = 2^d\). Let \(Z_1, \ldots , Z_D\) be 0-1 random variables that are \(\epsilon \)-biased for linear tests of size \(k'\) that are constructed using n random bits. Define \(\textsf {Ext}^{(1)}\): \(\{0, 1\}^n \times \{0, 1\}^d \rightarrow \{0, 1\}\) by \(\textsf {Ext}^{(1)}(w, s) = Z_s(w)\), that is, \(\textsf {Ext}^{(1)}(w, s)\) is the random variable \(Z_s\), when using w as the value of the n bits needed to produce \(Z_1, \ldots , Z_D\). Then, for any \(0 < \delta \le \frac{1}{2}\) and any even integer \(k \le k'\), \(\textsf {Ext}^{(1)}\) is a \(((\frac{1}{2}+ \delta ) \cdot n, \gamma _2)\)-seeded-extractor, where \(\gamma _2= 2^{ \frac{ (\frac{1}{2}-\delta ) \cdot n }{k}} \cdot [2^{- \frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1 \cdot (1-\epsilon ) + \epsilon ]^{\frac{1}{k}}.\)

Proof

We improve the proof by proposing another method for partitioning the sum \(\sum \limits _{s_1, \ldots , s_k \in \{0, 1\}^d} \sum \limits _{w \in \{0, 1\}^n} \prod \limits _{j=1}^k e(w, s_j)\) into two sums. The first sum is over \(s_1, \ldots , s_k \in \{0, 1\}^d\) such that in each summand, at least one \(s_j\) appears odd times in the sequence \( s_1, \ldots , s_k\), and the second sum is over \(s_1, \ldots , s_k \in \{0, 1\}^d\) such that in each summand every \(s_j\) appears even times in the sequence \( s_1, \ldots , s_k\). By Proposition 1, the number of summands in the second sum is \(2^{\frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1\), and each summand is \(2^n\). Therefore, the number of summands in the first sum is \(2^{dk} - 2^{\frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1, \) and by Claim 1 each summand is bounded by \(2^n \cdot \epsilon \). Hence, \( 2^{ ( \frac{1}{2}+ \delta ) \cdot n} \cdot 2^{d \cdot k } \cdot \gamma (W, S)^k \le 2^n \cdot [ 2^{\frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1] + 2^n \cdot \epsilon \cdot [ 2^{dk} - 2^{\frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1].\) Correspondingly,

$$\begin{aligned}&\gamma (W, S)^k \le \frac{2^n \cdot 2^{dk}}{ 2^{( \frac{1}{2}+ \delta ) \cdot n } \cdot 2^{d \cdot k }} \cdot [ 2^{- \frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1 \cdot (1-\epsilon ) + \epsilon ]\\ {}&{~~~~~~~~~~~~}= 2^{( \frac{1}{2}- \delta ) \cdot n} \cdot [2^{- \frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1 \cdot (1-\epsilon ) + \epsilon ] \end{aligned}$$

That is, \(\gamma (W, S) \le 2^{ \frac{( \frac{1}{2}- \delta ) \cdot n}{k}} \cdot [2^{- \frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1 \cdot (1-\epsilon ) + \epsilon ]^{\frac{1}{k}}\).    \(\square \)

3.3 Comparison

For simplicity, in the rest of the paper, denote \(\gamma _1\) as the error of the extractor in Lemma 1, and \(\gamma _2\) as the counterpart in Theorem 2.

Proposition 2

\( (k-1) \cdot (k-3)\cdot \cdots \cdot 1 \le ( \frac{k}{2} )^k\) for any positive even integer k, and “=” holds iff \(k=2\). Furthermore, \(\lim \limits _{k \rightarrow \infty } \frac{ (k-1) \cdot (k-3)\cdot \cdots \cdot 1}{ 2^{\frac{1}{2} } \cdot {( \frac{k}{e} )}^{\frac{k}{2}} } =1 \).

Proof

When \(k=2\), it’s trivial that \((k-1) \cdot (k-3)\cdot \cdots \cdot 1 = ( \frac{k}{2} )^k\). In the following, we only consider any positive even integer k with \(k>2\).

Since \( \frac{k!}{(\frac{k}{2})!} < \frac{k^k}{2^{ \frac{k}{2}}}\), we have \(\frac{k! }{(\frac{k}{2})! \cdot 2^{\frac{k}{2}}} < \frac{k^k}{2^k}\). Hence,

$$ (k-1) \cdot (k-3)\cdot \cdots \cdot 1 = \frac{k! }{(\frac{k}{2})! \cdot 2^{\frac{k}{2}}} < \frac{k^k}{2^k}.$$

From the Stirling’s Formula, we have \( \lim \limits _{k \rightarrow \infty } \frac{k!}{ \sqrt{2 \pi k } ( \frac{k}{e} )^k } =1. \) Therefore,

$$ \lim \limits _{k \rightarrow \infty } \frac{ (k-1) \cdot (k-3)\cdot \cdots \cdot 1}{ 2^{\frac{1}{2} } \cdot {( \frac{k}{e} )}^{\frac{k}{2}} } = \lim \limits _{k \rightarrow \infty } [ \frac{k!}{ \sqrt{2 \pi k} \cdot ( \frac{k}{e})^k } \cdot \frac{\sqrt{2 \pi \cdot \frac{k}{2}} \cdot ( \frac{k}{2e})^{ \frac{k}{2} } }{( \frac{k}{2})! } ] =1. \quad \square $$

The error estimation of the extractor in Theorem 1 is better than that in Lemma 1. Recall that in Theorem 1, we have

$$\begin{aligned}&\gamma _2= 2^{ \frac{(\frac{1}{2}- \delta )n}{k}} \cdot [2^{- \frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1 \cdot (1-\epsilon ) + \epsilon ]^{\frac{1}{k}} \\ {}&~~~= 2^{ \frac{(\frac{1}{2}- \delta )n}{k}} \cdot \{ 2^{- \frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1 + [1- 2^{- \frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1] \cdot \epsilon \}^{\frac{1}{k} }, \end{aligned}$$

while in Lemma 1, we have \(\gamma _1 = 2^{ \frac{(\frac{1}{2}- \delta )n}{k}} \cdot ( 2 \epsilon )^{ \frac{1}{k}}\) in [6] under the assumption that \( \epsilon \ge 2^{ - \frac{dk}{2}} \cdot k^k\) and \( 0 < \delta \le \frac{1}{2}\).

In general, since \( \epsilon \ge 2^{ - \frac{dk}{2}} \cdot k^k\) and \( 2^{ - \frac{dk}{2}} \cdot k^k > 2^{- \frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1 \) for any even integer k, we get \(\gamma _1 > \gamma _2\). In particular, when k is large enough, from Proposition 2, we get that \( (k-1) \cdot (k-3)\cdot \cdots \cdot 1 \approx 2^{\frac{1}{2} } \cdot {( \frac{k}{e} )}^{\frac{k}{2}}\). Therefore,

$$ \gamma _2 \approx 2^{ \frac{(\frac{1}{2}- \delta )n}{k}} \cdot \{ 2^{- \frac{dk}{2}} \cdot 2^{\frac{1}{2} } \cdot {( \frac{k}{e} )}^{\frac{k}{2}} + [1- 2^{- \frac{dk}{2}} \cdot 2^{\frac{1}{2} } \cdot {( \frac{k}{e} )}^{\frac{k}{2}}] \cdot \epsilon \}^{\frac{1}{k} }.$$

Correspondingly, \( \epsilon \ge 2^{ - \frac{dk}{2}} \cdot k^k > 2^{ - \frac{dk}{2}} \cdot 2^{\frac{1}{2} } \cdot {( \frac{k}{e} )}^{\frac{k}{2}}\). Hence, \(\gamma _1 > \gamma _2\).

Remark 1

To simplify \(\gamma _2\), let k be a specific value. For instance, let \(k=4\), then the error \(\gamma _1 = 2^{ \frac{(\frac{1}{2}- \delta )n}{4}} \cdot ( 2 \epsilon )^{ \frac{1}{4}} \) and \(\gamma _2= 2^{ \frac{(\frac{1}{2}- \delta )n}{4}} \cdot [2^{- 2d} \cdot 3 \cdot (1-\epsilon ) + \epsilon ]^{\frac{1}{4}}\).

Remark 2

Noted that when k is large enough, \(( \frac{k}{2} )^k\) is much greater than \( (k-1) \cdot (k-3)\cdot \cdots \cdot 1\). For instance, when \(k=6\), we have \((\frac{k}{2} )^k=729\) and \((k-1) \cdot (k-3)\cdot \cdots \cdot 1=15\). Therefore, “The number of summands in the second sum is \(2^{\frac{dk}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1,\) and each summand is \(2^n\).” in the proof of Theorem 2 is a great improvement on “The number of summands in the second sum is bounded by \(2^{d \cdot \frac{k}{2}} \cdot (\frac{k}{2})^k\), and each summand is trivially bounded by \(2^n\).” in the proof of Lemma 1.

Remark 3

If \(\epsilon \ge \frac{1}{2^{(\frac{1}{2}-\delta )n+1}}\), then \(\gamma _1 = 2^{ \frac{(\frac{1}{2}- \delta )n}{k}} \cdot ( 2 \epsilon )^{ \frac{1}{k}} \ge 1\). In this case, the error estimation is meaningless.

3.4 Important Role in Improving the Seed Length of Non-malleable Extractors

It should be noticed that the error of the non-malleable extractor in Theorem 1 given by Cohen et al. [6] relies on some constrained parameters. The main idea of the proof about Theorem 1 given by Cohen et al. [6] is as follows. Assume for contradiction that \(\textsf {Ext}\) is not a non-malleable extractor, then after some steps, an inequality \(\gamma _1 > A\) is deduced, where A denotes a certain formula. On the other hand, from the assumption of Theorem 1, \(\gamma _1 < A\) should hold. Thus \(\textsf {Ext}\) is a non-malleable extractor. Essentially, the constraints on the parameters in Theorem 1 are chosen according to the inequality \(\gamma _1 < A\). From Proposition 2, we have \( \gamma _1 > \gamma _2\) for any positive even integer \(k \ge 4\). Therefore, we may relax the constraints on the parameters in Theorem 1 according to \( \gamma _2 < A\). See the proofs of Theorems 3 and 4 below for details. Correspondingly, the seed length may be further shortened.

4 Explicit Construction of Non-malleable Extractors with Shorter Seed Length

In this section, we improve the parameters of the explicit construction of non-malleable extractors by Cohen et al. in [6]. The seed length here is shorter than that in Theorem 1.

We first review two lemmas that will be used later.

Lemma 2

(see [6]) Let X be a random variable over \(\{0, 1\}^m\). Let Y, S be two random variables. Then,

$$ \Vert (X, Y, S) - (U_m, Y, S)\Vert _1 = \mathbb {E}_{s \sim S } [\Vert (X, Y, S)|_{S=s} -(U_m, Y, S)|_{S=s} \Vert _1].$$

Lemma 3

(see [6]) Let X, Y be random variables over \(\{0, 1\}^m\) and \(\{0, 1\}^n\) respectively. Then \(\Vert (X, Y)-(U_m, Y)\Vert _1 \le \sum \limits _{ \emptyset \ne \sigma \subseteq [m], \tau \subseteq [n]} bias (X_\sigma \oplus Y_\tau ),\) where \(X_i\) is the i-th bit of X, \(Y_j\) is the jth bit of Y, \(X_\sigma = \oplus _{i \in \sigma } X_i\), and \(Y_\tau = \oplus _{j \in \tau } Y_j\).

In what follows, we show a specific explicit construction of a non-malleable extractor such that it is an improvement of [6] in the sense that the seed length is shorter.

Theorem 3

There exists an explicit \((1016, \frac{1}{2})\)-non-malleable extractor \(\textsf {Ext}: \{0,1\}^{1024} \times \{0, 1\}^{19} \rightarrow \{0, 1\}\).

Proof Idea. We borrow the reductio ad absurdum approach in the proof of Theorem 1. The proof sketch is as follows. Assume by contradiction that \(\textsf {Ext}\) is not non-malleable. Then

\(\textsf {Phase~1}\): There must exist a weak source W with min-entropy at least \(\alpha \) and an adversarial function \(\mathcal {A}\) such that the statistical distance between \((\textsf {Ext} (W, S), \textsf {Ext}(W, \mathcal {A}(S)), S)\) and \((U_1, \textsf {Ext} (W, \mathcal {A}(S)), S) \) has a certain lower bound. Then there exists \(S\subseteq \{0, 1\}^d\) s.t. for every \(s \in S\), \(Y_s = \textsf {Ext}(W, s) \oplus \textsf {Ext}(W, \mathcal {A}(s))\) is biased. Consider the directed graph \(G= (S \cup \mathcal {A}(s), E)\) with \( E = \{(s, \mathcal {A}(s): s \in S \} \), where G might contains cycles. By employing a lemma about graph as shown in [6], we can find a subset \( S' \subseteq S\) s.t. the induced graph of G by \(S' \cup \mathcal {A}(S')\) is acyclic.

\(\textsf {Phase~2}\): We prove that the set of variables \( \{Y_s\}_{s \in S'}\) is \(\epsilon \)-biased for linear tests of size at most k / 2. Consider the extractor of Raz built on the variables \( \{Y_s\}_{s \in S'}\). It’s a good seeded-extractor, which yields a contradiction.

Phase 1 of the proof is almost the same as that in [6]. Phase 2 jumps out of the idea in [6]. We exploit the error estimation of the extractor in Theorem 2 instead of Lemma 1. We use a trick such that the even integer k is just 4 instead of the largest even integer that is not larger than \( \frac{\lceil 128 \delta \rceil }{2}\), where \(\delta \) can be seen in Theorem 1. Therefore the extractor error can be simplified and we don’t need to prove \(k \cdot ( \frac{1}{ \epsilon })^{ \frac{1}{k}} \le (2^d)^{\frac{1}{2}}\) as shown in Lemma 1.

Proof

The explicit construction we present is the extractor constructed in [20]. Alon et al. [1] observed that for every \(k'\), \(N \ge 2\), the sequence of 0-1 random variables \(Z_1, \ldots , Z_N\) that is \(\epsilon \)-biased for linear tests of size \(k'\) can be explicitly constructed using \(2 \cdot \lceil \log (1/\epsilon ) + \log k' + \log \log N \rceil \) random bits. Therefore, let \(D=2^{19}\) and \(\epsilon =2^{-\frac{1024}{2}+r}\) with \(r=1+ \log k' +\log 19\), then we can construct a sequence of 0-1 random variables \(Z_1, \ldots , Z_{2^{19}}\) that is \(\epsilon \)-biased for linear tests of size \(k'\) using n random bits. Let \(k'=8\). Define \(\textsf {Ext}: \{0, 1\}^{1024} \times \{0, 1\}^{19} \rightarrow \{0, 1\}\) by \(\textsf {Ext}(w, s)= Z_s(w)\).

Let S be a random variable uniformly distributed over \(\{0, 1\}^{19}\).

Assume for contradiction that \(\textsf {Ext}\) is not a \((1016, \frac{1}{2})\)-non-malleable-extractor. Then there exists a source W of length 1024 with min-entropy 1016, and an adversarial function \(\mathcal {A}: \{0, 1\}^{19} \rightarrow \{0, 1\}^{19}\) such that

$$\Vert ( \textsf {Ext}(W, S), \textsf {Ext}(W, \mathcal {A}(S)), S) - (U_1, \textsf {Ext} (W, \mathcal {A}(S)), S)\Vert _1 > \frac{1}{2}.$$

As in [5], suppose W is uniformly distributed over a set \(W' \subseteq \{0, 1\}^{1024}\) of size \(2^{1016}\).

For every \(s \in \{0, 1\}^{19}\), let \(X_s\) be the random variable \(\textsf {Ext}(W, s)\). By Lemmas 2 and 3, we have

$$\sum \limits _{\emptyset \ne \sigma \subseteq [1], \tau \subseteq [1]} \mathbb {E}_{s \sim S}[bias ((X_s)_\sigma \oplus (X_{\mathcal {A}(s) })_\tau )] > \frac{1}{2}. $$

Let \(\sigma ^*, \tau ^* \subseteq [1]\) be the indices of (one of) the largest summands in the above sum. For every \(s \in \{0, 1\}^{19}\), let \(Y_s=(X_s)_{\sigma ^*} \oplus (X_{\mathcal {A}(s) })_{\tau ^*}.\)

There is a set \(S'' \subseteq \{0, 1\}^{19}\) satisfying that

$$|S''| > \frac{\xi \cdot 2^{19-2}}{2 (1+1)^2}=2^{13}.$$

The \(S''\) here is the same as that in the proof of Theorem 1 by replacing t there with 1 and the error \(2^{-m}\) there with \( \frac{1}{2}\). Please see [6] for details.

Define a random variable \(Y_{S''}\) over \(\{0, 1\}\) as follows: To sample a bit from \(Y_{S''}\), uniformly sample a string s from \(S''\), and then independently sample a string w uniformly from \(W'\). The sampled value is \(Y_s(w)\). We have that \(bias(Y_{S''}) > \frac{ \frac{1}{2} }{ 2^{1+1} (2-1) (1+1) } = \frac{1}{2^4}\). For every \(s \in S''\), let \( Y'_s = Z_{(1, s)} \oplus ( \oplus _{j \in \tau ^* } Z_{(j, \mathcal {A}(s))}),\) where \(Z_{(1, s)}=Z_s\).

Let \(t=1\) and \(m=1\) in Claim 7.2 of [6], we get the following claim.

Claim 2. The set of random variables \(\{Y'_s\}_{s \in S''}\) \(\epsilon \)-fools linear tests of size 4.

We apply Theorem 2 on the random variables \(\{Y'_s \}_{s \in S''}\). For simplicity of presentation we assume \(|S''|= 2^{d'}\). By Theorem 2, the distribution of \( \textsf {Ext}^{(1)}(W, S'')\) is \(\gamma _2\)-biased for \(\gamma _2= 2^{ \frac{8}{k}} \cdot [2^{- \frac{d'k}{2}} \cdot (k-1) \cdot (k-3)\cdot \cdots \cdot 1 \cdot (1-\epsilon ) + \epsilon ]^{\frac{1}{k}}\). Let \(k=\frac{k'}{2}=4\), then \(\gamma _2= 2^{ \frac{8}{4}} \cdot [2^{-2d'} \cdot 3 \cdot (1-\epsilon ) + \epsilon ]^{\frac{1}{4}}\). We note that \(\textsf {Ext}^{(1)}(W, S'')\) has the same distribution as \(Y_{S''}\). In particular, both random variables have the same bias. Therefore, we get

$$ 2^{ \frac{8}{4}} \cdot [2^{-2d'} \cdot 3 \cdot (1-\epsilon ) + \epsilon ]^{\frac{1}{4}} \ge bias(Y_{S''}) > \frac{1}{2^4},$$

Moreover, since \(2^{d'} = |S''| > 2^{13}\), we have

$$ 2^2 \cdot [4 \cdot 2^{-28} \cdot 3 \cdot (1-\epsilon ) + \epsilon ]^{\frac{1}{4}} > 2^2 \cdot [2^{-2d'} \cdot 3 \cdot (1-\epsilon ) + \epsilon ]^{\frac{1}{4}} > \frac{1}{2^4}.$$

That is,

$$\begin{aligned} 2^{-38} > \frac{2^{-4} \cdot 2^{-20} -\epsilon }{3(1-\epsilon ) \cdot 2^{12} }, \end{aligned}$$
(a)

where \(\epsilon = 2^{ -516 +r}\) and \(r=4+ \log 19\).

On the other hand, we have \( 2^{-38} < \frac{2^{-4} \cdot 2^{-20} -\epsilon }{3(1-\epsilon ) \cdot 2^{10} \cdot 2^2},\) which is in contradiction to the inequality (a).   \(\square \)

Comparison. In Theorem 1, the seed length d and the source length n should satisfy \( d \ge \frac{23}{\delta } m + 2 \log n\) with \(0 < \delta \le \frac{1}{2}\). However, in the above construction, we have \(d=1.9 \log n\). We compare them in detail as follows.

Let \(n=2^{10}\), \(m=1\), and \(\delta =\frac{63}{128}\) in Theorem 1, then it can be easily verified that \(n \ge \frac{160}{\delta } \cdot m\). To construct an explicit \(((\frac{1}{2} + \delta ) \cdot n, 2^{-m})\)-non-malleable extractor \(\textsf {nmExt}: \{0, 1\}^n \times \{0, 1\}^d \rightarrow \{0, 1\}^m\) (that is, an explicit \((1016, \frac{1}{2})\)-non-malleable extractor \(\textsf {nmExt}\)), by Theorem 1, the seed length d should satisfy \(d \ge \frac{23}{\delta } \cdot m + 2 \log n = \frac{46}{63} + 66\). Moreover, when \(d \le 2^{41}\), the precondition \(\delta \ge 10 \cdot \frac{\log (nd)}{n}\) in Theorem 1 is satisfied. Meanwhile, by Theorem 3, the seed length d can just be 19. In this sense, our construction is much better than that of [6].

Using the extractor with improved error estimation (see Theorem 2), we can also improve the parameters of the explicit non-malleable extractor \(\textsf {nmExt}: \{0,1\}^n \times \{0, 1\}^d \rightarrow \{0, 1\}^m\) constructed by Cohen et al. [6] below.

Theorem 4

Assume that

$$ 0 < 2^{\log 3 -2 \theta +4m +8 } - 2^{ \log 3 - \frac{n}{2}+4 + \log d -2 \theta +4m +8} \le 2^{2d + 4 \theta -8m -8 -n + \alpha } - 2^{2d - \frac{n}{2}+4 + \log d }.$$

Then there exists an explicit \(( \alpha , 2^\theta )\)-non-malleable extractor \(\textsf {nmExt}: \{0,1\}^n \times \{0, 1\}^d \rightarrow \{0, 1\}^m\).

The proof is similar to that of Theorem 3. Please see Appendix A for details.

Due to the analysis of Sect. 3.4, we conclude that the above theorem is really an improvement in the sense that the seed length here is shorter. Though the constrains on the parameters in Theorem 4 are complex, we show some simplification in Appendix B. How to further simplify the constraints is an open problem.

5 Application to Privacy Amplification

In this section, we show how the non-malleable extractor is applied to the privacy amplification protocol [8, 9] (also known as an information-theoretic key agreement protocol), the formal concept of which can be seen in Appendix C.

Roughly speaking, in this scenario, Alice and Bob share a shared weak secret W, the min-entropy of which is only guaranteed. They communicate over a public and unauthenticated channel to securely agree on a nearly uniform secret key R, where the attacker Eve is active and computationally unbounded. To achieve this goal, the protocol is designed as follows.

Table 1. The Dodis-Wichs privacy amplification protocol.

Assume that we’ll authenticate the seed \(S_0\). Alice initiates the conversation by transmitting a uniformly random seed S to Bob. During this transmission, S may be modified by Eve into any value \(S'\). Then Bob samples a uniform seed \(S_0\), computes the authentication key \(R'= \textsf {nmExt}(W, S')\), and sends \(S_0\) together with the authentication tag \(T_0=\textsf {MAC}_{R'}(S_0)\) to Alice. At this point, Bob reaches the \( \textsf {KeyDerived}\) state and outputs \(R_B= \textsf {Ext}(W, S_0)\). During this transmission, \((S_0, T_0)\) may be modified by Eve into any pair \((S'_0, T'_0)\). Alice computes the authentication key \(R= \textsf {nmExt}(W, S)\) and verifies that \(T'_0=\textsf {MAC}_{R}(S'_0)\). If the verification fails then Alice rejects and outputs \(R_A=\bot \). Otherwise, Alice reaches the \(\textsf {KeyConfirmed}\) state and outputs \(R_A= \textsf {nmExt}(W, S'_0)\).

The security can be analyzed in two cases [6, 8]. Case 1: Eve does not modified the seed S in the first round. Then Alice and Bob share the same authentication key (i.e., \(R'= R\)), which is statistically close to a uniform distribution. Therefore, Eve has only a negligible probability of getting a valid authentication tag \(T'_0\) for any seed \(S'_0 \ne S_0\). Case 2: Eve does modify the seed S to a different seed \(S'\). Since \(T_0\) is a deterministic function of \(S_0\) and \(R'\), Eve may guess \(R'\). According to the definition of non-malleable extractors, the authentication key R computed by Alice is still statistically close to a uniform distribution. Thus, again, the adversary has only a negligible probability of computing a valid authentication \(T'_0\) for any seed \(S'_0\) with respect to the authentication key R. Consequently, the above protocol is secure.

Theorem 5

(see [6, 9]) Assume \(\textsf {nmExt}: \{0, 1\}^n \times \{0, 1\}^{d_1} \rightarrow \{0, 1\}^{m_1}\) is a \((\alpha , \gamma _{\textsf {nmExt}})\)-non-malleable extractor, \(\textsf {Ext}: \{0, 1\}^n \times \{0, 1\}^{d_2} \rightarrow \{0, 1\}^{m_2}\) is a strong \(( \alpha - (d_1 + m_1) - \log \frac{1}{\epsilon '}, \gamma _{\textsf {Ext}})\)-extractor, and \(\{ \textsf {MAC}_r : \{0, 1\}^{d_2} \rightarrow \{0, 1\}^\tau \}_{r \in \{0, 1\}^{m_1}}\) is a \(\varepsilon _{\textsf {MAC}}\)-secure message authentication code. Then for any integers n and \(\alpha \le n\), the protocol in Table 1 is a 2-round \((n, \alpha , m, \eta )\)-privacy amplification protocol, with communication complexity \(d_1 + d_2 + \tau \) and \(\eta = \max \{\epsilon '+ \gamma _{\textsf {Ext}}, \gamma _{\textsf {nmExt}}+ \varepsilon _{\textsf {MAC}}\}\).

The explicit non-malleable extractor in this paper can be applied to construct the above privacy amplification protocol with low communication complexity.

6 Conclusion

Non-malleable extractor is a powerful theoretical tool to study privacy amplification protocols, where the attacker is active and computationally unbounded. In this paper, we improved the error estimation of Raz’s extractor using the combination and permutation techniques. Based on the improvement, we presented an improved explicit construction of non-malleable extractors with shorter seed length. Similar to [6], our construction is also based on biased variable sequence for linear tests. However, our parameters are improved. More precisely, we presented an explicit \((1016, \frac{1}{2})\)-non-malleable extractor \(\textsf {nmExt}: \{0, 1\}^{1024} \times \{0, 1\}^d \rightarrow \{0, 1\}\) with seed length 19, while it is no less than \(\frac{46}{63} + 66\) according to Cohen et al. in CCC’12 [6]. We also improved the parameters of the general explicit construction of non-malleable extractors proposed by Cohen et al. and analyzed the simplification of the constraints on the parameters (see Appendix B for details). How to further simplify the constraints is an open problem. Finally, we showed their applications to privacy amplification protocol (or information-theoretic key agreement protocol).