Abstract
Very recently (in CRYPTO 2017) Dai, Hoang, and Tessaro have introduced the Chi-square method (χ2 method) which can be applied to obtain an upper bound on the statistical distance between two joint probability distributions. The authors have applied this method to prove the pseudorandom function security (PRF-security) of sum of two random permutations. In this work, we revisit their proof and find a non-trivial gap in the proof. We plug this gap for two specific cases and state the general case as an assumption whose proof is essential for the completeness of the proof by Dai et al.. A complete, correct, and transparent proof of the full security of the sum of two random permutations construction is much desirable, especially due to its importance and two decades old legacy. The proposed χ2 method seems to have potential for application to similar problems, where a similar gap may creep into a proof. These considerations motivate us to communicate our observation in a formal way. On the positive side, we provide a very simple proof of the PRF-security of the truncated random permutation construction (a method to construct PRF from a random permutation) using the χ2 method. We note that a proof of the PRF-security due to Stam is already known for this construction in a purely statistical context. However, the use of the χ2 method makes the proof much simpler.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Different tools from probability and statistics are now heavily used in different areas in cryptography. In this paper, we focus on a statistical tool, termed χ2 method, which has been introduced by Dai, Hoang, and Tessaro in CRYPTO 2017 [7]. Although a method which is essentially similar to the χ2 method is known in statistics (since 1978), we believe that the χ2 method is new in the context of cryptography. In [7], this method has been used to show pseudorandom function security (PRF-security) of two well known constructions, namely sum of random permutations [1, 17, 21, 22] and encrypted Davis-Meyer (EDM) [5, 19]. Further, we feel that this method may help us to obtain tight (and simplified) proofs for certain constructions where proofs so far have evaded more classical methods, such as the H-coefficient method [20].
χ2Method. The distinguishing advantage of a family of keyed functions is bounded by the total variation (also known as statistical distance) between the output distribution of the family and the output distribution of a random function. Total variation between two probability distributions P 0 and P 1 over a sample space Ω, denoted dTV(P 0 ,P 1 ), is defined as the half of L1-norm \(\| \mathbf {P}_{\mathbf {0}}- \mathbf {P}_{\mathbf {1}}\|_{1} := \sum _{x \in \mathrm {\Omega }} |\mathbf {P}_{\mathbf {0}}(x) - \mathbf {P}_{\mathbf {1}}(x)|\). In [7], the authors have revisited a variation of the additivity property of the KL divergence between two joint distributions. The authors have termed it χ2 method. When P 0 and P 1 are joint distributions, this method provides an upper bound on ∥P 0 −P 1 ∥1 based on the χ2 distances between the conditional distributions of P 0 and P 1 . Next, we recall the definition of χ2 distance. In what follows, we use the convention that 0/0 = 0.
Definition 1
The χ2 distance between distributions P 0 and P 1 (over a sample space Ω) with P 0 ≪P 1 (i.e., the support of P 0 is contained in the support of P 1 ) is defined as
χ2 distance has its origin in mathematical statistics dating back to Pearson (see [18] for some history). It can be seen that χ2 distance is not symmetric and hence it is not a metric. However, this is useful for bounding other metrics, e.g., total variation. In the following, we briefly describe the χ2 method (see Section Appendix A for details and proof).
Let X = (X1,…,X q ) and Y = (Y1,…,Y q ) be two multivariate random variables taking values from Ωq. In order to simplify the notation, we denote by Xi− 1 the joint random variable (X1,…,Xi− 1). Let \(\mathbf {P}_{{0}_{x_{1}, \ldots , x_{i-1}}} \) denote the conditional probability distribution of X i given X1 = x1, …, Xi− 1 = xi− 1. We similarly write \(\mathbf {P}_{{1}_{x_{1},\ldots , x_{i-1}}}\) for the distribution of Y i given \(\mathsf {Y}_{1} = x_{1}\), …, Yi− 1 = xi− 1. Then the χ2 method says
where \(\chi ^{2}(x_{1}, \ldots , x_{i-1}) = d_{\chi ^{2}}(\mathbf {P}_{{0}_{x_{1},\ldots , x_{i-1}}}, \mathbf {P}_{{1}_{x_{1}, \ldots , x_{i-1}}})\) and for all x1,…,xi− 1, \(\mathbf {P}_{{0}_{x_{1},\ldots , x_{i-1}}}\ll \mathbf {P}_{{1}_{x_{1}, \ldots , x_{i-1}}}\). Note that we need this condition to define \(d_{\chi ^{2}}\).
XOR of Two Random Permutations. XOR or sum of two random permutations is a well known construction, proposed and studied by Hall et al. in [13], for conversion of pseudorandom permutations (PRPs) into pseudorandom functions (PRFs).Footnote 1 Given a permutation π : {0,1}n↦{0,1}n, the construction creates a function f : {0,1}n− 1↦{0,1}n, defined as f(x) = π(0||x) ⊕ π(1||x). When π is chosen uniformly at random from Perm n , the set of all permutations of {0,1}n, how well does f resemble (in a certain well defined sense) a random function with the same domain and range (a function chosen uniformly from the set of all functions from the domain to the range)? A satisfactory answer to this question remained elusive for over two decades. There have been attempts [1, 17, 21, 22] to prove information-theoretic security of the construction. However, the proofs either fell short of proving full security (to be made precise in the next section) of the construction [17] or were sketchy ([1]) or contained non-trivial gaps and were difficult to follow [21, 22] as has also been observed by the authors of [7].Footnote 2 Also, as a related problem, Cogliati, Lampe, and Patarin [4] gave weaker bounds for the case of the sum of at least three permutations. The XOR construction is important since it has been used to obtain some constructions achieving beyond birthday (or sometimes almost full) security (e.g., CENC [15], PMAC_Plus [3] and ZMAC [14]).
1.1 Main results in the paper
In [7], Dai et al. have used the χ2 method to prove full security of the XOR construction (XOR of two random permutations). In this paper, we have a closer inspection of the proof and we find a non-trivial gap in it. The gap is due to incorrect equalities involving conditional expectations. We have also made an attempt to fix the proof and we have shown that the proof can be fixed under Asssumption 1 (stated in Section 4). We have proved the assumption for two special cases and left the general case as an open problem.
In this note, we communicate the above observation formally. This serves two purposes: (a) to motivate a flawless proof of this problem, especially owing to its importance and a two-decades old legacy, (b) to prevent these types of loopholes from creeping into the proofs involving the χ2 method, especially since the method seems to have potential for application to similar problems.
Truncation of Random Permutation. Although the application (in [7]) of the χ2 method to the XOR construction contains gap, this technique can be powerful for bounding PRF-security of other constructions. In fact, in [7], the authors applied this method to bound the PRF-security of the EDM (or encrypted Davis-Meyer) construction. In this note, we apply this technique to the truncated random permutation construction and obtain a very simple proof of the known tight bound on the PRF-security of the construction. This has been studied by Stam (in a statistical context) in 1978 [25] and later by many others (e.g., [1, 8,9,10, 13]). Stam’s proof technique is very close to the χ2 method. However, the other proofs are very different and produce different results. The difference between the proof methods of the relevant results from [1, 8, 13] and [25] is discussed in [10]. Our proof approach is more modular and uses the χ2 method explicitly. We discuss these very briefly in Remark 1 and Remark 2.
The PRF property of the truncated random permutation construction has recently been used in the key derivation for the AES-GCM, Counter based authenticated encryption constructions [11].
1.2 Organization of the paper
The rest of the paper is organized as follows. In the next section, we provide a brief overview of relevant security notions and the χ2 method. There we also discuss the two constructions: XOR of two random permuations construction and trucated random permutation construction. Section 3 is devoted to the proof of Theorem 2. In Section 4, we discuss the proof, by Dai et al., of the full security of the XOR of two random permutation construction, where we also point out the gap in it. In Section 5, we provide the proofs of Assumption A1 for two specific cases. We conclude in Section 6 by remarking on an identical gap in the proof of full security of a related construction, termed XOR of two independent random permutations. Finally, in Appendix A, we provide a self-contained proof of the χ2 method; essential ingredients of the proof is same as that of [7], however, we also cover the finer details (such as the proof of the Pinsker’s inequality).
2 Preliminaries
Notation and Convention
We use the short-hand notation Xt to denote a tuple (X1,…,X t ). We also write \({\mathcal {S}}^{t}\) to denote the t-fold Cartesian product of the set \(\mathcal {S}\) with itself. It will be clear from the context whether Xt means a t-tuple (when X is a tuple) or product set (when X is a set).
We use notations X,Y,Z etc. (possibly with suffix) to represent random variables over some sets. Following the above notational convention, Xt would represent a t-tuple of random variables or random vector (X1,…,X t ). We use \(\mathcal {E}, \mathcal {S}, \mathcal {T}\) etc. (possibly with suffix) to denote sets. \(\mathcal {A}\) will always represent an adversary.
In this paper, we fix a positive integer n, and we denote 2n by N.
2.1 PRF-security definition
Pseudorandom function (PRF) is a very popular security notion in cryptography. While analyzing a message authentication code (MAC), we mostly study its PRF-security as it is a stronger notion than MAC. It has also been used to define encryption schemes, authenticated encryptions, and other cryptographic algorithms.
Now we formally define the PRF-advantage of an algorithm or a keyed function. By \(\mathsf {X} \leftarrow _{\$} \mathcal {S}\) we mean that X is sampled uniformly from a finite set \(\mathcal {S}\). Let m and p be positive integers. Let RP m denote the random permutation chosen uniformly from Perm m , the set of all permutations on {0,1}m, i.e., RP m ← $ Perm m . Similarly, let RFm→p← $ Funcm→p (the set of all functions from {0,1}m to {0,1}p). Let \(\mathcal {K}\) be a finite set (it is the key space of the construction). Given a function \(f : \mathcal {K} \times \{0,1\}^{m} \to \{0,1\}^{p}\) and for every \(k \in \mathcal {K}\), we denote f k to represent the function (also called keyed function) f(k,⋅) ∈Funcm→p. We now define the PRF-advantage of an oracle adversary \(\mathcal {A}\) against f as follows.
Definition 2 (PRF-advantage)
Let \(\mathcal {A}\) be a distinguisher (oracle algorithm) and \(f : {\mathcal {K}} \times \{0,1\}^{m} \to \{0,1\}^{p}\). Then, the PRF-advantage of \(\mathcal {A}\) against f is defined as
As we restrict to only deterministic keyed functions (i.e., functions which give same output on same input) we can assume, without loss of generality, that the adversary does not repeat its queries. In other words, if Q1,…,Q q are all queries then these are distinct. We can also assume that \(\mathcal {A}\) is deterministic as it can always run with the best random coins which maximize the advantage. Suppose \(\mathcal {A}\) makes q distinct queries adaptively, denoted Q1,…,Q q , and obtains responses U1,…,U q . So, when \(\mathcal {A}\) in interacting with RFm→p, the outputs are uniformly and independently distributed over {0,1}p which we denote as U1,…,U q ← $ {0,1}p.
Similarly, let X1,…,X q denote the outputs of fK where \(\textsf {K} \leftarrow _{\$} {\mathcal {K}}\). We denote the probability distributions associated with U1,…,U q and X1,…,X q by P 1 and P 0 respectively. Thus,
where \({\mathcal {E}}\) is the set of all q-tuple of responses xq := (x1,…,x q ) ∈ ({0,1}n)q for which \(\mathcal {A}\) returns 1. From the definition the total variation (also known as the statistical distance) between P 0 and P 1 is
Hence,
Footnote 3 Thus, the main cryptographic objective (that of determining the PRF-advantage \({{\mathbf {Adv}}^{\text {prf}}_{f}}(\mathcal {A})\)) turns out to be a purely probability or statistical problem. Next, we discuss the χ2 method which provides an upper bound of total variation between two joint distributions.
2.2 χ 2 method
Let \(\mathsf {X} := (\mathsf {X}_{1}, \ldots , \mathsf {X}_{q})\) and Z := (Z1,…,Z q ) are two random vectors of size q distributed over Ωq. Let us denote the probability distributions of X and Z as P 0 and P 1 respectively. We denote the conditional probability distributions as follows.
When i = 1, \(\mathbf {P}_{\mathbf {0}|x^{i-1}}(x_{1})\) represents \(\mathbf {P}(\mathsf {X}_{1} = x_{1})\). Similarly, for \(\mathbf {P}_{\mathbf {1}|x^{i-1}}(x_{1})\). Let \(x^{i-1} \in \mathrm {\Omega }^{i-1}\), i ≥ 1. Let us denote the χ2 distance between \(\mathbf {P}_{\mathbf {0}|x^{i-1}}\) and \(\mathbf {P}_{\mathbf {1}|x^{i-1}}\) as \(\chi ^{2}(x^{i-1})\), i.e.,
Thus, χ2 is a real valued function. The next theorem is the crux of the χ2 method; it bounds the total variation between two joint distributions in terms of the χ2 distance between the corresponding conditional distributions.
Theorem 1 ([7])
Suppose P 0 and P 1 denote probabilitydistributions ofX := (X1,…,X q )and\(\mathsf {Z} := (\mathsf {Z}_{1}, \ldots , \mathsf {Z}_{q})\)andfor all\(x_{1}, \ldots , x_{i-1}\),we have\(\mathbf {P}_{\mathbf {0}|x^{i-1}}\ll \mathbf {P}_{\mathbf {1}|x^{i-1}}\).Then
For the sake of completeness, we provide a complete proof of this theorem in Appendix A. In our setup, note that \(\mathsf {Z}_{1}, \ldots , \mathsf {Z}_{q} \leftarrow _{\$} {\{0,1\}}^{p}\) for some p and hence \(\mathbf {P}_{\mathbf {1}|x^{i-1}}(x_{i}) = \frac {1}{2^{p}}\) for all xi. So,
In the following subsection, we describe two constructions for which this method was applied.
2.3 Two random permutation based constructions
In this paper, we mainly deal with two constructions based on a random permutation RP n . Similar to a random function, if all queries to a random permutation RP n are distinct and depends only on the previous responses (which is the case for an adversary), the outputs V1,…,V q behave like a random sample without replacement (WOR) from {0,1}n. We write \(\mathsf {V}_{1}, \ldots , \mathsf {V}_{q} \gets _{\text {wor}} \{0,1\}^{n}\) to denote this. More formally, for all distinct\(x_{1}, \ldots , x_{q} \in \{0,1\}^{n}\), \(\textbf {P}(\mathsf {V}_{1} = x_{1}, \ldots \mathsf {V}_{q} = x_{q}) = \frac {1}{(N)_{q}}\), where (N) q = N(N − 1)⋯(N − q + 1). Now, we briefly describe the constructions.(1) XORConstruction. Define XOR π : {0,1}n− 1 →{0,1}n to be the construction that takes a permutation π ∈Perm n x ∈{0,1}n− 1 it returns π(x∥0) ⊕ π(x∥1). Thus, XOR construction based on a random permutation RP n returns X1,…,X q where X1 :=V1 ⊕V2, …, X q :=V2q− 1 ⊕V2q and V1,…,V2q←wor{0,1}n.(2) trRPConstruction. Let m ≤ n and trunc m denotes the truncation function which returns the first m bits of x ∈{0,1}n. Truncated random permutation is a composition of random permutation followed by a truncation function. More formally, we define for every x ∈{0,1}n,
Note that it is a function family, keyed by random permutation, mapping the set of all n-bit sequences to the set of all m-bit sequences. Let X1,…,X q denote the q outputs of trRP m . Then X i =trunc m (V i ) for all i.
PRF-security of this construction has been studied by Stam in 1978, though in a much broader context (see [25] for details), and later by others (e.g., [1, 8,9,10, 13]). In particular, Stam proved the following statement.
Theorem 2 ([25])
LetV1,…,V q ←wor{0,1}n,U1,…,U q ← $ {0,1}mandX i =trunc m (V i )foralli.Then
whereX = (X1,…,X q )andU = (U1,…,U q ).
The following corollary (though not proved by Stam) is immediate from the relationship between PRF-advantage and total variation.
Corollary 1
LetM = 2m,N = 2nandm ≤ n. For anyadversary\(\mathcal {A}\)makingqqueries wehave
Remark 1
The upper bounds on the PRF-advantage of the trRP Construction given in [8, 13] are different (and weaker) than the one obtained by Stam. Although the bounds are similar for some choices of parameters. In [10], all these results are mentioned, and the proofs are briefly surveyed. In [10], a general tight lower bound on the PRF-advantage has been proved (improving on the lower bound declared in [13]).
3 Proof of theorem 2 using the χ 2 method
Now we provide an alternative proof of Theorem 2 using the χ2 method. We briefly recall the setup. Here V1,…,V q ←wor{0,1}n and X i =trunc m (V i ). Let x ∈{0,1}m, i ≥ 1 be an integer, and K = N/M. Also, let H denote the number of j < i, for which trunc m (V j ) = x. The probability distribution of H is well known as the hypergeometric distribution HG(N,K,i − 1). For every max(0,s + K − N) ≤ a ≤ min(K,s) we have
The following fact states the expectation and variance formula of a hypergeometric distribution. Its proof can be found in standard probability theory text books and hence we skip it.
Fact 1
Let H follow hypergeometric distribution HG(N,K,s) and let p denote \(\frac {K}{N}\). Then,
As an aside, we mention that the factor \(\frac {N-s}{N-1}\) is also known as the finite sampling correction factor. Up to this factor, the expression of variance is same as that of the binomial distribution.
Now, we apply the χ2 method to bound the total variation dTV(X,U), where U1,…,U q ← $ {0,1}m. Let P 0 and P 1 denote the probability distributions of X and U respectively. Note that
Let \(N_{i,x}(x^{i-1}) := |{\mathcal {S}}_{i,x}(x^{i-1})|\) and Hi,x = Ni,x(Xi− 1). Then it is easy to see from the definition of the heypergeometric distribution that Hi,x follows HG(N,N/M,(i − 1)). Now, we compute the χ2 function evaluated at xi− 1.
Hence,
This follows from the linearity of the expectation and the fact that Ex[Hi,x] = (i − 1)/M. By substituting the value of Var[N x ] as described in the Fact 1, we obtain
Now by using Theorem 1 we have
Remark 2
In order to draw comparison between our proof (using the χ2 method) of Theorem 2 and the proof due to Stam, we remark that the main ideas of both the proofs are same; namely both use the chain rule of the KL divergence, concavity of the logarithm function, and also the hypergeometric distribution. However, unlike in our case (in (6)) Stam did not make explicit use of variance of the hypergeometric distribution. Instead, he used Jensen’s inequality. Moreover, our proof is simpler and modular compared to Stam’s proof with a more direct approach.
4 Overview of the proof by Dai et al. and its flaw
In this section, we provide a brief overview of the proof by Dai et al. to precisely point out the gap in their proof. In order to better emphasize, we provide a brief sketch of the proof due to Dai et al. We mostly follow the notation by the authors along with our notational convention. For example, we mostly use N instead of 2n. Moreover, for simplicity we write the set {0,1}n ∖{0n} as [N]∗.
Theorem 3 ([7])
Fix an integern ≥ 8and letN = 2n. Forany adversary\(\mathcal {A}\)that makes\(q \leq \frac {N}{32}\)queries we have
Proof due to Dai et al. in [7]
Let \(\mathcal {A}\) be an adversary making exactly q distinct queries adaptively. As we have observed before, the output distributions of random function and XOR function do not depend on \(\mathcal {A}\). In fact, \(\mathsf {U}^{\prime }_{1}, \ldots , \mathsf {U}^{\prime }_{q} \leftarrow _{\$} \{0,1\}^{n}\) and X1 :=V1 ⊕V2,…,X q :=V2q− 1 ⊕V2q are the outputs of random function and XOR construction respectively, where V1,…,V2q←wor{0,1}n. Let P 1 and P2 denote the output distributions of X := (X1,…,X q ) and \(\mathsf {U}^{\prime } := (\mathsf {U}^{\prime }_{1}, \ldots , \mathsf {U}^{\prime }_{q})\) respectively. Thus,
Now, we note that X i ’s cannot take 0n and hence it is natural to consider the q-tuple of random variables U1,…,U q ← $ [N]∗ := {0,1}n ∖{0n}. Let us denote by P 0 the probability distribution of U1,…,U q . By simple algebra, we have dTV(P 0 ,P2) ≤ q/2n. Also, using triangle inequalityFootnote 4, we have
At this point, the χ2 method (i.e., Theorem 1) gives an upper bound on dTV(P 0 ,P 1 ). The rest of the proof is devoted to show \(d_{\text {TV}}(\mathbf {P}_{\mathbf {0}}, \mathbf {P}_{\mathbf {1}}) \leq \frac {0.5q + 3 \sqrt {q}}{2^{n}}\).
For every non-zero x1,…,x i , we clearly have \(\mathbf {P}_{\mathbf {0}|x^{i-1}}(x_{i}) = 1/(N-1)\). For simplicity, let us denote by Yi,x the conditional probability \(\mathbf {P}_{\mathbf {1}}(x)\) which is also a function over xi− 1. When xi− 1 is chosen following the distribution of Xi− 1, we denote Yi,x as Yi,x. From the definition of χ2 function corresponding to (V1,…,V q ) and (U1,…,U q ), we have
Now, we give a brief description of the rest but critical part of the proof where the authors provided an upper bound on Ex[χ2(Xi− 1)]. We keep the authors’ flow (suppressing some calculation which will be denoted as ∗∗∗) and wordings. However, we change some of their notations in order to make them consistent with our notation. Authors complete the proof as described below.
We now expand Yi,x into a more expressive and convenient formula to work with. ∗∗∗ Let S = {V1,V2,…,V2i− 2}. Let Di,x be the number of pairs (u,u ⊕ x) such that both u and u ⊕ x belongs to S. Note that S and Di,x are both random variables, and in fact functions of the random variables V1,V2,…,V2i− 2. ∗∗∗ Hence,
From (7),
In the last formula, it is helpful to think of each Di,x as a function of V1,V2,…,V2i− 2, and the expectation is taken over the choices of V1,V2,…,V2i− 2 sampled uniformly without replacement from {0,1}n. We will show thatFootnote 5 for any x ∈{0,1}n ∖{0n},
and thus
Summing up, from χ2-method
4.1 Flaw in the above proof and its repair under an assumption
Let us revisit (8). Let us fix distinct v1,…,v2i− 2 and define the set \({\mathcal {S}} = \{v_{1}, \ldots , v_{2i-2} \}\). Let Di,x denote the number of pairs (u,u ⊕ x) such that both u and u ⊕ x belong to \(\mathcal {S}\). Let x1 = v1 ⊕ v2,…,xi− 1 = v2i− 3 ⊕ v2i− 2. Now, it is easy to see that
which appeared in the right hand side of (8). Whereas the left hand side of the equation is P(X i = x|X1 = x1,…,Xi− 1 = xi− 1). Note that in general,
does not hold for everyv 1 ,…,v2i− 2. Hence (8) is incorrect.
After observing this flaw in the proof, let us see how we can fix it. If we can prove (10) in some other way, we can still continue with the rest of the proof. This can be proved if we can prove a variant of the (8) as follows:
where c = 1/(N − 1). In other words,
The above equation is equivalent to
We will show in the subsequent section that (14) is not actually true. This in fact shows that the (13) is actually an inequality for certain choices of v i ’s. However, the proof can still survive if we have the following weaker statement which we place here as an assumption.
Assumption 1
Under this assumption, we can justify the inequalities appearing in (9) and (10). Hence, we can complete the proof of PRF-security of XOR construction under the above assumption. In the following section, we study the assumption for i = 2 and i = 3.
5 On the correctness of Assumption 1
In this section, we study the correctness of our assumption stated in the previous section for some small choices of i, namely for i = 2 and 3.
Theorem 4
Proof
For notational simplicity we will write [N]∗ to represent the set {1,2,…,N − 1}. Now, we evaluate the two sides of (16).
L.H.S. of (16): Expanding the l.h.s. of (16) we get
Now, it follows that
Next, we split the sum in (17) into the following two subcases: (i) Case 1.1, where we consider the condition x2 = x1, and (ii) Case 1.2, where we consider the condition x2≠x1. In each of the subcases, we determine the number of tuples (x2,x1) and probability P(X2 = x2,X1 = x1) of a particular tuple satisfying the conditions of the subcase.
Case 1.1: (x2 = x1).
Next, we have
Therefore,
Case 1.2: (x2≠x1).
Now,
Hence,
R.H.S. of (16): Expanding the r.h.s. of (16) we get
Next, it follows that
Similar to the l.h.s., we split the sum in the r.h.s of (21) depending on the conditions (i) (Case 2.1), and (ii) x2 = v1 + v2 (iii) x2≠v1 + v2 (Case 2.2). In each of the subcases, we determine the number of tuples (x2,v2) and probability P(X2 = x2,V2 = v2) of a particular tuple satisfying the conditions of the subcase.
Case 2.1: (x2 = v1 + v2).
So,
Case 2.2: (x2≠v1 + v2).
Therefore,
The theorem follows by comparing (20) and (24).
So we have shown that our assumption is valid for i = 2 (in fact these are equal). Now we show that the assumption is still valid for i = 3. Here we have strict inequality.
Theorem 5
Proof
As in the proof of the previous theorem, we consider the two sides of (25) separately. However, in this proof, the calculations are more involved. In order to help the reader follow the steps of the proof, we show the proof structure in Fig. 1. In the figure, the root node corresponds to (25), and its two children correspond to calculation of the two sides of (25). In the remaining nodes, we show the conditions corresponding to the subcases.
L.H.S. of (25): Expanding the l.h.s. of (25) we get
We split the outer sum in the r.h.s. of (26) depending on the conditions (i) x1 = x2 (Case 1.1), and (ii) x1≠x2 (Case 1.2). For each of these subcases and subcases of these subcases, we determine the number of tuples (x3,x2) and the probability P(X3 = x3,X2 = x2) of a particular tuple satisfying the conditions of the subcase.
Case 1.1: (x1 = x2).
In this case, we have
Next, we consider the following two subcases of this case: (i) Case 1.1.1, where we consider the condition x3 = x1, and (ii) Case 1.1.2, where we consider the condition x3≠x1.
Case 1.1.1: (x3 = x1 = x2).
Case 1.1.2:(x3≠x1 = x2)
In order to calculate the probability on the r.h.s. of (112), we first observe that for fixed x1, number of possible choices for (v1,v2) = N. Since v3,v4∉{v1,v2}, so, number of possible choices for (v3,v4) = N − 2. Further, we also require v5,v6∉{v1,v2,v3,v4}. This is equivalent to the condition that v5∉S = {v1,v2,v3,v4}∪{v1 + x3,v2 + x3,v3 + x3,v4 + x3}. Therefore, in order to determine the number of possible choices of (v5,v6) we need to calculate the cardinality of the set S.
Here, we observe that in order to calculate the cardinality of S, it is enough to determine |{v1}∩{v1 + x3,v2 + x3,v3 + x3,v4 + x3}|. It is clear that v1∉{v1 + x3,v2 + x3}. Hence, we are left with the following two subcases: Case 1.1.2.a and Case 1.1.2.b.
-
Case 1.1.2.a:(v1 ∈{v3 + x3,v4 + x3}). We have the following two possibilities.
-
1.
v1 = v3 + x3: In this case,it also follows that v2 = v4 + x3. So, |S| = 4.
-
2.
v1 = v4 + x3: Similar to the above subcase. Hence, |S| = 4.
-
1.
-
Case 1.1.2.b:(v1∉{v3 + x3,v4 + x3}). In this case, |S| = 8.
Therefore, out of total N(N − 2) choices of v4, for 2N choices , we have that v1 ∈{v3 + x3,v4 + x3}. For these 2N choices of v4, number of possible choices of (v5,v6) = N − 4. For the remaining N(N − 4) choices of v4, number of possible choices of (v5,v6) = N − 8. Hence,
Summing up the cases 1.1.1 and 1.1.2, we get
Case 1.2:(x1≠x2).
Here, we have that for fixed x1 the number of possible choices of (v1,v2) = N. Now, v3∉{v1,v2}∪{v1 + x2,v2 + x2}. Also, {v1,v2}∩{v1 + x2,v2 + x2} = ∅. Therefore, the number of possible choices of (v3,v4) is N − 4. So,
Next, we consider the following three subcases depending on (i) x3 = x1 (Case 1.2.1), (ii) x3 = x2 (Case 1.2.2), and (iii) x3∉{x1,x2} (Case 1.2.3).
Case 1.2.1: (x3 = x1)
By following the same argument as in Case 1.1.2, we get
Case 1.2.2:(x3 = x2) This subcase is same as the previous case, i.e., the Case 1.2.1.Case 1.2.3: (x3∉{x1,x2}) Here, we have the following subcases to consider depending on (i) x3 = x1 + x2 (Case 1.2.3.a) and (ii) x3≠x1 + x2 (Case 1.2.3.b).
Case 1.2.3.a:(x3 = x1 + x2).
Following an analysis similar to Case 1.1.2 we obtain
Case 1.2.3.b:(x3≠x1 + x2).
In this case also, we follow an analysis similar to Case 1.1.2 to obtain
By adding the cases 1.2.1, 1.2.2, 1.2.3.a, and 1.2.3.b, we obtain
Finally, by adding (211) and (212), we get the following expression for the l.h.s. of (25).
where = N5 − 22N4 + 195N3 − 870N2 + 1936N − 1568 and = N6 − 23N5 + 217N4 − 1073N3 + 2926N2 − 4160N + 2400.
R.H.S. of (16): Next, we obtain the expression for the r.h.s. of (25). As in the case of l.h.s., we expand the sum under expectation as follows.
Clearly, we have
Similar to Case 1, we split the outer sum in the r.h.s. of (31) in two subcases: (i) in case 2.1 we consider the condition v1 + v2 = v3 + v4, and (ii) in Case 2.2, we consider the condition v1 + v2≠v3 + v4. For each of these subcases and subcases of these subcases we determine the number of tuples (x3,v4) and the probability P(V5 +V6 = x3,V4 = v4) of a particular tuple satisfying the conditions of the subcase.Case 2.1:(v1 + v2 = v3 + v4). We further divide this subcase according to the conditions (i) x3 = v1 + v2 = v3 + v4 (Case 2.1.1), and (ii) x3≠v1 + v2 = v3 + v4 (Case 2.1.2).
Case 2.1.1:(x3 = v1 + v2 = v3 + v4).
Clearly, we have
And for fixed (v5,v6) and v4,
So,
Case 2.1.2:(x3≠v1 + v2 = v3 + v4). By following an argument similar to the Case 1.1.2 we arrive at the following two subcases depending on the conditions (i) v1 ∈{v3 + x3,v4 + x3} (Case 2.1.2.a), and (ii) v1∉{v3 + x3,v4 + x3} (Case 2.1.2.b).Case 2.1.2.a:(v1 ∈{v3 + x3,v4 + x3}).
The number of possible choices of the pair (v5,v6) is N − 4. Therefore,
Case 2.1.2.b:(v1∉{v3 + x3,v4 + x3}).
Also, the number of possible choices of the pair (v5,v6) is N − 8. So, in this case we have
So, summing up the cases 2.1.1, 2.1.2.a and 2.1.2.b, we get
Case 2.2:(v1 + v2≠v3 + v4) We split this case according to the conditions (i) x3 ∈{v1 + v2,v3 + v4} (Case 2.2.1), (ii) x i i = v1 + v2 + v3 + v4 (Case 2.2.2), and (iii) x3∉{v1 + v2,v3 + v4,v1 + v2 + v3 + v4} (Case 2.2.3).Case 2.2.1:(x3 ∈{v1 + v2,v3 + v4})
In this case, possible number of choices for the pair (v5,v6) is N − 6. So, we have
Case 2.2.2:(x3 = v1 + v2 + v3 + v4)
The number of possible choices for the pair (v5,v6) is N − 8. Hence,
Case 2.2.3:(x3∉{v1 + v2,v3 + v4,v1 + v2 + v3 + v4}) We further divide this case into the following two subcases depending on (i) v1 ∈{v2 + x3,v4 + x3} (Case 2.2.3.a), and (ii) v1∉{v3 + x3,v4 + x3} (Case 2.2.3.b).Case 2.2.3.a:(v1 ∈{v3 + x3,v4 + x3}).
Possible number of choices for the pair (v4,v5) is N − 6. Therefore,
Case 2.2.3.b: (v1∉{v3 + x3,v4 + x3}).
The number of possible choices of the pair (v5,v6) is N − 8. So, in this case, we have
So, summing up the subcases 2.2.1, 2.2.2, 2.2.3.a, and 2.2.3.b, we get
Finally, adding (32) and (33) we get
The theorem follows by comparing (21) and (22) (difference between the two being \(16(N^{2} - 8N + 8)\over (N-2)(N-3)(N-4)^{2}(N-5)^{2}\) which is > 0 for N ≥ 8).
6 Concluding remarks
In the concluding part of their paper [7], the authors consider XOR of two independent random permutations construction which is a variant of the XOR of two random permutations construction. Given two independent random permutations π1,π2 of {0,1}n, the construction creates a function g : {0,1}n↦{0,1}n as g(x) = π1(x) ⊕ π2(x). The authors show full security of the construction as well. Steps of the proof are quite similar to the previous one. We mention that the same gap exists (in the same step as (8) in this paper) in this proof as well, which can be demonstrated in similar manner as our proofs in this note. To avoid repetition we do not show this part.
Notes
This line of work was initiated by Bellare et al. in [2] who coined the term “Luby-Rackoff backwards” for such conversion.
A quote from the paper [7] “Patarin's tight proof is very involved, with some claims remaining open or unproved.”
In fact, in this setting, i.e, for information theoretic security, there always exists an adversary \(\mathcal {A}^{\prime }\) such that \( \mathbf {Adv}^{\text {prf}}_{f}(\mathcal {A}^{\prime }) = d_{\text {TV}}(\mathbf {P}_{\mathbf {1}}, \mathbf {P}_{\mathbf {0}})\); \(\mathcal {A}^{\prime }\) returns 1 for any \(x^{q} \in \mathcal {E}^{\prime }\), where \(\mathcal {E}^{\prime }\) is such that \(d_{\text {TV}}(\mathbf {P}_{\mathbf {1}}, \mathbf {P}_{\mathbf {0}})= \mathbf {P}_{\mathbf {0}}({\mathcal {E}}^{\prime }) - \mathbf {P}_{\mathbf {1}}({\mathcal {E}}^{\prime })\).
Triangle inequality of total variation metric can be easily shown from the triangle inequality in real numbers.
Which has been shown later in the proof given by Dai et al. In this paper we don’t provide details on this claim and so we skip this proof here.
Let a1,…,a n and b1,…,b n be nonnegative numbers. We denote the sum \(\sum _{i} a_{i}\)and \(\sum _{i} b_{i}\)by a and b respectively. The log sum inequality states that \(\sum _{{i = 1}}^{n}a_{i}\log \frac {a_{i}}{b_{i}} \geq a\log \frac {a}{b}\).
References
Bellare, M., Impagliazzo, R.: A tool for obtaining tighter security analyses of pseudorandom function based constructions, with applications to PRP to PRF conversion. IACR Cryptol. ePrint Arch. 1999, 24 (1999)
Bellare, M., Krovetz, T., Rogaway, P.: Luby-Rackoff backwards: Increasing security by making block ciphers non-invertible, pp 266–280. Springer, Berlin (1998)
Black, J., Rogaway, P.: A block-cipher mode of operation for parallelizable message authentication. In: EUROCRYPT 2002, volume 2332 of LNCS, pp 384–397. Springer (2002)
Cogliati, B., Lampe, R., Patarin, J.: The Indistinguishability of the XOR of k Permutations. In: Fast Software Encryption - 21st International Workshop, FSE 2014, London, UK, March 3-5, 2014. Revised Selected Papers, edited by C. Cid and C. Rechberger, volume 8540 of Lecture Notes in Computer Science, pp 285–302. Springer (2014)
Cogliati, B., Seurin, Y.: EWCDM: An Efficient, Beyond-Birthday Secure, Nonce-Misuse Resistant MAC. In: CRYPTO 2016, Proceedings, Part I, pp 121–149 (2016)
Cover, T.M., Thomas, J.A.: Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing), Wiley-Interscience (2006)
Dai, W., Hoang, V.T., Tessaro, S.: Information-theoretic Indistinguishability via the Chi-squared Method, Cryptology ePrint Archive, Report 2017/537. http://eprint.iacr.org/2017/537 (2017)
Gilboa, S., Gueron, S.: Distinguishing a truncated random permutation from a random function, IACR Cryptology ePrint Archive 2015, 773 (2015)
Gilboa, S., Gueron, S.: The advantage of truncated permutations, CoRR arXiv:1610.02518 (2016)
Gilboa, S., Gueron, S., Morris, B.: How Many Queries are Needed to Distinguish a Truncated Random Permutation from a Random Function? Journal of Cryptology (2017)
Gueron, S., Langley, A., Lindell, Y: AES-GCM-SIV: Specification and Analysis, IACR Cryptology ePrint Archive 2017, 168 (2017)
Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70(3), 419–435 (2002)
Hall, C., Wagner, D., Kelsey, J., Schneier, B.: Building PRFs from PRPs, pp 370–389. Springer, Berlin (1998)
Iwata, T., Minematsu, K., Peyrin, T., Seurin, Y.: ZMAC: A fast tweakable block cipher mode for highly secure message authentication, IACR Cryptology ePrint Archive 2017, 535 (2017)
Iwata, T.: New blockcipher modes of operation with beyond the birthday bound security. In: Fast Software Encryption, 13th International Workshop, FSE 2006, Graz, Austria, March 15-17, 2006, Revised Selected Papers, edited by M. J. B. Robshaw, volume 4047 of Lecture Notes in Computer Science, pp 310–327. Springer (2006)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Statist. 22(1), 79–86 (1951)
Lucks, S.: The sum of PRPs is a secure PRF. In: EUROCRYPT 2000, volume 1807 of LNCS, pp 470–484. Springer (2000)
Liese, F., Vajda, I.: Convex statistical distances. Teubner, Leipzig (1987)
Mennink, B., Neves, S.: Encrypted Davies-Meyer and its dual: Towards optimal security using Mirror theory, Cryptology ePrint Archive, Report 2017/xxx, to be published in CRYPTO 2017. http://eprint.iacr.org/2017/537 (2017)
Patarin, J.: The “Coefficients H” Technique. In: Selected Areas in Cryptography, 2008, volume 5381 of LNCS, pp 328-345. Springer (2008)
Patarin, J.. In: ICITS 2008, volume 5155 of LNCS, pp 232–248. Springer (2008)
Patarin, J.: Introduction to Mirror Theory: Analysis of Systems of Linear Equalities and Linear Non Equalities for Cryptography., Cryptology ePrint Archive, Report 2017/287. http://eprint.iacr.org/2010/287 (2010)
Reiss, R.-D.: Approximate distributions of order statistics: with applications to nonparametric statistics. Springer Science & Business Media, Berlin (2012)
Slivkins, A.: Lecture Notes CMSC 858G: Bandits, Experts and Games (Lecture 3). http://www.cs.umd.edu/slivkins/CMSC858G-fall16/Lecture3.pdf (2016)
Stam, A.J.: Distance between sampling with and without replacement. Statistica Neerlandica 32(2), 81–91 (1978)
Acknowledgments
We thank the reviewers for their comments and suggestions which have improved the quality of our manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is part of the Topical Collection on Special Issue on Statistics in Design and Analysis of Symmetric Ciphers
Appendix A: Proof of the χ 2 method
Appendix A: Proof of the χ 2 method
In this section we provide proof of Theorem 1, which is the heart of the χ2 method. The proof is based on Lemma 1, Lemma 2, and Theorem 6. Along the way we also briefly mention some (relevant) facts of KL divergence and χ2 distance.
Kullback-Leibler Divergence. Kullback-Leibler divergence (KL divergence) or relative entropy between P 0 to P 1 is defined as
Note that the KL divergence is defined only if P 0 ≪P 1 (with the convention that \(0 \log {0\over 0}= 0\)). It was first defined by Kullback and Leibler in 1951 [16] as a generalization of the entropy notion of Shannon (see [6]).
It can be shown that the KL divergence between any two distributions is always non-negative (known as Gibbs’ inequality, see [6]). However, it is not symmetric (i.e., dKL(P 0 ,P 1 )≠dKL(P 0 ,P 1 ) in general) and does not satisfy the triangle inequality. Thus, KL divergence is not a metric.
Though not a metric, KL divergence has some useful properties. For example, the KL divergence between any two product distributions is additive over the corresponding marginals (see [6, 23]). The KL divergence between two joint distribution can be obtained as the sum of the KL divergences of corresponding conditional distributions. This is known as the chain rule of KL divergence. It is one of the crucial parts of the χ2 method. We elaborate it in more detail below.
Chain rule of KL divergence. Let \( \mathbf {P}_{\mathbf {0}}^{\textit {\textbf {q}}}\) and \(\mathbf {P}_{\mathbf {1}}^{\textit {\textbf {q}}}\) be two probability distributions over Ωq. We denote \(\mathbf {P}_{\mathbf {0}}^{ {i}}\) and \(\mathbf {P}_{\mathbf {1}}^{ {i}}\) to represent the marginal probability distributions for first i coordinates of \(\mathbf {P}_{\mathbf {0}}^{\textit {\textbf {q}}}\) and \(\mathbf {P}_{\mathbf {1}}^{\textit {\textbf {q}}}\) respectively, 1 ≤ i ≤ q. In other words, if X := (X1,…,X q ) and Y := (Y1,…,Y q ) are two joint random variables following the probability distributions \(\mathbf {P}_{\mathbf {0}}^{q}\) and \(\mathbf {P}_{\mathbf {1}}^{q}\) then \(\textbf {P}^{i}_{0}\) and \(\textbf {P}^{i}_{1}\) represent the probability distributions of Xi and Yi respectively. We recall that P 0 (x i ) denotes the conditional distribution \(\textbf {P}(\mathsf {X}_{i} = x_{i}|\mathsf {X}^{i-1} = x^{i-1})\) and similarly \(\mathbf {P}_{\mathbf {1}|x^{i-1}}(x_{i})\). Moreover, \(\text {KL}(x^{i-1}) = d_{\text {KL}}(\mathbf {P}_{\mathbf {0}|x^{i-1}}, \mathbf {P}_{\mathbf {1}|x^{i-1}})\). Now we state chain rule of KL divergence.
Lemma 1 (Chain rule of KL divergence (see [6], Theorem 2.5.3))
Following the above notations,
Proof
The next inequality due to Pinsker (see [6]) gives an upper bound on the total variation distance between two distributions in terms of their KL divergence.
Theorem 6 (Pinsker’s Inequality)
For every probability functionsP 0 ,P 1 ,
Proof
We follow the steps of [24]. Let Ω′ = {x ∈Ω|P 0 (x) ≥P 1 (x)}.Also, let \(p_{i} = \sum _{x \in {\mathrm {\Omega }^{\prime }}} \mathbf {\textit {P}}_{\mathbf {\textit {i}}}(x)\)for i ∈{0, 1}. So,dTV(P 0 ,P 1 ) = p0 − p1. Also, by logsuminequalityFootnote 6,we have \(d_{\text {KL}}(\mathbf {P}_{\mathbf {0}}, \mathbf {P}_{\mathbf {1}}) \geq p_{0} \log {p_{0} \over p_{1}}+ (1-p_{0}) \log {(1-p_{0}) \over (1-p_{1})}\).Therefore,
χ2distance. χ2 distance has its origin in mathematical statistics dating back to Pearson (see [18] for some history). The χ2 distance between P 0 and P 1 , with P 0 ≪P 1 , is defined as
It can be seen that χ2 distance is not symmetric. Therefore, it is not a metric. However, like KL-divergence, χ2 distance between product distributions can be bounded in terms of the χ2 distances between their marginals (see [23]). The following lemma shows that KL-divergence between two distributions can be upper bounded by their χ2 distance. The first inequality can also be found in earlier works (see [12] for this and many other relations among various distances used in Statistics).
Lemma 2
\(d_{\text {KL}}(\mathbf {P}_{\mathbf {0}}, \mathbf {P}_{\mathbf {1}}) \leq \log (1 + d_{\chi ^{2}}(\mathbf {P}_{\mathbf {0}}, \mathbf {P}_{\mathbf {1}})) \leq d_{\chi ^{2}}(\mathbf {P}_{\mathbf {0}}, \mathbf {P}_{\mathbf {1}})\) .
Proof
By the definition of χ2distance we have
The last inequality follows by observing that \(d_{\chi ^{2}}(\mathbf {P}_{\mathbf {0}}, \mathbf {P}_{\mathbf {1}})) \geq 0\)and log(1 + t) ≤ t fort ≥ 0.
1.1 A.1 Proof of Theorem 1
We are now ready to show the upper bound on \(d_{\text {TV}}(\mathbf {P}_{\mathbf {0}}^{\textit {\textbf {q}}}, \mathbf {P}_{\mathbf {1}}^{\textit {\textbf {q}}})\) in terms of expected value of χ2 distance between the conditional distributions P 0 and P 1 . We state and prove the χ2 method, i.e. Theorem 1.
Proof
of Theorem 1 The proof follows directly from Pinsker’s inequality (Theorem 6), chain rule of KLdivergence (Lemma 1), and Lemma 2. More precisely, we have
Rights and permissions
About this article
Cite this article
Bhattacharya, S., Nandi, M. A note on the chi-square method: A tool for proving cryptographic security. Cryptogr. Commun. 10, 935–957 (2018). https://doi.org/10.1007/s12095-017-0276-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12095-017-0276-z
Keywords
- Random permutation
- pseudorandom function
- χ 2 distance
- KL divergence
- Total variation distance
- Pinsker’s inequality
- Sum of random permutation
- Truncated random permutation