1 Introduction

The failure of popular hash functions MD5 and SHA-1 [55, 56] lends an impetus to the search for new ones. The contention of our paper is that there will be a “niche” market for as-fast-as-possible hash functions proven secure under standard assumptions. We provide a general paradigm that yields such functions.

The hash functions we get are chameleon [30] and we extend the treatment to get a characterization of chameleon hash functions, on the one hand unifying and clarifying previous constructs and on the other hand yielding new and more efficient ones. Let us now look at all this in more detail.

The Need for Proven-Secure Hashing

Suppose an important document has been signed with a typical hash-then-sign scheme such as PKCS#1 [29]. If collisions are found in the underlying hash function, the public key needs to be revoked and the signature can no longer be accepted. Yet there are instances in which we want a public key and signatures under it to survive for twenty or more years. This might be the case for a central and highly disseminated certificate or an important contract. Revocation of a widely disseminated public key is simply too costly and error-prone. In such a case, we want to be able to trust that collisions in our hash function will not be found even twenty years down the line.

Given the failure of MD5 and SHA-1, it would be understandable, from this twenty-year perspective, to feel uncertain about any hash function designed by “similar” methods. On the other hand, we may be very willing to pay a (reasonable!) computational price for security because documents or certificates of the ultra-importance we are considering may not need to be signed often. In this case, hash functions with proven security are interesting, and the faster they are the better. Our contribution is a general transform that yields a plurality of such hash functions, not only providing new ones but “explaining” or improving old ones.

From Σ to Hash

We show how to construct a collision-resistant hash function from any (suitable) Σ-protocol. Recall that Σ-protocols are a class of popular 3-move identification schemes. Canonical examples are the Schnorr [46], Fiat–Shamir [21] and GQ [23] protocols, but there are many others as well [10, 24, 35, 3739, 44]. Briefly, the protocols achieve a strong form of the usual honest-verifier zero-knowledge property, and our hash function is defined using the simulator. (We stress that the computation of the hash is deterministic even though the simulator is randomized.) The collision-resistance stems from strong special soundness [8], a well-studied property of Σ-protocols. The advantage of our approach is that there is a rich history in constructing proven-secure Σ-protocols and we can now leverage this to get collision-resistant hash functions. For future reference let us refer to a hash function derived from our approach as a Σ-hash function.

Damgård [19] and Cramer, Damgård and MacKenzie [16] have previously shown that it is possible to design commitment schemes based on Σ-protocols, but prior to our work it has not been observed that one can design collision-resistant hash functions from Σ-protocols. Note that secure commitment is not known to imply collision-resistant hashing and in fact is unlikely to do so because the former can be based on one-way functions [36] and the latter probably not [49]. Perhaps as a consequence, our construction requires slightly stronger properties from the Σ-protocols than do the constructions of [16, 19].

Specific Designs

The Schnorr [46] and GQ [23] schemes are easily shown to meet our conditions, yielding collision-resistant Σ-hash functions and based, respectively, on discrete log and RSA. More interesting is the Fiat–Shamir protocol  [21]. It does not satisfy strong special soundness, but we modify it to a protocol (strong ) that we prove does under the factoring assumption, thereby obtaining a Σ-hash function . From a modified version of the Micali–Shamir protocol [35] we obtain a Σ-hash function with security based on the SRPP (Square Roots of Prime Products) assumption of [35]. We also obtain a Σ-hash from Okamoto’s protocol [38] and a pairing-based Σ-hash from an identification protocol of [9] derived from the identity-based signature scheme of Hess [24].

How Fast?

One question we consider interesting is: How fast can one hash while maintaining a proof of security under the standard factoring assumption? Figure 1 compares to the fastest known factoring-based functions and shows that the former emerges as the winner. (VSH, the Very Smooth Hash function of [14], is faster than all these, but is based on a non-standard assumption related to the difficulty of extracting modular square roots of products of small primes. We will discuss VSH, and our improvement to it, in a bit.) In Fig. 1, is the most efficient factoring-based instantiation known of Damgård’s claw-free permutation-based hash function [17, 22, 30]. is the hash function of Shamir and Tauman [48]. The table entries are the rate defined as the average number of bits of data hashed per modular multiplication in MD mode with a block size of 512 bits and a modulus and output size of 1024 bits. The figure shows that without pre-computation, is twice as fast as and 9 times as fast as . But is amenable to pre-computation-based speedup and is not, so the gap in their rates increases swiftly with storage. is also amenable to pre-computation-based speedup but remains a factor 4 faster for any given amount of storage. We also remark that additionally is amenable to parallelization, unlike the other functions. We remark that is faster than but based on a stronger assumption. In Sect. 6 we recall and and justify the numbers in Fig. 1. We also discuss implementation results.

Fig. 1.
figure 1

Performance of factoring-based hash functions. The modulus and output size are 1024 bits and the block size is 512 bits. “Pre” is the amount of pre-computation, in number of group elements stored. The table entry is the rate defined as the average number of bits of data hashed per modular multiplication.

Additional Features

Σ-hash functions are keyed. While one can, of course, simply hardwire into the code a particular key to get an unkeyed function in the style of MD5 or SHA-1, it is advantageous, as explained in [6], to allow each user to choose their own key. The reason is that damage from a collision is now limited to the user whose key is involved, and the attacker must reinvest resources to attack another key. This slows down the rate of attacks and gives users time to get patches in place or revoke keys.

The reductions underlying the security proofs of Σ-hash functions are tight, so that the proven security guarantees hold with standard values of the security parameters.

Σ-Hash Functions are Chameleon

Krawczyk and Rabin [30] introduced chameleon hash functions. The over 150 citations to date to their paper (as per Google Scholar) are an indication of the popularity and utility of the primitive.

Krawczyk and Rabin [30] presented two example constructions of chameleon hash functions, and others were found by [2, 3, 48]. The analyses, however, are ad hoc. We, in contrast, show a general result, namely, that any Σ-hash is chameleon (cf. Theorem 5.1). As a consequence, we immediately obtain that and are chameleon. In particular, in , we obtain the fastest known chameleon hash function under the standard factoring assumption.

Shamir and Tauman used chameleon hash functions to build on-line/off-line signature schemes [48]. (The concept is due to [20].) This means that when one uses a Σ-hash one can completely eliminate the on-line cost of signing. (This cost is shifted entirely to the off-line phase.) Another application is chameleon signatures [30], which provides a recipient with a non-repudiable signature of a message without allowing it to prove to a third party that the signer signed this message. As explained in [30], this is an important tool for privacy-respecting authenticity in the signing of contracts and agreements. Chameleon hash functions are also used in designated-verifier signatures to achieve privacy [28, 50]. Finally, chameleon hashing can be used to transform a weakly-secure signature scheme into a fully-secure one. This is used in many places [11, 25, 30, 47] and a full statement and proof were provided by Hohenberger and Waters [26] whose design of RSA-based signatures made crucial use of this transform. By adding new and more efficient chameleon hash functions to the pool of existing ones we enable new and more efficient ways to implement all the different applications.

Reverse Connection

As indicated above, we show that Σ-hash functions are chameleon. To complement this, we show that the converse is true as well, namely, all chameleon hash functions are Σ-hash functions (cf. Theorem 5.2). We prove this by associating with any chameleon hash function  a Σ-protocol such that applying our \(\mathsf{\varSigma2H}\) (Σ-to-hash) transform to returns . We thereby have a characterization of chameleon hash functions as Σ-hash functions, which, as we discuss below, allows us to unify previous work.

We also obtain numerous new Σ-protocols, and thus identification protocols and, via [16, 19], commitment schemes, from existing chameleon hash functions such as [17] and [48]. However, we are not aware of any practical benefit of these constructs over known ones.

Unifying Previous Work

turns out to be exactly the classical hash function of Chaum, Van Heijst and Pfitzmann [13], which was shown to be chameleon by [30]. is an extension thereof [13]. is a special case of a chameleon hash function proposed by Ateniese and de Medeiros [2, 3]. (Our other hash functions , and are new.) The re-derivation of these hash functions as Σ-hashes sheds new light on the designs and shows how the Σ paradigm explains and unifies previous constructs.

Finally we make a connection between VSH [14] and , the Σ-hash function emanating from the protocol of Micali and Shamir [35]. The latter is a more efficient version of the Fiat–Shamir protocol in which the public key, rather than consisting of random quadratic residues, consists of small primes. Interestingly, turns out to be the VSH compression function [14] modulo some details. We suggest that this provides some intuition for the VSH design. It turns out that we can exploit this connection to get some improvements to VSH.

VSH

In number-theoretic hashing there is (as elsewhere) a trade-off between speed and assumptions. We saw above that is the fastest known hash function under the standard factoring assumption. We now turn to non-standard factoring-related assumptions. Here the record-holder is VSH (Very Smooth Hash), a construct of Contini, Lenstra and Steinfeld [14] which has a proof of collision-resistance based on the VSSR assumption of the same paper [14]. We provide a modification VSH whose compression function, unlike that of the original, is collision-resistant, leading to better performance for short messages. (Our implementations show that \(\mathsf {VSH^{*}}\) is up to 5 times faster than VSH on short messages. On long messages they have the same performance.) This is important because short messages are an important case in practice. (For example, most Internet packets are short.) VSH remains provably collision-resistant under the same VSSR assumption as VSH. A different collision-resistant modification of the compression function of VSH is provided by [51].

We provide analogous improvements for the Fast-VSH variant of VSH provided by [14]. Again, we can provide Fast-\(\mathsf {VSH^{*}}\) whose underlying compression function (unlike that of Fast-VSH) is proven collision-resistant, leading to speedups in hashing short messages. However, the speed gains are smaller than in the previous case.

Overall we believe that, even putting performance aside, having a collision-resistant compression function underlying a hash function is a plus since it can be used directly and makes the hash function more misuse-resistant.

What Σ-Hash Functions Are Not

Some recent work [1, 5, 15] suggests that general-purpose hash functions should have extra properties like pseudo-randomness. Σ-hash functions are merely collision-resistant and chameleon; they do not offer these extra attributes. But as indicated above, Σ-hash functions are not intended to be general purpose. The envisaged applications are chameleon hashing and proven-secure, reasonable-cost (purely) collision-resistant hashing.

2 Related Work

Damgård [17] presents a construction of collision-resistant hash functions from claw-free permutation pairs [22]. As noted above, his factoring-based instantiation, based on [22] and also considered in [30, 48], is slower than our .

Ishai, Kushilevitz and Ostrovsky [27] show how to transform homomorphic encryption (or commitment) schemes into collision-resistant hash functions. This is an interesting theoretical connection between the primitives. As far as we can tell, however, the approach is not yet practical. Specifically, their quadratic-residuosity (QR) based instantiation has a rate of 1/40 (that is, 40 modular multiplications per bit) with a 1024-bit modulus. (Their matrix needs 80 rows to get the 80-bit security corresponding to a 1024-bit modulus.) Hence their function is much slower than the constructs of Fig. 1 in addition to being based on a stronger assumption (QR as opposed to factoring). Additionally it has a (80⋅1024)-bit output so in a practical sense is not really hashing. Other instantiations of their construction that we know (El Gamal under DDH, Paillier [40] under DCRA) are also both slower than known ones and based on stronger assumptions.

Lyubashevsky, Micciancio, Peikert and Rosen [33] present a fast hash function SWIFFT with an asymptotic security proof based on assumptions about the hardness of lattice problems [32, 41], but the proof would not seem to yield guarantees for the parameter sizes proposed in [33]. In contrast, our reductions are tight and the proofs provide guarantees for standard values of the security parameters.

Bellare and Micciancio’s construction [4] (whose goal was to achieve incrementality) uses random oracles, but these can be eliminated by using a small block size, such as one bit. In this case their MuHASH is provably collision-resistant based only on the discrete-log assumption, and runs at 0.33 bits per group operation in MD mode. In comparison, (also discrete log based) is faster, at 0.57 bits per group operation in MD mode.

Charles, Goren and Lauter [12] presented a hash function based on the assumed hardness of some problems related to elliptic curves. However, their construction was shown to not be collision-resistant [54] and in fact to not even be pre-image resistant [42]. Tillich and Zémor [53] present a hash function based on the assumed hardness of some graph problems, whose security properties and efficiency were improved by Petit, Veyrat-Charvillon, and Quisquater [43]. The hash function of [43] is slower than Fast-VSH, and thereby slower than Fast-VSH , according to the performance results reported in [14] and [43].

Heng and Kurosawa [31] define reversible Σ-protocols and show that these and trapdoor commitment schemes are equivalent. Reversibility, a property we do not assume, requires that the prover’s randomness or internal state can be reconstructed from the public key and last two messages in the protocol given the secret key. The binding property of the commitment scheme is a weak form of collision-resistance which rules out efficiently finding message-randomness pairs where the message parts are different but the corresponding commitments are the same. They do not claim or provide (chameleon) collision-resistant hash functions.

Steinfeld, Pieprzyk and Wang [51] suggest a collision-resistant modification of the VSH compression function based on restricting the domain of the second argument. This makes iterating the compression function somewhat less convenient but it can be done using the methods we discuss in Appendix A and would then appear to yield performance benefits similar to those we get via VSH . They do not consider Fast-VSH.

Subsequent to the preliminary version of our work [7], Šarinay [45] provides a variant of VSH with better performance than the original, security now being based on the k-sum problem.

3 Definitions

Notation and Conventions

We denote by a 1∥⋯∥a n a string encoding of a 1,…,a n from which the constituent objects are uniquely recoverable. We denote the empty string by ε. Unless otherwise indicated, an algorithm may be randomized. If A is a randomized algorithm then denotes the operation of running A with fresh coins on inputs x 1,… and letting y denote the output. We denote by [A(x 1,…)] the set of all y that have positive probability of being output by A on input x 1,…. If S is a (finite) set then denotes the operation of picking s uniformly at random from S. If X=x 1x 2∥⋯∥x n , then x 1x 2∥⋯x n X denotes the operation of parsing X into its constituents. Similarly, if X=(x 1,x 2,…,x n ) is an n-tuple, then (x 1,x 2,…,x n )←X denotes the operation of parsing X into its elements. We denote the security parameter by k, and by 1k its unary encoding. Vectors are denoted in boldface, for example u. If u is a vector then |u| is the number of its components and u[i] is its ith component. “PT” stands for polynomial time.

Σ-Protocols

A Σ-protocol is a three-move interactive protocol conducted by a prover and a verifier. Formally, it is a tuple , where K,P are PT algorithms and V is a deterministic boolean algorithm. The key-generation algorithm K takes input 1k and returns a pair (pk,sk) consisting of a public and secret key for the prover. The prover is initialized with pk,sk while the verifier is initialized with pk. The parties interact as depicted in Fig. 2. The prover begins by applying P to pk,sk to yield his first move Y∈CmSet(pk), called the commitment, together with state information y, called the ephemeral secret key. The commitment is sent to the verifier, who responds with a challenge c drawn at random from ChSet(pk). The prover computes its response z∈RpSet(pk) by applying P to pk,sk, the challenge c and the ephemeral secret key y. (This computation may use fresh coins although in the bulk of protocols it is deterministic.) Upon receiving c the verifier applies V to the public key and transcript Ycz of the conversation to decide whether to accept or reject. We require completeness, which means that an interaction between the honest prover and verifier is always accepting. Formally, for all \(k\in \mathbb {N}\) we have d=1 with probability 1 in the experiment

The verifier given pk,Ycz should always check that Y∈CmSet(Y) and c∈ChSet(pk) and z∈RpSet(pk) and reject otherwise. We implicitly assume this is done throughout.

Fig. 2.
figure 2

Σ-protocol. Keys pk and sk are produced using key-generation algorithm K.

Security Notions

We provide formal definitions of strong special soundness (sss) and strong honest-verifier zero-knowledge (StHVZK). Strong special soundness of Σ-protocol [8] asks that it be computationally infeasible, given only the public key, to produce a pair of accepting transcripts that are commitment-agreeing but challenge-response-disagreeing. Formally an sss-adversary, on input pk, returns a tuple (Y,c 1,z 1,c 2,z 2) such that Y∈CmSet(pk);c 1,c 2∈ChSet(pk); z 1,z 2∈RpSet(pk) and (c 1,z 1)≠(c 2,z 2). The advantage of such an adversary is defined for all \(k\in \mathbb {N}\) as the probability that V(pk,Yc 1z 1)=1 and V(pk,Yc 2z 2)=1 in the experiment where K(1k) is first executed to get (pk,sk) and then A(pk) is executed to get (Y,c 1,z 1,c 2,z 2). We say that has strong special soundness if is negligible for all PT sss-adversaries A. To define StHVZK, let be the algorithm that on input (pk,sk) executes P and V as per Fig. 2 and returns the transcript Ycz. Recall that a PT algorithm Sim is a HVZK simulator for if the outputs of the processes

and

are identically distributed. (We require perfect, not computational, ZK. This simplifies applications and there is no particular loss from assuming it since it is provided by the candidate protocols.) We say that a PT algorithm StSim is a strong HVZK (StHVZK) simulator for if StSim is deterministic and the algorithm Sim defined on input pk by

is a HVZK simulator for . We say that is StHVZK if it has a PT StHVZK simulator and also the commitment Y generated via is uniformly distributed over CmSet(pk) for all (pk,sk)∈[K(1k)]. We denote by Σ(sss) the set of all Σ-protocols that satisfy strong special soundness and by Σ(StHVZK) the set of all Σ-protocols that are strong HVZK.

Discussion

While the basic format of Σ-protocols as 3-move protocols of the type above is agreed upon, when it comes to security properties, there are different choices and variations in the literature. Our formalization of strong special soundness is from [8]. Strong HVZK seems to be new but the canonical protocols in this area have this property.

Collision-Resistant Hash Functions

A family of n-input hash functions (where n≥1 is a constant) is a tuple . The key-generation algorithm KG takes input 1k and returns a key K describing a particular function H K :D1(K)×⋯D n (K)→R(K). As this indicates, D1,…,D n ,R are functions that given K return sets. A cr-adversary, on input K returns distinct tuples (x 1,…,x n ),(y 1,…,y n ) such that x i ,y i ∈D i (K) for all 1≤in. The advantage of such an adversary B is defined for all \(k\in \mathbb {N}\) as the probability that H(K,x 1,…,x n )=H(K,y 1,…,y n ) in the experiment where KG(1k) is first executed to get K and then B(K) is executed to get ((x 1,…,x n ),(y 1,…,y n )). We say that is collision-resistant if the cr-advantage of any PT adversary B is negligible.

4 Σ-Hash Theory

This section covers the theory of Σ-hash functions. We present and justify the \(\mathsf{\varSigma2H}\) transform that turns a Σ-protocol into a collision-resistant hash function . Then we find Σ-protocols which we can prove have the required properties and derive specific Σ-hash functions. In Sect. 5 we relate Σ and chameleon hash functions. In Sect. 6 we discuss the practical and performance aspects of our Σ-hash functions.

4.1 The Transform

We show how to build a collision-resistant hash function from any Σ-protocol that satisfies strong special soundness and strong HVZK. Let StSim be a strong HVZK simulator for . Let K (1) be the algorithm that on input 1k lets and returns pk. We define the 2-input family of hash functions by KG=K (1) and H pk (c,z)=StSim(pk,c,z). In other words, the key is the prover’s public key. (The secret key is discarded.) The inputs to the hash function are regarded as the challenge and response in the Σ-protocol. The output is the corresponding commitment. The existence of a StHVZK simulator is exploited to deterministically compute this output. We refer to a family of functions defined in this way as a Σ-hash. We write to indicate that has been derived as above from Σ-protocol . The following theorem says that a Σ-hash family is collision-resistant.

Theorem 4.1

Let be a Σ-protocol. Let be the family of hash functions associated with as above. For every cr adversary B against there exists an sss-adversary A against such that for all k we have , and the running time of A is that of B.

The proof of this theorem, given below, is simple, but we note some subtleties, which is the way it relies on the (strong) HVZK and completeness of the Σ-protocol in addition to the strong special soundness.

Proof of Theorem 4.1

We define adversary A as follows.

By definition of a cr-adversary we know that (c 1,z 1)≠(c 2,z 2). Hence A satisfies the definition of an sss adversary. Let Y i =H pk (c i ,z i ) for i=1,2. The definition of a cr-adversary also implies that c i ∈ChSet(pk) and z i ∈RpSet(pk) for i=1,2. Strong HVZK now implies that the transcripts Y i c i z i have positive probability of being produced in the protocol, meaning of being output by . The completeness of the protocol now implies that V(pk,Y 1c 1z 1)=1 and V(pk,Y 2c 2z 2)=1. Finally, if B succeeds then Y 1=Y 2 so A also succeeds. □

To construct Σ-hash functions we now seek Σ-protocols which we can show are in Σ(sss)∩Σ(StHVZK).

4.2 Overview of Constructions

We begin, as illustrative examples, with the Schnorr [46] and GQ [23] Σ-protocols, which we can easily show to have the desired properties. The hash functions obtained are known [2, 3, 13] and their re-derivation as Σ-hashes sheds new light on their design and also shows how the Σ-hash paradigm unifies and explains existing work. More interesting is the Fiat–Shamir [21] Σ-protocol. It does not satisfy strong special soundness, but we modify it to a Σ-protocol that we prove is in Σ(sss)∩Σ(StHVZK) under the standard factoring assumption. With non-standard factoring-related assumptions (that it is hard to extract modular square roots of products of small primes) we get a faster Σ-hash from a modification of the Micali–Shamir Σ-protocol [35]. We also get another discrete-log-based Σ-hash from Okamoto’s protocol [38] and a pairing based one from the protocol [9, 24]. Let us now detail all this.

4.3

We fix a prime-order group generator, by which we mean a PT algorithm that on input 1k returns the description 〈G〉 of a group G of prime order p∈{2k−1,…,2k−1} together with p and a generator g of G. The key-generation process and protocol underlying the Σ-protocol of [46] are then as shown in Fig. 3. The algorithm that on input pk=(〈G〉,p,g,X) picks and returns X c g zcz is a HVZK simulator for , so and the derived Σ-hash is as shown in Fig. 3. The key observation for strong special soundness is that if and (c 1,z 1)≠(c 2,z 2) then it must be that c 1c 2. This leads us to associate with sss-adversary A the discrete log finder D that on input 〈G〉,p,g,X runs A on the same input to get (Y,c 1,z 1,c 2,z 2) and returns (z 2z 1)(c 1c 2)−1modp. Then for all k we have , where the latter is defined as the probability that x′=x in the experiment where we let . This shows that has strong special soundness as long as the discrete log problem is hard relative to . By Theorem 4.1, is collision-resistant under the same assumption.

Fig. 3.
figure 3

Σ-protocol and the derived Σ-hash family, where is a prime-order group generator.

4.4

We fix a prime-exponent RSA generator with associated challenge length L(⋅), by which we mean a PT algorithm that on input 1k returns an RSA modulus N∈{2k−1,…,2k−1} and an RSA encryption exponent e>2L(k) that is a prime. The key-generation process and protocol underlying Σ-protocol of [23] are then as shown in Fig. 4. The algorithm that on input pk=(N,e,l,X) picks and returns Ycz, where Y=X c z emodN, is a HVZK simulator for , so and the derived Σ-hash is as shown in Fig. 4. Again, observe that if and (c 1,z 1)≠(c 2,z 2) then c 1c 2. To adversary A attacking the strong special soundness, this leads us to associate the inverter that on input N,e,X runs A on input N,e,l,X, where l=L(⌊log2(N)⌋+1), to get (Y,c 1,z 1,c 2,z 2), and returns , where a,b satisfy ae+b(c 1c 2)=1 and are found via the extended gcd algorithm. (This is where we use the fact that e is prime.) Then for all k we have , where the latter is defined as the probability that x′=x in the experiment where we let . This shows that has strong special soundness if RSA is one-way relative to . By Theorem 4.1, is collision-resistant under the same assumption.

Fig. 4.
figure 4

Σ-protocol and the derived Σ-hash family, where is a prime exponent RSA generator with associated challenge length L.

4.5 and

We fix a modulus generator, namely a PT algorithm , that on input 1k returns a modulus N∈{2k−1,…,2k−1} and distinct primes p,q such that N=pq. We also fix a challenge length L(⋅). If c is an l-bit string and \(\textbf {u} \in(Z_{N}^{*})^{l}\) then we let u c=∏u[i]c[i] where the product is over 1≤il and c[i] denotes the ith bit of c. The key-generation algorithm and protocol underlying the Σ-protocol are then as shown in Fig. 5. However, this protocol does not satisfy strong special soundness because if Ycz is an accepting transcript relative to pk=(N,l,u) then so is Ycz′ where z′=Nz. We now show how to modify so that it has strong special soundness. First, some notation. For \(w\in \mathbb {Z}_{N}\) we let [w] N equal w if wN/2 and Nw otherwise. Let \(\mathbb {Z}_{N}^{+}=\mathbb {Z}_{N}^{*}\cap\{1,\ldots, N/2\}\). The modified protocol (strong ) is shown in Fig. 5. Here CmSet((N,l,u)) is the set QR N of quadratic residues in \(\mathbb {Z}_{N}^{*}\) and ChSet((N,l,u)) is {0,1}l, just as in , but RpSet((N,l,u)) is now equal to \(\mathbb {Z}_{N}^{+}\) rather than \(\mathbb {Z}_{N}^{*}\) as before. For any adversary F we define as the probability that r∈{p,q} in the experiment where we let and . The following shows that has strong special soundness under the standard hardness of factoring assumption.

Fig. 5.
figure 5

, , and protocols and the Σ-hash derived from . The upper left key-generation algorithm is that of and , while the lower left one is that of and . The upper protocol is that of and while the lower protocol is that of and . Here is a modulus generator and is a small prime modulus generator. The computations are in \(\mathbb {Z}_{N}^{*}\), meaning modulo N.

Proposition 4.2

Let be a modulus generator and L(⋅) a challenge length. Let be the associated Σ-protocol as per Fig5. If A is an sss-adversary against then there is a factoring algorithm F against such that for all k we have . The running time of F is that of A plus the time for at most L(⋅) multiplications, one inversion modulo N, and the time for one execution of the gcd algorithm.

Proof

The factoring algorithm F is shown in Fig. 6. For the analysis, consider two cases. The first is that c 1c 2 and the second is that c 1=c 2 and z 1z 2. In the first case, a simple computation shows that \(r_{1}^{2}\equiv r_{2}^{2} \ (\mathrm{mod}\ N)\). On the other hand, s[g] is chosen at random in \(\mathbb {Z}_{N}^{*}\) and the only information A gets about it is u[g]=s[g]−2modN so \(\textbf {s}[g]\not\in\{r_{1},N-r_{1}\}\) with probability 1/2. So in this case F succeeds with probability 1/2 times the probability that A succeeds. In the case c 1=c 2 but z 1z 2 we have, modulo N,

and c 1=c 2 implies . But z 1z 2 and so it must be that . So F succeeds with the same probability as A in this case.

Fig. 6.
figure 6

Factoring algorithm F for proof of Proposition 4.2 and spr-adversary B for proof of Proposition 4.3.

 □

Now, the algorithm that on input pk=(N,l,u) lets , and Yu cz 2modN and returns Ycz is a HVZK simulator for . Accordingly and we derive from the Σ-hash family shown in Fig. 5. Proposition 4.2 and Theorem 4.1 imply that is collision-resistant under the standard factoring assumption.

4.6 and

The Micali–Shamir protocol [35] is a variant of in which verification time is reduced by choosing the coordinates of u to be small primes. As with , it does not satisfy sss, but we can modify it to do so and thereby obtain a collision-resistant hash function that is faster than at the cost of a stronger assumption for security. To detail all this, let be a small prime modulus generator with challenge length L(⋅), by which we mean a PT algorithm that on input 1k returns a modulus N∈{2k−1,…,2k−1}, distinct primes p,q such that N=pq, and an L(k)-vector u each of whose coordinates is a prime in \(\mathrm {QR}(N)=\{x^{2} \bmod N: x\in Z_{N}^{*}\}\). For efficiency we would choose these primes to be as small as possible. (For example u[i] is the ith prime in QR(N).) An spr-adversary B against takes input N and \(\textbf {u}\in (\mathbb {Z}_{N}^{*})^{L(k)}\) and returns (x,S) where \(x\in \mathbb {Z}_{N}^{*}\) and S is a non-empty subset of {0,1}l. Its spr-advantage is defined for all k by

The SRPP (Square Root of Prime Products) assumption [35] says that the spr-advantage of any PT B is negligible. Now, Fig. 5 shows our modified version of the Micali–Shamir protocol. It is in Σ(StHVZK) for the same reason as and hence the derived hash function is again as shown, where SQR(⋅,p,q) takes input w∈QR(N) and returns at random one of the four square roots of w modulo N=pq, computed using the primes p,q. Strong special soundness of is proven in the following under the SRPP assumption.

Proposition 4.3

Let be a small prime modulus generator with associated challenge length L. Let be the associated Σ-protocol as per Fig5. If A is an sss-adversary against then there is a spr-adversary B such that for all k we have . The running time of B is that of A plus time t=max{t 1,t 2}, where t 1 is the time it takes to execute one inversion modulo N and L+1 multiplications modulo N and t 2 is the time it takes to execute the gcd algorithm and the SQR algorithm.

Proof

Let us explain why the adversary B shown in Fig. 6 works. If A succeeds then

(1)

Now, when c 1c 2, multiplying both sides of this equation by ∏ jT u[j]/∏ jR u[j] gives:

From c 1c 2 it follows that TM ≠∅, and therefore by outputting the adversary B succeeds.

In the other case, when c 1=c 2, it has to be z 1z 2 and Eq. (1) becomes . By finding the gcd of N and z 1z 2, B can factorize N, and so it can compute the square root of u[1] modulo N. □

Proposition 4.3 and Theorem 4.1 imply that is collision-resistant under the SRPP assumption.

4.7 Additional Functions

Okamoto’s protocol [38] is StHVZK and can be shown to have special soundness if the discrete logarithm problem is hard relative to the underlying prime-order group generator, and hence we obtain a collision-resistant Σ-hash family . The key has the form (〈G〉,p,g 1,g 2,X), where g 1,g 2G and XG, and \(\mathsf {H}_{\mathit {pk}}:\mathbb {Z}_{p}\times(\mathbb {Z}_{p}\times \mathbb {Z}_{p})\) is defined by . However, this hash function seems to offer no performance advantage over . A pairing-based identification protocol , derived from the id-based signature scheme of [24], is noted in [9]. It is shown in [9] to have special soundness under concurrent attack assuming the hardness of the one more computational Diffie–Helman problem relative to an underlying prime-order bilinear group generator. The proof can be easily extended to show strong special soundness while relaxing the assumption to the hardness of the computational Diffie–Helman problem. can also be shown to be StHVZK and hence we obtain a Σ-hash family . The key has the form (〈G 1〉,〈G 2〉,q,P,〈e〉,α) where G 1 and G 2 are groups of prime order p;e:G 1×G 1G 2, is non-degenerate bilinear map; \(P\in G_{1}^{*}\) and αG 2. The function \(\mathsf {H}_{\mathit {pk}}:\mathbb {Z}_{p}\times G_{1} \rightarrow G_{2}\) is defined by H pk (c,z)=e(z,P)⋅α c. Due to the pairing, however, this hash function is slower than .

5 Σ=Chameleon

We move from examples of Σ-hash functions to a general property of the class, namely that any Σ-hash function is chameleon and vice-versa.

5.1 Definitions

A 2-input hash family is said to be trapdoor if the following are true. First, there is a PT algorithm K, called the full key-generation algorithm, that on input 1k returns a pair (K,T), and algorithm KG is equal to K (1), meaning algorithm KG, on input 1k, runs K(1k) to get (K,T) and returns K. Second, there is a deterministic, PT algorithm, , called the inversion algorithm, such that for all (K,T)∈[K(1k)] and all c 1,c 2∈D1(K) and all Y∈R(K) the map defined by is a bijection of to , where .Footnote 1 We say that has the uniformity property if for all K and all c∈D1(K) it is the case that H K (c,⋅) is uniformly distributed over R(K) when regarded as a random variable over a random choice of its argument z from D2(K). We say that is chameleon if it is trapdoor, collision-resistant and has the uniformity property.

The (standard) completeness requirement for a Σ-protocol implies that from a secret key sk and challenge c, one can easily (in PT) compute the response z, but only if one has the ephemeral secret key y underlying the commitment. To obtain chameleon hash functions from Σ-protocols we need the latter to satisfy a strong form of completeness which says that a response, distributed identically to the response of the real prover P, can be computed even without the ephemeral secret key so long as we have access to some accepting conversation. Formally a strong HVZK Σ-protocol satisfies strong completeness if there is a deterministic PT algorithm called the strong prover such that for all (pk,sk)∈[K(1k)] and all c 1,c 2∈ChSet(pk) and all Y∈CmSet(pk) the map defined by is a bijection of to , where where StSim is the strong HVZK simulator. We let Σ(sc) be the class of all Σ-protocols that have the strong completeness property.

5.2 Sigma Is Chameleon

The following implies that any Σ-hash is chameleon.

Theorem 5.1

Let be a Σ-protocol. Then the Σ-hash family is chameleon.

Proof of Theorem 5.1

Theorem 4.1 implies that is collision-resistant. We now show that the strong HVZK property of implies uniformity of . Fix (pk,sk)∈[K(1k)] and also fix c∈ChSet(pk). We want to show that H pk (c,⋅)=StSim(pk,c,⋅) is uniformly distributed over CmSet(pk) when its argument is drawn at random from RpSet(pk). Consider the games of Fig. 7. Let D be any (computationally unbounded) adversary. Then it suffices to show that

$${\Pr\bigl[\mathrm {S}^{D}\Rightarrow1 \bigr]}={\Pr\bigl[ \mathrm {T}^{D}\Rightarrow1 \bigr]} $$

where “G D⇒1” denotes the event that D outputs 1 on input the output of game G, and the probability is over the coins of G and D. But this follows because

$$\begin{aligned} {\Pr\bigl[\mathrm {S}^{D}\Rightarrow1 \bigr]} =& {\Pr\bigl[\mathrm {R}^{D}\Rightarrow1 \bigr]}, \end{aligned}$$
(2)
$$\begin{aligned} {\Pr\bigl[\mathrm {R}^{D}\Rightarrow1 \bigr]} =&{ \Pr\bigl[\mathrm {T}^{D}\Rightarrow1 \bigr]}, \end{aligned}$$
(3)

where game T is also in Fig. 7. Equation (2) is true by the strong HVZK property. (The real and simulated conversation transcripts are equally distributed, and hence continue to be so conditioned on a particular challenge.) Equation (3) is true because our definition of strong HVZK required that a commitment generated by the prover is uniformly distributed over CmSet(pk).

Fig. 7.
figure 7

Games for proof of uniformity of in proof of Theorem 5.1. Here pk,sk,c are fixed.

To show that is trapdoor we need to exhibit the full key-generation algorithm and the inversion algorithm. The full key-generation algorithm is simply the key-generation algorithm K of , so that the trapdoor is the secret key of the protocol. The inversion algorithm is simply the strong prover from the strong completeness condition. That the trapdoor condition is met is a tautology, since the set is exactly the set . □

As a consequence, we obtain the following new chameleon hash functions: , , , , . ( was already known to be chameleon [30].) This yields numerous new and more efficient instantiations of on-line/off-line signatures [48], chameleon signatures [30] and designated-verifier signatures [28, 50]. It also provides new and more efficient ways to turn weakly-secure signatures into fully-secure ones that can improve the performance of schemes like [26].

5.3 Chameleon is Sigma

We also prove the converse. The following theorem says that any chameleon hash family is a Σ-hash family, meaning the result of applying our \(\mathsf{\varSigma2H}\) transform to some Σ-protocol.

Theorem 5.2

Let be a family of chameleon hash functions. Then there is a Σ-protocol such that is the Σ-hash family corresponding to .

Proof of Theorem 5.2

Since is trapdoor, it has a full key-generation algorithm K and an inversion algorithm . Let the former be the key-generation algorithm of . Now we define the prover algorithm as shown in Fig. 8. Define the verifier V on input pk,Ycz to output 1 if H pk (c,z)=Y and Y∈CmSet(pk) and c∈ChSet(pk) and z∈RpSet(pk), and 0 otherwise. We now need to show that satisfies strong HVZK, strong special soundness, and strong completeness. (We need to also show that satisfies completeness, but this is implied by strong completeness.)

Fig. 8.
figure 8

Prover algorithm for the proof of Theorem 5.2. In line 2 of the second column, (Y,c 1,z 1)←y means we parse y as shown.

Let StSim be defined by StSim(pk,c,z)=H pk (c,z) . We now show this is a strong HVZK simulator. Fix (pk,sk)∈K(1k) and consider the games of Fig. 9. Game R generates real protocol transcripts based on the prover algorithm of Fig. 8 while S generates a simulated transcript based on StSim. We want to show that

$$ {\Pr\bigl[\mathrm {R}^{D}\Rightarrow1 \bigr]}={\Pr\bigl[\mathrm {S}^{D}\Rightarrow1 \bigr]} $$
(4)

for any (computationally unbounded) adversary D. But the uniformity property implies that

$${\Pr\bigl[\mathrm {R}^D\Rightarrow1 \bigr]} = {\Pr\bigl[ \mathrm {T}^D\Rightarrow1 \bigr]}. $$

On the other hand, by the trapdoor property we have

$${\Pr\bigl[\mathrm {T}^{D}\Rightarrow1 \bigr]}={\Pr\bigl[ \mathrm {U}^{D}\Rightarrow1 \bigr]}. $$

Re-applying uniformity we have

$${\Pr\bigl[\mathrm {U}^{D}\Rightarrow1 \bigr]}={\Pr\bigl[ \mathrm {S}^{D}\Rightarrow1 \bigr]} $$

and so we have Eq. (4).

Fig. 9.
figure 9

Games for proof of strong HVZK in proof of Theorem 5.2. Here (pk,sk) are fixed.

The collision-resistance of directly implies strong special soundness of . Also, the trapdoor property of implies strong completeness of by simply letting the strong prover for the strong completeness condition be the inversion algorithm of the trapdoor condition. Again, the required conditions are met simply because the sets and are the same. □

Applying this to known chameleon-hash functions like [17, 30] and [48] yields new Σ-protocols and hence new identification schemes and, via [16, 19], new commitment schemes.

6 Σ-Hash Practice and Performance

In this section we cover practical issues related to Σ-hash functions, including performance, performance comparison with existing constructions and implementation results.

6.1 Extending the Domain

A Σ-hash family as defined above is actually a (keyed) compression function since the domain is relatively small. In practice however we need to hash messages of long and variable length. This would not at first appear to be much of a problem since we should be able to do MD iteration [18, 34]. In fact, this is essentially true but one has to be careful about a few things. What one would naturally like to do is use the second argument to H pk as the chaining variable. But this requires that outputs of the compression function can be regarded as chaining values, meaning CmSet(pk) be a subset of RpSet(pk). Sometimes this is true, as for , which in this way lends itself easily and naturally to MD iteration. But in the case of and we have \(\mathrm {CmSet}((N,l,\textbf {u}))=\mathbb {Z}_{N}^{*} \subsetneqq \mathbb {Z}_{N}^{+}= \mathrm {RpSet}((N,l,\textbf {u}))\). In Appendix A we show how to resolve these problems by appropriate “embeddings” that effectively allow the second input of the compression function to be used as a chaining variable at the cost of 1 bit in throughput and in particular allows us to run any of our Σ-hash functions in MD mode. We will not detail the general transform here, but it is instructive to describe the modified compression function. The public key has the form (N,l,u,v) where N,l,u are as before and v∈QR(N), and \(\mathsf {H}_{\mathit {pk}} :\{0,1\}^{l}\times \mathbb {Z}_{N}^{*}\rightarrow \mathbb {Z}_{N}^{*}\) is defined by

(5)

where f N (z)=0 if and 1 otherwise. It can be shown that this modified function is also a Σ-hash, meaning the result of applying Σ2H to a suitably modified version of the original Σ-protocol that retains the sss, StHVZK and sc properties of the original. But now \(\mathrm {CmSet}((N,l, \textbf {u}, v))=\mathbb {Z}_{N}^{*}= \mathrm {RpSet}((N,l,\textbf {u}, v))\) so MD-iteration is possible.

6.2 Metrics

We measure performance of a hash function in terms of rate, which we define as the average number of bits hashed per group operations. (By “average” we mean when the data is random.) In this measure, an exponentiation aA a costs 1.5n group operations and a twofold multi-exponentiation a,bA a B b costs 1.75n group operations where n is the length of a and also of b. We will use these estimates extensively below. We can consider two modes of operation of a given Σ-hash function , namely compression and MD. In the first case the data to be hashed by H pk is the full input c,z, while in the second case it is only c. (The second input is the chaining variable which is not part of the data.) The rate in MD mode is lower than in compression mode for most hash functions. ( is an interesting exception.) Compression mode is relevant when the function is being used as a chameleon hash, since the data can then be compressed with a standard (merely collision-resistant) hash function such as SHA-1 before applying the Σ-hash [30, Lemma 1]. MD mode is relevant when one wants to avoid conventional hash functions and get the full provable guarantees of the Σ-hash by using it alone. Our performance evaluations will consider MD mode.

6.3 Performance of Σ-Hash Functions

and can be computed with one twofold multi-exponentiation so that they use 1.75 group operations per bit of data (in MD mode). We now turn to . Since we are considering MD mode performance we refer to the MD-compatible version of the function from Eq. (5). (But in fact performance is hardly affected by the modification.) On the average about half the bits of c are 1 so comes in at about 0.5 modular multiplications per bit. This explains the claim of Fig. 1 in regard to without pre-computation. Now we look at how pre-computation speeds it up, using a block size of l=512 (the same as MD5 and SHA-1) for illustration. The method is obvious. Pick a “width” w that divides l and let t=l/w. Letting pk=(N,l,u,v) denote the public key, pre-compute and store the table T with entries

$$T[i, x] = {\prod_{j=1}^{w}} \textbf {u}\bigl[(i-1)w + j\bigr]^{x} \bmod N\quad \bigl(1\leq i \leq t,\ x\in\bigl\{0,\ldots2^w-1\bigr\}\bigr). $$

The size of the table is t2w=l2w/w group elements. Now computing takes t+2=2+l/w multiplications since

where x i is the integer with binary representation c[(i−1)w+1]…c[iw] (1≤it). The number of group operations per bit is thus [2+l/w]/l≈1/w, meaning the rate is w. Figure 1 showed the storage and this rate for w=4 and w=8.

Analytical assessment of the performance of is difficult, but we have implemented both it and (for comparison) . The implementation used a 1024-bit modulus and (for MD mode) a 512-bit block size. Table 1 shows that is about 30 times faster than the basic (no pre-computation) version of . The gap drops to a factor of 15 and 7.5 when compared with the w=4 and w=8 pre-computation levels of , respectively. Note that here is without pre-computation. (The latter does not seem to help it much.) These implementation results are on a Dual Pentium IV, 3.2 GHz machine, running Linux kernel 2.6 and using the GMP library [52].

Table 1. Implementation results. Here w is the “width” parameter determining pre-computation and the space is the number of group elements that need to be stored.

6.4 Comparisons

We now assess performance of previous schemes, justifying claims in Sect. 1. Damgård [17] shows how to construct collision-resistant hash functions from claw-free permutations [22]. Of various factoring-based instantiations of his construction, the one of [22, 30], which we denote , seems to be the most efficient. The key is a modulus N product of two primes, one congruent to 3mod8 and the other to 7mod8, and the hash function \(\mathsf {H}_{N}:\{0,1\}^{l}\times \mathbb {Z}_{N}^{*}\rightarrow \mathbb {Z}_{N}^{*}\) is defined by H N (m,r)=4mr smodN where s=2l. Since multiplying by 4 is cheap, we view it as free and the cost is then one multiplication per bit, meaning is twice as fast. But pre-computation does not help since r is not fixed, and the gap in rates increases as we allow pre-computation for as shown in Fig. 1.

The key of Shamir and Tauman’s [48] hash function is a modulus N and an \(a\in \mathbb {Z}_{N}^{*}\). With a 1024-bit modulus the chaining variable needs to be 1024 bits as well, so that with a 512-bit block size the function would take a (512+1024)-bit input, regard it as an integer s, and return a smodN. The computation takes 1.5 multiplications per bit of the full input, which is 1.5⋅(1024+512)/512=4.5 per bit of data, meaning the rate is 1/4.5≈0.22 as claimed in Fig. 1. Since a is fixed, one can use the standard pre-computation methods for exponentiation. For any v dividing 1024+512=1536, the computation takes 1536/v multiplications with a table of 2v⋅1536/v group elements. Note that per data bit the rate is 512/(1536/v)=v/3. To compare to we need to choose parameters so that the storage for the two is about the same, meaning 2w(512/w)≈2v(1536/v). This yields v=1 for w=4 and v=6 for w=8. This explains the rates shown in Fig. 1.

7 Improvements to VSH

The performance of a hash function on short inputs is important in practice. (For example, a significant fraction of Internet traffic consists of short packets.) We present a variant VSH of VSH that is up to 5 times faster in this context while remaining proven-secure under the same assumption as VSH. The improvement stems from VSH , unlike VSH, having a collision-resistant compression function.

Background

The key of Contini, Lenstra and Steinfeld’s VSH function [14] is a modulus N product of two primes. The VSH compression function \(\mathsf {vsh}_{N}: \{0,1\}^{l}\times \mathbb {Z}_{N}^{*}\rightarrow \mathbb {Z}_{N}^{*}\) is defined by

where p i is the ith prime and c[i] is the ith bit of c. The hash function VSH is obtained by MD-iteration of vsh with initial vector 1. A curious feature of VSH is that the compression function is not collision-resistant. Indeed, vsh N (c,z)=vsh N (c,Nz) for any c∈{0,1}l and \(z\in \mathbb {Z}_{N}^{*}\). Nonetheless, it is shown in [14] that the hash function VSH is collision-resistant based on the VSSR assumption. The latter states that given N,l it is hard to find \(x\in \mathbb {Z}_{N}^{*}\) and integers e 1,…,e l , not all even, such that \(x^{2}\equiv p_{1}^{e_{1}} \cdot\cdots\cdot p_{l}^{e_{l}} \pmod{N}\). The proof makes crucial use of the fact that the initial vector is set to 1.

\(\mathsf {VSH^{*}}\)

We alter the compression function of VSH so that it becomes (provably) collision-resistant and then define \(\mathsf {VSH^{*}}\) by MD iteration with the initial vector being part of the data to be hashed. The first application of the compression function thus consumes much more (1024 bits more for a 1024-bit modulus, for example) of the input, resulting in significantly improved rate for the important practical case of hashing short messages. For example, the implementation results of Table 2 show speed increases of a factor of 5 over VSH when hashing 1024-bit messages. Performance for long messages is the same as for VSH. \(\mathsf {VSH^{*}}\) and its compression function \(\mathsf {vsh^{*}}\) are provably collision-resistant under the same VSSR assumption as VSH.

Table 2. The size of the modulus used here is 1024. The block and the input size are given in bits.

The inspiration comes from which we notice is very similar to vsh but, unlike the latter, is collision-resistant. The difference is that in the primes u[1],…,u[l],v—referring to the MD-compatible version of the function from Eq. (5)—are quadratic residues. But this turns out to be important for the completeness of the Σ-protocol rather than for collision-resistance. This leads to the compression function \(\mathsf {vsh}^{*}_{N}:\{0,1\}^{l}\times \mathbb {Z}_{N}^{*}\rightarrow \mathbb {Z}_{N}^{*}\) defined by

where f N (z)=0 if and 1 otherwise, p i is the ith prime and c[i] is the ith bit of c. As a check notice that is unlikely to equal because f N (z)≠f N (Nz), meaning the attack showing vsh is not collision-resistant does not apply. Of course, this is not the only possible attack, but the proof of strong special soundness of Proposition 4.3 can be adapted to show that vsh is collision-resistant under the VSSR assumption. Finally \(\mathsf {VSH^{*}}\) is obtained by MD iteration of vsh but with the initial vector being the first k−1 bits of the input. For MD-strengthening, the standard padding method of SHA-1 is used.

The implementation results given in Table 2 were again obtained on a Pentium IV, 3 GHz machine using the GMP library [52]. We set the block size to 128 for both functions and considered hashing a 1024-bit input. In this case (even taking into account the increase in length due to MD strengthening) \(\mathsf {VSH^{*}}\) needs 1 application of its compression function. On the other hand VSH (with their own form of strengthening) needs 9. The implementation shows that \(\mathsf {VSH^{*}}\) is 5.6 times faster. We need to add that our implementations (unlike those of [14]) are not optimized, but our goal was more to assess the comparative than the absolute performance of these hash functions, and this is achieved because both are tested on the same platform.