Keywords

1 Introduction

A (cryptographic) hash function transforms a variable-length message into a fixed-length output, referred to as a “message digest,” a “hash value,” or simply a “hash.” This hash is intended to serve as a unique representative value of the message (i.e., as a “digital fingerprint”). A typical use of hash functions is in digital signature schemes, where the signature is typically applied to the hash of the message.

For such signature schemes to be secure, a hash must be uniquely identifiable by the corresponding message. Nevertheless, hash functions are many-to-one, therefore due to the pigeonhole principle, it is unavoidable that there exists a collision: two distinct messages with the same hash value.

A secure hash function is traditionally required to have three security properties: it should be computationally infeasible to find a collision, as well as to find a second preimage (another message that results in the same hash), or to find a preimage (i.e., to find a message that corresponds to a given hash). For a classical treatment of hash functions based on these three properties (preimage, second preimage, and collision resistance), we refer to the Handbook of Applied Cryptography [7, Chapter 9].

Wang et al. presented a colliding pair of messages for the Message Digest 5 (MD5) hash function at EUROCRYPT 2005 [21], and presented a collision attack for SHA-1 at CRYPTO 2005 [20]. In response to these attacks, NIST announced a competition for a new SHA-3 hash function standard in 2007 [11]. The Keccak hash function was one of the 64 hash functions submitted in 2008 and was eventually selected as the winner of the competition in 2012. In 2015, NIST published FIPS 202 [12], which specifies the SHA-3 standard.

In this paper, we will not focus on the specifications of hash functions, but on the correctness of their implementations. The source codes of the SHA-3 submissions have been subject to years of public scrutiny. Already at the beginning of the competition, Forsythe and Held of Fortify [4] performed a systematic analysis of all first-round candidates against typical programming errors and found buffer overflows, out-of-bound reads, memory leaks, and null dereferences in five reference implementations. In 2018, Mouha et al. [10] introduced a new testing strategy that showed bugs in 41 of the 86 reference implementations. Later at CT-RSA 2020, Mouha and Celi [9] announced a vulnerability in Apple’s CoreCrypto library that affected 11 out of the 12 hash functions that were implemented in the library.

In this paper, we present an undiscovered vulnerability that impacts the final-round submission of Keccak to the SHA-3 competition [13]. The vulnerability also affects the eXtended Keccak Code Package (XKCP) [2] of the Keccak team and various software projects (including Python and PHP) that are based on this source code. The vulnerability described in this paper does not affect the SHA-3 standard (as specified in FIPS 202 [12]), and not all implementations of SHA-3 are vulnerable. Most notably, the implementation of SHA-3 in OpenSSL is not affected.

Fig. 1.
figure 1

The SHA-3 sponge function, where \(M_0\), \(M_1\), \(\ldots \) are the message blocks after padding, the concatenation of \(Z_0\), \(Z_1\), \(\ldots \) is the hash value before truncation, r and c are the rate and capacity in bits, and \(0^{r+c}\) denotes an all-zero string of \(r+c\) bits.

Vulnerability Disclosure. CVE (Common Vulnerabilities and Exposures) identifiers are assigned by CVE Numbering Authorities (CNAs). The vulnerability did not seem to fit the scope of any of the regular CNAs, so the MITRE CNA of Last Resort (CNA-LR) was contacted on August 4, 2022. On August 7, CVE-2022-37454 was assigned to this vulnerability. The Keccak Team was contacted on August 21, and a series of discussions followed regarding the technical details and scope of the vulnerability, potential fixes, and a disclosure process. Proof-of-concept code was disclosed to the main projects that appeared to be impacted (Python, PHP, and pysha3) on October 11. There were no objections to publicly patching and disclosing the vulnerability on October 20. On October 21, the PyPy and SHA3 for Ruby projects were informed as well. The National Vulnerability Database (NVD) assigned the score “9.8 CRITICAL” to this vulnerability on October 25. Fixes are available for the affected projects, therefore the Python and PHP scripts in this paper may no longer produce segmentation faults even though older versions of the interpreters are vulnerable.

2 The SHA-3 Standard

SHA-3 uses the “sponge construction” to process the message in blocks of a fixed size (see Fig. 1). For the four hash functions (SHA3-224, SHA3-256, SHA3-384, and SHA3-512), the number in the suffix refers to the length of the hash value in bits. An eXtendable-Output Function (XOF) is a variant of a hash function that provides a hash value of any requested length. The two XOFs (SHAKE128 and SHAKE256) have a security strength of 128 and 256 bits respectively, assuming the requested output is sufficiently long.

The sponge construction is parameterized by a rate r and a capacity c, both given in bits. For the four hash functions and the two XOFs that are specified in the SHA-3 standard [12], the values of these parameters are given in Table 1. The message M is processed according to a specific padding ruleFootnote 1 so that the padded message becomes a positive multiple of r bits. This allows it to be split into blocks of r bits each: \(M_0\), \(M_1\), \(M_2\), \(\ldots \) As many output blocks \(Z_0\), \(Z_1\), \(Z_2\), \(\ldots \) are generated as necessary, these blocks are concatenated, and the hash value is obtained after truncating to the desired length.

Table 1. Parameters for the SHA-3 standard.

We will use the notation \(0^s\) to refer to the all-zero bit string of length s. In the figures, the numbers given next to every line represent the length of the corresponding bit string and \(\oplus \) is the bitwise eXclusive-OR (XOR) operation. The function f is a cryptographic permutation. It is easy to evaluate f and its inverse \(f^{-1}\), but the outputs should appear “random,” so that any structure in the output only occurs by chance after evaluating f on a sufficient number of inputs. In “sponge” terminology, processing the padded message is referred to as “absorbing,” and generating the hash is referred to as “squeezing.”

This paper will focus mostly on the hash function with the smallest output size: SHA3-224. This is only for convenience and simplicity, as the source code that contains the vulnerability is used by the implementations of all the hash functions and XOFs in the SHA-3 standard, as well as the SHA-3 derived functions (cSHAKE, KMAC, TupleHash, and ParallelHash) specified in SP 800-185 [6].

Two application programmer interfaces (APIs) are common for hash function implementations. More specifically, the message can be processed either at once or incrementally. In the latter case, a call to Keccak_HashInitialize() is followed by any number of calls to Keccak_HashUpdate(), and then followed by a call to Keccak_HashFinal(). In this case, the calls to Keccak_HashUpdate() are “absorbing,” while the single call to Keccak_HashFinal() performs the “squeezing.” This makes it convenient to process a message that consists of several parts: it is not necessary to store these parts in a temporary buffer, but the hash can be computed on the fly.

Many cryptographic algorithms naturally lend themselves to processing the input in blocks: for the cryptographic library HACL\(^*\) [15, 22], 17 algorithms are spread out across 40 implementations, and at least a dozen of those follow a block-based paradigm as pointed out by Protzenko and Ho [17].

3 The Vulnerability

In XKCP versions released before October 20, 2022 (and in other projects such as the Python and PHP scripting languages that included this source code before they were patched), there is a vulnerability in the KeccakSponge.inc file that implements the processing of the message in fixed-size blocks. The same vulnerability is also present in the KeccakSponge.c file of the final-round source code made available by NIST on the SHA-3 competition website. As explained in Sect. 2, the block size is also known as the “rate” and its size in bytes is denoted by rateInBytes in the source code.

The KeccakSponge.inc file contains the following code in SpongeAbsorb() to process the input of the hash function in fixed-size blocks:

figure a

On all 64-bit Windows, Linux, and macOS operating systems, size_t variables are unsigned 64-bit integers and unsigned int variables are 32-bit unsigned integers. Therefore, the variable definitions (not shown here) imply that

  • partialBlock, instance->byteIOIndex, and rateInBytes are unsigned 32-bit integers, whereas

  • dataByteLen and i are unsigned 64-bit integers.

The comparison (partialBlock+instance->byteIOIndex> rateInBytes) is intended to detect when SpongeAbsorb() encounters (partial) inputs that, when added to the instance->byteIOIndex bytes already in the buffer from previous calls (if any) to SpongeAbsorb(), will be larger than the block size (rateInBytes).

This buffer may already contain some data. If this is the case, then a subsequent call to SpongeAbsorb() with an input that is just below \(2^{32}\) bytes (4 GiB) causes partialBlock+instance->byteIOIndex to wrap around due to an integer overflow. This incorrectly results in a value that is lower than the block size, so that the if condition evaluates to false. Consequently, a large value of partialBlock will be passed on to SnP_AddBytes(), resulting in a buffer overflow when these partialBlock bytes are XORed to memory inside SnP_AddBytes().

Additionally, there is an incorrect type casting. If an input of at least \(2^{32}\) bytes (4 GiB) is provided, then the higher bits are discarded due to the cast to an unsigned int. The code will nevertheless be correct if only one call to SpongeAbsorb() is performed. If, however, the buffer already contains some data and an input of at least \(2^{32}\) bytes is provided, then the program will enter into an infinite loop. Note the similarity here with the vulnerability presented at CT-RSA 2020 by Mouha and Celi [9], which affected every implemented hash function except MD2 in Apple’s CoreCrypto library, and also caused an infinite loop.

The infinite loop can be avoided as follows. Assume that an input of x bytes is processed (where \(0<x<\texttt {rateInBytes}\)), so that instance->byteIOIndex is set to x. The buffer then contains x bytes. Then, assume that this is followed by another input of \(2^{32}-x\) bytes. This will create a situation where a large number of bytes of the input message are XORed in memory. If this involves a write operation into unwritable memory, it will cause a segmentation fault. Proof of concept Python and PHP scripts that generate a segmentation fault in this way are given in Appendix A.

If we can ensure that the write is done into writable memory, then this specific input value will avoid an infinite loop, but instead, will exit the loop before the next iteration. We will not go into the details of the techniques to avoid write operations to unwritable memory, but we note that the typical techniques for this (such as stack spraying or heap spraying, depending on the location of the internal hash function state), may also help to mitigate Address Space Layout Randomization (ASLR) if present.

In the following, we explain that if a write operation to unwritable memory can be avoided, it will be possible to generate second preimages and preimages for this specific implementation of the SHA-3 hash function. We reiterate that this is not due to a weakness in the SHA-3 standard, but rather due to the implementation producing an incorrect hash value when provided with malicious inputs. We also show how to provide an exploit payload along with the message, which will overwrite the stack return address to point to the location of the payload inside the message.

3.1 Constructing a Second Preimage

The construction of a second preimage (which also implies a collision) is rather straightforward. As shown in Fig. 2, we process an all-zero message of \(2^{32}\) bytes (4 GiB) using two calls to Keccak_HashUpdate() (which will internally call SpongeAbsorb()). The first call consists of \(0<x<\texttt {rateInBytes}\) bytes, followed by a call of \(2^{32}-x\) bytes. The value of x can be any integer within the specified range, for simplicity we use \(x=1\) in the proof of concept code given in Appendix A.

Fig. 2.
figure 2

SHA-3 second preimage for a vulnerable implementation. The second preimage consists of the following two messages that have the same hash value: the empty string, and the 4 GiB all-zero message \(M_0\Vert M_1\) that is processed using two calls to Keccak_HashUpdate(), where the length of the first call is a positive number of bytes less than \(\texttt {rateInBytes}\). Here, \(M_2\) is an extra block due to the padding of either message, and A refers to the contents of the adjacent memory region that needs to be writable but may be unknown to the attacker.

The \(2^{32}\) bytes of the message will be XORed into memory. As we are XORing all-zero values, the content of the memory will not be changed but may result in a segmentation fault if the memory region is not writable. Therefore, the adjacent memory region, beyond the \(r+c\) bits of the sponge state, does not need to be known to the attacker but needs to be writable.

A call to Keccak_HashUpdate() of \(0<x<\texttt {rateInBytes}\) bytes followed by a call of \(2^{32}-x\) bytes will conveniently result in another integer overflow: instance->byteIOIndex will overflow and end up with a value of zero. Therefore, from the point of view of the implementation, the 4 GiB message is “forgotten” and the computation continues as if nothing has been processed yet.

Now, the padding of SHA-3 becomes relevant. As explained in [12], the padding consists of adding a fixed two- or four-bit suffix to the message (to distinguish the SHA-3 hash functions from the SHA-3 XOFs), followed by the “multi-rate padding rule” which consists of a ‘1’, followed by a possibly empty string of zeros, and a ‘1’. This padding is notably different from the MD4, MD5, SHA-1, and the SHA-2 family, which include the length of the message as part of the padding, a process known as Merkle–Damgård strengthening.

Fig. 3.
figure 3

SHA-3 preimage of zero for a vulnerable implementation. The message \(M_0\Vert M_1\) is again 4 GiB in length and is processed in two calls to Keccak_HashUpdate(), but it contains a well-placed 1 that sets the squeezing variable in the hash function state to a non-zero value. This causes Keccak_Final() to return with an error and the hash value is never written but keeps the value to which it was initialized (typically zero).

Because the padding for SHA-3 does not involve the number of bytes that were processed, we can perform a third call to Keccak_HashUpdate() (and any number of subsequent calls) and the hash value will be the same as when the first two calls to Keccak_HashUpdate() were not present.

As such, we find a second preimage for the vulnerable implementation: given any message, we can prepend 4 GiB of zeros to the message (to be processed as mentioned earlier in two calls) to obtain another message that results in the same hash value.

3.2 Constructing a Preimage of Zero

At SAC 2020, Benmocha et al. [1] studied implementations of the keyed-hash message authentication code (HMAC) when the API is used in an unintended way by adding extra data after the tag has already been computed. They noted that most APIs do not raise an error when used in such a way, and that for OpenSSL it is possible to instantly find collisions and multi-collisions that are also colliding under any key.

The SHA-3 implementation does raise an error when such an API misuse happens. To achieve this, the state contains a squeezing variable that is initialized to zero, and is set to a non-zero value when the padding has been processed. Every time new data is processed, the implementation confirms that the squeezing variable is zero, otherwise the calling function returns with a non-zero value to indicate an error.

On the other hand, an implementation would typically not check for errors that cannot occur if the implementation is correct, and even if they do, such checks might be eliminated as part of a common compiler optimization called “dead code elimination.” If this is the case, we show how to construct a preimage of zero for a vulnerable implementation.

More specifically, we can provide a 4 GiB message (processed using two calls to Keccak_HashUpdate() as before) to reach beyond the \(r+c\) bits of the sponge state, and access the internal variables of the hash function state (see Fig. 3). This allows us to set the squeezing variable to a non-zero value, and when Keccak_HashFinal() calls SpongeAbsorbLastFewBits() to process the padding, it will return early with an error when it finds that squeezing has a non-zero value. In the end, the hash will not be written but will contain the value to which it was initialized, most likely zero.

In Appendix A, we provide proof-of-concept code to use this technique to obtain a preimage of zero for a vulnerable implementation.

3.3 Constructing a Preimage of Any Value

Rather than just creating a preimage of zero, we can use the vulnerability to create a preimage of an arbitrary hash value.

For this, we start with the target hash value H, and pad it to the entire \(r+c\) sponge state. The contents of the padding do not matter, so we can just use zeros for simplicity. Recall that f is a permutation, so we can invert f on any value. The code for the inverse of f is not included in XKCP [2], however, it can be found in KeccakTools [3]. As SHA-3 initializes the \(r+c\) bits of the sponge state with zeros, all we need to do now is XOR the inverse of f with two padding bytes (see [12, App. B.2]) to obtain the first \(r+c\) bits of the \(M_0\Vert M_1\), which is again a message of 4 GiB that is processed in two calls. The other bits of \(M_0\Vert M_1\) are set to zero to avoid altering the adjacent memory regions. The entire procedure is illustrated in Fig. 4.

In literature, the attack is known as the correcting block attack as applied to hash functions based on Cipher Block Chaining (CBC) [16, Sect. 5.3.1.1], such as the attack on the first-round SHA-3 candidate Khichidi-1 [8, Sect. 2.6.3].

In Appendix A, we show how for a vulnerable implementation we can generate a preimage of 000102030405060708090a0b0c0d0e0f101112131415161718191a1b in this way.

Fig. 4.
figure 4

SHA-3 preimage of any H for a vulnerable implementation. We use two calls to Keccak_HashUpdate() to process a message \(M_0\Vert M_1\) that is again 4 GiB in length. However, we now use the fact that f is invertible to determine the correct value of \(M_0\Vert M_1\), noting that we can use the vulnerability to overwrite all \(r+c\) bits of the sponge state.

3.4 Constructing a Message with an Exploit Payload

As shown in Fig. 5, a carefully constructed stack overflow allows the return address of the function to be overwritten. We illustrate this with a simple return-to-stack exploit when an attacker-provided file is hashed, which launches a Meterpreter Reverse TCP payload. This allows the attacker to download and upload files, view the webcam, run post-exploitation tools to pivot deeper into the victim’s device and/or to maintain persistence, etc. Proof-of-concept code is provided in Appendix A. Our exploit assumes that the stack is executable and that ASLR is not present. Note that these assumptions can be avoided by using more advanced exploitation techniques, such as return-oriented programming and techniques to reduce address randomization.

Fig. 5.
figure 5

SHA-3 exploit for a vulnerable implementation, where two calls to Keccak_HashUpdate() are made to provide an attack payload and to overwrite the function return address on the stack. The attacker-provided payload will be executed when the function returns.

3.5 Attacking EdDSA

The use of SHA-3 and its variants is mandatory in certain NIST and Internet Engineering Task Force (IETF) standards. For example, EdDSA [5, 14] makes the use of SHAKE256 mandatory for Ed448. The vulnerability would then work as follows. If an implementation of Ed448 verification (with the default empty-string context) places a 10-byte encoded context, a 57-byte point R, and a 57-byte public key Q in the buffer, then \(10+57+57=124\) bytes are in the buffer before the message is processed. This is less than 136 bytes, which is the rate in bytes for SHAKE256. Therefore, a message of \(2^{32}-124\) bytes can be used to cause the buffer overflow described in this paper. Note that the message does not need to be correctly signed for the buffer overflow attack on the verification function to work.

As this example shows, the use of repeated calls to Keccak_HashUpdate() can occur quite naturally, for algorithms such as EdDSA where the input consists of a concatenation of various values.

4 Discussion

The execution time to process the 4 GiB message will depend on the platform. However, no calls to the cryptographic permutation f are involved, therefore the execution time is mainly the time required to XOR a 4 GiB value into memory. In our experiments on a recent laptop, we observed an execution time of 2 to 3 s to process this input. The proof-of-concept code in Appendix A shows various techniques to avoid a large amount of RAM or swap space, such as using mmap() to create file-backed and anonymous mappings. However, it may be necessary for some attacks that the amount of RAM and swap together is at least 4 GiB to avoid an error that insufficient memory is available. It does not seem that 32-bit systems are vulnerable because the address space is insufficient to malloc() or mmap() such an input.

Not all implementations of SHA-3 are vulnerable to this bug. For example, the implementation of OpenSSL is not based on the XKCP, and incidentally, Python 3.9 has been patched to use OpenSSL’s implementation when available [18]. As explained by Christian Heimes [19], both SHA-3 and SHAKE are listed in hashlib.algorithms_guaranteed, but using OpenSSL is optional. This explains why the vulnerable code has not been removed and that it may be reachable under some configurations. Since Python 3.11, however, the XKCP implementation was replaced by Saarinen’s tiny_sha3 [19].

Projects that are derived from Python, such as PyPy3, may remain vulnerable for a longer time due to a slower adoption of Python patches. For example, the PyPy 3.8 release is vulnerable, but the latest PyPy 3.9 release incorporates the patch to use the OpenSSL implementation.

A possible suggestion to mitigate the vulnerability is to switch the default SHA-3 and SHAKE from XKCP to OpenSSL, or Saarinen’s tiny_sha. Another suggestion to mitigate this bug is to limit the maximum size of a call to \(2^{32}-\texttt {rateInBytes}\) bytes, where rateInBytes is either the corresponding value in Table 1 for the given SHA-3 hash function or XOF, or a cautious upper limit of 200 (the size of the sponge state in bytes). Lastly, the vulnerability can also be avoided by always processing the entire message at once, which may require the use of a temporary buffer.

Note that the Large Data Test (LDT) that was introduced at CT-RSA 2020 by Mouha and Celi [9] is not effective to find this bug (nor for regression testing) because a specific sequence of calls is required; a single call with a large input will not trigger the vulnerability. Bugs of this type may be difficult to find through testing because they require a very specific sequence of calls, which may explain why this bug has not been discovered since it was first introduced in 2011. Nevertheless, the bug may be triggered using only one call to higher-level algorithms that are now introducing SHA-3 or its variants, as in the Ed448 example mentioned earlier.

The bug was not present in the first- and second-round submissions of the Keccak package to the NIST SHA-3 competition, but appears in the implementation that was submitted in the final round where partialBlock was changed from a 64-bit to a 32-bit variable. Nevertheless, we note a slight difference with the bug in the Keccak package: the incorrect line contains databitlen rather than dataByteLen, and therefore a message of \(2^{32}\) bits (0.5 GiB) rather than \(2^{32}\) bytes (4 GiB) is required to trigger the bug described in this paper. Taking this change into account, all attacks described in this paper also apply to the final-round Keccak submission to the SHA-3 competition.

5 Proposing the Init-Update-Final Test (IUFT)

Within the NIST Cryptographic Algorithm Validation Program (CAVP), when testing hash functions, a single call to Keccak_HashUpdate() is performed to compute the hash value. As we have shown, this is not sufficient to cover corner cases that appear in practice. Testing must match the real-world use cases of an implementation to be effective. Currently, there is a gap in the test coverage offered by NIST. To cover this gap, we propose the Init-Update-Final Test (IUFT) as a solution in Fig. 6. The example is given in the Automated Cryptographic Validation Protocol (ACVP) JavaScript Object Notation (JSON) format, and includes an optional Large Data Test (LDT) element proposed by Mouha and Celi at CT-RSA 2020 [9].

Fig. 6.
figure 6

An example Init-Update-Final Test (IUFT) case for the ACVP JSON format. An array of messages with lengths (in bits) are passed to the Keccak_HashUpdate() function individually and in order before Keccak_Final() is called. This example test case would cause a segmentation fault when run on vulnerable implementations.

6 Conclusion

We described a buffer overflow vulnerability in the final-round Keccak submission package to the NIST SHA-3 competition, in the eXtended Keccak Code Package (XKCP), and in various projects such as the Python and PHP interpreters that incorporate this code.

The vulnerability is due to a 32-bit integer overflow that occurs when a large (around 4 GiB) call to Keccak_HashUpdate() is made after an incomplete number of blocks have been processed. Depending on the length of the call, this will result in either an infinite loop or an attacker-chosen 4 GiB value that is XORed into memory, resulting in a buffer overflow.

We showed how this buffer overflow can be leveraged to violate the cryptographic properties of the hash function (preimage, second preimage, and collision resistance), as it provides the attacker full control over the \(r+c\) bits of the sponge state. Moreover, we showed how to overwrite the stack pointer and execute an attacker-provided payload.

Lastly, we proposed the Init-Update-Final Test (IUFT) that can process an input in several parts.