Abstract
We describe new arithmetic coding techniques and side-channel blinding countermeasures for lattice-based cryptography. Using these techniques, we develop a practical, compact, and more quantum-resistant variant of the BLISS Ideal Lattice Signature Scheme. We first show how the BLISS parameters and hash-based random oracle can be modified to be more secure against quantum pre-image attacks while optimizing signature size. Arithmetic Coding offers an information theoretically optimal compression for stationary and memoryless sources, such as the discrete Gaussian distributions often present in lattice-based cryptography. We show that this technique gives better signature sizes than the previously proposed advanced Huffman-based signature compressors. We further demonstrate that arithmetic decoding from an uniform source to target distribution is also an optimal non-uniform sampling method in the sense that a minimal amount of true random bits is required. Performance of this new Binary Arithmetic Coding sampler is comparable to other practical samplers. The same code, tables, or circuitry can be utilized for both tasks, eliminating the need for separate sampling and compression components. We then describe simple randomized blinding techniques that can be applied to anti-cyclic polynomial multiplication to mask timing- and power consumption side-channels in ring arithmetic. We further show that the Gaussian sampling process can also be blinded by a split-and-permute techniques as an effective countermeasure against side-channel attacks.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Recent years have seen an increased focus on Lattice-based and other “quantum-resistant” public key cryptography for classical computers, which has also been fueled by official governmental warnings and endorsements [4, 5, 7, 27].
However, standardization efforts for these new algorithms are only starting [6] and post-quantum algorithms clearly have not yet reached the level of maturity that is expected from traditional cryptosystems (RSA or Elliptic Curve). Lattice-based public key algorithms have been criticized for their key and signature/ciphertext sizes, and for their lack of resistance against side-channel attacks. This paper offers a number of novel implementation techniques that address these issues.
The Bimodal Lattice Signature Scheme (BLISS) was proposed in CRYPTO 2013 by Ducas et al. [9] and offers some of the most compact and efficient lattice-based signatures currently available. A recent survey of practical lattice-based signature schemes [17] found BLISS to offer state-of-the-art performance and recommended it for practical use. The scheme has been implemented in FPGA hardware [30], on an 8-bit AVR target [24], and has been distributed as a part of the strongSwan IPsec VPN suite. Relevant hardware implementation techniques are also considered in [33]. We use BLISS as basis for our new lattice-based signature implementation, dubbed BLZZRD.
After describing the notation (Sect. 2.1) and the original BLISS scheme (Sect. 2.2), we move to our new discoveries:
-
1.
Grover’s Attack on Random Oracle can be mounted with a Quantum Computer against many signature algorithms. The attack can be applied against the random oracle component of BLISS (Sect. 3). We show how to modify the algorithm and its security parameters to counter this attack with small overhead (Sect. 3.1).
-
2.
Arithmetic Coding is ideally suited for compression of stationary distributions (Sect. 4.1) arising in Lattice cryptography, achieving near-optimal efficiency. We show how a Binary Arithmetic Coder (BAC) can be implemented with limited precision (Sect. 5), and used to compress lattice signatures (Sect. 5.2).
-
3.
Gaussian sampling can be implemented with a BAC The same arithmetic coding tables, code, and circuitry can also be used to efficiently implement Gaussian sampling, another major operation required in Lattice cryptosystems (Sect. 5.4). The resulting sampler is optimal in the sense that it requires a minimum number of true random bits.
-
4.
Polynomial Blinding is an efficient ring-arithmetic side-channel countermeasure. It hides the details of time or energy consumption of individual arithmetic operations via randomization. Polynomial blinding can be implemented with a minimal impact on overall performance or footprint (Sect. 6.1).
-
5.
Split Shuffled Sampling is a general side-channel countermeasure for Gaussian samplers. Here two or more sample vectors are randomized and combined with each other, masking the properties of individual samples during the random sampling process (Sect. 6.2). This is an effective countermeasure against attacks such as the cache attack presented against BLISS in CHES 2016 [2].
Some efficient implementation tricks for tasks such as computation of the discrete Gaussian distribution are also described. We further take advantage of recent advances in security proof techniques, indicating a smaller precision requirement which results leads to smaller, faster implementations.
2 BLISS and BLZZRD
BLISS is an lattice public key signature algorithm proposed by Ducas et al. [9]. We use the more recent BLISS-B variant [8] as basis of our work. We refer the reader to these works for security analysis and other design rationale of the original proposals. Since BLZZRD is an improvement with quantum resistance, compressed signatures, and side-channel resistant countermeasures, we concentrate on these specific implementation techniques in present work.
2.1 Conventions and notation
Arithmetic is performed in a cyclotomic ring \( \mathbf {R} = {\mathbb {Z}}_q[x]/(x^n+1)\) where q is a small prime. Such a ring is anti-circulant as \(x^n \equiv -1 ~ (\bmod ~ x^n+1)\). Arithmetic in this ring can be performed efficiently via Number Theoretic Transforms (NTT, finite field FFT) when n divides \(q-1\) and n is a power of 2.
Non-boldface \(f_i\) denotes individual coefficients of polynomial \( \mathbf {f} = \sum _{i=0}^{n-1} f_i x^i\). Hence \( \mathbf {f} \in \mathbf {R}\) is a ring element, while \(f_i \in {\mathbb {Z}}_q\). We use \( \mathbf {a}* \mathbf {b}=\sum _{i=0}^{n-1} \sum _{j=0}^{n-1} a_i b_j x^{i+j}\) to denote the product of polynomials \( \mathbf {a}\) and \( \mathbf {b}\). For anti-circulant rings of degree n, we always have \(f_{i+n} = -f_i\). We may therefore reduce \(f_i=(-1)^{\lfloor i/n \rfloor } f_{i \bmod n}\) for all \(i \in {\mathbb {Z}}\).
We interchangeably use polynomials as zero-indexed vectors; dot product is defined as \( \mathbf {a} \cdot \mathbf {b} = \sum _{i=0}^{n-1} a_i b_j\), the Euclidean norms are \(\Vert \mathbf {a}\Vert ^2 = \mathbf {a} \cdot \mathbf {a}, \Vert \mathbf {a}\Vert _2 = \sqrt{ \mathbf {a} \cdot \mathbf {a}}\), and sup norm \(\Vert \mathbf {a}\Vert _\infty \) is the largest absolute value \(\max \{|a_0|, |a_1|, \ldots , |a_{n-1}| \}. \lfloor x \rceil = \lfloor x + \frac{1}{2} \rfloor \) denotes the closest integer to x.
A discrete Gaussian distribution with mean \(\mu \), deviation \(\sigma \) and variance \(\sigma ^2\) is denoted \({\mathcal {N}}_{\mathbb {Z}}(\mu , \sigma ^2)\). We use exclusively zero-centered (\(\mu = 0\)) distributions. See Sect. 4.1 for definitions and further discussion.
In pseudocode, we use \(\wedge , \vee , \lnot \) symbols to denote bitwise Boolean manipulation of two’s complement integers. A binary modular reduction operator \(\bmod ~n\) returns an integer in range \([0, n-1]\).
2.2 Description of BLISS
Our description differs somewhat from the original description (which is even more dense). We used the original reference implementationFootnote 1 and the strongSwan “production grade” implementationFootnote 2 to verify the correctness of our interpretation.
Various parameters, symbols, and security claims used in descriptions are given in Table 1. These are the parameters used by both the publications and can also be found in the implementations. These are unmodified for BLZZRD apart from \(\kappa \). See Sect. 2.2 for further information.
2.3 Key generation
The BLISS-B key generation is perhaps the simplest part of the scheme. Algorithm 1 describes the process. The BLISS-B key generation procedure differs from original BLISS in that keys are not rejected based on an additional norm.
2.4 Creating signatures
Algorithm 2 describes the signature generation method. Here we note that the purpose of the random oracle H is to take the input \(( \mathbf {w}, \mathbf {M})\) and hash it into a vector of length n that has exactly \(\kappa \) ones, with other entries being zero (the vector can also be interpreted as a polynomial in the ring). The reference implementation is an ad hoc construction based on SHA-512 [14]; the strongSwan implementation now uses the MGF1 construction [18]. We use a two-stage random oracle described in Sect. 3.1. The BLISS-B signature method differs from BLISS in steps 8–15 of Algorithm 2; this is the Greedy Sign Choices algorithm for minimizing the norm \(\Vert \mathbf {c} \cdot \mathbf {t}\Vert + \Vert \mathbf {c} \cdot \mathbf {u}\Vert \). Otherwise the algorithms are equivalent and BLISS can even be used to verify BLISS-B signatures without modification.
2.5 Verifying signatures
The BLISS signature verification process is described in Algorithm 3. We note that verification is very fast; only two quick checks for norm bounds and a single ring multiplication is required in addition to a call to the random oracle.
3 Security of the random oracle
Is there a trivial way to forge a signature \(( \mathbf {t}, \mathbf {z}, \mathbf {c})\) that will pass signature verification (Algorithm 3) for some public key \( \mathbf {a}\) and message \( \mathbf {M}\)?
We can easily choose arbitrary \( \mathbf {t}\) and \( \mathbf {z}\) that pass steps 1 and 2 (norm bounds). What is left is the matching of the oracle outputs \( \mathbf {c} \mathop {=}\limits ^{?} \mathbf {c}^\star \). There are \(\left( {\begin{array}{c}n\\ \kappa \end{array}}\right) \) possible ways for \(H( \mathbf {w}, \mathbf {M})\) to construct a vector with \(\kappa \) entries as 1, and remaining \(n - \kappa \) entries being zero.
Examining the BLISS reference implementations, we find that \( \mathbf {c}\) may not be transmitted or used as a vector (or a sorted list) but as an unsorted output of the oracle; in this case there are \(\frac{n!}{(n-\kappa )!}\) possibilities. To get from list (ordered set) matching to vector (unordered set) matching complexity, one can simply rearrange the input \( \mathbf {c}\) values in the signature to match the output from the oracle for the forged message \( \mathbf {M}\) (note that this is not as easy with our revised Oracle described in Sect. 3.1). These entropies (given in Table 2) appear to be consistent with security claims.
This is reasonable in classical computing where \(O(2^H)\) pre-image search is required where H is the entropy of target range; however the signer herself can create a signature that matches two different messages with relative ease. BLISS signatures are therefore not collision resistant.
We note that pre-image search is one of the things that quantum computers do well. This leads us the following simple theorem:
Theorem 1
(Exhaustive Quantum Forgery) A quantum adversary can forge an arbitrary BLISS signature with \(O\left( \sqrt{\left( {\begin{array}{c}n\\ \kappa \end{array}}\right) }\right) \) complexity.
Proof
With fixed \( \mathbf {t}\) and \( \mathbf {c}\) in Algorithm 3 one can mount an exhaustive pre-image search to find some message \( \mathbf {M}\) or small-norm vector \({\varDelta } \mathbf {z} = \mathbf {z} - \mathbf {z'}\) that satisfies \( \mathbf {c} = H( \mathbf {w} + {\varDelta } \mathbf {z}, \mathbf {M})\). This search with Grover’s Algorithm [15, 16] will require \(\frac{\pi }{4} \sqrt{\left( {\begin{array}{c}n\\ \kappa \end{array}}\right) }\) lookups. This is optimal for a quantum search [39]. \(\square \)
We observe that with the suggested parameters and security levels in Tables 1 and 2, the security level against a quantum adversary falls short of the target; for example, BLISS-I and BLISS-II with claimed 128-bit security would fall with complexity of roughly \(2^{66}\).
3.1 New random oracle
With \(n=512\), a trivial encoding of \( \mathbf {c}\) requires \(9\kappa \) bits for transmitting the indexes. In BLZZRD the \( \mathbf {c}\) vector itself (or the nonzero index set) is not transmitted. An intermediate function \(H_i( \mathbf {w}, \mathbf {M})\) is first used to compute a \(\theta \)-bit random hash \(c_\theta \). The intermediate then can be converted to the \( \mathbf {c}\) vector with ones at exactly \(\kappa \) positions via \(H_o\) oracle function:
Related random Oracle construction was also proposed in [37], were hash function output was extended using a PRNG. Fortunately we now have a hash standard that directly supports eXtendable-Output Functions (XOFs).
BLZZRD uses the SHA3 hash function and SHAKE XOF [13, 14]. \(H_i\) is implemented with SHA3-256 or SHA3-384, depending on corresponding \(\theta \) value:
Contents of \( \mathbf {w} = (w_0 ~|~ w_1 ~| \cdots |~ w_{n-1})\) are encoded as network byte order 16-bit integers. To produce the \( \mathbf {c}\) indexes, SHA3’s SHAKE256 extendable-output function (XOF) is used. The \(c_\theta \) intermediate hash is used to seed the XOF, from which an arbitrary number of 16-bit big-endian values can be extracted. These are masked to index range \([0, n-1]\) and rejected if already contained in \( \mathbf {c}\). This process is repeated until \(\kappa \) ones are found for the \( \mathbf {c}\) vector.
Note that SHAKE256 accepts an arbitrary-sized input “key” and has an internal chaining capacity of 512 bits. Its collision resistance as a hash function is at 256-bit security level, and this is also its security level against a quantum pre-image attack; more than adequate for 128-, 160-, and 192- bit security goals.
When transmitting or comparing signatures (Algorithms 2 and 3) we may use \(c_\theta \) instead of the full \( \mathbf {c}\) vector or index set. This leads to more compact signatures. See Table 2 for numerical values for \(\theta \); it has been chosen to be the double of the of the security parameter to counter the Quantum Forgery Attack of Theorem 1. The new \(\kappa \) parameter is chosen so that \(\left( {\begin{array}{c}n\\ \kappa \end{array}}\right) > 2^\theta \).
The increased \(\kappa \) has an impact on both signature creation and verification speed, and also the rejection ratio (Step 18 in Algorithm 2.) Experimentally the slowdown is less than \(30\%\) in all cases. This is somewhat arbitrary due to the fashion that the constant M is determined in [9].
Resistance against general quantum attacks Even though our modification of \(\kappa \) helps to secure the scheme against this particular quantum attack, it does not secure the scheme against others. We suggest other authors to suggest revised parameters for other parameters.
4 Discrete Gaussians
Obtaining random numbers from a Gaussian (normal) or other non-uniform distributions is called sampling. Cryptographically secure sampling is required by many Lattice-based cryptographic algorithms; see [10] for an overview.
4.1 Gaussian sampling
A random sampler from a zero-centered discrete Gaussian distribution \({\mathcal {N}}_{\mathbb {Z}}(0, \sigma ^2)\) returns integer \(x \in {\mathbb {Z}}\) with probability given by density function \(\rho _\sigma (x)\). This probability mass of discrete Gaussian distribution at x is exactly proportional to \(\rho _\sigma (x) \propto \exp (-\frac{x^2}{2 \sigma ^2})\), where \(\sigma \) is a deviation parameter (see Fig. 1). For \(\sigma \gtrapprox 2\) we can approximate it to high precision with
4.2 Tailcut
As |x| grows, \(\rho _\sigma (x)\) eventually diminishes into statistical insignificance. We may choose a cryptographic threshold parameter such as \(\epsilon = 2^{-128}\) and simply ignore all \(|x| > \tau \) when \(2 \sum _{x > \tau } \rho _\sigma (x) \le \epsilon \). Therefore the probability that a random x from the distribution satisfies \(-\tau< x < \tau \) is greater than \(1 - \epsilon \). We call \(\tau \) the “tailcut parameter.” It’s typical numerical value for \(\epsilon = 2^{-128}\) bound is related to deviation by roughly \(\tau = 13.2 \sigma \).
4.3 Efficient computation of density tables
Essentially all sampling algorithms require distribution density tables. Evaluation of transcendental functions (such as \(\exp \)) is slow if generic algorithms are used.
We derive two parameters \(b = \exp \big (-\frac{1}{2\sigma ^2}\big )\) and \(c = {1}/{\sigma \sqrt{2 \pi }}\) from deviation \(\sigma \). Equation (3) can now be written as \(\rho _\sigma (x) = c b^{(x^2)}\). Since \(x^2 \in {\mathbb {Z}}\), square-and-multiply exponentiation algorithms (analogous to those used for modular exponentiation) may be used to efficiently compute the power \(b^{(x^2)}\) at arbitrary \(x \in {\mathbb {Z}}\). Real values can be kept at fixed point range [0, 1] if floating point arithmetic is not available.
When tabulating the density, due to symmetry \(\rho _\sigma (x) = \rho _\sigma (-x)\) we consider the sequence \(t_0, t_1, t_2, \ldots \) with \(t_i = \rho _\sigma (i)\). By observing that the ratio of consecutive values satisfies \(\frac{t_{i+1}}{t_i} = u_i = b^{2i+1}\) we arrive at the following recurrence:
The algorithm of Eq. (4) computes consecutive discrete Gaussian density function values with only two multiplications per point (\(b^2\) stays constant). This new technique makes the table initialization process much faster, a great advantage for limited-resource implementations.
4.4 Gaussian sampling algorithms
We first analyze the “inversion sampler” which is one way of using uniform random bits to select an element from the target distribution by inverting its cumulative distribution. Since \({\mathcal {N}}_{\mathbb {Z}}(0, \sigma ^2)\) is symmetric we can define a cumulative sequence \(s_i = \sum _{x=-i}^{i} \rho _\sigma (x)\). It can be computed as an extension of sequences of Eq. (4) via \(s_0 = t_0\) and \(s_i = s_{i-1} + 2 t_i\). Clearly the sum of all probabilities converges to \(s_\infty = 1\).
For cryptographic applications we can assume a source of unbiased random bits \(z_i \in \{0,1\}\). A sequence of n random bits can be viewed as real-valued \(z \in [0,1]\) via binary fraction \(z = 0.z_1z_2z_3\ldots z_n\). When n is finite, \(z_1, z_2, \ldots z_n\) only defines a range of size \(2^{-n}\). For uniformly (\(n=\infty \)) random \(z \in [0,1]\) the corresponding sampled integer x can be derived with additional random sign bit via
This corresponds to “inversion sampling,” and can be implemented via a binary search into monotonically increasing table \(s_i\) by first randomizing a large number of bits to create a high-precision real number z.
A wide variety of sampling methods such as Inversion Sampling [28], Knuth-Yao Sampling [20], The Ziggurat Method [3, 11, 25, 26], Kahn-Karney Sampling [19], and “Bernoulli” sampling [9] have been proposed for lattice cryptography.
5 Arithmetic coding
Arithmetic Coding is a classical data compression technique [31], notable for its optimality under certain conditions. Information theory tells us that the average entropy (in bits/symbol) of a stationary and memoryless discrete source such as \({\varOmega }= {\mathcal {N}}_{\mathbb {Z}}(0, \sigma ^2)\) is
Equation (6) gives us a lower bound for bits required to represent any discrete Gaussian sequence; this is also the expected number of random entropy bits required in sampling.
Theorem 2
Arithmetic coding is optimal for a static and memoryless discrete source \({\varOmega }\) in the sense that it is able to encode n symbols from such a source into \(n H({\varOmega }) + O(1)\) data bits.
Proof
See Section 1.5 of [35]. \(\square \)
5.1 Implementing a Binary Arithmetic Coder
Arithmetic Coding and Decoding can be implemented in many ways. Binary Arithmetic Code (BAC) [23] is one such approach. For BLZZRD reference code, we implemented a BAC encoder and decoder for a static (non-dynamic) distribution. Range scaling details of the new codec were inspired by [35], even though that implementation isn’t a BAC. See [38] for a classic implementation of arithmetic coding.
Our implementation is in C language, uses 64-bit fixed point precision, and is highly portable. Importantly, we pose no specific limits on buffer carry bit back-propagation, such as “bit stuffing” used in Q-Coder hardware architecture [29]. Full source is distributed together with the reference implementation (see Sect. 7). The implementation is only about 250 lines.
Table 3 gives the variable conventions used in the implementation. Constants used to define the BLZZRD BAC are as follows:
- P :
-
Bit precision used by the implementation. We used \(P=64\).
- D :
-
Alphabet size in bits. For discrete Gaussians, we need \(2^D > 2\tau \).
With a BAC, an alphabet of size \(2^D\) can be viewed as a binary tree of depth D. For each one of the D binary decisions in encoding or decoding, a single comparison is required.
The input frequencies can be arbitrarily scaled “counts.” The probability of occurrence of x is
Given a table of frequencies \( \mathsf {freq}[2^D]\), function \( \mathsf {BuildDist}\) (Algorithm 4) creates a corresponding scaled BAC distribution table “tree” \( \mathsf {dist}[2^D]\) that is used by both encoding and decoding functions.
The encoding routine \( \mathsf {AriEncode}\) (Algorithm 6) requires a helper function \( \mathsf {StoreWithCarry}\) (Algorithm 5) to propagate carries. Arithmetic coding routines generally require an output buffer as these carries cannot be reliably predicted. The decoding routine \( \mathsf {AriDecode}\) (Algorithm 8) utilizes a helper function \( \mathsf {ShiftGetBit}\) (Algorithm 7) to get new bits into the working variable x.
The coding and decoding procedure is illustrated by Fig. 2. Exhaustive testing with various distributions was performed to verify the correct operation of the codec.
5.2 Compressing signatures
We have implemented arithmetic coding compression and decompression for the \( \mathbf {t}\) component of signatures generated with new BLZZRD parameters. The \( \mathbf {z}\) vector was also modeled as discrete Gaussian with a small \(\sigma \). See Table 4 for our parameters and experimental results. Rather than transmitting the \( \mathbf {c}\) vector indexes, an intermediate hash of \(\theta \) bits is transmitted. The last line gives the experimental average signature sizes, including overhead required to encode parameter and array sizes.
5.3 Comparison with a Huffman code
The extended version of [30] describes an advanced “block” Huffman encoding technique for the BLISS-1 parameter set. A similar technique is also implemented in the strongSwan project.
The codec achieves good performance (for a Huffman code) by assigning codes to quadruplets \(\Big (\lfloor \frac{{{\mathrm{abs}}}(t_{2i})}{2^8} \rfloor , \lfloor \frac{{{\mathrm{abs}}}(t_{2i+1})}{2^8} \rfloor , z_{2i}, z_{2i+1} \Big )\) rather than individual \(t_i\) and \(z_i\) values. The lower 8 bits of \(t_i\) values is stored as they are, and up to four sign bits are stored for nonzero entries.
We note that the 64-entry Huffman tree given in Appendix B of [30] for BLISS-I can only encode values in range \(-1023 \le t_i \le 1023\). Since \(t_i\) comes from Gaussian distribution with \(\sigma = 215\), there is a probability of \(P = 1 - \sum _{x=-1023}^{1023} \rho _\sigma (x) \approx 2^{-18.98}\) that individual \(t_i\) values will overflow and a probability of \(1-(1-P)^n \approx 2^{-9.98}\) (or roughly one in thousand) that an individual \( \mathbf {t}\) signature vector cannot be compressed using the proposed code.
Ignoring these frequent overflow conditions, our exhaustive testing has showed that the block Huffman codec compresses \(( \mathbf {t}, \mathbf {z})\) to an average of 5650.1 bits/signature. Our arithmetic coder achieves a slightly better result, 5565.2 bits/signature. A non-customized Huffman coding technique would yield significantly inferior results.
Furthermore, [30] states that “\( \mathbf {c}\) does not contain any easily removable redundancy,” and therefore \(\kappa \log _2(n) = 207\) bits are used. This is not strictly accurate as our new oracle (Sect. 3.1) requires a smaller number of bits. In case of the original BLISS-I security parameters only \(\theta =128\) bits would be required, a \(38 \%\) drop. For BLZZRD-I, we have \(\kappa =58\) and \(\theta =256\), indicating a reduction to less than half of the corresponding plain BLISS encoding size.
Arithmetic Coding is somewhat slower than Huffman coding, but as can be observed in Table 7, compression is not a bottleneck in the signing or verification operation. As indicated by Theorem 2, Arithmetic Coding is the most efficient way to compress signatures with elements from non-uniform but static distribution.
5.4 Sampling with an arithmetic decoder
An interesting consequence of the information theoretic optimality of an arithmetic coder is that when the decoder is fed uniform random bits, it will output random samples in the desired target distribution. Therefore the same code, circuitry, and tables that are used to compress signatures can double as a random sampler as well. We have utilized Algorithm 8 as a Gaussian sampler in this fashion. With precision \(P=64\) we may reasonably expect to reach a security level close to 128-bit level, based on the “Valiant-Valiant” sampling Theorem and conjectures [34].
Table 5 gives performance characteristics of our initial implementation. The binary search CDF sampler used tables of size \(2^{12}\) and an additional sign bit, whereas the BAC sampler used tables of size \(2^{13}\) without special coding for sign bits. The random usage was calculated from average number of random bytes used to sample vectors of length \(n=512\). The table also includes numbers for a blinded sampler (Sect. 6.2). Precision of \(P=64\) was used in all cases and the speeds are comparable. We observe that the main advantage of the new BAC sampler is that it uses a very small number of true random bits; the actual random usage is typically within \(1.3 \%\) of the theoretic minimum for vector sizes used on Lattice cryptography in experiments with our implementation.
By comparison, The Knuth-Yao sampler can be also be expected to use a small number of bits, shown to average \(H+2\) bits per sample [20]. However, Knuth-Yao always uses a whole number of bits per sample whereas a BAC sampler can be utilized to create vectors of samples where bits of entropy are “shared” between individual samples. This elementary property directly follows from Theorem 2. In many ways the Knuth-Yao sampler compares to an BAC sampler like Huffman Codes compare to Arithmetic Compression.
For BLZZRD parameters the actual randomness usage of Knuth-Yao is about \(16 \%\) higher than the BAC sampler for vectors of length \(n=512\). Knuth-Yao of [32] is suitable primarily for small deviations used in Ring-LWE encryption (\(\sigma \approx 3\)), not for larger \(\sigma \) required by BLISS/BLZZRD.
6 Randomized side-channel countermeasures for ideal lattices
Blinding is a standard countermeasure against both the timing attack [21] and emissions-based attacks such as Differential Power Analysis [22] for traditional public key cryptosystems. Blinding countermeasures add randomness to private key operations, making determination of secrets from observations more difficult for the attacker. In case of RSA, there are two kinds of blinding, base blinding and exponent blinding. In case of ECC, scalar blinding can be used.
Examining private key operations in lattice cryptosystems, we note that there are two main components that can introduce side-channel vulnerabilities to the implementation; ring polynomial arithmetic and Gaussian sampling. We have implemented analogous blinding countermeasures against both. For BLISS/BLZZRD (Algorithm 2), special note should be made to implement the GreedySC component using side-channel resistant techniques as well. An advanced cache attack against a BLISS implementation was described in CHES 2016 [2], which attacked the samplers of a reference implementation.
6.1 Blinded polynomial multiplication
Basic arithmetic in \({\mathbb {Z}}_q\) and especially the modular reduction by small prime q may easily introduce emissions to an ideal lattice cryptography implementation. For example, the arithmetic operations described for a low-resource Ring-LWE implementation in [24] contain a number of data-conditional execution cases.
The simplest form of polynomial blinding is to multiply polynomials with random constants \(a, b \in \mathbf {Z}_q\). This can be done in regular or NTT domain; the results are equivalent. One can set \(a=b^{-1}\) or multiply the result by \(c = (ab)^{-1}\):
Note that anti-cyclic NTT multiplication requires each polynomial to be “pre-processed” by multiplying entries with tabulated roots of unity \(\omega ^i\) anyway. Therefore by choosing \(a=\omega ^i, a=\omega ^j\), this type of blinding can be done with virtually no extra cost. The normalization constant becomes \(c = \omega ^{-i-j}\).
An another type of blinding of anti-cyclic polynomial multiplication can be achieved via circularly “shifting” the polynomials. As noted in Sect. 2.1, we may write a polynomial as \( \mathbf {f}=\sum _{i=0}^{n-1} f_i x^i\). Shifting by j positions is equivalent to computing
Here the coefficients are therefore simply rotated in anti-cyclic fashion. Both constant multiplication and shifting by c are captured by function \( \mathsf {PolyBlind}\) (Algorithm 9).
Algorithm 9, \( \mathsf {PolyBlind}\) is very fast. The inverse operation \( \mathsf {PolyBlind}'( \mathbf {v}, -s, c^{-1})\), distinguished by a negative shift value s, is equally easy to construct. With that function, we have
Due to isometries of the anti-circulant ring, we can use a total of four blinding parameters: a, b (constants) and r, s (shift values) in the blinded scheme to compute the polynomial product \( \mathbf {f} * \mathbf {g}\):
One may choose a and b from tabulated roots of unity; \(a=\omega ^i, a=\omega ^j\) and avoid computing the inverse since \((ab)^{-1} = \omega ^{-(i+j)}\). This type of blinding has a relatively small performance penalty. If roots of unity are used as constants, the total amount of “noise” entropy introduced is \(4\log _2(n) = 36\) bits.
The basic blinding technique is generally applicable to schemes based on ideal lattices. For example the optimized implementations of ring arithmetic for the “New Hope” Ring-LWE key exchange [1] can be blinded in straightforward fashion and with only a negligible performance penalty.
6.2 Blinded Gaussian sampling
We note that the BLISS/BLZZRD algorithm always samples vectors of n variables at once. We define a function \( \mathsf {VectorSample}(n, \sigma ) = {\mathcal {N}}^n_{\mathbb {Z}}(0, \sigma ^2)\) that produces a vector of n samples from discrete Gaussian with deviation parameter \(\sigma \). Step 1 of signing process (Algorithm 2) can be written as
If \( \mathsf {VectorSample}\) is implemented naively as a loop, emissions from the implementation may reveal information about the random numbers being sampled and attacker may determine which elements of the random vector have specific features, as is done in the cache attack presented in CHES 2016 [2]. This may be alleviated to some degree by using \( \mathsf {VectorShuffle}( \mathsf {VectorSample}(n, \sigma ))\), which is just a random shuffle of the Gaussian sample vector.
From probability theory we know that the sum of any two discrete Gaussian distributions is a discrete Gaussian distribution. More precisely, variances (which is the square of deviation) are additive. Let X and Y have distributions \({\mathcal {N}}_{\mathbb {Z}}(\mu _X, \sigma ^2_X)\) and \({\mathcal {N}}_{\mathbb {Z}}(\mu _Y, \sigma ^2_Y)\), respectively. Then their sum \(X + Y\) has distribution \({\mathcal {N}}_{\mathbb {Z}}(\mu _X + \mu _Y, \sigma ^2_X + \sigma ^2_Y)\). With zero-centered distributions the average does not change, but the resulting deviation will be \(\sigma _{X+Y} = \sqrt{\sigma ^2_X + \sigma ^2_Y}\). By induction, this can be generalized to more variables. We use this property to create a more secure Gaussian vector sampler. Algorithm 10 constructs the target distribution from a sum of random samples from \(\sigma ' = \frac{1}{\sqrt{m}}\sigma \).
As pointed out in [30], one can also choose \(\sigma '=\frac{\sigma }{\sqrt{1+k^2}}\) and construct the final distribution as \(Z = X + kY\). The convolution parameter k must be carefully chosen so that the statistical distance to the actual target distribution is not too great. Table 6 gives some appropriate parameters to use. In the resulting construction all elements of the second vector are simply multiplied by the integer constant k:
The algorithm has a performance impediment factor of m or less. However, fast “leakier” algorithms may be used. More significantly, the required tables are much smaller. For example, a CDF table of only 256 entries gives a secure tailcut at \(\tau = 14.34 \sigma \) for BLZZRD-I with parameters of Table 6.
The amount of noise entropy introduced by the permutations is technically in thousands of bits. Even though this does not directly translate to attack complexity, this countermeasure makes emissions- and cache-based attacks on the sampler significantly harder, defeating the BLISS attack of CHES 2016 [2]. It may be possible to adopt the attack to counter some blinding parameters, but probably not all. The countermeasure parameter m can be increased without significantly modifying the implementation or breaking compatibility.
7 Implementation
An experimental reference implementation that contains most of the features described in this work is available as an self-contained public domain distribution https://github.com/mjosaarinen/blzzrd. Table 7 summarizes its performance on a 2015 laptop computer (Intel Core i7-4870HQ @ 2.50 GHz).
On the same system, OpenSSL’s (version 1.0.2g) optimized implementation of ECDSA with the NIST P-256 curve [12] required 0.039 ms for creating a signature and 0.097 ms for verification. All variants of Reference BLZZRD have at least 40% faster signature verification when compared to this implementation of NIST P-256 ECDSA.
NIST P-256 is the fastest curve implemented by OpenSSL; BLZZRD outperforms all ECDSA curves in signature verification with a high margin. For larger curves (409 bits or more), BLZZRD signature generation was also faster than ECDSA.
We note that (unlike the OpenSSL ECDSA benchmark implementation) our BLZZRD reference implementation has no assembly language optimizations and was generally written for readability and maintainability rather than for raw speed. It is also the first implementation with side-channel countermeasures, so direct performance comparisons to other Lattice-based schemes with countermeasures are difficult.
8 Conclusions
We have offered a number of techniques that can be used to improve the security and performance of lattice public key algorithms. Using these techniques, we constructed BLZZRD, a practical, evolutionary variant of the BLISS-B signature scheme.
We have described a direct Grover’s attack on the random oracle component of the BLISS signature scheme. In order to counter this attack, a new hash-based random oracle is proposed, with adjusted \(\kappa \) parameter and an “intermediate” hash which reduces signature size. It is currently not easy to estimate the security of all components of BLISS/BLZZRD against quantum attacks, but the new random oracle parameters are consistent with suggested quantum-resistant key sizes for symmetric ciphers [7].
Many algorithms in Lattice cryptography utilize discrete Gaussian variables in signatures and ciphertexts. We show that arithmetic coding can be used to compress these quantities in essentially optimal fashion. We describe an efficient Binary Arithmetic Coder (BAC) that produces smaller signatures than previous compressors. A somewhat surprising finding is that arithmetic coders can be adopted as non-uniform (Gaussian) Samplers that have comparable performance characteristics (millions of samples per second) to other sampling algorithms, but require only a (optimally) small amount of true random bits.
Standard “blinding” side-channel countermeasures for public key algorithms introduce randomness (noise) into private key operations, thus making determination of secrets more difficult. In ideal lattice cryptography, the main components used by private key operations typically are ring arithmetic and non-uniform (Gaussian) random sampling.
For ring arithmetic, we introduce “blinded polynomial multiplication,” a simple randomization technique based of constant multiplication and rotation of polynomials. This technique is cheap to implement and utilizes the specific isometries of the types of anti-circulant rings often used in lattice cryptography.
For Gaussian sampling, we note that lattice cryptography algorithms typically require vectors rather than individual samples. We show that sampling processes can be blinded by shuffling these vectors and combining multiple Gaussian distributions with each other. This also results in a smaller implementation footprint since the size of required CDF tables becomes smaller. The general countermeasure is effective against side-channel attacks such as the cache attack recently described against BLISS.
Notes
Original BLISS reference implementations are available from: http://bliss.di.ens.fr/.
References
Alkim, E., Ducas, L., Pöppelmann, T., Schwabe, P.: Post-quantum key exchange—a new hope. In: 25th USENIX Security Symposium (USENIX Security 16), pp. 237–343. USENIX Association (2016)
Bruinderink, L.G., Hülsing, A., Lange, T., Yarom, Y.: Flush, gauss, and reload—a cache attack on the BLISS lattice-based signature scheme. In: Gierlichs, B., Poschmann, A.Y. (eds.) CHES 2016, vol. 9813 of LNCS, pp. 323–345. Springer, Berlin (2016)
Buchmann, J., Cabarcas, D., Göpfert, F., Hülsing, A., Weiden, P.: Discrete ziggurat: a time-memory trade-off for sampling from a Gaussian distribution over the integers. In: Lange, T., Lauter, K., Lisonĕk, P. (eds.) SAC 2013, vol. 8282 of LNCS, pp. 402–417. Springer, Berlin. Extended version available as IACR ePrint 2014/510 (2014)
Campbell, P., Groves, M., Shepherd, D.: Soliloquy: a cautionary tale. In: ETSI 2nd Quantum-Safe Crypto Workshop in Partnership with the IQC (2014)
CESG. Quantum key distribution: a CESG white paper (2016)
Chen, L., Jordan, S., Liu, Y.-K., Moody, D., Peralta, R., Perlner, R., Smith-Tone, D.: Report on post-quantum cryptography. NISTIR 8105, April 2016
CNSS. Use of public standards for the secure sharing of information among national security systems. Committee on National Security Systems: CNSS Advisory Memorandum, Information Assurance 02-15 (2015)
Ducas, L.: Accelerating bliss: the geometry of ternary polynomials. IACR ePrint 2014/874 (2014)
Ducas, L., Durmus, A., Lepoint, T., Lyubashevsky, V.: Lattice signatures and bimodal Gaussians. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013, pp. 40–56. Springer, Berlin, Extended version available as IACR ePrint 2013/383 (2013)
Dwarakanath, N.C., Galbraith, S.D.: Sampling from discrete Gaussians for lattice-based cryptography on a constrained device. Appl. Algebra Eng. Commun. Comput. 25(3), 159–180 (2014)
Edrees, H., Cheung, B., Sandora, M., Nummey, D.B., Stefan, D.: Hardware-optimized ziggurat algorithm for high-speed Gaussian random number generators. In: Plaks, T.P. (ed.) ERSA 2009, pp. 254–260. CSREA Press, Las Vegas (2009)
FIPS. (FIPS) 186-4, digital signature standard (DSS). Federal Information Processing Standards Publication (2013)
FIPS. Secure Hash Standard (SHS). Federal Information Processing Standards Publication 180-4 (2015)
FIPS. SHA-3 standard: Permutation-based hash and extendable-output functions. Federal Information Processing Standards Publication 202 (2015)
Grover, L.K.: A fast quantum mechanical algorithm for database search. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, STOC’96, pp. 212–219. ACM (1996)
Grover, L.K.: From Schrödinger’s equation to the quantum search algorithm. Am. J. Phys. 69(7), 769–777 (2001)
Howe, J., Pöppelmann, T., O’Neill, M., O’Sullivan, E., Güneysu, T.: Practical lattice-based digital signature schemes. ACM Trans. Embed. Comput. Syst. 14(3), 41:1–41:24 (2015)
Jonsson, J., Kaliski, B.: Public-key cryptography standards (PKCS) #1: RSA cryptography specifications version 2.1. IETF RFC 3447 (2003)
Karney, C.F.F.: Sampling exactly from the normal distribution. Preprint arXiv:1303.6257, Version 2 (2014)
Knuth, D.E., Yao, A.C.: Algorithms and complexity: new directions and recent results. In: Traub, J.F. (ed.) The Complexity of Nonuniform Random Number Generation, pp. 357–428. Academic Press, New York (1976)
Kocher, P.: Timing attacks on implementations of Diffie–Hellman, RSA, DSS, and other systems. In: Koblitz, N., (ed.) CRYPTO’96, vol. 1109 of LNCS, pp. 104–113. Springer, Berlin (1996)
Kocher, P., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M. (ed.) CRYPTO’99, vol. 1666 of LNCS, pp. 388–397. Springer, Berlin (1999)
Langdon Jr, G.G.: An introduction to arithmetic coding. IBM J. Res. Dev. 28(2), 135–149 (1984)
Liu, Z., Seo, H., Roy, SS., Großschädl, J., Kim, H., Verbauwhede, I.: Efficient ring-LWE encryption on 8-bit AVR processors. In: Güneysu, T., Handschuh, H. (eds.) CHES 2015, vol. 9293 of LNCS, pp. 663–682. Springer, Berlin (2015)
Marsaglia, G., Tsang, W.W.: A fast, easily implemented method for sampling from decreasing or symmetric unimodal density functions. SIAM J. Sci. Stat. Comput. 5(2), 349–359 (1984)
Marsaglia, G., Tsang, W.W.: The ziggurat method for generating random variables. J. Stat. Softw. 5(8), 1–7 (2000)
NSA/CSS. Information assurance directorate: commercial national security algorithm suite and quantum computing FAQ (2016)
Peikert, C.: An efficient and parallel Gaussian sampler for lattices. In: Rabin, T. (ed.) CRYPTO 2010, vol. 6223 of LNCS, pp. 80–97. Springer, Berlin (2010)
Pennebaker, W.B., Mitchell, J.L., Langdon Jr, G.G., Arps, R.B.: An overview of the basic principles of the Q-coder adaptive binary arithmetic coder. IBM J. Res. Dev. 32(6), 717–726 (1988)
Pöppelmann, T., Ducas, L., Güneysu, T.: Enhanced lattice-based signatures on reconfigurable hardware. In: Batina, L., Robshaw, M. (eds.) CHES 2014, vol. 8731 of LNCS, pp. 353–370. Springer, Berlin. Extended version available as IACR ePrint 2014/254 (2014)
Rissanen, J.J.: Generalized kraft inequality and arithmetic coding. IBM J. Res. Dev. 20, 198–203 (1976)
Roy, S.S., Reparaz, O., Vercauteren, F., Verbauwhede, I.: Compact and side channel secure discrete Gaussian sampling. IACR ePrint 2014/591 (2014)
Roy, S.S., Vercauteren, F., Mentens, N., Chen, D.D., Verbauwhede, I.: Compact ring-LWE cryptoprocessor. In: Batina, L., Robshaw, M. (eds.) CHES 2014, vol. 8731 of LNCS, pp. 371–391. Springer, Berlin (2014)
Saarinen, M.-J.O.: Gaussian sampling precision in lattice cryptography. IACR ePrint 2015/953 (2015)
Said, A.: Introduction to arithmetic coding—theory and practice. In: Sayood, K. (ed.) Lossless Compression Handbook. Academic Press, Chapter also published as HP Technical report HPL-2004-76 (2002)
Valiant, G., Valiant, P.: An automatic inequality prover and instance optimal identity testing. In: FOCS 2014, pp. 51–60. IEEE Computer Society, Full version available as http://theory.stanford.edu/~valiant/papers/instanceOptFull.pdf (2014)
Weiden, P., Hülsing, A., Cabarcas, D., Buchmann, J.: Instantiating treeless signature schemes. IACR ePrint 2013/065 (2013)
Witten, I.H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Commun. ACM 30(6), 520–540 (1987)
Zalka, C.: Grover’s quantum searching algorithm is optimal. Phys. Rev. A 60, 2746–2751 (1999)
Author information
Authors and Affiliations
Corresponding author
Additional information
Part of this work was funded by the European Union H2020 SAFEcrypto project (Grant No. 644729) while the Author was visiting CSIT, Queen’s University Belfast, UK.
Rights and permissions
About this article
Cite this article
Saarinen, MJ.O. Arithmetic coding and blinding countermeasures for lattice signatures . J Cryptogr Eng 8, 71–84 (2018). https://doi.org/10.1007/s13389-017-0149-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13389-017-0149-6