Introduction

The importance of lattice theory has been profound during the last decades [5]. Researchers have started reexamining the LLL algorithm [14] and introduced variants of it like the Schnorr–Euchner algorithm [21], algorithms that are based entirely on the Gram matrix of the basis vectors [23], floating point versions of reduction algorithms (see Chapter 1 of [18]), and also an algorithm running quadratically with respect to \(\text{ log }M\), where an integer \(d\)-dimensional lattice basis with vectors of norm less than \(M\) is originally given [16, 17]. Finally, a recent algorithm with quasi-linear time complexity with respect to \(\beta \), where \(\beta = \text{ log } \text{ max } \Vert {\underline{b}}_i\Vert \) is presented in [19] and \({\underline{b}}_i\) the basis vectors of the lattice. However, this algorithm is currently only theoretically justified and has not yet been implemented in practice.

The goal of lattice basis reduction is to find a basis of the same lattice, whose vectors are short and nearly orthogonal to each other. The first algorithm to achieve this was the Lenstra, Lenstra, Lovász [14], also known as the LLL algorithm, in 1982. Schnorr and Euchner [21] introduced the first lattice basis reduction algorithm that could efficiently be used in practice in 1991. Since then, many variants of the LLL algorithm have been proposed [2, 5, 1618, 23]. A survey of different aspects of the LLL algorithm is given in [18].

Furthermore, during the mid-1990s, several cryptosystems were introduced whose underlying hard problem was the shortest vector problem (SVP) and/or the closest vector problem (CVP) [10] in a lattice \(L\) of large dimension \(d\). The most important of these, in alphabetical order, were the Ajtai–Dwork cryptosystem [1], the GGH cryptosystem of Goldreich, Goldwasser and Halevi [9] and the NTRU cryptosystem proposed by Hoffstein, Pipher and Silverman [11]. However, it has not been until recently that lattice basis reduction algorithms have started to gain attention. The main reason for this is that it has been shown that cryptosystems based on classical problems are prone to various attacks, and most importantly if the quantum computer were to be built in the near future, we will have to move to post-quantum cryptography [4] which includes lattice-based cryptography as one of its main techniques, in order to ensure the privacy of data.

In our work we examine floating point arithmetic lattice basis reduction algorithms in contrast to exact arithmetic lattice basis reduction algorithms. In order to be able to compare the different basic lattice basis reduction algorithms, we have chosen to implement four of them, namely our variant of the Schnorr–Euchner algorithm, the recent \(L^2\) [16, 17] algorithm and their “buffered” versions. We implemented them as Sage [6, 22] scripts, based on the Python [20] programming language so that no algorithm would gain an advantage due to its implementation in a more efficient language. Moreover, in this way we make the algorithms available in their simplest form to the Sage community. We are not interested in choosing an optimal programming environment in terms of efficiency, since our aim is not to produce the fastest possible implementation but to compare the different algorithms chosen in the same environment. Moreover, we have chosen the Sage environment because it is an open–source computer algebra system (CAS) to which we wanted to contribute. Additionally, it is appropriate for expensive mathematical computations and is supported by a large community.

In this paper, apart from the comparison of the different lattice basis reduction algorithms, we propose a new algorithm. Our effort to design this new algorithm was initiated by [2].

The rest of the paper is organized as follows: In ”Preliminaries” section we describe some mathematical preliminaries. ”A Novel Algorithm Initiated by the Bu ered LLL Gram” section analyzes our proposed algorithm. ”Experimental Results” section describes the experimental results on the comparison of the four lattice basis reduction algorithms that we implemented. Finally, in ”Conclusions and Future Work” section we draw some conclusions and propose future work.

Preliminaries

A lattice \(L \subset {\mathbb {R}}^n\) is an additive discrete subgroup of \({\mathbb {R}}^n\) such that \(L=\big \{\sum _{i=1}^d x_i {\underline{b}}_i | x_i \in {\mathbb {Z}}, 1 \le i \le d\big \}\) with linearly independent vectors \({\underline{b}}_1, \ldots , {\underline{b}}_d \in {\mathbb {R}}^n (d \le n)\). \(B=({\underline{b}}_1, \ldots , {\underline{b}}_d) \in {\mathbb {R}}^{n \times d}\) is the lattice basis of \(L\) with dimension \(d\). The basis for a lattice is not unique. Different lattice bases \(B\) and \(B^{\prime }\) for the same lattice \(L\) are related to each other by means of a unimodular transformation \(U\), i.e., \(B^{\prime }=BU\) with \(U \in {\mathbb {Z}}^{d \times d}\) and \(|\text{ det }(U)|=1\). Unimodular transformations include the exchange of two base vectors, usually referred to as a swap, adding an integer multiple of a basis vector to another, usually referred to as a reduction or translation, and multiplying a basis vector by \(-\)1. The goal of lattice basis reduction is to start from an original lattice basis and reach a lattice basis whose vectors are relatively short and nearly orthogonal to each other via unimodular transformations.

The Gram Matrix \(G\) of a lattice \(L\) with basis \(B = ({\underline{b}}_1, \ldots , {\underline{b}}_d) \in {\mathbb {R}}^{n \times d}\) is defined as \(B^T B\).

Unlike the lattice basis, the determinant of a lattice is an invariant \(\big ( \text{ det }(L)=\sqrt{\text{ det }(B^T B)} \big )\), independent of the particular basis and has a natural geometric representation [5]. Its geometrical meaning is that it represents the \(n\)-dimensional volume of the parallelepiped whose edges are the basis vectors.

All lattice basis reduction algorithms are essentially based on the Gram–Schmidt orthogonalization (GSO) algorithm [5] or an equivalent technique like Cholesky factorization [16, 17]. Let \({\underline{b}}_1, \ldots , {\underline{b}}_n\) be a basis of \({\mathbb {R}}^n\). The Gram–Schmidt orthogonalization of \({\underline{b}}_1, \ldots , {\underline{b}}_n\) produces the following basis \({\underline{b}}_1^{*}, \ldots , {\underline{b}}_n^{*}\):

The \(\mu _{i,j}\)’s are called orthogonal projection coefficients [5].

The Gram–Schmidt basis vectors \({\underline{b}}_1^{*}, \ldots , {\underline{b}}_n^{*}\) are an orthogonal basis but usually not in the lattice generated by \({\underline{b}}_1, \ldots , {\underline{b}}_n\). In order to obtain a basis of the lattice we use variants of the Gram–Schmidt process resulting in vectors that are nearly orthogonal to each other.

In order to measure the quality of a lattice basis we use the notion of orthogonality defect. The orthogonality defect of a lattice basis \(B=({\underline{b}}_1, \ldots , {\underline{b}}_d) \in {\mathbb {R}}^{n \times d}\) defined as \(\text{ dft }(B)={{\prod _{i=1}^d\Vert {\underline{b}}_i\Vert }\over {\text{ det }(L)}}\) allows one to compare the quality of different lattice bases. It holds that \(\text{ dft }(B)\ge 1\) in general and \(\text{ dft }(B)=1\) for an orthogonal basis. The goal of lattice basis reduction is to algorithmically determine a basis with minimal defect. A basis is said to be LLL–reduced with parameters \(\delta \) and \(\eta \) [14] if it satisfies the following conditions:

$$\begin{aligned} |{\mu }_{i,j}| \le \eta \quad \text{ for } 1 \le j < i \le d, \end{aligned}$$

referred to as the “size condition”, and

$$\begin{aligned} {{|{\underline{b}}_i^{*}}+{\mu }_{i,i-1}{\underline{b}}_{i-1}^{*}|}^2 \ge \delta {|{{\underline{b}}_{i-1}^{*}}|}^2 \quad \text{ for } 2 \le i \le d, \end{aligned}$$

referred to as the “exchange condition”.

The reduction parameter \(\delta \) is a real number such that \({1 \over 4} < \delta < 1\) with standard value \(\delta = {3 \over 4}\) and the parameter \(\eta \) satisfies \(0.5 \le \eta < \sqrt{\delta }\).

Two fundamental problems related to the security of lattice based cryptosystems are the shortest vector problem (SVP) and the closest vector problem (CVP) [10].

The Shortest Vector Problem (SVP) Find a shortest nonzero vector in a lattice \(L\), i.e., find a nonzero vector \({\underline{v}} \in L\) that minimizes the Euclidean norm \(\Vert {\underline{v}}\Vert \).

The Closest Vector Problem (CVP) Given a vector \({\underline{w}} \in {\mathbb {R}}^n\) that is not in \(L\), find a vector \({\underline{v}} \in L\) that is closest to \({\underline{w}}\), i.e. find a vector \({\underline{v}} \in L\) that minimizes the Euclidean norm \(\Vert {\underline{w}}-{\underline{v}}\Vert \).

If a lattice \(L \subset {\mathbb {R}}^n\) has a basis \({\underline{b}}_1, \ldots , {\underline{b}}_d\) consisting of vectors that are pairwise orthogonal, i.e. such that

$$\begin{aligned} {\underline{b}}_i \cdot {\underline{b}}_j = 0 \quad \text{ for } \text{ all } i \ne j, \end{aligned}$$

then it is easy to solve both SVP and CVP ([10] p. 379). Thus the reduction algorithms lead to bases that are appropriate to approximate the hard problems on which the lattice based cryptosystems rely on.

A Novel Algorithm Initiated by the Buffered LLL Gram

The original LLL algorithm [14] is almost never used in practice. Instead one applies floating-point variants of LLL, where the long-integer arithmetic required by Gram–Schmidt orthogonalization is replaced by floating-point arithmetic [17]. The reason for this is because the running time of floating-point arithmetic versions [2, 16, 17, 21] is improved compared to long-integer arithmetic.

Additionally, in recent lattice-basis-reduction algorithms, Gram–Schmidt orthogonalization is replaced by Cholesky factorization [16, 17]. This is due to an improvement to the stability of the algorithm. An effort to achieve this has been proposed by Backes and Wetzel in 2007 [2]. Moreover, they have proposed the use of buffered transformations where the transformations are not applied directly to the lattice basis but are stored in a transformation matrix instead. This matrix is flushed, when its maximum entry exceeds a specific limit related to the bit size of the machine-type integers used (typically 32 or 64 bits).

In this paper we propose a novel algorithm borrowing ideas both from [16, 17], using a Gram-matrix variant to avoid recomputing costly inner products as in [2, 16, 17, 23]. We also reexamine the buffered transformations idea proposed by Backes and Wetzel [2].

Our Proposed Algorithm

Our proposed algorithm is Algorithm 1. In the pseudocode below, the symbol \(\diamond \) means the floating point approximation using the specified precision to the operations performed in the following parentheses.

figure a

The \(\mu _{i,j}\)’s in this algorithm correspond directly to the \(\mu _{i,j}\)’s used in the Gram–Schmidt orthogonalization (GSO). The \(r_{i,j}\)’s and the relation to the \(\mu _{i,j}\)’s are given by the following equations [16, 17]:

$$\begin{aligned} r_{i,j}=<{\underline{b}}_i, {\underline{b}}_j>-\sum _{k=1}^{j-1} \mu _{j,k} \cdot r_{i,k} \end{aligned}$$

and

$$\begin{aligned} \mu _{i,j}={{r_{i,j}} \over {r_{j,j}}} \end{aligned}$$

The \(S_j^{(i)}\)’s are given by the following equations [16, 17]:

$$\begin{aligned} S_j^{(i)}=\Vert {\underline{b}}_i\Vert ^2-\sum _{k=1}^{j-1} \mu _{i,k} \cdot r_{i,k} \end{aligned}$$

and

$$\begin{aligned} S_i^{(i)}=\Vert {\underline{b}}_i^{*}\Vert ^2=r_{i,i} \end{aligned}$$

where \({\underline{b}}_i^{*}\) corresponds to the i-th vector of the GSO.

We note that both the exact computation and the update of the \(\mu _{i,j}\)’s, \(r_{i,j}\)’s and \(S_j^{(i)}\)’s is important to ensure correctness of the algorithm. In our initial approach to achieve this, \(r_{i,j}\)’s and \(S_j^{(i)}\)’s have to be recomputed every time a reduction is performed, since the basis has been modified. However, we can implement an additional optimization in order to avoid recomputing these quantities needlessly. The optimization is performed as follows: We initially compute the \(\mu _{i,j}\)’s and \(r_{i,j}\)’s for \(j \le i\) (see lines 3–11 of Algorithm 1). We then proceed to size-reduce vector \({\underline{b}}_i\) while keeping only the \(\mu _{i,j}\)’s up-to-date (see lines 19–20). Once \({\underline{b}}_i\) has been size-reduced, we recompute the \(r_{i,j}\)’s (see lines 23–25) and \(S_j^{(i)}\)’s for \(j \le i\). That way, these quantities will be computed \(O(1)\) times instead of \(O(i)\) times, as would be the worst case of our initial approach. The computation of the \(S_j^{(i)}\)’s is stalled since they are not really needed until the loop condition of line (37).

The algorithm COMPUTE_GRAM(\(B\)) in Algorithm 1 simply computes the inner-products of the basis vectors for each pair resulting to the computation of the Gram matrix. Taking advantage of the symmetric property of the Gram matrix it suffices to compute and use only its upper triangular part. Also, the APPROX_BASIS_GRAM(G), creates an approximate version of the Gram matrix by converting its entries to floating point approximations. Finally, the APPROX_VECTOR_GRAM(\(G^{'}, G, k\)) algorithm updates the approximate Gram matrix \(G^{'}\) after a vector has been size-reduced [2].

Next, we present the algorithm REDUCE_GRAM which is applied when the scalar multiplication of an integer with a basis vector is subtracted from another basis vector.

figure b

The Swap Gram Algorithm

The final algorithm we examine is the one used for the Gram matrix, when a swap is performed:

figure c

This algorithm is an adapted version of the algorithm in page 35 of [23]. We adapted S. Wetzel’s algorithm in order to work with the upper triangular part of the Gram matrix.

For a short proof of why this algorithm correctly updates the Gram matrix see Appendix.

A Revised Buffered Transform Algorithm

In [2], Backes and Wetzel, introduced an algorithm that gradually performs the transformations needed to LLL-reduce a lattice basis by using an auxiliary matrix \(T\) as a buffer. The mathematical background behind this, is elementary (unimodular) matrices. In addition, a \({\underline{T\_max}}\) vector is used for keeping track of the maximum entries of T and that is used to determine whether the buffer needs to be flushed or not. In our work we propose a revisited version of this algorithm, in Algorithm 4, shown below.

Furthermore, a number of modifications to Algorithm 1 are required in order to incorporate this technique. In line (15), the transformation: \({\underline{b}}_i \leftarrow {\underline{b}}_i - \lceil \mu _{i,j} \rfloor \cdot {\underline{b}}_j\) is buffered instead of being applied directly to \(B\). In addition every time the vectors of \(B\) are interchanged, this transformation is also buffered. This is done by replacing \({\underline{b}}_i \leftrightarrow {\underline{b}}_{i-1}\) in line (38) with the swap operations: \({\underline{t}}_{i-1} \leftrightarrow {\underline{t}}_{i}\) and \({\underline{T\_max}}_{i-1} \leftrightarrow {\underline{T\_max}}_i\). Finally, the buffer needs to be flushed right before the termination of Algorithm 1, as it may still contain unapplied transformations.

The initializations required for Algorithm 1 to operate properly are as follows:

$$\begin{aligned}&T \leftarrow I_d\\&{\underline{T\_max}} \leftarrow {(1, \ldots , 1)}^T\\&pos\_min \leftarrow d \end{aligned}$$
figure d

Experimental Results

Lattice Bases Generation

For our experiments we focus on unimodular lattice bases. The reason for this choice is that they require an unusually high number of size-reductions compared to other types of lattice bases [3, 7]. In contrast to the experiments performed by [2] we allow the determinant of a lattice basis to have the value \(-1\) apart from the unique value \(1\) to examine a broader class of unimodular lattices.

The techniques we have used are standard ones for generating unimodular lattice bases [2, 8]. Specifically, we generate three types of unimodular matrices, from now on referred to as “Type A”, “Type B” and “Type C” accordingly based on the following formulas:

where the \(L\) matrices are lower-triangular, the \(U\) matrices upper-triangular and the \(P\) matrices permutation matrices.

The explanation of the selection of these three specific types is as follows: The first category generates unimodular matrices with significantly “small” entries, thus examining the performance of the algorithms for “simple” cases. The other two categories use permutation matrices and matrix multiplication, to generate unimodular matrices with much larger entries that require significantly more unimodular operations to achieve an LLL-reduced basis.

Environment of the Experiments and Actual Experiments Performed

In our work we have chosen to compare four algorithms. First of all we have chosen to compare our proposed algorithm with the \(L^2\) [16, 17] algorithm by Nguyen and Stehlé. The first reason for this choice is that the \(L^2\) algorithm gives the same approximation quality for an SVP as LLL. Secondly, \(L^2\) has an improved worst case running time analysis (see [18] Ch. 10). Additionally, we have implemented two variants of the aforementioned algorithms using the buffered transformations heuristic [2].

We have implemented all the algorithms as Sage scripts, because we wanted to compare them under the same setting and also contribute to the Sage community. Since at this high level of coding, timings may not be so relevant, we also examine more qualitative properties. Thus, following the advice by McGeoch [15] we have compared code counters for dominant operations, which in our case are the number of reductions and deep insertions performed.

Our paper belongs to category c of [13], which means that our aim is to better understand the strengths, weaknesses and operation of interesting algorithmic ideas in practice, rather than an optimized implementation.

In order to ensure comparability, a pre-condition emphasized in [13], we have performed all our experiments on the same machine, namely a workstation with Intel Core i5 CPU, of 2.40 GHz clock speed and 4GB RAM.

The maximum bit–size of the matrix entries used was 80 bits.

We have compared the CPU times for 1,000 bases for each of the three types defined previously, and for each dimension in order to have a sufficiently large sample of lattice bases. We have not used average time but total execution time to compare the different algorithms. Average times can directly be derived from them.

Results of the Experiments

We have executed the four examined algorithms up to dimension 50, with parameters \((\delta , \eta )=(0.99, 0.51)\). The reason for focusing on small dimensions is that according to [16] the correctness proof given by Nguyen and Stehlé [16] holds for dimensions up to 50, for the specific parameters \((\delta , \eta )\) and quadruple precision (which is the highest precision for most programming languages). Since we have observed that there exists no major difference in the running times for consecutive dimensions we have chosen 5 as the dimension step. We did not examine total time, since the execution time apart from CPU-time for our experiments has proved to be negligible.

The results of the running times of the four algorithms for the three different lattice basis types are presented in Figs.  13, where the horizontal axis represents the currently examined dimension while the vertical axis illustrates the execution time in seconds.

Fig. 1
figure 1

Total CPU times for the four different algorithms executed for 1,000 “type A” bases

Fig. 2
figure 2

Total CPU times for the four different algorithms executed for 1,000 “type B” bases

Fig. 3
figure 3

Total CPU times for the four different algorithms executed for 1,000 “type C” bases

As we can easily see the running times for the three different types of bases, increases as we move from “Type A” to “Type B” and consecutively to “Type C”. Accordingly, the number of reductions and to a lower extent the number of deep insertions shows a similar increase. This can be attributed to the larger entries of the bases for more complex types.

From these figures we can observe that the buffered heuristic does not benefit either of these two algorithms. The second observation one can make is that the “LLL Gram” algorithm outperforms \(L^2\) when unimodular lattices are examined for the specific dimensions. Additionally, the difference in CPU times widens as dimension increases.

The number of deep insertions for each type of lattice basis are presented in Figs.  46, while the number of reductions for the three different types of lattice bases are depicted in Figs.  79. Since the number of reductions and/or deep insertions required to obtain an LLL-reduced basis is not affected by whether they are buffered or not, we omit them from Figs.  49. Although the difference in the total number of deep insertions of the two algorithms is negligible, from Figs.  79 it is obvious that the number of reductions required to obtain an LLL-reduced basis using \(L^2\) is overwhelming compared to the ones required by our algorithm. This becomes much more evident when comparing the number of reductions for the highest dimensions examined, where the difference exceeds 200 %. This can be attributed to the fact that the \(L^2\) algorithm performs the reductions progressively, thus requiring many more reductions. This can be attributed to the use of “lazy size reduction” [16, 17] of Babai’s Nearest Plane algorithm. The proposed algorithm performs significantly faster, due to the fact that it requires a much lower number of reductions.

Fig. 4
figure 4

Total number of deep insertions for the four different algorithms executed for 1,000 “type A” bases

Fig. 5
figure 5

Total number of deep insertions for the four different algorithms executed for 1,000 “type B” bases

Fig. 6
figure 6

Total number of deep insertions for the four different algorithms executed for 1,000 “type C” bases

Fig. 7
figure 7

Total number of reductions for the four different algorithms executed for 1,000 “type A” bases

Fig. 8
figure 8

Total number of reductions for the four different algorithms executed for 1,000 “type B” bases

Fig. 9
figure 9

Total number of reductions for the four different algorithms executed for 1,000 “type C” bases

Also, given that the total number of reductions required by the \(L^2\) algorithm compared to our variant of Schnorr–Euchner is significantly bigger, we expected the “buffered” version of \(L^2\) to have a better running time compared to its “raw” counterpart, since the buffer needs to be flushed more frequently. However, that is not the case, as Figs.  13 suggest.

Additionally, every time a deep insertion is performed both \(L^2\) and our algorithm need to backtrack to the point of insertion in order to ensure correctness, which also adds an overhead of reductions needed to obtain an LLL-reduced basis. As Figs.  46 suggest, the difference in the number of deep insertions required between \(L^2\) and our algorithm widens (albeit slowly) thus burdening \(L^2\) with an additional number of reductions.

However, the \(L^2\) algorithm is asymptotically better, as the complexity analysis of [16] proves.

Conclusions and Future Work

Our results show that our algorithm outperforms the \(L^2\) algorithm for dimensions not exceeding the highest dimension available in order for the correctness proof by Nguyen and Stehlé to hold for quadruple precision. This does not cancel the effort to propose an algorithm that performs better asymptotically, since the \(L^2\) algorithm could be more efficient if a better precision were available in standard programming languages (i.e. more than quadruple precision). As future work we would like to examine whether we could use fast matrix multiplication techniques used in the literature, in order to improve the results of the buffered transformations idea. Additionally, we would like to implement the quasi-linear time complexity reduction algorithm [19] and perform appropriate comparisons.