Keywords

1 Introduction

Lattice basis reduction plays an important role in the detection of wireless multiple-input multiple-output (MIMO) systems. For detection problems of lattice type, the optimal maximum-likelihood (ML) decoding can be modeled as the closest vector problem (CVP) [1, 16], which has been proved to be NP-hard [2]. Although many algorithms, like the sphere decoding algorithm [5, 14], can solve CVP exactly, the complexity of these algorithms increases exponentially with the number of transmit antennas [1, 5, 6]. Thus, such optimal solvers are infeasible for real-time systems, where timing is critical. To satisfy the time constraint, many suboptimal solvers with polynomial complexity, like the successive interference cancelation (SIC) decoding, have been proposed [3, 12]. However, the suboptimal detectors may suffer from heavy performance loss at a low signal-to-noise ratio (SNR). It has been found that lattice reduction, used as an efficient preprocessor, has the potential to achieve high performance for suboptimal decoding algorithms. Recently, many reduction algorithms, such as the Lenstra-Lenstra-Lovász (LLL) algorithm [7], effective LLL algorithm [10], partial LLL algorithm [11, 17], and diagonal reduction algorithm [19], have been proposed for SIC decoding. It is proved in [9, 18] that SIC decoding aided by the above reduction notions can achieve the same receive diversity order as the infinite lattice decoding (ILD).

Of all the aforementioned lattice reduction algorithms, the diagonal reduction algorithm is the most efficient one. From our observation [19], the total computation of the diagonal reduction is dominated by the computation of the Givens rotations. Thus, in this paper, we propose to improve the efficiency of the diagonal reduction by replacing the Givens rotation with the more efficient and mathematically equivalent fast Givens transformation [4, p. 218]. The improvement is achieved by substantially reducing the number of multiplication operations required, because two entries of the 2-by-2 fast Givens matrix equal 1. Moreover, the fast Givens technique is general in that it can be incorporated into all the LLL-type lattice reduction methods to enhance performance.

Also, we investigate the basis reduction for dual lattices. In [9], the LLL and effective LLL algorithms for dual lattices are presented. In this paper, we investigate the diagonal reduction for dual lattices and prove that the dual basis of a diagonal reduced basis is also diagonal reduced. In addition, we derive an upper bound for the proximity factors of a family of dual LLL-type reduction aided SIC decoding. Our upper bound not only extends an existing bound for LLL reduction in [9] to a family of reduction methods, but also improves the existing one.

The rest of the paper is organized as follows. In Sect. 2, we briefly introduce the systems model and review the diagonal reduction algorithm. The new algorithm using the fast Givens is given in Sect. 3. Section 4 presents the diagonal reduction for dual lattices and our new upper bound for the proximity factors. In Sect. 5, we demonstrate our simulation results.

Notations: \(\mathbf {B}^\mathrm{T}\), \(\mathbf {B}^{\dag }\), and \(\det (\mathbf {B})\) denote the transpose, the Moore-Penrose inverse, and the determinant of a matrix \(\mathbf {B}\) respectively, \(\mathfrak {R}(z)\) and \(\mathfrak {I}(z)\) the real and imaginary parts of a complex number \(z\), \(\lfloor a \rceil \) the integer nearest to a real number \(a\).

2 Lattice Basis Reduction

2.1 System Model

Consider a MIMO system consisting of \(n_T\) transmit antennas and \(m_T\) receive antennas. The relationship between the \(n_T\,\times \,1\) transmitted signal vector \(\mathbf {x}\) and the \(m_T\,\times \,1\) received signal vector \(\mathbf {y}\) is given by

$$\begin{aligned} \mathbf {y}=\mathbf {H}\mathbf {x}+\mathbf {n}, \end{aligned}$$
(1)

where \(\mathbf {H}\), \(\mathbf {y}, \mathbf {n}\) represent the channel matrix, the received and additive noise signals, respectively. In general, the entries of both \(\mathbf {H}\) and \(\mathbf {n}\) are assumed to be complex-valued independently and identically distributed (i.i.d.) Gaussian variables. Treating the real and imaginary parts of (1) separately, an equivalent real-valued system of doubled size can be obtained:

$$\begin{aligned} \mathbf {y}=\mathbf {B}\mathbf {x}+\mathbf {n}, \end{aligned}$$
(2)

where

$$ \mathbf {y} = \left[ \begin{array}{c} \mathfrak {R}(\mathbf {y}) \\ \mathfrak {I}(\mathbf {y}) \end{array} \right] , \quad \mathbf {n} = \left[ \begin{array}{c} \mathfrak {R}(\mathbf {n}) \\ \mathfrak {I}(\mathbf {n}) \end{array} \right] , \quad \mathbf {B}= \left[ \begin{array}{rr} \mathfrak {R}(\mathbf {H}) &{} -\mathfrak {I}(\mathbf {H}) \\ \mathfrak {I}(\mathbf {H}) &{} \mathfrak {R}(\mathbf {H}) \end{array} \right] . $$

Given a MIMO system modeled as (2), the optimum ML decoding is equivalent to the following CVP:

$$\begin{aligned} \min _{\mathbf {x}\in \fancyscript{A}}\Vert \mathbf {y}-\mathbf {B}\mathbf {x}\Vert _2. \end{aligned}$$
(3)

where the constellation \(\fancyscript{A}\) is of lattice type. Unfortunately, CVP has been proved to be NP-hard [2], and all existing algorithms for solving (3) have an exponential complexity with the lattice dimension \(n\) [5, 6]. Recently, lattice-reduction-aided SIC decoding turned out to be extremely promising, since its bit-error-rate (BER) performance can effectively approximate the ML decoding with a complexity of only \(O(n^3)\) operations [9, 15].

2.2 Diagonal Reduction Algorithm

In this section, we first introduce some concepts of lattices and the SIC decoding, then we describe the diagonal reduction method [19].

Given a matrix \(\mathbf {B}\in \mathbb {R}^{m\,\times \,n}\) (\(n\le m\)) of full column rank, then a lattice generated by \(\mathbf {B}\) is defined by \( L(\mathbf {B}) = \{ \mathbf {B}\mathbf {z} : \mathbf {z} \in \mathbb {Z}^n \}\). The columns of \(\mathbf {B}\) form a basis for the lattice \(L(\mathbf {B})\). An integer matrix \(\mathbf {Z}\in \mathbb {Z}^{n\,\times \,n}\) is called unimodular if \(| \det (\mathbf {Z}) |=1\). The columns of a matrix \(\mathbf {B}'\) can form a basis for \(L(\mathbf {B})\) if and only if there exists a unimodular matrix \(\mathbf {Z}\) such that \(\mathbf {B}'=\mathbf {B}\mathbf {Z}\). The volume of \(L(\mathbf {B})\) is defined as \(vol (L(\mathbf {B}))=\sqrt{\det (\mathbf {B}^\mathrm{T}\mathbf {B})}\), which is independent of the choice of basis. Let \(\lambda (L)\) be the Euclidean length of the shortest nonzero vector in a lattice \(L\), then it is well known that \(\lambda (L)/ \hbox {vol}(L)^{1/n}\) is upper bounded over all \(n\)-dimension lattices \(L\), and the Hermite’s constant \(\gamma _n\) is defined as the supremum of \(\lambda (L)^2/ \hbox {vol}(L)^{2/n}\) over all \(n\)-dimension lattices. Finding the exact value of \(\gamma _n\) is very difficult. The exact value of \(\gamma _n\) is only known for \(1\le n \le 8\) and \(n=24\) [13, p. 33]. For an arbitrary dimension \(n\), an upper bound of the Hermite’s constant is given in [13, p. 35]:

$$\begin{aligned} \gamma _n \le 1 + \frac{n}{4}, \quad \hbox {for all } n\ge 1. \end{aligned}$$
(4)

A lattice reduction algorithm finds a unimodular matrix \(\mathbf {Z}\) for a given \(\mathbf {B}\) such that the columns of \(\mathbf {B}\mathbf {Z}\) are reasonably short. Lattice reduction has now become a powerful tool for enhancing the performance of suboptimal MIMO detectors, since it can significantly improve the orthogonality of the channel matrix.

Given a lattice generator matrix \(\mathbf {B}\in \mathbb {R}^{m\,\times \,n}\) and its QR decomposition \(\mathbf {B}=\mathbf {QR}\), where \(\mathbf {Q}\in \mathbb {R}^{m\,\times \,n}\) has orthonormal columns and \(\mathbf {R}\in \mathbb {R}^{n\,\times \,n}\) is upper triangular. From [8, 17], the efficiency of sphere decoding and the performance of SIC decoding is determined by the arrangement of the diagonal elements of \(\mathbf {R}\). Based on this fact, various reduction notions, such as the LLL reduction [7], effective LLL reduction [10], partial LLL reduction [11, 17], and diagonal reduction [19], have been proposed. Among all the aforementioned reduction notions, the diagonal reduction is the weakest, consequently, the least computationally demanding.

Definition 1

(Diagonal reduction [19]) A basis matrix \(\mathbf {B}\in \mathbb {R}^{m\,\times \,n}\) is said to be diagonal reduced with the parameter \(\omega \) (\(1/4<\omega <1\)), if the entries \(r_{i,j}\) of the upper triangular factor \(\mathbf {R}\) in its QR decomposition \(\mathbf {B}=\mathbf {QR}\) satisfy

$$\begin{aligned} (r_{k-1,k}-\mu _k r_{k-1,k-1})^2 + r_{k,k}^2 \ge \omega r_{k-1,k-1}^2 , \end{aligned}$$
(5)

for all \(1< k \le n\), where \(\mu _k = \lfloor r_{k-1,k}/r_{k-1,k-1} \rceil \).

From the above definition, diagonal reduction only imposes one simple constraint on the diagonal entries of \(\mathbf {R}\). However, it is proved in [19] that diagonal-reduction-aided SIC decoding has identical performance as LLL-reduction-aided SIC decoding. A generic implementation of diagonal reduction can be found in Fig. 1.

Fig. 1
figure 1

Diagonal reduction algorithm (DR) [19]

3 Diagonal Reduction Using Fast Givens

From Fig. 1, the computational cost of the diagonal reduction algorithm includes two parts: the size-reduction (lines 7–8) and the Givens rotation (lines 11–13). The simulation results in [19] indicate that the overall complexity of the algorithm is dominated by the Givens rotations as the lattice dimension \(n\) increases. Thus, we propose the use of the fast Givens transformation in place of the Givens rotations to speed up the diagonal reduction algorithm.

Like the Givens rotation, the fast Given can be used to introduce zeros into selected positions. Specifically, given a lattice generator matrix \(\mathbf {B}\), the fast Givens transformation is based on the following decomposition:

$$\begin{aligned} \mathbf {B} = \mathbf {F} \mathbf {D}^{-1} \mathbf {R} , \end{aligned}$$
(6)

where \(\mathbf {D}=\mathrm{diag}(d_i)\) is a positive diagonal matrix, \(\mathbf {F} \mathbf {D}^{-1/2}\) represents the orthogonal factor in the QR decomposition of \(\mathbf {B}\), and \(\mathbf {D}^{-1/2} \mathbf {R}\) represents the upper triangular factor.

How can the fast Givens introduce zeros? In the 2-by-2 case, given \(\mathbf {x} = [x_1 , x_2]^\mathrm{T}\) and the corresponding diagonal elements \(d_1, d_2 > 0\), we first compute

$$ \alpha = -x_1/x_2, \quad \beta = -\alpha d_2 / d_1, \quad \hbox {and} \ \gamma = - \alpha \beta . $$

When \(\gamma \le 1\), we have the type 1 fast Givens:

$$\begin{aligned} \mathbf {F} = \left[ \begin{array}{cc} \beta &{} 1 \\ 1 &{} \alpha \end{array} \right] \end{aligned}$$
(7)

and update \(d_1\) and \(d_2\):

$$\begin{aligned} \widehat{d}_1 \leftarrow (1 + \gamma ) d_2 \quad \hbox {and} \quad \widehat{d}_2 \leftarrow (1 + \gamma ) d_1 . \end{aligned}$$
(8)

When \(\gamma > 1\), setting

$$ \alpha \leftarrow 1 / \alpha , \quad \beta \leftarrow 1 / \beta , \quad \hbox {and} \ \gamma \leftarrow 1 / \gamma , $$

we have the type 2 fast Givens:

$$\begin{aligned} \mathbf {F} = \left[ \begin{array}{cc} 1 &{} \beta \\ \alpha &{} 1 \end{array} \right] \end{aligned}$$
(9)

and update \(d_1\) and \(d_2\):

$$\begin{aligned} \widehat{d}_1 \leftarrow (1 + \gamma ) d_1 \quad \hbox {and} \quad \widehat{d}_2 \leftarrow (1 + \gamma ) d_2. \end{aligned}$$
(10)

Then it can be verified that

$$ \mathbf {F} \left[ \begin{array}{c} x_1 \\ x_2 \end{array} \right] = \left[ \begin{array}{c} \times \\ 0 \end{array} \right] $$

and

$$ \left[ \begin{array}{cc} \widehat{d}_1 &{} 0 \\ 0 &{} \widehat{d}_2 \end{array} \right] ^{-1/2} \mathbf {F} \, \left[ \begin{array}{cc} d_1 &{} 0 \\ 0 &{} d_2 \end{array} \right] ^{1/2} $$

is orthogonal.

In our fast Givens-based diagonal reduction algorithm, all the transformations are based on the decomposition (6). In the beginning, we compute the QR decomposition \(\mathbf {B}=\mathbf {QR}\) and set \(\mathbf {F}=\mathbf {Q}\) and \(\mathbf {D}=\mathbf {I}_n\). Thus, in this case, the size-reduction in each iteration is the same as lines 7–8 of Fig. 1. But the diagonal reduction condition (5) becomes

$$\begin{aligned} d_{k-1}^{-1} (r_{k-1,k}-\mu _k r_{k-1,k-1})^2 + d_k^{-1} r_{k,k}^2 \ge \omega d_{k-1}^{-1} r_{k-1,k-1}^2, \end{aligned}$$
(11)

for \(1< k \le n\). The diagonal reduction algorithm using fast Givens (DRFG) is summarized in Fig. 2.

Fig. 2
figure 2

Diagonal reduction algorithm using fast givens (DRFG)

In comparison with the original diagonal reduction algorithm, DRFG saves a substantial number of multiplication operations, since two entries of the 2-by-2 fast Givens matrix are equal to 1. However, DRFG introduces overhead, such as the computations in line 14 and line 20. Our simulation results presented in Sect. 5 show that overall DRFG is more efficient than DR.

4 Dual Diagonal Reduction

In this section, after a brief introduction to dual basis, we first investigate diagonal reduction of dual basis and prove that if a primal basis is diagonal reduced, then its dual basis is also diagonal reduced. Then we derive an upper bound of proximity factor of SIC decoding, which not only improves an existing bound for the dual LLL reduction in [9], but also extends it to a family of dual LLL-type reductions.

4.1 Dual Lattice Reduction

Let \(L\) be an \(n\)-dimensional lattice in \(\mathbb {R}^{m}\), then the dual lattice \(L^*\) of \(L\) is defined as the set

$$\begin{aligned} L^* = \{\mathbf {u} ~|~ \langle \mathbf {u}, \mathbf {v}\rangle \in \mathbb {Z}, \hbox {for~all} ~\mathbf {v}\in L \}, \end{aligned}$$
(12)

where \(\langle \mathbf {u}, \mathbf {v}\rangle \) is the inner product of \(\mathbf {u}\) and \(\mathbf {v}\). Suppose that \(\mathbf {B}\) is a primal basis matrix of \(L\), then it is obvious that the columns of \(\mathbf {B}^{\dag \mathrm T}\) form a basis for its dual lattice \(L^*\). In this paper, we adopt the definition of the dual basis \(\mathbf {B}^*\triangleq \mathbf {B}^{\dag \mathrm T}\mathbf {J}\) [9], where

$$ \mathbf {J} \triangleq \left[ \begin{array}{cccc} 0 &{} \cdots &{} 0 &{} 1 \\ 0 &{} \cdots &{} 1 &{} 0 \\ \vdots &{} \cdots &{} \vdots &{} \vdots \\ 1 &{} \cdots &{} 0 &{} 0 \end{array} \right] $$

A dual lattice is closely related to its corresponding primal lattice. For instance, we have \(L^{**} = L\) and \(\det (L^*) = 1/ \det (L)\).

Given a primal basis matrix \(\mathbf {B}\), then the dual basis reduction is to perform a lattice reduction algorithm on its dual basis \(\mathbf {B}^*\). Like the primal basis reduction, dual basis reduction can also return a well reduced basis of the primal lattice. Suppose that \(\mathbf {Z}^*\) is the unimodular matrix that reduces the dual basis \(\mathbf {B}^*\), then the corresponding reduced primal basis is given by

$$ \mathbf {B}' = (\mathbf {B}^{\dag \mathrm T}\mathbf {J}\mathbf {Z}^*)^{\dag \mathrm T} \mathbf {J} = \mathbf {B}\mathbf {J}(\mathbf {Z}^*)^{\dag \mathrm T}\mathbf {J}, $$

where \(\mathbf {J}(\mathbf {Z}^*)^{\dag \mathrm T}\mathbf {J}\) is the unimodular matrix associated with the primal lattice.

To study the reduction properties of diagonal reduction on dual lattices, the following result is essential.

Lemma 1

Let \(\mathbf {B}=\mathbf {QR}\) and \(\mathbf {B}^*=\mathbf {Q}^*\mathbf {R}^*\) be the QR decompositions of the primal basis \(\mathbf {B}\) and its dual basis \(\mathbf {B}^*\), respectively. Then

$$\begin{aligned} \mathbf {Q}^* = \mathbf {QJ}, \quad \mathbf {R}^* = \mathbf {J}\mathbf {R}^{-\mathrm T}\mathbf {J}. \end{aligned}$$
(13)

Proof

It is easy to verify that \(\mathbf {B}^{\dag }=\mathbf {R}^{-1}\mathbf {Q}^\mathrm{T}\). Thus, we have

$$\begin{aligned} \mathbf {B}^*&= \mathbf {B}^{\dag \mathrm T}\mathbf {J} = (\mathbf {R}^{-1}\mathbf {Q}^\mathrm{T})^\mathrm{T}\mathbf {J} =\mathbf {Q}\mathbf {R}^{-\mathrm T}\mathbf {J} \nonumber \\&= (\mathbf {QJ})\cdot (\mathbf {J}\mathbf {R}^{-\mathrm T}\mathbf {J}). \end{aligned}$$
(14)

Obviously, \(\mathbf {QJ}\) has orthonormal columns and \(\mathbf {J}\mathbf {R}^{-\mathrm T}\mathbf {J}\) is an upper triangular matrix, thus the proof is completed.

Based on the above lemma, we can obtain the following result.

Proposition 1

If the lattice basis matrix \(\mathbf {B}\) is diagonal reduced, then its dual basis \(\mathbf {B}^{*}\) is also diagonal reduced.

Proof

Let \(\mathbf {R}=[r_{i,j}]\) and \(\mathbf {R}^{*}\) be the upper triangular factors of \(\mathbf {B}\) and \(\mathbf {B}^{*}\), respectively. Then from Lemma 1,

$$\begin{aligned} \mathbf {R}^*&= \mathbf {J}\mathbf {R}^{-\mathrm T}\mathbf {J} \nonumber \\&= \left[ \begin{array}{ccccc} \frac{1}{r_{n,n}} &{} -\frac{r_{n-1,n}}{r_{n-1,n-1}r_{n,n}} &{} \times &{} &{} \times \\ &{} \frac{1}{r_{n-1,n-1}} &{} -\frac{r_{n-2,n-1}}{r_{n-2,n-2}r_{n-1,n-1}} &{} &{} \times \\ &{} &{} \frac{1}{r_{n-2,n-2}} &{} \ddots &{} \vdots \\ &{} &{} &{} \ddots &{} -\frac{r_{1,2}}{r_{1,1}r_{2,2}} \\ &{} &{} &{} &{} \frac{1}{r_{1,1}} \end{array} \right] \end{aligned}$$
(15)

Since \(\mathbf {B}\) is diagonal reduced, we then have

$$\begin{aligned} \left( r_{k-1,k}- \left\lfloor \frac{r_{k-1,k}}{r_{k-1,k-1}} \right\rceil \cdot r_{k-1,k-1}\right) ^2 + r_{k,k}^2 \ge \omega r_{k-1,k-1}^2 , \end{aligned}$$
(16)

for all \(1< k \le n\). Multiplying the both sides of (16) with \(\frac{1}{(r_{k-1,k-1}r_{k,k})^2}\), we obtain

$$ \left( \frac{r_{k-1,k}}{r_{k-1,k-1}r_{k,k}} - \left\lfloor \frac{r_{k-1,k}}{r_{k-1,k-1}} \right\rceil \cdot \frac{1}{r_{k,k}} \right) ^2 + \left( \frac{1}{r_{k-1,k-1}} \right) ^2 \ge \omega \left( \frac{1}{r_{k,k}} \right) ^2, $$

which implies that \(\mathbf {R}^*\) is also diagonal reduced.

4.2 Proximity Factor

To characterize the performance gap between suboptimal decoding and ILD, a proximity factor was defined in [8] and further discussed in [9, 18]. Given a lattice generator matrix \(\mathbf {B}=[\mathbf {b}_1,...,\mathbf {b}_n]\), denote \(\phi _i\) the acute angle between \(\mathbf {b}_i\) and the linear space spanned by the previous \(i-1\) basis vectors, then the proximity factor of SIC decoding is defined as:

$$\begin{aligned} \rho _{i}\triangleq \sup _{\mathbf {B}\in \fancyscript{B}_{Red }} \frac{\lambda ^2(L(\mathbf {B}))}{\Vert \mathbf {b}_i\Vert _2^2 \sin ^2 \phi _i}, \end{aligned}$$
(17)

where the supremum is taken over the set \(\fancyscript{B}_{Red }\) of bases satisfying a certain reduction notion for any \(n\)-dim lattice \(L\). We further define \(\rho \triangleq \max _i \{\rho _{i}\}\). From [9], the average error probability of SIC decoding can be bounded by

$$ P_{e, \mathrm{SIC}}(\textsc {SNR})\le \sum _{i=1}^n P_{e, \mathrm{ILD}}\left( \frac{\textsc {SNR}}{\rho _{i}}\right) \le n P_{e, \mathrm{ILD}}\left( \frac{\textsc {SNR}}{\rho }\right) $$

for arbitrary SNR.

Denote \(\rho _{\text {LLL}}\), \(\rho _{\text {DLLL}}\), \(\rho _{\text {DR}}\), and \(\rho _{\text {DDR}}\) the proximity factors of SIC decoding aided by LLL reduction, dual LLL reduction, diagonal reduction, and dual diagonal reduction, respectively. An upper bound of \(\rho _{\text {LLL}}\) is given in [18]:

$$\begin{aligned} \rho _{\text {LLL}} \le \gamma _n \cdot \beta ^{\frac{n-1}{2}} \le \left( 1+\frac{n}{4}\right) \beta ^{\frac{n-1}{2}}, \end{aligned}$$
(18)

where \(\beta = 1/(\omega -1/4)\ge 4/3\). Following the argument in [19], it is easy to prove that

$$\begin{aligned} \rho _{i,\text {DR}} = \rho _{i,\text {LLL}} \le \gamma _i \cdot \beta ^{\frac{i-1}{2}}. \end{aligned}$$
(19)

Thus,

$$\begin{aligned} \rho _{\text {DR}} = \rho _{n,\text {DR}} \le \gamma _n \cdot \beta ^{\frac{n-1}{2}}. \end{aligned}$$
(20)

For dual reduction, an upper bound of \(\rho _{\text {DLLL}}\) is given in [9]:

$$\begin{aligned} \rho _{\text {DLLL}} \le \beta ^{n-1}. \end{aligned}$$
(21)

In the following, we improve the upper bound (21). From (19) and Proposition 1, we can obtain that

$$\begin{aligned} \rho _{i, \mathrm{DDR}} = \sup _{\mathbf {B^*}\in \fancyscript{B}_{\mathrm{DR}}} \frac{\lambda ^2(L(\mathbf {B}))}{r_{i,i}^2} = \sup _{\mathbf {B}\in \fancyscript{B}_{\mathrm{DR}}} \frac{\lambda ^2(L(\mathbf {B}))}{r_{i,i}^2} \le \gamma _i \cdot \beta ^{\frac{i-1}{2}}. \end{aligned}$$
(22)

Thus,

$$\begin{aligned} \rho _{\text {DDR}} = \rho _{n,\text {DDR}} \le \gamma _n \cdot \beta ^{\frac{n-1}{2}}. \end{aligned}$$
(23)

Following the above argument, it is easy to prove that the proximity factors of SIC decoding aided by all dual LLL-type reduction notions, such as dual LLL reduction, dual effective LLL reduction, and dual partial LLL reduction, can be upper bounded by the right-hand side of (23). Comparing (23) with (20), SIC decoding aided by primal and dual diagonal reductions are expected to have the same performance. This shall be confirmed by the simulation results presented in Sect. 5.

5 Simulation Results

In this section, we present our simulation results on comparing the efficiency of the proposed algorithm DRFG with the original algorithm DR. All experiments were performed on matrices with random entries, drawn from an i.i.d. zero-mean, unit variance Gaussian distribution. Without loss of generality, all testing matrices were set to square matrices. For each size, we generated 1,000 random matrices and took an average. The parameter \(\omega \) in the reduction algorithms was set to \(0.99\).

Fig. 3
figure 3

The average complexity (in flops) of the diagonal reduction (DR), dual diagonal reduction (dual DR), DRFG, and dual DRFG algorithms

Fig. 4
figure 4

Simulated BER of SIC aided by the LLL, DR, dual DR, DRFG, and the dual DRFG for 64-QAM over an \(8\,\times \,8\) uncoded MIMO fading channel

Although the new algorithm DRFG is expected to be faster than the original algorithm DR, the computations in line 14 and line 20 of Fig. 2 introduce overhead. To compare the overall complexity of the algorithms, we experimented on the floating-point operations (flops)Footnote 1 carried out by the algorithms. Figure 3 depicts our results on the average numbers of flops performed by the reduction algorithms. The figure shows that in both cases of primal and dual lattice reduction, DRFG is more efficient than DR, and the performance gap between them widens quickly as the dimension increases. This indicates that the overhead introduced by the fast Givens is insignificant. Also note that the DR (DRFG) algorithm is slightly faster than its dual counter part dual DR (dual DRFG) algorithm. This is due to the additional computation, for instance, the calculation of \(\mathbf {B}^{\dag }\), required by the dual reduction.

We also investigated the reduction quality of different reduction algorithms measured by the BER performance of the SIC decoding. Specifically, using a 64-QAM constellation, Fig. 4 depicts the simulated BER curves of lattice-reduction-aided SIC over an \(8\,\times \,8\) uncoded MIMO fading channel. We have found that the SIC aided by the four diagonal reduction algorithms have identical BER performance to that aided by the LLL algorithm. This is consistent with the theoretical analysis presented in Sect. 4.2.