1 Introduction

In telecommunication, a signal transmitted undergoes distortions due to multi-path propagation and band limited channels which causes the inter symbol interference (ISI) [15]. ISI causes severe effect in wireless channel and makes the signal communication less reliable. Equalizer is a device that attempts to nullify the distortion occurred by a transmitted signal through a channel. Equalizers [16] are placed at the receiver side to combat the ISI and to recover the transmitted signal. The classification of equalizers is detailed in Fig. 1. They are broadly classified as Linear equalizers and non-linear equalizers. Linear equalizers can eliminate the ISI, but enhances other noises which leads to poor signal performance. When the channel distortion is too severe and cannot be mitigated by linear equalizers, non-linear equalizers are used. The performance of non-linear equalizers is more effective to nullify the channel impairment. Decision feedback equalizer [2, 5, 16] is the non-linear equalizer which is effectively used as channel equalizer. DFE gives better signal to noise ratio when compared with linear equalizer by removing ISI and it exhibits less noise. The basic architecture of DFE is shown in Fig. 2. It consists of a decision device (quantizer), a feedforward (FF) filter, and a feedback(FB) filter.

Fig. 1
figure 1

Classification of equalizers

Fig. 2
figure 2

Block diagram of decision feed-back equalizer

The FF filter receives and equalizes the data with the transfer function of anti causal part of the channel and cancels pre-cursor ISI. The noise enhanced in DFE is significantly reduced. The FB filter suppresses the post-cursor ISI. The coefficients for the FF and FB filter should be carefully chosen to operate the DFE with zero errors. Untill the quantizer propagates zero value, the DFE channel will operate efficiently with less noise.

When data rate of the transmission system increases, the output of the DFE causes more symbols to overlap. To decrease this, we have to increase the order of FF and FB filter in DFE. Generally DFEs are designed using multiply accumulate units (MAC). When the filter order increases, the number of multipliers required by MAC unit also increases, which makes the hardware architecture more complex and the implementation of the DFE will become a challenging task. To reduce the area of DFE, MAC units are replaced with multiplier less architectures.

Many multiplier-less architectures have been reported in the literature [1, 4, 7, 9, 13, 20, 21], some of them have been proved efficient in certain conditions. Distributed Arithmetic (DA) architecture is one of the baseline efficient multiplier less architectures which pre-computes and stores the partial inner product of two vectors in a ROM/LUT. Generally the speed of MAC based multipliers depends on the length of the vector, but the speed of DA based architecture depends on the bit-length of the input vector. Hence DA based architectures (DAA) are faster than MAC based ones. DSP blocks such as FIR, IIR, DCT, FFT and adaptive filters can be implemented using DAA. The size of the memory in DA based FIR filter raises exponentially when the filter order is increased. To address this issue, many DAA [3, 12] have been proposed with less memory usage. In ref. [22], a modified DA (MDA) architecture was explained. In this, the LUT size is halved by replacing adder with adder/substractor. Further the LUT size of DA is reduced by introducing offset binary code (OBC) concept. DAA was constructed without the usage of memory. In [6], the memory less architecture was developed by replacing multiplexers and adders instead of LUT/ROM. Several applications like hearing-aids, software defined radio, channel equalizers utilize the DAA. The works in [8, 19] proposed DAA for efficient base band processing in software receiver application. The authors in [17, 18] have proposed block floating point (BFP) approach for adaptive DFE. The performance cost of BFP is more when observed with the fixed point approach. Basic DA based DFE and OBC DA based DFE architectures are described in [10, 11, 14]. These architectures occupy higher area due to usage of ROM/LUT.

In this paper, we propose a novel memory less Distributed Arithmetic (NMLDA) filter and developed DFE by using the proposed NMLDA, which occupies less area and offers high speed when compared to the existing DFE architectures.

The remnant of the paper is organized as follows. Section 2 comprises of the mathematical calculations of the DA. The proposed NMLDA architecture is described in Section 3. Section 4 emphasizes on the implementation of DFE with NMLDA architecture. We provide the synthesis results of existing and proposed design in Section 5. Finally conclusion encompasses in Section 6.

2 Background of distributed arithmetic

The bit serial multiplication operation of DA can be performed with in single direct step. Let us consider xk and dk to be the input and fixed filter coefficient vectors with K number of filter input words. Multiplication of vectors can be written as

$$ y= \sum\limits_{k = 1}^{K} d_{k} x_{k} $$
(1)

Where xk is written as signed 2s complement binary number and |xk| < 1, then xk can be expressed as:

$$ x_{k}= -b_{k0}+\sum\limits_{n = 1}^{N-1}b_{kn} 2^{-n} $$
(2)

Where xk = bk0, bk1, ....... bk(N− 1) and bkn has (0, 1) values. On substituting xk in y and rearranging the summation order, we get finally

$$ y= \sum\limits_{k = 1}^{K}d_{k}(-b_{k0})+\sum\limits_{n = 1}^{N} \left[\sum\limits_{k = 1}^{K}d_{k} b_{kn} \right]2^{-n} $$
(3)

Equation (3) provides us DA form. The modify form of above equation is:

$$ y=\sum\limits_{l = 1}^{L-1} \left[2^{-l} C_{l}-C_{0} \right] $$
(4)

Where

$$ C_{l}=\sum\limits_{k = 0}^{K-1}(d_{k} b_{kn}) $$
(5)

Cl is the pre-computed partial inner product value which is stored in the memory. It has 2k possible combination values and contains 2 ∗ 2k size of LUT. The LUT size will be reduced from (2 ∗ 2k) to 2k by using MDA design. Further the LUT size is reduced to 2k− 1 by having OBC concept in DA architecture. When filter order increases, the LUT size of DA increases and hence memory less DA architecture (MLDA) is developed to reduce the area occupancy. The memory units are replaced with multiplexers in MLDA architecture. In the paper, we further extended MLDA design to decrease the area when filter order increases.

3 Proposed novel memory-less DA filter architecture

Our proposed Novel memory-less DA (NMLDA) filter architecture is explained in this section. It consists of 2:1 multiplexers instead of memory elements and the adders of memory less DAA [6] are replaced with enhanced 4:2 compressor as shown in Fig. 3. It consists of serial in parallel out shift register (SIPOSR), four 2:1 multiplexers, enhanced 4:2 compressor adders and shift accumulators. The input data xk is fed to the SIPOSR and the outputs from shift registers will act as selection lines for four 2:1 multiplexers. One of the inputs of the 2:1 multiplexer is the filter coefficient and the other input is logic ’0’. If the selection input line is high, then filter coefficient will present at output else the output of multiplexer is zero. The output from MUXs are A1, A2, A3, A4 which are given to the enhanced 4:2 compressor adder to get final output.

Fig. 3
figure 3

Block diagram of 4-tap NMLDA filter

The enhanced 4:2 compressor adder is designed with dual mode logic (DML) and is shown in Fig. 4. DML logic consists of XOR/XNOR module and MUX module. The XOR/XNOR module is developed with CMOS logic and MUX module is developed using transmission logic gate (TG). By using DML realization, we can achieve better results in area and speed. The outputs from MUXs A1, A2, A3 and A4 are given to enhanced 4:2 compressor adder. A5 (Cin) is the fifth input to the compressor which is the Cout of the previous stage compressor. The four inputs A1, A2, A3, A4 and sum output will have same weights. The A1 and A2 inputs are fed to XOR/XNOR1 module and A3 and A4 are fed to XOR/XNOR2 module. The outputs from XOR/XNOR modules are given to the MUX1 module with A4 as a selection line. The outputs from MUX1 module are given as inputs to the MUX2 module with A5 as selection line to generate sum. To achieve carry, A4 and A5 are given to MUX4 with selection line as one of the output of MUX2. By using DML logic, the 4:2 compressor adder provides the outputs as follows:

$$\begin{array}{@{}rcl@{}} Sum&=&((\overline{(A_{3} \oplus A_{4})} *(A_{1}\oplus A_{2}))+ \bar{A_{5}}*(A_{3} \oplus A_{4})*(\overline{(A_{1} \oplus A_{2})})\\ &&+ \overline{((\overline{(A_{3} \oplus A_{4})}*(A_{1}\oplus A_{2}))+((A_{3} \oplus A_{4})*\overline{(A_{1} \oplus A_{2})})}* A_{5} \end{array} $$
(6)
$$ Carry= (\overline{(A_{1}\oplus A_{2} \oplus A_{3} \oplus A_{4})} * A_{4}) +(A_{1}\oplus A_{2} \oplus A_{3} \oplus A_{4})*A_{5} $$
(7)
$$ C_{out}= \overline{(A_{1} \oplus A_{2})}*A_{1} + (A_{1} \oplus A_{2})* A_{3} $$
(8)
Fig. 4
figure 4

Block diagram of enhanced compressor adder with DML logic

The outputs sum and carry are given to shift and accumulator unit to get final result.

4 Decision feedback equalizer with proposed NMLDA

Let us consider, decision feed back equalizer shown in Fig. 2, with input signal x(k), where kZ with Nf number of FF filter coefficients and feedback output decision r(k) with Nb number of FB filter coefficients. The output generated decision Sqk for DFE is given as follows:

$$ S_{qk}=Q[S(k)] $$
(9)

where Q[.] represents the quantization operation.

$$ S(k)= \hat{x(k)}-\hat{r(k)} $$
(10)
$$ r(k)=S_{q}(k-1) $$
(11)
$$ \hat{x(k)}=\sum\limits_{i = 0}^{N_{f}-1}d_{i} x(k-i)=d^{T}x(k) $$
(12)
$$ \hat{r(k)}=\sum\limits_{j = 0}^{N_{b}-1}b_{j}r(k-j)=b^{T}r(k) $$
(13)

where

$$ d^{T}=[d_{0},d_{1},d_{2},.....d_{N_{f}-1}] $$
(14)
$$ b^{T}=[b_{0},b_{1},b_{2},.....b_{N_{b}-1}] $$
(15)

The coefficient of FF filter is dT and the coefficient of FB filter is bT. Nb and Nf are number of FB and FF filter coefficients respectively. The 2s complementary form of x(k) and r(k) with W word length can be expressed as

$$ x(k-i)=\sum\limits_{w = 1}^{W-1}x_{i,W-1-i}2^{-i}-x_{i,W-1} $$
(16)
$$ r(k-j)=\sum\limits_{w = 1}^{W-1}r_{j,W-1-j}2^{-j}-r_{j,W-1} $$
(17)

On substituting Eqs. (16) and (17) in Eqs. (12) and (13) respectively and again substituting in Eq. (10) finally we get:

$$\begin{array}{@{}rcl@{}} S(k)&=& \left[\sum\limits_{i = 0}^{N_{f}-1}-d_{i}x_{i,W-1}+\sum\limits_{w = 1}^{W-1}\sum\limits_{i = 0}^{N_{f}-1}x_{i,W-1-i}2^{-i}\right]\\ &-&\left[\sum\limits_{j = 0}^{N_{b}-1}-b_{j}r_{j,W-1}+\sum\limits_{w = 1}^{W-1}\sum\limits_{j = 0}^{N_{b}-1}r_{j,W-1-j}2^{-j}\right] \end{array} $$
(18)

DFE architecture with proposed NMLDA is shown in Fig. 5. DFE consists of FF filter block, control circuit, FB filter block and decision device. Both FF and FB filter blocks are designed by using proposed NMLDA architecture. It consists of serial in parallel out shift register bank, a block of multiplexers, enhanced compressor adder bank and shift-accumulator block. In FF filter block, the input from the SIPO register bank are given as a selection line to the multiplexer and outputs from multiplexers are passed to the enhanced 4:2 compressor adder, from there it is given to the shift-accumulator to compute the output \(\hat {x(k)}\). Similar operation will be performed in FB filter block and the output \(\hat {r(k)}\) is achieved. The difference of the outputs of FF and FB filter blocks is S(k) which is given to the decision device. The output S(k) is checked by the decision device whether the signal lies with in

Fig. 5
figure 5

Decision feedback equalizer with NMLDA

the range of signal or not and quantizes the signal according to the modulated scheme. The process will be continued untill DFE gets zero error.

5 Results and discussion

We have validated the proposed design explained so for by performing simulation. We simulated 4,8,16 and 32 taps of NMLDA filter architecture and implemented them on FPGA. The number of logical elements, static power consumption and maximum sampling frequency are obtained by using Altera cyclone III . The results of proposed architecture are compared with MAC based filter, the OBC DA based filter, MDA filter and memory-less DA based filter and are tabulated.

From the results shown in Table 1, we analyzed that the number of logical elements in proposed architecture is very less when compared to MAC architecture and other memory based DA architectures. In proposed 4-tap FF filter architecture,the OBC technique and MAC design have same number of logical elements but when the filter order increases, there is large variation in their logical elements. Also the number of logical elements for proposed design is less when compared to the memory less architecture [6] and is shown in Fig. 6. It is observed from Table 2 that the maximum frequency of the proposed NMLDA is very high when compared with all other architectures and observed in Fig. 7. From Table 3, we observe that static power consumption of NMLDA is almost same as the with the memory based DA architectures.

Table 1 Performance comparison of no. of logical elements for existing and proposed NMLDA architectures implemented on cyclone III EP3C55F484C6 FPGA device
Table 2 Performance comparison of f\(_{\max }\) (MHZ) for existing and proposed NMLDA architectures implemented on cyclone III EP3C55F484C6 FPGA device
Table 3 Performance comparison of static power consumption (mW) for existing and proposed NMLDA architectures implemented on cyclone III EP3C55F484C6 FPGA device
Fig. 6
figure 6

Logical elements for proposed and existing DA architectures

Fig. 7
figure 7

Frequency response for proposed and existing DA architectures

Critical path delay is the longest delay path between any two registers in a system. The critical path delay of MAC based FB filter is CMAC = Tmul + NTadd, where Tmul= Computation time of multiplier, Tadd = Computation time of adder. Where ‘N’ is the positive number. Critical path delay for MAC and MDA based architectures are almost same for lower order filters. The critical path delay of MDA scheme is CMDA= Tmemory + NTadd + Tmux. The critical path delay of OBC DA based design is COBC = TOBC + Tmux + (N + 1)Tadd. The critical path delay for OBC based DA is higher when compared with all other architectures, because EXOR gates are additionally added. The critical path delay for the memory-less DA (MLDA) filter design is given as CMLDA = Tmux + Tadd. The critical path delay for the proposed NMLDA design is given as CNMLDA = Tmux + Tcompressoradder. As the critical path delay for the proposed design is less it exhibits high throughput.

The proposed DFE architecture with NMLDA has been tested by channel equalizer system with binary phase shift keying (BPSK) and frequency shift keying (FSK) modulated techniques. Figures 8 and 9 depict the output responses of proposed DFE based channel equalizer with BPSK and FSK signals respectively. Let us consider that a set of channel impulse response signals are modulated with carrier signal using BPSK and FSK techniques and are transmitted. At receiving side of the channel equalizer, the received signal is generated with noise and ISI type of errors. For removing noise and ISI errors in the generated signal, the signals are passed to the DFE. In DFE, the FF filter block removes the pre-cursor anti-causal part of ISI and the rest of the noise and errors are removed in the FB filter block. The DFE channel will operate untill the decision device of the DFE propagates zero value. Finally, the original signal output is obtained with out errors in DFE. The original noise signal and filtered ISI free signal for BPSK and FSK are shown in Figs. 10 and 11 respectively.

Fig. 8
figure 8

Waveforms of proposed DFE based channel equalizer system using BPSK

Fig. 9
figure 9

Waveforms of proposed DFE based channel equalizer system using FSK

Fig. 10
figure 10

Original noise signal and filtered ISI free signal of BPSK

Fig. 11
figure 11

Original noise signal and filtered ISI free signal of FSK

6 Conclusion

In this paper, we proposed NMLDA based DFE. By using NMLDA architecture the number of logical elements used is reduced when compared with MAC architecture and memory based DA architectures. By using novel enhanced 4:2 compressor adder in the proposed architecture, the number of adders used is reduced. So the critical path of the design is reduced, which causes increase in speed. By using NMLDA based DFE, the ISI errors in the transmitted signal can be minimized and the original signal that is transmitted can be obtained without errors. The proposed architecture becomes extremely helpful for high speed data rate modems.