High performance area efficient DA based FIR filter for concurrent decision feedback equalizer

Vijetha, K.; Naik, B. Rajendra

doi:10.1007/s10772-020-09695-x

High performance area efficient DA based FIR filter for concurrent decision feedback equalizer

Published: 03 March 2020

Volume 23, pages 297–303, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Speech Technology Aims and scope Submit manuscript

High performance area efficient DA based FIR filter for concurrent decision feedback equalizer

Download PDF

K. Vijetha¹ &
B. Rajendra Naik¹

183 Accesses
5 Citations
Explore all metrics

Abstract

In this paper we proposed a novel Distributed Arithmetic (DA) based block FIR filters for design of decision feed back equalizers. Here a block FIR filter is designed using DA architecture and is implement for DFE architecture. By introducing block FIR filter architecture the throughput rate for the design is increased. The proposed distributed arithmetic architecture is implemented in application specific integrated circuit (ASIC) Synopsis design compiler tool using SAED 90 nm technology. The application of decision feed back equalizer is implemented in Matlab Simulink and Xilinx system generator tool. The obtained results shows 71% less area delay product (ADP) and 65% less energy delay product (EDP) when compared with the existing architecture and the performance of the design is very high. By using proposed DA based DFE architecture the ISI noises can be removed and is well suited for digital communication systems.

ASIC Implementation of Low Power, Area Efficient Adaptive FIR Filter Using Pipelined DA

Modeling and simulation of FIR filter using distributed arithmetic algorithm on FPGA

Article 20 February 2024

FIR Filter Design Using Distributed Arithmetic with Lookup Tables (LUTs)

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

One of the major obstacles present in digital communication system (Barry et al. 2012; Proakis and Manolakis 1996) is channel distortion which is caused due to inter-symbol interference (ISI). High speed communication systems are needed in DSP when data rates are increased in transmission systems. The speed in DSP systems is limited up to 1 GHz. So, new architectures are needed to operate multigiga bit data rates. Pipelining and parallel processing are two schemes which are successfully employed to increase the computation speed in the digital communication system (Parhi 2004). The distortions like ISI in transmission system can be reduced by introducing equalizers (Lin et al. 2012, 2006; Oh and Parhi 2006) at receiving side of transmission system. The two classes of equalizers are linear equalizer and non-linear equalizer. Figure 1 shows the classification of equalizers. The performance of non-linear equalizer is better than linear equalizers. Decision feedback equalizer (DFE) and maximum likelihood sequence detector (MLSD) are two non linear eminent techniques used in equalizers to combat ISI errors and improve the signal to noise (SNR) ratio by removing the post-cursor ISI at the output. It also eliminates the noise amplification problems which are caused due to spectral nulls. Apart from the earlier eminences it poses a severe problem in speed. The speed of the equalizer is limited due to its iteration bound in the feedback filter (FBF). Several techniques have been implemented in past to enhance the speed and reduce the hardware architecture. In Lin et al. (2006), Lin et al. (2007) the author designed the DFE architecture by removing feedback loop and introducing multiplexers and the comparator to enhance the speed and decrease the chip area, later the author (Parhi 2005), enlarged the work to parallelization and unfolding techniques to reach the target of gigabit system . In paper (Lin et al. 2006; Parhi and Messerschmitt 1989) the authors described look-ahead pipelined multiplexer loops based DFE. In paper (Khan and Ahamed 2016) the author implements the DFE using concurrent look ahead scheme to alleviate the hardware cost. The hardware complexity of the DFE is linearly increased and the performance of the DFE architecture is decreased. To surmount the performance problem, distributed arithmetic architecture (Venkatachalam and Ko 2018; Khan and Ahamed 2016; Prakash et al. 2016) is introduced in DFE to mitigate the speed limitation in DFE. In paper (NagaJyothi and Sridevi 2019) the author proposed memory less distributed arithmetic based FIR filter for decision feedback equalizer.

Now-a-days in DSP, there is a growing interest in Distributed Arithmetic (DA) architecture (Yoo and Anderson 2005; NagaJyothi and SriDevi 2017; Grande and Sridevi 2017; Jyothi and Sriadibhatla 2019) due to its multiplier less architecture and it uses only LUTs and shift accumulate block for partial product. The filter coefficients and input signals of DA are of two’s complement or offset binary code form. DA was first introduced by Croisier later mathematical calculation for DA was developed by White (1989) and Peled and Liu (1974). Chen has given a RAM-based approach for implementing FIR filter which results in low memory requirements. The memory partitioning and multiple memory bank approach have been suggested for FIR filtering in order to reduce the memory requirement of DA-based FIR filter. Choi proposed the DA-based FIR filter structure using offset binary coding (OBC) to reduce the memory size by a factor of 2 Jyothi and Sridevi (2018). Yoo proposed DA-based structure for FIR filter, where memory size has been reduced at the cost of adders. A LUT decomposition scheme has been suggested to reduce LUT complexity of DA-based FIR filter structures, at the cost of few adders. Several designs have been discussed in past decades to improve the performance and decrease the area of the DA architecture when filter order increased. It is an efficient architecture for the FIR filters used in DFE.

In this paper we proposed memory less DA based FIR filter and implemented in concurrent decision feed back equalizer. By using concurrent DFE based architecture the area could be traded for higher throughput or low-power implementation.

The rest of the paper is organized as follows: Sect. 2 describes the mathematical formulation of block DA based FIR filter. Section 3 explains the proposed variable DA based block FIR filter. Synthesis results are explained in Sect. 4. Proposed architecture for DFE is explained in Sect. 5. Finally Sect. 6 concludes the paper.

2 Formulation of DA based block FIR filter

Let us assume a block FIR filter which has a block of input data L and produces block of L outputs for every cycle. Let k th block of filter output $y_k$ is calculated as :

$$y_k=X_k \cdot d$$

(1)

The filter coefficient d is written as:

$$d=[d(0),d(1),\ldots d(N-1)]^T$$

(2)

The input matrix $X_k$ is calculated from present input block of Length L and past data (N − 1) and are expressed as:

$$\begin{aligned} X_k= \begin{bmatrix} x(kL) &{} x(kL-1) &{}.. &{}x(kL-N+1) \\ x(kL-1) &{}x(kL-2) &{} .. &{} x(kL-N)\\ .&{}.&{}..&{}.\\ .&{}.&{}..&{}.\\ x(kL-L+1)&{}x(kL-L)&{}..&{}x(kL-L-N+2) \end{bmatrix} \end{aligned}$$

The input matrix $X_k$ of (LXN) size is spitted into N/2 matrices $S_k^j$ of size (LX2) each and the filter coefficient vector h is spitted into (N/2) filter coefficient vector $u_j$ of size 2, for $0\le j\le (N/2)-1$. The computation of equation.1 is expressed as the sum of M matrix vector products:

$$y_k=\sum _{j=0}^{(N/2)-1} S_k^j u_j$$

(3)

where $S_k^j$ and $U_j$ is expressed as:

$$\begin{aligned} S_k^j= \begin{bmatrix} x(kL-1) &{}x(kL-2j-1) \\ x(kL-2L-1) &{} x(kL-2j-2)\\ .&{}.&{}..&{}.\\ .&{}.&{}..&{}.\\ x(kL-2j-L+1)&{}x(kL-2j-L) \end{bmatrix} \end{aligned}$$

Each filter output $y(kL-i)$ for $0<i<L-1$ is given as sum of N/2 inner products.

$$y(kL-i)=\sum _{j=0}^{{N/2}-1}S_k^{ij} U_j$$

(4)

$S_k^{ij}$ being (i+1)th row of $S_k^j$ is given by

$$S_k^{ij}=[x(kL-2j-i) x(kL-2j-i-1)]$$

(5)

The above equation will as selection lines for the multiplexers.

3 Proposed DA based FIR filter for DFE

To explain the idea of the register sharing, the input sample vectors of an FIR filter of length N = 4 is analyzed for computation of one block of four filter outputs ${y(n-3),y(n-2),y(n-1),y(n)}$. The input vector required for computation of filter outputs, ${y(n-3),y(n-2),y(n-1),y(n)}$ are x(n − 3), x(n − 2), x(n − 1), x(n), x(n − 4), x(n − 3), x(n − 2), x(n − 1), x(n − 5), x(n − 4), x(n − 3), x(n − 2), x(n − 6), x(n − 5), x(n − 4), x(n − 3)respectively. The input vectors corresponding to successive filter output are overlapped by three samples. Due to overlapping samples, only seven out of sixteen samples are different from each other, while the other 9 are overlapping samples of the above 7. The overlapping samples could be eliminated and the required overlapping samples can be sourced by sharing the register contents. In this case of block processing, one block of x(n − 3), x(n − 2), x(n − 1) and x(n) is received during a particular clock cycle. Taking this into consideration, out of the 7 non-overlapping samples, only 3 samples need to be saved in the register to generate all the sixteen samples of 4 input vectors. Therefore, 3 registers are needed by the block architecture of the FIR filter of length four which is the same as those needed by the FIR filter structure of the same length. Therefore, the register complexity of the block structure is independent of the block-size which is an important design feature of block structure. The arithmetic resource needs to be increased proportionately with the block-size in fixed-coefficient FIR filter. The area complexity of block-based FIR filter architecture is marginally less than the proportionate increase in area complexity with the block-size due to register saving, the area-delay efficiency of the hardware architecture is expected to be better for higher block-sizes. To get rid of these, we propose block-based DA structure for fixed-coefficient FIR filters. Q = 4 and 16-word ROM are the decomposition factors considered to derive variable-coefficient DA based FIR architecture.

The architecture of bit parallel variable coefficient FIR is shown in Fig. 2. It consists of one bit slice generator, one partial product generator (PPG) unit, one register array, one partial product selector (PPS) unit, one ATU and one SAT. The bit-slice generator consists of $(N - 1)$ number of B-bit registers. The PPG unit consists of $M(2^Q - Q- 1)$ number of adders. The register array consists of $M(2^Q - 1)$ number of $B_0$ bit registers. The PPS unit consists of BM number of $(2^Q:1)$ size multiplexers. Each $(2^Q:1)$ size multiplexer is implemented using $(2^Q- 1)$ number of 2:1 size multiplexers. Therefore, the PPS unit involves $BM(2^Q- 1)$ number of 2:1 multiplexers. ATU consists of B number of ATs each of $\frac{N}{Q}$ words. Similarly, one SAT is comprised of (B - 1) adders and the same number of shifters. Therefore, the bit-parallel DA-based variable-coeffient FIR structure involves $BM(2Q- 1)$ number of 2:1 multiplexers of bit-width B each.

The LUT uses multiplexer to select the LUT values according to the address bits available at the multiplexer select lines. When LUT size increases, multiplexer complexity increases which increases the area complexity and also the critical-path of the DA structure also increases. To solve these problems, variable block DA based FIR filter are needed.

The variable block DA based FIR is shown in Fig. 3. It consist of one delay unit, one PPG block, one register-LUT unit, and L FIR blocks. The PPG block contains of $\frac{N}{2}$ adders and receives a set of coefficient vectors and computes $\frac{N}{2}$ sets of partial products to update the register-LUT unit for a particular filter. The register-LUT contains $\frac{3N}{2}$ registers. The delay unit receives a set of input $x_k$ and generates L input vectors $x_i^k$ of length N each. Figure 4 shows internal architecture of delay block. For every cycle, L FIR blocks receive L input vectors $x_i^k$ from delay unit and (N/2) sets of three partial product values $rr_j$ from LUT register unit and generates L parallel filter outputs yk. Figure 5 shows the internal architecture of variable block FIR filter. The architecture consist of of $\frac{N}{2}$ multiplexer blocks, ($\frac{BN}{2}$) number of adder tree block and one SAT block. The multiplexer block contains B number of 4:1 multiplexers having 2-point input vector $s_k^ij$ as selection lines. one partial product values rrj from register-LUT block and retrieves B partial filter outputs in parallel corresponding to the B bit-slices of 2-point input vector. Therefore, all (N/2) multiplexers block receive (N/2) input vectors from $x_i^ k$ and (N/2) sets of partial inner-product values rrj from register-LUT unit, and retrieve (BN/2) partial filter outputs in parallel. These (BN/2) partial filter outputs are added through B ATs to produce B partial filter outputs which are shift-added in SAT to obtain the block of L filter outputs. The proposed design receives a set of L inputs for every clock cycle and generates a block of L filter outputs

4 Result analysis

The proposed design and existing design of DA based FIR filter are implement in Verilog HDL and for ASIC implementation results synapsis design complier is used. The proposed design is synthesized in saed 90 nm technology. It is noticed that from the Synthesis result that the proposed structure for variable-coefficient FIR involves 71% less ADP and 65% less EPS than the existing similar structures. Theoretical comparison shows that the proposed fixed coefficient structure, for block-size 8 and filter-length 32, involves eight times more ROM words, eight times more adders, two less registers, and offers eight times higher throughput-rate with same cycle-period than existing. For the same block-size and filter lengths, the proposed variable-coefficient structure involves 7.2 times more adders, the same number of registers, eight times more MUXes, and offers eight times higher throughput-rate than existing design and is shown in Tables 1 and 2. Figures 6 and 7 shows the area delay product and power delay product for the proposed and existing architectures.

Table 1 Hardware and time complexities of the proposed variable coefficient DA-based FIR filter for B=8

Full size table

Table 2 Synopsys synthesis results for proposed variable coefficient DA-based FIR filter using 90nm technology

Full size table

5 Implementation of proposed DA based block FIR filter for DFE

Decision feedback equalizer (DFE) is well-suited and power efficient for channels with a few dominant post-cursor ISI terms, however, the power can become prohibitive for channels with many post-cursor ISI terms. The basic block diagram of proposed DA based DFE as shown in Fig. 8. It consist of feed forward filter, feed back filter and a decision device. The FF and FB filters are designed using proposed DA based block FIR filter. The ISI errors present in the transmission signals can be nullify by using proposed design. The proposed variable DA based FIR filter has been inserted in DFE in feed forward and feed back filter of the concurrent DFE. The ISI errors are nullified during feed forward filter and other noises are removed by feed back filter.

The proposed design has been implemented in Matlab Simulink and Xilinx System Generator too. For implementing proposed design let us consider set of channel impulse response signals, modulated with message signal using BPSK. For removing noise and ISI errors in the generated signal, the signals are passed to the adaptive DFE. Pre-cursor and anti causal part of ISI are removed by FF filter block and other noise and error signals are removed by FB filter block. The adaptive DFE will operate untill decision device of adaptive DFE propagates zero value. The bit error rate calculations has been shown in Fig. 9.

6 Conclusion

In this paper, we made complexity analysis of variable-coefficient DA based FIR filter to study the effect of LUT decomposition factor on DA design. Based on the complexity analysis, we proposed the full-parallel block-based DA structure for variable-coefficient DA FIR structure using . The proposed architecture process one block of L input samples and produce one block of L outputs in every clock cycle.

References

Barry, J. R., Lee, E. A., & Messerschmitt, D. G. (2012). Digital communication. New York: Springer.
Google Scholar
Grande, N. J., & Sridevi, S. (2017). Asic implementation of shared lut based distributed arithmetic in fir filter: Proceedings of the 2017 International conference on Microelectronic Devices, Circuits and Systems (ICMDCS), (pp. 1–4). IEEE.
Jyothi, G. N., & Sriadibhatla, S. (2019). Asic implementation of low power, area efficient adaptive fir filter using pipelined da: Proceedings of the Microelectronics, Electromagnetics and Telecommunications, pp. 385–394. Springer.
Jyothi, G. N., & Sridevi, S. (2018). Low power, low area adaptive finite impulse response filter based on memory less distributed arithmetic. Journal of Computational and Theoretical Nanoscience, 15(6–7), 2003–2008.
Article Google Scholar
Khan, M. T., & Ahamed, S. R. (2016). Low cost implementation of concurrent decision feedback equalizer using distributed arithmetic: Proceedings of the 2016 1st India International Conference on Information Processing (IICIP), (pp. 1–5). IEEE.
Lin, C. H., Wu, A. Y., & Li, F. M. (2006). High-performance vlsi architecture of decision feedback equalizer for gigabit systems. IEEE Transactions on Circuits and Systems II: Express Briefs, 53(9), 911–915.
Article Google Scholar
Lin, C. S., Lin, Y. C., Jou, S. J., & Shiou, M. T. (2007). Concurrent digital adaptive decision feedback equalizer for 10gbase-lx4 ethernet system. In Custom Integrated Circuits Conference, 2007. CICC’07. IEEE, (pp. 289–292). IEEE.
Lin, Y. C., Jou, S. J., & Shiue, M. T. (2012). High throughput concurrent lookahead adaptive decision feedback equaliser. IET Circuits, Devices & Systems, 6(1), 52–62.
Article Google Scholar
NagaJyothi, G., & SriDevi, S. (2017). Distributed arithmetic architectures for fir filters-a comparative review: Proceedings of the 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). (pp. 2684–2690). IEEE.
NagaJyothi, G., & Sridevi, S. (2019). High speed and low area decision feed-back equalizer with novel memory less distributed arithmetic filter. Multimedia Tools and Applications, 78, 1–15.
Article Google Scholar
Oh, D., & Parhi, K. K. (2006). Low complexity design of high speed parallel decision feedback equalizers: Proceedings of the ASAP’06. International Conference on Application-specific Systems, Architectures and Processors. (pp. 118–124). IEEE.
Parhi, K. K. (2004). Pipelining of parallel multiplexer loops and decision feedback equalizers: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP’04), vol. 5, (pp. V–21). IEEE.
Parhi, K. K. (2005). Design of multigigabit multiplexer-loop-based decision feedback equalizers. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 4, 489–493.
Article Google Scholar
Parhi, K. K., & Messerschmitt, D. G. (1989). Pipeline interleaving and parallelism in recursive digital filters. I. Pipelining using scattered look-ahead and decomposition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(7), 1099–1117.
Article Google Scholar
Peled, A., & Liu, B. (1974). A new hardware realization of digital filters. IEEE Transactions on Acoustics, Speech, and Signal Processing, 22(6), 456–462.
Article Google Scholar
Prakash, M. S., Shaik, R. A., & Koorapati, S. (2016). An efficient distributed arithmetic-based realization of the decision feedback equalizer. Circuits, Systems, and Signal Processing, 35(2), 603–618.
Article Google Scholar
Proakis, J. G., & Manolakis, D. G. (1996). Digital signal processing (Vol. 3). New Jersey: Prentice Hall.
Google Scholar
Venkatachalam, S., & Ko, S. B. (2018). Approximate sum-of-products designs based on distributed arithmetic. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 99, 1–5.
Google Scholar
White, S. A. (1989). Applications of distributed arithmetic to digital signal processing: A tutorial review. IEEE ASSP Magazine, 6(3), 4–19.
Article Google Scholar
Yoo, H., & Anderson, D. V. (2005). Hardware-efficient distributed arithmetic architecture for high-order digital filters. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings.(ICASSP’05) (Vol. 5, pp. 5–125).

Download references

Author information

Authors and Affiliations

Department of Electronics Engineering, Osmania University, Hyderabad, India
K. Vijetha & B. Rajendra Naik

Authors

K. Vijetha
View author publications
You can also search for this author in PubMed Google Scholar
B. Rajendra Naik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Vijetha.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vijetha, K., Naik, B.R. High performance area efficient DA based FIR filter for concurrent decision feedback equalizer. Int J Speech Technol 23, 297–303 (2020). https://doi.org/10.1007/s10772-020-09695-x

Download citation

Received: 02 December 2019
Accepted: 24 February 2020
Published: 03 March 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10772-020-09695-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

High performance area efficient DA based FIR filter for concurrent decision feedback equalizer

Abstract

Similar content being viewed by others

ASIC Implementation of Low Power, Area Efficient Adaptive FIR Filter Using Pipelined DA

Modeling and simulation of FIR filter using distributed arithmetic algorithm on FPGA

FIR Filter Design Using Distributed Arithmetic with Lookup Tables (LUTs)

1 Introduction

2 Formulation of DA based block FIR filter

3 Proposed DA based FIR filter for DFE

4 Result analysis

5 Implementation of proposed DA based block FIR filter for DFE

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High performance area efficient DA based FIR filter for concurrent decision feedback equalizer

Abstract

Similar content being viewed by others

ASIC Implementation of Low Power, Area Efficient Adaptive FIR Filter Using Pipelined DA

Modeling and simulation of FIR filter using distributed arithmetic algorithm on FPGA

FIR Filter Design Using Distributed Arithmetic with Lookup Tables (LUTs)

Explore related subjects

1 Introduction

2 Formulation of DA based block FIR filter

3 Proposed DA based FIR filter for DFE

4 Result analysis

5 Implementation of proposed DA based block FIR filter for DFE

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation