A New Approach for Parallel Discrete Fourier Transform in Multi Mesh Network

Datta, Amit; De, Mallika

doi:10.1007/978-81-322-1856-2_10

Amit Datta⁴ &
Mallika De⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 266))

1046 Accesses
1 Citations

Abstract

A simple parallel algorithm is proposed in this paper to compute Discrete Fourier Transform (DFT) coefficients of N points on Multi-Mesh (MM) architecture having N ² processors where each of the processor generates a single element of the Fourier matrix based on the node position in the network i.e. Fourier elements are generated in place. The algorithm given here transforms a vector of length N in constant time multiplication and O (√N) addition and data communication time.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Parallel Implementation for Some Applications of Integral Equations Method

Article 25 May 2018

Massive parallelization of multilevel fast multipole algorithm for 3-D electromagnetic scattering problems on SW26010 many-core cluster

Article 28 November 2023

Expandable Parallel Finite Element Methods for Linear Elliptic Problems

Article 01 March 2020

Keywords

1 Introduction

Discrete Fourier transforms (DFTs) [1, 2] are used extensively in a range of computing applications like audio processing, image processing, medical imaging etc.

A number of parallel algorithms developed for DFT/FFT in different architectures can be found in the literature [2–7].

Using an extension of the common factor algorithm authors in [3] have shown that a simple planar 2-dimensional systolic array having N = M ² processing elements can be used to compute DFT of size N in 2 M + 1 steps of pipelined operations, achieving the area-time complexity AT ² = O (N ² log ³ N).

In this paper we present an entirely different but simple direct matrix–vector multiplication approach for computing DFT in Multi-Mesh architecture. The time complexity of the algorithm is O (√N) for N point DFT. The algorithm is implemented in Multi-Mesh (MM) architecture [7–9] having N ² processors where each processor is of 4 degree. As shown in Fig. 1, there are N meshes of size √N × √N each, which themselves are again arranged in √N rows and √N columns so that there will be N ² processors in the MM network.

The important aspect of the proposed parallel algorithm is that it requires constant time multiplication. The order of time complexity is mainly governed by the data communication and addition times. The Fourier components are generated in place in parallel for each processor using the processor position co-ordinates and require three multiplication steps and two addition steps. The input of the vector to be transformed and their propagation requires (2n + 1) communication steps. The partial products are generated by a single multiplication step. After which 2(n − 1) additions and (n + 1) data communication steps are required to get the transformed vector.

2 Discrete Fourier Transform

The DFT [1, 2] of a continuous function a (t) is given by

$$ A_{f} = \sum\nolimits_{0}^{N - 1} {a_{t} e^{{\frac{2\pi ifk}{N}}} } ,\;0 \le f \le N - 1 $$

where, $ i = \sqrt { - 1} $. The variable t is used to represent time, and f is used to represent frequency. The Fourier transform is used to convert a function of time into a function of frequency.

3 The Multi Mesh (MM) Network

First, the MM network proposed by the authors [7] is described. The basic building block of the MM network is n × n mesh.

A processor inside a given block can be uniquely identified by two coordinates. Again blocks are organized as matrix form so each block can be identified by two coordinates, say α and β as M (α, β). Thus, each of the n⁴ processors in MM can be uniquely identified using a 4-tuple. If the boundary processors of P (α, β ₁) are connected to boundary processors of P (α, β ₂) for all values of α, 0 ≤ α ≤ n − 1, we denote these sets of links by an interconnection between the sets P (*, β ₁) and P (*, β ₂). Inter-block vertical connection is identified by following rule.

∀ β, 0 ≤ β ≤ n − 1, P (α, β, 0, y) are connected to P (y, β, n − 1, α), where 0 ≤ y, α ≤ n − 1, and inter block horizontal connection is identified by following rule.

∀ α, 0 ≤ α ≤ n − 1, P (α, β, x, 0) are connected to P (α, x, β, n − 1), where 0 ≤ x, β ≤ n − 1.

4 Fourier Transformation Using Matrix–Vector Multiplication

Transformation of a matrix A of size N × N by a vector B of size N is given by C = A × B, where C is the transformed vector of length N.

4.1 Parallel Implementation of Fourier Transform Using MM Network

The given matrix A is partitioned into n × n sub-matrices A _{α, β} where 0 ≤ α, β ≤ n − 1. In this way there are n ² sub-matrices which are initially mapped to M (α, β), 0 ≤ α, β ≤ n − 1, each of which contains n ² processors P (α, β, *, *). This ‘*’ indicates all possible values from 0 to n − 1. Here each processor holds a single component of matrix A and all these values are generated in place according to the position of the node P (α, β, x, y), 0 ≤ α, β, x, y ≤ n − 1. For example, ω ^{(αn + x)(βn + y)} is the value of A at the node position P (α, β, x, y) and is calculated as $ \cos \frac{{\left( {\alpha n + x} \right)\left( {\beta n + y} \right)2\pi }}{n} - i \sin \frac{{\left( {\alpha n + x} \right)\left( {\beta n + y} \right)2\pi }}{n} $ using the built in complex functionality $ \omega = e^{{\frac{2\pi i}{n}}} = \cos \frac{2\pi }{n} - i \sin \frac{2\pi }{n} $. Next B values are input through the upper boundary of the MM network and are moved to all the other positions in the MM network. The movement of the B values are shown in Fig. 2 where only first four components of B vector are shown for the first column of blocks of a 4 × 4 Multi-Mesh.

Once the B values are populated and A values are calculated at the corresponding node, the next step towards completion of Fourier Transform is to perform multiplication at each node.

To calculate the $ C_{i } \sum\nolimits_{j = 0}^{{n^{2} - 1}} {c_{ij} } $ for each i, where $ c_{ij} = a_{ij} \times b_{j} $, all c_ij ’s are to be brought in a single block M (i/n, i%n), as they are now scattered in ith rows of n different blocks. To achieve this n shift is performed along the horizontal inter block links of the MM network as shown in Fig. 3 given below.

Now the n ² values of each mesh are summed along the column direction then horizontally along the first row of each block to make an element of the result vector C. After this step one more single step data movement is performed along the horizontal inter block links to bring the values of all the blocks M (α, 0), 0 ≤ α ≤ n − 1, at the left boundaries as shown in and Fig. 4. The final output is obtained at the left boundary of the MM network.

5 Parallel Discrete Fourier Transform Using MM Network

5.1 Algorithm PDFT

A.
Initialization step

This step is subdivided into two parts. Register R1 s contain the initial values of matrix A, whereas registers R2 s contain the initial values of vector B.
Step 1::

∀ α, β, x and y, 0 ≤ α, β, x, y ≤ n − 1 do in parallel

R1 (α, β, x, y)←

Step 2::

∀ β and y, 0 ≤ α, β, y ≤ n − 1 do in parallel

R2 (0, β, 0, y)←
B.
Propagation of B vector to the other rows of MM network
1. I.
  ∀ β and y, 0 ≤ β, y ≤ n − 1 do in parallel
  
  R2 (y, β, n, 0) ← R2 (0, β, 0, y);
2. II.
  ∀ α, β, 0 ≤ α, β ≤ n − 1 do in parallel
  
  for i = 1 to n − 1 do
  
  R2 (α, β, n − 1, i) ← R1 (α, β, n − 1, i − 1);
3. III.
  ∀ α, β, i, 0 ≤ α, β, i ≤ n − 1 do in parallel
  
  R2 (i, β, 0, α) ← R2 (α, β, n − 1, i);
4. IV.
  ∀ α, β, j, 0 ≤ α, β, j ≤ n − 1 do in parallel
  
  for i = 1 to n − 1 do
  
  R1 (α, β, i, j) ← R2(α, β, i − 1, j);
C.
Multiplication Step
- ∀ α, β, x and y, 0 ≤ α, β, x, y ≤ n − 1 do in parallel
- R1 (α, β, x, y) ← R1 (α, β, x, y) × R2 (α, β, x, y);
D.
Data Movement Step

In the MM network horizontal inter block links form cycles of length 2n between the k th row of the block M (i, j) and the j th row of the block M (i, k) for j ≠ k, 0 ≤ i ≤ n − 1. For a given α, if data elements in M(α, *) are shifted through n positions along the horizontal cycles, then the ith row elements of M(α, j) will also be shifted to the jth row of B(α, i), 0 ≤ α ≤ n − 1.

/* Here, `*’ indicates all possible values from 0 to n − 1, but the same value for it must be used on both sides of the assignment operator */

6 Time Complexity of Parallel DFT

The number of steps required for initialization stage A is 6 and for propagation stage is 2n. Generation of partial products needs single step. Data movement stage D requires n steps. Addition stage E requires 2(n − 1). Data arrangement stage F is done in single step. So, total number of steps required for the entire process is 5n + 6 or 5 N ^1/2 + 6 where N ² = n ⁴ which implies time complexity of this algorithm is O (n) or O (N ^1/2).

7 Comparative Study of Time Complexities of Parallel DFT in Multi-Mesh and Other Architectures

The time complexities of parallel DFT computation in various architectures have been tabulated in Table 1.

Table 1 Time complexities of PDFT computation in various architectures are given below

Full size table

For the purpose of comparison, we are considering same mesh size (n × n) and same total number of nodes (n ⁴) for both MM and Multi-dimensional mesh. The diameters of MM and 4-D mesh are 2n and 4 (n − 1) respectively, and degree of each node in MM is uniformly 4, whereas the node degree of each internal node is 8 for 4-D mesh. If smaller meshes are used to construct the multi-dimensional mesh, the node degree will increase. Moreover, the time complexity of proposed DFT computation in MM outperforms multi-dimensional mesh for practical sizes of meshes, i.e., mesh sizes less than 2048 × 2048.

In Hypercube better time complexity is achieved through its high node degree and number of links with increasing dimensions.

8 Conclusion and Future Work

A parallel algorithm for DFT of vector of length N (=n ²) has been proposed with constant multiplication time and O (√N) addition and data communication time. Each processor of the MM network having N ² (=n ⁴) processors, generates a single component of the Fourier matrix depending on the processor position in MM network i.e. each of the components is generated in place.

The authors in [7] have implemented n ^1.42 × n ^1.42 matrix-by-matrix multiplication in O (n) time using n ⁴ processors (same as that used in the proposed work) of MM and indicated that DFT of vector length n ^1.42 can be computed in the same way. In the present work, DFT of vector length n ² is computed as linear transformation in O (n) time using n ⁴ processors of MM. The Fourier matrix is calculated in-place at each node of the MM, so there is no need to input the Fourier matrix through the left boundary of each mesh (block) as was done in [7], and vector to be transformed was input only through the upper boundary of the MM whereas the input was made through the upper boundary of each mesh (block) in [7].

As a future work, we may consider a generalized MM network where total number of processors in MM network is m ² × n ² having m × n number of processors in each block of the MM network for m, n ≥ 3.

References

Cooley, J.M., Tukey, J.W.: An algorithm for the machine computation of the complex Fourier series. Math. Comput. 19, 297–301 (1965)
Article MATH MathSciNet Google Scholar
Strong, J.P.: The Fourier transform on mesh connected processing arrays such as massively parallel processors. CAPAIDM 19, 190–196 (1986)
Google Scholar
Shousheng, H., Torkelson, M.: A systolic array implementation of common factor algorithm to compute DFT. In: International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN), pp. 374–381 (1994)
Google Scholar
Sinha, B.P., Mukherjee, A.: Parallel sorting algorithm using multiway merge and its implementation on a multi-mesh network. J. Parallel Distrib. Comput. 60, 891–960 (2000)
Article Google Scholar
Jaja, J.: An Introduction to Parallel Algorithms. Addison-Wesley (1992)
Google Scholar
Fragopoulou, P., Akl, S.G.: A parallel algorithm for computing Fourier transform on the star graph. Trans. Parallel Distrib. Syst. 5(5), 525–531 (1994)
Article Google Scholar
Das, D., De, M., Sinha, B.P.: A new network topology with multiple meshes. IEEE Trans. Comput. 48(5), 536–551 (1999)
Article MathSciNet Google Scholar
De, M., Das, D., Sinha, B.P.: An efficient sorting algorithm on the multi-mesh network. IEEE Trans. Comput. 46(10), 1132–1137 (1997)
Article MathSciNet Google Scholar
Sinha, B. P.: Multi-mesh an efficient topology for parallel processing. In: Proceedings of 9th International Parallel Processing Symposium, (IPPS), pp. 17–21 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Research Scholar, Department of Engineering and Technological Studies, University of Kalyani, Kalyani, India
Amit Datta
Department of Engineering and Technological Studies, University of Kalyani, Kalyani, India
Mallika De

Authors

Amit Datta
View author publications
You can also search for this author in PubMed Google Scholar
Mallika De
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amit Datta .

Editor information

Editors and Affiliations

Computer Science and Engineering, Indian School of Mines, Dhanbad, Dhanbad, Jharkhand, India
G. P. Biswas
Computer Science and Engineering, Indian School of Mines, Dhanbad, Dhanbad, Jharkhand, India
Sushanta Mukhopadhyay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Datta, A., De, M. (2014). A New Approach for Parallel Discrete Fourier Transform in Multi Mesh Network. In: Biswas, G., Mukhopadhyay, S. (eds) Recent Advances in Information Technology. Advances in Intelligent Systems and Computing, vol 266. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1856-2_10

Download citation

DOI: https://doi.org/10.1007/978-81-322-1856-2_10
Published: 12 March 2014
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1855-5
Online ISBN: 978-81-322-1856-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

A New Approach for Parallel Discrete Fourier Transform in Multi Mesh Network

Abstract

Similar content being viewed by others

Parallel Implementation for Some Applications of Integral Equations Method

Massive parallelization of multilevel fast multipole algorithm for 3-D electromagnetic scattering problems on SW26010 many-core cluster

Expandable Parallel Finite Element Methods for Linear Elliptic Problems

Keywords

1 Introduction

2 Discrete Fourier Transform

3 The Multi Mesh (MM) Network

4 Fourier Transformation Using Matrix–Vector Multiplication

4.1 Parallel Implementation of Fourier Transform Using MM Network

5 Parallel Discrete Fourier Transform Using MM Network

5.1 Algorithm PDFT

6 Time Complexity of Parallel DFT

7 Comparative Study of Time Complexities of Parallel DFT in Multi-Mesh and Other Architectures

8 Conclusion and Future Work

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A New Approach for Parallel Discrete Fourier Transform in Multi Mesh Network

Abstract

Similar content being viewed by others

Parallel Implementation for Some Applications of Integral Equations Method

Massive parallelization of multilevel fast multipole algorithm for 3-D electromagnetic scattering problems on SW26010 many-core cluster

Expandable Parallel Finite Element Methods for Linear Elliptic Problems

Keywords

1 Introduction

2 Discrete Fourier Transform

3 The Multi Mesh (MM) Network

4 Fourier Transformation Using Matrix–Vector Multiplication

4.1 Parallel Implementation of Fourier Transform Using MM Network

5 Parallel Discrete Fourier Transform Using MM Network

5.1 Algorithm PDFT

6 Time Complexity of Parallel DFT

7 Comparative Study of Time Complexities of Parallel DFT in Multi-Mesh and Other Architectures

8 Conclusion and Future Work

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation