A parallel sliding-window belief propagation algorithm for Q-ary LDPC codes accelerated by GPU

Shan, Bowei; Chen, Sihua; Fang, Yong

doi:10.1007/s11042-020-08738-4

A parallel sliding-window belief propagation algorithm for Q-ary LDPC codes accelerated by GPU

Published: 04 March 2020

Volume 79, pages 34287–34300, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

A parallel sliding-window belief propagation algorithm for Q-ary LDPC codes accelerated by GPU

Download PDF

160 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, a parallel Sliding-Window Belief Propagation algorithm to decode Q-ary Low-Density-Parity-Codes is proposed. This algorithm is accelerated by taking advantage of high parallel features of GPU, and applied to video compression under distributed video coding framework. The experiment results show that our parallel algorithm achieves 2.3× to 30.3× speedup ratio under 256 to 2048 codeword length and 69.21× to 78.31× speedup ratio under 16,384 codeword length than sequential algorithm.

GPU Accelerated Parallel Algorithm of Sliding-Window Belief Propagation for LDPC Codes

Article 04 April 2019

Accelerating Q-ary Sliding-Window Belief Propagation Algorithm with GPU

Graphics processing unit-accelerated joint-bitplane belief propagation algorithm in DSC

Article 11 May 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Distributed Video Coding (DVC) [7] is an advanced asymmetric coding scheme which encodes individual frame independently while decodes them conditionally. This feature makes DVC widely applied in mobile device environment. As a capacity-approaching channel codes, Low-Density-Parity-Codes (LDPC) codes [6, 9] have been used to compress video under DVC framework [15]. LDPC codes were first invented by Gallager [6] in 1962, and rediscovered by MacKay and Neal [9] in 1996. Thereafter, it has become one of the hottest topic for both research and industrial community.

In 1998, MacKay et al. [3] generalized the binary LDPC to finite fields GF(Q = 2^q) and proposed a Q-ary LDPC (QLDPC). QLDPC provides a better choice for practical multimedia problems, e. g., video and image compression, as one pixel is normally represented by 8 bits at least. Most decoders of LDPC codes are implemented by Belief Propagation (BP) algorithm [8] (also known as “sum-product” algorithm). The Q-ary BP (QBP) algorithm was used to decode QLDPC. The QBP has a computing complexity O(NtQ²), where N is the codeword length, and t is the mean column weight of sparse parity check matrix H. To reduce its computing burden, Declercq et al. [1] proposed a fast Q-ary BP algorithm, whose idea is to replace the convolutional operations by Fast Fourier Transform (FFT) and reduces the complexity to $ O\left( NtQ{\log}_2(Q)\right) $.

To improve the performance of BP algorithm, Fang [4] presented a Sliding-Window Belief Propagation (SWBP) algorithm, whose idea is to adaptively select optimal local bias probabilities to seed the variable nodes of BP. A lot of experiments [4, 5] showed that SWBP could achieve better performance with less iteration times. In addition, it is very easy to implement and insensitive to the initial settings. Recently, Q-ary SWBP (QSWBP) [14] has been proposed to deal with the QLDPC codes and achieved better performance and robustness, while it still suffers from heavy computing complexity.

Graphics Processing Unit (GPU) [10] invented by NVIDIA, by means of highly parallel structure, has demonstrated powerful ability for high performance computing. Inspired by GPU’s amazing ability, we propose a parallel version of QSWBP and accelerate it by GPU. In 2016, the joint-bitplane BP has been accelerated by GPU [2]. In 2019, A parallel binary SWBP algorithm has been accelerated by GPU and obtained remarkable speedup ratio [12]. To our best knowledge, parallel QSWBP algorithm has still not been presented. Instead of C/C++, we will use MATLAB as our programming platform in this paper. As a high-level language for scientific computing and rapid prototyping engineering problems, MATLAB has many advantages, e. g.eliminating pointer to avoid memory access error; powerful manipulation of vector and matrix; and concise and efficient vectorization instructions. Since 2010, MATLAB has introduced supports to GPU [11]. We use our parallel QSWBP algorithm to decode a small fractue of video under DVC framework. The numertical experiments are performed to investigate the accelerating effects between parellel and sequence algorithm. A brief version of this paper has been published in IoTaaS 2019 conference [13], and we hereby present the detaild discussions and the application of this algorithm.

2 QSWBP algorithm

2.1 Correlation model

Let A = [0 : Q) denote the alphabet. Let x, y ∈ A denote the realization of X and Y, which are two random variables. Let Xⁿ be the source to be compressed at the encoder. Let Yⁿ be the Side Information (SI) that resides only at the decoder. Let Xⁿ = Yⁿ + Zⁿ. We model the correlation between input Xⁿ and output Yⁿ as a virtual channel with following properties: Yⁿ and Zⁿ are independent with each other; $ {p}_{Z^n}\left({z}^n\right)={\prod}_{i=1}^n{p}_{Z_i}\left({z}_i\right) $, where p_X(x) denotes the Probability Mass Function (pmf) of discrete random variable X; pmfs of Z_i’s may be different, where i ∈ [0 : n].

We use Truncated Discrete Laplace(TDL) distribution to model Z_i:

$$ {p}_{X_i\mid {Y}_i}\left(x|y\right)\propto \frac{1}{2{b}_i}\exp\ \left(-\frac{\mid x-y\mid }{b_i}\right) $$

(1)

where b_i is the local scale parameter. Since $ {\sum}_{x=0}^{Q-1}{p}_{X_i\mid {Y}_i}\left(x|y\right)=1 $, we can obtain

$$ {p}_{X_i\mid {Y}_i}\left(x|y\right)=\exp\ \left(-\frac{\mid x-y\mid }{b_i}\right){L}_Q\left({b}_i,y\right) $$

(2)

where $ {L}_Q\left(b,y\right)={\sum}_{x=0}^{Q-1}\exp\ \left(-|x-y|{b}_i\right) $ . To reduce the computing complexity, we use integration to approximate the summation. When b and Q are reasonably big, this approximation is precise enough by

$$ {\displaystyle \begin{array}{l}{L}_Q\left(b,y\right)\approx {\int}_0^{Q-1}\exp\ \left(-\frac{\mid x-y\mid }{b}\right) dx\\ {}\kern3.5em =2b\left(1-\frac{1}{2}\exp\ \left(\frac{y-\left(Q-1\right)}{b}\right)-\frac{1}{2}\exp\ \left(-\frac{y}{b}\right)\right)\end{array}} $$

(3)

2.2 Encoding

The encoder uses QLDPC codes to compress source x ∈ [0 : Q)ⁿ to get syndrome s ∈ [0 : Q)ⁿ. This process is performed by matrix-vector multiplication over the finite field GF(Q):

$$ \mathrm{s}=\mathbf{H}\mathrm{x} $$

(4)

where H ∈ [0 : Q)^{m × n} is the sparse parity-check matrix. In H, i-th column corresponds to source node x_i, and j-th row corresponds to syndrome node s_i. If the elementary of H h_{j, i} ≠ 0, an edge connects s_i and x_i in the bipartite graph of H, as illustrated in Fig. 1. We define the indices of all source nodes that are connected to syndrome node s_j as $ {N}_j\overset{\Delta}{=}\le ft\Big\{i:{h}_{j,i}\ne 0\subset \left[1:n\right] $, and indices of all syndrome nodes that are connected to source node x_i as $ {M}_i\overset{\Delta}{=}\left\{j:{h}_{j,i}\ne 0\right\}\subset \left[1:m\right] $.

2.3 Decoding

The decoder seeds source nodes x according to SI y, and runs QBP algorithm to recover x. For the belief propagation between source nodes and syndrome nodes, we give following definitions: ξ_i(x) is intrinsic pmf of source node x_i; ζ_i(x) is overall pmf of source node x_i; r_{i, j}(x) is the pmf passed from source node x_i to syndrome nodes s_j; and q_{j, i}(x) is the pmf passed from syndrome nodes s_j to source node x_i, where j ∈ M_i and i ∈ N_j.

BP includes 5 steps:

1.
Initializing BP:

$$ {\xi}_i(x)={\zeta}_i(x)={p}_{X_i\mid {Y}_i}\left(x|y\right) $$

(5)

$$ {q}_{j,i}(x)=1/Q $$

(6)

where $ {p}_{X_i\mid {Y}_i}\left(x|y\right) $ is calculated by (3)

2.
Source-to-Syndrome BP:

$$ {r}_{i,j}(x)\propto \frac{\zeta_i(x)}{q_{j,i}(x)} $$

(7)

3.
Syndrome-to-Source BP:

$$ {q}_{j,i}\left({h}_{j,i}x\right)={F}^{-1}\left\{\frac{\psi_j(w)}{F\left\{{r}_{i,j}\left({h}_{j,i}x\right)\right\}}\right\} $$

(8)

where $ {\psi}_j(w)=\prod \limits_{i\in {I}_j}F\left\{{r}_{i,j}\left({h}_{j,i}x\right)\right\} $ for j ∈ [1 : m] and w ∈ [0 : Q), F{·} denotes the Fourier Transform, and F⁻¹{·} denotes the inverse Fourier Transform.

4.
Overall pmf of Source Nodes:

$$ {\zeta}_i(x)={\xi}_i(x)\prod \limits_{i\in {J}_i}{q}_{j,i}(x) $$

(9)

5.
Hard Decision and Convergence Test:

$$ {\hat{x}}_i=\underset{x\in \left[0:Q\right)}{\arg\ \max\ {\zeta}_i}(x) $$

(10)

If $ \mathrm{s}=\mathbf{H}\hat{\mathrm{x}} $, the decoding process finished successfully; otherwise, more iterations need be performed until either $ \mathrm{s}=\mathbf{H}\hat{\mathrm{x}} $ or the iteration times exceed a prespecified threshold.

2.4 SWBP algorithm

In QBP, the source nodes need be seeded with local scale parameter b of virtual correlation channel. In [1, 8], the parameter of virtual correlation channel is estimated by SWBP algorithm. In this paper, we will use expected L₁ distance between each source symbol and its corresponding SI symbol defined as

$$ {\mu}_i\overset{\Delta}{=}\sum \limits_{x=1}^Q\left({\zeta}_i(x)\cdotp |x-{y}_i|\right) $$

(11)

Then, the estimated local scale parameter $ \hat{b} $ is calculated by averaging the expected L₁ distances of its neighbors in a window with size-(2η + 1)

$$ {\hat{b}}_i\left(\eta \right)=\frac{t_i\left(\eta \right)-{\mu}_i}{\mathit{\min}\ \left(i+\eta, n\right)-\mathit{\max}\ \left(1,i-\eta \right)} $$

(12)

where

$$ {t}_i\left(\eta \right)\overset{\Delta}{=}\sum \limits_{i^{\prime }=\max\ \left(1,i-\eta \right)}^{\min\ \left(i+\eta, n\right)}{\mu}_{i^{\prime }} $$

(13)

To calculate (13), we first calculate $ {t}_1\left(\eta \right)=\sum \limits_{i^{\prime }=1}^{1+\eta }{u}_{i^{\prime }} $ . Then for i ∈ [2 : n],

$$ {t}_i\left(\eta \right)=\left\{\begin{array}{l}{t}_{i-1}\left(\eta \right)+{\mu}_{i+\eta },\kern3.5em i\in \left[2:\left(\eta +1\right)\right]\\ {}{t}_{i-1}\left(\eta \right)+{\mu}_{i+\eta }-{\mu}_{i-1-\eta },\kern0.75em i\in \left[\left(\eta +2\right):\left(n-\eta \right)\right]\\ {}{t}_{i-1}\left(\eta \right)+{\mu}_{i-1-\eta },\kern3em i\in \left[\left(n-\eta +1\right):n\right]\end{array}\right. $$

(14)

Same as [1, 8], the main purpose of QSWBP is to find a best half window size $ \hat{\eta} $. We define an expected rate:

$$ {\displaystyle \begin{array}{l}\gamma \left(\eta \right)\overset{\Delta}{=}-\sum \limits_{i=1}^n\sum \limits_{x=0}^{Q-1}{\zeta}_i(x)\cdotp \ln\ \frac{\mathit{\exp}\ \left(-|x-{y}_i|{\hat{b}}_i\left(\eta \right)\right)}{L_Q\left({\hat{b}}_i\left(\eta \right),{y}_i\right)}\\ {}\kern1.75em =\sum \limits_{i=1}^n\left(\ln\ {L}_Q\Big({\hat{b}}_i\left(\eta \right),{y}_i\Big)+\frac{\mu_i}{{\hat{b}}_i\left(\eta \right)}\right)\end{array}} $$

(15)

where $ {L}_Q\left({\hat{b}}_i\left(\eta \right),{y}_i\right) $ is defined by (3). The best half window size $ \hat{\eta} $ is chosen by

$$ \hat{\eta}=\underset{\eta }{\arg\ \min\ \gamma}\left(\eta \right), $$

(16)

It is a natural idea that best half window size should minimize the expected rate. The flowchart of QSWBP algorithm is illustrated in Fig. 2.

3 Parallel QSWBP algorithm

3.1 Complexity analysis

In Fig. 2, the decoding process includes 4 parts.

The Source-to-Syndrome BP step needs n times iterations to calculate r_{i, j}(x) Since there are n source nodes.
The Syndrome-to-Source BP step needs m times iterations to calculate q_{j, i}(x) since there are m syndrome nodes. Let rw_j denotes the j-th row weight, in each iteration, the Fourier Transform needs be calculated for rw_j times and inverse Fourier Transform also needs be calculated for rw_j times. We use Fast Fourier Transform(FFT) to implement Fourier Transform and inverse Fourier Transform. If the codeword length is n, each FFT needs nlog₂(n) real multiplies and real adds.
The Computing Overall pmf step needs n times iterations to calculate ζ_i(x).
In SWBP step, to find the best half window size, a search strategy [5] was proposed that only searches half window size from $ \eta \in \left\{{1}^2,\dots, {\left\lfloor \sqrt{\frac{n-1}{2}}\right\rfloor}^2\right\} $. Although this strategy could remarkably reduce the searching iterations, we found that the best half window size might be omitted according to our experiment results. Therefore, we will evaluate all expected rates from $ \eta \in \left\{1,2,\dots, \left\lfloor \frac{n-1}{2}\right\rfloor \right\} $ which needs $ \left\lfloor \frac{n-1}{2}\right\rfloor $ iterations. In practice, since γ(1) is always big enough, it is obviously unnecessary to calculate it while η = 1. In each SWBP iteration, getting t_i(η) needs n iterations in (14) and getting γ(n) also needs n iterations in (15) in which the ln functions is very time consuming.

Based on above analysis, we conclude that syndrome-to-source BP step and SWBP step are two major time consuming parts in the entire algorithm. To verify our conclusion, we use the profile function in MATLAB to investigate the details of time consuming. Profile function [10] could record the executive time of each function in MATLAB code. The time consuming result is shown in Fig. 3, where qldpc _ test is the main function which costs 5.548 s, ntt is the FFT function which is called for 8192 times and totally cost 3.912 s, and swbp _ lap is the SWBP function which is called for 7 times and totally cost 1.607 s. Profile analysis means FFT and SWBP are two major bottlenecks which agrees well with ours analysis. Since one GPU has large amount of cores, which can run many threads simultaneously, we take advangtage of this feature of GPU to accelerate above bottlenecks. In the our parallel algorithm, the bottleneck is divided into many pieces, which has no corelation with each others, and each piece can run on one core of GPU. The details of parallel algorithm are introduced in next two subsections.

3.2 Parallel syndrome-to-source BP algorithm

The sequential Syndrome-to-Source BP algorithm needs m iterations to calculate q_{j, i}(x) as there are m syndrome nodes. Since these m iterations are independent with each other, they can be calculated in parallel. We take Fig. 1(b) as an example. Figure 4 depicts our parallel algorithm, where the number in square means the row position of non-zero elements in parity-check-matrix H of Fig. 1(b). 5 threads run simultaneously on GPU to calculate each q_{j, i}(x) and each thread calculates FFT in sequence.

3.3 Parallel SWBP algorithm

In sequential SWBP algorithm, each window size setup iteration could generate an expected rate γ(η), which is calculated by (15). Any two expected rate γ(η₁) and γ(η₂) (η₁ ≠ η₂) are uncorrelated, and can be computed in parallel. In our parallel algorithm, all γ(η), $ \eta \in \left\{1,2,\dots, \left\lfloor \frac{n-1}{2}\right\rfloor \right\} $ would be calculated simultaneously by thousands of threads on GPU. Once γ(η), $ \eta \in \left\{1,2,\dots, \left\lfloor \frac{n-1}{2}\right\rfloor \right\} $ was obtained, we could use min() function in MATLAB to get the smallest γ and corresponding best η from array γ(η). The sequential and parallel algorithm are illustrated in Fig. 5.

3.4 Vectorization

Thanks to the features of MATLAB language, we could manipulate matrix and vector by vectorizing code instead of loop-based code. We take calcualting (11) for example. Listing 1 is our MATLAB implementation of (11) by loop-based code and vectorizing code, repectively. From these codes, we could find vectorization has three advantages: less code means less error; it’s easier to understand since vectorizing code appears like the mathematical expressions in equations; and it runs much faster than loop-based code since MATLAB is optimized for vectorization. By using vectorization, the code appears more concise and elegents. Our vectorized source code has reduced 10% length than loop-based code.

Listing 1 MATLAB code of (11)

4 Video compression by QLDPC

Under DVC framework, the encoder is implemented by Pixel-Domain and Transform-Domain (PDTD) scheme. PDTD divides video frames into Key Frame, which is encoded and decoded using a conventional intraframe codec, and Wyner-Ziv Frames. A block-wise DCT is performed for Wyner-Ziv frames to obtain the transform coefficients X^DCT, which are independently quantized and grouped into coefficient bands. These bands are compressed by LDPC encoder. At the decoder, the correlation between X^DCT and side information S^DCT is modeled as a Laplacian distribution. LDPC decoder reconstructs the coefficient bands with the corrsponding side information and performs a inverse DCT to generate the reconstructed Wyner-Ziv frames.

One frame in the video is normally represented by an n × n 2D source. Let x^{n, n} and y^{n, n} be two n × n 2D sources, where

$$ {x}^{n,n}\triangleq \left(\begin{array}{ccc}{x}_{0,0}& \cdots & {x}_{0,n-1}\\ {}\vdots & \ddots & \vdots \\ {}{x}_{n-1,0}& \cdots & {x}_{n-1,n-1}\end{array}\right) $$

(17)

where x_{i, j} ∈ [0 : Q). The correlation between them follows the setup in Section 2.1.

At the encoder, x^{n, n} is first vectorized into an Q-ary temporary vector v^{n, n}. Then v^{n, n} is performed a matrix-vector multiplication over the finite field GF(Q) to compress source to syndrome s^m:

$$ {s}^m=\boldsymbol{H}{v}^{n,n}. $$

(18)

where H is an m × (n × n) sparse parity-check matrix.

At the decoder, the QSWBP is performed with the help of syndrome and side information to recover the source.

5 Experiment results

In our experiments, we use Intel Core i7 with 3.60Ghz as our CPU and NVIDIA GTX 1080Ti as our GPU. The detailed parameters of this GPU are listed in Table 1. We used MATLAB 2014b as our development platform, since this version provides fully support to GPU acceleration.

Table 1 Parameters of GPU platform

Full size table

5.1 Performance of parallel QSWBP

To evaluate the performance of our parallel algorithm, we perform two experiments with different Q.

In first experiment, we set Q = 256, and use 4 different regular LDPC codes as our input. The parameters of these LPDC codes are listed in Table 2. To eliminate the random errors, we perform 100 tests and average these outputs as our final results. The experiment result is illustrated in Fig. 6, which shows that parallel QSWBP algorithm achieves 2.9× to 30.3× accelerating ratio than sequential QSWBP algorithm. The longer the codeword length, the higher the accelerating ratio.

Table 2 Different LDPC code parameters(N is codeword length, K is information bit number)

Full size table

The second experiment uses first experiment’s LDPC codes as input, while we set Q = 2048. The experiment result is illustrated in Fig. 7, which shows that parallel QSWBP algorithm achieves 2.3× to 11.4× accelerating ratio than sequential QSWBP algorithm. The trend of accelerating ratio is same with the first one.

Normally, successful decoding needs 11 rounds BP iterations under Q = 256, but only 3 rounds BP iterations under Q = 2048. As a result, the totally running time of second experiment is shorter than that of first experiment. Because of the accumulating effects, first experiment achevieves higher accelerating performance than second experiment does.

5.2 Performance of video decoding

We borrow a YUV video sequence named Foreman from [16], and choose first 4 frames from Foreman as the source. Each frame is cropped to the size of 128×128 pixels. We construct a regular LDPC code with condeword length 16,384, and information bit number 8192. Then the rate is 1/2. The alphabet cardinality Q is fixed to 2⁸ = 256.

Our early experiments have demonstrated that QSWBP outperforms QBP in video decoding [14]. In this paper, only the parallel and sequential QSWBP are evaluated by decoding the Foreman video. The running time is used as the metric to evaluate the performance of two algorithms. To eliminate the random errors, we perform 100 decoding processes and average these running times as our final results. The experimental results are listed in Table 3. Both parallel and sequental QSWBP obtain the same recovered frames which are displayed in Fig. 8.

Table 3 Performance Comparision Between Parallel and Sequential QSWBP Algorithms

Full size table

From Table 3, we can find that, under different frames, parallel QSWBP achieved 69.21× to 78.31× speedup ratio. It is because that a very long LDPC code length (16384) need more iterations to search a best window size in the sequential QSWBP.

6 Conclusion

A parallel Q-ary SWBP algorithm to decode regular LDPC codes with different codelength and Q value has been proposed. We accelerate this algorithm with GPU and MATLAB vectorization technique. Experiment results show that parallel algorithm achieves 2.9× to 30.3× speedup ratio under Q = 256 and 2.3× to 11.4× speedup ratio under Q = 2048. The video decoding experiment shows that parallel algorithm achieves 69.21× to 78.31× speedup ratio under Q = 256. In our future work, we will implement the parallel algorithm on the FPGA platform to extend its applications.

References

Barnault L, Declercq D (2003) Fast decoding algorithm for LDPC over GF(2^q). In: Information Theory Workshop, 2003. Proceedings. 2003 IEEE, pp. 70–73. IEEE
Dai Y, Fang Y, Yang L, Jeon G (2016) Graphics processing unit-accelerated joint-bitplane belief propagation algorithm in DSC. J Supercomput 72(6):2351–2375
Article Google Scholar
Davey MC, MacKay D (1998) Low-density parity check codes over GF(q). IEEE Commun Lett 2(6):165–167
Article Google Scholar
Fang Y (2012) LDPC-based lossless compression of nonstationary binary sources using sliding-window belief propagation. IEEE Trans Commun 60(11):3161–3166
Article Google Scholar
Fang Y (2013) Asymmetric slepian-wolf coding of nonstationarily-correlated m-ary sources with sliding-window belief propagation. IEEE Trans Commun 61(12):5114–5124
Article Google Scholar
Gallager R (1962) Low-density parity-check codes. IRE Transactions on information theory 8(1):21–28
Article MathSciNet Google Scholar
Girod B, Aaron AM, Rane S and Rebollo-Monedero D (2005) Distributed Video Coding, in Proceedings of the IEEE, vol. 93(1), pp. 71–83
MacKay DJC (1999) Good error-correcting codes based on very sparse matrices. IEEE Trans Inf Theory 45(2):399–431
Article MathSciNet Google Scholar
Mackay DJC, Neal RM (1997) Near shannon limit performance of low density parity check codes. Electron Lett 32(6):457–458
Article Google Scholar
NVIDIA: http://www.nvidia.com/object/what-is-gpu-computing.html
Ploskas N, Samaras N (2016) GPU programming in MATLAB. Morgan Kaufmann
Shan B, Fang Y. A GPU accelerated sliding-window belief propagation parallel algorithm for LDPC code. International Journal of Parallel Programming (in press)
Shan B et al. (2019) Accelerating Q-ary Sliding-Window Belief Propagation algorithm with GPU. 5th EAI International Conference on IoT as a Service (IoTaas 2019), Xi’an
Shan, B. et al. Joint source-channel estimation via sliding-window belief propagation. IEEE Transactions on Wireless Communications (in preparation)
Xu Q, Xiong Z (2006) Layered Wyner-Ziv video coding. IEEE Trans Image Process 15(12):3791–3803
Article MathSciNet Google Scholar
YUV Video Sequences: http://trace.eas.asu.edu/yuv/index.html

Download references

Acknowledgements

We would like to thank colleagues in Chang’an University for their helpful discussions.

Author information

Authors and Affiliations

School of Information Engineering, Chang’an University, Xi’an, 710064, China
Bowei Shan, Sihua Chen & Yong Fang

Authors

Bowei Shan
View author publications
You can also search for this author in PubMed Google Scholar
Sihua Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yong Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Fang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shan, B., Chen, S. & Fang, Y. A parallel sliding-window belief propagation algorithm for Q-ary LDPC codes accelerated by GPU. Multimed Tools Appl 79, 34287–34300 (2020). https://doi.org/10.1007/s11042-020-08738-4

Download citation

Received: 09 August 2019
Revised: 26 December 2019
Accepted: 13 February 2020
Published: 04 March 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11042-020-08738-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A parallel sliding-window belief propagation algorithm for Q-ary LDPC codes accelerated by GPU

Abstract

Similar content being viewed by others

GPU Accelerated Parallel Algorithm of Sliding-Window Belief Propagation for LDPC Codes

Accelerating Q-ary Sliding-Window Belief Propagation Algorithm with GPU

Graphics processing unit-accelerated joint-bitplane belief propagation algorithm in DSC

1 Introduction