Accelerating Q-ary Sliding-Window Belief Propagation Algorithm with GPU

Shan, Bowei; Chen, Sihua; Fang, Yong

doi:10.1007/978-3-030-44751-9_1

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 316))

Included in the following conference series:

International Conference on Internet of Things as a Service

870 Accesses
1 Citations

Abstract

In this paper, we present a parallel Sliding-Window Belief Propagation algorithm to decode Q-ary Low-Density-Parity-Codes. The bottlenecks of sequential algorithm are carefully investigated. We use MATLAB platform to develop the parallel algorithm and run these bottlenecks simultaneously on thousands of threads of GPU. The experiment results show that our parallel algorithm achieves 2.3$\times $ to 30.3$\times $ speedup ratio than sequential algorithm.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A parallel sliding-window belief propagation algorithm for Q-ary LDPC codes accelerated by GPU

Article 04 March 2020

GPU Accelerated Parallel Algorithm of Sliding-Window Belief Propagation for LDPC Codes

Article 04 April 2019

Reduced-complexity decoding implementation of QC-LDPC codes with modified shuffling

Article Open access 09 October 2021

Keywords

1 Introduction

As a very good error-correcting code [1], Low-Density-Parity-Codes (LDPC) codes are wildly used in Fifth Generation (5G) telecommunication and Internet of Things as s Service (IoTaaS). It was first invented by Gallager [2] in 1962, and unfortunately ignored by information society for more than 30 years. The renaissance of LDPC was triggered by MacKay and Neal in 1996.

Originally, binary LDPC has been decoded by belief propagation (BP) algorithm (also known as “sum-product” algorithm) [1]. In 1998, MacKay $et\ al.$ [3] generalized the binary LDPC to finite fields GF ($Q=2^q$) and proposed a Q-ary LDPC. The Q-ary BP algorithm is used to decode Q-ary LDCP. To improve the performance of BP algorithm, Fang presented a Sliding-Window Belief Propagation (SWBP) algorithm. A lot of experiments [6] show that SWBP achieves better performance with less iteration times. In addition, it is very easy to implement and insensitive to the initial settings. Incorporating fast Q-ary BP with SWBP is a nature way to attain better performance and robustness, while it still suffers from heavy computing complexity.

Invented by NVIDIA, Graphics Processing Unit (GPU) [7] has demonstrated powerful ability for general-purpose computing. NVIDIA also presented a C-like languages interface named Compute Unified Device Architecture (CUDA) which is a useful tool for researchers to develop the parallel algorithms. Inspired by GPU’s amazing ability, we propose a parallel algorithm of Q-ary SWBP and accelerate it by GTX 1080Ti. Although we has used GPU to accelerate parallel binary SWBP algorithm developed by CUDA C++ [8], to our best knowledge, parallel Q-ary SWBP algorithm has still not been presented.

The rest of this paper is organized as: Sect. 2 describes the sequential Q-ary SWBP algorithm. Section 3 analyses the bottlenecks of sequential algorithm and presents the parallel algorithm. Section 4 uses GPU to accelerate the parallel algorithm and gives the experiment results. We concludes this paper in Sect. 5.

2 Q-ary SWBP Algorithm

2.1 Correlation Model

Let $\mathcal{A} = [0:Q)$ denote the alphabet. Let $x,y \in \mathcal{A}$ denote the realization of X and Y, which are two random variables. Let ${X^n}$ be the source to be compressed at the encoder. Let ${Y^n}$ be the Side Information (SI) that resides only at the decoder. Let ${X^n} = {Y^n} + {Z^n}$. We model the correlation between input ${X^n}$ and output ${Y^n}$ as a virtual channel with three properties:

(1)
Additive: ${Y^n}$ and ${X^n}$ are independent with each other;
(2)
Memoryless: ${p_{{Z^n}}}({z^n}) = \prod \nolimits _{i = 1}^n {{p_{{Z_i}}}({z_i})} $, where ${p_X}(x)$ denotes the Probability Mass Function (pmf) of discrete random variable X;
(3)
Nonstationary: pmfs of $Z_i$’s may be different, where $i \in [0:n]$.

We use Truncated Discrete Laplace (TDL) distribution to model $Z_i$:

$$\begin{aligned} {p_{{X_i}\mathrm{{|}}{Y_i}}}\left( {x|y} \right) \propto \frac{1}{{2{b_i}}}\exp \left( { - \frac{{|x - y|}}{{{b_i}}}} \right) \end{aligned}$$

(1)

where $b_i$ is the local scale parameter. Since $\sum \nolimits _{x = 0}^{Q - 1} {{p_{{X_i}\mathrm{{|}}{Y_i}}}\left( {x|y} \right) } = 1$, we can obtain

(2)

where . To reduce the computing complexity, we use integration to approximate the summation. When b and Q are reasonably big, this approximation is precise enough by

$$\begin{aligned} \begin{array}{l} {\mathcal{L}_Q}(b,y) \approx \int _0^{Q - 1} {\exp \left( { - \dfrac{{|x - y|}}{b}} \right) } dx \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\mathrm{{ = 2}}b\left( {1 - \dfrac{1}{2}\exp \left( {\dfrac{{y - (Q - 1)}}{b}} \right) \mathrm{{ - }}\dfrac{1}{2}\exp \left( { - \dfrac{y}{b}} \right) } \right) \end{array} \end{aligned}$$

(3)

The encoder uses Q-ary LDPC codes to compress source $\mathrm{{x}} \in {[0:Q)^n}$ to get syndrome $\mathrm{{s}} \in {[0:Q)^n}$. The decoder seeds source nodes x according to SI y, and runs Q-ary BP algorithm to recover x. For the belief propagation between source nodes and syndrome nodes, we give following definitions: ${\xi _i}(x)$ is intrinsic pmf of source node $x_i$; ${\zeta _i}(x)$ is overall pmf of source node $x_i$; $r_{i,j}(x)$ is the pmf passed from source node $x_i$ to syndrome nodes $s_j$; and $q_{j,i}(x)$ is the pmf passed from syndrome nodes $s_j$ to source node $x_i$, where $j \in {\mathcal{M}_i}$ and $i \in {\mathcal{N}_j}$. The encoding and decoding process of $Q-ary$ BP has been stated in [6], and we omit it in this paper.

2.2 SWBP Algorithm

In Q-ary BP, the source nodes need be seeded with local scale parameter b of virtual correlation channel. In [4] and [5], the parameter of virtual correlation channel is estimated by SWBP algorithm. In this paper, we will use expected $L_1$ distance between each source symbol and its corresponding SI symbol defined as

$$\begin{aligned} {\mu _i} \buildrel \varDelta \over = \sum \limits _{x = 1}^Q {({\zeta _i}(x) \cdot |x - {y_i}|)} \end{aligned}$$

(4)

Then, the estimated local scale parameter $\hat{b}$ is calculated by averaging the expected $L_1$ distances of its neighbors in a window with size-$(2\eta + 1)$

$$\begin{aligned} {\hat{b}_i}(\eta ) = \frac{{{t_i}(\eta ) - {\mu _i}}}{{\min (i + \eta ,n) - \max (1,i - \eta )}} \end{aligned}$$

(5)

where

$$\begin{aligned} {t_i}(\eta ) \buildrel \varDelta \over = \sum \limits _{i' = \max (1,i - \eta )}^{\min (i + \eta ,n)} {{\mu _{i'}}} \end{aligned}$$

(6)

To calculate (13), we first calculate ${t_1}(\eta )\mathrm{{ = }}\sum \limits _{i' = 1}^{1 + \eta } {{u_{i'}}} $ . Then for $i \in [2:n]$,

$$\begin{aligned} {t_i}(\eta )\mathrm{{ = }}\left\{ \begin{array}{l} {t_{i - 1}}(\eta ) + {\mu _{i + \eta }},\;\;\;\;\;\;\;\;\;\;\;\;\;\;i \in [2:(\eta + 1)] \\ {t_{i - 1}}(\eta ) + {\mu _{i + \eta }} - {\mu _{i - 1 - \eta }},\;\;\;i \in [(\eta + 2):(n - \eta )] \\ {t_{i - 1}}(\eta ) + {\mu _{i - 1 - \eta }},\;\;\;\;\;\;\;\;\;\;\;\;i \in [(n - \eta + 1):n] \\ \end{array} \right. \; \end{aligned}$$

(7)

Same as [4] and [5], the main purpose of SWBP is to find a best half window size $\hat{\eta }$. We define an expected rate:

(8)

where ${\mathcal{L}_Q}({\hat{b}_i}(\eta ),{y_i})$ is defined by (3). The best half window size $\hat{\eta }$ is chosen by

$$\begin{aligned} \hat{\eta }\mathrm{{ = }}\mathop {\arg \min }\limits _\eta \gamma (\eta ), \end{aligned}$$

(9)

It is a natural idea that best half window size should minimize the expected rate. The flowchart of Q-ary SWBP algorithm was illustrated in Fig. 1.

3 Parallel SWBP Algorithm

In sequential SWBP algorithm, each window size setup iteration generates an expected rate $\gamma (\eta )$, which is calculated by (15). Any two expected rate $\gamma ({\eta _1})$ and $\gamma ({\eta _2})$ (${\eta _1} \ne {\eta _2}$) are uncorrelated, and can be computed in parallel. In our parallel algorithm, all $\gamma (\eta )$, $\eta \in \left\{ {1,2, \ldots ,\left\lfloor {\frac{{n - 1}}{2}} \right\rfloor } \right\} $ are calculated simultaneously by thousands of threads on GPU. Once $\gamma (\eta )$, $\eta \in \left\{ {1,2, \ldots ,\left\lfloor {\frac{{n - 1}}{2}} \right\rfloor } \right\} $ are obtained, we use min() function in MATLAB to get the smallest $\gamma $ and corresponding best $\eta $ from array $\gamma (\eta )$. The sequential and parallel algorithm are illustrated in Fig. 2.

4 Experiment Results

In our experiments, we use Intel Core i7 with 3.60Ghz as our CPU and NVIDIA GTX 1080Ti as our GPU. We use MATLAB 2014b as our development platform. We perform numerical experiments to evaluate the performance of our parallel algorithm.

We set Q=256, and use 4 different regular LDPC codes as our input. The parameters of these LPDC codes are listed in Table 1. To eliminate the random errors, we perform 100 tests and average these outputs as our final results. The experiment result is illustrated in Fig. 3, which shows that parallel Q-ary SWBP algorithm achieves 2.9$\times $ to 30.3$\times $ accelerating ratio than sequential Q-ary SWBP algorithm. The longer the codeword length, the higher the accelerating ratio.

Table 1. Different LDPC code parameters (N is codeword length, K is information bit number)

Full size table

5 Conclusion

We propose a parallel Q-ary SWBP algorithm to decode regular LDPC codes with different codelength and Q value. This algorithm can address the bottlnecks of sequential Q-ary SWBP and be accelerated by GPU. Experiment results show that parallel algorithm achieves 2.9$\times $ to 30.3$\times $ speedup ratio under Q=256. The longer codeword length leads to higher speedup ratio.

References

MacKay, D.J.C.: Good error-correcting codes based on very sparse matrices. IEEE Trans. Inf. Theory 45(2), 399–431 (1999)
Article MathSciNet Google Scholar
Gallager, R.: Low-density parity-check codes. IRE Trans. Inf. Theory 8(1), 21–28 (1962)
Article MathSciNet Google Scholar
Davey, M.C., MacKay, D.: Low-density parity check codes over GF($q$). IEEE Commun. Lett. 2(6), 165–167 (1998)
Article Google Scholar
Fang, Y.: LDPC-based lossless compression of nonstationary binary sources using sliding-window belief propagation. IEEE Trans. Commun. 60(11), 3161–3166 (2012)
Article Google Scholar
Fang, Y.: Asymmetric Slepian-Wolf coding of nonstationarily-correlated M-ary sources with sliding-window belief propagation. IEEE Trans. Commun. 61(12), 5114–5124 (2013). https://doi.org/10.1109/TCOMM.2013.111313.130230
Article Google Scholar
Fang, Y., Yang, Y., Shan, B., Stankovic, V.: Joint source-channel estimation via sliding-window belief propagation. IEEE Trans. Commun. (2019, submited)
Google Scholar
NVIDIA. http://www.nvidia.com/object/what-is-gpu-computing.html
Shan, B., Fang, Y.: GPU accelerated parallel algorithm of sliding-window belief propagation for LDPC codes. Int. J. Parallel Program. (2019). https://doi.org/10.1007/s10766-019-00632-3

Download references

Author information

Authors and Affiliations

School of Information Engineering, Chang’an University, Xi’an, China
Bowei Shan, Sihua Chen & Yong Fang

Authors

Bowei Shan
View author publications
You can also search for this author in PubMed Google Scholar
Sihua Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yong Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bowei Shan .

Editor information

Editors and Affiliations

Northwestern Polytechnical University, Xi’an, China
Bo Li
Northwest University, Xi’an, China
Jie Zheng
Chang’an University, Xi’an, China
Yong Fang
Northwestern Polytechnical University, Xi’an, China
Mao Yang
Northwestern Polytechnical University, Xi’an, China
Zhongjiang Yan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shan, B., Chen, S., Fang, Y. (2020). Accelerating Q-ary Sliding-Window Belief Propagation Algorithm with GPU. In: Li, B., Zheng, J., Fang, Y., Yang, M., Yan, Z. (eds) IoT as a Service. IoTaaS 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 316. Springer, Cham. https://doi.org/10.1007/978-3-030-44751-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-44751-9_1
Published: 01 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44750-2
Online ISBN: 978-3-030-44751-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics