1 Introduction

Forward error correction (FEC) has become a major component in digital communication systems, allowing error recovery at the decoder whenever a data stream undergoes channel impairments. Several capacity-approaching codes driving system performance close to theoretical Shannon limits (Shannon 1948) have been designed and widely exploited, such as low density parity check (LDPC) codes (Gallager 1968), turbo codes (Berrou et al. 1993) and polar codes (Arikan 2009). Turbo and LDPC codes have been used in a wide range of standards and applications, such as in the third (3G) and fourth (4G) generations of mobile communication systems (e.g. UMTS, LTE, LTE-A,...) and digital video broadcasting (DVB) (Douillard et al. 2000), whereas the more recent polar codes are being considered for the FEC module in the air interface design of next generation (5G) systems (HUAWEI 2015).

The expected 5G will be an enabling technology for applications with massive connectivity demands (Tong et al. 2015), such as the Internet of Things (IoT), where large numbers of devices with multiple sensors and actuators exchange information and control commands, thus forming a wireless sensor network (WSN) (Molano et al. 2017; Ryoo et al. 2017). Large amounts of information data being exchanged in a WSN bring to our concern the problem of data compression, in order to alleviate bitrate and power requirements, without forgetting the necessity for channel coding to protect transmitted data from channel errors.

In WSN, source and channel coding incur additional computational burden at the wireless sensors where resources (i.e. power and memory) are scarce. With distributed source compression (DSC) and joint source-channel coding (JSCC) techniques, a channel code can be used for source compression as well as for error protection, shifting the computational load to the decoder side where sufficient resources can be found (e.g. a base station or relay node). The concept of DSC is based on Slepian–Wolf theorem (Slepian and Wolf 1973); considering two correlated sources X and Y, both sources can be independently encoded and jointly decoded with the same compression efficiency as joint encoding and decoding. In other words, if both sources are jointly encoded and decoded, the best theoretically achievable compression is the joint entropy H(XY), while on the other hand, if Y is compressed to its entropy H(Y), X can be independently compressed to the conditional entropy \(H(X\mid Y)\) provided that Y is available as side information for decoding X (i.e. joint decoding), thus resulting in the same overall compression since \(H(X,Y)=H(Y)+H(X\mid Y)\). Source correlation can be modeled as a noisy channel with one source (X) as the channel input and the other source (Y) as the output, and thus, channel coding techniques can be used to recover X by observing Y as a noisy version of X (Aaron et al. 2002; Ascenso and Pereira 2009; Farah et al. 2006; Liveris et al. 2002; Sartipi and Fekri 2005; Yaacoub et al. 2007, 2008, 2009). When a transmission channel is taken into account in a DSC application, channel codes used for forward error correction over the correlation channel can also be used for error protection over the transmission channel, thus allowing for joint source and channel coding.

Turbo and LDPC codes have demonstrated superior performance in multiple DSC and JSCC applications, compared to other FEC codes. For instance, Farah et al. (2006) used non-binary turbo codes for the compression of correlated sources, and extended their study to the case of joint source-channel coding. Aaron et al. (2002) and Yaacoub et al. (2008, 2009) used turbo codes in a DSC approach for video compression referred as distributed video coding (DVC). LDPC codes have been used for the compression of binary sources with side information at the decoder (Liveris et al. 2002) as well as for DVC (Ascenso and Pereira 2009). Different schemes for DSC or JSCC in wireless sensor networks have also been proposed based on turbo (Yaacoub et al. 2007) and LDPC (Sartipi and Fekri 2005) codes.

Since their invention by Arikan (2009), polar codes have been well investigated in the literature. The idea behind polar codes is to create J new channels from J independent copies of a channel using a linear transformation, such that the new channels are polarized. Therefore, data can be transmitted over these synthesized good channels whereas only zeros (frozen bits) are sent over the bad channels, with the same overall capacity. Recent studies (Iscan et al. 2016; Zhang et al. 2016) have demonstrated the superior performance of polar codes compared to LDPC and turbo codes in the context of 5G test scenarios. In addition to their error correction capability that outperforms both turbo and LDPC codes for the case of short and moderate block lengths (HUAWEI 2015), these codes can be constructed and decoded using simple algorithms that are more computationally efficient than LDPC and turbo codecs, which makes them suitable for a wide range of applications, including DSC and JSCC in wireless sensor networks.

The use of polar codes in DSC applications has been studied by Lv et al. (2013); Onay (2014); Trang et al. (2012) and Korada and Urbanke (2010). Yaacoub and Sarkis (2016) proposed using polar codes in their systematic form within the context of DSC, due to their superior error correction capability compared to non-systematic codes (Arikan 2011) and to their intuitive design approach for DSC; with systematic codes, only parity bits are transmitted to the decoder where the missing systematic bits are replaced with the side information. This study was later extended (Yaacoub and Sarkis 2017) to the case of JSCC where a Gaussian model was considered for source correlation, and an additive Gaussian noise considered for the transmission channel. This paper is an extended version of Yaacoub and Sarkis (2017) with more in depth theoretical calculations, simulation scenarios and practical results. The remainder of this paper is organized as follows. In Sect. 2, a detailed description of the JSCC system model is presented in the context of WSN along with a brief review of systematic polar encoding. Simulation scenarios are presented in Sect. 3 and practical results are then discussed. Finally, conclusions are drawn in Sect. 4 with a brief discussion of future work perspectives.

2 System description

Consider a network of wireless sensors observing a common source of information, and transmitting data to a relay node or a central base station for decoding. This model fits for several practical scenarios where sensor data are correlated. For example, temperature measurements from different nodes in a dense region would be spatially and temporally correlated. Similarly, streams from multiple video sensors capturing the same scene from different views or angles (e.g. multiview video) would also be correlated. The proposed block diagram for JSCC in this context is shown in Fig. 1 where a network of 2 sensors is shown, for simplicity. One of the sensors (sensor 2 in Fig. 1) applies conventional source and channel encoding (CSCE) techniques to transmit its observed data (Y) to the base station where the corresponding conventional source and channel decoders (CSCD) reside. The other sensor (sensor 1) independently encodes its data (X) using a systematic polar encoder (SPE). At the output of the SPE, systematic bits are dropped while only parity bits are transmitted over a noisy channel to the base station. While providing error protection over the transmission link, if the number of parity bits does not exceed the number of input (or equivalently, systematic) bits, compression is also achieved. A systematic polar decoder (SPD) at the base station uses the decoded source Y as a noisy version of the systematic data needed to recover X. In case of a larger number of sensors, only one would employ CSCE to provide initial side information for the others. If the number of sensors is very large, they could be grouped into clusters where each cluster contains one sensor with CSCE. In the sequel, we will only consider the case of two sensors, while the generalization to an arbitrary number of sensors is straightforward.

Fig. 1
figure 1

Block diagram of the proposed JSCC system model

With conventional encoding, Y can be compressed to a rate close to its entropy bound H(Y) and correctly recovered at the decoder. This can be achieved using any entropy coding scheme, e.g. Huffman coding (Huffman 1952), with a suitable FEC code. As stated earlier, by exploiting the correlation between X and Y at the decoder, X can be compressed to a rate close to the conditional entropy \(H(X\mid Y)\), thus achieving stronger compression compared to H(X), which represents the achievable compression rate when Y is not to be exploited for decoding X. For a (M, K) SPE, compression is achieved when M<2K, and the compression rate of X is defined as:

$$\begin{aligned} R=\frac{M-K}{K}=\frac{M}{K}-1. \end{aligned}$$
(1)

The case of a binary discrete memoryless source X with equally likely symbols is considered. A virtual channel is used to model the correlation between the sources, taking at its input the source X and giving Y at its output. As Y does not necessarily need to be discrete, a Gaussian correlation model is considered. After encoding X, systematic data {\(d_s\)} is dropped while parity bits {\(d_p\)} travel along with Y through an additive noise channel, as shown in the simplified model of Fig. 2.

Fig. 2
figure 2

Simplified JSCC system model

The correlation channel is modeled as a Gaussian channel. The binary source X is fed to a binary pulse amplitude modulator (B-PAM) that outputs rectangular pulses of duration \(T_b\) and amplitudes \(\pm \sqrt{E_b/T_b}\). Gaussian noise (N) is added to the transmitted pulse, and the channel output is then sampled to obtain the source Y. While source correlation models vary depending on the application (e.g. temperature measurement, multiview video, etc...), this correlation channel model (shown in Fig. 3) is borrowed from communications theory (Haykin 2001), where \(E_b\) represents bit energy and N represents a zero-mean additive white Gaussian noise (AWGN) with power spectral density \(N_0/2\). Therefore, the correlation between X and Y can be measured by the bit energy to noise density ratio \(E_b/N_0\) (i.e. the higher the ratio, the more the sources are correlated).

Fig. 3
figure 3

Source correlation model

According to Slepian–Wolf theory, the lower bound for R is \(H(X\mid Y)\) which can be obtained by:

$$\begin{aligned} H\left( X|Y \right) =H\left( X \right) -I\left( X,Y \right) , \end{aligned}$$
(2)

where I(XY) is the mutual information between X and Y, and \(H(X)=1\) since X is a discrete binary memoryless source with equally probable symbols. On the other hand, the mutual information for the case of our binary-input-Gaussian-output channel is obtained by:

$$\begin{aligned} I\left( X,Y \right) =\sum \limits _{x=0}^{1}{{{p}_{x}}\int _{-\infty }^{+\infty }{{{f}_{Y|X}}\left( y|x \right) {{\log }_{2}}\left( \frac{{{f}_{Y|X}}\left( y|x \right) }{\sum \limits _{x=0}^{1}{{{p}_{x}}{{f}_{Y|X}}\left( y|x \right) }} \right) dy}}, \end{aligned}$$
(3)

where \(p_x\) is the probability of occurrence of x and \(f_{Y|X}(y|x)\) is the probability density function (PDF) of Y knowing X, which is equivalent to the noise PDF in the AWGN channel. Additionally, the upper bound of Eq. (3) is the correlation channel capacity \(I_{max}\):

$$\begin{aligned} {{I}_{\max }}=\frac{1}{2}{{\log }_{2}}\left( 1+\frac{{{E}_{b}}}{{{N}_{0}}} \right) . \end{aligned}$$
(4)

Therefore Eq. (2) can be estimated as:

$$\begin{aligned} H\left( X|Y \right) =1-\frac{1}{2}{{\log }_{2}}\left( 1+\frac{{{E}_{b}}}{{{N}_{0}}} \right) . \end{aligned}$$
(5)

However, for this estimation to fit the distributed compression model, Eq. (4) has to be clipped to unity for the constraint \(0\le H\left( X|Y \right) \le H\left( X \right) \le 1\) to be satisfied. Practically, there is always a gap between the theoretical bound and the achievable rate which depends on code design. In case of JSCC, additional redundancy bits are required to overcome channel impairments, and thus the gap towards \(H(X\mid Y)\) further increases.

To illustrate the effect of the transmisison channel on compression bound, let \(E_c/N_0\) the transmitted bit energy to noise density ratio, and \(H_c(X\mid Y)\) the new bound; \(H_c(X\mid Y)\) depends on \(H(X\mid Y)\) and on transmission channel capacity \(C_{trans}\) as:

$$\begin{aligned} {{H}_{c}}\left( X|Y \right) =\frac{H\left( X|Y \right) }{{{C}_{trans}}}=\frac{2H\left( X|Y \right) }{{{\log }_{2}}\left( 1+\frac{{{E}_{c}}}{{{N}_{0}}} \right) }. \end{aligned}$$
(6)

Figure 4 shows \(H(X\mid Y)\) for the case of DSC as well as \(H_c(X\mid Y)\) for the case of JSCC, for different values of \(E_c/N_0\). It can be observed that \(H_c(X\mid Y)\) approaches \(H(X\mid Y)\) for high values of \(E_c/N_0\). When transmission channel conditions worsen (i.e. \(E_c/N_0\) decreases), \(H_c(X\mid Y)\) increases and could reach values greater than unity. When \({{H}_{c}}\left( X|Y \right) \ge 1\), this bound indicates that compression cannot be achieved, i.e. the average number of transmitted bits per input bit should be greater than 1 in order to achieve an arbitrarily low bit error rate (BER).

Fig. 4
figure 4

Compression bound for different source correlation levels (\(E_b/N_0\)) and transmission channel conditions (\(E_c/N_0=1, 2, 3\), or 4 dB)

Different systematic polar encoders are considered in this study. For a (M, K) SPE code, the number M of output bits is chosen as a power of 2 (i.e. \(\hbox {M}=2^n\), for \(n = 8, 10, 12, 14, 16\), or 18), whereas the compression rate is varied by varying the number K of input data bits for a given value of M.

Define \({{x}^{in}}=\left[ {{x}_{\left\{ i \right\} }},{{0}_{{{\left\{ i \right\} }^{c}}}} \right]\), a vector of M bits that includes \({{x}_{\left\{ i \right\} }}\), the K information bits of input vector x at positions defined by the set of indices \(\{i\}\), and \({{0}_{{{\left\{ i \right\} }^{c}}}}\), a set of M−K zeros at frozen bit indices \(\{i\}^c\). In a non-systematic polar encoder, the output codeword d is obtained by computing:

$$\begin{aligned} d={{x}^{in}}\cdot {{F}^{\otimes n}}, \end{aligned}$$
(7)

where \({{F}^{\otimes n}}\) is the Kronecker product of n copies of the kernel F defined as: \(F=\left[ \begin{matrix} 1 &{} 0 \\ 1 &{} 1 \\ \end{matrix} \right]\).

In a SPE, the output codeword consists of systematic and parity bits such that d\(= \left[ {{d}_{\left\{ i \right\} }},{{d}_{{{\left\{ i \right\} }^{c}}}} \right]\), where the systematic part is \(d_s=d_{\{i\}}=x_{\{i\}}\), and the parity component is \(d_p = d_{\{i\}^c}\). The systematic bits in SPE do not appear as the first K bits in the output codeword similar to systematic linear block codes, but they rather appear at information bit indices at the SPE output, and parity bits are therefore placed at frozen bit indices in d. Given the information vector x, the output codeword of an SPE is the solution of:

$$\begin{aligned} d={z}\cdot {{F}^{\otimes n}}, \end{aligned}$$
(8)

where \(z=\left[ z_{\{i\}}, 0_{\{i\}^c}\right]\), with \(z_{\{i\}}\) and \(d_{\{i\}^c}\) being the unknowns. Algorithms for the solution of Eq. (8) were proposed by Vangala et al. (2016) along with their source codes (Vangala et al. 2015), whereas successive cancellation (Vangala et al. 2016) is used at the decoder.

3 Practical results

In our simulations, the case of DSC (i.e. transmission channel is noiseless) is first considered. Figure 5 shows the gap between the achievable compression rate and the theoretical compression bound, for a target bit error rate (BER) of \(10^{-6}\), using SPEs with different values of \(n\in \left\{ 8,10,12,14,16,18\right\}\). It can be observed that rate curves have a behavior comparable to the theoretical limit, except for \(n=16\) and \(n=18\) where the rate quickly converges towards \(H(X\mid Y)\) as the correlation between X and Y (i.e. the ratio \(E_b/N_0\)) increases. In general, the gap towards \(H(X\mid Y)\) decreases as \(E_b/N_0\) increases. For example, for \(n = 10\), the gap is reduced by 0.05 bits when \(E_b/N_0\) increases from 1.5 to 3 dB. On the other hand, these curves show that for a desired compression ratio, a greater energy to correlation noise ratio is required as n decreases in order to achieve the desired BER performance.

Fig. 5
figure 5

Achievable compression rate for a target BER of \(10^{-6}\) in case of DSC

In Fig. 6, the BER obtained with different SPEs is shown as a function of the conditional entropy \(H(X\mid Y)\) for a compression rate of 0.64 with different values of n. The dotted line represents the actual compression rate and the distance between this line and any data point represents the gap towards the compression bound for a given BER. For example, 0.15 bits (per input bit) are required in addition to \(H(X\mid Y)\) to achieve a target BER of \(10^{-8}\) for \(n = 18\). This gap increases to 0.34 bits for \(n = 12\). It can be clearly observed that for this compression rate, the BER curve corresponding to \(n = 18\) is the closest to the dotted line and thus, the SPE with \(n = 18\) achieves the best compression performance in this case. On the other hand, it can be noticed that in the region where \(H(X\mid Y)\) is greater than the actual compression rate (i.e. right side of the dotted line in Fig. 6), the BER quickly increases as expected, since \(H(X\mid Y)\) represents the minimum rate required to achieve low BER.

Fig. 6
figure 6

BER with respect to \(H(X\mid Y)\) obtained with a compression rate of 0.64 represented as a dotted line

In Fig. 7, \(H(X\mid Y)\) is fixed to 0.315 (dotted line) and the BER is measured for different compression rates. As expected, a stronger compression results in increased BER, regardless of the value of n. By observing the system behavior for larger values of n (i.e. \(n = 14\), 16, and 18), it can be noticed that the BER increases with n for \(R \le 0.4\) (roughly). After this threshold (i.e. for \(R>0.4\)), the BER sharply drops and decreases as n increases. This is due to the fact that an arbitrarily low BER cannot be achieved as the rate approaches zero (very strong compression) in a (M, K) SPE with very large M, which is not the case with polar codes used in channel coding applications where a better performance is always obtained by increasing M whose maximum value is bound by physical constraints (e.g. memory requirements...).

Fig. 7
figure 7

BER with respect to compression rate obtained with \(H(X\mid Y)=0.315\) represented as a dotted line

After evaluating our SPE-based DSC system, we study next the case of JSCC, i.e. the influence of transmission channel errors on system performance. In a first scenario, the side information (Y) is assumed successfully recovered at the decoder using conventional source and channel coding techniques (i.e. referring to Fig. 2, Y is not affected by channel noise) whereas the SPE is jointly used for both compression and forward error correction, for the transmission and reconstruction of the source X. The correlation channel is the same used for DSC, whereas the transmitted symbol energy to noise density ratio (\(E_c/N_0\)) is varied on the parity transmission channel (i.e. the channel carrying parity bits) in order to analyze our JSCC system performance in terms of BER.

Fig. 8
figure 8

BER for different source correlation levels and parity channel conditions obtained in case of JSCC, with a compression rate of 0.45 and \(n=12\)

Fig. 9
figure 9

BER for different source correlation levels and parity channel conditions obtained in case of JSCC, with a compression rate of 0.64 and \(n=12\)

Fig. 10
figure 10

BER for different source correlation levels and parity channel conditions obtained in case of JSCC, with a compression rate of 0.45 and \(n=16\)

Fig. 11
figure 11

BER for different source correlation levels and parity channel conditions obtained in case of JSCC, with a compression rate of 0.64 and \(n=16\)

We consider codes with \(M=2^{12}\) and \(M=2^{16}\), compression rates of 0.45 and 0.64, and \(E_c/N_0=1, 2, 3.5\), and 5 dB. The BER is measured in each case and results are reported in Figs. 8, 9, 10, and 11. In those figures, the BER is not represented as a function of \(H(X\mid Y)\) as in Fig. 6, since the uncertainty about the transmitted source increases due to noisy transmission and consequently, \(H(X\mid Y)\) would not be the same. Therefore, we plot BER curves as a function of the correlation parameter \(E_b/N_0\), for different values of \(E_c/N_0\). Furthermore, we keep, for reference, the BER obtained in case of DSC represented as a dotted curve and labeled as noise-free on the figures. These reference lines represent the best achievable performance obtained when there is no noise affecting the transmission of source X. Two major observations can be made from Fig. 8 through Fig. 11. The first observation is that for the same input block length (i.e. fixing \(n=12\) or \(n=16\)), the stronger the compression the less the system is sensitive to noise. For example, for \(n=12\) at \(E_b/N_0=0\) dB, the BER increases from roughly 0.005 to 0.1 (i.e. 20 times) when \(E_c/N_0\) decreases from 5 dB to 1 dB for a compression rate of 0.45, whereas with a compression rate of 0.64, at the same source correlation level (\(E_b/N_0=0\, \hbox {dB}\)) and the same noise levels, the BER increases by a factor of 100 (i.e. from \(10^{-4}\) to \(10^{-2}\)). Though the BER shows lower values for the weaker compression (i.e. compression ratio of 0.64), implying a better BER performance, the BER increases faster with noise in case of higher compression ratio (i.e. weaker compression), indicating higher sensitivity to noise. Similarly, for \(n=16\) at \(E_b/N_0=0\,\hbox {dB}\), when \(E_c/N_0\) decreases from 5 to 1 dB, the BER increases by a factor of 100 for a compression ratio of 0.45, whereas it increases by a factor of 1000 when the compression rate is 0.64. Therefore, despite the fact that the error rate increases with stronger compression, one can conclude that for a fixed value of n, the BER increases with noise at a slower rate in case a stronger compression was applied, compared to the case of a weaker compression. The second observation is that for the same compression ratio, the system is more sensitive to noise when the block length increases. For example, considering the same operating points previously discussed, when the block length increases from \(n=12\) to \(n=16\), the rate of BER increase goes from 100 to 1000 with a compression ratio of 0.64, and from 20 to 100 with a compression ratio of 0.45. In other words, for a fixed compression ratio, the BER increases faster with noise for larger values of n, compared to the case of smaller blocks.

In contrast with the previous simulation setup where the side information Y was assumed perfectly recovered at the receiver, we consider next that the decoder relies on a noisy version of Y for decoding X. With reference to the system model in Fig. 2, Y undergoes an additive noise with same statistical properties as the noise on parity bits. Codes with \(M=2^{12}\) and \(M=2^{16}\) are considered with compression rates of 0.45 and 0.64 as in the previous scenario. The BER is measured in each case for \(E_c/N_0=2\), 3.5, 5, and 7 dB, and the results are shown in Figs. 12, 13, 14, and 15, where the dotted lines represent the best achievable performance when no noise is present neither on side information nor on parity bits. Significant performance degradation can be observed compared to the case where Y is perfectly recovered. For example, for \(E_c/N_0=2\) dB and \(E_b/N_0=4\) dB, with \(n=12\) and a compression rate of 0.45, the BER obtained with a noisy side information is about 200 times the BER obtained when Y is recovered error-free. For a compression rate of 0.64, \(n=16\), and \(E_b/N_0=4\) dB, \(E_c/N_0\) should be increased by 6 dB with noisy SI compared to the case of ideal SI in order to obtain the same BER. Similar observations can be made for different values of \(E_b/N_0\), \(E_c/N_0\), n and compression rate. On the other hand, in case of a noiseless SI channel, system performance (in terms of BER) converges towards the case of DSC when \(E_c/N_0\) approaches 5 dB, whereas with noiseless SI, BER curves seem to reach an error floor with a significant gap towards the noise-free (DSC) scenario even for values of \(E_c/N_0\) greater than 5 dB.

Fig. 12
figure 12

BER for different source correlation levels and parity channel conditions obtained in case of JSCC with noise on SI, a compression rate of 0.45 and \(n=12\)

Fig. 13
figure 13

BER for different source correlation levels and parity channel conditions obtained in case of JSCC with noise on SI, a compression rate of 0.64 and \(n=12\)

Fig. 14
figure 14

BER for different source correlation levels and parity channel conditions obtained in case of JSCC with noise on SI, a compression rate of 0.45 and \(n=16\)

Fig. 15
figure 15

BER for different source correlation levels and parity channel conditions obtained in case of JSCC with noise on SI, a compression rate of 0.64 and \(n=16\)

4 Conclusion

In this paper, we investigated the use of systematic polar codes for the joint source-channel coding of correlated sources, in the context of wireless sensor networks. A Gaussian model has been considered to represent source correlation, and a Gaussian channel has been considered for transmission. A simple scenario of two correlated sources has been considered for simplicity, but the generalization to an arbitrary number of sources is straightforward. It has been shown that a better error rate can be obtained with less compression and longer blocks, whereas it was observed that the system with stronger compression and shorter blocks is more robust to degradation due transmission channel impairments. It was also noticed that noise on side information significantly degrades the system performance whereas when the side information is perfectly recovered at the receiver and only parity bits undergo channel impairments, adequate power management would allow the system to overcome the effect of noise and perform similarly to the case of distributed source compression.

As for future work perspectives, we aim at considering more practical scenarios with large numbers of sensors, multiple relay nodes, and fading channels.