1 Introduction

In recent times, many up-stream applications, such as wireless surveillance video and mobile camera, have emerged wherein the size and battery life of the transmitting device are of paramount importance. For such applications, a low complexity encoder is desirable at the expense of a highly complex decoder [12]. A new video coding paradigm, known as distributed video coding (DVC), has been proposed to fulfill this requirement [17]. In this video coding paradigm, the source statistics are exploited only at the decoder. DVC is a new coding scheme that is not yet fully developed, but is now receiving more attention from the research community [14]. The theoretical for DVC is based on Slepian-Wolf [29] and Wyner-Ziv [34] theorems. According to the Slepian-Wolf theorem, if two correlated sources X and Y are encoded separately and decoded jointly, we can achieve a minimum coding rate, which is the same as that of joint encoding and joint decoding. Wyner and Ziv extended the Slepian-Wolf theorem for lossy coding with a decoder side information (SI). The Wyner-Ziv (WZ) coding technique has been widely used in DVC. Most of the DVC schemes developed in the literature are based on the Stanford architecture [1], which is one of the first practical implementations of DVC. In these DVC schemes, the video sequences are first divided into Group of Pictures (GOPs). The first frame of each GOP, called a key frame, is encoded using the conventional intra-frame encoder, for example, H.264/AVC in the intra-mode. The remaining frames in a GOP, called WZ frames, are encoded based on channel codes such as turbo codes or LDPC codes. In the transform-domain DVC [2], a 4 × 4 discrete cosine transform (DCT) is applied on each WZ frame and the corresponding coefficients organized into 16 DCT coefficient bands X i , i = 1, 2, ... , 16. After quantizing each coefficient band into different levels depending on the target quality, bitplanes are extracted for each quantized coefficient band. Subsequently, each bitplane goes through a rate-adaptive channel encoder, to produce some error-correcting bits (parity bits or syndrome bits). These bits are stored in a buffer and transmitted incrementally upon a request by the decoder received through a feedback channel. At the decoder, after decoding the key frames, an SI frame, as an approximation to the current WZ frame, is usually generated by a motion- compensated interpolation or extrapolation of the decoded key frames and the previously decoded WZ frames. The side information is then used in the channel decoder along with the buffered error-correcting bits requested via a feedback channel, in order to decode the bitpalanes corresponding to each DCT band of the current WZ frame and finally, to reconstruct the WZ frame. In DVC, the ability to model the statistical dependency between the WZ information and SI generated at the decoder has a significant impact on the coding efficiency and the rate-distortion (RD) performance. The DICSOVER codec, a state-of-the-art and practical implementation of the transform-domain DVC based on the Stanford approach, was developed in 2007 by Artigas et al. [3]. Figure 1 shows the architecture of the Discover codec. X W Z and X represent the WZ frame and its corresponding DCT coefficient bands respectively. X P and X F are two successive decoded key frames that are used for generating the side information frame Y S I . Moreover, Y is the DCT coefficient bands of the SI frame Y S I . In the Discover codec, a Laplacian distribution is used in decoder to model the correlation noise between the DCT band of SI and the corresponding DCT band in the WZ frame. The Laplacian distribution parameter is estimated online for each DCT band, as well as for each of the coefficients in the various bands. The Log-likelihood ratio (LLR) for each bit is calculated by considering the previously decoded bitplanes, the side information and the correlation noise model (Laplacian distribution in the case of DISCOVER codec). Rate-adaptive LDPC accumulative (LDPCA) codec is employed by DISCOVER. In the LDPCA decoder, the bitplanes are decoded from the most significant one to the least significant, based on the calculated LLR intrinsic values and the syndrome bits sent by the encoder. If the decoder fails to decode the current bitplane, then it requests for more syndrome bits through a feedback channel. After successfully decoding all the bitplanes associated with the DCT bands, the bitplanes are grouped together to form the corresponding quantized DCT coefficients. Then, inverse DCT is applied to the quantized coefficients to construct the decoded WZ frame.

Fig. 1
figure 1

DISCOVER codec [3]

In DVC, the parameters of the correlation noise model should be estimated as accurately as possible to improve the overall RD performance and the coding efficiency.

The correlation noise parameters can be computed offline at the encoder using the original WZ frame and the estimated side information; however, the encoder has to extract the side information by using a motion estimation procedure, thus making it complex. The correlation noise parameters can also be calculated offline by employing training methods using several video sequences [2, 34]; however, in this case the same correlation noise model is used for the corresponding DCT bands of any given video sequence and the non-stationary behavior of the correlation noise is not taken into account.

These parameters can also be estimated online at the decoder without having access to the original WZ frames, a realistic solution that is used in practice. For the pixel-domain WZ video coding, Brites et al. [7] have proposed several online schemes that make use of the temporal correlation between frames to estimate the correlation noise at different levels, frame, block and pixel. They estimate the correlation noise parameters for block and pixel levels by using the spatial correlation within each frame and obtained estimates, which are more accurate than that using frame-level estimation. In 2008, Brites and Pereira extended the work in the pixel-domain WZ video coding [7] to the transform domain WZ video coding by estimating the correlation noise model parameters in the DCT band level as well as at the coefficient level [6]. In 2009, Haung and Forchhammer [19] improved the method proposed in [6] by considering the cross-band correlation and using a classification map that is refined after each DCT band is decoded. Esmaili and Cosman [13, 14] proposed a method to estimate the correlation noise parameters by separating and classifying the blocks of each frame based on the quality and accuracy of the side information. After determining the class for each specific block, a Laplacian parameter value is assigned to each of the blocks, using a lookup table. In some of the correlation noise estimation methods, the information on the previously-decoded bands is used to improve the decoding of the current band [15, 20, 27]. Thus, the previously decoded bands are used to improve the estimation of the correlation noise in the succeeding bands.

In most of the online methods [6, 7, 1315, 19, 20, 27], the estimation process is performed before the Slepian-Wolf decoder starts to decode the bitplanes. Therefore, the estimated parameters for the correlation noise are held constant, that is, they are not modified during the decoding of each DCT band. The soft information of each bitplane corresponding to a DCT band is available after every iteration of the LDPC decoder. This information can be used at the decoder to estimate and refine the correlation noise parameters during the decoding process. In [24], a parallel LDPC decoding is used to decode and estimate the correlation noise parameters on a factor graph. In this algorithm, the non-stationary characteristic of the correlation noise through the DCT bands is not taken into account, and just one parameter is estimated for each DCT band. In [28] and [33], a particle filter-based message passing algorithm for decoding and adaptively estimating the correlation noise parameters has been proposed. As a stochastic method is used for the message passing, it may lead to unpredictable results; further, it is slow as it requires a large number of iterations.

In this paper, in order to overcome the aforementioned limitations, a new message passing algorithm, based on variational Bayes, is proposed for decoding the bitplanes corresponding to each DCT band in a WZ frame, and simultaneously for estimating and refining the correlation noise parameter. In Section 2, using an augmented factor graph, a parallel decoding of several bitplanes as well as Bayesian estimation of the correlation noise parameter is briefly reviewed. In Section 3, the proposed message passing algorithm on the augmented factor graph is presented. Variational Bayes method is employed to approximate the posterior distribution of the correlation noise parameter, which is used to derive a closed form expression for the messages on the augmented factor graph. In Section 4, the performance of the proposed algorithm is studied in the frame work of a DVC codec using several video sequences. Conclusion is given in Section 5.

2 Bayesian estimation of the correlation noise parameter in a parallel LDPCA decoder

As the correlation noise distribution in DVC is defined at a symbol or coefficient level, all the corresponding bitplanes are required to be available for them to be decoded simultaneously on an augmented factor graph in order to estimate the parameter of the correlation noise [24, 31]. Therefore, a parallel LDPCA decoder is used. As a consequence, cross correlation between the bitplanes is utilized to improve the decoding performance of DVC [24]. The parameters of the correlation noise distribution are unknown and need to be estimated during the decoding process dynamically and progressively. One way of estimating the unknown parameters ϕ is by using the maximum likelihood estimation (MLE) method which seeks the parameters that maximize the likelihood function P(D|ϕ) given the observation D [10]. Maximum likelihood estimation has been used for estimating the channel and correlation noise parameters in distributed source coding (DSC) [32, 36] and DVC problems, [11]. In [32], it is used to estimate the cross-over probability for binary symmetric channel (BSC) modeling of the channel in DSC. It has been used for estimating the correlation noise parameter during the decoding process in DVC [11, 36]. One of the drawbacks with MLE is that the entire probability mass is used to assign probabilities to the observed data. Further, MLE performs poorly when the sample size is small. One way to overcoming these drawbacks is to add a prior distribution for ϕ, which allows us to adjust and control as to how the probability mass could be distributed between the observed and unobserved data. Employing the Bayes rule, we can use such a prior distribution for ϕ so that a posteriori distribution, conditioned on the data D, can be derived as: P(ϕ|D) = P(ϕ)P(D|ϕ)/Z, where Z is a normalization factor. In maximum a posteriori (MAP) estimation, we look for the parameters ϕ that maximize the posterior distribution P(ϕ|D). MLE and MAP are point estimation methods that yield fixed values for ϕ. Consequently, any information regarding the uncertainty of the parameters is not taken into account. To address this problem, the Bayesian estimation is used. In this approach, all possible values for ϕ are considered by defining a probability distribution for ϕ. Hence, in this approach, parameter estimation is equivalent to calculating the posterior distribution of ϕ. Moreover, Bayesian estimation performs better than MLE when the sample size is small.

Suppose Y = {y 1, y 2, ... , y N }, N being the number of 4 × 4 blocks in frame, is a DCT coefficient band obtained by grouping the DCT coefficients of the side information frame and X = {x 1, x 2, ... , x N } is the corresponding DCT coefficient band for the current WZ frame quantized uniformly to 2β levels, where β is the number of bitplanes extracted from the quantized symbols of DCT coefficient band X, and decoded jointly using the LDPCA decoders.

To take into account the non-stationary characteristic of the correlation noise in each DCT coefficient band in the DVC problem, a parameter θ j is assigned to each block of DCT coefficients with length M, where j = 1, 2, ... , N/M [33]. As M is selected to be relatively small, Bayesian estimation is preferred for estimating the parameter θ j . Considering only jth block of DCT coefficients, the posterior distribution for parameter θ j given \(Y_{j}=\{{y^{j}_{1}},{y^{j}_{2}}, ... , {y^{j}_{M}}\}\), M DCT coefficients in the corresponding DCT band Y of the generated side information frame, can be written as [28]

$$ P(\theta_{j}|Y_{j})=\frac{1}{L_{j}}p(\theta_{j})\prod\limits_{i=1}^{M} P({y_{i}^{j}}|\theta_{j}) $$
(1)

where L j is a normalization factor. Replacing \(P({y_{i}^{j}}|\theta _{j})\) by \({\sum }_{{x_{i}^{j}}} P({y_{i}^{j}},{x_{i}^{j}}|\theta _{j})\) where \({x_{i}^{j}}\) is the coefficient in the DCT band of the WZ frame corresponding to \({y_{i}^{j}}\), (1) gets transformed to

$$ \begin{array}{lllllll} P(\theta_{j}|Y_{j}) &=\frac{1}{L_{j}}P(\theta_{j}){\prod}_{i=1}^{M} \sum_{{x_{i}^{j}}} P({y_{i}^{j}},{x_{i}^{j}}|\theta_{j})\\ &=\frac{1}{L_{j}}P(\theta_{j}){\prod}_{i=1}^{M} \sum_{{x_{i}^{j}}} P({y_{i}^{j}}|{x_{i}^{j}},\theta_{j})P({x_{i}^{j}}) \end{array} $$
(2)

where the summation is over all the values that \({x_{i}^{j}}\) can take. To find the posterior distribution, the corresponding factor graph [23] is first obtained. In the factor graph, a message along the edge from node a to node b is represented by μ ab . The likelihood function \(P({y_{i}^{j}}|{x_{i}^{j}},\theta _{j})\) in (2) is represented by the factor node \({f_{i}^{j}}({y_{i}^{j}},{x_{i}^{j}},\theta _{j})\) in the factor graph, while the prior distribution for \({x_{i}^{j}}\), \(P({x_{i}^{j}})\) by the message \(\mu _{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}({x_{i}^{j}})\) from the variable node \({x_{i}^{j}}\) to the factor node \({f_{i}^{j}}\). As a consequence, the posterior distribution (2) can be rewritten as

$$ P(\theta_{j}|Y_{j}) =\frac{1}{L_{j}}P(\theta_{j})\prod\limits_{i=1}^{M} \sum\limits_{{x_{i}^{j}}} {f_{i}^{j}}({y_{i}^{j}},{x_{i}^{j}},\theta_{j})\mu_{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}({x_{i}^{j}}) $$
(3)

We can identify the sum \({S_{i}^{j}}={\sum }_{{x_{i}^{j}}} {f_{i}^{j}}({y_{i}^{j}},{x_{i}^{j}},\theta _{j})\mu _{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}({x_{i}^{j}})\) to be the output message \(\mu _{{f_{i}^{j}}\rightarrow \theta ^{j}}(\theta _{j})\) going from the factor node \({f_{i}^{j}}({y_{i}^{j}},{x_{i}^{j}},\theta _{j})\) to the variable node θ j in the factor graph shown in Fig. 2.

Fig. 2
figure 2

Factor graph whose output message \(\mu _{{f_{i}^{j}}\rightarrow \theta ^{j}}(\theta _{j})\) is the sum \({S_{i}^{j}}\)

Therefore, the posterior distribution in (3) can be written as

$$ P(\theta_{j}|Y_{j}) =\frac{1}{L_{j}}P(\theta_{j})\prod\limits_{i=1}^{M} \mu_{{f_{i}^{j}}\rightarrow \theta_{j}}(\theta_{j}) $$
(4)

We now introduce a factor node g j so that the prior distribution of θ j , P(θ j ), can be denoted by the message \(\mu _{g_{j}\rightarrow \theta _{j}}(\theta _{j})\). As a consequence, (3) may be rewritten as

$$ P(\theta_{j}|Y_{j}) =\frac{1}{L_{j}}\mu_{g_{j}\rightarrow \theta_{j}}(\theta_{j})\prod\limits_{i=1}^{M} \mu_{{f_{i}^{j}}\rightarrow \theta^{j}}(\theta_{j}) $$
(5)

Without loss of generality, we assume that the above equation is normalized so that the posterior distribution in (3) may be written as

$$ P(\theta_{j}|Y_{j}) =\mu_{g_{j}\rightarrow \theta^{j}}(\theta_{j})\prod\limits_{i=1}^{M} \mu_{{f_{i}^{j}}\rightarrow \theta_{j}}(\theta_{j}) $$
(6)

The expression in (6) shows that the posterior distribution of θ j given Y j can be calculated as the product of all the M incoming messages from the factor nodes \({f_{i}^{j}}\), i = 1, 2, ... , M to variable node θ j and the message \(\mu _{g_{j}\rightarrow \theta _{j}}\) coming from the factor node g j . Hence, the posterior distribution, P(θ j |Y j ) given by (6) can be represented by the factor graph shown in Fig. 3.

Fig. 3
figure 3

Factor graph representing the posterior distribution P(θ j |Y j ) given by (6)

After using 2β level quantizer for the DCT coefficient band X, the quantization indices of that DCT band turn into β bitplanes B c = {b 1c , b 2c , ..., b N c }, c = 1, 2, ... , β. Here, β LDPCA decoders are used in parallel to decode all the bitplanes B c . The belief propagation (BP) decoding algorithm [35], a well-known iterative decoding algorithm, is used on the factor graphs of the LDPCA decoders to obtain the log-likelihood ratios (LLR) for each bit b i c in the bitplane B c . The factor graphs of each of the LDPCA decoders used for decoding the bitplanes are augmented by the factor graph shown in Fig. 3 representing P(θ j |Y j ) for j = 1, 2, ... , N/M. The augmented LDPCA decoder is obtained as shown in Fig. 4.

Fig. 4
figure 4

The augmented decoder

The boxes in Fig. 4 represent the LDPCA decoder graphs constructed for simultaneous decoding of the various bitpalnes. The LDPC decoder graph for each bitplane B c consists of N source nodes and T c syndrome nodes corresponding to T c accumulated syndrome bits sent to the decoder, as shown in Fig. 5. It should be noted that at the encoder, N accumulated syndrome bits are produced for each bitplane according to the LDPC encoder graph structure and the concatenated accumulator as explained in [30]. These N accumulated syndrome bits are stored in a buffer and sent to the decoder incrementally at the request of the decoder. Based on the number of accumulated syndrome bits T c (T c < N) received at the decoder for each bitplane, the LDPCA decoder graph for that bitplane gets updated.

Fig. 5
figure 5

LDPCA decoder graph for the bitplane B c

Details of the Block j in Fig. 4 are shown in Fig. 6, where \(b_{ic}^{j}\), c = 1, 2, ... , β, represents the c-th bit corresponding to the quantized symbol of DCT coefficient \({x_{i}^{j}}\). The message \(\mu _{b_{ic}^{j}\rightarrow \ {x_{i}^{j}}}\) is calculated from LLR of the bit \(b_{ic}^{j}\) obtained using the BP algorithm that passes the messages back and forth between the source and syndrome nodes in the LDPCA decoder graph for B c using (2) and (3) of [30]. Hence, the message \(\mu _{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}\) is obtained as the product of the messages \(\mu _{b_{ic}^{j}\rightarrow \ {x_{i}^{j}}}\), c = 1, 2, ... , β.

Fig. 6
figure 6

Factor graph of block j

It is prohibitively expensive to calculate the posterior distribution P(θ j |Y j ) as given by (5). Also, we need a simple and closed form distribution for P(θ j |Y j ) to derive messages for the message passing algorithm. In view of these, the posterior distribution P(θ j |Y j ) needs to be approximated by a simple distribution, such as a distribution from the exponential family.

3 New decoding algorithm based on VB

In this section, a new recursive message passing algorithm is proposed to decode all the bitplanes corresponding to each of the DCT bands. The proposed recursive algorithm consists of three main modules:

  1. 1.

    VB algorithm to approximate the posterior distribution

  2. 2.

    Message update

  3. 3.

    Parallel LDPCA decoding process.

These three modules are explained in detail in Sections 3.13.2 and 3.3, respectively, and the complete recursive algorithm is given in Section 3.4.

3.1 VB algorithm to approximate the posterior distribution

It was seen in Section II that the posterior distribution P(θ j |Y j ) given by (3) consists of 2βM terms and further, that it does not have a closed form. Hence, calculation of the posterior distribution is extremely expensive. Sampling or particle methods, such as the Markov Chain Monte Carlo (MCMC) method, are used frequently for the approximation of the posterior distribution [18, 33]. These methods are stochastic approximation methods [9], and have high computational costs. In addition, results using any of these methods vary for each run on the same data set. Another class of methods used for approximation is deterministic in nature and is much faster than the sampling methods. The main idea behind the deterministic methods is to find a distribution function that is as close as possible to the true posterior distribution. Variational Bayes is a well-known deterministic method that is used to approximate the true posterior distribution [5, 16].

In a general Bayesian problem, one of the objectives is to find P(Z|X), where Z denotes all the unknown parameters and the hidden variables and X the observed variables. Since the exact calculation of P(Z|X) is prohibitively expensive, it is necessary to find an approximation for P(Z|X). It is known that for a given distribution q(Z), the log marginal probability of X can be decomposed as [16]

$$ ln P(X) = l(q)+KL(q\mid \mid p) $$
(7)

where

$$\begin{array}{@{}rcl@{}} l(q) &=&\int q(Z)ln\left\{ \frac{P(X,Z)}{q(Z)}\right\} dZ \end{array} $$
(8a)
$$\begin{array}{@{}rcl@{}} KL(q\mid\mid p) &=&\int q(Z)ln\left\{ \frac{q(Z)}{P(Z\mid X)}\right\} dZ \end{array} $$
(8b)

K L(q∣∣p) being the Kullback-Leibler (KL) divergence that quantifies the similarity between the two distributions q(Z) and P(ZX), and l(q) being the lower bound for l n P(X). In order for q(Z) to be an approximation of P(ZX) and at the same time be a tractable distribution, a restricted family of distributions is considered for q(Z). In fact, we try to restrict q(Z) to be a tractable distribution that is flexible enough to provide a proper approximation to the true posterior distribution. Then, the members of this distribution family are found for which the KL divergence in (8b) is minimized. It is equivalent to maximizing the lower bound l(q) with respect to q(Z).

Suppose the elements of Z are partitioned into S disjoint subsets Z n , n = 1, 2, ... , S. Then, we assume that q(Z) can be factorized as [16]

$$ q(Z)=\prod\limits_{i=1}^{S} q_{n}(Z_{n}) $$
(9)

The objective is to find the distribution q(Z) that leads to the largest lower bound l(q). As shown in [23], the variational optimization of l(q) with respect to the m-th factor, q m (Z m ), can be obtained using

$$ ln q_{m}(Z_{m})=E_{n\neq m}\left[ln\, P(X,\, Z)\right]+C \quad \quad 1<m<S $$
(10)

where C is a constant, and E nm [l n P(X, Z)] = \(\int P(X,Z){\prod }_{n\neq m} q_{n}(Z_{n})\)dZ n .

Equation (10) represents the conditions that will maximize the lower bound l(q) and consequently, minimize the KL divergence with respect to m-th factor, q m (Z m ). Solving (10) for q m (Z m ), m = 1, 2, .. , S, leads to the distribution q(Z) as an approximation to the posterior distribution P(ZX).

The above method is used in our proposed one to approximate the posterior distribution P(θ j |Y j ) derived in Section 2 and consequently, simplify the message structure on the augmented LDPC decoder in the j th block illustrated in Fig. 6. In order to use the variational Bayes method, we use a set of hidden variables \(H_{j}=\{{h_{1}^{j}},{h_{2}^{j}}, ..., {h_{M}^{j}}\}\), where each \({h_{i}^{j}}\), i = 1, 2, ... , M is a K-length vector (K = 2β). Let Z = {H j , θ j }, where θ j is an unknown parameter. Hence, the variational factorization given by (9) can be performed, for S = 2 as follows, by letting Z 1 = H j and Z 2 = θ j in (9)

$$ q(Z)=q(H_{j},\theta_{j})=q_{1}(Z_{1})q_{2}(Z_{2})=q_{1}(H_{j})q_{2}(\theta_{j}) $$
(11)

where q 2(θ j ) is the variational approximation for P(θ j |Y j ). After the factorization, the optimization (10) is used for both the factors by considering the observed variables X in (10) to be the side information Y j in our problem. Hence, two equations for VB algorithm can be presented as

$$\begin{array}{@{}rcl@{}} ln q_{1}(H_{j}) &=& E_{\theta_{j}} [Ln P(Y_{j}, H_{j}, \theta_{j})]+C_{1} \end{array} $$
(12a)
$$\begin{array}{@{}rcl@{}} ln q_{2}(\theta_{j}) &=& E_{H_{j}} [Ln P(Y_{j}, H_{j}, \theta_{j})]+C_{2} \end{array} $$
(12b)

where the joint distribution P(Y j , H j , θ j ) in (12a) and (12b) can be written as

$$ P(Y_{j}, H_{j},\theta_{j})= P(\theta_{j})P(Y_{j}| H_{j},\theta_{j})P(H_{j})=P(\theta_{j})\prod\limits_{i=1}^{M} P({y^{i}_{j}}| {h^{i}_{j}},\theta_{j})P({h^{i}_{j}}) $$
(13)

To determine q 1(H j ) and q 2(θ j ) from (12a12), we first need to find an expression for P(Y j , H j , θ j ) in (13).

For each WZ frame in the encoder, all of the coefficients in a specific DCT coefficient band have been uniformly quantized to K = 2β level to generate the quantized symbols. At the decoder, since the DCT coefficients of WZ frame, \({x_{i}^{j}}\)s, are not available, we use a partially decoded coefficient obtained by minimum mean square error (MMSE) reconstruction \( w^{j}_{ik}=E({x_{i}^{j}}|{y_{i}^{j}}, I_{k},\lambda _{j})\), where k = 1, 2, ... , K, I k is k th quantization interval and λ j is the initial value for the parameter of correlation noise distribution. In view of this, for each side information DCT coefficient \({y_{i}^{j}}\) extracted in the decoder, a hidden variable vector \({h_{i}^{j}}\) is considered as a k-length binary vector with elements \(h_{i1}^{j}, h_{i2}^{j}, ... , h_{iK}^{j}\). This vector has only one element equal to 1 and the rest are all zero. For each observation \({y_{i}^{j}}\), the position of 1 in each vector \({h_{i}^{j}}\) is determined by the quantization interval index (quantized symbol) so that if \({x_{i}^{j}} \in I_{d}\), 1 < d < K, only the d th element of vector \({h_{i}^{j}}\) is 1, i.e., \({h_{i}^{j}}=[0, 0, ..., 1, 0,0, ..., 0]\). By considering this feature for the hidden variable vectors \({h_{i}^{j}}\), \(P({y_{i}^{j}} | {h_{i}^{j}},\theta _{j})\) and \(P({h_{i}^{j}})\) can be written as \(P({y_{i}^{j}} | {h_{i}^{j}},\theta _{j})={\prod }_{k=1}^{K} P({y_{i}^{j}} | {x_{i}^{j}}= w^{j}_{ik},\theta _{j})^{h_{ik}^{j}}\) and \(P({h_{i}^{j}})={\prod }_{k=1}^{K} (\mu _{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik}))^{h_{ik}^{j}}\) respectively. Using the expressions for \(P({h_{i}^{j}})\) and \(P({y_{i}^{j}} | {h_{i}^{j}},\theta _{j})\), (12a12a) can be rewritten as

$$\begin{array}{@{}rcl@{}} ln q_{1}(H_{j}) &=& E_{\theta_{j}} [ln P(Y_{j} | H_{j}, \theta_{j})+ln P(H_{j})] + C_{1} \\ &=& E_{\theta_{j}} \left[ln \prod\limits_{i=1}^{M}\prod\limits_{k=1}^{K} (P({y_{i}^{j}} | {x_{i}^{j}}=w^{j}_{ik},\theta_{j}))^{h_{ik}^{j}}\right.\\&&\left.+ln \prod\limits_{i=1}^{M}\prod\limits_{k=1}^{K}(\mu_{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik}))^{h_{ik}^{j}}\right]+ C_{1} \\ &=& E_{\theta_{j}} \left[ \sum\limits_{i=1}^{M}\sum\limits_{k=1}^{K} h_{ik}^{j}\{ln(P({y_{i}^{j}} | {x_{i}^{j}}=w^{j}_{ik},\theta_{j}))\right.\\&&\left.+ln (\mu_{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik}))\}\right]+ C_{1} \end{array} $$
(14)

Similarly, (12b12b) can be rewritten as

$$\begin{array}{@{}rcl@{}} ln q_{2}(\theta_{j}) & =& E_{H_{j}} [ln P(Y_{j} | H_{j}, \theta_{j})+ln P(\theta_{j})] + C_{2} \\ &=& E_{H_{j}} \left[ \sum\limits_{i=1}^{M}\sum\limits_{k=1}^{K} h_{ik}^{j} \{ln(P({y_{i}^{j}} | {x_{i}^{j}} =w^{j}_{ik},\theta_{j}))\}\right]+ ln P(\theta_{j}) + C_{2} \end{array} $$
(15)

In distributed video coding, the correlation noise which is the difference between each DCT coefficient band of the WZ frame and the corresponding one in the side information frame is often modeled by Gaussian [25] or Laplacian [22] distribution. In the following subsections, we consider Gaussian and Laplacian distributions for the correlation noise model to solve (14) and (15) simultaneously in order to find q 2(θ j ) as an approximation to P(θ j |Y j ).

  • 1) Gaussian distribution for correlation noise model

Assuming Gaussian distribution for the correlation noise, we can express the probability \(P({y_{i}^{j}} | {x_{i}^{j}} =w^{j}_{ik},\theta _{j})\) in (14) and (15) as

$$ P({y_{i}^{j}} | {x_{i}^{j}} =w^{j}_{ik},\theta_{j})=\frac{\theta_{j}^{\frac{1}{2}}}{\sqrt{2\pi}}e^{\frac{-({y_{i}^{j}}-w^{j}_{ik})^{2}}{2}\theta_{j}} $$
(16)

Substituting (16) in (14) and after some simplification, it can be shown that,

$$\begin{array}{@{}rcl@{}} ln q_{1}(H_{j})&=&E_{\theta_{j}} \left[{\sum}_{i=1}^{M}{\sum}_{k=1}^{K} h_{ik}^{j} \left\{ ln \left( \frac{\theta_{j}^{\frac{1}{2}}}{\sqrt{2\pi}}e^{\frac{-({y_{i}^{j}}-w^{j}_{ik})^{2}}{2}\theta_{j}}\right)\right.\right.\\&&\left.\left.+ln (\mu_{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik})) \right\}\right] + C_{1} \\ &=&E_{\theta_{j}} \left[{\sum}_{i=1}^{M}{\sum}_{k=1}^{K} h_{ik}^{j} \left\{ \frac{1}{2}ln\theta_{j}+\frac{1}{2}ln\frac{1}{2\pi}-\frac{({y_{i}^{j}}-w^{j}_{ik})^{2}}{2}\theta_{j}\right.\right.\\&&\left.\left.+ln (\mu_{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik}) \right\}\right]+ C_{1} \\ &=&{\sum}_{i=1}^{M}{\sum}_{k=1}^{K} h_{ik}^{j} ln\rho_{ik}+C_{1} \end{array} $$
(17)

where

$$ ln\rho_{ik}=\frac{1}{2}E_{\theta_{j}}\left[ln\theta_{j}\right]-\frac{1}{2}ln2\pi-E_{\theta_{j}}\left[\frac{({y_{i}^{j}}-w^{j}_{ik})^{2}}{2}\theta_{j}\right]+ln (\underset{{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}}{\mu ({x_{i}^{j}}=w^{j}_{ik}))} $$
(18)

Let the normalized value of ρ i k be denoted by r i k . Then,

$$ r_{ik} =\rho_{ik}/ \sum\limits_{k=1}^{K} \rho_{ik} $$
(19)

From (18), it can be concluded that

$$ q_{1}(H_{j})=\prod\limits_{i=1}^{M} \prod\limits_{k=1}^{K} r_{ik}^{h_{ik}^{i}} $$
(20)

Also, the update (15) for q 2(θ j ) can be obtained as follows

$$\begin{array}{@{}rcl@{}} ln q_{2}(\theta_{j}) & =& E_{H_{j}} \left[\sum\limits_{i=1}^{M} \sum\limits_{k=1}^{K} h_{ik}^{j} \left\{ ln P({y_{i}^{j}} | {x_{i}^{j}} =w^{j}_{ik},\theta_{j})\right\}\right]+ ln P(\theta_{j}) + C_{2} \\ & =& \sum\limits_{i=1}^{M}\sum\limits_{k=1}^{K} \frac{1}{2}E_{h_{ik}^{j}}[h_{ik}^{j}]ln (\theta_{j}) - \sum\limits_{i=1}^{M}\sum\limits_{k=1}^{K} E_{h_{ik}^{j}}[h_{ik}^{j}]\frac{({y_{i}^{j}}-w^{j}_{ik})^{2}}{2}\theta_{j}\\&&+ln P(\theta_{j}) + C_{2} \end{array} $$
(21)

As \(E_{h_{ik}^{j}}[h_{ik}^{j}]=p(h_{ik}^{j} =1)=r_{ik} \), (21) can be rewritten as

$$ ln q_{2}(\theta_{j})=\sum\limits_{i=1}^{M}\sum\limits_{k=1}^{K} \frac{1}{2} r_{ik} ln (\theta_{j}) - \sum\limits_{i=1}^{M}\sum\limits_{k=1}^{K} r_{ik} \frac{({y_{i}^{j}}-w^{j}_{ik})^{2}}{2}\theta_{j}+ln P(\theta_{j}) + C_{2} $$
(22)

If the prior distribution P(θ j ) is considered as a gamma distribution with parameter a 0 and b 0, that is,

$$ P(\theta_{j})= Gama(\theta|a_{0}, b_{0} )= \frac{1}{\Gamma(a_{0})} b_{0}^{a_{0}} \theta_{j}^{a_{0}-1}e^{-b_{0}\theta_{j}} $$
(23)

Then

$$ ln P(\theta_{j})= b_{0}^{a_{0}} \theta_{j}(a_{0}-1)ln\theta_{j} -b_{0} \theta_{j} +\alpha $$
(24)

where \(\alpha =\frac {1}{\Gamma (a_{0})}b_{0}^{a_{0}}\)is a constant. Then, by substituting (24) in (22), l n P(θ j ) can be simplified as

$$ ln q_{2}(\theta_{j})=\left( \frac{1}{2}\sum\limits_{i=1}^{M}\sum\limits_{k=1}^{K} r_{ik} +a_{0}-1\right) ln (\theta_{j}) -\left( \sum\limits_{i=1}^{M}\sum\limits_{k=1}^{K} r_{ik} \frac{({y_{i}^{j}}-w^{j}_{ik})^{2}}{2} +b_{0} \right)\theta_{j} + C_{3} $$
(25)

By comparing (25) and (24), it is obvious that the q 2(θ j ), the variational approximation of the true posterior distribution, would be in the form of a gamma distribution with parameters a and b as follows,

$$\begin{array}{@{}rcl@{}} a&=&\frac{1}{2}\sum\limits_{i=1}^{M}\sum\limits_{k=1}^{K} r_{ik} +a_{0} =\frac{1}{2} M+a_{0} \\ b&=&\sum\limits_{i=1}^{M}\sum\limits_{k=1}^{K} r_{ik} \frac{({y_{i}^{j}}-w^{j}_{ik})^{2}}{2} +b_{0} \end{array} $$
(26)

Then, by using the gamma distribution with parameters a and b obtained in (26), ρ i k can be calculated from (18). Consequently, after normalizing ρ i k using (19), r i k can be obtained as

$$ r_{ik}=\frac{\mu_{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik})e^{\left( \frac{-({y_{i}^{j}}-w^{j}_{ik})^{2}}{2}\frac{a}{b}\right)}}{{\sum}_{k=1}^{K}\mu_{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik})e^{\left( \frac{-({y_{i}^{j}}-w^{j}_{ik})^{2}}{2}\frac{a}{b}\right)}} $$
(27)

In the first iteration of the VB algorithm, we consider a = a 0 and b = b 0 for the parameters of the gamma distribution. The value obtained for r i k is then substituted in (26) to find the new value for b. The new parameters for gamma distribution is now used in (27) to obtain a new value for r i k . This procedure is repeated iteratively until there is almost no change in the value of b. The gamma distribution with the parameters a and b so obtained, i.e., G a m a(θ j |a, b) is considered as the distribution approximating the posterior distribution.

  • 2) Laplacian for correlation noise model

Assuming Laplacian distribution for the correlation noise, we can express the probability \(P({y_{i}^{j}} | {x_{i}^{j}} =w^{j}_{ik},\theta _{j})\) in (14) and (15) as

$$ P({y_{i}^{j}} | {x_{i}^{j}} =w^{j}_{ik},\theta_{j})=\frac{\theta_{j}}{2}e^{-|{y_{i}^{j}}-w^{j}_{ik}|\theta_{j}} $$
(28)

The VB method explained above for the Gaussian distribution can be also applied for the Laplacian distribution. In this case, the approximation of the posterior distribution is also a gamma distribution with parameters a and b as given below.

$$\begin{array}{@{}rcl@{}} a&=&\frac{1}{2}\sum\limits_{i=1}^{M}\sum\limits_{k=1}^{K} r_{ik} +a_{0} = \frac{1}{2} M+a_{0} \\ b&=&\sum\limits_{i=1}^{M}\sum\limits_{k=1}^{K} r_{ik} | {y_{i}^{j}}-w^{j}_{ik}|+b_{0} \end{array} $$
(29)

Then, using the gamma distribution with the above parameters, r i k can be obtained as

$$ r_{ik}=\frac{\mu_{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik})e^{-|{y_{i}^{j}}-w^{j}_{ik}|\frac{a}{b}}}{{\sum}_{k=1}^{K}\mu_{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik})e^{-|{y_{i}^{j}}-w^{j}_{ik}|\frac{a}{b}}} $$
(30)

Just as in the case of VB in Gaussian distribution, the values for r i k and the parameters of the gamma distribution, a and b, are obtained iteratively until there is almost no more change in the value of b. The gamma distribution with the parameters a and b so obtained, i.e., G a m a(θ j |a, b) is considered as the approximate distribution for the posterior distribution.

3.2 Message update

After obtaining the approximation for the posterior distribution P(θ j |Y j ), the message \(\mu _{{f_{i}^{j}}\rightarrow {x_{i}^{j}}}(x_{i})\), representing the probability that the partially decoded coefficient is \(w^{j}_{ik}\) or equivalently \({x_{i}^{j}} \in I_{k}\), k = 1, 2, ... , K, is calculated based only on the information from the Bayesian estimation part shown in the factor graph of Fig. 6. If the correlation noise is Gaussian, then the message \(\mu _{{f_{i}^{j}}\rightarrow {x_{i}^{j}}}(x_{i})\) can be calculated as

$$\begin{array}{@{}rcl@{}} \mu_{{f_{i}^{j}}\rightarrow {x_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik})&=&\underset{\theta_{j}}{\int}\frac{1}{\Gamma(a)}b^{a}\theta_{j}^{a-1}e^{-b\theta_{j}}\frac{\theta_{j}^{\frac{1}{2}}}{\sqrt{2\pi}}e^{\frac{-\theta_{j}}{2}({y_{i}^{j}}-w^{j}_{ik})^{2}}d\theta_{j}\\ &=&\overset{\infty}{\underset{0}{\int}}\frac{1}{\Gamma(a)\sqrt{2\pi}}b^{a}\theta_{j}^{a-\frac{1}{2}}e^{-\theta_{j}\left[\frac{\left( {y_{i}^{j}}-w^{j}_{ik}\right)^{2}}{2}+b\right]}d\theta_{j} \end{array} $$
(31)

Then, after some mathematical simplification, \(\mu _{{f_{i}^{j}}\rightarrow {x_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik})\) can be expressed as

$$ \mu_{{f_{i}^{j}}\rightarrow {x_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik})=\frac{\Gamma(a+\frac{1}{2})}{\Gamma(a)\sqrt{2\pi}}b^{a}\left[b+\frac{\left( {y_{i}^{j}}-w^{j}_{ik}\right)^{2}}{2}\right]^{-\left( a+\frac{1}{2}\right)} $$
(32)

On the other hand, if the correlation noise has a Laplacian distribution, then the message \(\mu _{{f_{i}^{j}}\rightarrow {x_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik})\) can be obtained as

$$\begin{array}{@{}rcl@{}} \mu_{{f_{i}^{j}}\rightarrow {x_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik})&=&\underset{\theta_{j}}{\int}\frac{1}{\Gamma(a)}b^{a}\theta_{j}^{a-1}e^{-b\theta_{j}}\frac{\theta_{j}}{2}e^{-\theta_{j}|{y_{i}^{j}}-w^{j}_{ik}|}d\theta_{j}\\ &=&\overset{\infty}{\underset{0}{\int}}\frac{1}{2{\Gamma}(a)}b^{a}\theta_{j}^{a-\frac{1}{2}}e^{\left[-\theta_{j}|{y_{i}^{j}}-w^{j}_{ik}|+b\right]}d\theta_{j} \end{array} $$
(33)

which after simplification, can be written as

$$ \mu_{{f_{i}^{j}}\rightarrow {x_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik})=\frac{1}{2}b^{a}\left[b+|{y_{i}^{j}}-w^{j}_{ik}|\right]^{-(a+1)} $$
(34)

The updated messages from each of the blocks are then returned into LDPCA decoders for the bitplanes B 1, B 2, ... , B β (See Fig. 4) to start decoding with more accurate soft information. Hence, all the decoders have new and more precise knowledge about the correlation noise parameter, leading to a more efficient decoding after performing regular belief propagation in the LDPCA decoder.

3.3 Parallel LDPCA decoding process

To decode the bitplanes B 1, B 2, ... , B β (see Fig. 4) using the BP algorithm in the LDPCA decoders, the log-likelihood ratio (LLR) for each bit \(b^{j}_{ic}\) in the bitplanes B 1, B 2, ... , B β has to be obtained. First, the messages \(\mu _{{f_{i}^{j}}\rightarrow {x_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik})\) obtained in Section 3.2 are used to calculate the messages \(\mu _{{x_{i}^{j}}\rightarrow b_{ic}^{j}}(b_{ic}^{j})\) from node \({x_{i}^{j}}\) to the corresponding bit nodes \(b_{ic}^{j}\), using the procedure given in [24]. Then, \(\mu _{{x_{i}^{j}}\rightarrow b_{ic}^{j}}(b_{ic}^{j})\) is exploited to compute the initial LLR for each bit \(b_{ic}^{j}\) as

$$ L_{ic}^{j}=log\frac{\mu_{{x_{i}^{j}}\rightarrow b_{ic}^{j}}(b_{ic}^{j}=1)}{\mu_{{x_{i}^{j}}\rightarrow b_{ic}^{j}}(b_{ic}^{j}=0)} $$
(35)

After a pre-specified number of iterations for the BP algorithm in the LDPCA decoders, LLR for each bit \(b_{ic}^{j}\) is obtained as \(l_{ic}^{j}=L_{ic}^{j}+\sum l_{ic}^{j,v}\) , where \(L_{ic}^{j}\) is calculated using (35), \(l_{ic}^{j,v}\) is the LLR value received through the v th edge (v = 1, 2,...V i ) from the syndrome node to the node \(b_{ic}^{j}\) after a pre-defined number of iterations and V i is the number of syndrome nodes connected to the node \(b_{ic}^{j}\). Then, \(b_{ic}^{j}\) is decoded as 1 if \(l_{ic}^{j}>0\) and as zero otherwise. Next, the LDPCA syndrome and 8-bit cyclic redundancy check (CRC) summations are used in the decoder to determine whether or not the LDPCA decoding has been successful [3].

3.4 The recursive message passing algorithm

Figure 7 shows the proposed bitplane decoder consisting of the three modules explained in Sections 3.13.2 and 3.3. The arrows in this figure indicate the interactions amongst the three modules.

Fig. 7
figure 7

Proposed bitplanes decoder

The recursive message passing algorithm is described below.

  • Step 1 - The messages \(\mu _{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}({x_{i}^{j}})\) in Fig. 6 are first calculated using the messages \(\mu _{b_{ic}^{j}\rightarrow {x_{i}^{j}}}(b_{ic}^{j})\) received by the node \({x_{i}^{j}}\) from the bit nodes \(b_{ic}^{j}\), c = 1, 2, ... , β so that \(\mu _{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}({x_{i}^{j}})= {\prod }_{c=1}^{\beta } \mu _{b_{ic}^{j}\rightarrow {x_{i}^{j}}}(b_{ic}^{j})\).

  • Step 2 - Using the messages, \(\mu _{{x_{i}^{j}}\rightarrow {f_{i}^{j}}}({x_{i}^{j}})\), and the partially decoded coefficients \(w^{j}_{ik}\) for k = 1, 2, .. , K, an approximation for the posterior distribution of each correlation noise parameter θ j is calculated using the VB algorithm, as explained in Section 3.1.

  • Step 3 - The approximated posterior distribution for each correlation noise parameter θ j is used to calculate the messages \(\mu _{{f_{i}^{j}}\rightarrow {x_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik})\) from the factor nodes \({f_{i}^{j}}\) to the variable nodes \({x_{i}^{j}}\), as explained in Section 3.2.

  • Step 4 - The messages \(\mu _{{f_{i}^{j}}\rightarrow {x_{i}^{j}}}({x_{i}^{j}}=w^{j}_{ik})\) thus obtained are then used to calculate the messages \(\mu _{{x_{i}^{j}}\rightarrow b_{ic}^{j}}(b_{ic}^{j})\) from the node \({x_{i}^{j}}\) to the bit nodes \(b_{ic}^{j}\), c = 1, 2, ... , β. The initial LLRs \(L_{ic}^{j}\) are then calculated using (35) and employed in the LDPCA decoders to decode all the bitplanes, as explained in Section 3.3.

  • Step 5 - By using the LDPCA syndrome and 8-bit CRC summations as mentioned in Section 3.3, we check if all the bitplanes have been decode correctly.

  • Step 6 - if all the bitplanes are decoded correctly, the decoding process stops; otherwise Steps 1- 5 are repeated for a pre-specified number of iterations.

After applying the above algorithm, if any of the LDPCA decoder fails to decode its bitplane correctly, then the corresponding decoder requests more syndrome bits from the encoder, as is done in algorithms. Then, this decoder modifies its factor graph, and the proposed recursive message passing algorithm is applied again. This whole process is repeated until all the LDPCA decoders successfully decode all their bitplanes. The LDPCA decoder and the correlation noise estimation blocks in the DISCOVER codec shown in Fig. 1 are now replaced by the proposed decoder shown in Fig. 7, and the resulting modified architecture for the transform-domain distributed video codec is shown in Fig. 8.

Fig. 8
figure 8

Modified architecture for DVC

4 Simulation results

In this section, we study through extensive experimentation the effect on the rate-distortion performance of the DISCOVER codec, when it is modified as in Fig. 8 by incorporating the bitplane decoder of Fig. 7, which is based on the proposed joint estimation and decoding method. We then compare the rate-distortion (RD) performance of this modified DISCOVER codec with that of the original codec [3] that uses the online correlation noise estimation proposed in [6]. For the simulations, Foreman (150 frames), Coastguard (150 frames) and Hall (150 frames) and Soccer (150 frames) video sequences with 15Hz frame rate and QCIF format are employed; one frame of each of these sequences is shown in Fig. 9. The key frames are encoded using the intra coding mode of the H.264/AVC codec, JM 9.5 [21]. Eight RD points corresponding to the eight 4 ×4 quantization matrices Q 1, Q 2,, Q 8, the same as the ones used in the DISCOVER codec [6], are considered. Moreover, the QP values in H.264/AVC (in intra mode) are set to the same values as used for the key frames in the DISCOVER codec [6]. Also, only the luminance component (Y) of the video frames is considered in our simulation for the rate-distortion evaluation. In our setup, the Laplacain distribution is used to model the correlation noise in each block of the DCT coefficients of length M=99 in the corresponding DCT band. Then, the proposed message passing algorithm based on VB is used to decode all the bitplanes simultaneously in each of the DCT bands. The maximum number of iterations used for the recursive message passing algorithm in the proposed decoder before requesting for more syndrome bits is three. Carrying out further iterations would only increase the execution time without adding any noticeable improvement in the performance. Also, the belief propagation algorithm inside the LDPCA decoders runs for 100 iterations to decode the bitplanes in each DCT band of each of the Wyner-Ziv frames.

Fig. 9
figure 9

One frame of each of the sequences, Foreman, Hall and Coastguard and Soccer

Table 1 gives the relative average savings (%) in bitrate and the improvement in PSNR (dB) of the proposed codec over that of the DISCOVER codec for WZ frames as well as for all frames, computed using the Bjøntegaard metric [8]. Considering the GOP of size 2 (Having one WZ frames between two successive key frames), it can be seen that for the Foreman sequence the proposed method leads to an average bitrate savings of 5.53% and 11.45% for all frames and WZ frames, respectively. The corresponding savings are 3.21% and 6.11% for the Hall sequence, 4.79% and 9.73% for the Coastguard sequence, and 8.23% and 15.71% for the soccer sequence. As for the PSNR, the proposed codec shows an average improvement of 0.31 dB and 0.29 dB, 0.27 dB and 0.48 dB for the Foreman, Hall, Coastguard and Soccer sequences in WZ frames. Further, there is an improvement of 0.16 dB, 0.14 dB, 0.12 dB and 0.26 dB in Foreman, Hall, Coastguard and Soccer sequences respectively, for all frames. Hence, on an average, we observe that our method leads to 10.75% saving in bitrate for WZ frames and 5.44% bitrate saving for all frames of the sequences. Moreover, on an average, the improvements in PSNR values are 0.33 dB and 0.17dB in WZ frames and all frames, respectively.

Table 1 The relative bit-rate savings (%) and improvement in PSNR values (dB) over that of discover codec, computed using the Bjøntegaard metric

For GOP of size 4 (having 3 WZ frames between two successive key frames), it also can be seen from Table 1 that for the Foreman sequence the proposed method leads to an average bitrate savings of 8.41% and 10.68% for all frames and WZ frames, respectively. The corresponding savings are 5.26% and 7.26% for the Hall sequence, 6.67% and 9.56% for the Coastguard sequence, and 7.76% and 11.21% for the soccer sequence. As for the PSNR, the proposed codec shows an average improvement of 0.33 dB and 0.35 dB, 0.26 dB and 0.41 dB for the Foreman, Hall, Coastguard and Soccer sequences in WZ frames. Further, there is an improvement of 0.24 dB, 0.22 dB, 0.19 dB and 0.25 dB in Foreman, Hall, Coastguard and Soccer sequences respectively, for all frames. Hence, on an average, we observe that our method leads to 9.67% saving in bitrate for WZ frames and 7.02% bitrate saving for all frames of the sequences. Moreover, on an average, the improvements in PSNR values are 0.34 dB and 0.22dB in WZ frames and all frames, respectively.

Figures 10 and 11 show the overall RD performance for the four sequences Foreman, Coastguard, Hall and Soccer using the DISCOVER codec and the modified DISCOVER codec for GOP of size 2 and 4. It is seen from these figures that the modified DISCOVER codec exhibits an overall better RD performance than the original codec. Further, it is clear from these figures and Table 1 that this improvement in the RD performance is even more pronounced in the case of Foreman and Soccer sequences where there are fast motions.

Fig. 10
figure 10figure 10

RD performance for GOP size of 2

Fig. 11
figure 11figure 11

RD performance for GOP size of 4

The proposed decoder can also be used in other transform-domain distributed video coding schemes which have the same architecture as the DISCOVER codec, namely, those based on the Stanford approach. For instance, if the proposed decoder is employed in the distributed video codec with the side information refinement (SIR) proposed in [26], then using Bjøntegaard metric, the relative bitrate savings (%) and PSNR improvement (in dB) are obtained as shown in Table 2. It can be see that for GOP of size 2, on an average, the modified DVC codec, on average, leads to 8.79% and 4.11% savings in the bitrate for WZ frames and all frames, respectively, for the aforementioned sequences. Moreover, on an average, there is an improvement of 0.19 dB and 0.1 dB in PSNR for WZ frames and all frames, respectively. Further, for GOP of size 4, the corresponding savings in the bitrate are 7.51% and 5.63% for WZ frames and all frames, respectively. Also, there is an improvement of 0.17 dB and 0.11 dB in PSNR for WZ frames and all frames, respectively. The RD performance of DVC codec with SIR [26] and its modified version using the proposed decoder are also shown in Figs. 10 and 11. It is clear from these figures that an improved RD performance is achieved by using the proposed decoder.

Table 2 The relative bit-rate savings (%) and improvement in PSNR values (dB) over that of codec with SIR [26], computed using the Bjøntegaard metric

Some screenshots from the decoded Wyner-Ziv frames of Foreman and Soccer sequences are shown in Figs. 12 and 13 respectively, which demonstrate the improvement in the quality of the decoded frames resulting from using the DISCOVER codec modified by the proposed decoder over the original DISCOVER codec. For these video sequences, GOP size of 2 and the quantization matrix Q 6 corresponding to 6th RD point are considered.

Fig. 12
figure 12

47th decoded frame of the Foreman sequence

Fig. 13
figure 13

85th decoded frame of the Soccer sequence

In the DISCOVER codec as well as in the DVC codec with SIR [26], the correlation noise is considered to have a Laplacian distribution and its distribution parameter is obtained online using the method proposed in [6] before the LDPCA decoder starts decoding the bitplanes. Therefore, the correlation noise parameter is kept fixed during the decoding of each WZ frame. However, in our proposed decoder, the estimation of the correlation noise parameter is refined recursively during the decoding of each DCT coefficient band. This has led to a more accurate estimation of the correlation noise parameter and consequently, to a better RD performance.

In addition, we investigate the relation between the amount of motion in the video sequences and the improvement in the RD performance resulting from using the proposed decoder. The mean of the magnitudes of forward motion vectors (MVs) between each pair of the successive frames and the average of these values over the entire video sequence for each of the Foreman, Coastguard, Hall and Soccer sequences are shown in Fig. 14. It is clear from this figure that the video sequences Soccer, Foreman, coastguard and Hall have in that order the highest to lowest motion contents. From Figs. 10 and 11, it can be seen that the improvements in the RD performance resulting from using the two DVC codecs modified by the proposed decoder, respectively, over the two original DVC codecs are more significant in video sequences having higher motion content.

Fig. 14
figure 14

Measurement of motion in the video sequences

The hardware used for our simulation is a personal computer with Core i5 CPU at 2.7 GHz, and 8-GB RAM. Windows 7 operating system is used and the codec is implemented using the Visual Studio C++ v10.0 compiler in release mode on one CPU core. The execution time (in seconds) to decode each of the four video sequences (with GOP size of 2 and quantization matrix Q1) is shown in Table 3 for the original DISCOVER codec and the DISCOVER codec modified by the proposed decoder.

Table 3 Execution time (in seconds) for decoding the video sequences with GOP size of 2 and quantization matrix Q1

It should be mentioned that a parallel or multi-threaded programming on a multi-core processor or on a GPU can be used to reduce the execution time of the algorithm and speed up the decoding process significantly.

5 Conclusion

Distributed video coding is a coding paradigm based on Slepian-Wolf and Wyner-Ziv theorems. In this paper, we have investigated the problem of obtaining the correlation noise parameter in the DVC decoder in order to improve the rate-distortion performance and the coding efficiency in a distributed video codec. Having a more accurate information about the correlation noise leads to a better decoding performance and consequently, a higher coding efficiency and a better rate-distortion performance of the video codec. Since the decoder does not have access to the current encoded WZ frame located at the encoder and the correlation noise is non-stationary, it is difficult to model the correlation noise and obtain its parameter accurately. To overcome these difficulties, a recursive algorithm based on variational Bayes has been proposed to estimate and refine the correlation noise distribution parameter while decoding simultaneously all the bitplanes corresponding to the current DCT band on an augmented factor graph. Unlike most of the DVC schemes in which the parameter of the correlation noise distribution is obtained before the decoding of each DCT coefficient band of the WZ frame, in our proposed decoder, the estimation of the correlation noise parameter is refined during the decoding of each DCT coefficient band. This has resulted in obtaining more accurate information about the correlation noise. The proposed decoder has then been used in the DISCOVER codec, one of the most popular codecs designed based on the Stanford approach. It has been shown through extensive simulations that the DISCOVER codec using the proposed decoder exhibits a performance that is better that of the original codec, particularly on sequences with fast motions. Such an improvement in the performance has also been observed when the proposed decoder is used in the DVC codec with SIR [36], which also has an architecture designed based on the Stanford approach. This leads us to believe that proposed decoder can be used to improve the performance of any codec whose architecture is based on the Stanford approach.