12.1 Shannon Limit

12.1.1 Entropy and Mutual Information

Let X be a discrete random variable taking its values in a set \( {\mathcal{X}} \), according to some probability mass function (pmf) \( p\left( x \right) = { \Pr }\left\{ {X = x} \right\} \). The entropy of X is defined as

$$ H\left( X \right) = - \sum\limits_{x} {p\left( x \right)\log_{2} p\left( x \right)} . $$

Intuitively, the entropy \( H\left( X \right) \) may be thought as the uncertainty associated with the random variable. For example, a deterministic variable is characterized by a zero entropy while, for a given positive integer M, the random variable with the largest entropy among all discrete random variables whose support set \( {\mathcal{X}} \) has cardinality M is the uniform one, i.e. , \( p\left( x \right) = 1/M \) for all \( x \in {\mathcal{X}} \). In this latter case we obtain \( H\left( X \right) = \log_{2} M \).

Consider now a second discrete random variable \( Y \in {\mathcal{Y}} \) characterized by a pmf \( p\left( y \right) \). Let \( p\left( {y|x} \right) = { \Pr }\{ Y = y|X = x\} \) be the pmf of Y conditioned to the event \( \left\{ {X = x} \right\} \). The entropy of Y given the event \( \left\{ {X = x} \right\} \) is defined as

$$ H\left( {Y|X = x} \right) = - \mathop \sum \limits_{y} p(y|x)\log_{2} p(y|x). $$

Next, the conditional entropy \( H\left( {Y|X} \right) \) is defined as

$$ \begin{aligned} H\left( {Y|X} \right) & = \sum\limits_{x} {p\left( x \right)H\left( {Y |X = x} \right)} \\ & = - \sum\limits_{x} {\sum\limits_{y} {p(y|x)p\left( x \right)\log_{2} p(y|x)} } . \\ \end{aligned} $$

Finally, the mutual information \( I\left( {X;Y} \right) \) between X and Y is defined as

$$ I\left( {X;Y} \right) = \sum\limits_{x} {\sum\limits_{y} {p\left( {y|x} \right)p\left( x \right)\log_{2} } } \frac{{p\left( {y|x} \right)p\left( x \right)}}{p\left( x \right)p\left( y \right)}. $$
(12.1)

It can be shown that \( I\left( {X;Y} \right) = H\left( Y \right) - H\left( {Y |X} \right) = H\left( X \right) - H\left( {X |Y} \right) \). As such, \( I\left( {X;Y} \right) \) intuitively represents the reduction of uncertainty about X due to the fact that we can observe Y (equivalently, reduction of uncertainty about Y due to the fact that we can observe X). The mutual information is well-defined also for continuous random variables. In this case, \( p\left( x \right) \), \( p\left( y \right) \), and \( p\left( {y|x} \right) \) are probability density functions (pdfs), and we have

$$ I\left( {X;Y} \right) = \int {p\left( {y|x} \right)p\left( x \right)\log_{2} } \frac{{p\left( {y|x} \right)p\left( x \right)}}{p\left( x \right)p\left( y \right)}dxdy. $$
(12.2)

Moreover, if X is a discrete random variable and Y is a continuous one, \( I\left( {X;Y} \right) \) is defined as

$$ I\left( {X;Y} \right) = \sum\limits_{x} {p\left( x \right)\smallint p\left( {y|x} \right)\log_{2} } \frac{{p\left( {y|x} \right)p\left( x \right)}}{p\left( x \right)p\left( y \right)}dy. $$
(12.3)

12.1.2 System Model and Channel Capacity

The fundamental limit of point-to-point digital communication over a noisy channel was established in 1948 by C. Shannon , who showed that a vanishing error probability can be attained at a finite information rate, provided this rate is smaller than the capacity of the noisy channel.

With reference to Fig. 12.1, a source S of information generates messages that must be delivered to a destination D through a noisy channel. The generic message, denoted by W, is drawn from a set of M possible messages \( \left\{ {1,2, \ldots ,M} \right\} \), where all messages are a priori equally likely. Prior to transmission over the channel, the message W is encoded through a channel encoder , that maps deterministically (and univocally) each message onto a codeword \( \varvec{x} = \left[ {x_{0} ,x_{1} , \ldots ,x_{n - 1} } \right] \), i.e. , an n-tuple of symbols belonging to some alphabet \( {\mathcal{X}} \). The ratio

Fig. 12.1
figure 1

Communication model

$$ R = \frac{{\log_{2} M}}{n} $$

is the code rate of the channel code and the code is named an \( \left( {n,2^{nR} } \right) \) code. All n codeword symbols are then transmitted sequentially over the channel, resulting in a sequence \( \varvec{y} = \left[ {y_{0} ,y_{1} , \ldots ,y_{n - 1} } \right] \) whose symbols belong to an alphabet \( {\mathcal{Y}} \). A decoding algorithm is then performed by a channel decoder to decide which codeword, out of the set of M candidate codewords, had been transmitted over the channel, given the noisy observation \( \varvec{y} \). The codeword \( \hat{\varvec{x}} \) returned by the decoder is converted back to the corresponding message \( \hat{W} \) that is finally delivered to the destination. As error occurs whenever \( W \ne \hat{W} \), i.e. , a wrong message is delivered.

A probability of error can be defined for each of the M transmitted messages as follows. The probability of error associated with the j-th message, \( j \in \left\{ {1,2, \ldots ,M} \right\} \), is denoted by \( P_{e,j} \) and is defined as

$$ P_{e,j} = \Pr \left\{ {\hat{W} \ne W |W = j} \right\}. $$

Furthermore, the maximum probability of error is defined as

$$ P_{e,max} = \mathop {\hbox{max} }\limits_{{j \in \left\{ {1,2, \ldots ,M} \right\}}} P_{e,j} $$
(12.4)

and the average probability of error as

$$ P_{e} = \frac{1}{M}\sum\limits_{j = 1}^{M} {P_{e,j} } . $$
(12.5)

The channel code along with its decoding algorithm shall be designed in order to make the maximum probability of error over the given channel as small as possible.

Assume that both the input alphabet \( {\mathcal{X}} \) and the output alphabet \( {\mathcal{Y}} \) are discrete. Let \( X \in {\mathcal{X}} \) and \( Y \in {\mathcal{Y}} \) be two discrete random variables, representing the input to the channel and the corresponding output. Moreover, assume that the channel is fully defined by the transition probabilities \( p\left( {y |x} \right) = { \Pr }\{ Y = y|X = x\} \). In this case, the channel is called a discrete memory-less channel (DMC). The capacity of a DMC is defined as

$$ C = \mathop {\hbox{max} }\limits_{p\left( x \right)} I\left( {X;Y} \right) $$
(12.6)

i.e. , as the maximum amount of uncertainty we can remove from the input symbol (which cannot be observed directly) by observing the output symbol, where the maximum is taken over all possible pmfs for the input symbol. The capacity is an intrinsic parameter of the channel, only depending on the cardinalities of \( {\mathcal{X}} \) and \( {\mathcal{Y}} \) and on the transition probabilities \( p\left( {y |x} \right) \). It is expressed in terms of information bits (or Shannon) per channel use.

Example 12.1

The DMC depicted in Fig. 12.2 is characterized by \( {\mathcal{X}} = {\mathcal{Y}} = \left\{ {{+}1, {-}1} \right\} \) and by \( \Pr \left\{ {Y = {+}1 |X = {+}1} \right\} = \Pr \left\{ {Y = {-}1 |X = {-}1} \right\} = 1 - p \), \( \Pr \left\{ {Y = {+}1 |X = {-}1} \right\} = \Pr \left\{ {Y = {-}1 |X = {+}1} \right\} = p \). This channel is known as binary symmetric channel (BSC) , and p is called the error (or crossover) probability. Every binary symbol input to the channel is received in error with probability p and is correctly received with probability \( 1 - p \). The capacity of the BSC is achieved for \( \Pr \left\{ {X = + 1} \right\} = \Pr \left\{ {X = {-}1} \right\} = 1/2 \) and is given by Footnote 1

Fig. 12.2
figure 2

Binary symmetric channel (BSC) model

$$ C = 1 - \left[ { - p\log_{2} p - \left( {1 - p} \right)\log_{2} \left( {1 - p} \right)} \right]. $$
(12.7)

As we shall see later, the BSC is a possible channel model for SLC Flash memories. Assuming \( p \le 1/2 \), its capacity is maximum for \( p = 0 \), where we have \( C = 1 \) (every binary symbol outcoming from the channel is reliable) and is minimum for \( p = 1/2 \), where we have \( C = 0 \) (no uncertainty is removed from X by observing Y).

The concept of capacity, so far introduced for a DMC, can be extended to time-discrete memory-less channels whose input symbol is either a discrete or a continuous random variable and whose output symbol is a continuous one. The capacity is still defined by (12.6), where the mutual information is now given by (12.2) if X is continuous, and by (12.3) if X is discrete. As opposed to the DMC case, however, additional constraints to the optimization problem may be introduced (for example, an upper bound on the average transmitted power). The reason is that the solution to the unconstrained optimization problem may correspond to an input variable X for which the channel is essentially noiseless.

Additive noise channels represent an important class of such channels. Here, the output symbol is obtained as \( Y = X + Z \), where Z is a continuous random variable, namely, an additive noise. If Z is independent of X and is normally distributed with zero mean and variance \( \sigma^{2} \),

$$ p\left( z \right) = \frac{1}{{\sqrt {2\pi \sigma^{2} } }}e^{{ - \frac{{z^{2} }}{{2\sigma^{2} }}}} , $$

then the corresponding channel is called an additive Gaussian channel.

Example 12.2

Consider the additive Gaussian channel depicted in Fig. 12.3, and assume that X is a Bernoulli (i.e. , discrete with a binary alphabet) random variable. Without any further constraint, it is possible to achieve the capacity \( {C} = {1} \) (corresponding to a noiseless channel) regardless of \( {\sigma}^{2} \) by letting \( {X} \in \left\{ { - {A}, + {A}} \right\} \), where \( {A} > 0 \) is a real, choosing \( \text{Pr}\left\{ {X} = - {A} \right\} = \text{Pr}\left\{ {X} = + \varvec{A} \right\} = {1}/{2} \), and letting \( {A} \to \infty \). On the other hand, if the maximization problem is constrained to \( \left( {1}/{n} \right)\sum\nolimits_{{{i} = 0}}^{{{n} - 1}} {{x}_{{i}}^{2} \le {E}_{{s}} } \) for any transmitted codeword, then the maximum is attained for \( {X} \in \left\{ {{-}\sqrt {{E}_{{s}} } , {+}\sqrt {{E}_{{s}} } } \right\} \) and \( \text{Pr}\left\{ {{X} = {-}\sqrt {{E}_{{s}} } } \right\} = \text{Pr}\left\{ {{X} = {+}\sqrt {{E}_{{s}} } } \right\} = {1}/{2} \). In this case (12.3) yields

Fig. 12.3
figure 3

Binary-input additive white Gaussian noise channel model

$$ C = - \int {p\left( y \right)\log_{2} \left( {p\left( y \right)\sqrt {2\pi e\sigma^{2} } } \right)dy} , $$

where

$$ p\left( y \right) = \frac{1}{{\sqrt {8\pi \sigma^{2} } }}\left( {e^{{ - \frac{{\left( {y - \sqrt {E_{s} } } \right)^{2} }}{{2\sigma^{2} }}}} + e^{{ - \frac{{\left( {y + \sqrt {E_{s} } } \right)^{2} }}{{2\sigma^{2} }}}} } \right). $$

and where the capacity, that does not admit a closed-form expression, must be computed via numerical integration. This channel model is known as the binary-input additive white Gaussian noise (Bi-AWGN) channel. It is possible to show that its capacity is a function of parameter \( E_{s} /N_{0} \), where \( N_{0} = 2\sigma^{2} \). In general, the larger \( E_{s} /N_{0} \) the higher C. Moreover, \( C \to 1 \) as \( E_{s} /N_{0} \to \infty \).

Example 12.3

Consider a channel \( X \to Y^{{\prime }} \to Y \) composed of the cascade of a Bi-AWGN channel and a one-bit quantizer, returning \( Y = {+}1 \) if \( Y^{{\prime }} > 0 \) and \( Y = {-}1 \) otherwise (if \( Y^{{\prime }} = 0 \), +1 or −1 is returned with equal probability). It is readily shown that this channel is equivalent to a BSC whose crossover probability p is

$$ p = \frac{1}{2}{\text{erfc}}\left( {\sqrt {\frac{{E_{s} }}{{N_{0} }}} } \right) $$
(12.8)

where

$$ {\text{erfc}}\left( x \right) = \frac{2}{\sqrt \pi }\int_{x}^{\infty } {e^{{ - \theta^{2} }} d\theta } . $$

Again, the capacity is a monotonically increasing function of parameter \( E_{s} /N_{0} \), and again \( C \to 1 \) as \( E_{s} /N_{0} \to \infty \). For the same value of \( E_{s} /N_{0} \), the capacity of the output-quantized Bi-AWGN channel is always smaller than the capacity of the corresponding unquantized channel.

With reference to the last two examples, if Y is allowed to assume \( q > 2 \) different quantized values (which corresponds to adopting \( \left\lceil {\log_{2} q} \right\rceil \) quantization bits), the capacity of the obtained channel is upper bounded by that of the unquantized Bi-AWGN channel and is lower bounded by that of the one-bit quantized channel. (Note that the \( q - 1 \) quantization thresholds shall be properly designed.) In general, the higher q the larger the capacity.

12.1.3 The Channel Coding Theorem

Adopting the formulation in [1], which makes use of the maximum error probability defined in (12.4), Shannon’s channel coding theorem can be stated as follows. “For every rate \( R < C \) there exists a sequence of \( \left( {n,2^{nR} } \right) \) codes for which \(\lim_{n \to \infty } P_{e,max} \left( n \right) = 0 \). Conversely, if \( \lim_{n \to \infty } P_{e,max} \left( n \right) = 0 \) for a sequence of \( \left( {n,2^{nR} } \right) \) codes, then \( R \le C \).” Note that \( \lim_{n \to \infty } P_{e,max} \left( n \right) = 0 \) implies \(\lim_{n \to \infty } P_{e} \left( n \right) = 0 \), where \( P_{e} \left( n \right) \) is the average error probability defined in (12.5).

Essentially, Shannon’s channel coding theorem states that communication over a noisy channel is possible with an arbitrarily small maximum error rate if and only if the code rate of the employed channel code does not exceed the channel capacity . On the other hand, from the proof of the converse, it is possible to show that, when \( R > C \), the average probability of error probability is bounded away from zero. Specifically, we have

$$ P_{e} \left( n \right) \ge 1 - \frac{C}{R} - \frac{1}{nR} $$
(12.9)
$$ \to 1 - \frac{C}{R} $$
(12.10)

in the limit where \( n \to \infty \). Inequality (12.9) defines a non-achievable region for the considered communication channel. No channel code of length n exists whose average probability of error over the considered channel is smaller than the right-hand side of (12.9). For \( n \to \infty \), the non-achievable region is identified by (12.10). For a channel parametrized by some parameter \( \gamma \) (e.g., the crossover probability p for a BSC , or \( E_{s} /N_{0} \) for the Bi-AWGN channel or its output-quantized version), the non-achievable region can be reported in the \( P_{e} \left( n \right) \) versus \( \gamma \) plane for a specific code rate R, as illustrated in the following example.

Example 12.4

In Fig. 12.4 the non-achievable region is depicted for both the unquantized Bi-AWGN channel and its one-bit output-quantized version, for code rate \( R = 9/10 \) and infinite codeword length. Specifically, for fixed \( R = 9/10 \) the right-hand side of (12.10) is plotted as a function of \( E_{b} /N_{0} \) (in logarithmic scale), where \( E_{b} = RE_{s} \). If \( E_{s} \) is interpreted as the energy per transmitted binary symbol, \( E_{b} \) can be regarded as the energy per information bit. The dashed curve identifies a non-achievable region over the unquantized Bi-AWGN channel (i.e. , no \( \left( {E_{b} /N_{0} ,P_{e} } \right) \) point inside the corresponding area is achievable), while the solid one a non-achievable region over its one-bit output-quantized version. That the unquantized non-achievable region is contained in the quantized one is coherent with the fact that the capacity of the Bi-AWGN channel is larger than the capacity of its output-quantized version, for the same value of \( E_{s} /N_{0} \). In general, if \( q > 2 \) quantization levels are allowed, the corresponding non-achievable region is identified by a curve falling between the two plotted curves. This serves to illustrate how soft information at the decoder can be exploited to improve the system performance. The smallest value of \( E_{b} /N_{0} \) for which communication is possible with a vanishing error probability at the given rate \( R = 9/10 \) over the Bi-AWGN channel is about \( 3.198\,{\text{dB}} \). The corresponding value over the one-bit quantized Bi-AWGN channel is about \( 4.400\,{\text{dB}} \).

Fig. 12.4
figure 4

Plot of the Shannon limit for code rate \( R = 9/10 \), over the Bi-AWGN channel and over the BSC obtained via one-bit quantization of the output of the Bi-AWGN channel

12.2 Maximum a Posteriori and Maximum Likelihood Decoding of Linear Block Codes

As from Sect. 12.1.2, decoding is essentially a decision problem. Given the observation y from the communication channel, the decoder has to decide which of the M codewords has been most likely transmitted, in order to minimize the maximum probability of error. Optimum decoding is based on maximum a posteriori (MAP) decision criterion, and consists of assuming as the transmitted codeword the one maximizing the a posteriori probability:

$$ \hat{\varvec{x}} = {\text{argmax}}_{\varvec{x}} p\left( {\varvec{x} |\varvec{y}} \right). $$

When the codewords are a priori equally likely, then MAP decoding is equivalent to maximum likelihood (ML) decoding, that returns the codeword

$$ \hat{\varvec{x}} = {\text{argmax}}_{\varvec{x}} p\left( {\varvec{y} |\varvec{x}} \right). $$

It is readily shown that, over a BSC , ML decoding is equivalent to returning the codeword exhibiting the minimum Hamming distance from the received word y. (Recall that the Hamming distance between two sequences is the number of positions at which the corresponding symbols are different.) Moreover, over a Bi-AWGN channel, ML decoding consists of returning the codeword (whose symbols belong to the set \( \{{{-}\sqrt {E_{s} } , {+}\sqrt {E_{s} } }\} \)) exhibiting the minimum Euclidean distance from y.

Optimum decoding is unfeasible for most codes (including linear codes), due to the need of computing M metrics, with M prohibitively large. Low-density parity-check codes, introduced in Sect. 12.4, are capable to perform close to the Shannon limit at a manageable complexity.

12.3 NAND Flash Memory Channel Model

In NAND flash memories, the generic memory cell is a floating gate transistor. Writing the cell consists of exploiting Fowler-Nordheim tunneling effect [2] to inject a certain amount of charges into the floating gate in order to program the threshold voltage \( V_{th} \) of the transistor. For an MLC memory with b bits per cell, there are \( 2^{b} \) nominal values for threshold voltage \( V_{th} \), each bijectively associated with a word of b bits. (There are two nominal values for \( V_{th} \) in the particular case of an SLC memory.) The whole range of possible values of \( V_{th} \) is then partitioned into \( 2^{b} \) intervals, each corresponding to a nominal value of the threshold voltage .

Reading a cell is a decision problem consisting of picking one of the \( 2^{b} \) nominal values of \( V_{th} \) and forwarding the corresponding binary b-tuple. The value of \( V_{th} \), however, cannot be observed directly. In order to read the cell, a word-line voltage must be applied and the corresponding transistor drain current measured. In this chapter, we refer to the word-line voltage simply as the “read voltage”, denoting it by \( V_{READ} \). If for some \( V_{READ} \) a sufficiently high drain current is detected then we conclude that \( V_{READ} > V_{th} \), otherwise we conclude that \( V_{READ} < V_{th} \). In this sense, the application of a specific read voltage value is capable to provide exactly one bit of information. Therefore, in order to read the full content of a cell in an MLC memory the drain current must be analyzed for a sufficiently large number of read voltage values. A single \( V_{READ} \) value is sufficient in the SLC case unless we wish to extract some soft information to improve the performance of the adopted error control coding scheme.

In ideal flash memories, after a cell is written the corresponding value of \( V_{th} \) is exactly equal to one of the \( 2^{b} \) nominal values. In real memories, however, the actual value of \( V_{th} \) may differ, even significantly, from its nominal value due to a number of possible physical impairments. For a thorough description of these impairments we refer the reader, for example, to [3, Chap. 4], [4]. As such, the actual value of \( V_{th} \) may fall into a voltage interval whose nominal voltage threshold is different from the one we attempted to set during the write operation. When this happens the forwarded binary b-tuple after a read operation differs from the one that was written into the cell. A bit error generated by an erroneous decision about the interval of voltage values \( V_{th} \) belongs to is called a raw bit error, and the probability of occurrence of raw bit errors is called the raw bit error probability.

The raw bit error probability may be analyzed by modeling the threshold voltage \( V_{th} \) of the generic cell as a continuous random variable whose pdf is here denoted by \( p\left( {V_{th} } \right) \). It must be pointed out that \( p\left( {V_{th} } \right) \) is not constant during the memory lifetime, as it is modified by subsequent write and read operations, leading to a progressive degradation of the channel in terms of increasing raw bit error probability. The threshold voltages for two different memory cells are typically assumed to be independent and identically distributed (i .i.d .) random variables. In the following two subsections, the channel model for SLC and MLC flash memories is addressed.

12.3.1 SLC Channel Model

The simplest channel model for an SLC flash memory consists of modeling the threshold voltage \( V_{th} \) of the generic cell as the weighted sum (with the same weights) of two independent Gaussian random variables with the same variance \( \sigma^{2} \) neglecting that, in principle, Gaussian random variables assume their values over an infinite range. The mean values of the two Gaussian distributions are the two nominal values of the threshold voltage , namely,\( V_{th,1} \) and \( V_{th,2} \) where we assume \( V_{th,1} < V_{th,2} \). Let \( X \in \left\{ {0,1} \right\} \) be a Bernoulli random variable with equiprobable values, representing the bit originally written into the memory cell. Moreover, let Y be the symbol read from the cell. Conditionally to X, the threshold voltage \( V_{th} \) is a Gaussian random variable with variance \( \sigma^{2} \) and whose mean is \( V_{th,1} \) if \( X = 1 \) (erase state) and \( V_{th,2} \) if \( X = 0 \). This is depicted in Fig. 12.5. Overall, we have

Fig. 12.5
figure 5

Plot of \( p(V_{th} |X = 1) \) and \( p(V_{th} |X = 0 ) \) for an SLC flash memory where the threshold voltage \( V_{th} \) is modeled as the sum of two independent and identically distributed (i .i .d.) Gaussian random variables

$$ \begin{aligned} p\left( {V_{th} } \right) & = \frac{1}{2}p\left( {V_{th} |X = 1} \right) + \frac{1}{2}p\left( {V_{th} |X = 0} \right) \\ & = \frac{1}{{\sqrt {8\pi \sigma^{2} } }}\left( {e^{{ - \frac{{\left( {V_{th} - V_{th,1} } \right)^{2} }}{{2\sigma^{2} }}}} + e^{{ - \frac{{\left( {V_{th} - V_{th,2} } \right)^{2} }}{{2\sigma^{2} }}}} } \right). \\ \end{aligned} $$

If we apply only one read voltage \( V_{th,1} < V_{READ,1} < V_{th,2} \) we get information about the actual value of \( V_{th} \) being larger or smaller than the applied read voltage value. Hence, if only one read voltage value is used, Y is a Bernoulli random variable as well as X. In particular, we have \( Y = 1 \) if \( V_{th} < V_{READ} \) is detected, and \( Y = 0 \) otherwise. A raw bit error occurs any time \( Y \ne X \), and the raw bit error probability is trivially minimized by setting \( V_{READ,1} = \left( {V_{th,1} + V_{th,2} } \right)/2 \), as depicted in Fig. 12.5. In this situation, the channel is clearly equivalent to the cascade of a Bi-AWGN channel and a one-bit quantizer described in Example 12.2 (i.e. , to a BSC ), and the raw bit error probability is given by (12.8) where \( E_{s} /N_{0} = \left( {V_{th,2} - V_{READ,1} } \right)^{2} /2\sigma^{2} \). At the beginning of the memory life, \( \sigma^{2} \) is very small and the memory is almost ideal. Then, \( \sigma^{2} \) increases with the memory use, increasing the raw error probability and degrading the channel. A typical value of the raw bit error probability towards the end of the memory life is \( 10^{{-}2} \).

If an error correcting code is employed to protect the data stored in the flash memory, hard-decision decoding must be necessarily performed if only one \( V_{READ} \) value is used as no soft information is available at the decoder . As it will be shown in Sect. 0, however, the availability of soft information at the decoder input represents an essential feature to boost the performance of the coding scheme. In order to provide the decoder with soft information, and consequently to increase its coding gain, more read voltages must be applied sequentially. For example, with reference again to Fig. 12.5 we may employ three read voltage values \( V_{READ,1} \), \( V_{READ,2} \), and \( V_{READ,3} \) and apply two of them for each cell read operation. Specifically, \( V_{READ,1} \) is applied at first. if \( V_{th} < V_{READ,1} \) then \( V_{READ,2} \) is applied to discriminate between \( V_{th} < V_{READ,2} \) and \( V_{READ,2} < V_{th} < V_{READ,1} \). On the contrary, \( V_{READ,3} \) is applied to discriminate between \( V_{th} > V_{READ,3} \) and \( V_{READ,1} < V_{th} < V_{READ,3} \). In this case the output symbol Y is a discrete random variable assuming the four possible values in the set \( \{ Y_{1} ,Y_{2} ,Y_{3} ,Y_{4} \} \) and the channel may be represented as the DMC depicted in Fig. 12.6.

Fig. 12.6
figure 6

Equivalent channel model for an SLC flash memory where the threshold voltage is modeled as the sum of two i .i.d . Gaussian random variables and where three read voltage values are employed. Each read operation involves two read voltages

Each arrow in the depicted DMC is associated with a transition probability \( p\left( {y |x} \right) \), where the transition probabilities depend on the choice of the read voltages \( V_{READ,2} \) and \( V_{READ,3} \). A “natural” approach to choose them consists of maximizing the mutual information between the random variables X and Y under the setting \( \Pr \left( {X = 0} \right) = \Pr \left( {X = 1} \right) = 1/2 \). This approach, proposed in [5], may be easily extended to any number of read voltages. It may also be easily extended to different choices of the pdf \( p\left( {V_{th} } \right) \), and therefore to MLC Flash memories.

12.3.2 MLC Channel Model

While the channel model for SLC Flash memories is rather well-established, the development of an MLC channel model is still a subject of research and measurement campaigns, and several models may be found in the literature. These models typically assume the random variable \( V_{th} \) to be the weighted sum (with the same weights) of \( 2^{b} \) independent random variables, each corresponding to a nominal value of the threshold voltage . Among these models, the one described next has been adopted in several works [6]. Letting X denote the binary b-tuple that was written in the cell, the pdf \( p(V_{th} |\varvec{X}_{1} = 11 \ldots 1) \) associated with the lowest nominal threshold voltage value \( V_{th,1} \) (erase state) is modeled as Gaussian with mean \( V_{th,1} \) and variance \( \sigma_{0}^{2} \), while the pdf \( p(V_{th} |\varvec{X}_{i} ) \) associated with any other nominal value \( V_{th,i} \) \( (\varvec{X}_{i} \ne 11 \ldots 1) \) is characterized by a uniform central region of size \( \Delta V \) centered in the mean value \( V_{th,i} \) and by two Gaussian tails of variance \( \sigma^{2} < \sigma_{0}^{2} \). Formally, for \( i \in \left\{ {2,3, \ldots ,2^{b} } \right\} \) we have

$$ p\left( {V_{th} |\varvec{X}_{i} } \right) = \left\{ {\begin{array}{*{20}c} {\frac{1}{{\sqrt {2\pi \sigma^{2} } + \Delta V}}e^{{ - \frac{{\left( {V_{th} - V_{th,i} - \Delta V/2} \right)^{2} }}{{2\sigma^{2} }}}} } & {V_{th} > V_{th,1} + \frac{\Delta V}{2}} \\ {\frac{1}{{\sqrt {2\pi \sigma^{2} } + \Delta V}}} & {V_{th,1} - \frac{\Delta V}{2} < V_{th} < V_{th,1} + \frac{\Delta V}{2}} \\ {\frac{1}{{\sqrt {2\pi \sigma^{2} } + \Delta V}}e^{{ - \frac{{\left( {V_{th} - V_{th,i} + \Delta V/2} \right)^{2} }}{{2\sigma^{2} }}}} } & {V_{th} < V_{th,1} - \frac{\Delta V}{2}} \\ \end{array} } \right. $$

and

$$ p\left( {V_{th} } \right) = \frac{1}{{2^{b} }}\sum\limits_{i = 1}^{{2^{b} }} {p\left( {V_{th} |\varvec{X}_{i} } \right)} . $$

A pictorial representation of the four conditional pdfs \( p\left( {V_{th} |X_{i} } \right) \), \( i \in \left\{ {1,2,3,4} \right\} \), for an MLC flash memory with \( b = 2 \) bits per cell and equally spaced threshold voltages is shown in Fig. 12.7.

Fig. 12.7
figure 7

Representation of the four conditional probability density functions \( p\left( {V_{th} |\varvec{X}_{i} } \right) \) of the threshold voltage in an MLC flash memory with \( b = 2 \) bits per cell

In an analogous way as for the SLC case, a read is performed by applying sequentially a certain number of read voltages \( V_{READ} \) in order to identify the interval in which the actual value of the threshold voltage belongs. If \( N \ge 2^{b} - 1 \) different read voltages are employed, the equivalent communication channel is a DMC with \( 2^{b} \) equiprobable input symbols X and \( N + 1 \) output symbols Y. Again, the larger the number of employed read voltages (i.e. , the larger the number of intervals in which the range of possible \( V_{th} \) values is partitioned) the more accurate the soft information at the decoder input, the lower the bit error rate after decoding. Again, the values of the N read voltages must be properly designed, for instance, maximizing the mutual information \( I\left( {\varvec{X};\varvec{Y}} \right) \) under the assumption \( { \Pr }\left( {\varvec{X} = \varvec{X}_{i} } \right) = 2^{ - b} \) for all \( i \in \left\{ {1,2, \ldots ,2^{b} } \right\} \).

12.4 Low-Density Parity -Check Codes

Low-density parity-check (LDPC) codes were introduced by Gallager in [7] and have been almost forgotten for about 30 years. They gained a new interest only after the discovery of turbo codes [8], when it was shown that iterative decoding schemes can attain performances very close to the Shannon limit with a manageable complexity [9, 10].

A binary LDPC code is defined as a binary linear block code whose parity-check matrix H is characterized by a relatively small number of 1 entries, i.e. , whose parity-check matrix is sparse. LDPC codes are often represented graphically through a bipartite graph \( G = \left( {{\mathcal{V}} \cup {\mathcal{C}},{\mathcal{E}}} \right) \) called the Tanner graph [11]. In the Tanner graph there are two different types of nodes, namely, the variable nodes (whose set is \( {\mathcal{V}} \)) and the check nodes (whose set is \( {\mathcal{C}} \)). The n variable nodes and the m check nodes are associated in a bijective way with the n encoded bits of the generic codeword and with the m parity -check equations, respectively. Each edge \( e \in {\mathcal{E}} \) in the Tanner graph connects a variable node \( V \in {\mathcal{V}} \) with a check node \( C \in {\mathcal{C}} \) if and only if the bit corresponding to V is involved in the parity -check equation corresponding to C. Note that in general not all the m parity -check equations may be linearly independent, so that the actual code rate R of the LDPC code fulfills

$$ R \ge \frac{n - m}{n} $$

where equality holds when all m equations are independent. In the Tanner graph of an LDPC code a cycle (or loop) is any closed path starting from a node and ending on the same node. The length of a cycle is the number of edges involved in the cycle. Moreover, the girth g of the Tanner graph is the length of its shortest loop. For reasons that will be clear in the next section, the Tanner graph of an LDPC code should exhibit a large girth. In the Tanner graph , the degree of a variable node or check node is the number of edges incident to it. An LDPC code is said to be regular if all of its variable nodes have the same degree and all of its check nodes have the same degree, and is said to be irregular otherwise.

The representation of LDPC codes in terms of their Tanner graphs is very convenient in order to describe their iterative decoding algorithm, known as belief propagation (BP). In fact, as it will be addressed in Sect. 12.5, BP decoding of LDPC codes may be interpreted as an iterative exchange of messages between the variable nodes and the check nodes along the edges of the Tanner graph . In principle, the Tanner graph can be drawn for any H matrix of any linear block code. As an example, in Fig. 12.8 the Tanner graph is depicted for the \( \left( {7,4} \right) \) Hamming code represented by \( \varvec{H} = \left[ {1110100,1101010,10111001} \right] \).

Fig. 12.8
figure 8

Tanner graph of a \( \left( {7,4} \right) \) Hamming code represented by \( H = \left[ {1110100,1101010,10111001} \right] \). There are seven variable nodes variable nodes \( \left\{ {V_{0} , \ldots ,V_{6} } \right\} \), one for each encoded bit, and three check nodes , \( \left\{ {C_{0} , \ldots ,C_{2} } \right\} \), one for each parity -check equation

In this section we provide a few details about binary LDPC code design, while LDPC decoding is discussed in the next section. One of the major issues in LDPC coding is represented by efficient encoding, i.e. , the efficient computation of the encoded codeword of n bits from a message W represented by a binary k-tuple. Hence, we focus on the design of quasi-cyclic LDPC (QC-LDPC ) codes based on circulant matrices, a class of LDPC codes characterized by low-complexity encoding and good performances [12]. In general, a linear block code is said to be quasi-cyclic when there exists some positive integer q such that a cyclic shift by q positions of any codeword results in another codeword. The encoder of QC-LDPC codes may be implemented very efficiently in hardware using shift register-based circuits [13]. Efficient hardware implementations for the decoder are also available [14].

12.4.1 LDPC Code Ensembles

As opposed to classical algebraic codes, LDPC codes are typically analyzed in terms of average ensemble properties, where an LDPC code ensemble is formed by all LDPC codes having the same codeword length n and nominally the same rate R, and sharing common properties. This approach was introduced by Gallager to analyze his regular LDPC codes [7], and has been successfully adopted to design irregular LDPC codes performing very close to the Shannon limit [15, 16].

An example of LDPC code ensemble is the unstructured irregular one [15]. Let n and m be the numbers of variable and check nodes , respectively. Moreover, let \( \Lambda _{i} \) and \( {\text{P}}_{i} \) be the fractions of variable nodes and check nodes of degree i, respectively. Hence, in the Tanner graph there are \( \Lambda _{i} n \) variable nodes with i sockets and \( {\text{P}}_{i} m \) check nodes with i sockets and the number of edges is \( E = n\sum\nolimits_{i = 2}^{D} {i\Lambda _{i} } = m\sum\nolimits_{i = 2}^{H} {i{\text{P}}_{i} } \) where D is the maximum variable node degree and H the maximum check node degree. For given \( \Lambda _{i} \), \( i = 2, \ldots ,D \) and \( {\text{P}}_{i} \), \( i = 2, \ldots ,H \),Footnote 2 the unstructured \( {\mathcal{C}}\left( {n,\Lambda ,{\text{P}}} \right) \) ensemble includes all LDPC codes corresponding to all possible \( E! \) edge permutations between the variable node and the check node sockets, according to a uniform probability distribution.

Another example is the protograph ensemble [17] (see also the work [18] on LDPC codes from superposition). A protograph is defined as a small Tanner graph and represents the starting point to derive a larger Tanner graph via a “copy-and-permute” procedure. Specifically, the protograph is first copied Q times. Then, the edges of the individual replicas are permuted among the replicas, leading to a larger graph. The edge permutation is performed in such a way that, if an edge e connects a variable node V to a check node C in the protograph, then in the final graph any of the Q replicas of e may connect only a replica of V to a replica of C. Note that, while parallel edges between nodes are allowed in the protograph, they are avoided in the permutation phase. An example of this copy-and-permute procedure is depicted in Fig. 12.9. For a given protograph and a given Q the ensemble is composed of the LDPC codes corresponding to all possible edge permutations fulfilling the described constraints (again, the probability distribution over such permutations is uniform).

Fig. 12.9
figure 9

Conceptual example of copy-and-permute protograph procedure

12.4.2 QC-LDPC Codes Construction

A very popular technique to design finite length LDPC codes consists of two subsequent steps. An ensemble of LDPC codes with desired properties is first designed and then a code from the ensemble is picked constructing its Tanner graph according to some graph-lifting algorithm. In the first design phase (ensemble optimization) asymptotic ensembles are considered, i.e. , ensembles of LDPC codes whose codeword length tends to infinity (examples are the unstructured \( {\mathcal{C}}\left( {\infty ,\Lambda ,{\text{P}}} \right) \) ensemble and the protograph ensemble defined by a specific finite-length protograph in the limit where \( Q \to \infty \)). The main parameter characterizing an asymptotic ensemble of LDPC codes under iterative decoding is the asymptotic decoding threshold [19, 15]. Letting \( \ell \) be the iteration index and assuming that the communication channel is parameterized by some real parameter \( \theta \) such that \( \theta_{1} < \theta_{2} \) means that the channel corresponding to \( \theta_{2} \) is a degraded version of the channel corresponding to \( \theta_{1} \), the asymptotic threshold \( \theta^{*} \) is defined as

$$ \theta^{*} = { \sup }\left\{ {\theta \,{\text{s}}.{\text{t}}.\,P_{e,\ell }^{\infty } \to 0\,\,{\text{as}}\,\,\ell \to \infty } \right\} $$

where \( P_{e,\ell }^{\infty } \) is the average error probability under iterative decoding over the asymptotic ensemble (i.e. , the expected probability of error for an LDPC code randomly picked in the asymptotic ensemble). For example, over a BSC the parameter \( \theta \) is the crossover probability p, while over a Bi-AWGN channel it is the noise power \( \sigma^{2} \) for given \( E_{s} \) (therefore over the Bi-AWGN channel the threshold may be expressed as \( \left( {E_{b} /N_{0} } \right)^{*} \) where \( E_{b} = RE_{s} \) and R is the nominal ensemble rate). Note that for the same ensemble, the threshold is different for different message passing decoders. For unstructured ensembles the threshold may be calculated exactly via a procedure called density evolution [15] or approximately via a tool known as EXIT chart [20]. For protograph ensembles it may be calculated with good approximation via multi-dimensional EXIT analysis [21]. In Sect. 12.6.2 density evolution is reviewed for unstructured regular LDPC ensembles and for a very simple decoder called the Gallager B decoder .

Once a protograph ensemble with a satisfying threshold over the channel of interest has been designed, a QC-LDPC code can be constructed from the protograph. This step is usually performed by first representing the protograph as a base matrix B. The number of rows and columns in the base matrix equal the number of check and variable nodes in the protograph, respectively. Moreover, the \( \left( {j,i} \right){\text{th}} \) entry of B is equal to the number of connection between check node \( C_{j} \) and variable node \( V_{i} \) in the protograph. For example, the base matrix corresponding to the protograph depicted in Fig. 12.9 is

$$ \varvec{B} = \left[ {\begin{array}{*{20}c} 1 & 2 & 1 \\ 1 & 1 & 1 \\ \end{array} } \right]. $$

In order to construct the parity-check matrix H of a QC-LDPC code from B, each entry in the base matrix is replaced with a \( Q \times Q \) circulant matrix, where a circulant matrix is any square matrix such that every row is obtained from the previous row by a cyclic shift to the right by one position. An entry in B equal to t is replaced by a circulant matrix whose rows and columns all have Hamming weight t. (Null entries in B are replaced by zero \( Q \times Q \) square matrices.) If the number of variable nodes in the protograph is \( n_{p} \) then the final LDPC code has length \( Qn_{p} \). Moreover, it is a QC-LDPC code as the cyclic shift of any codeword by \( n_{p} \) positions results in another codeword. The specific circulant matrices used to replace the entries of the base matrix are chosen according to algorithms aimed at increasing the girth g of the graph, making it suitable to iterative message-passing decoding. It is pointed out that sometimes the parity-check matrix H is obtained by lifting the base matrix in several steps. For example, instead of replacing each entry of B by a \( Q \times Q \) matrix (for large Q), \( \tilde{Q} \times \tilde{Q} \) circulant matrices may be used at first, with Q being a multiple of \( \tilde{Q} \), and then circulant permutation matrices of size \( Q/\tilde{Q} \) may replace each entry in the “intermediate” matrix.Footnote 3

12.4.3 Error Floor

Finite length LDPC codes are affected by a phenomenon known as the “error floor ” [24, 25]. Considering again a communication channel parameterized by a real parameter \( \theta \) indicating the level of channel noise, the error floor consists of a sudden reduction in the slope of the LDPC code performance curve when \( \theta \) becomes lower than some value. For example, over the BSC the error floor appears at sufficiently low values of the error probability p, while over the Bi-AWGN channel it appears at sufficiently high values of \( E_{b} /N_{0} \). An example performance curve in term of bit error rate (BER) versus \( E_{b} /N_{0} \) exhibiting an error floor is depicted in Fig. 12.10. In NAND Flash memories applications, very pressing requirements are usually imposed on the error floor . More specifically, it is often required that the error floor must not appear above page error rate (i.e. , codeword error rate) \( 10^{{-}15} \).

Fig. 12.10
figure 10

Performance curve (in terms of BER vs. \( E_{b} /N_{0} \)) exhibiting an error floor at \( {\text{BER}}\,\approx\,10^{ - 7} \) (\( E_{b} /N_{0} > 4.6 \) dB)

The error floor of LDPC codes under belief propagation decoding is mainly due to graphical structures in the Tanner graph called trapping sets [25]. Given a subset \( {\mathcal{W}} \) of the variable nodes, the subgraph induced by \( {\mathcal{W}} \) is the bipartite graph composed of \( {\mathcal{W}} \), of the subset \( {\mathcal{U}} \) of check nodes connected to \( {\mathcal{W}} \) and of the corresponding edges. By definition, an \( \left( {a,b} \right) \) trapping set is any size-a subset \( {\mathcal{W}} \) of the variable nodes, such that there are exactly b check nodes of odd degree (an arbitrary number of check nodes of even degree) in the corresponding induced subgraph. The parameter a is called the size of the trapping set . If there are only degree-1 and degree-2 check nodes in the induced subgraph, then the trapping set is said to be elementary. Elementary trapping sets of small size are a major cause of error floor for iteratively decoded LDPC codes. We point out that small weight codewords may also contribute to the error floor together with trapping sets.

The need to construct LDPC codes characterized by very low error floors imposes some modifications to the QC-LDPC code design procedure described in the previous subsection, which becomes more involved. The asymptotic decoding threshold is not the only metric to be taken into account during the ensemble optimization phase, as other asymptotic parameters such as the typical relative minimum distance or smallest trapping set size must be considered [26, 27]. We also point out that reliable error floor analysis at very low error rates of LDPC codes for storage applications still represents an open issue. In fact, Monte Carlo software simulation is not feasible at very low error rates because of prohibitively long simulation times. Approaches proposed in the literature are hardware simulation, importance sampling [6, 28], and estimation techniques [29].

12.5 Belief Propagation (BP) Decoding of LDPC Codes

12.5.1 Introduction

As opposed to MAP and ML decoding algorithms (Sect. 12.2), that are block-wise algorithms, BP is a bit-wise decoding algorithm, working iteratively. More specifically, at the end of each decoding iteration a separate decision is taken about each bit in the codeword, and then it is checked whether the currently decoded hard-decision sequence is a codeword or it is not. Letting \( \varvec{y} = \left[ {y_{0} ,y_{1} , \ldots ,y_{n - 1} } \right] \) denote the sequence outcoming from the communication channel, the decision about encoded bit \( c_{i} \), \( i = 0, \ldots ,n - 1 \), is taken according to its a posteriori likelihood ratio (LR) , namely,

$$ L\left( {c_{i} |\varvec{y}} \right) = \frac{{{\text{Pr(}}c_{i} = 0 |\varvec{y})}}{{{\text{Pr(}}c_{i} = 1 |\varvec{y})}}\begin{array}{*{20}c} {\hat{c}_{i} = 0} \\ { \gtrless } \\ {\hat{c}_{i} = 1} \\ \end{array} 1. $$

Unfortunately, the only information available at variable node i at the beginning of the decoding process is the a priori LR

$$ L(c_{i} |y_{i} )= \frac{{{\text{Pr(}}c_{i} = 0 |y_{i} )}}{{{\text{Pr(}}c_{i} = 1 |y_{i} )}} $$

i.e. , the LR conditioned only to the local observation, not the a posteriori LR \( L(c_{i} |\varvec{y}) \) as required. Indeed, the task of the BP decoder consists of calculating the a posteriori LR for each variable node, starting from the individual a priori LRs, exploiting an iterative exchange of information among the nodes of the bipartite graph. In the following description of the BP decoder , we will not make any assumption on the communication channel, but that the channel is memory-less with binary input and equally likely input values.

12.5.2 Preliminaries

We start with some preliminary material that will be useful to properly describe BP decoding of LDPC codes.

Let us consider a Bernoulli random variable B taking the values 0 and 1 with equal probabilities. As depicted in Fig. 12.11, assume that N random experiments are performed to get information about the value assumed by B and that all these experiments are independent. The outcome of the n-th experiment (n-th observation) is denoted by \( \omega_{n} \), while the vector of N observables by \( \varvec{\omega}= \left[ {\omega_{1} ,\omega_{2} , \ldots ,\omega_{N} } \right] \). We define the likelihood ratio (LR) of B conditioned to the observation \( \omega_{n} \) as

Fig. 12.11
figure 11

N random experiments are conducted to obtain some information about the value of a Bernoulli random variable B. The observation associated with the n-th random experiment is \( \omega_{n} \)

$$ L\left( {B |\omega_{n} } \right) = \frac{{{\text{Pr(}}B = 0 |\omega_{n} )}}{{{\text{Pr(}}B = 1 |\omega_{n} )}} $$
(12.11)

and the a posteriori likelihood ratio of B (i.e. , conditioned to the whole set of N independent observations), as

$$ L\left( {B |\varvec{\omega}} \right) = \frac{{{\text{Pr(}}B = 0 |\varvec{\omega})}}{{{\text{Pr(}}B = 1 |\varvec{\omega})}}. $$
(12.12)

We now seek for an expression of the a posteriori LR , \( L(B|\varvec{\omega}) \), as a function of the individual LRs, each conditioned to a specific observation. By Bayes rule we have

$$ \begin{aligned} L\left( {B |\varvec{\omega}} \right) & = \frac{{p (\varvec{\omega}|B = 0)}}{{p\left( {\varvec{\omega}|B = 1} \right)}} \\ & = \prod\limits_{n = 1}^{N} {\frac{{p (\omega_{n} |B = 0)}}{{p\left( {\omega_{n} |B = 1} \right)}}} \\ & = \prod\limits_{n = 1}^{N} {L\left( {B |\omega_{n} } \right)} , \\ \end{aligned} $$
(12.13)

where the second equality follows from independence of the random experiments.

We also observe that, through (12.12) and the relationship \( \Pr \left( {B = 0 |\varvec{\omega}} \right) + \Pr \left( {B = 1 |\varvec{\omega}} \right) = 1 \), the probabilities \( \Pr \left( {B = 0 |\varvec{\omega}} \right) \) and \( \Pr \left( {B = 1 |\varvec{\omega}} \right) \) may be expressed as functions of the a posteriori LR as follows:

$$ { \Pr }\left( {B = 0 |\varvec{\omega}} \right) = \frac{{L (B |\varvec{\omega})}}{{1 + L (B |\varvec{\omega})}}, $$
(12.14)
$$ Pr\left( {B = 1 |\varvec{\omega}} \right) = \frac{1}{{1 + L (B |\varvec{\omega})}}. $$
(12.15)

This is sometimes referred to as soft bit. Analogous relationships may be derived for \( { \Pr }(B = 0|\omega_{n} ) \) and \( { \Pr }(B = 1|\omega_{n} ) \).

Next, consider n statistically independent Bernoulli random variables \( B_{1} ,B_{2} , \ldots ,B_{n} \) each taking its value in \( \left\{ {0,1} \right\} \). We allow \( \Pr \left( {B_{k} = 1} \right) \ne \,\Pr \left( {B_{l} = 1} \right) \) if \( k \ne l \). We ask what is the probability that the n variables sum to 0 (in binary algebra), i.e. , the probability that an even number of such random variables take value 1. This problem was solved in [7], where it was shown that

$$ { \Pr }(B_{1} + B_{2} + \cdots + B_{n} = 0) = \frac{{1 + \prod\nolimits_{{{\text{k}} = 1}}^{n} {\left( {1 - 2\Pr \left( {B_{k} = 1} \right)} \right)} }}{2}. $$
(12.16)

Consider now n Bernoulli random variables \( B_{1} ,B_{2} , \ldots ,B_{n} \) fulfilling a parity constraint \( B_{1} + B_{2} + \cdots + B_{n} = 0 \). Moreover, assume that some reliability information is known about variables \( B_{1} , \ldots ,B_{i - 1} ,B_{i + 1} , \ldots ,B_{n} \), in terms of LRs \( L\left( {B_{k} } \right) \), \( k \in \left\{ {1, \ldots ,i - 1,i + 1, \ldots ,n} \right\} \) and that \( B_{1} , \ldots ,B_{i - 1} ,B_{i + 1} , \ldots ,B_{n} \) are statistically independent. We seek for an expression of the LR \( L\left( {B_{i} } \right) \), conditional on all available information about the other \( n - 1 \) variables. Since \( { \Pr }(B_{i} = 0) = { \Pr }(B_{1} + \cdots B_{i - 1} + B_{i + 1} + \cdots + B_{n} = 0 \)), through (12.16) we obtain

$$ \begin{aligned} & { \Pr }\left( {B_{i} = 0 |L\left( {B_{1} } \right), \ldots ,L\left( {B_{i - 1} } \right),L\left( {B_{i + 1} } \right), \ldots ,L\left( {B_{n} } \right)} \right) \\ & \,\,\, = \frac{{1 + \prod\nolimits_{{{\text{k}} \ne {\text{i }}}} {\left( {1 - 2\Pr \left( {B_{k} = 1} \right)} \right)} }}{2} \\ \end{aligned} $$

and, consequently,

$$ \begin{aligned} & { \Pr }\left( {B_{i} = 1 |L\left( {B_{1} } \right), \ldots ,L\left( {B_{i - 1} } \right),L\left( {B_{i + 1} } \right), \ldots ,L\left( {B_{n} } \right)} \right) \\ & \,\,\, = \frac{{1 - \prod\nolimits_{{{\text{k}} \ne {\text{i }}}} {\left( {1 - 2\Pr \left( {B_{k} = 1} \right)} \right)} }}{2}. \\ \end{aligned} $$

Note that each term \( \Pr \left( {B_{k} = 1} \right) \) involved in the multiplication may be expressed in terms of the corresponding \( L\left( {B_{k} } \right) \) through (12.15). From the term-by-term ratio between these two latter equations, we obtain

$$ {\text{L}}(B_{i} |L\left( {B_{1} } \right), \ldots ,L\left( {B_{i - 1} } \right),L\left( {B_{i + 1} } \right), \ldots ,L\left( {B_{n} } \right) )= \frac{{1 + \prod\nolimits_{{{\text{k}} \ne {\text{i }}}} {\left( {1 - 2\Pr \left( {B_{k} = 1} \right)} \right)} }}{{1 - \prod\nolimits_{{{\text{k}} \ne {\text{i }}}} {\left( {1 - 2\Pr \left( {B_{k} = 1} \right)} \right)} }}. $$

Through (12.15), after a few calculations this leads to

$$ {\text{L}}(B_{i} |L\left( {B_{1} } \right), \ldots ,L\left( {B_{i - 1} } \right),L\left( {B_{i + 1} } \right), \ldots ,L\left( {B_{n} } \right) )= \frac{{\prod\nolimits_{k \ne i} {\frac{{L\left( {B_{k} } \right) + 1}}{{L\left( {B_{k} } \right) - 1}} + 1} }}{{\prod\nolimits_{k \ne i} {\frac{{L\left( {B_{k} } \right) + 1}}{{L\left( {B_{k} } \right) - 1}} - 1} }}. $$
(12.17)

12.5.3 Algorithm Description

12.5.3.1 Overview

For ease of presentation, in the description of the algorithm we omit the decoding iteration index. We denote by \( r_{i}^{j} \) the message sent by variable node \( V_{i} \), \( i = 0, \ldots ,n - 1 \), to check node \( C_{j} \), \( j = 0, \ldots ,m - 1 \) during the current iteration, and by \( m_{j}^{i} \) the message sent back by check node \( C_{j} \), to variable node \( V_{i} \), during the same iteration. For \( i = 0, \ldots ,n - 1 \), we also denote by \( w_{i} \) the a priori LR for variable node \( V_{i} \), i.e. ,

$$ w_{i} = \frac{{\Pr \left( {c_{i} = 0 |y_{i} } \right)}}{{\Pr \left( {c_{i} = 1 |y_{i} } \right)}}. $$

This is illustrated in Fig. 12.12.

Fig. 12.12
figure 12

Tanner graph of an LDPC code. The message sent by variable node \( V_{i} \) to check node \( C_{j} \) and the message sent by check node \( C_{j} \) to variable node \( V_{i} \) are denoted by \( r_{i}^{j} \) and \( m_{j}^{i} \), respectively

Belief-propagation decoding is composed of four steps, namelyFootnote 4:

  • initialization;

  • horizontal step;

  • vertical step;

  • hard decision and stopping criterion step.

Out of them, the initialization step is executed only once, at the beginning of decoding. The other three steps are executed iteratively, until a termination condition is verified or a maximum number of iterations, denoted by \( I_{max} \), is reached. Each decoding iteration is split into two half-iterations. During the first half-iteration (horizontal step), check nodes process messages incoming from their neighboring variable nodes. Then, each check node sends one message along every edge incident on it. Thus, every check node sends one message per iteration to each of its neighboring variable nodes. During the second half-iteration (vertical step) variable nodes process messages incoming from their neighboring check nodes . Similar to the previous half-iteration, at the end of this processing each variable node sends one message along each edge incident on it. Thus, every variable node sends one message per iteration to each of its neighboring check nodes . At the end of the two half-iterations, a hard decision is taken in each variable node, about the value of the corresponding encoded bit.

The message transmitted by check node \( C_{j} \), \( j = 0, \ldots ,m - 1 \), to variable node \( V_{i} \), \( i = 0, \ldots ,n - 1 \), where \( V_{i} \) belongs to the neighborhood of \( C_{j} \), may be interpreted as the best estimate \( C_{j} \) has about the value of \( V_{i} \) up to the current iteration. This is the estimate of the value of \( V_{i} \) given all information about \( V_{i} \) the check node has got from the variable nodes connected to it other than \( V_{i} \). This is known as extrinsic information. Analogously, the message sent back by variable node \( V_{i} \) to check node \( C_{j} \) may be interpreted as the best estimate \( V_{i} \) has about itself up to the current iteration. This is the estimate of its value given all information the variable node has got from the communication channel and from the check nodes connected to it other than \( C_{j} \) (extrinsic information). All messages exchanged between variable nodes and check nodes are LRs or, equivalently, soft bits.

At the end of the vertical step, each variable node takes a hard decision about the value of its associated bit, based on the a priori information incoming from the channel and on all estimates incoming from the check nodes connected to it. If the obtained hard-decision binary sequence \( \hat{\varvec{c}} \) is a codeword of the LDPC code, i.e. , if every check node is connected to an even number of variable nodes whose current estimate is 1, then a decoding success is declared, decoding is terminated, and \( \hat{\varvec{c}} \) is returned as the decoded codeword. Otherwise, a new iteration is started, unless the maximum number of iterations has been reached. In this latter case, no codeword has been found and a decoding failure is declared. LDPC codes decoded via belief propagation are then characterized by two different error events: detected errors and undetected errors. A detected error takes place whenever no codeword is found up to the maximum number of iterations. An undetected error takes place whenever, at some iteration, the hard-decision sequence \( \hat{\varvec{c}} \) is a codeword but not the transmitted one. Undetected errors may be extremely dangerous is some contexts, including NAND Flash memories.

12.5.3.2 Initialization

At the beginning, each variable node broadcasts to all its neighboring check nodes the a priori LR received from the communication channel. Hence, we have

$$ r_{i}^{j} = w_{i} $$

for all \( j \in N\left( i \right) \), where \( N\left( i \right) \) is the set of indexes of check nodes connected to \( V_{i} \). The expression of \( w_{i} \) depends on the nature of the channel. For example, it is easy to check that over a BSC with error probability p and antipodal mapping \( x_{i} = 1 - 2c_{i} \in \{{{-}1, {+}1}\} \), we have

$$ w_{i} = \left\{ {\begin{array}{*{20}l} {\frac{1 - p}{p}\,\,\,\,{\text{if}}\,\,y_{i} = {+}1} \hfill \\ {\frac{p}{1 - p}\,\,\,\,{\text{if}}\,\,y_{i} = {-}1.} \hfill \\ \end{array} } \right. $$
(12.18)

As another example, over a Bi-AWGN channel and again antipodal mapping \( x_{i} = 1 - 2c_{i} \), (meaning \( E_{s} \) normalized to 1) we have

$$ w_{i} = e^{{(2/\sigma^{2} )y_{i} }} . $$
(12.19)

Importantly, the initialization step requires a knowledge of the channel. For instance, in the case of a BSC the error probability p must be known, as well as the noise power \( \sigma^{2} \) in the Bi-AWGN case.

12.5.3.3 Horizontal Step

For \( j = 0, \ldots ,m - 1 \), check node \( C_{j} \), of degree \( h_{j} \), sends to each of the \( h_{j} \) variable nodes connected to it its current estimate of the corresponding bit. If variable node \( V_{i} \) is connected to \( C_{j} \), the message from \( C_{j} \) to \( V_{i} \) is the LR of bit \( c_{i} \), conditional on the information available at \( C_{j} \) incoming from all its neighboring variable nodes, except the information incoming from \( V_{i} \). A pictorial representation of this process is provided in Fig. 12.13. Note that two different variable nodes connected to \( C_{j} \) will receive, in general, different messages.

Fig. 12.13
figure 13

Check node processing of incoming messages during the horizontal step

The message \( m_{j}^{i} \) from \( C_{j} \) to \( V_{i} \) can be calculated exploiting one of the results introduced in Sect. 12.5.2. In fact, each of the \( h_{j} \) incoming messages is the LR of a specific bit on which the check node imposes a parity constraint. Hence, under independence hypothesis, denoting by \( N\left( j \right)\backslash \left\{ i \right\} \) the set of indexes of variable nodes connected to \( C_{j} \) except \( V_{i} \), from (12.17) we immediately obtain

$$ m_{j}^{i} = \frac{{\prod\nolimits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {\frac{{r_{k}^{j} + 1}}{{r_{k}^{j} - 1}} + 1} }}{{\prod\nolimits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {\frac{{r_{k}^{j} + 1}}{{r_{k}^{j} - 1}} - 1} }}. $$
(12.20)

Note that the independence hypothesis is fulfilled only during the first \( g/2 \) decoding iterations, where g is the girth of the Tanner graph . On the other hand, it represents an approximation during all subsequent iterations.

12.5.3.4 Vertical Step

For \( i = 0, \ldots ,n - 1 \), variable node \( V_{i} \), of degree \( d_{i} \), sends to each of its \( d_{i} \) neighboring check nodes its current estimate of the associated bit. With reference to Fig. 12.14, the message \( r_{i}^{j} \) sent to check node \( C_{j} \) is the LR about bit \( c_{i} \), conditional on the a priori information available from the communication channel and on the information incoming from all check nodes connected to it, except \( C_{j} \). Again, two different check nodes connected to \( V_{i} \) will receive, in general, different messages.

Fig. 12.14
figure 14

Variable node processing of incoming messages during the vertical step

The message \( r_{i}^{j} \) that variable node \( V_{i} \) sends to check node \( C_{j} \) connected to it can be easily computed based on the result in Sect. 12.5.2. In fact, each of the \( d_{i} \) messages incoming towards the variable node (including the message \( w_{i} \) incoming from the channel), represents the LR of \( c_{i} \) conditioned to some observation. Under the hypothesis of independence for the \( d_{i} \) observations, denoting by \( N\left( i \right)\backslash \left\{ j \right\} \) the set of indexes check nodes connected to \( V_{i} \) except check node of index j, we have

$$ r_{i}^{j} = w_{i} \prod\limits_{{k \in N\left( i \right)\backslash \left\{ j \right\}}} {m_{k}^{i} } . $$
(12.21)

(Again, the independence hypothesis is valid rigorously only during the first \( g/2 \) decoding iterations.)

12.5.3.5 Hard Decision and Stopping Criterion

At the last step of each iteration, every variable node takes a decision about its associated encoded bit. This decision is based on all currently available information about the bit, i.e. , on the a priori information from the communication channel and on all messages incoming from the check nodes . Let \( \varvec{m}^{i} \) denote the list of all messages incoming towards the variable node \( V_{i} \). Applying again the result developed in Sect. 12.5.2 under the hypothesis of independence of the incoming messages, we may write

$$ L\left( {c_{i} |w_{i} ,\varvec{m}^{i} } \right) = w_{i} \prod\limits_{k \in N\left( i \right)} {m_{k}^{i} } . $$
(12.22)

(Again, the independence hypothesis is fulfilled rigorously only during the first \( g/2 \) decoding iterations.) The decision about encoded bit \( c_{i} \) at the end of the generic iteration is then

$$ L\left( {c_{i} |w_{i} ,\varvec{m}^{i} } \right)\begin{array}{*{20}c} {\hat{c}_{i} = 0} \\ { \gtrless } \\ {\hat{c}_{i} = 1} \\ \end{array} 1. $$

If the current hard-decision sequence \( \hat{\varvec{c}} \) is a codeword (\( \hat{\varvec{c}}\varvec{H}^{T} = 0 \), where H is any parity-check matrix of the code) then the algorithm is terminated and \( \hat{\varvec{c}} \) is returned as the decoded codeword. Else, if \( \hat{\varvec{c}} \) is not a codeword and the maximum number of iterations \( I_{max} \) has been reached, the algorithm is terminated and a failure is reported. Else, a new iteration is started jumping to the horizontal step. Belief propagation decoding of LDPC codes may be summarized as follows.

12.5.4 Log-Domain BP Decoder

The main issue when implementing BP decoding described in Sect. 12.5.3 is represented by the need to handle and combine, through multiplications and divisions, likelihood ratios whose values may differ by several orders of magnitude. For this reason, a log-domain implementation is usually preferred from an implementation viewpoint. In the log-domain version of BP decoding, log-likelihood ratios (LLRs) of the encoded bits are exchanged between variable and check nodes . Next, we discuss how the above-described BP decoding shall be modified in the log-domain. All logarithms are assumed to be natural logarithms. Moreover, \( {\text{sgn}}\left( x \right) \) will denote the sign function, i.e. , \( {\text{sgn}}\left( x \right) = {+}1 \) if \( x \ge 0 \) and \( {\text{sgn}}\left( x \right) = {-}1 \) otherwise.

The initialization step remains the same, the only difference being that the first message each variable node sends to all its neighboring check nodes is the a priori LLR of the corresponding encoded bit. Neglecting again the iteration index and denoting by \( R_{i}^{j} \) the message sent from variable node \( i \in \left\{ {0, \ldots ,n - 1} \right\} \) to check node \( j \in N\left( i \right) \), we have

$$ R_{i}^{j} = W_{i} , $$

where \( W_{i} = \log w_{i} \). For instance, assuming antipodal mapping \( x_{i} = 1 - 2c_{i} \), over a BSC with error probability p we have

$$ W_{i} = \left\{ {\begin{array}{*{20}l} {\log \frac{1 - p}{p}ify_{i} = {+}1} \hfill \\ {\log \frac{p}{1 - p}ify_{i} = {-}1} \hfill \\ \end{array} } \right. $$
(12.23)

while, over a Bi-AWGN channel,

$$ W_{i} = \frac{2}{{\sigma^{2} }}y_{i} . $$
(12.24)

The development of check node message processing (horizontal step) in the log domain is more involved. Denoting \( R_{i}^{j} = \log r_{i}^{j} \) and \( M_{j}^{i} = \log m_{j}^{i} \), from (12.20) we may write

$$ \begin{aligned} M_{j}^{i} & = \log \frac{{\prod\nolimits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {\frac{{e^{{R_{k}^{j} }} + 1}}{{e^{{R_{k}^{j} }} - 1}} + 1} }}{{\prod\nolimits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {\frac{{e^{{R_{k}^{j} }} + 1}}{{e^{{R_{k}^{j} }} - 1}} - 1} }} \\ & = \log \frac{{\prod\nolimits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {{\text{sgn}}\left( {R_{k}^{j} } \right)} \cdot \prod\nolimits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {\frac{{e^{{\left| {R_{k}^{j} } \right|}} + 1}}{{e^{{\left| {R_{k}^{j} } \right|}} - 1}} + 1} }}{{\prod\nolimits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {{\text{sgn}}\left( {R_{k}^{j} } \right)} \cdot \prod\nolimits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {\frac{{e^{{\left| {R_{k}^{j} } \right|}} + 1}}{{e^{{\left| {R_{k}^{j} } \right|}} - 1}} - 1} }}, \\ \end{aligned} $$

where we have exploited the fact that any odd function fulfills \( f\left( x \right) = {\text{sgn}}\left( x \right)f\left( {\left| x \right|} \right) \) and the fact that \( f\left( x \right) = (e^{x} + 1)/\left( {e^{x} - 1} \right) \) is odd. The obtained expression of \( M_{j}^{i} \) can be further developed through the identity \( { \log }\left( {\left( {x + 1} \right)/\left( {x - 1} \right)} \right) = {\text{sgn}}\left( x \right) \cdot { \log }\left( {\left( {\left| x \right| + 1} \right)/\left( {\left| x \right| - 1} \right)} \right) \) and through the fact that \( e^{\left| R \right|} \ge 1 \). This yields

$$ \begin{aligned} M_{j}^{i} & = \prod\limits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {{\text{sgn}}\left( {R_{k}^{j} } \right)} \cdot { \log }\frac{{\prod\nolimits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {\frac{{e^{{|R_{k}^{j} |}} + 1}}{{e^{{\left| {R_{k}^{j} } \right|}} - 1}} + 1} }}{{\prod\nolimits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {\frac{{e^{{\left| {R_{k}^{j} } \right|}} + 1}}{{e^{{\left| {R_{k}^{j} } \right|}} - 1}} - 1} }} \\ & = \prod\limits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {{\text{sgn}}\left( {R_{k}^{j} } \right)} \cdot { \log }\frac{{e^{{\sum\nolimits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {{ \log }\frac{{e^{{|R_{k}^{j} |}} + 1}}{{e^{{\left| {R_{k}^{j} } \right|}} - 1}}} }} + 1}}{{e^{{\sum\nolimits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {{ \log }\frac{{e^{{|R_{k}^{j} |}} + 1}}{{e^{{\left| {R_{k}^{j} } \right|}} - 1}}} }} - 1}} \\ & = \prod\limits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {{\text{sgn}}\left( {R_{k}^{j} } \right)} \cdot \varphi \left( {\sum\limits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {\varphi (|R_{k}^{j} |)} } \right) \\ \end{aligned} $$
(12.25)

where, for \( x > 0 \), we have introduced the nonlinear function

$$ \varphi \left( x \right) = \log \frac{{e^{x} + 1}}{{e^{x} - 1}} = - { \log }(\tanh \left( {x/2} \right)). $$

A plot of this function is depicted in Fig. 12.15. Note that the function coincides with its inverse, i.e. , \( \varphi \left( {\varphi \left( x \right)} \right) = x \).

Fig. 12.15
figure 15

Plot of function \( \varphi \left( x \right) = - { \log }(\tanh \left( {x/2} \right)). \)

The transposition of the variable node processing (vertical step) to the logarithmic domain is much simpler. In fact, from (12.21) we immediately obtain

$$ R_{i}^{j} = W_{i} + \sum\limits_{{k \in N\left( i \right)\backslash \left\{ j \right\}}} {M_{k}^{i} } . $$
(12.26)

Analogously, (12.22) shall be updated as

$$ { \log }L\left( {c_{i} |W_{i} ,\varvec{M}^{i} } \right) = W_{i} + \sum\limits_{k \in N\left( i \right)} {M_{k}^{i} } . $$
(12.27)

The algorithm may be then summarized as follows.

Although an enhanced numerical stability is achieved operating on log-likelihood ratios, as well as a lower complexity (as, for instance, products in (12.21) and (12.22) are transformed in sums in (12.25) and (12.26), respectively), check node processing in the log-domain imposes the evaluation of the nonlinear function \( \varphi \). For a single check node \( C_{j} \) of degree \( h_{j} \), this function should in principle be evaluated \( (h_{j} )^{2} \) times per iteration (even if techniques to limit the number of \( \varphi \) evaluations exist). The calculation of function \( \varphi \) is typically performed by means of lookup-tables. Note that, however, for small x the graph of \( \varphi \left( x \right) \) is very steep, thus requiring a very fine (in general, nonuniform) discretization of the corresponding region of the function domain, and that the implementation of \( \varphi \left( x \right) \) through a lookup table may be quite inconvenient in hardware implementation. For these reasons, extensive work has been carried out to develop either approximations of the log-domain BP decoder or other reduced-complexity decoding schemes. All of these decoders offer a reduced error correction capability than actual BP. However, they also exhibit a lower decoding complexity and, hence, a higher decoding speed.

12.6 Reduced-Complexity Decoders

So far we have focused on the BP decoder (both in probability domain and log-domain) originally developed by Gallager . Next, we present a few reduced-complexity, implementation-friendly decoders for LDPC codes. It must be pointed out that a large amount of reduced-complexity decoding schemes for LDPC codes have been developed in the last decade [30]. Most of these decoding schemes may be seen as approximations of the BP decoder , in the sense that they are characterized by approximations of the most complex step of BP decoding, namely, the horizontal step (consisting of the calculation of extrinsic messages from the check nodes to the variable nodes). As such, these approximate BP decoding algorithms can be formalized via the same pseudo-code we have adopted for the log-domain BP decoder , with a difference in step 2.

We only present the most famous approximation of the BP decoder , called the Min-Sum (MS) decoder . We then move to describe decoders exhibiting an even lower complexities. More specifically, we present a binary message-passing algorithm known as “Gallager B” (and originally proposed in [7]) and a class of non-message-passing decoders named “flipping algorithms” (the idea of bit flipping appears again in [7]). These very low complexity decoding algorithms (along with some of their modifications, not addressed in this chapter) are of interest in NAND Flash memories at the beginning of the memory life, when the raw bit error probability is extremely low.

12.6.1 Min-Sum Decoder

The MS decoder can be directly developed from the log-domain BP decoder as follows. From Fig. 12.15 observe that the graph of function \( \varphi \left( x \right) \) is very steep for small values of x. Then, when x assumes small values, a small perturbation in terms of x determines a large deviation in terms of \( \varphi \left( x \right) \). For this reason, if at least one of the magnitudes \( |R_{k}^{j} | \) in the summation appearing in (12.25) is sufficiently small, the corresponding value of \( \varphi (|R_{k}^{j} |) \) dominates the other summands. Hence, we can write

$$ \begin{aligned} M_{j}^{i} & = \prod\limits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {{\text{sgn}}\left( {R_{k}^{j} } \right)} \cdot \varphi \left( {\sum\limits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {\varphi (|R_{k}^{j} |)} } \right) \\ & \approx \prod\limits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {{\text{sgn}}\left( {R_{k}^{j} } \right)} \cdot \varphi \left( {\max_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} \varphi (|R_{k}^{j} |)} \right) \\ & = \prod\limits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {{\text{sgn}}\left( {R_{k}^{j} } \right)} \cdot \min_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} |R_{k}^{j} | \\ \end{aligned} $$
(12.28)

where the last equality follows from \( \varphi \left( x \right) \) being self-invertible (i.e. , \( \varphi \left( {\varphi \left( x \right)} \right) = x \)) and monotonically decreasing. The MS decoding algorithm is summarized next.

Several improvements to the MS decoder have been proposed in the literature, to reduce the gap between its performance and that of BP decoding, at the expense of a small increase in terms of computational cost. These refinements are out of the scope of this book. Interested readers may refer, for example, to [31, 32].

12.6.2 Gallager B Decoder

The BP and MS decoders are characterized by real-valued (properly quantized, in hardware implementation) messages exchanged between the variable nodes and the check nodes . Moreover, as previously emphasized, both algorithms remain unchanged over a wide range of communication channels. In contrast, Gallager B decoder , first proposed in [7], is a message-passing decoding algorithm for LDPC codes characterized by binary-valued messages and is specifically tailored for the BSC (i.e. , no soft information is available at the decoder input). Although its performance is poor compared with that of BP and MS algorithms over the BSC , it has been proved that it represents the optimum LDPC decoder over the BSC when the extrinsic messages are constrained to be binary.

The algorithm works as follows. Assuming transmission over a BSC with error probability p and input and output alphabets \( {\mathcal{X}} = {\mathcal{Y}} = \left\{ {0,1} \right\} \), for \( i = 0, \ldots ,n - 1 \) variable node \( V_{i} \) is fed with the corresponding binary symbol \( y_{i} \in \left\{ {0,1} \right\} \) received from the channel. (In contrast, to perform BP decoding over the BSC variable node i is initialized according to (12.18) or to its logarithmic version (12.23).) The symbol \( y_{i} \) is broadcasted by variable node \( V_{i} \) to each of its neighboring check nodes . The algorithm is then structured in a similar way as BP or MS, where the horizontal, vertical, and stopping criterion steps are specified as follows.

During the horizontal step, for \( j = 0, \ldots ,m - 1 \) the message propagating from check node \( C_{j} \) to variable node \( V_{i} \), \( i \in N\left( j \right) \), is simply the modulo-2 summation of all binary messages incoming from variable nodes connected to \( C_{j} \) but the message incoming from \( V_{i} \). Hence, we can write

$$ m_{j}^{i} = \sum\limits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {r_{k}^{j} } $$
(12.29)

where the summation is modulo-2. (Note that \( r_{i}^{j} = y_{i} \) for all \( i = 0, \ldots ,n - 1 \) at the first iteration.) During the vertical step, for \( i = 0, \ldots ,n - 1 \) the message from variable node \( V_{i} \) to check node \( C_{j} \), \( j \in N\left( i \right) \), is equal to the modulo-2 complement of \( y_{i} \) if the number of incoming extrinsic messages different from \( y_{i} \) is above some threshold, and is equal to \( y_{i} \) otherwise. Letting

$$ X_{j}^{i} = \left| {\left\{ {m_{k}^{i} \ne y_{i}\,s.t.\,k \in N\left( i \right)\backslash \left\{ j \right\}} \right\}} \right| $$

and \( T^{\left( i \right)} \) be the number of such extrinsic messages and the threshold at the current iteration, respectively, and letting \( C\left( {y_{i} } \right) \) be the modulo-2 complement of \( y_{i} \), we have

$$ r_{i}^{j} = \left\{ {\begin{array}{*{20}l} {C\left( {y_{i} } \right)} \hfill & {{\text{if}}\,\,X_{j}^{i} \ge T^{\left( i \right)} } \hfill \\ {y_{i} } \hfill & {{\text{otherwise}}.} \hfill \\ \end{array} } \right. $$
(12.30)

At the end of each decoding iteration, for each variable node \( V_{i} \) the decision about the current value of the local bit \( \hat{c}_{i} \) is taken according to a majority policy. More specifically, if the variable node degree \( d_{i} \) is even, then \( \hat{c}_{i} \) is set equal to the value assumed by the majority of the incoming messages \( m_{j}^{i} \) and of \( y_{i} \). On the other hand, if the variable node degree is odd, then \( \hat{c}_{i} \) is set equal to the value assumed by the majority of the incoming messages \( m_{j}^{i} \) (\( y_{i} \) is not considered).

Appropriate values for the threshold \( T^{\left( i \right)} \) range between \( \lfloor (d_{i} - 1)/2\rfloor \) and \( d_{i} \), as the number of incoming extrinsic messages enforcing an outgoing message different from \( \tilde{y}_{i} \) must be sufficiently high. Note that in principle, for irregular codes the value of the threshold may be different for two different variable nodes, even during the same iteration. Also note that, for the same variable node, the value of the threshold may not remain constant with the iteration index, as it may be adjusted dynamically. In [7] it was shown that for a regular \( \left( {d,h} \right) \) LDPC code, the optimum value of the threshold (the same for all variable nodes at the same iteration) is the smallest integer T for which the inequality

$$ \frac{1 - p}{p} \le \left( {\frac{{1 + \left( {1 - 2\varepsilon } \right)^{h - 1} }}{{1 - \left( {1 - 2\varepsilon } \right)^{h - 1} }}} \right)^{2T - d + 1} $$
(12.31)

is fulfilled, where p is the BSC error probability and \( \varepsilon \) is the extrinsic error probability. This latter parameter represents the average probability that an edge in the Tanner graph carries an error message from the variable node set to the check node set at the considered iteration, and varies over iterations. In the asymptotic setting where the Tanner graph is assumed to be cycle-free, the update equation for \( \varepsilon \) for regular LDPC codes is [7]

$$ \begin{aligned} \varepsilon_{\ell + 1} & = p - p\sum\limits_{{z = T_{\ell } }}^{d - 1} {\left( {\begin{array}{*{20}c} {d - 1} \\ z \\ \end{array} } \right)} \left[ {\frac{{1 + \left( {1 - 2\varepsilon_{\ell } } \right)^{h - 1} }}{2}} \right]^{z} \left[ {\frac{{1 - \left( {1 - 2\varepsilon_{\ell } } \right)^{h - 1} }}{2}} \right]^{d - 1 - z} \\ & \quad + \left( {1 - p} \right)\sum\limits_{{z = T_{\ell } }}^{d - 1} {\left( {\begin{array}{*{20}c} {d - 1} \\ z \\ \end{array} } \right)} \left[ {\frac{{1 - \left( {1 - 2\varepsilon_{\ell } } \right)^{h - 1} }}{2}} \right]^{z} \left[ {\frac{{1 + \left( {1 - 2\varepsilon_{\ell } } \right)^{h - 1} }}{2}} \right]^{d - 1 - z} \\ \end{aligned} $$
(12.32)

where \( \ell \ge 0 \) is the iteration index and where \( \varepsilon_{0} = p \).

Example 12.5

Equation (12.32) represents density evolution recursion for Gallager B decoding of regular unstructured \( \left( {d,h} \right) \) LDPC code ensembles. The asymptotic decoding threshold \( p^{*} \) for this ensemble under Gallager B decoding is then the sup of the set of all \( p > 0 \) such that \( {\lim }_{\ell \to \infty } \varepsilon_{\ell } = 0 \). For given d and h, whether or not some p is above or below threshold can be easily checked by running the recursion (with starting point \( \varepsilon_{0} = p \)), adapting the value of \( T_{\ell } \) at each iteration according to (12.31) for the current value of \( \varepsilon_{\ell } \). For example, for \( d = 4 \) and \( h = 40 \) (which corresponds to a rate \( R = 9/10 \) ensemble) we obtain a threshold \( p^{*} = 0.0041 \). Through (12.8) and \( E_{s} = RE_{b} \), this corresponds to a threshold \( (E_{b} /N_{o} )^{*} = 5.892 \) dB, about 1.5 dB away from the Shannon limit relevant to the one-bit quantized Bi-AWGN channel.

12.6.3 Flipping Algorithms

Flipping algorithms are a class of low-complexity, iterative decoding algorithms for LDPC codes over the BSC different from message-passing ones. The decoding strategy consists of flipping, at the end of each decoding iteration, the current value of a subset of variable nodes for which a certain flipping condition is fulfilled. If the obtained binary sequence is a codeword, decoding is stopped and the codeword is returned. Otherwise, a new iteration is started. The process continues until a codeword is found or a maximum number of iterations is reached. Different flipping algorithms are characterized by different criteria to identify the variable nodes to be flipped.

A popular flipping algorithm, hereafter referred to simply as bit-flipping (BF) algorithm, consists of flipping at each iteration those variable nodes for which the number u of unsatisfied check nodes is maximum. A BSC with input and output alphabets \( {\mathcal{X}} = {\mathcal{Y}} = \left\{ {0,1} \right\} \) is assumed.

12.7 Non-binary LDPC Codes

The so far introduced LDPC codes are binary, in that the code represents an \( {Rn} \)-dimensional subspace of the vector space \( \text{GF}\left({2}\right)^{{n}} \), where R is the code rate, n is the codeword length, and \( \text{GF}\left({2}\right) \) is the Galois field of order 2. More specifically, the LDPC code is the (\( {Rn} \)-dimensional) null space of an \( {m} \times {n} \) sparse parity-check matrix H. All n encoded vectors belong to \( \text{GF}\left({2}\right) \) as well as all of the elements of H. If row vector a belongs to \( \text{GF}\left({2}\right)^{{n}} \), then the syndrome of a is \( \varvec{s} = \varvec{aH}^{\varvec{T}} \in {\text{GF}}\left({2} \right)^{{m}} \), where all operations are performed in \( \text{GF}\left({2} \right) \). Vector a is a codeword if and only if its syndrome is null.

Like other classes of linear block codes, also LDPC codes may be constructed on Galois fields of order \( q > 2 \) [33]. In this case the code can be represented by a sparse parity-check matrix H on \( {\text{GF}}\left( q \right) \) i.e. , a matrix whose elements \( h_{j,i} \), \( j \in \left\{ {0,1, \ldots ,m - 1} \right\} \) and \( i \in \left\{ {0,1, \ldots ,n - 1} \right\} \), belong to \( {\text{GF}}\left( q \right) \) and with a relatively small number of nonzero elements. The code is an \( Rn \)-dimensional subspace of the vector space \( {\text{GF}}\left( q \right)^{n} \), where R is still the code rate and the codeword length n is expressed in Galois field symbols. Letting row vector a belong to \( {\text{GF}}\left( q \right)^{n} \), the syndrome of a is still \( \varvec{s} = \varvec{aH}^{T} \in {\text{GF}}\left( q \right)^{m} \), where now all operations are performed in \( {\text{GF}}\left( q \right) \). Still, a is a codeword if and only if its syndrome is null. Hereafter we focus on LDPC codes constructed on extension fields \( {\text{GF}}\left( q \right) \) with \( q = 2^{p} \) for integer \( p > 2 \). We denote by \( \alpha \) a primitive element of \( {\text{GF}}\left( q \right) \). We use the terminology non-binary LDPC (NB-LDPC ) code to refer to an LDPC code constructed on the Galois field \( {\text{GF}}\left( q \right) \).

12.7.1 NB-LDPC Code Ensembles

As binary LDPC codes, also NB-LDPC ones admit a graphical representation through a Tanner graph \( G = \left( {{\mathcal{V}} \cup {\mathcal{C}},{\mathcal{E}}} \right) \). Again, \( {\mathcal{V}} = \left\{ {V_{0} ,V_{1} , \ldots ,V_{n - 1} } \right\} \) is the set of variable nodes, \( {\mathcal{C}} = \left\{ {C_{0} ,C_{1} , \ldots ,C_{m - 1} } \right\} \) is the set of check nodes , and \( {\mathcal{E}} \) is the set of edges. The number of edges, equal to the number of non-zero entries of H, is still denoted by E. The n variable nodes and the m check nodes are still bijectively associated with the n codeword symbols and with the m parity -check equations, respectively; each encoded symbol now belongs to \( {\text{GF}}\left( q \right) \) and each parity -check equation is a linear equation in \( {\text{GF}}\left( q \right) \). In the Tanner graph , variable node \( V_{i} \in {\mathcal{V}} \) is connected to check node \( C_{j} \in {\mathcal{C}} \) by an edge if and only if \( h_{j,i} \in {\text{GF}}\left( q \right)\backslash \left\{ 0 \right\} \), i.e. , if and only if the element of H in row \( j \in \left\{ {0,1, \ldots ,m - 1} \right\} \) and column \( i \in \left\{ {0,1, \ldots ,n - 1} \right\} \) is non-zero. Equivalently, \( V_{i} \) is connected to \( C_{j} \) if and only if the non-binary codeword symbol \( c_{i} \), associated with \( V_{i} \), is involved in the parity -check equation corresponding to \( C_{j} \). Edge labeling represents the main difference between the Tanner graphs of binary and non-binary LDPC codes. As opposed to the Tanner graph of a binary LDPC code, in fact, in the Tanner graph of a NB-LDPC code the edge connecting variable node \( V_{i} \) to check node \( C_{j} \) is labeled by the corresponding non-zero element \( h_{j,i} \) of the parity-check matrix .

As an example, the Tanner graph of a linear block code with codeword length \( n = 5 \) and dimension \( k = 3 \) (where both n and k are measured in field symbols) is shown in Fig. 12.16. The Tanner graph has two check nodes , each imposing a linear constraint on the variable nodes connected to it, and five variable nodes, each representing a codeword symbol. The parity-check matrix of the corresponding linear block code is

Fig. 12.16
figure 16

Tanner graph of a non-binary linear block code over \( {\text{GF}}\left( 4 \right) \) with codeword length 5 (field symbols) and code rate 3/5. Each edge in the Tanner graph is labeled with a non-zero element of \( {\text{GF}}\left( 4 \right) \)

$$ \varvec{H} = \left[ {\begin{array}{*{20}c} \alpha & 1 & 0 & 1 & 0 \\ 0 & {\alpha^{2} } & {\alpha^{2} } & \alpha & 1 \\ \end{array} } \right]. $$

Similarly to their binary counterparts, NB-LDPC codes are usually analyzed in terms of ensemble average. Ensembles of NB-LDPC codes are defined similarly to ensembles of binary LDPC codes, with the difference that edge labeling is also considered in the ensemble definition. For example, the unstructured ensemble of NB-LDPC codes over \( {\text{GF}}\left( q \right) \) of length n and degree distribution \( \left( {\Lambda },{\text{P}} \right) \), denoted by \( {\mathcal{C}}_{q} \left( {n,\Lambda ,{\text{P}}} \right) \), includes the LDPC codes constructed \( {\text{GF}}\left( q \right) \) corresponding to all possible \( E! \) edge permutations between the variable node and the check node sockets, according to a uniform probability distribution and, for each such permutation, all possible edges labelings with non-zero elements of \( {\text{GF}}\left( q \right) \) again according to a uniform probability measure. Ensembles of ultra-sparse NB-LDPC codes (where all variable nodes have degree 2) have attracted an increasing interest in the past decade [34, 35]. Ensembles of protograph-based NB-LDPC codes may also be defined similarly to their binary counterparts, by including edge labeling in the ensemble definition [36, 37].

12.7.2 Iterative Decoding of NB-LDPC Codes

Similarly to binary LDPC codes, NB-LDPC codes may be decoded iteratively via BP decoding. The BP decoder for NB-LDPC codes may be regarded as a generalization of the above-described BP decoder for binary LDPC codes. Hereafter, we provide a description of such a decoder , focusing on its probability-domain implementation. We assume an extension field of order \( q = 2^{p} \) for integer \( p > 2 \). For the sake of clarity, we divide the algorithm into six steps called initialization, message permutation, horizontal step, message de-permutation, vertical step and hard decision and stopping criterion. Out of these six steps, the first one (initialization) is performed only once, at the beginning of the algorithm, while the others are performed iteratively until a stopping rule is verified.

In the non-binary BP decoder , each message still represents extrinsic information. As opposed to the binary case, in which each message exchanged between a variable node \( V_{i} \) and a check node \( C_{j} \) is (in the log-domain implementation) a scalar value representing a likelihood ratio or log-likelihood ratio, in the non-binary setting each message is a vector of length \( q = 2^{p} \) representing a pmf for the non-binary symbol associated with \( V_{i} \). For example, for an LDPC code constructed over the Galois field \( {\text{GF}}\left( 4 \right) \), the message \( m_{j}^{i} \) from check node \( C_{j} \) to variable node \( V_{i} \) is a vector with four elements having the form \( m_{j}^{i} = \left( {m_{j}^{i} \left( 0 \right),m_{j}^{i} \left( 1 \right),m_{j}^{i} \left( \alpha \right),m_{j}^{i} \left( {\alpha^{2} } \right)} \right) \) where \( m_{j}^{i} \left( 0 \right) = \Pr \left( {V_{i} = 0} \right) \), \( m_{j}^{i} \left( 1 \right) = \Pr \left\{ {V_{i} = 1} \right\} \), \( m_{j}^{i} \left( \alpha \right) = \Pr \left\{ {V_{i} = \alpha } \right\} \), and \( m_{j}^{i} \left( {\alpha^{2} } \right) = \Pr \left\{ {V_{i} = \alpha^{2} } \right\} \), each probability being conditioned to the extrinsic information received by the check node along all of its edges but the one towards \( V_{i} \).

12.7.2.1 Initialization

In the initialization step, each variable node receives a priori information from the channel and simply broadcasts it along all of its edges, towards the check nodes that are connected to it. Hereafter we denote by \( r^{i} \) a priori information for variable node \( V_{i} \). As well as messages exchanged between variable nodes and check nodes , \( r_{i} \) is a pmf for the non-binary symbol \( c_{i} \in {\text{GF}}\left( {2^{p} } \right) \) associated with \( V_{i} \). The way a priori information \( r_{i} \) is computed depends on the channel.

For example, let us consider transmission of a NB-LDPC code constructed on \( {\text{GF}}\left( {2^{p} } \right) \) over the Bi-AWGN channel depicted in Fig. 12.16. Let \( \varvec{c} = \left( {c_{0} ,c_{1} , \ldots ,c_{n - 1} } \right) \) be the NB-LDPC codeword, \( c_{i} \in {\text{GF}}\left( {2^{p} } \right) \) for \( i \in \left\{ {0,1, \ldots ,n - 1} \right\} \). In this case the generic non-binary codeword symbol \( c_{i} \) is first converted to its binary representation \( \varvec{c}_{i} = \left( {c_{i,0} ,c_{i,1} , \ldots ,c_{i,p - 1} } \right) \), \( c_{t,j} \in {\text{GF}}\left( 2 \right) \) for \( j \in \left\{ {0,1, \ldots ,p - 1} \right\} \). Then, the binary representation is mapped onto a word of p antipodal symbols \( \varvec{x}_{i} = 1 - 2\varvec{c}_{i} \) (meaning \( E_{s} \) normalized to 1), yielding a sequence \( \varvec{x} = \left( {\varvec{x}_{0} ,\varvec{x}_{1} , \ldots ,\varvec{x}_{n - 1} } \right) \) of \( np \) channel symbols that are transmitted sequentially over the channel. Letting \( \varvec{y} = \left( {\varvec{y}_{0} ,\varvec{y}_{1} , \ldots ,\varvec{y}_{n - 1} } \right) \) be the corresponding Bi-AWGN channel output, it is easy to verify that, for all \( i \in \left\{ {0,1, \ldots ,n - 1} \right\} \) and for each \( \beta \in {\text{GF}}\left( {2^{p} } \right) \), we have

$$ \Pr \left( {c_{i} = \beta|y_{i} } \right)\, \alpha\, \left( {2\pi \sigma^{2} } \right)^{{ - \frac{p}{2}}} \exp\left( { - \frac{||{\bf\varvec{x}_{i} \left( \beta \right)||^{2} + ||\bf\varvec{y}_{i}||^{2} }}{{2\sigma^{2} }}} \right)\exp \left( {\frac{<{\bf\varvec{x}_{i} \left( \beta \right),\bf\varvec{y}_{i} }>}{{\sigma^{2} }}} \right) $$
(12.33)

where \( \varvec{x}_{i} \left( \beta \right) \) is the antipodal version of the binary representation of \( \beta \), where \( \sigma^{2} \) is the variance of each noise sample, and where \(\langle \varvec{x}_{i} \left( \beta \right),\varvec{y}_{i}\rangle \) is the inner product between \( \varvec{x}_{i} \left( \beta \right) \) and \( \varvec{y}_{i} \). In this example, a priori information for \( V_{i} \) (coinciding with the message \( V_{i} \) sends to all of its neighboring check nodes during the initialization step) is therefore \( r_{i} = \left( {\Pr \left\{ {c_{i} = 0} \right\},\Pr \left\{ {c_{i} = 1} \right\},\Pr \left\{ {c_{i} = \alpha } \right\}, \ldots ,\Pr \left\{ {c_{i} = \alpha^{q - 2} } \right\}} \right) \) where each element of the pmf \( r^{i} \) is computed according to (12.33). Over an SLC or MLC channel model , a priori information shall be appropriately computed, usually based again on the binary representation of each non-binary codeword symbol.

All subsequent steps of the BP decoder for NB-LDPC codes, described next, remain the same regardless of the specific channel model and therefore irrespective of how a priori information is computed.

12.7.2.2 Message Permutation

As previously described, each edge in the Tanner graph of a NB-LDPC code is labeled by the corresponding non-zero element of the parity-check matrix . Considering check node \( C_{j} \) and letting \( c_{i} \in {\text{GF}}\left( {2^{p} } \right) \) be the \( j{\text{th}} \) codeword symbol, the check node imposes the constraint

$$ \sum\limits_{k \in N\left( j \right)} {h_{j,k} c_{k} } = 0. $$
(12.34)

where \( h_{j,k} \in {\text{GF}}\left( {2^{p} } \right)\backslash \left\{ 0 \right\} \) is the element of H in position \( \left( {j,k} \right) \). This means that the value of each variable node \( V_{i} \), connected to \( C_{j} \), is first multiplied by the corresponding edge label and then is checked by the check node through (12.34). In terms of BP decoding, where message \( r_{i}^{j} \) is a pmf for symbol \( c_{i} \in {\text{GF}}\left( {2^{p} } \right) \), multiplication of by the non-zero edge label simply entails a permutation of the elements of \( r_{i}^{j} \). To make a distinction between the message sent by \( V_{i} \) and the message received by \( C_{j} \) (after the permutation), hereafter we denote the former by \( r_{i}^{j} \) and the latter by \( \Pi (r_{i}^{j} ) \). An example is provided next.

Example 12.6

Let the Galois field order be \( q = 4 \). Let the edge connecting variable node \( V_{i} \) and check node \( C_{j} \) be labeled by \( \alpha \in {\text{GF}}\left( 4 \right) \), and the message sent by \( V_{i} \) be \( r_{i}^{j} = \left( {0.4,0.3,0.2,0.1} \right) \). Since in \( {\text{GF}}\left( 4 \right) \) we have \( 0 \cdot \alpha = 0 \), \( 1 \cdot \alpha = \alpha \), \( \alpha \cdot \alpha = \alpha^{2} \), and \( \alpha^{2} \cdot \alpha = 1 \), the effect of the edge label \( \alpha \) on the message is a permutation of its elements, leading to the message \( \Pi \left( {r_{i}^{j} } \right) = \left( {0.4,0.1,0.3,0.2} \right) \) received by \( C_{j} \). Each non-zero edge label induces a specific permutation.

12.7.2.3 Horizontal Step

Check node \( C_{j} \), \( j \in \left\{ {0,1, \ldots ,m - 1} \right\} \), receives one message \( \Pi (r_{i}^{j} ) \), \( i \in N\left( j \right) \), from each of its neighboring variable nodes and sends back one message \( m_{j}^{i} \), \( i \in N\left( j \right) \), to each of them. To understand how extrinsic information shall be generated at the check node and forwarded to the relevant variable node, we can look at (12.34) that we recast in the form \( \sum\nolimits_{k \in N\left( j \right)} {z_{k} } = 0 \) by defining \( z_{k} = h_{j,k} c_{k} \). For some \( i \in N\left( j \right) \), the constraint imposed by the check node is \( z_{i} = - \sum\nolimits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {z_{k} } = \sum\nolimits_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {z_{k} } \) where the “\( - \)” sign can be omitted owing to the fact that \( q = 2^{p} \). We may regard each summand \( z_{k} \) as a random variable taking values in \( {\text{GF}}\left( q \right) \) and with pmf equal to that of the incoming message \( \Pi (r_{k}^{j} ) \). Under the assumption that all \( z_{k} \) are independent, the pmf of their sum (hence the pmf of \( z_{i} \)) is the convolution of their pmfs. That is, we may write

$$ m_{j}^{i} = { \circledast }_{{k \in N\left( j \right)\backslash \left\{ i \right\}}}\Pi \left( {r_{k}^{j} } \right) $$
(12.35)

where \( { \circledast } \) denotes convolution between pmfs.

Since the complexity of convolution scales quadratically with the vector size, a naïve implementation of the horizontal step based on (12.35) leads to a complexity scaling as \( {\mathcal{O}}\left( {q^{2} } \right) \), such a complexity dominating the overall decoding complexity and becoming problematic even for moderate q. A reduced-complexity but equivalent implementation of the horizontal step is based on applying fast Hadamard transform to both sides of (12.35). Hadamard transform turns vector convolution into element-wise multiplication of the transformed vectors; therefore, letting \( {\mathcal{H}} \) denote the Hadamard transform and recalling that Hadamard transform coincides with its inverse, \( m_{j}^{i} \) in (12.35) may equivalently be calculated as

$$ m_{j}^{i} = {\mathcal{H}}\left( {{ \otimes }_{{k \in N\left( j \right)\backslash \left\{ i \right\}}} {\mathcal{H}}\left( {\Pi \left( {r_{k}^{j} } \right)} \right)} \right) $$
(12.36)

where \( { \otimes } \) denotes element-wise product between two vectors. Using fast Hadamard transform reduces the horizontal step complexity (and more in general the complexity of the whole decoder ) to \( {\mathcal{O}}\left( {q\,\log q} \right) \).

12.7.2.4 Message De-permutation

In the previously described message permutation step, the elements of message \( r_{i}^{j} \), sent by variable node \( V_{i} \) to check node \( C_{j} \), are permuted according to the permutation established by the edge label \( h_{j,i} \). The message \( m_{j}^{i} \), sent by \( C_{j} \) towards \( V_{i} \), must undergo the inverse permutation (equivalently, the permutation established by the inverse label \( h_{j,k}^{ - 1} \)) before reaching \( V_{i} \). For the sake of clarity, in order to distinguish the message sent by \( C_{j} \) from the message received by \( V_{i} \) after de-permutation, we keep denoting by \( m_{j}^{i} \) the former and by \( \Pi ^{ - 1} \left( {m_{j}^{i} } \right) \) the latter.

12.7.2.5 Vertical Step

Variable node \( V_{i} \), \( i \in \left\{ {0,1, \ldots ,n - 1} \right\} \), receives the \( d_{i} \) messages \( \Pi ^{ - 1} \left( {m_{j}^{i} } \right) \), \( j \in N\left( i \right) \), and generates \( d_{i} \) messages \( r_{i}^{j} \), \( j \in N\left( i \right) \), each of which is sent towards a specific edge to the corresponding check node. Each message \( r_{i}^{j} \) is computed based on a priori information \( r_{i} \) available from the channel and on extrinsic information \( \Pi ^{ - 1} \left( {m_{k}^{i} } \right) \), \( k \in N\left( i \right)\backslash \left\{ j \right\} \). Specifically, assuming independence between all of the incoming messages (including a priori information), \( r_{i}^{j} \) is computed as

$$ r_{i}^{j} = \gamma_{j} \cdot r_{i} { \otimes }\left( { \otimes_{{k \in N\left( i \right)\backslash \left\{ j \right\}}}\Pi ^{ - 1} \left( {m_{k}^{i} } \right)} \right) $$
(12.37)

where again \( { \otimes } \) denotes element-wise product between two vectors and where the scalar \( \gamma_{j} \) is a scaling factor whose value makes the sum of the elements of \( r_{i}^{j} \) equal to 1.

12.7.2.6 Hard Decision and Stopping Criterion

At the end of each BP decoding iteration, a hard decision is made about the value taken by each variable node; this hard decision exploits a posteriori information for the variable node. In probability-domain BP decoding of NB-LDPC codes, a posteriori information is represented by the pmf of the Galois field symbol \( c_{i} \) associated with the variable node given all incoming messages and a priori information. Under independence assumption this is given by

$$ r_{i}^{\text{APP}} = \gamma \cdot r_{i} { \otimes }\left[ { \otimes_{j \in N\left( i \right)} \left( {\Pi ^{ - 1} \left( {m_{j}^{i} } \right)} \right)} \right] $$
(12.38)

where again \( \gamma \) is a normalization factor. Let \( {\varvec{\Pi}}^{ - 1} (\varvec{m}^{i} ) \) be the ordered list of messages received by variable node \( V_{i} \). Once the a posteriori pmf \( r_{i}^{\text{APP}} \) for symbol \( c_{i} \) has been computed, a symbol-wise MAP decision is made, namely,

$$ \hat{c}_{i} = {\text{argmax}}_{{c \in {\text{GF}}\left( q \right)}} \Pr \left\{ {c|r^{i} ,{\varvec{\Pi}}^{ - 1} (\varvec{m}^{i} )} \right\}. $$
(12.39)

In other words, \( \hat{c}_{i} \) is the element of \( {\text{GF}}\left( q \right) \) that corresponds to the largest element of the pmf \( r_{i}^{\text{APP}} \). Note that, as the scaling factor \( \gamma \) is the same for all elements of \( r_{i}^{\text{APP}} \), it does not affect the final decision and therefore it can be set to 1 for all \( i \in \left\{ {0,1, \ldots ,n - 1} \right\} \).

Similarly to the binary case, if the current hard-decision sequence \( \hat{\varvec{c}} = \left( {\hat{c}_{0} ,\hat{c}_{1} , \ldots ,\hat{c}_{n - 1} } \right) \) fulfills \( \hat{\varvec{c}}\varvec{H}^{T} = \bf\varvec{0} \), then the algorithm terminates and \( \hat{\varvec{c}} \) is returned as the detected codeword. Else, if \( \hat{\varvec{c}} \) is not a codeword and the maximum number of iterations \( I_{max} \) has been reached, the algorithm is terminated and a failure is reported. Else, a new iteration is started jumping to the message permutation step.

Belief propagation decoding of LDPC codes over \( {\text{GF}}\left( q \right) \), \( q = 2^{p} \), may be summarized as follows.

12.8 Numerical Example

In this section, we present some numerical results aimed at comparing the performance of binary LDPC and BCH codes, with the purpose to highlight the potential of LDPC codes in Flash memories applications. We assume an SLC memory as the reference channel model. We compare the performance of a regular QC-LDPC code, under several decoding algorithms offering different tradeoffs between performance and complexity, with the performance of a narrowsense binary BCH code with similar parameters, decoded via bounded distance decoding.

The LDPC code is characterized by a length \( n_{\text{LDPC}} = 8200 \) and a dimension \( k_{\text{LDPC}} = 7379 \) bits, and therefore by a code rate R very close to \( 9/10 \). Its minimum distance, estimated with the impulse method proposed in [38], is equal to \( d_{\text{LDPC}} = 114 \). All variable nodes of the LDPC code have degree 4, and all of its check nodes have degree 40. Its 820 × 8200 parity-check matrix is in block circulant form, where the generic block is a 205 × 205 circulant permutation matrix, and has been constructed according to a block circulant version of the progressive edge-growth (PEG) algorithm. The performance of this code has been evaluated via Monte Carlo software simulation, under BP, MS, and BF decoding algorithms. The performance curves under both BP and MS decoding have been obtained under two different settings, namely, soft-decision and hard-decision decoding . These two settings correspond to assuming the Bi-AWGN channel with unquantized output (Example 12.2) and with one-bit quantized output (Example 12.3), respectively, as the channel model. The first setting is equivalent to assuming an SLC memory with an infinite number of reads per bit, while the second one to assuming an SLC memory with one read per bit. The variable nodes are initialized according to (12.19) in the unquantized case and according to (12.18) in the quantized one. In the quantized case, the raw bit error rate of the channel can be obtained from \( E_{b} /N_{0} \) according to (12.8), where \( E_{s} /N_{0} = RE_{b} /N_{0} \). For instance, \( E_{b} /N_{0} = 5\,{\text{dB}} \) corresponds to a raw bit error rate \( p = 8.5 \cdot 10^{ - 3} \). The Shannon limit for the unquantized case and for the one-bit quantized case are also evaluated, for benchmarking purposes.

The competitor BCH code has nominal parameters \( n_{\text{BCH}} = 8191 \), \( k_{\text{BCH}} = 7372 \), \( t = 63 \) (error correction capability ), and minimum distance \( d_{\text{BCH}} = 127 \). Its code rate is approximately equal to \( 9/10 \), similar to the code rate of the QC-LDPC code. The codeword error rate (CER) and the bit error rate (BER) of the BCH code under hard decision bounded distance decoding have been evaluated analytically according to the relationships

$$ P_{e} = \sum\limits_{r = t + 1}^{{n_{\text{BCH}} }} {\left( {\begin{array}{*{20}c} {n_{\text{BCH}} } \\ r \\ \end{array} } \right)} p^{r} \left( {1 - p} \right)^{{n_{\text{BCH}} - r}} $$
(12.40)

and

$$ P_{b}\,\approx\,\frac{{d_{\text{BCH}} }}{k} \cdot P_{e} $$
(12.41)

respectively.

With reference to Fig. 12.17, we see that over the hard-decision channel (SLC with one read) the BCH code exhibits nearly the same performance as the QC-LDPC code decoded via BP and that its performance is even slightly better at low error rates. This is not surprising, as BCH codes are well known to offer very good performances over hard-decision channels, especially at high code rates. As opposed to BCH codes, however, LDPC codes can handle in a very natural way soft information incoming from the communication channel, which allows to attain substantial performance improvements over the error correction capabilities achievable with hard-decision decoding . In our example, when the LDPC decoder is fed with unquantized soft information, its coding gain with respect to that achieved under hard-decision decoding is improved by about 1.6 dB under both BP and MS decoding algorithms at \( {\text{CER}} = 10^{{-}4} \). Moreover, again at \( {\text{CER}} = 10^{{-}4} \), the LDPC code under unquantized BP decoding performs only 0.8 dB away from the corresponding Shannon limit , in terms of BER.

Fig. 12.17
figure 17

Bit and codeword error rates for an (8191, 7372) QC-LDPC code (under different decoding algorithms) and an (8191, 7372), t = 63 narrowsense binary BCH code under bounded distance decoding, over an SLC flash memory channel. Curves corresponding to filled and empty symbols illustrate the codeword error rates and the bit error rates of the LDPC code, respectively. The dashed and dot-dashed lines illustrate the codeword error rate and the bit error rate of the BCH code, respectively. The two straight solid lines are the Shannon limits for rate R = 9/10 under soft-decision and hard-decision decoding , respectively

For the same decoding algorithm (BP or MS), the performance curves of the LDPC code labeled as “soft” and “hard” represent the two extreme cases in which unconstrained soft information is available at the decoder , and no soft information is available. In general, when a finite number of cell reads is performed with different read voltage values, the corresponding performance curve will lie between the two extreme curves: The larger the number of cell reads, the closer the performance curve to the “soft” one. Therefore, LDPC codes can largely outperform BCH codes in Flash memory applications, provided a sufficient amount of soft information is available at the decoder . It is also pointed out that the design of appropriate QC irregular LDPC codes can favor an even larger coding gain with respect to BCH codes.

We also highlight how very simple decoding algorithms of LDPC codes such as BF (or Gallager B) decoding, can be of interest at the beginning of the memory life, i.e. , when the raw bit error rate is very small. For example, as from Fig. 12.17, BF decoding could become of interest for values of \( E_{b} /N_{0} \) larger of 7.0 dB, corresponding to a raw bit error rate smaller than \( 1.3 \cdot 10^{{-}3} \).