State-Dependent Channels: Strong Converse and Bounds on Reliability Function

Tyagi, Himanshu; Narayan, Prakash

doi:10.1007/978-0-8176-8376-4_22

Himanshu Tyagi⁶ &
Prakash Narayan⁶

Part of the book series: Applied and Numerical Harmonic Analysis ((ANHA))

1754 Accesses
2 Citations

Abstract

We consider an information theoretic model of a communication channel with a time-varying probability law. Specifically, our model consists of a state-dependent discrete memoryless channel, in which the underlying state process is independent and identically distributed with known probability distribution, and for which the channel output at any time instant depends on the inputs and states only through their current values. For this channel, we provide a strong converse result for its capacity, explaining the structure of optimal transmission codes. Exploiting this structure, we obtain upper bounds for the reliability function when the transmitter is provided channel state information causally and noncausally. Instrumental to our proofs is a new technical result which provides an upper bound on the rate of codes with code words that are “conditionally typical over large message-dependent subsets of a typical set of state sequences.” This technical result is a nonstraightforward extension of an analogous result for a discrete memoryless channel without states; the latter provides a bound on the rate of a good code with code words of a fixed composition.

Access provided by Autonomous University of Puebla. Download chapter PDF

Memory effects can make the transmission capability of a communication channel uncomputable

Article Open access 20 March 2018

Quantum coding with finite resources

Article Open access 09 May 2016

Computational Mechanics of Input–Output Processes: Structured Transformations and the $\epsilon $-Transducer

Article 12 August 2015

Keywords

1 Introduction

The information theoretic model of a communication channel for message transmission is described by the conditional probability law of the channel output given the input. For instance, the binary symmetric channel is a model for describing the communication of binary data in which noise may cause random bit-flips with a fixed probability. A reliable encoded transmission of a message generally entails multiple uses of the channel. In several applications, such as mobile wireless communication, digital fingerprinting, and storage memories, the probability law characterizing the channel can change with time. This time-varying behavior of the channel probability is described typically in terms of the evolution of the underlying channel condition, termed “state.” The availability of channel state information (CSI) at the transmitter or receiver can enhance overall communication performance (cf. [7, 1, 6]).

We consider a state-dependent discrete memoryless channel (DMC), in which the underlying state process is independent and identically distributed (i.i.d.) with known probability mass function (PMF), and for which the channel output at any time instant depends on the inputs and states only through their current values. We address the cases of causal and noncausal CSI at the transmitter. In the former case, the transmitter has knowledge of all the past channel states as well as the current state; this model was introduced by Shannon [8]. In the latter case, the transmitter is provided access at the outset to the entire state sequence prevailing during the transmission of a message; see Gelfand–Pinsker [5]. We restrict ourselves to the situation where the receiver has no CSI, for receiver CSI can be accommodated by considering the states, too, as channel outputs.

Two information theoretic performance measures are of interest: Channel capacity and reliability function. The channel capacity characterizes the largest rate of encoded transmission for reliable communication. The reliability function describes the best exponential rate of decay of decoding error probability with transmission duration for coding rates below capacity. The capacities of the models above with causal and noncausal CSI were characterized in classic papers by Shannon [8] and Gelfand–Pinsker [5]. The reliability function is not fully characterized even for a DMC without states; however, good upper and lower bounds are known, which coincide at rates close to capacity [9, 10, 3].

Our contributions are twofold. First, we provide a strong converse for the capacity of state-dependent channels, which explains the structure of optimal codes. Second, exploiting this structure, we obtain upper bounds for the reliability functions of the causal and noncausal CSI models. Instrumental to our proofs is a new technical result which provides an upper bound on the rate of codes with code words that are “conditionally typical over large message-dependent subsets of a typical set of state sequences.” This technical result is a nonstraightforward analog of [3, Lemma 2.1.4] for a DMC without states; the latter provides a bound on the rate of a good code with codewords of a fixed composition. A preliminary conference version of this work is in [11].

In the next section, we compile pertinent technical concepts and tools that will be used to prove our results. These standard staples can be found, for instance, in [3, 2]. The channel models are described in Sect. 22.3. Sections 22.4–22.6 contain our main results.

2 Preliminaries: Types, Typical Sets and Image Sets

Let $\mathcal{X}$ be a finite set. For a sequence $\mathbf{x} \in {\mathcal{X}}^{n}$, the type of x, denoted by Q _x, is a pmf on $\mathcal{X}$, where Q _x(x) is the relative frequency of x in x. Similarly, joint types are pmfs on product spaces. For example, the joint type of two given sequences $\mathbf{x} \in {\mathcal{X}}^{n}$ $\mathbf{s} \in {\mathcal{S}}^{n}$ is a pmf Q on $\mathcal{X}\times \mathcal{S}$, where Q _x, s(x, s) is the relative frequency of the tuple (x, s) among the tuples $\left ({x}_{t},{s}_{t}\right )$, t = 1, …, n. Joint types of several n-length sequences are defined similarly.

The number of types of sequences in ${\mathcal{X}}^{n}$ is bounded above by ${(n + 1)}^{\vert \mathcal{X}\vert }$. Denoting by ${\mathcal{T}}_{Q}^{(n)}$ the set of all sequences in ${\mathcal{X}}^{n}$ of type Q, we note that

$$\begin{array}{rlrlrl} {(n + 1)}^{-\|\mathcal{X}\|}\exp [nH(Q)] \leq \left \|{\mathcal{T}}_{ Q}^{(n)}\right \| \leq \exp [nH(Q)]. & &\end{array}$$

(22.1)

For any pmf P on $\mathcal{X}$, and type Q on ${\mathcal{X}}^{n}$,

$$\begin{array}{rlrlrl} {P}_{}^{n}(\mathbf{x}) & ={ \prod }_{t=1}^{n}{P}_{}({x}_{ t}) ={ \prod }_{x\in \mathcal{X}}{P}_{}{(x)}^{nQ(x)} & & \\ & =\exp [-n(D({P}_{}\|Q) + H(Q))],\quad \mathbf{x} \in {\mathcal{T}}_{Q}^{(n)}, & & \end{array}$$

from which, along with (22.1), it follows that

$$\begin{array}{rlrlrl} {(n + 1)}^{-\|\mathcal{X}\|}\exp [-n(D({P}_{}\|Q)] \leq {P}_{}^{n}\left ({\mathcal{T}}_{ Q}^{(n)}\right ) \leq \exp [-n(D({P}_{}\|Q)]. & & \end{array}$$

Next, for a pmf P on $\mathcal{X}$ and δ > 0, a sequence $\mathbf{x} \in {\mathcal{X}}^{n}$ is P typical with constant δ if

$${\max }_{x\in \mathcal{X}}\left \vert {Q}_{\mathbf{x}}(x) - {P}_{}(x)\right \vert \leq \delta ,$$

and P(x) = 0 implies Q _x(x) = 0. The set of all P -typical sequences with constant δ, is called the P -typical set, denoted ${\mathcal{T}}_{[{P}_{}]}^{(n)}$ (where the dependence on δ is not displayed explicitly). Thus,

$${\mathcal{T}}_{[{P}_{}]}^{(n)} ={ \bigcup \nolimits }_{{ { \mathrm{types}Q : \atop \max }_{x\in \mathcal{X}}\left \vert {Q}_{\mathbf{x}}(x) - {P}_{}(x)\right \vert \leq \delta } }{\mathcal{T}}_{Q}^{(n)}.$$

In general, δ = δ_n and is assumed to satisfy the “δ-convention” [3], namely

$$\begin{array}{rlrlrl} {\delta }_{n} \rightarrow 0,\quad \sqrt{n}{\delta }_{n} \rightarrow \infty \mathrm{as}n \rightarrow \infty. & &\end{array}$$

(22.2)

The typical set has large probability. Precisely, for δ = δ_n as in (22.2),

$$\begin{array}{rlrlrl} {P}_{}^{n}\left ({\mathcal{T}}_{ Q}^{(n)}\right ) \geq 1 - \frac{\|\mathcal{X}\|} {4n{\delta }^{2}}. & &\end{array}$$

(22.3)

Consider sequences $\mathbf{x} \in {\mathcal{X}}^{n}$, $\mathbf{y} \in {\mathcal{Y}}^{n}$ of joint type Q _x, y. The sequence $\mathbf{y} \in {\mathcal{Y}}^{n}$ has conditional type V if Q _x, y = Q _x V , for some stochastic matrix $V : \mathcal{X} \rightarrow \mathcal{Y}$. Given a stochastic matrix $W : \mathcal{X} \rightarrow \mathcal{Y}$, and $\mathbf{x} \in {\mathcal{X}}^{n}$, a sequence $\mathbf{y} \in {\mathcal{Y}}^{n}$ of conditional type V is W-conditionally typical if for all $x \in \mathcal{X}$:

$${\max }_{y\in \mathcal{Y}}\left \vert V (y\mid x) - W(y\mid x)\right \vert \leq \delta ,$$

and W(y∣x) = 0 implies V (y∣x) = 0. The set of all W-conditionally typical sequences conditioned on $\mathbf{x} \in {\mathcal{X}}^{n}$ is denoted by ${\mathcal{T}}_{[W]}^{(n)}(\mathbf{x})$. In a manner similar to (22.3), it holds that

$$\begin{array}{rlrlrl} {W}^{n}\left ({\mathcal{T}}_{ [W]}^{(n)}(\mathbf{x})\mid \mathbf{x}\right ) \geq 1 - \frac{\|\mathcal{X}\|\|\mathcal{Y}\|} {4n{\delta }^{2}}. & & \end{array}$$

For a subset A of $\mathcal{X}$, we shall require also estimates of the minimum cardinality of sets in $\mathcal{Y}$ with significant W-conditional probability given x ∈ A. Precisely, a set $B \subseteq \mathcal{Y}$ is an ε-image (0 < ε ≤ 1) of $A \subseteq \mathcal{X}$ under $W : \mathcal{X} \rightarrow \mathcal{Y}$ if W(B∣x) ≥ ε for all x ∈ A. The minimum cardinality of ε-images of A is termed the image size of A (under W), and is denoted by g _W(A, ε). Coding theorems in information theory use estimates of the rates of the image size of $A \subseteq {\mathcal{X}}^{n}$ under W ⁿ, i.e., $(1/n)\log {g}_{{W}^{n}}(A,\epsilon )$. In particular, for multiterminal systems, we compare the rates of image sizes of $A \subseteq {\mathcal{X}}^{n}$ under two different channels W ⁿ and V ⁿ. Precisely, given stochastic matrices $W : \mathcal{X} \rightarrow \mathcal{Y}$ and $V : \mathcal{X} \rightarrow \mathcal{S}$, for every 0 < ε < 1, δ > 0 and for every $A \subseteq {\mathcal{T}}_{[{P}_{X}]}^{(n)}$, there exists an auxiliary rv U and associated pmfs P _UXY = P _U∣X P _X W and ${P}_{UXZ} = {P}_{U\mid X}{P}_{X}V$ such that

$$\begin{array}{rlrlrl} \left \vert \frac{1} {n}\log {g}_{{W}^{n}}(B({m}_{0}),\epsilon ) - H(Y \vert U) - t\right \vert & < \delta , & \\ \left \vert \frac{1} {n}\log {g}_{{V }^{n}}(B({m}_{0}),\epsilon ) - H(S\vert U) - t\right \vert & < \delta , & & \end{array}$$

(22.4)

where $0 \leq t \leq \min \{ I(U \wedge Y ),I(U \wedge S)\}$.

3 Channels with States

Consider a state-dependent DMC $W : \mathcal{X}\times \mathcal{S}\rightarrow \mathcal{Y}$ with finite input, state, and output alphabets $\mathcal{X}$, $\mathcal{S}$, and $\mathcal{Y}$, respectively. The $\mathcal{S}$-valued state process {S _t}_t = 1 ^∞ is i.i.d. with known pmf P _S. The probability law of the DMC is specified by

$$\begin{array}{rlrlrl} {W}^{n}(\mathbf{y}\mid \mathbf{x},\mathbf{s}) ={ \prod }_{t=1}^{n}W({y}_{ t}\mid {x}_{t},{s}_{t}),\quad \mathbf{x} \in {\mathcal{X}}^{n}\,,\mathbf{s} \in {\mathcal{S}}^{n},\,\mathbf{y} \in {\mathcal{Y}}^{n}. & & \end{array}$$

An (M, n)-code with encoder CSI consists of the mappings (f, ϕ) where the encoder mapping $f = \left ({f}_{1},\ldots ,{f}_{n}\right )$ is either causal, i.e.,

$$\begin{array}{rlrlrl} {f}_{t} : \mathcal{M}\times {\mathcal{S}}^{t} \rightarrow \mathcal{X},\quad t = 1,\ldots ,n, & & \end{array}$$

or noncausal, i.e.,

$$\begin{array}{rlrlrl} {f}_{t} : \mathcal{M}\times {\mathcal{S}}^{n} \rightarrow \mathcal{X},\quad t = 1,\ldots ,n. & & \end{array}$$

with $\mathcal{M} =\{ 1,\ldots ,M\}$ being the set of messages. The decoder ϕ is a mapping

$$\begin{array}{rlrlrl} \phi : {\mathcal{Y}}^{n} \rightarrow \mathcal{M}. & & \end{array}$$

We restrict ourselves to the situation where the receiver has no CSI. When the receiver, too, has CSI, our results apply in a standard manner by considering an associated DMC with augmented output alphabet $\mathcal{Y}\times \mathcal{S}$.

The rate of the code is (1 ∕ n)logM. The corresponding (maximum) probability of error is

$$\begin{array}{rlrlrl} e(f,\phi ) {=\max }_{m\in \mathcal{M}}{\sum }_{\mathbf{s}\in {\mathcal{S}}^{n}}{P}_{S}(\mathbf{s}){W}^{n}({({\phi }^{-1}(m))}^{c}\mid f(m,\mathbf{s}),\mathbf{s}), & &\end{array}$$

(22.5)

where ${\phi }^{-1}(m) =\{ \mathbf{y} \in {\mathcal{Y}}^{n} : \phi (\mathbf{y}) = m\}$ and ( ⋅)^c denotes complement.

Definition 23.1.

Given 0 < ε < 1, a number R > 0 is ε-achievable if for every δ > 0 and for all n sufficiently large, there exist (M, n)-codes (f, ϕ) with $(1/n)\log M > R - \delta $ and e(f, ϕ) < ε. The supremum of all ε-achievable rates is denoted by C(ε). The capacity of the DMC is

$$C {=\lim }_{\epsilon \rightarrow 0}C(\epsilon ).$$

If C(ε) = C for 0 < ε < 1, the DMC is said to satisfy a strong converse [12]. This terminology reflects the fact that for rates R > C, e(f, ϕ) > ε for n ≥ N(ε), 0 < ε < 1. (In contrast, a standard converse shows that for R > C, e(f, ϕ) cannot be driven to 0 as n → ∞.)

For a given pmf ${P}_{\tilde{X}\tilde{S}}$ on $\mathcal{X}\times \mathcal{S}$ and an rv U with values in a finite set $\mathcal{U}$, let $\mathcal{P}\left ({P}_{\tilde{X}\tilde{S}},W\right )$ denote the set of all pmfs P _UXSY on $\mathcal{U}\times \mathcal{X}\times \mathcal{S}\times \mathcal{Y}$ with

$$\begin{array}{rlrlrl} X = h(U,S) & &\end{array}$$

(22.6)

for some mapping h,

$$\begin{array}{rlrlrl} &U -\circ - X,S -\circ - Y,\qquad {P}_{X,S,Y } = {P}_{\tilde{X}\tilde{S}}W. &\end{array}$$

(22.7)

For γ ≥ 0, let ${\mathcal{P}}_{\gamma }\left ({P}_{\tilde{X}\tilde{S}},W\right )$ be the subset of $\mathcal{P}\left ({P}_{\tilde{X}\tilde{S}},W\right )$ with I(U ∧ S) ≤ γ; note that ${\mathcal{P}}_{0}\left ({P}_{\tilde{X}\tilde{S}},W\right )$ corresponds to the subset of $\mathcal{P}\left ({P}_{\tilde{X}\tilde{S}},W\right )$ with U independent of S.

The classical results on the capacity of a state-dependent channel are due to Shannon [8] when the encoder CSI is causal and Gelfand and Pinsker [5] when the encoder CSI is noncausal.

Theorem 23.1.

For the case with causal CSI, the capacity is

$${C}_{Sh} {=\max { }_{{P}_{X\mid S}}\max }_{{\mathcal{P}}_{0}\left ({P}_{X\mid S}{P}_{S},W\right )}I(U \wedge Y ),$$

and holds with the strong converse.

Remark.

The capacity formula was derived by Shannon [8], and the strong converse was proved later by Wolfowitz [12].

Theorem 23.2 (Gelfand–Pinsker [5]).

For the case with noncausal CSI, the capacity is

$${C}_{GP} {=\max { }_{{P}_{X\mid S}}\max }_{\mathcal{P}\left ({P}_{X\mid S}{P}_{S},W\right )}I(U \wedge Y ) - I(U \wedge S).$$

One main result below is to show that the previous result, too, holds with a strong converse.

Definition 23.2.

The reliability functionE(R), R ≥ 0, of the DMC W is the largest number E ≥ 0 such that for every δ > 0 and for all sufficiently large n, there exist n-length block codes (f, ϕ) with causal or noncausal CSI as above of rate greater than R − δ and $e(f,\phi ) \leq \exp [-n(E - \delta )]$ (see, for instance, [3]).

4 A Technical Lemma

For a DMC without states, the result in [3, Corollary 6.4] provides, in effect, an image size characterization of a good codeword set; this does not involve any auxiliary rv. In the same spirit, our key technical lemma below provides an image size characterization for good codeword sets for the causal and noncausal DMC models, which now involves an auxiliary rv.

Lemma 23.1.

Let ε,τ > 0 be such that ε + τ < 1. Given a pmf ${P}_{\tilde{S}}$ on $\mathcal{S}$ and conditional pmf ${\tilde{P}}_{X\mid S}$ , let (f,ϕ) be a (M,n)-code as above. For each $m \in \mathcal{M}$ , let A(m) be a subset of ${\mathcal{S}}^{n}$ which satisfies the following conditions:

$$\begin{array}{rcl} & A(m) \subseteq {\mathcal{T}}_{[{P}_{\tilde{S}}]}^{n}, &\end{array}$$

(22.8)

$$\begin{array}{rcl}& \|A(m)\| \geq \exp \left [n\left (H({P}_{\tilde{S}}) -\frac{\tau } {6} \right )\right ], &\end{array}$$

(22.9)

$$\begin{array}{rcl}& f(m,\mathbf{s}) \in {\mathcal{T}}_{[{P}_{\tilde{X}\mid \tilde{S}}]}^{n}(\mathbf{s}),\,\,\,\,\,\,\,\,\mathbf{s} \in A(m).&\end{array}$$

(22.10)

Furthermore, let (f,ϕ) satisfy one of the following two conditions:

$$ \begin{array}{rcl} & {W}^{n}({\phi }^{-1}(m)\mid f(m,\mathbf{s}),\mathbf{s}) \geq 1 - \epsilon ,\,\,\,\,\,\,\,\,\mathbf{s} \in A(m), & \end{array}$$

(22.11a)

$$\begin{array}{rcl}& \frac{1} {\|A(m)\|}{ \sum \nolimits }_{\mathbf{s}\in A(m)}{W}^{n}({\phi }^{-1}(m)\mid f(m,\mathbf{s}),\mathbf{s}) \geq 1 - \epsilon.&\end{array}$$

(22.11b)

(a)
In the causal CSI case, for $n \geq N(\|\mathcal{X}\|,\|\mathcal{S}\|,\|\mathcal{Y}\|,\tau ,\epsilon )$,^{Footnote 1} it holds that
$$\begin{array}{rcl} \frac{1} {n}\log M \leq I(U \wedge Y ) + \tau ,& & \\ \end{array}$$
for some ${P}_{UXSY } \in {\mathcal{P}}_{\tau }({P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}},W)$ .
(b)
In the noncausal CSI case, for $n \geq N(\|\mathcal{X}\|,\|\mathcal{S}\|,\|\mathcal{Y}\|,\tau ,\epsilon )$ , it holds that
$$\begin{array}{rcl} \frac{1} {n}\log M \leq I(U \wedge Y ) - I(U \wedge S) + \tau ,& & \\ \end{array}$$
for some ${P}_{UXSY } \in \mathcal{P}({P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}},W)$ .

Furthermore, in both cases it suffices to restrict the rv U to take values in a finite set $\mathcal{U}$ with $\|\mathcal{U}\|\leq \|\mathcal{X}\|\|\mathcal{S}\| + 1$ .

Proof.

Our proof below is for the case when (22.11a) holds. The case when (22.11b) holds can be proved similarly with minor modifications; specifically, in the latter case, we can find subsets A′(m) of A(m), $m \in \mathcal{M}$, that satisfy (22.8)–(22.10) and (22.11a) for some ε′, τ′ > 0 with ε′ + τ′ < 1 for all n sufficiently large.

With (22.11a) holding, set

$$\begin{array}{rcl} B(m) =\{ (f(m,\mathbf{s}),\mathbf{s}) \in {\mathcal{X}}^{n} \times {\mathcal{S}}^{n} : \mathbf{s} \in A(m)\},\quad m \in \mathcal{M}.& & \\ \end{array}$$

Let ${P}_{\tilde{Y }} = {P}_{\tilde{X}\tilde{S}}W$ be a pmf on $\mathcal{Y}$ defined by

$$\begin{array}{rcl}{ P}_{\tilde{Y }}(y) ={ \sum \nolimits }_{s,x}{P}_{\tilde{S}\tilde{X}}(s,x)W(y\mid x,s),\quad y \in \mathcal{Y}.& & \\ \end{array}$$

Consequently,

$$\begin{array}{rcl}{ W}^{n}({\mathcal{T}}_{ [{P}_{\tilde{Y}}]}^{n}\mid f(m,\mathbf{s}),\mathbf{s}) > \epsilon + \tau ,\quad \mathbf{s} \in A(m),& &\end{array}$$

(22.12)

for all $n \geq N(\|\mathcal{X}\|,\vert \mathcal{S}\|,\vert \mathcal{Y}\|,\tau ,\epsilon )$ (not depending on m and s in A(m)). Denoting

$$\begin{array}{rcl} C(m) = {\phi }^{-1}(m) \cap {\mathcal{T}}_{ [{P}_{\tilde{Y}}]}^{n},& & \\ \end{array}$$

we see from (22.11a) and (22.12) that

$$\begin{array}{rcl}{ W}^{n}(C(m)\mid f(m,\mathbf{s}),\mathbf{s}) > \tau > 0,\quad ,(f(m,\mathbf{s}),\mathbf{s}) \in B(m),& & \\ \end{array}$$

so that

$$\begin{array}{rcl} \|C(m)\| \geq {g}_{{W}^{n}}(B(m),\tau ),& & \\ \end{array}$$

where ${g}_{{W}^{n}}(B(m),\tau )$ denotes the smallest cardinality of a subset D of ${\mathcal{Y}}^{n}$ with

$$\begin{array}{rcl}{ W}^{n}(D\mid (f(m,\mathbf{s}),\mathbf{s})) > \tau ,\,\,\,\,\,\,\,\,(f(m,\mathbf{s}),\mathbf{s}) \in B(m).& &\end{array}$$

(22.13)

With ${m}_{0} ={ arg\min }_{1\leq m\leq M}\|C(m)\|$, we have

$$\begin{array}{rcl} M\|C({m}_{0})\| \leq {\sum \nolimits }_{m=1}^{M}\|C(m)\| =\| {\mathcal{T}}_{ [{P}_{\tilde{Y}}]}^{n}\| \leq \exp n\bigg{(}H({P}_{\tilde{ Y }}) + \frac{\tau } {6}\bigg{)}.& & \\ \end{array}$$

Consequently,

$$\begin{array}{rcl} \frac{1} {n}\log M \leq H({P}_{\tilde{Y }}) + \frac{\tau } {6} - \frac{1} {n}\log {g}_{{W}^{n}}(B({m}_{0}),\tau ).& &\end{array}$$

(22.14)

The remainder of the proof entails relating the “image size” of B(m ₀), i.e., ${g}_{{W}^{n}}(B({m}_{0}),\tau )$, to $\|A({m}_{0})\|$, and is completed below separately for the cases of causal and noncausal CSI.

First consider the causal CSI case. For a rv $\hat{{S}}^{n}$ distributed uniformly over A(m ₀), we have from (22.9) that

$$\begin{array}{rcl} \frac{1} {n}H(\hat{{S}}^{n}) = \frac{1} {n}\log \|A({m}_{0})\| \geq H({P}_{\tilde{S}}) -\frac{\tau } {6}.& &\end{array}$$

(22.15)

Since

$$\frac{1} {n}H(\hat{{S}}^{n}) = \frac{1} {n}{\sum \nolimits }_{i=1}^{n}H(\hat{{S}}_{ i}\mid \hat{{S}}^{i-1}) = H(\hat{{S}}_{ I}\mid \hat{{S}}^{I-1},I),$$

where the rv I is distributed uniformly over the set {1, …, n} and is independent of all other rvs, the previous identity, together with (22.15), yields

$$\begin{array}{rcl} H({P}_{\tilde{S}}) - H(\hat{{S}}_{I}\mid \hat{{S}}^{I-1},I) \leq \frac{\tau } {3}.& &\end{array}$$

(22.16)

Next, denote by $\hat{{X}}^{n}$ the rv f(m ₀, \hat{S}ⁿ) and by $\hat{{Y }}^{n}$ the rv which conditioned on $\hat{{X}}^{n}$, $\hat{{S}}^{n}$, has (conditional) distribution W ⁿ, i.e., $\hat{{Y }}^{n}$ is the random output of the DMC W when the input is set to $\left (\hat{{X}}^{n},\hat{{S}}^{n}\right )$. Then, using [3, Lemma 15.2], we get

$$\begin{array}{rcl} \frac{1} {n}\log {g}_{{W}^{n}}(B({m}_{0}),\tau ) \geq \frac{1} {n}H(\hat{{Y }}^{n}) -\frac{\tau } {6},& &\end{array}$$

(22.17)

for all n sufficiently large. Furthermore,

$$\begin{array}{rcl} \frac{1} {n}H(\hat{{Y }}^{n})& = \frac{1} {n}{ \sum \nolimits }_{i=1}^{n}H(\hat{{Y }}_{i}\mid \hat{{Y }}^{i-1}) & \\ & \geq H(\hat{{Y }}_{I}\mid \hat{{X}}^{I-1},\hat{{S}}^{I-1},\hat{{Y }}^{I-1},I)& \\ & = H(\hat{{Y }}_{I}\mid \hat{{X}}^{I-1},\hat{{S}}^{I-1},I) & \\ & = H(\hat{{Y }}_{I}\mid \hat{{S}}^{I-1},I), & \\ \end{array}$$

where the last-but-one equality follows from the DMC assumption, and the last equality holds since $\hat{{X}}^{I-1} = f({m}_{0},\hat{{S}}^{I-1})$. The inequality above, along with (22.17) and (22.14) gives

$$\begin{array}{rcl} \frac{1} {n}\log M \leq H({P}_{\tilde{Y }}) - H(\hat{{Y }}_{I}\mid \hat{{S}}^{I-1},I) + \frac{\tau } {3}.& &\end{array}$$

(22.18)

Denote by $\hat{U}$ the rv $(\hat{{S}}^{I-1},I)$ and note that the following Markov property holds:

$$\hat{{Y }}_{I} -\circ -\hat{ {X}}_{I},\hat{{S}}_{I} -\circ -\hat{ U}.$$

Also, from the definition of B(m ₀),

$$\begin{array}{rcl}{ P}_{\hat{{X}}_{I},\hat{{S}}_{I}}(x,s)& = \frac{1} {n}{ \sum \nolimits }_{i=1}^{n}{P}_{\hat{{X}}_{i},\hat{{S}}_{i}}\left (x,s\right ) & \\ & = \frac{1} {n}{ \sum \nolimits }_{i=1}^{n}{ \sum \nolimits }_{\mathbf{x},\mathbf{s}\in B({m}_{0})}\frac{\mathbf{1}({x}_{i}=x,{s}_{i}=s)} {\|B({m}_{0})\|} & \\ & = \frac{1} {\|B({m}_{0})\|}{ \sum \nolimits }_{\mathbf{x},\mathbf{s}\in B({m}_{0})}{Q}_{\mathbf{x},\mathbf{s}}(x,s), & \\ \end{array}$$

where Q _x, s(x, s) is the joint type of x, s, and the last equation follows upon interchanging the order of summation. It follows from (22.8) and (22.10) that $\|{P}_{\hat{{X}}_{I},\hat{{S}}_{I}} - {P}_{\tilde{X}\tilde{S}}\| \leq {\delta }_{n}$ for some δ_n → 0 satisfying the delta convention. Furthermore,

$$\begin{array}{rcl} \|{P}_{\hat{{Y }}_{I}} - {P}_{\tilde{Y }}\|& ={ \sum \nolimits }_{y}\left \vert {\sum \nolimits }_{x,s}W(y\vert x,s){P}_{\tilde{X}\tilde{S}}(x,s) -{\sum \nolimits }_{x,s}W(y\vert x,s){P}_{\hat{{X}}_{I}\hat{{S}}_{I}}(x,s)\right \vert & \\ & \leq {\sum \nolimits }_{x,s}{ \sum \nolimits }_{y}W(y\vert x,s)\left \vert {P}_{\tilde{X}\tilde{S}}(x,s) - {P}_{\hat{{X}}_{I}\hat{{S}}_{I}}(x,s)\right \vert & \\ & =\| {P}_{\tilde{X}\tilde{S}} - {P}_{\hat{{X}}_{I}\hat{{S}}_{I}}\| \leq {\delta }_{n}. & \\ \end{array}$$

Let the rvs $\tilde{X},\tilde{S},\tilde{Y }$ have a joint distribution ${P}_{\tilde{X}\tilde{S}\tilde{Y }}$. Define a rv U which takes values in the same set as $\hat{U}$, has ${P}_{\hat{U}\mid \hat{{X}}_{I}\hat{{S}}_{I}}$ as its conditional distribution given X, S, and satisfies the Markov relation

$$Y -\circ - X,S -\circ - U.$$

Then using the continuity of the entropy function and the arguments above, (22.18) yields

$$\frac{1} {n}\log M \leq I(U \wedge Y ) + \tau ,$$

and (22.16) yields

$$I(U \wedge S) \leq \tau ,$$

for all n sufficiently large, where ${P}_{UXSY } \in {\mathcal{P}}_{\tau }({P}_{\tilde{X}\tilde{S}},W)$.

Turning to the case with noncausal CSI, define a stochastic matrix $V : \mathcal{X}\,\times \,\mathcal{S}\,\rightarrow \,\mathcal{S}$ with

$$\begin{array}{rcl} V (s{^\prime}\mid x,s) = \mathbf{1}(s{^\prime} = s),& & \\ \end{array}$$

and let g _{V n} be defined in a manner analogous to g _Wn above with ${\mathcal{S}}^{n}$ in the role of ${\mathcal{Y}}^{n}$ in (22.13). For any $m \in \mathcal{M}$ and subset E of ${\mathcal{S}}^{n}$, observe that

$$\begin{array}{rcl}{ V }^{n}(E\mid f(m,\mathbf{s}),\mathbf{s}) = \mathbf{1}(s \in E),\,\,\,\,\,\,\,\,\mathbf{s} \in {\mathcal{S}}^{n}.& & \\ \end{array}$$

In particular, if E satisfies

$$\begin{array}{rcl}{ V }^{n}(E\mid f(m,\mathbf{s}),\mathbf{s}) > \tau ,\,\,\,\,\,\,\,\,\mathbf{s} \in A(m),& &\end{array}$$

(22.19)

it must be that A(m) ⊆ E, and since E = A(m) satisfies (22.19), we get that

$$\begin{array}{rcl} \|A(m)\| = {g}_{{V }^{n}}(B(m),\tau )& &\end{array}$$

(22.20)

using the definition of B(m). Using the image size characterization in (22.4) [3, Theorem 15.11], there exists an auxiliary rv U and associated pmf ${P}_{UXSY } = {P}_{U\mid XS}{P}_{\tilde{X}\tilde{S}}W$ such that

$$\begin{array}{rcl} \left \vert \frac{1} {n}\log {g}_{{W}^{n}}(B({m}_{0}),\tau ) - H(Y \vert U) - t\right \vert & < \frac{\tau } {6} ,& \\ \left \vert \frac{1} {n}\log {g}_{{V }^{n}}(B({m}_{0}),\tau ) - H(S\vert U) - t\right \vert & < \frac{\tau } {6} ,&\end{array}$$

(22.21)

where $0 \leq t \leq \min \{ I(U \wedge Y ),I(U \wedge S)\}$. Then, using (22.14), (22.20), and (22.21) we get

$$\begin{array}{rcl} \frac{1} {n}\log M \leq I(U \wedge Y ) + H(S\mid U) - \frac{1} {n}\log \|A({m}_{0})\| + \frac{\tau } {2},& & \\ \end{array}$$

which, by (22.9), yields

$$\begin{array}{rcl} \frac{1} {n}\log M \leq I(U \wedge Y ) - I(U \wedge S) + \tau.& & \\ \end{array}$$

In (22.21), ${P}_{UXSY } \in \mathcal{P}({P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}},W)$ but need not satisfy (22.6). Finally, the asserted restriction to ${P}_{UXSY } \in \mathcal{P}({P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}},W)$ follows from the convexity of I(U ∧ Y ) − I(U ∧ S) in P _X∣US for a fixed P _US (as observed in [5]).

Lastly, it follows from the support lemma [3, Lemma 15.4] that it suffices to consider those rvs U for which $\|\mathcal{U}\|\leq \|\mathcal{X}\|\|\mathcal{S}\| + 1$.

5 The Strong Converse

Theorem 23.3 (Strong converse).

Given 0 < ε < 1 and a sequence of (M _n ,n) codes (f _n ,ϕ _n ) with e(f _n ,ϕ _n ) < ε, it holds that

$$\begin{array}{rcl} {limsup}_{n} \frac{1} {n}\log {M}_{n} \leq C,& & \\ \end{array}$$

where C = C _Sh and C _GP for the cases of causal and noncausal CSI, respectively.

Proof.

Given 0 < ε < 1 and a (M, n)-code (f, ϕ) with e(f, ϕ) ≤ ε, the proof involves the identification of sets A(m), $m \in \mathcal{M}$, satisfying (22.8)–(22.10) and (22.11a). The assertion then follows from Lemma 22.1. Note that e(f, ϕ) ≤ ε implies

$$\begin{array}{rcl} {\sum \nolimits }_{\mathbf{s}\in {\mathcal{S}}^{n}}{P}_{S}\left (\mathbf{s}\right ){W}^{n}({\phi }^{-1}(m)\mid f(m,\mathbf{s}),\mathbf{s}) \geq 1 - \epsilon & & \\ \end{array}$$

for all $m \in \mathcal{M}$. Since ${P}_{S}\left ({\mathcal{T}}_{[{P}_{S}]}^{n}\right ) \rightarrow 1$ as n → ∞, we get that for every $m \in \mathcal{M}$,

$$\begin{array}{rcl}{ P}_{S}\left (\{\mathbf{s} \in {\mathcal{T}}_{[{P}_{S}]}^{n} : {W}^{n}({\phi }^{-1}(m)\mid f(m,\mathbf{s}),\mathbf{s}) > \frac{1 - \epsilon } {2} \}\right ) \geq \frac{1 - \epsilon } {3} & &\end{array}$$

(22.22)

for all $n \geq N(\|\mathcal{S}\|,\epsilon )$. Denoting the set $\{ \cdot \}$ in (22.22) by $\hat{A}(m)$, clearly for every $m \in \mathcal{M}$,

$$\begin{array}{rcl}{ W}^{n}({\phi }^{-1}(m)\mid f(m,\mathbf{s}),\mathbf{s}) \geq \frac{1 - \epsilon } {2} ,\,\,\,\,\,\,\,\,\mathbf{s} \in \hat{ A}(m),& & \\ \end{array}$$

and

$$\begin{array}{rcl}{ P}_{S}\left (\hat{A}(m)\right ) \geq \frac{1 - \epsilon } {3} & & \\ \end{array}$$

for all $n \geq N(\|\mathcal{S}\|,\epsilon )$, whereby for an arbitrary δ > 0, we get

$$\begin{array}{rcl} \|\hat{A}(m)\| \geq \exp [n(H({P}_{S}) - \delta )]& & \\ \end{array}$$

for all $n \geq N(\|\mathcal{S}\|,\delta )$. Partitioning $\hat{A}(m)$, $m \in \mathcal{M}$, into sets according to the (polynomially many) conditional types of f(m, s) given s in $\hat{A}(m)$, we obtain a subset A(m) of $\hat{A}(m)$ for which

$$\begin{array}{rcl} f(m,\mathbf{s})& \in {\mathcal{T}}_{m}^{n}(\mathbf{s}),\,\,\,\,\,\,\,\,\mathbf{s} \in A(m),& \\ \|A(m)\|& \geq \exp [n(H({P}_{S}) - 2\delta )], & \\ \end{array}$$

for all $n \geq N(\|\mathcal{S}\|,\|\mathcal{X}\|,\delta )$, where ${\mathcal{T}}_{m}^{n}(\mathbf{s})$ represents a set of those sequences in ${\mathcal{X}}^{n}$ that have the same conditional type (depending only on m).

Once again, the polynomial size of such conditional types yields a subset $\mathcal{M}{^\prime}$ of $\mathcal{M}$ such that f(m, s) has a fixed conditional type (not depending on m) given s in A(m), and with

$$\begin{array}{rcl} \frac{1} {n}\log \|\mathcal{M}{^\prime}\| \geq \frac{1} {n}\log M - \delta & & \\ \end{array}$$

for all $n \geq N(\|\mathcal{S}\|,\|\mathcal{X}\|,\delta )$. Finally, the strong converse follows by applying Lemma 22.1 to the subcode corresponding to $\mathcal{M}{^\prime}$ and noting that δ > 0 is arbitrary.

6 Outer Bound on Reliability Function

An upper bound for the reliability function E(R), 0 < R < C, of a DMC without states is derived in [3] using a strong converse for codes with codewords of a fixed type. The key technical Lemma 22.1 gives an upper bound on the rate of codes with codewords that are conditionally typical over large message-dependent subsets of the typical set of state sequences and serves, in effect, as an analog of [3, Corollary 6.4] for state-dependent channels to derive an upper bound on the reliability function.

Theorem 23.4 (Sphere packing bound).

Given δ > 0, for 0 < R < C, it holds that

$$\begin{array}{rcl} E(R) \leq {E}_{SP}(1 + \delta ) + \delta ,& & \\ \end{array}$$

where

$$\begin{array}{rcl}{ E}_{SP} {=\min { }_{{P}_{\tilde{S}}}\max {}_{{P}_{\tilde{X}\mid \tilde{S}}}\min }_{V \in \mathcal{V}(R,{P}_{\tilde{X}\tilde{S}})}\big{[}D({P}_{\tilde{S}}\|{P}_{S}) + D(V \|W\mid {P}_{\tilde{X}\tilde{S}})\big{]}& &\end{array}$$

(22.23)

with

$$\begin{array}{rcl} \mathcal{V}(R,{P}_{\tilde{X}\tilde{S}}) = {\mathcal{V}}_{Sh}(R,{P}_{\tilde{X}\tilde{S}}) = \{V : \mathcal{X}\times \mathcal{S}\rightarrow \mathcal{Y} {:\max }_{{P}_{UXSY }\in {\mathcal{P}}_{0}({P}_{\tilde{X}\tilde{S}},V )}I(U \wedge Y ) < R\},& &\end{array}$$

(22.24)

and

$$\begin{array}{rcl} \mathcal{V}(R,{P}_{\tilde{X}\tilde{S}})& = {\mathcal{V}}_{GP}(R,{P}_{\tilde{X}\tilde{S}}) & \\ & = \{V : \mathcal{X}\times \mathcal{S}\rightarrow \mathcal{Y} {:\max }_{{P}_{UXSY }\in \mathcal{P}({P}_{\tilde{X}\tilde{S}},V )}I(U \wedge Y ) - I(U \wedge S) < R\},&\end{array}$$

(22.25)

for the causal and noncausal CSI cases, respectively.

Remark.

In (22.23), the terms $D({P}_{\tilde{S}}\|{P}_{S})$ and $D(V \|W\mid {P}_{\tilde{S}}{P}_{\tilde{X}\mid \tilde{S}})$ account, respectively, for the shortcomings of a given code for corresponding “bad” state pmf and “bad” channel.

Proof.

Consider sequences of type ${P}_{\tilde{S}}$ in ${\mathcal{S}}^{n}$. Picking $\hat{A}(m) = {\mathcal{T}}_{{P}_{\tilde{S}}}^{n}$, $m \in \mathcal{M}$, in the proof of Theorem 22.3, and following the arguments therein to extract the subset A(m) of $\hat{A}(m)$, we have for a given δ > 0 that for all $n \geq N(\|\mathcal{S}\|,\|\mathcal{X}\|,\delta )$, there exists a subset $\mathcal{M}{^\prime}$ of $\mathcal{M}$ and a fixed conditional type, say ${P}_{\tilde{X}\mid \tilde{S}}$ (not depending on m), such that for every $m \in \mathcal{M}{^\prime}$,

$$\begin{array}{rcl} A(m)& \subseteq \hat{ A}(m) = {\mathcal{T}}_{{P}_{\tilde{S}}}^{n}, & \\ \|A(m)\|& \geq \exp [n(H({P}_{\tilde{S}}) - \delta )], & \\ f(m,\mathbf{s})& \in {\mathcal{T}}_{{P}_{\tilde{X}\mid \tilde{S}}}^{n}(\mathbf{s}),\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\mathbf{s} \in A(m),& \\ \frac{1} {n}\log \|\mathcal{M}{^\prime}\|& \geq R - \delta. & \\ \end{array}$$

Then for every $V \in \mathcal{V}(R,{P}_{\tilde{X}\tilde{S}})$, we obtain using Lemma 22.1 (in its version with condition (22.11b)), that for every δ′ > 0, there exists $m \in \mathcal{M}{^\prime}$ (possibly depending on δ′ and V ) with

$$\begin{array}{rcl} \frac{1} {\|A(m)\|}{\sum \nolimits }_{\mathbf{s}\in A(m)}{V }^{n}({({\phi }^{-1}(m))}^{c}\mid f(m,\mathbf{s}),\mathbf{s}) \geq 1 - \delta {^\prime}& & \\ \end{array}$$

for all $n \geq N(\|\mathcal{S}\|,\|\mathcal{X}\|,\|\mathcal{Y}\|,\delta {^\prime})$. Since the average V ⁿ-(conditional) probability of ${\left ({\phi }^{-1}(m)\right )}^{c}$ is large, its W ⁿ-(conditional) probability cannot be too small. To that end, for this m, apply [3, Theorem 10.3, (10.21)] with the choices

$$\begin{array}{rcl} Z& = {\mathcal{Y}}^{n} \times A(m), & \\ S& = {({\phi }^{-1}(m))}^{c} \times A(m),& \\ {Q}_{1}(\mathbf{y},\mathbf{s})& = \frac{{V }^{n}(\mathbf{y}\mid f(m,\mathbf{s}),\mathbf{s})} {\|A(m)\|} , & \\ {Q}_{2}(\mathbf{y},\mathbf{s})& = \frac{{W}^{n}(\mathbf{y}\mid f(m,\mathbf{s}),\mathbf{s})} {\|A(m)\|} , & \\ \end{array}$$

for (y, s) ∈ Z, to obtain

$$\begin{array}{rcl} \frac{1} {\|A(m)\|}{\sum \nolimits }_{\mathbf{s}\in A(m)}{W}^{n}({({\phi }^{-1}(m))}^{c}\mid f(m,\mathbf{s}),\mathbf{s}) \geq \exp \left (-\frac{nD(V \|W\mid {P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}}) + 1} {1 - \delta {^\prime}} \right ).& & \\ \end{array}$$

Finally,

$$\begin{array}{rcl} e(f,\phi )& \geq {\sum \nolimits }_{\mathbf{s}\in A(m)}{P}_{S}\left (\mathbf{s}\right ){W}^{n}({({\phi }^{-1}(m))}^{c}\mid f(m,\mathbf{s}),\mathbf{s}) & \\ & \geq \exp [-n(D({P}_{\tilde{S}}\|{P}_{S}) + D(V \|W\mid {P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}})(1 + \delta ) + \delta )]& \\ \end{array}$$

for $n \geq N(\|\mathcal{S}\|,\|\mathcal{X}\|,\|\mathcal{Y}\|,\delta ,\delta {^\prime})$, whereby it follows for the noncausal CSI case that

$$ \begin{array}{rcl} \lim {\sup }_{n} - \frac{1} {n}\log e(f,\phi )\,&{ \leq \,\min {}_{{P}_{\tilde{S}}}\max {}_{{P}_{\tilde{X}\mid \tilde{S}}}\min }_{V \in \mathcal{V}(R,{P}_{\tilde{X}\tilde{S}})}[D({P}_{\tilde{S}}\|{P}_{S})& \\ & \quad + D(V \|W\mid {P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}})(1 + \delta ) + \delta ]& \\ \end{array}$$

for every δ > 0. Similarly, for the case of causal CSI, for τ > 0, letting

$$\begin{array}{rcl}{ \mathcal{V}}_{\tau }(R,{P}_{\tilde{X}\tilde{S}}) = \{V : \mathcal{X}\times \mathcal{S}\rightarrow \mathcal{Y} {:\max }_{{P}_{UXSY }\in {\mathcal{P}}_{\tau }({P}_{\tilde{X}\tilde{S}},V )}I(U \wedge Y ) < R\},& &\end{array}$$

(22.26)

we get

$$ \begin{array}{rcl} \lim {\sup }_{n} - \frac{1} {n}\log e(f,\phi ) {\leq \min {}_{{P}_{\tilde{S}}}\max {}_{{P}_{\tilde{X}\mid \tilde{S}}}\min }_{V \in {\mathcal{V}}_{\tau }(R,{P}_{\tilde{X}\tilde{S}})}[D({P}_{\tilde{S}}\|{P}_{S}) + D(V \|W\mid {P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}})].& & \\ \end{array}$$

The continuity of the right side of (22.26), as shown in the Appendix, yields the claimed expression for E _SP in (22.23) and (22.24).

7 Acknowledgement

The work of H. Tyagi and P. Narayan was supported by the U.S. National Science Foundation under Grants CCF0830697 and CCF1117546.

8 Appendix: Continuity of the Right Side of (23.26)

Let

$$\begin{array}{rlrlrl} f(R,{P}_{U\tilde{X}\tilde{S}}) {=\min }_{{ V :I(U\wedge Y )<R \atop {P}_{Y\mid \tilde{X}\tilde{S}}=V} }D({P}_{\tilde{S}}\|{P}_{S}) + D(V \|W\mid {P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}}). & &\end{array}$$

(22.27)

Further, let

$$\begin{array}{rlrlrl} g({P}_{\tilde{S}},\tau ) {=\max }_{{ {P}_{U\tilde{X}\mid \tilde{S}}:I(U\wedge \tilde{S})\leq \tau \atop U-\circ -\tilde{X},\tilde{S}-\circ -Y} }f(R,{P}_{U\tilde{X}\tilde{S}}), & &\end{array}$$

(22.28)

and

$$\begin{array}{rlrlrl} g(\tau ) {=\min }_{{P}_{\tilde{S}}}g({P}_{\tilde{S}},\tau ). & &\end{array}$$

(22.29)

To show the continuity of g(τ) at τ = 0, first note that g(τ) ≥ g(0) for all τ ≥ 0. Next, let ${P}_{\tilde{S}}^{0}$ attain the minimum in (22.29) for τ = 0. Clearly,

$$\begin{array}{rlrlrl} g({P}_{\tilde{S}}^{0},\tau ) \geq g(\tau ). & &\end{array}$$

(22.30)

Also, let ${P}_{U\tilde{X}\mid \tilde{S}}^{\tau }$ attain the maximum of $g({P}_{\tilde{S}}^{0},\tau )$ in (22.28). For the associated joint pmf ${P}_{\tilde{S}}^{0}{P}_{U\tilde{X}\mid \tilde{S}}^{\tau }$, let ${P}_{U}^{\tau }$ denote the resulting U-marginal pmf, and consider the joint pmf ${P}_{U}^{\tau }{P}_{\tilde{S}}^{0}{P}_{\tilde{X}\mid U\tilde{S}}^{\tau }$. Then, using (22.28) and (22.29) and the observations above,

$$f(R,{P}_{U}^{\tau }{P}_{\tilde{ S}}^{0}{P}_{\tilde{ X}\mid U\tilde{S}}^{\tau }) \leq g(0) \leq g(\tau ) \leq g({P}_{\tilde{ S}}^{0},\tau ) = f(R,{P}_{\tilde{ S}}^{0}{P}_{ U\tilde{X}\mid \tilde{S}}^{\tau }).$$

The continuity of g(τ) at τ = 0 will follow upon showing that

$$f(R,{P}_{\tilde{S}}^{0}{P}_{ U\tilde{X}\mid \tilde{S}}^{\tau }) - f(R,{P}_{ U}^{\tau }{P}_{\tilde{ S}}^{0}{P}_{\tilde{ X}\mid U\tilde{S}}^{\tau }) \rightarrow 0\mathrm{as}\tau \rightarrow 0.$$

The constraint on the mutual information (22.28) gives by Pinsker’s inequality [3, 4] that,

$$\tau \geq D\left ({P}_{U\mid \tilde{S}}^{\tau }{P}_{\tilde{ S}}^{0}\|{P}_{ U}^{\tau }{P}_{\tilde{ S}}^{0}\right ) \geq 2{\left \|{P}_{ U\mid \tilde{S}}^{\tau }{P}_{\tilde{ S}}^{0} - {P}_{ U}^{\tau }{P}_{\tilde{ S}}^{0}\right \|}^{2},$$

i.e.,

$$\begin{array}{rlrlrl} \left \|{P}_{U\mid \tilde{S}}^{\tau }{P}_{\tilde{ S}}^{0} - {P}_{ U}^{\tau }{P}_{\tilde{ S}}^{0}\right \| \leq \sqrt{\frac{\tau } {2}}. & &\end{array}$$

(22.31)

For ${P}_{U\tilde{X}\tilde{S}} = {P}_{U}^{\tau }{P}_{\tilde{S}}^{0}{P}_{\tilde{X}\mid U\tilde{S}}^{\tau }$, let V ⁰ attain the minimum in (22.26), i.e.,

$$\begin{array}{rlrlrl} {P}_{Y \mid \tilde{X}\tilde{S}} & = {V }^{0},\quad I(U \wedge Y ) < R,\mathrm{and} & & \\ f(R,{P}_{U}^{\tau }{P}_{\tilde{ S}}^{0}{P}_{\tilde{ X}\mid U\tilde{S}}^{\tau }) & = D({P}_{\tilde{ S}}\|{P}_{S}) + D({V }^{0}\|W\mid {P}_{\tilde{ X}\mid \tilde{S}}{P}_{\tilde{S}}). & & \end{array}$$

By (22.31), for ${P}_{U\tilde{X}\tilde{S}} = {P}_{\tilde{S}}^{0}{P}_{U\tilde{X}\mid U\tilde{S}}^{\tau }$ and ${P}_{Y \mid \tilde{X}\tilde{S}} = {V }^{0}$, by standard continuity arguments, we have

$$I(U \wedge Y ) < R + \nu ,$$

and

$$D({P}_{\tilde{S}}\|{P}_{S}) + D({V }^{0}\|W\mid {P}_{\tilde{ X}\mid \tilde{S}}{P}_{\tilde{S}}) \leq f(R,{P}_{U}^{\tau }{P}_{\tilde{ S}}^{0}{P}_{\tilde{ X}\mid U\tilde{S}}^{\tau }) + \nu ,$$

where $\nu = \nu (\tau ) \rightarrow 0$ as τ → 0. Consequently,

$$f(R,{P}_{\tilde{S}}^{0}{P}_{ U\tilde{X}\mid U\tilde{S}}^{\tau }) \leq D({P}_{\tilde{ S}}\|{P}_{S})+D({V }^{0}\|W\mid {P}_{\tilde{ X}\mid \tilde{S}}{P}_{\tilde{S}}) \leq f(R,{P}_{U}^{\tau }{P}_{\tilde{ S}}^{0}{P}_{\tilde{ X}\mid U\tilde{S}}^{\tau })+\nu.$$

Finally, noting the continuity of $f(R,{P}_{U\tilde{X}\tilde{S}})$ in R [3, Lemma 10.4], the proof is completed.

Notes

1.
In our assertions, we indicate the validity of a statement “for all n ≥ N(.)” by showing the explicit dependency of N; however, the standard picking of the “largest such N” from (finitely many) such Ns is not indicated.

References

Biglieri, E., Proakis, J., Shamai, S.(Shitz).: Fading channels: information-theoretic and communications aspects. IEEE Trans. Inf. Theor. 44(6), 2619–2692 (1998)
Google Scholar
Csiszár, I.: The method of types. IEEE Trans. Inf. Theor. 44(6), 2505–2523 (1998)
Google Scholar
Csiszár, I., Körner, J.: Information Theory: Coding Theorems for Discrete Memoryless Channels, 2nd edn. Cambridge University Press, Cambridge (2011)
Google Scholar
Fedotov, A.A., Harremoës, P., Topsøe, F.: Refinements of Pinsker’s inequality. IEEE Trans. Inf. Theor. 49(6), 1491–1498 (2003)
Google Scholar
Gelfand, S.I., Pinsker, M.S.: Coding for channels with random parameters. Prob. Contr. Inf. Theor. 9(1), 19–31 (1980)
Google Scholar
Keshet, G., Steinberg, Y., Merhav, N.: Channel coding in presence of side information. Foundations and Trends in Commun. Inf. Theor. 4(6), 445–586 (2008)
Google Scholar
Lapidoth, A., Narayan, P.: Reliable communication under channel uncertainty. IEEE Trans. Inf. Theor. 44(6), 2148–2177 (1998)
Google Scholar
Shannon, C.E.: Channels with side information at the transmitter. IBM J. Res. Dev. 2, 289–293 (1958)
Google Scholar
Shannon, C.E., Gallager, R.G., Berlekamp, E.R.: Lower bounds to error probability for coding on discrete memoryless channels-i. Inf. Contr. 65–103 (1966)
Google Scholar
Shannon, C.E., Gallager, R.G., Berlekamp, E.R.: Lower bounds to error probability for coding on discrete memoryless channels-ii. Inf. Contr. 522–552 (1967)
Google Scholar
Tyagi, H., Narayan, P.: The Gelfand-Pinsker channel: strong converse and upper bound for the reliability function. In: Proceedings of the IEEE International Symposium on Information Theory, Seoul, Korea (2009)
Google Scholar
Wolfowitz, J.: Coding Theorems of Information Theory. Springer, New York (1978)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering and the Institute for Systems Research, University of Maryland, College Park, MD, 20742, USA
Himanshu Tyagi & Prakash Narayan

Authors

Himanshu Tyagi
View author publications
You can also search for this author in PubMed Google Scholar
Prakash Narayan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prakash Narayan .

Editor information

Editors and Affiliations

, Department of Mathematics, University of Maryland, Norbert Wiener Center, College Park, 20742-0001, Maryland, USA
Travis D. Andrews
Norbert Wiener Center, Department of Mathematics, University of Maryland, College Park, 20742, Maryland, USA
Radu Balan
Department of Mathematics, University of Maryland, College Park, 20742-4015, Maryland, USA
John J. Benedetto
Department of Mathematics, University of Maryland, College Park, 20742-4015, Maryland, USA
Wojciech Czaja
, Department of Mathematics, University of Maryland, College Park, 20742, Maryland, USA
Kasso A. Okoudjou

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tyagi, H., Narayan, P. (2013). State-Dependent Channels: Strong Converse and Bounds on Reliability Function. In: Andrews, T., Balan, R., Benedetto, J., Czaja, W., Okoudjou, K. (eds) Excursions in Harmonic Analysis, Volume 1. Applied and Numerical Harmonic Analysis. Birkhäuser, Boston. https://doi.org/10.1007/978-0-8176-8376-4_22

Download citation

DOI: https://doi.org/10.1007/978-0-8176-8376-4_22
Published: 20 November 2012
Publisher Name: Birkhäuser, Boston
Print ISBN: 978-0-8176-8375-7
Online ISBN: 978-0-8176-8376-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

State-Dependent Channels: Strong Converse and Bounds on Reliability Function

Abstract

Similar content being viewed by others

Memory effects can make the transmission capability of a communication channel uncomputable

Quantum coding with finite resources

Computational Mechanics of Input–Output Processes: Structured Transformations and the \(\epsilon \)-Transducer

Keywords

1 Introduction

2 Preliminaries: Types, Typical Sets and Image Sets

3 Channels with States

Definition 23.1.

Theorem 23.1.

Remark.

Theorem 23.2 (Gelfand–Pinsker [5]).

Definition 23.2.

4 A Technical Lemma

Lemma 23.1.

Proof.

5 The Strong Converse

Theorem 23.3 (Strong converse).

Proof.

6 Outer Bound on Reliability Function

Theorem 23.4 (Sphere packing bound).

Remark.

Proof.

7 Acknowledgement

8 Appendix: Continuity of the Right Side of (23.26)

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation