Keywords

1 Introduction

The information theoretic model of a communication channel for message transmission is described by the conditional probability law of the channel output given the input. For instance, the binary symmetric channel is a model for describing the communication of binary data in which noise may cause random bit-flips with a fixed probability. A reliable encoded transmission of a message generally entails multiple uses of the channel. In several applications, such as mobile wireless communication, digital fingerprinting, and storage memories, the probability law characterizing the channel can change with time. This time-varying behavior of the channel probability is described typically in terms of the evolution of the underlying channel condition, termed “state.” The availability of channel state information (CSI) at the transmitter or receiver can enhance overall communication performance (cf. [716]).

We consider a state-dependent discrete memoryless channel (DMC), in which the underlying state process is independent and identically distributed (i.i.d.) with known probability mass function (PMF), and for which the channel output at any time instant depends on the inputs and states only through their current values. We address the cases of causal and noncausal CSI at the transmitter. In the former case, the transmitter has knowledge of all the past channel states as well as the current state; this model was introduced by Shannon [8]. In the latter case, the transmitter is provided access at the outset to the entire state sequence prevailing during the transmission of a message; see Gelfand–Pinsker [5]. We restrict ourselves to the situation where the receiver has no CSI, for receiver CSI can be accommodated by considering the states, too, as channel outputs.

Two information theoretic performance measures are of interest: Channel capacity and reliability function. The channel capacity characterizes the largest rate of encoded transmission for reliable communication. The reliability function describes the best exponential rate of decay of decoding error probability with transmission duration for coding rates below capacity. The capacities of the models above with causal and noncausal CSI were characterized in classic papers by Shannon [8] and Gelfand–Pinsker [5]. The reliability function is not fully characterized even for a DMC without states; however, good upper and lower bounds are known, which coincide at rates close to capacity [9103].

Our contributions are twofold. First, we provide a strong converse for the capacity of state-dependent channels, which explains the structure of optimal codes. Second, exploiting this structure, we obtain upper bounds for the reliability functions of the causal and noncausal CSI models. Instrumental to our proofs is a new technical result which provides an upper bound on the rate of codes with code words that are “conditionally typical over large message-dependent subsets of a typical set of state sequences.” This technical result is a nonstraightforward analog of [3, Lemma 2.1.4] for a DMC without states; the latter provides a bound on the rate of a good code with codewords of a fixed composition. A preliminary conference version of this work is in [11].

In the next section, we compile pertinent technical concepts and tools that will be used to prove our results. These standard staples can be found, for instance, in [32]. The channel models are described in Sect. 22.3. Sections 22.422.6 contain our main results.

2 Preliminaries: Types, Typical Sets and Image Sets

Let \(\mathcal{X}\) be a finite set. For a sequence \(\mathbf{x} \in {\mathcal{X}}^{n}\), the type of x, denoted by Q x , is a pmf on \(\mathcal{X}\), where Q x (x) is the relative frequency of x in x. Similarly, joint types are pmfs on product spaces. For example, the joint type of two given sequences \(\mathbf{x} \in {\mathcal{X}}^{n}\) \(\mathbf{s} \in {\mathcal{S}}^{n}\) is a pmf Q on \(\mathcal{X}\times \mathcal{S}\), where Q x, s (x, s) is the relative frequency of the tuple (x, s) among the tuples \(\left ({x}_{t},{s}_{t}\right )\), t = 1, , n. Joint types of several n-length sequences are defined similarly.

The number of types of sequences in \({\mathcal{X}}^{n}\) is bounded above by \({(n + 1)}^{\vert \mathcal{X}\vert }\). Denoting by \({\mathcal{T}}_{Q}^{(n)}\) the set of all sequences in \({\mathcal{X}}^{n}\) of type Q, we note that

$$\begin{array}{rlrlrl} {(n + 1)}^{-\|\mathcal{X}\|}\exp [nH(Q)] \leq \left \|{\mathcal{T}}_{ Q}^{(n)}\right \| \leq \exp [nH(Q)]. & &\end{array}$$
(22.1)

For any pmf P on \(\mathcal{X}\), and type Q on \({\mathcal{X}}^{n}\),

$$\begin{array}{rlrlrl} {P}_{}^{n}(\mathbf{x}) & ={ \prod }_{t=1}^{n}{P}_{}({x}_{ t}) ={ \prod }_{x\in \mathcal{X}}{P}_{}{(x)}^{nQ(x)} & & \\ & =\exp [-n(D({P}_{}\|Q) + H(Q))],\quad \mathbf{x} \in {\mathcal{T}}_{Q}^{(n)}, & & \end{array}$$

from which, along with (22.1), it follows that

$$\begin{array}{rlrlrl} {(n + 1)}^{-\|\mathcal{X}\|}\exp [-n(D({P}_{}\|Q)] \leq {P}_{}^{n}\left ({\mathcal{T}}_{ Q}^{(n)}\right ) \leq \exp [-n(D({P}_{}\|Q)]. & & \end{array}$$

Next, for a pmf P on \(\mathcal{X}\) and δ > 0, a sequence \(\mathbf{x} \in {\mathcal{X}}^{n}\) is P typical with constant δ if

$${\max }_{x\in \mathcal{X}}\left \vert {Q}_{\mathbf{x}}(x) - {P}_{}(x)\right \vert \leq \delta ,$$

and P(x) = 0 implies Q x (x) = 0. The set of all P -typical sequences with constant δ, is called the P -typical set, denoted \({\mathcal{T}}_{[{P}_{}]}^{(n)}\) (where the dependence on δ is not displayed explicitly). Thus,

$${\mathcal{T}}_{[{P}_{}]}^{(n)} ={ \bigcup \nolimits }_{{ { \mathrm{types}Q : \atop \max }_{x\in \mathcal{X}}\left \vert {Q}_{\mathbf{x}}(x) - {P}_{}(x)\right \vert \leq \delta } }{\mathcal{T}}_{Q}^{(n)}.$$

In general, δ = δ n and is assumed to satisfy the “δ-convention” [3], namely

$$\begin{array}{rlrlrl} {\delta }_{n} \rightarrow 0,\quad \sqrt{n}{\delta }_{n} \rightarrow \infty \mathrm{as}n \rightarrow \infty. & &\end{array}$$
(22.2)

The typical set has large probability. Precisely, for δ = δ n as in (22.2),

$$\begin{array}{rlrlrl} {P}_{}^{n}\left ({\mathcal{T}}_{ Q}^{(n)}\right ) \geq 1 - \frac{\|\mathcal{X}\|} {4n{\delta }^{2}}. & &\end{array}$$
(22.3)

Consider sequences \(\mathbf{x} \in {\mathcal{X}}^{n}\), \(\mathbf{y} \in {\mathcal{Y}}^{n}\) of joint type Q x, y . The sequence \(\mathbf{y} \in {\mathcal{Y}}^{n}\) has conditional type V if Q x, y  = Q x V , for some stochastic matrix \(V : \mathcal{X} \rightarrow \mathcal{Y}\). Given a stochastic matrix \(W : \mathcal{X} \rightarrow \mathcal{Y}\), and \(\mathbf{x} \in {\mathcal{X}}^{n}\), a sequence \(\mathbf{y} \in {\mathcal{Y}}^{n}\) of conditional type V is W-conditionally typical if for all \(x \in \mathcal{X}\):

$${\max }_{y\in \mathcal{Y}}\left \vert V (y\mid x) - W(y\mid x)\right \vert \leq \delta ,$$

and W(yx) = 0 implies V (yx) = 0. The set of all W-conditionally typical sequences conditioned on \(\mathbf{x} \in {\mathcal{X}}^{n}\) is denoted by \({\mathcal{T}}_{[W]}^{(n)}(\mathbf{x})\). In a manner similar to (22.3), it holds that

$$\begin{array}{rlrlrl} {W}^{n}\left ({\mathcal{T}}_{ [W]}^{(n)}(\mathbf{x})\mid \mathbf{x}\right ) \geq 1 - \frac{\|\mathcal{X}\|\|\mathcal{Y}\|} {4n{\delta }^{2}}. & & \end{array}$$

For a subset A of \(\mathcal{X}\), we shall require also estimates of the minimum cardinality of sets in \(\mathcal{Y}\) with significant W-conditional probability given x ∈ A. Precisely, a set \(B \subseteq \mathcal{Y}\) is an ε-image (0 < ε ≤ 1) of \(A \subseteq \mathcal{X}\) under \(W : \mathcal{X} \rightarrow \mathcal{Y}\) if W(Bx) ≥ ε for all x ∈ A. The minimum cardinality of ε-images of A is termed the image size of A (under W), and is denoted by g W (A, ε). Coding theorems in information theory use estimates of the rates of the image size of \(A \subseteq {\mathcal{X}}^{n}\) under W n, i.e., \((1/n)\log {g}_{{W}^{n}}(A,\epsilon )\). In particular, for multiterminal systems, we compare the rates of image sizes of \(A \subseteq {\mathcal{X}}^{n}\) under two different channels W n and V n. Precisely, given stochastic matrices \(W : \mathcal{X} \rightarrow \mathcal{Y}\) and \(V : \mathcal{X} \rightarrow \mathcal{S}\), for every 0 < ε < 1, δ > 0 and for every \(A \subseteq {\mathcal{T}}_{[{P}_{X}]}^{(n)}\), there exists an auxiliary rv U and associated pmfs P UXY  = P UX P X W and \({P}_{UXZ} = {P}_{U\mid X}{P}_{X}V\) such that

$$\begin{array}{rlrlrl} \left \vert \frac{1} {n}\log {g}_{{W}^{n}}(B({m}_{0}),\epsilon ) - H(Y \vert U) - t\right \vert & < \delta , & \\ \left \vert \frac{1} {n}\log {g}_{{V }^{n}}(B({m}_{0}),\epsilon ) - H(S\vert U) - t\right \vert & < \delta , & & \end{array}$$
(22.4)

where \(0 \leq t \leq \min \{ I(U \wedge Y ),I(U \wedge S)\}\).

3 Channels with States

Consider a state-dependent DMC \(W : \mathcal{X}\times \mathcal{S}\rightarrow \mathcal{Y}\) with finite input, state, and output alphabets \(\mathcal{X}\), \(\mathcal{S}\), and \(\mathcal{Y}\), respectively. The \(\mathcal{S}\)-valued state process {S t } t = 1 is i.i.d. with known pmf P S . The probability law of the DMC is specified by

$$\begin{array}{rlrlrl} {W}^{n}(\mathbf{y}\mid \mathbf{x},\mathbf{s}) ={ \prod }_{t=1}^{n}W({y}_{ t}\mid {x}_{t},{s}_{t}),\quad \mathbf{x} \in {\mathcal{X}}^{n}\,,\mathbf{s} \in {\mathcal{S}}^{n},\,\mathbf{y} \in {\mathcal{Y}}^{n}. & & \end{array}$$

An (M, n)-code with encoder CSI consists of the mappings (f, ϕ) where the encoder mapping \(f = \left ({f}_{1},\ldots ,{f}_{n}\right )\) is either causal, i.e.,

$$\begin{array}{rlrlrl} {f}_{t} : \mathcal{M}\times {\mathcal{S}}^{t} \rightarrow \mathcal{X},\quad t = 1,\ldots ,n, & & \end{array}$$

or noncausal, i.e.,

$$\begin{array}{rlrlrl} {f}_{t} : \mathcal{M}\times {\mathcal{S}}^{n} \rightarrow \mathcal{X},\quad t = 1,\ldots ,n. & & \end{array}$$

with \(\mathcal{M} =\{ 1,\ldots ,M\}\) being the set of messages. The decoder ϕ is a mapping

$$\begin{array}{rlrlrl} \phi : {\mathcal{Y}}^{n} \rightarrow \mathcal{M}. & & \end{array}$$

We restrict ourselves to the situation where the receiver has no CSI. When the receiver, too, has CSI, our results apply in a standard manner by considering an associated DMC with augmented output alphabet \(\mathcal{Y}\times \mathcal{S}\).

The rate of the code is (1 ∕ n)logM. The corresponding (maximum) probability of error is

$$\begin{array}{rlrlrl} e(f,\phi ) {=\max }_{m\in \mathcal{M}}{\sum }_{\mathbf{s}\in {\mathcal{S}}^{n}}{P}_{S}(\mathbf{s}){W}^{n}({({\phi }^{-1}(m))}^{c}\mid f(m,\mathbf{s}),\mathbf{s}), & &\end{array}$$
(22.5)

where \({\phi }^{-1}(m) =\{ \mathbf{y} \in {\mathcal{Y}}^{n} : \phi (\mathbf{y}) = m\}\) and ( ⋅)c denotes complement.

Definition 23.1.

Given 0 < ε < 1, a number R > 0 is ε-achievable if for every δ > 0 and for all n sufficiently large, there exist (M, n)-codes (f, ϕ) with \((1/n)\log M > R - \delta \) and e(f, ϕ) < ε. The supremum of all ε-achievable rates is denoted by C(ε). The capacity of the DMC is

$$C {=\lim }_{\epsilon \rightarrow 0}C(\epsilon ).$$

If C(ε) = C for 0 < ε < 1, the DMC is said to satisfy a strong converse [12]. This terminology reflects the fact that for rates R > C, e(f, ϕ) > ε for n ≥ N(ε), 0 < ε < 1. (In contrast, a standard converse shows that for R > C, e(f, ϕ) cannot be driven to 0 as n → .)

For a given pmf \({P}_{\tilde{X}\tilde{S}}\) on \(\mathcal{X}\times \mathcal{S}\) and an rv U with values in a finite set \(\mathcal{U}\), let \(\mathcal{P}\left ({P}_{\tilde{X}\tilde{S}},W\right )\) denote the set of all pmfs P UXSY on \(\mathcal{U}\times \mathcal{X}\times \mathcal{S}\times \mathcal{Y}\) with

$$\begin{array}{rlrlrl} X = h(U,S) & &\end{array}$$
(22.6)

for some mapping h,

$$\begin{array}{rlrlrl} &U -\circ - X,S -\circ - Y,\qquad {P}_{X,S,Y } = {P}_{\tilde{X}\tilde{S}}W. &\end{array}$$
(22.7)

For γ ≥ 0, let \({\mathcal{P}}_{\gamma }\left ({P}_{\tilde{X}\tilde{S}},W\right )\) be the subset of \(\mathcal{P}\left ({P}_{\tilde{X}\tilde{S}},W\right )\) with I(U ∧ S) ≤ γ; note that \({\mathcal{P}}_{0}\left ({P}_{\tilde{X}\tilde{S}},W\right )\) corresponds to the subset of \(\mathcal{P}\left ({P}_{\tilde{X}\tilde{S}},W\right )\) with U independent of S.

The classical results on the capacity of a state-dependent channel are due to Shannon [8] when the encoder CSI is causal and Gelfand and Pinsker [5] when the encoder CSI is noncausal.

Theorem 23.1.

For the case with causal CSI, the capacity is

$${C}_{Sh} {=\max { }_{{P}_{X\mid S}}\max }_{{\mathcal{P}}_{0}\left ({P}_{X\mid S}{P}_{S},W\right )}I(U \wedge Y ),$$

and holds with the strong converse.

Remark.

The capacity formula was derived by Shannon [8], and the strong converse was proved later by Wolfowitz [12].

Theorem 23.2 (Gelfand–Pinsker [5]). 

For the case with noncausal CSI, the capacity is

$${C}_{GP} {=\max { }_{{P}_{X\mid S}}\max }_{\mathcal{P}\left ({P}_{X\mid S}{P}_{S},W\right )}I(U \wedge Y ) - I(U \wedge S).$$

One main result below is to show that the previous result, too, holds with a strong converse.

Definition 23.2.

The reliability functionE(R),  R ≥ 0, of the DMC W is the largest number E ≥ 0 such that for every δ > 0 and for all sufficiently large n, there exist n-length block codes (f, ϕ) with causal or noncausal CSI as above of rate greater than R − δ and \(e(f,\phi ) \leq \exp [-n(E - \delta )]\) (see, for instance, [3]).

4 A Technical Lemma

For a DMC without states, the result in [3, Corollary 6.4] provides, in effect, an image size characterization of a good codeword set; this does not involve any auxiliary rv. In the same spirit, our key technical lemma below provides an image size characterization for good codeword sets for the causal and noncausal DMC models, which now involves an auxiliary rv.

Lemma 23.1.

Let ε,τ > 0 be such that ε + τ < 1. Given a pmf \({P}_{\tilde{S}}\) on \(\mathcal{S}\) and conditional pmf \({\tilde{P}}_{X\mid S}\) , let (f,ϕ) be a (M,n)-code as above. For each \(m \in \mathcal{M}\) , let A(m) be a subset of \({\mathcal{S}}^{n}\) which satisfies the following conditions:

$$\begin{array}{rcl} & A(m) \subseteq {\mathcal{T}}_{[{P}_{\tilde{S}}]}^{n}, &\end{array}$$
(22.8)
$$\begin{array}{rcl}& \|A(m)\| \geq \exp \left [n\left (H({P}_{\tilde{S}}) -\frac{\tau } {6} \right )\right ], &\end{array}$$
(22.9)
$$\begin{array}{rcl}& f(m,\mathbf{s}) \in {\mathcal{T}}_{[{P}_{\tilde{X}\mid \tilde{S}}]}^{n}(\mathbf{s}),\,\,\,\,\,\,\,\,\mathbf{s} \in A(m).&\end{array}$$
(22.10)

Furthermore, let (f,ϕ) satisfy one of the following two conditions:

$$ \begin{array}{rcl} & {W}^{n}({\phi }^{-1}(m)\mid f(m,\mathbf{s}),\mathbf{s}) \geq 1 - \epsilon ,\,\,\,\,\,\,\,\,\mathbf{s} \in A(m), & \end{array}$$
(22.11a)
$$\begin{array}{rcl}& \frac{1} {\|A(m)\|}{ \sum \nolimits }_{\mathbf{s}\in A(m)}{W}^{n}({\phi }^{-1}(m)\mid f(m,\mathbf{s}),\mathbf{s}) \geq 1 - \epsilon.&\end{array}$$
(22.11b)
  1. (a)

    In the causal CSI case, for \(n \geq N(\|\mathcal{X}\|,\|\mathcal{S}\|,\|\mathcal{Y}\|,\tau ,\epsilon )\),Footnote 1 it holds that

    $$\begin{array}{rcl} \frac{1} {n}\log M \leq I(U \wedge Y ) + \tau ,& & \\ \end{array}$$

    for some \({P}_{UXSY } \in {\mathcal{P}}_{\tau }({P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}},W)\) .

  2. (b)

    In the noncausal CSI case, for \(n \geq N(\|\mathcal{X}\|,\|\mathcal{S}\|,\|\mathcal{Y}\|,\tau ,\epsilon )\) , it holds that

    $$\begin{array}{rcl} \frac{1} {n}\log M \leq I(U \wedge Y ) - I(U \wedge S) + \tau ,& & \\ \end{array}$$

    for some \({P}_{UXSY } \in \mathcal{P}({P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}},W)\) .

Furthermore, in both cases it suffices to restrict the rv U to take values in a finite set \(\mathcal{U}\) with \(\|\mathcal{U}\|\leq \|\mathcal{X}\|\|\mathcal{S}\| + 1\) .

Proof.

Our proof below is for the case when (22.11a) holds. The case when (22.11b) holds can be proved similarly with minor modifications; specifically, in the latter case, we can find subsets A′(m) of A(m), \(m \in \mathcal{M}\), that satisfy (22.8)–(22.10) and (22.11a) for some ε, τ > 0 with ε + τ < 1 for all n sufficiently large.

With (22.11a) holding, set

$$\begin{array}{rcl} B(m) =\{ (f(m,\mathbf{s}),\mathbf{s}) \in {\mathcal{X}}^{n} \times {\mathcal{S}}^{n} : \mathbf{s} \in A(m)\},\quad m \in \mathcal{M}.& & \\ \end{array}$$

Let \({P}_{\tilde{Y }} = {P}_{\tilde{X}\tilde{S}}W\) be a pmf on \(\mathcal{Y}\) defined by

$$\begin{array}{rcl}{ P}_{\tilde{Y }}(y) ={ \sum \nolimits }_{s,x}{P}_{\tilde{S}\tilde{X}}(s,x)W(y\mid x,s),\quad y \in \mathcal{Y}.& & \\ \end{array}$$

Consequently,

$$\begin{array}{rcl}{ W}^{n}({\mathcal{T}}_{ [{P}_{\tilde{Y}}]}^{n}\mid f(m,\mathbf{s}),\mathbf{s}) > \epsilon + \tau ,\quad \mathbf{s} \in A(m),& &\end{array}$$
(22.12)

for all \(n \geq N(\|\mathcal{X}\|,\vert \mathcal{S}\|,\vert \mathcal{Y}\|,\tau ,\epsilon )\) (not depending on m and s in A(m)). Denoting

$$\begin{array}{rcl} C(m) = {\phi }^{-1}(m) \cap {\mathcal{T}}_{ [{P}_{\tilde{Y}}]}^{n},& & \\ \end{array}$$

we see from (22.11a) and (22.12) that

$$\begin{array}{rcl}{ W}^{n}(C(m)\mid f(m,\mathbf{s}),\mathbf{s}) > \tau > 0,\quad ,(f(m,\mathbf{s}),\mathbf{s}) \in B(m),& & \\ \end{array}$$

so that

$$\begin{array}{rcl} \|C(m)\| \geq {g}_{{W}^{n}}(B(m),\tau ),& & \\ \end{array}$$

where \({g}_{{W}^{n}}(B(m),\tau )\) denotes the smallest cardinality of a subset D of \({\mathcal{Y}}^{n}\) with

$$\begin{array}{rcl}{ W}^{n}(D\mid (f(m,\mathbf{s}),\mathbf{s})) > \tau ,\,\,\,\,\,\,\,\,(f(m,\mathbf{s}),\mathbf{s}) \in B(m).& &\end{array}$$
(22.13)

With \({m}_{0} ={ arg\min }_{1\leq m\leq M}\|C(m)\|\), we have

$$\begin{array}{rcl} M\|C({m}_{0})\| \leq {\sum \nolimits }_{m=1}^{M}\|C(m)\| =\| {\mathcal{T}}_{ [{P}_{\tilde{Y}}]}^{n}\| \leq \exp n\bigg{(}H({P}_{\tilde{ Y }}) + \frac{\tau } {6}\bigg{)}.& & \\ \end{array}$$

Consequently,

$$\begin{array}{rcl} \frac{1} {n}\log M \leq H({P}_{\tilde{Y }}) + \frac{\tau } {6} - \frac{1} {n}\log {g}_{{W}^{n}}(B({m}_{0}),\tau ).& &\end{array}$$
(22.14)

The remainder of the proof entails relating the “image size” of B(m 0), i.e., \({g}_{{W}^{n}}(B({m}_{0}),\tau )\), to \(\|A({m}_{0})\|\), and is completed below separately for the cases of causal and noncausal CSI.

First consider the causal CSI case. For a rv \(\hat{{S}}^{n}\) distributed uniformly over A(m 0), we have from (22.9) that

$$\begin{array}{rcl} \frac{1} {n}H(\hat{{S}}^{n}) = \frac{1} {n}\log \|A({m}_{0})\| \geq H({P}_{\tilde{S}}) -\frac{\tau } {6}.& &\end{array}$$
(22.15)

Since

$$\frac{1} {n}H(\hat{{S}}^{n}) = \frac{1} {n}{\sum \nolimits }_{i=1}^{n}H(\hat{{S}}_{ i}\mid \hat{{S}}^{i-1}) = H(\hat{{S}}_{ I}\mid \hat{{S}}^{I-1},I),$$

where the rv I is distributed uniformly over the set {1, , n} and is independent of all other rvs, the previous identity, together with (22.15), yields

$$\begin{array}{rcl} H({P}_{\tilde{S}}) - H(\hat{{S}}_{I}\mid \hat{{S}}^{I-1},I) \leq \frac{\tau } {3}.& &\end{array}$$
(22.16)

Next, denote by \(\hat{{X}}^{n}\) the rv f(m 0, \hat{S}n) and by \(\hat{{Y }}^{n}\) the rv which conditioned on \(\hat{{X}}^{n}\), \(\hat{{S}}^{n}\), has (conditional) distribution W n, i.e., \(\hat{{Y }}^{n}\) is the random output of the DMC W when the input is set to \(\left (\hat{{X}}^{n},\hat{{S}}^{n}\right )\). Then, using [3, Lemma 15.2], we get

$$\begin{array}{rcl} \frac{1} {n}\log {g}_{{W}^{n}}(B({m}_{0}),\tau ) \geq \frac{1} {n}H(\hat{{Y }}^{n}) -\frac{\tau } {6},& &\end{array}$$
(22.17)

for all n sufficiently large. Furthermore,

$$\begin{array}{rcl} \frac{1} {n}H(\hat{{Y }}^{n})& = \frac{1} {n}{ \sum \nolimits }_{i=1}^{n}H(\hat{{Y }}_{i}\mid \hat{{Y }}^{i-1}) & \\ & \geq H(\hat{{Y }}_{I}\mid \hat{{X}}^{I-1},\hat{{S}}^{I-1},\hat{{Y }}^{I-1},I)& \\ & = H(\hat{{Y }}_{I}\mid \hat{{X}}^{I-1},\hat{{S}}^{I-1},I) & \\ & = H(\hat{{Y }}_{I}\mid \hat{{S}}^{I-1},I), & \\ \end{array}$$

where the last-but-one equality follows from the DMC assumption, and the last equality holds since \(\hat{{X}}^{I-1} = f({m}_{0},\hat{{S}}^{I-1})\). The inequality above, along with (22.17) and (22.14) gives

$$\begin{array}{rcl} \frac{1} {n}\log M \leq H({P}_{\tilde{Y }}) - H(\hat{{Y }}_{I}\mid \hat{{S}}^{I-1},I) + \frac{\tau } {3}.& &\end{array}$$
(22.18)

Denote by \(\hat{U}\) the rv \((\hat{{S}}^{I-1},I)\) and note that the following Markov property holds:

$$\hat{{Y }}_{I} -\circ -\hat{ {X}}_{I},\hat{{S}}_{I} -\circ -\hat{ U}.$$

Also, from the definition of B(m 0),

$$\begin{array}{rcl}{ P}_{\hat{{X}}_{I},\hat{{S}}_{I}}(x,s)& = \frac{1} {n}{ \sum \nolimits }_{i=1}^{n}{P}_{\hat{{X}}_{i},\hat{{S}}_{i}}\left (x,s\right ) & \\ & = \frac{1} {n}{ \sum \nolimits }_{i=1}^{n}{ \sum \nolimits }_{\mathbf{x},\mathbf{s}\in B({m}_{0})}\frac{\mathbf{1}({x}_{i}=x,{s}_{i}=s)} {\|B({m}_{0})\|} & \\ & = \frac{1} {\|B({m}_{0})\|}{ \sum \nolimits }_{\mathbf{x},\mathbf{s}\in B({m}_{0})}{Q}_{\mathbf{x},\mathbf{s}}(x,s), & \\ \end{array}$$

where Q x, s (x, s) is the joint type of x, s, and the last equation follows upon interchanging the order of summation. It follows from (22.8) and (22.10) that \(\|{P}_{\hat{{X}}_{I},\hat{{S}}_{I}} - {P}_{\tilde{X}\tilde{S}}\| \leq {\delta }_{n}\) for some δ n  → 0 satisfying the delta convention. Furthermore,

$$\begin{array}{rcl} \|{P}_{\hat{{Y }}_{I}} - {P}_{\tilde{Y }}\|& ={ \sum \nolimits }_{y}\left \vert {\sum \nolimits }_{x,s}W(y\vert x,s){P}_{\tilde{X}\tilde{S}}(x,s) -{\sum \nolimits }_{x,s}W(y\vert x,s){P}_{\hat{{X}}_{I}\hat{{S}}_{I}}(x,s)\right \vert & \\ & \leq {\sum \nolimits }_{x,s}{ \sum \nolimits }_{y}W(y\vert x,s)\left \vert {P}_{\tilde{X}\tilde{S}}(x,s) - {P}_{\hat{{X}}_{I}\hat{{S}}_{I}}(x,s)\right \vert & \\ & =\| {P}_{\tilde{X}\tilde{S}} - {P}_{\hat{{X}}_{I}\hat{{S}}_{I}}\| \leq {\delta }_{n}. & \\ \end{array}$$

Let the rvs \(\tilde{X},\tilde{S},\tilde{Y }\) have a joint distribution \({P}_{\tilde{X}\tilde{S}\tilde{Y }}\). Define a rv U which takes values in the same set as \(\hat{U}\), has \({P}_{\hat{U}\mid \hat{{X}}_{I}\hat{{S}}_{I}}\) as its conditional distribution given X, S, and satisfies the Markov relation

$$Y -\circ - X,S -\circ - U.$$

Then using the continuity of the entropy function and the arguments above, (22.18) yields

$$\frac{1} {n}\log M \leq I(U \wedge Y ) + \tau ,$$

and (22.16) yields

$$I(U \wedge S) \leq \tau ,$$

for all n sufficiently large, where \({P}_{UXSY } \in {\mathcal{P}}_{\tau }({P}_{\tilde{X}\tilde{S}},W)\).

Turning to the case with noncausal CSI, define a stochastic matrix \(V : \mathcal{X}\,\times \,\mathcal{S}\,\rightarrow \,\mathcal{S}\) with

$$\begin{array}{rcl} V (s{^\prime}\mid x,s) = \mathbf{1}(s{^\prime} = s),& & \\ \end{array}$$

and let g V n be defined in a manner analogous to g Wn above with \({\mathcal{S}}^{n}\) in the role of \({\mathcal{Y}}^{n}\) in (22.13). For any \(m \in \mathcal{M}\) and subset E of \({\mathcal{S}}^{n}\), observe that

$$\begin{array}{rcl}{ V }^{n}(E\mid f(m,\mathbf{s}),\mathbf{s}) = \mathbf{1}(s \in E),\,\,\,\,\,\,\,\,\mathbf{s} \in {\mathcal{S}}^{n}.& & \\ \end{array}$$

In particular, if E satisfies

$$\begin{array}{rcl}{ V }^{n}(E\mid f(m,\mathbf{s}),\mathbf{s}) > \tau ,\,\,\,\,\,\,\,\,\mathbf{s} \in A(m),& &\end{array}$$
(22.19)

it must be that A(m) ⊆ E, and since E = A(m) satisfies (22.19), we get that

$$\begin{array}{rcl} \|A(m)\| = {g}_{{V }^{n}}(B(m),\tau )& &\end{array}$$
(22.20)

using the definition of B(m). Using the image size characterization in (22.4) [3, Theorem 15.11], there exists an auxiliary rv U and associated pmf \({P}_{UXSY } = {P}_{U\mid XS}{P}_{\tilde{X}\tilde{S}}W\) such that

$$\begin{array}{rcl} \left \vert \frac{1} {n}\log {g}_{{W}^{n}}(B({m}_{0}),\tau ) - H(Y \vert U) - t\right \vert & < \frac{\tau } {6} ,& \\ \left \vert \frac{1} {n}\log {g}_{{V }^{n}}(B({m}_{0}),\tau ) - H(S\vert U) - t\right \vert & < \frac{\tau } {6} ,&\end{array}$$
(22.21)

where \(0 \leq t \leq \min \{ I(U \wedge Y ),I(U \wedge S)\}\). Then, using (22.14), (22.20), and (22.21) we get

$$\begin{array}{rcl} \frac{1} {n}\log M \leq I(U \wedge Y ) + H(S\mid U) - \frac{1} {n}\log \|A({m}_{0})\| + \frac{\tau } {2},& & \\ \end{array}$$

which, by (22.9), yields

$$\begin{array}{rcl} \frac{1} {n}\log M \leq I(U \wedge Y ) - I(U \wedge S) + \tau.& & \\ \end{array}$$

In (22.21), \({P}_{UXSY } \in \mathcal{P}({P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}},W)\) but need not satisfy (22.6). Finally, the asserted restriction to \({P}_{UXSY } \in \mathcal{P}({P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}},W)\) follows from the convexity of I(U ∧ Y ) − I(U ∧ S) in P XUS for a fixed P US (as observed in [5]).

Lastly, it follows from the support lemma [3, Lemma 15.4] that it suffices to consider those rvs U for which \(\|\mathcal{U}\|\leq \|\mathcal{X}\|\|\mathcal{S}\| + 1\).

5 The Strong Converse

Theorem 23.3 (Strong converse). 

Given 0 < ε < 1 and a sequence of (M n ,n) codes (f n n ) with e(f n n ) < ε, it holds that

$$\begin{array}{rcl} {limsup}_{n} \frac{1} {n}\log {M}_{n} \leq C,& & \\ \end{array}$$

where C = C Sh and C GP for the cases of causal and noncausal CSI, respectively.

Proof.

Given 0 < ε < 1 and a (M, n)-code (f, ϕ) with e(f, ϕ) ≤ ε, the proof involves the identification of sets A(m), \(m \in \mathcal{M}\), satisfying (22.8)–(22.10) and (22.11a). The assertion then follows from Lemma 22.1. Note that e(f, ϕ) ≤ ε implies

$$\begin{array}{rcl} {\sum \nolimits }_{\mathbf{s}\in {\mathcal{S}}^{n}}{P}_{S}\left (\mathbf{s}\right ){W}^{n}({\phi }^{-1}(m)\mid f(m,\mathbf{s}),\mathbf{s}) \geq 1 - \epsilon & & \\ \end{array}$$

for all \(m \in \mathcal{M}\). Since \({P}_{S}\left ({\mathcal{T}}_{[{P}_{S}]}^{n}\right ) \rightarrow 1\) as n → , we get that for every \(m \in \mathcal{M}\),

$$\begin{array}{rcl}{ P}_{S}\left (\{\mathbf{s} \in {\mathcal{T}}_{[{P}_{S}]}^{n} : {W}^{n}({\phi }^{-1}(m)\mid f(m,\mathbf{s}),\mathbf{s}) > \frac{1 - \epsilon } {2} \}\right ) \geq \frac{1 - \epsilon } {3} & &\end{array}$$
(22.22)

for all \(n \geq N(\|\mathcal{S}\|,\epsilon )\). Denoting the set \(\{ \cdot \}\) in (22.22) by \(\hat{A}(m)\), clearly for every \(m \in \mathcal{M}\),

$$\begin{array}{rcl}{ W}^{n}({\phi }^{-1}(m)\mid f(m,\mathbf{s}),\mathbf{s}) \geq \frac{1 - \epsilon } {2} ,\,\,\,\,\,\,\,\,\mathbf{s} \in \hat{ A}(m),& & \\ \end{array}$$

and

$$\begin{array}{rcl}{ P}_{S}\left (\hat{A}(m)\right ) \geq \frac{1 - \epsilon } {3} & & \\ \end{array}$$

for all \(n \geq N(\|\mathcal{S}\|,\epsilon )\), whereby for an arbitrary δ > 0, we get

$$\begin{array}{rcl} \|\hat{A}(m)\| \geq \exp [n(H({P}_{S}) - \delta )]& & \\ \end{array}$$

for all \(n \geq N(\|\mathcal{S}\|,\delta )\). Partitioning \(\hat{A}(m)\), \(m \in \mathcal{M}\), into sets according to the (polynomially many) conditional types of f(m, s) given s in \(\hat{A}(m)\), we obtain a subset A(m) of \(\hat{A}(m)\) for which

$$\begin{array}{rcl} f(m,\mathbf{s})& \in {\mathcal{T}}_{m}^{n}(\mathbf{s}),\,\,\,\,\,\,\,\,\mathbf{s} \in A(m),& \\ \|A(m)\|& \geq \exp [n(H({P}_{S}) - 2\delta )], & \\ \end{array}$$

for all \(n \geq N(\|\mathcal{S}\|,\|\mathcal{X}\|,\delta )\), where \({\mathcal{T}}_{m}^{n}(\mathbf{s})\) represents a set of those sequences in \({\mathcal{X}}^{n}\) that have the same conditional type (depending only on m).

Once again, the polynomial size of such conditional types yields a subset \(\mathcal{M}{^\prime}\) of \(\mathcal{M}\) such that f(m, s) has a fixed conditional type (not depending on m) given s in A(m), and with

$$\begin{array}{rcl} \frac{1} {n}\log \|\mathcal{M}{^\prime}\| \geq \frac{1} {n}\log M - \delta & & \\ \end{array}$$

for all \(n \geq N(\|\mathcal{S}\|,\|\mathcal{X}\|,\delta )\). Finally, the strong converse follows by applying Lemma 22.1 to the subcode corresponding to \(\mathcal{M}{^\prime}\) and noting that δ > 0 is arbitrary.

6 Outer Bound on Reliability Function

An upper bound for the reliability function E(R), 0 < R < C, of a DMC without states is derived in [3] using a strong converse for codes with codewords of a fixed type. The key technical Lemma 22.1 gives an upper bound on the rate of codes with codewords that are conditionally typical over large message-dependent subsets of the typical set of state sequences and serves, in effect, as an analog of [3, Corollary 6.4] for state-dependent channels to derive an upper bound on the reliability function.

Theorem 23.4 (Sphere packing bound). 

Given δ > 0, for 0 < R < C, it holds that

$$\begin{array}{rcl} E(R) \leq {E}_{SP}(1 + \delta ) + \delta ,& & \\ \end{array}$$

where

$$\begin{array}{rcl}{ E}_{SP} {=\min { }_{{P}_{\tilde{S}}}\max {}_{{P}_{\tilde{X}\mid \tilde{S}}}\min }_{V \in \mathcal{V}(R,{P}_{\tilde{X}\tilde{S}})}\big{[}D({P}_{\tilde{S}}\|{P}_{S}) + D(V \|W\mid {P}_{\tilde{X}\tilde{S}})\big{]}& &\end{array}$$
(22.23)

with

$$\begin{array}{rcl} \mathcal{V}(R,{P}_{\tilde{X}\tilde{S}}) = {\mathcal{V}}_{Sh}(R,{P}_{\tilde{X}\tilde{S}}) = \{V : \mathcal{X}\times \mathcal{S}\rightarrow \mathcal{Y} {:\max }_{{P}_{UXSY }\in {\mathcal{P}}_{0}({P}_{\tilde{X}\tilde{S}},V )}I(U \wedge Y ) < R\},& &\end{array}$$
(22.24)

and

$$\begin{array}{rcl} \mathcal{V}(R,{P}_{\tilde{X}\tilde{S}})& = {\mathcal{V}}_{GP}(R,{P}_{\tilde{X}\tilde{S}}) & \\ & = \{V : \mathcal{X}\times \mathcal{S}\rightarrow \mathcal{Y} {:\max }_{{P}_{UXSY }\in \mathcal{P}({P}_{\tilde{X}\tilde{S}},V )}I(U \wedge Y ) - I(U \wedge S) < R\},&\end{array}$$
(22.25)

for the causal and noncausal CSI cases, respectively.

Remark.

In (22.23), the terms \(D({P}_{\tilde{S}}\|{P}_{S})\) and \(D(V \|W\mid {P}_{\tilde{S}}{P}_{\tilde{X}\mid \tilde{S}})\) account, respectively, for the shortcomings of a given code for corresponding “bad” state pmf and “bad” channel.

Proof.

Consider sequences of type \({P}_{\tilde{S}}\) in \({\mathcal{S}}^{n}\). Picking \(\hat{A}(m) = {\mathcal{T}}_{{P}_{\tilde{S}}}^{n}\), \(m \in \mathcal{M}\), in the proof of Theorem 22.3, and following the arguments therein to extract the subset A(m) of \(\hat{A}(m)\), we have for a given δ > 0 that for all \(n \geq N(\|\mathcal{S}\|,\|\mathcal{X}\|,\delta )\), there exists a subset \(\mathcal{M}{^\prime}\) of \(\mathcal{M}\) and a fixed conditional type, say \({P}_{\tilde{X}\mid \tilde{S}}\) (not depending on m), such that for every \(m \in \mathcal{M}{^\prime}\),

$$\begin{array}{rcl} A(m)& \subseteq \hat{ A}(m) = {\mathcal{T}}_{{P}_{\tilde{S}}}^{n}, & \\ \|A(m)\|& \geq \exp [n(H({P}_{\tilde{S}}) - \delta )], & \\ f(m,\mathbf{s})& \in {\mathcal{T}}_{{P}_{\tilde{X}\mid \tilde{S}}}^{n}(\mathbf{s}),\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\mathbf{s} \in A(m),& \\ \frac{1} {n}\log \|\mathcal{M}{^\prime}\|& \geq R - \delta. & \\ \end{array}$$

Then for every \(V \in \mathcal{V}(R,{P}_{\tilde{X}\tilde{S}})\), we obtain using Lemma 22.1 (in its version with condition (22.11b)), that for every δ > 0, there exists \(m \in \mathcal{M}{^\prime}\) (possibly depending on δ and V ) with

$$\begin{array}{rcl} \frac{1} {\|A(m)\|}{\sum \nolimits }_{\mathbf{s}\in A(m)}{V }^{n}({({\phi }^{-1}(m))}^{c}\mid f(m,\mathbf{s}),\mathbf{s}) \geq 1 - \delta {^\prime}& & \\ \end{array}$$

for all \(n \geq N(\|\mathcal{S}\|,\|\mathcal{X}\|,\|\mathcal{Y}\|,\delta {^\prime})\). Since the average V n-(conditional) probability of \({\left ({\phi }^{-1}(m)\right )}^{c}\) is large, its W n-(conditional) probability cannot be too small. To that end, for this m, apply [3, Theorem 10.3, (10.21)] with the choices

$$\begin{array}{rcl} Z& = {\mathcal{Y}}^{n} \times A(m), & \\ S& = {({\phi }^{-1}(m))}^{c} \times A(m),& \\ {Q}_{1}(\mathbf{y},\mathbf{s})& = \frac{{V }^{n}(\mathbf{y}\mid f(m,\mathbf{s}),\mathbf{s})} {\|A(m)\|} , & \\ {Q}_{2}(\mathbf{y},\mathbf{s})& = \frac{{W}^{n}(\mathbf{y}\mid f(m,\mathbf{s}),\mathbf{s})} {\|A(m)\|} , & \\ \end{array}$$

for (y, s) ∈ Z, to obtain

$$\begin{array}{rcl} \frac{1} {\|A(m)\|}{\sum \nolimits }_{\mathbf{s}\in A(m)}{W}^{n}({({\phi }^{-1}(m))}^{c}\mid f(m,\mathbf{s}),\mathbf{s}) \geq \exp \left (-\frac{nD(V \|W\mid {P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}}) + 1} {1 - \delta {^\prime}} \right ).& & \\ \end{array}$$

Finally,

$$\begin{array}{rcl} e(f,\phi )& \geq {\sum \nolimits }_{\mathbf{s}\in A(m)}{P}_{S}\left (\mathbf{s}\right ){W}^{n}({({\phi }^{-1}(m))}^{c}\mid f(m,\mathbf{s}),\mathbf{s}) & \\ & \geq \exp [-n(D({P}_{\tilde{S}}\|{P}_{S}) + D(V \|W\mid {P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}})(1 + \delta ) + \delta )]& \\ \end{array}$$

for \(n \geq N(\|\mathcal{S}\|,\|\mathcal{X}\|,\|\mathcal{Y}\|,\delta ,\delta {^\prime})\), whereby it follows for the noncausal CSI case that

$$ \begin{array}{rcl} \lim {\sup }_{n} - \frac{1} {n}\log e(f,\phi )\,&{ \leq \,\min {}_{{P}_{\tilde{S}}}\max {}_{{P}_{\tilde{X}\mid \tilde{S}}}\min }_{V \in \mathcal{V}(R,{P}_{\tilde{X}\tilde{S}})}[D({P}_{\tilde{S}}\|{P}_{S})& \\ & \quad + D(V \|W\mid {P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}})(1 + \delta ) + \delta ]& \\ \end{array}$$

for every δ > 0. Similarly, for the case of causal CSI, for τ > 0, letting

$$\begin{array}{rcl}{ \mathcal{V}}_{\tau }(R,{P}_{\tilde{X}\tilde{S}}) = \{V : \mathcal{X}\times \mathcal{S}\rightarrow \mathcal{Y} {:\max }_{{P}_{UXSY }\in {\mathcal{P}}_{\tau }({P}_{\tilde{X}\tilde{S}},V )}I(U \wedge Y ) < R\},& &\end{array}$$
(22.26)

we get

$$ \begin{array}{rcl} \lim {\sup }_{n} - \frac{1} {n}\log e(f,\phi ) {\leq \min {}_{{P}_{\tilde{S}}}\max {}_{{P}_{\tilde{X}\mid \tilde{S}}}\min }_{V \in {\mathcal{V}}_{\tau }(R,{P}_{\tilde{X}\tilde{S}})}[D({P}_{\tilde{S}}\|{P}_{S}) + D(V \|W\mid {P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}})].& & \\ \end{array}$$

The continuity of the right side of (22.26), as shown in the Appendix, yields the claimed expression for E SP in (22.23) and (22.24).

7 Acknowledgement

The work of H. Tyagi and P. Narayan was supported by the U.S. National Science Foundation under Grants CCF0830697 and CCF1117546.

8 Appendix: Continuity of the Right Side of (23.26)

Let

$$\begin{array}{rlrlrl} f(R,{P}_{U\tilde{X}\tilde{S}}) {=\min }_{{ V :I(U\wedge Y )<R \atop {P}_{Y\mid \tilde{X}\tilde{S}}=V} }D({P}_{\tilde{S}}\|{P}_{S}) + D(V \|W\mid {P}_{\tilde{X}\mid \tilde{S}}{P}_{\tilde{S}}). & &\end{array}$$
(22.27)

Further, let

$$\begin{array}{rlrlrl} g({P}_{\tilde{S}},\tau ) {=\max }_{{ {P}_{U\tilde{X}\mid \tilde{S}}:I(U\wedge \tilde{S})\leq \tau \atop U-\circ -\tilde{X},\tilde{S}-\circ -Y} }f(R,{P}_{U\tilde{X}\tilde{S}}), & &\end{array}$$
(22.28)

and

$$\begin{array}{rlrlrl} g(\tau ) {=\min }_{{P}_{\tilde{S}}}g({P}_{\tilde{S}},\tau ). & &\end{array}$$
(22.29)

To show the continuity of g(τ) at τ = 0, first note that g(τ) ≥ g(0) for all τ ≥ 0. Next, let \({P}_{\tilde{S}}^{0}\) attain the minimum in (22.29) for τ = 0. Clearly,

$$\begin{array}{rlrlrl} g({P}_{\tilde{S}}^{0},\tau ) \geq g(\tau ). & &\end{array}$$
(22.30)

Also, let \({P}_{U\tilde{X}\mid \tilde{S}}^{\tau }\) attain the maximum of \(g({P}_{\tilde{S}}^{0},\tau )\) in (22.28). For the associated joint pmf \({P}_{\tilde{S}}^{0}{P}_{U\tilde{X}\mid \tilde{S}}^{\tau }\), let \({P}_{U}^{\tau }\) denote the resulting U-marginal pmf, and consider the joint pmf \({P}_{U}^{\tau }{P}_{\tilde{S}}^{0}{P}_{\tilde{X}\mid U\tilde{S}}^{\tau }\). Then, using (22.28) and (22.29) and the observations above,

$$f(R,{P}_{U}^{\tau }{P}_{\tilde{ S}}^{0}{P}_{\tilde{ X}\mid U\tilde{S}}^{\tau }) \leq g(0) \leq g(\tau ) \leq g({P}_{\tilde{ S}}^{0},\tau ) = f(R,{P}_{\tilde{ S}}^{0}{P}_{ U\tilde{X}\mid \tilde{S}}^{\tau }).$$

The continuity of g(τ) at τ = 0 will follow upon showing that

$$f(R,{P}_{\tilde{S}}^{0}{P}_{ U\tilde{X}\mid \tilde{S}}^{\tau }) - f(R,{P}_{ U}^{\tau }{P}_{\tilde{ S}}^{0}{P}_{\tilde{ X}\mid U\tilde{S}}^{\tau }) \rightarrow 0\mathrm{as}\tau \rightarrow 0.$$

The constraint on the mutual information (22.28) gives by Pinsker’s inequality [34] that,

$$\tau \geq D\left ({P}_{U\mid \tilde{S}}^{\tau }{P}_{\tilde{ S}}^{0}\|{P}_{ U}^{\tau }{P}_{\tilde{ S}}^{0}\right ) \geq 2{\left \|{P}_{ U\mid \tilde{S}}^{\tau }{P}_{\tilde{ S}}^{0} - {P}_{ U}^{\tau }{P}_{\tilde{ S}}^{0}\right \|}^{2},$$

i.e.,

$$\begin{array}{rlrlrl} \left \|{P}_{U\mid \tilde{S}}^{\tau }{P}_{\tilde{ S}}^{0} - {P}_{ U}^{\tau }{P}_{\tilde{ S}}^{0}\right \| \leq \sqrt{\frac{\tau } {2}}. & &\end{array}$$
(22.31)

For \({P}_{U\tilde{X}\tilde{S}} = {P}_{U}^{\tau }{P}_{\tilde{S}}^{0}{P}_{\tilde{X}\mid U\tilde{S}}^{\tau }\), let V 0 attain the minimum in (22.26), i.e.,

$$\begin{array}{rlrlrl} {P}_{Y \mid \tilde{X}\tilde{S}} & = {V }^{0},\quad I(U \wedge Y ) < R,\mathrm{and} & & \\ f(R,{P}_{U}^{\tau }{P}_{\tilde{ S}}^{0}{P}_{\tilde{ X}\mid U\tilde{S}}^{\tau }) & = D({P}_{\tilde{ S}}\|{P}_{S}) + D({V }^{0}\|W\mid {P}_{\tilde{ X}\mid \tilde{S}}{P}_{\tilde{S}}). & & \end{array}$$

By (22.31), for \({P}_{U\tilde{X}\tilde{S}} = {P}_{\tilde{S}}^{0}{P}_{U\tilde{X}\mid U\tilde{S}}^{\tau }\) and \({P}_{Y \mid \tilde{X}\tilde{S}} = {V }^{0}\), by standard continuity arguments, we have

$$I(U \wedge Y ) < R + \nu ,$$

and

$$D({P}_{\tilde{S}}\|{P}_{S}) + D({V }^{0}\|W\mid {P}_{\tilde{ X}\mid \tilde{S}}{P}_{\tilde{S}}) \leq f(R,{P}_{U}^{\tau }{P}_{\tilde{ S}}^{0}{P}_{\tilde{ X}\mid U\tilde{S}}^{\tau }) + \nu ,$$

where \(\nu = \nu (\tau ) \rightarrow 0\) as τ → 0. Consequently,

$$f(R,{P}_{\tilde{S}}^{0}{P}_{ U\tilde{X}\mid U\tilde{S}}^{\tau }) \leq D({P}_{\tilde{ S}}\|{P}_{S})+D({V }^{0}\|W\mid {P}_{\tilde{ X}\mid \tilde{S}}{P}_{\tilde{S}}) \leq f(R,{P}_{U}^{\tau }{P}_{\tilde{ S}}^{0}{P}_{\tilde{ X}\mid U\tilde{S}}^{\tau })+\nu.$$

Finally, noting the continuity of \(f(R,{P}_{U\tilde{X}\tilde{S}})\) in R [3, Lemma 10.4], the proof is completed.