Keywords

1 Introduction

Assume that Alice receives a value of a random variable X and she wants to transmit that value to Bob. It is well-known [8] that Alice can do it using one message over the binary alphabet of expected length less than \(H(X) + 1\). Assume now that there are n independent random variables \(X_1, \ldots , X_n\) distributed as X, and Alice wants to transmit all \(X_1, \ldots , X_n\) to Bob. Another classical result from [8] states, that Alice can do it using one message of fixed length, namely \(\approx nH(X)\), with a small probability of error.

One of the possible ways to generalize this problem is to provide Bob with a value of another random variable Y which is jointly distributed with X. That is, to let Bob know some partial information about X for free. This problem is the subject of the classical Slepian-Wolf Theorem [9] which asserts that if there are n independent pairs \((X_1, Y_1), \ldots , (X_n, Y_n)\), each pair distributed exactly as (XY), then Alice can transmit all \(X_1, \ldots , X_n\) to Bob, who knows \(Y_1, \ldots , Y_n\), using one message of fixed length, namely \(\approx nH(X|Y)\), with a small probability of errorFootnote 1 However, it turns out that a one-shot analogue of this theorem is impossible, if only one-way communication is allowed.

The situation is quite different, if we allow Alice and Bob to interact, that is, to send messages in both directions. In [7] Orlitsky studied this problem for the average-case communication when no error is allowed. He showed that if pair (XY) is uniformly distributed on it’s support, then Alice may transmit X to Bob using at most

$$H(X|Y) + 3\log _2(H(X|Y) + 1) + 17$$

bits on average and 4 rounds. For the pairs (XY) whose support is a Cartesian product Orlitsky showed that error-less transmission of X from Alice to Bob requires H(X) bits on average.

From a result of Braverman and Rao [3], it follows that for arbitrary (XY) it is sufficient to communicate at most

$$H(X|Y) + 5\sqrt{H(X|Y)} + O\left( \log _2\left( \frac{1}{\varepsilon }\right) \right) $$

bits on average (here \(\varepsilon \) stands for the error probability).

We show that for every positive \(\varepsilon \) and natural r there is a public-coin protocol transmitting X from Alice to Bob with error probability at most \(\varepsilon \) (for each pair of inputs) using at most

$$\left( 1 + \frac{1}{r}\right) H(X|Y) + r + O\left( \log _2\frac{1}{\varepsilon }\right) $$

bits on average in at most

$$\frac{2H(X|Y)}{r} + 2$$

rounds on average. Furthermore, there is a private-coin protocol with the same properties plus extra \(O\left( \log \log \mathsf {supp}(X, Y)\right) \) bits of communication. Our protocol is inspired by protocol from [1]. The idea of the protocol is essentially the same, we only apply some technical trick to reduce communication.

This improves the result of Braverman and Rao, since setting \(r = \left\lceil \sqrt{H(X|Y)}\right\rceil \) above we obtain the protocol with expected communication at most \(H(X|Y) + 2\sqrt{H(X|Y)} + O\left( \log _2\left( \frac{1}{\varepsilon }\right) \right) \). In [4], it is established a one-shot interactive analogue of the Slepian-Wolf theorem for the bounded-round communication. They showed that Alice may transmit X to Bob using at most \(O(H(X|Y) + 1)\) bits and O(1) rounds on average. More specifically, their protocol transmits at most \(4H(X|Y) +\log _2(1/\varepsilon )+O(1)\) bits on average in 10 rounds on average. Setting \(r = \lceil H(X|Y)\rceil \) above we improve this result. Indeed, we obtain the protocol with the expected length at most \(2H(X|Y) + O\left( \log _2\left( \frac{1}{\varepsilon }\right) \right) \) and the expected number of rounds at most 4.

Actually, in [3] a more general result was established. It was shown there that every one-round protocol \(\pi \) with information complexity I can be compressed to the (many-round) protocol with expected length at most

$$\begin{aligned} \approx I + 5\sqrt{I}. \end{aligned}$$
(1)

Using the result from [2], we improve Equ. 1. Namely, we show that every one-round protocol \(\pi \) with information complexity I can be compressed to the (many-round) protocol with expected communication length at most

$$\approx I + 2\sqrt{I}.$$

Are there random variables XY for which the upper bound of the form \(H(X|Y) + O\left( \sqrt{H(X|Y)}\right) \) is tight? We make a step towards answering this question: we provide an example of random variables XY such that every public-coin communication protocol which transmits X from Alice to Bob with error probability \(\varepsilon \) (with respect to the input distribution and the protocol’s randomness) must communicate at least \(H(X|Y) + \varOmega \left( \log _2\left( \frac{1}{\varepsilon }\right) \right) \) bits on average.

2 Definitions

2.1 Information Theory

Let X, Y be two joint distributed random variables, taking values in the finite sets, respectively, \(\mathcal {X}\) and \(\mathcal {Y}\).

Definition 1

Shannon Entropy of X is defined by the formula

$$H(X) = \sum \limits _{x\in \mathcal {X}}\Pr [X = x]\log _2\left( \frac{1}{\Pr [X = x]}\right) .$$

Definition 2

Conditional Shannon entropy of X with respect to Y is defined by the formula:

$$H(X|Y) = \sum \limits _{y\in \mathcal {Y}}H(X|Y = y)\Pr [Y = y],$$

where \(X|Y = y\) denotes a distribution of X, conditioned on the event \(\{Y = y\}\).

If X is uniformly distributed in \(\mathcal {X}\) then obviously \(H(X) = \log _2(|\mathcal {X}|)\). We will also use the fact that the formula for conditional entropy may be re-written as

$$ H(X|Y) = \sum \limits _{(x, y)\in \mathcal {X}\times \mathcal {Y}}\Pr [X = x, Y = y]\log _2\left( \frac{1}{\Pr [X = x|Y = y]}\right) . $$

Generalization of the Shannon entropy is Renyi entropy.

Definition 3

Renyi entropy of X is defined by the formula

$$H_2(X) = -\log _2\left( \sum \limits _{x\in \mathcal {X}}\Pr [X = x]^2\right) .$$

Concavity of \(\log \) implies that \(H(X) \ge H_2(X)\).

The mutual information of two random variables X and Y, conditioned on another random variable Z, can be defined as:

$$ I(X:Y|Z) = H(X|Z) - H(X|Y, Z). $$

For the further introduction in information theory see, for example [11].

2.2 Communication Protocols

Assume that we are given jointly distributed random variables X and Y, taking values in finite sets \(\mathcal {X}\) and \(\mathcal {Y}\). Let \(R_A, R_B\) be a random variables, taking values in finite sets \(\mathcal {R_A}\) and \(\mathcal {R_B}\), such that \((X, Y), R_A, R_B\) are mutually independent.

Definition 4

A private–coin communication protocol is a rooted binary tree, in which each non-leaf vertex is associated either with Alice or with Bob. For each non-leaf vertex v associated with Alice there is a function \(f_v:\mathcal {X}\times \mathcal {R_A}\rightarrow \{0, 1\}\) and for each non-leaf vertex u associated with Bob there is a function \(g_u:\mathcal {Y}\times \mathcal {R_B}\rightarrow \{0, 1\}\). For each non-leaf vertex one of an out-going edges is labeled by 0 and other is labeled by 1. Finally, for each leaf l there is a function \(\phi _l:\mathcal {Y}\times \mathcal {R_B}\rightarrow \mathcal {O}\), where \(\mathcal {O}\) denotes the set of all possible Bob’s outputs.

A computation according to a protocol runs as follows. Alice is given \(x \in \mathcal {X}\), Bob is given \(y \in \mathcal {Y}\). Assume that \(R_A\) takes a value \(r_a\) and \(R_B\) takes a value \(r_b\). Alice and Bob start at the root of the tree. If they are in the non-leaf vertex v associated with Alice, then Alice sends \(f_v(x, r_a)\) to Bob and they go by the edge labeled by \(f_v(x, r_a)\). If they are in a non-leaf vertex associated with Bob then Bob sends \(g_v(y, r_b)\) to Alice and they go by the edge labeled by \(g_v(y, r_b)\). When they reach a leaf l Bob outputs the result \(\phi _l(y, r_b)\).

A protocol is called deterministic if \(f_v, g_u\) and \(\phi _l\) do not depend on the values of \(R_A, R_B\).

A randomized communication protocol is a distribution over private-coin protocols with the same \(\mathcal {X}\) for Alice and the same \(\mathcal {Y}\) for Bob. The random variable with this distribution (public randomness) is denoted below by R. Before the execution starts, Alice and Bob sample R to choose the private-coin protocol to be executed.

A protocol is called public-coin if it is a distribution over deterministic protocols.

We distinguish between average-case communication complexity and the worst-case communication complexity. The (worst-case) communication complexity of a protocol \(\pi \), denoted by \(CC(\pi )\), is defined as the maximal possible depth of the leaf Alice and Bob may reach in \(\pi \).

We say that protocol \(\pi \) communicates d bits on average (or expected length of the protocol is equal to d), if the expected depth of the leaf that Alice and Bob reach during the execution of the protocol \(\pi \) is equal to d, where the expectation is taken over X, Y and the protocol’s randomness.

For the further introduction in Communication Complexity see [5].

3 Slepian-Wolf Theorem with Interaction

Consider the following auxiliary problem. Let A be a finite set. Assume that Alice receives an arbitrary \(a\in A\) and Bob receives an arbitrary probability distribution \(\mu \) on A. Alice wants to communicate a to Bob in about \(\log (1/\mu (a))\) bits with small probability of error.

Lemma 1

Let \(\varepsilon \) be a positive real and r a positive integer. There exists a public coin randomized communication protocol such that for all a in the support of \(\mu \) the following hold:

  • in the end of the communication Bob outputs \(b\in A\) which is equal to a with probability at least \(1 - \varepsilon \);

  • the protocol communicates at most

    $$ \log _2\left( \frac{1}{\mu (a)}\right) + \frac{\log _2\left( \frac{1}{\mu (a)}\right) }{r} + r +\log _2\left( \frac{1}{\varepsilon }\right) + 2 $$

    bits, regardless of the randomness.

  • the number of rounds in the protocol does not exceed

    $$ \frac{2\log _2\left( \frac{1}{\mu (a)}\right) }{r} + 2.$$

Proof

Alice and Bob interpret each portion of |A| consecutive bits from the public randomness source as a table of a random function \(h:A\rightarrow \{0,1\}\). That is, we will think that they have access to a large enough family of mutually independent random functions of the type \(A\rightarrow \{0,1\}\). Those functions will be called hash functions and their values hash values below.

The first set \(k = \left\lceil \log _2\left( \frac{1}{\varepsilon }\right) \right\rceil \). Then for all \(i = 0,1\dots \) Bob sets:

$$S_i = \left\{ x\in A\,|\,\mu (x) \in (2^{-i - 1}, 2^{-i}]\right\} .$$

At the beginning Alice sends k hash values of a. Then Alice and Bob work in stages numbered \(1,2\dots \).

On Stage t:

  1. 1.

    Alice sends r new hash values of a to Bob so that the total number of hash values of a available to Bob be \(k + rt\).

  2. 2.

    For each \(i \in \{r(t - 1), \ldots , rt - 1\}\) Bob computes set \(S_i^\prime \), which consists of all elements from \(S_i\), which agree with all Alice’s hash values.

  3. 3.

    If there exists \(i \in \{r(t - 1), \ldots , rt - 1\}\) such that \(S_i^\prime \ne \varnothing \), then Bob sends 1 to Alice, outputs any element of \(S_i^\prime \) and they terminate. Otherwise Bob sends 0 to Alice and they proceed to Stage \(t + 1\).

Let us at first show that the protocol terminates for all a in the support of \(\mu \). Assume that Alice has a and Bob has \(\mu \). Let \(i = \left\lfloor \log _2\left( \frac{1}{\mu (a)}\right) \right\rfloor \) so that \(a\in S_i\). The protocol terminates on Stage t where

$$\begin{aligned} r(t - 1) \le i \le rt - 1 \end{aligned}$$
(2)

or earlier. Indeed all hash values of a available to Bob on Stage t coincide with hash values of some element of \(S_i\) (for instance, with those of a).

Thus Alice sends at most \(k + rt\) bits to Bob and Bob sends at most t bits to Alice. The left-hand size of (2) implies that \(t \le \frac{i}{r} + 1\). Therefore Alice’s communication is bounded by

$$ \begin{aligned} k + rt&\le k + r\left( \frac{i}{r} + 1\right) \\&= \left\lceil \log _2\left( \frac{1}{\varepsilon }\right) \right\rceil + i + r\\&\le \log _2\left( \frac{1}{\mu (a)}\right) + r + \log _2\left( \frac{1}{\varepsilon }\right) +1, \end{aligned} $$

and Bob’s communication is bounded by

$$t\le \frac{i}{r} + 1 \le \frac{\log _2\left( \frac{1}{\mu (a)}\right) }{r} + 1.$$

These two bounds imply that the total communication length is at most \( \log _2\left( \frac{1}{\mu (a)}\right) + \frac{\log _2\left( \frac{1}{\mu (a)}\right) }{r} + r + \log _2\left( \frac{1}{\varepsilon }\right) + 2\). The number of rounds equals the length of Bob’s communication, multiplied by 2. Hence this number is at most \(\frac{2\log _2\left( \frac{1}{\mu (a)}\right) }{r} + 2\). We conclude that the communication and the number of rounds are as short as required.

It remains to bound the error probability. An error may occur, if for some t a set \(S_i\) considered on Stage t has an element \(b\ne a\) which agrees with hash values sent from Alice. At that time Bob has already \(k+rt\ge k+i +1\) hash values, where the inequality follows from (2). The probability that \(k+i + 1\) hash values of b coincide with those of a is \(2^{-k-i -1}\). Hence by union bound error probability does not exceed

$$ \begin{aligned} \sum \limits _{i = 0}^{\infty } |S_i|2^{-k - i - 1}&= 2^{-k}\sum \limits _{i = 0}^\infty |S_i| 2^{-i - 1} < 2^{-k}\sum \limits _{i = 0}^\infty \sum \limits _{x\in S_i}\mu (x)\\&= 2^{-k}\sum \limits _{x\in A}\mu (x)= 2^{-k} = 2^{- \left\lceil \log _2\left( \frac{1}{\varepsilon }\right) \right\rceil } \le \varepsilon . \end{aligned} $$

Theorem 1

Let X, Y be jointly distributed random variables that take values in the finite sets \(\mathcal {X}\) and \(\mathcal {Y}\). Then for every positive \(\varepsilon \) and positive integer r there exists a public-coin protocol with the following properties.

  • For every pair (xy) from the support of (XY) with probability at least \(1 - \varepsilon \) Bob outputs x;

  • The expected length of communication is at most

    $$H(X|Y) + \frac{H(X|Y)}{r} + r + \log _2\left( \frac{1}{\varepsilon }\right) + 2.$$
  • The expected number of rounds is at most

    $$\frac{2H(X|Y)}{r} + 2.$$

Proof

On input xy, Alice and Bob run protocol of Lemma 1 with \(A=\mathcal {X}\), \(a = x\) and \(\mu \) equal to the distribution of X, conditioned on the event \(Y = y\). Notice that Alice knows a and Bob knows \(\mu \).

Let us show that all the requirements are fulfilled for this protocol. The first requirement immediately follows from the first property of the protocol of Lemma 1.

From the second and the third property of the protocol of Lemma 1 it follows that for input pair xy out protocol communicates at most:

$$ \log _2\left( \frac{1}{\Pr [X = x|Y = y]}\right) + \frac{\log _2\left( \frac{1}{\Pr [X = x|Y = y]}\right) }{r} + r +\log _2\left( \frac{1}{\varepsilon }\right) + 2$$

bits in at most

$$\frac{\log _2\left( \frac{1}{\Pr [X = x|Y = y]}\right) }{r} + 2$$

rounds. Recalling that

$$ H(X|Y)=\sum \limits _{(x, y)\in \mathcal {X}\times \mathcal {Y}}\Pr [X = x, Y = y]\log _2\left( \frac{1}{\Pr [X = x|Y = y]}\right) $$

we see on average the communication and the number of rounds are as short as required.

Theorem 1 provides a trade off between the communication and the number of rounds.

  • To obtain a protocol with minimal communication set \(r =\left\lceil \sqrt{H(X|Y)}\right\rceil \). For such r the protocol communicates at most \(H(X|Y) + 2\sqrt{H(X|Y)} + O\left( \log _2\frac{1}{\varepsilon }\right) \) bits on average.

  • To obtain a protocol with a constant number of rounds on average set, for example, \(r = \lceil H(X|Y)\rceil \). For such r the protocol communicates at most \(2H(X|Y) + O\left( \log _2\frac{1}{\varepsilon }\right) \) bits on average in at most 4 rounds on average.

  • In a same manner for every \(\delta \in (0, 0.5)\) we can obtain a protocol with the expected communication at most \(H(X|Y) + O\left( H(X|Y)^{0.5 + \delta }\right) \) and the expected number of rounds at most \(O\left( H(X|Y)^{0.5 - \delta }\right) \).

Remark. One may wonder whether there exists a private-coin communication protocol with the same properties as the protocol of Theorem 1. Newman’s theorem [6] states that every public-coin protocol can be transformed into a private-coin protocol at the expense of increasing the error probability by \(\delta \) and the worst case communication by \(O(\log \log |\mathcal {X}\times \mathcal {Y}|+\log 1/\delta )\) (for any positive \(\delta \)). Lemma 1 provides an upper bound for the error probability and communication of our protocol for each pair of inputs. Repeating the arguments from the proof of Newman’s theorem, we are able to transform the public-coin protocol of Lemma 1 into a private-coin one with the same trade off between the increase of error probability and the increase of communication length. It follows that for our problem there exists a private-coin communication protocol which errs with probability at most \(\varepsilon \) and communicates on average as many bits as the public-coin protocol from Theorem 1 plus extra \(O(\log \log |\mathcal {X}\times \mathcal {Y}|)\) bits.

4 One-Round Compression

Information complexity of the protocol \(\pi \) with inputs (XY) is defined as

$$\begin{aligned} IC_\mu (\pi )&= I(X : \varPi |Y, R) + I(Y : \varPi |X, R)\\&= I(X : \varPi |Y, R,R_B) + I(Y : \varPi |X, R,R_A)\\&= I(X : \varPi ,R,R_B|Y) + I(Y : \varPi ,R,R_A|X), \end{aligned}$$

where \(R,R_A,R_B\) denote (shared, Alice’s and Bob’s) randomness, \(\mu \) stands for the distribution of (XY) and \(\varPi \) stands for the concatenation of all bits sent in \(\pi \) (\(\varPi \) is called a transcript). The first term is equal to the information which Bob learns about Alice’s input and the second term is equal to the information which Alice learns about Bob’s input. Information complexity is an important concept in the Communication Complexity. For example, information complexity plays the crucial role in the Direct-Sum problem [10].

We will consider the special case when \(\pi \) is one-round. In this case Alice sends one message \(\varPi \) to Bob, then Bob outputs the result (based on his input, his randomness, and Alice’s message) and the protocol terminates. Since Alice learns nothing, information complexity can be re-written as

$$I = IC_\mu (\pi ) = I(X:\varPi |Y, R).$$

Our goal is to simulate a given one-round protocol \(\pi \) with another protocol \(\tau \) which has the same input space (XY) and whose expected communication complexity is close to I. The new protocol \(\tau \) may be many-round. The quality of simulation will be measured by the statistical distance. Statistical distance between random variables A and B, both taking values in the set V, equals

$$\delta (A, B) = \max \limits _{U\subset V} \left| \Pr [A \in U] - \Pr [B \in U]\right| .$$

One of the main results of [3] is the following theorem.

Theorem 2

For every one-round protocol \(\pi \) and for every probability distribution \(\mu \) there is a public-coin protocol \(\tau \) with expected length (with respect to \(\mu \) and the randomness of \(\tau \)) at most \(I + 5\sqrt{I} + O\left( \log _2\frac{1}{\varepsilon }\right) \) such that for each pair of inputs (xy) after termination of \(\tau \) Bob outputs a random variable \(\varPi ^\prime \) with \(\delta \left( \left( \varPi |X = x, Y = y\right) , \left( \varPi ^\prime |X = x, Y = y\right) \right) \le \varepsilon \).

We will show that Theorem 1 and together with the main result of [2] imply that we can replace \(5\sqrt{I}\) by about \(2\sqrt{I}\) in this theorem. More specifically,

Theorem 3

For every one-round protocol \(\pi \) and for every probability distribution \(\mu \) there is a public-coin protocol \(\tau \) with expected length (with respect to \(\mu \) and the randomness of \(\tau \)) at most

$$I + \log _2(I + O(1)) + 2\sqrt{I + \log _2(I + O(1))} + O\left( \log _2\frac{1}{\varepsilon }\right) $$

such that for each pair of inputs (xy) in the protocol \(\tau \) Bob outputs \(\varPi ^\prime \) with \(\delta \left( \left( \varPi |X = x, Y = y\right) , \left( \varPi ^\prime |X = x, Y = y\right) \right) \le \varepsilon \)

We want to transmit Alice’s message \(\varPi \) to Bob (who knows Y and his randomness R) in many rounds so that the expected communication length is small. By Theorem 1 this task can be solved with error \(\varepsilon \) in expected communication

$$\begin{aligned} H(\varPi |Y,R) + 2\sqrt{H(\varPi |Y,R)} + O\left( \log _2 \frac{1}{\varepsilon }\right) . \end{aligned}$$
(3)

Assume first that the original protocol \(\pi \) uses only public randomness. Then

$$ I = I(X:\varPi |Y, R) = H(\varPi |Y, R) - H(\varPi |X, Y, R) = H(\varPi |Y, R). $$

Indeed, \(H(\varPi |X, Y, R) = 0\), since \(\varPi \) is defined by XR. Thus (3) becomes

$$ I + 2\sqrt{I} + O\left( \log _2 \frac{1}{\varepsilon }\right) $$

and we are done.

In general case, when the original protocol uses private randomness, I can be much smaller than \(H(\varPi |Y, R)\). Fortunately, by the following theorem from [2] we can remove private coins from the protocol with only a slight increase in information complexity.

Theorem 4

For every one-round protocol \(\pi \) and for every probability distribution \(\mu \) there is a one-round public-coin protocol \(\pi ^\prime \) with information complexity \(IC_\mu (\pi ^\prime ) \le I + \log _2 (I + O(1))\) such that for each pairs of inputs (xy) in the protocol \(\pi '\) Bob outputs \(\varPi ^\prime \) for which \(\varPi ^\prime |X =x, Y = y\) and \(\varPi |X = x, Y = y\) are identically distributed.

Combining this theorem with our main result (Theorem 1), we obtain Theorem 3.

5 A Lower Bounds for the Average-Case Communication

Let (XY) be a pair of jointly distributed random variables. Assume that \(\pi \) is a deterministic protocol to transmit X from Alice to Bob who knows Y. Let \(\pi (X, Y)\) stand for the result output by the protocol \(\pi \) for input pair (XY). We assume that for at least \(1-\varepsilon \) input pairs this result is correct:

$$ \Pr [\pi (X, Y)\ne X)]\le \varepsilon . $$

It is not hard to see that in this case the expected communication length cannot be much less than H(X|Y) bits on average. Moreover, this applies for communication from Alice to Bob only.

Proposition 1

For every deterministic protocol as above the expected communication from Alice to Bob is at least \(H(X|Y) - \varepsilon \log _2|\mathcal {X}| - 1.\)

The proof of this proposition is omitted due to space constraints.

There are random variables for which this lower bound is tight. For instance, let Y be empty and let X take the value \(x\in \{0,1\}^n\) with probability \(\varepsilon /2^n\) (for all such x) and let \(X=\) (the empty string) with the remaining probability \(1-\varepsilon \). Then the trivial protocol with no communication solves the job with error probability \(\varepsilon \) and \(H(X|Y)\approx \varepsilon \log _2|\mathcal {X}|\).

In this section we consider the following question: are there a random variables (XY), for which for every public-coin communication protocol the expected communication is significantly larger than H(X|Y), say close to the upper bound \(H(X|Y) + 2\sqrt{H(X|Y)} + \log _2\left( \frac{1}{\varepsilon }\right) \) of Theorem 1?

Orlitsky [7] showed that if no error is allowed and the support of (XY) is a Cartesian product, then every deterministic protocol must communicate H(X) bits on average.

Proposition 2

Let (XY) be a pair of jointly distributed random variables whose support is a Cartesian product. Assume that \(\pi \) is a deterministic protocol, which transmits X from Alice to Bob who knows Y and

$$\Pr [\pi (X, Y) \ne X)] = 0.$$

Then the expected length of \(\pi \) is at least H(X).

This result can be easily generalized to the case when \(\pi \) is public-coin.

The main result of this section states that there are random variables (XY) such that transmission of X from Alice to Bob with error probability \(\varepsilon \) requires \(H(X|Y) + \varOmega \left( \log _2\left( \frac{1}{\varepsilon }\right) \right) \) bits on average.

The random variables XY are specified by two parameters, \(\delta \in (0, 1/2)\) and \(n\in \mathbb {N}\). Both random variables take values in \(\{0, 1, \ldots , n\}\) and are distributed as follows: Y is distributed uniformly in \(\{0, 1, \ldots , n\}\) and \(X=Y\) with probability \(1-\delta \) and X is uniformly distributed in \(\{0, 1, \ldots , n\}\setminus \{X\}\) with the remaining probability \(\delta \). That is,

$$ \Pr [X = i, Y = j] = \frac{(1 - \delta )\delta _{ij} + \frac{\delta }{n}(1 - \delta _{ij})}{n + 1}, $$

where \(\delta _{ij}\) stands for the Kronecker’s delta. Notice that X is uniformly distributed on \(\{0, 1, \ldots , n\}\) as well. A straightforward calculation reveals that

$$\Pr [X = i|Y = j] = \frac{\Pr [X = i, Y = j]}{\Pr [Y = j]} = (1 - \delta -\frac{\delta }{n})\delta _{ij} + \frac{\delta }{n}$$

and

$$H(X|Y) = (1 - \delta )\log _2\left( \frac{1}{1 - \delta }\right) + \delta \log _2\left( \frac{n}{\delta }\right) = \delta \log _2 n + O(1).$$

We will think of \(\delta \) as a constant, say 1/4. For one-way protocol we are able to show that communication length must be close to \(\log n\), which is about \(1/\delta \) times larger than H(X|Y):

Proposition 3

Assume that \(\pi \) is a one-way deterministic protocol, which transmits X from Alice to Bob who knows Y and

$$ \Pr [\pi (X, Y) \ne X)] \le \varepsilon . $$

Then the expected length of \(\pi \) is at least \(\left( 1 - \frac{\varepsilon }{\delta }\right) \log _2(n + 1) - 2\).

This result explains why the one-way one-shot analogue of the Slepian–Wolf theorem is not possible.

Proof

Let S be the number of leafs in \(\pi \). For each \(j\in \{0, 1, \ldots , n\}\)

$$\#\left\{ i\in \{0, 1, \ldots , n\}\left. \right| \pi (i, j) = i \right\} \le S.$$

Hence the error probability \(\varepsilon \) is at least \((n + 1 - S)\frac{\delta }{n}\). This implies that

$$ S \ge n\left( 1 - \frac{\varepsilon }{\delta }\right) + 1\ge (n + 1)\left( 1 - \frac{\varepsilon }{\delta }\right) . $$

Let \(\varPi (X)\) denote the leaf Alice and Bob reach in \(\pi \) (since the protocol is one-way, the leaf depends only on X). The expected length of \(\varPi (X)\) is at least \(H(\varPi )\) (identify each leaf with the binary string, written on the path from the root to this leaf in the protocol tree; the set of all these strings is prefix–free). Let \(l_1, l_2, \ldots , l_S\) be the list of all leaves in the support of the random variable \(\varPi (X)\). As X is distributed uniformly, we have

$$\Pr [\varPi = l_i]\ge \frac{1}{n + 1}$$

for all i. The statement follows from

Lemma 2

Assume that \(p_1, \ldots , p_k, q_1, \ldots , q_k\in (0, 1)\) satisfy

$$ \sum \limits _{i = 1}^kp_i = 1,$$
$$\forall i\in \{1, \ldots , k\}\qquad p_i\ge q_i. $$

Then

$$ \sum \limits _{i = 1}^kp_i\log _2\frac{1}{p_i}\ge \sum \limits _{i = 1}^kq_i\log _2\frac{1}{q_i} - 2. $$

The proof of this technical lemma is omitted due to space constraints. The lemma implies that

$$ \begin{aligned} H(\varPi )&= \sum \limits _{i = 1}^S \Pr [\varPi = l_i]\log _2\left( \frac{1}{\Pr [\varPi = l_i]}\right) \\&\ge \frac{S}{n + 1}\log _2(n + 1) - 2\ge \left( 1 - \frac{\varepsilon }{\delta }\right) \log _2(n + 1) - 2. \end{aligned} $$

The next theorem states that for any fixed \(\delta \) every two-way public-coin protocol with error probability \(\varepsilon \) must communicate about \(H(X|Y)+(1-\delta )\log _2(1/\varepsilon )\) bits on average.

Theorem 5

Assume that \(\pi \) is a public-coin communication protocol which transmits X from Alice to Bob who knows Y and

$$\Pr [X^\prime \ne X] \le \varepsilon ,$$

where \(X^\prime \) denotes the Bob’s output and the probability is taken with respect to input distribution and public randomness of \(\pi \). Then the expected length of \(\pi \) is at least

$$(1 - \delta - \delta /n)\log _2\left( \frac{\delta }{\varepsilon + \delta /n}\right) + (\delta - 2\varepsilon )\log _2(n + 1) - 2\delta .$$

The lower bound in this theorem is quite complicated and comes from its proof. To understand this bound assume that \(\delta \) is a constant, say \(\delta =1/4\), and \(\frac{1}{n}\le \varepsilon \le \frac{1}{\log _2n}\). Then \(H(X|Y)=(1/4)\log _2 n+O(1)\) and the lower bound becomes

$$\begin{aligned} \left( 1 - \frac{1}{4} - \frac{1}{4n}\right) \log _2\left( \frac{\frac{1}{4}}{\varepsilon + \frac{1}{4n}}\right) + (1/4 - 2\varepsilon )\log _2(n + 1) - \frac{1}{2}\\ \end{aligned}$$

Condition \(\frac{1}{n}\le \varepsilon \) implies that the first term is equal to

$$(3/4)\log _2\left( \frac{1}{\varepsilon }\right) - O(1).$$

Condition \(\varepsilon \le \frac{1}{\log _2 n}\) implies that the seconds term is equal to

$$(1/4)\log _2 n - O(1).$$

Therefore under these conditions the lower bound becomes

$$(1/4)\log _2 n + (3/4)\log _2\left( \frac{1}{\varepsilon }\right) - O(1) = H(X|Y) + (3/4)\log _2\left( \frac{1}{\varepsilon }\right) - O(1).$$

Proof

Let us start with the case when \(\pi \) is deterministic. Let \(\varPi =\varPi (X, Y)\) denote the leaf Alice and Bob reach in the protocol \(\pi \) for input pair (XY). As we have seen, the expected length of communication is at least the entropy \(H(\varPi (X,Y))\). Let \(l_1, \ldots , l_S\) denote all the leaves in the support of the random variable \(\varPi (X, Y)\). The set \(\{(x, y)\left. \right| \varPi (x, y) = l_i\}\) is a combinatorial rectangle \(R_i\subset \{0, 1, \ldots , n\}\times \{0, 1, \ldots , n\}\). Imagine \(\{0, 1, \ldots , n\}\times \{0, 1, \ldots , n\}\) as a table in which Alice owns columns and Bob owns rows. Let \(h_i\) be the height of \(R_i\) and \(w_i\) be the width of \(R_i\). Let \(d_i\) stand for the number of diagonal elements in \(R_i\) (pairs of the form (jj)). By definition of (XY) we have

$$\begin{aligned} \Pr [\varPi (X,Y) = l_i] = \frac{(1 - \delta )d_i}{n + 1} + \frac{\delta (h_i w_i - d_i)}{n(n + 1)}.\end{aligned}$$
(4)

The numbers \(\{\Pr [\varPi (X,Y) = l_i]\}_{i = 1}^S\) define a probability distribution over the set \(\{1, 2, \ldots , S\}\) and its entropy equals \(H(\varPi (X,Y))\). Equation (4) represents this distribution as a weighted sum of the following distributions: \( \left\{ \frac{d_i}{n + 1}\right\} _{i = 1}^S\) and \(\left\{ \frac{h_i w_i}{(n + 1)^2}\right\} _{i = 1}^S\). That is, Eq. (4) implies that

$$\{\Pr [\varPi = l_i]\}_{i = 1}^S = (1 - \delta - \delta /n) \left\{ \frac{d_i}{n + 1}\right\} _{i = 1}^S + (\delta + \delta /n)\left\{ \frac{h_i w_i}{(n + 1)^2}\right\} _{i = 1}^S.$$

Since entropy is concave, we have

$$\begin{aligned} \begin{aligned} H(\varPi )&= H\left( \{\Pr [\varPi = l_i]\}_{i = 1}^S \right) \\ {}&\ge (1 - \delta - \delta /n)H\left( \left\{ \frac{d_i}{n + 1}\right\} _{i = 1}^S\right) + (\delta + \delta /n)H\left( \left\{ \frac{h_i w_i}{(n + 1)^2}\right\} _{i = 1}^S\right) \end{aligned} \end{aligned}$$
(5)

The lower bound of the theorem follows from lower bounds of the entropies of these distributions.

A lower bound for \(H\left( \left\{ \frac{d_i}{n + 1}\right\} _{i = 1}^S\right) \) . In each row of \(R_i\) there is at most 1 element (xy), for which \(\pi (x, y) = x\). The rectangle \(R_i\) consists of \(d_i\) diagonal elements and hence there are at least \(d_i^2 - d_i\) elements (xy) in \(R_i\) for which \(\pi (x, y)\ne x\). Summing over all i we get

$$\varepsilon \ge \sum \limits _{i = 1}^S \frac{\delta (d_i^2 - d_i)}{n(n + 1)}$$

and thus

$$\sum \limits _{i = 1}^S \left( \frac{d_i}{n + 1}\right) ^2 \le \frac{\varepsilon + \delta /n}{\delta }.$$

Since Renyi entropy is a lower bound for the Shannon entropy, we have

$$ H\left( \left\{ \frac{d_i}{n + 1}\right\} _{i = 1}^S\right) \ge \log _2\left( \frac{1}{\sum \limits _{i = 1}^S \left( \frac{d_i}{n + 1}\right) ^2}\right) \ge \log _2\left( \frac{\delta }{\varepsilon + \delta /n}\right) . $$

A lower bound for \(H\left( \left\{ \frac{h_i w_i}{(n + 1)^2}\right\} _{i = 1}^S\right) \) . In \(R_i\), there are at most \(h_i\) good pairs (for which \(\pi \) works correctly). At most \(d_i\) of them has probability \(\frac{1 - \delta }{n + 1}\). Hence

$$ \Pr [\varPi = l_i, \pi (X, Y) = X] \le \frac{(1 - \delta )d_i}{n + 1} + \frac{\delta (h_i - d_i)}{n(n + 1)} $$

and

$$ \begin{aligned} 1 - \varepsilon&\le \Pr [\pi (X, Y) = X] = \sum \limits _{i = 1}^S \Pr [\varPi = l_i, \pi (X, Y) = X]\\&\le \sum \limits _{i = 1}^S\left( \frac{(1 - \delta )d_i}{n + 1} + \frac{\delta (h_i - d_i)}{n(n + 1)}\right) = 1 - \delta - \delta /n + \frac{\delta }{n(n + 1)} \sum \limits _{i = 1}^S h_i. \end{aligned} $$

The last inequality implies that

$$\sum \limits _{i = 1}^S h_i \ge (1 - \varepsilon /\delta ) (n + 1)^2.$$

Since \(h_i \le n + 1\), we have

$$\begin{aligned}&\sum \limits _{i = 1}^S \frac{h_i w_i}{(n + 1)^2}\log _2\left( \frac{(n + 1)^2}{h_i w_i}\right) \ge \sum \limits _{i = 1}^S \frac{h_i w_i}{(n + 1)^2}\log _2\left( \frac{(n + 1)^2}{(n + 1) w_i}\right) \\&\quad = -\log _2(n + 1) {\,+\,} \sum \limits _{i = 1}^S h_i \frac{w_i}{(n + 1)^2}\log _2\left( \frac{(n + 1)^2}{w_i}\right) . \end{aligned}$$

Obviously \(\frac{w_i}{(n + 1)^2} \ge \frac{1}{(n + 1)^2}\). By Lemma 2 we get

$$ \begin{aligned} \sum \limits _{i = 1}^S h_i \frac{w_i}{(n + 1)^2}\log _2\left( \frac{(n + 1)^2}{w_i}\right)&\ge \left( \sum \limits _{i = 1}^Sh_i\right) \frac{1}{(n + 1)^2}\log _2\left( (n + 1)^2\right) - 2\\&\ge (2 - 2\varepsilon /\delta )\log _2(n + 1) - 2. \end{aligned} $$

Thus

$$ H\left( \left\{ \frac{h_i w_i}{(n + 1)^2}\right\} _{i = 1}^S\right) \ge (1 - 2\varepsilon /\delta )\log _2(n + 1) - 2,$$

and the theorem is proved for deterministic protocols.

Assume now that \(\pi \) is a public-coin protocol with public randomness R and let r be a possible value of R. Let \(\pi _r\) stand for the deterministic communication protocol obtained from \(\pi \) by fixing \(R = r\). For any protocol \(\tau \) let \(\Vert \tau \Vert \) denote the random variable representing communication length of \(\tau \) (which may depend on the input and the randomness). Finally, set \(\varepsilon _r = \Pr [X^\prime \ne X| R = r]\)

Note that \(\pi _r\) transmits X from Alice to Bob with error probability at most \(\varepsilon _r\) (with respect to input distribution). Since \(\pi _r\) is deterministic, the expected length of \(\pi _r\) is at least:

$$\mathrm {E}\Vert \pi _r\Vert \ge (1 - \delta - \delta /n)\log _2\left( \frac{\delta }{\varepsilon _r + \delta /n}\right) + (\delta - 2\varepsilon _r)\log _2(n + 1) - 2\delta .$$

Since \(\mathrm {E}_{r\sim R} \varepsilon _r = \varepsilon \) and by concavity of \(\log \):

$$\begin{aligned} \mathrm {E}\Vert \pi \Vert&= \mathrm {E}_{r\sim R} \mathrm {E}\Vert \pi _r\Vert \\&\ge \mathrm {E}_{r\sim R} \left[ (1 - \delta - \delta /n)\log _2\left( \frac{\delta }{\varepsilon _r + \delta /n}\right) + (\delta - 2\varepsilon _r)\log _2(n + 1) - 2\delta \right] \\&\ge (1 - \delta - \delta /n)\log _2\left( \frac{\delta }{\varepsilon + \delta /n}\right) + (\delta - 2\varepsilon )\log _2(n + 1) - 2\delta . \end{aligned}$$