A Direct Product Theorem for Two-Party Bounded-Round Public-Coin Communication Complexity

Jain, Rahul; Pereszlényi, Attila; Yao, Penghui

doi:10.1007/s00453-015-0100-0

A Direct Product Theorem for Two-Party Bounded-Round Public-Coin Communication Complexity

Published: 18 December 2015

Volume 76, pages 720–748, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Algorithmica Aims and scope Submit manuscript

A Direct Product Theorem for Two-Party Bounded-Round Public-Coin Communication Complexity

Download PDF

Rahul Jain^1,2,
Attila Pereszlényi³ &
Penghui Yao³

382 Accesses
7 Citations
Explore all metrics

Abstract

A strong direct product theorem for a problem in a given model of computation states that, in order to compute k instances of the problem, if we provide resource which is less than k times the resource required for computing one instance of the problem with constant success probability, then the probability of correctly computing all the k instances together, is exponentially small in k. In this paper, we consider the model of two-party bounded-round public-coin randomized communication complexity. We show a direct product theorem for the communication complexity of any complete relation in this model. In particular, our result implies a strong direct product theorem for the two-party constant-round public-coin randomized communication complexity of all complete relations. As an immediate application of our result, we get a strong direct product theorem for the pointer chasing problem. This problem has been well studied for understanding round v/s communication trade-offs in both classical and quantum communication protocols. Our result generalizes the result of Jain which can be regarded as the special case when the number of messages is one. Our result can be considered as an important progress towards settling the strong direct product conjecture for two-party public-coin communication complexity, a major open question in this area. We show our result using information theoretic arguments. Our arguments and techniques build on the ones used by Jain. One key tool used in our work and also by Jain is a message compression technique due to Braverman and Rao, who used it to show a direct sum theorem in the same model of communication complexity as considered by us. Another important tool that we use is a correlated sampling protocol which, for example, has been used by Holenstein for proving a parallel repetition theorem for two-prover games.

Public vs. Private Randomness in Simultaneous Multi-party Communication Complexity

A Discrepancy Lower Bound for Information Complexity

Article 30 November 2015

Making Randomness Public in Unbounded-Round Information Complexity

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

A fundamental question in complexity theory is how much resource is needed to solve k independent instances of a problem compared to the resource required to solve one instance. More specifically, suppose that for solving one instance of a problem with probability of correctness p, we require c units of some resource in a given model of computation. A natural way to solve k independent instances of the same problem is to solve them independently, which needs $k \cdot c$ units of resource and the overall success probability is $p^k$. A strong direct product theorem for this problem would state that any algorithm, which solves k independent instances of this problem with $o(k\cdot c)$ units of the resource, can only compute all the k instances correctly with probability at most $p^{-\varOmega \,\left( k \right) }$. The weaker direct sum theorems state that in order to compute k independent instances of a problem, if we provide $o(k\cdot c)$ units of resource, then the success probability for computing all the k instances correctly is at most a constant $q < 1$.

In this work, we are concerned with the model of communication complexity which was introduced by Yao [40]. In this model there are different parties who wish to compute a joint relation of their inputs. They do local computation, use public or private coins, and communicate to achieve this task. The resource that is counted is the number of bits communicated. The text by Kushilevitz and Nisan [27] is an excellent reference for this model.

Direct product questions and direct sum questions have been extensively investigated in different sub-models of communication complexity. Some examples of known direct product theorems are Parnafes et al. [32] theorem for forests of communication protocols, Shaltiel’s [36] theorem for the discrepancy bound (which is a lower bound on the distributional communication complexity) under the uniform distribution, extended to arbitrary distributions by Lee et al. [29], extended to the multi-party case by Viola and Wigderson [39], extended to the generalized discrepancy bound by Sherstov [38]. Jain et al. [16] proved direct product theorem for the subdistribution bound. Klauck et al. [26] proved it for the quantum communication complexity of the set disjointness problem and Klauck [24] proved it for the public-coin communication complexity of the set disjointness problem (which was re-proven using different arguments by Jain [14]). Ben-Aroya et al. [4] showed it for the one-way quantum communication complexity of the index function problem. Jain showed it for randomized one-way communication complexity and for the conditional relative min-entropy bound [14], which is a lower bound on public-coin communication complexity. Recently, Jain and Yao [22] showed a strong direct product theorem in terms of the smooth rectangle bound. Later, Braverman and Weinstein [8] strengthened the result by showing a strong direct product theorem in terms of the (internal) information cost. Direct sum theorems were shown in the public-coin one-way model [18], in the public-coin simultaneous message passing model [18], in the entanglement-assisted quantum one-way communication model [20], in the private-coin simultaneous message passing model [15], in the constant-round public-coin two-way model [5] and in the general two-way model [3]. On the other hand, strong direct product conjectures have been shown to be false by Shaltiel [36] in some models of distributional communication complexity (and of query complexity and circuit complexity) under specific choices for the error parameter. Examples of direct product theorems in others models of computation include Yao’s XOR lemma [41], Raz’s [34] theorem for two-prover games, Shaltiel’s [36] theorem for fair decision trees, Nisan et al. [30] theorem for decision forests, Drucker’s [11] theorem for randomized query complexity, Sherstov’s [38] theorem for approximated polynomial degree, and Lee and Roland’s [28] theorem for quantum query complexity. Besides their inherent importance, direct product theorems had various important applications such as in probabilistically checkable proofs [34], in circuit complexity [41], and in showing time-space trade-offs [1, 24, 26].

In this paper, we show a direct product theorem for two-party bounded-round public-coin randomized communication complexity. In this model, for computing a relation $f \subseteq \mathscr {X}\times \mathscr {Y}\times \mathscr {Z}$ (where $\mathscr {X}$, $\mathscr {Y}$, and $\mathscr {Z}$ are finite sets), one party, say Alice, is given an input $x\in \mathscr {X}$ and the other party, say Bob, is given an input $y \in \mathscr {Y}$. They are supposed to do local computations using public coins shared between them, communicate a fixed number of rounds and at the end, output an element $z\in \mathscr {Z}$. We only consider complete relations so there exists a z. They succeed if $(x,y,z) \in f$. For a natural number $t \ge 1$ and $\varepsilon \in (0,1)$, let $\mathrm {R}^{(t), \mathrm {pub}}_{\varepsilon } (f)$ be the two-party t-round public-coin communication complexity of f with worst case error $\varepsilon $ (see Definition 2.13).

We show the following.

Theorem 1.1

Let $\mathscr {X}$, $\mathscr {Y}$, and $\mathscr {Z}$ be finite sets, $f \subseteq \mathscr {X}\times \mathscr {Y}\times \mathscr {Z}$ a complete relation, $\varepsilon > 0$, and $k, t \ge 1$ integers. There exists a constant $\kappa \ge 0$ such that

$$\begin{aligned} \mathrm {R}^{(t),\mathrm {pub}}_{1-\left( 1-\varepsilon /2 \right) ^{\varOmega \,\left( k \varepsilon ^2/t^2 \right) }} \left( f^k \right) = \varOmega \,\left( \frac{\varepsilon \cdot k}{t} \cdot \left( \mathrm {R}^{(t),\mathrm {pub}}_{\varepsilon }(f) - \frac{\kappa t^2}{\varepsilon } \right) \right) . \end{aligned}$$

In particular, it implies a strong direct product theorem for the two-party constant-round public-coin randomized communication complexity of all complete relations.^{Footnote 1} Our result generalizes the result of Jain [14] which can be regarded as the special case when $t=1$. Prior to our result, randomized one-way communication complexity is the only model whose strong direct product theorem was established [14]. Hence our result can be considered as an important progress towards settling the strong direct product conjecture for two-party public-coin communication complexity, a major open question in this area. Recently, our result was improved by Braverman et al. [6] with better dependence on the number of rounds, using a new sampling technique introduced in Ref. [7].

As a direct consequence of our result, we get a direct product theorem for the pointer chasing problem defined as follows. Let $n, t\ge 1$ be integers. Alice and Bob are given functions $F_A: [n]\rightarrow [n]$ and $F_B: [n]\rightarrow [n]$, respectively. Let $F^t$ represent alternate composition of $F_A$ and $F_B$ done t times, starting with $F_A$. The parties are supposed to communicate and determine $F^t(1)$. In the bit version of the problem, the players are supposed to output the least significant bit of $F^t(1)$. We refer to the t-pointer chasing problem as $\mathrm {FP}_t$ and the bit version as $\mathrm {BP}_t$. The pointer chasing problem naturally captures the trade-off between number of messages exchanged and the communication used. There is a straightforward t-round deterministic protocol with $t\cdot \log n$ bits of communication for both $\mathrm {FP}_t$ and $\mathrm {BP}_t$. However, if only $t-1$ rounds are allowed to be exchanged between the parties, exponentially more communication is required, treating t as a fixed constant. The communication complexity of this problem has been very well studied in both the classical and the quantum models [17, 23, 25, 31, 33]. Some tight lower bounds that we know so far are as follows.

Theorem 1.2

[33] For any integer $t \ge 1$,

$$\begin{aligned} \mathrm {R}^{(t-1),\mathrm {pub}}_{1/3} \left( \mathrm {FP}_t \right)&\ge \varOmega \,\left( n\log ^{(t-1)}n \right) \\ \mathrm {R}^{(t-1),\mathrm {pub}}_{1/3} \left( \mathrm {BP}_t \right)&\ge \varOmega \,\left( n \right) \end{aligned}$$

As a consequence of Theorem 1.1 we get strong direct product results for this problem. Note that in the descriptions of $\mathrm {FP}_t$ and $\mathrm {BP}_t$, t is a fixed constant, independent of the input size.

Corollary 1.3

For integers $t,k\ge 1$,

$$\begin{aligned} \mathrm {R}^{(t-1),\mathrm {pub}}_{1-2^{-\varOmega \,\left( k/t^2 \right) }} \left( \mathrm {FP}_t^k \right)&\ge \varOmega \,\left( \frac{k}{t} \cdot n \log ^{(t-1)}n \right) \\ \mathrm {R}^{(t-1),\mathrm {pub}}_{1-2^{-\varOmega \,\left( k/t^2 \right) }} \left( \mathrm {BP}_t^k \right)&\ge \varOmega \,\left( \frac{k}{t} \cdot n \right) . \end{aligned}$$

1.1 Our Techniques

We prove our direct product result using information theoretic arguments. Information theory is a versatile tool in communication complexity, especially in proving lower bounds and direct sum and direct product theorems [2, 3, 5, 9, 14, 15, 18–20]. The similar information theoretic arguments have been used to prove parallel repetition theorems for two-prover one-round games as well [12, 34]. The broad argument that we use is as follows. For a given relation f, let the communication required for computing one instance with t rounds and constant success be c. Let us consider a protocol for computing $f^k$ with t rounds and communication o(kc). Let us condition on success on some $\ell $ coordinates. If the overall success in these $\ell $ coordinates is already as small as we want then we are done. Otherwise, we exhibit another coordinate j outside of these $\ell $ coordinates such that success in the j-th coordinate, even when conditioned on success in these $\ell $ coordinates, is bounded away from 1. This way the overall success keeps going down and becomes exponentially small (in k) eventually. We do this argument in the distributional setting where one is concerned with average error over the inputs coming from a specified distribution rather than the worst case error over all inputs. The distributional setting is then related to the worst case setting by the well known Yao’s principle [40].

More concretely, let $\mu $ be a distribution on $\mathscr {X}\times \mathscr {Y}$, possibly non-product across $\mathscr {X}$ and $\mathscr {Y}$. Let c be the minimum communication required for computing f with t-round protocols having error at most $\varepsilon $ averaged over $\mu $. Let the inputs for $f^k$ be drawn from distribution $\mu ^{k}$ (k independent copies of $\mu $). Consider a t-round protocol $\mathscr {P}$ for $f^k$ with communication o(kc) and for the rest of the argument condition on success on a set of coordinates C. If the success probability of this event is as small as we desire then we are done. Otherwise we exhibit a new coordinate $j\notin C$ satisfying the following conditions conditioning on success on all coordinates in C. The distribution of Alice’s and Bob’s input in the j-th coordinate ($X_j Y_j$) is quite close to $\mu $. Here use the same symbol to represent a random variable and its distribution. The joint distribution $X_j Y_j M$, where M is the message transcript of $\mathscr {P}$, can be approximated very well by Alice and Bob using a t-round protocol for f, when they are given input according to $\mu $, using communication less than c. This shows that success in the j-th coordinate must be bounded away from one.

To sample the transcript, we adopt the message compression protocol of Braverman and Rao [5], where they used the protocol to show a direct sum theorem for the same communication model we are considering. Informally, the protocol can be stated as follows.

Braverman–Rao protocol (informal)

Given a Markov chain $Y \leftrightarrow X \leftrightarrow M$ (see Definition 2.1), there exists a public-coin protocol between Alice and Bob, with inputs X and Y, with a single message from Alice to Bob of $O\,\left( \mathrm {I}\,\left( X \, \!: \, \!M \, \!\big \vert \, \!Y \right) + \sqrt{\mathrm {I}\,\left( X \, \!: \, \!M \, \!\big \vert \, \!Y \right) } \right) + 1$ bits, such that at the end of the protocol, Alice and Bob both possess a random variable $M^{\prime }$ which is close to M in the $\ell _1$ distance.

Consider the situation after conditioning on success in all the coordinates in C, as above, and let $X_j Y_j$ represent the input in the j-th coordinate. The Braverman–Rao compression protocol cannot be directly applied at this point. Take the first message $M_1$, sent by Alice, for instance. $Y_j X_j M_1$ doesn’t necessarily form a Markov chain. For example, $M_1$ is the message in which Alice tries to guess Bob’s input $Y_j$ and the event of success is Alice succeeds in doing so. Then it is easy to see that $Y_jX_jM_1$ is not a Markov chain conditioning on success. However, we are able to show that $Y_jX_jM_1$ is ‘close’ to being a Markov chain by further conditioning on appropriate sub-events. We then use a more ‘robust’ Braverman–Rao compression protocol (along the lines of the original), where by being ‘robust’, we mean that the communication cost and the error does not vary much even for XYM which is close to being a Markov chain. (Similar arguments were used by Jain in Ref. [14].) We then apply such a robust message compression protocol to each successive message. Conditioning on success in C incurs a small statistical loss for each message. Thus, the overall error is bounded as the number of messages exchanged is bounded in our model. Recently, Braverman et.al. introduced in Ref. [7] a new simulation whose statistical error is independent of the number of messages. Using this simulation, Braverman et al. [6] strengthened our result with better dependence on the number of rounds.

Another difficulty in this argument is that since $\mu $ may be a non-product distribution, the input of Alice and Bob in other coordinates may be correlated with each other’s input in the j-th coordinate when conditioned on success in C. We overcome this by introducing new random variables DU conditioning on which Alice’s input is independent of Bob’s input. Namely, DU split $\mu ^k$ into a convex combination of product distributions.

This idea of splitting a non-product distribution into convex combination of product distributions has been used in several previous works [2, 3, 5, 12, 14, 34, 35]. $D_{-j}U_{-j}$ is independent of $X_jY_j$ without conditioning on success in all coordinates in C. This fact is sufficient for several direct sum results [2, 5]. However, after conditioning on success in all coordinates in C, $D_{-j}U_{-j}$ is correlated with $X_jY_j$. This lead us to use another important tool namely the correlated sampling protocol, that was also used for example by Holenstein [12] in his proof of a strong direct product theorem for two-prover one-round games. We prove that $D_{-j}U_{-j}$ can be correlatedly sampled by Alice and Bob. Conditioning on $D_{-j}U_{-j}$ and their own inputs, Alice and Bob are able to complete the remaining XY.

As mentioned previously, we build on the arguments used by Jain [14]. He showed a new characterization of two-party one-way public-coin communication complexity and used that characterization to show a strong direct product result for all relations in this model. We are unable to arrive at such characterization for protocols with more than one messages so we use a more direct approach, as outlined above, to prove our direct product result.

1.2 Organization

The rest of the paper is organized as follows. In Sect. 2, we present some background on information theory and communication complexity. In Sect. 3, we prove our main result, Theorem 1.1, starting with some lemmas that are helpful in building the proof. Some proofs are deferred to Sect. 4.

2 Preliminaries

2.1 Information Theory

For integer $n \ge 1$, let [n] represent the set $\{1,2, \ldots , n\}$ and let $\left[ 0 \right] $ be the empty set. Let $\mathscr {X}$ and $\mathscr {Y}$ be finite sets and k be a natural number. Let $\mathscr {X}^k$ be the set $\mathscr {X}\times \cdots \times \mathscr {X}$ the cross product of $\mathscr {X}$, k times. Let $\mu $ be a probability distribution on $\mathscr {X}$. Let $\mu (x)$ represent the probability of $x\in \mathscr {X}$ according to $\mu $. Let X be a random variable distributed according to $\mu $. We use the same symbol to represent a random variable and its distribution whenever it is clear from the context. We use letters in lower-case such as x, y, z to represent the elements in the supports of X, Y, Z, respectively. The expectation of function f on $\mathscr {X}$ is defined as

$$\begin{aligned} \mathop {\mathbb {E}}\limits _{\begin{array}{c} x \leftarrow X \end{array}}\left[ f(x) \right] \mathop {=}\limits ^{\mathrm {def}}\sum _{x \in \mathscr {X}} \mu \left( x \right) \cdot f(x). \end{aligned}$$

The entropy of X is defined by Shannon in [37] as

$$\begin{aligned} \mathrm {H}(X) \mathop {=}\limits ^{\mathrm {def}}- \sum _{x \in \mathscr {X}} \mu (x) \cdot \log \mu (x). \end{aligned}$$

For two distributions $\mu $ and $\lambda $ on $\mathscr {X}$, the distribution $\mu \otimes \lambda $ is defined as $(\mu \otimes \lambda )(x_1,x_2)\mathop {=}\limits ^{\mathrm {def}}\mu (x_1)\cdot \lambda (x_2)$. Define $\mu ^k$ to be $\mu \otimes \cdots \otimes \mu $ with k times. If $L = L_1 \cdots L_k$, we define $L_{-i} \mathop {=}\limits ^{\mathrm {def}}L_1\cdots L_{i-1} L_{i+1} \cdots L_k$ and $L_{<i} \mathop {=}\limits ^{\mathrm {def}}L_1 \cdots L_{i-1}$. The random variable $L_{\le i}$ is defined analogously. The total variance distance between $\mu $ and $\lambda $ is defined to be half of the $\ell _1$ norm of $\mu - \lambda $, i.e.,

$$\begin{aligned} \left\| \lambda - \mu \right\| _{1} \mathop {=}\limits ^{\mathrm {def}}\frac{1}{2} \sum _x \left| \lambda (x) - \mu (x) \right| = \max _{S\subseteq \mathscr {X}} \left| \lambda _S - \mu _S \right| \end{aligned}$$

where $\lambda _S \mathop {=}\limits ^{\mathrm {def}}\sum _{x\in S}\lambda (x)$. We say that $\lambda $ is $\varepsilon $-close to $\mu $ if $\Vert \lambda -\mu \Vert _1\le \varepsilon $. The relative entropy between distributions X and Y on $\mathscr {X}$ is defined as

$$\begin{aligned} \mathrm {D}\,\left( X \big \Vert Y \right) \mathop {=}\limits ^{\mathrm {def}}\mathop {\mathbb {E}}\limits _{\begin{array}{c} x\leftarrow X \end{array}}\left[ \log \frac{\Pr \,\left[ X=x \right] }{\Pr \,\left[ Y=x \right] } \right] . \end{aligned}$$

The relative min-entropy between them is defined as

$$\begin{aligned} \mathrm {S}_{\infty }\left( X \big \Vert Y \right) \mathop {=}\limits ^{\mathrm {def}}\max _{x\in \mathscr {X}} \left\{ \log \frac{\Pr \,\left[ X=x \right] }{\Pr \,\left[ Y=x \right] } \right\} . \end{aligned}$$

It is easy to see that $\mathrm {D}\,\left( X \big \Vert Y \right) \le \mathrm {S}_{\infty }\left( X \big \Vert Y \right) $. Let X, Y, and Z be jointly distributed random variables. We often write XY as a shorthand for the pair $\left( X,Y \right) $. With slight abuse of notations, we write XX for a joint distribution $XX^{\prime }$ where X and $X^{\prime }$ are always equal, i.e., $\Pr \left[ X=X^{\prime } \right] =1$ Let $Y_x$ denote the distribution of Y conditioned on $X=x$. The conditional entropy of Y conditioned on X is defined as

$$\begin{aligned} \mathrm {H}(Y|X) \mathop {=}\limits ^{\mathrm {def}}\mathop {\mathbb {E}}\limits _{\begin{array}{c} x\leftarrow X \end{array}}\left[ \mathrm {H}(Y_x) \right] = \mathrm {H}(XY)-\mathrm {H}(X). \end{aligned}$$

The mutual information between X and Y is defined as

$$\begin{aligned} \mathrm {I}\,\left( X \, \!: \, \!Y \right)&\mathop {=}\limits ^{\mathrm {def}}\mathrm {H}(X)+\mathrm {H}(Y)-\mathrm {H}(XY) \\&= \mathop {\mathbb {E}}\limits _{\begin{array}{c} y \leftarrow Y \end{array}}\left[ \mathrm {D}\,\left( X_y \big \Vert X \right) \right] \\&= \mathop {\mathbb {E}}\limits _{\begin{array}{c} x \leftarrow X \end{array}}\left[ \mathrm {D}\,\left( Y_x \big \Vert Y \right) \right] . \end{aligned}$$

It is easily seen that $\mathrm {I}\,\left( X \, \!: \, \!Y \right) = \mathrm {D}\,\left( XY \big \Vert X \otimes Y \right) $. We say that X and Y are independent if $\mathrm {I}\,\left( X \, \!: \, \!Y \right) = 0$. The conditional mutual information between X and Y, conditioned on Z, is defined as

$$\begin{aligned} \mathrm {I}\,\left( X \, \!: \, \!Y \, \!\big \vert \, \!Z \right)&\mathop {=}\limits ^{\mathrm {def}}\mathop {\mathbb {E}}\limits _{\begin{array}{c} z \leftarrow Z \end{array}}\left[ \mathrm {I}\,\left( X \, \!: \, \!Y \, \!\big \vert \, \!Z=z \right) \right] \\&= \mathrm {H}\left( X|Z \right) +\mathrm {H}\left( Y|Z \right) -\mathrm {H}\left( XY|Z \right) . \end{aligned}$$

The following chain rule for mutual information can be proved easily

$$\begin{aligned} \mathrm {I}\,\left( X \, \!: \, \!YZ \right) = \mathrm {I}\,\left( X \, \!: \, \!Z \right) + \mathrm {I}\,\left( X \, \!: \, \!Y \, \!\big \vert \, \!Z \right) . \end{aligned}$$

Definition 2.1

Let X, $X^{\prime }$, Y, and Z be jointly distributed random variables. We define the joint distribution of $(X^{\prime }Z)(Y|X)$ by

$$\begin{aligned} \Pr [(X^{\prime }Z)(Y|X)=\left( x,z,y \right) ] \mathop {=}\limits ^{\mathrm {def}}\Pr [X^{\prime }=x, Z=z] \cdot \Pr [Y=y|X=x]. \end{aligned}$$

We say that X, Y, and Z is a Markov chain if $XYZ=(XY)(Z|Y)$ and we denote it by $X\leftrightarrow Y\leftrightarrow Z$.

It is easy to see that X, Y, Z is a Markov chain if and only if $\mathrm {I}\,\left( X \, \!: \, \!Z \, \!\big \vert \, \!Y \right) =0$. Ibinson et al. [13] showed that if $\mathrm {I}\,\left( X \, \!: \, \!Z \, \!\big \vert \, \!Y \right) $ is small then XYZ is close to being a Markov chain.

Lemma 2.2

([13]) For any random variables X, Y, and Z, it holds that

$$\begin{aligned} \mathrm {I}\,\left( X \, \!: \, \!Z \, \!\big \vert \, \!Y \right) = \min \left\{ \mathrm {D}\,\left( XYZ \big \Vert X^{\prime }Y^{\prime }Z^{\prime } \right) : X^{\prime } \leftrightarrow Y^{\prime }\leftrightarrow Z^{\prime } \right\} . \end{aligned}$$

The minimum is achieved by the distribution $X^{\prime }Y^{\prime }Z^{\prime }=(XY)(Z|Y)$.

We will need the following basic facts. A very good text for reference on information theory is [10].

Fact 2.3

([10, page32]) The relative entropy is jointly convex in its arguments. That is, for distributions $\mu , \mu ^1, \lambda , \lambda ^1 \in \mathscr {X}$ and $p\in [0,1]$,

$$\begin{aligned} \mathrm {D}\,\left( p \mu + (1-p) \mu ^1 \big \Vert p \lambda + (1-p) \lambda ^1 \right) \le p \cdot \mathrm {D}\,\left( \mu \big \Vert \lambda \right) + (1-p) \cdot \mathrm {D}\,\left( \mu ^1 \big \Vert \lambda ^1 \right) . \end{aligned}$$

Fact 2.4

([10, page24]) The relative entropy satisfies the following chain rule. Let XY and $X^1Y^1$ be random variables on $\mathscr {X}\times \mathscr {Y}$. It holds that

$$\begin{aligned} \mathrm {D}\,\left( X^1Y^1 \big \Vert XY \right) = \mathrm {D}\,\left( X^1 \big \Vert X \right) + \mathop {\mathbb {E}}\limits _{\begin{array}{c} x\leftarrow X^1 \end{array}}\left[ \mathrm {D}\,\left( Y^1_x \big \Vert Y_x \right) \right] . \end{aligned}$$

In particular,

$$\begin{aligned} \mathrm {D}\,\left( X^1Y^1 \big \Vert X\otimes Y \right)&= \mathrm {D}\,\left( X^1 \big \Vert X \right) + \mathop {\mathbb {E}}\limits _{\begin{array}{c} x\leftarrow X^1 \end{array}}\left[ \mathrm {D}\,\left( Y^1_x \big \Vert Y \right) \right] \\&\ge \mathrm {D}\,\left( X^1 \big \Vert X \right) + \mathrm {D}\,\left( Y^1 \big \Vert Y \right) , \end{aligned}$$

where the inequality is from Fact 2.3.

Note that there is no conditioning on x in Y at the end of the first line as in the second argument of the relative entropy X and Y are independent. The following fact proves that the minimizer of the relative entropy is the product of the marginals.

Fact 2.5

Let XY and $X^1Y^1$ be random variables on $\mathscr {X}\times \mathscr {Y}$. It holds that

$$\begin{aligned} \mathrm {D}\,\left( X^1Y^1 \big \Vert X\otimes Y \right) \ge \mathrm {D}\,\left( X^1Y^1 \big \Vert X^1\otimes Y^1 \right) =\mathrm {I}\,\left( X^1 \, \!: \, \!Y^1 \right) . \end{aligned}$$

Proof

From the definition of the relative entropy, we have

$$\begin{aligned} \mathrm {D}\,\left( X^1Y^1 \big \Vert X\otimes Y \right)= & {} \sum _{xy}\Pr \left[ X^1Y^1=xy \right] \log \frac{\Pr \left[ X^1Y^1=xy \right] }{\Pr \left[ X=x \right] \Pr \left[ Y=y \right] } \nonumber \\= & {} \sum _{xy}\Pr \left[ X^1Y^1=xy \right] \left( \log \frac{\Pr \left[ X^1Y^1=xy \right] }{\Pr \left[ X^1=x \right] \Pr \left[ Y^1=y \right] } \right. \nonumber \\&\quad \left. +\, \log \frac{\Pr \left[ X^1=x \right] \Pr \left[ Y^1=y \right] }{\Pr \left[ X=x \right] \Pr \left[ Y=y \right] } \right) \nonumber \\= & {} \mathrm {D}\,\left( X^1Y^1 \big \Vert X^1\otimes Y^1 \right) +\mathrm {D}\,\left( X^1 \big \Vert X \right) +\mathrm {D}\,\left( Y^1 \big \Vert Y \right) \nonumber \\\ge & {} \mathrm {D}\,\left( X^1Y^1 \big \Vert X^1\otimes Y^1 \right) \nonumber . \end{aligned}$$

The equality $\mathrm {D}\,\left( X^1Y^1 \big \Vert X^1\otimes Y^1 \right) =\mathrm {I}\,\left( X^1 \, \!: \, \!Y^1 \right) $ can easily be verified from the definitions.

Fact 2.6

(Pinsker’s inequality, [10, page370]) For distributions $\lambda $ and $\mu $,

$$\begin{aligned} 0 \le \left\| \lambda -\mu \right\| _{1} \le \sqrt{\mathrm {D}\,\left( \lambda \big \Vert \mu \right) }. \end{aligned}$$

The following fact gives a lower bound on each term in the summation in the definition of the relative entropy.

Fact 2.7

([21]) Let $\lambda $ and $\mu $ be distributions on $\mathscr {X}$. For any subset $\mathscr {S}\subseteq \mathscr {X}$, it holds that

$$\begin{aligned} \sum _{x \in \mathscr {S}} \lambda (x) \cdot \log \frac{\lambda (x)}{\mu (x)} \ge -1. \end{aligned}$$

Hence, for any $r>0, c>0$, if $\mathrm {D}\,\left( \lambda \big \Vert \mu \right) \le c$, then it holds that

$$\begin{aligned} \Pr _{\begin{array}{c} x\leftarrow \lambda \end{array}}\left[ \log \frac{\lambda \left( x \right) }{\mu \left( x \right) }\le \frac{c+1}{r} \right] \le r. \end{aligned}$$

The following fact easily follows from the triangle inequality and Fact 2.4.

Fact 2.8

The $\ell _1$ distance and the relative entropy are monotone non-increasing when subsystems are considered. Let XY and $X^1Y^1$ be random variables on $\mathscr {X}\times \mathscr {Y}$, then

$$\begin{aligned} \left\| XY - X^1Y^1 \right\| _{1}&\ge \left\| X - X^1 \right\| _{1} \end{aligned}$$

and

$$\begin{aligned} \mathrm {D}\,\left( XY \big \Vert X^1Y^1 \right)&\ge \mathrm {D}\,\left( X \big \Vert X^1 \right) . \end{aligned}$$

Fact 2.9

For any function $f : \, \mathscr {X}\times \mathscr {R} \rightarrow \mathscr {Y}$ and random variables X, $X_1$ on $\mathscr {X}$ and R on $\mathscr {R}$, such that R is independent of $X X_1$, it holds that

$$\begin{aligned} \left\| Xf(X,R) - X_1f(X_1,R) \right\| _{1} = \left\| X-X_1 \right\| _{1}. \end{aligned}$$

Proof

$$\begin{aligned}&\left\| Xf(X,R) - X_1f(X_1,R) \right\| _{1}\\&\quad = \frac{1}{2}\sum _{xy} \left| \Pr \left[ Xf(X,R)=xy \right] -\Pr \left[ X^1f(X^1,R)=xy \right] \right| \\&\quad = \frac{1}{2}\sum _x \left| \Pr \left[ X=x \right] -\Pr \left[ X^1=x \right] \right| \cdot \sum _y\Pr \left[ f(x,R)=y \right] \\&\quad = \frac{1}{2}\sum _x \left| \Pr \left[ X=x \right] -\Pr \left[ X^1=x \right] \right| = \left\| X-X^1 \right\| _{1}. \end{aligned}$$

$\square $

The following definition was introduced by Holenstein [12]. It plays a critical role in his proof of a parallel repetition theorem for two-prover games.

Definition 2.10

([12]) For two distributions $(X_0Y_0)$ and $(X_1SY_1T)$, we say that $(X_0,Y_0)$ is $\left( 1-\varepsilon \right) $-embeddable in $(X_1S,Y_1T)$ if there exists a probability distribution R over a set $\mathscr {R}$, which is independent of $X_0Y_0$ and functions $f_A:\mathscr {X}\times \mathscr {R}\rightarrow \mathscr {S}$, $f_B:\mathscr {Y}\times \mathscr {R}\rightarrow \mathscr {T}$, such that

$$\begin{aligned} \left\| X_0Y_0f_A(X_0,R)f_B(Y_0,R) - X_1Y_1ST \right\| _{1} \le \varepsilon . \end{aligned}$$

The following lemma was shown by Holenstein [12] using a correlated sampling protocol.

Lemma 2.11

(Corollary 5.3 in [12]) For random variables S, X, and Y, if

$$\begin{aligned} \left\| XYS-(XY)(S|X) \right\| _{1}&\le \varepsilon \end{aligned}$$

and

$$\begin{aligned} \left\| XYS-(XY)(S|Y) \right\| _{1}&\le \varepsilon \end{aligned}$$

then (X, Y) is $\left( 1-5\varepsilon \right) $-embeddable in (XS, YS).

We will need the following generalization of the previous lemma.

Lemma 2.12

For joint random variables $(A^{\prime },B^{\prime },C^{\prime })$ and (A, B), satisfying

$$\begin{aligned}&\mathrm {D}\,\left( A^{\prime }B^{\prime } \big \Vert AB \right) \le \varepsilon \end{aligned}$$

(1)

$$\begin{aligned}&\mathop {\mathbb {E}}\limits _{\begin{array}{c} (a,c)\leftarrow A^{\prime }C^{\prime } \end{array}}\left[ \mathrm {D}\,\left( B^{\prime }_{a,c} \big \Vert B_a \right) \right] \le \varepsilon \end{aligned}$$

(2)

$$\begin{aligned}&\mathop {\mathbb {E}}\limits _{\begin{array}{c} (b,c)\leftarrow B^{\prime }C^{\prime } \end{array}}\left[ \mathrm {D}\,\left( A^{\prime }_{b,c} \big \Vert A_b \right) \right] \le \varepsilon \end{aligned}$$

(3)

it holds that (A, B) is $\left( 1-5\sqrt{\varepsilon } \right) $-embeddable in $(A^{\prime }C^{\prime },B^{\prime }C^{\prime })$.

Proof

Using the definition of the relative entropy, we get the following.

$$\begin{aligned}&\mathop {\mathbb {E}}\limits _{\begin{array}{c} (a,c) \leftarrow A^{\prime }C^{\prime } \end{array}}\left[ \mathrm {D}\,\left( B^{\prime }_{a,c} \big \Vert B_a \right) \right] - \mathop {\mathbb {E}}\limits _{\begin{array}{c} (a,c) \leftarrow A^{\prime }C^{\prime } \end{array}}\left[ \mathrm {D}\,\left( B^{\prime }_{a,c} \big \Vert B^{\prime }_a \right) \right] \\&\quad = \mathop {\mathbb {E}}\limits _{\begin{array}{c} (a,b,c) \leftarrow A^{\prime }B^{\prime }C^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ B^{\prime }=b|A^{\prime }=a \right] }{\Pr \,\left[ B=b|A=a \right] } \right] \\&\quad = \mathop {\mathbb {E}}\limits _{\begin{array}{c} a \leftarrow A^{\prime } \end{array}}\left[ \mathrm {D}\,\left( B^{\prime }_a \big \Vert B_a \right) \right] \ge 0. \end{aligned}$$

This means that

$$\begin{aligned} \mathop {\mathbb {E}}\limits _{\begin{array}{c} (a,c) \leftarrow A^{\prime }C^{\prime } \end{array}}\left[ \mathrm {D}\,\left( B^{\prime }_{a,c} \big \Vert B^{\prime }_a \right) \right] \le \mathop {\mathbb {E}}\limits _{\begin{array}{c} (a,c) \leftarrow A^{\prime }C^{\prime } \end{array}}\left[ \mathrm {D}\,\left( B^{\prime }_{a,c} \big \Vert B_a \right) \right] \le \varepsilon . \end{aligned}$$

(4)

Furthermore,

$$\begin{aligned} \mathop {\mathbb {E}}\limits _{\begin{array}{c} (a,c)\leftarrow A^{\prime }C^{\prime } \end{array}}\left[ \mathrm {D}\,\left( B^{\prime }_{a,c} \big \Vert B_a^{\prime } \right) \right]&= \mathrm {D}\,\left( A^{\prime }C^{\prime }B^{\prime } \big \Vert \left( A^{\prime }C^{\prime } \right) \left( B^{\prime }|A^{\prime } \right) \right) \end{aligned}$$

(5)

$$\begin{aligned}&= \mathrm {D}\,\left( A^{\prime }B^{\prime }C^{\prime } \big \Vert \left( A^{\prime }B^{\prime } \right) \left( C^{\prime }|A^{\prime } \right) \right) \end{aligned}$$

(6)

$$\begin{aligned}&\ge \left\| A^{\prime }B^{\prime }C^{\prime } - \left( A^{\prime }B^{\prime } \right) \left( C^{\prime }|A^{\prime } \right) \right\| _{1}^2. \end{aligned}$$

(7)

Above, Eq. (5) follows from the chain rule for the relative entropy, Eq. (6) is because the distributions $\left( A^{\prime }C^{\prime } \right) \left( B^{\prime }|A^{\prime } \right) $ and $\left( A^{\prime }B^{\prime } \right) \left( C^{\prime }|A^{\prime } \right) $ are same by Definition 2.1, and Eq. (7) follows from Fact 2.6. Now from Eqs. (4) and (7) we get

$$\begin{aligned} \left\| A^{\prime }B^{\prime }C^{\prime } - \left( A^{\prime }B^{\prime } \right) \left( C^{\prime }|A^{\prime } \right) \right\| _{1}&\le \sqrt{\varepsilon }. \end{aligned}$$

By similar arguments we get

$$\begin{aligned} \left\| A^{\prime }B^{\prime }C^{\prime } - \left( A^{\prime }B^{\prime } \right) \left( C^{\prime }|B^{\prime } \right) \right\| _{1}&\le \sqrt{\varepsilon }. \end{aligned}$$

The two inequalities above (using Lemma 2.11) imply that $\left( A^{\prime }, B^{\prime } \right) $ is $\left( 1 - 5 \sqrt{\varepsilon } \right) $-embeddable in $\left( A^{\prime }C^{\prime }, B^{\prime }C^{\prime } \right) $. Namely, there exist functions $f_1$ and $f_2$ and random variable R independent of $A^{\prime }B^{\prime }$ such that $\left\| A^{\prime }B^{\prime }f_1\left( A^{\prime },R \right) f_2\left( B^{\prime },R \right) -A^{\prime }B^{\prime }C^{\prime }C^{\prime } \right\| _{1}\le 5\sqrt{\varepsilon }$. Furthermore, from Fact 2.6 and Eq. (1) we get that

$$\begin{aligned} \left\| A^{\prime }B^{\prime } - AB \right\| _{1} \le \sqrt{\varepsilon }. \end{aligned}$$

Finally,

$$\begin{aligned}&\left\| ABf\left( A,R \right) f\left( B,R \right) -A^{\prime }B^{\prime }C^{\prime }C^{\prime } \right\| _{1}\\&\quad \le \left\| ABf_1\left( A,R \right) f_2\left( B,R \right) -A^{\prime }B^{\prime }f_1\left( A^{\prime },R \right) f_2\left( B^{\prime },R \right) \right\| _{1}\\&\qquad +\left\| A^{\prime }B^{\prime }f_1\left( A^{\prime },R \right) f_2\left( B^{\prime },R \right) -A^{\prime }B^{\prime }C^{\prime }C^{\prime } \right\| _{1}\\&\quad =\left\| AB-A^{\prime }B^{\prime } \right\| _{1}+\left\| A^{\prime }B^{\prime }f_1\left( A^{\prime },R \right) f_2\left( B^{\prime },R \right) -A^{\prime }B^{\prime }C^{\prime }C^{\prime } \right\| _{1}\le 6\sqrt{\varepsilon }, \end{aligned}$$

where the equality is from Fact 2.9. Thus

we get that (A, B) is $\left( 1-6\sqrt{\varepsilon } \right) $-embeddable in $(A^{\prime }C^{\prime },B^{\prime }C^{\prime })$. $\square $

2.2 Communication Complexity

Let $f \subseteq \mathscr {X}\times \mathscr {Y}\times \mathscr {Z}$ be a relation, $t \ge 1$ an integer, and $\varepsilon \in (0,1)$. In this work we only consider complete relations, i.e., for every $(x,y) \in \mathscr {X}\times \mathscr {Y}$, there is some $z \in \mathscr {Z}$ such that $(x,y,z) \in f$. In the two-party t-round public-coin model of communication, Alice, with input $x \in \mathscr {X}$, and Bob, with input $y \in \mathscr {Y}$, do local computation using public coins shared between them and exchange t messages, with Alice sending the first message. At the end of the protocol, the party receiving the t-th message outputs some $z \in \mathscr {Z}$. The output is declared correct if $(x,y,z) \in f$ and wrong otherwise.

Definition 2.13

Let $\mathrm {R}^{(t),\mathrm {pub}}_{\varepsilon }(f)$ represent the two-party t-round public-coin communication complexity of f with worst case error $\varepsilon $, i.e., the minimum number of bits that Alice and Bob need to exchange in a t-round public-coin protocol that the output for each input $\left( x,y \right) $ is correct with probability at least $1-\varepsilon $. We similarly consider two-party t-round deterministic protocols where there are no public coins used by Alice and Bob. Let $\mu \in \mathscr {X}\times \mathscr {Y}$ be a distribution. We let $\mathrm {D}_{\varepsilon }^{(t),\mu }(f)$ represent the two-party t-round distributional communication complexity of f under $\mu $ with expected error $\varepsilon $, i.e., the minimum number of bits Alice and Bob need to exchange in a two-party t-round deterministic protocol for f with distributional error (average error over the inputs) at most $\varepsilon $ under $\mu $.

The following is a consequence of the min–max theorem in game theory, see e.g., [27, Theorem 3.20].

Lemma 2.14

(Yao’s principle, [40]) $ \displaystyle \mathrm {R}^{(t),\mathrm {pub}}_{\varepsilon }(f) = \max _{\mu } \mathrm {D}^{(t),\mu }_{\varepsilon }(f) $.

The following fact about communication protocols can be verified easily.

Fact 2.15

Let $M_1, \ldots , M_t$ be t messages in a deterministic communication protocol between Alice and Bob with inputs X and Y, where X and Y are independent. Then for any $s \in [t]$, X and Y are independent even conditioned on $M_1, \ldots , M_s$.

Let $f^k \subseteq \mathscr {X}^k \times \mathscr {Y}^k \times \mathscr {Z}^k$ be the cross product of f with itself k times. In a protocol for computing $f^k$, Alice will receive input in $\mathscr {X}^k$, Bob will receive input in $\mathscr {Y}^k$ and the output of the protocol will be in $\mathscr {Z}^k$.

3 Proof of Theorem 1.1

We start by showing a few lemmas which are helpful in the proof of the main result. The following theorem was shown by Jain [14] and follows primarily from a message compression argument due to Braverman and Rao [5].

Theorem 3.1

(Lemma 3.8 in [14]) Let $\delta > 0$ and $c \ge 0$. Let $X^{\prime }$, $Y^{\prime }$, and N be random variables for which $Y^{\prime } \leftrightarrow X^{\prime } \leftrightarrow N$ is a Markov chain and the following holds.

$$\begin{aligned} \Pr _{\begin{array}{c} (x,y,m)\leftarrow X^{\prime },Y^{\prime },N \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] }>c \right] \le \delta . \end{aligned}$$

(8)

There exists a public-coin protocol between Alice and Bob, with inputs $X^{\prime }$ and $Y^{\prime }$, with a single message from Alice to Bob of at most $c+O\,\left( \log (1/\delta ) \right) $ bits, such that at the end of the protocol, Alice and Bob possess random variables $M_A$ and $M_B$, respectively, satisfying $ \left\| X^{\prime }Y^{\prime }NN-X^{\prime }Y^{\prime }M_AM_B \right\| _{1} \le 2\delta $.

Remark 3.2

In Ref. [5], the condition $\mathrm {I}\,\left( X^{\prime } \, \!: \, \!N \, \!\big \vert \, \!Y^{\prime } \right) \le c$ is used instead of Eq. (8). It is changed to the current one in Ref. [14]. By the equality $\mathrm {I}\,\left( X^{\prime } \, \!: \, \!N \, \!\big \vert \, \!Y \right) =\mathrm {D}\,\left( X^{\prime }Y^{\prime }N \big \Vert X^{\prime }Y^{\prime }\left( N|Y^{\prime } \right) \right) $ and Fact 2.7, $\mathrm {I}\,\left( X^{\prime } \, \!: \, \!N \, \!\big \vert \, \!Y^{\prime } \right) \le c$ implies

$$\begin{aligned} \Pr _{\begin{array}{c} (x,y,m) \leftarrow X^{\prime },Y^{\prime },N \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] } > \frac{c+1}{\delta } \right] \le \delta . \end{aligned}$$

This modification is essential in our argument since the condition in Eq. (8) is robust when the underlying joint distribution is perturbed slightly, while $\mathrm {I}\,\left( X^{\prime } \, \!: \, \!N \, \!\big \vert \, \!Y^{\prime } \right) $ may change a lot with such a perturbation.

As mentioned in Sect. 1, we will have to work with approximate Markov chains in our argument for the direct product. The following lemma makes Theorem 3.1 more robust to deal with approximate Markov chains. Its proof appears in Sect. 4.

Lemma 3.3

Let $ c \ge 0 $, $ 1 > \varepsilon > 0 $, and $\varepsilon ^{\prime } > 0 $. Let $X^{\prime }$, $Y^{\prime }$, and $M^{\prime }$ be random variables for which the following holds,

$$\begin{aligned} \mathrm {I}\,\left( X^{\prime } \, \!: \, \!M^{\prime } \, \!\big \vert \, \!Y^{\prime } \right) \le c \quad \text {and} \quad \mathrm {I}\,\left( Y^{\prime } \, \!: \, \!M^{\prime } \, \!\big \vert \, \!X^{\prime } \right) \le \varepsilon . \end{aligned}$$

There exists a public-coin protocol between Alice and Bob, with inputs $X^{\prime }$ and $Y^{\prime }$, with a single message from Alice to Bob of at most $\frac{c+5}{\varepsilon ^{\prime }}+O\,\left( \log \frac{1}{\varepsilon ^{\prime }} \right) $ bits, such that at the end of the protocol, Alice and Bob possess a random variable $M_A$ and $M_B$, respectively, satisfying

$$\begin{aligned} \left\| X^{\prime }Y^{\prime }M^{\prime }M^{\prime }-X^{\prime }Y^{\prime }M_AM_B \right\| _{1} \le 3 \sqrt{\varepsilon } + 6 \varepsilon ^{\prime }. \end{aligned}$$

The following lemma generalizes the above lemma to deal with multiple messages. Its proof appears in Sect. 4.

Lemma 3.4

Let $t \ge 1$ be an integer. Let $ \varepsilon ^{\prime } > 0$, $ c_s \ge 0 $, and $ 1 > \varepsilon _s > 0$ for all $ 1 \le s \le t$. Let $R^{\prime }$, $X^{\prime }$, $Y^{\prime }$, $M_1^{\prime }, \ldots , M_t^{\prime }$ be random variables for which the following holds. (Below $ M^{\prime }_{<s} = M^{\prime }_1 \cdots M^{\prime }_{s-1}$ by definition.)

$$\begin{aligned} \mathrm {I}\,\left( X^{\prime } \, \!: \, \!M^{\prime }_s \, \!\big \vert \, \!Y^{\prime }R^{\prime }M^{\prime }_{<s} \right)&\le c_s \end{aligned}$$

(9)

$$\begin{aligned} \mathrm {I}\,\left( Y^{\prime } \, \!: \, \!M^{\prime }_s \, \!\big \vert \, \!X^{\prime }R^{\prime }M^{\prime }_{<s} \right)&\le \varepsilon _s \end{aligned}$$

(10)

for odd s and

$$\begin{aligned} \mathrm {I}\,\left( Y^{\prime } \, \!: \, \!M^{\prime }_s \, \!\big \vert \, \!X^{\prime }R^{\prime }M^{\prime }_{<s} \right)&\le c_s \end{aligned}$$

(11)

$$\begin{aligned} \mathrm {I}\,\left( X^{\prime } \, \!: \, \!M^{\prime }_s \, \!\big \vert \, \!Y^{\prime }R^{\prime }M^{\prime }_{<s} \right)&\le \varepsilon _s \end{aligned}$$

(12)

for even s. There exists a public-coin t-round protocol $\mathscr {P}_t$ between Alice, with input $X^{\prime }R^{\prime }$, and Bob, with input $Y^{\prime }R^{\prime }$, with Alice sending the first message. The total communication of $\mathscr {P}_t$ is at most

$$\begin{aligned} \frac{\sum _{s=1}^tc_s+5t}{\varepsilon ^{\prime }} + O\,\left( t\log \frac{1}{\varepsilon ^{\prime }} \right) . \end{aligned}$$

At end of the protocol, Alice and Bob possess random variables $M_{A,1}^{\prime }, \ldots , M_{A,t}^{\prime }$ and $M_{B,1}^{\prime }, \ldots , M_{B,t}^{\prime }$, respectively, satisfying

$$\begin{aligned} \left\| R^{\prime }X^{\prime }Y^{\prime }M_1^{\prime }M_1^{\prime } \cdots M_t^{\prime }M_t^{\prime }-R^{\prime }X^{\prime }Y^{\prime }M_{A,1}^{\prime }M_{B,1}^{\prime }\cdots M_{A,t}^{\prime }M_{B,t}^{\prime } \right\| _{1} \le 3 \sum _{s=1}^t \sqrt{\varepsilon _s} + 6 \varepsilon ^{\prime } t. \end{aligned}$$

In the above lemma, Alice and Bob shared an input $R^{\prime }$ (potentially correlated with $X^{\prime }Y^{\prime }$). Eventually we will need Alice and Bob to generate this shared part themselves using correlated sampling. The following lemma, obtained from the lemma above, is the one that we will finally use in the proof of our main result. Its proof appears in Sect. 4.

Lemma 3.5

Let random variables $R^{\prime }$, $X^{\prime }$, $Y^{\prime }$, and $M_1^{\prime }, \ldots , M_t^{\prime }$ and numbers $ \varepsilon ^{\prime }$, $c_s$, and $\varepsilon _s$ satisfy all the conditions in Lemma 3.4. Let $\tau > 0$ and let random variables (X, Y) be $(1-\tau )$-embeddable in $(X^{\prime }R^{\prime },Y^{\prime }R^{\prime })$. There exists a public-coin t-round protocol $\mathscr {Q}_t$ between Alice, with input X, and Bob, with input Y, with Alice sending the first message, and total communication at most

$$\begin{aligned} \frac{\sum _{s=1}^tc_s+5t}{\varepsilon ^{\prime }} + O\,\left( t\log \frac{1}{\varepsilon ^{\prime }} \right) . \end{aligned}$$

At the end of the protocol, Alice possesses $R_A M_{A,1} \cdots M_{A,t}$ and Bob possesses $R_BM_{B,1}\cdots M_{B,t}$, such that

$$\begin{aligned}&\left\| X Y R_A R_B M_1M_1 \cdots M_tM_t - X^{\prime } Y^{\prime } R^{\prime } R^{\prime } M_{A,1}^{\prime }M_{B,1}^{\prime }\cdots M_{A,t}^{\prime }M_{B,t}^{\prime } \right\| _{1} \\&\quad \le \tau + 3 \sum _{s=1}^t \sqrt{\varepsilon _s} + 6 \varepsilon ^{\prime } t. \end{aligned}$$

We are now ready to prove our main result, Theorem 1.1. We restate it here for convenience.

Theorem 1.1 Let $\mathscr {X}$, $\mathscr {Y}$, and $\mathscr {Z}$ be finite sets, $f \subseteq \mathscr {X}\times \mathscr {Y}\times \mathscr {Z}$ a complete relation, $\varepsilon > 0$, and $k, t \ge 1$ integers. There exists a constant $\kappa \ge 0$ such that

$$\begin{aligned} \mathrm {R}^{(t),\mathrm {pub}}_{1-\left( 1-\varepsilon /2 \right) ^{\varOmega \,\left( k \varepsilon ^2/t^2 \right) }} \left( f^k \right) = \varOmega \,\left( \frac{\varepsilon \cdot k}{t} \cdot \left( \mathrm {R}^{(t),\mathrm {pub}}_{\varepsilon }(f) - \frac{\kappa t^2}{\varepsilon } \right) \right) . \end{aligned}$$

Proof of Theorem 1.1

Let $\delta \mathop {=}\limits ^{\mathrm {def}}\frac{\varepsilon ^2}{7500t^2}$ and $\delta _1 = \frac{\varepsilon }{3000 t}$. From Yao’s principle (Lemma 2.14) it suffices to prove that for any distribution $\mu $ on $\mathscr {X}\times \mathscr {Y}$,

$$\begin{aligned} \mathrm {D}^{(t),\mu ^k}_{1-(1-\varepsilon /2)^{\lfloor \delta k \rfloor }} \left( f^k \right) \ge \delta _1 k c \end{aligned}$$

where $c \mathop {=}\limits ^{\mathrm {def}}\mathrm {D}^{(t), \mu }_{\varepsilon }(f) - \frac{\kappa t^2}{\varepsilon } $, for constant $\kappa $ to be chosen later. Let XY be distributed according to $\mu ^k$. Let $\mathscr {Q}$ be a t-round deterministic protocol between Alice, with input X, and Bob, with input Y, that computes $f^k$, with Alice sending the first message and total communication $\delta _1 k c$ bits. We assume that t is odd for the rest of the argument and so Bob makes the final output. (The case when t is even follows similarly.) The following claim implies that the success of $\mathscr {Q}$ is at most $(1-\varepsilon /2)^{\lfloor \delta k \rfloor }$ and this shows the desired. $\square $

Claim 3.6

For each $i \in [k]$, let us define a binary random variable $T_i \in \left\{ 0,1 \right\} $, which represents the success of $\mathscr {Q}$, i.e., Bob’s output being correct, on the i-th instance. So $T_i=1$ if the protocol $\mathscr {Q}$ computes the i-th instance of f correctly and $T_i=0$ otherwise. Let $k^{\prime } \mathop {=}\limits ^{\mathrm {def}}\lfloor \delta k \rfloor $. There exist $k^{\prime }$ coordinates $\left\{ i_1, \ldots , i_{k^{\prime }} \right\} $ such that for each $1 \le r \le k^{\prime }-1$, either

$$\begin{aligned} \Pr \,\left[ T^{\left( r \right) }= 1 \right]&\le (1-\frac{\varepsilon }{2})^{k^{\prime }} \end{aligned}$$

or

$$\begin{aligned} \Pr \,\left[ T_{i_{r+1}}=1 \big \vert T^{\left( r \right) }= 1 \right]&\le 1-\frac{\varepsilon }{2} \end{aligned}$$

where $ \displaystyle T^{\left( r \right) }\mathop {=}\limits ^{\mathrm {def}}\prod _{j=1}^{r} T_{i_j} $.

Proof

For $s \in [t]$, we denote the s-th message of $\mathscr {Q}$ by $M_s$. Let $M \mathop {=}\limits ^{\mathrm {def}}M_1 \cdots M_t$. In the following, we assume that $1 \le r < k^{\prime }$. However, the same argument also works when $r=0$, i.e., for identifying the first coordinate, which we skip for the sake of avoiding repetition. Suppose that we have already identified r coordinates $i_1,\ldots ,i_r$ satisfying that

$$\begin{aligned} \Pr \,\left[ T_{i_1}=1 \right]&\le 1 - \frac{\varepsilon }{2} \end{aligned}$$

and

$$\begin{aligned} \Pr \,\left[ T_{i_{j+1}} = 1 \big \vert T^{(j)} = 1 \right]&\le 1 - \frac{\varepsilon }{2} \end{aligned}$$

for $1\le j\le r-1$. If $\Pr \,\left[ T^{\left( r \right) }= 1 \right] \le (1-\frac{\varepsilon }{2})^{k^{\prime }}$ then we are done. So from now on, assume that

$$\begin{aligned} \Pr \,\left[ T^{\left( r \right) }=1 \right] > (1-\frac{\varepsilon }{2})^{k^{\prime }} \ge 2^{-\delta k}. \end{aligned}$$

Let D be a random variable uniformly distributed in $\{0,1\}^k$ and independent of XY. Let $U_i = X_i$ if $D_i = 0$ and $U_i = Y_i$ if $D_i = 1$. For any random variable L, let us introduce the notation

$$\begin{aligned} L^1 \mathop {=}\limits ^{\mathrm {def}}(L | T^{\left( r \right) }= 1). \end{aligned}$$

For example, $X^1Y^1=(XY|T^{\left( r \right) }=1)$. Let $C \mathop {=}\limits ^{\mathrm {def}}\left\{ i_1, \ldots , i_r \right\} $ and

$$\begin{aligned} R_i \mathop {=}\limits ^{\mathrm {def}}D_{-i} U_{-i} X_{C \cup [i-1]} Y_{C \cup [i-1]} \end{aligned}$$

for $i \in [k]$. We denote an element from the range of $R_i$ by $r_i$.^{Footnote 2}

To prove the claim, we will show that there exists a coordinate $j \notin C$ such that

1.
$\left( X_j, Y_j \right) $ can be embedded well in $\left( X^1_j R^1_j, Y^1_j R^1_j \right) $ (with appropriate parameters as required by Lemma 2.12.)
2.
Random variables $R_j^1$, $X^1_j$, $Y^1_j$, and $M^1_1, \ldots , M^1_t$ satisfy the conditions of Lemma 3.4 with appropriate parameters.

The following calculations are helpful for achieving the condition in Eq. (1) in Lemma 2.12. That is, $X^1_jY^1_j$ is close to $\mu $.

$$\begin{aligned} \delta k&> \mathrm {S}_{\infty }\left( X^1Y^1 \big \Vert XY \right) \nonumber \\&\ge \mathrm {D}\,\left( X^1Y^1 \big \Vert XY \right) \nonumber \\&\ge \sum _{i\notin C} \mathrm {D}\,\left( X^1_iY^1_i \big \Vert X_i Y_i \right) \end{aligned}$$

(13)

where the first inequality follows from the assumption that $\Pr \,\left[ T^{\left( r \right) }=1 \right] > 2^{-\delta k}$ and the last inequality follows from Fact 2.4. The following calculations are helpful for achieving the conditions in Eqs. (2) and (3) in Lemma 2.12 that $\left( X^1_j|R^1_jY^1_j \right) \approx \left( X_j|Y_j \right) $ and $\left( Y^1_j|R^1_jX^1_j \right) \approx \left( Y_j|X_j \right) $. It implies that Alice and Bob are able to sample $R_j^1$ correlatedly with inputs $X^1_jY^1_j$.

$$\begin{aligned} \delta k&> \mathrm {S}_{\infty }\left( X^1Y^1D^1U^1 \big \Vert XYDU \right) \ge \mathrm {D}\,\left( X^1Y^1D^1U^1 \big \Vert XYDU \right) \nonumber \\&\ge \mathop {\mathbb {E}}\limits _{\begin{array}{c} (d,u,x_C,y_C) \leftarrow D^1U^1X^1_CY^1_C \end{array}}\left[ \mathrm {D}\,\left( \left( X^1Y^1 \right) _{d,u,x_C,y_C} \big \Vert \left( XY \right) _{d,u,x_C,y_C} \right) \right] \end{aligned}$$

(14)

$$\begin{aligned}&= \sum _{i \notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (d,u,x_{C\cup [i-1]},y_{C\cup [i-1]})\\ \leftarrow D^1U^1X_{C\cup [i-1]}^1Y_{C\cup [i-1]}^1 \end{array}}\left[ \mathrm {D}\,\left( \left( X_i^1Y_i^1 \right) _{\begin{array}{c} d,u,x_{C\cup [i-1]}, \\ y_{C\cup [i-1]} \end{array}} \big \Vert \left( X_iY_i \right) _{\begin{array}{c} d,u,x_{C\cup [i-1]}, \\ y_{C\cup [i-1]} \end{array}} \right) \right] \end{aligned}$$

(15)

$$\begin{aligned}&= \sum _{i\notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (d_i,u_i,r_i) \leftarrow D^1_iU^1_iR^1_i \end{array}}\left[ \mathrm {D}\,\left( (X^1_iY^1_i)_{d_i,u_i,r_i} \big \Vert (X_iY_i)_{d_i,u_i,r_i} \right) \right] \end{aligned}$$

(16)

$$\begin{aligned}&= \frac{1}{2} \sum _{i \notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (r_i,x_i)\leftarrow R^1_iX^1_i \end{array}}\left[ \mathrm {D}\,\left( \left( Y_i^1 \right) _{r_i, x_i} \big \Vert \left( Y_i \right) _{x_i} \right) \right] \nonumber \\&\quad {} + \frac{1}{2} \sum _{i \notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (r_i,y_i)\leftarrow R^1_iY^1_i \end{array}}\left[ \mathrm {D}\,\left( \left( X_i^1 \right) _{r_i, y_i} \big \Vert \left( X_i \right) _{y_i} \right) \right] . \end{aligned}$$

(17)

Above, Eqs. (14) and (15) follow from Fact 2.4. Equation (16) is because $\left( d_i,u_i,r_i \right) $ and $\left( d,u,x_{C\cup [i-1]},y_{C\cup [i-1]} \right) $ are same up to the order. Equation (17) follows because $D^1_i$ is independent of $R^1_i$ and with probability half $ D^1_i$ is 0, in which case $U^1_i = X^1_i$ and with probability half $ D^1_i$ is 1 in which case $U_i^1 = Y_i^1$.

The following calculation is useful for achieving the conditions of Eqs. (9) and (11), exhibiting that the information carried by the messages about the sender’s input is small.

$$\begin{aligned} \delta _1 c k&\ge \left| M^1 \right| \quad (|M^1| \text{ represents } \text{ the } \text{ length } \text{ of } M^1)\nonumber \\&\ge \mathrm {I}\,\left( X^1Y^1 \, \!: \, \!M^1 \, \!\big \vert \, \!D^1U^1X^1_CY^1_C \right) \nonumber \\&= \sum _{i\notin C} \mathrm {I}\,\left( X^1_iY^1_i \, \!: \, \!M^1 \, \!\big \vert \, \!D^1U^1X^1_{C\cup [i-1]}Y^1_{C\cup [i-1]} \right) \nonumber \\&= \sum _{i \notin C} \sum _{s=1}^t \mathrm {I}\,\left( X^1_iY^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!D^1U^1X^1_{C\cup [i-1]}Y^1_{C\cup [i-1]}M^1_{<s} \right) \nonumber \\&= \sum _{i\notin C} \sum _{s=1}^t \mathrm {I}\,\left( X^1_iY^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!D^1_iU^1_iR^1_iM^1_{<s} \right) \nonumber \\ \!&=\! \sum _{i\notin C} \left( \sum _{s \text {odd}} \mathrm {I}\,\left( X^1_iY^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!D^1_iU^1_iR^1_iM^1_{<s} \right) \!+\! \sum _{s \text {even}} \mathrm {I}\,\left( X^1_iY^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!D^1_iU^1_iR^1_iM^1_{<s} \right) \right) \nonumber \\&= \frac{1}{2} \sum _{i\notin C} \left( \sum _{s \text {odd}} \mathrm {I}\,\left( X^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_iY^1_iM^1_{<s} \right) + \sum _{s \text {even}} \mathrm {I}\,\left( Y^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_iX^1_iM^1_{<s} \right) \right) \end{aligned}$$

(18)

Above, we have used the chain rule for the mutual information in the first two equalities. The last inequality follows because $D^1_i$ is independent of $ X^1_i Y_i^1 R^1_i M^1 $ and with probability half $ D^1_i $ is 0, in which case $U^1_i = X^1_i$, and with probability half $ D^1_i $ is 1, in which case $U_i^1 = Y_i^1$.

The following calculation is useful for achieving the conditions of Eqs. (10) and (12), exhibiting that the information carried by the messages about the receiver’s input is very small. Here we are only able to argue round by round and hence pay a factor proportional to the number of messages in the final result. Let $s \in [t]$ be odd for now.

$$\begin{aligned} \delta k&\ge \mathrm {S}_{\infty }\left( D^1 U^1 X^1 Y^1 M^1_{\le s} \big \Vert DUXYM_{\le s} \right) \nonumber \\&\ge \mathrm {D}\,\left( D^1 U^1 X^1 Y^1 M^1_{\le s} \big \Vert DUXYM_{\le s} \right) \nonumber \\&\ge \mathop {\mathbb {E}}\limits _{\begin{array}{c} (d,u,x_C,y_C,m_{\le s}) \leftarrow D^1U^1X^1_CY^1_CM^1_{\le s} \end{array}}\left[ \mathrm {D}\,\left( (X^1Y^1)_{d,u,x_C,y_C,m_{\le s}} \big \Vert (XY)_{d,u,x_C,y_C,m_{\le s}} \right) \right] \nonumber \\&= \sum _{i\notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (d, u, x_{C\cup [i-1]}, y_{C\cup [i-1]}, m_{\le s}) \\ \leftarrow D^1U^1X^1_{C\cup [i-1]}Y^1_{C\cup [i-1]}M^1_{\le s} \end{array}}\left[ \mathrm {D}\,\left( (X^1_i Y^1_i)_{\begin{array}{c} d, u, x_{C\cup [i-1]}, \\ y_{C\cup [i-1]}, m_{\le s} \end{array}} \big \Vert (X_i Y_i)_{\begin{array}{c} d, u, x_{C\cup [i-1]}, \\ y_{C\cup [i-1]}, m_{\le s} \end{array}} \right) \right] \nonumber \\&= \sum _{i\notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (d_i, u_i, r_i, m_{\le s}) \leftarrow D^1_iU^1_iR^1_iM^1_{\le s} \end{array}}\left[ \mathrm {D}\,\left( (X^1_iY^1_i)_{d_i,u_i,r_i,m_{\le s}} \big \Vert (X_iY_i)_{d_i,u_i,r_i,m_{\le s}} \right) \right] \end{aligned}$$

(19)

$$\begin{aligned}&\ge \frac{1}{2} \sum _{i\notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (x_i, r_i, m_{\le s}) \leftarrow X^1_iR^1_iM^1_{\le s} \end{array}}\left[ \mathrm {D}\,\left( (Y^1_i)_{x_i, r_i, m_{\le s}} \big \Vert (Y_i)_{x_i, r_i, m_{\le s}} \right) \right] \nonumber \\&= \frac{1}{2} \sum _{i\notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (x_i, r_i, m_{\le s}) \leftarrow X^1_iR^1_iM^1_{\le s} \end{array}}\left[ \mathrm {D}\,\left( (Y^1_i)_{x_i,r_i,m_{\le s}} \big \Vert (Y_i)_{x_i,r_i,m_{< s}} \right) \right] \end{aligned}$$

(20)

$$\begin{aligned}&= \frac{1}{2} \sum _{i\notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (x_i, r_i, m_{< s}) \leftarrow X^1_iR^1_iM^1_{< s} \end{array}}\left[ \mathrm {D}\,\left( (Y^1_iM^1_s)_{x_i,r_i,m_{<s}} \big \Vert (Y_i)_{x_i,r_i,m_{<s}} \otimes (M^1_s)_{x_i,r_i,m_{<s}} \right) \right] \nonumber \\&\ge \frac{1}{2} \sum _{i\notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (x_i, r_i, m_{< s}) \leftarrow X^1_iR^1_iM^1_{< s} \end{array}}\left[ \mathrm {I}\,\left( (Y^1_i)_{x_i,r_i,m_{<s}} \, \!: \, \!(M^1_s)_{x_i,r_i,m_{<s}} \right) \right] \end{aligned}$$

(21)

$$\begin{aligned}&= \frac{1}{2}\sum _{i\notin C} \; \mathrm {I}\,\left( Y^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!X^1_iR^1_iM^1_{<s} \right) \end{aligned}$$

(22)

Above, we have used Fact 2.4 several times. Equation (19) follows from the definition of $R_i$, Eq. (20) follows from the fact that $Y\leftrightarrow X_iR_iM_{<s}\leftrightarrow M_s$ for any i, when s is odd, and Eq. (21) follows from Fact 2.5. From a symmetric argument, we can show that when $s \in [t]$ is even then

$$\begin{aligned} \frac{1}{2} \sum _{i \notin C} \; \mathrm {I}\,\left( X^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!Y^1_iR^1_iM^1_{<s} \right) \le \delta k. \end{aligned}$$

This and Eq. (22) together imply that

$$\begin{aligned} \sum _{i\notin C} \left( \sum _{s\text {odd}} \mathrm {I}\,\left( Y^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_iX^1_iM^1_{<s} \right) + \sum _{s\text {even}} \mathrm {I}\,\left( X^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_iY^1_iM^1_{<s} \right) \right) \le 2 \delta kt. \end{aligned}$$

(23)

Note that in a true protocol, the LHS in the above inequality is 0. Here we prove that conditioning on success on all coordinates in C, it is still small.

Combining Eqs. (13), (17), (18) and (23), and making standard use of Markov’s inequality, we can get a coordinate $j \notin C$ such that

$$\begin{aligned}&\mathrm {D}\,\left( X^1_j Y^1_j \big \Vert X_j Y_j \right) \le 12 \delta \end{aligned}$$

(24)

$$\begin{aligned}&\mathop {\mathbb {E}}\limits _{\begin{array}{c} (r_j, x_j) \leftarrow R^1_jX^1_j \end{array}}\left[ \mathrm {D}\,\left( \left( Y_j^1 \right) _{r_j,x_j} \big \Vert \left( Y_j \right) _{x_j} \right) \right] \le 12 \delta \end{aligned}$$

(25)

$$\begin{aligned}&\mathop {\mathbb {E}}\limits _{\begin{array}{c} (r_j, y_j) \leftarrow R^1_jY^1_j \end{array}}\left[ \mathrm {D}\,\left( \left( X_j^1 \right) _{r_j,y_j} \big \Vert \left( X_j \right) _{y_j} \right) \right] \le 12 \delta \end{aligned}$$

(26)

$$\begin{aligned}&\sum _{s \text {odd}} \mathrm {I}\,\left( X^1_j \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_jY^1_jM^1_{<s} \right) + \sum _{s \text {even}} \mathrm {I}\,\left( Y^1_j \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_jX^1_jM^1_{<s} \right) \le 12 \delta _1 c \end{aligned}$$

(27)

$$\begin{aligned}&\sum _{s \text {odd}} \mathrm {I}\,\left( Y^1_j \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_jX^1_jM^1_{<s} \right) + \sum _{s \text {even}} \mathrm {I}\,\left( X^1_j \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_jY^1_jM^1_{<s} \right) \le 12 \delta t. \end{aligned}$$

(28)

Let

$$\begin{aligned} \varepsilon ^{\prime }&\mathop {=}\limits ^{\mathrm {def}}\frac{\varepsilon }{125t} \end{aligned}$$

(29)

$$\begin{aligned} \varepsilon _s&\mathop {=}\limits ^{\mathrm {def}}{\left\{ \begin{array}{ll} \mathrm {I}\,\left( Y^1_j \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_jX^1_jM^1_{<s} \right) &{}\text {if}\, s \in [t]\,\text {is odd}\\ \mathrm {I}\,\left( X^1_j \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_jY^1_jM^1_{<s} \right) &{}\text {if}\, s \in [t]\, \text {is even} \end{array}\right. }\end{aligned}$$

(30)

$$\begin{aligned} c_s&\mathop {=}\limits ^{\mathrm {def}}{\left\{ \begin{array}{ll} \mathrm {I}\,\left( Y^1_j \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_jX^1_jM^1_{<s} \right) &{}\text {if}\, s \in [t]\, \text {is even}\\ \mathrm {I}\,\left( X^1_j \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_jY^1_jM^1_{<s} \right) &{}\text {if}\, s \in [t]\,\text {is odd.} \end{array}\right. } \end{aligned}$$

(31)

From Eq. (28) and Cauchy–Schwartz inequality, we have that $\sum _{s=1}^t \sqrt{\varepsilon _s} \le \sqrt{12\delta } t $. From Eqs. (24) to (26) and Lemma 2.12, we can infer that $\left( X_j, Y_j \right) $ is $(1-10\sqrt{3\delta })$-embeddable in $\left( X^1_j R^1_j, Y^1_j R^1_j \right) $. This, combined with Eqs. (27) and (28) and Lemma 3.5, (by taking $\varepsilon ^{\prime }$, $\varepsilon _s$, and $c_s$ in Lemma 3.5 to be Eqs. (29) and (30) and Eq. (31), respectively, and taking $ X Y X^{\prime } Y^{\prime } R^{\prime } M_1^{\prime } \cdots M_t^{\prime }$ to be $ X_j Y_j X_j^1 Y^1_j R^1_j M_1^1 \cdots M_t^1 $) imply the following. There exists a public-coin t-round protocol $\mathscr {Q}^1$ between Alice, with input $X_j$, and Bob, with input $Y_j$, with Alice sending the first message and total communication

$$\begin{aligned} \frac{12 \delta _1 c + 5t}{\varepsilon ^{\prime }} + O\,\left( t \log \frac{1}{\varepsilon ^{\prime }} \right) < \mathrm {D}^{(t), \mu }_{\varepsilon }(f) \end{aligned}$$

such that at the end Alice possesses $R_AM_{A,1} \cdots M_{A,t}$ and Bob possesses $R_BM_{B,1} \cdots M_{B,t}$, satisfying

$$\begin{aligned}&\left\| X_j Y_j R_A R_B M_{A,1}M_{B,1} \cdots M_{A,t}M_{B,t} - X^1_j Y^1_j R^1_j R^1_j M^1_1M^1_1 \cdots M^1_tM^1_t \right\| _{1}\\&\le 10 \sqrt{3 \delta } + 3 \sqrt{12 \delta } t + 6 \varepsilon ^{\prime } t < \frac{\varepsilon }{2}. \end{aligned}$$

Assume towards contradiction that $\Pr \,\left[ T_{j}=1 \big \vert T^{\left( r \right) }= 1 \right] > 1-\frac{\varepsilon }{2}$. Consider a protocol $\mathscr {Q}^2$ (with no communication) for f between Alice, with input $X^1_j R^1_j M^1_1 \cdots M^1_t$, and Bob, with input $Y^1_jR^1_j M^1_1 \cdots M^1_t$, as follows. Bob generates the rest of the random variables present in $Y^1$ (not present in his input) himself. He can do this because, conditioned on his input, those other random variables are independent of Alice’s input. (Here we used Fact 2.15.) He then generates the output for the j-th coordinate in $\mathscr {Q}$, and makes it the output of $\mathscr {Q}^2$. This ensures that the success probability of $\mathscr {Q}^2$ is $\Pr \,\left[ T_{j}=1 \big \vert T^{\left( r \right) }= 1 \right] >1 - \frac{\varepsilon }{2}$. Now consider a protocol $\mathscr {Q}^3$ for f, with Alice’s input $X_j$ and Bob’s input $Y_j$, which is a composition of $\mathscr {Q}^1$ followed by $\mathscr {Q}^2$. This ensures, using Fact 2.9, that the probability of success (averaged over the public coins and inputs $X_j$ and $Y_j$) of $\mathscr {Q}^3$ is larger than $1 - \varepsilon $. Finally, by fixing the public coins of $\mathscr {Q}^3$, we get a deterministic protocol $\mathscr {Q}^4$ for f with Alice’s input $X_j$ and Bob’s input $Y_j$ such that the communication of $\mathscr {Q}^4$ is less than $\mathrm {D}^{(t), \mu }_{\varepsilon }(f)$ and the success probability (averaged over the inputs $X_j$ and $Y_j$) of $\mathscr {Q}^4$ is larger than $1 - \varepsilon $. This is a contradiction to the definition of $\mathrm {D}^{(t), \mu }_{\varepsilon }(f)$. (Recall that $X_jY_j$ are distributed according to $\mu $.) So it must be that $\Pr \,\left[ T_{j} = 1 \big \vert T^{\left( r \right) }= 1 \right] \le 1 - \frac{\varepsilon }{2}$. The claim now follows by setting $i_{r+1} = j$.

4 Deferred Proofs

Proof of Lemma 3.3

Let us introduce a new random variable N with joint distribution

$$\begin{aligned} X^{\prime } Y^{\prime } N \mathop {=}\limits ^{\mathrm {def}}(X^{\prime }Y^{\prime })(M^{\prime }|X^{\prime }). \end{aligned}$$

Note that $Y^{\prime } \leftrightarrow X^{\prime } \leftrightarrow N$ is a Markov chain. Using Lemma 2.2, we have that

$$\begin{aligned} \mathrm {D}\,\left( X^{\prime }Y^{\prime }M^{\prime } \big \Vert X^{\prime }Y^{\prime }N \right) = \mathrm {I}\,\left( Y^{\prime } \, \!: \, \!M^{\prime } \, \!\big \vert \, \!X^{\prime } \right) \le \varepsilon . \end{aligned}$$

(32)

Applying Fact 2.6, we get $\left\| X^{\prime }Y^{\prime }M^{\prime }-X^{\prime }Y^{\prime }N \right\| _{1} \le \sqrt{\varepsilon }$. Theorem 3.1 and Claim 4.1 below together imply that there exists a public-coin protocol between Alice and Bob, with inputs $X^{\prime }$ and $Y^{\prime }$, with a single message from Alice to Bob of $\frac{c+5}{\varepsilon ^{\prime }}+O\,\left( \log \frac{1}{\varepsilon +\varepsilon ^{\prime }} \right) =\frac{c+5}{\varepsilon ^{\prime }}+O\,\left( \log \frac{1}{\varepsilon ^{\prime }} \right) $ bits, at the end of which Alice and Bob possess random variables $N^{\prime }_A$ and $N^{\prime }_B$, respectively, satisfying

$$\begin{aligned} \left\| X^{\prime } Y^{\prime } N^{\prime }_AN^{\prime }_B - X^{\prime } Y^{\prime } NN \right\| _{1} \le 2 \sqrt{\varepsilon } + 6 \varepsilon ^{\prime }. \end{aligned}$$

Finally, using the triangle inequality for the $\ell _1$ norm we conclude the desired.

Claim 4.1

Let random variables $X^{\prime }$, $Y^{\prime }$, $M^{\prime }$, and N and numbers c, $\varepsilon $, and $\varepsilon ^{\prime }$ be the same as in the statement and the proof of Lemma 3.3. It holds that

$$\begin{aligned} \Pr _{\begin{array}{c} (m,x,y) \leftarrow NX^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] } \ge \frac{c+5}{\varepsilon ^{\prime }} \right] \le 3 \varepsilon ^{\prime } + \sqrt{\varepsilon }. \end{aligned}$$

Proof

For any m, x, and y, it holds that

$$\begin{aligned} \log \frac{\Pr \,\left[ N=m|X^{\prime }=x \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] }&= \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] } \\&= \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] }\\&\quad +\, \log \frac{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|Y^{\prime }=y \right] } \\&\quad +\, \log \frac{\Pr \,\left[ M^{\prime }=m, Y^{\prime }=y \right] }{\Pr \,\left[ N=m, Y^{\prime }=y \right] }. \end{aligned}$$

From the union bound, the above calculation, and using that $ 1 > \varepsilon > 0 $, we get

$$\begin{aligned}&\Pr _{\begin{array}{c} (m,x,y) \leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] } \ge \frac{c+5}{\varepsilon ^{\prime }} \right] \\&\qquad \qquad {} = \Pr _{\begin{array}{c} (m,x,y) \leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] } \ge \frac{c+5}{\varepsilon ^{\prime }} \right] \\&\qquad \qquad {} \le \Pr _{\begin{array}{c} (m,x,y) \leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] } \ge \frac{\varepsilon +1}{\varepsilon ^{\prime }} \right] \\&\qquad \qquad \quad {} + \Pr _{\begin{array}{c} (m,x,y) \leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|Y^{\prime }=y \right] } \ge \frac{c+1}{\varepsilon ^{\prime }} \right] \\&\qquad \qquad \quad {} + \Pr _{\begin{array}{c} (m,x,y) \leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ M^{\prime }=m, Y^{\prime }=y \right] }{\Pr \,\left[ N=m, Y^{\prime }=y \right] } \ge \frac{\varepsilon +1}{\varepsilon ^{\prime }} \right] . \end{aligned}$$

We bound each of the above term separately, starting with the first one. Let us define the set

$$\begin{aligned} G_1 \mathop {=}\limits ^{\mathrm {def}}\left\{ (m,x,y) : \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] } < \frac{\varepsilon +1}{\varepsilon ^{\prime }} \right\} . \end{aligned}$$

The following calculation gives a bound on the first term.

$$\begin{aligned} 0&\ge -\mathop {\mathbb {E}}\limits _{\begin{array}{c} (x,y) \leftarrow X^{\prime }Y^{\prime } \end{array}}\left[ \mathrm {D}\,\left( M^{\prime }_{xy} \big \Vert N_{xy} \right) \right] \nonumber \\&= \mathop {\mathbb {E}}\limits _{\begin{array}{c} (m,x,y) \leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] } \right] \end{aligned}$$

(33)

$$\begin{aligned}&= \sum _{(m,x,y) \in G_1} \left( \Pr \,\left[ M^{\prime }=m, X^{\prime }=x, Y^{\prime }=y \right] \cdot \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] } \right) \nonumber \\&\quad {} + \sum _{(m,x,y) \notin G_1} \left( \Pr \,\left[ M^{\prime }=m, X^{\prime }=x, Y^{\prime }=y \right] \cdot \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] } \right) \nonumber \\&\ge \sum _{(m,x,y) \in G_1} \left( \Pr \,\left[ M^{\prime }=m, X^{\prime }=x, Y^{\prime }=y \right] \cdot \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] } \right) \nonumber \\&\quad {} + \Pr \,\left[ \left( M^{\prime },X^{\prime },Y^{\prime } \right) \notin G_1 \right] \cdot \frac{\varepsilon +1}{\varepsilon ^{\prime }} \end{aligned}$$

(34)

$$\begin{aligned}&= \sum _{(m,x,y) \notin G_1} \left( \Pr \,\left[ M^{\prime }=m, X^{\prime }=x, Y^{\prime }=y \right] \cdot \log \frac{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] } \right) \nonumber \\&\quad {} -\, \mathrm {D}\,\left( M^{\prime }X^{\prime }Y^{\prime } \big \Vert NX^{\prime }Y^{\prime } \right) + \Pr \,\left[ \left( M^{\prime },X^{\prime },Y^{\prime } \right) \notin G_1 \right] \cdot \frac{\varepsilon +1}{\varepsilon ^{\prime }} \end{aligned}$$

(35)

$$\begin{aligned}&\ge - 1 - \varepsilon + \Pr \,\left[ \left( M^{\prime },X^{\prime },Y^{\prime } \right) \notin G_1 \right] \cdot \frac{\varepsilon +1}{\varepsilon ^{\prime }} \end{aligned}$$

(36)

Above, Eqs. (33) and (35) follow from the definition of the relative entropy and Eq. (34) follows from the definition of $G_1$. To get Eq. (36), we use Fact 2.7 and Eq. (32). Equation (36) implies that

$$\begin{aligned} \Pr \,\left[ \left( M^{\prime },X^{\prime },Y^{\prime } \right) \notin G_1 \right] \le \varepsilon ^{\prime }. \end{aligned}$$

To upper bound the second term, let us define

$$\begin{aligned} G_2 \mathop {=}\limits ^{\mathrm {def}}\left\{ (m,x,y) : \log \frac{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|Y^{\prime }=y \right] } < \frac{c+1}{\varepsilon ^{\prime }} \right\} . \end{aligned}$$

The following calculation gives a bound on the second term.

$$\begin{aligned} c&\ge \mathrm {I}\,\left( M^{\prime } \, \!: \, \!X^{\prime } \, \!\big \vert \, \!Y^{\prime } \right) \end{aligned}$$

(37)

$$\begin{aligned}&= \mathop {\mathbb {E}}\limits _{\begin{array}{c} (m,x,y)\leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|Y^{\prime }=y \right] } \right] \end{aligned}$$

(38)

$$\begin{aligned}&= \sum _{(m,x,y) \in G_2} \left( \Pr \,\left[ M^{\prime }=m, X^{\prime }=x, Y^{\prime }=y \right] \cdot \log \frac{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x,Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|Y^{\prime }=y \right] } \right) \nonumber \\&\quad {} + \sum _{(m,x,y) \notin G_2} \left( \Pr \,\left[ M^{\prime }=m, X^{\prime }=x, Y^{\prime }=y \right] \cdot \log \frac{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x,Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|Y^{\prime }=y \right] } \right) \nonumber \\&\ge - 1 + \frac{c+1}{\varepsilon ^{\prime }} \cdot \Pr \,\left[ \left( M^{\prime },X^{\prime },Y^{\prime } \right) \notin G_2 \right] \end{aligned}$$

(39)

Above, Eq. (37) is one of the assumptions in Lemma 3.3. Equation (38) follows from the definition of the conditional mutual information and Eq. (39) follows from the definition of $G_2$ and Fact 2.7. Equation (39) implies that

$$\begin{aligned} \Pr \,\left[ \left( M^{\prime },X^{\prime },Y^{\prime } \right) \notin G_2 \right] \le \varepsilon ^{\prime }. \end{aligned}$$

To bound the last term, we define

$$\begin{aligned} G_3 \mathop {=}\limits ^{\mathrm {def}}\left\{ (m,x,y) : \log \frac{\Pr \,\left[ M^{\prime }=m, Y^{\prime }=y \right] }{\Pr \,\left[ N=m, Y^{\prime }=y \right] } < \frac{\varepsilon +1}{\varepsilon ^{\prime }} \right\} . \end{aligned}$$

The following calculation gives a bound on the third term.

$$\begin{aligned} \varepsilon&\ge \mathrm {D}\,\left( X^{\prime }Y^{\prime }M^{\prime } \big \Vert X^{\prime }Y^{\prime }N \right) \nonumber \\&\ge \mathrm {D}\,\left( Y^{\prime }M^{\prime } \big \Vert Y^{\prime }N \right) \end{aligned}$$

(40)

$$\begin{aligned}&= \mathop {\mathbb {E}}\limits _{\begin{array}{c} (m,x,y) \leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ M^{\prime }=m, Y^{\prime }=y \right] }{\Pr \,\left[ N=m, Y^{\prime }=y \right] } \right] \nonumber \\&= \sum _{(m,x,y) \in G_3} \left( \Pr \,\left[ M^{\prime }=m, X^{\prime }=x, Y^{\prime }=y \right] \cdot \log \frac{\Pr \,\left[ M^{\prime }=m, Y^{\prime }=y \right] }{\Pr \,\left[ N=m,Y^{\prime }=y \right] } \right) \nonumber \\&\quad {} + \sum _{(m,x,y) \notin G_3} \left( \Pr \,\left[ M^{\prime }=m, X^{\prime }=x, Y^{\prime }=y \right] \cdot \log \frac{\Pr \,\left[ M^{\prime }=m, Y^{\prime }=y \right] }{\Pr \,\left[ N=m, Y^{\prime }=y \right] } \right) \nonumber \\&\ge -1 + \Pr \,\left[ \left( M^{\prime },X^{\prime },Y^{\prime } \right) \notin G_3 \right] \cdot \frac{\varepsilon +1}{\varepsilon ^{\prime }} \end{aligned}$$

(41)

Above, Eq. (40) follows from Fact 2.8 and Eq. (41) follows from the definition of $G_3$ and Fact 2.7. Equation (41) implies that

$$\begin{aligned} \Pr \,\left[ \left( M^{\prime },X^{\prime },Y^{\prime } \right) \notin G_3 \right] \le \varepsilon ^{\prime }. \end{aligned}$$

Combining the bounds for the three terms we get

$$\begin{aligned} \Pr _{\begin{array}{c} (m,x,y)\leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] } \ge \frac{c+5}{\varepsilon ^{\prime }} \right] \le 3 \varepsilon ^{\prime }. \end{aligned}$$

Using $\left\| X^{\prime }Y^{\prime }M^{\prime } - X^{\prime }Y^{\prime }N \right\| _{1} \le \sqrt{\varepsilon } $ (as was shown previously), we finally have

$$\begin{aligned} \Pr _{\begin{array}{c} (m,x,y)\leftarrow NX^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] } \ge \frac{c+5}{\varepsilon ^{\prime }} \right] \le 3 \varepsilon ^{\prime }+\sqrt{\varepsilon }. \end{aligned}$$

$\square $

Proof of Lemma 3.4

We prove the lemma by induction on t. For the base case $t=1$, note that

$$\begin{aligned} \mathrm {I}\,\left( X^{\prime }R^{\prime } \, \!: \, \!M_1^{\prime } \, \!\big \vert \, \!Y^{\prime }R^{\prime } \right)&= \mathrm {I}\,\left( X^{\prime } \, \!: \, \!M_1^{\prime } \, \!\big \vert \, \!Y^{\prime }R^{\prime } \right) \le c_1 \end{aligned}$$

and

$$\begin{aligned} \mathrm {I}\,\left( Y^{\prime }R^{\prime } \, \!: \, \!M_1^{\prime } \, \!\big \vert \, \!X^{\prime }R^{\prime } \right)&= \mathrm {I}\,\left( Y^{\prime } \, \!: \, \!M_1^{\prime } \, \!\big \vert \, \!X^{\prime }R^{\prime } \right) \le \varepsilon _1. \end{aligned}$$

Lemma 3.3 implies (by taking $X^{\prime }$, $Y^{\prime }$, and $M^{\prime }$ in Lemma 3.3 to be $X^{\prime }R^{\prime }$, $Y^{\prime }R^{\prime }$, and $M_1^{\prime }$ respectively) that Alice, with input $X^{\prime }R^{\prime }$, and Bob, with input $Y^{\prime }R^{\prime }$, can run a public-coin protocol with a single message from Alice to Bob of

$$\begin{aligned} \frac{c_1+5}{\varepsilon ^{\prime }}+O\,\left( \log \frac{1}{\varepsilon ^{\prime }} \right) \end{aligned}$$

bits and generate random variables $M_{A,1}^{\prime }$ and $M_{B,1}^{\prime }$, respectively, satisfying

$$\begin{aligned} \left\| R^{\prime } X^{\prime } Y^{\prime } M^{\prime }_1 M^{\prime }_1 - R^{\prime } X^{\prime } Y^{\prime } M_{A,1}^{\prime }M_{B,1}^{\prime } \right\| _{1} \le 3 \sqrt{\varepsilon _1} + 6 \varepsilon ^{\prime }. \end{aligned}$$

Now let $t > 1$. We assume that t is odd. For even t, a similar argument follows. From the induction hypothesis, there exists a public-coin $t-1$ round protocol $\mathscr {P}_{t-1}$ between Alice, with input $X^{\prime }R^{\prime }$, and Bob, with input $Y^{\prime }R^{\prime }$, with Alice sending the first message, and total communication

$$\begin{aligned} \frac{\sum _{s=1}^{t-1}c_s+5(t-1)}{\varepsilon ^{\prime }} + O\,\left( (t-1) \log \frac{1}{\varepsilon ^{\prime }} \right) \end{aligned}$$

(42)

such that at the end both Alice and Bob possess random variables $M_{A,1}^{\prime }, \ldots , M_{A,t-1}^{\prime }$ and $M_{B,1}^{\prime }, \ldots , M_{B,t-1}^{\prime }$, satisfying

$$\begin{aligned}&\left\| R^{\prime } X^{\prime } Y^{\prime } M_{A,1}^{\prime }M_{B,1}^{\prime } \cdots M_{A,t-1}^{\prime }M_{B,t-1}^{\prime } - R^{\prime } X^{\prime } Y^{\prime } M_1^{\prime }M_1^{\prime } \cdots M_{t-1}^{\prime }M_{t-1}^{\prime } \right\| _{1} \nonumber \\&\quad \le 3 \sum _{s=1}^{t-1} \sqrt{\varepsilon _s} + 6 \varepsilon ^{\prime } (t-1). \end{aligned}$$

(43)

Note that

$$\begin{aligned} \mathrm {I}\,\left( Y^{\prime } R^{\prime } M_{<t}^{\prime } \, \!: \, \!M_t^{\prime } \, \!\big \vert \, \!X^{\prime }R^{\prime }M_{<t}^{\prime } \right) = \mathrm {I}\,\left( Y^{\prime } \, \!: \, \!M_t^{\prime } \, \!\big \vert \, \!X^{\prime }R^{\prime }M_{<t}^{\prime } \right) \le c_t \end{aligned}$$

and

$$\begin{aligned} \mathrm {I}\,\left( X^{\prime }R^{\prime }M_{<t}^{\prime } \, \!: \, \!M_t^{\prime } \, \!\big \vert \, \!Y^{\prime }R^{\prime }M_{<t}^{\prime } \right) = \mathrm {I}\,\left( X^{\prime } \, \!: \, \!M_t^{\prime } \, \!\big \vert \, \!Y^{\prime }R^{\prime }M_{<t}^{\prime } \right) \le \varepsilon _t. \end{aligned}$$

Therefore, Lemma 3.3 implies (by taking $X^{\prime }$, $Y^{\prime }$, and $M^{\prime }$ in Lemma 3.3 to be $X^{\prime }R^{\prime }M_{<t}^{\prime }$, $Y^{\prime }R^{\prime }M_{<t}^{\prime }$, and $M_t^{\prime }$ respectively) that Alice, with input $X^{\prime }R^{\prime }M_{<t}^{\prime }$, and Bob, with input $Y^{\prime }R^{\prime }M_{<t}^{\prime }$, can run a public coin protocol $\mathscr {P}$ with a single message from Alice to Bob of

$$\begin{aligned} \frac{c_t+5}{\varepsilon ^{\prime }} + O\,\left( \log \frac{1}{\varepsilon ^{\prime }} \right) \end{aligned}$$

(44)

bits and generate new random variable $M^{\prime \prime }_{A,t}$ and $M^{\prime \prime }_{B,t}$, respectively, satisfying

$$\begin{aligned} \left\| R^{\prime } X^{\prime } Y^{\prime } M_1^{\prime } \cdots M_{t-1}^{\prime } M_t^{\prime }M_t^{\prime } - R^{\prime } X^{\prime } Y^{\prime } M_1^{\prime } \cdots M_{t-1}^{\prime } M^{\prime \prime }_{A,t} M^{\prime \prime }_{B,t} \right\| _{1} \!\le \! 3 \sqrt{\varepsilon _t} \!+\! 6 \varepsilon ^{\prime }. \end{aligned}$$

(45)

Fact 2.9 and Eq. (43) imply that

Thus, Alice, with input $X^{\prime } R^{\prime } M_{A,<t}^{\prime }$, and Bob, with input $Y^{\prime } R^{\prime } M_{B,<t}^{\prime }$, on running protocol $\mathscr {P}$ will generate new random variables $M_{A,t}^{\prime }$ and $M_{B,t}^{\prime }$, respectively, satisfying

$$\begin{aligned}&\left\| R^{\prime } X^{\prime } Y^{\prime } M_{A,1}^{\prime }M_{B,1}^{\prime } \cdots M_{A,t-1}^{\prime }M_{B,t-1}^{\prime } M_{A,t}^{\prime }M_{B,t}^{\prime } \!-\! R^{\prime } X^{\prime } Y^{\prime } M_1^{\prime }M_1^{\prime } \cdots M_{t-1}^{\prime }M_{t-1}^{\prime } M_{A,t}^{\prime \prime }M_{B,t}^{\prime \prime } \right\| _{1} \nonumber \\&\quad = \left\| R^{\prime } X^{\prime } Y^{\prime } M_{A,1}^{\prime }M_{B,1}^{\prime } \cdots M_{A,t-1}^{\prime }M_{B,t-1}^{\prime } - R^{\prime } X^{\prime } Y^{\prime } M_1^{\prime }M_1^{\prime } \cdots M_{t-1}^{\prime }M_{t-1}^{\prime } \right\| _{1} \nonumber \\&\quad {} \le 3 \sum _{s=1}^{t-1} \sqrt{\varepsilon _s} + 6 \varepsilon ^{\prime } (t-1). \end{aligned}$$

(46)

where the equality follows from Fact 2.9 because $M_{A,t}^{\prime }$ and $M_{A,t}^{\prime \prime }$ can be obtained by applying a same function (protocol) on $XR^{\prime }M^{\prime }_{A,<t}$ and $Y^{\prime }R^{\prime }M^{\prime }_{B,<t}$, respectively. Same for $M_{B,t}^{\prime }$ and $M_{B,t}^{\prime \prime }$. The equality is from Eq. (43). Therefore, by composing protocol $\mathscr {P}_{t-1}$ and protocol $\mathscr {P}$, using Eqs. (42) and (44)–(46) and the triangle inequality for the $\ell _1$ norm, we get a public-coin t-round protocol $\mathscr {P}_t$ between Alice, with input $X^{\prime }R^{\prime }$, and Bob, with input $Y^{\prime }R^{\prime }$, with Alice sending the first message, and total communication

$$\begin{aligned} \frac{\sum _{s=1}^{t} c_s + 5 t}{\varepsilon ^{\prime }} + O\,\left( t \log \frac{1}{\varepsilon ^{\prime }} \right) , \end{aligned}$$

such that at the end Alice and Bob possess random variables $M_{A,1}^{\prime }, \ldots , M_{A,t}^{\prime }$ and $M_{B,1}^{\prime }, \ldots , M_{B,t}^{\prime }$, respectively, satisfying

$$\begin{aligned} \left\| R^{\prime }X^{\prime }Y^{\prime }M_1^{\prime }M_1^{\prime } \cdots M_t^{\prime }M_t^{\prime }-R^{\prime }X^{\prime }Y^{\prime }M_{A,1}^{\prime }M_{B,1}^{\prime }\cdots M_{A,t}^{\prime }M_{B,t}^{\prime } \right\| _{1} \le 3 \sum _{s=1}^t \sqrt{\varepsilon _s} + 6 \varepsilon ^{\prime } t. \end{aligned}$$

$\square $

Proof of Lemma 3.5

In $\mathscr {Q}_t$, Alice and Bob, using public coins and no communication, first generate $R_A$ and $R_B$ such that $ \left\| X Y R_A R_B - X^{\prime } Y^{\prime } R^{\prime } R^{\prime } \right\| _{1} \le \tau $. They can do this because $\left( X, Y \right) $ is $\left( 1 - \tau \right) $-embeddable in $\left( X^{\prime }R^{\prime }, Y^{\prime }R^{\prime } \right) $. Now they will run protocol $\mathscr {P}_t$ (given by Lemma 3.4) with Alice’s input being $XR_A$ and Bob’s input being $YR_B$ and at the end Alice and Bob possess $M_{A,1},\ldots , M_{A,t}$ and $M_{B,1},\ldots , M_{B,t}$, respectively. From Lemma 3.4, the communication of $\mathscr {Q}_t$ is as desired. Now, from Fact 2.9, Lemma 3.4, and the triangle inequality for the $\ell _1$ norm, we get

$$\begin{aligned}&\left\| X Y R_A R_B M_{A,1} M_{B,1} \cdots M_{A,t} M_{B,t} - X^{\prime } Y^{\prime } R^{\prime } R^{\prime } M_1^{\prime }M_1^{\prime } \cdots M_t^{\prime } M_t^{\prime } \right\| _{1} \\&\quad \le \tau + 3 \sum _{s=1}^t \sqrt{\varepsilon _s} + 6 \varepsilon ^{\prime }t. \end{aligned}$$

$\square $

5 Open Problems

Some natural questions that arise from this work are:

1.
Recently Braverman et al. [6] improved our result by showing that
$$\begin{aligned} \mathrm {R}^{(t),\mathrm {pub}}_{1-2^{-\varOmega (\varepsilon ^2k)}} \left( f^k \right) = \varOmega \,\left( \varepsilon ^2 \cdot k \cdot \left( \mathrm {R}^{(7t), \mathrm {pub}}_{\varepsilon }(f) - \kappa \left( \frac{t \log t}{\varepsilon } - \frac{t}{\varepsilon ^2} \right) \right) \right) , \end{aligned}$$
for some constant $\kappa $. Can the dependence on t be improved further?
2.
Direct product conjectures for quantum communication complexity are still widely open. Can these techniques be extended to show direct product theorems for bounded-round quantum communication complexity?

Notes

When $\mathrm {R}^{(t),\mathrm {pub}}_{\varepsilon }(f)$ is a constant, all the lower bounds are constants as well. It is known that several lower bounds satisfy a strong direct product theorem, such as conditional relative entropy [14]. Thus in this case a strong direct product result for the model we concerns follows directly.
We justify here the composition of $R_i$. Random variables $D_{-i} U_{-i}$ are useful because conditioning on them makes the distribution of inputs product across Alice and Bob (for fixed values of $X_iY_i$). Random variables $X_C Y_C$ are helpful since conditioning on them ensures that the inputs become product even when conditioned on success on C. Random variables $X_{[i-1]} Y_{[i-1]}$ are helpful because we use the following chain rule to draw a new coordinate outside of C with low information.
$$\begin{aligned} \mathrm {I}\,\left( XY \, \!: \, \!M \right) = \sum _i \mathrm {I}\,\left( X_iY_i \, \!: \, \!M \, \!\big \vert \, \!X_{[i-1]}Y_{[i-1]} \right) \end{aligned}$$

References

Ambainis, A., Špalek, R., de Wolf, R.: A new quantum lower bound method, with applications to direct product theorems and time-space tradeoffs. Algorithmica 55(3), 422–461 (2009). doi:10.1007/s00453-007-9022-9
Article MathSciNet MATH Google Scholar
Bar-Yossef, Z., Jayram, T., Kumar, R., Sivakumar, D.: An information statistics approach to data stream and communication complexity. In: Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science, FOCS ’02, pp. 209–218 (2002). doi:10.1109/SFCS.2002.1181944
Barak, B., Braverman, M., Chen, X., Rao, A.: How to compress interactive communication. SIAM J. Comput. 42(3), 1327–1363 (2013). doi:10.1137/100811969
Article MathSciNet MATH Google Scholar
Ben-Aroya, A., Regev, O., de Wolf, R.: A hypercontractive inequality for matrix-valued functions with applications to quantum computing and LDCs. In: Proceedings of the 49rd Annual IEEE Symposium on Foundations of Computer Science, FOCS ’08, pp. 477–486 (2008). doi:10.1109/FOCS.2008.45
Braverman, M., Rao, A.: Information equals amortized communication. IEEE Trans. Inf. Theory 60(10), 6058–6069 (2014). doi:10.1109/TIT.2014.2347282
Article MathSciNet MATH Google Scholar
Braverman, M., Rao, A., Weinstein, O., Yehudayoff, A.: Direct product via round-preserving compression. In: Automata, Languages, and Programming, Lecture Notes in Computer Science, vol. 7965, pp. 232–243. Springer, Berlin (2013). doi:10.1007/978-3-642-39206-1_20
Braverman, M., Rao, A., Weinstein, O., Yehudayoff, A.: Direct products in communication complexity. In: Proceedings of the 54rd Annual IEEE Symposium on Foundations of Computer Science, FOCS ’13, pp. 746–755 (2013). doi:10.1109/FOCS.2013.85
Braverman, M., Weinstein, O.: An interactive information odometer with applications. In: Proceedings of the 47th Annual ACM Symposium on Theory of Computing, STOC ’15, pp. 341–350 (2000). doi:10.1145/2746539.2746548
Chakrabarti, A., Shi, Y., Wirth, A., Yao, A.: Informational complexity and the direct sum problem for simultaneous message complexity. In: Proceedings of the 42rd Annual IEEE Symposium on Foundations of Computer Science, FOCS ’01, pp. 270–278 (2001). doi:10.1109/SFCS.2001.959901
Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, London (2006)
MATH Google Scholar
Drucker, A.: Improved direct product theorems for randomized query complexity. Comput. Complex. 21(2), 197–244 (2012). doi:10.1007/s00037-012-0043-7
Article MathSciNet MATH Google Scholar
Holenstein, T.: Parallel repetition: simplification and the no-signaling case. Theory Comput. 5(8), 141–172 (2009). doi:10.4086/toc.2009.v005a008. http://www.theoryofcomputing.org/articles/v005a008
Ibinson, B., Linden, N., Winter, A.: Robustness of quantum Markov chains. Commun. Math. Phys. 277(2), 289–304 (2008). doi:10.1007/s00220-007-0362-8
Article MathSciNet MATH Google Scholar
Jain, R.: New strong direct product results in communication complexity. Journal of the ACM (JACM), 62(3), (2015). doi:10.1145/2699432
Jain, R., Klauck, H.: New results in the simultaneous message passing model via information theoretic techniques. In: Proceedings of the 24rd Annual IEEE Conference on Computational Complexity, CCC ’09, pp. 369–378 (2009). doi:10.1109/CCC.2009.28
Jain, R., Klauck, H., Nayak, A.: Direct product theorems for classical communication complexity via subdistribution bounds: extended abstract. In: Proceedings of the 40rd Annual ACM Symposium on Theory of Computing, STOC ’08, pp. 599–608 (2008). doi:10.1145/1374376.1374462
Jain, R., Radhakrishnan, J., Sen, P.: The quantum communication complexity of the pointer chasing problem: the bit version. In: FST TCS 2002: Foundations of Software Technology and Theoretical Computer Science, Lecture Notes in Computer Science, vol. 2556, pp. 218–229. Springer, Berlin (2002). doi:10.1007/3-540-36206-1_20
Jain, R., Radhakrishnan, J., Sen, P.: A direct sum theorem in communication complexity via message compression. In: Automata, Languages and Programming, Lecture Notes in Computer Science, vol. 2719, pp. 300–315. Springer, Berlin (2003). doi:10.1007/3-540-45061-0_26
Jain, R., Radhakrishnan, J., Sen, P.: A lower bound for the bounded round quantum communication complexity of set disjointness. In: Proceedings of the 44rd Annual IEEE Symposium on Foundations of Computer Science, FOCS ’03, pp. 220–229 (2003). doi:10.1109/SFCS.2003.1238196
Jain, R., Radhakrishnan, J., Sen, P.: Prior entanglement, message compression and privacy in quantum communication. In: Proceedings of the 20rd Annual IEEE Conference on Computational Complexity, CCC ’05, pp. 285–296 (2005). doi:10.1109/CCC.2005.24
Jain, R., Sen, P., Radhakrishnan, J.: Optimal direct sum and privacy trade-off results for quantum and classical communication complexity (2008). arXiv:0807.1267
Jain, R., Yao, P.: A strong direct product theorem in terms of the smooth rectangle bound (2012). arXiv:1209.0263
Klauck, H.: On quantum and probabilistic communication: Las Vegas and one-way protocols. In: Proceedings of the 32rd Annual ACM Symposium on Theory of Computing, STOC ’00, pp. 644–651 (2000). doi:10.1145/335305.335396
Klauck, H.: A strong direct product theorem for disjointness. In: Proceedings of the 42rd ACM Symposium on Theory of Computing, STOC ’10, pp. 77–86 (2010). doi:10.1145/1806689.1806702
Klauck, H., Nayak, A., Ta-Shma, A., Zuckerman, D.: Interaction in quantum communication and the complexity of set disjointness. In: Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, STOC ’01, pp. 124–133 (2001). doi:10.1145/380752.380786
Klauck, H., Špalek, R., de Wolf, R.: Quantum and classical strong direct product theorems and optimal time-space tradeoffs. SIAM J. Comput. 36(5), 1472–1493 (2007). doi:10.1137/05063235X
Article MathSciNet MATH Google Scholar
Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press, Cambridge (1996)
Book MATH Google Scholar
Lee, T., Roland, J.: A strong direct product theorem for quantum query complexity. Comput. Complex. 22(2), 429–462 (2013). doi:10.1007/s00037-013-0066-8
Article MathSciNet MATH Google Scholar
Lee, T., Shraibman, A., Špalek, R.: A direct product theorem for discrepancy. In: Proceedings of the 23rd Annual IEEE Conference on Computational Complexity, CCC ’08, pp. 71–80 (2008). doi:10.1109/CCC.2008.25
Nisan, N., Rudich, S., Saks, M.: Products and help bits in decision trees. SIAM J. Comput. 28(3), 1035–1050 (1999). doi:10.1137/S0097539795282444
Article MathSciNet MATH Google Scholar
Nisan, N., Widgerson, A.: Rounds in communication complexity revisited. In: Proceedings of the 23rd Annual ACM Symposium on Theory of Computing, STOC ’91, pp. 419–429 (1991). doi:10.1145/103418.103463
Parnafes, I., Raz, R., Wigderson, A.: Direct product results and the GCD problem, in old and new communication models. In: Proceedings of the 29rd Annual ACM Symposium on Theory of Computing, STOC ’97, pp. 363–372 (1997). doi:10.1145/258533.258620
Ponzio, S.J., Radhakrishnan, J., Venkatesh, S.: The communication complexity of pointer chasing. J. Comput. Syst. Sci. 62(2), 323–355 (2001). doi:10.1006/jcss.2000.1731. http://www.sciencedirect.com/science/article/pii/S0022000000917318
Raz, R.: A parallel repetition theorem. SIAM J. Comput. 27(3), 763–803 (1998). doi:10.1137/S0097539795280895
Article MathSciNet MATH Google Scholar
Razborov, A.: On the distributional complexity of disjointness. Theor. Comput. Sci. 106(2), 385–390 (1992). doi:10.1016/0304-3975(92)90260-M. http://www.sciencedirect.com/science/article/pii/030439759290260M
Shaltiel, R.: Towards proving strong direct product theorems. Comput. Complex. 12(1–2), 1–22 (2003). doi:10.1007/s00037-003-0175-x
Article MathSciNet MATH Google Scholar
Shannon, C.E.: A mathematical theory of communication. SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001). doi:10.1145/584091.584093
Article MathSciNet Google Scholar
Sherstov, A.A.: Strong direct product theorems for quantum communication and query complexity. SIAM J. Comput. 41(5), 1122–1165 (2012). doi:10.1137/110842661
Article MathSciNet MATH Google Scholar
Viola, E., Wigderson, A.: Norms, XOR lemmas, and lower bounds for polynomials and protocols. Theory Comput. 4(7), 137–168 (2008). doi:10.4086/toc.2008.v004a007. http://www.theoryofcomputing.org/articles/v004a007
Yao, A.C.C.: Some complexity questions related to distributive computing (preliminary report). In: Proceedings of the 11rd Annual ACM Symposium on Theory of Computing, STOC ’79, pp. 209–213 (1979). doi:10.1145/800135.804414
Yao, A.C.C.: Theory and application of trapdoor functions. In: Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science, FOCS ’82, pp. 80–91 (1982). doi:10.1109/SFCS.1982.45

Download references

Acknowledgments

This work is funded by the Singapore Ministry of Education, partly through the Academic Research Fund Tier3 MOE2012-T3-1-009 and partly through the internal Grants of the Center for Quantum Technologies, Singapore. We thank the referees for their helpful comments.

Author information

Authors and Affiliations

Department of Computer Science, Centre for Quantum Technologies, National University of Singapore, Singapore, Singapore
Rahul Jain
MajuLab, CNRS-UNS-NUS-NTU International Joint Research Unit, UMI 3654, Singapore, Singapore
Rahul Jain
Centre for Quantum Technologies, National University of Singapore, Singapore, Singapore
Attila Pereszlényi & Penghui Yao

Authors

Rahul Jain
View author publications
You can also search for this author in PubMed Google Scholar
Attila Pereszlényi
View author publications
You can also search for this author in PubMed Google Scholar
Penghui Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Penghui Yao.

Additional information

A preliminary version of this article has appeared in the Proceedings of the 53rd Annual IEEE Symposium on Foundations of Computer Science, FOCS 2012.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jain, R., Pereszlényi, A. & Yao, P. A Direct Product Theorem for Two-Party Bounded-Round Public-Coin Communication Complexity. Algorithmica 76, 720–748 (2016). https://doi.org/10.1007/s00453-015-0100-0

Download citation

Received: 25 September 2014
Accepted: 10 December 2015
Published: 18 December 2015
Issue Date: November 2016
DOI: https://doi.org/10.1007/s00453-015-0100-0

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Direct Product Theorem for Two-Party Bounded-Round Public-Coin Communication Complexity

Abstract

Similar content being viewed by others

Public vs. Private Randomness in Simultaneous Multi-party Communication Complexity

A Discrepancy Lower Bound for Information Complexity

Making Randomness Public in Unbounded-Round Information Complexity

1 Introduction

Theorem 1.1

Theorem 1.2

Corollary 1.3

1.1 Our Techniques

Braverman–Rao protocol (informal)

1.2 Organization

2 Preliminaries

2.1 Information Theory

Definition 2.1

Lemma 2.2

Fact 2.3

Fact 2.4

Fact 2.5

Proof

Fact 2.6

Fact 2.7

Fact 2.8

Fact 2.9

Proof

Definition 2.10

Lemma 2.11

Lemma 2.12

Proof

2.2 Communication Complexity

Definition 2.13

Lemma 2.14

Fact 2.15

3 Proof of Theorem 1.1

Theorem 3.1

Remark 3.2

Lemma 3.3

Lemma 3.4

Lemma 3.5

Proof of Theorem 1.1

Claim 3.6

Proof

4 Deferred Proofs

Proof of Lemma 3.3

Claim 4.1

Proof

Proof of Lemma 3.4

Proof of Lemma 3.5

5 Open Problems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation