1 Introduction

A fundamental question in complexity theory is how much resource is needed to solve k independent instances of a problem compared to the resource required to solve one instance. More specifically, suppose that for solving one instance of a problem with probability of correctness p, we require c units of some resource in a given model of computation. A natural way to solve k independent instances of the same problem is to solve them independently, which needs \(k \cdot c\) units of resource and the overall success probability is \(p^k\). A strong direct product theorem for this problem would state that any algorithm, which solves k independent instances of this problem with \(o(k\cdot c)\) units of the resource, can only compute all the k instances correctly with probability at most \(p^{-\varOmega \,\left( k \right) }\). The weaker direct sum theorems state that in order to compute k independent instances of a problem, if we provide \(o(k\cdot c)\) units of resource, then the success probability for computing all the k instances correctly is at most a constant \(q < 1\).

In this work, we are concerned with the model of communication complexity which was introduced by Yao [40]. In this model there are different parties who wish to compute a joint relation of their inputs. They do local computation, use public or private coins, and communicate to achieve this task. The resource that is counted is the number of bits communicated. The text by Kushilevitz and Nisan [27] is an excellent reference for this model.

Direct product questions and direct sum questions have been extensively investigated in different sub-models of communication complexity. Some examples of known direct product theorems are Parnafes et al. [32] theorem for forests of communication protocols, Shaltiel’s [36] theorem for the discrepancy bound (which is a lower bound on the distributional communication complexity) under the uniform distribution, extended to arbitrary distributions by Lee et al. [29], extended to the multi-party case by Viola and Wigderson [39], extended to the generalized discrepancy bound by Sherstov [38]. Jain et al. [16] proved direct product theorem for the subdistribution bound. Klauck et al. [26] proved it for the quantum communication complexity of the set disjointness problem and Klauck [24] proved it for the public-coin communication complexity of the set disjointness problem (which was re-proven using different arguments by Jain [14]). Ben-Aroya et al. [4] showed it for the one-way quantum communication complexity of the index function problem. Jain showed it for randomized one-way communication complexity and for the conditional relative min-entropy bound [14], which is a lower bound on public-coin communication complexity. Recently, Jain and Yao [22] showed a strong direct product theorem in terms of the smooth rectangle bound. Later, Braverman and Weinstein [8] strengthened the result by showing a strong direct product theorem in terms of the (internal) information cost. Direct sum theorems were shown in the public-coin one-way model [18], in the public-coin simultaneous message passing model [18], in the entanglement-assisted quantum one-way communication model [20], in the private-coin simultaneous message passing model [15], in the constant-round public-coin two-way model [5] and in the general two-way model [3]. On the other hand, strong direct product conjectures have been shown to be false by Shaltiel [36] in some models of distributional communication complexity (and of query complexity and circuit complexity) under specific choices for the error parameter. Examples of direct product theorems in others models of computation include Yao’s XOR lemma [41], Raz’s [34] theorem for two-prover games, Shaltiel’s [36] theorem for fair decision trees, Nisan et al. [30] theorem for decision forests, Drucker’s [11] theorem for randomized query complexity, Sherstov’s [38] theorem for approximated polynomial degree, and Lee and Roland’s [28] theorem for quantum query complexity. Besides their inherent importance, direct product theorems had various important applications such as in probabilistically checkable proofs [34], in circuit complexity [41], and in showing time-space trade-offs [1, 24, 26].

In this paper, we show a direct product theorem for two-party bounded-round public-coin randomized communication complexity. In this model, for computing a relation \(f \subseteq \mathscr {X}\times \mathscr {Y}\times \mathscr {Z}\) (where \(\mathscr {X}\), \(\mathscr {Y}\), and \(\mathscr {Z}\) are finite sets), one party, say Alice, is given an input \(x\in \mathscr {X}\) and the other party, say Bob, is given an input \(y \in \mathscr {Y}\). They are supposed to do local computations using public coins shared between them, communicate a fixed number of rounds and at the end, output an element \(z\in \mathscr {Z}\). We only consider complete relations so there exists a z. They succeed if \((x,y,z) \in f\). For a natural number \(t \ge 1\) and \(\varepsilon \in (0,1)\), let \(\mathrm {R}^{(t), \mathrm {pub}}_{\varepsilon } (f)\) be the two-party t-round public-coin communication complexity of f with worst case error \(\varepsilon \) (see Definition 2.13).

We show the following.

Theorem 1.1

Let \(\mathscr {X}\), \(\mathscr {Y}\), and \(\mathscr {Z}\) be finite sets, \(f \subseteq \mathscr {X}\times \mathscr {Y}\times \mathscr {Z}\) a complete relation, \(\varepsilon > 0\), and \(k, t \ge 1\) integers. There exists a constant \(\kappa \ge 0\) such that

$$\begin{aligned} \mathrm {R}^{(t),\mathrm {pub}}_{1-\left( 1-\varepsilon /2 \right) ^{\varOmega \,\left( k \varepsilon ^2/t^2 \right) }} \left( f^k \right) = \varOmega \,\left( \frac{\varepsilon \cdot k}{t} \cdot \left( \mathrm {R}^{(t),\mathrm {pub}}_{\varepsilon }(f) - \frac{\kappa t^2}{\varepsilon } \right) \right) . \end{aligned}$$

In particular, it implies a strong direct product theorem for the two-party constant-round public-coin randomized communication complexity of all complete relations.Footnote 1 Our result generalizes the result of Jain [14] which can be regarded as the special case when \(t=1\). Prior to our result, randomized one-way communication complexity is the only model whose strong direct product theorem was established [14]. Hence our result can be considered as an important progress towards settling the strong direct product conjecture for two-party public-coin communication complexity, a major open question in this area. Recently, our result was improved by Braverman et al. [6] with better dependence on the number of rounds, using a new sampling technique introduced in Ref. [7].

As a direct consequence of our result, we get a direct product theorem for the pointer chasing problem defined as follows. Let \(n, t\ge 1\) be integers. Alice and Bob are given functions \(F_A: [n]\rightarrow [n]\) and \(F_B: [n]\rightarrow [n]\), respectively. Let \(F^t\) represent alternate composition of \(F_A\) and \(F_B\) done t times, starting with \(F_A\). The parties are supposed to communicate and determine \(F^t(1)\). In the bit version of the problem, the players are supposed to output the least significant bit of \(F^t(1)\). We refer to the t-pointer chasing problem as \(\mathrm {FP}_t\) and the bit version as \(\mathrm {BP}_t\). The pointer chasing problem naturally captures the trade-off between number of messages exchanged and the communication used. There is a straightforward t-round deterministic protocol with \(t\cdot \log n\) bits of communication for both \(\mathrm {FP}_t\) and \(\mathrm {BP}_t\). However, if only \(t-1\) rounds are allowed to be exchanged between the parties, exponentially more communication is required, treating t as a fixed constant. The communication complexity of this problem has been very well studied in both the classical and the quantum models [17, 23, 25, 31, 33]. Some tight lower bounds that we know so far are as follows.

Theorem 1.2

 [33] For any integer \(t \ge 1\),

$$\begin{aligned} \mathrm {R}^{(t-1),\mathrm {pub}}_{1/3} \left( \mathrm {FP}_t \right)&\ge \varOmega \,\left( n\log ^{(t-1)}n \right) \\ \mathrm {R}^{(t-1),\mathrm {pub}}_{1/3} \left( \mathrm {BP}_t \right)&\ge \varOmega \,\left( n \right) \end{aligned}$$

As a consequence of Theorem 1.1 we get strong direct product results for this problem. Note that in the descriptions of \(\mathrm {FP}_t\) and \(\mathrm {BP}_t\), t is a fixed constant, independent of the input size.

Corollary 1.3

For integers \(t,k\ge 1\),

$$\begin{aligned} \mathrm {R}^{(t-1),\mathrm {pub}}_{1-2^{-\varOmega \,\left( k/t^2 \right) }} \left( \mathrm {FP}_t^k \right)&\ge \varOmega \,\left( \frac{k}{t} \cdot n \log ^{(t-1)}n \right) \\ \mathrm {R}^{(t-1),\mathrm {pub}}_{1-2^{-\varOmega \,\left( k/t^2 \right) }} \left( \mathrm {BP}_t^k \right)&\ge \varOmega \,\left( \frac{k}{t} \cdot n \right) . \end{aligned}$$

1.1 Our Techniques

We prove our direct product result using information theoretic arguments. Information theory is a versatile tool in communication complexity, especially in proving lower bounds and direct sum and direct product theorems [2, 3, 5, 9, 14, 15, 1820]. The similar information theoretic arguments have been used to prove parallel repetition theorems for two-prover one-round games as well [12, 34]. The broad argument that we use is as follows. For a given relation f, let the communication required for computing one instance with t rounds and constant success be c. Let us consider a protocol for computing \(f^k\) with t rounds and communication o(kc). Let us condition on success on some \(\ell \) coordinates. If the overall success in these \(\ell \) coordinates is already as small as we want then we are done. Otherwise, we exhibit another coordinate j outside of these \(\ell \) coordinates such that success in the j-th coordinate, even when conditioned on success in these \(\ell \) coordinates, is bounded away from 1. This way the overall success keeps going down and becomes exponentially small (in k) eventually. We do this argument in the distributional setting where one is concerned with average error over the inputs coming from a specified distribution rather than the worst case error over all inputs. The distributional setting is then related to the worst case setting by the well known Yao’s principle [40].

More concretely, let \(\mu \) be a distribution on \(\mathscr {X}\times \mathscr {Y}\), possibly non-product across \(\mathscr {X}\) and \(\mathscr {Y}\). Let c be the minimum communication required for computing f with t-round protocols having error at most \(\varepsilon \) averaged over \(\mu \). Let the inputs for \(f^k\) be drawn from distribution \(\mu ^{k}\) (k independent copies of \(\mu \)). Consider a t-round protocol \(\mathscr {P}\) for \(f^k\) with communication o(kc) and for the rest of the argument condition on success on a set of coordinates C. If the success probability of this event is as small as we desire then we are done. Otherwise we exhibit a new coordinate \(j\notin C\) satisfying the following conditions conditioning on success on all coordinates in C. The distribution of Alice’s and Bob’s input in the j-th coordinate (\(X_j Y_j\)) is quite close to \(\mu \). Here use the same symbol to represent a random variable and its distribution. The joint distribution \(X_j Y_j M\), where M is the message transcript of \(\mathscr {P}\), can be approximated very well by Alice and Bob using a t-round protocol for f, when they are given input according to \(\mu \), using communication less than c. This shows that success in the j-th coordinate must be bounded away from one.

To sample the transcript, we adopt the message compression protocol of Braverman and Rao [5], where they used the protocol to show a direct sum theorem for the same communication model we are considering. Informally, the protocol can be stated as follows.

Braverman–Rao protocol (informal)

Given a Markov chain \(Y \leftrightarrow X \leftrightarrow M\) (see Definition 2.1), there exists a public-coin protocol between Alice and Bob, with inputs X and Y, with a single message from Alice to Bob of \(O\,\left( \mathrm {I}\,\left( X \, \!: \, \!M \, \!\big \vert \, \!Y \right) + \sqrt{\mathrm {I}\,\left( X \, \!: \, \!M \, \!\big \vert \, \!Y \right) } \right) + 1\) bits, such that at the end of the protocol, Alice and Bob both possess a random variable \(M^{\prime }\) which is close to M in the \(\ell _1\) distance.

Consider the situation after conditioning on success in all the coordinates in C, as above, and let \(X_j Y_j\) represent the input in the j-th coordinate. The Braverman–Rao compression protocol cannot be directly applied at this point. Take the first message \(M_1\), sent by Alice, for instance. \(Y_j X_j M_1\) doesn’t necessarily form a Markov chain. For example, \(M_1\) is the message in which Alice tries to guess Bob’s input \(Y_j\) and the event of success is Alice succeeds in doing so. Then it is easy to see that \(Y_jX_jM_1\) is not a Markov chain conditioning on success. However, we are able to show that \(Y_jX_jM_1\) is ‘close’ to being a Markov chain by further conditioning on appropriate sub-events. We then use a more ‘robust’ Braverman–Rao compression protocol (along the lines of the original), where by being ‘robust’, we mean that the communication cost and the error does not vary much even for XYM which is close to being a Markov chain. (Similar arguments were used by Jain in Ref. [14].) We then apply such a robust message compression protocol to each successive message. Conditioning on success in C incurs a small statistical loss for each message. Thus, the overall error is bounded as the number of messages exchanged is bounded in our model. Recently, Braverman et.al. introduced in Ref. [7] a new simulation whose statistical error is independent of the number of messages. Using this simulation, Braverman et al. [6] strengthened our result with better dependence on the number of rounds.

Another difficulty in this argument is that since \(\mu \) may be a non-product distribution, the input of Alice and Bob in other coordinates may be correlated with each other’s input in the j-th coordinate when conditioned on success in C. We overcome this by introducing new random variables DU conditioning on which Alice’s input is independent of Bob’s input. Namely, DU split \(\mu ^k\) into a convex combination of product distributions.

This idea of splitting a non-product distribution into convex combination of product distributions has been used in several previous works [2, 3, 5, 12, 14, 34, 35]. \(D_{-j}U_{-j}\) is independent of \(X_jY_j\) without conditioning on success in all coordinates in C. This fact is sufficient for several direct sum results [2, 5]. However, after conditioning on success in all coordinates in C, \(D_{-j}U_{-j}\) is correlated with \(X_jY_j\). This lead us to use another important tool namely the correlated sampling protocol, that was also used for example by Holenstein [12] in his proof of a strong direct product theorem for two-prover one-round games. We prove that \(D_{-j}U_{-j}\) can be correlatedly sampled by Alice and Bob. Conditioning on \(D_{-j}U_{-j}\) and their own inputs, Alice and Bob are able to complete the remaining XY.

As mentioned previously, we build on the arguments used by Jain [14]. He showed a new characterization of two-party one-way public-coin communication complexity and used that characterization to show a strong direct product result for all relations in this model. We are unable to arrive at such characterization for protocols with more than one messages so we use a more direct approach, as outlined above, to prove our direct product result.

1.2 Organization

The rest of the paper is organized as follows. In Sect. 2, we present some background on information theory and communication complexity. In Sect. 3, we prove our main result, Theorem 1.1, starting with some lemmas that are helpful in building the proof. Some proofs are deferred to Sect. 4.

2 Preliminaries

2.1 Information Theory

For integer \(n \ge 1\), let [n] represent the set \(\{1,2, \ldots , n\}\) and let \(\left[ 0 \right] \) be the empty set. Let \(\mathscr {X}\) and \(\mathscr {Y}\) be finite sets and k be a natural number. Let \(\mathscr {X}^k\) be the set \(\mathscr {X}\times \cdots \times \mathscr {X}\) the cross product of \(\mathscr {X}\), k times. Let \(\mu \) be a probability distribution on \(\mathscr {X}\). Let \(\mu (x)\) represent the probability of \(x\in \mathscr {X}\) according to \(\mu \). Let X be a random variable distributed according to \(\mu \). We use the same symbol to represent a random variable and its distribution whenever it is clear from the context. We use letters in lower-case such as xyz to represent the elements in the supports of XYZ, respectively. The expectation of function f on \(\mathscr {X}\) is defined as

$$\begin{aligned} \mathop {\mathbb {E}}\limits _{\begin{array}{c} x \leftarrow X \end{array}}\left[ f(x) \right] \mathop {=}\limits ^{\mathrm {def}}\sum _{x \in \mathscr {X}} \mu \left( x \right) \cdot f(x). \end{aligned}$$

The entropy of X is defined by Shannon in  [37] as

$$\begin{aligned} \mathrm {H}(X) \mathop {=}\limits ^{\mathrm {def}}- \sum _{x \in \mathscr {X}} \mu (x) \cdot \log \mu (x). \end{aligned}$$

For two distributions \(\mu \) and \(\lambda \) on \(\mathscr {X}\), the distribution \(\mu \otimes \lambda \) is defined as \((\mu \otimes \lambda )(x_1,x_2)\mathop {=}\limits ^{\mathrm {def}}\mu (x_1)\cdot \lambda (x_2)\). Define \(\mu ^k\) to be \(\mu \otimes \cdots \otimes \mu \) with k times. If \(L = L_1 \cdots L_k\), we define \(L_{-i} \mathop {=}\limits ^{\mathrm {def}}L_1\cdots L_{i-1} L_{i+1} \cdots L_k\) and \(L_{<i} \mathop {=}\limits ^{\mathrm {def}}L_1 \cdots L_{i-1}\). The random variable \(L_{\le i}\) is defined analogously. The total variance distance between \(\mu \) and \(\lambda \) is defined to be half of the \(\ell _1\) norm of \(\mu - \lambda \), i.e.,

$$\begin{aligned} \left\| \lambda - \mu \right\| _{1} \mathop {=}\limits ^{\mathrm {def}}\frac{1}{2} \sum _x \left| \lambda (x) - \mu (x) \right| = \max _{S\subseteq \mathscr {X}} \left| \lambda _S - \mu _S \right| \end{aligned}$$

where \(\lambda _S \mathop {=}\limits ^{\mathrm {def}}\sum _{x\in S}\lambda (x)\). We say that \(\lambda \) is \(\varepsilon \)-close to \(\mu \) if \(\Vert \lambda -\mu \Vert _1\le \varepsilon \). The relative entropy between distributions X and Y on \(\mathscr {X}\) is defined as

$$\begin{aligned} \mathrm {D}\,\left( X \big \Vert Y \right) \mathop {=}\limits ^{\mathrm {def}}\mathop {\mathbb {E}}\limits _{\begin{array}{c} x\leftarrow X \end{array}}\left[ \log \frac{\Pr \,\left[ X=x \right] }{\Pr \,\left[ Y=x \right] } \right] . \end{aligned}$$

The relative min-entropy between them is defined as

$$\begin{aligned} \mathrm {S}_{\infty }\left( X \big \Vert Y \right) \mathop {=}\limits ^{\mathrm {def}}\max _{x\in \mathscr {X}} \left\{ \log \frac{\Pr \,\left[ X=x \right] }{\Pr \,\left[ Y=x \right] } \right\} . \end{aligned}$$

It is easy to see that \(\mathrm {D}\,\left( X \big \Vert Y \right) \le \mathrm {S}_{\infty }\left( X \big \Vert Y \right) \). Let X, Y, and Z be jointly distributed random variables. We often write XY as a shorthand for the pair \(\left( X,Y \right) \). With slight abuse of notations, we write XX for a joint distribution \(XX^{\prime }\) where X and \(X^{\prime }\) are always equal, i.e., \(\Pr \left[ X=X^{\prime } \right] =1\) Let \(Y_x\) denote the distribution of Y conditioned on \(X=x\). The conditional entropy of Y conditioned on X is defined as

$$\begin{aligned} \mathrm {H}(Y|X) \mathop {=}\limits ^{\mathrm {def}}\mathop {\mathbb {E}}\limits _{\begin{array}{c} x\leftarrow X \end{array}}\left[ \mathrm {H}(Y_x) \right] = \mathrm {H}(XY)-\mathrm {H}(X). \end{aligned}$$

The mutual information between X and Y is defined as

$$\begin{aligned} \mathrm {I}\,\left( X \, \!: \, \!Y \right)&\mathop {=}\limits ^{\mathrm {def}}\mathrm {H}(X)+\mathrm {H}(Y)-\mathrm {H}(XY) \\&= \mathop {\mathbb {E}}\limits _{\begin{array}{c} y \leftarrow Y \end{array}}\left[ \mathrm {D}\,\left( X_y \big \Vert X \right) \right] \\&= \mathop {\mathbb {E}}\limits _{\begin{array}{c} x \leftarrow X \end{array}}\left[ \mathrm {D}\,\left( Y_x \big \Vert Y \right) \right] . \end{aligned}$$

It is easily seen that \(\mathrm {I}\,\left( X \, \!: \, \!Y \right) = \mathrm {D}\,\left( XY \big \Vert X \otimes Y \right) \). We say that X and Y are independent if \(\mathrm {I}\,\left( X \, \!: \, \!Y \right) = 0\). The conditional mutual information between X and Y, conditioned on Z, is defined as

$$\begin{aligned} \mathrm {I}\,\left( X \, \!: \, \!Y \, \!\big \vert \, \!Z \right)&\mathop {=}\limits ^{\mathrm {def}}\mathop {\mathbb {E}}\limits _{\begin{array}{c} z \leftarrow Z \end{array}}\left[ \mathrm {I}\,\left( X \, \!: \, \!Y \, \!\big \vert \, \!Z=z \right) \right] \\&= \mathrm {H}\left( X|Z \right) +\mathrm {H}\left( Y|Z \right) -\mathrm {H}\left( XY|Z \right) . \end{aligned}$$

The following chain rule for mutual information can be proved easily

$$\begin{aligned} \mathrm {I}\,\left( X \, \!: \, \!YZ \right) = \mathrm {I}\,\left( X \, \!: \, \!Z \right) + \mathrm {I}\,\left( X \, \!: \, \!Y \, \!\big \vert \, \!Z \right) . \end{aligned}$$

Definition 2.1

Let X, \(X^{\prime }\), Y, and Z be jointly distributed random variables. We define the joint distribution of \((X^{\prime }Z)(Y|X)\) by

$$\begin{aligned} \Pr [(X^{\prime }Z)(Y|X)=\left( x,z,y \right) ] \mathop {=}\limits ^{\mathrm {def}}\Pr [X^{\prime }=x, Z=z] \cdot \Pr [Y=y|X=x]. \end{aligned}$$

We say that X, Y, and Z is a Markov chain if \(XYZ=(XY)(Z|Y)\) and we denote it by \(X\leftrightarrow Y\leftrightarrow Z\).

It is easy to see that X, Y, Z is a Markov chain if and only if \(\mathrm {I}\,\left( X \, \!: \, \!Z \, \!\big \vert \, \!Y \right) =0\). Ibinson et al. [13] showed that if \(\mathrm {I}\,\left( X \, \!: \, \!Z \, \!\big \vert \, \!Y \right) \) is small then XYZ is close to being a Markov chain.

Lemma 2.2

([13]) For any random variables X, Y, and Z, it holds that

$$\begin{aligned} \mathrm {I}\,\left( X \, \!: \, \!Z \, \!\big \vert \, \!Y \right) = \min \left\{ \mathrm {D}\,\left( XYZ \big \Vert X^{\prime }Y^{\prime }Z^{\prime } \right) : X^{\prime } \leftrightarrow Y^{\prime }\leftrightarrow Z^{\prime } \right\} . \end{aligned}$$

The minimum is achieved by the distribution \(X^{\prime }Y^{\prime }Z^{\prime }=(XY)(Z|Y)\).

We will need the following basic facts. A very good text for reference on information theory is [10].

Fact 2.3

([10, page32]) The relative entropy is jointly convex in its arguments. That is, for distributions \(\mu , \mu ^1, \lambda , \lambda ^1 \in \mathscr {X}\) and \(p\in [0,1]\),

$$\begin{aligned} \mathrm {D}\,\left( p \mu + (1-p) \mu ^1 \big \Vert p \lambda + (1-p) \lambda ^1 \right) \le p \cdot \mathrm {D}\,\left( \mu \big \Vert \lambda \right) + (1-p) \cdot \mathrm {D}\,\left( \mu ^1 \big \Vert \lambda ^1 \right) . \end{aligned}$$

Fact 2.4

([10, page24]) The relative entropy satisfies the following chain rule. Let XY and \(X^1Y^1\) be random variables on \(\mathscr {X}\times \mathscr {Y}\). It holds that

$$\begin{aligned} \mathrm {D}\,\left( X^1Y^1 \big \Vert XY \right) = \mathrm {D}\,\left( X^1 \big \Vert X \right) + \mathop {\mathbb {E}}\limits _{\begin{array}{c} x\leftarrow X^1 \end{array}}\left[ \mathrm {D}\,\left( Y^1_x \big \Vert Y_x \right) \right] . \end{aligned}$$

In particular,

$$\begin{aligned} \mathrm {D}\,\left( X^1Y^1 \big \Vert X\otimes Y \right)&= \mathrm {D}\,\left( X^1 \big \Vert X \right) + \mathop {\mathbb {E}}\limits _{\begin{array}{c} x\leftarrow X^1 \end{array}}\left[ \mathrm {D}\,\left( Y^1_x \big \Vert Y \right) \right] \\&\ge \mathrm {D}\,\left( X^1 \big \Vert X \right) + \mathrm {D}\,\left( Y^1 \big \Vert Y \right) , \end{aligned}$$

where the inequality is from Fact 2.3.

Note that there is no conditioning on x in Y at the end of the first line as in the second argument of the relative entropy X and Y are independent. The following fact proves that the minimizer of the relative entropy is the product of the marginals.

Fact 2.5

Let XY and \(X^1Y^1\) be random variables on \(\mathscr {X}\times \mathscr {Y}\). It holds that

$$\begin{aligned} \mathrm {D}\,\left( X^1Y^1 \big \Vert X\otimes Y \right) \ge \mathrm {D}\,\left( X^1Y^1 \big \Vert X^1\otimes Y^1 \right) =\mathrm {I}\,\left( X^1 \, \!: \, \!Y^1 \right) . \end{aligned}$$

Proof

From the definition of the relative entropy, we have

$$\begin{aligned} \mathrm {D}\,\left( X^1Y^1 \big \Vert X\otimes Y \right)= & {} \sum _{xy}\Pr \left[ X^1Y^1=xy \right] \log \frac{\Pr \left[ X^1Y^1=xy \right] }{\Pr \left[ X=x \right] \Pr \left[ Y=y \right] } \nonumber \\= & {} \sum _{xy}\Pr \left[ X^1Y^1=xy \right] \left( \log \frac{\Pr \left[ X^1Y^1=xy \right] }{\Pr \left[ X^1=x \right] \Pr \left[ Y^1=y \right] } \right. \nonumber \\&\quad \left. +\, \log \frac{\Pr \left[ X^1=x \right] \Pr \left[ Y^1=y \right] }{\Pr \left[ X=x \right] \Pr \left[ Y=y \right] } \right) \nonumber \\= & {} \mathrm {D}\,\left( X^1Y^1 \big \Vert X^1\otimes Y^1 \right) +\mathrm {D}\,\left( X^1 \big \Vert X \right) +\mathrm {D}\,\left( Y^1 \big \Vert Y \right) \nonumber \\\ge & {} \mathrm {D}\,\left( X^1Y^1 \big \Vert X^1\otimes Y^1 \right) \nonumber . \end{aligned}$$

The equality \(\mathrm {D}\,\left( X^1Y^1 \big \Vert X^1\otimes Y^1 \right) =\mathrm {I}\,\left( X^1 \, \!: \, \!Y^1 \right) \) can easily be verified from the definitions.

Fact 2.6

(Pinsker’s inequality, [10, page370]) For distributions \(\lambda \) and \(\mu \),

$$\begin{aligned} 0 \le \left\| \lambda -\mu \right\| _{1} \le \sqrt{\mathrm {D}\,\left( \lambda \big \Vert \mu \right) }. \end{aligned}$$

The following fact gives a lower bound on each term in the summation in the definition of the relative entropy.

Fact 2.7

([21]) Let \(\lambda \) and \(\mu \) be distributions on \(\mathscr {X}\). For any subset \(\mathscr {S}\subseteq \mathscr {X}\), it holds that

$$\begin{aligned} \sum _{x \in \mathscr {S}} \lambda (x) \cdot \log \frac{\lambda (x)}{\mu (x)} \ge -1. \end{aligned}$$

Hence, for any \(r>0, c>0\), if \(\mathrm {D}\,\left( \lambda \big \Vert \mu \right) \le c\), then it holds that

$$\begin{aligned} \Pr _{\begin{array}{c} x\leftarrow \lambda \end{array}}\left[ \log \frac{\lambda \left( x \right) }{\mu \left( x \right) }\le \frac{c+1}{r} \right] \le r. \end{aligned}$$

The following fact easily follows from the triangle inequality and Fact 2.4.

Fact 2.8

The \(\ell _1\) distance and the relative entropy are monotone non-increasing when subsystems are considered. Let XY and \(X^1Y^1\) be random variables on \(\mathscr {X}\times \mathscr {Y}\), then

$$\begin{aligned} \left\| XY - X^1Y^1 \right\| _{1}&\ge \left\| X - X^1 \right\| _{1} \end{aligned}$$

and

$$\begin{aligned} \mathrm {D}\,\left( XY \big \Vert X^1Y^1 \right)&\ge \mathrm {D}\,\left( X \big \Vert X^1 \right) . \end{aligned}$$

Fact 2.9

For any function \(f : \, \mathscr {X}\times \mathscr {R} \rightarrow \mathscr {Y}\) and random variables X, \(X_1\) on \(\mathscr {X}\) and R on \(\mathscr {R}\), such that R is independent of \(X X_1\), it holds that

$$\begin{aligned} \left\| Xf(X,R) - X_1f(X_1,R) \right\| _{1} = \left\| X-X_1 \right\| _{1}. \end{aligned}$$

Proof

$$\begin{aligned}&\left\| Xf(X,R) - X_1f(X_1,R) \right\| _{1}\\&\quad = \frac{1}{2}\sum _{xy} \left| \Pr \left[ Xf(X,R)=xy \right] -\Pr \left[ X^1f(X^1,R)=xy \right] \right| \\&\quad = \frac{1}{2}\sum _x \left| \Pr \left[ X=x \right] -\Pr \left[ X^1=x \right] \right| \cdot \sum _y\Pr \left[ f(x,R)=y \right] \\&\quad = \frac{1}{2}\sum _x \left| \Pr \left[ X=x \right] -\Pr \left[ X^1=x \right] \right| = \left\| X-X^1 \right\| _{1}. \end{aligned}$$

\(\square \)

The following definition was introduced by Holenstein [12]. It plays a critical role in his proof of a parallel repetition theorem for two-prover games.

Definition 2.10

([12]) For two distributions \((X_0Y_0)\) and \((X_1SY_1T)\), we say that \((X_0,Y_0)\) is \(\left( 1-\varepsilon \right) \)-embeddable in \((X_1S,Y_1T)\) if there exists a probability distribution R over a set \(\mathscr {R}\), which is independent of \(X_0Y_0\) and functions \(f_A:\mathscr {X}\times \mathscr {R}\rightarrow \mathscr {S}\), \(f_B:\mathscr {Y}\times \mathscr {R}\rightarrow \mathscr {T}\), such that

$$\begin{aligned} \left\| X_0Y_0f_A(X_0,R)f_B(Y_0,R) - X_1Y_1ST \right\| _{1} \le \varepsilon . \end{aligned}$$

The following lemma was shown by Holenstein [12] using a correlated sampling protocol.

Lemma 2.11

(Corollary 5.3 in [12]) For random variables S, X, and Y, if

$$\begin{aligned} \left\| XYS-(XY)(S|X) \right\| _{1}&\le \varepsilon \end{aligned}$$

and

$$\begin{aligned} \left\| XYS-(XY)(S|Y) \right\| _{1}&\le \varepsilon \end{aligned}$$

then (XY) is \(\left( 1-5\varepsilon \right) \)-embeddable in (XSYS).

We will need the following generalization of the previous lemma.

Lemma 2.12

For joint random variables \((A^{\prime },B^{\prime },C^{\prime })\) and (AB), satisfying

$$\begin{aligned}&\mathrm {D}\,\left( A^{\prime }B^{\prime } \big \Vert AB \right) \le \varepsilon \end{aligned}$$
(1)
$$\begin{aligned}&\mathop {\mathbb {E}}\limits _{\begin{array}{c} (a,c)\leftarrow A^{\prime }C^{\prime } \end{array}}\left[ \mathrm {D}\,\left( B^{\prime }_{a,c} \big \Vert B_a \right) \right] \le \varepsilon \end{aligned}$$
(2)
$$\begin{aligned}&\mathop {\mathbb {E}}\limits _{\begin{array}{c} (b,c)\leftarrow B^{\prime }C^{\prime } \end{array}}\left[ \mathrm {D}\,\left( A^{\prime }_{b,c} \big \Vert A_b \right) \right] \le \varepsilon \end{aligned}$$
(3)

it holds that (AB) is \(\left( 1-5\sqrt{\varepsilon } \right) \)-embeddable in \((A^{\prime }C^{\prime },B^{\prime }C^{\prime })\).

Proof

Using the definition of the relative entropy, we get the following.

$$\begin{aligned}&\mathop {\mathbb {E}}\limits _{\begin{array}{c} (a,c) \leftarrow A^{\prime }C^{\prime } \end{array}}\left[ \mathrm {D}\,\left( B^{\prime }_{a,c} \big \Vert B_a \right) \right] - \mathop {\mathbb {E}}\limits _{\begin{array}{c} (a,c) \leftarrow A^{\prime }C^{\prime } \end{array}}\left[ \mathrm {D}\,\left( B^{\prime }_{a,c} \big \Vert B^{\prime }_a \right) \right] \\&\quad = \mathop {\mathbb {E}}\limits _{\begin{array}{c} (a,b,c) \leftarrow A^{\prime }B^{\prime }C^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ B^{\prime }=b|A^{\prime }=a \right] }{\Pr \,\left[ B=b|A=a \right] } \right] \\&\quad = \mathop {\mathbb {E}}\limits _{\begin{array}{c} a \leftarrow A^{\prime } \end{array}}\left[ \mathrm {D}\,\left( B^{\prime }_a \big \Vert B_a \right) \right] \ge 0. \end{aligned}$$

This means that

$$\begin{aligned} \mathop {\mathbb {E}}\limits _{\begin{array}{c} (a,c) \leftarrow A^{\prime }C^{\prime } \end{array}}\left[ \mathrm {D}\,\left( B^{\prime }_{a,c} \big \Vert B^{\prime }_a \right) \right] \le \mathop {\mathbb {E}}\limits _{\begin{array}{c} (a,c) \leftarrow A^{\prime }C^{\prime } \end{array}}\left[ \mathrm {D}\,\left( B^{\prime }_{a,c} \big \Vert B_a \right) \right] \le \varepsilon . \end{aligned}$$
(4)

Furthermore,

$$\begin{aligned} \mathop {\mathbb {E}}\limits _{\begin{array}{c} (a,c)\leftarrow A^{\prime }C^{\prime } \end{array}}\left[ \mathrm {D}\,\left( B^{\prime }_{a,c} \big \Vert B_a^{\prime } \right) \right]&= \mathrm {D}\,\left( A^{\prime }C^{\prime }B^{\prime } \big \Vert \left( A^{\prime }C^{\prime } \right) \left( B^{\prime }|A^{\prime } \right) \right) \end{aligned}$$
(5)
$$\begin{aligned}&= \mathrm {D}\,\left( A^{\prime }B^{\prime }C^{\prime } \big \Vert \left( A^{\prime }B^{\prime } \right) \left( C^{\prime }|A^{\prime } \right) \right) \end{aligned}$$
(6)
$$\begin{aligned}&\ge \left\| A^{\prime }B^{\prime }C^{\prime } - \left( A^{\prime }B^{\prime } \right) \left( C^{\prime }|A^{\prime } \right) \right\| _{1}^2. \end{aligned}$$
(7)

Above, Eq. (5) follows from the chain rule for the relative entropy, Eq. (6) is because the distributions \(\left( A^{\prime }C^{\prime } \right) \left( B^{\prime }|A^{\prime } \right) \) and \(\left( A^{\prime }B^{\prime } \right) \left( C^{\prime }|A^{\prime } \right) \) are same by Definition 2.1, and Eq. (7) follows from Fact 2.6. Now from Eqs. (4) and (7) we get

$$\begin{aligned} \left\| A^{\prime }B^{\prime }C^{\prime } - \left( A^{\prime }B^{\prime } \right) \left( C^{\prime }|A^{\prime } \right) \right\| _{1}&\le \sqrt{\varepsilon }. \end{aligned}$$

By similar arguments we get

$$\begin{aligned} \left\| A^{\prime }B^{\prime }C^{\prime } - \left( A^{\prime }B^{\prime } \right) \left( C^{\prime }|B^{\prime } \right) \right\| _{1}&\le \sqrt{\varepsilon }. \end{aligned}$$

The two inequalities above (using Lemma 2.11) imply that \(\left( A^{\prime }, B^{\prime } \right) \) is \(\left( 1 - 5 \sqrt{\varepsilon } \right) \)-embeddable in \(\left( A^{\prime }C^{\prime }, B^{\prime }C^{\prime } \right) \). Namely, there exist functions \(f_1\) and \(f_2\) and random variable R independent of \(A^{\prime }B^{\prime }\) such that \(\left\| A^{\prime }B^{\prime }f_1\left( A^{\prime },R \right) f_2\left( B^{\prime },R \right) -A^{\prime }B^{\prime }C^{\prime }C^{\prime } \right\| _{1}\le 5\sqrt{\varepsilon }\). Furthermore, from Fact 2.6 and Eq. (1) we get that

$$\begin{aligned} \left\| A^{\prime }B^{\prime } - AB \right\| _{1} \le \sqrt{\varepsilon }. \end{aligned}$$

Finally,

$$\begin{aligned}&\left\| ABf\left( A,R \right) f\left( B,R \right) -A^{\prime }B^{\prime }C^{\prime }C^{\prime } \right\| _{1}\\&\quad \le \left\| ABf_1\left( A,R \right) f_2\left( B,R \right) -A^{\prime }B^{\prime }f_1\left( A^{\prime },R \right) f_2\left( B^{\prime },R \right) \right\| _{1}\\&\qquad +\left\| A^{\prime }B^{\prime }f_1\left( A^{\prime },R \right) f_2\left( B^{\prime },R \right) -A^{\prime }B^{\prime }C^{\prime }C^{\prime } \right\| _{1}\\&\quad =\left\| AB-A^{\prime }B^{\prime } \right\| _{1}+\left\| A^{\prime }B^{\prime }f_1\left( A^{\prime },R \right) f_2\left( B^{\prime },R \right) -A^{\prime }B^{\prime }C^{\prime }C^{\prime } \right\| _{1}\le 6\sqrt{\varepsilon }, \end{aligned}$$

where the equality is from Fact 2.9. Thus

we get that (AB) is \(\left( 1-6\sqrt{\varepsilon } \right) \)-embeddable in \((A^{\prime }C^{\prime },B^{\prime }C^{\prime })\). \(\square \)

2.2 Communication Complexity

Let \(f \subseteq \mathscr {X}\times \mathscr {Y}\times \mathscr {Z}\) be a relation, \(t \ge 1\) an integer, and \(\varepsilon \in (0,1)\). In this work we only consider complete relations, i.e., for every \((x,y) \in \mathscr {X}\times \mathscr {Y}\), there is some \(z \in \mathscr {Z}\) such that \((x,y,z) \in f\). In the two-party t-round public-coin model of communication, Alice, with input \(x \in \mathscr {X}\), and Bob, with input \(y \in \mathscr {Y}\), do local computation using public coins shared between them and exchange t messages, with Alice sending the first message. At the end of the protocol, the party receiving the t-th message outputs some \(z \in \mathscr {Z}\). The output is declared correct if \((x,y,z) \in f\) and wrong otherwise.

Definition 2.13

Let \(\mathrm {R}^{(t),\mathrm {pub}}_{\varepsilon }(f)\) represent the two-party t-round public-coin communication complexity of f with worst case error \(\varepsilon \), i.e., the minimum number of bits that Alice and Bob need to exchange in a t-round public-coin protocol that the output for each input \(\left( x,y \right) \) is correct with probability at least \(1-\varepsilon \). We similarly consider two-party t-round deterministic protocols where there are no public coins used by Alice and Bob. Let \(\mu \in \mathscr {X}\times \mathscr {Y}\) be a distribution. We let \(\mathrm {D}_{\varepsilon }^{(t),\mu }(f)\) represent the two-party t-round distributional communication complexity of f under \(\mu \) with expected error \(\varepsilon \), i.e., the minimum number of bits Alice and Bob need to exchange in a two-party t-round deterministic protocol for f with distributional error (average error over the inputs) at most \(\varepsilon \) under \(\mu \).

The following is a consequence of the min–max theorem in game theory, see e.g., [27, Theorem 3.20].

Lemma 2.14

(Yao’s principle, [40]) \( \displaystyle \mathrm {R}^{(t),\mathrm {pub}}_{\varepsilon }(f) = \max _{\mu } \mathrm {D}^{(t),\mu }_{\varepsilon }(f) \).

The following fact about communication protocols can be verified easily.

Fact 2.15

Let \(M_1, \ldots , M_t\) be t messages in a deterministic communication protocol between Alice and Bob with inputs X and Y, where X and Y are independent. Then for any \(s \in [t]\), X and Y are independent even conditioned on \(M_1, \ldots , M_s\).

Let \(f^k \subseteq \mathscr {X}^k \times \mathscr {Y}^k \times \mathscr {Z}^k\) be the cross product of f with itself k times. In a protocol for computing \(f^k\), Alice will receive input in \(\mathscr {X}^k\), Bob will receive input in \(\mathscr {Y}^k\) and the output of the protocol will be in \(\mathscr {Z}^k\).

3 Proof of Theorem 1.1

We start by showing a few lemmas which are helpful in the proof of the main result. The following theorem was shown by Jain [14] and follows primarily from a message compression argument due to Braverman and Rao [5].

Theorem 3.1

(Lemma 3.8 in [14]) Let \(\delta > 0\) and \(c \ge 0\). Let \(X^{\prime }\), \(Y^{\prime }\), and N be random variables for which \(Y^{\prime } \leftrightarrow X^{\prime } \leftrightarrow N\) is a Markov chain and the following holds.

$$\begin{aligned} \Pr _{\begin{array}{c} (x,y,m)\leftarrow X^{\prime },Y^{\prime },N \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] }>c \right] \le \delta . \end{aligned}$$
(8)

There exists a public-coin protocol between Alice and Bob, with inputs \(X^{\prime }\) and \(Y^{\prime }\), with a single message from Alice to Bob of at most \(c+O\,\left( \log (1/\delta ) \right) \) bits, such that at the end of the protocol, Alice and Bob possess random variables \(M_A\) and \(M_B\), respectively, satisfying \( \left\| X^{\prime }Y^{\prime }NN-X^{\prime }Y^{\prime }M_AM_B \right\| _{1} \le 2\delta \).

Remark 3.2

In Ref. [5], the condition \(\mathrm {I}\,\left( X^{\prime } \, \!: \, \!N \, \!\big \vert \, \!Y^{\prime } \right) \le c\) is used instead of Eq. (8). It is changed to the current one in Ref. [14]. By the equality \(\mathrm {I}\,\left( X^{\prime } \, \!: \, \!N \, \!\big \vert \, \!Y \right) =\mathrm {D}\,\left( X^{\prime }Y^{\prime }N \big \Vert X^{\prime }Y^{\prime }\left( N|Y^{\prime } \right) \right) \) and Fact 2.7, \(\mathrm {I}\,\left( X^{\prime } \, \!: \, \!N \, \!\big \vert \, \!Y^{\prime } \right) \le c\) implies

$$\begin{aligned} \Pr _{\begin{array}{c} (x,y,m) \leftarrow X^{\prime },Y^{\prime },N \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] } > \frac{c+1}{\delta } \right] \le \delta . \end{aligned}$$

This modification is essential in our argument since the condition in Eq. (8) is robust when the underlying joint distribution is perturbed slightly, while \(\mathrm {I}\,\left( X^{\prime } \, \!: \, \!N \, \!\big \vert \, \!Y^{\prime } \right) \) may change a lot with such a perturbation.

As mentioned in Sect. 1, we will have to work with approximate Markov chains in our argument for the direct product. The following lemma makes Theorem 3.1 more robust to deal with approximate Markov chains. Its proof appears in Sect. 4.

Lemma 3.3

Let \( c \ge 0 \), \( 1 > \varepsilon > 0 \), and \(\varepsilon ^{\prime } > 0 \). Let \(X^{\prime }\), \(Y^{\prime }\), and \(M^{\prime }\) be random variables for which the following holds,

$$\begin{aligned} \mathrm {I}\,\left( X^{\prime } \, \!: \, \!M^{\prime } \, \!\big \vert \, \!Y^{\prime } \right) \le c \quad \text {and} \quad \mathrm {I}\,\left( Y^{\prime } \, \!: \, \!M^{\prime } \, \!\big \vert \, \!X^{\prime } \right) \le \varepsilon . \end{aligned}$$

There exists a public-coin protocol between Alice and Bob, with inputs \(X^{\prime }\) and \(Y^{\prime }\), with a single message from Alice to Bob of at most \(\frac{c+5}{\varepsilon ^{\prime }}+O\,\left( \log \frac{1}{\varepsilon ^{\prime }} \right) \) bits, such that at the end of the protocol, Alice and Bob possess a random variable \(M_A\) and \(M_B\), respectively, satisfying

$$\begin{aligned} \left\| X^{\prime }Y^{\prime }M^{\prime }M^{\prime }-X^{\prime }Y^{\prime }M_AM_B \right\| _{1} \le 3 \sqrt{\varepsilon } + 6 \varepsilon ^{\prime }. \end{aligned}$$

The following lemma generalizes the above lemma to deal with multiple messages. Its proof appears in Sect. 4.

Lemma 3.4

Let \(t \ge 1\) be an integer. Let \( \varepsilon ^{\prime } > 0\), \( c_s \ge 0 \), and \( 1 > \varepsilon _s > 0\) for all \( 1 \le s \le t\). Let \(R^{\prime }\), \(X^{\prime }\), \(Y^{\prime }\), \(M_1^{\prime }, \ldots , M_t^{\prime }\) be random variables for which the following holds. (Below \( M^{\prime }_{<s} = M^{\prime }_1 \cdots M^{\prime }_{s-1}\) by definition.)

$$\begin{aligned} \mathrm {I}\,\left( X^{\prime } \, \!: \, \!M^{\prime }_s \, \!\big \vert \, \!Y^{\prime }R^{\prime }M^{\prime }_{<s} \right)&\le c_s \end{aligned}$$
(9)
$$\begin{aligned} \mathrm {I}\,\left( Y^{\prime } \, \!: \, \!M^{\prime }_s \, \!\big \vert \, \!X^{\prime }R^{\prime }M^{\prime }_{<s} \right)&\le \varepsilon _s \end{aligned}$$
(10)

for odd s and

$$\begin{aligned} \mathrm {I}\,\left( Y^{\prime } \, \!: \, \!M^{\prime }_s \, \!\big \vert \, \!X^{\prime }R^{\prime }M^{\prime }_{<s} \right)&\le c_s \end{aligned}$$
(11)
$$\begin{aligned} \mathrm {I}\,\left( X^{\prime } \, \!: \, \!M^{\prime }_s \, \!\big \vert \, \!Y^{\prime }R^{\prime }M^{\prime }_{<s} \right)&\le \varepsilon _s \end{aligned}$$
(12)

for even s. There exists a public-coin t-round protocol \(\mathscr {P}_t\) between Alice, with input \(X^{\prime }R^{\prime }\), and Bob, with input \(Y^{\prime }R^{\prime }\), with Alice sending the first message. The total communication of \(\mathscr {P}_t\) is at most

$$\begin{aligned} \frac{\sum _{s=1}^tc_s+5t}{\varepsilon ^{\prime }} + O\,\left( t\log \frac{1}{\varepsilon ^{\prime }} \right) . \end{aligned}$$

At end of the protocol, Alice and Bob possess random variables \(M_{A,1}^{\prime }, \ldots , M_{A,t}^{\prime }\) and \(M_{B,1}^{\prime }, \ldots , M_{B,t}^{\prime }\), respectively, satisfying

$$\begin{aligned} \left\| R^{\prime }X^{\prime }Y^{\prime }M_1^{\prime }M_1^{\prime } \cdots M_t^{\prime }M_t^{\prime }-R^{\prime }X^{\prime }Y^{\prime }M_{A,1}^{\prime }M_{B,1}^{\prime }\cdots M_{A,t}^{\prime }M_{B,t}^{\prime } \right\| _{1} \le 3 \sum _{s=1}^t \sqrt{\varepsilon _s} + 6 \varepsilon ^{\prime } t. \end{aligned}$$

In the above lemma, Alice and Bob shared an input \(R^{\prime }\) (potentially correlated with \(X^{\prime }Y^{\prime }\)). Eventually we will need Alice and Bob to generate this shared part themselves using correlated sampling. The following lemma, obtained from the lemma above, is the one that we will finally use in the proof of our main result. Its proof appears in Sect. 4.

Lemma 3.5

Let random variables \(R^{\prime }\), \(X^{\prime }\), \(Y^{\prime }\), and \(M_1^{\prime }, \ldots , M_t^{\prime }\) and numbers \( \varepsilon ^{\prime }\), \(c_s\), and \(\varepsilon _s\) satisfy all the conditions in Lemma 3.4. Let \(\tau > 0\) and let random variables (XY) be \((1-\tau )\)-embeddable in \((X^{\prime }R^{\prime },Y^{\prime }R^{\prime })\). There exists a public-coin t-round protocol \(\mathscr {Q}_t\) between Alice, with input X, and Bob, with input Y, with Alice sending the first message, and total communication at most

$$\begin{aligned} \frac{\sum _{s=1}^tc_s+5t}{\varepsilon ^{\prime }} + O\,\left( t\log \frac{1}{\varepsilon ^{\prime }} \right) . \end{aligned}$$

At the end of the protocol, Alice possesses \(R_A M_{A,1} \cdots M_{A,t}\) and Bob possesses \(R_BM_{B,1}\cdots M_{B,t}\), such that

$$\begin{aligned}&\left\| X Y R_A R_B M_1M_1 \cdots M_tM_t - X^{\prime } Y^{\prime } R^{\prime } R^{\prime } M_{A,1}^{\prime }M_{B,1}^{\prime }\cdots M_{A,t}^{\prime }M_{B,t}^{\prime } \right\| _{1} \\&\quad \le \tau + 3 \sum _{s=1}^t \sqrt{\varepsilon _s} + 6 \varepsilon ^{\prime } t. \end{aligned}$$

We are now ready to prove our main result, Theorem 1.1. We restate it here for convenience.

Theorem 1.1 Let \(\mathscr {X}\), \(\mathscr {Y}\), and \(\mathscr {Z}\) be finite sets, \(f \subseteq \mathscr {X}\times \mathscr {Y}\times \mathscr {Z}\) a complete relation, \(\varepsilon > 0\), and \(k, t \ge 1\) integers. There exists a constant \(\kappa \ge 0\) such that

$$\begin{aligned} \mathrm {R}^{(t),\mathrm {pub}}_{1-\left( 1-\varepsilon /2 \right) ^{\varOmega \,\left( k \varepsilon ^2/t^2 \right) }} \left( f^k \right) = \varOmega \,\left( \frac{\varepsilon \cdot k}{t} \cdot \left( \mathrm {R}^{(t),\mathrm {pub}}_{\varepsilon }(f) - \frac{\kappa t^2}{\varepsilon } \right) \right) . \end{aligned}$$

Proof of Theorem 1.1

Let \(\delta \mathop {=}\limits ^{\mathrm {def}}\frac{\varepsilon ^2}{7500t^2}\) and \(\delta _1 = \frac{\varepsilon }{3000 t}\). From Yao’s principle (Lemma 2.14) it suffices to prove that for any distribution \(\mu \) on \(\mathscr {X}\times \mathscr {Y}\),

$$\begin{aligned} \mathrm {D}^{(t),\mu ^k}_{1-(1-\varepsilon /2)^{\lfloor \delta k \rfloor }} \left( f^k \right) \ge \delta _1 k c \end{aligned}$$

where \(c \mathop {=}\limits ^{\mathrm {def}}\mathrm {D}^{(t), \mu }_{\varepsilon }(f) - \frac{\kappa t^2}{\varepsilon } \), for constant \(\kappa \) to be chosen later. Let XY be distributed according to \(\mu ^k\). Let \(\mathscr {Q}\) be a t-round deterministic protocol between Alice, with input X, and Bob, with input Y, that computes \(f^k\), with Alice sending the first message and total communication \(\delta _1 k c\) bits. We assume that t is odd for the rest of the argument and so Bob makes the final output. (The case when t is even follows similarly.) The following claim implies that the success of \(\mathscr {Q}\) is at most \((1-\varepsilon /2)^{\lfloor \delta k \rfloor }\) and this shows the desired. \(\square \)

Claim 3.6

For each \(i \in [k]\), let us define a binary random variable \(T_i \in \left\{ 0,1 \right\} \), which represents the success of \(\mathscr {Q}\), i.e., Bob’s output being correct, on the i-th instance. So \(T_i=1\) if the protocol \(\mathscr {Q}\) computes the i-th instance of f correctly and \(T_i=0\) otherwise. Let \(k^{\prime } \mathop {=}\limits ^{\mathrm {def}}\lfloor \delta k \rfloor \). There exist \(k^{\prime }\) coordinates \(\left\{ i_1, \ldots , i_{k^{\prime }} \right\} \) such that for each \(1 \le r \le k^{\prime }-1\), either

$$\begin{aligned} \Pr \,\left[ T^{\left( r \right) }= 1 \right]&\le (1-\frac{\varepsilon }{2})^{k^{\prime }} \end{aligned}$$

or

$$\begin{aligned} \Pr \,\left[ T_{i_{r+1}}=1 \big \vert T^{\left( r \right) }= 1 \right]&\le 1-\frac{\varepsilon }{2} \end{aligned}$$

where \( \displaystyle T^{\left( r \right) }\mathop {=}\limits ^{\mathrm {def}}\prod _{j=1}^{r} T_{i_j} \).

Proof

For \(s \in [t]\), we denote the s-th message of \(\mathscr {Q}\) by \(M_s\). Let \(M \mathop {=}\limits ^{\mathrm {def}}M_1 \cdots M_t\). In the following, we assume that \(1 \le r < k^{\prime }\). However, the same argument also works when \(r=0\), i.e., for identifying the first coordinate, which we skip for the sake of avoiding repetition. Suppose that we have already identified r coordinates \(i_1,\ldots ,i_r\) satisfying that

$$\begin{aligned} \Pr \,\left[ T_{i_1}=1 \right]&\le 1 - \frac{\varepsilon }{2} \end{aligned}$$

and

$$\begin{aligned} \Pr \,\left[ T_{i_{j+1}} = 1 \big \vert T^{(j)} = 1 \right]&\le 1 - \frac{\varepsilon }{2} \end{aligned}$$

for \(1\le j\le r-1\). If \(\Pr \,\left[ T^{\left( r \right) }= 1 \right] \le (1-\frac{\varepsilon }{2})^{k^{\prime }}\) then we are done. So from now on, assume that

$$\begin{aligned} \Pr \,\left[ T^{\left( r \right) }=1 \right] > (1-\frac{\varepsilon }{2})^{k^{\prime }} \ge 2^{-\delta k}. \end{aligned}$$

Let D be a random variable uniformly distributed in \(\{0,1\}^k\) and independent of XY. Let \(U_i = X_i\) if \(D_i = 0\) and \(U_i = Y_i\) if \(D_i = 1\). For any random variable L, let us introduce the notation

$$\begin{aligned} L^1 \mathop {=}\limits ^{\mathrm {def}}(L | T^{\left( r \right) }= 1). \end{aligned}$$

For example, \(X^1Y^1=(XY|T^{\left( r \right) }=1)\). Let \(C \mathop {=}\limits ^{\mathrm {def}}\left\{ i_1, \ldots , i_r \right\} \) and

$$\begin{aligned} R_i \mathop {=}\limits ^{\mathrm {def}}D_{-i} U_{-i} X_{C \cup [i-1]} Y_{C \cup [i-1]} \end{aligned}$$

for \(i \in [k]\). We denote an element from the range of \(R_i\) by \(r_i\).Footnote 2

To prove the claim, we will show that there exists a coordinate \(j \notin C\) such that

  1. 1.

    \(\left( X_j, Y_j \right) \) can be embedded well in \(\left( X^1_j R^1_j, Y^1_j R^1_j \right) \) (with appropriate parameters as required by Lemma 2.12.)

  2. 2.

    Random variables \(R_j^1\), \(X^1_j\), \(Y^1_j\), and \(M^1_1, \ldots , M^1_t\) satisfy the conditions of Lemma 3.4 with appropriate parameters.

The following calculations are helpful for achieving the condition in Eq. (1) in Lemma 2.12. That is, \(X^1_jY^1_j\) is close to \(\mu \).

$$\begin{aligned} \delta k&> \mathrm {S}_{\infty }\left( X^1Y^1 \big \Vert XY \right) \nonumber \\&\ge \mathrm {D}\,\left( X^1Y^1 \big \Vert XY \right) \nonumber \\&\ge \sum _{i\notin C} \mathrm {D}\,\left( X^1_iY^1_i \big \Vert X_i Y_i \right) \end{aligned}$$
(13)

where the first inequality follows from the assumption that \(\Pr \,\left[ T^{\left( r \right) }=1 \right] > 2^{-\delta k}\) and the last inequality follows from Fact 2.4. The following calculations are helpful for achieving the conditions in Eqs. (2) and (3) in Lemma 2.12 that \(\left( X^1_j|R^1_jY^1_j \right) \approx \left( X_j|Y_j \right) \) and \(\left( Y^1_j|R^1_jX^1_j \right) \approx \left( Y_j|X_j \right) \). It implies that Alice and Bob are able to sample \(R_j^1\) correlatedly with inputs \(X^1_jY^1_j\).

$$\begin{aligned} \delta k&> \mathrm {S}_{\infty }\left( X^1Y^1D^1U^1 \big \Vert XYDU \right) \ge \mathrm {D}\,\left( X^1Y^1D^1U^1 \big \Vert XYDU \right) \nonumber \\&\ge \mathop {\mathbb {E}}\limits _{\begin{array}{c} (d,u,x_C,y_C) \leftarrow D^1U^1X^1_CY^1_C \end{array}}\left[ \mathrm {D}\,\left( \left( X^1Y^1 \right) _{d,u,x_C,y_C} \big \Vert \left( XY \right) _{d,u,x_C,y_C} \right) \right] \end{aligned}$$
(14)
$$\begin{aligned}&= \sum _{i \notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (d,u,x_{C\cup [i-1]},y_{C\cup [i-1]})\\ \leftarrow D^1U^1X_{C\cup [i-1]}^1Y_{C\cup [i-1]}^1 \end{array}}\left[ \mathrm {D}\,\left( \left( X_i^1Y_i^1 \right) _{\begin{array}{c} d,u,x_{C\cup [i-1]}, \\ y_{C\cup [i-1]} \end{array}} \big \Vert \left( X_iY_i \right) _{\begin{array}{c} d,u,x_{C\cup [i-1]}, \\ y_{C\cup [i-1]} \end{array}} \right) \right] \end{aligned}$$
(15)
$$\begin{aligned}&= \sum _{i\notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (d_i,u_i,r_i) \leftarrow D^1_iU^1_iR^1_i \end{array}}\left[ \mathrm {D}\,\left( (X^1_iY^1_i)_{d_i,u_i,r_i} \big \Vert (X_iY_i)_{d_i,u_i,r_i} \right) \right] \end{aligned}$$
(16)
$$\begin{aligned}&= \frac{1}{2} \sum _{i \notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (r_i,x_i)\leftarrow R^1_iX^1_i \end{array}}\left[ \mathrm {D}\,\left( \left( Y_i^1 \right) _{r_i, x_i} \big \Vert \left( Y_i \right) _{x_i} \right) \right] \nonumber \\&\quad {} + \frac{1}{2} \sum _{i \notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (r_i,y_i)\leftarrow R^1_iY^1_i \end{array}}\left[ \mathrm {D}\,\left( \left( X_i^1 \right) _{r_i, y_i} \big \Vert \left( X_i \right) _{y_i} \right) \right] . \end{aligned}$$
(17)

Above, Eqs. (14) and (15) follow from Fact 2.4. Equation (16) is because \(\left( d_i,u_i,r_i \right) \) and \(\left( d,u,x_{C\cup [i-1]},y_{C\cup [i-1]} \right) \) are same up to the order. Equation (17) follows because \(D^1_i\) is independent of \(R^1_i\) and with probability half \( D^1_i\) is 0, in which case \(U^1_i = X^1_i\) and with probability half \( D^1_i\) is 1 in which case \(U_i^1 = Y_i^1\).

The following calculation is useful for achieving the conditions of Eqs. (9) and (11), exhibiting that the information carried by the messages about the sender’s input is small.

$$\begin{aligned} \delta _1 c k&\ge \left| M^1 \right| \quad (|M^1| \text{ represents } \text{ the } \text{ length } \text{ of } M^1)\nonumber \\&\ge \mathrm {I}\,\left( X^1Y^1 \, \!: \, \!M^1 \, \!\big \vert \, \!D^1U^1X^1_CY^1_C \right) \nonumber \\&= \sum _{i\notin C} \mathrm {I}\,\left( X^1_iY^1_i \, \!: \, \!M^1 \, \!\big \vert \, \!D^1U^1X^1_{C\cup [i-1]}Y^1_{C\cup [i-1]} \right) \nonumber \\&= \sum _{i \notin C} \sum _{s=1}^t \mathrm {I}\,\left( X^1_iY^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!D^1U^1X^1_{C\cup [i-1]}Y^1_{C\cup [i-1]}M^1_{<s} \right) \nonumber \\&= \sum _{i\notin C} \sum _{s=1}^t \mathrm {I}\,\left( X^1_iY^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!D^1_iU^1_iR^1_iM^1_{<s} \right) \nonumber \\ \!&=\! \sum _{i\notin C} \left( \sum _{s \text {odd}} \mathrm {I}\,\left( X^1_iY^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!D^1_iU^1_iR^1_iM^1_{<s} \right) \!+\! \sum _{s \text {even}} \mathrm {I}\,\left( X^1_iY^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!D^1_iU^1_iR^1_iM^1_{<s} \right) \right) \nonumber \\&= \frac{1}{2} \sum _{i\notin C} \left( \sum _{s \text {odd}} \mathrm {I}\,\left( X^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_iY^1_iM^1_{<s} \right) + \sum _{s \text {even}} \mathrm {I}\,\left( Y^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_iX^1_iM^1_{<s} \right) \right) \end{aligned}$$
(18)

Above, we have used the chain rule for the mutual information in the first two equalities. The last inequality follows because \(D^1_i\) is independent of \( X^1_i Y_i^1 R^1_i M^1 \) and with probability half \( D^1_i \) is 0, in which case \(U^1_i = X^1_i\), and with probability half \( D^1_i \) is 1, in which case \(U_i^1 = Y_i^1\).

The following calculation is useful for achieving the conditions of Eqs. (10) and (12), exhibiting that the information carried by the messages about the receiver’s input is very small. Here we are only able to argue round by round and hence pay a factor proportional to the number of messages in the final result. Let \(s \in [t]\) be odd for now.

$$\begin{aligned} \delta k&\ge \mathrm {S}_{\infty }\left( D^1 U^1 X^1 Y^1 M^1_{\le s} \big \Vert DUXYM_{\le s} \right) \nonumber \\&\ge \mathrm {D}\,\left( D^1 U^1 X^1 Y^1 M^1_{\le s} \big \Vert DUXYM_{\le s} \right) \nonumber \\&\ge \mathop {\mathbb {E}}\limits _{\begin{array}{c} (d,u,x_C,y_C,m_{\le s}) \leftarrow D^1U^1X^1_CY^1_CM^1_{\le s} \end{array}}\left[ \mathrm {D}\,\left( (X^1Y^1)_{d,u,x_C,y_C,m_{\le s}} \big \Vert (XY)_{d,u,x_C,y_C,m_{\le s}} \right) \right] \nonumber \\&= \sum _{i\notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (d, u, x_{C\cup [i-1]}, y_{C\cup [i-1]}, m_{\le s}) \\ \leftarrow D^1U^1X^1_{C\cup [i-1]}Y^1_{C\cup [i-1]}M^1_{\le s} \end{array}}\left[ \mathrm {D}\,\left( (X^1_i Y^1_i)_{\begin{array}{c} d, u, x_{C\cup [i-1]}, \\ y_{C\cup [i-1]}, m_{\le s} \end{array}} \big \Vert (X_i Y_i)_{\begin{array}{c} d, u, x_{C\cup [i-1]}, \\ y_{C\cup [i-1]}, m_{\le s} \end{array}} \right) \right] \nonumber \\&= \sum _{i\notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (d_i, u_i, r_i, m_{\le s}) \leftarrow D^1_iU^1_iR^1_iM^1_{\le s} \end{array}}\left[ \mathrm {D}\,\left( (X^1_iY^1_i)_{d_i,u_i,r_i,m_{\le s}} \big \Vert (X_iY_i)_{d_i,u_i,r_i,m_{\le s}} \right) \right] \end{aligned}$$
(19)
$$\begin{aligned}&\ge \frac{1}{2} \sum _{i\notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (x_i, r_i, m_{\le s}) \leftarrow X^1_iR^1_iM^1_{\le s} \end{array}}\left[ \mathrm {D}\,\left( (Y^1_i)_{x_i, r_i, m_{\le s}} \big \Vert (Y_i)_{x_i, r_i, m_{\le s}} \right) \right] \nonumber \\&= \frac{1}{2} \sum _{i\notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (x_i, r_i, m_{\le s}) \leftarrow X^1_iR^1_iM^1_{\le s} \end{array}}\left[ \mathrm {D}\,\left( (Y^1_i)_{x_i,r_i,m_{\le s}} \big \Vert (Y_i)_{x_i,r_i,m_{< s}} \right) \right] \end{aligned}$$
(20)
$$\begin{aligned}&= \frac{1}{2} \sum _{i\notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (x_i, r_i, m_{< s}) \leftarrow X^1_iR^1_iM^1_{< s} \end{array}}\left[ \mathrm {D}\,\left( (Y^1_iM^1_s)_{x_i,r_i,m_{<s}} \big \Vert (Y_i)_{x_i,r_i,m_{<s}} \otimes (M^1_s)_{x_i,r_i,m_{<s}} \right) \right] \nonumber \\&\ge \frac{1}{2} \sum _{i\notin C} \; \mathop {\mathbb {E}}\limits _{\begin{array}{c} (x_i, r_i, m_{< s}) \leftarrow X^1_iR^1_iM^1_{< s} \end{array}}\left[ \mathrm {I}\,\left( (Y^1_i)_{x_i,r_i,m_{<s}} \, \!: \, \!(M^1_s)_{x_i,r_i,m_{<s}} \right) \right] \end{aligned}$$
(21)
$$\begin{aligned}&= \frac{1}{2}\sum _{i\notin C} \; \mathrm {I}\,\left( Y^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!X^1_iR^1_iM^1_{<s} \right) \end{aligned}$$
(22)

Above, we have used Fact 2.4 several times. Equation (19) follows from the definition of \(R_i\), Eq. (20) follows from the fact that \(Y\leftrightarrow X_iR_iM_{<s}\leftrightarrow M_s\) for any i, when s is odd, and Eq. (21) follows from Fact 2.5. From a symmetric argument, we can show that when \(s \in [t]\) is even then

$$\begin{aligned} \frac{1}{2} \sum _{i \notin C} \; \mathrm {I}\,\left( X^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!Y^1_iR^1_iM^1_{<s} \right) \le \delta k. \end{aligned}$$

This and Eq. (22) together imply that

$$\begin{aligned} \sum _{i\notin C} \left( \sum _{s\text {odd}} \mathrm {I}\,\left( Y^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_iX^1_iM^1_{<s} \right) + \sum _{s\text {even}} \mathrm {I}\,\left( X^1_i \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_iY^1_iM^1_{<s} \right) \right) \le 2 \delta kt. \end{aligned}$$
(23)

Note that in a true protocol, the LHS in the above inequality is 0. Here we prove that conditioning on success on all coordinates in C, it is still small.

Combining Eqs. (13), (17), (18) and (23), and making standard use of Markov’s inequality, we can get a coordinate \(j \notin C\) such that

$$\begin{aligned}&\mathrm {D}\,\left( X^1_j Y^1_j \big \Vert X_j Y_j \right) \le 12 \delta \end{aligned}$$
(24)
$$\begin{aligned}&\mathop {\mathbb {E}}\limits _{\begin{array}{c} (r_j, x_j) \leftarrow R^1_jX^1_j \end{array}}\left[ \mathrm {D}\,\left( \left( Y_j^1 \right) _{r_j,x_j} \big \Vert \left( Y_j \right) _{x_j} \right) \right] \le 12 \delta \end{aligned}$$
(25)
$$\begin{aligned}&\mathop {\mathbb {E}}\limits _{\begin{array}{c} (r_j, y_j) \leftarrow R^1_jY^1_j \end{array}}\left[ \mathrm {D}\,\left( \left( X_j^1 \right) _{r_j,y_j} \big \Vert \left( X_j \right) _{y_j} \right) \right] \le 12 \delta \end{aligned}$$
(26)
$$\begin{aligned}&\sum _{s \text {odd}} \mathrm {I}\,\left( X^1_j \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_jY^1_jM^1_{<s} \right) + \sum _{s \text {even}} \mathrm {I}\,\left( Y^1_j \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_jX^1_jM^1_{<s} \right) \le 12 \delta _1 c \end{aligned}$$
(27)
$$\begin{aligned}&\sum _{s \text {odd}} \mathrm {I}\,\left( Y^1_j \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_jX^1_jM^1_{<s} \right) + \sum _{s \text {even}} \mathrm {I}\,\left( X^1_j \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_jY^1_jM^1_{<s} \right) \le 12 \delta t. \end{aligned}$$
(28)

Let

$$\begin{aligned} \varepsilon ^{\prime }&\mathop {=}\limits ^{\mathrm {def}}\frac{\varepsilon }{125t} \end{aligned}$$
(29)
$$\begin{aligned} \varepsilon _s&\mathop {=}\limits ^{\mathrm {def}}{\left\{ \begin{array}{ll} \mathrm {I}\,\left( Y^1_j \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_jX^1_jM^1_{<s} \right) &{}\text {if}\, s \in [t]\,\text {is odd}\\ \mathrm {I}\,\left( X^1_j \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_jY^1_jM^1_{<s} \right) &{}\text {if}\, s \in [t]\, \text {is even} \end{array}\right. }\end{aligned}$$
(30)
$$\begin{aligned} c_s&\mathop {=}\limits ^{\mathrm {def}}{\left\{ \begin{array}{ll} \mathrm {I}\,\left( Y^1_j \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_jX^1_jM^1_{<s} \right) &{}\text {if}\, s \in [t]\, \text {is even}\\ \mathrm {I}\,\left( X^1_j \, \!: \, \!M^1_s \, \!\big \vert \, \!R^1_jY^1_jM^1_{<s} \right) &{}\text {if}\, s \in [t]\,\text {is odd.} \end{array}\right. } \end{aligned}$$
(31)

From Eq. (28) and Cauchy–Schwartz inequality, we have that \(\sum _{s=1}^t \sqrt{\varepsilon _s} \le \sqrt{12\delta } t \). From Eqs. (24) to (26) and Lemma 2.12, we can infer that \(\left( X_j, Y_j \right) \) is \((1-10\sqrt{3\delta })\)-embeddable in \(\left( X^1_j R^1_j, Y^1_j R^1_j \right) \). This, combined with Eqs. (27) and (28) and Lemma 3.5, (by taking \(\varepsilon ^{\prime }\), \(\varepsilon _s\), and \(c_s\) in Lemma 3.5 to be Eqs. (29) and (30) and Eq. (31), respectively, and taking \( X Y X^{\prime } Y^{\prime } R^{\prime } M_1^{\prime } \cdots M_t^{\prime }\) to be \( X_j Y_j X_j^1 Y^1_j R^1_j M_1^1 \cdots M_t^1 \)) imply the following. There exists a public-coin t-round protocol \(\mathscr {Q}^1\) between Alice, with input \(X_j\), and Bob, with input \(Y_j\), with Alice sending the first message and total communication

$$\begin{aligned} \frac{12 \delta _1 c + 5t}{\varepsilon ^{\prime }} + O\,\left( t \log \frac{1}{\varepsilon ^{\prime }} \right) < \mathrm {D}^{(t), \mu }_{\varepsilon }(f) \end{aligned}$$

such that at the end Alice possesses \(R_AM_{A,1} \cdots M_{A,t}\) and Bob possesses \(R_BM_{B,1} \cdots M_{B,t}\), satisfying

$$\begin{aligned}&\left\| X_j Y_j R_A R_B M_{A,1}M_{B,1} \cdots M_{A,t}M_{B,t} - X^1_j Y^1_j R^1_j R^1_j M^1_1M^1_1 \cdots M^1_tM^1_t \right\| _{1}\\&\le 10 \sqrt{3 \delta } + 3 \sqrt{12 \delta } t + 6 \varepsilon ^{\prime } t < \frac{\varepsilon }{2}. \end{aligned}$$

Assume towards contradiction that \(\Pr \,\left[ T_{j}=1 \big \vert T^{\left( r \right) }= 1 \right] > 1-\frac{\varepsilon }{2}\). Consider a protocol \(\mathscr {Q}^2\) (with no communication) for f between Alice, with input \(X^1_j R^1_j M^1_1 \cdots M^1_t\), and Bob, with input \(Y^1_jR^1_j M^1_1 \cdots M^1_t\), as follows. Bob generates the rest of the random variables present in \(Y^1\) (not present in his input) himself. He can do this because, conditioned on his input, those other random variables are independent of Alice’s input. (Here we used Fact 2.15.) He then generates the output for the j-th coordinate in \(\mathscr {Q}\), and makes it the output of \(\mathscr {Q}^2\). This ensures that the success probability of \(\mathscr {Q}^2\) is \(\Pr \,\left[ T_{j}=1 \big \vert T^{\left( r \right) }= 1 \right] >1 - \frac{\varepsilon }{2}\). Now consider a protocol \(\mathscr {Q}^3\) for f, with Alice’s input \(X_j\) and Bob’s input \(Y_j\), which is a composition of \(\mathscr {Q}^1\) followed by \(\mathscr {Q}^2\). This ensures, using Fact 2.9, that the probability of success (averaged over the public coins and inputs \(X_j\) and \(Y_j\)) of \(\mathscr {Q}^3\) is larger than \(1 - \varepsilon \). Finally, by fixing the public coins of \(\mathscr {Q}^3\), we get a deterministic protocol \(\mathscr {Q}^4\) for f with Alice’s input \(X_j\) and Bob’s input \(Y_j\) such that the communication of \(\mathscr {Q}^4\) is less than \(\mathrm {D}^{(t), \mu }_{\varepsilon }(f)\) and the success probability (averaged over the inputs \(X_j\) and \(Y_j\)) of \(\mathscr {Q}^4\) is larger than \(1 - \varepsilon \). This is a contradiction to the definition of \(\mathrm {D}^{(t), \mu }_{\varepsilon }(f)\). (Recall that \(X_jY_j\) are distributed according to \(\mu \).) So it must be that \(\Pr \,\left[ T_{j} = 1 \big \vert T^{\left( r \right) }= 1 \right] \le 1 - \frac{\varepsilon }{2}\). The claim now follows by setting \(i_{r+1} = j\).

4 Deferred Proofs

Proof of Lemma 3.3

Let us introduce a new random variable N with joint distribution

$$\begin{aligned} X^{\prime } Y^{\prime } N \mathop {=}\limits ^{\mathrm {def}}(X^{\prime }Y^{\prime })(M^{\prime }|X^{\prime }). \end{aligned}$$

Note that \(Y^{\prime } \leftrightarrow X^{\prime } \leftrightarrow N\) is a Markov chain. Using Lemma 2.2, we have that

$$\begin{aligned} \mathrm {D}\,\left( X^{\prime }Y^{\prime }M^{\prime } \big \Vert X^{\prime }Y^{\prime }N \right) = \mathrm {I}\,\left( Y^{\prime } \, \!: \, \!M^{\prime } \, \!\big \vert \, \!X^{\prime } \right) \le \varepsilon . \end{aligned}$$
(32)

Applying Fact 2.6, we get \(\left\| X^{\prime }Y^{\prime }M^{\prime }-X^{\prime }Y^{\prime }N \right\| _{1} \le \sqrt{\varepsilon }\). Theorem 3.1 and Claim 4.1 below together imply that there exists a public-coin protocol between Alice and Bob, with inputs \(X^{\prime }\) and \(Y^{\prime }\), with a single message from Alice to Bob of \(\frac{c+5}{\varepsilon ^{\prime }}+O\,\left( \log \frac{1}{\varepsilon +\varepsilon ^{\prime }} \right) =\frac{c+5}{\varepsilon ^{\prime }}+O\,\left( \log \frac{1}{\varepsilon ^{\prime }} \right) \) bits, at the end of which Alice and Bob possess random variables \(N^{\prime }_A\) and \(N^{\prime }_B\), respectively, satisfying

$$\begin{aligned} \left\| X^{\prime } Y^{\prime } N^{\prime }_AN^{\prime }_B - X^{\prime } Y^{\prime } NN \right\| _{1} \le 2 \sqrt{\varepsilon } + 6 \varepsilon ^{\prime }. \end{aligned}$$

Finally, using the triangle inequality for the \(\ell _1\) norm we conclude the desired.

Claim 4.1

Let random variables \(X^{\prime }\), \(Y^{\prime }\), \(M^{\prime }\), and N and numbers c, \(\varepsilon \), and \(\varepsilon ^{\prime }\) be the same as in the statement and the proof of Lemma 3.3. It holds that

$$\begin{aligned} \Pr _{\begin{array}{c} (m,x,y) \leftarrow NX^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] } \ge \frac{c+5}{\varepsilon ^{\prime }} \right] \le 3 \varepsilon ^{\prime } + \sqrt{\varepsilon }. \end{aligned}$$

Proof

For any m, x, and y, it holds that

$$\begin{aligned} \log \frac{\Pr \,\left[ N=m|X^{\prime }=x \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] }&= \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] } \\&= \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] }\\&\quad +\, \log \frac{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|Y^{\prime }=y \right] } \\&\quad +\, \log \frac{\Pr \,\left[ M^{\prime }=m, Y^{\prime }=y \right] }{\Pr \,\left[ N=m, Y^{\prime }=y \right] }. \end{aligned}$$

From the union bound, the above calculation, and using that \( 1 > \varepsilon > 0 \), we get

$$\begin{aligned}&\Pr _{\begin{array}{c} (m,x,y) \leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] } \ge \frac{c+5}{\varepsilon ^{\prime }} \right] \\&\qquad \qquad {} = \Pr _{\begin{array}{c} (m,x,y) \leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] } \ge \frac{c+5}{\varepsilon ^{\prime }} \right] \\&\qquad \qquad {} \le \Pr _{\begin{array}{c} (m,x,y) \leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] } \ge \frac{\varepsilon +1}{\varepsilon ^{\prime }} \right] \\&\qquad \qquad \quad {} + \Pr _{\begin{array}{c} (m,x,y) \leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|Y^{\prime }=y \right] } \ge \frac{c+1}{\varepsilon ^{\prime }} \right] \\&\qquad \qquad \quad {} + \Pr _{\begin{array}{c} (m,x,y) \leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ M^{\prime }=m, Y^{\prime }=y \right] }{\Pr \,\left[ N=m, Y^{\prime }=y \right] } \ge \frac{\varepsilon +1}{\varepsilon ^{\prime }} \right] . \end{aligned}$$

We bound each of the above term separately, starting with the first one. Let us define the set

$$\begin{aligned} G_1 \mathop {=}\limits ^{\mathrm {def}}\left\{ (m,x,y) : \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] } < \frac{\varepsilon +1}{\varepsilon ^{\prime }} \right\} . \end{aligned}$$

The following calculation gives a bound on the first term.

$$\begin{aligned} 0&\ge -\mathop {\mathbb {E}}\limits _{\begin{array}{c} (x,y) \leftarrow X^{\prime }Y^{\prime } \end{array}}\left[ \mathrm {D}\,\left( M^{\prime }_{xy} \big \Vert N_{xy} \right) \right] \nonumber \\&= \mathop {\mathbb {E}}\limits _{\begin{array}{c} (m,x,y) \leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] } \right] \end{aligned}$$
(33)
$$\begin{aligned}&= \sum _{(m,x,y) \in G_1} \left( \Pr \,\left[ M^{\prime }=m, X^{\prime }=x, Y^{\prime }=y \right] \cdot \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] } \right) \nonumber \\&\quad {} + \sum _{(m,x,y) \notin G_1} \left( \Pr \,\left[ M^{\prime }=m, X^{\prime }=x, Y^{\prime }=y \right] \cdot \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] } \right) \nonumber \\&\ge \sum _{(m,x,y) \in G_1} \left( \Pr \,\left[ M^{\prime }=m, X^{\prime }=x, Y^{\prime }=y \right] \cdot \log \frac{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] } \right) \nonumber \\&\quad {} + \Pr \,\left[ \left( M^{\prime },X^{\prime },Y^{\prime } \right) \notin G_1 \right] \cdot \frac{\varepsilon +1}{\varepsilon ^{\prime }} \end{aligned}$$
(34)
$$\begin{aligned}&= \sum _{(m,x,y) \notin G_1} \left( \Pr \,\left[ M^{\prime }=m, X^{\prime }=x, Y^{\prime }=y \right] \cdot \log \frac{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ N=m|X^{\prime }=x, Y^{\prime }=y \right] } \right) \nonumber \\&\quad {} -\, \mathrm {D}\,\left( M^{\prime }X^{\prime }Y^{\prime } \big \Vert NX^{\prime }Y^{\prime } \right) + \Pr \,\left[ \left( M^{\prime },X^{\prime },Y^{\prime } \right) \notin G_1 \right] \cdot \frac{\varepsilon +1}{\varepsilon ^{\prime }} \end{aligned}$$
(35)
$$\begin{aligned}&\ge - 1 - \varepsilon + \Pr \,\left[ \left( M^{\prime },X^{\prime },Y^{\prime } \right) \notin G_1 \right] \cdot \frac{\varepsilon +1}{\varepsilon ^{\prime }} \end{aligned}$$
(36)

Above, Eqs. (33) and (35) follow from the definition of the relative entropy and Eq. (34) follows from the definition of \(G_1\). To get Eq. (36), we use Fact 2.7 and Eq. (32). Equation (36) implies that

$$\begin{aligned} \Pr \,\left[ \left( M^{\prime },X^{\prime },Y^{\prime } \right) \notin G_1 \right] \le \varepsilon ^{\prime }. \end{aligned}$$

To upper bound the second term, let us define

$$\begin{aligned} G_2 \mathop {=}\limits ^{\mathrm {def}}\left\{ (m,x,y) : \log \frac{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|Y^{\prime }=y \right] } < \frac{c+1}{\varepsilon ^{\prime }} \right\} . \end{aligned}$$

The following calculation gives a bound on the second term.

$$\begin{aligned} c&\ge \mathrm {I}\,\left( M^{\prime } \, \!: \, \!X^{\prime } \, \!\big \vert \, \!Y^{\prime } \right) \end{aligned}$$
(37)
$$\begin{aligned}&= \mathop {\mathbb {E}}\limits _{\begin{array}{c} (m,x,y)\leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x, Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|Y^{\prime }=y \right] } \right] \end{aligned}$$
(38)
$$\begin{aligned}&= \sum _{(m,x,y) \in G_2} \left( \Pr \,\left[ M^{\prime }=m, X^{\prime }=x, Y^{\prime }=y \right] \cdot \log \frac{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x,Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|Y^{\prime }=y \right] } \right) \nonumber \\&\quad {} + \sum _{(m,x,y) \notin G_2} \left( \Pr \,\left[ M^{\prime }=m, X^{\prime }=x, Y^{\prime }=y \right] \cdot \log \frac{\Pr \,\left[ M^{\prime }=m|X^{\prime }=x,Y^{\prime }=y \right] }{\Pr \,\left[ M^{\prime }=m|Y^{\prime }=y \right] } \right) \nonumber \\&\ge - 1 + \frac{c+1}{\varepsilon ^{\prime }} \cdot \Pr \,\left[ \left( M^{\prime },X^{\prime },Y^{\prime } \right) \notin G_2 \right] \end{aligned}$$
(39)

Above, Eq. (37) is one of the assumptions in Lemma 3.3. Equation (38) follows from the definition of the conditional mutual information and Eq. (39) follows from the definition of \(G_2\) and Fact 2.7. Equation (39) implies that

$$\begin{aligned} \Pr \,\left[ \left( M^{\prime },X^{\prime },Y^{\prime } \right) \notin G_2 \right] \le \varepsilon ^{\prime }. \end{aligned}$$

To bound the last term, we define

$$\begin{aligned} G_3 \mathop {=}\limits ^{\mathrm {def}}\left\{ (m,x,y) : \log \frac{\Pr \,\left[ M^{\prime }=m, Y^{\prime }=y \right] }{\Pr \,\left[ N=m, Y^{\prime }=y \right] } < \frac{\varepsilon +1}{\varepsilon ^{\prime }} \right\} . \end{aligned}$$

The following calculation gives a bound on the third term.

$$\begin{aligned} \varepsilon&\ge \mathrm {D}\,\left( X^{\prime }Y^{\prime }M^{\prime } \big \Vert X^{\prime }Y^{\prime }N \right) \nonumber \\&\ge \mathrm {D}\,\left( Y^{\prime }M^{\prime } \big \Vert Y^{\prime }N \right) \end{aligned}$$
(40)
$$\begin{aligned}&= \mathop {\mathbb {E}}\limits _{\begin{array}{c} (m,x,y) \leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ M^{\prime }=m, Y^{\prime }=y \right] }{\Pr \,\left[ N=m, Y^{\prime }=y \right] } \right] \nonumber \\&= \sum _{(m,x,y) \in G_3} \left( \Pr \,\left[ M^{\prime }=m, X^{\prime }=x, Y^{\prime }=y \right] \cdot \log \frac{\Pr \,\left[ M^{\prime }=m, Y^{\prime }=y \right] }{\Pr \,\left[ N=m,Y^{\prime }=y \right] } \right) \nonumber \\&\quad {} + \sum _{(m,x,y) \notin G_3} \left( \Pr \,\left[ M^{\prime }=m, X^{\prime }=x, Y^{\prime }=y \right] \cdot \log \frac{\Pr \,\left[ M^{\prime }=m, Y^{\prime }=y \right] }{\Pr \,\left[ N=m, Y^{\prime }=y \right] } \right) \nonumber \\&\ge -1 + \Pr \,\left[ \left( M^{\prime },X^{\prime },Y^{\prime } \right) \notin G_3 \right] \cdot \frac{\varepsilon +1}{\varepsilon ^{\prime }} \end{aligned}$$
(41)

Above, Eq. (40) follows from Fact 2.8 and Eq. (41) follows from the definition of \(G_3\) and Fact 2.7. Equation (41) implies that

$$\begin{aligned} \Pr \,\left[ \left( M^{\prime },X^{\prime },Y^{\prime } \right) \notin G_3 \right] \le \varepsilon ^{\prime }. \end{aligned}$$

Combining the bounds for the three terms we get

$$\begin{aligned} \Pr _{\begin{array}{c} (m,x,y)\leftarrow M^{\prime }X^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] } \ge \frac{c+5}{\varepsilon ^{\prime }} \right] \le 3 \varepsilon ^{\prime }. \end{aligned}$$

Using \(\left\| X^{\prime }Y^{\prime }M^{\prime } - X^{\prime }Y^{\prime }N \right\| _{1} \le \sqrt{\varepsilon } \) (as was shown previously), we finally have

$$\begin{aligned} \Pr _{\begin{array}{c} (m,x,y)\leftarrow NX^{\prime }Y^{\prime } \end{array}}\left[ \log \frac{\Pr \,\left[ N=m|X^{\prime }=x \right] }{\Pr \,\left[ N=m|Y^{\prime }=y \right] } \ge \frac{c+5}{\varepsilon ^{\prime }} \right] \le 3 \varepsilon ^{\prime }+\sqrt{\varepsilon }. \end{aligned}$$

\(\square \)

Proof of Lemma 3.4

We prove the lemma by induction on t. For the base case \(t=1\), note that

$$\begin{aligned} \mathrm {I}\,\left( X^{\prime }R^{\prime } \, \!: \, \!M_1^{\prime } \, \!\big \vert \, \!Y^{\prime }R^{\prime } \right)&= \mathrm {I}\,\left( X^{\prime } \, \!: \, \!M_1^{\prime } \, \!\big \vert \, \!Y^{\prime }R^{\prime } \right) \le c_1 \end{aligned}$$

and

$$\begin{aligned} \mathrm {I}\,\left( Y^{\prime }R^{\prime } \, \!: \, \!M_1^{\prime } \, \!\big \vert \, \!X^{\prime }R^{\prime } \right)&= \mathrm {I}\,\left( Y^{\prime } \, \!: \, \!M_1^{\prime } \, \!\big \vert \, \!X^{\prime }R^{\prime } \right) \le \varepsilon _1. \end{aligned}$$

Lemma 3.3 implies (by taking \(X^{\prime }\), \(Y^{\prime }\), and \(M^{\prime }\) in Lemma 3.3 to be \(X^{\prime }R^{\prime }\), \(Y^{\prime }R^{\prime }\), and \(M_1^{\prime }\) respectively) that Alice, with input \(X^{\prime }R^{\prime }\), and Bob, with input \(Y^{\prime }R^{\prime }\), can run a public-coin protocol with a single message from Alice to Bob of

$$\begin{aligned} \frac{c_1+5}{\varepsilon ^{\prime }}+O\,\left( \log \frac{1}{\varepsilon ^{\prime }} \right) \end{aligned}$$

bits and generate random variables \(M_{A,1}^{\prime }\) and \(M_{B,1}^{\prime }\), respectively, satisfying

$$\begin{aligned} \left\| R^{\prime } X^{\prime } Y^{\prime } M^{\prime }_1 M^{\prime }_1 - R^{\prime } X^{\prime } Y^{\prime } M_{A,1}^{\prime }M_{B,1}^{\prime } \right\| _{1} \le 3 \sqrt{\varepsilon _1} + 6 \varepsilon ^{\prime }. \end{aligned}$$

Now let \(t > 1\). We assume that t is odd. For even t, a similar argument follows. From the induction hypothesis, there exists a public-coin \(t-1\) round protocol \(\mathscr {P}_{t-1}\) between Alice, with input \(X^{\prime }R^{\prime }\), and Bob, with input \(Y^{\prime }R^{\prime }\), with Alice sending the first message, and total communication

$$\begin{aligned} \frac{\sum _{s=1}^{t-1}c_s+5(t-1)}{\varepsilon ^{\prime }} + O\,\left( (t-1) \log \frac{1}{\varepsilon ^{\prime }} \right) \end{aligned}$$
(42)

such that at the end both Alice and Bob possess random variables \(M_{A,1}^{\prime }, \ldots , M_{A,t-1}^{\prime }\) and \(M_{B,1}^{\prime }, \ldots , M_{B,t-1}^{\prime }\), satisfying

$$\begin{aligned}&\left\| R^{\prime } X^{\prime } Y^{\prime } M_{A,1}^{\prime }M_{B,1}^{\prime } \cdots M_{A,t-1}^{\prime }M_{B,t-1}^{\prime } - R^{\prime } X^{\prime } Y^{\prime } M_1^{\prime }M_1^{\prime } \cdots M_{t-1}^{\prime }M_{t-1}^{\prime } \right\| _{1} \nonumber \\&\quad \le 3 \sum _{s=1}^{t-1} \sqrt{\varepsilon _s} + 6 \varepsilon ^{\prime } (t-1). \end{aligned}$$
(43)

Note that

$$\begin{aligned} \mathrm {I}\,\left( Y^{\prime } R^{\prime } M_{<t}^{\prime } \, \!: \, \!M_t^{\prime } \, \!\big \vert \, \!X^{\prime }R^{\prime }M_{<t}^{\prime } \right) = \mathrm {I}\,\left( Y^{\prime } \, \!: \, \!M_t^{\prime } \, \!\big \vert \, \!X^{\prime }R^{\prime }M_{<t}^{\prime } \right) \le c_t \end{aligned}$$

and

$$\begin{aligned} \mathrm {I}\,\left( X^{\prime }R^{\prime }M_{<t}^{\prime } \, \!: \, \!M_t^{\prime } \, \!\big \vert \, \!Y^{\prime }R^{\prime }M_{<t}^{\prime } \right) = \mathrm {I}\,\left( X^{\prime } \, \!: \, \!M_t^{\prime } \, \!\big \vert \, \!Y^{\prime }R^{\prime }M_{<t}^{\prime } \right) \le \varepsilon _t. \end{aligned}$$

Therefore, Lemma 3.3 implies (by taking \(X^{\prime }\), \(Y^{\prime }\), and \(M^{\prime }\) in Lemma 3.3 to be \(X^{\prime }R^{\prime }M_{<t}^{\prime }\), \(Y^{\prime }R^{\prime }M_{<t}^{\prime }\), and \(M_t^{\prime }\) respectively) that Alice, with input \(X^{\prime }R^{\prime }M_{<t}^{\prime }\), and Bob, with input \(Y^{\prime }R^{\prime }M_{<t}^{\prime }\), can run a public coin protocol \(\mathscr {P}\) with a single message from Alice to Bob of

$$\begin{aligned} \frac{c_t+5}{\varepsilon ^{\prime }} + O\,\left( \log \frac{1}{\varepsilon ^{\prime }} \right) \end{aligned}$$
(44)

bits and generate new random variable \(M^{\prime \prime }_{A,t}\) and \(M^{\prime \prime }_{B,t}\), respectively, satisfying

$$\begin{aligned} \left\| R^{\prime } X^{\prime } Y^{\prime } M_1^{\prime } \cdots M_{t-1}^{\prime } M_t^{\prime }M_t^{\prime } - R^{\prime } X^{\prime } Y^{\prime } M_1^{\prime } \cdots M_{t-1}^{\prime } M^{\prime \prime }_{A,t} M^{\prime \prime }_{B,t} \right\| _{1} \!\le \! 3 \sqrt{\varepsilon _t} \!+\! 6 \varepsilon ^{\prime }. \end{aligned}$$
(45)

Fact 2.9 and Eq. (43) imply that

Thus, Alice, with input \(X^{\prime } R^{\prime } M_{A,<t}^{\prime }\), and Bob, with input \(Y^{\prime } R^{\prime } M_{B,<t}^{\prime }\), on running protocol \(\mathscr {P}\) will generate new random variables \(M_{A,t}^{\prime }\) and \(M_{B,t}^{\prime }\), respectively, satisfying

$$\begin{aligned}&\left\| R^{\prime } X^{\prime } Y^{\prime } M_{A,1}^{\prime }M_{B,1}^{\prime } \cdots M_{A,t-1}^{\prime }M_{B,t-1}^{\prime } M_{A,t}^{\prime }M_{B,t}^{\prime } \!-\! R^{\prime } X^{\prime } Y^{\prime } M_1^{\prime }M_1^{\prime } \cdots M_{t-1}^{\prime }M_{t-1}^{\prime } M_{A,t}^{\prime \prime }M_{B,t}^{\prime \prime } \right\| _{1} \nonumber \\&\quad = \left\| R^{\prime } X^{\prime } Y^{\prime } M_{A,1}^{\prime }M_{B,1}^{\prime } \cdots M_{A,t-1}^{\prime }M_{B,t-1}^{\prime } - R^{\prime } X^{\prime } Y^{\prime } M_1^{\prime }M_1^{\prime } \cdots M_{t-1}^{\prime }M_{t-1}^{\prime } \right\| _{1} \nonumber \\&\quad {} \le 3 \sum _{s=1}^{t-1} \sqrt{\varepsilon _s} + 6 \varepsilon ^{\prime } (t-1). \end{aligned}$$
(46)

where the equality follows from Fact 2.9 because \(M_{A,t}^{\prime }\) and \(M_{A,t}^{\prime \prime }\) can be obtained by applying a same function (protocol) on \(XR^{\prime }M^{\prime }_{A,<t}\) and \(Y^{\prime }R^{\prime }M^{\prime }_{B,<t}\), respectively. Same for \(M_{B,t}^{\prime }\) and \(M_{B,t}^{\prime \prime }\). The equality is from Eq. (43). Therefore, by composing protocol \(\mathscr {P}_{t-1}\) and protocol \(\mathscr {P}\), using Eqs. (42) and (44)–(46) and the triangle inequality for the \(\ell _1\) norm, we get a public-coin t-round protocol \(\mathscr {P}_t\) between Alice, with input \(X^{\prime }R^{\prime }\), and Bob, with input \(Y^{\prime }R^{\prime }\), with Alice sending the first message, and total communication

$$\begin{aligned} \frac{\sum _{s=1}^{t} c_s + 5 t}{\varepsilon ^{\prime }} + O\,\left( t \log \frac{1}{\varepsilon ^{\prime }} \right) , \end{aligned}$$

such that at the end Alice and Bob possess random variables \(M_{A,1}^{\prime }, \ldots , M_{A,t}^{\prime }\) and \(M_{B,1}^{\prime }, \ldots , M_{B,t}^{\prime }\), respectively, satisfying

$$\begin{aligned} \left\| R^{\prime }X^{\prime }Y^{\prime }M_1^{\prime }M_1^{\prime } \cdots M_t^{\prime }M_t^{\prime }-R^{\prime }X^{\prime }Y^{\prime }M_{A,1}^{\prime }M_{B,1}^{\prime }\cdots M_{A,t}^{\prime }M_{B,t}^{\prime } \right\| _{1} \le 3 \sum _{s=1}^t \sqrt{\varepsilon _s} + 6 \varepsilon ^{\prime } t. \end{aligned}$$

\(\square \)

Proof of Lemma 3.5

In \(\mathscr {Q}_t\), Alice and Bob, using public coins and no communication, first generate \(R_A\) and \(R_B\) such that \( \left\| X Y R_A R_B - X^{\prime } Y^{\prime } R^{\prime } R^{\prime } \right\| _{1} \le \tau \). They can do this because \(\left( X, Y \right) \) is \(\left( 1 - \tau \right) \)-embeddable in \(\left( X^{\prime }R^{\prime }, Y^{\prime }R^{\prime } \right) \). Now they will run protocol \(\mathscr {P}_t\) (given by Lemma 3.4) with Alice’s input being \(XR_A\) and Bob’s input being \(YR_B\) and at the end Alice and Bob possess \(M_{A,1},\ldots , M_{A,t}\) and \(M_{B,1},\ldots , M_{B,t}\), respectively. From Lemma 3.4, the communication of \(\mathscr {Q}_t\) is as desired. Now, from Fact 2.9, Lemma 3.4, and the triangle inequality for the \(\ell _1\) norm, we get

$$\begin{aligned}&\left\| X Y R_A R_B M_{A,1} M_{B,1} \cdots M_{A,t} M_{B,t} - X^{\prime } Y^{\prime } R^{\prime } R^{\prime } M_1^{\prime }M_1^{\prime } \cdots M_t^{\prime } M_t^{\prime } \right\| _{1} \\&\quad \le \tau + 3 \sum _{s=1}^t \sqrt{\varepsilon _s} + 6 \varepsilon ^{\prime }t. \end{aligned}$$

\(\square \)

5 Open Problems

Some natural questions that arise from this work are:

  1. 1.

    Recently Braverman et al. [6] improved our result by showing that

    $$\begin{aligned} \mathrm {R}^{(t),\mathrm {pub}}_{1-2^{-\varOmega (\varepsilon ^2k)}} \left( f^k \right) = \varOmega \,\left( \varepsilon ^2 \cdot k \cdot \left( \mathrm {R}^{(7t), \mathrm {pub}}_{\varepsilon }(f) - \kappa \left( \frac{t \log t}{\varepsilon } - \frac{t}{\varepsilon ^2} \right) \right) \right) , \end{aligned}$$

    for some constant \(\kappa \). Can the dependence on t be improved further?

  2. 2.

    Direct product conjectures for quantum communication complexity are still widely open. Can these techniques be extended to show direct product theorems for bounded-round quantum communication complexity?