A Discrepancy Lower Bound for Information Complexity

Braverman, Mark; Weinstein, Omri

doi:10.1007/s00453-015-0093-8

A Discrepancy Lower Bound for Information Complexity

Published: 30 November 2015

Volume 76, pages 846–864, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Algorithmica Aims and scope Submit manuscript

A Discrepancy Lower Bound for Information Complexity

Download PDF

Mark Braverman¹ &
Omri Weinstein²

398 Accesses
10 Citations
Explore all metrics

Abstract

This paper provides the first general technique for proving information lower bounds on two-party unbounded-rounds communication problems. We show that the discrepancy lower bound, which applies to randomized communication complexity, also applies to information complexity. More precisely, if the discrepancy of a two-party function f with respect to a distribution $\mu $ is $Disc_\mu f$, then any two party randomized protocol computing f must reveal at least $\Omega (\log (1/Disc_\mu f))$ bits of information to the participants. As a corollary, we obtain that any two-party protocol for computing a random function on $\{0,1\}^n\times \{0,1\}^n$ must reveal $\Omega (n)$ bits of information to the participants. In addition, we prove that the discrepancy of the Greater-Than function is $\Omega (1/\sqrt{n})$, which provides an alternative proof to the recent proof of Viola (Proceedings of the twenty-fourth annual ACM-SIAM symposium on discrete algorithms, SODA 2013, New Orleans, LA, USA, 6–8 Jan 2013, pp 632–651, 2013) of the $\Omega (\log n)$ lower bound on the communication complexity of this well-studied function and, combined with our main result, proves the tight $\Omega (\log n)$ lower bound on its information complexity. The proof of our main result develops a new simulation procedure that may be of an independent interest. In a followup breakthrough work of Kerenidis et al. (53rd annual IEEE symposium on foundations of computer science, FOCS 2012, New Brunswick, NJ, USA, 20–23 Oct 2012, pp 500–509, 2012), our simulation procedure served as a building block towards a proof that almost all known lower bound techniques for communication complexity (and not just discrepancy) apply to information complexity as well.

New Lower Bounds for Privacy in Communication Protocols

Communication Complexity with Small Advantage

Article 20 April 2020

A Direct Product Theorem for Two-Party Bounded-Round Public-Coin Communication Complexity

Article 18 December 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The main objective of this paper is to expand the available techniques for proving information complexity lower bounds for communication problems. Let $f:\mathcal{X}\times \mathcal{Y}\rightarrow \{0,1\}$ be a function, and $\mu $ be a distribution on $\mathcal{X}\times \mathcal{Y}$. Informally, the information complexity of f is the least amount of information that Alice and Bob need to exchange on average to compute f(x, y) using a randomized communication protocol if initially x is given to Alice, y is given to Bob, and $(x,y)\sim \mu $. Note that information here is measured in the Shannon sense, and the amount of information may be much smaller than the number of bits exchanged. Thus the randomized communication complexity of f is an upper bound on its information complexity, but may not be a lower bound.

Within the context of communication complexity, information complexity has first been introduced in the context of direct sum theorems for randomized communication complexity [2, 7, 9]. These techniques are also being used in the related direction of direct product theorems [12, 16, 19, 20]. A direct sum theorem in a computational model states that the amount of resources needed to perform k independent tasks is roughly k times the amount of resources c needed for computing a single task. A direct product theorem, which is a stronger statement, asserts that any attempt to solve k independent tasks using o(kc) resources would result in an exponentially small success probability $2^{-\Omega (k)}$.

The direct sum line of work [2, 5, 11, 14] has eventually led to a tight connection (equality) between amortized communication complexity and information complexity. Thus proving lower bounds on the communication complexity of k copies of f for a growing k is equivalent to proving lower bounds on the information complexity of f. In particular if f satisfies $IC(f)=\Omega (CC(f))$, i.e. that its information cost is asymptotically equal to its communication complexity, then a strong direct sum theorem holds for f. In addition to the intrinsic interest of understanding the amount of information exchange that needs to be involved in computing f, direct sum theorems motivate the development of techniques for proving lower bounds on the information complexity of functions.

Another important motivation for seeking lower bounds on the information complexity of functions stems from understanding the limits of security in two-party computation. In a celebrated result Ben-Or et al. [3] (see also [1]) showed how a multi-party computation (with three or more parties) may be carried out in a way that reveals no information to the participants except for the computation’s output. The protocol relies heavily on the use of random bits that are shared between some, but not all, parties. Such a resource can clearly not exist in the two-party setting. While it can be shown that perfect information security is unattainable by two-party protocols [6, 8], quantitatively it is not clear just how much information the parties must “leak” to each other to compute f. The quantitative answer depends on the model in which the leakage occurs, and whether quantum computation is allowed [15]. Information complexity answers this question in the strongest possible sense for classical protocols: the parties are allowed to use private randomness to help them “hide” their information, and the information revealed is measured on average. Thus an information complexity lower bound of I on a problem implies that the average (as opposed to worst-case) amount of information revealed to the parties is at least I.

As mentioned above, the information complexity is always upper bounded by the communication complexity of f. The converse is not known to be true. Moreover, lower bound techniques for communication complexity do not readily translate into lower bound techniques for information complexity. The key difference is that a low-information protocol is not limited in the amount of communication it uses, and thus rectangle-based communication bounds do not readily convert into information lower bounds. No general technique has been known to yield sharp information complexity lower bounds. A linear lower bound on the communication complexity of the disjointness function has been shown in [22]. An information-theoretic proof of this lower bound [7] can be adapted to prove a linear information lower bound on disjointness [5]. One general technique for obtaining (weak) information complexity lower bounds was introduced in [5], where it has been shown that any function that has I bits of information complexity, has communication complexity bounded by $2^{O(I)}$. This immediately implies that the information complexity of a function f is at least the log of its communication complexity ($IC(f)\ge \Omega (\log (CC(f)))$). In fact, this result easily follows from the stronger result we prove below (Theorem 3.1).

1.1 Our Results

In this paper we prove that the discrepancy method—a general communication complexity lower bound technique—generalizes to information complexity. The discrepancy of f with respect to a distribution $\mu $ on inputs, denoted $Disc_\mu (f)$, measures how “unbalanced” f can get on any rectangle, where the balancedness is measured with respect to $\mu $:

$$\begin{aligned} Disc_{\mu }(f)\mathop {=}\limits ^{def}\max _{\text {rectangles }R} \bigg | \Pr _\mu [f(x,y) = 0 \wedge (x,y)\in R] - \Pr _\mu [f(x,y) = 1 \wedge (x,y)\in R] \bigg |. \end{aligned}$$

A well-known lower bound (see e.g [18]) asserts that the distributional communication complexity of f, denoted $D^\mu _{1/2-\epsilon }(f)$, when required to predict f with advantage $\varepsilon $ over a random guess (with respect to $\mu $), is bounded from below by $\Omega (\log 1/Disc_{\mu }(f))$:

$$\begin{aligned} D^\mu _{1/2-\epsilon }(f) \ge \log (2\epsilon /Disc_{\mu }(f)). \end{aligned}$$

Note that the lower bound holds even if we are merely trying to get an advantage of $\varepsilon =\sqrt{Disc_{\mu }(f)}$ over random guessing in computing f. We prove that the information complexity of computing f with probability 9 / 10 with respect to $\mu $ is also bounded from below by $\Omega (\log (1/Disc_{\mu }(f)))$.

Theorem 1.1

Let $f{:} \mathcal{X}\times \mathcal{Y}\rightarrow \{0,1\}$ be a Boolean function and let $\mu $ be any probability distribution on $\mathcal{X}\times \mathcal{Y}$. Then

$$\begin{aligned} {\mathsf {IC}_{\mu }(f,1/10)} \ge \Omega (\log (1/Disc_{\mu }(f))). \end{aligned}$$

Remark 1.2

The choice of 9 / 10 is somewhat arbitrary. For randomized worst-case protocols, we may replace the success probability with $1/2 + \delta $ for a constant $\delta $, since repeating the protocol constantly many times ($1/\delta ^2$) would yield the aforementioned success rate, while the information cost of the repeated protocol differs only by a constant factor from the original one. In particular, using prior-free information cost [5] this implies $ {\mathsf {IC}\left( f,1/2-\delta \right) } \ge \Omega (\delta ^2 \log (1/Disc_{\mu }(f))).$

In particular, Theorem 1.1 implies a linear lower bound on the information complexity of the inner product function $IP(x,y)=\sum _{i=1}^n x_i y_i \mod 2$, and on a random boolean function $f_r{:}\{0,1\}^n \times \{0,1\}^n \rightarrow \{0,1\}$, expanding the (limited) list of functions for which nontrivial information-complexity lower bounds are known:

Corollary 1.3

The information complexity ${\mathsf {IC}_{uniform}(IP,1/10)}$ of IP(x, y) is $\Omega (n)$. The information complexity ${\mathsf {IC}_{uniform}(f_r,1/10)}$ of a random function $f_r$ is $\Omega (n)$, except with probability $2^{-\Omega (n)}$.

We study the communication and information complexity of the Greater-Than function ($GT_n$) on numbers of length n. This is a very well-studied problem [18, 21, 24]. Only very recently the tight lower bound of $\Omega ({\log n})$ in the public-coins probabilistic model was given by Viola [25]. We show that the discrepancy of the $GT_n$ function is $\Omega (1/\sqrt{n})$:

Lemma 1.4

There exist a distribution $\mu _n$ on $\mathcal {X}\times \mathcal {Y}$ such that the discrepancy of $GT_n$ with respect to $\mu _n$ satisfies $\; Disc_{\mu _n}(GT_n)< 20/\sqrt{n}.$

We defer the proof to the “Appendix 4”. Lemma 1.4 provides an alternative (arguably simpler) proof of Viola’s [25] lower bound on the communication complexity of $GT_n$. By Theorem 1.1, Lemma 1.4 immediately implies a lower bound on the information complexity of $GT_n$:

Corollary 1.5

${\mathsf {IC}_{\mu _n}(GT_n,1/10)} = \Omega (\log {n})$

This settles the information complexity of the GT function, since this problem can be solved by a randomized protocol with $O(\log {n})$ communication (see [18]). This lower bound is particularly interesting since it demonstrates the first tight information-complexity lower bound for a natural function that is not linear.

The key technical idea in the proof of Theorem 1.1 is a new simulation procedure that allows us to convert any protocol that has information cost I into a (two-round) protocol that has communication complexity O(I) and succeeds with probability $>1/2 + 2^{-O(I)}$, yielding a $2^{-O(I)}$ advantage over random guessing. Combined with the discrepancy lower bound for communication complexity, this proves Theorem 1.1.

Our approach, namely, the observation that any nontrivial correlation with the function’s output requires $\Omega (\log (1/Disc_\mu f))$ communication, was used before by Viola and Wigderson [26], who observed the that the discrepancy of a function is essentially equivalent to the maximum correlation obtainable by a 2-bit protocol. We show here that this correlation is at least $1/2 + 2^{-\Omega (I)}$ (See also Remark 3.6 in Sect. 3).

1.2 Comparison and Connections to Prior Results

The most relevant prior work is an article by Lee et al. [20]. Improving on an earlier work of Shaltiel [23], Lee et al. show a direct product theorem for discrepancy, proving that the discrepancy of $f^{\otimes k}$—the k-wise XOR of a function f with itself—behaves as $Disc(f)^{\Omega (k)}$. This implies in particular that the communication complexity of $f^{\otimes k}$ scales at least as $\Omega (k\cdot \log Disc(f))$. Using the fact that the limit as $k\rightarrow \infty $ of the amortized communication complexity of f is equal to the information cost of f [4], the result of Lee et al. “almost” implies the bound of Theorem 1.1. Unfortunately, the amortized communication complexity in the sense of [4] is the amortized cost of k copies of f, where each copy is allowed to err with some probability (say 1 / 10). Generally speaking, this task is much easier than computing the XOR (which requires all copies to be evaluated correctly with high probability). Thus the lower bound that follows from Lee et al. applies to a more difficult problem, and does not imply the information complexity lower bound.

Another generic approach one may try to take is to use compression results such as [2] to lower bound the information cost from communication complexity lower bounds. The logic of such a proof would go as follows: “Suppose there was a information-complexity-I protocol $\pi $ for f, then if one can compress it into a low-communication protocol one may get a contradiction to the communication complexity lower bound f”. Unfortunately, all known compression results compress $\pi $ into a protocol $\pi '$ whose communication complexity depends on I but also on $CC(\pi )$. Even for external information complexity (which is always greater than the internal information complexity, the bound obtained in [2] is of the form $I_{ext}(\pi )\cdot {\text{ p }olylog} (CC(\pi ))$. Thus compression results of this type cannot rule out protocols that have low information complexity but a very high (e.g. exponential) communication complexity.

Our result can be viewed as a weak compression result for protocols, where a protocol for computing f that conveys I bits of information is converted into a protocol that uses O(I) bits of communication and giving an advantage of $2^{-O(I)}$ in computing f. This strengthens the result in [5] where a compression to $2^{O(I)}$ bits of communication has been shown. It should be noted that compression to O(I) bits of communication and constant (say 2 / 3) success probability (as opposed to small advantage over random guessing) has recently proven impossible by the breakthrough work of [10], who showed that $2^{\Omega (I)}$ bits is the best one can hope for.

In a very recent breakthrough work of Kerenidis et al. [17], our main protocol played an important role in the proof that almost all known lower bound techniques for communication complexity (and not just discrepancy) apply to information complexity. The results of [17] shed more light on the information complexity of many communication problems, and the question of whether interactive communication can be compressed.

2 Preliminaries

In an effort to make this paper as self-contained as possible, we provide some background on information theory and communication complexity, which is essential to proving our results. For further details and a more thorough treatment of these subjects see [4] and references therein.

Notation We reserve capital letters for random variables and distributions, calligraphic letters for sets, and small letters for elements of sets. Throughout this paper, we often use the notation |b to denote conditioning on the event $B=b$. Thus A|b is shorthand for $A|B=b$.

We use the standard notion of statistical/total variation distance between two distributions.

Definition 2.1

Let D and F be two random variables taking values in a set $\mathcal {S}$. Their statistical distance is $| D-F | \mathop {=}\limits ^{def}\max _{\mathcal {T} \subseteq \mathcal {S}}(|\Pr [D \in \mathcal {T}] - \Pr [F \in \mathcal {T}]|) = \frac{1}{2} \sum _{s \in \mathcal {S}}|\Pr [D=s]-\Pr [F=s]|$

2.1 Information Theory

Definition 2.2

(Entropy) The entropy of a random variable X is $H(X) \mathop {=}\limits ^{def}\sum _x \Pr [X=x] \log (1/\Pr [X=x]).$ The conditional entropy H(X|Y) is defined as ${\mathbf{E}}_{y \in _{R}Y}\left[ H(X|Y=y) \right] $.

Definition 2.3

We also use the notion of divergence (also known as Kullback-Leibler distance or relative entropy), which is a different way to measure the distance between two distributions:

Definition 2.4

(Divergence) The informational divergence between two distributions is

$$\begin{aligned}\mathbf{D} \left( A ||B \right) \mathop {=}\limits ^{def}\sum _x A(x) \log (A(x)/B(x)).\end{aligned}$$

Proposition 2.5

Let A, B, C be random variables in the same probability space. For every a in the support of A and c in the support of C, let $B_a$ denote $B|A=a$ and $B_{ac}$ denote $B|A=a,C=c$. Then $I(A;B | C) = {\mathbf E}_{a,c\in _R A,C} [\mathbf{D} \left( B_{ac} ||B_c \right) ]$.

2.2 Communication Complexity

We use the standard definitions of the computational model defined in [27]. For complete details see “Appendix 1”.

Given a communication protocol $\pi $, $\pi (x,y)$ denotes the concatenation of the public randomness with all the messages that are sent during the execution of $\pi $. We call this the transcript of the protocol. When referring to the random variable denoting the transcript, rather than a specific transcript, we will use the notation $\Pi (x,y)$—or simply $\Pi $ when x and y are clear from the context, thus $\pi (x,y)\in _{R}\Pi (x,y)$. When x and y are random variables themselves, we will denote the transcript by $\Pi (x,y)$, or just $\Pi $.

Definition 2.6

(Communication Complexity notation) For a function $f: \mathcal {X} \times \mathcal {Y} \rightarrow \mathbb {Z}_K$, a distribution $\mu $ supported on $\mathcal {X} \times \mathcal {Y}$, and a parameter $\epsilon > 0$, $D^\mu _{\epsilon }(f)$ denotes the communication complexity of the cheapest deterministic protocol computing f on inputs sampled according to $\mu $ with error $\epsilon $.

Definition 2.7

(Combinatorial Rectangle) A Rectangle . in $\mathcal {X} \times \mathcal {Y}$ is a subset $R\subseteq \mathcal {X} \times \mathcal {Y}$ which satisfies $ (x_1,y_1) \in R \text { and } (x_2,y_2) \in R \Longrightarrow (x_1,y_2) \in R$

2.3 Information + Communication: The Information Cost of a Protocol

The following quantity, which is implicit in [7] and was explicitly defined in [2], is the central notion of this paper.

Definition 2.8

The (internal) information cost of a protocol $\pi $ over inputs drawn from a distribution $\mu $ on $\mathcal{X}\times \mathcal{Y}$, is given by:

$$\begin{aligned} {\mathsf {IC}_{\mu }(\pi )}:=I(\Pi ;X|Y)+I(\Pi ;Y|X). \end{aligned}$$

Intuitively, Definition 2.8 captures how much the two parties learn about each other’s inputs from the execution transcript of the protocol $\pi $. The first term captures what the second player learns about X from $\Pi $—the mutual information between the input X and the transcript $\Pi $ given the input Y. Similarly, the second term captures what the first player learns about Y from $\Pi $.

Note that the information of a protocol $\pi $ depends on the prior distribution $\mu $, as the mutual information between the transcript $\Pi $ and the inputs depends on the prior distribution on the inputs. To give an extreme example, if $\mu $ is a singleton distribution, i.e. one with $\mu (\{(x,y)\}) = 1$ for some $(x,y)\in \mathcal{X}\times \mathcal{Y}$, then ${\mathsf {IC}_{\mu }(\pi )}=0$ for all possible $\pi $, as no protocol can reveal anything to the players about the inputs that they do not already know a-priori. Similarly, ${\mathsf {IC}_{\mu }(\pi )}=0$ if $\mathcal{X}=\mathcal{Y}$ and $\mu $ is supported on the diagonal $\{(x,x):x\in \mathcal{X}\}$. As expected, one can show that the communication cost $\mathsf {CC}(\pi )$ of $\pi $ is an upper bound on its information cost over any distribution $\mu $:

Lemma 2.9

[4] For any distribution $\mu $, ${\mathsf {IC}_{\mu }(\pi )}\le \mathsf {CC}(\pi )$.

On the other hand, as noted in the introduction, the converse need not hold. In fact, by [4], getting a strict inequality in Lemma 2.9 is equivalent to violating the direct sum conjecture for randomized communication complexity.

As one might expect, the information cost of a function f with respect to $\mu $ and error $\rho $ is the least amount of information that needs to be revealed by a protocol computing f with error $\le \rho $:

$$\begin{aligned} {\mathsf {IC}_{\mu }(f,\rho )}:= \inf _{\pi :~ {\mathbf P}_\mu [\pi (x,y)\ne f(x,y)]\le \rho } {\mathsf {IC}_{\mu }(\pi )}. \end{aligned}$$

The (prior-free) information cost was defined in [5] as the minimum amount of information that a worst-case error-$\rho $ randomized protocol can reveal against all possible prior distributions.

$$\begin{aligned} {\mathsf {IC}\left( f,\rho \right) } := \inf _{\pi \text {is a protocol with } {\mathbf P}[\pi (x,y)\ne f(x,y)]\le \rho \text { for all } (x,y)} \max _\mu ~~ {\mathsf {IC}_{\mu }(\pi )}. \end{aligned}$$

This information cost matches the amortized randomized communication complexity of f [5]. It is clear that lower bounds on ${\mathsf {IC}_{\mu }(f,\rho )}$ for any distribution $\mu $ also apply to ${\mathsf {IC}\left( f,\rho \right) } $.

3 Proof of Theorem 1.1

To establish the correctness of Theorem 1.1, we prove the following theorem, which is the central result of this paper:

Theorem 3.1

Suppose that ${\mathsf {IC}_{\mu }(f,1/10)} = I_\mu $. Then there exist a protocol $\pi '$ such that

$\mathsf {CC}(\pi ') = O(I_\mu )$.
${\mathbf P}_{(x,y)\sim \mu }[\pi '(x,y) = f(x,y)] \ge 1/2 + 2^{-O(I_\mu )}$

We first show how Theorem 1.1 follows from Theorem 3.1:

Proof of Theorem 1.1

Let $f, \mu $ be as in Theorem 1.1, and let ${\mathsf {IC}_{\mu }(f,1/10)} = I_\mu $. By Theorem 3.1, there exists a protocol $\pi '$ computing f with error probability $1/2 - 2^{-O(I_\mu )}$ using $O(I_\mu )$ bits of communication. Applying the discrepancy lower bound for communication complexity we obtain

$$\begin{aligned} O(I_\mu ) \ge D^\mu _{1/2-2^{-O(I_\mu )}}(f) \ge \log (2\cdot 2^{-O(I_\mu )}/Disc_{\mu }(f)) \end{aligned}$$

(1)

which after rearranging gives $I_\mu \ge \Omega (\log (1/Disc_{\mu }(f)))$, as desired. $\square $

We turn to prove Theorem 3.1. The main step is the following sampling lemma.

Lemma 3.2

Let $\mu $ be any distribution over a universe $\mathcal{U}$ and let $I\ge 0$ be a parameter that is known to both A and B. Further, let $\nu _A$ and $\nu _B$ be two distributions over $\mathcal{U}$ such that $\mathbf{D} \left( \mu ||\nu _A \right) \le I$ and $\mathbf{D} \left( \mu ||\nu _B \right) \le I$. The players are each given a pair of real functions $(p_A,q_A)$, $(p_B,q_B)$, $p_A,q_A,p_B,q_B:\mathcal{U}\rightarrow [0,1]$ such that for all $x\in \mathcal{U}$, $\mu (x) = p_A(x)\cdot p_B(x)$, $\nu _A(x)=p_A(x)\cdot q_A(x)$, and $\nu _B(x)=p_B(x)\cdot q_B(x)$. Then there is a (two round) sampling protocol $\Pi _1=\Pi _1(p_A, p_B, q_A, q_B,I)$ which has the following properties:

1.
at the end of the protocol, the players either declare that the protocol “fails”, or output $x_A\in \mathcal{U}$ and $x_B\in \mathcal{U}$, respectively (“success”).
2.
let ${\mathcal {S}}$ be the event that the players output “success”. Then ${\mathcal {S}}\Rightarrow x_A = x_B$, and $0.9 \cdot 2^{-50{(I+1)}} \le \Pr [{\mathcal {S}}] \le 2^{-50(I+1)}$.
3.
if $\mu _1$ is the distribution of $x_A$ conditioned on ${\mathcal {S}}$, then $|\mu -\mu _1|<2/9$.

Furthermore, $\Pi _1$ can be “compressed” to a protocol $\Pi _2$ such that $\mathsf {CC}(\Pi _2) = 211 I + 1$, whereas $|\Pi _1 - \Pi _2| \le 2^{-59 I}$ (by an abuse of notation, here we identify $\Pi _i$ with the random variable representing its output).

We will use the following technical fact about the divergence of distributions, which to best of our knowledge first appeared in the substate theorem of [13].

Claim 3.3

([13], Claim 5.1 in [5]) Suppose that $\mathbf{D} \left( \mu ||\nu \right) \le I$. Let $\varepsilon $ be any parameter. Then $ \mu \left\{ x:~2^{(I+1)/\varepsilon }\cdot \nu (x) <\mu (x)\right\} < \varepsilon . $

For completeness, we repeat the proof in “Appendix 2”.

Proof of Lemma 3.2

Throughout the execution of $\Pi _1$, Alice and Bob interpret their shared random tape as a source of points $(x_i,\alpha _i,\beta _i)$ uniformly distributed in $\mathcal{U}\times [0,2^{50(I+1)}] \times [0,2^{50(I+1)}]$. Alice and Bob consider the first $T= |\mathcal{U}|\cdot 2^{100(I+1)}\cdot 60I$ such points. Their goal will be to discover the first index $\tau $ such that $\alpha _\tau \le p_A(x_\tau )$ and $\beta _\tau \le p_B(x_\tau )$ (where they wish to find it using a minimal amount of communication, even if they are most likely to fail). First, we note that the probability that an index t satisfies $\alpha _t\le p_A(x_t)$ and $\beta _t \le p_B(x_t)$ is exactly $1/|\mathcal{U}|2^{50(I+1)}2^{50(I+1)} = 1/|\mathcal{U}|2^{100(I+1)}$. Hence the probability that $\tau >T$ (i.e. that $x_\tau $ is not among the T points considered) is bounded by

$$\begin{aligned} \left( 1-1/|\mathcal{U}|2^{100(I+1)}\right) ^{T} < e^{-T/|\mathcal{U}|2^{100(I+1)}} = e^{-60I} < 2^{-60I} \end{aligned}$$

(2)

Denote by $\mathcal{A}$ the following set of indices $\mathcal{A}:= \{i\le T:~\alpha _i\le p_A(x_i) \text { and }\beta _i\le 2^{50(I+1)}\cdot q_A(x_i)\}$, the set of potential candidates for $\tau $ from A’s viewpoint. Similarly, denote $\mathcal{B}:= \{i\le T:~\alpha _i\le 2^{50(I+1)}\cdot q_B(x_i) \text { and }\beta _i\le p_B(x_i)\}$.

The protocol $\Pi _1$ is very simple. Alice takes her bet on the first element $a \in \mathcal{A}$ and sends it to Bob. Bob outputs a only if (it just so happens that) $\beta _\tau \le p_B(a)$.

We turn to analyze $\Pi _1$. Denote the set of “Good” elements by

$$\begin{aligned} {\mathcal {G}}\mathop {=}\limits ^{def}\{ x:~2^{50(I+1)}\cdot \nu _A(x) \ge \mu (x) \; \text { and } \; 2^{50(I+1)}\cdot \nu _B(x) \ge \mu (x)\}\}. \end{aligned}$$

Then by Claim 3.3, $\mu ({\mathcal {G}}) \ge 48/50 = 24/25$. The following claim asserts that if it succeeds, the output of $\Pi _1$ has the “correct” distribution on elements in ${\mathcal {G}}$. $\square $

Proposition 3.4

Assume $\mathcal{A}$ is nonempty. Then for any $x_i \in \mathcal{U}$, the probability that $\Pi _1$ outputs $x_i$ is at most $\mu (x_i)\cdot 2^{-50(I+1)}$. If $x_i \in {\mathcal {G}}$, then this probability is exactly $\mu (x_i)\cdot 2^{-50(I+1)}$.

Proof

Note that if $\mathcal{A}$ is nonempty, then for any $x_i\in \mathcal{U}$, the probability that $x_i$ is the first element in $\mathcal{A}$ (i.e, $a = x_i$) is $p_A(x_i)q_A(x_i) = \nu _A(x_i)$. By construction, the probability that $\beta _i\le p_B(a)$ is $\min \{p_B(x_i)/(2^{50{(I+1)}}q_A(x_i)), 1\}$, and thus

$$\begin{aligned} \Pr [\Pi _1 \text { outputs } x_i] \le p_A(x_i)q_A(x_i)\cdot \frac{p_B(x_i)}{2^{50(I+1)}q_A(x_i)} = \mu (x_i)\cdot 2^{-50(I+1)}. \end{aligned}$$

On the other hand, if $x_i \in {\mathcal {G}}$, then we know that $ p_B(x_i)/q_A(x_i) = \mu (x_i)/\nu _A(x_i) \le 2^{50(I+1)}, $ and so the probability that $\beta _i\le p_B(a)$ is exactly $p_B(x_i)/(2^{50{(I+1)}}q_A(x_i))$. Since $\Pr [\Pi _1 \text { outputs } x_i] = \Pr [a = x_i]\Pr [\beta _i\le p_B(a)]$ (assuming $\mathcal{A}$ is nonempty), we conclude that:

$$\begin{aligned} x_i \in {\mathcal {G}}\Longrightarrow \Pr [\Pi _1 \text { outputs } x_i] = p_A(x_i)q_A(x_i)\cdot \frac{p_B(x_i)}{2^{50(I+1)}q_A(x_i)} = \mu (x_i)\cdot 2^{-50(I+1)}. \end{aligned}$$

$\square $

We are now ready to estimate the success probability of the protocol.

Proposition 3.5

Let ${\mathcal {S}}$ denote the event that $\mathcal{A}\ne 0$ and $a \in \mathcal{B}$ (i.e, that the protocol succeeds). Then

$$\begin{aligned} 0.9 \cdot 2^{-50{(I+1)}} \le \Pr [{\mathcal {S}}] \le 2^{-50(I+1)}. \end{aligned}$$

Proof

Using Proposition 3.4, we have that

$$\begin{aligned} \Pr [{\mathcal {S}}]\le & {} {\mathbf P}[a\in \mathcal{B}\; |\; \mathcal{A}\ne \emptyset ] = \sum _{i\in \mathcal{U}}\Pr [a = x_i]\Pr [\beta _i\le p_B(a)] \le \nonumber \\\le & {} \sum _{i\in \mathcal{U}} \mu (x_i)\cdot 2^{-50(I+1)} = 2^{-50(I+1)} \end{aligned}$$

(3)

For the lower bound, we have

$$\begin{aligned} \Pr [{\mathcal {S}}]\ge & {} \Pr [\beta _i\le p_B(a) \; |\; \mathcal{A}\ne \emptyset ]\cdot \Pr [\mathcal{A}\ne \emptyset ] \nonumber \\\ge & {} (1-2^{-60I})\bigg (\sum _{i\in \mathcal{U}}\Pr [a = x_i]\Pr [\beta _i\le p_B(a)]\bigg ) \nonumber \\\ge & {} (1-2^{-60I})\bigg (\sum _{i\in {\mathcal {G}}}\Pr [a = x_i]\Pr [\beta _i\le p_B(a)]\bigg ) \nonumber \\= & {} (1-2^{-60I})\bigg ( 2^{-50(I+1)}\sum _{i\in {\mathcal {G}}} \mu (x_i) \bigg ) = (1-2^{-60I})\bigg ( 2^{-50(I+1)}\mu ({\mathcal {G}}) \bigg ) \nonumber \\\ge & {} \frac{24}{25}(1-2^{-60I})2^{-50{(I+1)}} \ge 0.9 \cdot 2^{-50{(I+1)}} \end{aligned}$$

(4)

where the equality follows again from Claim 3.4. This proves the second claim of Lemma 3.2. $\square $

The following claim asserts that if ${\mathcal {S}}$ occurs, then the distribution of a is indeed close to $\mu $.

Claim 4

Let $\mu _1$ be the distribution of $a|{\mathcal {S}}$. Then $|\mu _1 - \mu | \le 2/9$.

Proof

The claim follows directly from Proposition 3.5. We defer the proof to the “Appendix 3”.

We turn to the “Furthermore” part of of Lemma 3.2. The protocol $\Pi _1$ satisfies the premises of the lemma, except it has a high communication cost. This is due to the fact that Alice explicitly sends a to Bob. To reduce the communication, Alice will instead send O(I) random hash values of a, and Bob will add corresponding consistency constraints to his set of candidates. The final protocol $\Pi _2$ is given in Figure 1.

Let $\mathcal{E}$ denote the event that in step 3. of the protocol, Bob finds an element $x_i\ne a$ (that is, the probability that the protocol outputs “success” but $x_A \ne x_B$). We upper bound the probability of $\mathcal{E}$. Given $a\in \mathcal{A}$ and $x_i\in \mathcal{B}$ such that $a\ne x_i$, the probability (over possible choices of the hash functions) that $h_j(a)=h_j(x_i)$ for $j=1..d$ is $2^{-d} \le 2^{-211 I}$. For any t, ${\mathbf P}[t\in \mathcal{B}] \le \frac{1}{|\mathcal{U}|}\sum _{x_i\in \mathcal{U}}p_B(x_i)q_B(x_i)\cdot 2^{50(I+1)} = \frac{1}{|\mathcal{U}|}\sum _{x_i\in \mathcal{U}}\nu _B(x_i)\cdot 2^{50(I+1)} = 2^{50(I+1)}/|\mathcal{U}|$. Thus, by a union bound we have

$$\begin{aligned} {\mathbf P}[\mathcal{E}]\le & {} {\mathbf P}[\exists x_i\in \mathcal{B}\; s.t \; x_i\ne a \; \wedge \; h_j(a)=h_j(x_i) \; \forall \; j=1,\ldots ,d] \nonumber \\\le & {} T\cdot 2^{50(I+1)}\cdot 2^{-d}/|\mathcal{U}| = 2^{150(I+1)}\cdot 60 I \cdot 2^{-211 I} \ll 2^{-60 I}. \end{aligned}$$

(5)

By a slight abuse of notation, let $\Pi _2$ be the distribution of $\Pi _2$’s output. Similarly, denote by $\Pi _1$ the distribution of the output of protocol $\Pi _1$. Note that if $\mathcal{E}$ does not occur, then the outcome of the execution of $\Pi _2$ is identical to the outcome of $\Pi _1$. Since ${\mathbf P}[\mathcal{E}] \le 2^{-60 I}$, we have

$$\begin{aligned} |\Pi _2 - \Pi _1| = \Pr [\mathcal{E}] \cdot |[\Pi _2|\mathcal{E}]-[\Pi _1|\mathcal{E}]| \le 2\cdot 2^{-60I} \ll 2^{-59 I} \end{aligned}$$

which finishes the proof of the lemma. $\square $

Remark 3.6

The communication cost of the sampling protocol $\Pi _2$ can be reduced from $O(I_\mu )$ to O(1) (more precisely, to only two bits) in the following way: Instead of having Alice compute the hash values privately and send them to Bob in step 2 and 3 of the protocol, the players can use their shared randomness to sample $d=O(I_\mu )$ random hash values $h_1(b_1),\ldots ,h_d(b_d)$ (where the $b_i$’s are random independent strings in $\mathcal{U}$), and Alice will only send Bob a single bit indicating whether those hash values match the hashing of her string a (i.e, $h_i(b_i) = h_i(a)$ for all $i\in [d]$). In step 4 Bob will act as before, albeit comparing the hashes of his candidate b to the random hashes $h_i(b_i)$, and output success (“1”) if the hashes match. Note that this modification incurs an additional loss of $2^{-d}=2^{-O(I_\mu )}$ in the success probability of the protocol (as this is the probability that $h_i(b_i) = h_i(a)$ for all $i\in [d]$), but since the success probability we are shooting for is already of the order $2^{-O(I_\mu )}$, we can afford this loss. This modification was discovered in [17].

With Lemma 3.2 in hand, we are now ready to prove Theorem 3.1.

Proof of Theorem 3.1

Let $\pi $ be a protocol that realizes the value $I_\mu :={\mathsf {IC}_{\mu }(f,1/10)}$. In other words, $\pi $ has an error rate of at most 1 / 10 and information cost of at most $I_\mu $ with respect to $\mu $. Denote by $\pi _{xy}$ the random variable that represents that transcript $\pi $ given the inputs (x, y), and by $\pi _x$ (resp. $\pi _y$) the protocol conditioned on only the input x (resp. y). We denote by $\pi _{XY}$ the transcripts where (X, Y) are also a pair of random variables. By Claim 3.3, we know that

$$\begin{aligned} I_\mu = I(X; \pi _{XY}|Y)+I(Y; \pi _{XY}|X) = {\mathbf E}_{(x,y)\sim \mu } [\mathbf{D} \left( \pi _{xy} ||\pi _x \right) +\mathbf{D} \left( \pi _{xy} ||\pi _y \right) ]. \end{aligned}$$

(6)

Let us now run the sampling algorithm $\Pi _1$ from Lemma 3.2, with the distribution $\mu $ taken to be $\pi _{xy}$, the distributions $\nu _A$ and $\nu _B$ taken to be $\pi _x$ and $\pi _y$ respectively, and I taken to be $20 I_\mu $.

At each node v of the protocol tree that is owned by player X let $p_0(v)$ and $p_1(v)=1-p_0(v)$ denote the probabilities that the next bit sent by X is 0 and 1, respectively. For nodes owned by player Y, let $q_0(v)$ and $q_1(v)=1-q_0(v)$ denote the probabilities that the next bit sent by Y is 0 and 1, respectively, as estimated by player X given the input x. For each leaf $\ell $ let $p_X(\ell )$ be the product of all the values of $p_b(v)$ from the nodes that are owned by X along the path from the root to $\ell $; let $q_X(\ell )$ be the product of all the values of $q_b(v)$ from the nodes that are owned by Y along the path from the root to $\ell $. The values $p_Y(\ell )$ and $q_Y(\ell )$ are defined similarly. For each $\ell $ we have ${\mathbf P}[\pi _{xy}=\ell ] = p_X(\ell )\cdot p_Y(\ell )$, ${\mathbf P}[\pi _{x} = \ell ] = p_X(\ell )\cdot q_X(\ell )$, and ${\mathbf P}[\pi _{y} = \ell ] = p_Y(\ell )\cdot q_Y(\ell )$. Thus we can apply Lemma 3.2 so as to obtain the following protocol $\pi '$ for computing f:

If $\Pi _1$ fails, we return a random unbiased coin flip.
If $\Pi _1$ succeeds, we return the final bit of the transcript sample T. Denote this bit by $T_{out}$.

To prove the correctness of the protocol, let $\mathcal {Z}$ denote the event that both $\mathbf{D} \left( \pi _{xy} ||\pi _x \right) \le 20 I_\mu $ and $\mathbf{D} \left( \pi _{xy} ||\pi _y \right) \le 20 I_\mu $. By (6) and Markov inequality, $\Pr [\mathcal {Z}] \ge 19/20$ (where the probability is taken with respect to $\mu $). Denote by $\delta $ the probability that $\Pi _1$ succeeds. By the assertions of Lemma 3.2, $\delta \ge 0.9\cdot 2^{-50(I+1)}$. Furthermore, if $\Pi _1$ succeeds, then we have $|T-\pi _{xy}|\le 2/9$, which in particular implies that ${\mathbf P}[T_{out} = \pi _{out}] \ge 7/9$. Finally, ${\mathbf P}[ \pi _{out} = f(x,y)] \ge 9/10$, since $\pi $ has error at most 1 / 10 with respect to $\mu $. Now, let $\mathcal{W}$ denote the indicator variable whose value is 1 iff $\pi '(x,y) = f(x,y)$. Putting together the above,

$$\begin{aligned} {\mathbf E}[\mathcal{W}\; | \; \mathcal {Z}] = (1-\delta )\cdot \frac{1}{2} + \delta \cdot \left( \frac{7}{9}-\frac{1}{10}\right) > \frac{1}{2} + \delta \cdot \frac{1}{6} > \frac{1}{2} + \frac{1}{8}\cdot 2^{-50(I+1)}.\quad \end{aligned}$$

(7)

On the other hand, note that by Lemma 3.2 the probability that $\Pi _1$ succeeds is at most $2^{-50(I+1)}$ (no matter how large $\mathbf{D} \left( \pi _{xy} ||\pi _x \right) $ and $\mathbf{D} \left( \pi _{xy} ||\pi _y \right) $ are!), and so $ {\mathbf E}[\mathcal{W}\; | \; \lnot \mathcal {Z}] \ge (1-2^{-50(I+1)})/2. $

Hence we conclude that

$$\begin{aligned} {\mathbf E}[\mathcal{W}]= & {} {\mathbf E}[\mathcal{W}\; | \; \mathcal {Z}]\cdot {\mathbf P}[\mathcal {Z}] + {\mathbf E}[\mathcal{W}\; | \; \lnot \mathcal {Z}]\cdot {\mathbf P}[\lnot \mathcal {Z}] \ge \left( \frac{1}{2} + \frac{1}{8}\cdot 2^{-50(I+1)}\right) \cdot \frac{19}{20} \\&+ \left( 1-2^{-50(I+1)}\right) \cdot \frac{1}{2}\cdot \frac{1}{20} \ge \frac{1}{2} + \frac{1}{12}\cdot 2^{-50(I+1)} > \frac{1}{2} + \frac{1}{12}\cdot 2^{-1000(I_\mu +1)}. \end{aligned}$$

Finally, Lemma 3.2 asserts that $|\Pi _1 - \Pi _2| < 2^{-59 I}$. Thus if we replace $\Pi _1$ by $\Pi _2$ in the execution of protocol $\pi '$, the success probability decreases by at most $2^{-59 I}\ll \frac{1}{12}\cdot 2^{-50(I+1)}$. Furthermore, the amount of communication used by $\pi '$ is now

$$\begin{aligned} 211 I = 4220 I_\mu = O(I_\mu ).\end{aligned}$$

Hence we conclude that with this modification, $\pi '$ has the following properties:

$\mathsf {CC}(\pi ') = 4220\cdot I_\mu $;
${\mathbf P}_{(x,y)\sim \mu }[\pi '(x,y) = f(x,y)] \ge 1/2 + 2^{-1000(I_\mu +1)-4}$;

which completes the proof. $\square $

Remark 3.7

Using similar techniques, Braverman [5] showed previously that any function f whose information complexity is I has communication cost at most $2^{O(I)}$ ^{Footnote 1}, thus implying that $IC(f)\ge \Omega (\log (CC(f)))$. We note that this result can be easily derived (up to constant factors) from Theorem 3.1. Indeed, applying the “compressed” protocol $2^{O(I)}\log (1/\epsilon )$ independent times and taking a majority vote guarantees an error of at most $\epsilon $ (by a standard Chernoff bound^{Footnote 2}), with communication $O(I)\cdot 2^{O(I)} = 2^{O(I)}$. Thus, our result is strictly stronger than the former one.

Notes

More precisely, it shows that for any distribution $\mu $, $D_{\varepsilon + \delta }^\mu (f) = 2^{O(1+{\mathsf {IC}_{\mu }(f,\varepsilon )}/\delta ^2)}$.
See N.Alon and J. Spencer, “The Probabilistic Method” (Third Edition) ,Corollary A.1.14, p.312.

References

Asharov, G., Lindell, Y.: A full proof of the BGW protocol for perfectly-secure multiparty computation. IACR Cryptol. ePrint Arch. 2011, 136 (2011)
Google Scholar
Barak, B., Braverman, M., Chen, X., Rao, A.: How to compress interactive communication. In: STOC, pp 67–76 (2010)
Ben-Or, M., Goldwasser, S., Wigderson, A.: Completeness theorems for non-cryptographic fault-tolerant distributed computation. In: Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, pp. 1–10. ACM (1988)
Braverman, M., Rao, A.: Information equals amortized communication. IEEE Trans. Inf. Theory 60(10), 6058–6069 (2014)
Article MathSciNet MATH Google Scholar
Braverman, M.: Interactive information complexity. In: Proceedings of the 44th Symposium on Theory of Computing, STOC ’12, pp 505–524. ACM, New York, NY (2012)
Bar-Yehuda, R., Chor, B., Kushilevitz, E., Orlitsky, A.: Privacy, additional information and communication. IEEE Trans. Inf. Theory 39(6), 1930–1943 (1993)
Article MathSciNet MATH Google Scholar
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D.: An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci. 68(4), 702–732 (2004)
Article MathSciNet MATH Google Scholar
Chor, B., Kushilevitz, E.: A zero-one law for boolean privacy. In: Proceedings of the Twenty-first Annual ACM Symposium on Theory of Computing, pp. 62–72. ACM (1989)
Chakrabarti, A., Shi, Y., Wirth, A., Yao, A.: Informational complexity and the direct sum problem for simultaneous message complexity. In: Proceedings of the 42nd Annual IEEE Symposium on Foundations of Computer Science, pp. 270–278 (2001)
Ganor, A., Kol, G., Raz. R.: Exponential separation of information and communication for boolean functions. In: Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, 14–17 June 2015, pp. 557–566 (2015)
Harsha, P., Jain, R., McAllester, D., Radhakrishnan, J.: The communication complexity of correlation. In: Twenty-Second Annual IEEE Conference on Computational Complexity, 2007 (CCC’07), pp. 10–23. IEEE (2007)
Jain, R.: A strong direct product theorem for two-way public coin communication complexity. Arxiv preprint arXiv:1010.0846 (2010)
Jain, R., Radhakrishnan, J., Sen, P.: Privacy and interaction in quantum communication complexity and a theorem about the relative entropy of quantum states. In: Proceedings of the 43rd Symposium on Foundations of Computer Science (FOCS 2002), 16–19 November 2002, Vancouver, BC, Canada, pp. 429–438 (2002)
Jain, R., Sen, P., Radhakrishnan, J.: Optimal direct sum and privacy trade-off results for quantum and classical communication complexity. Arxiv preprint arXiv:0807.1267 (2008)
Klauck, H.: Quantum and approximate privacy. Theory Comput. Syst. 37(1), 221–246 (2004)
Article MathSciNet MATH Google Scholar
Klauck, H.: A strong direct product theorem for disjointness. In: Proceedings of the 42nd ACM Symposium on Theory of Computing, pp. 77–86. ACM (2010)
Kerenidis, I., Laplante, S., Lerays, V., Roland, J., Xiao, D.: Lower bounds on information complexity via zero-communication protocols and applications. In: 53rd Annual IEEE Symposium on Foundations of Computer Science, FOCS 2012, New Brunswick, NJ, USA, 20–23 October 2012, pp. 500–509 (2012)
Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press, New York (1997). 96012840 96012840 Eyal Kushilevitz, Noam Nisan
Book MATH Google Scholar
Klauck, H., Spalek, R., De Wolf, R.: Quantum and classical strong direct product theorems and optimal time-space tradeoffs. In: Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, 2004, pp. 12–21. IEEE (2004)
Lee, T., Shraibman, A., Spalek. R.: A direct product theorem for discrepancy. In: 23rd Annual IEEE Conference on Computational Complexity, 2008 (CCC’08), pp. 71–80. IEEE (2008)
Miltersen, P.B., Nisan, N., Safra, S., Wigderson, A.: On data structures and asymmetric communication complexity. In: Proceedings of the Twenty-seventh Annual ACM Symposium on Theory of Computing, pp. 103–111. ACM (1995)
Razborov, A.A.: On the distributional complexity of disjointness. Theor. Comput. Sci. 106(2), 385–390 (1992)
Article MathSciNet MATH Google Scholar
Shaltiel, R.: Towards proving strong direct product theorems. Comput. Complex. 12(1), 1–22 (2003)
Article MathSciNet MATH Google Scholar
Smirnov. D.V.: Shannon’s information methods for lower bounds for probabilistic communication complexity. Master’s thesis, Moscow State University (1988)
Viola, E.: The communication complexity of addition. In: Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, LA, USA, 6–8 January 2013, pp. 632–651 (2013)
Viola, E., Wigderson, A.: Norms, xor lemmas, and lower bounds for polynomials and protocols. Theory Comput. 4(1), 137–168 (2008)
Article MathSciNet MATH Google Scholar
Yao, A.C.C.: Some complexity questions related to distributive computing. In: STOC, pp. 209–213 (1979)

Download references

Acknowledgments

We thank Ankit Garg and several anonymous reviewers for their useful comments and helpful discussions.

Author information

Authors and Affiliations

Department of Computer Science, Princeton University, Princeton, NJ, USA
Mark Braverman
Department of Computer Science, Courant Institute, New York University, New York, NY, USA
Omri Weinstein

Authors

Mark Braverman
View author publications
You can also search for this author in PubMed Google Scholar
Omri Weinstein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Omri Weinstein.

Additional information

Mark Braverman: Supported in part by an NSF CAREER award (CCF-1149888), a Packard Fellowship in Science and Engineering, and the Simons Collaboration on Algorithms and Geometry. Omri Weinstein: Supported by a Simons Society Junior Fellowship and a Siebel Scholarship.

Appendices

Appendix 1: Communication Complexity

Let $\mathcal {X}, \mathcal {Y}$ denote the set of possible inputs to the two players, who we name A and B. We view a private coins protocol for computing a function $f: \mathcal {X} \times \mathcal {Y} \rightarrow \mathbb {Z}_K$ as a rooted tree with the following structure:

Each non-leaf node is owned by A or by B.
Each non-leaf node owned by a particular player has a set of children that are owned by the other player. Each of these children is labeled by a binary string, in such a way that this coding is prefix free: no child has a label that is a prefix of another child.
Every node is associated with a function mapping $\mathcal {X}$ to distributions on children of the node and a function mapping $\mathcal {Y}$ to distributions on children of the node.
The leaves of the protocol are labeled by output values.

A public coin protocol is a distribution on private coins protocols, run by first using shared randomness to sample an index r and then running the corresponding private coin protocol $\pi _r$. Every private coin protocol is thus a public coin protocol. The protocol is called deterministic if all distributions labeling the nodes have support size 1.

Definition 3.8

The communication cost (or communication complexity) of a public coin protocol $\pi $, denoted $\mathsf {CC}(\pi )$, is the maximum number of bits that can be transmitted in any run of the protocol.

Definition 3.9

The number of rounds of a public coin protocol is the maximum depth of the protocol tree $\pi _r$ over all choices of the public randomness.

Appendix 2: Proof of Claim 3.3 (cf. [5])

Proof

Recall that $\mathbf{D} \left( \mu ||\nu \right) = \sum _{x\in \mathcal{U}} \mu (x) \log \frac{\mu (x)}{\nu (x)}$. Denote by $\mathcal{N}=\{x:~\mu (x)<\nu (x)\}$—the terms that contribute a negative amount to $\mathbf{D} \left( \mu ||\nu \right) $. First we observe that for all $0<x<1$, $x\log x>-1$, and thus

$$\begin{aligned} \sum _{x\in \mathcal{N}} \mu (x) \log \frac{\mu (x)}{\nu (x)} = \sum _{x\in \mathcal{N}} \nu (x)\cdot \frac{\mu (x)}{\nu (x)} \log \frac{\mu (x)}{\nu (x)} \ge \sum _{x\in \mathcal{N}} \nu (x)\cdot (-1) > -1. \end{aligned}$$

Denote by $\mathcal{L}= \left\{ x:~2^{(I+1)/\varepsilon }\cdot \nu (x) <\mu (x)\right\} $; we need to show that $\mu (\mathcal{L})<\varepsilon $. For each $x\in \mathcal{L}$ we have $\log \frac{\mu (x)}{\nu (x)}>(I+1)/\varepsilon $. Thus

$$\begin{aligned} I \ge \mathbf{D} \left( \mu ||\nu \right) \ge \sum _{x\in \mathcal{L}} \mu (x) \log \frac{\mu (x)}{\nu (x)} + \sum _{x\in \mathcal{N}} \mu (x) \log \frac{\mu (x)}{\nu (x)} > \mu (\mathcal{L})\cdot (I+1)/\varepsilon -1, \end{aligned}$$

implying $\mu (\mathcal{L})<\varepsilon $. $\square $

Appendix 3: Proof of Claim 4

Proof

For any $x_i\in \mathcal{U}$,

$$\begin{aligned} \mu _1(x_i)= \Pr (a = x_i \; | \; {\mathcal {S}}) \le \frac{\mu (x_i)2^{-50{(I+1)}}}{\Pr [{\mathcal {S}}]} \le \frac{\mu (x_i)}{0.9} = (1 + 1/9)\mu (x_i) \end{aligned}$$

(8)

where the last inequality follows from Proposition 3.5. Hence, $|\mu _1 - \mu | = $

$$\begin{aligned} 2\bigg (\sum _{x_i : \mu _1(x_i) \ge \mu (x_i)} \mu _1(x_i) - \mu (x_i)\bigg ) \le 2\bigg ( \sum _{x_i : \mu _1(x_i) \ge \mu (x_i)}(1 + 1/9)\mu (x_i) - \mu (x_i) \bigg ) \le \frac{2}{9} \end{aligned}$$

This proves Claim (3) of the lemma. $\square $

Appendix 4: Proof of Lemma 1.4: The Discrepancy of the Greater-Than Function

We consider the Greater-Than function on n-bit strings. We start by defining the “hard” distribution $\mu $. A pair (x, y) is sampled as follows:

1.
Sample an index $k\in \{1,\ldots ,n\}$ uniformly at random.
2.
Sample $z_1,\ldots ,z_{k-1}$, w, $x_{k+1},\ldots ,x_n$, $y_{k+1},\ldots ,y_n$—uniformly random bits.
3.
Let $x=z_1,\ldots ,z_{k-1},w,x_{k+1},\ldots ,x_n$, $y=z_1,\ldots ,z_{k-1},\overline{w},y_{k+1},\ldots ,y_n$.

Denote this distribution by $\mu _n$. Let $GT_n(x,y)=1$ iff $x>y$. We will prove the following Lemma:

Lemma 3.10

The discrepancy of $GT_n$ with respect to $\mu _n$ satisfies

$$\begin{aligned}Disc_{\mu _n}(GT_n)<\frac{20}{\sqrt{n}}.\end{aligned}$$

In fact, to facilitate an inductive proof, we will show a slightly stronger statement:

Lemma 3.11

Let $R=S\times T$ be a rectangle in $\{0,1\}^n\times \{0,1\}^n$. Let $s:=|S|/2^n$ and $t:=|T|/2^n$ be the uniform size of S and T respectively. Then

$$\begin{aligned} Disc_{\mu _n}(GT_n,R)<\frac{20 \sqrt{s t}}{\sqrt{n}}. \end{aligned}$$

Note that Lemma 3.11 immediately implies Lemma 3.10.

Proof

We prove Lemma 3.11 by induction on n. The statement is trivially true for $n=1$. Assume the statement is true for $n-1$. Our goal is to prove it for n. Let $R=S\times T$ be any rectangle in $\{0,1\}^n\times \{0,1\}^n$. By a slight abuse of notation we write:

$$\begin{aligned} Disc_{\mu _n}(GT_n,R) = \Pr _{\mu _n}[f(x,y) = 1 \wedge (x,y)\in R] - \Pr _{\mu _n}[f(x,y) = 0 \wedge (x,y)\in R], \end{aligned}$$

and prove an upper bound on this quantity (without $|\,\cdot \,|$). The matching upper bound on $-Disc_{\mu _n}(GT_n,R)$ follows by an identical argument.

Let $s:=|S|/2^n$ and $t:=|T|/2^n$. Denote by $S_0$ the set of strings in S that begin with a 0, and $S_1:=S\setminus S_0$. Similarly, define $T_0$ and $T_1$. Further, let $p:=|S_0|/|S|$ and $q:=|T_0|/|T|$.

Note that restricted to $S_0\times T_0$, $\mu _{n}$ is the same distribution as $\mu _{n-1}$, scaled by a factor of $\frac{n-1}{2n}$. Moreover, $s_0:=|S_0|/2^{n-1}=p s 2^{n}/2^{n-1} = 2ps$. Similarly, $s_1:=|S_1|/2^{n-1}=2(1-p)s$, $t_0:=|T_0|/2^{n-1} = 2qt$, $t_1:=|T_1|/2^{n-1}=2(1-q)t$. Putting these pieces together, and applying the inductive hypothesis we get:

$$\begin{aligned} Disc_{\mu _n}(GT_n,S\times T)= & {} Disc_{\mu _n}(GT_n,S_0\times T_0) +Disc_{\mu _n}(GT_n,S_1\times T_1) \nonumber \\&+ Disc_{\mu _n}(GT_n,S_1\times T_0) +Disc_{\mu _n}(GT_n,S_0\times T_1) \nonumber \\= & {} \frac{n-1}{2 n}\cdot Disc_{\mu _{n-1}}(GT_{n-1},S_0\times T_0) + \frac{n-1}{2 n}\nonumber \\&\cdot Disc_{\mu _{n-1}}(GT_{n-1},S_1\times T_1) \nonumber \\&+\frac{2}{n}(1-p)s q t - \frac{2}{n}p s (1-q) t \nonumber \\< & {} \frac{n-1}{2n}\cdot \frac{20 \sqrt{s_0 t_0}}{\sqrt{n-1}} + \frac{n-1}{2n}\cdot \frac{20 \sqrt{s_1 t_1}}{\sqrt{n-1}} + \frac{2}{n} (q-p) st \nonumber \\= & {} \frac{1}{\sqrt{n}} \Bigg ( \sqrt{\frac{n-1}{n}} \cdot \left( {20 \sqrt{pq}\cdot \sqrt{st}}+{20 \sqrt{(1-p)(1-q)}\cdot \sqrt{st}}\right) \nonumber \\&+\frac{2}{\sqrt{n}} (q-p) st \Bigg ). \end{aligned}$$

(9)

If $q-p<0$, we continue (9) as follows:

$$\begin{aligned} RHS\le & {} \frac{1}{\sqrt{n}} \left( \sqrt{\frac{n-1}{n}} \cdot \left( {20 \sqrt{pq}\cdot \sqrt{st}}+{20 \sqrt{(1-p)(1-q)}\cdot \sqrt{st}}\right) \right) \\\le & {} \frac{20\sqrt{st}}{\sqrt{n}} \cdot \left( \sqrt{pq}+\sqrt{(1-p)(1-q)}\right) \le \frac{20\sqrt{st}}{\sqrt{n}}, \end{aligned}$$

where the last inequality follows from simple calculations.

On the other hand, in the more difficult case when $q-p\ge 0$, we use the fact the $st\le 1$ to continue (9) as follows:

$$\begin{aligned}&RHS\le \frac{1}{\sqrt{n}} \left( \sqrt{\frac{n-1}{n}} \cdot \left( {20 \sqrt{pq}\cdot \sqrt{st}}+{20 \sqrt{(1-p)(1-q)}\cdot \sqrt{st}}\right) +\frac{2}{\sqrt{n}} (q-p) \sqrt{st} \right) \nonumber \\&\qquad =\frac{20\sqrt{st}}{\sqrt{n}} \left( \sqrt{\frac{n-1}{n}} \cdot \left( { \sqrt{pq}}+{ \sqrt{(1-p)(1-q)}}\right) +\frac{1}{10 \sqrt{n}} (q-p) \right) \end{aligned}$$

(10)

Next, we use the readily verifiable facts that $\sqrt{\frac{n-1}{n}} < 1-\frac{1}{2n}$ and that ${ \sqrt{pq}}+{ \sqrt{(1-p)(1-q)}} \le 1-(q-p)^2/4$, to continue (10) as follows:

$$\begin{aligned} RHS\le & {} \frac{20\sqrt{st}}{\sqrt{n}} \left( \left( 1-\frac{1}{2n}\right) \cdot \left( 1-(q-p)^2/4 \right) +\frac{1}{10 \sqrt{n}} (q-p) \right) \nonumber \\\le & {} \frac{20\sqrt{st}}{\sqrt{n}} \left( \left( 1-\frac{1}{4n}-(q-p)^2/8 \right) +\frac{1}{10 \sqrt{n}} (q-p) \right) \nonumber \\= & {} \frac{20\sqrt{st}}{\sqrt{n}} \left( 1-\left( \frac{1/(2n)+(q-p)^2/4}{2} \right) +\frac{1}{10 \sqrt{n}} (q-p) \right) \nonumber \\\le & {} \frac{20\sqrt{st}}{\sqrt{n}} \left( 1-\sqrt{\frac{1}{2n}\cdot \frac{(q-p)^2}{4}} +\frac{1}{10 \sqrt{n}} (q-p) \right) \nonumber \\= & {} \frac{20\sqrt{st}}{\sqrt{n}} \left( 1-\frac{1}{\sqrt{8} \sqrt{n}} (q-p) +\frac{1}{10 \sqrt{n}} (q-p) \right) \le \frac{20\sqrt{st}}{\sqrt{n}}, \end{aligned}$$

(11)

where the third-to-last inequality follows from the fact that for all $0\le a,b\le 1$, $(1-a)(1-b)\le 1-a/2-b/2$, and the second-to-last one in an application of the AM-GM inequality. $\square $

Appendix 5: Sampling Protocol from Lemma 3.2

See Figure 2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Braverman, M., Weinstein, O. A Discrepancy Lower Bound for Information Complexity. Algorithmica 76, 846–864 (2016). https://doi.org/10.1007/s00453-015-0093-8

Download citation

Received: 08 October 2014
Accepted: 05 November 2015
Published: 30 November 2015
Issue Date: November 2016
DOI: https://doi.org/10.1007/s00453-015-0093-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Discrepancy Lower Bound for Information Complexity

Abstract

Similar content being viewed by others

New Lower Bounds for Privacy in Communication Protocols

Communication Complexity with Small Advantage

A Direct Product Theorem for Two-Party Bounded-Round Public-Coin Communication Complexity

1 Introduction

1.1 Our Results

Theorem 1.1

Remark 1.2

Corollary 1.3

Lemma 1.4

Corollary 1.5

1.2 Comparison and Connections to Prior Results

2 Preliminaries

Definition 2.1

2.1 Information Theory

Definition 2.2

Definition 2.3

Definition 2.4

Proposition 2.5

2.2 Communication Complexity

Definition 2.6

Definition 2.7

2.3 Information + Communication: The Information Cost of a Protocol

Definition 2.8

Lemma 2.9

3 Proof of Theorem 1.1

Theorem 3.1

Proof of Theorem 1.1

Lemma 3.2

Claim 3.3

Proof of Lemma 3.2

Proposition 3.4

Proof

Proposition 3.5

Proof

Claim 4

Proof

Remark 3.6

Proof of Theorem 3.1

Remark 3.7

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Communication Complexity

Definition 3.8

Definition 3.9

Appendix 2: Proof of Claim 3.3 (cf. [5])

Proof

Appendix 3: Proof of Claim 4

Proof

Appendix 4: Proof of Lemma 1.4: The Discrepancy of the Greater-Than Function

Lemma 3.10

Lemma 3.11

Proof

Appendix 5: Sampling Protocol from Lemma 3.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation