1 Introduction

The past few years has seen a wave of commercially deployed systems [17, 29] for analysis of users’ sensitive data in the local model of differential privacy (LDP). LDP systems have several features that make them attractive in practice, and limit the barriers to adoption. Each user only sends private data to the data collector, so users do not need to fully trust the collector, and the collector is not saddled with legal or ethical obligations. Moreover, these protocols are relatively simple and scalable, typically requiring each party to asynchronously send just a single short message.

However, the local model imposes strong constraints on the utility of the algorithm. These constraints preclude the most useful differentially private algorithms, which require a central model where the users’ data is sent in the clear, and the data collector is trusted to perform only differentially private computations. Compared to the central model, the local model requires enormous amounts of data, both in theory and in practice (see e.g. [20] and the discussion in [5]). Unsurprisingly, the local model has so far only been used by large corporations like Apple and Google with billions of users.

In principle, there is no dilemma between the central and local models, as any algorithm can be implemented without a trusted data collector using cryptographic multiparty computation (MPC). However, despite dramatic recent progress in the area of practical MPC, existing techniques still require large costs in terms of computation, communication, and number of rounds of interaction between the users and data collector, and are considerably more difficult for companies to extend and maintain.

In this work, we initiate the analytic study of an intermediate model for distributed differential privacy called the shuffled model. This model, a special case of the ESA framework of [5], augments the standard model of local differential privacy with an anonymous channel (also called a shuffler) that collects messages from the users, randomly permutes them, and then forwards them to the data collector for analysis. For certain applications, this model overcomes the limitations on accuracy of local algorithms while preserving many of their desirable features. However, under natural constraints, this model is dramatically weaker than the central model. In more detail, we make two primary contributions:

  • We give a simple, non-interactive algorithm in the shuffled model for estimating a single Boolean-valued statistical query (also known as a counting query) that essentially matches the error achievable by centralized algorithms. We also show how to extend this algorithm to estimate a bounded real-valued statistical query, albeit at an additional cost in communication. These protocols are sufficient to implement any algorithm in the statistical queries model [22], which includes methods such as gradient descent.

  • We consider the ubiquitous variable-selection problem—a simple but canonical optimization problem. Given a set of counting queries, the variable-selection problem is to identify the query with nearly largest value (i.e. an “approximate argmax”). We prove that the sample complexity of variable selection in a natural restriction of the shuffled model is exponentially larger than in the central model. The restriction is that each user send only a single message into the shuffle, as opposed to a set of messages, which we call this the one-message shuffled model. Our positive results show that the sample complexity in the shuffled model is polynomially smaller than in the local model. Taken together, our results give evidence that the central, shuffled, and local models are strictly ordered in the accuracy they can achieve for selection. Our lower bounds follow from a structural result showing that any algorithm that is private in the one-message shuffled model is also private in the local model with weak, but non-trivial, parameters.

In concurrent and independent work, Erlingsson et al. [16] give conceptually similar positive results for local protocols aided by a shuffler. We give a more detailed comparison between our work and theirs after giving a thorough description of the model and our results (Sect. 2.3)

1.1 Background and Related Work

Models for Differentially Private Algorithms. Differential privacy [14] is a restriction on the algorithm that processes a dataset to provide statistical summaries or other output. It ensures that, no matter what an attacker learns by interacting with the algorithm, it would have learned nearly the same thing whether or not the dataset contained any particular individual’s data [21]. Differential privacy is now widely studied, and algorithms satisfying the criterion are increasingly deployed [1, 17, 24].

There are two well-studied models for implementing differentially-private algorithms. In the central model, raw data are collected at a central server where they are processed by a differentially private algorithm. In the local model [14, 18, 33], each individual applies a differentially private algorithm locally to their data and shares only the output of the algorithm—called a report or response—with a server that aggregates users’ reports. The local model allows individuals to retain control of their data since privacy guarantees are enforced directly by their devices. It avoids the need for a single, widely-trusted entity and the resulting single point of security failure. The local model has witnessed an explosion of research in recent years, ranging from theoretical work to deployed implementations. A complete survey is beyond the scope of this paper.

Unfortunately, for most tasks there is a large, unavoidable gap between the accuracy that is achievable in the two models. [4] and [8] show that estimating the sum of bits, one held by each player, requires error \(\varOmega (\sqrt{n}/\varepsilon )\) in the local model, while an error of just \(O(1/\varepsilon )\) is possible the central model. [12] extended this lower bound to a wide range of natural problems, showing that the error must blowup by at least \(\varOmega (\sqrt{n})\), and often by an additional factor growing with the data dimension. More abstractly, [20] showed that the power of the local model is equivalent to the statistical query model [22] from learning theory. They used this to show an exponential separation between the accuracy and sample complexity of local and central algorithms. Subsequently, an even more natural separation arose for the variable-selection problem [12, 31], which we also consider in this work.

Implementing Central-Model Algorithms in Distributed Models. In principle, one could also use the powerful, general tools of modern cryptography, such as multiparty computation (MPC), or secure function evaluation, to simulate central model algorithms in a setting without a trusted server [13], but such algorithms currently impose bandwidth and liveness constraints that make them impractical for large deployments. In contrast, Google [17] now uses local differentially private protocols to collect certain usage statistics from hundreds of millions of users’ devices.

A number of specific, efficient MPC algorithms have been proposed for differentially private functionalities. They generally either (1) focus on simple summations and require a single “semi-honest”/“honest-but-curious” server that aggregates user answers, as in [6, 9, 26] ; or (2) allow general computations, but require a network of servers, a majority of whom are assumed to behave honestly, as in [11]. As they currently stand, these approaches have a number of drawbacks: they either require users to trust that a server maintained by a service provided is behaving (semi-)honestly, or they require that a coalition of service providers collaborate to run protocols that reveal to each other who their users are and what computations they are performing on their users’ data. It is possible to avoid these issues by combining anonymous communication layers and MPC protocols for universal circuits but, with current techniques, such modifications destroy the efficiency gains relative to generic MPC.

Thus, a natural question—relevant no matter how the state of the art in MPC evolves—is to identify simple (and even minimal) primitives that can be implemented via MPC in a distributed model and are expressive enough to allow for sophisticated private data analysis. In this paper, we show that shuffling is a powerful primitive for differentially private algorithms.

Mixnets. One way to realize the shuffling functionality is via a mixnet. A mix network, or mixnet, is a protocol involving several computers that takes as input a sequence of encrypted messages, and outputs a uniformly random permutation of those messages’ plaintexts. Introduced by [10], the basic idea now exists in many variations. In its simplest instantiation, the network consists of a sequence of servers, whose identities and ordering are public information.Footnote 1 Messages, each one encrypted with all the servers’ keys, are submitted by users to the first server. Once enough messages have been submitted, each server in turn performs a shuffle in which the server removes one layer of encryption and sends a permutation of the messages to the next server. In a verifiable shuffle, the server also produces a cryptographic proof that the shuffle preserved the multi-set of messages. The final server sends the messages to their final recipients, which might be different for each message. A variety of efficient implementations of mixnets with verifiable shuffles exist (see, e.g., [5, 23] and citations therein).

Another line of work [19, 30] shows how to use differential privacy in addition to mixnets to make communication patterns differentially private for the purposes of anonymous computation. Despite the superficial similarly, this line of work is orthogonal to ours, which is about how to use mixnets themselves to achieve (more accurate) differentially private data analysis.

figure a

Shufflers as a Primitive for Private Data Analysis. This paper studies how to use a shuffler (e.g. a mixnet) as a cryptographic primitive to implement differentially-private algorithms. Bittau et al. [5] propose a general framework, dubbed encode-shuffle-analyze (or ESA), which generalizes the local and central models by allowing a local randomized encoding step E performed on user devices, a permutation step S in which encrypted encodings are shuffled, and a final randomized process A that analyzes the permuted encodings. We ask what privacy guarantee can be provided if we rely only on the local encoding E and the shuffle S—the analyst A is untrusted. In particular, we are interested in protocols that are substantially more accurate than is possible in the local model (in which the privacy guarantee relies entirely on the encoding E). This general question was left open by [5].

One may think of the shuffled model as specifying a highly restricted MPC primitive on which we hope to base privacy. Relative to general MPC, the use of mixnets for shuffling provides several advantages: First, there already exist a number of highly efficient implementations. Second, their trust model is simple and robust—as long as a single one of the servers performs its shuffle honestly, the entire process is a uniformly random permutation, and our protocols’ privacy guarantees will hold. The architecture and trust guarantees are also easy to explain to nonexperts (say, with metaphors of shuffled cards or shell games). Finally, mixnets automatically provide a number of additional features that are desirable for data collection: they can maintain secrecy of a company’s user base, since each company’s users could use that company’s server as their first hop; and they can maintain secrecy of the company’s computations, since the specific computation is done by the analyst. Note that we think of a mixnet here as operating on large batches of messages, whose size is denoted by n. (In implementation, this requires a fair amount of latency, as the collection point must receive sufficiently many messages before proceeding—see Bittau et al. [5]).

Understanding the possibilities and limitations of shuffled protocols for private data analysis is interesting from both theoretical and practical perspectives. It provides an intermediate abstraction, and we give evidence that it lies strictly between the central and local models. Thus, it sheds light on the minimal cryptographic primitives needed to get the central model’s accuracy. It also provides an attractive platform for near-term deployment [5], for the reasons listed above.

For the remainder of this paper, we treat the shuffler as an abstract service that randomly permutes a set of messages. We leave a discussion of the many engineering, social, and cryptographic implementation considerations to future work.

2 Overview of Results

The Shuffled Model. In our model, there are n users, each with data \(x_i \in \mathcal {X}\). Each user applies some encoder \(R :\mathcal {X}\rightarrow \mathcal {Y}^m\) to their data and sends the messages \((y_{i,1},\dots ,y_{i,m}) = R(x_i)\). In the one-message shuffled model, each user sends \(m=1\) message. The \(n \cdot m\) messages \(y_{i,j}\) are sent to a shuffler \(S:\mathcal {Y}^* \rightarrow \mathcal {Y}^*\) that takes these messages and outputs them in a uniformly random order. The shuffled set of messages is then passed through some analyzer \(A :\mathcal {Y}^* \rightarrow \mathcal {Z}\) to estimate some function \(f(x_1,\dots ,x_n)\). Thus, the protocol P consists of the tuple (RSA). We say that the algorithm is \((\varepsilon ,\delta )\)-differentially private in the shuffled model if the algorithm \( M_{R}(x_1,\dots ,x_n) = S( \cup _{i=1}^{n} R(x_i) ) \) satisfies \((\varepsilon ,\delta )\)-differential privacy. For more detail, see the discussion leading to Definition 8.

In contrast to the local model, differential privacy is now a property of all n users’ messages, and the \((\varepsilon ,\delta )\) may be functions of n. However, if an adversary were to inject additional messages, then it would not degrade privacy, provided that those messages are independent of the honest users’ data. Thus, we may replace n, in our results, as a lower bound on the number of honest users in the system. For example, if we have a protocol that is private for n users, but instead we have \(\frac{n}{p}\) users of which we assume at least a p fraction are honest, the protocol will continue to satisfy differential privacy.

2.1 Algorithmic Results

Our main result shows how to estimate any bounded, real-valued linear statistic (a statistical query) in the shuffled model with error that nearly matches the best possible utility achievable in the central model.

Theorem 1

For every \(\varepsilon \in (0,1)\), and every \(\delta \gtrsim \varepsilon n 2^{-\varepsilon n}\) and every function \(f :\mathcal {X}\rightarrow [0,1]\), there is a protocol P in the shuffled model that is \((\varepsilon ,\delta )\)-differentially private, and for every n and every \(X = (x_1,\dots ,x_n) \in \mathcal {X}^n\),

$$ { \mathbb {E} \left[ \left| P(X) - \sum _{i=1}^{n} f(x_i) \right| \right] } = O\left( \frac{1}{\varepsilon } \log \frac{n}{\delta }\right) . $$

Each user sends \(m = \varTheta (\varepsilon \sqrt{n})\) one-bit messages.

For comparison, in the central model, the Laplace mechanism achieves \((\varepsilon ,0)\)-differential privacy and error \(O(\frac{1}{\varepsilon })\). In contrast, error \(\varOmega (\frac{1}{\varepsilon }\sqrt{n})\) is necessary in the local model. Thus, for answering statistical queries, this protocol essentially has the best properties of the local and central models (up to logarithmic factors).

In the special case of estimating a sum of bits (or a Boolean-valued linear statistic), our protocol has a slightly nicer guarantee and form.

Theorem 2

For every \(\varepsilon \in (0,1)\), and every \(\delta \gtrsim 2^{-\varepsilon n}\) and every function \(f :\mathcal {X}\rightarrow \{0,1\}\), there is a protocol P in the shuffled model that is \((\varepsilon ,\delta )\)-differentially private, and for every n and every \(X = (x_1,\dots ,x_n) \in \mathcal {X}^n\),

$$ { \mathbb {E} \left[ \left| P(X) - \sum _{i=1}^{n} f(x_i) \right| \right] } = O\left( \frac{1}{\varepsilon } \sqrt{\log \frac{1}{\delta }} \right) . $$

Each user sends a single one-bit message.

The protocol corresponding to Theorem 2 is extremely simple:

  1. 1.

    For some appropriate choice of \(p \in (0,1)\), each user i with input \(x_i\) outputs \(y_i = x_i\) with probability \(1-p\) and a uniformly random bit \(y_i\) with probability p. When \(\varepsilon \) is not too small, \(p \approx \frac{\log (1/\delta )}{\varepsilon ^2 n}\).

  2. 2.

    The analyzer collects the shuffled messages \(y_1,\dots ,y_n\) and outputs

    $$\frac{1}{1-p} \left( \sum _{i=1}^{n} y_i - \tfrac{p}{2}\right) \, .$$

Intuition. In the local model, an adversary can map the set of observations \(\{y_1,\dots , y_n\}\) to users. Thus, to achieve \(\varepsilon \)-differential privacy, the parameter p should be set close to \(\frac{1}{2}\). In our model, the attacker sees only the anonymized set of observations \(\{y_1,\dots ,y_n\}\), whose distribution can be simulated using only \(\sum _i y_i\). Hence, to ensure that the protocol is differentially private, it suffices to ensure that \(\sum _i y_i\) is private, which we show holds for \(p \approx \frac{\log (1/\delta )}{\varepsilon ^2 n} \ll \frac{1}{2}\).

Communication Complexity. Our protocol for real-valued queries requires \(\varTheta (\varepsilon \sqrt{n})\) bits per user. In contrast, the local model requires just a single bit, but incurs error \(\varOmega (\frac{1}{\varepsilon } \sqrt{n})\). A generalization of Theorem 1 gives error \(O(\frac{\sqrt{n}}{r} + \frac{1}{\varepsilon } \log \frac{r}{\delta })\) and sends r bits per user, but we do not know if this tradeoff is necessary. Closing this gap is an interesting open question.

2.2 Negative Results

We also prove negative results for algorithms in the one-message shuffled model. These results hinge on a structural characterization of private protocols in the one-message shuffled model.

Theorem 3

If a protocol \(P = (R,S,A)\) satisfies \((\varepsilon ,\delta )\)-differential privacy in the one-message shuffled model, then R satisfies \(( \varepsilon + \ln n, \delta )\)-differential privacy. Therefore, P is \((\varepsilon + \ln n, \delta )\)-differentially private in the local model.

Using Theorem 3 (and a transformation of [7] from \((\varepsilon ,\delta )\)-DP to \((O(\varepsilon ),0)\)-DP in the local model), we can leverage existing lower bounds for algorithms in the local model to obtain lower bounds on algorithms in the shuffled model.

Variable Selection. In particular, consider the following variable selection problem: given a dataset \(x \in \{0,1\}^{n \times d}\), output \(\widehat{J}\) such that

$$ \sum _{i=1}^{n} x_{i,\widehat{J}} \ge \left( \max _{j \in [d]} \ \ \sum _{i=1}^{n} x_{i,j}\right) - \frac{n}{10} \, . $$

(The \(\frac{n}{10}\) approximation term is somewhat arbitrary—any sufficiently small constant fraction of n will lead to the same lower bounds and separations.)

Any local algorithm (with \(\varepsilon = 1\)) for selection requires \(n = \varOmega (d \log d)\), whereas in the central model the exponential mechanism [25] solves this problem for \(n = O(\log d)\). The following lower bound shows that for this ubiquitous problem, the one-message shuffled model cannot match the central model.

Theorem 4

If P is a \((1,\frac{1}{n^{10}})\)-differentially private protocol in the one-message shuffled model that solves the selection problem (with high probability) then \(n = \varOmega (d^{1/17})\). Moreover this lower bound holds even if x is drawn iid from a product distribution over \(\{0,1\}^{d}\).

In Sect. 6, we also prove lower bounds for the well studied histogram problem, showing that any one-message shuffled-model protocol for this problem must have error growing (polylogarithmically) with the size of the data domain. In contrast, in the central model it is possible to release histograms with no dependence on the domain size, even for infinite domains.

We remark that our lower bound proofs do not apply if the algorithm sends multiple messages through the shuffler. However, we do not know whether beating the bounds is actually possible. Applying our bit-sum protocol d times (together with differential privacy’s composition property) shows that \(n = \tilde{O}(\sqrt{d})\) samples suffice in the general shuffled model. We also do not know if this bound can be improved. We leave it as an interesting direction for future work to fully characterize the power of the shuffled model.

2.3 Comparison to [16]

In concurrent and independent work, Erlingsson et al. [16] give conceptually similar positive results for local protocols aided by a shuffler. Specifically, they prove a general amplification result: adding a shuffler to any protocol satisfying local differential privacy improve the privacy parameters, often quite significantly. This amplification result can be seen as a partial converse to our transformation from shuffled protocols to local protocols (Theorem 3).

Their result applies to any local protocol, whereas our protocol for bit-sums (Theorem 2) applies specifically to the one-bit randomized response protocol. However, when specialized to randomized response, their result is quantitatively weaker than ours. As stated, their results only apply to local protocols that satisfy \(\varepsilon \)-differential privacy for \(\varepsilon < 1\). In contrast, the proof of Theorem 2 shows that, for randomized response, local differential privacy \(\varepsilon \approx \ln (n)\) can be amplified to \(\varepsilon ' = 1\). Our best attempt at generalizing their proof to the case of \(\varepsilon \gg 1\) does not give any amplification for local protocols with \(\varepsilon \approx \ln (n)\). Specifically, our best attempt at applying their method to the case of randomized response yields a shuffled protocol that is 1-differentially private and has error \(\varTheta (n^{5/12})\), which is just slightly better than the error \(O(\sqrt{n})\) that can be achieved without a shuffler.

3 Model and Preliminaries

In this section, we define terms and notation used throughout the paper. We use \(\mathrm {Ber}(p)\) to denote the Bernoulli distribution over \(\{0,1\}\), which has value 1 with probability p and 0 with probability \(1-p\). We will use \(\mathrm {Bin}(n,p)\) to denote the binomial distribution (i.e. the sum of n independent samples from \(\mathrm {Ber}(p)\).

3.1 Differential Privacy

Let \(X \in \mathcal {X}^n\) be a dataset consisting of elements from some universe \(\mathcal {X}\). We say two datasets \(X,X'\) are neighboring if they differ on at most one user’s data, and denote this \(X \sim X'\).

Definition 5

(Differential Privacy [14]). An algorithm \(M :\mathcal {X}^* \rightarrow \mathcal {Z}\) is \((\varepsilon ,\delta )\)-differentially private if for every \(X \sim X' \in \mathcal {X}^*\) and every \(T \subseteq \mathcal {Z}\)

$$ { \mathbb {P} \left[ M(X) \in T\right] } \le e^{\varepsilon } { \mathbb {P} \left[ M(X') \in T\right] } + \delta . $$

where the probability is taken over the randomness of M.

Differential privacy satisfies two extremely useful properties:

Lemma 6

(Post-Processing [14]). If M is \((\varepsilon ,\delta )\)-differentially private, then for every A, \(A \circ M\) is \((\varepsilon ,\delta )\)-differentially private.

Lemma 7

(Composition [14, 15]). If \(M_1,\dots ,M_T\) are \((\varepsilon ,\delta )\)-differentially private, then the composed algorithm

$$ \widetilde{M}(X) = (M_1(X),\dots ,M_T(X)) $$

is \((\varepsilon ', \delta ' + T\delta )\)-differentially private for every \(\delta ' > 0\) and \(\varepsilon ' = \varepsilon (e^\varepsilon -1)T + \varepsilon \sqrt{2T \log (1/\delta ')}\).

3.2 Differential Privacy in the Shuffled Model

In our model, there are n users, each of whom holds data \(x_i \in \mathcal {X}\). We will use \(X = (x_1,\dots ,x_n) \in \mathcal {X}^n\) to denote the dataset of all n users’ data. We say two datasets \(X,X'\) are neighboring if they differ on at most one user’s data, and denote this \(X \sim X'\).

The protocols we consider consist of three algorithms:

  • \(R :\mathcal {X}\rightarrow \mathcal {Y}^m\) is a randomized encoder that takes as input a single users’ data \(x_i\) and outputs a set of m messages \(y_{i,1},\dots ,y_{i,m} \in \mathcal {Y}\). If \(m=1\), then P is in the one-message shuffled model.

  • \(S:\mathcal {Y}^* \rightarrow \mathcal {Y}^*\) is a shuffler that takes a set of messages and outputs these messages in a uniformly random order. Specifically, on input \(y_1,\dots ,y_N\), \(S\) chooses a uniformly random permutation \(\pi :[N] \rightarrow [N]\) and outputs \(y_{\pi (1)},\dots ,y_{\pi (N)}\).

  • \(A :\mathcal {Y}^* \rightarrow \mathcal {Z}\) is some analysis function or analyzer that takes a set of messages \(y_1,\dots ,y_N\) and attempts to estimate some function \(f(x_1,\dots ,x_n)\) from these messages.

We denote the overall protocol \(P = (R,S,A)\). The mechanism by which we achieve privacy is

$$\varPi _{R}(x_1,\dots ,x_n) = S(\cup _{i=1}^n R(x_i)) = S(y_{1,1},\dots ,y_{n,m}),$$

where both R and \(S\) are randomized. We will use \(P(X) = A \circ \varPi _{R}(X)\) to denote the output of the protocol. However, by the post-processing property of differential privacy (Lemma 6), it will suffice to consider the privacy of \(\varPi _{R}(X)\), which will imply the privacy of P(X). We are now ready to define differential privacy for protocols in the shuffled model.

Definition 8

(Differential Privacy in the Shuffled Model). A protocol \(P = (R,S,A)\) is \((\varepsilon ,\delta )\)-differentially private if the algorithm \(\varPi _{R}(x_1,\dots ,x_n) = S(R(x_1),\dots ,R(x_n))\) is \((\varepsilon ,\delta )\)-differentially private (Definition 5).

In this model, privacy is a property of the entire set of users’ messages and of the shuffler, and thus \(\varepsilon ,\delta \) may depend on the number of users n. When we wish to refer to P or \(\varPi \) with a specific number of users n, we will denote this by \(P_{n}\) or \(\varPi _{n}\).

We remark that if an adversary were to inject additional messages, then it would not degrade privacy, provided that those messages are independent of the honest users’ data. Thus, we may replace n, in our results, with an assumed lower bound on the number of honest users in the system.

In some of our results it will be useful to have a generic notion of accuracy for a protocol P.

Definition 9

(Accuracy of Distributed Protocols). Protocol \(P=(R,S,A)\) is \((\alpha ,\beta )\)-accurate for the function \(f:\mathcal {X}^* \rightarrow \mathcal {Z}\) if, for every \(X \in \mathcal {X}^*\), we have \({ \mathbb {P} \left[ d(P(X),f(X)) \le \alpha \right] } \ge 1-\beta \) where \(d :\mathcal {Z}\times \mathcal {Z}\rightarrow \mathbb {R}\) is some application-dependent distance measure.

As with the privacy guarantees, the accuracy of the protocol may depend on the number of users n, and we will use \(P_n\) when we want to refer to the protocol with a specific number of users.

Composition of Differential Privacy. We will use the following useful composition property for protocols in the shuffled model, which is an immediate consequence of Lemma 7 and the post-processing Lemma 6. This lemma allows us to directly compose protocols in the shuffled model while only using the shuffler once, rather than using the shuffler independently for each protocol being composed.

Lemma 10

(Composition of Protocols in the Shuffled Model). If \(\varPi _1 = (R_1,S),\dots ,\varPi _{T} = (R_{T},S)\) for \(R_t :\mathcal {X}\rightarrow \mathcal {Y}^m\) are each \((\varepsilon ,\delta )\)-differentially private in the shuffled model, and \(\widetilde{R} :\mathcal {X}\rightarrow \mathcal {Y}^{mT}\) is defined as

$$ \widetilde{R}(x_i) = (R_{1}(x_i),\dots ,R_{T}(x_i)) $$

then, for every \(\delta ' > 0\), the composed protocol \(\widetilde{\varPi } = (\widetilde{R}, S)\) is \((\varepsilon ', \delta ' + T\delta )\)-differentially private in the shuffled model for \(\varepsilon ' = \varepsilon ^2 + 2\varepsilon \sqrt{T \log (1/\delta ')}\).

Local Differential Privacy. If the shuffler \(S\) were replaced with the identity function (i.e. if it did not randomly permute the messages) then we would be left with exactly the local model of differential privacy. That is, a locally differentially private protocol is a pair of algorithms \(P = (R,A)\), and the output of the protocol is \(P(X) = A(R(x_1),\dots ,R(x_n))\). A protocol P is differentially private in the local model if and only if the algorithm R is differentially private. In Sect. 6 we will see that if \(P = (R,S,A)\) is a differentially private protocol in the one-message shuffled model, then R itself must satisfy local differential privacy for non-trivial \((\varepsilon ,\delta )\), and thus \((R,A \circ S)\) is a differentially private local protocol for the same problem.

4 A Protocol for Boolean Sums

In this section we describe and analyze a protocol for computing a sum of \(\{0,1\}\) bits, establishing Theorem 2 in the introduction.

4.1 The Protocol

In our model, the data domain is \(\mathcal {X}= \{0,1\}\) and the function being computed is \(f(x_1,\dots ,x_n) = \sum _{i=1}^{n} x_i\). Our protocol, \(P_\lambda \), is specified by a parameter \(\lambda \in [0,n]\) that allows us to trade off the level of privacy and accuracy. Note that \(\lambda \) may be a function of the number of users n. We will discuss in Sect. 4.3 how to set this parameter to achieve a desired level of privacy. For intuition, one may wish to think of the parameter \(\lambda \approx \frac{1}{\varepsilon ^2}\) when \(\varepsilon \) is not too small.

The basic outline of \(P_\lambda \) is as follows. Roughly, a random set of \(\lambda \) users will choose \(y_i\) randomly, and the remaining \(n - \lambda \) will choose \(y_i\) to be their input bit \(x_i\). The output of each user is the single message \(y_i\). The outputs are then shuffled and the output of the protocol is the sum \(\sum _{i=1}^{n} y_i\), shifted and scaled so that it is an unbiased estimator of \(\sum _{i=1}^{n} x_i\).

The protocol is described in Algorithm 1. The full name of this protocol is \(P^{0/1}_\lambda \), where the superscript serves to distinguish it with the real sum protocol \(P^\mathbb {R}_{\lambda ,r}\) (Sect. 5). Because of the clear context of this section, we drop the superscript. Since the analysis of both the accuracy and utility of the algorithm will depend on the number of users n, we will use \(P_{n,\lambda }, R_{n,\lambda }, A_{n,\lambda }\) to denote the protocol and its components in the case where the number of users is n.

figure b

4.2 Privacy Analysis

In this section we will prove that \(P_{\lambda }\) satisfies \((\varepsilon ,\delta )\)-differential privacy. Note that if \(\lambda = n\) then the each user’s output is independent of their input, so the protocol trivially satisfies (0, 0)-differential privacy, and thus our goal is to prove an upper bound on the parameter \(\lambda \) that suffices to achieve a given \((\varepsilon ,\delta )\).

Theorem 11

(Privacy of \(P_{\lambda }\)). There are absolute constants \(\kappa _1,\dots ,\kappa _5\) such that the following holds for \(P_\lambda \). For every \(n \in \mathbb {N}\), \(\delta \in (0,1)\) and \(\frac{\kappa _2 \log (1/\delta )}{n}\le \varepsilon \le 1\), there exists a \(\lambda = \lambda (n,\varepsilon ,\delta )\) such that \(P_{n,\lambda }\) is \((\varepsilon ,\delta )\) differentially private and,

$$\begin{aligned} \lambda \le {\left\{ \begin{array}{ll} \frac{\kappa _4 \log (1/\delta )}{\varepsilon ^2} &{} if\,\varepsilon \ge \sqrt{\frac{\kappa _3 \log (1/\delta )}{n}} \\ n - \frac{\kappa _5 \varepsilon n^{3/2}}{\sqrt{\log (1/\delta )}} &{} otherwise \end{array}\right. } \end{aligned}$$

In the remainder of this section we will prove Theorem 11.

The first step in the proof is the observation that the output of the shuffler depends only on \(\sum _i y_i\). It will be more convenient to analyze the algorithm \(C_{\lambda }\) (Algorithm 2) that simulates \(S(R_\lambda (x_1),\dots ,R_{\lambda }(x_n))\). Claim 12 shows that the output distribution of \(C_{\lambda }\) is indeed the same as that of the output \(\sum _{i} y_i\). Therefore, privacy of \(C_{\lambda }\) carries over to \(P_{\lambda }\).

figure c

Claim 12

For every \(n\in \mathbb {N}\), \(x\in \{0,1\}^n\), and every \(r\in \{0,1,2,\dots ,n\}\),

$$ { \mathbb {P} \left[ C_{\lambda }(X)=r\right] }={ \mathbb {P} \left[ \sum _{i=1}^n R_{n,\lambda }(x_i) =r\right] }$$

Proof

Fix any \(r\in \{0,1,2,\dots ,n\}\).

$$\begin{aligned} { \mathbb {P} \left[ C_{\lambda }(X)=r\right] }&= \sum _{H\subseteq [n]} { \mathbb {P} \left[ C_{\lambda }(X)=r \cap \mathbf {H}=H\right] } \nonumber \\&= \sum _{H\subseteq [n]} { \mathbb {P} \left[ \sum _{i\notin H} x_i + Bin \left( |H|,\frac{1}{2}\right) =r\right] } \cdot \left( \frac{\lambda }{n}\right) ^{|H|}\left( 1-\frac{\lambda }{n}\right) ^{n-|H|}\nonumber \\&= \sum _{H\subseteq [n]} { \mathbb {P} \left[ \sum _{i\notin H} x_i + \sum _{i\in H} Ber \left( \frac{1}{2}\right) =r\right] } \cdot \left( \frac{\lambda }{n}\right) ^{|H|}\left( 1-\frac{\lambda }{n}\right) ^{n-|H|} \end{aligned}$$
(1)

Let \(\mathbf {G}\) denote the (random) set of people for whom \(b_i= 1\) in \(P_{\lambda }\). Notice that

$$\begin{aligned} { \mathbb {P} \left[ \sum _{i=1}^n R_{n,\lambda }(x_i)=r\right] }&= \sum _{G \subseteq [n]} { \mathbb {P} \left[ \sum _i R_{n,\lambda }(x_i) = r \cap \mathbf {G} = G\right] }\\&= \sum _{G\subseteq [n]} { \mathbb {P} \left[ \sum _{i\notin G} x_i + \sum _{i\in G} Ber \left( \frac{1}{2}\right) =r\right] } \\&~~~~~\cdot \left( \frac{\lambda }{n}\right) ^{|G|}\left( 1-\frac{\lambda }{n}\right) ^{n-|G|} \end{aligned}$$

which is the same as (1). This concludes the proof.    \(\square \)

Now we establish that in order to demonstrate privacy of \(P_{n,\lambda }\), it suffices to analyze \(C_{\lambda }\).

Claim 13

If \(C_{\lambda }\) is \((\varepsilon ,\delta )\) differentially private, then \(P_{n,\lambda }\) is \((\varepsilon ,\delta )\) differentially private.

Proof

Fix any number of users n. Consider the randomized algorithm \(T:\{0,1,2,\dots ,n\} \rightarrow \{0,1\}^n\) that takes a number r and outputs a uniformly random string z that has r ones. If \(C_{\lambda }\) is differentially private, then the output of \(T\circ C_{\lambda }\) is \((\varepsilon ,\delta )\) differentially private by the post-processing lemma.

To complete the proof, we show that for any \(X\in \mathcal {X}^n\) the output of \((T\circ C_{\lambda })(X)\) has the same distribution as \(S(R_{\lambda }(x_1),\dots R_{\lambda }(x_n))\). Fix some vector \(Z \in \{0,1\}^{n}\) with sum r

figure d

This completes the proof of Claim 13.    \(\square \)

We will analyze the privacy of \(C_{\lambda }\) in three steps. First we show that for any sufficiently large H, the final step (encapsulated by Algorithm 3) will ensure differential privacy for some parameters. When then show that for any sufficiently large value s and H chosen randomly with \(|H| = s\), the privacy parameters actually improve significantly in the regime where s is close to n; this sampling of H is performed by Algorithm 4. Finally, we show that when s is chosen randomly then s is sufficiently large with high probability.

figure e

Claim 14

For any \(\delta >0\) and any \(H\subseteq [n]\) such that \(|H|>8\log \frac{4}{\delta }\), \(C_H\) is \((\varepsilon ,\frac{\delta }{2})\)-differentially private for

$$ \varepsilon = \ln \left( 1+\sqrt{\frac{32 \log \frac{4}{\delta }}{|H|}}\right) < \sqrt{\frac{32 \log \frac{4}{\delta }}{|H|}} $$

Proof

Fix neighboring datasets \(X\sim X'\in \{0,1\}^n\), any \(H\subseteq [n]\) such that \(|H|> 8\log \frac{4}{\delta }\), and any \(\delta >0\). If the point at which \(X,X'\) differ lies within H, the two distributions \(C_H(X),C_H(X')\) are identical. Hence, without loss of generality we assume that \(x_j=0\) and \(x_j'=1\) for some \(j \not \in H\).

Define \(u:=\sqrt{\frac{1}{2}|H|\log \frac{4}{\delta }}\) and \(I_u:= \left( \frac{1}{2}|H|-u, \frac{1}{2}|H|+u \right) \) so that by Hoeffding’s inequality, \({ \mathbb {P} \left[ \mathbf {B} \in I_u\right] } < \frac{1}{2}\delta \). For any \(W\subseteq \{0,1,2,\dots ,n\}\) we have,

$$\begin{aligned} { \mathbb {P} \left[ C_H(X)\in W\right] }&= { \mathbb {P} \left[ C_H(X)\in W \cap \mathbf {B} \in I_u\right] } + { \mathbb {P} \left[ C_H(X)\in W \cap \mathbf {B} \notin I_u\right] }\\&\le { \mathbb {P} \left[ C_H(X)\in W \cap \mathbf {B} \in I_u\right] } + \frac{1}{2}\delta \\&= \sum _{r \in W\cap I_u} { \mathbb {P} \left[ \mathbf {B}+\sum _{i\notin H} x_i = r\right] } + \frac{1}{2}\delta \end{aligned}$$

Thus to complete the proof, it suffices to show that for any H and \(r \in W\cap I_u\)

$$\begin{aligned} \frac{{ \mathbb {P} \left[ \mathbf {B}+\sum _{i\notin H} x_i =r\right] }}{{ \mathbb {P} \left[ \mathbf {B}+\sum _{i\notin H} x_i' =r\right] }} \le 1+\sqrt{\frac{32 \log \frac{4}{\delta }}{|H|}} \end{aligned}$$
(2)

Because \(x_j=0,x_j'=1\) and \(j\notin H\), we have \(\sum _{i\notin H} x_i = \sum _{i\notin H} x_i' - 1\). Thus,

figure f

Now we define \(k = r-\sum _{i\notin H} x_i'\) so that

$$\frac{{ \mathbb {P} \left[ \mathbf {B}= \left( r-\sum _{i\notin H} x_i'\right) +1\right] }}{{ \mathbb {P} \left[ \mathbf {B} = \left( r-\sum _{i\notin H} x_i'\right) \right] }} = \frac{{ \mathbb {P} \left[ \mathbf {B} = k+1\right] }}{{ \mathbb {P} \left[ \mathbf {B} = k\right] }}.$$

Then we can calculate

figure g

which completes the proof.   \(\square \)

Next, we consider the case where H is a random subset of [n] with a fixed size s. In this case we will use an amplification via sampling argument [20, 27] to argue that the randomness of H improves the privacy parameters by a factor of roughly \((1-\frac{s}{n})\), which will be crucial when \(s \approx n\).

figure h

Claim 15

For any \(\delta >0\) and any \(s>8\log \frac{4}{\delta }\), \(C_s\) is \(( \varepsilon ,\frac{1}{2}\delta )\) differentially private for

$$ \varepsilon = \sqrt{\frac{32\log \frac{4}{\delta }}{s}} \cdot \left( 1-\frac{s}{n}\right) $$

Proof

As in the previous section, fix \(X\sim X'\in \{0,1\}^n\) where \(x_j=0, x_j'=1\). \(C_s(X)\) selects \(\mathbf {H}\) uniformly from \(\mathcal {H}_s\) and runs \(C_H(X)\); let H denote the realization of \(\mathbf {H}\). To enhance readability, we will use the shorthand \(\varepsilon _0(s) := \sqrt{\frac{32\log \frac{4}{\delta }}{s}}\). For any \(W\subset \{0, 1, 2, \dots , n\}\), we aim to show that

$$ \frac{{\underset{\mathbf {H},C_\mathbf {H}}{\mathbb {P}} \left[ C_\mathbf {H}(X)\in W\right] }-\frac{1}{2}\delta }{{\underset{\mathbf {H},C_\mathbf {H}}{\mathbb {P}} \left[ C_\mathbf {H}(X')\in W\right] }} \le \exp \left( \varepsilon _0(s) \cdot \left( 1-\frac{s}{n}\right) \right) $$

First, we have

$$\begin{aligned}&\frac{{\underset{\mathbf {H},C_\mathbf {H}}{\mathbb {P}} \left[ C_\mathbf {H}(X)\in W\right] }-\frac{1}{2}\delta }{{\underset{\mathbf {H},C_\mathbf {H}}{\mathbb {P}} \left[ C_\mathbf {H}(X')\in W\right] }} \nonumber \\ ={}&\frac{ {\underset{\mathbf {H},C_\mathbf {H}}{\mathbb {P}} \left[ C_\mathbf {H}(X)\in W \cap j\in \mathbf {H}\right] } + {\underset{\mathbf {H},C_\mathbf {H}}{\mathbb {P}} \left[ C_\mathbf {H}(X)\in W\cap j\notin \mathbf {H}\right] } -\frac{1}{2}\delta }{{\underset{\mathbf {H},C_\mathbf {H}}{\mathbb {P}} \left[ C_\mathbf {H}(X')\in W\cap j\in \mathbf {H}\right] } + {\underset{\mathbf {H},C_\mathbf {H}}{\mathbb {P}} \left[ C_\mathbf {H}(X')\in W\cap j\notin \mathbf {H}\right] } } \nonumber \\ ={}&\frac{(1-p)\gamma (X)+p\zeta (X)-\frac{1}{2}\delta }{(1-p)\gamma (X')+p\zeta (X')} \end{aligned}$$
(3)

where \(p := { \mathbb {P} \left[ j\notin \mathbf {H}\right] }=(1-s/n)\),

$$\gamma (X) := {\underset{C_\mathbf {H}}{\mathbb {P}} \left[ C_\mathbf {H}(X)\in W\mid j\in \mathbf {H}\right] }\quad \text {and}\quad \zeta (X) := {\underset{C_\mathbf {H}}{\mathbb {P}} \left[ C_\mathbf {H}(X)\in W\mid j\notin \mathbf {H}\right] }\, .$$

When user j outputs a uniformly random bit, their private value has no impact on the distribution. Hence, \(\gamma (X)=\gamma (X')\), and

$$\begin{aligned} (3) = \frac{(1-p)\gamma (X)+p\zeta (X)-\frac{1}{2}\delta }{(1-p)\gamma (X)+p\zeta (X')} \end{aligned}$$
(4)

Since \(s = |H|\) is sufficiently large, by Claim 14 we have \(\zeta (X) \le (1+\varepsilon _0(s)) \cdot \min \{ \zeta (X'), \gamma (X) \} + \frac{1}{2}\delta \).

$$\begin{aligned} (4)&\le \frac{(1-p)\gamma (X)+p\cdot (1+\varepsilon _0(s)) \cdot \min \{\zeta (X'),\gamma (X)\}+\delta )-\frac{1}{2}\delta }{(1-p)\gamma (X)+p\zeta (X')} \nonumber \\&\le \frac{(1-p)\gamma (X)+p\cdot (1+\varepsilon _0(s)) \cdot \min \{\zeta (X'),\gamma (X)\}}{(1-p)\gamma (X)+p\zeta (X')} \nonumber \\&= \frac{(1-p)\gamma (X)+p\cdot \min (\zeta (X'),\gamma (X)) + p\cdot \varepsilon _0(s) \cdot \min \{\zeta (X'),\gamma (X)\}}{(1-p)\gamma (X)+p\zeta (X')} \nonumber \\&\le \frac{(1-p)\gamma (X)+p\zeta (X') + p\cdot \varepsilon _0(s) \cdot \min \{\zeta (X'),\gamma (X)\}}{(1-p)\gamma (X)+p\zeta (X')} \nonumber \\&= 1+ \frac{p\cdot \varepsilon _0(s) \cdot \min \{\zeta (X'),\gamma (X)\}}{(1-p)\gamma (X)+p\zeta (X')} \end{aligned}$$
(5)

Observe that \(\min \{\zeta (X'),\gamma (X)\} \le (1-p) \gamma (X) + p \zeta (X')\), so

$$\begin{aligned} (5) \le 1+ p\cdot \varepsilon _0(s) = 1+\varepsilon _0(s)\cdot \left( 1-\frac{s}{n}\right)&\le \exp \left( \varepsilon _0(s)\cdot \left( 1-\frac{s}{n}\right) \right) \\&= \exp \left( \sqrt{\frac{32\log (4/\delta )}{s}} \cdot \left( 1-\frac{s}{n}\right) \right) \end{aligned}$$

which completes the proof.   \(\square \)

We now come to the actual algorithm \(C_{\lambda }\), where s is not fixed but is random. The analysis of \(C_{s}\) yields a bound on the privacy parameter that increases with s, so we will complete the analysis of \(C_{\lambda }\) by using the fact that, with high probability, s is almost as large as \(\lambda \).

Claim 16

For any \(\delta >0\) and \(n \ge \lambda \ge 14\log \frac{4}{\delta }\), \(C_{\lambda }\) is \((\varepsilon , \delta )\) differentially private where

$$ \varepsilon = \sqrt{\frac{32\log \frac{4}{\delta }}{\lambda - \sqrt{2\lambda \log \tfrac{2}{\delta }}}} \cdot \left( 1 - \frac{\lambda - \sqrt{2 \lambda \log \tfrac{2}{\delta }}}{n} \right) $$

The proof is in the full version of the paper.

From Claim 13, \(C_{\lambda }\) and \(P_{n,\lambda }\) share the same privacy guarantees. Hence, Claim 16 implies the following:

Corollary 17

For any \(\delta \in (0,1)\), \(n \in \mathbb {N}\), and \(\lambda \in \left[ 14\log \frac{4}{\delta },n\right] \), \(P_{n,\lambda }\) is \((\varepsilon ,\delta )\) differentially private, where

$$ \varepsilon = \sqrt{\frac{32\log \frac{4}{\delta }}{\lambda - \sqrt{2\lambda \log \tfrac{2}{\delta }}}} \cdot \left( 1 - \frac{\lambda - \sqrt{2 \lambda \log \tfrac{2}{\delta }}}{n} \right) $$

4.3 Setting the Randomization Parameter

Corollary 17 gives a bound on the privacy of \(P_{n,\lambda }\) in terms of the number of users n and the randomization parameter \(\lambda \). While this may be enough on its own, in order to understand the tradeoff between \(\varepsilon \) and the accuracy of the protocol, we want to identify a suitable choice of \(\lambda \) to achieve a desired privacy guarantee \((\varepsilon ,\delta )\). To complete the proof of Theorem 11, we prove such a bound.

For the remainder of this section, fix some \(\delta \in (0,1)\). Corollary 17 states that for any n and \(\lambda \in \left[ 14\log \frac{4}{\delta },n\right] \), \(P_{n,\lambda }\) satisfies \((\varepsilon ^*(\lambda ),\delta )\)-differential privacy, where

$$\begin{aligned} \varepsilon ^*(\lambda ) = \sqrt{\frac{32\log \frac{4}{\delta }}{\lambda - \sqrt{2\lambda \log \tfrac{2}{\delta }}}} \cdot \left( 1 - \frac{\lambda - \sqrt{2 \lambda \log \tfrac{2}{\delta }}}{n} \right) \end{aligned}$$

Let \(\lambda ^*(\varepsilon )\) be the inverse of \(\varepsilon ^*\), i.e. the minimum \(\lambda \in [0,n]\) such that \(\varepsilon ^*(\lambda ) \le \varepsilon \). Note that \(\varepsilon ^*(\lambda )\) is decreasing as \(\lambda \rightarrow n\) while \(\lambda ^*(\varepsilon )\) increases as \(\varepsilon \rightarrow 0\). By definition, \(P_{n,\lambda }\) satisfies \((\varepsilon ,\delta )\) privacy if \(\lambda \ge \lambda ^*(\varepsilon )\); the following Lemma gives such an upper bound:

Lemma 18

For all \(\delta \in (0,1)\), \(n \ge 14\log \frac{4}{\delta }\), \(\varepsilon \in \left( \frac{\sqrt{3456}}{n}\log \frac{4}{\delta },1\right) \), \(P_{n,\lambda }\) is \((\varepsilon ,\delta )\) differentially private if

$$\begin{aligned} \lambda = {\left\{ \begin{array}{ll} \frac{64}{\varepsilon ^2}\log \frac{4}{\delta } &{} if\,\varepsilon \ge \sqrt{\frac{192}{n}\log \frac{4}{\delta }}\\ n-\frac{\varepsilon n^{3/2}}{\sqrt{432\log (4/\delta )}} &{} otherwise \end{array}\right. } \end{aligned}$$
(6)

We’ll prove the lemma in two claims, each of which corresponds to one of the two cases of our bound on \(\lambda ^*(\varepsilon )\). The first bound applies when \(\varepsilon \) is relatively large.

Claim 19

For all \(\delta \in (0,1)\), \(n \ge 14\log \frac{4}{\delta }\), \(\varepsilon \in \left( \sqrt{\frac{192}{n}\log \frac{4}{\delta }},1\right) \), if \(\lambda = \frac{64}{\varepsilon ^2} \log \frac{4}{\delta }\) then \(P_{n,\lambda }\) is \((\varepsilon ,\delta )\) private.

Proof

Let \(\lambda = \frac{64}{\varepsilon ^2} \log \frac{4}{\delta }\) as in the statement. Corollary 17 states that \(P_{n,\lambda }\) satisfies \((\varepsilon ^*(\lambda ),\delta )\) privacy for

figure i

This completes the proof of the claim.   \(\square \)

The value of \(\lambda \) in the previous claim can be as large as n when \(\varepsilon \) approaches \(1/\sqrt{n}\). We now give a meaningful bound for smaller values of \(\varepsilon \).

Claim 20

For all \(\delta \in (0,1)\), \(n \ge 14\log \frac{4}{\delta }\), \(\varepsilon \in \left( \frac{\sqrt{3456}}{n}\log \frac{4}{\delta }, \sqrt{\frac{192}{n} \log \frac{4}{\delta }}\right) \), if

$$\lambda = n - \frac{\varepsilon n^{3/2}}{\sqrt{432\log (4/\delta )}}$$

then \(P_{n,\lambda }\) is \((\varepsilon ,\delta )\) private.

Proof

Let \(\lambda = n - \varepsilon n^{3/2} / \sqrt{432\log (4/\delta )}\) as in the statement. Note that for this \(\varepsilon \) regime, we have \(n/3< \lambda < n\). Corollary 17 states that \(P_{n,\lambda }\) satisfies \((\varepsilon ^*(\lambda ),\delta )\) privacy for

figure j

which completes the proof.   \(\square \)

4.4 Accuracy Analysis

In this section, we will bound the error of \(P_{\lambda }(X)\) with respect to \(\sum _i x_i\). Recall that, to clean up notational clutter, we will often write \(f(X) = \sum _i x_i\). As with the previous section, our statements will at first be in terms of \(\lambda \) but the section will end with a statement in terms of \(\varepsilon ,\delta \).

Theorem 21

For every \(n \in \mathbb {N}\), \(\beta > 0\), \(n > \lambda \ge 2\log \frac{2}{\beta }\), and \(x \in \{0,1\}^{n}\),

$$ { \mathbb {P} \left[ \left| P_{n,\lambda }(x) - \sum _i x_i \right| > \sqrt{2\lambda \log (2/\beta )} \cdot \left( \frac{n}{n-\lambda }\right) \right] } \le \beta $$

Observe that, using the choice of \(\lambda \) specified in Theorem 11, we conclude that for every \(\frac{1}{n} \lesssim \varepsilon \lesssim 1\) and every \(\delta \) the protocol \(P_{\lambda }\) satisfies

$$ { \mathbb {P} \left[ \left| P_{n,\lambda }(x) - \sum _i x_i \right| > O\left( \frac{\sqrt{\log (1/\delta )\log (1/\beta )}}{\varepsilon } \right) \right] } \le \beta $$

To see how this follows from Theorem 21, consider two parameter regimes:

  1. 1.

    When \(\varepsilon \gg 1/\sqrt{n}\) then \(\lambda \approx \frac{\sqrt{\log (1/\delta )}}{\varepsilon ^2} \ll n\), so the bound in Theorem 21 is \(O(\sqrt{\lambda \log (1/\beta )})\), which yields the desired bound.

  2. 2.

    When \(\varepsilon \ll 1/\sqrt{n}\) then \(n-\lambda \approx \varepsilon n^{3/2}/\sqrt{\log (1/\delta )} \ll n\), so the bound in Theorem 21 is \(O\left( \frac{n^{3/2} \sqrt{\log (1/\beta )}}{n-\lambda }\right) \), which yields the desired bound.

Theorem 2 in the introduction follows from this intuition; a formal proof can be found in the full version.

5 A Protocol for Sums of Real Numbers

In this section, we show how to extend our protocol to compute sums of bounded real numbers. In this case the data domain is \(\mathcal {X}= [0,1]\), but the function we wish to compute is still \(f(x) = \sum _i x_i\). The main idea of the protocol is to randomly round each number \(x_i\) to a Boolean value \(b_i \in \{0,1\}\) with expected value \(x_i\). However, since the randomized rounding introduces additional error, we may need to round multiple times and estimate several sums. As a consequence, this protocol is not one-message.

5.1 The Protocol

Our algorithm is described in two parts, an encoder \(E_r\) that performs the randomized rounding (Algorithm 5) and a shuffled protocol \(P^\mathbb {R}_{\lambda ,r}\) (Algorithm 6) that is the composition of many copies of our protocol for the binary case, \(P^{0/1}_{\lambda }\). The encoder takes a number \(x \in [0,1]\) and a parameter \(r \in \mathbb {N}\) and outputs a vector \((b_1,\dots ,b_r) \in \{0,1\}^{r}\) such that \({ \mathbb {E} \left[ \frac{1}{r} \sum _j b_j\right] } = x_j\) and \({ \mathrm {Var} \left[ \frac{1}{r} \sum _j b_j\right] } = O(1/r^2)\). To clarify, we give two examples of the encoding procedure:

  • If \(r = 1\) then the encoder simply sets \(b = Ber (x)\). The mean and variance of b are x and \(x(1-x) \le \frac{1}{4}\), respectively.

  • If \(x = .4\) and \(r = 4\) then the encoder sets \(b = (1, Ber (.6),0,0)\). The mean and variance of \(\frac{1}{4}(b_1 + b_2 + b_3 + b_4)\) are .4 and .015, respectively.

After doing the rounding, we then run the bit-sum protocol \(P^{0/1}_{\lambda }\) on the bits \(b_{1,j},\dots ,b_{n,j}\) for each \(j \in [r]\) and average the results to obtain an estimate of the quantity

$$ \sum _{i} \frac{1}{r} \sum _{j} b_{i,j} \approx \sum _{i} x_i $$

To analyze privacy we use the fact that the protocol is a composition of bit-sum protocols, which are each private, and thus we can analyze privacy via the composition properties of differential privacy.

Much like in the bit-sum protocol, we use \(P^\mathbb {R}_{n,\lambda ,r}, R^\mathbb {R}_{n,\lambda ,r}, A^\mathbb {R}_{n,\lambda ,r}\) to denote the real-sum protocol and its components when n users participate.

figure k
figure l

Theorem 22

For every \(\delta = \delta (n)\) such that \(e^{-\varOmega (n^{1/4})}< \delta (n) < \frac{1}{n}\) and \(\frac{\mathrm {poly(\log n)}}{n}< \varepsilon < 1\) and every sufficiently large n, there exists parameters \(\lambda \in [0,n],r\in \mathbb {N}\) such that \(P^\mathbb {R}_{n,\lambda ,r}\) is both \((\varepsilon ,\delta )\) differentially private and for every \(\beta > 0\), and every \(X = (x_1,\dots ,x_n) \in [0,1]^n\),

$$ { \mathbb {P} \left[ \left| P^\mathbb {R}_{n,\lambda ,r}(X) - \sum _{i=1}^{n} x_{i} \right| > O\left( \frac{1}{\varepsilon } \log \frac{1}{\delta } \sqrt{ \log \frac{1}{\beta }}\right) \right] } \le \beta $$

5.2 Privacy Analysis

Privacy will follow immediately from the composition properties of shuffled protocols (Lemma 10) and the privacy of the bit-sum protocol \(P_{n,\lambda }\). One technical nuisance is that the composition properties are naturally stated in terms of \(\varepsilon \), whereas the protocol is described in terms of the parameter \(\lambda \), and the relationship between \(\varepsilon , \lambda ,\) and n is somewhat complex. Thus, we will state our guarantees in terms of the level of privacy that each individual bit-sum protocol achieves with parameter \(\lambda \). To this end, define the function \(\lambda ^*(n,\varepsilon ,\delta )\) to be the minimum value of \(\lambda \) such that the bit-sum protocol with n users satisfies \((\varepsilon ,\delta )\)-differential privacy. We will state the privacy guarantee in terms of this function.

Theorem 23

For every \(\varepsilon ,\delta \in (0,1), n,r \in \mathbb {N}\), define

$$ \varepsilon _0 = \frac{\varepsilon }{\sqrt{8 r \log (2/\delta )}}~~~~~~\delta _0 = \frac{\delta }{2r}~~~~~~\lambda ^* = \lambda ^*(n,\varepsilon _0,\delta _0) $$

For every \(\lambda \ge \lambda ^*\), \(P^\mathbb {R}_{n,\lambda ,r}\) is \((\varepsilon ,\delta )\)-differentially private.

5.3 Accuracy Analysis

In this section, we bound the error of \(P^\mathbb {R}_{\lambda ,r}(X)\) with respect to \(\sum _i x_i\). Recall that \(f(X) = \sum _i x_i\).

Observe that there are two sources of randomness: the encoding of the input \(X=(x_1,\dots x_n)\) as bits and the execution of \(R^{0/1}_{n,\lambda }\) on that encoding. We first show that the bit encoding lends itself to an unbiased and concentrated estimator of f(X). Then we show that the output of \(P_{n,\lambda ,r}\) is concentrated around any value that estimator takes.

Theorem 24

For every \(\beta > 0\), \(n \ge \lambda \ge \frac{16}{9}\log \frac{2}{\beta }\), \(r\in \mathbb {N}\), and \(X\in [0,1]^n\),

$$ { \mathbb {P} \left[ \left| P^\mathbb {R}_{n,\lambda ,r}(X) - f(X) \right| \ge \tfrac{\sqrt{2}}{r}\sqrt{ n\log \tfrac{2}{\beta }} + \tfrac{n}{n-\lambda } \cdot \sqrt{2\tfrac{\lambda }{r} \log \tfrac{2}{\beta }}\right] } < 2\beta $$

The analysis can be found in the full version of the paper, which also argues that setting \(r\leftarrow \varepsilon \cdot \sqrt{n}\) suffices to achieve the bound in Theorem 22.

6 Lower Bounds for the Shuffled Model

In this section, we prove separations between central model algorithms and shuffled model protocols where each user’s local randomizer is identical and sends one indivisible message to the shuffler (the one-message model).

Theorem 25

(Shuffled-to-Local Transformation). Let \(P_{S}\) be a protocol in the one-message shuffled model that is

  • \((\varepsilon _{S},\delta _{S})\)-differentially private in the shuffled model for some \(\varepsilon _{S} \le 1\) and \(\delta _{S} = \delta _{S}(n) < n^{-8}\), and

  • \((\alpha ,\beta )\)-accurate with respect to f for some \(\beta = \varOmega (1)\).

Then there exists a protocol \(P_{L}\) in the local model that is

  • \((\varepsilon _{L}, 0)\)-differentially private in the local model for \(\varepsilon _{L} = 8(\varepsilon _{S} + \ln n)\), and

  • \((\alpha ,4\beta )\)-accurate with respect to f (when n is larger than some absolute constant)

This means that an impossibility result for approximating f in the local model implies a related impossibility result for approximating f in the shuffled model. In Sect. 6.2 we combine this result with existing lower bounds for local differential privacy to obtain several strong separations between the central model and the one-message shuffled model.

The key to Theorem 25 is to show that if \(P_{S} = (R_{S},S,A_{S})\) is a protocol in the one-message shuffled model satisfying \((\varepsilon _S,\delta _S)\)-differential privacy, then the algorithm \(R_{S}\) itself satisfies \((\varepsilon _{L},\delta _{S})\)-differential privacy without use of the shuffler S. Therefore, the local protocol \(P_{L} = (R_{S}, A_{S} \circ S)\) is \((\varepsilon _{L},\delta _{S})\)-private in the local model and has the exact same output distribution, and thus the exact same accuracy, as \(P_{S}\). To complete the proof, we use (a slight generalization of) a transformation of Bun, Nelson, and Stemmer [7] to turn R into a related algorithm \(R'\) satisfying \((8(\varepsilon _{S} + \ln n),0)\)-differential privacy with only a slight loss of accuracy. We prove the latter result in the full version of the paper.

6.1 One-Message Randomizers Satisfy Local Differential Privacy

The following lemma is the key step in the proof of Theorem 25, and states that for any symmetric shuffled protocol, the local randomizer R must satisfy local differential privacy with weak, but still non-trivial, privacy parameters.

Theorem 26

Let \(P = (R,S,A)\) be a protocol in the one-message shuffled model. If \(n \in \mathbb {N}\) is such that \(P_{n}\) satisfies \((\varepsilon _{S},\delta _{S})\)-differential privacy, then the algorithm R satisfies \((\varepsilon _{L},\delta _L)\)-differential privacy for \(\varepsilon _{L} = \varepsilon _{S} + \ln n\). Therefore, the symmetric local protocol \(P_{L} = (R,A \circ S)\) satisfies \((\varepsilon _{L},\delta _L)\)-differential privacy.

Proof

By assumption, \(P_n\) is \((\varepsilon _S,\delta _S)\)-private. Let \(\varepsilon \) be the supremum such that \(R :\mathcal {X}\rightarrow \mathcal {Y}\) is not \((\varepsilon ,\delta _{S})\)-private. We will attempt to find a bound on \(\varepsilon \). If R is not \((\varepsilon ,\delta _{S})\)-differentially private, there exist \(Y \subset \mathcal {Y}\) and \(x,x'\in \mathcal {X}\) such that

$$ { \mathbb {P} \left[ R(x')\in Y\right] } > \exp (\varepsilon ) \cdot { \mathbb {P} \left[ R(x)\in Y\right] } +\delta _{S} $$

For brevity, define \(p := \mathbb {P}(R(x)\in Y)\) and \(p' := \mathbb {P}(R(x')\in Y)\) so that we have

$$\begin{aligned} p' > \exp (\varepsilon )p +\delta _{S} \end{aligned}$$
(7)

We will show that if \(\varepsilon \) is too large, then (7) will imply that \(P_n\) is not \((\varepsilon _S,\delta _S)\)-differentially private, which contradicts our assumption. To this end, define the set \(\mathcal {W}:=\{W\in \mathcal {Y}^n~|~\exists i~ w_i \in Y\}\). Define two datasets \(X\sim X'\) as

$$X := (\underbrace{x,\dots ,x}_{n \text { times}})~~~\text {and}~~~X' := (x',\underbrace{x,\dots ,x}_{n-1 \text { times}})$$

Because \(P_n\) is \((\varepsilon _S,\delta _S)\)-differentially private

$$\begin{aligned} { \mathbb {P} \left[ P_n(X')\in \mathcal {W}\right] } \le \exp (\varepsilon _S)\cdot { \mathbb {P} \left[ P_n(X)\in W\right] } + \delta _S \end{aligned}$$
(8)

Now we have

figure m

where the second equality is because the set W is closed under permutation, so we can remove the random permutation \(S\) without changing the probability. Similarly, we have

figure n

Now, plugging the previous two inequalities into (8), we have

$$\begin{aligned} \exp (\varepsilon )p + \delta _{S} <{}&{ \mathbb {P} \left[ P_n(X')\in \mathcal {W}\right] } \\ \le {}&\exp (\varepsilon _{S})\cdot { \mathbb {P} \left[ P_n(X)\in \mathcal {W}\right] } \\ \le {}&\exp (\varepsilon _{S})np + \delta _{S} \end{aligned}$$

By rearranging and canceling terms in the above we obtain the conclusion

$$ \varepsilon \le \varepsilon _{S} + \ln n $$

Therefore R must satisfy \((\varepsilon _{S} + \ln n, \delta _{S})\)-differential privacy.   \(\square \)

Claim 27

If the shuffled protocol \(P_{S} = (R,S,A)\) is \((\alpha ,\beta )\)-accurate for some function f, then the local protocol \(P_{L} = (R, A \circ S)\) is \((\alpha ,\beta )\)-accurate for f, where

$$ (A \circ S)(y_1,\dots ,y_N) = A( S(y_1,\dots ,y_N)) $$

We do not present a proof of Claim 27, as it is immediate that the distribution of \(P_{S}(x)\) and \(P_{L}(x)\) are identical, since \(A \circ S\) incorporates the shuffler.

We conclude this section with a slight extension of a result of Bun, Nelson, and Stemmer [7] showing how to transform any local algorithm satisfying \((\varepsilon ,\delta )\)-differential privacy into one satisfying \((O(\varepsilon ),0)\)-differential privacy with only a small decrease in accuracy. Our extension covers the case where \(\varepsilon > 2/3\), whereas their result as stated requires \(\varepsilon \le 1/4\).

Theorem 28

(Extension of [7]). Suppose local protocol \(P_{L} = (R,A)\) is \((\varepsilon ,\delta )\) differentially private and \((\alpha ,\beta )\) accurate with respect to f. If \(\varepsilon > 2/3\) and

$$ \delta < \frac{\beta }{8n\ln (n/\beta )}\cdot \frac{1}{\exp (6\varepsilon )} $$

then there exists another local protocol \(P_{L}' = (R',A)\) that is \((8\varepsilon ,0)\) differentially private and \((\alpha ,4\beta )\) accurate with respect to f.

The proof can be found in the full version of the paper. Theorem 25 now follows by combining Theorem 26 and Claim 27 with Theorem 28.

6.2 Applications of Theorem 25

In this section, we define two problems and present known lower bounds in the central and local models. By applying Theorem 25, we derive lower bounds in the one-message shuffled model. These bounds imply large separations between the central and one-message shuffled models.

Table 1. Comparisons Between Models. When a parameter is unspecified, the reader may substitute \(\varepsilon = 1, \delta = 0, \alpha = \beta = .01\). All results are presented as the minimum dataset size n for which we can hope to achieve the desired privacy and accuracy as a function of the relevant parameter for the problem.

The Selection Problem. We define the selection problem as follows. The data universe is \(\mathcal {X}= \{0,1\}^{d}\) where d is the dimension of the problem and the main parameter of interest. Given a dataset \(x = (x_1,\dots ,x_n) \in \mathcal {X}^n\), the goal is to identify a coordinate j such that the sum of the users’ j-th bits is approximately as large as possible. That is, a coordinate \(j \in [d]\) such that

$$\begin{aligned} \sum _{i=1}^{n} x_{i,j} \ge \max _{j' \in [d]} \sum _{i=1}^{n} x_{i,j'} - \frac{n}{10} \end{aligned}$$
(9)

We say that an algorithm solves the selection problem with probability \(1-\beta \) if for every dataset x, with probability at least \(1-\beta \), it outputs j satisfying (9).

We would like to understand the minimum n (as a function of d) such that there is a differentially private algorithm that can solve the selection problem with constant probability of failure. We remark that this is a very weak notion of accuracy, but since we are proving a negative result, using a weak notion of accuracy only strengthens our results.

The following lower bound for locally differentially private protocols for selection is from [31], and is implicit in the work of [12].Footnote 2

Theorem 29

If \(P_{L} = (R_{L},A_{L})\) is a local protocol that satisfies \((\varepsilon ,0)\)-differential privacy and \(P_{L}\) solves the selection problem with probability \(\frac{9}{10}\) for datasets \(x \in (\{0,1\}^{d})^n\), then \(n = \varOmega \left( \frac{d \log d}{(e^{\varepsilon } - 1)^2}\right) \).

By applying Theorem 25 we immediately obtain the following corollary.

Corollary 30

If \(P_{S} = (R_{S},S,A_{S})\) is a \((1,\delta )\)-differentially private protocol in the one-message shuffled model, for \(\delta = \delta (n) < n^{-8}\), and \(P_{S}\) solves the selection problem with probability \(\frac{99}{100}\), then \(n = \varOmega ( (d \log d)^{1/17} )\).

Using a multi-message shuffled protocolFootnote 3, we can solve selection with \(\tilde{O}(\frac{1}{\varepsilon } \sqrt{d})\) samples. By contrast, in the local model \(n = \varTheta (\frac{1}{\varepsilon ^2} d \log d)\) samples are necessary and sufficient. In the central model, this problem is solved by the exponential mechanism [25] with a dataset of size just \(n = O(\frac{1}{\varepsilon } \log d)\), and this is optimal [2, 28]. These results are summarized in Table 1.

Histograms. We define the histogram problem as follows. The data universe is \(\mathcal {X}= [D]\) where D is the domain size of the problem and the main parameter of interest. Given a dataset \(x = (x_1,\dots ,x_n) \in \mathcal {X}^n\), the goal is to build a vector of size D such that for all \(j\in [D]\) the j-th element is as close to the frequency of j in x. That is, a vector \(v \in [0,n]^D\) such that

$$\begin{aligned} \max _{j\in [D]} \left| v_j - \sum _{i=1}^n \mathbbm {1}(x_i=j) \right| \le \frac{n}{10} \end{aligned}$$
(10)

where \(\mathbbm {1}(\texttt {conditional})\) is defined to be 1 if conditional evaluates to true and 0 otherwise.

Similar to the selection problem, an algorithm solves the histogram problem with probability \(1-\beta \) if for every dataset x, with probability at least \(1-\beta \) it outputs v satisfying (10). We would like to find the minimum n such that a differentially private algorithm can solve the histogram problem; the following lower bound for locally differentially private protocols for histograms is from [3].

Theorem 31

If \(P_L=(R_L,A_L)\) is a local protocol that satisfies \((\varepsilon ,0)\) differential privacy and \(P_L\) solves the histogram problem with probability \(\frac{9}{10}\) for any \(x\in [D]^n\) then \(n=\varOmega \left( \frac{\log D}{(e^\varepsilon -1)^2}\right) \)

By applying Theorem 25, we immediately obtain the following corollary.

Corollary 32

If \(P_{S} = (R_{S},S,A_{S})\) is a \((1,\delta )\)-differentially private protocol in the one-message shuffled model, for \(\delta = \delta (n) < n^{-8}\), and \(P_{S}\) solves the histogram problem with probability \(\frac{99}{100}\), then \(n=\varOmega \left( \log ^{1/17}D\right) \)

In the shuffled model, we can solve this problem using our protocol for bit-sums by having each user encode their data as a “histogram” of just their value \(x_i \in [D]\) and then running the bit-sum protocol D times, once for each value \(j \in [D]\), which incurs error \(O(\frac{1}{\varepsilon } \sqrt{ \log \frac{1}{\delta } \log D})\).Footnote 4 But in the central model, this problem can be solved to error \(O( \min \{ \log \frac{1}{\delta }, \log D \})\), which is optimal (see, e.g. [32]). Thus, the central and one-message shuffled models are qualitatively different with respect to computing histograms: D may be infinite in the former whereas D must be bounded in the latter.