1 Introduction

Individuals are producing ever growing amounts of personal data, which in turn fuel innovative services based on machine learning (ML). The classic centralized paradigm consists in collecting, storing and analyzing this data on a (supposedly trusted) central server or in the cloud, which poses well documented privacy risks for the users. With the increase of public awareness and regulations, we are witnessing a shift towards a more decentralized paradigm where personal data remains on each user’s device, as can be seen from the growing popularity of federated learning (Kairouz et al., 2021b). In this setting, users typically do not trust the central server (if any), or each other, which introduces new issues regarding privacy and security. First, the information shared by users during the decentralized training protocol can reveal a lot about their private data (see (Geiping et al., 2020; Melis et al., 2019; Nasr et al., 2019) for inference attacks on federated learning). Formal guarantees such as differential privacy (DP) (Dwork, 2006) are needed to provably mitigate this and convince users to participate. Second, malicious users may send incorrect results to bias the learned model in arbitrary ways (Bagdasaryan et al., 2020; Bhagoji et al., 2019; Hayes & Ohrimenko, 2018). Ensuring the correctness of the computation is crucial to persuade service providers to move to a more decentralized and privacy-friendly setting.

In this work, we tackle these challenges in the context of private distributed averaging. In this canonical problem, the objective is to privately compute an estimate of the average of values owned by many users who do not want to disclose them. Beyond simple data analytics, distributed averaging is of high relevance to modern ML. Indeed, it is the key primitive used to aggregate user updates in gradient-based distributed and federated learning algorithms (Agarwal et al., 2018; Jayaraman et al., 2018; Kairouz et al., 2021b; Lin et al., 2020; McMahan et al., 2017; Stich, 2019). It also allows to train ML models whose sufficient statistics are averages (e.g., linear models and decision trees). Distributed averaging with differential privacy guarantees has thus attracted a lot of interest in recent years. In the strong model of local differential privacy (LDP) (Chen et al., 2020; Duchi et al., 2013; Kairouz et al., 2015, 2016a, b; Kasiviswanathan et al., 2008), each user randomizes its input locally before sending it to an untrusted aggregator. Unfortunately, the best possible error for the estimated average with n users is of the order \(O(\sqrt{n})\) larger than in the centralized model of DP where a trusted curator aggregates data in the clear and perturbs the output (Chan et al., 2012a). To fill this gap, some work has explored relaxations of LDP that make it possible to match the utility of the trusted curator model. This is achieved through the use of cryptographic primitives such as secure aggregation (Bonawitz et al., 2017; Chan et al., 2012b; Dwork et al., 2006; Jayaraman et al., 2018; Shi et al., 2011) and secure shuffling (Balle et al., 2020; Cheu et al., 2019; Erlingsson et al., 2019; Ghazi et al., 2020). Many of these solutions however assume that all users truthfully follow the protocol (they are honest-but-curious) and/or give significant power (ability to reveal sensitive data) to a small number of servers. Furthermore, their practical implementation poses important challenges when the number of parties is large: for instance, the popular secure aggregation approach of Bonawitz et al. (2017) requires all \(O (n^2)\) pairs of users to exchange messages.

In this context, our contribution is fourfold.

  • First, we propose Gopa, a novel decentralized differentially private averaging protocol that relies on users exchanging (directly or through a server) some pairwise-correlated Gaussian noise terms along the edges of a graph so as to mask their private values without affecting the global average. This ultimately canceling noise is complemented by the addition of independent (non-canceling) Gaussian noise by each user.

  • Second, we analyze the differential privacy guarantees of Gopa. Remarkably, we establish that our approach can achieve nearly the same privacy-utility trade-off as a trusted curator who would average the values of honest users, provided that the graph of honest-but-curious users is connected and the pairwise-correlated noise variance is large enough. In particular, for \(n_H\) honest-but-curious users and any fixed DP guarantee, the variance of the estimated average is only \(n_H/(n_H-1)\) times larger than with a trusted curator, a factor which goes to 1 as \(n_H\) grows. We further show that if the graph is well-connected, the pairwise-correlated noise variance can be significantly reduced.

  • Third, to ensure both scalability and robustness to malicious users, we propose a randomized procedure in which each user communicates with only a logarithmic number of other users while still matching the privacy-utility trade-off of the trusted curator. Our analysis is novel and requires to leverage and adapt results from random graph theory on embedding spanning trees in random graphs. Additionally, we show our protocol is robust to a fraction of users dropping out.

  • Finally, we propose a procedure to make Gopa verifiable by untrusted external parties, i.e., to enable users to prove the correctness of their computations without compromising the efficiency or the privacy guarantees of the protocol. To the best of our knowledge, we are the first to propose such a procedure. It offers a strong preventive countermeasure against various attacks such as data poisoning or protocol deviations aimed at reducing utility. Our construction relies on commitment schemes and zero knowledge proofs (ZKPs), which are very popular in auditable electronic payment systems and cryptocurrencies. These cryptographic primitives scale well both in communication and computational requirements and are perfectly suitable in our untrusted decentralized setting. We use classic ZKPs to design a procedure for the generation of noise with verifiable distribution, and ultimately to prove the correctness of the final computation (or detect malicious users who did not follow the protocol). Crucially, the privacy guarantees of the protocol are not compromised by this procedure, while the integrity of the computation relies on a standard discrete logarithm assumption. In the end, we argue that our protocol offers correctness guarantees that are essentially equivalent to the case where a trusted curator would hold the private data of users.

The paper is organized as follows. Section 2 introduces the problem setting. We discuss the related work in more details in Sect. 3. The Gopa protocol is introduced in Sect. 4 and we analyze its differential privacy guarantees in Sect. 5. We present our procedure to ensure correctness against malicious behavior in Sect. 6, and summarize computational and communication costs in Sect. 7. Finally, we present some experimental results in Sect. 8 and conclude with future lines of research in Sect. 9.

2 Notations and setting

We consider a set \(U=\{1,\dots ,n\}\) of \(n\ge 3\) users (parties). Each user \(u\in U\) holds a private value \(X_u\), which can be thought of as being computed from the private dataset of user u. We assume that \(X_u\) lies in a bounded interval of \({\mathbb {R}}\) (without loss of generality, we assume \(X_u\in [0, 1]\)). The extension to the vector case is straightforward. We denote by X the column vector \(X=[X_1,\dots ,X_n]^\top \in [0,1]^n\) of private values. Unless otherwise noted, all vectors are column vectors. Users communicate over a network represented by a connected undirected graph \(G=(U,E)\), where \(\{u,v\}\in E\) indicates that users u and v are neighbors in G and can exchange secure messages. For a given user u, we denote by \(N(u)=\{v : \{u,v\}\in E\}\) the set of its neighbors. We note that in settings where users can only communicate with a server, the latter can act as a relay that forwards (encrypted and authenticated) messages between users, as done in secure aggregation (Bonawitz et al., 2017).

The users aim to collaboratively estimate the average value \(X^{avg}= \frac{1}{n}\sum _{u=1}^n X_u\) without revealing their individual private values. Such a protocol can be readily used to privately execute distributed ML algorithms that interact with data through averages over values computed locally by the participants, but do not actually need to see the individual values. We give two concrete examples below.

Example 1

(Linear regression) Let \(\uplambda \ge 0\) be a public parameter. Each user u holds a private feature vector \(\phi _u=[\phi _u^1,\dots ,\phi _u^d]\in {\mathbb {R}}^d\) and a private label \(y_u\in {\mathbb {R}}\). The goal is to solve a ridge regression task, i.e. find \(\theta ^*\in \mathop {\mathrm {arg\,min}}\limits _\theta \frac{1}{n}\sum _{u\in U} (\phi _u^\top \theta - y_u)^2 + \uplambda \Vert \theta \Vert ^2\). \(\theta ^*\) can be computed in closed form from the quantities \(\frac{1}{n}\sum _{u\in U} \phi _u^iy_u\) and \(\frac{1}{n}\sum _{u\in U}\phi _u^i\phi _u^j\) for all \(i,j\in \{1,\dots ,d\}\).

Example 2

(Federated ML) In federated learning (Kairouz et al., 2021b) and distributed empirical risk minimization, each user u holds a private dataset \({\mathcal {D}}_u\) and the goal is to find \(\theta ^*\) such that \(\theta ^*\in \mathop {\mathrm {arg\,min}}\limits _\theta \frac{1}{n}\sum _{u\in U}f(\theta ; {\mathcal {D}}_u)\) where f is some loss function. Popular algorithms (Agarwal et al., 2018; Jayaraman et al., 2018; Lin et al., 2020; McMahan et al., 2017; Stich, 2019) all follow the same high-level procedure: at round t, each user u computes a local update \(\theta _u^t\) based on \({\mathcal {D}}_u\) and the current global model \(\theta ^{t-1}\), and the updated global model is computed as \(\theta ^t=\frac{1}{n}\sum _u \theta _u^t\).

Threat model We consider two commonly adopted adversary models formalized by Goldreich (1998) and used in the design of many secure protocols. A honest-but-curious (honest for short) user will follow the protocol specification, but may use all the information obtained during the execution to infer information about other users. A honest user may accidentally drop out at any point of the execution (in a way that is independent of the private values X). On the other hand, a malicious user may deviate from the protocol execution (e.g, sending incorrect values or dropping out on purpose). Malicious users can collude, and thus will be seen as a single malicious party (the adversary) who has access to all information collected by malicious users. Our privacy guarantees will hold under the assumption that honest users communicate through secure channels, while the correctness of our protocol will be guaranteed under some form of the Discrete Logarithm Assumption (DLA), a standard assumption in cryptography.

For a given execution of the protocol, we denote by \(U^O\) the set of the users who remained online until the end (i.e., did not drop out). Users in \(U^O\) are either honest or malicious: we denote by \(U^H\subseteq U^O\) those who are honest, by \(n_H=|U^{H}|\) their number and by \(\rho =n_H/n\) their proportion with respect to the total number of users. We also denote by \(G^H= (U^{H}, E^{H})\) the subgraph of G induced by the set of honest users \(U^{H}\), i.e., \(E^{H}= \{\{u,v\} \in E : u,v \in U^{H}\}\). The properties of G and \(G^H\) will play a key role in the privacy and scalability guarantees of our protocol.

Privacy definition Our goal is to design a protocol that satisfies differential privacy (DP) Dwork (2006), which has become a gold standard in private information release.

Definition 1

(Differential privacy) Let \(\varepsilon >0,\delta \ge 0\). A (randomized) protocol \({\mathcal {A}}\) is \((\varepsilon ,\delta )\)-differentially private if for all neighboring datasets \(X=[X_1,\dots ,X_n]\) and \(X'=[X'_1,\dots ,X_n]\) differing only in a single data point, and for all sets of possible outputs \({\mathcal {O}}\), we have:

$$\begin{aligned} \Pr ({\mathcal {A}} (X) \in {\mathcal {O}}) \le e^ {\varepsilon }\Pr ({\mathcal {A}} ( X') \in {\mathcal {O}}) + \delta . \end{aligned}$$
(1)

3 Related work

In this section we review the most important work related to ours. A set of key approaches together with their main features are summarized in Table 1.

Table 1 Comparison of GOPA with previous DP averaging approaches with their communication cost per party, mean squared error (MSE), verifiability (Verif) and additional risks

Distributed averaging is a key subroutine in distributed and federated learning (Agarwal et al., 2018; Jayaraman et al., 2018; Lin et al., 2020; McMahan et al., 2017; Kairouz et al., 2021b; Stich, 2019). Therefore, any improvement in the privacy-utility-communication trade-off for averaging implies gains for many ML approaches downstream.

Local differential privacy (LDP) (Duchi et al., 2013; Kairouz et al., 2015, 2016a, b; Kasiviswanathan et al., 2008) requires users to locally randomize their input before they send it to an untrusted aggregator. This very strong model of privacy comes at a significant cost in utility: the best possible mean squared is of order \(1/n^2\) in the trusted curator model while it is of order 1/n in LDP (Chan et al., 2012a; Chen et al., 2020). This limits the usefulness of the local model to industrial settings where the number of participants is huge (Erlingsson et al., 2014; Ding et al., 2017). Our approach belongs to the recent line of work which attempts to relax the LDP model so as to improve utility without relying on a trusted curator (or similarly on a small fixed set of parties).

Previous work considered the use of cryptographic primitives like secure aggregation protocols, which can be used to compute the (exact) average of private values (Bonawitz et al., 2017; Chan et al., 2012b; Dwork et al., 2006; Shi et al., 2011). While secure aggregation allows in principle to recover the utility of the trusted curator model, it suffers three main drawbacks. Firstly, existing protocols require \(\varOmega (n)\) communication per party, which is hardly feasible beyond a few hundred or thousand users. In contrast, we propose a protocol which requires only \(O(\log n)\) communication.Footnote 1 Secondly, combining secure aggregation with DP is nontrivial as the noise must be added in a distributed fashion and in the discrete domain. Existing complete systems (Agarwal et al., 2021; Kairouz et al., 2021a) assume an ideal secure aggregation functionality which does not reflect the impact of colluding/malicious users. In these more challenging settings, it is not clear how to add the necessary noise for DP and what the resulting privacy/utility trade-offs would be. Alternatively, (Jayaraman et al., 2018) adds the noise within the secure protocol but relies on two non-colluding servers. Thirdly, most of the above schemes are not verifiable. One exception is the verifiable secret sharing approach of Dwork et al. (2006), which again induces \(\varOmega (n)\) communication. Finally, we note that secure aggregation typically uses uniformly distributed pairwise masks, hence a single residual term completely destroys the utility. In contrast, we use Gaussian pairwise masks that have zero mean and bounded variance, which provides more robustness but requires the more involved privacy analysis we present in Sect. 5.

Recently, the shuffle model of privacy (Balle et al., 2020; Cheu et al., 2019; Erlingsson et al., 2019; Ghazi et al., 2020; Hartmann & West, 2019), where inputs are passed to a trusted/secure shuffler that obfuscates the source of the messages, has been studied theoretically as an intermediate point between the local and trusted curator models. For differentially private averaging, the shuffle model allows to match the utility of the trusted curator setting (Balle et al., 2020). However, practical implementations of secure shuffling are not discussed in these works. Existing solutions typically rely on multiple layers of routing servers (Dingledine et al., 2004) with high communication overhead and non-collusion assumptions. Anonymous communication is also potentially at odds with the identification of malicious parties. To the best of our knowledge, all protocols for averaging in the shuffle model assume honest-but-curious parties.

The protocol proposed in Imtiaz et al. (2021) uses correlated Gaussian noise to achieve trusted curator utility for averaging, but the dependence structure of the noise must be only at the global level (i.e., noise terms sum to zero over all users). Generating such noise actually requires a call to a secure aggregation primitive, which incurs \(\varOmega (n)\) communication per party as discussed above. In contrast, our pairwise-canceling noise terms can be generated with only \(O(\log n)\) communication. Furthermore, Imtiaz et al. (2021) assume honest parties, while our protocol is robust to malicious participants.

In summary, an original aspect of our work is to match the privacy-utility trade-off of the trusted curator model at a relatively low cost without requiring to trust a fixed small set of parties. By spreading trust over sufficiently many parties, we ensure that even in the unlikely case where many parties collude they will not be able to infer much sensitive information, reducing the incentive to collude. We are not aware of other differential privacy work sharing this feature. Overall, our protocol provides a unique combination of three important properties: (a) utility of same order as trusted curator setting, (b) logarithmic communication per user, and (c) robustness to malicious users.

4 Proposed protocol

In this section we describe our protocol called Gopa (GOssip noise for Private Averaging). The high-level idea of Gopa is to have each user u mask its private value by adding two different types of noise. The first type is a sum of pairwise-correlated noise terms \(\varDelta _{u,v}\) over the set of neighbors \(v\in N(u)\) such that each \(\varDelta _{u,v}\) cancels out with the \(\varDelta _{v,u}\) of user v in the final result. The second type of noise is an independent term \(\eta _u\) which does not cancel out. At the end of the protocol, each user has generated a noisy version \({\hat{X}}_u\) of its private value \(X_u\), which takes the following form:

$$\begin{aligned} {\hat{X}}_u = X_u + \textstyle \sum _{v\in N(u)} \varDelta _{u,v} + \eta _u. \end{aligned}$$
(2)

Algorithm 1 presents the detailed steps. Neighboring nodes \(\{u,v\}\in E\) contact each other to draw a real number from the Gaussian distribution \({\mathcal {N}}(0,\sigma _{\varDelta} ^2)\), that u adds to its private value and v subtracts. Intuitively, each user thereby distributes noise masking its private value across its neighbors so that even if some of them are malicious and collude, the remaining noise values will be enough to provide the desired privacy guarantees. The idea is reminiscent of uniformly random pairwise masks in secure aggregation (Bonawitz et al., 2017) but we use Gaussian noise and restrict exchanges to the edges of the graph instead of requiring messages between all pairs of users. As in gossip algorithms (Boyd et al., 2006), the pairwise exchanges can be performed asynchronously and in parallel. Additionally, every user \(u \in U\) adds an independent noise term \(\eta _u\sim {\mathcal {N}}{(0, \sigma _\eta ^2)}\) to its private value. This noise will ensure that the final estimate of the average satisfies differential privacy (see Sect. 5). The pairwise and independent noise variances \(\sigma _{\varDelta} ^2\) and \(\sigma _\eta ^2\) are public parameters of the protocol.

figure a

Utility of Gopa The protocol generates a set of noisy values \({\hat{X}}=[{\hat{X}}_1,\dots ,{\hat{X}}_n]^\top\) which are then publicly released. They can be sent to an untrusted aggregator, or averaged in a decentralized way via gossiping (Boyd et al., 2006). In any case, the estimated average is given by \({\hat{X}}^{avg} = \frac{1}{n}\sum _{u\in U} {\hat{X}}_u = X^{avg}+ \frac{1}{n}\sum _{u \in U} \eta _u,\) which has expected value \(X^{avg}\) and variance \(\sigma _\eta ^2/n\). Recall that the local model of DP, where each user releases a locally perturbed input without communicating with other users, would require \(\sigma _\eta ^2=O(1)\). In contrast, we would like the total amount of independent noise to be of order \(O(1/n_H)\) as needed to protect the average of honest users with the standard Gaussian mechanism in the trusted curator model of DP (Dwork & Roth, 2014). We will show in Sect. 5 that we can achieve this privacy-utility trade-off by choosing an appropriate variance \(\sigma _{\varDelta} ^2\) for our pairwise noise terms.

Dealing with dropout A user \(u\notin U^{O}\) who drops out during the execution of the protocol does not actually publish any noisy value (i.e., \({\hat{X}}_u\) is empty). The estimated average is thus computed by averaging only over the noisy values of users in \(U^{O}\). Additionally, any residual noise term that a user \(u\notin U^{O}\) may have exchanged with a user \(v\in U^{O}\) before dropping out can be “rolled back” by having v reveal \(\varDelta _{u,v}\) so it can be subtracted from the result (we will ensure this does not threaten privacy by having sufficiently many neighbors, see Sect. 5.4). We can thus obtain an estimate of \(\frac{1}{|U^{O}|}\sum _{u\in U^{O}} X_u\) with variance \(\sigma _\eta ^2/|U^{O}|\). Note that even if some residual noise terms are not rolled back, e.g. to avoid extra communication, the estimate remains unbiased (with a larger variance that depends on \(\sigma _{\varDelta} ^2\)). This is a rather unique feature of Gopa which comes from the use of Gaussian noise rather than the uniformly random noise used in secure aggregation (Bonawitz et al., 2017). We discuss strategies to handle users dropping out in more details in Appendix B.2.

5 Privacy guarantees

Our goal is to prove differential privacy guarantees for Gopa. First, we develop in Sect. 5.1 a general result providing privacy guarantees as a function of the structure of the communication graph \(G^H\), i.e., the subgraph of G induced by \(U^H\). Then, in Sects. 5.2 and 5.3, we study the special cases of the path graph and the complete graph respectively, showing they are the worst and best cases in terms of privacy. Yet, we show that as long as \(G^H\) is connected and the variance \(\sigma _{\varDelta} ^2\) for the pairwise (canceling) noise is large enough Gopa can (nearly) match the privacy-utility trade-off of the trusted curator setting. In Sect. 5.4, we propose a randomized procedure to construct the graph G and show that it strikes a good balance between privacy and communication costs. In each section, we first discuss the result and its consequences, and then present the proof. In Sect. 5.5, we summarize our results and provide further discussion.

5.1 Effect of the communication structure on privacy

The strength of the privacy guarantee we can prove depends on the communication graph \(G^H\) over honest users. Intuitively, this is because the more terms \(\varDelta _{u,v}\) a given honest user u exchanges with other honest users v, the more he/she spreads his/her secret over others and the more difficult it becomes to estimate the private value \(X_u\). We first introduce in Sect. 5.1.1 a number of preliminary concepts. Next, in Sect. 5.1.2, we prove an abstract result, Theorem 1, which gives DP guarantees for Gopa that depend on the choice of a labeling t of the graph \(G^H\).

In Sect. 5.1.3 we discuss a number of implications of Theorem 1 which provide some insight into the dependency between the structure of \(G^H\) and the privacy of Gopa , and will turn out helpful in the proofs of Theorems 24.

5.1.1 Preliminary concepts

Recall that each user \(u\in U^{O}\) who does not drop out generates \({\hat{X}}_u\) from its private value \(X_u\) by adding pairwise noise terms \({\bar{\varDelta }}_u=\sum _{v\in N(u)} \varDelta _{u,v}\) (with \(\varDelta _{u,v}+\varDelta _{v,u}=0\)) as well as independent noise \(\eta _u\). All random variables \(\varDelta _{u,v}\) (with \(u<v\)) and \(\eta _u\) are independent. We thus have the system of linear equations

$$\begin{aligned} {{\hat{X}}} = X+{\bar{\varDelta }}+\eta , \end{aligned}$$

where \({\bar{\varDelta }}=({\bar{\varDelta }}_u)_{u\in U^{O}}\) and \(\eta =(\eta _u)_{u\in U^{O}}\).

We now define the knowledge acquired by the adversary (colluding malicious users) during a given execution of the protocol. It consists of the following:

  1. i.

    the noisy value \({\hat{X}}_u\) of all users \(u\in U^{O}\) who did not drop out,

  2. ii.

    the private value \(X_u\) and the noise \(\eta _u\) of the malicious users, and

  3. iii.

    all \(\varDelta _{u,v}\)’s for which u or v is malicious.

We also assume that the adversary knows the full network graph G and all the pairwise noise terms exchanged by dropped out users (since they can be rolled back, as explained in Sect. 4). The only unknowns are thus the private value \(X_u\) and independent noise \(\eta _u\) of each honest user \(u\in U^H\), as well as the \(\varDelta _{u,v}\) values exchanged between pairs of honest users \(\{u,v\}\in E^H\).

Letting \(N^H(u)=\{v : \{u,v\}\in E^H\}\), from the above knowledge the adversary can subtract \(\sum _{v \in N(u) \setminus N^H(u)} \varDelta _{u,v}\) from \({\hat{X}}_u\) to obtain

$$\begin{aligned} {\hat{X}}_u^H = X_u + \sum _{u \in N^H (u)} \varDelta _{u,v} + \eta _u \end{aligned}$$

for every honest \(u \in U^H\). The view of the adversary can thus be summarized by the vector \({\hat{X}}^H = ({\hat{X}}^H_u)_{u \in U^H}\) and the correlation between its elements. Let \({{\hat{X}}}^H_u = {{\hat{X}}}_u - \sum _{v\in N(u)\setminus N^H(u)} \varDelta _{u,v}\). Let \(X^H=(X_u)_{u\in U^H}\) be the vector of private values restricted to the honest users and similarly \(\eta ^H=(\eta _u)_{u\in U^H}\).

Let the directed graph \((U^H,{{\vec {E}}^H})\) be an arbitrary orientation of the undirected graph \(G^H=(U^H,E^H)\), i.e., for every edge \(\{u,v\}\in E^H\), the set \({{\vec {E}}^H}\) either contains the arc (uv) or the arc (vu). For every arc \((u,v)\in {{\vec {E}}^H}\), let \(\varDelta _{(u,v)} = \varDelta _{u,v} = -\varDelta _{v,u}\). Let \(\varDelta ^H=(\varDelta _e^H)_{e\in {{\vec {E}}^H}}\) be a vector of pairwise noise values indexed by arcs from \({{\vec {E}}^H}\). Let \(K\in {\mathbb {R}}^{U^H\times {{\vec {E}}^H}}\) denote the oriented incidence matrix of the graph \(G^H\), i.e., for \((u,v)\in {{\vec {E}}^H}\) and \(w\in U^H\setminus \{u,v\}\) there holds \(K_{u,(u,v)}=-1\), \(K_{v,(u,v)}=1\) and \(K_{w,(u,v)}=0\). In this way, we can rewrite the system of linear equations as

$$\begin{aligned} {{\hat{X}}}^H = X^H+K\varDelta ^H+\eta ^H. \end{aligned}$$
(3)

Now, adapting differential privacy (Definition 1) to our setting, for any input X and any possible outcome \({\hat{X}}\), we need to compare the probability of the outcome being equal to \({\hat{X}}\) when a (non-malicious) user \(v_1\in U\) participates in the computation with private value \(X_{v_1}^A\) to the probability of obtaining the same outcome when the value of \(v_1\) is exchanged with an arbitrary value \(X_{v_1}^B\in [0,1]\). Since honest users drop out independently of X and do not reveal anything about their private value when they drop out, in our analysis we will fix an execution of the protocol where some set \(U^{H}\) of \(n_H\) honest users have remained online until the end of the protocol. For notational simplicity, we denote by \(X^A\) the vector of private values \((X_u)_{u \in U^H}\) of these honest users in which a user \(v_1\) has value \(X_ {v_1}^A\), and by \(X^B\) the vector where \(v_1\) has value \(X_{v_1}^B\). \(X^A\) and \(X^B\) differ in only in the \(v_1\)-th coordinate, and their maximum difference is 1.

All noise variables are zero mean, so the expectation and covariance matrix of \({\hat{X}}^H\) are respectively given by:

$$\begin{aligned} {\mathbb {E}}\left[ {{\hat{X}}}^H\right] = X^H,\quad \hbox {var}\left( {{\hat{X}}}^H\right) = \sigma _\eta ^2 I_{U^H} + \sigma _{\varDelta} ^2 L, \end{aligned}$$

where \(I_{U^H}\in {\mathbb {R}}^{n_H\times n_H}\) is the identity matrix and \(L=KK^\top\) is the graph Laplacian matrix of \(G^H\).

Now consider the real vector space \(Z= {\mathbb {R}}^{n_H}\times {\mathbb {R}}^{|E^H|}\) of all possible values pairs \((\eta ^H,\varDelta ^H)\) of noise vectors of honest users. For the sake of readability, in the remainder of this section we will often drop the superscript H and write \((\eta ,\varDelta )\) when it is clear from the context that we work in the space Z.

Let

$$\begin{aligned} {\varSigma ^{(g)}}= \left[ \begin{array}{cc}\sigma _\eta ^2 I_{U^H} &{} 0 \\ 0 &{} \sigma _{\varDelta} ^2 I_{E^H}\end{array}\right] , \end{aligned}$$

and let \(\varSigma ^{(-g)}=\left( {\varSigma ^{(g)}}\right) ^{-1}\), we then have a joint probability distribution of independent Gaussians:

$$\begin{aligned} P((\eta ,\varDelta )) = C_1\exp \left( -\frac{1}{2}(\eta ,\varDelta )^\top \varSigma ^{(-g)}(\eta ,\varDelta )\right) , \end{aligned}$$

where \(C_1=(2\pi )^{-(n_H+|E^H|)/2}|{\varSigma ^{(g)}}|^{-1/2}\).

Consider the following subspaces of Z:

$$\begin{aligned} Z^A= & {} \{(\eta ,\varDelta )\in Z \mid \eta +K \varDelta ={{\hat{X}}}^H-X^A\}, \\ Z^B= & {} \{(\eta ,\varDelta )\in Z \mid \eta +K \varDelta ={{\hat{X}}}^H-X^B\}. \end{aligned}$$

Assume that the (only) vertex for which \(X^A\) and \(X^B\) differ is \(v_1\). Recall that without loss of generality, private values are in the interval [0, 1]. Hence, if we set \(X^A_{v_1}-X^B_{v_1}=1\) then \(X^A\) and \(X^B\) are maximally apart and also the difference between \(P({{\hat{X}}}\mid X^A)\) and \(P({{\hat{X}}}|X^B)\) will be maximal.

Now choose any vector \(t=(t_\eta ,t_{\varDelta} )\in Z\) with \(t_\eta = (t_u)_{u\in U^H}\) and \(t_{\varDelta} = (t_e)_{e\in E^H}\) such that \(t_\eta +Kt_{\varDelta} = X^A-X^B\). It follows that \(Z^B = Z^A + t\), i.e., \(Y\in Z^A\) if and only if \(Y+t \in Z^B\). This only imposes one linear constraint on the vector t: later we will choose t more precisely in a way which is most convenient to prove our privacy claims.

In particular, for any \({\hat{X}}^H\) we have that \(X^A + \eta + K\varDelta = {\hat{X}}^H\) if and only if \(X^B + (\eta + t_\eta ) + K(\varDelta + t_{\varDelta} ) = {\hat{X}}^H\). The key idea is that appropriately choosing t allows us to map any noise \((\eta ,\varDelta )\) which results in observing \({{\hat{X}}}^H\) given dataset \(X^A\) on a similarly likely noise \((\eta +t_\eta ,\varDelta +t_{\varDelta} )\) which results in observing \({{\hat{X}}}^H\) given the adjacent dataset \(X^B\).

We illustrate the meaning of t using the example of Fig. 1. Consider the graph \(G^H\) of honest users shown in Fig. 1a and databases \(X^A\) and \(X^B\) as defined above. The difference of neighboring databases \(X^A - X^B\) is shown in Fig. 1b. Figure 1c illustrates a possible assignment of t, where \(t_\eta = (\frac{1}{9}, \dots , \frac{1}{9})\) and \(t_{\varDelta} = (\frac{5}{9}, \frac{3}{9}, \frac{3}{9}, \frac{1}{9}, \dots , \frac{1}{9}, 0, 0)\).

One can see that \(t=(t_\eta ,t_{\varDelta} )\) can be interpreted as a flow on \(G^H\) where \(t_{\varDelta}\) represents the values flowing through edges, and \(t_\eta\) represents the extent to which vertices are sources or sinks. The requirement \(t_\eta + K t_{\varDelta} = X^A- X^B\) means that for a given user u we have \(\sum _{(v,u) \in {{\vec {E}}^H}} t_{(v,u)} - \sum _{(u,v)\in {{\vec {E}}^H}} t_{(u,v)} = X^A_u - X^B_u-t_u\), which can be interpreted as the property of a flow, i.e., the value of incoming edges minus the value of outgoing edges equals the extent to which the vertex is a source or sink (here \(-t_u\) except for \(v_1\) where it is \(1-t_{v_1}\)). We will use this flow t to first distribute \(X^A-X^B\) as equally as possible over all users, and secondly to avoid huge flows through edges. For example, in Fig. 1c, we choose \(t_\eta\) in such a way that the difference \(X^A_{v_1} - X^B_{v_1} = 1\) is spread over a difference of 1/9 at each node, and \(t_{\varDelta}\) is chosen to let a value 1/9 flow from each of the vertices in \(U^H \setminus \{v_1\}\) to \(v_1\), making the flow consistent.

In the following we will first prove generic privacy guarantees for any (fixed) t, which will be used to map elements of \(Z^A\) and \(Z^B\) and bound the overall probability differences of outcomes of adjacent datasets. Next, we will instantiate t for different graphs \(G^H\) and obtain concrete guarantees.

Fig. 1
figure 1

An example of valid t over a communication graph of honest users \(G^H\). The graph \(G^H\) is shown in figure a, the difference of neighboring databases \(X^A - X^B\) in figure b, and a possible value of t which evenly distributes the flow of information in figure c

5.1.2 Abstract differential privacy result

We start by proving differential privacy guarantees which depend on the particular choice of labeling t. Theorem 1 holds for all possible choices of t, but some choices will lead to more advantageous results than others. Later, we will apply this theorem for specific choices of t for proving theorems giving privacy guarantees for communication graphs \(G^H\) with specific properties.

We first define the function \({\varTheta _{max}}:{\mathbb {R}}_+\times (0,1) \mapsto {\mathbb {R}}_+\) such that \({\varTheta _{max}}\) maps pairs \((\varepsilon ,\delta )\) on the largest positive value of \(\theta\) satisfying

$$\begin{aligned} \varepsilon&\ge \theta ^{1/2} + \theta /2, \end{aligned}$$
(4)
$$\begin{aligned} \frac{\left( \varepsilon - \theta /2\right) ^2}{\theta }&\ge 2\log \left( \frac{2}{\delta \sqrt{2\pi }}\right) . \end{aligned}$$
(5)

Note that for any \(\varepsilon\) and \(\delta\), any \(\theta \in (0,{\varTheta _{max}}]\) satisfies Eqs. (4) and (5).

Theorem 1

Let \(\varepsilon ,\delta \in (0,1)\). Choose a vector \(t=(t_\eta ,t_{\varDelta} )\in Z\) with \(t_\eta = (t_u)_{u\in U^H}\) and \(t_{\varDelta} = (t_e)_{e\in E^H}\) such that \(t_\eta +Kt_{\varDelta} = X^A-X^B\) and let \(\theta = t^\top \varSigma ^{(-g)}t\). Under the setting introduced above, if \(\theta \le {\varTheta _{max}}(\varepsilon ,\delta )\) then Gopa is \((\varepsilon ,\delta )\)-DP, i.e.,

$$\begin{aligned} P({{\hat{X}}} \mid X^A) \le e^\varepsilon P({{\hat{X}}} \mid X^B)+\delta . \end{aligned}$$

The proof of this theorem is in Appendix A.1. Essentially, we adapt ideas from the privacy proof of the Gaussian mechanism (Dwork & Roth, 2014) to our setting.

5.1.3 Discussion

Essentially, given some \(\varepsilon\), Eq. (4) provides a lower bound for the noise (the diagonal of \({\varSigma ^{(g)}}\)) to be added. Equation (4) also implies that the left hand side of Eq. (5) is larger than 1. Equation (5) may then require the noise or \(\varepsilon\) to be even higher if \(2\log (2/\delta \sqrt{2\pi })\ge 1\), i.e., \(\delta \le 0.48394\).

If \(\delta\) is fixed, both (4) and (5) allow for smaller \(\varepsilon\) if \(\theta\) is smaller. Let us analyze the implications of this result. We know that \(\theta = \sigma _\eta ^{-2} t_\eta ^\top t_\eta + \sigma _{\varDelta} ^ {-2}t_{\varDelta} ^\top t_{\varDelta}\). As we can make \(\sigma _{\varDelta}\) arbitrarily large without affecting the variance of the output of the algorithm (the pairwise noise terms canceling each other) and thus make the second term \(\sigma _{\varDelta} ^{-2}t_{\varDelta} ^\top t_{\varDelta}\) arbitrarily small, our first priority to achieve a strong privacy guarantee will be to chose a t making \(\sigma _\eta ^{-2}t_\eta ^\top t_\eta\) small. We have the following lemma.

Lemma 1

In the setting described above, for any t chosen as in Theorem 1 we have:

$$\begin{aligned} \sum _{u\in U^H} t_u = 1. \end{aligned}$$
(6)

Therefore, the vector \(t_\eta\) satisfying Eq. (6) and minimizing \(t_\eta ^\top t_\eta\) is the vector \(\mathbbm {1}_{n_H}/n_H\), i.e., the vector containing \(n_H\) components with value \(1/n_H\). The proofs of the several specializations of Theorem 1 we will present will all be based on this choice for \(t_\eta\). The proof of Lemma 1 can be found in Appendix A.2, along with the proof of another constraint that we derive from these observations.

Lemma 2

In the setting described above with t as defined in Theorem 1, if \(t_\eta = \mathbbm {1}_{n_H}/n_H\), then \(G^H\) must be connected.

Given a fixed privacy level and fixed variance of the output, a second priority is to minimize \(\sigma _{\varDelta}\), as this may be useful when a user drops out and the noise he/she exchanged cannot be rolled back or would take too much time to roll back (see Appendix B.2). For this, having more edges in \(G^H\) implies that the vector \(t_{\varDelta}\) has more components and therefore typically allows a solution to \(t_\eta +Kt_{\varDelta} =X^A-X^B\) with a smaller \(t_{\varDelta} ^\top t_{\varDelta}\) and hence a smaller \(\sigma _{\varDelta}\).

5.2 Worst case topology

We now specialize Theorem 1 to obtain a worst-case result.

Theorem 2

(Privacy guarantee for worst-case graph) Let \(X^A\) and \(X^B\) be two databases (i.e., graphs with private values at the vertices) which differ only in the value of one user. Let \(\varepsilon ,\delta \in (0,1)\) and \({\theta _{{\textsc {p}}}}= \frac{1}{\sigma _\eta ^{2}n_H} + \frac{n_H}{3\sigma _{\varDelta} ^{2}}\). If \(G^H\) is connected and \({\theta _{{\textsc {p}}}}\le {\varTheta _{max}}(\varepsilon ,\delta )\), then Gopa is \((\varepsilon ,\delta )\)-differentially private, i.e., \({P({{\hat{X}}} \mid X^A)} \le e^\varepsilon P({{\hat{X}}} \mid X^B)+\delta\).

Crucially, Theorem 2 holds as soon as the subgraph \(G^H\) of honest users who did not drop out is connected. Note that if \(G^H\) is not connected, we can still obtain a similar but weaker result for each connected component separately (\(n_H\) is replaced by the size of the connected component).

In order to get a constant \(\varepsilon\), inspecting the term \({\theta _{{\textsc {p}}}}\) shows that the variance \(\sigma _\eta ^{2}\) of the independent noise must be of order \(1/n_H\). This is in a sense optimal as it corresponds to the amount of noise required when averaging \(n_H\) values in the trusted curator model. It also matches the amount of noise needed when using secure aggregation with differential privacy in the presence of colluding users, where honest users need to add \(n/n_H\) more noise to compensate for collusion (Shi et al., 2011).

Further inspection of the conditions in Theorem 2 also shows that the variance \(\sigma _{\varDelta} ^2\) of the pairwise noise must be large enough. How large it must be actually depends on the structure of the graph \(G^H\). Theorem 2 describes the worst case, which is attained when every node has as few neighbors as possible while still being connected, i.e., when \(G^H\) is a path. In this case, Theorem 2 shows that the variance \(\sigma _{\varDelta} ^2\) needs to be of order \(n_H\). Recall that this noise cancels out, so it does not impact the utility of the final output. It only has a minor effect on the communication cost (the representation space of reals needs to be large enough to avoid overflows with high probability), and on the variance of the final result if some residual noise terms of dropout users are not rolled back (see Sect. 4).

Proof of Theorem 2

Let T be a spanning tree of the (connected) communication graph \(G^H\). Let \(E^T\) be the set of edges in T. Let \(t\in {\mathbb {R}}^{n_H+|E^H|}\) be a vector such that:

  • For vertices \(u\in U^H\), \(t_u=1/n_H\).

  • For edges \(e\in E^H \setminus E^T\), \(t_e=0\).

  • Finally, for edges \(e\in E^T\), we choose \(t_e\) in the unique way such that \(t_\eta +K t_{\varDelta} =(X^A-X^B)\).

In this way, \(t_\eta +K t_{\varDelta}\) is a vector with a 1 on the \(v_1\) position and 0 everywhere else. We can find a unique vector t using this procedure for any communication graph \(G^H\) and spanning tree T. It holds that

$$\begin{aligned} t_\eta ^\top t_\eta = \left( \frac{\mathbbm {1}_{n_H}}{n_H}\right) ^\top \left( \frac{\mathbbm {1}_{n_H}}{n_H}\right) = \frac{1}{n_H}. \end{aligned}$$
(7)

Both Eqs. (4) and (5) of Theorem 1, require \(t^\top \varSigma ^{(-g)}t\) to be sufficiently small. We can see \(t_{\varDelta} ^\top \sigma _{\varDelta} ^{-2} t_{\varDelta}\) is maximized (thus producing the worst case) if the spanning tree T is a path \(\left( v_1\,v_2\ldots v_{n_H}\right)\), in which case \(t_{\{v_i,v_{i+1}\}}=(n_H-i)/n_H\). Therefore,

$$\begin{aligned} t_{\varDelta} ^\top t_{\varDelta}\le & {} \sum _{i=1}^{n_H-1} \left( \frac{n_H-i}{n_H}\right) ^2 = \frac{n_H(n_H-1)(2n_H-1)/6}{n_H^2} = \frac{(n_H-1)(2n_H-1)}{6n_H}. \end{aligned}$$
(8)

Combining Eqs. (7) and (8) we get

$$\begin{aligned} \theta = t^\top \varSigma ^{(-g)}t \le \sigma _\eta ^{-2}\frac{1}{n_H} + \sigma _{\varDelta} ^{-2}\frac{n_H(n_H-1)(2n_H-1)/6}{n_H^2} \end{aligned}$$

We can see that \(\theta \le {\theta _{{\textsc {p}}}}\) and hence \(\theta \le {\varTheta _{max}}(\varepsilon ,\delta )\) satisfies the conditions of Theorem 1 and Gopa is \((\varepsilon ,\delta )\)-differentially private. \(\square\)

In conclusion, we see that in the worst case \(\sigma _{\varDelta} ^2\) should be large (linear in \(n_H\)) to keep \(\varepsilon\) small, which has no direct negative effect on the utility of the resulting \({{\hat{X}}}\). On the other hand, \(\sigma _\eta ^2\) can be small (of the order \(1/n_H\)), which means that independent of the number of participants or the way they communicate a small amount of independent noise is sufficient to achieve DP as long as \(G^H\) is connected.

5.3 The complete graph

The necessary value of \(\sigma _{\varDelta} ^2\) depends strongly on the network structure. This becomes clear in Theorem 3, which covers the case of the complete graph and shows that for a fully connected \(G^H\), \(\sigma _{\varDelta} ^2\) can be of order \(O(1/n_H)\), which is a quadratic reduction compared to the path case.

Theorem 3

(Privacy guarantee for complete graph) Let \(\varepsilon ,\delta \in (0,1)\) and let \(G^H\) be the complete graph. Let \(\theta _C = \frac{1}{\sigma _\eta ^{2}n_H} + \frac{1}{\sigma _{\varDelta} ^{2}n_H}\). If \(\theta _C\le \varTheta _{max}(\varepsilon ,\delta )\), then Gopa is \((\varepsilon ,\delta )\)-DP.

Proof

If the communication graph is fully connected, we can use the following values for the vector t:

  • As earlier, for \(v\in U^H\), let \(t_v=1/n_H\).

  • For edges \(\{u,v\}\) with \(v_1\not \in \{u,v\}\), let \(t_{\{u,v\}}=0\).

  • For \(u\in U^H\setminus \{v_1\}\), let \(t_{\{u,v_1\}} = 1/n_H\).

Again, one can verify that \(t_\eta +Kt_{\varDelta} =X^A-X^B\) is a vector with a 1 on the \(v_1\) position and 0 everywhere else. In this way, again \(t_\eta ^\top t_\eta =1/n_H\) but now \(t_{\varDelta} ^\top t_{\varDelta} =(n_H-1)/n_H^2\) is much smaller. We now get

$$\begin{aligned} \theta = t^\top \varSigma ^{(-g)}t = \sigma _\eta ^{-2}/n_H+ \sigma _{\varDelta} ^{-2}(n_H-1)/n_H^2 \le \theta _C \le {\varTheta _{max}}(\varepsilon ,\delta ). \end{aligned}$$

Hence, we can apply Theorem 1 and Gopa is \((\varepsilon ,\delta )\)-differentially private. \(\square\)

A practical communication graph will be between the two extremes of the path and the complete graph, as shown in the next section.

5.4 Practical random graphs

Our results so far are not fully satisfactory from the practical perspective, when the number of users n is large. Theorem 2 assumes that we have a procedure to generate a graph G such that \(G^H\) is guaranteed to be connected (despite dropouts and malicious users), and requires a large \(\sigma _{\varDelta} ^2\) of \(O(n_H)\). Theorem 3 applies if we pick G to be the complete graph, which ensures connectivity of \(G^H\) and allows smaller \(O(1/n_H)\) variance but is intractable as all \(n^2\) pairs of users need to exchange noise.

To overcome these limitations, we propose a simple randomized procedure to construct a sparse network graph G such that \(G^H\) will be well-connected with high probability, and prove a DP guarantee for the whole process (random graph generation followed by Gopa), under much less noise than the worst-case. The idea is to make each (honest) user select k other users uniformly at random among all users. Then, the edge \(\{u,v\}\in E\) is created if u selected v or v selected u (or both). Such graphs are known as random k-out or random k-orientable graphs (Bollobás, 2001; Fenner & Frieze, 1982). They have very good connectivity properties (Fenner & Frieze, 1982; Yağan & Makowski, 2013) and are used in creating secure communication channels in distributed sensor networks (Chan et al., 2003). Note that Gopa can be conveniently executed while constructing the random k-out graph. Recall that \(\rho = n_H/n\) is the proportion of honest users. We have the following privacy guarantees (which we prove in Appendix A.3).

Theorem 4

(Privacy guarantee for random k-out graphs) Let \(\varepsilon ,\delta \in (0,1)\) and let G be obtained by letting all (honest) users randomly choose \(k\le n\) neighbors. Let k and \(\rho =n_H/n\) be such that \(\rho n \ge 81\), \(\rho k \ge 4\log (2\rho n/3\delta )\), \(\rho k \ge 6\log (\rho n/3)\) and \(\rho k \ge \frac{3}{2} + \frac{9}{4}\log (2e/\delta )\). Let

$$\begin{aligned} \theta _R&= \frac{1}{n_H\sigma _\eta ^{2}}+\frac{1}{\sigma _{\varDelta} ^{2}} \Big (\frac{1}{\lfloor (k-1)\rho /3 \rfloor -1} + \frac{12+6\log (n_H)}{n_H} \Big ) \end{aligned}$$

If \(\theta _R \le {\varTheta _{max}}(\varepsilon ,\delta )\) then Gopa is \((\varepsilon ,3\delta )\)-differentially private.

This result has a similar form as Theorems 2 and 3 but requires k to be large enough (of order \(\log (\rho n)/\rho\)) so that \(G^H\) is sufficiently connected despite dropouts and malicious users. Crucially, \(\sigma _{\varDelta} ^2\) only needs to be of order \(1/k\rho\) to match the utility of the trusted curator, and each user needs to exchange with only \(2k=O(\log n)\) peers in expectation, which is much more practical than a complete graph.

Notice that up to a constant factor this result is optimal. Indeed, in general, random graphs are not connected if their average degree is smaller than logarithmic in the number of vertices. The constant factors mainly serve for making the result practical and (unlike asymptotic random graph theory) applicable to moderately small communication graphs, as we illustrate in the next section.

5.5 Scaling the noise

Using these results, we can precisely quantify the amount of independent and pairwise noise needed to achieve a desired privacy guarantee depending on the topology, as illustrated in the corollary below.

Corollary 1

Let \(\varepsilon ,\delta ' \in (0,1)\), and \({\sigma _\eta ^2 = c^2 /n_H\varepsilon ^2}\), where \(c^2 > 2\log (1.25/\delta ')\). Given some \(\kappa > 0\), let \(\sigma _{\varDelta} ^2= \kappa \sigma _\eta ^2\) if G is complete, \(\sigma _{\varDelta} ^2= \kappa \sigma _\eta ^2n_H(\frac{1}{\lfloor (k-1)\rho /3 \rfloor -1} + (12+6\log (n_H))/n_H)\) if it is a random k-out graph with k and \(\rho\) as in Theorem 4, and \(\sigma _{\varDelta} ^2 = \kappa \sigma _\eta ^2n_H^2/3\) for an arbitrary connected \(G^H\). Then Gopa is \((\varepsilon , \delta )\)-DP with \(\delta \ge a (\delta '/1.25)^{\kappa /\kappa +1}\), where \(a=3.75\) for the k-out graph and 1.25 otherwise.

Table 2 Value of \(\sigma _{\varDelta}\) needed to ensure \((\varepsilon ,\delta )\)-DP with trusted curator utility for \(n=10000\), \(\varepsilon =0.1\), \(\delta '=1/n_H^2\), \(\delta =10\delta '\) depending on the topology, as obtained from Corollary 1 or numerical simulation

We prove Corollary 1 in Appendix A.4. The value of \(\sigma _\eta ^2\) is set such that after all noisy values are aggregated, the variance of the residual noise matches that required by the Gaussian mechanism (Dwork & Roth, 2014) to achieve \((\varepsilon , \delta ')\)-DP for an average of \(n_H\) values in the centralized setting. The privacy-utility trade-off achieved by Gopa is thus the same as in the trusted curator model up to a small constant in \(\delta\), as long as the pairwise variance \(\sigma _{\varDelta} ^2\) is large enough. As expected, we see that as \(\sigma _{\varDelta} ^2\rightarrow +\infty\) (that is, as \(\kappa \rightarrow +\infty\)), we have \(\delta \rightarrow \delta '\) for worst case and complete graphs, or \(\delta \rightarrow 3\delta '\) for k-out graphs. Given the desired \(\delta \ge \delta '\), we can use Corollary 1 to determine a value for \(\sigma _{\varDelta} ^2\) that is sufficient for Gopa to achieve \((\varepsilon ,\delta )\)-DP. Table 2 shows a numerical illustration with \(\delta\) only a factor 10 larger than \(\delta '\). For random k-out graphs, we report the values of \(\sigma _{\varDelta}\) and k given by Theorem 4, as well as smaller (yet admissible) values obtained by numerical simulation (see Appendix A.5). Although the conditions of Theorem 4 are a bit conservative (constants can likely be improved), they still lead to practical values. Clearly, random k-out graphs provide a useful trade-off in terms of scalability and robustness. Note that in practice, one often does not know in advance the exact proportion \(\rho\) of users who are honest and will not drop out, so a lower bound can be used instead.

Remark 1

For clarity of presentation, our privacy guarantees protect against an adversary that consists of colluding malicious users. To simultaneously protect against each single honest-but-curious user (who knows his own independent noise term), we can simply replace \(n_H\) by \(n_H'=n_H-1\) in our results. This introduces a factor \(n_H/(n_H-1)\) in the variance, which is negligible for large \(n_H\). Note that the same applies to other approaches which distribute noise-generation over data-providing users, e.g., Dwork et al. (2006).

6 Correctness against malicious users

While the privacy guarantees of Sect. 5 hold regardless of the behavior of the (bounded number of) malicious users, the utility guarantees discussed in Sect. 4 are not valid if malicious users tamper with the protocol. In this section, we add to our protocol the capability of being audited to ensure the correctness of the computations while preserving privacy guarantees. In particular, while Sect. 5 guarantees privacy and prevents inference attacks where attackers infer sensitive information illegally, tampering with the protocol can be part of poisoning attacks which aim to change the result of the computation. We argue that we can detect all attacks to poison the output which can also be detected in the centralized setting. As we will explain we don’t assume prior knowledge about data distributions or patterns, and in such conditions neither in the centralized setting nor in our setting one can detect behavior which could be legal but may be unlikely.

We present here the main ideas. In Appendix B and Appendix D, we review some cryptographic background required to understand and verify the details of our construction.

Objective Our goal is to (i) verify that all calculations are performed correctly even though they are encrypted, and (ii) identify any malicious behavior. As a result, we guarantee that given the input vector X a truthfully computed \({\hat{X}}^{avg}\) is generated which excludes any faulty contributions.

Concretely, users will be able to prove the following properties:

$$\begin{aligned}&{\hat{X}}_u = X_u + \textstyle \sum _{v\in N(u)} \varDelta _{u,v} + \eta _u, \qquad \forall u \in U,\; \end{aligned}$$
(9)
$$\begin{aligned}&\varDelta _{u,v} = -\varDelta _{v,u}, \qquad \forall \{u,v\} \in E,\; \end{aligned}$$
(10)
$$\begin{aligned}&\eta _u \sim {\mathcal {N}}(0, \sigma _\eta ^2), \qquad \forall u \in U,\; \end{aligned}$$
(11)
$$\begin{aligned}&X_u \text { is a valid input,}\qquad \forall u \in U.\; \end{aligned}$$
(12)

It is easy to see that the correctness of the computation is guaranteed if Properties (9)–(12) are satisfied. Note that, as long as they are self-canceling and not excessively large (avoiding overflows and additional costs if a user drops out, see Appendix B.2), we do not need to ensure that pairwise noise terms \(\varDelta _{u,v}\) have been drawn from the prescribed distribution, as these terms do not influence the final result and only those involving honest users affect the privacy guarantees of Sect. 5. In contrast, Properties (11) and (12) are necessary to prevent a malicious user from biasing the outcome of the computation. Indeed, (11) ensures that the independent noise is generated correctly, while (12) ensures that input values are in the allowed domain. Moreover, we can force users to commit to input data so that they consistently use the same values for data over multiple computations.

We first explain the involved tools to verify computations in Sect. 6.1 and we present our verification protocol in Sect. 6.2.

6.1 Tools for verifying computations

Our approach consists in publishing an encrypted log of the computation using cryptographic commitments and proving that it is performed correctly without revealing any additional information using zero knowledge proofs. These techniques are popular in a number of applications such as privacy-friendly financial systems such as Narula et al. (2018); Ben Sasson et al. (2014). We explain below different tools to robustly verify our computations. Namely, a structure to post the encrypted log of our computations, hash functions to generate secure random numbers, commitments and zero knowledge proofs.

Public bulletin board We implement the publication of commitments and proofs using a public bulletin board so that any party can verify the validity of the protocol, avoiding the need for a trusted verification entity. Users sign their messages so they cannot deny them. More general purpose distributed ledger technology (Nakamoto, 2008) could be used here, but we aim at an application-specific, light-weight and hence more scalable solution.

Representations We will represent numbers by elements of cyclic groups isomorphic to \({\mathbb {Z}}_q\) for some large prime q. To be able to work with signed fixed-precision values, we encode them in \({\mathbb {Z}}_q\) by multiplying them by a constant \(1/{\psi }\) and using the upper half of \({\mathbb {Z}}_q\), i.e., \(\{ x \in {\mathbb {Z}}_q : x \ge \lceil q/2 \rceil \}\) to represent negative values. Unless we explicitly state otherwise, properties (such as linear relationships) we establish for the \({\mathbb {Z}}_q\) elements translate straightforwardly to the fixed-precision values they represent. We choose the precision so that the error of approximating real numbers up to a multiple of \({\psi }\) does not impact our results.

Cryptographic hash functions We also use hash functions \({H: {\mathbb {Z}}\rightarrow {\mathbb {Z}}_{2^T}}\) for an integer T such that \(2^T\) is a few orders of magnitude bigger than q, so that numbers uniformly distributed over \({\mathbb {Z}}_{2^T}\) modulo q are indistinguishable from numbers uniformly distributed over \({\mathbb {Z}}_q\). Such function is easy to evaluate, but predicting its outcome or distinguishing it from random numbers is intractable for polynomially bounded algorithms (Yao, 1982). Practical instances of H can be found in Barker and Kelsey (2007); Bertoni et al. (2010).

Pedersen commitments Commitments, first introduced by Blum (1983), allow users to commit to values while keeping them hidden from others. After a commitment is performed, the committer cannot change its value, but can later prove properties of it or reveal it. For our protocol we use the Pedersen commitment scheme (Pedersen, 1991). Pedersen commitments have as public parameters \(\varTheta = (q, {\mathbb {G}}, g, h)\) where \({\mathbb {G}}\) is a cyclic multiplicative group of prime order q, and g and h are two generators of \({\mathbb {G}}\) chosen at random. A commitment is computed by applying the function \(Com_\varTheta : {\mathbb {Z}}_q \times {\mathbb {Z}}_q \rightarrow {\mathbb {G}}\), defined as

$$\begin{aligned} Com_{\varTheta } (x, r)= & {} g^x \cdot h^r, \end{aligned}$$
(13)

where \((\cdot )\) is the product operation of \({\mathbb {G}}\), \(x \in {\mathbb {Z}}_q\) is the committed value, and \(r \in {\mathbb {Z}}_q\) is a random number to ensure that \(Com_{\varTheta } (x, r)\) perfectly hides x. When r is not relevant, we simply denote by \(Com_\varTheta (x)\) a commitment of x and assume r is drawn appropriately. Under the Discrete Logarithm Assumption (DLA) \(Com_{\varTheta }\) is binding as long as g and h are picked at random such that no one knows the discrete logarithm base g of h. That is, no computationally efficient algorithm can find \(x_1,x_2, r_1, r_2 \in {\mathbb {Z}}_q\) such that \(x_1 \ne x_2\) and \(Com_{\varTheta }(x_1, r_1) = Com_{\varTheta }(x_2, r_2)\). As the parameters \(\varTheta\) are public, many parties can share the same scheme, but parameters must be sampled with unbiased randomness to ensure the binding property.

Pedersen commitments are homomorphic, as \(Com_{\varTheta }(x+y,r+s)=Com_{\varTheta }(x,r)\cdot Com_{\varTheta }(y,s)\). This is already enough to verify linear relations, such as the ones in Properties (9) and (10).

It is sometimes needed to let users prove that they know the values underlying commitments. In our discussion we will implicitly assume proofs of knowledge (Cramer & Damgård, 1998) (see also below) are inserted where needed.

Zero knowledge proofs To verify other than linear relationships, we use a family of techniques called Zero Knowledge Proofs (ZKPs), first proposed in Goldwasser et al. (1989). In these proofs, a party called the prover convinces another party, the verifier, about a statement over committed values. For our scope and informally speaking, ZKPsFootnote 2

  • allow the prover to successfully prove true a statement (completeness),

  • allow the verifier to discover with arbitrarily large probability any attempt to prove a false statement (soundness),

  • guarantee that by performing the proof, no information about the knowledge of the prover other than the proven statement is revealed (zero knowledge).

Importantly, the zero knowledge property of our proofs does not rely on any computational hardness assumption.

\(\varSigma\)-Protocols We use a family of ZKPs called \(\varSigma\)-protocols and introduced in Cramer (1997). They allow to prove the knowledge of committed values, relations between them that involve arithmetic circuits in \({\mathbb {Z}}_q\) Cramer and Damgård (1998) (i.e. functions that only contain additions and products in \({\mathbb {Z}}_q\)), and the disjunction of statements involving this kind of relations (Cramer et al., 1994). Let \(C_{cst}\) be the cost of computing an arithmetic circuit \(C: {\mathbb {Z}}_q^m \rightarrow {\mathbb {Z}}_q^s\), then the computational cost of proving the correct computation of C requires \(O(C_{cst})\) cryptographic computations (mainly exponentiations in \({\mathbb {G}}\)) and a transfer of \(O(C_{cst})\) elements of \({\mathbb {G}}\). Proving the disjunction \(S_1 \vee S_2\) of two statements \(S_1\) and \(S_2\) costs the sum of proving \(S_1\) and \(S_2\) separately. For simplicity, we say that a proof has cost c if the cost of generating a proof and its verification is at most c cryptographic computations each, and the communication cost is at most c elements of \({\mathbb {G}}\). We denote by \(W\) the size in bits of an element of \({\mathbb {G}}\).

Proving that a commitment to a number \(a \in {\mathbb {Z}}_q\) is in a certain range \([0, 2^k-1]\) for some integer k can be derived from circuit proofs with the following folklore protocol: commit to each bit of \(b_1, \dots , b_k\) of a and prove that they are indeed bits, for example by proving that \(b_i (1 - b_i ) = 0\) for all \(i \in \{1, \dots , k\}\), then prove that \(a = \sum _{i=1}^k 2^{i-1} b_i\). This proof has a cost of 5k. The homomorphic property of commitments allows one to easily generalize this proof to any range \([a,b] \subset {\mathbb {Z}}_p\) with a cost of \(10\lceil \log _2(b-a) \rceil\).

\(\varSigma\)-protocols require that the prover interacts with a honest verifier. This is not applicable to our setting where verifiers can be malicious. We can turn our proofs into non-interactive ones with negligible additional cost with the strong Fiat–Shamir heuristic (Bernhard et al., 2012). In that way, for a statement S each user generates a proof transcript \(\varvec{\pi }_{S}\) together with the involved commitments and publish it in the bulletin board. Any party can later verify offline the correctness of \(\varvec{\pi }_{S}\). The transcript size is equal to the amount of exchanged elements in the original protocol.

6.2 Verification protocol

Our verification protocol, based on the primitives described in Sect. 6.1, consists of four phases:

  1. 1.

    Private data commit At the start of our protocol, we assume users have committed to their private data. In particular, for every user u a commitment is available, either directly published by u or available through a distributed ledger or other suitable mechanism. This attenuates data poisoning, as it forces users to use the same value for \(X_u\) in each computation where it is needed.

  2. 2.

    Setup In a setup phase at the start of our protocol, users generate Pedersen commitment parameters \(\varTheta\) and private random seeds that will be used to prove Property (11). Details are discussed in Appendix B.1.

  3. 3.

    Verification During our protocol, users can prove that execution is performed correctly and verify logs containing such proofs by others. If during the protocol one detects a user has cheated he/she is added to a cheater list. After the protocol, one can verify that all steps were performed correctly and that the protocol has been completed. We give details on this key step below.

  4. 4.

    Mitigation Cheaters and drop-out users (who got off-line for a too long period of time) detected during the protocol are excluded from the computation, and their contributions are rolled back. Details are provided in Appendix B.2.

Verification phase First, we use the homomorphic property of Pedersen commitments to prove Properties (9) and (10). Note that Property (10) involves secrets of two different users u and v. This is not a problem as these pairwise noise terms are known by both involved users, so they can use negated randomnesses \(r_{\varDelta _{u,v}}=-r_{\varDelta _{v,u}}\) in their commitments of \(\varDelta _{u,v}\) and \(\varDelta _{v,u}\) such that everybody can verify that \(Com_{\varTheta }(\varDelta _{u,v},r_{\varDelta _{u,v}}) \cdot Com_{\varTheta } (\varDelta _{v,u},r_{\varDelta _{v,u}})=Com_{\varTheta }(0,0)\). Users can choose how they generate pairwise Gaussian noise (e.g., by convention, the user that initiates the exchange can generate the noise). We just require that each user holds a message on the agreed noise terms signed by the other user before publishing commitments, so that if one of them cheats, it can be easily discovered.

Verifying the correct drawing of Gaussian numbers is more involved and requires private seeds \(r_1, \dots , r_n\) generated in Phase 2. We explain the procedure step by step in Appendix D.3. The proof generates a transcript \(\varvec{\pi }_{\eta _u}\) for each user u.

To verify Property (12), we verify its domain and its consistency. For the domain, we prove that \(X_u \in [0,1]\) with the range proof outlined in Sect. 6.1. For the consistency, users u publish a Pedersen commitment \(\mathbf {c_{X_u}}=Com_{\varTheta }(X_u)\) and prove its consistency with private data committed to in Phase 1 denoted as \(\mathbf {c_D}\). Such proof depends on the nature of the commitment in Phase 1: if the same Pedersen commitment scheme is used nothing needs to be done, but users could also prove consistency with a record in a blockchain (which is also used for other applications) or they may need to prove more complex consistency relationships. We denote the entire proof transcript as \(\varvec{\pi }_{X_u}\). As an illustration, consider ridge regression in Example 1. Every user u can publish commitments \(\mathbf {c}_{\mathbf {y}_\mathbf {u}} = Com_ {\varTheta }(y_u)\), \(\mathbf {c}_{{\varvec{\phi }}_\mathbf {u}^\mathbf {i}} = Com_{\varTheta }(\phi _u^i)\) for \(i \in \{1, \dots ,d\}\) (computed with the appropriately drawn randomness), and additionally commit to \(\phi _u^i y_u\) and \(\phi _u^i \phi _u^j\), for \(i,j \in \{1, \dots , d\}\). Then it can be verified that all these commitments are computed coherently, i.e, that the commitment of \(\phi _u^i y_u\) is the product of secrets committed in \(\mathbf {c_{y_u}}\) and \(\mathbf {c}_{{\varvec{\phi }}_\mathbf {u}^\mathbf {i}}\) for \(i \in \{1, \dots , d\}\), and analogously for the commitment of \(\phi _u^i \phi _u^j\) in relation with \(\mathbf {c}_{{\varvec{\phi }}_\mathbf {u}^\mathbf {i}}\) and \(\mathbf {c}_{{\varvec{\phi }}_\mathbf {u}^\mathbf {j}}\), for \(i,j \in \{1, \dots , d \}\).

We note that if poisoned private values are used consistently after committing to them, this will remain undetected. However, if our verification methodology is applied in the training of many different models over time, it could be required that users prove consistency over values that have been committed long time ago. Therefore, cheating is discouraged and these attacks are attenuated by the impossibility to adapt corrupted contributions to specific computations.

Compared to the central setting with a trusted curator, encrypting the input does not make the verification of input more problematic. Both in the central setting and in our setting one can perform domain tests, ask certification of values from independent sources, and require consistency of the inputs over multiple computations, even though in some cases both the central curator and our setting may be unable to verify the correctness of some input.

Algorithm 2 gives a high level overview of the 4 verification steps described above. By composition of ZKPs, these steps allow each user to prove the correctness of their computations and preserve completeness, soundness and zero knowledge properties, thereby leading to our security guarantees:

Theorem 5

(Security guarantees of Gopa) Under the DLA, a user \(u \in U\) that passes the verification protocol proves that \({\hat{X}}_u\) was computed correctly. Additionally, u does not reveal any additional information about \(X_u\) by running the verification, even if the DLA does not hold.

To reduce the verification load, we note that it is possible to perform the verification for only a subset of users picked at random (for example, sampled using public unbiased randomness generated in Phase 2) after users have published the involved commitments. In this case, we obtain probabilistic security guarantees, which may be sufficient for some applications.

We can conclude that Gopa is an auditable protocol that, through existing efficient cryptographic primitives, can offer guarantees similar to the automated auditing which is possible for data shared with a central party.

figure b

7 Computation and communication costs

Our cost analysis considers user-centered costs, which is natural as most operations can be performed asynchronously and in parallel. The following statement summarizes our results (concrete non-asymptotic costs are in Appendix C).

Theorem 6

(Complexity of Gopa) Let \({\psi } >0\) be the desired fixed precision such that the number 1 would be represented as \(1/{\psi }\). Let \(B > 0\) be such that the \(\eta _u\)’s are drawn from a Gaussian distribution approximated with 1/B equiprobable bins. Then, each user u, to perform and prove its contribution, requires \(O(|N(u)|+\log (1/{\psi })\log (1/B) + \log (1/B) + \log (1/{\psi }))\) computations and transferred bits. The verification of its contribution requires the same cost.

Unlike other frameworks like fully homomorphic encryption and secure multiparty computation, our cryptographic primitives (Franck et al., 2017) scale well to large data.

8 Experiments

Private averaging. We present some numerical simulations to study the empirical utility of Gopa and in particular the influence of malicious and dropped out users. We consider \(n=10000\), \(\varepsilon =0.1\), \(\delta =1/n^2\) and set the values of k, \(\sigma ^2_\eta\) and \(\sigma ^2_{\varDelta}\) using Corollary 1 so that Gopa satisfies \((\varepsilon ,\delta )\)-DP. Figure 2 (left) shows the utility of Gopa when executed with k-out graphs as a function of \(\rho\) , which is the (lower bound on) the proportion of users who are honest and do not drop out. The curves in the figure are closed form formulas given by Corollary 1 (for Gopa ) and Appendix A of Dwork and Roth (2014) (for local DP and central DP). As long as the value of k is admissible, it does not change \(\sigma _\eta\). The utility of Gopa is shown for different values of \(\kappa\). This parameter allows to obtain different trade-offs between magnitudes of \(\sigma _\eta\) and \(\sigma _{\varDelta}\). While a very small \(\kappa\) degrades the utility, this impact quickly becomes negligible as \(\kappa\) reaches 10 (which also has a minor effect in \(\sigma _{\varDelta}\)). With \(\kappa = 10\) and even for reasonably small \(\rho\), Gopa already approaches a utility of the same order as a trusted curator that would average the values of all n users. Further increasing \(\kappa\) would not be of any use as this will not have a significant impact in utility and will simply increase \(\sigma _{\varDelta}\).

While the values of \(\varepsilon\) and \(\delta\) obviously impact the utility, we stress the fact that they only have a uniform scaling effect which does not affect the relative distance between the utility of Gopa and that of a trusted curator. Regarding the communication graph \(G^H\), it greatly influences the communication cost and \(\sigma _{\varDelta}\), but only affects \(\sigma _\eta\) via parameter a of Corollary 1 which has a negligible impact in utility.

In Appendix B.2, we further describe the ability of Gopa to tolerate a small number of residual pairwise noise terms of variance \(\sigma _{\varDelta} ^2\) in the final result. We note that this feature is rather unique to Gopa and is not possible with secure aggregation (Bonawitz et al., 2017; Bell et al., 2020b).

Application to federated SGD We present some experiments on training a logistic regression model in a federated learning setting. We use a binarized version of UCI Housing dataset with standardized features and points normalized to unit L2 norm to ensure a gradient sensitivity bounded by 2. We set aside \(20\%\) of the points as test set and split the rest uniformly at random across \(n=10000\) users so that each user u has a local dataset \(D_u\) composed of 1 or 2 points.

We use the Federated SGD algorithm, which corresponds to FedAvg with a single local update (McMahan et al., 2017). At each iteration, each user computes a stochastic gradient using one sample of their local dataset; these gradients are then averaged and used to update the model parameters. To privately average the gradients, we compare Gopa (using k-out graphs with \(\rho =0.5\) and \(\kappa = 10\)) to (i) a trusted aggregator that averages all n gradients in the clear and adds Gaussian noise to the result as per central DP, and (ii) local DP. We fix the total privacy budget to \(\varepsilon =1\) and \(\delta =1/(\rho n)^2\) and use advanced composition (in Sect. 3.5.2 of Dwork and Roth (2014)) to compute the budget allocated to each iteration. Specifically, we use Corollary 3.21 of Dwork and Roth (2014) by requiring that each update is \((\varepsilon _s, \delta _s)\)-DP for \(\varepsilon _s = \varepsilon / 2\sqrt{2 T \ln (1/\delta _s)}\), \(\delta _s = \delta /T+1\) and T equal to the total number of iterations. This ensures \((\varepsilon , \delta )\)-DP for the overall algorithm. The step size is tuned for each approach, selecting the value with the highest accuracy after a predefined number T of iterations.

Figure 2 (middle) shows a typical run of the algorithm for \(T=50\) iterations. Local DP is not shown as it diverges unless the learning rate is overly small. On the other hand, Gopa is able to decrease the objective function steadily, although we see some difference with the trusted aggregator (this is expected since \(\rho =0.5\)). Figure 2 (right) shows the final test accuracy (averaged over 10 runs) for different numbers of iterations T. Despite the small gap in objective function, Gopa nearly matches the accuracy achieved by the trusted aggregator, while local DP is unable to learn useful models.

We provide the code to reproduce the experimental results presented in Figures 2 and 3 (see Appendix B.2) and in Tables 2 (see Sect. 5.5) and 3 (see Appendix A.5) in a public repository.Footnote 3

Fig. 2
figure 2

Comparing Gopa to central and local DP. Left: Utility of \(\textsc {Gopa}\) (measured by the variance of the estimated average) w.r.t. \(\rho\). Middle: Evolution of the objective for a typical run of FedSGD. Right: Test accuracy of models learned with FedSGD. See text for details

9 Conclusion

We proposed Gopa, a protocol to privately compute averages over the values of many users. Gopa satisfies DP, can nearly match the utility of a trusted curator, and is robust to malicious parties. It can be used in distributed and federated ML (Jayaraman et al., 2018; Kairouz et al., 2021b) in place of more costly secure aggregation schemes. In future work, we plan to provide efficient implementations, to integrate our approach in complete ML systems, and to exploit scaling to reduce the cost per average. We think that our work is also relevant beyond averaging, e.g. in the context of robust aggregation for distributed SGD (Blanchard et al., 2017) and for computing pairwise statistics (Bell et al., 2020a).