Keywords

1 Introduction

Federated learning (FL) [27] allows multiple clients to collaboratively train a machine learning model. In each training round, the clients share their local model updates with a central server, which then aggregates them to improve the global model. Although raw data is not shared in the clear, vanilla FL is prone to model-inversion attacks [18] and membership-inference attacks [29]. To mitigate such attacks, secure aggregation (SecAgg) [9] has been proposed, where the clients jointly mask their local model updates so that only the aggregate is revealed to the server.

Many papers explicitly or implicitly assume that SecAgg provides strong privacy against honest-but-curious servers in a single round [16, 19, 30, 31]. However, a formal analysis of the privacy offered by SecAgg is lacking, making this presumption unjustified. SecAgg has been combined with differential privacy (DP) [14] to ensure that the server only sees the aggregate of the noisy local updates  [2, 22]. However, in these works, the privacy analysis does not account for SecAgg. It remains unclear how much privacy SecAgg by itself provides for individual updates.

Main Contributions. We address the question: how much privacy does SecAgg by itself guarantee for the local updates? Specifically, we formally analyze the privacy of SecAgg against membership inference attacks wherein the server aims to distinguish, from two potential update vectors, the one a client submitted in a single training round of FL with SecAgg. Our approach consists in treating SecAgg as a local differential privacy (LDP) mechanism for each update, where the sum of the other clients’ updates plays the role of a source of uncontrolled noise. We then characterize the privacy parameters \((\epsilon ,\delta )\) for SecAgg to satisfy \((\epsilon ,\delta )\)-LDP via the following steps.

  • We show that, under some practical assumptions, as the client population grows, the sum of the clients’ updates converges to a Gaussian vector (Theorem 1). We analyze the optimal privacy guarantee of the Gaussian mechanism with correlated noise (Theorem 2).

  • We evaluate the optimal LDP parameters of SecAgg in some special cases (Theorem 3 and Corollary 1) and verify that these parameters are close to that of a Gaussian mechanism, even for a small number of clients (Fig. 1).

  • Exploiting the similarity of SecAgg and a Gaussian mechanism, we audit the privacy of SecAgg. Specifically, we design a simple membership inference attack wherein the server regards SecAgg as a Gaussian mechanism with correlated noise. We then evaluate the achievable false negative rate (FNR) and false positive rate (FPR) of this attack and use these values to compute a lower bound on the smallest \(\epsilon \) for SecAgg to satisfy \((\epsilon ,\delta )\)-LDP.

We apply our privacy auditing procedure to federated averaging for a classification problem on the ADULT dataset [5] and the EMNIST Digits dataset [11]. We show that both the FNR and FPR can be small simultaneously, and the audited \((\epsilon ,\delta )\) are high. Our results reveal that SecAgg provides weak privacy even for a single training round. Indeed, it is difficult to hide a local update by adding other independent local updates when the updates are of high dimension. Therefore, SecAgg cannot be used as a sole privacy-enhancing mechanism in FL.

2 Related Work

Secure Aggregation. Based on cryptographic multi-party computation, SecAgg ensures that the central server sees only the aggregate of the clients’ local updates, while individual updates are kept confidential. This is achieved by letting the clients jointly add randomly sampled masks to their updates via secret sharing, such that when the masked updates are aggregated, the masks cancel out [6, 9]. With SecAgg, a client’s update is obfuscated by many other clients’ updates. However, the level of privacy provided by SecAgg lacks a formal analysis. In [16], this level was measured by the mutual information between a local update and the aggregated update. However, mutual information only measures the average privacy leakage and does not capture the threat to the most vulnerable data points. Furthermore, the bound provided in [16] is not explicit, i.e., not computable.

Differential Privacy. DP is a rigorous privacy measure that quantifies the ability of an adversary to guess which dataset, out of two neighboring ones, a model was trained on [14, 15]. DP is typically achieved by adding noise to the model/gradients obtained from the dataset [1]. A variant of DP is LDP [13, 24], where the noise is added to individual data points. When applied to achieve client-level privacy in FL, LDP lets the clients add noise to their updates before sending the updates to the server.

Privacy Attacks in FL with SecAgg. Model inversion attacks [18] and membership-inference attacks [29] have been shown to seriously jeopardize the integrity of FL. When SecAgg is employed, the server can perform disaggregation attacks to learn the individual data. A malicious server performs active attacks by suppressing the updates of non-target clients [8, 17]. For an honest-but-curious server, existing passive attacks require that the server leverages the aggregated model across many rounds [25, 26]. Differently from these works, we consider a passive attack based only on the observation in a single round.

3 Preliminaries

We denote random quantities with lowercase nonitalic letters, such as a scalar \({\text {x}}\) and a vector \({\textbf{x}}\). The only exception is the privacy-loss random variable (PLRV) \({\text {L}}\), which is in uppercase. Deterministic quantities are denoted with italic letters, such as a scalar x and a vector \({\boldsymbol{x}}\). We denote the multidimensional normal distribution with mean \({\boldsymbol{\mu }}\) and covariance matrix \({\boldsymbol{\varSigma }}\) by \({\mathcal {N}}({\boldsymbol{\mu }},{\boldsymbol{\varSigma }})\) and its probability density function (PDF) evaluated at \({\boldsymbol{x}}\) by \({\mathcal {N}}({\boldsymbol{x}}; {\boldsymbol{\mu }}, {\boldsymbol{\varSigma }})\). We denote by \(\varPhi (x)\) the cummulative distribution function (CDF) of the standard normal distribution \({\mathcal {N}}(0,1)\), i.e., \(\varPhi (x) \triangleq \frac{1}{\sqrt{2\pi }} \int _{-\infty }^x e^{-u^2/2} \textrm{d}u\). We denote by [m : n] the set of integers from m to n; \([n] \triangleq [1:n]\); \((\cdot )^+ \triangleq \max \{0,\cdot \}\). Furthermore, \({\mathbbm {1}{\left\{ \cdot \right\} }}\) denotes the indicator function, and \(f(n) = o(g(n))\) means that \(f(n)/g(n) \rightarrow 0\) as \(n \rightarrow \infty \).

Let \(\phi \) be a decision rule of a hypothesis test between \(\{H :\!\) the underlying distribution is \(P\}\) and \(\{H' :\!\) the underlying distribution is \(Q\}\). Specifically, \(\phi \) returns 0 and 1 in favor of H and \(H'\), respectively. A false positive (resp. false negative) occurs when H (resp. \(H'\)) is true but rejected. The FPR and FNR of the test are given by \(\alpha _\phi \triangleq \mathbb {E}_{P}\left[ {\phi }\right] \) and \(\beta _\phi \triangleq 1 - \mathbb {E}_{Q}\left[ {\phi }\right] \), respectively.

Definition 1

(Trade-off function). The trade-off function \(T_{P,Q}(\cdot ) :[0,1] \rightarrow [0,1]\) is the map from the FPR to the corresponding minimum FNR of the test between P and Q, i.e., \(T_{P,Q}(\alpha ) \triangleq \inf _{\phi :\alpha _\phi \le \alpha } \beta _\phi \), \(\alpha \in [0,1]\).

We also write the trade-off function for the distributions of \({\text {x}}\) and \({\text {y}}\) as \(T_{{\text {x}},{\text {y}}}(\cdot )\). We next state the definition of LDP.

Definition 2

(LDP [13, 24]). A mechanism M satisfies \((\epsilon ,\delta )\)-LDP if and only if, for every pair of data points \(({\boldsymbol{x}},{\boldsymbol{x}}')\) and for every measurable set \({\mathcal {E}}\), we have \( \mathbb {P}\left[ {M({\boldsymbol{x}}) \in {\mathcal {E}}}\right] \le e^{\epsilon } \mathbb {P}\left[ {M({\boldsymbol{x}}') \in {\mathcal {E}}}\right] + \delta \).

For a mechanism M, we define the optimal LDP curve \(\delta _M({\epsilon })\) as the function that returns the smallest \(\delta \) for which M satisfies \(({\epsilon },\delta )\)-LDP. We next define a variant of LDP that is built upon the trade-off function in a similar manner as f-DP [12, Def. 3].

Definition 3

(f-LDP). For a function f, a mechanism M satisfies \(f\)-LDP if for every pair of data points \(({\boldsymbol{x}},{\boldsymbol{x}}')\), we have that \(T_{M({\boldsymbol{x}}), M({\boldsymbol{x}}')}(\alpha ) \ge f(\alpha ), \forall \alpha \in [0,1].\)

For a mechanism M, we define the optimal f-LDP curve \(f_M(\cdot )\) as the upper envelope of all functions f such that M satisfies f-LDP. In the paper, we regard SecAgg as a LDP mechanism and provide bounds on both its optimal LDP curve and optimal f-LDP curve.

4 Privacy Analysis of Secure Aggregation

We consider a FL scenario with \(n+1\) clients and a central server. The model update of client i can be represented as a vector \({\textbf{x}}_i \in {\mathbb {R}}^d\). Under SecAgg, the server only learns the aggregate model update \(\bar{{\textbf{x}}}= \sum _{i=0}^n {\textbf{x}}_i\), while the individual updates \(\{{\textbf{x}}_i\}_{i=0}^n\) remain confidential.

4.1 Threat Model

Server. The server is honest and follows the FL and SecAgg protocols. We assume that it observes the exact sum \(\bar{{\textbf{x}}}\). In practical SecAgg, the clients discretize their updates (using, e.g., randomized rounding) and the server obtains a modulo sum. These operations introduce perturbations that can improve privacy [34]. However, they do not capture the essence of SecAgg which is to use the updates of other clients to obfuscate an individual update. The rounding and modulo operations can be applied even in a setting without SecAgg. We ignore the perturbation caused by these operations to focus on the privacy obtainable by using the updates of other clients to obfuscate an update.

Clients. The clients are also honest. Client i computes the local model update \({\textbf{x}}_i\) from the global model in the previous round and its local dataset. We also assume that each vector \({\textbf{x}}_i\), \(i\in [0:n]\), has correlated entries since these entries together describe a model, and that the vectors are mutually independent. The latter assumption holds in the first training round if the clients have independent local data. Furthermore, the independence assumption results in the best-case scenario for privacy as, if the vectors are dependent, the sum reveals more information about each vector. Therefore, the privacy level for the case of independent \(\{{\textbf{x}}_i\}_{i=0}^n\) acts as an upper bound on the privacy for the case of dependent vectors. So if privacy does not hold for independent data, it will also not hold when there is dependence.

Privacy Threat. The server is curious. It seeks to infer the membership of a targeted client, say client 0, from the aggregate model updates \(\bar{{\textbf{x}}}\). We consider a membership inference game [33] where: i) a challenger selects a pair of possible local updates \(({\boldsymbol{x}}_0,{\boldsymbol{x}}_0')\) of client 0, one of which is used in the aggregation, and sends this pair to the server, ii) the server observes \(\bar{{\textbf{x}}}\) and guesses if \({\boldsymbol{x}}_0\) or \({\boldsymbol{x}}_0'\) was submitted by client 0. Note that this attack can be an entry point for the server to further infer the data of client 0. Our goal is to quantify the capability of SecAgg in mitigating this attack.

4.2 SecAgg as a Noiseless LDP Mechanism

Hereafter, we focus on client 0; the analysis for other clients follows similarly. Our key idea is to view SecAgg through the lens of noiseless DP [7], where the contribution of other clients can be seen as noise and no external (controlled) noise is added. More precisely, for client 0, SecAgg plays the role of the mechanism

$$\begin{aligned} M({\textbf{x}}_0) = {\textbf{x}}_0 + {\textbf{y}}, \end{aligned}$$
(1)

where \({\textbf{y}}= \sum _{i=1}^n{\textbf{x}}_i\) is a source of uncontrolled noise.

The aforementioned membership inference game can be cast as follows: given \(M({\textbf{x}}_0)\), the server guesses whether it came from \(P_{M({\textbf{x}}_0) | {\textbf{x}}_0 = {\boldsymbol{x}}_0}\) or \(P_{M({\textbf{x}}_0) | {\textbf{x}}_0 = {\boldsymbol{x}}_0'}\) for the worst-case pair \(({\boldsymbol{x}}_0,{\boldsymbol{x}}_0')\). This game is closely related to the LDP framework. First, the tradeoff between the FPR and FNR of the server’s guesses is captured by the f-LDP guarantee of M. Second, as M achieves a stronger \((\epsilon ,\delta )\)-LDP guarantee, the distributions \(P_{M({\textbf{x}}_0) | {\textbf{x}}_0 = {\boldsymbol{x}}_0}\) and \(P_{M({\textbf{x}}_0) | {\textbf{x}}_0 = {\boldsymbol{x}}_0'}\) become more similar, and the hypothesis test between them becomes harder. Therefore, we shall address the following question: how much LDP or f-LDP does SecAgg guarantee for client 0? Specifically, we shall establish bounds on the optimal LDP curve \(\delta _M(\epsilon )\) and optimal f-LDP curve \(f_M(\cdot )\) of the mechanism M.

4.3 Asymptotic Privacy Guarantee

Let us first focus on the large-n regime. The following asymptotic analysis will be used as inspiration for our privacy auditing procedure to establish a lower bound on the LDP curve in Sect. 4.5.

We assume that the \(\ell _2\) norm of the vectors \(\{{\textbf{x}}_i\}_{i=1}^n\) scales as \(o(\sqrt{n})\), which holds if, e.g., d is fixed. In this case, \({\textbf{y}}\) converges to a Gaussian vector when \(n\rightarrow \infty \), as stated in the next theorem.

Theorem 1

(Asymptotic noise distribution). Assume that \(\{{\textbf{x}}_i\}_{i=1}^n\) are independent, \(\Vert {\textbf{x}}_i\Vert _2 = o(\sqrt{n})\) for \(i \in [n]\), and \(\frac{1}{n}\sum _{i=1}^n\textrm{Cov}[{\textbf{x}}_i] \rightarrow {\boldsymbol{\varSigma }}\) as \(n \rightarrow \infty \). Then \(\frac{1}{\sqrt{n}}\big (\!\sum _{i=1}^n \!{\textbf{x}}_i - \mathbb {E}\left[ {\sum _{i=1}^n \!{\textbf{x}}_i}\right] \!\big )\) converges in distribution to \({\mathcal {N}}(\textbf{0},{\boldsymbol{\varSigma }})\) as \(n\rightarrow \infty \).

Proof

Theorem 1 follows by applying the multivariate Lindeberg-Feller central limit Theorem [32, Prop. 2.27] to the triangular array \(\big \{\frac{{\textbf{x}}_i}{\sqrt{n}}\big \}_{n,i}\), upon verifying the Lindeberg condition \(\lim \limits _{n\rightarrow \infty } \sum _{i=1}^n \mathbb {E}\left[ {\frac{\Vert {\textbf{x}}_i\Vert _2^2}{n} {\mathbbm {1}{\left\{ \frac{\Vert {\textbf{x}}_i\Vert _2}{\sqrt{n}} > \varepsilon \right\} }}}\right] = 0, \forall \varepsilon > 0\). Since \(\Vert {\textbf{x}}_i\Vert _2 = o(\sqrt{n})\), i.e., \(\Vert {\textbf{x}}_i\Vert _2/\sqrt{n} \rightarrow 0\) as \(n\rightarrow \infty \), this condition indeed holds.    \(\square \)

Theorem 1 implies that, when n is large, under the presented assumptions, the mechanism \(\widetilde{M}({\boldsymbol{x}}_0) = M({\boldsymbol{x}}_0) - \mathbb {E}\left[ {{\textbf{y}}}\right] \) behaves like a Gaussian mechanism with noise distribution \({\mathcal {N}}(\textbf{0},{\boldsymbol{\varSigma }}_{\textbf{y}})\), where \({\boldsymbol{\varSigma }}_{\textbf{y}}\) is the covariance matrix of \({\textbf{y}}\). Furthermore, since the map from M to \(\widetilde{M}\) is simply a shift by a fixed vector \(\mathbb {E}\left[ {{\textbf{y}}}\right] \), i.e., it is a bijection, we have from the post-processing propertyFootnote 1 that the optimal LDP curve of M is the same as that of \(\widetilde{M}\).

We now provide privacy guarantees for a Gaussian mechanism with correlated noise, to capture the correlation between the entries of the vectors \({\textbf{x}}_i\). The next theorem, proved in Appendix A.1, is an extension of the optimal privacy curve of the uncorrelated Gaussian mechanism [4, Theorem 8].

Theorem 2

(Correlated Gaussian mechanism). Consider the mechanism \(G({\textbf{x}}) = {\textbf{x}}+ {\textbf{y}}\) where \({\textbf{x}}\) belongs to a set \({\mathcal {S}}_d \subset {\mathbb {R}}^d\), and \({\textbf{y}}\sim {\mathcal {N}}(\textbf{0}, {\boldsymbol{\varSigma }}_{\textbf{y}})\). The optimal LDP curve of G is

$$\begin{aligned} \delta _G(\epsilon ) = \varPhi \Big (\frac{\varDelta }{2} - \frac{\epsilon }{\varDelta }\Big ) - e^{\epsilon }\varPhi \Big (-\frac{\varDelta }{2} - \frac{\epsilon }{\varDelta }\Big ) \end{aligned}$$
(2)

where \(\varDelta = \max _{{\boldsymbol{x}},{\boldsymbol{x}}' \in {\mathcal {S}}_d} \varDelta _{{\boldsymbol{x}}, {\boldsymbol{x}}'}\) with

$$\begin{aligned} \varDelta _{{\boldsymbol{x}},{\boldsymbol{x}}'} \triangleq \sqrt{({\boldsymbol{x}}-{\boldsymbol{x}}')^{\scriptscriptstyle \textsf{T}}{\boldsymbol{\varSigma }}_{\textbf{y}}^{-1}({\boldsymbol{x}}- {\boldsymbol{x}}')}. \end{aligned}$$
(3)

In Sect. 4.4, we shall verify the similarity between the privacy of SecAgg and that of the Gaussian mechanism G via numerical examples.

Parameter \(\varDelta \) is the maximum Euclidean distance between a pair of input vectors transformed by matrix \({\boldsymbol{\varSigma }}^{-1/2}\) (similar to the whitening transformation). It plays the same role as the ratio of the sensitivity and the noise standard deviation in the case of uncorrelated noise [4]. We remark that the privacy guarantee of G is weakened as \(\varDelta \) increases: for a given \(\epsilon \), \(\delta _G(\epsilon )\) increases with \(\varDelta \). To achieve small \(\epsilon \) and \(\delta \), we need \(\varDelta \) to be small. The impact of \(\varDelta \) can also be seen via the hypothesis test associated to the considered membership inference game. Consider an adversary that observes an output \({\boldsymbol{z}}\) of G and tests between \(\{H: {\boldsymbol{z}}\text { came from }P_{G({\textbf{x}}) | {\textbf{x}}= {\boldsymbol{x}}}\}\) and \(\{H': {\boldsymbol{z}}\text { came from }P_{G({\textbf{x}}) | {\textbf{x}}= {\boldsymbol{x}}'}\}\). This is effectively a test between \({\mathcal {N}}({\boldsymbol{x}}, {\boldsymbol{\varSigma }}_{\textbf{y}})\) and \({\mathcal {N}}({\boldsymbol{x}}', {\boldsymbol{\varSigma }}_{\textbf{y}})\). The trade-off function for this test is stated in the following proposition, which is proved in Appendix A.3.

Proposition 1

\(T_{{\mathcal {N}}({\boldsymbol{x}},{\boldsymbol{\varSigma }}_{\textbf{y}}), {\mathcal {N}}({\boldsymbol{x}}',{\boldsymbol{\varSigma }}_{\textbf{y}})}(\alpha ) = \varPhi \big (\varPhi ^{-1}(1-\alpha ) - \varDelta _{{\boldsymbol{x}},{\boldsymbol{x}}'}\big )\), \(\alpha \in [0,1]\).

The trade-off function decreases with \(\varDelta _{{\boldsymbol{x}},{\boldsymbol{x}}'}\). A large \(\varDelta _{{\boldsymbol{x}},{\boldsymbol{x}}'}\) facilitates the distinguishability of the pair \(({\boldsymbol{x}},{\boldsymbol{x}}')\), and thus weakens the privacy guarantee. Furthermore, the worst-case pair \(({\boldsymbol{x}},{\boldsymbol{x}}')\) that minimizes the trade-off function is given by the maximizer of \(\varDelta _{{\boldsymbol{x}},{\boldsymbol{x}}'}\). It follows that the optimal f-LDP curve of the Gaussian mechanism G is \(f_G(\alpha ) = \varPhi \big (\varPhi ^{-1}(1-\alpha ) - \varDelta \big )\), \(\alpha \in [0,1]\).

4.4 Upper Bounding \(\delta _M(\epsilon )\) via Dominating Pairs of Distributions

We now upper-bound the optimal LDP curve of M in (1) for finite n. We shall then consider the case in which the upper bound is tight and verify the convergence of SecAgg to a Gaussian mechanism.

We define the hockey-stick divergence with parameter \(\alpha \) between two probability measures P and Q as \({\textsf{E}}_\alpha (P\Vert Q) = \sup _{{\mathcal {E}}} (P({\mathcal {E}}) - \alpha Q({\mathcal {E}}))\). We also write the hockey-stick divergence between the distributions of \({\text {x}}\) and \({\text {y}}\) as \({\textsf{E}}_\alpha ({\text {x}}\Vert {\text {y}})\). The condition for LDP in Definition 2 is equivalent to \(\textstyle \sup _{{\boldsymbol{x}}\ne {\boldsymbol{x}}'} {\textsf{E}}_{e^{\epsilon }}(M({\boldsymbol{x}}) \Vert M({\boldsymbol{x}}')) \le \delta .\) Therefore, the optimal LDP curve of mechanism M is given by

$$\begin{aligned} \delta _M({\epsilon }) = \textstyle \sup _{{\boldsymbol{x}}\ne {\boldsymbol{x}}'} {\textsf{E}}_{e^{\epsilon }}(M({\boldsymbol{x}}) \Vert M({\boldsymbol{x}}'))\,. \end{aligned}$$
(4)

A pair of measures (PQ) is called a dominating pair of distributions for M if

$$\begin{aligned} \textstyle \sup _{{\boldsymbol{x}}\ne {\boldsymbol{x}}'} {\textsf{E}}_{e^{\epsilon }}(M({\boldsymbol{x}}) \Vert M({\boldsymbol{x}}')) \le {\textsf{E}}_{e^{\epsilon }}(P \Vert Q),\quad \forall \epsilon \ge 0\,. \end{aligned}$$
(5)

If equality is achieved in (5) for every \(\epsilon \ge 0\), then (PQ) is said to be a tightly dominating pair of distributions for M. For each dominating pair (PQ), we associate a privacy-loss random variable \({\text {L}}\triangleq \ln \frac{\textrm{d}P}{\textrm{d}Q}({\textbf{y}})\) with \({\textbf{y}}\sim P\), where \(\frac{\textrm{d}P}{\textrm{d}Q}\) is the Radon-Nikodym derivative. We have that

$$\begin{aligned} {\textsf{E}}_{e^\epsilon }(P \Vert Q) = \mathbb {E}\left[ {(1-e^{\epsilon - {\text {L}}})^+}\right] \triangleq \delta _{\text {L}}(\epsilon )\,. \end{aligned}$$
(6)

It follows readily that \(\delta _{\text {L}}(\epsilon )\) is an upper bound on the optimal LDP curve \(\delta _M(\epsilon )\).

Without a known distribution of \({\textbf{y}}\), it is challenging to characterize a dominating pair of distributions for the mechanism M in (1). In the next theorem, proved in Appendix B, we make some assumptions on \(P_{\textbf{y}}\) to enable such characterization.

Theorem 3

(Dominating pair of distributions). Let \({\textbf{x}}_0 = ({\text {x}}_{01}, {\text {x}}_{02}, \dots , {\text {x}}_{0d})\) and assume that \(\underline{r}_j \le {\text {x}}_{0j} \le \overline{r}_j\), \(j\in [d]\). Assume further that \({\textbf{y}}\) has independent entries, i.e., \(P_{\textbf{y}}= P_{{\text {y}}_1} \times \dots \times P_{{\text {y}}_d}\), and that the marginal probabilities \(\{P_{{\text {y}}_j}\}\) are log-concave and symmetric.Footnote 2 Then, a dominating pair of distributions for the mechanism \(M({\textbf{x}}_0)\) in (1) is given by \((P_{\underline{r}_1 + {\text {y}}_1} \times \dots \times P_{\underline{r}_d + {\text {y}}_d}, P_{\overline{r}_1 + {\text {y}}_1} \times \dots \times P_{\overline{r}_d + {\text {y}}_d})\).

The family of log-concave distributions includes the typical noise distributions in DP, namely, the Gaussian and Laplace distributions, as well as many other common distributions, e.g., the exponential and uniform distributions [3]. If each vector \({\textbf{x}}_i, i\in [n],\) has independent entries following a log-concave distribution, then so does the sum \({\textbf{y}}= \sum _{i=1}^n {\textbf{x}}_i\), because log-concavity is closed under convolutions [28]. Under the presented assumptions, Theorem 3 allows us to characterize an upper bound on the LDP curve of M as \( \delta _{\text {L}}(\epsilon ) \) with \({\text {L}}= \sum _{j=1}^d \ln \frac{P_{\underline{r}_j + {\text {y}}_j}({\text {z}}_j)}{P_{\overline{r}_j + {\text {y}}_j}({\text {z}}_j)}\) where \({\text {z}}_j \sim P_{\underline{r}_j + {\text {y}}_j}\), \(j\in [d]\).

Corollary 1

If the support of \({\textbf{x}}_0\) contains \((\underline{r}_1,\dots ,\underline{r}_d)\) and \((\overline{r}_1,\dots ,\overline{r}_d)\), the dominating pair of distributions in Theorem 3 becomes tight, and the resulting upper bound \(\delta _{\text {L}}(\epsilon )\) is the optimal LDP curve.

We now use Corollary 1 to evaluate the optimal LDP curve of mechanism M in (1) when each \({\textbf{x}}_i\) has independent entries. We aim to verify the convergence of SecAgg to a Gaussian mechanism implied by Theorem 1 and understand how the LDP curve depends on the model size d. We consider two cases. In the first case, the entries follow the exponential distribution with parameter 1, and thus \({\textbf{y}}\) has independent entries following the Gamma distribution with shape n and scale 1. For convenience, we further assume that \({\textbf{x}}_{0}\) is truncated such that \(0\le {\text {x}}_{0j} \le 4\), \(j\in [d]\). In the second case, the entries are uniformly distributed in \([-1/2,1/2]\), and thus \({\textbf{y}}\) has independent entries following the shifted Irwin-Hall distribution with PDF \(p_{{\text {y}}_i}(y) = \frac{1}{(n-1)!} \sum _{k=0}^n (-1)^k \left( {\begin{array}{c}n\\ k\end{array}}\right) \big [(y+n/2-k)^+\big ]^{n-1}\). Both cases satisfy the conditions of Corollary 1. We can therefore obtain the optimal LDP curves and depict them in Fig. 1. We also show the optimal LDP curve of the Gaussian mechanism G with the same noise covariance matrix \({\boldsymbol{\varSigma }}_{\textbf{y}}\). We see that the optimal LDP curve of M is indeed close to that of G, even for a small value of n in the second case. Furthermore, although Theorem 1 assumes a fixed d and \(n \rightarrow \infty \), Fig. 1 suggests that M behaves similarly to a Gaussian mechanism even for large d. Remarkably, for a given \(\epsilon \), the parameter \(\delta \) increases rapidly with d, indicating that the privacy of SecAgg is weak for high-dimensional models.

Fig. 1.
figure 1

The optimal LDP curve of M in (1) where each \({\textbf{x}}_i\) has independent entries, compared with the Gaussian mechanism G with the same noise covariance matrix.

4.5 Lower Bounding \(\delta _M(\epsilon )\) and Upper Bounding \(f_M(\cdot )\) via Privacy Auditing

In practical FL, the updates typically have a distribution that does not satisfy the conditions of Theorem 3 and is not known in closed form. Therefore, we now establish a numerical routine to compute a lower bound on the optimal LDP curve and an upper bound on the optimal f-LDP curve of M. The proposed numerical routine exploits the similarity of SecAgg and a Gaussian mechanism as discussed in Sects. 4.3 and 4.4. The bounds are based on the following result.

Proposition 2

(LDP via the trade-off function). A mechanism M satisfies \((\epsilon ,\delta )\)-LDP if and only if for every \(\alpha \in [0,1]\),

$$\begin{aligned} \epsilon \ge \ln \max \left\{ \frac{1-\delta -\alpha }{\inf _{{\boldsymbol{x}}\ne {\boldsymbol{x}}'} T_{M({\boldsymbol{x}}),M({\boldsymbol{x}}')}(\alpha )}, \frac{1-\delta -\inf _{{\boldsymbol{x}}\ne {\boldsymbol{x}}'} T_{M({\boldsymbol{x}}),M({\boldsymbol{x}}')}(\alpha )}{\alpha }\right\} \,. \end{aligned}$$
(7)

The proof of Proposition 2 follows from similar arguments for DP in [23, Thm. 2.1]. This proposition implies that, if a pair \((\textrm{FPR},\textrm{FNR})\) is achievable for some decision rule \(\phi \) between the distributions of \(M({\boldsymbol{x}})\) and \(M({\boldsymbol{x}}')\) for some \(({\boldsymbol{x}},{\boldsymbol{x}}')\), then the mechanism does not satisfy \((\epsilon ,\delta )\)-LDP for \(\delta \in [0,1]\) and

$$\begin{aligned} \epsilon < \ln \max \left\{ \frac{1-\delta -\textrm{FPR}}{\textrm{FNR}}, \frac{1-\delta -\textrm{FNR}}{\textrm{FPR}}\right\} \,. \end{aligned}$$
(8)

This gives a lower bound on the optimal LDP curve of M. Furthermore, it follows readily from Definition 3 that, for an achievable pair \((\textrm{FPR},\textrm{FNR})\) of the mentioned test, the mechanism does not satisfy f-LDP for any trade-off function f such that \(f(\textrm{FPR}) < \textrm{FNR}\). Therefore, a collection of achievable pairs \((\textrm{FPR}, \textrm{FNR})\) constitutes an upper bound on the optimal f-LDP curve of M.

We shall use this result to perform privacy auditing [21, 33]. Specifically, following the defined membership inference game (see privacy threat in Sect. 4.1), we conduct a hypothesis test between \(\{H :{\boldsymbol{z}}\text { came from }P_{M({\textbf{x}}_0) | {\textbf{x}}_0 = {\boldsymbol{x}}_0}\}\) and \(\{H' :{\boldsymbol{z}}\text { came from }P_{M({\textbf{x}}_0) | {\textbf{x}}_0 = {\boldsymbol{x}}_0'}\}\) for a given output \({\boldsymbol{z}}\) of M. That is, we select a pair \(({\boldsymbol{x}}_0,{\boldsymbol{x}}_0')\) for the challenger and a decision rule \(\phi \) for the server. We then evaluate the achievable pair \((\textrm{FPR}, \textrm{FNR})\), and obtain therefrom a lower bound on the optimal LDP curve and an upper bound on the optimal f-LDP curve of M. To design the attack, we draw inspiration from the asymptotic analysis in Sect. 4.3 as follows.

We consider the likelihood-ratio test, i.e., the test rejects H if

$$\begin{aligned} \ln \frac{P_{M({\textbf{x}}_0) \,\vert \,{\textbf{x}}_0}({\boldsymbol{z}}\,\vert \,{\boldsymbol{x}}_0)}{P_{M({\textbf{x}}_0) \,\vert \,{\textbf{x}}_0}({\boldsymbol{z}}\,\vert \,{\boldsymbol{x}}_0')} = \ln \frac{P_{{\textbf{y}}}({\boldsymbol{z}}- {\boldsymbol{x}}_0)}{P_{{\textbf{y}}}({\boldsymbol{z}}- {\boldsymbol{x}}_0')} \le \theta \end{aligned}$$
(9)

for a given threshold \(\theta \). We choose the input pair \(({\boldsymbol{x}}_0,{\boldsymbol{x}}_0')\) as the worst-case pair, i.e., \(({\boldsymbol{x}}_0,{\boldsymbol{x}}_0') = \mathop {\mathrm {arg\,min}}\limits _{{\boldsymbol{x}},{\boldsymbol{x}}'} T_{M({\boldsymbol{x}}),M({\boldsymbol{x}}')}(\alpha ), ~\forall \alpha \in [0,1]\). However, the trade-off function is not known in closed-form in general, and thus finding the worst-case pair is challenging. Motivated by Theorem 1, we treat \({\textbf{y}}\) as a Gaussian vector with the same mean \({\boldsymbol{\mu }}_{{\textbf{y}}}\) and covariance \({\boldsymbol{\varSigma }}_{\textbf{y}}\). We thus approximate the trade-off function \(T_{M({\boldsymbol{x}}),M({\boldsymbol{x}}')}(\cdot )\) by \(T_{{\mathcal {N}}({\boldsymbol{x}},{\boldsymbol{\varSigma }}_{\textbf{y}}),{\mathcal {N}}({\boldsymbol{x}}',{\boldsymbol{\varSigma }}_{\textbf{y}})}(\cdot )\), and choose \(({\boldsymbol{x}}_0,{\boldsymbol{x}}_0')\) as the minimizer of the latter. Using Proposition 1, we have that

$$\begin{aligned} ({\boldsymbol{x}}_0,{\boldsymbol{x}}_0') = \mathop {\mathrm {arg\,max}}\limits _{{\boldsymbol{x}},{\boldsymbol{x}}' \in {\mathcal {X}}_0} \varDelta _{{\boldsymbol{x}}, {\boldsymbol{x}}'} \end{aligned}$$
(10)

where \({\mathcal {X}}_0\) is the support of \({\textbf{x}}_0\).

If the server does not know \(P_{\textbf{y}}({\boldsymbol{y}})\) in closed form, we let it approximate \(P_{\textbf{y}}({\boldsymbol{y}})\) as \({\mathcal {N}}({\boldsymbol{y}}; {\boldsymbol{\mu }}_{\textbf{y}}, {\boldsymbol{\varSigma }}_{\textbf{y}})\). That is, the test rejects \({\boldsymbol{x}}_0\) if

$$\begin{aligned} \ln \frac{{\mathcal {N}}({\boldsymbol{z}}- {\boldsymbol{x}}_0; {\boldsymbol{\mu }}_{\textbf{y}}, {\boldsymbol{\varSigma }}_{\textbf{y}})}{{\mathcal {N}}({\boldsymbol{z}}- {\boldsymbol{x}}_0'; {\boldsymbol{\mu }}_{\textbf{y}}, {\boldsymbol{\varSigma }}_{\textbf{y}})} \le \theta \,. \end{aligned}$$
(11)

Moreover, if the server does not know \({\boldsymbol{\mu }}_{\textbf{y}}\) and \({\boldsymbol{\varSigma }}_{\textbf{y}}\) but can generate samples from \(P_{\textbf{y}}\), we let it estimate \({\boldsymbol{\mu }}_{\textbf{y}}\) and \({\boldsymbol{\varSigma }}_{\textbf{y}}\) as the sample mean and sample covariance matrix, and use these estimates instead of the true values in (10) and (11).

We evaluate the \(\textrm{FNR}\) and \(\textrm{FPR}\) of the test via Monte-Carlo simulation. Specifically, we repeat the test \(N_{\textrm{s}}\) times and count the number of false negatives \(N_{\textrm{FN}}\) and the number false positives \(N_{\textrm{FP}}\). We obtain a high-confidence upper bound on FNR using the Clopper-Pearson method [10] as \(\overline{\textrm{FNR}} = B(1-\gamma /2; N_{\textrm{FN}} + 1, N_{\textrm{s}} - N_{\textrm{FN}})\), where B(xab) is the quantile of the Beta distribution with shapes (ab), and \(1-\gamma \) is the confidence level. A high-confidence upper bound \(\overline{\textrm{FPR}}\) on FPR is obtained similarly. By varying the threshold \(\theta \), we obtain an empirical trade-off curve \(\overline{\textrm{FNR}}\) vs. \(\overline{\textrm{FPR}}\). This curve is an upper bound on the optimal f-LDP curve of SecAgg. For a given \(\delta \in [0,1]\), we also compute a lower confidence bound on \(\epsilon \) for SecAgg to satisfies \((\epsilon ,\delta )\)-LDP. Specifically, we use \(\overline{\textrm{FNR}}\) and \(\overline{\textrm{FPR}}\) in place of \(\textrm{FNR}\) and \(\mathrm FPR\) in (8). Note that \(\overline{\textrm{FNR}}\) and \(\overline{\textrm{FPR}}\) are lower bounded by \(B(1-\gamma /2; 1, N_{\textrm{s}})\) even if \(N_{\textrm{FN}} = N_{\textrm{FP}} = 0\). Therefore, the estimated \(\epsilon \) is upper bounded by \(\overline{\epsilon }_\delta \triangleq \ln \frac{1 - \delta - B(1-\gamma /2; 1, N_{\textrm{s}})}{B(1-\gamma /2; 1, N_{\textrm{s}})}.\) That is, it is impossible to audit an arbitrarily large \(\epsilon \) with a finite number of trials.

5 Experiments and Discussion

Experimental Setting. We consider federated averaging with \(n_\textrm{tot} = 100\) clients out of which \(n+1\) clients are randomly selected in each round. The experimentsFootnote 3 are conducted for a classification problem on the ADULT dataset [5] and the EMNIST Digits dataset [11]. The ADULT dataset contains \({30\,162}\) entries with 104 features; the entries belong to two classes with 7508 positive labels and \({22\,654}\) negative labels. The EMNIST Digits dataset contains \({280\,000}\) images of size \(28\times 28\) of handwritten digits belonging to 10 balanced classes. We allocate the training samples between \(n_\textrm{tot}\) clients according to a latent Dirichlet allocation model with concentration parameter \(\omega \). Here, with \(\omega \rightarrow \infty \), the training samples are distributed evenly and uniformly between the clients; with \(\omega \rightarrow 0\), each client holds samples from only one class. We consider a single-layer neural network and use the cross-entropy loss and stochastic gradient descent with a learning rate of 0.01 and batch size of 64. The model size is \(d = 210\) for ADULT and \(d = 7850\) for EMNIST Digits.

We focus on the first round, containing one local epoch, of federated averaging and perform privacy auditing for a fixed initial model, which is known to the server. Note that performing an attack in the first round is the most challenging because, in later rounds, the server accumulates more observations. Let \(\{{\textbf{x}}_i\}_{i=0}^n\) be the local updates of the selected clients in the first round. The server does not know the distribution of \(\{{\textbf{x}}_i\}_{i=0}^n\) in closed form, but can sample from this distribution by simulating the learning scenario. Note that it is a common assumption in membership inference attacks that the adversary can sample from the population [33]. We let the server compute the sample mean \(\hat{{\boldsymbol{\mu }}}_{\textbf{x}}\) and sample covariance matrix \(\widehat{\boldsymbol{\varSigma }}_{\textbf{x}}\) from \({25\,000}\) samples of \({\textbf{x}}_i\), then estimate the mean and covariance matrix of \({\textbf{y}}= \sum _{i=1}^n {\textbf{x}}_i\) as \(\hat{{\boldsymbol{\mu }}}_{\textbf{y}}= n\hat{\boldsymbol{\mu }}_{\textbf{x}}\) and \(\widehat{\boldsymbol{\varSigma }}_{\textbf{y}}= n \widehat{\boldsymbol{\varSigma }}_{\textbf{x}}\). The server then uses \(\hat{{\boldsymbol{\mu }}}_{\textbf{y}}\) and \(\widehat{\boldsymbol{\varSigma }}_{\textbf{y}}\) for privacy auditing, as described in Sect. 4.5. Following (10), we find the worst-case input pair \(({\boldsymbol{x}}_0,{\boldsymbol{x}}_0')\) by searching for the maximizer of \(\varDelta _{{\boldsymbol{x}}_0,{\boldsymbol{x}}_0'}\) among 5000 and 1000 samples of \({\textbf{x}}_0\) for the ADULT and EMNIST Digits datasets, respectively. The Clopper-Pearson confidence level is \(1-\gamma = 95\%\). For a given initial model, we consider \(N_\textrm{s} = 5000\) trials with random data partition and batch selection. In the simulation results, we report the average of \(\overline{\textrm{FPR}}\), \(\overline{\textrm{FNR}}\), and the audited \((\epsilon ,\delta )\) over 10 and 5 initial models for the ADULT and EMNIST Digits datasets, respectively.

Homogeneous Data Partitioning. We first consider \(\omega = \infty \). In Fig. 2(a), we show the trade-off between the estimated FNR and FPR for the ADULT dataset, achieved by varying the threshold \(\theta \) in (11) for \(n+1\in \{60, 70, 90\}\) clients. Both the FNR and FPR can be as small as 0.005 simultaneously. Hence, the server can reliably distinguish the selected input pair, and the membership inference attack is successful. We note that a reference for the trade-off curve that represents different privacy levels is given in [12, Fig. 3]. There, the case with both FNR and FPR equal to 0.07 is already considered nonprivate. Comparing Fig. 2(a) with this reference, we conclude that SecAgg provides essentially no privacy for the ADULT dataset. Next, in Fig. 2(b), we show the average audited LDP curves for the ADULT dataset. We observe that the audited LDP curves are close to the largest auditable \((\overline{\epsilon }_\delta ,\delta )\) with the considered \(N_\textrm{s}\) and \(\gamma \). As \(\epsilon \) increases, \(\delta \) remains high until it drops due to the limit of the Clopper-Pearson method–even for \(\epsilon =7\), \(\delta >10^{-1}\). Furthermore, increasing n provides only a marginal privacy improvement. This shows that the privacy of SecAgg, viewed through the lens of LDP, is weak.

Fig. 2.
figure 2

The audited FPR vs. FNR trade-off and LDP curve, averaged over 10 initial models, for SecAgg in federated learning on the ADULT dataset with homogeneous data partitioning. Here, \(d = 210\).

Fig. 3.
figure 3

Audited FPR vs. FNR trade-off and LDP curves, averaged over 5 initial models, for federated learning with SecAgg on the EMNIST Digits dataset with homogeneous data partitioning. Here, \(d = 7850\).

In Fig. 3, we show the FPR vs. FNR trade-off and the audited LDP curve for the EMNIST Digits dataset. Similar conclusions hold: SecAgg provides weak privacy. In this case, with a larger model size than the ADULT dataset, the adversary achieves even smaller FPR and FNR simultaneously.

Heterogeneous Data Partitioning. We next consider \(\omega = 1\) and show the FPR vs. FNR trade-off and the audited LDP curve for the EMNIST Digits dataset in Fig. 4. In this case, the FPR and FNR are simultaneously reduced with respect to the homogeneous case, and the audited \((\epsilon ,\delta )\) coincide with the largest auditable values. This is because the worst-case pair \(({\boldsymbol{x}}_0,{\boldsymbol{x}}_0')\) is better separated than in the homogeneous case and thus easier to distinguish.

Fig. 4.
figure 4

Same as Fig. 3 but with heterogeneous data partitioning.

Discussion. We have seen that SecAgg is expected to perform like a correlated Gaussian mechanism (see Sect. 4.3). Why does SecAgg fail to prevent membership inference attacks, given that the Gaussian mechanism (with appropriate noise calibration) is known to be effective? We explain it as follows. We assume that the individual updates have entries with a bounded magnitude such that \(\Vert {\textbf{x}}_i\Vert _2 \le r \sqrt{d}\), \(i\in [0:n]\). For large n, we expect the mechanism M in (1) to have similar privacy guarantee as G in Theorem 2 with \({\mathcal {S}}_d = \{{\boldsymbol{x}}\in {\mathbb {R}}^d:\Vert {\boldsymbol{x}}\Vert _2 \le r \sqrt{d}\}\) and \({\boldsymbol{\varSigma }}_ {\textbf{y}}\) being the covariance matrix of \(\sum _{i=1}^n {\textbf{x}}_i\). In this case, \(\varDelta \ge 2\sqrt{d/n}\) (see Appendix A.2). A strong privacy guarantee requires \(\varDelta \) to be small, which implies that d/n must be small. This suggests that the privacy guarantee of SecAgg is weak if the vector dimension d is large compared to the number of clients n. This is, however, the case in most practical FL scenarios, as the model size is typically much larger than the number of clients. Note that reducing the ratio d/n by dividing the model into smaller chunks that are federated via SecAgg does not solve this issue, as the privacy loss composes over these chunks.

While with our results we have shown that SecAgg provides weak privacy for small models where d is in the order of \(10^2\)\(10^3\), we remark that the privacy guarantee is expected to further deteriorate for larger models. This is supported by the rapid degradation of the privacy guarantee with d of the Gaussian mechanism (see Fig. 1) and the similarity of SecAgg and a Gaussian mechanism.

6 Conclusions

We analyzed the privacy of SecAgg through the lens of LDP. Via privacy auditing, we showed that membership inference attacks on the output of SecAgg succeed with high probability: adding independent local updates is not sufficient to hide a local update when the model is of high dimension. While this result may not be surprising, our work fills an important gap by providing a formal analysis of the privacy of SecAgg and challenges the prevailing claims of the privacy robustness of SecAgg. Hence, it underscores that additional privacy mechanisms, such as noise addition, are needed in federated learning.