Abstract
Secure aggregation (SecAgg) is a commonly-used privacy-enhancing mechanism in federated learning, affording the server access only to the aggregate of model updates while safeguarding the confidentiality of individual updates. Despite widespread claims regarding SecAgg’s privacy-preserving capabilities, a formal analysis of its privacy is lacking, making such presumptions unjustified. In this paper, we delve into the privacy implications of SecAgg by treating it as a local differential privacy (LDP) mechanism for each local update. We design a simple attack wherein an adversarial server seeks to discern which update vector a client submitted, out of two possible ones, in a single training round of federated learning under SecAgg. By conducting privacy auditing, we assess the success probability of this attack and quantify the LDP guarantees provided by SecAgg. Our numerical results unveil that, contrary to prevailing claims, SecAgg offers weak privacy against membership inference attacks even in a single training round. Indeed, it is difficult to hide a local update by adding other independent local updates when the updates are of high dimension. Our findings underscore the imperative for additional privacy-enhancing mechanisms, such as noise injection, in federated learning.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Keywords
1 Introduction
Federated learning (FL) [27] allows multiple clients to collaboratively train a machine learning model. In each training round, the clients share their local model updates with a central server, which then aggregates them to improve the global model. Although raw data is not shared in the clear, vanilla FL is prone to model-inversion attacks [18] and membership-inference attacks [29]. To mitigate such attacks, secure aggregation (SecAgg) [9] has been proposed, where the clients jointly mask their local model updates so that only the aggregate is revealed to the server.
Many papers explicitly or implicitly assume that SecAgg provides strong privacy against honest-but-curious servers in a single round [16, 19, 30, 31]. However, a formal analysis of the privacy offered by SecAgg is lacking, making this presumption unjustified. SecAgg has been combined with differential privacy (DP) [14] to ensure that the server only sees the aggregate of the noisy local updates [2, 22]. However, in these works, the privacy analysis does not account for SecAgg. It remains unclear how much privacy SecAgg by itself provides for individual updates.
Main Contributions. We address the question: how much privacy does SecAgg by itself guarantee for the local updates? Specifically, we formally analyze the privacy of SecAgg against membership inference attacks wherein the server aims to distinguish, from two potential update vectors, the one a client submitted in a single training round of FL with SecAgg. Our approach consists in treating SecAgg as a local differential privacy (LDP) mechanism for each update, where the sum of the other clients’ updates plays the role of a source of uncontrolled noise. We then characterize the privacy parameters \((\epsilon ,\delta )\) for SecAgg to satisfy \((\epsilon ,\delta )\)-LDP via the following steps.
-
We show that, under some practical assumptions, as the client population grows, the sum of the clients’ updates converges to a Gaussian vector (Theorem 1). We analyze the optimal privacy guarantee of the Gaussian mechanism with correlated noise (Theorem 2).
-
We evaluate the optimal LDP parameters of SecAgg in some special cases (Theorem 3 and Corollary 1) and verify that these parameters are close to that of a Gaussian mechanism, even for a small number of clients (Fig. 1).
-
Exploiting the similarity of SecAgg and a Gaussian mechanism, we audit the privacy of SecAgg. Specifically, we design a simple membership inference attack wherein the server regards SecAgg as a Gaussian mechanism with correlated noise. We then evaluate the achievable false negative rate (FNR) and false positive rate (FPR) of this attack and use these values to compute a lower bound on the smallest \(\epsilon \) for SecAgg to satisfy \((\epsilon ,\delta )\)-LDP.
We apply our privacy auditing procedure to federated averaging for a classification problem on the ADULT dataset [5] and the EMNIST Digits dataset [11]. We show that both the FNR and FPR can be small simultaneously, and the audited \((\epsilon ,\delta )\) are high. Our results reveal that SecAgg provides weak privacy even for a single training round. Indeed, it is difficult to hide a local update by adding other independent local updates when the updates are of high dimension. Therefore, SecAgg cannot be used as a sole privacy-enhancing mechanism in FL.
2 Related Work
Secure Aggregation. Based on cryptographic multi-party computation, SecAgg ensures that the central server sees only the aggregate of the clients’ local updates, while individual updates are kept confidential. This is achieved by letting the clients jointly add randomly sampled masks to their updates via secret sharing, such that when the masked updates are aggregated, the masks cancel out [6, 9]. With SecAgg, a client’s update is obfuscated by many other clients’ updates. However, the level of privacy provided by SecAgg lacks a formal analysis. In [16], this level was measured by the mutual information between a local update and the aggregated update. However, mutual information only measures the average privacy leakage and does not capture the threat to the most vulnerable data points. Furthermore, the bound provided in [16] is not explicit, i.e., not computable.
Differential Privacy. DP is a rigorous privacy measure that quantifies the ability of an adversary to guess which dataset, out of two neighboring ones, a model was trained on [14, 15]. DP is typically achieved by adding noise to the model/gradients obtained from the dataset [1]. A variant of DP is LDP [13, 24], where the noise is added to individual data points. When applied to achieve client-level privacy in FL, LDP lets the clients add noise to their updates before sending the updates to the server.
Privacy Attacks in FL with SecAgg. Model inversion attacks [18] and membership-inference attacks [29] have been shown to seriously jeopardize the integrity of FL. When SecAgg is employed, the server can perform disaggregation attacks to learn the individual data. A malicious server performs active attacks by suppressing the updates of non-target clients [8, 17]. For an honest-but-curious server, existing passive attacks require that the server leverages the aggregated model across many rounds [25, 26]. Differently from these works, we consider a passive attack based only on the observation in a single round.
3 Preliminaries
We denote random quantities with lowercase nonitalic letters, such as a scalar \({\text {x}}\) and a vector \({\textbf{x}}\). The only exception is the privacy-loss random variable (PLRV) \({\text {L}}\), which is in uppercase. Deterministic quantities are denoted with italic letters, such as a scalar x and a vector \({\boldsymbol{x}}\). We denote the multidimensional normal distribution with mean \({\boldsymbol{\mu }}\) and covariance matrix \({\boldsymbol{\varSigma }}\) by \({\mathcal {N}}({\boldsymbol{\mu }},{\boldsymbol{\varSigma }})\) and its probability density function (PDF) evaluated at \({\boldsymbol{x}}\) by \({\mathcal {N}}({\boldsymbol{x}}; {\boldsymbol{\mu }}, {\boldsymbol{\varSigma }})\). We denote by \(\varPhi (x)\) the cummulative distribution function (CDF) of the standard normal distribution \({\mathcal {N}}(0,1)\), i.e., \(\varPhi (x) \triangleq \frac{1}{\sqrt{2\pi }} \int _{-\infty }^x e^{-u^2/2} \textrm{d}u\). We denote by [m : n] the set of integers from m to n; \([n] \triangleq [1:n]\); \((\cdot )^+ \triangleq \max \{0,\cdot \}\). Furthermore, \({\mathbbm {1}{\left\{ \cdot \right\} }}\) denotes the indicator function, and \(f(n) = o(g(n))\) means that \(f(n)/g(n) \rightarrow 0\) as \(n \rightarrow \infty \).
Let \(\phi \) be a decision rule of a hypothesis test between \(\{H :\!\) the underlying distribution is \(P\}\) and \(\{H' :\!\) the underlying distribution is \(Q\}\). Specifically, \(\phi \) returns 0 and 1 in favor of H and \(H'\), respectively. A false positive (resp. false negative) occurs when H (resp. \(H'\)) is true but rejected. The FPR and FNR of the test are given by \(\alpha _\phi \triangleq \mathbb {E}_{P}\left[ {\phi }\right] \) and \(\beta _\phi \triangleq 1 - \mathbb {E}_{Q}\left[ {\phi }\right] \), respectively.
Definition 1
(Trade-off function). The trade-off function \(T_{P,Q}(\cdot ) :[0,1] \rightarrow [0,1]\) is the map from the FPR to the corresponding minimum FNR of the test between P and Q, i.e., \(T_{P,Q}(\alpha ) \triangleq \inf _{\phi :\alpha _\phi \le \alpha } \beta _\phi \), \(\alpha \in [0,1]\).
We also write the trade-off function for the distributions of \({\text {x}}\) and \({\text {y}}\) as \(T_{{\text {x}},{\text {y}}}(\cdot )\). We next state the definition of LDP.
Definition 2
(LDP [13, 24]). A mechanism M satisfies \((\epsilon ,\delta )\)-LDP if and only if, for every pair of data points \(({\boldsymbol{x}},{\boldsymbol{x}}')\) and for every measurable set \({\mathcal {E}}\), we have \( \mathbb {P}\left[ {M({\boldsymbol{x}}) \in {\mathcal {E}}}\right] \le e^{\epsilon } \mathbb {P}\left[ {M({\boldsymbol{x}}') \in {\mathcal {E}}}\right] + \delta \).
For a mechanism M, we define the optimal LDP curve \(\delta _M({\epsilon })\) as the function that returns the smallest \(\delta \) for which M satisfies \(({\epsilon },\delta )\)-LDP. We next define a variant of LDP that is built upon the trade-off function in a similar manner as f-DP [12, Def. 3].
Definition 3
(f-LDP). For a function f, a mechanism M satisfies \(f\)-LDP if for every pair of data points \(({\boldsymbol{x}},{\boldsymbol{x}}')\), we have that \(T_{M({\boldsymbol{x}}), M({\boldsymbol{x}}')}(\alpha ) \ge f(\alpha ), \forall \alpha \in [0,1].\)
For a mechanism M, we define the optimal f-LDP curve \(f_M(\cdot )\) as the upper envelope of all functions f such that M satisfies f-LDP. In the paper, we regard SecAgg as a LDP mechanism and provide bounds on both its optimal LDP curve and optimal f-LDP curve.
4 Privacy Analysis of Secure Aggregation
We consider a FL scenario with \(n+1\) clients and a central server. The model update of client i can be represented as a vector \({\textbf{x}}_i \in {\mathbb {R}}^d\). Under SecAgg, the server only learns the aggregate model update \(\bar{{\textbf{x}}}= \sum _{i=0}^n {\textbf{x}}_i\), while the individual updates \(\{{\textbf{x}}_i\}_{i=0}^n\) remain confidential.
4.1 Threat Model
Server. The server is honest and follows the FL and SecAgg protocols. We assume that it observes the exact sum \(\bar{{\textbf{x}}}\). In practical SecAgg, the clients discretize their updates (using, e.g., randomized rounding) and the server obtains a modulo sum. These operations introduce perturbations that can improve privacy [34]. However, they do not capture the essence of SecAgg which is to use the updates of other clients to obfuscate an individual update. The rounding and modulo operations can be applied even in a setting without SecAgg. We ignore the perturbation caused by these operations to focus on the privacy obtainable by using the updates of other clients to obfuscate an update.
Clients. The clients are also honest. Client i computes the local model update \({\textbf{x}}_i\) from the global model in the previous round and its local dataset. We also assume that each vector \({\textbf{x}}_i\), \(i\in [0:n]\), has correlated entries since these entries together describe a model, and that the vectors are mutually independent. The latter assumption holds in the first training round if the clients have independent local data. Furthermore, the independence assumption results in the best-case scenario for privacy as, if the vectors are dependent, the sum reveals more information about each vector. Therefore, the privacy level for the case of independent \(\{{\textbf{x}}_i\}_{i=0}^n\) acts as an upper bound on the privacy for the case of dependent vectors. So if privacy does not hold for independent data, it will also not hold when there is dependence.
Privacy Threat. The server is curious. It seeks to infer the membership of a targeted client, say client 0, from the aggregate model updates \(\bar{{\textbf{x}}}\). We consider a membership inference game [33] where: i) a challenger selects a pair of possible local updates \(({\boldsymbol{x}}_0,{\boldsymbol{x}}_0')\) of client 0, one of which is used in the aggregation, and sends this pair to the server, ii) the server observes \(\bar{{\textbf{x}}}\) and guesses if \({\boldsymbol{x}}_0\) or \({\boldsymbol{x}}_0'\) was submitted by client 0. Note that this attack can be an entry point for the server to further infer the data of client 0. Our goal is to quantify the capability of SecAgg in mitigating this attack.
4.2 SecAgg as a Noiseless LDP Mechanism
Hereafter, we focus on client 0; the analysis for other clients follows similarly. Our key idea is to view SecAgg through the lens of noiseless DP [7], where the contribution of other clients can be seen as noise and no external (controlled) noise is added. More precisely, for client 0, SecAgg plays the role of the mechanism
where \({\textbf{y}}= \sum _{i=1}^n{\textbf{x}}_i\) is a source of uncontrolled noise.
The aforementioned membership inference game can be cast as follows: given \(M({\textbf{x}}_0)\), the server guesses whether it came from \(P_{M({\textbf{x}}_0) | {\textbf{x}}_0 = {\boldsymbol{x}}_0}\) or \(P_{M({\textbf{x}}_0) | {\textbf{x}}_0 = {\boldsymbol{x}}_0'}\) for the worst-case pair \(({\boldsymbol{x}}_0,{\boldsymbol{x}}_0')\). This game is closely related to the LDP framework. First, the tradeoff between the FPR and FNR of the server’s guesses is captured by the f-LDP guarantee of M. Second, as M achieves a stronger \((\epsilon ,\delta )\)-LDP guarantee, the distributions \(P_{M({\textbf{x}}_0) | {\textbf{x}}_0 = {\boldsymbol{x}}_0}\) and \(P_{M({\textbf{x}}_0) | {\textbf{x}}_0 = {\boldsymbol{x}}_0'}\) become more similar, and the hypothesis test between them becomes harder. Therefore, we shall address the following question: how much LDP or f-LDP does SecAgg guarantee for client 0? Specifically, we shall establish bounds on the optimal LDP curve \(\delta _M(\epsilon )\) and optimal f-LDP curve \(f_M(\cdot )\) of the mechanism M.
4.3 Asymptotic Privacy Guarantee
Let us first focus on the large-n regime. The following asymptotic analysis will be used as inspiration for our privacy auditing procedure to establish a lower bound on the LDP curve in Sect. 4.5.
We assume that the \(\ell _2\) norm of the vectors \(\{{\textbf{x}}_i\}_{i=1}^n\) scales as \(o(\sqrt{n})\), which holds if, e.g., d is fixed. In this case, \({\textbf{y}}\) converges to a Gaussian vector when \(n\rightarrow \infty \), as stated in the next theorem.
Theorem 1
(Asymptotic noise distribution). Assume that \(\{{\textbf{x}}_i\}_{i=1}^n\) are independent, \(\Vert {\textbf{x}}_i\Vert _2 = o(\sqrt{n})\) for \(i \in [n]\), and \(\frac{1}{n}\sum _{i=1}^n\textrm{Cov}[{\textbf{x}}_i] \rightarrow {\boldsymbol{\varSigma }}\) as \(n \rightarrow \infty \). Then \(\frac{1}{\sqrt{n}}\big (\!\sum _{i=1}^n \!{\textbf{x}}_i - \mathbb {E}\left[ {\sum _{i=1}^n \!{\textbf{x}}_i}\right] \!\big )\) converges in distribution to \({\mathcal {N}}(\textbf{0},{\boldsymbol{\varSigma }})\) as \(n\rightarrow \infty \).
Proof
Theorem 1 follows by applying the multivariate Lindeberg-Feller central limit Theorem [32, Prop. 2.27] to the triangular array \(\big \{\frac{{\textbf{x}}_i}{\sqrt{n}}\big \}_{n,i}\), upon verifying the Lindeberg condition \(\lim \limits _{n\rightarrow \infty } \sum _{i=1}^n \mathbb {E}\left[ {\frac{\Vert {\textbf{x}}_i\Vert _2^2}{n} {\mathbbm {1}{\left\{ \frac{\Vert {\textbf{x}}_i\Vert _2}{\sqrt{n}} > \varepsilon \right\} }}}\right] = 0, \forall \varepsilon > 0\). Since \(\Vert {\textbf{x}}_i\Vert _2 = o(\sqrt{n})\), i.e., \(\Vert {\textbf{x}}_i\Vert _2/\sqrt{n} \rightarrow 0\) as \(n\rightarrow \infty \), this condition indeed holds. \(\square \)
Theorem 1 implies that, when n is large, under the presented assumptions, the mechanism \(\widetilde{M}({\boldsymbol{x}}_0) = M({\boldsymbol{x}}_0) - \mathbb {E}\left[ {{\textbf{y}}}\right] \) behaves like a Gaussian mechanism with noise distribution \({\mathcal {N}}(\textbf{0},{\boldsymbol{\varSigma }}_{\textbf{y}})\), where \({\boldsymbol{\varSigma }}_{\textbf{y}}\) is the covariance matrix of \({\textbf{y}}\). Furthermore, since the map from M to \(\widetilde{M}\) is simply a shift by a fixed vector \(\mathbb {E}\left[ {{\textbf{y}}}\right] \), i.e., it is a bijection, we have from the post-processing propertyFootnote 1 that the optimal LDP curve of M is the same as that of \(\widetilde{M}\).
We now provide privacy guarantees for a Gaussian mechanism with correlated noise, to capture the correlation between the entries of the vectors \({\textbf{x}}_i\). The next theorem, proved in Appendix A.1, is an extension of the optimal privacy curve of the uncorrelated Gaussian mechanism [4, Theorem 8].
Theorem 2
(Correlated Gaussian mechanism). Consider the mechanism \(G({\textbf{x}}) = {\textbf{x}}+ {\textbf{y}}\) where \({\textbf{x}}\) belongs to a set \({\mathcal {S}}_d \subset {\mathbb {R}}^d\), and \({\textbf{y}}\sim {\mathcal {N}}(\textbf{0}, {\boldsymbol{\varSigma }}_{\textbf{y}})\). The optimal LDP curve of G is
where \(\varDelta = \max _{{\boldsymbol{x}},{\boldsymbol{x}}' \in {\mathcal {S}}_d} \varDelta _{{\boldsymbol{x}}, {\boldsymbol{x}}'}\) with
In Sect. 4.4, we shall verify the similarity between the privacy of SecAgg and that of the Gaussian mechanism G via numerical examples.
Parameter \(\varDelta \) is the maximum Euclidean distance between a pair of input vectors transformed by matrix \({\boldsymbol{\varSigma }}^{-1/2}\) (similar to the whitening transformation). It plays the same role as the ratio of the sensitivity and the noise standard deviation in the case of uncorrelated noise [4]. We remark that the privacy guarantee of G is weakened as \(\varDelta \) increases: for a given \(\epsilon \), \(\delta _G(\epsilon )\) increases with \(\varDelta \). To achieve small \(\epsilon \) and \(\delta \), we need \(\varDelta \) to be small. The impact of \(\varDelta \) can also be seen via the hypothesis test associated to the considered membership inference game. Consider an adversary that observes an output \({\boldsymbol{z}}\) of G and tests between \(\{H: {\boldsymbol{z}}\text { came from }P_{G({\textbf{x}}) | {\textbf{x}}= {\boldsymbol{x}}}\}\) and \(\{H': {\boldsymbol{z}}\text { came from }P_{G({\textbf{x}}) | {\textbf{x}}= {\boldsymbol{x}}'}\}\). This is effectively a test between \({\mathcal {N}}({\boldsymbol{x}}, {\boldsymbol{\varSigma }}_{\textbf{y}})\) and \({\mathcal {N}}({\boldsymbol{x}}', {\boldsymbol{\varSigma }}_{\textbf{y}})\). The trade-off function for this test is stated in the following proposition, which is proved in Appendix A.3.
Proposition 1
\(T_{{\mathcal {N}}({\boldsymbol{x}},{\boldsymbol{\varSigma }}_{\textbf{y}}), {\mathcal {N}}({\boldsymbol{x}}',{\boldsymbol{\varSigma }}_{\textbf{y}})}(\alpha ) = \varPhi \big (\varPhi ^{-1}(1-\alpha ) - \varDelta _{{\boldsymbol{x}},{\boldsymbol{x}}'}\big )\), \(\alpha \in [0,1]\).
The trade-off function decreases with \(\varDelta _{{\boldsymbol{x}},{\boldsymbol{x}}'}\). A large \(\varDelta _{{\boldsymbol{x}},{\boldsymbol{x}}'}\) facilitates the distinguishability of the pair \(({\boldsymbol{x}},{\boldsymbol{x}}')\), and thus weakens the privacy guarantee. Furthermore, the worst-case pair \(({\boldsymbol{x}},{\boldsymbol{x}}')\) that minimizes the trade-off function is given by the maximizer of \(\varDelta _{{\boldsymbol{x}},{\boldsymbol{x}}'}\). It follows that the optimal f-LDP curve of the Gaussian mechanism G is \(f_G(\alpha ) = \varPhi \big (\varPhi ^{-1}(1-\alpha ) - \varDelta \big )\), \(\alpha \in [0,1]\).
4.4 Upper Bounding \(\delta _M(\epsilon )\) via Dominating Pairs of Distributions
We now upper-bound the optimal LDP curve of M in (1) for finite n. We shall then consider the case in which the upper bound is tight and verify the convergence of SecAgg to a Gaussian mechanism.
We define the hockey-stick divergence with parameter \(\alpha \) between two probability measures P and Q as \({\textsf{E}}_\alpha (P\Vert Q) = \sup _{{\mathcal {E}}} (P({\mathcal {E}}) - \alpha Q({\mathcal {E}}))\). We also write the hockey-stick divergence between the distributions of \({\text {x}}\) and \({\text {y}}\) as \({\textsf{E}}_\alpha ({\text {x}}\Vert {\text {y}})\). The condition for LDP in Definition 2 is equivalent to \(\textstyle \sup _{{\boldsymbol{x}}\ne {\boldsymbol{x}}'} {\textsf{E}}_{e^{\epsilon }}(M({\boldsymbol{x}}) \Vert M({\boldsymbol{x}}')) \le \delta .\) Therefore, the optimal LDP curve of mechanism M is given by
A pair of measures (P, Q) is called a dominating pair of distributions for M if
If equality is achieved in (5) for every \(\epsilon \ge 0\), then (P, Q) is said to be a tightly dominating pair of distributions for M. For each dominating pair (P, Q), we associate a privacy-loss random variable \({\text {L}}\triangleq \ln \frac{\textrm{d}P}{\textrm{d}Q}({\textbf{y}})\) with \({\textbf{y}}\sim P\), where \(\frac{\textrm{d}P}{\textrm{d}Q}\) is the Radon-Nikodym derivative. We have that
It follows readily that \(\delta _{\text {L}}(\epsilon )\) is an upper bound on the optimal LDP curve \(\delta _M(\epsilon )\).
Without a known distribution of \({\textbf{y}}\), it is challenging to characterize a dominating pair of distributions for the mechanism M in (1). In the next theorem, proved in Appendix B, we make some assumptions on \(P_{\textbf{y}}\) to enable such characterization.
Theorem 3
(Dominating pair of distributions). Let \({\textbf{x}}_0 = ({\text {x}}_{01}, {\text {x}}_{02}, \dots , {\text {x}}_{0d})\) and assume that \(\underline{r}_j \le {\text {x}}_{0j} \le \overline{r}_j\), \(j\in [d]\). Assume further that \({\textbf{y}}\) has independent entries, i.e., \(P_{\textbf{y}}= P_{{\text {y}}_1} \times \dots \times P_{{\text {y}}_d}\), and that the marginal probabilities \(\{P_{{\text {y}}_j}\}\) are log-concave and symmetric.Footnote 2 Then, a dominating pair of distributions for the mechanism \(M({\textbf{x}}_0)\) in (1) is given by \((P_{\underline{r}_1 + {\text {y}}_1} \times \dots \times P_{\underline{r}_d + {\text {y}}_d}, P_{\overline{r}_1 + {\text {y}}_1} \times \dots \times P_{\overline{r}_d + {\text {y}}_d})\).
The family of log-concave distributions includes the typical noise distributions in DP, namely, the Gaussian and Laplace distributions, as well as many other common distributions, e.g., the exponential and uniform distributions [3]. If each vector \({\textbf{x}}_i, i\in [n],\) has independent entries following a log-concave distribution, then so does the sum \({\textbf{y}}= \sum _{i=1}^n {\textbf{x}}_i\), because log-concavity is closed under convolutions [28]. Under the presented assumptions, Theorem 3 allows us to characterize an upper bound on the LDP curve of M as \( \delta _{\text {L}}(\epsilon ) \) with \({\text {L}}= \sum _{j=1}^d \ln \frac{P_{\underline{r}_j + {\text {y}}_j}({\text {z}}_j)}{P_{\overline{r}_j + {\text {y}}_j}({\text {z}}_j)}\) where \({\text {z}}_j \sim P_{\underline{r}_j + {\text {y}}_j}\), \(j\in [d]\).
Corollary 1
If the support of \({\textbf{x}}_0\) contains \((\underline{r}_1,\dots ,\underline{r}_d)\) and \((\overline{r}_1,\dots ,\overline{r}_d)\), the dominating pair of distributions in Theorem 3 becomes tight, and the resulting upper bound \(\delta _{\text {L}}(\epsilon )\) is the optimal LDP curve.
We now use Corollary 1 to evaluate the optimal LDP curve of mechanism M in (1) when each \({\textbf{x}}_i\) has independent entries. We aim to verify the convergence of SecAgg to a Gaussian mechanism implied by Theorem 1 and understand how the LDP curve depends on the model size d. We consider two cases. In the first case, the entries follow the exponential distribution with parameter 1, and thus \({\textbf{y}}\) has independent entries following the Gamma distribution with shape n and scale 1. For convenience, we further assume that \({\textbf{x}}_{0}\) is truncated such that \(0\le {\text {x}}_{0j} \le 4\), \(j\in [d]\). In the second case, the entries are uniformly distributed in \([-1/2,1/2]\), and thus \({\textbf{y}}\) has independent entries following the shifted Irwin-Hall distribution with PDF \(p_{{\text {y}}_i}(y) = \frac{1}{(n-1)!} \sum _{k=0}^n (-1)^k \left( {\begin{array}{c}n\\ k\end{array}}\right) \big [(y+n/2-k)^+\big ]^{n-1}\). Both cases satisfy the conditions of Corollary 1. We can therefore obtain the optimal LDP curves and depict them in Fig. 1. We also show the optimal LDP curve of the Gaussian mechanism G with the same noise covariance matrix \({\boldsymbol{\varSigma }}_{\textbf{y}}\). We see that the optimal LDP curve of M is indeed close to that of G, even for a small value of n in the second case. Furthermore, although Theorem 1 assumes a fixed d and \(n \rightarrow \infty \), Fig. 1 suggests that M behaves similarly to a Gaussian mechanism even for large d. Remarkably, for a given \(\epsilon \), the parameter \(\delta \) increases rapidly with d, indicating that the privacy of SecAgg is weak for high-dimensional models.
4.5 Lower Bounding \(\delta _M(\epsilon )\) and Upper Bounding \(f_M(\cdot )\) via Privacy Auditing
In practical FL, the updates typically have a distribution that does not satisfy the conditions of Theorem 3 and is not known in closed form. Therefore, we now establish a numerical routine to compute a lower bound on the optimal LDP curve and an upper bound on the optimal f-LDP curve of M. The proposed numerical routine exploits the similarity of SecAgg and a Gaussian mechanism as discussed in Sects. 4.3 and 4.4. The bounds are based on the following result.
Proposition 2
(LDP via the trade-off function). A mechanism M satisfies \((\epsilon ,\delta )\)-LDP if and only if for every \(\alpha \in [0,1]\),
The proof of Proposition 2 follows from similar arguments for DP in [23, Thm. 2.1]. This proposition implies that, if a pair \((\textrm{FPR},\textrm{FNR})\) is achievable for some decision rule \(\phi \) between the distributions of \(M({\boldsymbol{x}})\) and \(M({\boldsymbol{x}}')\) for some \(({\boldsymbol{x}},{\boldsymbol{x}}')\), then the mechanism does not satisfy \((\epsilon ,\delta )\)-LDP for \(\delta \in [0,1]\) and
This gives a lower bound on the optimal LDP curve of M. Furthermore, it follows readily from Definition 3 that, for an achievable pair \((\textrm{FPR},\textrm{FNR})\) of the mentioned test, the mechanism does not satisfy f-LDP for any trade-off function f such that \(f(\textrm{FPR}) < \textrm{FNR}\). Therefore, a collection of achievable pairs \((\textrm{FPR}, \textrm{FNR})\) constitutes an upper bound on the optimal f-LDP curve of M.
We shall use this result to perform privacy auditing [21, 33]. Specifically, following the defined membership inference game (see privacy threat in Sect. 4.1), we conduct a hypothesis test between \(\{H :{\boldsymbol{z}}\text { came from }P_{M({\textbf{x}}_0) | {\textbf{x}}_0 = {\boldsymbol{x}}_0}\}\) and \(\{H' :{\boldsymbol{z}}\text { came from }P_{M({\textbf{x}}_0) | {\textbf{x}}_0 = {\boldsymbol{x}}_0'}\}\) for a given output \({\boldsymbol{z}}\) of M. That is, we select a pair \(({\boldsymbol{x}}_0,{\boldsymbol{x}}_0')\) for the challenger and a decision rule \(\phi \) for the server. We then evaluate the achievable pair \((\textrm{FPR}, \textrm{FNR})\), and obtain therefrom a lower bound on the optimal LDP curve and an upper bound on the optimal f-LDP curve of M. To design the attack, we draw inspiration from the asymptotic analysis in Sect. 4.3 as follows.
We consider the likelihood-ratio test, i.e., the test rejects H if
for a given threshold \(\theta \). We choose the input pair \(({\boldsymbol{x}}_0,{\boldsymbol{x}}_0')\) as the worst-case pair, i.e., \(({\boldsymbol{x}}_0,{\boldsymbol{x}}_0') = \mathop {\mathrm {arg\,min}}\limits _{{\boldsymbol{x}},{\boldsymbol{x}}'} T_{M({\boldsymbol{x}}),M({\boldsymbol{x}}')}(\alpha ), ~\forall \alpha \in [0,1]\). However, the trade-off function is not known in closed-form in general, and thus finding the worst-case pair is challenging. Motivated by Theorem 1, we treat \({\textbf{y}}\) as a Gaussian vector with the same mean \({\boldsymbol{\mu }}_{{\textbf{y}}}\) and covariance \({\boldsymbol{\varSigma }}_{\textbf{y}}\). We thus approximate the trade-off function \(T_{M({\boldsymbol{x}}),M({\boldsymbol{x}}')}(\cdot )\) by \(T_{{\mathcal {N}}({\boldsymbol{x}},{\boldsymbol{\varSigma }}_{\textbf{y}}),{\mathcal {N}}({\boldsymbol{x}}',{\boldsymbol{\varSigma }}_{\textbf{y}})}(\cdot )\), and choose \(({\boldsymbol{x}}_0,{\boldsymbol{x}}_0')\) as the minimizer of the latter. Using Proposition 1, we have that
where \({\mathcal {X}}_0\) is the support of \({\textbf{x}}_0\).
If the server does not know \(P_{\textbf{y}}({\boldsymbol{y}})\) in closed form, we let it approximate \(P_{\textbf{y}}({\boldsymbol{y}})\) as \({\mathcal {N}}({\boldsymbol{y}}; {\boldsymbol{\mu }}_{\textbf{y}}, {\boldsymbol{\varSigma }}_{\textbf{y}})\). That is, the test rejects \({\boldsymbol{x}}_0\) if
Moreover, if the server does not know \({\boldsymbol{\mu }}_{\textbf{y}}\) and \({\boldsymbol{\varSigma }}_{\textbf{y}}\) but can generate samples from \(P_{\textbf{y}}\), we let it estimate \({\boldsymbol{\mu }}_{\textbf{y}}\) and \({\boldsymbol{\varSigma }}_{\textbf{y}}\) as the sample mean and sample covariance matrix, and use these estimates instead of the true values in (10) and (11).
We evaluate the \(\textrm{FNR}\) and \(\textrm{FPR}\) of the test via Monte-Carlo simulation. Specifically, we repeat the test \(N_{\textrm{s}}\) times and count the number of false negatives \(N_{\textrm{FN}}\) and the number false positives \(N_{\textrm{FP}}\). We obtain a high-confidence upper bound on FNR using the Clopper-Pearson method [10] as \(\overline{\textrm{FNR}} = B(1-\gamma /2; N_{\textrm{FN}} + 1, N_{\textrm{s}} - N_{\textrm{FN}})\), where B(x; a, b) is the quantile of the Beta distribution with shapes (a, b), and \(1-\gamma \) is the confidence level. A high-confidence upper bound \(\overline{\textrm{FPR}}\) on FPR is obtained similarly. By varying the threshold \(\theta \), we obtain an empirical trade-off curve \(\overline{\textrm{FNR}}\) vs. \(\overline{\textrm{FPR}}\). This curve is an upper bound on the optimal f-LDP curve of SecAgg. For a given \(\delta \in [0,1]\), we also compute a lower confidence bound on \(\epsilon \) for SecAgg to satisfies \((\epsilon ,\delta )\)-LDP. Specifically, we use \(\overline{\textrm{FNR}}\) and \(\overline{\textrm{FPR}}\) in place of \(\textrm{FNR}\) and \(\mathrm FPR\) in (8). Note that \(\overline{\textrm{FNR}}\) and \(\overline{\textrm{FPR}}\) are lower bounded by \(B(1-\gamma /2; 1, N_{\textrm{s}})\) even if \(N_{\textrm{FN}} = N_{\textrm{FP}} = 0\). Therefore, the estimated \(\epsilon \) is upper bounded by \(\overline{\epsilon }_\delta \triangleq \ln \frac{1 - \delta - B(1-\gamma /2; 1, N_{\textrm{s}})}{B(1-\gamma /2; 1, N_{\textrm{s}})}.\) That is, it is impossible to audit an arbitrarily large \(\epsilon \) with a finite number of trials.
5 Experiments and Discussion
Experimental Setting. We consider federated averaging with \(n_\textrm{tot} = 100\) clients out of which \(n+1\) clients are randomly selected in each round. The experimentsFootnote 3 are conducted for a classification problem on the ADULT dataset [5] and the EMNIST Digits dataset [11]. The ADULT dataset contains \({30\,162}\) entries with 104 features; the entries belong to two classes with 7508 positive labels and \({22\,654}\) negative labels. The EMNIST Digits dataset contains \({280\,000}\) images of size \(28\times 28\) of handwritten digits belonging to 10 balanced classes. We allocate the training samples between \(n_\textrm{tot}\) clients according to a latent Dirichlet allocation model with concentration parameter \(\omega \). Here, with \(\omega \rightarrow \infty \), the training samples are distributed evenly and uniformly between the clients; with \(\omega \rightarrow 0\), each client holds samples from only one class. We consider a single-layer neural network and use the cross-entropy loss and stochastic gradient descent with a learning rate of 0.01 and batch size of 64. The model size is \(d = 210\) for ADULT and \(d = 7850\) for EMNIST Digits.
We focus on the first round, containing one local epoch, of federated averaging and perform privacy auditing for a fixed initial model, which is known to the server. Note that performing an attack in the first round is the most challenging because, in later rounds, the server accumulates more observations. Let \(\{{\textbf{x}}_i\}_{i=0}^n\) be the local updates of the selected clients in the first round. The server does not know the distribution of \(\{{\textbf{x}}_i\}_{i=0}^n\) in closed form, but can sample from this distribution by simulating the learning scenario. Note that it is a common assumption in membership inference attacks that the adversary can sample from the population [33]. We let the server compute the sample mean \(\hat{{\boldsymbol{\mu }}}_{\textbf{x}}\) and sample covariance matrix \(\widehat{\boldsymbol{\varSigma }}_{\textbf{x}}\) from \({25\,000}\) samples of \({\textbf{x}}_i\), then estimate the mean and covariance matrix of \({\textbf{y}}= \sum _{i=1}^n {\textbf{x}}_i\) as \(\hat{{\boldsymbol{\mu }}}_{\textbf{y}}= n\hat{\boldsymbol{\mu }}_{\textbf{x}}\) and \(\widehat{\boldsymbol{\varSigma }}_{\textbf{y}}= n \widehat{\boldsymbol{\varSigma }}_{\textbf{x}}\). The server then uses \(\hat{{\boldsymbol{\mu }}}_{\textbf{y}}\) and \(\widehat{\boldsymbol{\varSigma }}_{\textbf{y}}\) for privacy auditing, as described in Sect. 4.5. Following (10), we find the worst-case input pair \(({\boldsymbol{x}}_0,{\boldsymbol{x}}_0')\) by searching for the maximizer of \(\varDelta _{{\boldsymbol{x}}_0,{\boldsymbol{x}}_0'}\) among 5000 and 1000 samples of \({\textbf{x}}_0\) for the ADULT and EMNIST Digits datasets, respectively. The Clopper-Pearson confidence level is \(1-\gamma = 95\%\). For a given initial model, we consider \(N_\textrm{s} = 5000\) trials with random data partition and batch selection. In the simulation results, we report the average of \(\overline{\textrm{FPR}}\), \(\overline{\textrm{FNR}}\), and the audited \((\epsilon ,\delta )\) over 10 and 5 initial models for the ADULT and EMNIST Digits datasets, respectively.
Homogeneous Data Partitioning. We first consider \(\omega = \infty \). In Fig. 2(a), we show the trade-off between the estimated FNR and FPR for the ADULT dataset, achieved by varying the threshold \(\theta \) in (11) for \(n+1\in \{60, 70, 90\}\) clients. Both the FNR and FPR can be as small as 0.005 simultaneously. Hence, the server can reliably distinguish the selected input pair, and the membership inference attack is successful. We note that a reference for the trade-off curve that represents different privacy levels is given in [12, Fig. 3]. There, the case with both FNR and FPR equal to 0.07 is already considered nonprivate. Comparing Fig. 2(a) with this reference, we conclude that SecAgg provides essentially no privacy for the ADULT dataset. Next, in Fig. 2(b), we show the average audited LDP curves for the ADULT dataset. We observe that the audited LDP curves are close to the largest auditable \((\overline{\epsilon }_\delta ,\delta )\) with the considered \(N_\textrm{s}\) and \(\gamma \). As \(\epsilon \) increases, \(\delta \) remains high until it drops due to the limit of the Clopper-Pearson method–even for \(\epsilon =7\), \(\delta >10^{-1}\). Furthermore, increasing n provides only a marginal privacy improvement. This shows that the privacy of SecAgg, viewed through the lens of LDP, is weak.
In Fig. 3, we show the FPR vs. FNR trade-off and the audited LDP curve for the EMNIST Digits dataset. Similar conclusions hold: SecAgg provides weak privacy. In this case, with a larger model size than the ADULT dataset, the adversary achieves even smaller FPR and FNR simultaneously.
Heterogeneous Data Partitioning. We next consider \(\omega = 1\) and show the FPR vs. FNR trade-off and the audited LDP curve for the EMNIST Digits dataset in Fig. 4. In this case, the FPR and FNR are simultaneously reduced with respect to the homogeneous case, and the audited \((\epsilon ,\delta )\) coincide with the largest auditable values. This is because the worst-case pair \(({\boldsymbol{x}}_0,{\boldsymbol{x}}_0')\) is better separated than in the homogeneous case and thus easier to distinguish.
Discussion. We have seen that SecAgg is expected to perform like a correlated Gaussian mechanism (see Sect. 4.3). Why does SecAgg fail to prevent membership inference attacks, given that the Gaussian mechanism (with appropriate noise calibration) is known to be effective? We explain it as follows. We assume that the individual updates have entries with a bounded magnitude such that \(\Vert {\textbf{x}}_i\Vert _2 \le r \sqrt{d}\), \(i\in [0:n]\). For large n, we expect the mechanism M in (1) to have similar privacy guarantee as G in Theorem 2 with \({\mathcal {S}}_d = \{{\boldsymbol{x}}\in {\mathbb {R}}^d:\Vert {\boldsymbol{x}}\Vert _2 \le r \sqrt{d}\}\) and \({\boldsymbol{\varSigma }}_ {\textbf{y}}\) being the covariance matrix of \(\sum _{i=1}^n {\textbf{x}}_i\). In this case, \(\varDelta \ge 2\sqrt{d/n}\) (see Appendix A.2). A strong privacy guarantee requires \(\varDelta \) to be small, which implies that d/n must be small. This suggests that the privacy guarantee of SecAgg is weak if the vector dimension d is large compared to the number of clients n. This is, however, the case in most practical FL scenarios, as the model size is typically much larger than the number of clients. Note that reducing the ratio d/n by dividing the model into smaller chunks that are federated via SecAgg does not solve this issue, as the privacy loss composes over these chunks.
While with our results we have shown that SecAgg provides weak privacy for small models where d is in the order of \(10^2\)–\(10^3\), we remark that the privacy guarantee is expected to further deteriorate for larger models. This is supported by the rapid degradation of the privacy guarantee with d of the Gaussian mechanism (see Fig. 1) and the similarity of SecAgg and a Gaussian mechanism.
6 Conclusions
We analyzed the privacy of SecAgg through the lens of LDP. Via privacy auditing, we showed that membership inference attacks on the output of SecAgg succeed with high probability: adding independent local updates is not sufficient to hide a local update when the model is of high dimension. While this result may not be surprising, our work fills an important gap by providing a formal analysis of the privacy of SecAgg and challenges the prevailing claims of the privacy robustness of SecAgg. Hence, it underscores that additional privacy mechanisms, such as noise addition, are needed in federated learning.
Notes
- 1.
If a mechanism M satisfies \((\epsilon ,\delta )\)-LDP, then so does \(h \circ M\) for a mapping h that is independent of M. The proof of this result is similar to the proof of the post-processing property of DP [15, Prop. 2.1].
- 2.
\(P_{{\text {y}}_j}\) is symmetric if there exists a \(y^*\) such that \(P_{{\text {y}}_j}({\mathcal {A}}+ y^*) = P_{{\text {y}}_j}(-{\mathcal {A}}+ y^*)\) for every subset \({\mathcal {A}}\) of the support of \({\text {y}}_j\). Here, \(-{\mathcal {A}}\triangleq \{-y :y \in {\mathcal {A}}\}\).
- 3.
The code is available at https://github.com/khachoang1412/SecAgg_not_private.
References
Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. ACM, New York (2016)
Agarwal, N., Suresh, A.T., Yu, F., Kumar, S., McMahan, H.B.: CpSGD: communication-efficient and differentially-private distributed SGD. In: Proceedings of the International Conference in Neural Information Processing Systems (NIPS), NIPS 2018, pp. 7575–7586 (2018)
Bagnoli, M., Bergstrom, T.: Log-concave probability and its applications. In: Aliprantis, C.D., Matzkin, R.L., McFadden, D.L., Moore, J.C., Yannelis, N.C. (eds.) Rationality and Equilibrium. Studies in Economic Theory, vol. 26, pp. 217–241. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-29578-X_11
Balle, B., Wang, Y.X.: Improving the Gaussian mechanism for differential privacy: analytical calibration and optimal denoising. In: Proceedings of the International Conference Machine Learning (ICML), pp. 394–403. PMLR (2018)
Becker, B., Kohavi, R.: Adult. UCI Machine Learning Repository (1996)
Bell, J.H., Bonawitz, K.A., Gascón, A., Lepoint, T., Raykova, M.: Secure single-server aggregation with (poly)logarithmic overhead. In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS), pp. 1253–1269. ACM, New York (2020)
Bhaskar, R., Bhowmick, A., Goyal, V., Laxman, S., Thakurta, A.: Noiseless database privacy. In: Lee, D.H., Wang, X. (eds.) ASIACRYPT 2011. LNCS, vol. 7073, pp. 215–232. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25385-0_12
Boenisch, F., Dziedzic, A., et al.: Reconstructing individual data points in federated learning hardened with differential privacy and secure aggregation. In: Proceedings of the European Symposium on Security and Privacy (EuroS&P), pp. 241–257 (2023)
Bonawitz, K., et al.: Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191 (2017)
Clopper, C.J., Pearson, E.S.: The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26(4), 404–413 (1934)
Cohen, G., Afshar, S., Tapson, J., van Schaik, A.: EMNIST: extending MNIST to handwritten letters. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), pp. 2921–2926 (2017)
Dong, J., Roth, A., Su, W.: Gaussian differential privacy. J. Roy. Stat. Soc. 84(1), 3–37 (2021)
Duchi, J.C., Jordan, M.I., Wainwright, M.J.: Local privacy and statistical minimax rates. In: Proceedings of the Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 429–438 (2013)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Dwork, C., Roth, A., et al.: The algorithmic foundations of differential privacy. Found. Trends® Theoret. Comput. Sci. 9(3–4), 211–407 (2014)
Elkordy, A.R., Zhang, J., Ezzeldin, Y.H., Psounis, K., Avestimehr, S.: How much privacy does federated learning with secure aggregation guarantee? In: Proceedings of the Privacy Enhancing Technologies Symposium (PETS), pp. 510–526 (2023)
Fowl, L.H., Geiping, J., Czaja, W., Goldblum, M., Goldstein, T.: Robbing the fed: directly obtaining private data in federated learning with modified models. In: Proceedings of the International Conference on Learning Representations (ICLR) (2022)
Geiping, J., Bauermeister, H., Dröge, H., Moeller, M.: Inverting gradients - how easy is it to break privacy in federated learning? In: Proceedings of the International Conference on Neural Information Processing Systems (NeuRIPS), NeuRIPS 2020 (2020)
Hatamizadeh, A., et al.: Do gradient inversion attacks make federated learning unsafe? IEEE Trans. Med. Imaging 42(7), 2044–2056 (2023)
Hogg, R.V., Tanis, E.A., Zimmerman, D.: Probability and Statistical Inference, 9th edn. Pearson, Upper Saddle River (2015)
Jagielski, M., Ullman, J., Oprea, A.: Auditing differentially private machine learning: how private is private SGD? In: Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 22205–22216 (2020)
Kairouz, P., Liu, Z., Steinke, T.: The distributed discrete Gaussian mechanism for federated learning with secure aggregation. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 5201–5212. PMLR (2021)
Kairouz, P., Oh, S., Viswanath, P.: The composition theorem for differential privacy. IEEE Trans. Inf. Theory 63(6), 4037–4049 (2017)
Kasiviswanathan, S.P., Lee, H.K., Nissim, K., Raskhodnikova, S., Smith, A.: What can we learn privately? SIAM J. Comput. 40(3), 793–826 (2011)
Kerkouche, R., Ács, G., Fritz, M.: Client-specific property inference against secure aggregation in federated learning. In: Proceedings of the Workshop Privacy in the Electronic Society, WPES 2023, pp. 45–60. ACM, New York (2023)
Lam, M., Wei, G.Y., Brooks, D., Reddi, V.J., Mitzenmacher, M.: Gradient disaggregation: breaking privacy in federated learning by reconstructing the user participant matrix. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 5959–5968. PMLR (2021)
McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.y.: Communication-efficient learning of deep networks from decentralized data. In: Singh, A., Zhu, J. (eds.) Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, vol. 54, pp. 1273–1282. PMLR, 20–22 April 2017
Merkle, M.: Convolutions of logarithmically concave functions. Publikacije Elektrotehničkog fakulteta. Serija Matematika, pp. 113–117 (1998)
Nasr, M., Shokri, R., Houmansadr, A.: Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In: Proceedings of the IEEE Symposium on Security and Privacy (SP), pp. 739–753 (2019)
So, J., Ali, R.E., Güler, B., Jiao, J., Avestimehr, A.S.: Securing secure aggregation: mitigating multi-round privacy leakage in federated learning. In: Proceedings of the AAAI Conference on Artificial Intelligence & Conf. Innov. App. Art. Intel. & Symp. Edu. Adv. Art. Intel. AAAI 2023/IAAI 2023/EAAI 2023. AAAI Press (2023)
Ullah, E., Choquette-Choo, C.A., Kairouz, P., Oh, S.: Private federated learning with autotuned compression. In: Proceedings of the International Conference on Machine Learning (ICML), ICML 2023. JMLR.org (2023)
Van der Vaart, A.W.: Asymptotic Statistics, vol. 3. Cambridge University Press, Cambridge (2000)
Ye, J., Maddi, A., Murakonda, S.K., Bindschaedler, V., Shokri, R.: Enhanced membership inference attacks against machine learning models. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, pp. 3093–3106 (2022)
Youn, Y., Hu, Z., Ziani, J., Abernethy, J.: Randomized quantization is all you need for differential privacy in federated learning. In: ICML Workshop (2023)
Zhu, Y., Dong, J., Wang, Y.X.: Optimal accounting of differential privacy via characteristic function. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, vol. 151, pp. 4782–4817. PMLR, 28–30 March 2022
Acknowledgments
This work has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 101022113. This work was also partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) and by the Swedish Research Council (VR) under grant 2020-03687. The authors would like to thank Onur Günlü and Balázs Pejó for fruitful discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Correlated Gaussian Mechanism
We present an analysis of the LDP guarantee of the Gaussian mechanism \(G({\textbf{x}}) = {\textbf{x}}+ {\textbf{y}}\), where \({\textbf{x}}\) belongs to a subset \({\mathcal {S}}_d\) of \({\mathbb {R}}^d\), and \({\textbf{y}}\sim {\mathcal {N}}(\textbf{0}, {\boldsymbol{\varSigma }}_{\textbf{y}})\).
1.1 A.1 Optimal LDP Curve: Proof of Theorem 2
We extend [4, Thm. 8] to the case of correlated noise. First, for a mechanism M and a pair \({\boldsymbol{x}},{\boldsymbol{x}}'\), we define the privacy loss function as \(L_{M,{\boldsymbol{x}},{\boldsymbol{x}}'}({\boldsymbol{z}}) \triangleq \ln \frac{P_{M({\boldsymbol{x}})}({\boldsymbol{z}})}{P_{M({\boldsymbol{x}}')}({\boldsymbol{z}})}\). The PLRV \({\text {L}}_{M,{\boldsymbol{x}},{\boldsymbol{x}}'}\) is defined as the output of \(L_{M,{\boldsymbol{x}},{\boldsymbol{x}}'}\) when the input follows \(P_{M({\boldsymbol{x}})}\). The PLRV can be used to express the optimal LDP curve as
Equation (12) is obtained from a similar result for DP given in [4, Thm. 5], upon modifying the notion of neighboring datasets.
For the mechanism G, we have that \(P_{G({\boldsymbol{x}})}({\boldsymbol{z}}) = \frac{\exp (-\frac{1}{2}({\boldsymbol{z}}- {\boldsymbol{x}})^{\scriptscriptstyle \textsf{T}}{\boldsymbol{\varSigma }}_{\textbf{y}}^{-1} ({\boldsymbol{z}}-{\boldsymbol{x}}))}{\sqrt{(2\pi )^d|{\boldsymbol{\varSigma }}_{\textbf{y}}|}}\). Therefore, the PLRV can be expressed as
With \({\textbf{z}}\) identically distributed to \(G({\boldsymbol{x}})\), i.e., \({\textbf{z}}\sim {\mathcal {N}}({\boldsymbol{x}},{\boldsymbol{\varSigma }}_{\textbf{y}})\), we have that \({\text {L}}_{G,{\boldsymbol{x}},{\boldsymbol{x}}'} \sim {\mathcal {N}}(\eta ,2\eta )\) with \(\eta = \frac{1}{2}({\boldsymbol{x}}- {\boldsymbol{x}}')^{\scriptscriptstyle \textsf{T}}{\boldsymbol{\varSigma }}_{\textbf{y}}^{-1} ({\boldsymbol{x}}-{\boldsymbol{x}}')\). It follows from [4, Lemma 7] that for \({\text {x}}\sim {\mathcal {N}}(\eta ,2\eta )\), \(\mathbb {P}\left[ {{\text {x}}\ge \epsilon }\right] - e^\epsilon \left[ {{\text {x}}\le -\epsilon }\right] \) is monotonically increasing in \(\eta \). By applying this result with \({\text {x}}= {\text {L}}_{G,{\boldsymbol{x}},{\boldsymbol{x}}'}\), we obtain that the maximum in the right-hand side of (12) is achieved when \(\eta \) is maximized, i.e., when \(\eta = \max _{{\boldsymbol{x}},{\boldsymbol{x}}'\in {\mathcal {S}}_d} \frac{1}{2}({\boldsymbol{x}}- {\boldsymbol{x}}')^{\scriptscriptstyle \textsf{T}}{\boldsymbol{\varSigma }}_{\textbf{y}}^{-1} ({\boldsymbol{x}}-{\boldsymbol{x}}') = \varDelta ^2/2\). We then obtain the optimal LDP curve (2) after some simple computations.
1.2 A.2 The Case \(\mathcal {S}_d=\{{\boldsymbol{x}}\in \mathbb {R}^{d}:||{\boldsymbol{x}}||_{2} \le {\boldsymbol{r}}\sqrt{d}\}\)
In this case, for every \({\boldsymbol{x}},{\boldsymbol{x}}'\in {\mathcal {S}}_d\), we have that
where \(\lambda _{\textrm{max}}({\boldsymbol{\varSigma }}_{\textbf{y}}^{-1})\) is the largest eigenvalue of \({\boldsymbol{\varSigma }}_{\textbf{y}}^{-1}\) and \(\lambda _{\textrm{min}}({\boldsymbol{\varSigma }}_{\textbf{y}})\) is the smallest eigenvalue of \({\boldsymbol{\varSigma }}_{\textbf{y}}\). Here, (16) follows from the triangle inequality and the Rayleigh-Ritz theorem, and (17) holds because both \(\Vert {\boldsymbol{x}}\Vert _2\) and \(\Vert {\boldsymbol{x}}'\Vert _2\) are bounded by \(r\sqrt{d}\). Equalities occur in (16) and (17) if \({\boldsymbol{x}}= -{\boldsymbol{x}}' = r\sqrt{d} {\boldsymbol{v}}_{\min }\) where \({\boldsymbol{v}}_{\textrm{min}}\) is the eigenvector of \({\boldsymbol{\varSigma }}_{\textbf{y}}\) corresponding to \(\lambda _{\textrm{min}}^{-1}({\boldsymbol{\varSigma }}_{\textbf{y}})\). Therefore, \(\varDelta = 2r\sqrt{d/\lambda _{\min }({\boldsymbol{\varSigma }}_{\textbf{y}})}\).
If we let \({\boldsymbol{\varSigma }}_{\textbf{y}}\) equal the covariance matrix of the sum of n independent vectors \({\textbf{x}}_1, \dots , {\textbf{x}}_n\) in \({\mathcal {S}}_d\), it holds that \(\lambda _{\textrm{min}}({\boldsymbol{\varSigma }}_{\textbf{y}}) \le \frac{1}{d}\textrm{Tr}({\boldsymbol{\varSigma }}_{\textbf{y}}) = \frac{1}{d}\mathbb {E}\left[ {\textrm{Tr}\left( \sum _{i=1}^n {\textbf{x}}_i {\textbf{x}}_i^{\scriptscriptstyle \textsf{T}}\right) } \right] = \frac{1}{d} \sum _{i=1}^n \mathbb {E}\left[ {\textrm{Tr}({\textbf{x}}_i {\textbf{x}}_i^{\scriptscriptstyle \textsf{T}})}\right] = \frac{1}{d} \sum _{i=1}^n \mathbb {E}\left[ {\Vert {\textbf{x}}_i\Vert _2^2}\right] \le n r^2.\) As a consequence, \(\varDelta \ge 2\sqrt{d/n}\).
1.3 A.3 Trade-Off Function: Proof of Proposition 1
The log-likelihood ratio (LLR) for the test between \(\{H :{\boldsymbol{z}}\) is generated from \({\mathcal {N}}({\boldsymbol{x}}, {\boldsymbol{\varSigma }}_{\textbf{y}})\}\) and \(\{H' :{\boldsymbol{z}}\) is generated from \({\mathcal {N}}({\boldsymbol{x}}', {\boldsymbol{\varSigma }}_{\textbf{y}})\}\) is given by the privacy loss function \(L_{G,{\boldsymbol{x}},{\boldsymbol{x}}'}({\boldsymbol{z}})\). For a threshold \(\theta \), the FNR and FPR are given by
In the proof of Theorem 2, we have shown that \({\text {L}}_{G,{\boldsymbol{x}},{\boldsymbol{x}}'}({\textbf{z}}) \sim {\mathcal {N}}(\eta ,2\eta )\) with \(\eta = \frac{1}{2}({\boldsymbol{x}}- {\boldsymbol{x}}')^{\scriptscriptstyle \textsf{T}}{\boldsymbol{\varSigma }}_{\textbf{y}}^{-1} ({\boldsymbol{x}}-{\boldsymbol{x}}')\). Therefore, \(\textrm{FNR}(\theta ) =\varPhi (\frac{-\theta - \eta }{\sqrt{2\eta }})\) and \(\textrm{FPR}(\theta ) = \varPhi (\frac{\theta - \eta }{\sqrt{2\eta }})\). To achieve \(\textrm{FPR}(\theta ) \le \alpha \), the threshold must satisfy \(\theta \le \eta - \sqrt{2\eta } \varPhi ^{-1}(1-\alpha )\). Under this constraint, the minimum FNR is given by \(\varPhi (\varPhi ^{-1}(1 - \alpha ) -\sqrt{2\eta })\). This is by definition the optimal trade-off \(T_{{\mathcal {N}}({\boldsymbol{x}},{\boldsymbol{\varSigma }}), {\mathcal {N}}({\boldsymbol{x}}',{\boldsymbol{\varSigma }})}(\alpha )\).
B LDP Analysis of the Mechanism (1) in a Special Case: Proof of Theorem 3
To prove Theorem 3, we shall use the following preliminary results.
Lemma 1
For two pairs of distributions \((P_1,Q_1)\) and \((P_2,Q_2)\), if \(T_{P_1,Q_1}(\alpha ) \ge T_{P_2,Q_2}(\alpha )\) for every \(\alpha \in [0,1]\), then \({\textsf{E}}_{e^\epsilon }(P_1\Vert Q_1) \le {\textsf{E}}_{e^\epsilon }(P_2\Vert Q_2)\) for every \(\epsilon > 0\).
Lemma 1 follows directly by expressing the hockey-stick divergence in terms of the trade-off function as follows.
Lemma 2
Consider two distributions P and Q defined over \({\mathcal {X}}\). Define a random variable \({\text {L}}_P = \ln \frac{Q({\text {x}})}{P({\text {x}})}\), \({\text {x}}\sim P\), and denote its CDF by \(F_P(x) = \mathbb {P}\left[ {{\text {L}}_P \le x}\right] \). It holds that \({\textsf{E}}_{e^\epsilon }(P\Vert Q) = F_P(-\epsilon ) - e^\epsilon T_{P,Q}(1- F_P(-\epsilon ))\).
Proof of Lemma 2. Define the random variable \({\text {L}}_Q = \ln \frac{Q({\text {x}})}{P({\text {x}})}\), \({\text {x}}\sim Q\), and denote its CDF by \(F_Q(x)\). Observe that \(1- F_P(\theta )\) and \(F_Q(\theta )\) are the FPR and FNR of the likelihood test between P and Q with threshold \(\theta \). It follows from Definition 1 and the Neyman-Pearson lemma [20, Thm. 8.6.1] that
We further have that
By substituting (20) with \(\theta = -\epsilon \) into (21), we complete the proof. \(\square \)
We are now ready to prove Theorem 3. For brevity, we omit the subscript 0 in \({\textbf{x}}_0\) and \({\text {x}}_{0j}\), \(j\in [d]\).
The Univariate Case. We first consider \(d = 1\). Then M in (1) is written as
where \(\underline{r} \le x \le \overline{r}\) and \({\text {y}}\) follows a symmetric and log-concave distribution \(P_{\text {y}}\). We next show that a dominating pair of distribution for this mechanism is \((P_{\underline{r}+{\text {y}}},P_{\overline{r}+{\text {y}}})\). By symmetry of \(P_{\text {y}}\) around \(y_0\), we have that, for every \(a, b \in {\mathbb {R}}\),
Therefore, it suffices to prove that
To this end, we shall show that \({\textsf{E}}_{e^\epsilon }(a + {\text {y}}\Vert b + {\text {y}})\) increases with \(b - a\), and thus maximized when \((a,b) = \displaystyle \mathop {\mathrm {arg\,max}}\limits _{a', b' \in [\underline{r}, \overline{r}], a' \le b'}(b' - a') = (\underline{r}, \overline{r})\). In light of Lemma 1, it suffices to show that \(T_{a + {\text {y}}, b + {\text {y}}}(x)\) decreases with \(b - a\) for all \(x \in {\mathbb {R}}\). Indeed, this is true because \(T_{a + {\text {y}}, b + {\text {y}}}(\alpha ) = F_{\text {y}}(F_{\text {y}}^{-1}(1-\alpha ) - (b - a))\) for \({\text {y}}\) following a log-concave distribution (this follows from [12, Prop. A.3]), and because the CDF \(F_{\text {y}}(\cdot )\) is an increasing function.
The Multivariate Case. We now address the general case with \(d\ge 1\). Using the independence assumption, we can write the mechanism (1) as \(M({\boldsymbol{x}}) = (M_1({\boldsymbol{x}}), M_2({\boldsymbol{x}}), \dots , M_d({\boldsymbol{x}}))\) where \(M_j({\boldsymbol{x}}) = {\boldsymbol{e}}_j^{\scriptscriptstyle \textsf{T}}{\boldsymbol{x}}+ {\text {y}}_j\), with \({\boldsymbol{e}}_j\) being the jth d-dimensional canonical basis vector. Therefore, \(M({\boldsymbol{x}})\) is a (nonadaptive) composition of d mechanisms \(\{M_j\}_{j\in [d]}\). Observe that each mechanism \(M_j\) has the form (22). Therefore, \((P_{\underline{r}_j + {\text {y}}_j}, P_{\overline{r}_j + {\text {y}}_j})\) is a dominating pair of distributions for \(M_j\). The proof is completed by applying [35, Thm. 10], which states that if \((P_j,Q_j)\) is a dominating pair of distributions for mechanism \(M_j\), \(j \in [d]\), then \((P_1 \times \dots \times P_d, Q_1 \times \dots \times Q_d)\) is a dominating pair of distributions for mechanism \(M({\boldsymbol{x}}) = (M_1({\boldsymbol{x}}), \dots , M_d({\boldsymbol{x}}))\).
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ngo, KH., Östman, J., Durisi, G., Graell i Amat, A. (2024). Secure Aggregation Is Not Private Against Membership Inference Attacks. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14946. Springer, Cham. https://doi.org/10.1007/978-3-031-70365-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-70365-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70364-5
Online ISBN: 978-3-031-70365-2
eBook Packages: Computer ScienceComputer Science (R0)