Keywords

1 Introduction

In the modern era, there has been a rapid increase in the amount of digital information collected by governments, social media, hospitals, etc. Though this data holds great utility for business and research purposes, inappropriate use of data can lead to a myriad of issues pertaining to privacy. For example, Target inferred that a teen girl was pregnant before her family knew and started sending her coupons related to baby products [8]. Few years ago, Uber’s poor privacy practices caught news’ attention: their employees misused customer data to track their customers, including politicians and celebrities, in real time, and blogged about “Rides of Glory”, where Uber was able to track one night stands [13]. [11] used the Internet Movie Database as a source of prior knowledge to re-identify the anonymized Netflix records of users, unveiling their alleged political preferences and other sensitive information. Due to these and other similar incidents, governments and policymakers start to recognize the importance of protecting personal data. The European Union (EU) recently proposed the General Data Protection Regulation (GDPR) to protect all EU citizens from privacy breaches in today’s data-driven world [17] and other countries are contemplating similar regulation meanwhile. Unfortunately, the gaps in current privacy preserving techniques make it difficult for data collectors to support this kind of privacy regulation. However, differential privacy helps organizations to comply with these regulations. The key idea of differential privacy is to reduce the privacy impact on individuals whose information is available in the dataset. Hence, it is not possible to identify individual records and sensitive information pertaining to a particular user.

Many possible approaches can be taken to preserve the privacy of datasets. Early techniques included simple mechanisms for anonymizing datasets by redacting or removing certain fields from datasets and operating on them normally. However, it quickly became apparent that an adversary with auxiliary information could learn significant information from these anonymized datasets. This led to the development of k-anonymity, which generalizes quasi-identifiers (pieces of data that by themselves are not unique identifiers but can be combined with others to act like one) and ensures that a particular user’s data is indistinguishable from that of at least \((k-1)\) other users [16]. Though k-anonymity can protect against identity disclosure, it is susceptible against homogeneity and background-knowledge based attacks [14]. l-diversity overcomes this problem and protects against inference-based attacks [10]. However, the semantic relationship between the sensitive attributes makes l-diversity prone to skewness and similarity-based attacks as it is inadequate to avoid attribute exposure [14]. Differential privacy, which provides strong theoretical privacy guarantees, was proposed to provide statistical indistinguishability of datasets.

Differentially private mechanisms are used to release statistics of a dataset as a whole while protecting the sensitive information of individuals in the dataset. Basically, differential privacy guarantees that the released results reveal little or no new information about an individual in the dataset. As an individual sample cannot affect the output significantly, the attackers thus cannot infer the private information corresponding to a particular individual.

Though there has been a myriad of significant contributions in the field of differential privacy, the reasons that it has not yet been adopted by many in the industry are: first, lack of flexibility in the existing mechanisms due to dearth of configurable parameters, second, concerns over reduced utility and privacy. In this paper, we address these issues and offer solutions. Our contributions are as follows:

  1. 1.

    First, we propose the generalized truncated Laplacian mechanism. We also derive the upper bounds on noise amplitude and noise power for the proposed mechanism. We also show that the generalized truncated Laplacian mechanism offers better flexibility than existing \((\epsilon ,\delta )\)-differentially private mechanisms and performs better than the optimal Gaussian mechanism by reducing the noise amplitude and noise power in all valid privacy regimes [1].

  2. 2.

    Second, we propose how different Laplacian distributions can be merged based on different breakpoints and we also prove that the resulting distribution is differentially private. We also show how it can enhance the utility while guaranteeing privacy.

The proposed mechanisms enable data controllers to fine-tune the perturbation that is necessary to protect privacy for use case specific distortion requirements. This also mitigates the problems pertaining to inaccuracy and provides better utility in bounding noise.

The paper is organized as follows. Section 2 compares and contrasts our work with related work in the field of differential privacy. Section 3 provides background on differential privacy. In Sect. 4 and Sect. 5, we present the generalized truncated Laplacian mechanism and the merging of Laplacian distribution mechanism, respectively. In Sect. 6 we conclude with a summary and a discussion of our future work.

2 Related Work

For numeric queries, \(\epsilon \)-differential privacy [3] is achieved by adding Laplacian noise to the query result. It has been the de facto approach in a number of works pertaining to differential privacy [4, 9] and [5]. [2] proposed \((\epsilon ,\delta )\)-differential privacy, which can be interpreted as \(\epsilon \)-differential privacy “with probability \(1-\delta \)”. In spite of its near-ubiquitous use, the Laplacian mechanism has no single substantiation of its optimality. [6] proposes a truncated Laplacian mechanism which draw noises from truncated Laplacian distribution. They have shown that the mechanism is more optimal than the optimal Gaussian mechanism as it significantly reduces the noise amplitude and noise power in a myriad of privacy regimes. [6] offers approximate differential privacy and is defined for the symmetric truncated region, that is, \([-A, A]\). [15] propose piecewise mixture distributions that preserve differential privacy and elucidate the importance of flexibility. Most mechanisms and algorithms in differential privacy uses probability distribution with density functions where \(\epsilon \) is the only variable, a predefined and fixed sensitivity, and minimal amount of additional flexibility for the query-mechanism designer. In this paper, we propose other mechanisms that offer greater flexibility and provide better privacy guarantees than the existing mechanisms. In order to make use of the perturbed query outputs, we have to understand the trade-off between accuracy and privacy. \(\epsilon \) plays a significant role in determining this trade-off. It is inversely proportional to the scale parameter in the Laplacian distribution. If the value of \(\epsilon \) is close to zero, the response to two queries made on neighboring datasets is virtually indistinguishable. However, this makes the queries useless as a large amount of noise would have been added to the result and make it futile. In prior literature pertaining to the accuracy and privacy of differentially private mechanisms, the metric of accuracy is in terms of the amount of noise added to the output of a query or in terms of variance. [7] studied the trade-off between privacy and error for answering a group of linear queries in a differentially private manner, where the error is defined as the lowest expectation of the \(\ell ^2\)-norm of the noise among the query outputs. They also derived the boundary conditions on the error given the differential privacy constraint. [12] were able to extend the result on the trade-off between privacy and error to the case of (\(\epsilon , \delta \))-differential privacy.

3 Background

In this section, we will provide an overview of differential privacy, describe the privacy-accuracy trade-off under \((\epsilon )\)-differential privacy and \((\epsilon ,\delta )\)-differential privacy, and provide the cost functions that are commonly used in evaluating the utility and privacy trade-off of mechanisms that satisfy differential privacy.

3.1 Differential Privacy

Consider a query function,

$$q: \mathcal {D}\rightarrow \mathbb {R},$$

where \(\mathcal {D}\) denotes the set of all possible datasets. The query function q is applied to a dataset or subsets of datasets and returns a real number. Any two datasets \(\mathcal {D}_1\in \mathcal {D}\) and \(\mathcal {D}_2\in \mathcal {D}\) are called neighboring datasets if they differ by at most one element. In other words, one dataset is a subset of the other and \(|\mathcal {D}_1 - \mathcal {D}_2|\le 1\). We denote two neighboring datasets \(\mathcal {D}_1\), \(\mathcal {D}_2\) as \(\mathcal {D}_1\sim \mathcal {D}_2\). A randomized query-answering mechanism \(\mathcal {A}\) is a function of the query function q, and will randomly output a real number with certain probability distribution \(\mathcal {P}\) depending on \(q(\mathcal {D})\), where \(\mathcal {D}\) is the dataset.

A more relaxed notion of \(\epsilon \)-differential privacy is \((\epsilon ,\delta )\)-differential privacy, which can be interpreted as the algorithm that is mostly \(\epsilon \)-differentially private with the factor \(\delta \) denoting the probability that it fails to be. Formally, we have the following definition.

Definition 1

(\((\epsilon , \delta )\)-Differential Privacy). A randomized mechanism \(\mathcal {A}:\mathcal {D}\rightarrow \mathcal {O}\) preserves \((\epsilon ,\delta )\)-differential privacy (\((\epsilon ,\delta )\)-DP) when there exists \(\epsilon > 0\), \(\delta > 0\) such that,

$$\text {Pr [}\mathcal {A}(\mathcal {D}_1)\in \mathcal {T}\text {]}\le e^\epsilon \text {Pr [}\mathcal {A}(\mathcal {D}_2)\in \mathcal {T}\text {]} + \delta $$

holds for every subset \(\mathcal {T} \subseteq \mathcal {O}\) and for any two neighboring datasets \(\mathcal {D}_1\sim \mathcal {D}_2\).

Definition 2

(Global Sensitivity). For a real-valued query function \(q: \mathcal {D} \rightarrow \mathbb {R}\), where \(\mathcal {D}\) denotes the set of all possible datasets, the global sensitivity of q, denoted by \(\varDelta \), is defined as

$$\varDelta = \max _{\mathcal {D}_1\sim \mathcal {D}_2} |q(\mathcal {D}_1)-q(\mathcal {D}_2)|,$$

for all \(\mathcal {D}_1\in \mathcal {D}\) and \(\mathcal {D}_2\in \mathcal {D}\).

Note when the query function q is a counting query or a histogram query, the global sensitivity \(\varDelta = 1\) because removing one user from the dataset \(\mathcal {D}\) only affects the output of the query by at most 1.

3.2 Utility Model

In this section, we discuss the way that we will be using to evaluate the utility and privacy of a differentially private mechanism. Consider a cost function \(\mathcal {L}:\mathbb {R}\rightarrow \mathbb {R}\), which is a function of the random additive noise in the mechanism \(\mathcal {A}\). Given a random additive noise x, the cost function for it is defined as \(\mathcal {L}(x)\). Therefore, we can derive the expectation of the cost over the probability distribution \(\mathcal {P}\) by solving:

$$\int _{x\in \mathbb {R}}\mathcal {L}(x)\mathcal {P}(x)dx$$

Upper bounds on the minimum noise amplitude and noise power, correspond to the \(l^1\) cost function \(\mathcal {L}(x) = |x|\) and \(l^2\) cost function \(\mathcal {L}(x)=x^2\), respectively. Our objective is to minimize such expectation of the cost over the probability distribution for preserving differential privacy.

3.3 Differentially Private Mechanisms

For the case of real output, introducing noise in an additive manner is a standard technique to preserve differential privacy. Thus, we will be discussing mechanisms \(\mathcal {A}\) that preserves \(\epsilon \) or \((\epsilon ,\delta )\)- differential privacy by adding a random noise X drawn from a probability distribution \(\mathcal {P}\). So we will reserve the notation \(\mathcal {A}\) for mechanisms that take the standard formula:

$$\mathcal {A}(\mathcal {D})=q(\mathcal {D})+X.$$

We will also reserve the variable X for the additive random noise drawn from the probability distribution \(\mathcal {P}\) from now on unless stated otherwise.

One of the most well-known differentially private mechanism is the Laplacian mechanism, which uses random noise X drawn from the symmetric Laplacian distribution. The zero-mean Laplacian distribution has a symmetric probability density function f(x) with a scale parameter \(\lambda \) defined as:

$$f(x) = \frac{1}{2\lambda }e^{-\frac{|x|}{\lambda }}.$$

Given the global sensitivity, \(\varDelta \), of the query function q, and the privacy parameter \(\epsilon \), the Laplacian mechanism \(\mathcal {A}\) uses random noise X drawn from the Laplacian distribution with scale \(\lambda = \frac{\varDelta }{\epsilon }\). The Laplacian mechanism preserves \(\epsilon \)-differential privacy [2].

A variant of the symmetric Laplacian mechanism is the truncated Laplacian mechanism, which uses a random noise generated from the truncated Laplace distribution. The zero-mean truncated Laplace distribution has a symmetric-bounded probability density function f(x) with scale \(\lambda \) defined as:

$$f(x) = {\left\{ \begin{array}{ll} Be^{-\frac{|x|}{\lambda }}, \text { for } x\in [-A, A]\\ 0, \text { otherwise} \end{array}\right. }$$

where

$$A = \frac{\varDelta }{\epsilon }\ln (1+\frac{e^\epsilon -1}{2\delta }) \text { and } B =\frac{1}{2\frac{\varDelta }{\epsilon }(1-\frac{1}{1+\frac{e^\epsilon -1}{2\delta })}}.$$

Given the global sensitivity \(\varDelta \) of the query function q, and the privacy parameters \(\epsilon \), \(\delta \), the truncated Laplacian mechanism \(\mathcal {A}\) uses random noise X drawn from the truncated Laplacian distribution with scale \(\lambda = \frac{\varDelta }{\epsilon }\). It has been proven to be \((\epsilon ,\delta )\)-differentially private for \(\delta <\frac{1}{2}\) [6].

Remark 1

Note that an \(\epsilon \) or \((\epsilon ,\delta )\)-differential private mechanism \(\mathcal {A}\) with additive noise X drawn from probability distribution \(\mathcal {P}\) will still be \(\epsilon \) or \((\epsilon ,\delta )\)-differential private when the mean \(\mu \) of \(\mathcal {P}\) is any finite real number instead of 0. Therefore, we will just be discussing and proving the \(\mu = 0\) case in this paper. However, the proof for any real number \(\mu \) is similar.

4 Generalized Truncated Laplacian Mechanism

In this section, we propose an (\(\epsilon ,\delta \))-differentially private mechanism that offers better flexibility than the symmetrically bounded truncated Laplacian mechanism [6] and better accuracy than the optimal Gaussian mechanism [1]. First, we state the probability density function and the cumulative distribution function of the generalized truncated Laplacian distribution. Then, we elucidate the (\(\epsilon ,\delta \))-differentially private mechanism. Finally, we evaluate the upper bound on noise amplitude and noise power.

4.1 Generalized Truncated Laplace Distribution

The probability distribution can be viewed as a generalized truncated Laplace distribution. Such a probability distribution is motivated by the symmetrically bounded Laplace distribution proposed by [6]. The proposed distribution in this paper is a more general version as it is asymmetrically bounded.

To construct such a distribution, we set the privacy parameter \(\epsilon \) and \(\delta \). In contrast to most of the existing \((\epsilon ,\delta )\)-differential private mechanisms, where \(\epsilon \) and \(\delta \) are the only two variables in the algorithm design, the generalized truncated Laplacian distribution allows another parameter to specify the upper or lower bound of the probability density function. Therefore, with the additional bounding parameter, not depending on the value of \(\epsilon \) or \(\delta \), the proposed generalized truncated Laplace distribution provides more flexibility.

Fig. 1.
figure 1

Laplacian mechanism vs Generalized truncated Laplacian mechanism

Definition 3

The zero-mean generalized truncated Laplace distribution has a probability density function f(x) with scale \(\lambda \), and is asymmetrically bounded by A and B where \(A<0<B\), defined as:

$$\begin{aligned} f(x)={\left\{ \begin{array}{ll} Me^{-\frac{|x|}{\lambda }} \text { for } x\in [A,B]\\ 0 \text { otherwise} \end{array}\right. } \text {where, } M=\frac{1}{\lambda (2-e^{\frac{A}{\lambda }}-e^{-\frac{B}{\lambda }})} \end{aligned}$$

Figure 1 depicts a zero-mean Laplace distribution and generalized truncated Laplacian distribution with a scale factor of 2.

The proposed probability distribution is valid, as its probability density function f(x) is greater than 0 for x in the sample space and \(\int _\infty ^\infty f(x)dx = 1\)

Then we present the closed form of the cumulative distribution function, F(x), for the generalized truncated Laplacian distribution.

The cumulative distribution function is defined as,

$$\begin{aligned} F(x) ={\left\{ \begin{array}{ll} \frac{e^\frac{x}{\lambda } - e^{\frac{A}{\lambda }}}{2-e^{\frac{A}{\lambda }}-e^{-\frac{B}{\lambda }}}\text { if } x < 0\\ [0.1in] \frac{2-e^{\frac{A}{\lambda }} - e^{-\frac{x}{\lambda }}}{2-e^{\frac{A}{\lambda }}-e^{-\frac{B}{\lambda }}} \text { if } x \ge 0 \end{array}\right. } \end{aligned}$$

4.2 Mechanism

Given the global sensitivity \(\varDelta \) of the query function q, and the privacy parameters \(\epsilon \), \(\delta \), the Generalized Truncated Laplacian mechanism \(\mathcal {A}\) uses random noise X drawn from the generalized truncated Laplacian distribution in Definition 3 with the following parameters:

$$\begin{aligned}&\lambda = \frac{\varDelta }{\epsilon }, A + \varDelta \le 0 \le B-\varDelta \\&\text {If }|A|\ge |B|, \\&{\left\{ \begin{array}{ll} A = \lambda \ln \left[ 2+(\frac{1 - \delta }{\delta })e^{-\frac{B}{\lambda }} - (\frac{1}{\delta })e^{-\frac{B-\varDelta }{\lambda }}\right] \\ B = \text {any positive real number satisfying} |A|\ge |B| \end{array}\right. }\\&\text {If }|A|<|B|, \\&{\left\{ \begin{array}{ll} A = \text {any negative real number satisfying } |A| < |B|\\ B = -\lambda \ln \left[ 2 +(\frac{1 - \delta }{\delta })e^{\frac{A}{\lambda }} - (\frac{1}{\delta })e^{\frac{A+\varDelta }{\lambda }}\right] \end{array}\right. }. \end{aligned}$$

Theorem 1

The generalized truncated Laplacian mechanism preserves \((\epsilon ,\delta )\)-differential privacy.

Proof

The proof for Theorem 1 relies on the following two lemmas, and the proof for those lemmas can be found in Appendix A.

Lemma 1

$$\max \Big (\int _A^{A+\varDelta }f(x)dx, \int _{B-\varDelta }^B f(x)dx\Big ) = \delta $$

for the probability density function f(x), \(\lambda , A\) and B of the generalized truncated Laplace distribution.

Lemma 2

A mechanism \(\mathcal {A}(\mathcal {D})=q(\mathcal {D})+X\) that adds a random noise X drawn from probability distribution \(\mathcal {P}\) with probability density function f(x), satisfies \((\epsilon ,\delta )\)-differential privacy when

$$\mathcal {P}(\mathcal {S})-e^\epsilon \mathcal {P}(\mathcal {S}+d)\le \delta $$

holds for any \(|d|\le \varDelta ,\) and any measurable set \(\mathcal {S}\subseteq \mathbb {R}\), where \(\varDelta \) is the global sensitivity for the query function q.

Using Lemma 2, in order to prove that our mechanism is (\(\epsilon , \delta \))-differential private, we need to show that for the global sensitivity, \(\varDelta \), of our query function q,

$$\mathcal {P}(\mathcal {S})-e^\epsilon \mathcal {P}(\mathcal {S}+d)\le \delta $$

holds for any \(|d|\le \varDelta ,\) and any measurable set \(\mathcal {S}\subseteq \mathbb {R}\). Equivalently, it is sufficient to show that

$$\max (\mathcal {P}(\mathcal {S})-e^\epsilon \mathcal {P}(\mathcal {S}+d))\le \delta $$

If \(x\in (A+\varDelta , B-\varDelta )\) then,

$$\begin{aligned} \frac{f(x)}{f(x+d)}&=\frac{Me^{-\frac{|x|}{\lambda }}}{Me^{-\frac{|x+d|}{\lambda }}} = e^{\frac{|x+d|-|x|}{\lambda }} \le e^{\frac{|d|}{\lambda }} \le e^\epsilon , \end{aligned}$$

which implies \(\forall |d|\le \varDelta , f(x)-e^\epsilon f(x+d)\le 0\) when \(x\in (A+\varDelta , B-\varDelta )\).

Thus, for measurable set \(\mathcal {S}\),

$$\mathcal {P}(\mathcal {S})-e^\epsilon \mathcal {P}(\mathcal {S}+d) \le \mathcal {P}(\mathcal {S}')-e^\epsilon \mathcal {P}(\mathcal {S}'+d)$$

for \(\mathcal {S}\subseteq \mathbb {R}\) and \(\mathcal {S}' = \mathcal {S}\backslash (A+\varDelta , B-\varDelta )\). Therefore, \(\mathcal {P}(\mathcal {S})-e^\epsilon \mathcal {P}(\mathcal {S}+d)\) is maximized for some set \(\mathcal {S}\subseteq (-\infty , A+\varDelta ]\) or \(\mathcal {S}\subseteq [B-\varDelta , \infty )\). Since the distribution changes exponentially with rate \(\frac{1}{\lambda } = \frac{\epsilon }{\varDelta }\), multiplying the probability distribution \(\mathcal {P}\) by \(e^\epsilon \) will result in shifting the probability distribution by \(\varDelta \). Therefore,

figure a

From Lemma 1, we have the desired inequality

$$\mathcal {P}(S)-e^\epsilon \mathcal {P}(S+d)\le \delta .$$

Remark 2

We claim that

$$0 < \delta \le \min \left( \int _A^0 f(x)dx, \int _0^B f(x)dx\right) .$$

Proof

If \(|A|\ge |B|\), then

$$\begin{aligned}&\int _A^0 f(x)dx >\int _0^B f(x)dx\\&\Rightarrow \min \left( \int _A^0 f(x)dx, \int _0^B f(x)dx\right) = \int _0^B f(x)dx \end{aligned}$$

Additionally,

$$\max \left( \int _A^{A+\varDelta } f(x)dx , \int _{B-\varDelta }^B f(x)dx\right) = \int _{B-\varDelta }^B f(x)dx = \delta $$

Since \(0\le B-\varDelta \),

$$\delta = \int _{B-\varDelta }^B f(x)dx \le \int _0^B f(x)dx.$$

If \(|A|<|B|\), the proof is similar.

4.3 Upper Bound on Noise Amplitude and Noise Power

We apply the generalized truncated Laplacian mechanism to derive upper bounds on the minimum noise amplitude and noise power, corresponding to the \(l^1\) cost function \(\mathcal {L}(x) = |x|\) and \(l^2\) cost function \(\mathcal {L}(x)=x^2\), respectively.

When \(\mathcal {L}(x)=|x|\), the upper bound on minimum noise amplitude is

$$\frac{2\lambda -(\lambda -A)e^{\frac{A}{\lambda }}-(\lambda +B)e^{-\frac{B}{\lambda }}}{2-e^{\frac{A}{\lambda }}-e^{-\frac{B}{\lambda }}}, \text {where } \lambda = \frac{\varDelta }{\epsilon },\text { A and B are specified in Theorem 1. } $$

This result is obtained by evaluating

$$\begin{aligned} inf_{\mathcal {P}\in \mathcal {P}_{\epsilon ,\delta }} \int _{x\in \mathbb {R}}|x|\mathcal {P}(dx) = \int _A^B |x|f(x)dx = M\left( \int _A^0 -xe^{\frac{x}{\lambda }}dx + \int _0^B xe^{-\frac{x}{\lambda }}dx \right) \ \end{aligned}$$

As the noise with probability density function f(x) satisfies \((\epsilon ,\delta )\)-differential privacy, this provides an upper bound on inf\(_{P\in P_{\epsilon ,\delta }}\) \(\int _{x\in \mathbb {R}}|x|\mathcal {P}(dx)\).

Similarly, we derive the upper bound on the minimum noise power by having \(\mathcal {L}(x) = x^2\), and we get

$$\frac{4\lambda ^2 -(2\lambda ^2-2\lambda A+A^2)e^{\frac{A}{\lambda }} - (2\lambda ^2+2\lambda B + B^2) e^{-\frac{B}{\lambda }}}{2-e^{\frac{A}{\lambda }}-e^{-\frac{B}{\lambda }}}$$

where \(\lambda = \frac{\varDelta }{\epsilon }\), and A and B are specified in Sect. 4.2.

As the noise with probability density function f(x) satisfies \((\epsilon ,\delta )\)-differential privacy, this provides an upper bound on inf\(_{P\in P_{\epsilon ,\delta }}\) \(\int _{x\in \mathbb {R}}x^2\mathcal {P}(dx)\).

We performed experiments to compare the performance of the generalized truncated Laplacian mechanism with the optimal Gaussian mechanism [1]. [1] calculate the variance of the optimal Gaussian mechanism using the cumulative density function instead of a tail bound approximation. The ratio of the noise amplitude and noise power of generalized truncated Laplacian mechanism and the optimal Gaussian mechanism is always less than 1 for appropriate values of \(\delta \), A and B as shown in Appendix B. Compared to the optimal Gaussian mechanism, the generalized truncated Laplacian mechanism reduces the noise power and noise amplitude across all privacy regimes.

5 Merged Laplacian Mechanism

In this section, we propose an \(\epsilon \)-differentially private mechanism that merges different Laplacian distributions on different breakpoints. We also evaluate the \(l^1\) cost function \(\mathcal {L}(x) = |x|\) and \(l^2\) cost function \(\mathcal {L}(x)=x^2\) for the proposed mechanism, compare it with the Laplacian mechanism and show that our proposed mechanism achieves better utility.

Definition 4

The zero-mean merged Laplacian distribution has a probability density function f(x) with n break points \(0<c_1<c_2<\cdots < c_n = \infty \) and n scale parameters \(\lambda _1, \lambda _2, ..., \lambda _n\) defined as:

$$\begin{aligned}&f(x) = f_m(x) \text { for } x\in (-c_m,-c_{m-1}]\cup [c_{m-1}, c_m)\\&\text {Let } c_0=0, \forall m\in \{1,2,..., n\} \text { where } f_m(x)= a_m e ^{-\frac{|x|}{\lambda _m}}, \end{aligned}$$

and all \(a_m >0\) computed by

$$\begin{aligned}&\sum _{m=1}^{n} \int _{c_{m-1}}^{c_{m}} a_m e^{-\frac{|x|}{\lambda _m}} dx = \dfrac{1}{2}, \end{aligned}$$
(1)
$$\begin{aligned} \text { and }&f_m(c_{m})=f_{m+1}(c_{m}). \end{aligned}$$
(2)

Remark 3

Note that (1) and (2) gives sufficient inputs to calculate \(a_m\) for \(m\in \{1,2,..,n\}\) as we can write \(a_m\)’s in terms of \(\lambda _1,\lambda _2, ..., \lambda _n\) and \(a_1\) and by inductively applying

$$\begin{aligned} f_m(c_m)&= f_{m+1}(c_m)\implies a_me^{-\frac{c_m}{\lambda _m}}&= a_{m+1}e^{-\frac{c_m}{\lambda _{m+1}}} \implies a_{m+1}&= a_m e^{\frac{c_m}{\lambda _{m+1}}-\frac{c_m}{\lambda _m}}. \end{aligned}$$

Then, we can rewrite (1) with \(a_1\) as the only variable to solve for the value of \(a_1\). Hence, we can get the values for the rest of the \(a_m\)’s.

Now we will prove that the probability distributions that we proposed is a valid one. To do this, we need to show that its probability density function f(x) is continuous and greater than 0 for x in the domain and the cumulative probability from \(-\infty \) to \(\infty \) is 1.

Proof

First, it is easy to see that \(f(x)>0\) for x in the domain as \(e^{-\frac{|x|}{\lambda _m}}>0\) for \(m\in \{1,2,...,n\}\), and all \(a_m>0\). Thus, \(f_m(x)>0\Rightarrow f(x)>0\).

Additionally, f(x) is continuous on \(\cup _{m = 0}^{n}(c_{m-1}, c_m)\) as \(e^{-\frac{|x|}{\lambda _m}}\) is continuous. At each break point \(c_m\), the continuity is ensured by \(f_m(c_m) = f_{m+1}(c_m)\). Now we will show that the cumulative probability from \(-\infty \) to \(\infty \) is 1.

figure b

Now we propose a differentially private mechanism which adds noise drawn from the Merged Laplacian distribution as defined in Definition 4.

Theorem 2

Given the global sensitivity, \(\varDelta \), of the query function q, and the privacy parameter \(\epsilon = \epsilon _1\), the Merged Laplacian mechanism \(\mathcal {A}\) uses random noise X drawn from the merged Laplacian distribution with scale parameter \(\lambda _m = \frac{\epsilon _m}{\varDelta }\) where \(\lambda _1> \lambda _2> \cdots > \lambda _n\) and preserves \(\epsilon \) - differential privacy.

Proof

To prove that our mechanism preserves \(\epsilon \) - differential privacy, we need to show that for \(\mathcal {D}_1\sim \mathcal {D}_2\),

$$Pr [\mathcal {A}(\mathcal {D}_1)\in \mathcal {T}]\le e^{\epsilon } Pr [\mathcal {A}(\mathcal {D}_2)\in \mathcal {T}]$$

for any subset \(\mathcal {T}\subseteq \mathcal {O}\), where \(\mathcal {O}\) is the set of all outputs of the mechanism. And the above inequality is equivalent to

$$\begin{aligned}&\frac{Pr[\mathcal {A}(\mathcal {D}_1) = t]}{Pr[\mathcal {A}(\mathcal {D}_2) = t]}\le e^{k\epsilon }\text {, }\forall t\in \mathcal {T} \Leftrightarrow&\frac{Pr[X = t - q(\mathcal {D}_1)]}{Pr[X = t - q(\mathcal {D}_2)]}\le e^{k\epsilon }. \end{aligned}$$

We will prove this inductively. Our base case is when \(n = 1\), then the mechanism becomes the well-known Laplacian mechanism, which is \(\epsilon \) - differentially private as \(\epsilon = \max (\epsilon _1)\).

Now, notice that since \(\lambda _m=\frac{\epsilon _m}{\varDelta }\) and \(\lambda _1> \lambda _2>\cdots >\lambda _n\), then \(\max (\epsilon _1, \epsilon _2, ..., \epsilon _n)=\epsilon _1=\epsilon \).

Now, assume with the same break points \(c_1, c_2, ..., c_{k-1}\) where \(0< c_1< c_2< \cdots < c_k = \infty \), the merged Laplacian mechanism is \(\epsilon = \epsilon _1\) - differentially private. We want to prove that adding one more break point \(c_k < \infty \) to the new merged mechanism satisfies \(\epsilon \) - differential privacy. We will prove the case where \(t - q(\mathcal {D}_1)\) and \(t - q(\mathcal {D}_2)\) are negative, as the other cases follows the similar proof with a few sign changes. For \(m\in \{1,2,..., k-1\}\), we have

$$\begin{aligned}&\frac{Pr[X = t - q(\mathcal {D}_1)]}{Pr[X = t - q(\mathcal {D}_2)]} =\frac{a_m e^{\frac{t-q(\mathcal {D}_1)}{\lambda _m}}}{a_{k}e^{\frac{t-q(\mathcal {D}_2)}{\lambda _k}}} =\frac{a_m}{a_k}\cdot e^{\frac{t-q(\mathcal {D}_1)}{\lambda _m}-\frac{t-q(\mathcal {D}_2)}{\lambda _k}} \end{aligned}$$

We also know that,

$$\begin{aligned} a_k&= a_{k-1} e^{\frac{c_{k-1}}{\lambda _{k}}-\frac{c_{k-1}}{\lambda _{k-1}}} = a_{k-2} e^{\frac{c_{k-2}}{\lambda _{k-1}}-\frac{c_{k-2}}{\lambda _{k-2}}+\frac{c_{k-1}}{\lambda _{k}}-\frac{c_{k-1}}{\lambda _{k-1}}} =a_m e^{\sum _{i=m-1}^{k-1} \left( \frac{c_{i}}{\lambda _{i+1}}-\frac{c_{i}}{\lambda _{i}}\right) } \end{aligned}$$

Hence,

$$\frac{a_m}{a_k} = e^{\sum _{i=m-1}^{k-1} \left( \frac{c_{i}}{\lambda _{i}}-\frac{c_{i}}{\lambda _{i+1}}\right) }.$$

Notice that

$$\begin{aligned} \sum _{i=m-1}^{k-1} \left( \frac{c_{i}}{\lambda _{i}}-\frac{c_{i}}{\lambda _{i+1}}\right)&= \sum _{i=m-1}^{k-1} \left( \frac{c_i\left( \lambda _{i+1}-\lambda _i\right) }{\lambda _i\lambda _{i+1}}\right) < 0 \text { since } \lambda _1> \lambda _2>\cdots > \lambda _n. \end{aligned}$$

Thus,

$$\begin{aligned}&\frac{Pr[X = t - q(\mathcal {D}_1)]}{Pr[X = t - q(\mathcal {D}_2)]} = \frac{a_m}{a_k}\cdot e^{\frac{t-q(\mathcal {D}_1)}{\lambda _m}-\frac{t-q(\mathcal {D}_2)}{\lambda _k}}< e^{\frac{t-q(\mathcal {D}_1)}{\lambda _m}-\frac{t-q(\mathcal {D}_2)}{\lambda _k}} = e^{\frac{\lambda _k(t-q(\mathcal {D}_1))-\lambda _m(t-q(\mathcal {D}_2))}{\lambda _m\lambda _k}}\\<&e^{\frac{\lambda _m(t-q(\mathcal {D}_1))-\lambda _m(t-q(\mathcal {D}_2))}{\lambda _m\lambda _k}} = e^{\frac{t-q(\mathcal {D}_1)-t+q(\mathcal {D}_2)}{\lambda _k}} = e^{\frac{q(\mathcal {D}_2)-q(\mathcal {D}_1)}{\lambda _k}} = e^{\epsilon _k} < e^\epsilon . \end{aligned}$$

Hence, we have proved that the proposed mechanism is \(\epsilon \) - deferentially private.

We evaluate the \(l^1\) cost function \(\mathcal {L}(x) = |x|\) and \(l^2\) cost function \(\mathcal {L}(x)=x^2\), for the Laplacian, Merged Laplacian with 1 break point and Merged Laplacian with 2 break points as shown in Appendix C. We show that the cost for the Merged Laplacian with 2 break points is lower than that of the Laplacian mechanism and hence we achieve better utility for the same privacy loss.

6 Conclusion and Future Work

In this paper, we presented two novel differentially private mechanisms that provide better accuracy guarantees compared to existing mechanisms. Firstly, we presented a new noise adding mechanism that preserves \((\epsilon ,\delta )\)-differential privacy. The proposed mechanisms provide more scope for customization as they have more parameters to tune. Due to this customizable and flexible nature, appropriate values for different parameters in the mechanisms can be set. We also show that the generalized truncated Laplacian mechanism performs better than the optimal Gaussian mechanism. Next, we show that the proposed merging of Laplacian mechanisms demonstrates better in performance in terms of various metrics for \(l^1\) and \(l^2\) loss without sacrificing additional privacy. As a part of future work, we plan to perform an in-depth comparison of all \((\epsilon ,\delta )\)-differentially private and \(\epsilon \)-differentially private mechanisms and highlight the pros and cons of every mechanism.