Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

With the advent of Big Data, research on privacy preserving data collection and analysis is culminating as users continuously produce data which once aggregated becomes very valuable. Often scenarios regarding data analysis involve an Aggregator which collects individual data from multiple (independent) users to compute useful statistics, these statistics are generally forwarded to Data Analyzers whose role is to extract insightful information about the entire user population. Various motivating examples for the aforementioned generic scenario exist in the real-world:

  • The analysis of different user profiles and the derivation of statistics can help recommendation engines provide targeted advertisements. In such scenarios a service provider would collect data from each individual user (i.e.: on-line purchases), thus acting as an Aggregator, and compute an on-demand aggregate value upon receiving a request from the advertisement company. The latter will further infer some statistics acting as a Data Analyzer, in order to send the appropriate advertisements to each category of users.

  • Data aggregation is a promising tool in the field of healthcare research. Different types of data, sensed by body sensors (eg. blood pressure), are collected on a large scale by Aggregators. Health scientists who act as Data Analyzers infer statistical information from these data without accessing the individual inputs (for privacy reasons). An aggregate value computed over a large population would give very useful information for deriving statistical models, evaluating therapeutic performance or learning the likelihood of upcoming patients’ diseases.

Unfortunately, existing solutions only focus on the problem of data confidentiality and consider the Aggregator to be honest-but-curious: the Aggregator wants to discover the content of each individual data, but performs the aggregation operation correctly. In this paper we consider a more powerful security model by assuming a malicious Aggregator: The Aggregator may provide a bogus aggregate value to the Data Analyzer. In order to protect against such a malicious behavior, we propose that along with the aggregate value, the Aggregator provides a proof of the correctness of the computation of the aggregate result to the Data Analyzer. For efficiency reasons, we require that the Data Analyzer verifies the correctness of the computation without communicating with users in the system.

The underlying idea of our solution is that each user encrypts its data according to Shi et al. [17] scheme using its own secret encryption key, and sends the resulting ciphertext to the untrusted Aggregator. Users, also homomorphically tag their data using two layers of randomness with two different keys and forward the tags to the Aggregator. The latter computes the sum by applying operations on the ciphertexts and derives a proof of computation correctness from the tags. The Aggregator finally sends the result and the proof to the Data Analyzer. In addition to ensuring obliviousness against the Aggregator and the Data Analyzer (i.e. neither the Data Analyzer nor the Aggregator learns individual data inputs), the proposed protocol assures public verifiablity: any third party can verify the correctness of the aggregate value.

To the best of our knowledge we are the first to define a model for Privacy and Unforgeability for Data Aggregation (\(\mathbf{{ PUDA }}\)). We also instantiate a \(\mathbf{{ PUDA }}\) scheme that supports:

  • A multi-user setting where multiple users produce personal sensitive data without interacting with each other.

  • Privacy of users’ individual data.

  • Public verifiability of the aggregate value.

2 Problem Statement

We are envisioning a scenario whereby a set of users \(\mathbb {U} = \{\mathcal {U}_{i}\}_{i=1}^n\) are producing sensitive data inputs \(x_{i,t}\) at each time interval t. These individual data are first encrypted into ciphertexts \(c_{i,t}\) and further forwarded to an untrusted Aggregator \(\mathcal{A}\). Aggregator \(\mathcal{A}\) aggregates all the received ciphertexts, decrypts the aggregate and forwards the resulting plaintext to a Data Analyzer \(\mathcal{DA}\) together with a cryptographic proof that assures the correctness of the aggregation operation, which in this paper corresponds to the sum of the users’ individual data. An important criterion that we aim to fulfill in this paper is to ensure that Data Analyzer \(\mathcal{DA}\) verifies the correctness of the Aggregator’s output without compromising users’ privacy. Namely, at the end of the verification operation, both Aggregator \(\mathcal{A}\) and Data Analyzer \(\mathcal{DA}\) learn nothing, but the value of the aggregation. While homomorphic signatures proposed in [4, 10] seem to answer the verifiability requirement, authors in those papers only consider scenarios where a single user generates data.

In the aim of assuring both individual user’s privacy and unforgeable aggregation, we first come up with a generic model for privacy preserving and unforgeable aggregation that identifies the algorithms necessary to implement such functionalities and defines the corresponding privacy and security models. Furthermore, we propose a concrete solution which combines an already existing privacy preserving aggregation scheme [17] with an additively homomorphic tag designed for bilinear groups.

Notably, a scheme that allows a malicious Aggregator to compute the sum of users’ data in privacy preserving manner and to produce a proof of correct aggregation will start by first running a setup phase. During setup, each user receives a secret key that will be used to encrypt the user’s private input and to generate the corresponding authentication tag; the Aggregator \(\mathcal{A}\) and the Data Analyzer \(\mathcal{DA}\) on the other hand, are provided with a secret decryption key and a public verification key, respectively. After the key distribution, each user sends its data encrypted and authenticated to Aggregator \(\mathcal{A}\), while making sure that the computed ciphertext and the matching authentication tag leak no information about its private input. On receiving users’ data, Aggregator \(\mathcal{A}\) first aggregates the received ciphertexts and decrypts the sum using its decryption key, then uses the received authentication tags to produce a proof that demonstrates the correctness of the decrypted sum. Finally, Data Analyzer \(\mathcal{DA}\) verifies the correctness of the aggregation, thanks to the public verification key.

2.1 PUDA Model

A \(\mathbf{{ PUDA }}\) scheme consists of the following algorithms:

  • \({\mathsf {Setup}}(1^{\kappa }) \rightarrow (\mathcal {P}, \mathsf {SK_A}, \{\mathsf{SK}_i\}_{\mathcal {U}_i \in \mathbb {U}}, \mathsf{VK})\): It is a randomized algorithm which on input of a security parameter \(\kappa \), this algorithm outputs the public parameters \(\mathcal {P}\) that will be used by subsequent algorithms, the Aggregator \(\mathcal{A}\)’s secret key \(\mathsf {SK_A}\), the secret keys \(\mathsf { SK}_i\) of users \(\mathcal {U}_i\) and the public verification key \(\mathsf {VK}\).

  • \({\mathsf {EncTag}}(t, \mathsf{SK}_i, x_{i,t}) \rightarrow (c_{i,t},{\mathsf {\sigma }}_{i,t})\): It is a randomized algorithm which on inputs of time interval t, secret key \(\mathsf{SK}_i\) of user \(\mathcal {U}_i\) and data \(x_{i, t}\), encrypts \(x_{i,t}\) to get a ciphertext \(c_{i,t}\) and computes a tag \({\mathsf {\sigma }}_{i,t}\) that authenticates \(x_{i, t}\).

  • \({\mathsf {Aggregate}}(\mathsf {SK_A}, \{c_{i,t}\}_{\mathcal {U}_i \in \mathbb {U}}, \{{\mathsf {\sigma }}_{i,t}\}_{\mathcal {U}_i \in \mathbb {U}})\rightarrow (\mathsf{sum_{t}}, {\mathsf {\sigma _t}})\): It is a deterministic algorithm executed by the Aggregator \(\mathcal{A}\). It takes as inputs Aggregator \(\mathcal{A}\)’s secret key \(\mathsf {SK_A}\), ciphertexts \(\{c_{i, t}\}_{\mathcal {U}_i \in \mathbb {U}}\) and authentication tags \(\{\sigma _{i, t}\}_{\mathcal {U}_i \in \mathbb {U}}\), and outputs the sum \(\mathsf{sum_{t}}\) of the values \(\{x_{i,t}\}_{{\mathcal {U}}_i \in \mathbb {U}}\) in cleartext and a proof \({\mathsf {\sigma _t}}\) of correctness for \(\mathsf{sum_{t}}\).

  • \({\mathsf {Verify }}(\mathsf {VK}, t, \mathsf{sum_{t}},{\mathsf {\sigma _t}})\rightarrow \{0, 1\}\): It is a deterministic algorithm that is executed by the Data Analyzer \(\mathcal{DA}\). It outputs 1 if Data Analyzer \(\mathcal{DA}\) is convinced that proof \({\mathsf {\sigma _t}}\) corresponds to the sum \(\mathsf{sum_{t}}= \sum _{{\mathcal {U}}_i \in \mathbb {U}} \{x_{i,t}\}\), where \(x_{i,t}\) are individual data inputs at time interval t of user \({\mathcal {U}}_i\); and 0 otherwise.

2.2 Security Model

In this paper, we only focus on the adversarial behavior of Aggregator \(\mathcal{A}\). The rationale behind this, is that Aggregator \(\mathcal{A}\) is the only party in the protocol that sees all the messages exchanged during the protocol execution: Namely, Aggregator \(\mathcal{A}\) has access to users’ ciphertexts. It follows that by ensuring security properties against the Aggregator, one by the same token, ensures these security properties against both Data Analyzer \(\mathcal{DA}\) and external parties.

In accordance with previous work [11, 17], we formalize the property of Aggregator obliviousness, which ensures that at the end of a protocol execution, Aggregator \(\mathcal{A}\) only learns the sum of users’ inputs and nothing else. Also, we enhance the security definitions of data aggregation with the notion of aggregate unforgeability. As the name implies, aggregate unforgeability guarantees that Aggregator \(\mathcal{A}\) cannot forge a valid proof \({\mathsf {\sigma _t}}\) for a sum \(\mathsf{sum_{t}}\) that was not computed correctly from users’ inputs (i.e. cannot generate a proof for \(\mathsf{sum_{t}}\ne \sum x_{i, t}\)).

Aggregator Obliviousness. Aggregator Obliviousness ensures that when users \(\mathcal {U}_i\) provide Aggregator \(\mathcal{A}\) with ciphertexts \(c_{i, t}\) and authentication tags \({\mathsf {\sigma }}_{i,t}\), Aggregator \(\mathcal{A}\) cannot reveal any information about individual inputs \(x_{i, t}\), other than the sum value \(\sum x_{i, t}\). We extend the existing definition of \(Aggregator Obliviousness \) (cf. [11, 13, 17]) so as to capture the fact that Aggregator \(\mathcal{A}\) not only has access to ciphertexts \(c_{i, t}\), but also has access to the authentication tags \({\mathsf {\sigma }}_{i, t}\) that enable Aggregator \(\mathcal{A}\) to generate proofs of correct aggregation.

Similarly to the work of [11, 17], we formalize Aggregator obliviousness using an indistinguishability-based game in which Aggregator \(\mathcal{A}\) accesses the following oracles:

  • \(\mathcal {O}_\mathsf{{Setup}}\): When called by Aggregator \(\mathcal{A}\), this oracle initializes the system parameters; it then gives the public parameters \(\mathcal {P}\), the Aggregator’s secret key \(\mathsf {SK_A}\) and public verification key \(\mathsf {VK}\) to \(\mathcal{A}\).

  • \(\mathcal {O}_\mathsf{{Corrupt}}\): When queried by Aggregator \(\mathcal{A}\) with a user \({\mathcal {U}}_i\)’ s identifier \(\mathsf{uid}_i\), this oracle provides Aggregator \(\mathcal{A}\) with \({\mathcal {U}}_i\)’s secret key denoted \(\mathsf {SK}_i\).

  • \(\mathcal {O}_\mathsf{EncTag}\): When queried with time t, user \({\mathcal {U}}_i\)’s identifier \(\mathsf{uid}_i\) and a data point \(x_{i, t}\), this oracle outputs the ciphertext \(c_{i, t}\) and the authentication tag \({\mathsf {\sigma }}_{i,t}\) of \(x_{i, t}\).

  • \(\mathcal {O}_\mathsf{AO}\): When called with a subset of users \(\mathbb {S} \subset \mathbb {U}\) and with two time-series \(\mathcal {X}^0_{t^*}=(\mathcal {U}_i, t, x^0_{i, t})_{\mathcal {U}_i \in \mathbb {S}}\) and \(\mathcal {X}^1_{t^*}=(\mathcal {U}_i, t, x^1_{i, t})_{\mathcal {U}_i \in \mathbb {S}}\) such that \(\sum x^0_{i, t} = \sum x^1_{i, t}\), this oracle flips a random coin \(b \in \{0, 1\}\) and returns an encryption of the time-serie \((\mathcal {U}_i, t, x^b_{i, t})_{\mathcal {U}_i \in \mathbb {S}}\) (that is the tuple of ciphertexts \(\{c^b_{i, t}\}_{\mathcal {U}_i \in \mathbb {S}}\) and the corresponding authentication tags \(\{{\mathsf {\sigma }}^b_{i, t}\}_{\mathcal {U}_i \in \mathbb {S}}\).

Aggregator \(\mathcal{A}\) is accessing the aforementioned oracles during a learning phase (cf. Algorithm 1) and a challenge phase (cf. Algorithm 2). In the learning phase, \(\mathcal{A}\) calls oracle \(\mathcal {O}_\mathsf{{Setup}}\) which in turn returns the public parameters \(\mathcal {P}\), the public verification key \(\mathsf{VK}\) and the Aggregator’s secret key \(\mathsf {SK_A}\). It also interacts with oracle \(\mathcal {O}_\mathsf{{Corrupt}}\) to learn the secret keys \(\mathsf{SK}_i\) of users \({\mathcal {U}}_i\), and oracle \(\mathcal {O}_\mathsf{{EncTag}}\) to get a set of ciphertexts \(c_{i,t}\) and authentication tags \({\mathsf {\sigma }}_{i,t}\).

In the challenge phase, Aggregator \(\mathcal {A}\) chooses a subset \(\mathbb {S}^*\) of users that were not corrupted in the learning phase, and a challenge time interval \(t^*\) for which it did not make an encryption query. Oracle \(\mathcal {O}_\mathsf{AO}\) then receives two time-series \(\mathcal {X}^0_{t^*} = (\mathcal {U}_i, t^*, x^0_{i, t^*})_{\mathcal {U}_i \in \mathbb {S}^*}\) and \(\mathcal {X}^1_{t^*} = (\mathcal {U}_i, t^*, x^1_{i, t^*})_{\mathcal {U}_i \in \mathbb {S}^*}\) from \(\mathcal{A}\), such that \(\sum x^0_{i, t^*} = \sum _{\mathcal {U}_i \in \mathbb {S}^*} x^1_{i, t^*}\). Then oracle \(\mathcal {O}_\mathsf{AO}\) flips a random coin \(b {\,\mathop {\leftarrow }\limits ^{{\scriptscriptstyle \$}}\,}\{0,1\}\) and returns to \( \mathcal{A}\) the ciphertexts \(\{c_{i, t^*}^b\}_{\mathcal {U}_i \in \mathbb {S}^*}\) and the matching authentication tags \(\{{\mathsf {\sigma }}_{i,t^*}^b\}_{\mathcal {U}_i \in \mathbb {S}^*}\).

At the end of the challenge phase, Aggregator \(\mathcal{A}\) outputs a guess \(b^*\) for the bit b.

We say that Aggregator \(\mathcal{A}\) succeeds in the Aggregator obliviousness game, if its guess \(b^*\) equals b.

figure a
figure b

Definition 1

(Aggregator Obliviousness). Let \(\Pr [\mathcal{A}^{\mathbf {AO}}]\) denote the probability that Aggregator \(\mathcal{A}\) outputs \(b^*=b\). Then an aggregation protocol is said to ensure Aggregator obliviousness if for any polynomially bounded Aggregator \(\mathcal{A}\) the probability \(\Pr [\mathcal{A}^{\mathbf {AO}}] \le \frac{1}{2} + \epsilon (\kappa )\), where \(\epsilon \) is a negligible function and \(\kappa \) is the security parameter.

Aggregate Unforgeability. We augment the security requirements of data aggregation with the requirement of aggregate unforgeability. More precisely, we assume that Aggregator \(\mathcal{A}\) is not only interested in compromising the privacy of users participating in the data aggregation protocol, but is also interested in tampering with the sum of users’ inputs. That is, Aggregator \(\mathcal{A}\) may sometimes have an incentive to feed Data Analyzer \(\mathcal{DA}\) erroneous sums. Along these lines, we define aggregate unforgeability as the security feature that ensures that Aggregator \(\mathcal{A}\) cannot convince Data Analyzer \(\mathcal{DA}\) to accept a bogus sum, as long as users \({\mathcal {U}}_i\) in the system are honest (i.e. they always submit their correct input and do not collude with the Aggregator \(\mathcal{A}\)).

In compliance with previous work [7, 10] on homomorphic signatures, we formalize aggregate unforgeability via a game in which Aggregator \(\mathcal{A}\) accesses oracles \(\mathcal {O}_\mathsf{{Setup}}\) and \(\mathcal {O}_\mathsf{EncTag}\). Furthermore, given the property that anyone holding the public verification key \(\mathsf{VK}\) can execute the algorithm \(\mathsf{Verify}\), we assume that Aggregator \(\mathcal{A}\) during the unforgeability game runs the algorithm \(\mathsf{Verify}\) by itself.

As shown in Algorithm 3, Aggregator \(\mathcal{A}\) enters the aggregate unforgeability game by querying the oracle \(\mathcal {O}_\mathsf{{Setup}}\) with a security parameter \(\kappa \). Oracle \(\mathcal {O}_\mathsf{{Setup}}\) accordingly returns public parameters \(\mathcal {P}\), verification key \(\mathsf {VK}\) and the secret key \(\mathsf {SK_A}\) of Aggregator \(\mathcal{A}\). Moreover, Aggregator \(\mathcal{A}\) calls oracle \(\mathcal {O}_\mathsf{EncTag}\) with tuples \((t, \mathsf{uid}_i, x_{i, t})\) in order to receive the ciphertext \(c_{i,t}\) encrypting \(x_{i, t}\) and the matching authenticating tag \({\mathsf {\sigma }}_{i,t}\), both computed using user \({\mathcal {U}}_i\)’s secret key \(\mathsf{SK}_i\). Note that for each time interval t, Aggregator \(\mathcal{A}\) is allowed to query oracle \(\mathcal {O}_\mathsf{EncTag}\) for user \({\mathcal {U}}_i\) only once. In other words, Aggregator \(\mathcal{A}\) cannot submit two distinct queries to oracle \(\mathcal {O}_\mathsf{EncTag}\) with the same time interval t and the same user identifier \(\mathsf{uid}_i\). Without loss of generality, we suppose that for each time interval t, Aggregator \(\mathcal{A}\) invokes oracle \(\mathcal {O}_\mathsf{EncTag}\) for all users \({\mathcal {U}}_i\) in the system.

At the end of the aggregate unforgeability game (see Algorithm 4), Aggregator \(\mathcal{A}\) outputs a tuple \((t^*, \mathsf{sum}_{t^*}, \sigma _{t^*})\). We say that Aggregator \(\mathcal{A}\) wins the aggregate unforgeability game if one of the following statements holds:

  1. 1.

    \(\mathsf{Verify}(\mathsf {VK}, t^*, \mathsf{sum}_{t^*}, \sigma _{t^*}) \rightarrow 1\) and Aggregator \(\mathcal{A}\) never made a query to oracle \(\mathcal {O}_\mathsf{EncTag}\) that comprises time interval \(t^*\). In the remainder of this paper, we denote this type of forgery Type I Forgery.

  2. 2.

    \(\mathsf{Verify}(\mathsf {VK}, t^*, \mathsf{sum}_{t^*}, \sigma _{t^*}) \rightarrow 1\) and Aggregator \(\mathcal{A}\) has made a query to oracle \(\mathcal {O}_\mathsf{EncTag}\) for time \(t^*\), however the sum \(\mathsf{sum}_{t^*} \ne \sum _{\mathcal {U}_i} x_{i, t^*}\). In what follows, we call this type of forgery Type II Forgery.

figure c
figure d

Definition 2

(Aggregate Unforgeability). Let \(\Pr [\mathcal{A}^{\mathbf {AU}}]\) denote the probability that Aggregator \(\mathcal{A}\) wins the aggregate unforgeability game, that is, the probability that Aggregator \(\mathcal{A}\) outputs a Type I Forgery or Type II Forgery that will be accepted by algorithm \(\mathsf{Verify}\).

An aggregation protocol is said to ensure aggregate unforgeability if for any polynomially bounded aggregator \(\mathcal{A}\), \(\Pr [\mathcal{A}^{\mathbf {AU}}] \le \epsilon (\kappa )\), where \(\epsilon \) is a negligible function in the security parameter \(\kappa \).

3 Idea of our PUDA Protocol

  • A homomorphic encryption algorithm that allows the Aggregator to compute the sum without divulging individual data.

  • A homomorphic tag that allows each user to authenticate the data input \(x_{i,t}\), in such a way that the Aggregator can use the collected tags to construct a proof that demonstrates to the Data Analyzer \(\mathcal{DA}\) the correctness of the aggregated sum.

Concisely, a set of non-interacting users are connected to personal services and devices that produce personal data. Without any coordination, each user chooses a random tag key \(\mathsf {tk}_i\) and sends an encoding thereof to the key dealer. After collecting all encoded keys , the key dealer publishes the corresponding public verification key \(\mathsf {VK}\). This verification key is computed as a function of the encodings . Later, the key dealer gives to each user in the system an encryption key \(\mathsf {ek}_i\) that will be used to compute the user’s ciphertexts. Accordingly, the secret key of each user \(\mathsf{SK}_i\) is defined as the pair of tag key \(\mathsf{tk}_i\) and encryption key \(\mathsf{ek}_i\). Finally, the key dealer provides the Aggregator with secret key \(\mathsf{SK}_A\) computed as the sum of encryption keys \(\mathsf{ek}_i\) and goes off-line.

Now at each time interval t, each user employs its secret key \(\mathsf {SK}_i\) to compute a ciphertext based on the encryption algorithm of Shi et al. [17] and a homomorphic tag on its sensitive data input. When the Aggregator collects the ciphertexts and the tags from all users, it computes the sum \(\mathsf{sum_{t}}\) of users’ data and a matching proof \({\mathsf {\sigma _t}}\), and forwards the sum and the proof to the Data Analyzer. At the final step of the protocol, the Data Analyzer verifies with the verification key \(\mathsf {VK}\) and proof \({\mathsf {\sigma _t}}\) the validity of the result \(\mathsf{sum_{t}}\).

Thanks to the homomorphic encryption algorithm of Shi et al. [17] and the way in which we construct our homomorphic tags, we show that our protocol ensures Aggregator Obliviousness. Moreover, we show that the Aggregator cannot forge bogus results. Finally, we note that the Data Analyzer \(\mathcal{DA}\) does not keep any state with respect to users’ transcripts be they ciphertexts or tags, but it only holds the public verification key, the sum \(\mathsf{sum_{t}}\) and the proof \({\mathsf {\sigma _t}}\).

4 PUDA Instantiation

Let \(\mathbb {G}_1,\mathbb {G}_2,\mathbb {G}_T\) be three cyclic groups of large prime order p and \(g_1,g_2\) be generators of \(\mathbb {G}_1,\mathbb {G}_2\) accordingly. We say that e is a bilinear map, if the following properties are satisfied:

  1. 1.

    bilinearity: \(e(g_1^a,g_2^b)=e(g_1,g_2)^{ab}\), for all \(g_1, g_2\in \mathbb {G}_1\times \mathbb {G}_2\) and \(a,b \in \mathbb {Z}_p\).

  2. 2.

    Computability: there exists an efficient algorithm that computes \(e(g_1^a,g_2^b)\) for all \(g_1,g_2\in \mathbb {G}_1\times \mathbb {G}_2\) and \(a,b \in \mathbb {Z}_p\).

  3. 3.

    Non-degeneracy: \(e(g_1,g_2)\ne 1\).

To encrypt users’ data homomorphically, we employ the discrete logarithm based encryption scheme of Shi et al. [17]:

4.1 Shi-Chan-Rieffel-Chow-Song Scheme

  • \({\mathsf {Setup}}(1^{\kappa })\): Let \(\mathbb {G}_1\) be a group of large prime order p. A trusted key dealer \(\mathcal {KD}\) selects a hash function \(H:\{0, 1\}^* \rightarrow \mathbb {G}_1\) . Furthermore, \(\mathcal {KD}\) selects secret encryption keys \(\mathsf {ek}_i \in \mathbb {Z}_p\) uniformly at random. \(\mathcal {KD}\) distributes to each user \({\mathcal {U}}_i\) the secret key \(\mathsf {ek}_i\) and sends the corresponding decryption key \(\mathsf {SK}_A=-\sum _{i=1}^n{\mathsf {ek}_i}\) to the Aggregator.

  • \({\mathsf {Encrypt}}(\mathsf {ek}_i,x_{i,t})\): Each user \({\mathcal {U}}_i\) encrypts the value \(x_{i,t}\) using its secret encryption key \(\mathsf {ek}_i\) and outputs the ciphertext \(c_{i,t}=H(t)^{\mathsf {ek}_i}g_1^{x_{i,t}} \in \mathbb {G}_1\).

  • \({\mathsf {Aggregate}}(\{c_{i,t}\}_{i=1}^n, \{{\mathsf {\sigma }}_{i,t}\}_{i=1}^n,\mathsf {SK_A})\): Upon receiving all the ciphertexts \(\{c_{i,t}\}_{i=1}^n\), the Aggregator computes: \(V_t = (\prod _{i=1}^n{c_{i,t}})H(t)^{\mathsf {SK}_A}=H(t)^{\sum _{i=1}^n{\mathsf {ek}_i}}g_1^{\sum _{i=1}^n{x_{i,t}}} H(t)^{-\sum _{i=1}^n{\mathsf {ek}_i}}=g_1^{\sum _{i=1}^n{x_{i,t}}} \in \mathbb {G}_1\). Finally \(\mathcal{A}\) learns the sum \(\mathsf{sum_{t}}=\sum _{i=1}^n{x_{i,t}} \in \mathbb {Z}_p \) by computing the discrete logarithm of \(V_t\) on the base \(g_1\). The sum computation is correct as long as \(\sum _{i=1}^n{x_{i,t}}<p\).

The above scheme is efficient as long as the plaintext values remain in a small range so as to the discrete logarithm computation during \({\mathsf {Aggregate}}\) algorithm is fast.

4.2 PUDA Scheme

In what follows we describe our PUDA protocol:

  • \({\mathsf {Setup}}(1^{\kappa })\): \(\mathcal {KD}\) outputs the parameters (\(p,g_1,g_2,\mathbb {G}_1, \mathbb {G}_2, \mathbb {G}_T\)) for an efficient computable bilinear map \(e:\mathbb {G}_1 \times \mathbb {G}_2 \rightarrow \mathbb {G}_T\), where \(g_1\) and \(g_2\) are two random generators for the multiplicative groups \(\mathbb {G}_1\) and \(\mathbb {G}_2\) respectively and p is a prime number that denotes the order of all the groups \(\mathbb {G}_1, \mathbb {G}_2\) and \(\mathbb {G}_T\). Moreover secret keys \(a, \{\mathsf {tk_i}\}_{i=1}^n\) are selected by \(\mathcal {KD}\). \(\mathcal {KD}\) publishes the verification key \(\mathsf {VK}=(\mathsf {vk}_1, \mathsf {vk}_2)= (g_2^{\sum _{i=1}^n{\mathsf {tk}_i}}, g_2^a) \) and distributes to each user \({\mathcal {U}}_i \in \mathbb {U}\) the secret key \(g_1^a \in \mathbb {G}_1\), the encryption key \(\mathsf{ek}_i\) and the tag key \(\mathsf {tk_i}\) through a secure channel. Thus the secret keys of the scheme are \(\mathsf {SK}_i=(\mathsf {ek}_i, \mathsf {tk}_i,g_1^a)\). After publishing the public parameters \(\mathcal {P}=(H,p,g_1,g_2,\mathbb {G}_1, \mathbb {G}_2, \mathbb {G}_T)\) and the verification key \(\mathsf {VK}\), \(\mathcal {KD}\) goes off-line and it does not further participate in any protocol phase.

  • \({\mathsf {EncTag}}(t, \mathsf {SK}_i=(\mathsf {ek}_i, \mathsf {tk}_i,g_1^a), x_{i,t})\): At each time interval t each user \({\mathcal {U}}_i\) encrypts the data value \(x_{i,t}\) with its secret encryption key \(\mathsf {ek}_i\), using the encryption algorithm, described in Sect. 4.1, which results in a ciphertext

    $$c_{i,t}=H(t)^{\mathsf {ek}_i}g_1^{x_{i,t}} \in \mathbb {G}_1$$

    \({\mathcal {U}}_i\) also constructs a tag \({\mathsf {\sigma }}_{i,t}\) with its secret tag key \((\mathsf {tk}_i,g_1^a)\):

    $${\mathsf {\sigma }}_{i,t}=H(t)^{\mathsf {tk}_i}(g_1^{a})^{x_{i,t}} \in \mathbb {G}_1$$

    Finally \({\mathcal {U}}_i\) sends \((c_{i,t}, {\mathsf {\sigma }}_{i,t})\) to \(\mathcal{A}\).

  • \({\mathsf {Aggregate}}(\mathsf {SK_A},\{c_{i,t}\}_{\mathcal {U}_i \in \mathbb {U}}, \{{\mathsf {\sigma }}_{i,t}\}_{\mathcal {U}_i \in \mathbb {U}})\): Aggregator \(\mathcal{A}\) computes the sum \(\mathsf{sum_{t}}=\sum _{i=1}^n{x_{i,t}}\) by using the \({\mathsf {Aggregate}}\) algorithm presented in Sect. 4.1. Moreover, \(\mathcal{A}\) aggregates the corresponding tags as follows:

    $$\begin{aligned} {\mathsf {\sigma _t}}=\prod _{i=1}^n{{\mathsf {\sigma }}_{i,t}}=\prod _{i=1}^n{H(t)^{\mathsf {tk}_i}(g_1^{a})^{x_{i,t}}}=H(t)^{\sum {\mathsf {tk}_i}}(g_1^{a})^{\sum {x_{i,t}}} \end{aligned}$$

    \(\mathcal{A}\) finally forwards \(\mathsf{sum_{t}}\) and \({\mathsf {\sigma _t}}\) to data analyzer \(\mathcal{DA}\).

  • \({\mathsf {Verify}}(\mathsf {VK},t, \mathsf{sum_{t}},{\mathsf {\sigma _t}})\): During the verification phase \(\mathcal{DA}\) verifies the correctness of the computation with the verification key \(\mathsf {VK}=(\mathsf {vk}_1, \mathsf {vk}_2) =(g_2^{\sum {\mathsf {tk}_i}},g_2^a)\), by checking the following equality:

    $$e({\mathsf {\sigma _t}},g_2)\mathop {=}\limits ^{?}e(H(t),\mathsf {vk}_1) e(g_1^{\mathsf{sum_{t}}},\mathsf {vk}_2)$$

Verification correctness follows from bilinear pairing properties:

$$\begin{aligned} e({\mathsf {\sigma _t}},g_2)&=e(\prod _{i=1}^n{{\mathsf {\sigma }}_{i,t}},g_2)=e(\prod _{i=1}^n{H(t)^{\mathsf {tk}_i} g_1^{a{x_{i,t}}}},g_2) \\ {}&= e(H(t)^{\sum _{i=1}^n{\mathsf {tk}_i}} g_1^{a{\sum _{i=1}^n{x_{i,t}}}},g_2) \\ {}&= e(H(t)^{\sum _{i=1}^n{\mathsf {tk}_i}},g_2)e( g_1^{a{\sum _{i=1}^n{x_{i,t}}}},g_2) \\ {}&= e(H(t),g_2^{\sum _{i=1}^n{\mathsf {tk}_i}})e( g_1^{{\sum _{i=1}^n{x_{i,t}}}},g_2^a) \\ {}&= e(H(t),g_2^{\sum _{i=1}^n{\mathsf {tk}_i}})e( g_1^{\mathsf{sum_{t}}},g_2^a) \\ {}&= e(H(t),\mathsf {vk}_1) e(g_1^{\mathsf{sum_{t}}},\mathsf {vk}_2) \end{aligned}$$

5 Analysis

5.1 Aggregator Obliviousness

Theorem 1

The proposed solution achieves Aggregator Obliviousness in the random oracle model under the decisional Diffie-Hellman (DDH) assumption in \(\mathbb {G}_1\).

Due to space limitations the proof of Theorem 1 can be found in the full version [14].

5.2 Aggregate Unforgeability

We first introduce a new assumption that is used during the security analysis of our PUDA instantiation. Our new assumption named hereafter \(\mathsf {LEOM}\) is a variant of the \(\mathsf {LRSW}\) assumption [16] which is proven secure in the generic model [18] and used in the construction of the CL signatures [5].

The oracle \(\mathcal {O}_{\mathsf {LEOM}}\) first chooses a and \(k_i\), \(1 \le i \le n\) in \(\mathbb {Z}^*_p\). Then it publishes the tuple \((g_1, g_2^{\sum _{i=1}^n{k_i}},g_2^{a})\). Thereafter, the adversary picks \(h_t \in \mathbb {G}_1\) and makes queries \((h_t, i,x_{i,t})\) for \(1 \le i \le n\) to the \(\mathcal {O}_{\mathsf {LEOM}}\) oracle which in turn replies with \(h_t^{k_i}g_1^{a x_{i,t}}\) for \(1 \le i \le n\).

The adversary is allowed to query the oracle \(\mathcal {O}_{\mathsf {LEOM}}\) for different \(h_t\) with the restriction that it cannot issue two queries for the same pair \((h_t, i)\).

We say that the adversary breaks the LEOM assumption, if it outputs a tuple \((z,h_t,h_t^{\sum _{i=1}^n{k_i}}g_1^{a z})\) for a previously queried t and \(z \ne \sum _{i=1}^n{x_{i,t}}\).

Theorem 2

(\(\mathsf {LEOM}\) Assumption) Given the security parameter \(\kappa \), the public parameters \((p, e,\mathbb {G}_1,\mathbb {G}_2,g_1,g_2)\), the public key \((g_2^{a}, g_2^{\sum _{i=1}^n{k_i}})\) and the oracle \(\mathcal {O}_{\mathsf {LEOM}}\), we say that the \(\mathsf {LEOM}\) assumption holds iff:

For all probabilistic polynomial time adversaries \(\mathcal{A}\), the following holds:

$$ \Pr [ (z,h_t,\sigma _t)\leftarrow \mathcal{A}^{\mathcal {O}_{\mathsf {LEOM}}(.)} : z\ne \sum _{i=1}^n{x_{i,t}} \wedge \sigma _t=h_t^{\sum _{i=1}^n{k_i}}g_1^{az}]\le \epsilon _2 (\kappa ) $$

Where \(\epsilon _2\) is a negligible function.

We show in our analysis that a Type I Forgery implies breaking the \(\mathsf {BCDH}\) assumption and that a Type II Forgery implies breaking the \(\mathsf {LEOM}\) assumption.

Theorem 3

Our scheme achieves aggregate unforgeability against a Type I Forgery under \(\mathsf {BCDH}\) assumption in the random oracle model.

Theorem 4

Our scheme guarantees aggregate unforgeability against a Type II Forgery under the \(\mathsf {LEOM}\) assumption in the random oracle model.

Due to space limitations, the security evidence of the LEOM assumption and proofs for Theorems 3 and 4 are deferred to Appendix A and B.

5.3 Performance Evaluation

In this section we analyze the extra overhead of ensuring the aggregate unforgeability property in our PUDA instantiation scheme. First, we consider a theoretical evaluation with respect to the mathematical operations a participant of the protocol be it user, Aggregator or Data Analyzer has to perform to ensure public verifiablity. That is, the computation of the tag by each user, the proof by the Aggregator and the verification of the proof by the Data Analyzer. We also present an experimental evaluation that shows the practicality of our scheme.

To allow the Data analyzer to verify the correctness of computations performed by an untrusted Aggregator, the key dealer distributes to each user \(g_1^a, \mathsf {tk}_i \in \mathbb {G}_1\) and publishes \(g_2^a,g_2^{\sum _{i=1}^{n}{\mathsf {tk}_i}} \in \mathbb {G}_2\), which calls for one exponentiation in \(\mathbb {G}_1\) and \(1+n\) in \(\mathbb {G}_2\). At each time interval t each user computes \({\mathsf {\sigma }}_{i,t}=H(t)^{\mathsf {tk}_i}(g_1^{a})^{x_{i,t}} \in \mathbb {G}_1\), which entails two exponentiations and one multiplication in \(\mathbb {G}_1\). To compute the proof \({\mathsf {\sigma _t}}\), the Aggregator carries out \(n-1\) multiplications in \(\mathbb {G}_1\). Finally the data analyzer verifies the validity of the aggregate sum by checking the equality: \(e({\mathsf {\sigma _t}},g_2)\mathop {=}\limits ^{?}e(H(t),\mathsf {vk}_1) e(g_1^{\mathsf{sum_{t}}},\mathsf {vk}_2)\), which asks for three pairing evaluations, one hash in \(\mathbb {G}_1\), one exponentiation in \(\mathbb {G}_1\) and one multiplication in \(\mathbb {G}_T\) (see Table 1). The efficiency of \(\mathbf{{ PUDA }}\) stems from the constant time verification with respect to the number of the users. This is of crucial importance since the Data Analyzer may not be computationally powerful.

Table 1. Performance of tag computation, proof construction and verification operations. l denotes the bit-size of the prime number p.

We implemented the verification functionalities of \(\mathbf{{ PUDA }}\) with the \(\texttt {Charm}\) cryptographic framework [1, 2]. For pairing computations, it inherits the \(\texttt {PBC}\) [15] library which is also written in \(\texttt {C}\). All of our benchmarks are executed on Intel\(^\circledR \) Core\(^TM\) i5 CPU M 560 @ 2.67GHz \(\times \) 4 with 8GB of memory, running Ubuntu 12.04 32bit. \(\texttt {Charm}\) uses 3 types of asymmetric pairings: \(\mathtt {MNT159,MNT201,MNT224}\). We run our benchmarks with these three different types of asymmetric pairings. The timings for all the underlying mathematical group operations are summarized in Table 3. There is a vast difference on the computation time of operations between \(\mathbb {G}_1\) and \(\mathbb {G}_2\) for all the different curves. The reason is the fact that the bit-length of elements in \(\mathbb {G}_2\) is much larger than in \(\mathbb {G}_1\).

As shown in Table 2, the computation of tags \({\mathsf {\sigma }}_{i,t}\) implies a computation overhead at a scale of milliseconds with a gradual increase as the bit size of the underlying elliptic curve increases. The data analyzer is involved in pairing evaluations and computations at the target group independent of the size of the data-users.

Table 2. Computational cost of \(\mathbf{{ PUDA }}\) operations with respect to different pairings.
Table 3. Average computation overhead of the underlying mathematical group operations for different type of curves.

6 Related Work

In [12] the authors presented a solution for verifiable aggregation in case of untrustworthy users. The solutions entails signatures on commitments of the secret values with non-interactive zero knowledge proofs, which are verified by the Aggregator. Hung-Min et al. [19] employed aggregate signatures in order to verify the integrity of the data, without addressing confidentiality issues for a malicious Aggregator. In [6], authors proposed a solution which is based on homomorphic message authenticators in order to verify the computation of generic functions on outsourced data. Each data input is authenticated with an authentication tag. A composition of the tags is computed by the cloud in order to verify the correctness of the output of a program P. Thanks to the homomorphic properties of the tags the user can verify the correctness of the program. The main drawback of the solution is that the user in order to verify the correctness of the computation has to be involved in computations that take exactly the same time as the computation of the function f. Backes et al. [3] proposed a generic solution for efficient verification of bounded degree polynomials in time less than the evaluation of f. The solution is based on \(closed form efficient \) pseudorandom function PRF. Contrary to our solution both solutions do not provide individual privacy and they are not designed for a multi-user scenario.

Catalano et al. [8] employed a nifty technique to allow single users to verify computations on encrypted data. The idea is to re-randomize the ciphertext and sign it with a homomorphic signature. Computations then are performed on the randomized ciphertext and the original one. However the aggregate value is not allowed to be learnt in cleartext by the untrusted Aggregator since the protocols are geared for cloud based scenarios.

In the multi-user setting, Choi et al. [9] proposed a protocol in which multiple users are outsourcing their inputs to an untrusted server along with the definition of a functionality f. The server computes the result in a privacy preserving manner without learning the result and the computation is verified by a user that has contributed to the function input. The users are forced to operate in a non-interactive model, whereby they cannot communicate with each other. The underlying machinery entails a novel proxy based oblivious transfer protocol, which along with a fully homomorphic scheme and garbled circuits allows for verifiability and privacy. However, the need of fully homomorphic encryption and garbled circuits renders the solution impractical for a real world scenario.

7 Concluding Remarks

In this paper, we designed and analyzed a protocol for privacy preserving and unforgeable data aggregation. The purpose of the protocol is to allow a data analyzer to verify the correctness of computation performed by a malicious Aggregator, without revealing the underlying data to either the Aggregator or the data analyzer. In addition to being provably secure and privacy preserving, the proposed protocol enables public verifiability in constant time.