1 Introduction

It is well known that in general encrypted data cannot be compressed. In this work, we study the task of compressing encrypted data, when a small amount of knowledge about the structure of the underlying plaintexts is known. More concretely, we consider encryptions of vectors \(\boldsymbol{m} = (m_1, \dots , m_n)\) where at most t distinct coordinates in \(\boldsymbol{m}\) are non-zero. In the context of outsourced storage applications the task of compressing such vectors appears naturally.

In searchable encryption [3, 26], we have a client Charlie, who holds a vector \((m_1, \dots , m_n)\) of data elements and wants to store it remotely on server Sally. To hide the contents of the data elements, Charlie encrypts the data vector under a secret key only she knows before sending it to Sally. Later on, Charlie may want to search through the vector and retrieve all elements that match some secret keyword. A series of recent works [1, 6,7,8, 18, 27] have shown how to construct searchable encryption schemes from fully homomorphic encryption [12, 22] with reasonable concrete efficiency. Conceptually, these approaches are comprised of two major steps. First Charlie sends a short keyword-dependent hint to the server, who uses it to obliviously transform the vector of ciphertexts \((c_1, \dots , c_n)\) into a new vector \(\boldsymbol{\tilde{c}} = (\tilde{c}_1, \dots , \tilde{c}_n)\), where for \(i \in \{1, \dots , n\}\) the ciphertext \(\tilde{c}_i\) is either an encryption of the original message \(m_i\) in \(c_i\) or zero, depending on whether \(m_i\) was matching the keyword of Charlie or not. In the second step, the server obliviously compresses the vector \(\boldsymbol{\tilde{c}}\) under the assumption that no more than t ciphertexts were matching the keyword and sends it back to Charlie. If the assumption about the sparsity of the vector \(\boldsymbol{\tilde{c}}\) was correct, then Charlie successfully decodes the vector and obtains the desired result. If the assumption was not correct, then Charlie may not be able to retrieve the output.

The best compression algorithm used in this context is due to Choi et al. [8], which compresses \(\boldsymbol{\tilde{c}}\) into a bit string of length \(\varOmega (t\xi (\kappa + \log n))\), where \(\xi \) is the length of one ciphertext entry. Under the assumption that the plaintext messages are of a specific formFootnote 1, the authors show that Charlie can correctly decode the vector with probability \(1-2^{-\kappa }\). Both compressing and decoding are computationally concretely efficient.

In the oblivious message retrieval setting, recently introduced by Liu and Tromer [19], we have a server Sally, who keeps a public bulletin board, and multiple clients Charlie, Chucky, and Chris. Each of the clients can post encrypted messages for any of the other clients on the bulletin board, but would like to hide who is the recipient of which message. At some point, for example, Chucky may want to retrieve all messages that are intended for him. Naively, he could simply download all contents from Sally’s bulletin board, but this would incur a large bandwidth overhead that is linear in the total number of messages stored by Sally. Instead, the idea behind oblivious message retrieval is to let Chucky generate a short identity-dependent hint that can be used by Sally to obliviously generate a short message that contains all relevant encrypted messages for Chucky. Conceptually, the construction of Liu and Tromer follows the exact same blueprint as the searchable encryption scheme outlined above. First Sally obliviously filters her vector with the hint provided by Chucky and then she obliviously compresses the filtered vector under the assumption that not too many messages are addressed to Chucky.

From an efficiency perspective, the solution of Liu and Tromer is rather expensive. To compress, Sally performs \(\varOmega (tn + \kappa t \log t)\) homomorphic additions and \(\varOmega (tn)\) homomorphic multiplications by constants, where \(\kappa \) is the correctness error defined as in the searchable encryption example. Sally’s message to Chucky is \(\varOmega (\xi t + \xi \kappa t\log t)\) bits long. To decode the result from Sally’s message, Chucky needs to perform gaussian elimination on a matrix of size \(\mathcal {O}({t}) \times \mathcal {O}({t})\), which incurs a computational overhead of \(\varOmega (t^3)\). The authors provide heuristic optimizations of their constructions that improve their performance significantly, but unfortunately these come without asymptotic bounds or provable correctness guarantees.

Taking a step back and looking at the two applications described above from a more abstract point of view, one can recognize that both follow a very similar blueprint. In the first step, both apply some vastly different techniques to convert a vector of ciphertexts into a sparse vector containing only the desired entries. In the second step, both works solve the identical problem. They both need to compress a sparse homomorphically encrypted vector with nothing but the knowledge of how many entries are non-zero, and in particular without any knowledge about which entries are zero and which ones are not. How to compress such a sparse encrypted vector is the topic of this work.

1.1 Our Contribution

We present two new algorithms, one based on polynomials and one based on algorithmic hashing, for compressing sparse encrypted vectors, which both improve upon the prior state of the art in terms of compression rate. Our algorithms only rely on homomorphic additions and homomorphic multiplications by constants. Both of our constructions have provable worst-case bounds for all their parameters.

Compressing via Polynomials. Our first construction (Sect. 4) is perfectly correct and is based on the concept of sparse polynomial interpolation. Its compression rate has an asymptotically optimal dependence on t, as the compressed vector is merely \(\mathcal {O}({\xi \cdot t})\) bits large. During compression one needs to perform \(\mathcal {O}({n \log n})\) homomorphic additions and equally many homomorphic multiplications by constants. The main bottleneck of this solution is the decompression complexity of \(\mathcal {\tilde{O}}(t \cdot \sqrt{n})\), which depends on the length n of the original vector. Although the compression rate is much better than that of previous works, such as those Choi et al. [8] and that of Liu and Tromer [19], this construction suffers from a slower decompression time.

Compressing via Hashing. Our second construction (Sect. 5) is a randomized hashing based solution, which is correct with probability \(1-2^{-\kappa }\), where the probability is taken over the random coins of the compression algorithm. We develop a novel data structure that is heavily inspired by the invertible bloom lookup tables of Goodrich and Mitzenmacher [14], but can be applied efficiently to encrypted data. Both compression and decompression are highly efficient. During compression one needs to perform \(\mathcal {O}({n\kappa /\log t})\) homomorphic additions and multiplications by constants, where \(\kappa \ge \log t\). During decompression the main costs come from \(\mathcal {O}({\kappa t /\log t})\) many decryptions and equally many evaluations of a pseudorandom permutation. In contrast to our polynomial based solution, however, the compressed vector is \(\mathcal {O}({\xi \kappa t /\log t})\) bits large. Nevertheless, this construction outperforms all prior works in terms of compression rate, while having either superior or comparable compression and decompression complexities.

1.2 Strawman Approach

When the sparse vector is encrypted using a fully homomorphic encryption scheme, conceptually simple solutions to the compression problem exist. For instance, Sally could just homomorphically sort all entries in the vector and then only send back the t largest entries. Such solutions, however, require her to perform multiplications of encrypted values. This is problematic for multiple reasons. Multiplications of encrypted values are much more computationally expensive than homomorphic additions or multiplications by constants. Since Sally may potentially store a very large database, we would like to minimize her computational overhead. Furthermore, if the data is encrypted using a somewhat homomorphic encryption scheme, then the multiplicative depth of the circuit that can be executed on the vector by Sally is bounded. Ideally, we would like the compression step to be concretely efficient and not require the use of any multiplications of encrypted values. For these reasons, we only focus on compression algorithms that require homomorphic additions and multiplications by constants in this work.

1.3 Additional Related Works

In addition to what has already been discussed above, there are several other works that are related to ours. Johnson, Wagner, and Ramchandran [16] showed that, assuming messages from a source with bounded entropy, it is possible to compress one-time pad encryptions without knowledge of the encryption key through a clever application of Slepian-Wolf coding [24]. Their result only applied to linear stream ciphers but was later extended to block ciphers using certain chaining modes by Klinc et al. [17]. These result do not apply to our setting, where we focus on compressing vectors encrypted using more complex homomorphic public-key encryption schemes.

In the context of fully homomorphic encryption, multiple works [4, 20, 25] have studied the question of how to optimize the encryption rate, i.e., the size of the ciphertext relative to the size of the plaintext, by packing multiple plaintexts into one ciphertext. These results are related, but do not allow for obliviously “removing” irrelevant encryptions of zero.

Another line of works [5, 11, 13] studies the compressed sensing problem, where the task is to design a matrix A such that it is possible to recover, possibly high-dimensional, but sparse vectors x from a vector of measurements Ax of small dimension. In general these works aim to recover an approximation of x even when given somewhat noisy measurements. In our case, we are interested in the simpler problem of exact recovery of a sparse vector. Our construction based on polynomials can be seen as a matrix-vector multiplication. Looking ahead, our matrix A will be a carefully chosen Vandermonde matrix that allows for very efficient matrix vector multiplication. The server will multiply this public matrix with the encrypted vector and send back the result that can be decoded by the client. Our second construction, based on hashing, does not fall into this category of algorithms.

2 Preliminaries

Notation. Given a possibly randomized function \(f: X \rightarrow Y\), we will sometimes abuse notation and write \(f(\boldsymbol{x}) := (f(x_1), \dots , f(x_n))\) for \(\boldsymbol{x} \in X^n\). For a set X, we write \(x \leftarrow X\) to denote the process of sampling a uniformly random element \(x \in X\). For a vector \(\boldsymbol{v} \in X^n\), we write \(v_i\) to denote its i-th component. For a matrix \(M \in X^{n \times m}\), we write M[ij] to denote the cell in the i-th row and j-th column. We write [n] to denote the set \(\{1, \dots , n\}\). For a set \(X^n\), we define the scissor operator to denote the subset of \(X^n\) consisting only of those vectors with unique entries.

Definition 1 (Sparse Vector Representation)

Let \(\mathbb {F}_q\) be a field and let \(\boldsymbol{a}\in \mathbb {F}_q^n\) be a vector. The sparse representation of \(\boldsymbol{a}\) is the set \(\textsf{sparse}(\boldsymbol{a}) := \{(i,a_i)\mid a_i\ne 0\}\).

2.1 Homomorphic Encryption

Informally, a homomorphic encryption scheme allows for computing an encryption of \(f(\boldsymbol{m})\), when only given the description of f and an encryption of message vector \(\boldsymbol{m}\). Throughout the paper, we assume that functions are represented as circuits composed of addition and multiplication gates.

Definition 2

A homomorphic encryption scheme \(\mathcal {E}\) is defined by a tuple of PPT algorithms \((\textsf{Gen}, \textsf{Enc}, \textsf{Eval},\textsf{Dec})\) that work as follows:

  • \(\textsf{Gen}(1^\lambda )\): The key generation algorithm takes the security parameter \(1^\lambda \) as input and returns a secret key \(\textsf{sk}\) and public key \(\textsf{pk}\). The public key implicitly defines a message space \(\mathcal {M}\) and ciphertext space \(\mathcal {C}\). We denote the set of all public keys as \(\mathcal {P}\).

  • \(\textsf{Enc}(\textsf{pk}, m)\): The encryption algorithm takes the public key \(\textsf{pk}\) and message \(m \in \mathcal {M}\) as input and returns a ciphertext \(c \in \mathcal {C}\).

  • \(\textsf{Eval}(\textsf{pk}, f, \boldsymbol{c})\): The evaluation algorithm takes the public key \(\textsf{pk}\), a function \(f : \mathcal {M}^n \rightarrow \mathcal {M}^m\), and a vector \(\boldsymbol{c} \in \mathcal {C}^n\) of ciphertexts as input and returns a new vector of ciphertexts \(\tilde{\boldsymbol{c}} \in \mathcal {C}^m\).

  • \(\textsf{Dec}(\textsf{sk}, c)\): The deterministic decryption algorithm takes the secret key \(\textsf{sk}\) and ciphertext \(c \in \mathcal {C}\) as input and returns a message \(m \in \mathcal {M}\cup \{\bot \}\).

Throughout the paper we assume that the ciphertext size is fixed and does not increase through the use of the homomorphic evaluation algorithm. We extend the definition of \(\textsf{Enc}\) and \(\textsf{Dec}\) to vectors and matrices of messages and ciphertexts respectively, by applying them componentwise, i.e., for any matrix \(\boldsymbol{M}\in \mathcal {M}^{n\times m}\), we have \(\textsf{Enc}(\textsf{pk},\boldsymbol{M}) = \boldsymbol{C}\) with \(\boldsymbol{C} \in \mathcal {C}^{n\times m}\) and \(C[i,j] = \textsf{Enc}(\textsf{pk},M[i,j])\) and equivalently \(\textsf{Dec}(\textsf{sk},\boldsymbol{C}) = \boldsymbol{M}'\) with \(\boldsymbol{M}' \in \mathcal {M}^{n\times m}\) and \(M'[i,j] = \textsf{Dec}(\textsf{sk},C[i,j])\). Let \(\mathcal {E}\) be an additively homomorphic encryption scheme with message space \(\mathcal {M}=\mathbb {F}_q\) for some prime power q. And let \(f : \mathbb {F}_q^2 \rightarrow \mathbb {F}_q\), \(f(a,b) := a+b\) and let \(g_\alpha : \mathbb {F}_q \rightarrow \mathbb {F}_q\), \(g(a) := \alpha \cdot a\) for any constant \(\alpha \in \mathbb {F}_q\). For notational convenience we write \(\textsf{Eval}(\textsf{pk},f,(c_1,c_2)^\intercal )\) as \(c_1 \boxplus c_2\) and \(\textsf{Eval}(\textsf{pk},g_\alpha ,c)\) as \(\alpha \boxdot c\) with \(\textsf{pk}\) being inferrable from context. For the sake of simplicity we restrict ourselves to homomorphic encryption schemes with unique secret keys, i.e. for a given \(\textsf{pk}\), there exists at most one \(\textsf{sk}\), such that \((\textsf{sk},\textsf{pk})\leftarrow \textsf{Gen}(1^\lambda )\). We write \(\textsf{Gen}^{-1}(\textsf{pk})\) to denote the – not efficiently computable – unique secret key.

Later on in the paper, it will be convenient for us to talk about ciphertexts that may not be fresh encryptions, but still allow for some homomorphic operations to be performed on them.

Definition 3

(\(\mathcal {Z}\) -Validity). Let \((\textsf{Gen}, \textsf{Enc}, \textsf{Eval}, \textsf{Dec})\) be a homomorphic encryption scheme, let \(\mathcal {Z}\) be a class of circuits, and let \(\textsf{pk}\) be a public key. A vector \(\boldsymbol{c}\) of ciphertexts is \(\mathcal {Z}\)-valid for \(\textsf{pk}\), iff for all functions \(f\in \mathcal {Z}\) it holds that and \(\textsf{Dec}(\textsf{Gen}^{-1}(\textsf{pk}),\textsf{Eval}(\textsf{pk},f,\boldsymbol{c}) = f(\textsf{Dec}(\textsf{sk},\boldsymbol{c}))\). We denote by \(\textsf{vld}(\mathcal {Z},\textsf{pk})\) the set of ciphertext vectors \(\mathcal {Z}\)-valid for \(\textsf{pk}\).

2.2 Polynomial Kung Fu

Let \(f(x) = \sum _{i=0}^{d} a_i \cdot x^i \in \mathbb {F}_q[x]\) be a polynomial with coefficients from a finite field \(\mathbb {F}_q\). The degree of f is defined as the largest exponent in any monomial with a non-zero coefficient. We say that f is s-sparse, if the number of non-zero monomials is at most s or more formally if \(|\{a_i \mid a_i \ne 0 \wedge i \in [n]\}| \le s\). It is well-known that any polynomial of degree at most d can be interpolated from \(d+1\) evaluation points. In this work, we will make use of the less well-known fact that sparse polynomials can be interpolated from a number of evaluation points that is linear in the polynomial’s sparsity. The first algorithms for interpolating sparse univariate and multivariate polynomials were presented by Prony [21] and Ben-Or and Tiwari [2] respectively. We will make use of the following result by Huang and Gao [15] for sparse interpolation of univariate polynomials over finite fields:

Theorem 4

([15]). Let \(f \in \mathbb {F}_q[x]\) be an s-sparse univariate polynomial of degree at most d with coefficients from a finite field \(\mathbb {F}_q\). Let \(\omega \in \mathbb {F}_q\) be a primitive \(2(s+1)\)-th root of unity. There exists an algorithm \(\textsf{Interpolate}\) that takes evaluations \(f(\omega ^0), \dots , f(\omega ^{2s+1})\) as input and returns the coefficients of f in sparse representation in time \(\mathcal {\tilde{O}}(s \cdot \sqrt{d})\).

The algorithm of Huang and Gao relies on a subroutine for finding discrete logarithms. Using Shank’s algorithm [23] for this step, we obtain the computational complexity stated in the above theorem with deterministic performance guarantees.

Another tool we will use is the Fast Fourier Transform, (re-)discovered by Cooley and Tukey [9] which allows for evaluating a degree d polynomial given as a list of coefficients at \(\ell \le d\) evaluation points simultaneously in time \(\mathcal {O}({d\log d})\). More precisely, for a fixed set of evaluation points \((\omega ^0, \dots , \omega ^{\ell })\), one can represent the circuit taking the polynomial coefficients \((a_0, \dots , a_d)\) as input and returning \((f(\omega ^0), \dots , f(\omega ^\ell ))\) as a series of \(\mathcal {O}({\log d})\) alternating layers of \(\mathcal {O}({d})\) addition or multiplication by constants gates respectively.

Theorem 5

[Fast Fourier Transform]. Let \(d, \ell \in \mathbb {N}\) with \(d \ge \ell \). Let \(f = \sum _{i=0}^{d} a_i \cdot x^i \in \mathbb {F}_q[x]\) be a polynomial of degree at most d with coefficients from a finite field \(\mathbb {F}_q\). Let \(\omega \in \mathbb {F}_q\) be a primitive \(\ell \)-th root of unity. There exists an arithmetic circuit \(\textsf{FFT}\) comprised of a series of \(\mathcal {O}({\log d})\) alternating layers of \(\mathcal {O}({d})\) addition or multiplication by constants gates respectively that takes \((a_0, \dots , a_d) \) as well as \((\omega ^0, \dots , \omega ^{\ell })\) as input and returns \((f(\omega ^0), \dots , f(\omega ^{\ell }))\).

2.3 Invertible Bloom Lookup Tables

An invertible Bloom lookup table (IBLT) is a data structure first introduced by Goodrich and Mitzenmacher [14] that supports three operations called \(\textsf{Insert}\), \(\textsf{Peel}\), and \(\textsf{List}\). The insertion operations adds elements to the data structure, the deletion operations removes themFootnote 2 and the list operation recovers all currently present elements with high probability, if not too many elements are present.

The data structure consists of two \(\gamma \times 8t\) matrices C, the count matrix and V the valueSum matrix. It further requires t-wise independent hash functions \(h_i : \{0,1\}^* \rightarrow [8t]\) for \(i \in [\gamma ]\). Initially all values are set to 0. To insert an element x into the data structure, we locate the cells \(C_{i, h_i(x)}\) and \(V_{i, h_i(x)}\) for \(i \in [\gamma ]\) and add 1 to each counter and x to each valueSum. To remove an element, we perform the inverse operations. To list all elements currently present in (CV), we repeatedly perform a peeling operation until (CV) is empty. The peeling operation finds a cell with counter 1, adds that corresponding valueSum value to the output list and deletes the element from (CV). The only way the list operation may fail is if (CV) is not empty, but the peeling operation cannot find any cell with counter 1. It has been shown by Goodrich and Mitzenmacher that this probability decreases exponentially in \(\gamma \log t\). We formally describe the algorithms in Fig. 1.

Fig. 1.
figure 1

An invertible Bloom lookup table

Theorem 6

([14]). Let \(h_1, \dots , h_\gamma \) be t-wise independent hash functions, then for any \(X = \{x_1, \dots , x_t\}\) it holds that

$$ \Pr \left[ B := \textsf{Insert}((0^2)^{\gamma \times 8t},X) : \begin{aligned} \textsf{List}(B) \ne X \end{aligned} \right] \le \mathcal {O}({2^{-(\gamma -2) \log t}}), $$

where the probability is taken over the random choices of \(h_1, \dots , h_\gamma \).

Remark 1

The construction of an IBLT can be modified to store tuples of values, by maintaining multiple valueSum matrices, one for each component. As long as one of the components remains unique among all inserted values, it is sufficient to use this component as input to the has functions, without affecting Theorem 6. We will make use of this in our construction in Sect. 5.

3 Ciphertext Compression

In this section we formally define the concept of a ciphertext compression scheme \((\textsf{Compress}, \textsf{Decompress})\). Intuitively, the compression algorithm takes the public encryption key \(\textsf{pk}\) as well as a vector of ciphertexts \(\boldsymbol{c}\) from some family \(\mathcal {F}_\textsf{pk}\) of ciphertext vectors as input and returns some compressed representation thereof. The decompression algorithm gets the compressed representation as well as the secret decryption key as input and should return the decryption of \(\boldsymbol{c}\).

Definition 7 (Ciphertext Compression Scheme)

Let \(\mathcal {E}=(\textsf{Gen},\textsf{Enc},\textsf{Eval}, \textsf{Dec})\) be a homomorphic public key encryption scheme with ciphertext size \(\xi =\xi (\lambda )\). Let \(\mathcal {P}\) be the public key space of \(\mathcal {E}\). For each \(\textsf{pk}\in \mathcal {P}\) let \(\mathcal {F}_\textsf{pk}\) be a set of ciphertext vectors. A \(\delta \)-compressing, \((1-\epsilon )\)-correct ciphertext compression scheme for the family \(\mathcal {F}:=\{\mathcal {F}_\textsf{pk}\mid \textsf{pk}\in \mathcal {P}\}\) is a pair of PPT algorithms \((\textsf{Compress}, \textsf{Decompress})\), such that for any \((\textsf{sk},\textsf{pk})\leftarrow \textsf{Gen}(1^\lambda )\) and any \(\boldsymbol{c}\in \mathcal {F}_\textsf{pk}\) the output length of \(\textsf{Compress}(\textsf{pk}, \boldsymbol{c})\) is at most \(\delta \xi |\boldsymbol{c}|\) and it holds that

$$ \Pr [\textsf{Decompress}(\textsf{sk}, \textsf{Compress}(\textsf{pk}, \boldsymbol{c})) = \textsf{sparse}(\textsf{Dec}(\textsf{sk}, \boldsymbol{c}))] = 1-\epsilon (\lambda ), $$

where the probability is taken over the random coins of the compression and decompression algorithms.

Remark 2

Note, that a ciphertext compression scheme gives no guarantee whatsoever in the case where \(\boldsymbol{c} \notin \mathcal {F}_{\textsf{pk}}\).

4 Compression via Sparse Polynomials

In this section we present our first construction, which is based on the idea of interpolating sparse polynomials. Given the right building blocks, the construction is conceptually very simple. We simply view the sparse encrypted vector \((c_1, \dots , c_n)\) as the coefficient representation of sparse polynomial. Using the Fast Fourier Transform, we homomorphically evaluate this polynomial efficiently at some sufficient number of points. These encrypted evaluations will constitute the compression of the vector. To obtain the original vector during decompression, we simply decrypt the evaluation points and interpolate the corresponding sparse polynomial.

Definition 8 (Fast Fourier Functions)

The class of fast fourier functions is the set of functions \(\mathcal {Z}_{\textsf{FFT}}^\ell = \{f^\ell _{\boldsymbol{x}} \mid \boldsymbol{x} \in \mathbb {F}_q^\ell \}\) with

$$ f^\ell _{\boldsymbol{x}} : \mathbb {F}_q^n \rightarrow \mathbb {F}^\ell _q,\quad f_{\boldsymbol{x}}(\boldsymbol{a}):=\textsf{FFT}(\boldsymbol{a},\boldsymbol{x}). $$

Definition 9

(\(\mathcal {Z}_{\textsf{FFT}}^{2(t+1)}\) -Valid Low Hamming Weight Ciphertext Vectors). Let \(\mathcal {E}= (\textsf{Gen}, \textsf{Enc}, \textsf{Eval}, \textsf{Dec})\) be a homomorphic public key encryption scheme. For any \(\textsf{pk}\in \mathcal {P}\), let

$$ \mathcal {F}^{\textsf{FFT}}_{t,\textsf{pk}} := \bigl \{ \boldsymbol{c} \in \textsf{vld}(\mathcal {Z}_{\textsf{FFT}}^{2(t+1)},\textsf{pk}) \mid {{\,\mathrm{\textsf{hw}}\,}}(\textsf{Dec}(\textsf{Gen}^{-1}(\textsf{pk}),\boldsymbol{c})) < t \bigr \}. $$

We then define the family of \(\mathcal {Z}_{\textsf{FFT}}^{2(t+1)}\)-valid ciphertext vectors with low hamming weight as \(\mathcal {F}^{\textsf{FFT}}_t := \{\mathcal {F}^\textsf{FFT}_{t,\textsf{pk}}\mid \textsf{pk}\in \mathcal {P}\}\).

Fig. 2.
figure 2

A ciphertext compression scheme for \(\mathcal {F}^\textsf{FFT}_t\) based on sparse polynomials. Here \(\textsf{FFT}(\cdot , (\omega ^0, \dots , \omega ^{2t+1}))\) refers to the circuit of the function \(f_{(\omega ^0, \dots , \omega ^{2t+1})}\) from Definition 8, i.e., the \(\textsf{FFT}\) circuit with the hardcoded second input \((\omega ^0, \dots , \omega ^{2t+1})\).

Theorem 10

Let \(\mathcal {E}= (\textsf{Gen}, \textsf{Enc}, \textsf{Eval}, \textsf{Dec})\) be an additively homomorphic encryption scheme with message space \(\mathcal {M}= \mathbb {F}_q\) with ciphertext size \(\xi =\xi (\lambda )\). Let \(n,t\in \mathbb {N}\) be integers such that \(n<q\) and let \(\omega \in \mathbb {F}_q\) be a \(2(t+1)\)-th primitive root of unity. Then \((\textsf{Compress},\textsf{Decompress})\) from Fig. 2 is a \(2(t+1)/n\)-compressing perfectly correct ciphertext compression scheme for family \(\mathcal {F}^{\textsf{FFT}}_t\).

Proof

Let \(\boldsymbol{c}\) be an arbitrary, but fixed \(\mathcal {Z}_{\textsf{FFT}}^{2(t+1)}\)-valid ciphertext vector and let S be an arbitrary vector in sparse representation. Due to the validity condition on \(\boldsymbol{c}\) we know that

$$\begin{aligned}&\Pr \left[ \begin{aligned} \boldsymbol{\tilde{c}} \leftarrow \textsf{Eval}\left( \textsf{pk}, \textsf{FFT}\left( \cdot , (\omega ^0, \dots , \omega ^{2t+1})\right) , \boldsymbol{c}\right) \\ \boldsymbol{m} \leftarrow \textsf{Dec}(\textsf{sk}, \boldsymbol{\tilde{c}})\\ S = \textsf{Interpolate}(\boldsymbol{m}) \\ \end{aligned} \right] \\ =&\Pr \left[ \begin{aligned} \boldsymbol{m} \leftarrow \textsf{Dec}(\textsf{sk}, \boldsymbol{c})\\ \boldsymbol{m'} \leftarrow \textsf{FFT}\left( \boldsymbol{m}, (\omega ^0, \dots , \omega ^{2t+1})\right) \\ S = \textsf{Interpolate}(\boldsymbol{m'}) \\ \end{aligned} \right] . \end{aligned}$$

Furthermore, the \(\mathcal {Z}_{\textsf{FFT}}^{2(t+1)}\)-validity of \(\boldsymbol{c}\) tells us that \(\textsf{Dec}(\textsf{sk}, \boldsymbol{c})\) is a vector of hamming weight at most t or, when viewed as a polynomial in coefficient representation, a t-sparse polynomial of degree at most n. From Theorem 4 it follows this sparse polynomial can be correctly interpolated from its \(2(t+1)\) evaluations produced by \(\textsf{FFT}(\textsf{Dec}(\textsf{sk}, \boldsymbol{c}) ,(\omega ^0, \dots , \omega ^{2t+1}))\) and therefore

$$ \textsf{Interpolate}(\textsf{FFT}(\textsf{Dec}(\textsf{sk}, \boldsymbol{c}) ,(\omega ^0, \dots , \omega ^{2t+1}))) = \textsf{sparse}(\textsf{Dec}(\textsf{sk}, \boldsymbol{c})). $$

The output of \(\textsf{Compress}\) is a vector of \(2(t+1)\) ciphertexts of size \(\xi \) and thus the scheme is \(2(t+1)/n\) compressing.

5 Compression via IBLTs

In this section we present our second construction, which is on a variant of invertible Bloom lookup tables. Given a vector of ciphertexts \(\boldsymbol{c}\) the idea is to homomorphically insert the corresponding non-zero plaintexts into an (encrypted) IBLT. The encrypted IBLT would then constitute the compression of the vector.

This approach encounters two problems: First, the insertion operation of an IBLT requires hashing the value to choose the cells to insert it in, which we cannot do because we do not have access to the plaintext. Second, since we do not know which of the ciphertexts correspond to a non-zero it is unclear how to only insert those.

The first problem can be solved using Remark 1 by actually storing pairs \((d,m_d)\). Since the index d is both publically known and unique, we can rely on only hashing d to derive the positions to insert the values.

The second problem is a bit trickier to solve. To build some intuition, we can first consider an easier compression problem where in addition to \(\boldsymbol{c}\) we are given an additional vector of ciphertexts \(\boldsymbol{h}\) containing “zero hints”. I.e., \(h_d\) decrypts to 0 if \(c_i\) decrypts to 0 and \(h_d\) decrypts to 1, if \(c_d\) decrypts to anything else.

The IBLT then gets initialized as three matrices, a count matrix \(\boldsymbol{C}\) and two valueSum matrices \(\boldsymbol{M}\) and \(\boldsymbol{D}\) with each cell containing an encryption of zero. To insert the content of ciphertext \(c_d\) into the IBLT, we can then for \(i\in [\gamma ]\) compute \(j:=h_i(d)\) and insert the value by setting \(\boldsymbol{C}[i,j] := \boldsymbol{C} \boxplus h_d\), \(\boldsymbol{M}[i,j] := \boldsymbol{M} \boxplus c_d\), and \(\boldsymbol{D}[i,j] := \boldsymbol{D} \boxplus (d\boxdot h_d)\).

Note that for any ciphertext corresponding to zero, this results in zero being added to all entries, which is equivalent to not inserting the value at all. Decompressing then involves decrypting all three matrices and using the \(\textsf{List}\) algorithm to extract pairs \((d,m_d)\) giving us a sparse representation of the plaintext vector corresponding to \(\boldsymbol{c}\). By the correctness guarantee of an IBLT, this works as long as not too many ciphertexts decrypt to a non-zero value.

However, actually getting such “zero hint” ciphertexts may not be feasible in all scenarios, especially if the encryption scheme is only additively homomorphic. This means we need to somehow simulate having a count matrix without these zero hints.

The trick that we use is to choose a vector of random values \(\boldsymbol{k}\) that we will use to “recognize” cells that only contain a single message. We will still initialize two matrices \(\boldsymbol{M}\) and \(\boldsymbol{K}\) but inserting into the IBLT is now done by setting \(\boldsymbol{M}[i,j] := \boldsymbol{M} \boxplus c_d\), and \(\boldsymbol{K}[i,j] := \boldsymbol{K} \boxplus (k_d\boxdot c_d)\). Note now, that after decryption, for any cell (ij) that only contains a single value \(m_d\), we have that \(\boldsymbol{M}[i,j] = m_d\) and \(\boldsymbol{K}[i,j] = k_d\cdot m_d\). By checking if \(\boldsymbol{K}[i,j]/\boldsymbol{M}[i,j]\) corresponds to one of the values in \(\boldsymbol{k}\), we can thus recognize which cells contain only a single value and which index it corresponds to, allowing us to peel the message from the IBLT. In section we prove in a helpful lemma in Sect. 5.2 we prove that we can bound the probability that this recognition procedure produces false positives.

There still remains the problem that simply using a random vector \(\boldsymbol{k}\) and storing it, which would require \(\mathcal {O}({n})\) storage and \(\mathcal {O}({n})\) computation to recognize the entries. To solve this issue we introduce the concept of wunderbar pseudorandom vectors in Sect. 5.1, which allows us to store a compact \(\mathcal {O}({\lambda })\) representation of a pseudorandom vector \(\boldsymbol{k}\) and recognition of vector entries in time \(\mathcal {O}({{{\,\mathrm{\textsf{polylog}}\,}}(n)})\).

5.1 Wunderbar Pseudorandom Vectors

The concept of a pseudorandom vector is conceptually similar to that of pseudorandom sets introduced in [10], except that we do not require puncturability. The idea is that it allows us to sample a short description of a long vector, which is indistinguishable from a random vector with unique entries. Importantly, we require that there exists an efficient algorithm that can recover the position of a given entry in the vector in time independent of the vector length. Naively one can always find the position in linear time in the vector length. This is, however, not good enough for our application, which is why we require the pseudorandom vector to be “wunderbar”. In particular, we want the description length of the vector to be in \(\mathcal {O}({\lambda })\) and getting individual entries as well as index recovery should be possible in \(\mathcal {O}({{{\,\mathrm{\textsf{polylog}}\,}}(n)})\).

Definition 11

A pseudorandom vector with index recovery for an efficiently sampleable universe \(K=K(\lambda )\) consists of a triple of ppt algorithms \((\textsf{Sample},\textsf{Entry},\textsf{Index})\) such that

  • \(\textsf{Sample}(1^\lambda ,n)\): The sampling algorithm takes as input the security parameter \(\lambda \) and the vector length n in unary and outputs the description of a pseudorandom vector s.

  • \(\textsf{Entry}(s,i)\): The deterministic retrieving algorithm takes as input a description s and an index \(i\in [n]\) and outputs a value \(k_i \in K\).

  • \(\textsf{Index}(s,k)\): The deterministic index recovery algorithm takes as input a description s and a value k and outputs either an index \(i\in [n]\) or \(\bot \).

A pseudorandom vector with index recovery is correct, if for all vector lengths \(n=\textsf{poly}(\lambda )\) and all seeds \(s\leftarrow \textsf{Gen}(1^\lambda ,1^n)\) it holds that:

  1. 1.

    For all indices \(i\in [n]\) it holds that \(\textsf{Index}(s,\textsf{Entry}(s,i))=i\).

  2. 2.

    For all all \(k^*\not \in \{\textsf{Entry}(s,i) \mid i\in [n]\}\) it holds that \(\textsf{Index}(s,k^*)=\bot \).

The pseudorandom vector is wunderbar if the description of a vector has length \(\mathcal {O}({\lambda })\) and the runtime of \(\textsf{Entry}\) and \(\textsf{Index}\) is \(\mathcal {O}({{{\,\mathrm{\textsf{polylog}}\,}}(n)})\). A pseudorandom vector is secure, if for all \(n=\textsf{poly}(\lambda )\) and all ppt algorithms \(\mathcal {A}\)

figure c

Remark 3

For ease of notation we define two algorithms \(\textsf{DummySample}\) and \(\textsf{DummyIndex}\) that represent a dummy version of a pseudorandom vector with index recovery. I.e., \(\textsf{DummySample}(1^\lambda ,n)\) simply samples and \(\textsf{DummyIndex}(\boldsymbol{k},k)\) performs an exhaustive search and returns i iff \(k_i=k\) and \(\bot \) if none of them match. Using this notation, the above security definition can be rewritten as

$$ \begin{aligned}&\left| {\Pr \left[ \begin{aligned} s\leftarrow \textsf{Sample}(1^\lambda ,1^n),\\ \boldsymbol{k} := \begin{pmatrix}\textsf{Entry}(s,1)\\ \vdots \\ \textsf{Entry}(s,n)\end{pmatrix} \end{aligned} : \mathcal {A}(\boldsymbol{k})\right] - \Pr [\boldsymbol{k}\leftarrow \textsf{DummySample}(1^\lambda ,n) : \mathcal {A}(\boldsymbol{k})]} \right| \\ \le&\textsf{negl}(\lambda )\end{aligned} $$

Wunderbar Pseudorandom Vectors from Pseudorandom Permuations. Let \(\mathbb {F}_{p^m}\) be a field such that \(m\cdot \lfloor \log p\rfloor \ge \lambda \). We construct wunderbar pseudorandom vectors over a subset \(K\subseteq \mathbb {F}_{p^m}\) from an arbitrary family of pseudorandom permutations over \(\mathbb {F}_2^\lambda \). To do so we need an efficiently computable and efficiently invertible injective function \(\textsf{binToField}\) mapping from \(\mathbb {F}_2^\lambda \) to \(\mathbb {F}_{p^m}\). The exact function is irrelevant, but for concreteness, we specify it in the following. Let \(\{0,1\}: [q] \rightarrow \mathbb {F}_2^{\lceil \log q\rceil }\) denote the function that maps an integer to its canonical binary representation and let \(\textsf{proj}: \mathbb {F}_2^{\lceil \log q\rceil } \rightarrow [q]\) be its inverse. Then we specify

$$ \textsf{binToField}: \mathbb {F}_2^\lambda \rightarrow \mathbb {F}_{p^m} \quad \textsf{binToField}((b_1,\dots ,b_\lambda )) := \sum _{i=0}^{m-1} c_ix^i $$

where

figure e

We further specify the inverse function as

$$\begin{aligned}\begin{gathered} \textsf{fieldToBin}: \mathbb {F}_{p^m} \rightarrow \mathbb {F}_2^\lambda \cup \{\bot \}\\ \textsf{fieldToBin}(\sum _{i=0}^{m-1} c_ix^i) := {\left\{ \begin{array}{ll}\bot &{} \text {if }\exists c_i.\; c_i \ge 2^{\min \{\lceil \frac{\lambda }{m} \rceil ,\lambda -i\lceil \frac{\lambda }{m} \rceil \}}\\ (b_1,\dots ,b_\lambda )&{}\text {otherwise}\end{array}\right. } \end{gathered}\end{aligned}$$

where

$$\begin{aligned} b_i := \textsf{bin}\bigl (c_{\lfloor \nicefrac {i}{\lceil \nicefrac {\lambda }{m}\rceil }\rfloor }\bigr )_{i-\lfloor \nicefrac {i}{\lceil \nicefrac {\lambda }{m}\rceil }\rfloor \cdot \lceil \nicefrac {\lambda }{m}\rceil }. \end{aligned}$$
Fig. 3.
figure 3

A wunderbar pseudorandom vector for \(K \subseteq \mathbb {F}_{p^m}\) constructed from a family of pseudorandom permuations over \(\mathbb {F}_2^\lambda \).

Theorem 12

Let \(\textsf{PRP}\) be a secure family of pseudorandom permuations over some \(\mathbb {F}_2^\lambda \). Then \((\textsf{Sample},\textsf{Entry},\textsf{Index})\) as described in Fig. 3 is a secure wunderbar pseudorandom vector with index recovery for universe \(K=\{\textsf{binToField}(\boldsymbol{b})\mid \) \(\boldsymbol{b} \in \mathbb {F}_2^\lambda \}\).

Proof

We need to establish that the construction is correct, wunderbar, and secure. It is simple to see that the construction is correct:

figure f

Similarly, it is easy to see that the construction is wunderbar: the description consists of \(s\in \mathbb {F}_2^\lambda \) and \(n=\textsf{poly}(\lambda )\le \lambda ^c\) for some constant c. Therefore, it has size at most \(\lambda + c\cdot \log \lambda \in \mathcal {O}({\lambda })\). The runtime of \(\textsf{Entry}\) is in fact independent of n and thus trivially in \(\mathcal {O}({{{\,\mathrm{\textsf{polylog}}\,}}(n)})\) and the only computation in \(\textsf{Index}\) that depends on n, is the membership check \(\textsf{proj}(\boldsymbol{b})\in [n]\) which can be performed in time \(\mathcal {O}({\log n}) \subset \mathcal {O}({{{\,\mathrm{\textsf{polylog}}\,}}(n)})\).

It remains to show that the construction is secure. Let \(n=n(\lambda )=\textsf{poly}(\lambda )\) and let \(\mathcal {A}\) be an arbitrary PPT algorithm, such that We construct an adversary \(\mathcal {B}\) against the pseudorandomness of as follows. \(\mathcal {B}\) takes as input the security parameter \(\lambda \) and is given access to an oracle. For each \(i\in [n]\), query \(\textsf{bin}(i)\) to the oracle, receiving back \(\boldsymbol{b}'_i\) and compute \(k_i := \textsf{binToField}(\boldsymbol{b}'_i)\). Invoke \(\mathcal {A}(\boldsymbol{k})\) and output whatever \(\mathcal {A}\) outputs. Clearly, \(\mathcal {B}\) is also PPT, needing a runtime overhead of just n oracle queries over simply running \(\mathcal {A}\). We now consider two cases: if, on the one hand, the oracle is \(\textsf{PRP}(s,\cdot )\), then for all \(i\in [n]\) \(k_i = \textsf{binToField}(\textsf{PRP}(s,\textsf{bin}(i))) = \textsf{Entry}((s,n),i)\). I.e., we have

$$\begin{aligned} \begin{aligned}&\Pr [s\leftarrow \mathbb {F}_2^\lambda : \mathcal {B}^{\textsf{PRP}(s,\cdot )}=1]\\ =&\Pr [s\leftarrow \textsf{Sample}(1^\lambda ,1^n), \boldsymbol{k} := (\textsf{Entry}(s,1),\dots ,\textsf{Entry}(s,n))^\intercal ) : \mathcal {A}(\boldsymbol{k})]. \end{aligned} \end{aligned}$$
(1)

If, on the other hand, the oracle is a truly random permutation g, then for all \(i\in [n]\) it holds that \(k_i = \textsf{binToField}(g(\textsf{bin}(i)))\) and therefore

$$\begin{aligned}&\Pr [g \leftarrow \varPi (\mathbb {F}_2^\lambda ) : \mathcal {B}^{g(\cdot )}=1]\nonumber \\ =&\Pr [g \leftarrow \varPi (\mathbb {F}_2^\lambda ); \forall i\in [n]. k_i = \textsf{binToField}(g(\textsf{bin}(i))) : \mathcal {A}(\boldsymbol{k})] \end{aligned}$$
(2)
(3)
(4)

Here, Eq. 3 holds because g is a uniformly chosen random permutation and therefore the values \(g(\textsf{bin}(i))\) are uniformly distributed conditioned on not being duplicates and Eq. 4 holds because \(\textsf{binToField}\) is an injective function into K.

Combining Eq. 1 and Eq. 4 we get

figure g

where the last inequality follows from the fact that \(\textsf{PRP}\) is pseudorandom. \(\square \)

5.2 A Helpful Lemma

We prove a helpful lemma which allows to bound the probability of false positives when attempting to detect cells with only a single entry in the IBLT. Recall, that we have two matrices \(\boldsymbol{M}\) and \(\boldsymbol{K}\), where the cells of \(\boldsymbol{M}\) contain sums of messages \(m_d\) and the cells of \(\boldsymbol{K}\) contain sums of \(k_d\cdot m_d\) for a random vector \(\boldsymbol{k}\). We check for cells containing only a single message, i.e. cells that can be peeled, by checking whether \(\boldsymbol{K}[i,j]/\boldsymbol{M}[i,j]\) corresponds to one of the values in \(\boldsymbol{k}\). A false positive could occur, if for some set of at least two non-zero messages corresponding to indices \(I\subseteq [n]\) it happens to hold that

$$\begin{aligned} k_j=\frac{\sum _{i\in I} k_im_i}{\sum _{i\in I} m_i} \end{aligned}$$

for some \(j\in [n]\). The lemma states that we can bound the probability of this occuring by choosing the entries of \(\boldsymbol{k}\) from a large enough space.

Lemma 13

Let \(K \subseteq \mathbb {F}_q\), \((m_1,\dots ,m_n)\in \mathbb {F}_q^n\) and \(I\subseteq [n]\) be arbitrary such that \(\sum _{i\in I}m_i \ne 0\) and there exist \(i,i'\in I\) with \(0\not \in \{m_i,m_{i'}\}\). It holds that

figure h

Proof

Using a union bound we have

(5)

It thus remains to bound the above probability for individual j. Let \(j\in [n]\) be arbitrary but fixed and let \(\xi =1\) if \(j\in I\) and \(\xi =0\) otherwise. It then holds that

figure i

We now consider two cases. If \(\sum _{i\in I\setminus \{j\}}=0\), let \(j'\in I\setminus \{j\}\) be an index, such that \(m_{j'}\ne 0\). Note that such an index always exists by the condition on I. We then have

figure j

where \((-m_{j'})^{-1}\) is always defined by the condition that \(m_{j'}\ne 0\). Since the right hand side of the equality is independent of \(k_{j'}\), the probability that the equality holds is at most \(1/|K|\) for any choice of \(k_i\), \(i\ne j'\). Thus, in this case

$$\begin{aligned} \Pr \biggl [\boldsymbol{k} \leftarrow K^n : k_j=\frac{\sum _{i\in I} k_im_i}{\sum _{i\in I} m_i}\biggr ] \le \frac{1}{|K|}. \end{aligned}$$
(6)

In the other case, i.e., if \(\sum _{i\in I\setminus \{j\}}\ne 0\), \((\sum _{i\in I\setminus \{j\}})^{-1}\) is well defined and we have

figure k

Here again, the right hand side of the equality is independent of \(k_{j}\). Thus, the probability that the equality holds is at most \(1/|K|\) for any choice of \(k_i\), \(i\ne j\) and also in this case it holds that

$$\begin{aligned} \Pr \biggl [\boldsymbol{k} \leftarrow K^n : k_j=\frac{\sum _{i\in I} k_im_i}{\sum _{i\in I} m_i}\biggr ] \le \frac{1}{|K|} \end{aligned}$$
(7)

Finally, combining Eq. 5 with Eq. 6 and Eq. 7, we get

figure l

By observing that the statistical distance betweem \(K^n\) and is at most \(n^2/|K|\) due to the birthday bound, we obtain the following corollary.

Corollary 14

Let \(K \subseteq \mathbb {F}_q\), \((m_1,\dots ,m_n)\in \mathbb {F}_q^n\) and \(I\subseteq [n]\) be arbitrary such that \(\sum _{i\in I}m_i \ne 0\) and there exist \(i,i'\in I\) with \(0\not \in \{m_i,m_{i'}\}\). It holds that

figure n

5.3 Construction of Ciphertext-Compression from IBLTs

Populating the IBLT involves homomorphically evaluating an inner product between the encrypted vector and a plain vector. Therefore, the compression scheme can only work for ciphertext vectors that allow the evaluation of inner product functions defined in the following.

Definition 15 (Inner Product Functions)

The class of inner product functions is the set of functions \(\mathcal {Z}_{\textsf{ip}}= \{f_{\boldsymbol{a}} \mid \boldsymbol{a} \in \mathbb {F}_q^n\}\) with

$$ f_{\boldsymbol{a}} : \mathbb {F}_q^n \rightarrow \mathbb {F}_q,\quad f_{\boldsymbol{a}}(\boldsymbol{x}):=\langle \boldsymbol{a},\boldsymbol{x}\rangle . $$

The family of ciphertext vectors the construction is applicable to is then exactly those ciphertext vectors with low hamming weight and allow the evaluation of inner product functions. We define this family as follows.

Definition 16

(\(\mathcal {Z}_{\textsf{ip}}\) -Valid Low Hamming Weight Ciphertext Vectors). Let \(\mathcal {E}= (\textsf{Gen}, \textsf{Enc}, \textsf{Eval}, \textsf{Dec})\) be a homomorphic public key encryption scheme. For any \(\textsf{pk}\in \mathcal {P}\), let

$$ \mathcal {F}^{\textsf{ip}}_{t,\textsf{pk}} := \bigl \{ \boldsymbol{c} \in \textsf{vld}(\mathcal {Z}_{\textsf{ip}},\textsf{pk}) \mid {{\,\mathrm{\textsf{hw}}\,}}(\textsf{Dec}(\textsf{Gen}^{-1}(\textsf{pk}),\boldsymbol{c})) < t \bigr \}. $$

We then define the family of \(\mathcal {Z}_{\textsf{ip}}\)-valid ciphertext vectors with low hamming weight as \(\mathcal {F}^{\textsf{ip}}_t := \{\mathcal {F}_{t,\textsf{pk}}\mid \textsf{pk}\in \mathcal {P}\}\).

Fig. 4.
figure 4

A ciphertext compression scheme based on invertible bloom lookup tables and wunder pseudorandom vectors.

Theorem 17

Let \(\mathcal {E}= (\textsf{Gen}, \textsf{Enc}, \textsf{Eval}, \textsf{Dec})\) be an additively homomorphic encryption scheme with message space \(\mathcal {M}= \mathbb {F}_q\) for some prime power q with ciphertext size \(\xi =\xi (\lambda )\). Let be integers and let . Let be a family of pseudorandom functions and let \((\textsf{Sample},\textsf{Entry},\textsf{Index})\) be a wunderbar pseudorandom vector with index recovery for a universe \(K=K(\lambda ) \subseteq \mathbb {F}_q\) with \(|K|\ge 2^{\kappa }(8t\gamma )(n^3+n^2)\). Then \((\textsf{Compress},\textsf{Decompress})\) from Fig. 4 is a \((\mathcal {O}({\lambda })+16t\gamma \xi )/(n\xi )\)-compressing \((1-\mathcal {O}({2^{-\kappa }}) - \textsf{negl}(\lambda ))\)-correct ciphertext compression scheme for family \(\mathcal {F}^{\textsf{ip}}_t\).

Proof

The output of the compression algorithm consists of a \(s_1\), \(s_2\) and \(16t\gamma \) ciphertexts. Since the pseudorandom vector is wunderbar, it holds that \(|s_1| = \mathcal {O}({\lambda })\) and \(s_2\) is chosen as a \(\lambda \)-bit string. Therefore, it is easy to see that the scheme is \((\mathcal {O}({\lambda })+16t\gamma \xi )/(n\xi )\)-compressing. It remains to prove that it is correct. To do so we define a series of six hybrid schemes in Figs. 5 through 9.

Fig. 5.
figure 5

The first hybrid scheme works exactly as the actual ciphertext compression scheme, except that it operates on plaintext messages instead of encrypted messages. I.e., ciphertexts are now decrypted before compression instead of between compression and decompression.

Claim 18

For any \(\mathcal {Z}_{\textsf{ip}}\)-valid vector of ciphertexts it holds that

$$ \begin{aligned}&\Pr [\textsf{Decompress}(\textsf{sk}, \textsf{Compress}(\textsf{pk}, \boldsymbol{c})) = \textsf{sparse}(\textsf{Dec}(\textsf{sk}, \boldsymbol{c}))]\\ =&\Pr [\textsf{Decompress}_1(\textsf{Compress}_1(\textsf{Dec}(\textsf{sk},\boldsymbol{c}))) = \textsf{sparse}(\textsf{Dec}(\textsf{sk}, \boldsymbol{c}))]\end{aligned} $$

Proof

Following Definition 15 we denote by \(f_{\boldsymbol{a}}\) the inner product function \(f_{\boldsymbol{a}} : \mathbb {F}_q^n \rightarrow \mathbb {F}_q,\quad f_{\boldsymbol{a}}(\boldsymbol{x}):=\langle \boldsymbol{a},\boldsymbol{x}\rangle \). Further, we denote

figure s

Now, let \(\boldsymbol{M}_0,\boldsymbol{K}_0,\boldsymbol{M}'_0,\boldsymbol{K}'_0\) and \(\boldsymbol{M}_1,\boldsymbol{K}_1,\boldsymbol{M}'_1,\boldsymbol{K}'_1\) denote the relevant matrices in the actual scheme and hybrid 1 respectively. We note, that since \(\boldsymbol{c}\) is \(\mathcal {Z}_{\textsf{ip}}\)-valid, it holds for all \((i,j)\in [\gamma ]\times [8t]\) that

figure t

as well as

figure u

Since the computation on \(\boldsymbol{M'},\boldsymbol{K'}\) is otherwise identical between the two hybrids, the claim immediately follows.   \(\square \)

Fig. 6.
figure 6

The second hybrid scheme works exactly as the first hybrid scheme, except that instead of using the wunder pseudorandom vector it uses the dummy sampler and the dummy index recovery to work with a uniformly random vector

figure v

Claim 19

If is a secure pseudorandom vector, it holds for any key pair \((\textsf{sk},\textsf{pk})\) and any vector \(\boldsymbol{c}\) that

$$ \Biggl |\begin{aligned}&\Pr [\textsf{Decompress}_1(\textsf{Compress}_1(\textsf{Dec}(\textsf{sk},\boldsymbol{c}))) = \textsf{sparse}(\textsf{Dec}(\textsf{sk}, \boldsymbol{c}))]\\ -&\Pr [\textsf{Decompress}_2(\textsf{Compress}_2(\textsf{Dec}(\textsf{sk},\boldsymbol{c}))) = \textsf{sparse}(\textsf{Dec}(\textsf{sk}, \boldsymbol{c}))]\end{aligned}\Biggr | \le \textsf{negl}(\lambda ). $$

Proof

We construct an attacker \(\mathcal {A}\) against security of the pseudorandom vector as follows. On input \(\boldsymbol{k}\), \(\mathcal {A}\) executes \(\textsf{Decompress}_2(\textsf{Compress}_2(\textsf{Dec}(\textsf{sk},\boldsymbol{c})))\), except that it uses its input \(\boldsymbol{k}\) instead of sampling a fresh one. If \(\boldsymbol{k}\) was chosen using \(\boldsymbol{k}\leftarrow \textsf{DummySample}(1^\lambda ,n)\), this is identical to a regular execution of \(\textsf{Decompress}_2(\textsf{Compress}_2(\textsf{Dec}(\textsf{sk},\boldsymbol{c})))\). If on the other hand \(\boldsymbol{k}\) was chosen by sampling \(s_2\leftarrow \textsf{Sample}(1^\lambda ,n)\) and setting \(\boldsymbol{k}:= (\textsf{Entry}(s,1),\dots ,\textsf{Entry}(s,n))^\intercal \), this is identical to a regular execution of \(\textsf{Decompress}_1(\textsf{Compress}_1(\textsf{Dec}(\textsf{sk},\boldsymbol{c})))\). Therefore, by the security of the pseudorandom vector

figure x

\(\square \)

Fig. 7.
figure 7

The third hybrid scheme works exactly as the second hybrid scheme, except that it maintains a matrix counting how many non-zero messages are mapped to each individual cell and deciding which messages to peel based on these exact counts instead of relying on the matrix \(\boldsymbol{K}\).

Claim 20

It holds that

$$ \left| {\begin{aligned}&\Pr [\textsf{Decompress}_2(\textsf{Compress}_2(\textsf{Dec}(\textsf{sk},\boldsymbol{c}))) = \textsf{sparse}(\textsf{Dec}(\textsf{sk}, \boldsymbol{c}))]\\ -&\Pr [\textsf{Decompress}_3(\textsf{Compress}_3(\textsf{Dec}(\textsf{sk},\boldsymbol{c}))) = \textsf{sparse}(\textsf{Dec}(\textsf{sk}, \boldsymbol{c}))]\end{aligned}}\right| \le 2^{-\kappa }. $$

Proof

We first note two things

  1. 1.

    Whenever a correct element (md) is peeled, the resulting matrices \((\boldsymbol{C'},\boldsymbol{M'},\boldsymbol{K'},\boldsymbol{D})\) are identical to the scenario where \(m_d = \textsf{Dec}(\textsf{sk},c_d) = 0\) and all other mesages are unchanged.

  2. 2.

    In the third hybrid scheme only correct elements are peeled.

The first observation follows because a correct peeling removes a message from the relevant cells by subtracting the corresponding values, which is equivalent to not adding them in the first place, which is exactly what happens if the message is zero. The second observation follows because we correctly keep track of the number of non-zero elements in each cell and only peel those, where a single non-zero element remains. By these observations, at any point during the execution of the decompression loop, there exists a vector \(\boldsymbol{m}' \in \mathbb {F}_q^n\), such that for all (ij)

$$\boldsymbol{K'}[i,j] := {\left\{ \begin{array}{ll}d &{} \text {if } \frac{\sum _{\iota \in I_i}k_\iota m_\iota }{\sum _{\iota \in I_i} m_\iota } = k_d\\ \bot &{} \text {otherwise}\end{array}\right. }$$

where .

We denote by \(E_{r,i,j}\) the event that before the r-th iteration of the main loop of \(\textsf{Decompress}_4\), it holds that \(C[i,j]>1\) but \(K[i,j]\ne \bot \). Note that \(\textsf{Decompress}_4\) and \(\textsf{Decompress}_3\) behave identically unless at least one of \(E_{r,i,j}\) for \((r,i,j)\in [n]\times [\gamma ]\times [8t]\) occurs.

Therefore by a union bound and Corollary 14

figure z

\(\square \)

Fig. 8.
figure 8

The fourth hybrid scheme works exactly as the third hybrid scheme, except that the matrix \(\boldsymbol{D}\) which before contained the indices of the messages if it could be inferred from the matrix \(\boldsymbol{K}\) is now maintained with a sum of the indices of all messages mapped to the cell. This means that whenever \(\boldsymbol{C}[i,j]=1\), \(\boldsymbol{D}[i,j]\) contains the index of the single non-zero message mapped to cell (ij).

Claim 21

It holds that

$$ \left| {\begin{aligned}&\Pr [\textsf{Decompress}_3(\textsf{Compress}_3(\textsf{Dec}(\textsf{sk},\boldsymbol{c}))) = \textsf{sparse}(\textsf{Dec}(\textsf{sk}, \boldsymbol{c}))]\\ =&\Pr [\textsf{Decompress}_4(\textsf{Compress}_4(\textsf{Dec}(\textsf{sk},\boldsymbol{c}))) = \textsf{sparse}(\textsf{Dec}(\textsf{sk}, \boldsymbol{c}))]\end{aligned}}\right| . $$

Proof

The only difference between the two hybrids could occur, if when peeling a message from cell (ij), the content of \(\boldsymbol{D}[i,j]\) would differ between the two hybrids. However, this is not possible, since in hybrid three we have

$$\begin{aligned} \begin{aligned}&\boldsymbol{D}[i,j] = \textsf{DummyIndex}(\boldsymbol{k},\frac{\boldsymbol{K}[i,j]}{\boldsymbol{M}[i,j]})\\=&\textsf{DummyIndex}(\boldsymbol{k},\frac{k_d\cdot m_d}{m_d}) =\textsf{DummyIndex}(\boldsymbol{k},k_d) = d\end{aligned} \end{aligned}$$

just as in hybrid four.    \(\square \)

The fifth hybrid is identical to the fourth hybrid except that the now unneccessary matrix \(\boldsymbol{K}\) is removed. The following claim trivially follows from the fact that \(\boldsymbol{K}\) is not used in either hybrids.

Claim 22

It holds that

figure aa
Fig. 9.
figure 9

The sixth hybrid scheme works exactly as the fifth hybrid scheme, except that instead of using a pseudorandom function to derive j from (id), it uses a truly random function h given as an oracle.

Claim 23

If is a secure pseudorandom function then it holds that

figure ac

Proof

The only difference between the two hybrids is the use of the function in the fifth hybrid and \(h(\cdot ,\cdot )\) in the sixth hybrid. Thus, the claim follows from a straightforward reduction that, given access to an oracle o, executes \(\textsf{Decompress}^{o(\cdot ,\cdot )}_6(\textsf{Compress}^{o(\cdot ,\cdot )}_6(\textsf{Dec}(\textsf{sk},\boldsymbol{c})))\) and outputs 0 if the result equals \(\textsf{sparse}(\textsf{Dec}(\textsf{sk}, \boldsymbol{c}))\) and 1 otherwise.    \(\square \)

Claim 24

It holds for any vector of ciphertexts with hamming weight at most t that

$$\begin{aligned} \Pr [\textsf{Decompress}_6(\textsf{Compress}_6(\textsf{Dec}(\textsf{sk},\boldsymbol{c}))) = \textsf{sparse}(\textsf{Dec}(\textsf{sk}, \boldsymbol{c}))] \le \mathcal {O}({2^{-\kappa }}) \end{aligned}$$

Proof

Denote \(\boldsymbol{m} := \textsf{Dec}(\textsf{sk},\boldsymbol{c})\), \(S := \{(d,m') \mid m'=m_d\}\) and \(S_{\ne 0} := \{(d,m')\in S \mid m'\ne 0\} = \textsf{sparse}(\boldsymbol{m})\). Since \(\boldsymbol{c}\) has Hamming weight at most t, we have that \(|S_{\ne 0}|\le t\).

Comparing hybrid six with the definition of an IBLT in Fig. 1, and keeping in mind Remark 1 we can observe, that what \(\textsf{Compress}_6\) actually outputs is simply an IBLT for pairs containing all elements of \(S_{\ne 0}\) using hash functions \(h(i,\cdot )\). Further, \(\textsf{Decompress}_6\) is in fact the same as \(\textsf{List}\) with \(\textsf{Update}_6\) being identical to \(\textsf{Peel}\). And since \(|S_{\ne 0}|\le t\) and random functions are t-wise independent, it thus holds by Theorem 6 that

figure ae

\(\square \)

By combining all of the above claims and using the triangle inequality it follows that

$$\begin{aligned}&\Pr [\textsf{Decompress}(\textsf{sk}, \textsf{Compress}(\textsf{pk}, \boldsymbol{c})) = \textsf{sparse}(\textsf{Dec}(\textsf{sk}, \boldsymbol{c}))]\\ \ge&1-\mathcal {O}({2^{-\kappa }}) - \textsf{negl}(\lambda )\end{aligned}$$

as claimed. \(\square \)