1 Introduction

Format preserving encryption Suppose that we have a database that stores credit card numbers for a large number of customers, and for security reason, we would like to encrypt all of the credit card numbers. If we take a straightforward approach of using any well known block cipher such as AES, each credit card number, being 16-digits long, should be transformed into a 128-bit plaintext (by adding some dummy information), and then encrypted as a ciphertext of the same length. In order to accommodate all the ciphertexts as 128-bit strings, the database should be largely modified, causing a significant amount of extra cost. With this consideration, it would be desirable to encrypt the credit card numbers into ciphertexts of the same format, namely 16-digit numbers. This problem, called format preserving encryption, does not allow any solution as straightforward as one might expect. One should either design a novel mode of operation in order to use a block cipher operating on large-sized blocks such as AES [2,3,4], or construct a (dedicated) small block cipher from scratch.

Card shuffle-based encryption Focusing on the dedicated construction, a (balanced) Feistel cipher, for example, might not be a satisfactory solution at least from a point of provable security: no matter how carefully designed, the resulting block cipher provides only n/2-bit security for the block size n [12, 13]. This level of security might be acceptable for a large block size n, but not for a small size. Credit card numbers of 16 digits in the above example can be represented approximately by 54 bits, and 27-bit security level would be too low. To find an alternative block cipher structure to address this problem, card shuffle algorithms have begun to attract renewed interest that have a long history in probability theory. A card shuffle can be viewed as an encryption scheme when we think of the final position of a card at the end of the shuffle as the ciphertext of the initial position of the card.

In order for a card shuffle to be a computationally feasible block cipher, it should be oblivious, namely one should be able to trace the trajectory of a card without attending to lots of other cards in the deck. The Thorp shuffle is a well-known example of an oblivious card shuffle, where one first cuts a deck of cards into two equal piles, and then starts dropping the cards from either the left or right hand with probability 1/2 [16]. Interpreted as a block cipher, a perfect matching is fixed on the set of positions for each round, and the two cards on each match is swapped or not according to a random coin of probability 1/2, or equivalently according to the evaluation of a single-bit random function at the match. A representative of each match might be defined as the maximum of the two positions of the match. From this cryptographic point of view, the Thorp shuffle operating on \(\{0,1\} ^n\) has been proved to be secure up to \(2^n/n\) queries for \(O(n^2)\) rounds [10].

Afterwards, a randomized variant of the Thorp shuffle, named swap-or-not, has been proposed [8]. In this shuffle, a perfect matching is randomly chosen by an additional round key; a round key \(K\in \{0,1\} ^n\) defines a perfect matching on \(\{0,1\} ^n\) by the difference of K, namely position \(x\in \{0,1\} ^n\) is matched with \(x\oplus K\). Then a single-bit round function is applied to each pair \(\{x,x\oplus K\}\) and the cards at the two positions are swapped or not according to the round function value. Then the threshold number of queries is significantly improved up to \((1-\varepsilon )2^{n}\) for any \(\varepsilon >0\) for O(n) rounds. Precisely, the adversarial distinguishing advantage is upper bounded by

$$\begin{aligned} \frac{8N^{3/2}}{r+4}\left( \frac{q+N}{2N}\right) ^{r/4+1} \end{aligned}$$

for the r-round swap-or-not shuffle, where N and q denote the size of the domain and the number of queries, respectively.Footnote 1 However it still requires a large number of rounds to achieve a sufficient level of security, for example more than 700 rounds for the domain size \(2^{32}\) and the threshold number of queries \(q=2^{31}\).

1.1 Our results

Partition-and-mix In this work, we naturally extend the swap-or-not shuffle by replacing a perfect matching used in each round of the swap-or-not by a certain uniform keyed partition. Formally, fix a domain \([N]=\{0,\ldots ,N-1\}\) for \(N>0\), the block size D of a keyed partition such that N is a multiple of D, and a certain key space \(\mathcal {K}\). Let \((\mathcal {B}_K)_{K\in \mathcal {K}}\) be a keyed partition of [N], where each key \(K\in \mathcal {K}\) defines a partition of [N]

$$\begin{aligned} \mathcal {B}_K=\{B_K,B_K^2,\ldots ,B_K^{\frac{N}{D}}\} \end{aligned}$$

such that \(|B_K^i|=D\) for \(i=1,\ldots ,N/D\) and \(\bigcup _{i=1}^{\frac{N}{D}}B_K^i=[N]\). For \(\varepsilon >0\), we will say the keyed partition \((\mathcal {B}_K)_{K\in \mathcal {K}}\) is \(\varepsilon \)-almost D-uniform if for every subset \(U\in [N]\) such that \(|U|=D\)

$$\begin{aligned} \Pr \left[ {K \leftarrow _{\$} \mathcal {K}:U\in \mathcal {B}_K}\right] \le \frac{1+\varepsilon }{\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }. \end{aligned}$$

Remark 1

Fix a subset \(U\in [N]\) of size D, and any single element a of U. When a partition of blocks of size D is chosen uniformly at random from the set of all possible partitions, the \(D-1\) other elements of the block containing a are uniformly chosen from the set \([N]\setminus \{a\}\). The probability that they are \(U{\setminus } \{a\}\) is exactly \(1/\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) \). In other words, when a partition of blocks of size D is chosen uniformly at random from the set of all possible partitions, the probability of having U as its block is exactly \(1/\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) \) for any subset \(U\in [N]\) of size D.

Given an almost uniform keyed partition \((\mathcal {B}_K)_{K\in \mathcal {K}}\), the next step is to define an independent random permutation

$$\begin{aligned} \sigma _K^{i,t}:B_K^i\rightarrow B_K^i \end{aligned}$$

for each key \(K\in \mathcal {K}\), \(i=1,\ldots ,N/D\) and \(t=1,\ldots ,r\). Then the t-th round \(\varPsi _t\) of the partition-and-mix shuffle, \(t=1,\ldots ,r\), is defined as

$$\begin{aligned} \varPsi _t(a)=\sigma _{K_t}^{i,t}(a) \end{aligned}$$
(1)

for each \(a\in \{0,1\} ^n\), where \(K_t\in \mathcal {K}\) is the t-th round key and \(i\in \{1,\ldots ,N/D\}\) is the index such that \(a\in B_{K_t}^i\). Finally, the r-round partition-and-mix shuffle is defined as

$$\begin{aligned} \textsf{PM}^{r} \mathrel {\mathop =^\textrm{def}} \varPsi _r\circ \cdots \circ \varPsi _1. \end{aligned}$$

As the entire domain is partitioned into blocks of a larger size \(D\ge 2\) compared to the swap-or-not shuffle, and all the elements in each block are uniformly mixed, it would be natural to expect a faster mixing time, or a smaller number of rounds for a given level of security. We remark that the swap-or-not shuffle can be viewed as an instantiation of the partition-and-mix shuffle with \(D=2\) and \(\mathcal {B}_K=\{\{x,x+K\}:x\in \{0,1\}^n\}\).

The main contribution of this work is to prove the security of the partition-and-mix shuffle; for \(\textsf{PM}^{r}\), we will prove

$$\begin{aligned} {{\textbf {Adv}}}_{\textsf{PM}^r}^{\textrm{cca}}(q)\le \frac{4\left( 1+\varepsilon \right) ^{\frac{r}{4}}N^{\frac{r}{4}+\frac{1}{2}}}{(r-4)D^{\frac{r}{4}}(N-q)^{\frac{r}{4}-1}}. \end{aligned}$$

In particular, if \(q=(1-\delta )N\) for \(\delta >0\), then we have

$$\begin{aligned} {{\textbf {Adv}}}_{\textsf{PM}^r}^{\textrm{cca}}((1-\delta )N)\le \frac{4\delta N^{\frac{3}{2}}}{r-4}\left( \frac{1+\varepsilon }{\delta D}\right) ^{\frac{r}{4}}. \end{aligned}$$

So, for a fixed number of adversarial queries, the number of rounds is reduced by \(\frac{1}{\log D -\log (1+\varepsilon )}\) compared to the swap-or-not shuffle.

Uniform set partition In practice, the efficiency of the partition-and-mix shuffle would depend on the instantiation of the keyed partition. It seems of independent interest to find keyed partitions that allow efficient implementation. In this work, we propose two constructions of uniform random partitions.

The first construction is to use binary Hamming codes. For each integer \(s\ge 2\), there is a binary Hamming code, denoted \(\mathcal {C}_s\), with block length \(2^s-1\) and message length \(2^s-s-1\). In other words, \(\mathcal {C}_s\) is a \((2^s-s-1)\)-dimensional subspace of \(\{0,1\} ^{2^s-1}\). Since a binary Hamming code is perfect, for any \(\textbf{x}\in \{0,1\} ^{2^s-1}\), there is only one codeword \(\textbf{c}\in \mathcal {C}_s\) such that the Hamming distance of \(\textbf{c}\) and \(\textbf{x}\) is at most one. So the balls of radius one centered at the codewords partition the entire set \(\{0,1\} ^{2^s-1}\). With this observation, for \(n\ge 2^s-1\) and \(D=2^s\), we can construct an almost D-wise uniform keyed partition on \(\{0,1\} ^n\) by the following recipe.

  1. 1.

    Linearly independent keys \(K_1,\ldots ,K_{D-1}\in \{0,1\} ^n\) are chosen uniformly at random. Then for a subspace

    $$\begin{aligned} V=\langle K_1,\ldots ,K_{D-1}\rangle \end{aligned}$$

    the entire domain \(\{0,1\} ^n\) is partitioned into the cosets of V.

  2. 2.

    Each coset can be identified as \(\{0,1\} ^{D-1}\). For example, one might choose a representative \(\textbf{a}\) for each coset, and define a bijection from \(\{0,1\} ^{D-1}\) to any coset by mapping

    $$\begin{aligned} \textbf{e}=(e_1,\ldots ,e_{D-1})\in \{0,1\} ^{D-1} \mapsto \textbf{a}+e_1K_1+\cdots +e_{D-1}K_{D-1}. \end{aligned}$$
  3. 3.

    A \([2^s-1,2^s-s-1,3]\)-Hamming code \(\mathcal {C}_s\) and an additional round key

    $$\begin{aligned} \textbf{b}=(b_1,\ldots ,b_{D-1})\in \{0,1\} ^{D-1} \end{aligned}$$

    defines a partition of the set \(\{0,1\} ^{D-1}\), and hence each coset of \(\{0,1\}^n\), as follows.

    $$\begin{aligned} \{0,1\} ^{D-1}=\bigcup _{\textbf{c}\in \mathcal {C}_s}\{\textbf{c}+\textbf{b}+\textbf{e}:\textbf{wt}(\textbf{e})\le 1\}. \end{aligned}$$

This keyed partition is shown to be \(\varepsilon \)-almost uniform for \(\varepsilon =2^{D-n}\). We will discuss in detail the properties and the instantiation of the keyed partitions based on Hamming codes in Sects. 4 and 5.

Our second construction is recursive: for the block size \(D>0\), one can construct a D-uniform keyed partition of \(X\times Y\) using a D-uniform keyed partition of X and a D-wise independent function family from X to Y. Notice that if a function family \((f_K)_{K\in \mathcal {K}_2}\) is D-wise independent, then for any distinct \(x_1,\ldots ,x_D\in X\) and any (not necessarily distinct) \(y_1,\ldots ,y_D\in Y\), the probability that \(g(x_i)=y_i\) for all \(i=1,\ldots ,D\) is the same, namely \(1/|Y|^D\) over random choice of the key \(K\in \mathcal {K}_2\).

Let \((\mathcal {B}'_K)_{K\in \mathcal {K}_1}\) be an \(\varepsilon \)-almost D-uniform keyed partition of X and let Y be an additive group. For a pair of keys \(K=(K_1,K_2)\in \mathcal {K}_1\times \mathcal {K}_2\), let

$$\begin{aligned} \mathcal {B}_K=\left\{ \{(x,f_{K_2}(x)+c):x\in B\}:B\in \mathcal {B}'_{K_1},\ c\in Y\right\} \subset X\times Y. \end{aligned}$$

In Sect. 4, we prove that \((\mathcal {B}_K)_{K\in \mathcal {K}}\) is an \(\varepsilon '\)-almost D-uniform keyed partition of \(X\times Y\) for

$$\begin{aligned} \varepsilon '=\varepsilon +\frac{D^2}{|X|}+\frac{\varepsilon D^2}{|X|}. \end{aligned}$$

A D-wise independent function family is typically defined as a polynomial of degree at most \(D-1\) over a finite field. This construction might be particularly useful when the domain size is not a power of two: for example, if we want to encrypt data (such as credit card numbers) within the domain \(\{0,\ldots ,9\}^{16}\), then we can decompose the domain as \(\{0,\ldots ,9\}^{16}=X\times Y\), where \(X=\{0,1\} ^{16}\) and \(Y=\{0,1,2,3,4\}^{16}\). Then we might use an almost uniform partition on the set X based on a binary Hamming code and any independent function family from X to Y to obtain a uniform keyed partition of \(X\times Y\).

Comparison Figure  compares the upper bounds on distinguishing advantages for the swap-or-not shuffle and the partition-and-mix shuffle based on a 8-uniform keyed partition for the domain size \(N=2^{32}\) and the threshold number of queries \(q=N/2\). In this example, the partition-and-mix shuffle requires a family of random 3-bit permutations, while it provides the same level of security with approximately 1/4th of the number of rounds needed for the swap-or-not shuffle. Details on the instantiation of the partition-and-mix shuffle and its efficiency is discussed in Sect. 5.

Fig. 1
figure 1

Upper bounds on distinguishing advantages for the swap-or-not shuffle (in a dashed line) and the partition-and-mix shuffle (in a solid line) for \(n=32\), \(q=2^{31}\) given as a function of the number of rounds. The \(\textsf{PM}\) shuffle is based on a uniform keyed partition using a binary [7, 4, 3]-Hamming code

1.2 Related work

The swap-or-not and the partition-and-mix shuffles asymptotically guarantee their security only up to \((1-\varepsilon )N\) queries for any \(\varepsilon >0\), but not all the N possible queries for the domain size N. In [14], a new approach, called mix-and-cut, has been proposed turning one shuffle to another, where a deck of cards are randomly separated into two piles, and the shuffle algorithm is independently applied to each of the two piles. Within this framework, one obtains a shuffle achieving the full security by repeatedly applying the swap-or-not shuffle \(O(\log ^2 N)\) times. This approach has been further improved in [11], where they slightly modified mix-and-cut, and showed application of the underlying shuffle to only one of the two piles is enough to achieve the full security. This framework, named sometimes-recurse, requires only \(O(\log N)\) applications of the shuffle on average, significantly improving the efficiency over mix-and-cut.

As another line of research on block cipher construction, a substitution-permutation network is modeled as an iterated Even-Mansour cipher. The original single-round construction is shown to be secure only up to the birthday bound [7]. Iteration would naturally enhance its security, and indeed the r-round Even-Mansour cipher on \(\{0,1\} ^n\) has been proved to be secure up to \(2^{\frac{rn}{r+1}}\) queries [5]. However we notice that the security model is incomparable to ours where the construction is based on independent random permutations whose size is the same as the entire construction as its underlying primitives, while an adversary is allowed to make queries to the inner permutations.

The partition-and-mix shuffle might be viewed as a mode of operation that extends the domain of a small block cipher operating on each block of the partition. The small block cipher might be constructed from a perfect random number generator, and again the random number generator constructed from any robust block cipher such as AES [15]. The domain extension of an ideal cipher has also been studied in [6], where they prove a 3-round Feistel cipher is a secure domain extender of an ideal cipher within the indifferentiability framework, while 2 rounds are enough to get a domain extender of a tweakable block cipher in the standard model.

2 Preliminaries

Notation For a fixed domain size \(N>0\), the set of all permutations on [N] will be denoted \(\mathcal {P} \). For a set T and an integer \(s\ge 1\), \(T^{*s}\) denotes the set of all sequences that consists of s pairwise distinct elements of T. For integers \(1\le s\le t\), we will write \((t)_s=t(t-1)\cdots (t-s+1)\). If \(|T|=t\), then \((t)_s\) becomes the size of \(T^{*s}\).

For a binary string \(\textbf{w}\), the number of its nonzero components is called the weight of \(\textbf{w}\), denoted \(\textbf{wt}(\textbf{w})\). For an element \(x\in \{0,1,\dots ,2^{s}-1\}\), let \(\langle x \rangle _s \in \{0,1\}^s\) denote the binary representation of x, namely, an s-bit string \((a_1,\ldots ,a_s)\in \{0,1\}^s\) such that \(x=2^{s-1}a_{s}+\dots +2a_2+a_1\), and let \(\textbf{e}(x)\) denote a \((2^s-1)\)-bit string \((b_1,\ldots ,b_{2^{s}-1}) \in \{0,1\}^{2^s-1}\) such that \(b_i=1\) if \(i=x\), and \(b_i=0\) otherwise. So we have \(\textbf{wt}(\textbf{e}(x))=0\) if \(x=0\), and \(\textbf{wt}(\textbf{e}(x))=1\) otherwise.

Hamming code An \([n,k,d]_{2^e}\) linear error-correcting code \(\mathcal {C}\) is a k-dimensional subspace of \(\mathbb {F}^n_{2^e}\) with the minimum weight d, where \(\mathbb {F}_{2^e}\) denotes a finite field of order \(2^e\). An \([n,k,d]_{2^e}\) code \(\mathcal {C}\) can be represented by a \(k\times n\) generator matrix G over \(\mathbb {F}_{2^e}\) where every codeword of \(\mathcal {C}\) is expressed as a linear combination of the row vectors of G, namely \(w\cdot G\) for some \(w\in \mathbb {F}^k_{2^e}\).

Hamming codes are a family of \([2^s-1,2^s-k-1,3]_{2}\) codes, where \(s\ge 2\). For each Hamming code, the balls of Hamming radius one centered on the codewords exactly fill out the entire space \(\{0,1\}^{n}\) where \(n=2^s-1\).

D-wise independent function family Let \((f_K)_{K\in \mathcal {K}}\) be a family of functions from X to Y with key space \(\mathcal {K}\). For a positive integer D, \((f_K)_{K\in \mathcal {K}}\) is called D-wise independent if for any distinct \(x_1,\ldots ,x_D\in X\) and any (not necessarily distinct) \(y_1,\ldots ,y_D\in Y\), the probability that \(g(x_i)=y_i\) for every \(i=1,\ldots ,D\) is \(1/|Y|^D\) over random choice of the key \(K\in \mathcal {K}\).

Security definition Let \(\textsf{E}\) be a block cipher on [N] that employs \(\lambda \)-bit keys. So each key \(\textbf{k}\in \{0,1\} ^{\lambda }\) defines a permutation \(\textsf{E}_{\textbf{k}}\) on [N]. In the adaptive chosen-ciphertext attack-indistinguishability (CCA-IND) model, an adversary \(\mathcal {A}\) adaptively makes forward and backward queries to either a permutation P or the blockcipher \(\textsf{E}_{\textbf{k}}\) to tell apart \(\textsf{E}_{\textbf{k}}\) and P, where \(\textsf{E}_{\textbf{k}}\) uses a random secret key \(\textbf{k}\) and P is chosen uniformly at random from \(\mathcal {P} \). Thus \(\mathcal {A}\)’s distinguishing advantage is formally defined by

$$\begin{aligned} {{\textbf {Adv}}}^{\textrm{cca}}_{\textsf{E}}(\mathcal {A})= \Pr \left[ {P \leftarrow _{\$} \mathcal {P} :\mathcal {A}^{P,P^{-1}}=1}\right] - \Pr \left[ {\textbf{k} \leftarrow _{\$} \{0,1\} ^{\lambda }:\mathcal {A}^{\textsf{E}_{\textbf{k}},\textsf{E}_{\textbf{k}}^{-1}}=1}\right] . \end{aligned}$$

In the non-adaptive chosen-plaintext attack (NCPA) model, an adversary \(\mathcal {A}\) makes only non-adaptive forward queries. The advantage \( {{\textbf {Adv}}}^{\textrm{ncpa}}_{\textsf{E}}(\mathcal {A})\) is similarly defined in this model. For \(\textrm{atk}\in \{\textrm{cca},\textrm{ncpa}\}\), and for \(q>0\), we define

$$\begin{aligned} {{\textbf {Adv}}}_{\textsf{E}}^{\textrm{atk}}(q)=\max _{\mathcal {A}} {{\textbf {Adv}}}_{\textsf{E}}^{\textrm{atk}}(\mathcal {A}) \end{aligned}$$

where the maximum is taken over all \(\textrm{atk}\)-adversaries making at most q queries. If the encryption and decryption algorithms are symmetric in their structures, we can lift the NCPA-security of the block cipher to CCA-security by doubling the number of rounds [9].

Lemma 1

If F and G are block ciphers on the same message space, then for any \(q>0\),

$$\begin{aligned} {{\textbf {Adv}}}_{F\circ G^{-1}}^{\textrm{cca}}(q)\le {{\textbf {Adv}}}_{F}^{\textrm{ncpa}}(q)+ {{\textbf {Adv}}}_{G}^{\textrm{ncpa}}(q). \end{aligned}$$

Total variation distance Given a finite event space \(\Omega \) and two probability distributions \(\mu \) and \(\nu \) defined on \(\Omega \), the total variation distance between \(\mu \) and \(\nu \), denoted \(\Vert \mu -\nu \Vert \), is defined as

$$\begin{aligned} \Vert \mu -\nu \Vert \mathrel {\mathop =^\textrm{def}} \frac{1}{2}\sum _{x\in \Omega }|\mu (x)-\nu (x)|=\max _{S\subset \Omega }\{\mu (S)-\nu (S)\}=\max _{S\subset \Omega }\{\nu (S)-\mu (S)\}. \end{aligned}$$

Useful Lemmas. For a finite nonempty set \(\Omega \), let \(\mu \) and \(\nu \) be probability distributions supported on q-tuples of elements of \(\Omega \). If the first l elements \(u^*_1,\dots ,u^*_l\) are fixed for \(l=0,\ldots ,q-1\), then we can consider the distribution of \(\mu \) restricted to the \((l+1)\)-th element, conditioned on \((u^*_1,\dots ,u^*_l)\), namely

$$\begin{aligned} \mu (u | u^*_1,\dots ,u^*_l)= \Pr \left[ {X_{l+1}=u | X_1=u^*_1,\ldots ,X_l=u^*_l}\right] \end{aligned}$$

where \((X_1,\ldots ,X_q)\sim \mu \). The distribution \(\nu (\ \cdot \ |u^*_1,\dots ,u^*_l)\) is similarly defined, and hence

$$\begin{aligned} \Vert \mu (\ \cdot \ | u^*_1,\dots ,u^*_l)-\nu (\ \cdot \ |u^*_1,\dots ,u^*_l)\Vert . \end{aligned}$$

Using this notation, given a set of random variables \((Z_1,\ldots ,Z_q)\), we can define a new random variable

$$\begin{aligned} \Vert \mu (\ \cdot \ | Z_1,\dots ,Z_l)-\nu (\ \cdot \ |Z_1,\dots ,Z_l)\Vert \end{aligned}$$

for \(l=0,\ldots ,q-1\). Then the total variation distance \(\Vert \mu -\nu \Vert \) is upper bounded by the sum of the conditional distances on average as follows.

Lemma 2

Fix a finite nonempty set \(\Omega \) and let \(\mu \) and \(\nu \) be probability distributions supported on q-tuples of elements of \(\Omega \), and suppose that \((Z_1,\ldots ,Z_q)\sim \mu \). Then

$$\begin{aligned} \Vert \mu -\nu \Vert \le \sum _{l=0}^{q-1}\textbf{E}\left( {\Vert \mu (\ \cdot \ | Z_1,\ldots ,Z_l )-\nu (\ \cdot \ | Z_1,\ldots ,Z_l)\Vert }\right) . \end{aligned}$$

Note that the expectation is taken over the set of random variables \((Z_1,\ldots ,Z_q)\).

Using the conventions \(\left( {\begin{array}{c}0\\ 0\end{array}}\right) =1\) and \(\left( {\begin{array}{c}p\\ q\end{array}}\right) =0\) for \(0\le p<q\), the following lemma on binomial coefficients will be also useful later.

Lemma 3

Let a, b, c be positive integers such that \(b\le c\). Then

$$\begin{aligned} \sum _{j=0}^{a}\frac{\left( {\begin{array}{c}b\\ j\end{array}}\right) \left( {\begin{array}{c}c-b\\ a-j\end{array}}\right) }{(j+1)\left( {\begin{array}{c}c\\ a\end{array}}\right) }\le \frac{c+1}{(a+1)(b+1)}. \end{aligned}$$
(2)

Proof

By integrating both sides of

$$\begin{aligned} (1+x)^b=\sum _{j=0}^{b}\left( {\begin{array}{c}b\\ j\end{array}}\right) x^j \end{aligned}$$

we obtain

$$\begin{aligned} \frac{1}{b+1}\left( (1+x)^{b+1}-1\right) =\sum _{j=0}^{b}\frac{1}{j+1}\left( {\begin{array}{c}b\\ j\end{array}}\right) x^{j+1}. \end{aligned}$$

Therefore the left-hand side of (2) is the coefficient of \(x^{a+1}\) in the polynomial

$$\begin{aligned} \frac{1}{\left( {\begin{array}{c}c\\ a\end{array}}\right) }\left( \sum _{j=0}^{b}\frac{1}{j+1}\left( {\begin{array}{c}b\\ j\end{array}}\right) x^{j+1}\right) \sum _{i=0}^{c-b}\left( {\begin{array}{c}c-b\\ i\end{array}}\right) x^i&=\frac{\left( (1+x)^{b+1}-1\right) \left( 1+x\right) ^{c-b}}{(b+1)\left( {\begin{array}{c}c\\ a\end{array}}\right) }\\&=\frac{(1+x)^{c+1}-(1+x)^{c-b}}{(b+1)\left( {\begin{array}{c}c\\ a\end{array}}\right) } \end{aligned}$$

which is upper bounded by the coefficient of \(x^{a+1}\) in

$$\begin{aligned} \frac{(1+x)^{c+1}}{(b+1)\left( {\begin{array}{c}c\\ a\end{array}}\right) }. \end{aligned}$$
(3)

The coefficient of \(x^{a+1}\) in (3) is

$$\begin{aligned} \frac{\left( {\begin{array}{c}c+1\\ a+1\end{array}}\right) }{(b+1)\left( {\begin{array}{c}c\\ a\end{array}}\right) }\le \frac{c+1}{(a+1)(b+1)}. \end{aligned}$$

\(\square \)

3 Security of the partition-and-mix shuffle

The security of the r-round partition-and-mix shuffle \(\textsf{PM}^r\) defined by an \(\varepsilon \)-almost D-uniform keyed partition \((\mathcal {B}_K)_{K\in \mathcal {K}}\) and a set of independent random permutations \((\sigma _K^{i,t})_{(K,i,t)\in \mathcal {K}\times \{1,\ldots ,\frac{N}{D}\}\times \{1,\ldots ,t\}}\) is summarized as the following theorem.

Theorem 1

Let \(\textsf{PM}^r\) be the r-round partition-and-mix shuffle defined by a keyed partition \((\mathcal {B}_K)_{K\in \mathcal {K}}\) and a set of mixing permutations \((\sigma _K^{i,t})\). If \((\mathcal {B}_K)_{K\in \mathcal {K}}\) is \(\varepsilon \)-almost D-uniform, \(\sigma _K^{i,t}\) are all independent random, and round keys \(K_1,\ldots ,K_t\) are chosen independently and uniformly at random from \(\mathcal {K}\), then

$$\begin{aligned} {{\textbf {Adv}}}_{\textsf{PM}^r}^{\textrm{cca}}(q)\le \frac{4\left( 1+\varepsilon \right) ^{\frac{r}{4}}N^{\frac{r}{4}+\frac{1}{2}}}{(r-4)D^{\frac{r}{4}}(N-q)^{\frac{r}{4}-1}}. \end{aligned}$$

3.1 Proof of Theorem 1

Fix q distinct elements \(z_1,\ldots ,z_q\in [N]\). For \(j=1,\ldots ,q\) and \(t=1,\ldots ,r\), let \(X_t(j)\) denote the random variable that indicates the position of \(z_j\) at the end of the t-th round of \(\textsf{PM}^r\), namely,

$$\begin{aligned} X_t(j)=\varPsi _t\circ \cdots \circ \varPsi _1(z_j) \end{aligned}$$

where \(\varPsi _1,\ldots ,\varPsi _t\) are as defined in (1). Let \(\tau _t\) be the distribution of

$$\begin{aligned} (X_t(1),\ldots ,X_t(q)) \end{aligned}$$

and let \(\pi \) be the uniform random distribution on \([N]^{*q}\). So \(\pi \) is the distribution of q samples without replacement from [N]. The core of the security proof is to upper bound their statistical distance \(\Vert \tau _r-\pi \Vert \) for reasonably small r since this is the distinguishing advantage of an NCPA-adversary that makes q queries \(z_1,\ldots ,z_q\).

Given a set of the first t round keys \(K=(K_1,\ldots ,K_t)\in \mathcal {K}^t\) for \(t=1,\ldots ,r\), we can consider the distribution of \((X_t(1),\ldots ,X_t(q))\) conditioned on a fixed set of partitions \((\mathcal {B}_{K_1},\ldots ,\mathcal {B}_{K_t})\), denoted \(\tau _t^{K}\). Then by the definition of the total variance distance and by the triangle inequality, we have

$$\begin{aligned} \Vert \tau _r-\pi \Vert&=\frac{1}{2}\times \sum _{(u_1,\ldots ,u_q)\in [N]^{*q}}\left| \left( \sum _{K\in \mathcal {K}^r}\frac{1}{|\mathcal {K}|^r}\tau _r^{K}(u_1,\ldots ,u_q)\right) -\pi (u_1,\ldots ,u_q)\right| \nonumber \\&=\frac{1}{2}\times \sum _{(u_1,\ldots ,u_q)\in [N]^{*q}}\left| \sum _{K\in \mathcal {K}^r}\frac{1}{|\mathcal {K}|^r}\left( \tau _r^{K}(u_1,\ldots ,u_q)-\pi (u_1,\ldots ,u_q)\right) \right| \nonumber \\&\le \sum _{K\in \mathcal {K}^r}\frac{1}{|\mathcal {K}|^r}\left( \frac{1}{2}\times \sum _{(u_1,\ldots ,u_q)\in [N]^{*q}}\left| \tau _r^{K}(u_1,\ldots ,u_q)-\pi (u_1,\ldots ,u_q)\right| \right) \nonumber \\&=\textbf{E}\left( {\Vert \tau _r^{K}-\pi \Vert }\right) \end{aligned}$$
(4)

where the expectation is taken over random variable K (regarded as defined on \(\mathcal {K}^r\) with the uniform distribution). Again, by Lemma 2, we have

$$\begin{aligned} \textbf{E}\left( {\Vert \tau _r^{K}-\pi \Vert }\right)&\le \textbf{E}\left( { \sum _{l=0}^{q-1}\textbf{E}\left( {\Vert \tau _r^{K}(\ \cdot \ | X_r(1),\ldots ,X_r(l) )-\pi (\ \cdot \ | X_r(1),\ldots ,X_r(l))\Vert }\right) }\right) \nonumber \\&= \sum _{l=0}^{q-1}\textbf{E}\left( {\Vert \tau _r^{K}(\ \cdot \ | X_r(1),\ldots ,X_r(l) )-\pi (\ \cdot \ | X_r(1),\ldots ,X_r(l))\Vert }\right) \nonumber \\&= \sum _{l=0}^{q-1}\textbf{E}\left( {\Vert \tau _r^{K}(\ \cdot \ | X_r(1),\ldots ,X_r(l) )-\frac{1}{m}\Vert }\right) \end{aligned}$$
(5)

where the last expectation is taken over random variables \(X_r(1),\ldots ,X_r(l)\) and K, and \(m=N-l\). For a fixed \(l=0,\ldots ,q-1\), let

$$\begin{aligned} p_t(a)=\tau _t^{K}(a | X_t(1),\ldots ,X_t(l)). \end{aligned}$$

Then we have

$$\begin{aligned} \Vert \tau _t^{K}(\ \cdot \ | X_t(1),\ldots ,X_t(l) )-\pi (\ \cdot \ | X_t(1),\ldots ,X_t(l))\Vert =\frac{1}{2}\sum _{a\in S_t}|p_t(a)-1/m| \end{aligned}$$

where \(S_t=[N]\setminus \{X_t(1),\ldots ,X_t(l)\}\). By using the inequality \(\textbf{E}\left( {X}\right) ^2\le \textbf{E}\left( {X^2}\right) \) (that holds for any random variable X) and the Cauchy-Schwarz inequality, we have

$$\begin{aligned} \left( \textbf{E}\left( {\sum _{a\in S_t}|p_t(a)-1/m|}\right) \right) ^2\le N\cdot \textbf{E}\left( {\sum _{a\in S_t}(p_t(a)-1/m)^2}\right) . \end{aligned}$$
(6)

Define \(s_t=\sum _{a\in S_t}(p_t(a)-1/m)^2\) for \(t=0,\ldots ,r\). Since the initial positions of the elements \(z_1,\ldots ,z_q\) are deterministic, we have

$$\begin{aligned} \textbf{E}\left( {s_0}\right) =\left( 1-\frac{1}{m}\right) ^2<1. \end{aligned}$$

Then we will express \(\textbf{E}\left( {s_{t+1}|s_t}\right) \) as a linear equation of \(s_t\) with small coefficients.

As \(s_t\) being a random variable defined by \(X_t(1),\ldots ,X_t(l)\) and \(K_1,\ldots ,K_t\), we fix the values of these variables, and consider the conditional expectation of \(s_{t+1}\). Given a partition \(\mathcal {B}_{K_{t+1}}\), we only determine the evolution of \(X_t(1),\ldots ,X_t(l)\) (not the other elements) to determine \(S_{t+1}\). Then we can arbitrarily define a permutation

$$\begin{aligned} f:S_t\longrightarrow S_{t+1} \end{aligned}$$

such that \(f(B\cap S_t)=B\cap S_{t+1}\) for every \(B\in \mathcal {B}_{K_{t+1}}\). (This is always possible since \(|B\cap S_t|=|B\cap S_{t+1}|\).) Since

$$\begin{aligned} p_{t+1}(f(a))= {\left\{ \begin{array}{ll} \sum _{u\in B\cap S_t}\frac{p_t(u)}{|B\cap S_t|}, &{} \text {if } a\in B\cap S_t\\ p_t(a), &{} \text {if } a\notin B\cap S_t \end{array}\right. } \end{aligned}$$

for every \(B\in \mathcal {B}_{K_{t+1}}\), it follows that

$$\begin{aligned}&\textbf{E}\left( {s_{t+1}|s_t}\right) \\&=\textbf{E}\left( {\sum _{a\in S_t}(p_{t+1}(f(a))-1/m)^2\ \bigg | \ s_t}\right) \\&=\sum _{a\in S_t}\sum _{\begin{array}{c} U\subset [N]\text {where}\\ a\in U\text { and } |U|=D \end{array}} \Pr \left[ {K_{t+1} \leftarrow _{\$} \mathcal {K}:U\in \mathcal {B}_{K_{t+1}}}\right] \left( \sum _{u\in U\cap S_t}\frac{p_t(u)}{|U\cap S_t|}-\frac{1}{m}\right) ^2\\&\le \left( 1+\varepsilon \right) \sum _{a\in S_t} \sum _{\begin{array}{c} U\subset [N]\text {where}\\ a\in U\text { and } |U|=D \end{array}}\frac{1}{\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }\left( \sum _{u\in U\cap S_t}\frac{p_t(u)}{|U\cap S_t|}-\frac{1}{m}\right) ^2. \end{aligned}$$

For a fixed element \(a\in S_t\), we can choose a set \(U\subset [N]\) such that \(a\in U\) and \(|U|=D\) by the following process.

  1. 1.

    Fix \(i=|(U\cap S_t)|\), where \(1\le i\le D\).

  2. 2.

    Choose \(V=(U\cap S_t)\setminus \{a\}=\{v_1,\ldots ,v_{i-1}\}\).

  3. 3.

    Choose \(W=U\setminus S_t\) such that \(|W|=D-i\).

  4. 4.

    Define \(U=V\cup W\cup \{a\}\).

Since the number of ways of choosing sets W is \(\left( {\begin{array}{c}l\\ D-i\end{array}}\right) \), we have

$$\begin{aligned}&\sum _{\begin{array}{c} U\subset [N]\text {where}\\ a\in U\text { and } |U|=D \end{array}}\frac{1}{\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }\left( \sum _{u\in U\cap S_t}\frac{p_t(u)}{|U\cap S_t|}-\frac{1}{m}\right) ^2 \\ {}&=\sum _{i=1}^{D}\frac{\left( {\begin{array}{c}l\\ D-i\end{array}}\right) }{\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }\sum _{\{v_1,\ldots ,v_{i-1}\}\subset S_t\setminus \{a\}}\left( \frac{p_t(a)+p_t(v_1)+\cdots +p_t(v_{i-1})}{i}-\frac{1}{m}\right) ^2 \\ {}&=\sum _{i=1}^{D}\frac{\left( {\begin{array}{c}l\\ D-i\end{array}}\right) }{\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }\cdot \frac{1}{i^2(i-1)!}\\ \quad&\times \sum _{(v_1,\ldots ,v_{i-1})\subset (S_t\setminus \{a\})^{*(i-1)}}\left( \left( p_t(a)-\frac{1}{m}\right) +\cdots +\left( p_t(v_{i-1})-\frac{1}{m}\right) \right) ^2. \end{aligned}$$

We expand and simplify the inner summation using the following observations.

  1. 1.
    $$\begin{aligned} \sum _{(v_1,\ldots ,v_{i-1})\subset (S_t\setminus \{a\})^{*(i-1)}}\left( p_t(a)-\frac{1}{m}\right) ^2=(m-1)_{i-1}\left( p_t(a)-\frac{1}{m}\right) ^2 \mathrel {\mathop =^\textrm{def}} A_1. \end{aligned}$$
  2. 2.

    For \(1\le j\le i-1\), since \(\sum _{v\in S_t}\left( p_t(v)-\frac{1}{m}\right) =0\),

    $$\begin{aligned}&\sum _{(v_1,\ldots ,v_{i-1})\subset (S_t\setminus \{a\})^{*(i-1)}}\left( p_t(a)-\frac{1}{m}\right) \left( p_t(v_j)-\frac{1}{m}\right) \\&=(m-2)_{i-2}\left( p_t(a)-\frac{1}{m}\right) \sum _{v\in S_t\setminus \{a\}}\left( p_t(v)-\frac{1}{m}\right) \\&=-(m-2)_{i-2}\left( p_t(a)-\frac{1}{m}\right) ^2 \mathrel {\mathop =^\textrm{def}} A_2 \end{aligned}$$

    where we assume \(m,i\ge 2\).

  3. 3.

    For \(1\le j\le i-1\),

    $$\begin{aligned}&\sum _{(v_1,\ldots ,v_{i-1})\subset (S_t\setminus \{a\})^{*(i-1)}}\left( p_t(v_j)-\frac{1}{m}\right) ^2\\&=(m-2)_{i-2}\sum _{v\in S_t\setminus \{a\}}\left( p_t(v)-\frac{1}{m}\right) ^2\\&=(m-2)_{i-2}\left( s_t-\left( p_t(a)-\frac{1}{m}\right) ^2\right) \mathrel {\mathop =^\textrm{def}} A_3 \end{aligned}$$

    where we assume \(m,i\ge 2\).

  4. 4.

    For \(1\le j<h\le i-1\),

    $$\begin{aligned}&\sum _{(v_1,\ldots ,v_{i-1})\subset (S_t\setminus \{a\})^{*(i-1)}}\left( p_t(v_j)-\frac{1}{m}\right) \left( p_t(v_h)-\frac{1}{m}\right) \\&=(m-3)_{i-3}\left( \left( \sum _{v\in S_t\setminus \{a\}}\left( p_t(v)-\frac{1}{m}\right) \right) ^2-\sum _{v\in S_t\setminus \{a\}}\left( p_t(v)-\frac{1}{m}\right) ^2\right) \\&=(m-3)_{i-3}\left( \left( p_t(a)-\frac{1}{m}\right) ^2-\sum _{v\in S_t\setminus \{a\}}\left( p_t(v)-\frac{1}{m}\right) ^2\right) \\&=(m-3)_{i-3}\left( 2\left( p_t(a)-\frac{1}{m}\right) ^2-s_t\right) \mathrel {\mathop =^\textrm{def}} A_4 \end{aligned}$$

    where we assume \(m,i\ge 3\).

Since

$$\begin{aligned} \sum _{a\in S_t}A_1&=(m-1)_{i-1}s_t,\\ \sum _{a\in S_t}A_2&=-(m-2)_{i-2}s_t,\\ \sum _{a\in S_t}A_3&=(m-2)_{i-2}\left( ms_t-s_t\right) =(m-1)_{i-1}s_t,\\ \sum _{a\in S_t}A_4&=(m-3)_{i-3}\left( 2s_t-ms_t\right) =-(m-2)_{i-2}s_t, \end{aligned}$$

we have

$$\begin{aligned} \sum _{a\in S_t}\left( A_1+2(i-1)A_2+(i-1)A_3+(i-1)(i-2)A_4\right) = i(m-i)(m-2)_{i-2}s_t, \end{aligned}$$

and hence

$$\begin{aligned} \textbf{E}\left( {s_{t+1}|s_t}\right)&=\left( 1+\varepsilon \right) \sum _{i=1}^{D}\frac{\left( {\begin{array}{c}l\\ D-i\end{array}}\right) }{\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }\cdot \frac{1}{i^2(i-1)!}\nonumber \\&\quad \times \sum _{a\in S_t}\left( A_1+2(i-1)A_2+(i-1)A_3+(i-1)(i-2)A_4\right) \nonumber \\&=\left( 1+\varepsilon \right) \sum _{i=1}^{D}\frac{\left( {\begin{array}{c}l\\ D-i\end{array}}\right) \cdot i(m-i)(m-2)_{i-2}}{i^2(i-1)!\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }s_t\nonumber \\ \quad&\le \left( 1+\varepsilon \right) \sum _{i=1}^{D}\frac{\left( {\begin{array}{c}l\\ D-i\end{array}}\right) \left( m-1\right) _{i-1}}{i!\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }s_t\nonumber \\ \quad&\le \frac{\left( 1+\varepsilon \right) Ns_t}{Dm} \end{aligned}$$
(7)

where the last inequality follows since by applying Lemma 3 with \(a=D-1\), \(b=m-1\) and \(c=N-1\),

$$\begin{aligned} \sum _{i=1}^{D}\frac{\left( {\begin{array}{c}l\\ D-i\end{array}}\right) \left( m-1\right) _{i-1}}{i!\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }&=\sum _{j=0}^{a}\frac{\left( {\begin{array}{c}c-b\\ a-j\end{array}}\right) \left( b\right) _{j}}{(j+1)!\left( {\begin{array}{c}c\\ a\end{array}}\right) }=\sum _{j=0}^{a}\frac{\left( {\begin{array}{c}c-b\\ a-j\end{array}}\right) \left( {\begin{array}{c}b\\ j\end{array}}\right) }{(j+1)\left( {\begin{array}{c}c\\ a\end{array}}\right) }\\&\le \frac{c+1}{(a+1)(b+1)}=\frac{N}{Dm}. \end{aligned}$$

By taking expectation on both sides of inequality (7), we have

$$\begin{aligned} \textbf{E}\left( {s_{t+1}}\right) \le \frac{\left( 1+\varepsilon \right) N}{Dm}\textbf{E}\left( {s_t}\right) . \end{aligned}$$

Since \(\textbf{E}\left( {s_0}\right) <1\), we have

$$\begin{aligned} \textbf{E}\left( {s_{r}}\right) \le \left( \frac{\left( 1+\varepsilon \right) N}{Dm}\right) ^r. \end{aligned}$$

Therefore by (4), (5) and (6), we have

$$\begin{aligned} {{\textbf {Adv}}}_{\textsf{PM}^r}^{\textrm{ncpa}}(q)&=\Vert \tau _r-\pi \Vert \\&\le \frac{1}{2}\sum _{l=0}^{q-1}\left( N\textbf{E}\left( {s_r}\right) \right) ^{\frac{1}{2}}\\&\le \frac{N^{\frac{1}{2}}}{2}\sum _{l=0}^{q-1}\left( \frac{\left( 1+\varepsilon \right) N}{Dm}\right) ^{\frac{r}{2}}\\&\le \frac{N^{\frac{3}{2}}}{2D^{\frac{r}{2}}}\sum _{l=0}^{q-1}\left( \frac{1+\varepsilon }{1-\frac{l}{N}}\right) ^{\frac{r}{2}}\cdot \frac{1}{N}\\&\le \frac{N^{\frac{3}{2}}}{2D^{\frac{r}{2}}}\int _{0}^{\frac{q}{N}}\left( \frac{1+\varepsilon }{1-x}\right) ^{\frac{r}{2}}dx\\&\le \frac{\left( 1+\varepsilon \right) ^{\frac{r}{2}}N^{\frac{r}{2}+\frac{1}{2}}}{(r-2)D^{\frac{r}{2}}(N-q)^{\frac{r}{2}-1}}. \end{aligned}$$

By using Lemma 1, we complete the proof of Theorem 1.

4 Almost uniform partitions

In this section, we will describe in detail how keyed partitions can be defined based on binary Hamming codes, and efficiently implemented within the PM shuffle. We also analyze the property of the recursive construction given in Sect. 1.1.

4.1 Almost uniform partitions based on binary hamming codes

For each integer \(s\ge 2\), let \(\mathcal {C}_s\) be a binary \([2^s-1,2^s-s-1,3]\)-Hamming code. Using the code \(\mathcal {C}_s\), we can define a keyed partition \((\mathcal {B}_K)_{K\in \mathcal {K}}\) of \(\{0,1\} ^n\) for any \(n\ge 2^s-1\) where each block is of size \(D=2^s\). The key space of this keyed partition is defined as

$$\begin{aligned} \mathcal {K}&=\{(K_1,\ldots ,K_{D-1})\in (\{0,1\} ^n)^{D-1}: K_1,\ldots ,K_{D-1}\text { are linearly independent}\}\\&\times \{0,1\} ^{D-1}. \end{aligned}$$

Given a key \(\left( K_1,\ldots ,K_{D-1},\textbf{b}\right) \in \mathcal {K}\), it determines a subspace of dimension \({D-1}\)

$$\begin{aligned} V=\langle K_1,\ldots ,K_{D-1}\rangle . \end{aligned}$$

If we arbitrarily fix a set of representatives R for the quotient space \(\{0,1\} ^n/V\), then the entire set \(\{0,1\} ^n\) is partitioned as

$$\begin{aligned} \{0,1\} ^n=\bigcup _{\textbf{a}\in R}(\textbf{a}+V). \end{aligned}$$

Again, we partition each coset \(\textbf{a}+V\) as

$$\begin{aligned}{} & {} \textbf{a}+V=\bigcup _{\textbf{c}\in \mathcal {C}_s}\{\textbf{a}+(c_1+b_1+e_1)K_1+\cdots +(c_{D-1}+b_{D-1}+e_{D-1})K_{D-1}:\\{} & {} \quad \textbf{wt}(e_1,\ldots ,e_{D-1})\le 1\}. \end{aligned}$$

where we write \(\textbf{c}=(c_1,\ldots ,c_{D-1})\), \(\textbf{b}=(b_1,\ldots ,b_{D-1})\). So for each codeword \(\textbf{c}=(c_1,\ldots ,c_{D-1})\in \mathcal {C}_s\) and the key \(\textbf{b}=(b_1,\ldots ,b_{D-1})\), the element

$$\begin{aligned} \textbf{a}+(c_1+b_1)K_1+\cdots +(c_{D-1}+b_{D-1})K_{D-1} \end{aligned}$$
(8)

becomes the center of the block containing the element itself in a sense that the other elements of the block are obtained by adding \(K_i\), \(i=1,\ldots ,D-1\), to the center. Given a key \(\left( K_1,\ldots ,K_{D-1},\textbf{b}\right) \), the center of each block is uniquely determined.

Let \(U=\{\textbf{u}_1,\ldots ,\textbf{u}_{D}\}\subset \{0,1\} ^n\) be a subset of size D. Suppose that U is a block in a partition with key \(\left( K_1,\ldots ,K_{D-1},\textbf{b}\right) \). Then \(u_i\) should be the center of a ball for some \(i=1,\ldots ,D\), which is of the form of (8). In this case, we have

$$\begin{aligned} (\textbf{u}_1+\textbf{u}_i,\ldots , \textbf{u}_{i-1}+\textbf{u}_i, \textbf{u}_{i+1}+\textbf{u}_i,\ldots , \textbf{u}_{D}+\textbf{u}_i)=(K_{g(1)},\ldots ,K_{g(D-1)}) \end{aligned}$$

for some permutation g on \([D-1]\). Once i and g are fixed, then \(V=\langle K_1,\ldots ,K_{D-1}\rangle \) is determined, and hence a representative \(\textbf{a}\) such that \(U\subset \textbf{a}+V\). If we arbitrarily choose any codeword \(\textbf{c}\in \mathcal {C}_s\), then \(\textbf{b}\) is uniquely determined by \(\textbf{a}\), \(\textbf{c}\) and the center of the ball \(u_i=\textbf{a}+(c_1+b_1)K_1+\cdots +(c_{D-1}+b_{D-1})K_{D-1}\).Since

$$\begin{aligned} |\mathcal {K}|&=2^{D-1}\cdot \prod _{i=0}^{D-2}(N-2^i),\\ |\mathcal {C}_s|&=2^{D-s-1}, \end{aligned}$$

and \(D=2^s\), we have

$$\begin{aligned} \Pr \left[ {K \leftarrow _{\$} \mathcal {K}:U\in \mathcal {B}_K}\right]&\le \frac{{D}\cdot (D-1)!\cdot |\mathcal {C}_s|}{|\mathcal {K}|}\\&= \frac{(D-1)!}{\prod _{i=0}^{D-2}(N-2^i)}\\&= \left( \prod _{i=0}^{D-2}\frac{1}{N-2^i}\right) \cdot \frac{(D-1)!\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }{\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }\\&\le \left( \prod _{i=0}^{D-2}\frac{N}{N-2^i}\right) \cdot \frac{1}{\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }\\&= \left( \frac{1}{\prod _{i=0}^{D-2}\left( 1-\frac{2^i}{N}\right) }\right) \cdot \frac{1}{\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }\\&\le \frac{1}{1-\frac{2^{D-1}}{N}}\cdot \frac{1}{\left( {\begin{array}{c}N-1\\ {D-1}\end{array}}\right) }\\&\le \left( 1+\frac{2^{D}}{N}\right) \frac{1}{\left( {\begin{array}{c}N-1\\ {D-1}\end{array}}\right) }\\ \end{aligned}$$

if \(N\ge 2^{D}\). Therefore this keyed partition is \(\varepsilon \)-almost D-uniform for \(\varepsilon =2^{D}/N\).

4.2 Extension of almost uniform partitions using random functions

Let \((\mathcal {B}'_K)_{K\in \mathcal {K}_1}\) be an \(\varepsilon \)-almost D-uniform keyed partition of X, let Y be an additive group, and let \((f_K)_{K\in \mathcal {K}_2}\) be a D-wise independent function family from X to Y. Then we can construct an \(\varepsilon '\)-almost D-uniform keyed partition \((\mathcal {B}_K)_{K\in \mathcal {K}}\) of \(X\times Y\) with the key space being \(\mathcal {K}=\mathcal {K}_1\times \mathcal {K}_2\), where

$$\begin{aligned} \varepsilon '=\varepsilon +\frac{D^2}{|X|}+\frac{\varepsilon D^2}{|X|}. \end{aligned}$$

Given a key \(K=(K_1,K_2)\in \mathcal {K}_1\times \mathcal {K}_2\), the partition keyed with K is defined as

$$\begin{aligned} \mathcal {B}_K=\left\{ \{(x,f_{K_2}(x)+c):x\in B\}:B\in \mathcal {B}_{K_1},\ c\in Y\right\} . \end{aligned}$$

Let \(U=\{(x_1,y_1),\ldots ,(x_D,y_D)\}\) be a subset of \(X\times Y\) of size D. If there is a collision at the first position, namely \(x_i=x_j\) for some \(1\le i <j\le D\), then

$$\begin{aligned} \Pr \left[ {K \leftarrow _{\$} \mathcal {K}:U\in \mathcal {B}_K}\right] =0. \end{aligned}$$

Otherwise, for \(M=|X|\), \(M'=|Y|\) and \(N=|X\times Y|=MM'\), we have

$$\begin{aligned} \Pr \left[ {K \leftarrow _{\$} \mathcal {K}:U\in \mathcal {B}_K}\right]&\le \frac{(1+\varepsilon )}{\left( {\begin{array}{c}M-1\\ D-1\end{array}}\right) }\cdot \frac{1}{(M')^{D-1}}\\&=\frac{(1+\varepsilon )}{\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }\cdot \frac{\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }{\left( {\begin{array}{c}M-1\\ D-1\end{array}}\right) (M')^{D-1}}\\&\le \frac{(1+\varepsilon )}{\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }\cdot \frac{M^{D-1}}{(M-1)_{D-1}}\\&=\frac{(1+\varepsilon )}{\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }\prod _{i=1}^{D-1} \frac{1}{1-\frac{i}{M}}\\&\le \frac{(1+\varepsilon )}{\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) }\cdot \frac{1}{1-\frac{D^2}{2M}}\\&\le \frac{(1+\varepsilon )(1+\frac{D^2}{M})}{\left( {\begin{array}{c}N-1\\ D-1\end{array}}\right) } \end{aligned}$$

if \(D^2\le M\).

5 Concrete instantiation of the \(\textsf{PM}\) shuffle

In this section, we present a concrete instantiation of an n-bit \(\textsf{PM}\) shuffle based on a binary \([2^s-1,2^s-s-1,3]\)-Hamming code \(\mathcal {C}_s\). Suppose that \(n\ge 2^s-1\) and let \(D=2^s\).

A single round of the resulting PM shuffle Given a key

$$\begin{aligned} K=\left( K_1,\ldots ,K_{D-1},\textbf{b}\right) \in \mathcal {K}\end{aligned}$$

then the \((D-1)\times n\) matrix L with the i-th row being \(K_i\), \(i=1,\ldots ,D-1\), can be transformed into a reduced row echelon form \(H=(h_{ij})\), where we can also compute and record a \((D-1)\times (D-1)\) invertible matrix \(M=(m_{ij})\) such that

$$\begin{aligned} ML=H. \end{aligned}$$

This computation, using the elementary row operations, would not be costly in general, and might be precomputed prior to encryption of data. Let \(j_1,\ldots ,j_{D-1}\) denote the column indices of the leading ones in H. So \(h_{\alpha , j_{\alpha }}=1\) for \(\alpha =1,\ldots ,D-1\).

Given an input \(\textbf{u}=(u_1,\ldots ,u_n)\in \{0,1\} ^n\), the representative of the coset containing \(\textbf{u}\) is defined by setting the elements at the positions of the leading ones to zero. Namely, the representative \(\textbf{a}\) is computed by

$$\begin{aligned} \textbf{a}=\textbf{u}+u_{j_1}H_1+\cdots +u_{j_{D-1}}H_{D-1} \end{aligned}$$

where \(H_i\) denotes the i-th row of H. Since

$$\begin{aligned} H_i=m_{i1}K_1+\cdots +m_{i,D-1}K_{D-1} \end{aligned}$$

for \(i=1,\ldots ,D-1\), we can also compute \(p_1,\ldots ,p_{D-1}\in \{0,1\} \) such that

$$\begin{aligned} \textbf{a}=\textbf{u}+p_{1}K_1+\cdots +p_{D-1}K_{D-1} \end{aligned}$$

or equivalently,

$$\begin{aligned} \textbf{u}=\textbf{a}+\textbf{b}+(b_{1}+p_{1})K_1+\cdots +(b_{D-1}+p_{D-1})K_{D-1}. \end{aligned}$$

Precisely, for \(i=1,\ldots ,D-1\),

$$\begin{aligned} p_i=u_{j_1}m_{1,i}+u_{j_2}m_{2,i}+\cdots +u_{j_{D-1}}m_{D-1,i}. \end{aligned}$$

By decoding the word \((b_1+p_1,\ldots ,b_{D-1}+p_{D-1})\) using the Hamming code \(\mathcal {C}_s\), we can obtain a codeword \(\textbf{c}=(c_1,\ldots ,c_{D-1})\) and the corresponding error vector

$$\begin{aligned} \textbf{e}=(e_1,\ldots ,e_{D-1})=(b_1+p_1+c_1,\ldots ,b_{D-1}+p_{D-1}+c_{D-1}) \end{aligned}$$

such that \(\textbf{wt}(\textbf{e})\le 1\). This step is essentially to compute the syndrome of the word \((b_1+p_1,\ldots ,b_{D-1}+p_{D-1})\) using the parity check matrix of \(\mathcal {C}_s\). Then we have

$$\begin{aligned} \textbf{u}=\textbf{a}+(b_1+c_1+e_1)K_1+\cdots +(b_{D-1}+c_{D-1}+e_{D-1})K_{D-1} \end{aligned}$$

and the block containing \(\textbf{u}\) is labeled as \((\textbf{a},\textbf{c})\in \{0,1\}^n\times \{0,1\}^{D-1}\). The position of one in \(\textbf{e}\) can be encoded as an element of \(\{0,1\} ^s\), with no error being regarded as \((0,\ldots ,0)\in \{0,1\} ^s\).

By applying the round permutation \(\sigma _{\textbf{a},\textbf{c}}\) to \(\textbf{e}\),Footnote 2 a new error vector \(\textbf{e}'=(e'_1,\ldots ,e'_{D-1})\) such that \(\textbf{wt}(\textbf{e}')\le 1\) is obtained, and finally the element \(\textbf{u}\) is mapped to

$$\begin{aligned} \textbf{u}'=\textbf{a}+(b_1+c_1+e'_1)K_1+\cdots +(b_{D-1}+c_{D-1}+e'_{D-1})K_{D-1}. \end{aligned}$$

Pseudocode. Suppose that the r-round \(\textsf{PM}^r\) cipher uses an s-bit tweakable permutation

$$\begin{aligned} \sigma :\left( \{0,1\}^n \times \{0,1\}^{D-1}\times \{1,\ldots ,r\}\right) \times \{0,1\}^s \longrightarrow \{0,1\}^s \end{aligned}$$

as its underlying primitive. Then \(\textsf{PM}^r\) encrypts \(\textbf{w}\in \{0,1\}^n\) using a set of t round keys

$$\begin{aligned} \left( K_{t,1},\ldots ,K_{t,D-1},\textbf{b}_t\right) _{t\in [r]}\in \left( (\{0,1\}^n)^{D-1}\times \{0,1\}^{D-1}\right) ^r \end{aligned}$$

as described in Fig. .

Fig. 2
figure 2

The r-round \(\textsf{PM}\) shuffle based on a binary \([2^s-1,2^s-s-1,3]\)-Hamming code

Numerical Example. Let \(s=3\), \(n=32\) and \(r=512\). Then one needs a 3-bit block cipher using 48-bit tweaks for the underlying primitive \(\sigma \). This small block cipher can be instantiated using a tweakable block cipher, e.g., Skinny-128-256 [1]. For each round, one makes a single call to Skinny-128–256 with a fixed plaintext using a 256-bit tweakey containing the 48-bit tweak, obtaining a 128-bit random string, from which one can construct a random permutation on 3 bits. A straightforward way of constructing such a permutation is to parse the 128-bit string into a sequence of eight 16-bit blocks. If there is no collision between the blocks, then the sequence defines a permutation on \(\{0,1\}^3\). The probability of collision is upper bounded by \(\left( {\begin{array}{c}8\\ 2\end{array}}\right) /2^{16}\), which is smaller than \(\frac{1}{2^{11}}\).

Lines 3 to 6 in the pseudocode can be precomputed for every round \(t\in [r]\) if a sufficient amount of memory is available. Line 11 can be executed using the syndrome decoding: the generator matrix of the [7, 4, 3]-Hamming code (for \(s=3\)) is given as

$$\begin{aligned} G=\left[ \begin{array}{lllllll}1 &{} 0 &{} 0 &{} 0 &{} 1 &{} 1 &{} 0\\ 0 &{} 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 1\\ 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 1 &{} 1\\ 0 &{} 0 &{} 0 &{} 1 &{} 1 &{} 1 &{} 1\end{array}\right] \end{aligned}$$

and its parity-check matrix is defined as

$$\begin{aligned} G^*=\left[ \begin{array}{lllllll}1 &{} 1 &{} 0 &{} 1 &{} 1 &{} 0 &{} 0\\ 1 &{} 0 &{} 1 &{} 1 &{} 0 &{} 1 &{} 0\\ 0 &{} 1 &{} 1 &{} 1 &{} 0 &{} 0 &{} 1\end{array}\right] . \end{aligned}$$

By computing \((\textbf{b}+\textbf{p}) (G^*)^T\), one obtains the 3-bit syndrome of \(\textbf{b}+\textbf{p}\), where \((G^*)^T\) denotes the transpose of \(G^*\). The syndrome of \(\textbf{b}+\textbf{p}\) specifies the exact position of the single bit error in \(\textbf{b}+\textbf{p}\) (if any), allowing one to recover the corresponding codeword \(\textbf{c}\) and the error vector \(\textbf{e}\) such that \(\textbf{c}+\textbf{e}=\textbf{b}+\textbf{p}\).