Keywords

1 Introduction

The applicability of card shuffles to cryptography was noticed many years ago by e.g., Naor [17] for Thorp shuffle. The shuffles can be categorized into two groups. The first one are the oblivious shuffles, meaning that the trajectory of a card during the shuffle can be traced without tracing trajectories of other cards. Thus oblivious shuffles can be seen as block ciphers. The other group of card shuffles – non-oblivious shuffles require tracing all the cards in order to trace a single one. Since one needs to trace each of the n cards, straightforward application of non-oblivious shuffles as block ciphers would be inefficient. But non-oblivious shuffles are used in cryptographic schemes anyway, just in a slightly different role – often as a building block of a stream cipher.

Let us use the naming convention used by RC4 – a stream cipher designed in 1987 by Ronald Rivest. There is also a long line of stream ciphers: RC4A [23], Spritz [20], RC4+ [21], VMPC [25] – all of them are very similar – they are composed from two algorithms:

  1. 1.

    KSA (Key Scheduling Algorithm) uses a secret key to transform identity permutation of n cards into some other permutation (one can model KSA as a card shuffle).

  2. 2.

    PRGA (Pseudo Random Generation Algorithm) starts with a permutation generated by KSA and outputs random bits from it updating permutation at the same time.

Thus, KSA s of all aforementioned algorithms (RC4, RC4A, Spritz, RC4+, VMPC) can be seen as performing some card shuffling, where a secret key corresponds to/replaces randomness. If we consider a version of the algorithm with purely random secret key of infinite length then we indeed consider a card shuffling procedure. Following [15], we call such a version of the algorithm an idealized version. In the case of KSA used by RC4 the idealized version (mathematical model) of the card shuffle is called Cyclic-to-Random Transpositions shuffle which indeed is an example of non-oblivious shuffle. Recently, in 2013, Rivest and Schuldt presented a new version of an RC4-like cipher (Spritz [20]) which has a new sponge-like KSA which performs more complicated shuffle: 6N steps of Cyclic-to-Random Transpositions (as part of Whip procedure, see Fig. 7; compared to only N steps of in RC4) and in between, partial sorting (so called Crush) of elements in the internal state is performed twice (after 2-nd and 4-th shuffling).

The KSA s of mentioned ciphers perform shuffling for some predefined number of steps. The security of such a scheme is mainly based on analyzing idealized version of the algorithm and corresponds to the “quality of a shuffling”. Roughly speaking, shuffling is considered as a Markov chain on permutations, all of them converge to uniform distribution (perfectly shuffled cards). Then we should perform as many steps as needed to be close to this uniform distribution, what is directly related to the so-called mixing time. This is one of the main drawbacks of RC4: it performs Cyclic-to-Random Transpositions for n steps, whereas the mixing time is of order \(n\log n\).

There is a long list of papers which point out weaknesses of the RC4  algorithm. Attacks exploited both weaknesses of PRGA and KSA or the way RC4   was used in specific systems [4, 9, 10, 12, 13]. As a result, in 2015 RC4 was prohibited in TLS by IETF, Microsoft and Mozilla.

In the paper we use so-called Strong Stationary Times (SST) for Markov chains. The main area of application of SSTs is studying the rate of convergence of a chain to its stationary distribution. However, they may also be used for perfect sampling from stationary distribution of a Markov chain, consult [19] (on Coupling From The Past algorithm) and [8] (on algorithms involving Strong Stationary Times and Strong Stationary Duality).

1.1 Our Contribution

(1) Strong stationary time based KSA algorithm(s). Instead of running a KSA algorithm (i.e., performing the shuffle) for some pre-defined number of steps, we make it randomized (Las Vegas algorithm). To be more specific we suggest utilization of so-called Strong Stationary Times (SST) for Markov chains. We use SST to obtain samples from uniform distribution on all permutations (we actually perform perfect sampling). We show benefits of such approach:

  1. 1.

    Use of SST may allow to close the gap between theoretical models and practice. As a result of Mironov’s [15] work, one knows that idealized version of RC4 ’s KSA would need keys of length \({\approx }23~037\) in order to meet the mathematical model. In fact one may use a better shuffling than the one that is used in RC4 i.e., time-reversed riffle shuffle which requires (on average) 4096 bits – not much more than 2048 bits which are allowed for RC4 (see Sect. 4.1).

  2. 2.

    Coupling methods are most commonly used tool for studying the rate of convergence to stationarity for Markov chains. They allow to bound so-called total variation distance between the distribution of given chain at time instant k and its stationary distribution. However, the (traditional) security definitions require something “stronger”. It turns out that bounding separation distance is what one actually needs. It fits perfectly in the notion of Strong Stationary Times we are using (see Sect. 4.2).

  3. 3.

    By construction, the running time of our model is key dependent. In extreme cases (very unlikely) it may leak some information about the key but it does not leak any information about the resulting permutation to an adversary. We also discuss how one can mask such a leakage (see Sect. 4.3).

(2) Better SST for RC4’s KSA. Our complementary contribution (Sect. 5.2) is the analysis of RC4 showing a new upper bound on number of steps the algorithm should perform. Similarly as in [15], we propose SST which is valid for Cyclic-to-Random Transpositions and Random-to-Random Transpositions, for the latter one we calculate the mixing time, which is however “faster” than the one given in [15]. It is known that Random-to-Random Transpositions card shuffling needs \({1\over 2}n\log n\) steps to mix. It is worth mentioning that although Random-to-Random Transpositions and Cyclic-to-Random Transpositions are similar in the spirit it does not transfer automatically that the latter one also needs \({1\over 2}n\log n\) steps to mix. Mironov [15] states that expected time till reaching uniform distribution is upper bounded by \(2n H_n - n\), we show in Lemma 3 that the expected running time for this SST in Random-to-Random Transpositions is equal to:

$$ E[T] = nH_n + n + O(H_n)$$

and empirically check that the result is similar for Cyclic-to-Random Transpositions. This directly translates into the required steps that should be performed by RC4 ’s KSA.

(3) Note on Spritz construction. We also have a look at Spritz (Sect. 6), a newer sponge-like construction. We explain why the Sign distinguisher attack cannot be successful and provide arguments why the KSA algorithm in Spritz may not perform enough steps.

2 Preliminary

Throughout the paper, let \(\mathcal {S}_n\) denote a set of all permutations of a set \(\{1,\ldots ,n\}=:[n]\).

2.1 Markov Chains and Rate of Convergence

Consider ergodic Markov chain \(\mathbf {X}=\{X_k, k\ge 0\}\) on finite state space \(\mathbb {E}=\{0,\ldots ,M-1\}\) with stationary distribution \(\psi \). Let \(\mathcal {L}(X_k)\) denote the distribution of a chain at time instant k. By the rate of convergence we understand the knowledge on how fast a distribution of a chain converges to its stationary distribution. We have to measure it according to some distance dist. Define

$$\tau ^{dist}_{mix}(\varepsilon )=\inf \{k: dist(\mathcal {L}(X_k),\psi )\le \varepsilon \},$$

which is called mixing time (w.r.t. given distance dist). In our case the state space is the set of all permutations of [n], i.e., \(\mathbb {E}:=\mathcal {S}_n\). The stationary distribution is the uniform distribution over \(\mathbb {E}\), i.e., \(\psi (\sigma )={1\over n!}\) for all \(\sigma \in \mathbb {E}\). In most applications the mixing time is defined w.r.t. total variation distance:

$$d_{TV}(\mathcal {L}(X_k),\psi )={1\over 2} \sum _{\sigma \in \mathcal {S}_n} \left| Pr(X_k=\sigma )-{1\over n!}\right| .$$

The separation distance is defined by

$$\begin{array}{lll} sep(\mathcal {L}(X_k),\psi ):=\displaystyle \max _{\sigma \in \mathbb {E}} \left( 1-n!\cdot Pr(X_k=\sigma )\right) . \\ \end{array} $$

It is relatively easy to check that \(d_{TV}(\mathcal {L}(X_k),\psi )\le sep(\mathcal {L}(X_k),\psi )\).

Strong Stationary Times. The definition of separation distance fits perfectly into notion of Strong Stationary Time (SST) for Markov chains. This is a probabilistic tool for studying the rate of convergence of Markov chains allowing also perfect sampling. We can think of stopping time as of a running time of algorithm which observes Markov chain \(\mathbf {X}\) and which stops according to some stopping rule (depending only on the past).

Definition 1

Random variable T is a randomized stopping time if it is a running time of the Randomized Stopping Time algorithm.

figure a

Definition 2

Random variable T is Strong Stationary Time (SST) if it is a randomized stopping time for chain \(\mathbf {X}\) such that:

$$\forall (i\in \mathbb {E})\ Pr(X_k=i | T=k) = \psi (i).$$

Having SST T for chain with uniform stationary distribution lets us bound the following (see [3])

$$\begin{aligned} sep(\mathcal {L}(X_k),\psi )\le Pr(T>k). \end{aligned}$$
(1)

We say that T is an optimal SST if \(sep(\mathcal {L}(X_k),\psi )=Pr(T>k)\).

2.2 Distinguishers and Security Definition

We consider three distinguishers: Sign distinguisher and Position distinguisher which are exactly the same as defined in [15], we also consider Permutation distinguisher – we consider his advantage in a traditional cryptographic security definition. That is, Permutation distinguisher is given a permutation \(\pi \) and needs to decide whether \(\pi \) is a result of a shuffle or if it is a permutation selected uniformly at random from the set of all permutations of a given size. The important difference is that the upper bound on Permutation distinguisher’s advantage is an upper bound on any possible distinguisher – even those which are not bounded computationally.

figure b

Position Distinguisher. Because of the nature of the process (in fact both: Random-to-Random Transpositions and Cyclic-to-Random Transpositions), the probability that ith card is at jth position depends on t: \(p^{(t)}_{i,j}=P(S[j] = i \) at time t) which can be pre-computed according to recursion: \( p_{i,j}^{(0)}=\left\{ \begin{array}{lll} 1 &{} &{} \text {if}\ i=j\\ 0 &{} &{} \text {otherwise} \end{array} \right. \) and for \(t>0:\)

$$ p_{i,j}^{(t)}=\left\{ \begin{array}{lll} p_{i,j}^{(t-1)}\left( 1-{1\over n}\right) + {1\over n} p_{t_0,j}^{(t-1)} &{} &{} \text {if}\ i\ne t_0,\\ {1\over n} &{} &{} \text {otherwise}, \end{array} \right. $$

where \(t_0=t\mod n\). The advantage of Position distinguisher dissolves in time – we will upper bound the time needed for this distinguisher to lose his advantage.

Sign Distinguisher. For a permutation \(\pi \) which has a representation of non-trivial transpositions \(\pi = (a_1 b_1)(a_2 b_2) \ldots (a_m b_m)\) the sign is defined as: \(sign(\pi ) = (-1)^m\). So the value of the sign is \(+1\) whenever m is even and is equal to \(-1\) whenever m is odd.

Permutation Distinguisher. The distinguishability game for the adversary is as follows:

Definition 3

The permutation indistinguishability Shuffle \(_{\textsf {S},\mathcal {A}}\)(n, r) experiment.

figure c

In the case when adversary wins the game (if \(b = b'\)) we say that \(\mathcal {A}\) succeeded. Adversary wins the game if she can distinguish the random permutation from the permutation being a result of the PRPG algorithm.

Definition 4

A shuffling algorithm S generates indistinguishable permutations if for all adversaries \(\mathcal {A}\) there exists a negligible function \(\textsf {negl}\) such that

$$ Pr\left[ {\textsf {Shuffle}}_{\textsf {S},\mathcal {A}}{(n,r)} = 1\right] \le \frac{1}{2} + \textsf {negl}(n). $$

The above translates into:

Definition 5

A shuffling algorithm S generates indistinguishable permutations if for any adversary \(\mathcal {A}\) there exists a negligible function \(\textsf {negl}\) such that:

$$\left| \underset{\scriptscriptstyle K \leftarrow \{0, 1\}^{KeyLen}}{Pr} \left[ \mathcal {A}( \textsf {S}(K)) = 1 \right] - {\underset{\scriptscriptstyle R \leftarrow \mathcal {U}(\mathcal {S}_n)}{Pr}}[\mathcal {A}(R) = 1] \right| \le \textsf {negl}(n)$$

3 Related Work

3.1 RC4 Algorithm

RC4 is a stream cipher, its so-called internal state is (Sij), where S is a permutation of [n] and ij are some two indices. As input it takes \(L-\)byte message \(m_1, \ldots , m_L\) and a secret key K and returns ciphertext \(c_1,\ldots ,c_L\). The initial state is the output of KSA. Based on this state PRGA is used to output bits which are XORed with the message. The actual KSA algorithm used in RC4 is presented in Fig. 1 together with its idealized version \({\textsf {KSA}}^{*}\) (where a secret key-based randomness is replaced with pure randomness) and our version of the algorithm \({\textsf {KSA}}^{**}\) (where, in addition to \({\textsf {KSA}}^{*}\), it does not run pre-defined number of steps, but the number depends on a key and is determined by some stopping procedure ST). The details on \({\textsf {KSA}}^{**}\) will be given in Sect. 4.1.

Fig. 1.
figure 1

KSA of RC4   algorithm and its idealized version \(\textsf {KSA} ^*\). The \(\textsf {KSA} ^{**}\) has some additional procedure ST (stopping time) which is computed during the execution of the algorithm (for original RC4 simply ST is: stop after n steps).

A closer look at \({\textsf {KSA}}^{*}\) reveals that it is actually so-called Cyclic-to-Random Transpositions. If we identify elements [n] with cards then we do the following: at step t exchange card \(t \bmod n\) with randomly chosen one. Throughout the paper, let \(\mathbf {Z}=\{Z\}_{t\ge 0}\) denote the chain corresponding to this shuffling and let \(\mathcal {L}(Z_t)\) denote the distribution of the chain at time t.

3.2 Sign Distinguisher for RC4 ’s KSA

It was observed in [15] that the sign of the permutation at the end of KSA algorithm is not uniform. And as a conclusion it was noticed that the number of discarded shuffles (by PRGA) must grow at least linearly in n. Below we present this result obtained in a different way than in [15], giving the exact formula for advantage at any step t. This form will be used by us to draw conclusions about Spritz algorithm in Sect. 6.1.

One can look at the sign-change process for the Cyclic-to-Random Transpositions   as follows: after the table is initialized, sign of the permutation is \(+1\) since it is identity so the initial distribution is concentrated in \(v_0 = (Pr(\text {sign}(Z_0)=+1),Pr(\text {sign}(Z_0)=-1))=(1, 0)\).

Then in each step the sign is unchanged if and only if \(i = j\) which happens with probability 1 / n. So the transition matrix \(M_n\) of a sign-change process induced by the shuffling process is equal to:

$$M_n: = \left( \begin{array}{cc} \frac{1}{n} &{} 1-\frac{1}{n}\\ 1-\frac{1}{n} &{} \frac{1}{n} \end{array} \right) .$$

This conclusion corresponds to looking at the distribution of the sign-change process after t steps: \(v_0 \cdot M_n^t \), where \(v_0\) is the initial distribution. The eigenvalues and eigenvectors of \(M_n\) are \((1,{2-n\over n})\) and \((1,1)^T,(-1,1)^T\) respectively. The spectral decomposition yields

$$ \begin{array}{llll} v_0\cdot M_n^t &{} = &{} (1,0) \left( \begin{array}{rr} 1 &{} -1 \\[8pt] 1 &{} 1 \end{array} \right) \left( \begin{array}{rr} 1 &{} 0 \\[8pt] 0 &{} {2-n\over n} \end{array} \right) ^t \left( \begin{array}{rr} {1\over 2} &{} -{1\over 2} \\[8pt] -{1\over 2} &{} {1\over 2} \end{array} \right)&= \displaystyle \left( {1\over 2}+{1\over 2}\left( {2\over n}-1\right) ^t,{1\over 2}-{1\over 2}\left( {2\over n}-1\right) ^t \right) . \end{array} $$

For \(n = 256\) (which corresponds to the value of n used in RC4) and initial distribution being identity permutation after \(t = n = 256\) steps one gets: \(v_0 \cdot M_{256}^{256} = (0.567138, 0.432862)\).

In [13] it was suggested that the first 512 bytes of output should be dropped. The Fig. 3 in Appendix A presents the advantage \(\epsilon \) of a sign-adversary after dropping k bytes of the output (so after \(n + k\) steps of the shuffle, for the mathematical model).

3.3 Position Distinguisher for RC4 ’s KSA

Mironov suggested analysis of idealized version of KSA algorithm. Being in permutation \(S\in \mathcal {S}_n\) at step i, the idealized version swaps element S[i] with purely random S[j]. Treating the permutation as a permutation of a deck of cards, this is exactly a known Cyclic-to-Random Transpositions card shuffling. On the other hand if both, S[i] and S[j] are chosen uniformly at random, the procedure is called Random-to-Random Transpositions card shuffling. It is known that Random Transposition requires around \({1\over 2}n\log n\) to reach uniform distribution, see [7]. Moreover, authors showed that “most of the action” actually happens at this step – the process exhibit so called cut-off phenomena. The analysis of Position distinguisher uses Strong Stationary Time (called Strong Uniform Times in [15]), based on Broder’s construction for Random-to-Random Transpositions. Unfortunately Mironov’s “estimate of the rate of growth of the strong uniform time T is quite loose” and results “are a far cry both from the provable upper and lower bounds on the convergence rate”. He:

  • proved an upper bound \(O(n\log n)\). More precisely Mironov showed that there exists some positive constant c such that \(P[T>c n\log n]\rightarrow 0\) when \(n\rightarrow \infty .\) Author experimentally checked that \(P[T > 2 n \lg n] < 1/n\) for \(n=256\) which corresponds to \(P[T > 4096] < 1/256\).

  • experimentally showed that \(E[T] \approx 11.16n \approx 1.4 n \lg n \approx 2857\) (for \(n = 256\)) – which translates into: on average one needs to drop \(\approx 2601\) initial bytes.

Later Mosel et al. [16] proved a matching lower bound establishing mixing time to be of order \(\varTheta (n\log n)\). However, the constant was not determined.

4 Randomized Stopping Times and Cryptographic Schemes

4.1 Strong Stationary Time Based KSA Algrorithms

We propose to use the \({\textsf {KSA}}^{**}_\texttt {Shuffle,ST}\)(n) algorithm which works as follows. It starts with identity permutation. Then at each step it performs some card shuffling procedure Shuffle. Instead of running it for a pre-defined number of steps, it runs until an event defined by a procedure ST occurs. The procedure ST is designed in such a way that it guarantees that the event is a Strong Stationary Time. At each step the algorithm uses new randomness – one can think about that as of an idealized version but when the length of a key is greater than the number of random bits required by the algorithm then we end up with a permutation which cannot be distinguished from a random (uniform) one (even by a computationally unbounded adversary).

figure d

Notational convention: in \({\textsf {KSA}}^{**}_\texttt {Shuffle,ST}\)(n) we omit parameter n. Moreover, if Shuffle and ST are omitted it means that we use Cyclic-to-Random Transpositions as shuffling procedure and stopping rule is clear from the context (as in \({\textsf {KSA}}^{**}\) given earlier in Fig. 1). Note that if we use for stopping rule ST “stop after n steps” (which of course is not SST), it is equivalent to RC4 ’s \({\textsf {KSA}}^{*}\) (also in Fig. 1).

Given a shuffling procedure Shuffle one wants to have a “fast” stopping rule ST (perfectly one wants an optimal SST which is stochastically the smallest). The stopping rule ST is a parameter, since for a given shuffling scheme one can come up with a better stopping rule(s). This is exactly the case with Cyclic-to-Random Transpositions and Random-to-Random Transpositions, we recall Mironov’s [15] stopping rule as well as new “faster” rule called StoppingRuleKLZ is given (in Sect. 5.2).

4.2 RST and Security Guarantees

Coupling method is a commonly used tool for bounding the rate of convergence of Markov chains. Roughly speaking, a coupling of a Markov chain \(\mathbf {X}\) with transition matrix \(\mathbf {P}\) is a bivariate chain \((\mathbf {X}',\mathbf {X}'')\) such that marginally \(\mathbf {X}'\) and \(\mathbf {X}''\) are Markov chains with transition matrix \(\mathbf {P}\) and once the chains meet they stay together (in some definitions this condition can be relaxed). Let then \(T_c=\inf _k\{X'_k=X''_k\}\) i.e., the first time chains meet, called coupling time. The coupling inequality states that \(d_{TV}(\mathcal {L}(X_k),\psi )\le Pr(T_c>k)\).

On the other hand separation distance is an upper bound on total variation distance, i.e., \(d_{TV}(\mathcal {L}(X_k),\psi )\le sep(\mathcal {L}(X_k),\psi )\). At first glance it seems that it is better to directly bound \(d_{TV}\), since we can have \(d_{TV}\) very small, whereas sep is (still) large. However, knowing that sep is small gives us much more than just knowing that \(d_{TV}\) is small, what turns out to be crucial for proving security guarantees (i.e., Definition 5). In our case (\(\mathbb {E}=\mathcal {S}_n\) and \(\psi \) is a uniform distribution on \(\mathbb {E}\)) having \(d_{TV}\) small, i.e., \(d_{TV}(\mathcal {L}(X_k),\psi )={1\over 2}\sum _{\sigma \in \mathcal {S}_n}\left| Pr(X_k=\sigma )-{1\over n!}\right| \le \varepsilon \) does not imply that \(|Pr(X_k=\sigma )-{1\over n!}|\) is uniformly small (i.e., of order \({1\over n!}\)). Knowing however that \(sep(\mathcal {L}(X_k),\psi )\le \varepsilon \) implies

$$\begin{aligned} \forall (\sigma \in \mathbb {E}) \ \left| Pr(X_k=\sigma )-{1\over n!}\right| \le {\varepsilon \over n!}. \end{aligned}$$
(2)

Above inequality is what we need in our security definitions and shows that the notion of separation distance is an adequate measure of mixing time for our applications.

It is worth noting that \(d_{TV}(\mathcal {L}(X_k),\mathcal {U}(\mathbb {E}))\le \varepsilon \) implies (see Theorem 7 in [2]) that \(sep(\mathcal {L}(X_{2k}),\mathcal {U}(\mathbb {E}))\le \varepsilon \). This means that proof of security which bounds directly total variation distance by \(\varepsilon \) would require twice as many bits of randomness compared to the result which guarantees \(\varepsilon \) bound on separation distance.

4.3 RST and Timing-Attacks

One of the most serious threats to any cryptographic scheme are side-channel attacks. One type of such attacks are timing-attacks where an attacker by observing the running time of the execution of a cryptosystem derives information about the key used. Timing attacks are especially powerful [1, 22] since an attacker may perform them remotely, over the network (while most of other types of side-channel attacks can be performed only when an attacker is nearby). In order to limit threat of timing-attacks, attempts to implement constant-time cryptographic schemes are made. The problem is that such attempts are usually unsuccessful  [18] even if the underlying architecture (“claims”) allows for that [11, 24].

The running time of an SST-based algorithm strictly depends on the secret key. However, in this section we explain why algorithms using randomized stopping times are immune to timing-attacks, we discuss separately security of two assets: (1) resulting permutation, (2) secret key.

Timing-Attacks and the Security of the Resulting Permutation. We already defined SST (Definition 2) in Sect. 2.1 but one can define SST differently.

Definition 6

Random variable T is Strong Stationary Time (SST) if it is a randomized stopping time for chain X such that:

$$ X_T\ \text {has distribution}\ \psi \text { and is independent of } T.$$

Corrolary 1

The information about the number of rounds that an SST-algorithm performs does not reveal any information about the resulting permutation.

Corollary 1 comes from the fact that the Definition 2 which defines SST as a certain randomized stopping time is equivalent to the Definition 6 which defines SST as a variable independent of the resulting distribution. For the proof of the equivalence see [3].

Timing Attacks and the Security of the Secret Key. Unfortunately, although no information about the resulting permutation leaks, some information about the secret key may leak. Shuffling may reveal randomness through the running time (see Example 2 in Appendix D). In practical implementations, one may use some function of a key instead of pure randomness in each step. Then (at least) two following cases may happen:

  1. 1.

    Bits of the keystream are re-used: the running time of the algorithm (SST) may leak both: information about key and the information about permutation (compare with Example 1 in Appendix D).

  2. 2.

    Bits of the keystream are “fresh” (never re-used): the running time of the algorithm (SST) may leak information about the key but it does not leak any information about the produced permutation! (compare with the Example 2 in Appendix D).

Masking SST. One can prevent obtaining information about the secret key by timing-attacks by performing a simple masking. For a stopping rule \(\texttt {ST}\) that results in expected running time ET one runs the algorithm for at least ET steps even if the \(\texttt {ST}\) occurred earlier.

This eliminates very short executions which could reveal information about the key.

On the other hand, for practical implementation one may want to eliminate the extremely long executions. This can be done by letting the algorithm run for e.g., \(ET + c \cdot \sqrt{VarT}\) (where c is a parameter and VarT is the variance for the \(\texttt {ST}\)).

5 (Not So) Random Shuffles of RC4 – Revisited

5.1 Mironov’s Stopping Rule – Details

The goal of KSA of original RC4 is to produce a pseudorandom permutation of \(n=256\) cards. The original algorithm performs 256 steps of Cyclic-to-Random Transpositions. However it is known that the mixing time of Cyclic-to-Random Transpositions is \(\varTheta (n\log n)\). Then performing only 256 (i.e., n) steps seems much too less. In fact, it is recommended to perform at least 3072 steps, see Mironov [15]. Generally, the more steps are performed, the closer to uniformity the final permutation is. Mironov considered idealized version of the algorithm together with the following marking rule:

“At the beginning all cards numbered \(0,\ldots ,n-2\) are unchecked, the \((n-1)^{th}\) card is checked. Whenever the shuffling algorithm exchanges two cards, S[i] and S[j], one of the two rules may apply before the swap takes place:

  1. a.

    If S[i] is unchecked and \(i=j\), check S[i].

  2. b.

    If S[i] is unchecked and S[j] is checked, check S[i].

The event T happens when all cards become checked.”

Then the author proves that this is a SST for Cyclic-to-Random Transpositions and shows that there exists constant c (can be chosen less than 30) such that \(Pr[T>c n\log n]\rightarrow 0\) when \(n\rightarrow \infty \). Empirically, for \(n=256\) he shows that \(Pr[T>2n\log n]<1/n\). Note that this marking scheme is also valid for Random-to-Random Transpositions shuffling.

Lemma 1

The expected running time of Random-to-Random Transpositions shuffling with Mironov stopping rule is:

$$ ET = 2nH_n -n + O(H_n). $$

Proof

We start with one card checked. When k cards are checked, then probability of checking another one is equal to \(p_k = \frac{(n-k)(k+1)}{n^2}\). Thus, the time to check all the cards is distributed as a sum of geometric random variables and its expectation is equal to:

$$\sum _{k=1}^{n-1} \frac{1}{p_k} = 2 \frac{n^2}{n+1} H_n - n = 2nH_n - n + O(H_n).$$

5.2 Better Stopping Rule

We suggest another “faster” SST which is valid for both Cyclic-to-Random Transpositions and Random-to-Random Transpositions. We will calculate its expectation and variance for Random-to-Random Transpositions and check experimentally (see Appendix C) that it is similar if the stopping rule is applied to Cyclic-to-Random Transpositions. As a result (proof given at the end of this Section) we have:

Theorem 1

Let \(\mathcal {A}\) be an adversary. Let \(K\in \{0,1\}^{r n}\) be a secret key. Let \(\mathcal {S}(K)\) be \({\textsf {KSA}}^{*}_\texttt {RTRT}\) (i.e., with Random-to-Random Transpositions shuffling) which runs for

$$ r= n(H_n+1)+{\pi n\over 2} {1\over \sqrt{n!\varepsilon }}$$

steps with \(0<\varepsilon <{1\over n!}\). Then

$$\left| \underset{\scriptscriptstyle K \leftarrow \{0, 1\}^{rm}}{Pr} \left[ \mathcal {A}(\textsf {S}(K)) = 1 \right] - {\underset{\scriptscriptstyle R \leftarrow \mathcal {U}(\mathcal {S}_n)}{Pr}}[\mathcal {A}(R) = 1] \right| \le \varepsilon $$

The stopping rule is given in StoppingRuleKLZ algorithm.

figure e

Lemma 2

The resulting permutation of \({\textsf {KSA}}^{**}\) with ST =StoppingRuleKLZ has a uniform distribution over \(\mathcal {S}_n\).

Proof

We will show that the running time of the algorithm is a SST, i.e., that the card marking procedure specified in StoppingRuleKLZ is a SST for Cyclic-to-Random Transpositions. First phase of the procedure (i.e., the case when there are less than \(\lceil (n-1)/2\rceil \) cards marked) is constructing a random permutation of marked cards by placing unmarked cards on randomly chosen unoccupied positions, this is actually first part of Matthews’s marking [14] scheme. Second phase is simply a Broder’s construction. Theorem 9 of [15] shows that this is a valid SST for Cyclic-to-Random Transpositions. Both phases combined produce a random permutation of all cards.

Remark 1

One important remark should be pointed. Full Matthews’s marking [14] scheme is “faster” than ours. However, although it is a SST for Random-to-Random Transpositions, this is not SST for Cyclic-to-Random Transpositions.

Calculating ET or VarT seems to be a challenging task. But note that marking scheme StoppingRuleKLZ also yields a valid SST for Random-to-Random Transpositions. In next Lemma we calculate ET and VarT for this shuffle, later we experimentally show that ET is very similar for both marking schemes.

Lemma 3

Let T be the running time of \({\textsf {KSA}}^{**}\) with Random-to-Random Transpositions shuffling and ST=StoppingRuleKLZ. Then we have

$$\begin{aligned} \begin{array}{lll} E[T] &{} = &{} nH_n + n + O(H_n), \\[8pt] Var[T] &{} \sim &{} \frac{\pi ^2}{4}n^2, \end{array} \end{aligned}$$
(3)

where \(H_n\) is the \(n-\)th harmonic number and \(f(k) \sim g(k)\) means that \(\lim _{k \rightarrow \infty } \frac{f(k)}{g(k)} = 1\).

The details of the proof of the Lemma 3 are in Appendix B.

Proof

Define \(T_k\) to be the first time when k cards are marked (thus \(T\equiv T_n\)). Let \(d=\lceil (n-1)/2\rceil \). Then \(T_d\) is the running time of the first phase and \(\left( T_n - T_d\right) \) is the running time of the second phase. Denote \(Y_k: = T_{k+1}-T_k\).

Assume that there are \(k<d\) marked cards at a certain step. Then the new card will be marked in next step if we choose two unmarked cards what happens with probability: \(p_a(k)= {(n-k)^2\over n^2}.\) Thus \(Y_k\) is a geometric random variable with parameter \(p_a(k)\) and

$$\begin{aligned} E[T_d] = n^2\left( H_n^{(2)}-H_{n-d}^{(2)}\right) . \end{aligned}$$

Now assume that there are \(k\ge d\) cards marked at a certain step. Then, the new card will be marked in next step with probability:

$$p_b(k)= {(n-k)(k+1)\over n^2}$$

and \(Y_k\) is a geometric random variable with parameter \(p_a(k)\). Thus:

$$E[T_n-T_d] = nH_n - \frac{n}{n+1}H_n + {n^2\over n+1} \left( H_{n-d} - H_d \right) .$$

For variance we have:

$$Var[T_n] = Var[T_d]+Var[T_n-T_d] \sim \frac{\pi ^2}{4}n^2. $$

\(\square \)

From Lemma 3 and Chebyshev’s inequality we immediately have the following:

Corrolary 2

Consider the chain corresponding to \({\textsf {KSA}}^{**}\) with Random-to-Random Transpositions shuffling and ST=StoppingRuleKLZ. Then we have

$$\tau _{mix}^{sep}(\varepsilon )\le n(H_n+1)+{\pi n\over 2} {1\over \sqrt{\varepsilon }}.$$

Proof

(of Theorem 1 ). In Theorem 1 we perform Random-to-Random Transpositions for \(r=\tau _{mix}^{sep}(n!\varepsilon )\) steps, i.e., \(sep(\mathcal {L}(X_r),\psi )\le n!\varepsilon \) Inequality (2) implies that \(|Pr(X_r=\sigma )-{1\over n!}|\le \varepsilon \) for any permutation \(\sigma \) and thus completes the proof. \(\square \)

5.3 Predefined Number of Steps vs SST-based Algorithms

There is a subtle difference between the randomized stopping time (like the one suggested in the paper) and an algorithm that performs a predefined number of steps. If one wants to achieve security level of e.g., \(\varepsilon = O(1/{n^k}), k > 2\) then the number of steps that would assure that the advantage is smaller than \(\varepsilon \) would need to be equal to: \(\tau _{mix}(\varepsilon )\le n(H_n+1)+{\pi n\over 2} {n}^{k/2} = O\left( n^{1+k/2}\right) \).

As we can see the estimated running time of Cyclic-to-Random Transpositions with both stopping rules is similar to the theoretical results for Random-to-Random Transpositions. Recall that for Cyclic-to-Random Transpositions it is known that the mixing time is of order \(\varTheta (n\log n)\), see [16], however the constant was not determined. Based on our new SST and simulations (see Appendix C) one can conjecture the following

Conjecture 1

The mixing time \(\tau ^{sep}_{mix}\) for Cyclic-to-Random Transpositions converges to \(n\log n\) as \(n\rightarrow \infty \).

6 A Note on Spritz

Spritz [20] is a new stream cipher that was proposed in 2014 by Rivest and Schuldt as a possible replacement for RC4.

The first cryptanalytic results were already achieved: inefficient state recovery attack [5] (with \(2^{1400}\) complexity) later improved by [6] (with \(2^{1247}\) steps). The second paper presents also a devastating distinguishing attacks of complexity \(2^{44.8}\) (multiple key-IV setting) and \(2^{60.8}\) (single key-IV setting).

Here we analyze the distribution of the internal state of Spritz after the main part of the scheme (procedure \(\textsc {Shuffle()}\)) is run. Similarly to the previous approach we replace the deterministic part of \(\textsc {Update()}\) function: \(j: = k + S[j + S[i]]\), with its idealized version: \(j: =\mathbf{random}(n)\)).

The definitions of Spritz’ procedures that are of our interest are presented in Appendix E.

6.1 Sign Distinguisher

Although we did not find strong stationary time for “KSA” part of Spritz algorithm, one can easily notice that the Sign Distinguisher has no advantage at all. This property is achieved thanks to Crush procedure. During this procedure, the table S is partially sorted i.e., elements at positions v and \(n-1-v\) (for \(v = 0 \ldots \lfloor N/2 \rfloor -1\)) are swapped whenever \(S[v] > S[n-1-v]\). So this corresponds to multiplying the sign process by:

$$M_{crush}: = \left( \begin{array}{cc} \frac{1}{2} &{} \frac{1}{2}\\[8pt] \frac{1}{2} &{} \frac{1}{2} \end{array} \right) .$$

If Spritz is used as stream cipher, as part of Squeeze procedure, at least one call to Shuffle is made. So the distribution of sign can be described as

$$v_0 \cdot M_n^{2n} M_{crush}^{n/2} M_n^{2n} M_{crush}^{n/2} M_n^{2n} = \left( \frac{1}{2}, \frac{1}{2}\right) .$$

This means that advantage of Sign Distinguisher for Spritz equals to 0.

6.2 Position Distinguisher

Let us recall what was one of the main drawbacks of the original RC4: it performed n steps instead of \(c n\log n\). The underlying mathematical model is simply a Cyclic-to-Random Transpositions card shuffling. This is somehow similar to Random-to-Random Transpositions for which it takes of order of \(n\log n\) steps. More exactly, there is so-called cutoff phenomena at \({1\over 2}n\log n\). Roughly speaking, lower and upper bounds are of this order. Analysis of Cyclic-to-Random Transpositions seemed to be harder, recently [16] the matching lower bound was established showing that mixing time is of order \(\varTheta (n\log n)\).

Recall that Spritz performs: in total 6n steps of Cyclic-to-Random Transpositions (as part of Whip procedure) and partial sorting of elements (Crush procedure) in the internal state is performed twice (after 2-nd and 4-th shuffling). This is of course more complicated shuffling than just repeating Cyclic-to-Random Transpositions.

Clever use of Crush lets Spritz to get rid of the Sign Distinguisher but at the same time it seems that it may badly influence the mixing time.

Imagine that there exists some SST which during the Spritz execution performs marking of the elements. Marked elements satisfy property that their mutual position is equally distributed. Now, take a look at the step when Crush is performed and there are two marked elements at positions v and \(N-1-v\). Then after Crush their relative position will be uniquely determined! This observation suggests that mixing time for Spritz would be greater than \(n \log n\).

7 A Note on Optimal Shuffling

Cyclic-to-Random Transpositions is the shuffle used in RC4 and in Spritz. To reach stationarity (i.e., produce random permutation), as we shown, one needs to perform \(O(n\log n)\) steps. In each step we use a random number from interval \([0,\ldots ,n-1]\), thus this shuffling requires \(O(n\log ^2 n)\) random bits.

One can ask the following question: Is this shuffling optimal in terms of required bits? The answer is no. The entropy of the uniform distribution on [n] is \(O(n\log n)\) (since there are n! permutations of [n]), thus one could expect that optimal shuffling would require this number of bits.

We will shortly describe (time reversal of) Riffle Shuffle, for details see [2]. For a given permutation \(\sigma \in \mathcal {S}_n\) we assign each element a random bit. Then we put all the elements (cards) with assigned bit 0 to the top keeping their relative ordering. The following is a known SST for this shuffle: At the beginning all \({n\atopwithdelims ()2}\) pairs of cards are unmarked. At each step one marks a pair (ij) if elements i and j were assigned different bits. Let T be the first time all pairs are marked.

The above SST of Riffle Shuffle has \(ET=2\lg n\), at each step n random bits are used, thus this shuffling requires \(2n\lg n\) random bits, matching the requirement of optimal shuffle (up to a constant).

In this paper we mainly focused on RC4 and thus on Cyclic-to-Random Transpositions shuffle. However, we wanted to point out that using Riffle Shuffle (or other shuffling schemes) can result in better efficiency of the whole scheme.

8 Conclusions

We presented the benefits of using Strong Stationary Times in cryptographic schemes (pseudo random permutation generators). These algorithms have a “health-check” built-in and guarantee the best possible properties (when it comes to the quality of randomness of the resulting permutation). We showed that use of SST does not lead to timing attacks. We showed that algorithms using SST achieve better security guarantees than any algorithm which runs predefined number of steps.

Fig. 2.
figure 2

Comparison between number of bits used by RC4 (40 to 2048) and required by mathematical models (Mironov [15] and ours) versus length of the key for the time-reversed riffle shuffle. Bits asymptotics approximates the number of fresh bits required by the mathematical model (number of bits required by the underlying Markov chain to converge to stationary distribution). Bits required is (rounded) value of # bits asymptotics when \(n = 256\).

Complementarily, we proved better bound for the mixing-time of the Cyclic-to-Random Transpositions shuffling process which is used in RC4 and showed that different, more efficient shuffling methods (i.e., time reversal of Riffle Shuffle) may be used as KSA. This last observation shows that the gap between mathematical model (4096 bits required) and reality (2048 allowed as maximum length of RC4) is not that big as previously thought (bound of 23037 by Mironov [15]).