1 Introduction

The RC4 stream cipher has a simple but robust structure, which, after going through significant analysis by researchers, proves to be robust enough on different platforms. The core of this cipher is only one internal state table of size N (generally 256), which eventually acts as an S-box to perform the main jumbling of bytes. There are two components in RC4: the KSA, i.e. the Key-Scheduling Algorithm, and the PRGA, i.e. the Pseudo-Random Generation Algorithm. The first one is usually employed for S-box initialisation and the second one generates the actual key-stream. In both components, the swap function plays an important role in exchanging the positions of two bytes. Researchers argued that the swap function introduces initialisation weakness in the S-box, as well as strong biases in the key-stream.

In this paper, the authors have first attempted to find out the optimal number of bytes that should be discarded to remove the biases from RC4. To study whether the security of RC4 really increases if more and more initial bytes are discarded from the key-stream, here N (256), 2N (512), 4N (1024) and lastly 8N (2048) initial bytes are discarded and then the outputs are analysed along with the original algorithm [1], following the guidelines of NIST (National Institute of Standards and Technology), USA, in their Statistical Test Suite, coded by the authors. It has been found that there is a certain optimal level of discarding the output bytes—it is not that discarding as many bytes as possible helps continuously in increasing the security.

Also, different logics have been introduced to modify the PRGA to handle at least two S-boxes to generate a more complicated key-stream, which has been analysed along with the model proposed by Paul and Preneel [2]. The authors have also introduced a key-expansion logic, which gives a stronger initialisation to the S-box and calculates the initial value of j in the PRGA from the key values, not as 0, as given in the original algorithm [1]—thus giving a more dynamic value to j. RC4 is described as follows:

2 Existing articles and motivation

Maitra and Paul [1] revealed the non-uniformity in KSA and presented a three-layer architecture in a scrambling phase to remove the weaknesses of the KSA and the PRGA. Their modified algorithm, named as RC4+, resolves most of the existing weaknesses of RC4. Paul and Preneel [2] described a new statistical bias in the distribution of the first two output bytes of RC4 and suggested some more random variables in the PRGA to reduce correlation between states, forming a new variant, RC4A. They proposed dropping 2N initial bytes. Rivest and Schuldt [3] used extensive simulations to search biases in RC4 and suggested that about 281 output bytes are required to distinguish a new variant. They named the improved variant as ‘Spritz’. Mironov [4] identified a weakness in RC4, stemming from an imperfect shuffling algorithm used in the KSA and the PRGA. He argued that discarding the initial 12N (or at least 3N) bytes from the output stream eliminates the possibility of strong attacks. Klein [5] argued that even after many steps of the PRGA, it is possible to exploit the weakness of the KSA. They also analysed RC4A and found that it also has weaknesses, similar to that of RC4—a new attack and a correlation between the output and the internal state of RC4 have been presented.

Roos [6] mentioned some limitations in the KSA by identifying several classes of weak keys for RC4 with some important technical results. He has defined strong correlations between the secret key bytes and the final key-stream generated. Akgün et al [7] showed that the KSA leaks information about the secret key if the initial state table is known. A new bias in the KSA has been detected by them, and they proposed the framework of a new algorithm to retrieve the RC4 key in a faster way. Mantin and Shamir [8] noted down the major non-uniformity in the output bytes at the second round of RC4, whose probability is twice the expected value. They pointed out that a cipher-text-only attack can exploit the weakness in broadcast applications.

SenGupta et al [9] studied the RC4 designing problem of throughput. An architecture to generate two key-stream bytes per clock using the idea of loop unrolling and pipelining has been implemented. Nawaz et al [10] introduced a new 32-bit RC4-like faster key-stream generator with a huge internal state and offered higher resistance against state recovery attacks. Tomašević and Bojanić [11] proposed a new technique to improve cryptanalytic attack on RC4 based on a tree representation of RC4. The strategy has been used to favour more promising values that should be assigned to unknown entries in the RC4 table. Grosul and Wallach [12] observed that by increasing key length, the strength of the RC4 key did not grow linearly and advised discarding the first 256 bytes of the key-stream. They showed that for each 2048-bit key, a family of related keys generates similar key-streams.

Al Fardan et al [13] recommended that RC4 should be avoided in TLS (Transport Layer Security) and WPA (Wi-Fi Protected Access). They have shown how plaintext recovery for RC4 is possible from arbitrary positions in the plaintext. Zoltak [14] reported that among 20 RC4-like algorithms, only VMPC-R produces strong pseudo-random output. They concentrated mainly on the PRGA, keeping the KSA same for all. They found statistical weaknesses in almost all of them. Fluhrer and McGrew [15] proved that the joint distribution of two successive bytes differs significantly from the uniform distribution. They computed the joint probability of two consecutive output bytes, which requires much less key-stream bytes. Sepehrdad et al [16] presented a tool that may be used as an application of automated discovery of weaknesses in ciphers—this may suggest a new kind of tool for cryptanalysts. They presented several weaknesses in RC4 by this technique.

Church [17] gave a complete list of irreducible polynomials for prime moduli (2, 3, 5, 7 and 11) of each degree. The determination of the exponents provided very satisfactory control of the irreducibility of the polynomials. Daemen and Rijmen [18] defined a process of creating S-boxes by a mathematical process in GF(28) (Galois Field). The process of calculating the multiplicative inverse of a byte with an irreducible polynomial as modulus has been explained. The publication SP 800-90A of NIST [19] contains specifications for cryptographically secured PRNGs (Pseudo-Random Number Generators), providing some methods based on hash functions, block cipher algorithms or number theoretic problems.

3 Proposed modifications to RC4

Roos [6] observed a strong correlation in RC4 between the S-box and the key. He pointed out the swap function of KSA as responsible for this while initialising the S-box. The line j = j + S[i] + K[i], preceding the swap function, calculates the indices i and j, where i is deterministic and j is pseudo-random. He argued that by this way, a particular element may change its position once or more, or may not relocate at all,—which becomes a major weakness in the cipher. According to his analysis, there is a high probability of about 37% that an element is swapped not even once. To get rid of this weakness, he proposed to discard at least N number of initial bytes from the key-stream.

Mironov [4] also identified the same kind of weaknesses in RC4 key-stream due to improper swap function, as he mentioned, as the prime cause of bias. He observed that the bias appears up to the first 2N or 3N bytes. Using an abstract model, he calculated a maximum number of 12N initial bytes to be dumped, to get a safe key-stream.

In this paper, an attempt has been made to identify the optimum number of bytes to be discarded from RC4 key-stream before starting the actual encryption. Undoubtedly, dumping more and more bytes from the output stream, as a result, does not keep on increasing the security. Here, a total of five key-streams have been analysed, generated by discarding 0, N, 2N, 4N and 8N (N = 256) bytes, in separation; the first one is the original cipher, and others are identified as RC4_N, RC4_2N, RC4_4N and RC4_8N, respectively.

Moreover, a new method of initialising the RC4 S-box has been proposed through modular arithmetic. The 256 cells of the S-box have been filled up by the multiplicative inverses of the 256 bytes (0–255), where an irreducible polynomial [16, 17, 20] in GF(28) has been used as the modulus [21, 22]. By this way, the S-box can be initialised with 256 random-like values, which is the primary aim of the KSA. This is how one can take care of the controversial swap function. The list of these polynomials is given as follows:

To ensure that the S-box is more unpredictable, these initial values have then been merged with the key bytes, by introducing a key-mixing logic, described in the following pseudocode. The logical block diagram of the same is shown in figure 1.

Figure 1
figure 1

Block diagram of the key-mixing procedure to initialise the RC4 S-box (represents addition mod 256).

By this way, each bit of the key is diffused into several stages, and sufficient nonlinearity is introduced against reconstructing the state array, to make it more resistant to known cryptanalytic attacks. This key-mixing process is simple to implement on any platform, removing biases and weak keys inserted by the swap function of KSA.

In the original RC4, values of the S-box are only swapped, not updated. Researchers argue on this point that many a value for the S-box may not at all move in this process, and thus serious bias occurs in the output of this cipher. Keeping the increase in security and robustness in mind as the main points, the authors tried to minimise the bias by removing the conventional swap from the KSA portion of RC4 and achieving the key-mixing logic.

While updating values with key-mixing, first the newly generated value is stored in a binary tree. Then, a binary search is performed on the tree from 0 to i 1 to search for a duplicate, with time complexity O(log i). If the newly generated value is found to be a duplicate one, the same is incremented by 1 (mod 256), and again a new search is performed, from the very beginning. Once it is confirmed that the newly generated value is unique from S[0] to S[i] up to the Nth value (N = 256, here), then only the value is taken as granted. After sufficient experimental and implementational observations, it can be concluded that the aforementioned algorithm is fast and efficient enough, and there is no scope of generating duplicate values once it completes its operation. Future work might be interesting for the mathematicians and researchers in this regard.

Though the aforementioned procedure definitely increases the time complexity of the cipher, it can be mentioned that the key-mixing is used only in place of the KSA, i.e. the S-box initialisation only. As the PRGA remains unchanged for the rest of the encryption process, which generally uses data of very large size that may tend to infinity, the increased time complexity of S-box initialisation will not eventually affect the overall time complexity of the cipher.

As a next phase, following the proposal of Paul and Preneel [2], multiple S-boxes have been used to make the output stream more randomised. Their algorithm of PRGA, marked as RC4_2A here, has been reformed by the authors, where two irreducible polynomials from GF(28) acted as moduli to initialise two S-boxes; the key-mixing logic has not been utilised in this part. Initial values of j1 and j2 have been calculated from key values, not from 0 as in RC4 or its other variants, thus, giving more dynamic values while starting the PRGA. The modified PRGA, marked as RC4_2B, is given as follows:

In another attempt, PRGA has been reformed again, where two values of i: i1 and i2, for each S-box, have been introduced, initialised as 0, forming a new variant. Initial values of j1 and j2 have been calculated using the same logic as before. The basic aim is to create the key-stream more random to make it harder for the intruder to break. This variant, marked as RC4_2C, is as follows:

Finally, another variant, with a single S-box, has been formed by implementing the key-mixing logic, whose initialisation has been done by calculating the multiplicative inverses using modular arithmetic, as described before. This algorithm is much simpler, and its space complexity is also reasonably less. Furthermore, in this way, the KSA has been replaced by a new function. This one is also analysed along with the aforementioned variants to check efficiency.

All these variants of RC4, along with the original one, have been statistically analysed using the NIST Statistical Test Suite, containing 15 different statistical tests to check the robustness of a cipher-text. For each test, a P-value (probability value) will be generated, from which two parameters, POP (Proportion of Passing) and UD (Uniformity Distribution), will be calculated to compare the results. For all the algorithms, an example text file has been encrypted 500 times using 500 same encryption keys, generating 500 cipher-texts for each algorithm, each containing 1,342,500 bits, as has been recommended by NIST [23]. Results are then compared to find out how the security varies following the steps of these algorithms.

4 Results and discussion

The analysis proves that though RC4 itself is sufficiently secured, the new variants claim themselves even more efficient than RC4. It has been found that the reformed algorithms create a tweak in RC4 to increase security. First of all, the optimum number of initial key-stream bytes to be discarded is determined by analysing RC4, RC4_N, RC4_2N, RC4_4N and RC4_8N. It has been observed that discarding some initial bytes undoubtedly increases the randomness in outputs, but discarding more and more bytes eventually may not increase the security, and at some point, ‘the beginning of RC4 ends’ [4].

Tables 1(a) and (b) show the results of the analysis, where the POP status and UD of NIST tests [23,24,25] for these five variants, are compared. Percentages of the probable best results in a particular test for each algorithm have been calculated (shaded in rows), which are then summed at the bottom of the table. The highest count (here, for RC4_4N) gives the winner, showing that this one is more robust than the remaining.

Table 1 Comparison of (a) POP status and (b) UD status generated by 15 NIST tests for discarded bytes.

POPs and UDs for the five algorithms, compared to the expected values, are shown in tables 2(a)–(e). The earlier two tables (table 1(a) and (b)) have been generated from these five tables. Distributions of P-values for the same data sets are displayed in table 3(a)–(e). The interval between 0 and 1 is divided into 10 sub-intervals, and P-values lying within each sub-interval are counted; distribution of P-values should be uniform in each sub-interval [23]. Figure 2 shows histograms on the distribution of P-values for two tests 4 and 8.

Table 2 POP status and UDs for (a) RC4, (b) RC4_N, (c) RC4_2N, (d) RC4_4N and (e) RC4_8N.
Table 3 P-value distribution for (a) RC4, (b) RC4_N, (c) RC4_2N, (d) RC4_4N and (e) RC4_8N.
Figure 2
figure 2figure 2

P-value distributions of test 4 for (a) RC4, (b) RC4_N, (c) RC4_2N, (d) RC4_4N, (e) RC4_8N. P-value distributions of Test 8 for (f) RC4, (g) RC4_N, (h) RC4_2N, (i) RC4_4N, (j) RC4_8N.

Figure 3 displays scattered graphs on the POP status for the 15 tests, which examine the proportion of sequences that pass a test. A threshold value (expected POP) has been calculated following the guidance given by NIST [23]. If the proportion falls outside of (i.e., less than or below) this expected value, then there is evidence that the data is not random [23]. If most of the values are greater than (or above) this expected value, then the data is considered to be random. For any algorithm, the more the number of POPs that tend to 1 for the 15 tests, the more the data set is assumed to be random.

Figure 3
figure 3

POP status of 15 NIST tests for (a) RC4, (b) RC4_N, (c) RC4_2N, (d) RC4_4N, (e) RC4_8N.

Thus, the observation is that discarding too many of the initial key-stream bytes does not help for increasing the security of RC4; it requires an optimum value in between to be maintained, which in the current data set has been found to be 4N, i.e. here 1024.

The next analysis is based on multiple S-boxes in RC4 (RC4_2A, RC4_2B and RC4_2C), along with the newly proposed variant using modular arithmetic and key-mixing logic to initiate the S-box. The results are shown in tables 4(a) and (b).

Table 4 Comparison of (a) POP status and (b) UD status generated by 15 NIST tests for RC4 variants including key-mixing.

Now it is clear from these two tables that obviously the new variant with key-mixing logic is able to establish itself as the best, even with a simpler logic and a single S-box among the other variants. Hence, it can now be decided that using only one S-box and an unchanged PRGA has a better time and space management than the variants with multiple S-boxes. POPs and UDs of this data set are depicted in table 5(a)–(e), and distributions of P-values generated by the same are included in tables 6(a)–(e) as per the rules stated earlier.

Table 5 POP status and UDs for (a) RC4, (b) RC4_2A, (c) RC4_2B, (d) RC4_2C and (e) key-mixing.
Table 6 P-value distribution for (a) RC4, (b) RC4_2A, (c) RC4_2B, (d) RC4_2C and (e) key-mixing.

Figure 4 shows histograms of the distribution of P-values for two tests 4 and 8 for four variants including original RC4. Scattered graphs on the POPs for the tests 4 and 8 for the same four variants are given in figure 5 to find the most secured variant following the logic as stated earlier.

Figure 4
figure 4figure 4

(a) and (b) P-value distributions of tests 4 and 8 for RC4. P-value distributions of tests (c) 4 and (d) 8 for RC4_N. (e) 4 and (f) 8 for RC4_2N. (g) 4 and (h) 8 for RC4_4N. (i) 4 and (j) 8 for RC4_8N.

Figure 5
figure 5

POP status of 15 NIST tests for (a) RC4 and (b) RC4_2A, (c) RC4_2B, (d) RC4_2C, (e) key-mixing.

These figures indicate that for key-mixing logic, more number of points tend to 1, compared with the remaining. Hence, graphically too, it is now evident that this particular variant gives more secured output among the other RC4 alternatives. It also points out that if an RC4 variant with single S-box and a simple key-mixing logic give better security than the ones with multiple S-boxes, it inevitably possesses much less operational complexity and better space management.

5 Conclusion

Besides having enough robustness, one can envisage that the internal state array of RC4 has some limitations and internal biases due to which convenience may be available to the intruders, as argued by a number of researchers. The security of RC4 will be undoubtedly enhanced if the state array can be initialised using any intelligent technique, where it becomes stronger for different applications. Here, by discarding an optimum number of initial bytes and accompanying modified intelligent PRGAs with multiple S-boxes, the security of RC4 has been grossly enhanced.

Using the key-mixing logic, it has been found that the security of RC4 can be strongly established without using multiple S-boxes and keeping the original PRGA unchanged. Other mathematical procedures, models, logic and modifications may also be employed to refine RC4 and studies on them are required to find better opportunities to generate more secured key-streams.