1 Introduction

With the development of electronic commerce, one-way hash function has been widely developed in public-key cryptography, digital signatures [1], integrity verification [2], message authentication and dynamic password authentication [3], etc. A hash function is a one-way function that can be used to map digital data of arbitrary size to digital data with fixed size. The digital data returned by a hash function are referred to as hash value or message digest. Hash functions have the properties of sensitivity to initial conditions, diffusion and confusion, collision resistance. There are some traditional one-way hash algorithms, such as MD2, MD4, MD5, and SHA [4, 5], but most of them are based on the hypothesis of complexity that requires lots of complicated logic computing (such as XOR) or packet encryption method-based multiple iterations. Nevertheless, there are inherent defects in logic operation although the computation is simple for the further method, and the computation of the latter method is large and it is hard to find fast and reliable encryption methods simultaneously. Chaos has some inherent merits of one way, sensitivity to tiny modifications in initial conditions and parameters, mixing property and ergodicity, which can be used for designing chaotic hash functions. In the last decades, many researchers propose different hash algorithms based on chaotic maps [619], encouraged by the specific properties of chaotic dynamical systems such as high sensitivity to the initial conditions, ergodicity, high complexity induced by their simple analytic expressions. However, these chaotic maps in cryptanalytic studies reveal security weakness [2027].

In this paper, we present a fast and efficient hash algorithm based on a generalized chaotic mapping with variable parameters. A piecewise linear chaotic map with variable parameter p is chosen, and a generalized chaotic mapping is defined by utilizing the piecewise linear chaotic map and trigonometric functions [such as sin(x) and cos(x)]. Then, we convert the arbitrary length of message into the corresponding ASCII values and perform 6-unit iterations with variable parameters and message values based on the generalized chaotic mapping, where the variable parameter p is updated by the generalized chaotic mapping. The final hash value is obtained by cascading extracted bits from iteration state values. We excessively evaluate the proposed algorithm in terms of distribution of hash value, sensitivity of hash value to the message and secret keys, statistical analysis of diffusion and confusion, analysis of birthday attacks and collision resistance, analysis of secret keys, analysis of speed, and comparison with other algorithms, and the results illustrate that the suggested algorithm is fast, efficient, and enough simple and has good confusion and diffusion capabilities, strong collision resistance, and a high level of security.

The main contributions of this work can be summarized as follows:

  • We present a fast and efficient hash algorithm based on a generalized chaotic mapping with variable parameters.

  • We define a generalized chaotic mapping by utilizing piecewise linear chaotic map and trigonometric functions.

  • We excessively evaluate the proposed algorithm, and the results illustrate that it has good capabilities of confusion and diffusion, strong collision resistance.

The remainder of this paper is organized as follows: Sect. 2 introduces the preliminaries of piecewise linear chaotic map and defines a generalized chaotic mapping used in the hash algorithm. In Sect. 3, we design the chaotic hash algorithm based on a generalized chaotic mapping with variable parameters in detail. We excessively evaluate the performance of the proposed hash algorithm in Sect. 4 and present conclusions in Sect. 5.

2 Preliminaries

To learn about the proposed hash algorithm, we should have some related preliminaries. In this section, we first introduce piecewise linear chaotic map, which is utilized for constructing a generalized chaotic mapping with trigonometric functions, and then define the generalized chaotic mapping in detail, which is used to iterate the message to generate hash values in the algorithm.

2.1 Piecewise linear chaotic map (PWLCM)

The chaotic tent map chosen in the hash algorithm is one-dimensional and piecewise linear chaoticmapping. Then, we define a gen map expressed as follows:

$$ X(t + 1) = F_{p} (X(t)) = \left\{{\begin{array}{ll}X(t)/p & 0 \le X(t) < p, \\ (X(t) - p)/(0.5 - p), & p \le X(t) < 0.5, \\ (1 - p - X(t))/(0.5 - p), & 0.5 \le X(t) < 1 - p, \\ (1 - X(t))/p, & 1 - p \le X(t) \le 1, \end{array}} \right. $$

where X(t) represents the iteration trajectory value and p denotes the control parameter. When p is assigned values in (0, 0.5), X(t) evolves into a chaotic state in range of (0, 1). That is, X(t) sequence values vary in (0, 1) and distribute independently. The PWLCM has properties of uniform distribution, good ergodicity, confusion and diffusion; therefore, it can provide chaotic random sequences. An explicit analysis of the bifurcation diagram of the PWLCM shows that with the specified initial value X(0) and parameter p, its iterative values are fixed, which are listed in Table 1. The map is running in a chaotic state within the range (0, 1), except for the specified values in Table 1. Therefore, we use the map as a key generator to produce the four initial buffers for our hash algorithm. Moreover, based on Ref. [28], we believe{X(t)}is ergodic and uniformly distributed in the interval (0, 1) and the autocorrelation function is δ-like [52].

Table 1 Fixed iterative values of PWLCM with the specified initial value X(0) and parameter p

2.2 Generalized chaotic mapping (GCM)

To realize multiple chaotic mappings under a unified structure, a key factor is that it should ensure each input of these chaotic maps does not exceed its value range [29]. Therefore, chaotic maps with same value range will be given preferential consideration to construct the generalized chaotic mapping. Then, we define a generalized chaotic mapping model as follows:

$$ \begin{aligned} f(t + 1) & = G(c(t + 1),\,X(t),\,f(t)) = a_{1} \times \frac{c(t + 1)}{256} + a_{2} \times \left| {\sin (c(t + 1))} \right| + a_{3} \times \left| {\sin (c(t + 1))} \right|^{2} \\ & \quad + a_{4} \times \left| {\sin (c(t + 1))} \right|^{3} \,+\, a_{5} \times \left| {\cos (c(t + 1))} \right| + a_{6} \times (1 - X(t)^{2} ) + a_{7} \times (1 - f(t)^{3} ) \\ \end{aligned} $$

where a i (i = 1, 2, 3,…, 7) only have two values: 0 or 1, and each variable element has the same value range of (0, 1). c(t + 1) is the (t + 1)th element of an array composed of the ASCII code values of message characters. X(t) is the tth iteration value of PWLCM. When a i (i = 1, 2, 3,…, 7) take different values, GCM illustrates different chaotic maps, and this kind of structure can easily control the range of the results. As we can see, the GCM is composed of outputs of PWLCM and trigonometric functions.

3 Description of the proposed algorithm

In this section, we design the fast and efficient hash algorithm based on a generalized chaotic mapping with variable parameters. The input is an arbitrary length of message M, and the output is l-bit hash value, where l = 128. The reason we choose the 128-bit length of hash value is that it is sufficient enough to ensure the security. We describe the hash algorithm in three steps of message preprocessing, hash algorithm description, and hash value generation, in the following:

  • Step 1 (Message preprocessing) We first convert the original message M into the corresponding ASCII code values and then store these values into an array c, where we denote the length of the array by n.

  • Step 2 (Hash algorithm description) We assign values to secret keys (f(0), X(0)) as f(0) = 0, X(0) = 0.2323, where f(0) is the input of GCM and the output is used to dynamically change the control parameter p of PWLCM, and X(0) is the initial value of PWLCM. Then, we perform 6-unit iterations, 1st-nth, (n + 1)th–2nth, (2n + 1)th–3nth, (3n + 1)th–4nth, (4n + 1)th–5nth, (5n + 1)th–6nth, on message array c. For each iteration, we first assign values to parameters a 1, a 2, a 3,…, a 7 in GCM: For i = 1 to n, we calculate s(i) = c(i) mod 8 and then assign values as a 1,…,a s(i) = 1 and a s(i+1),…,a 7 = 0. There are two exceptions: One is that when s(i) = 0, then we set s(i) = 7; the other is that, for some special conditions, for example, when all messages are all the same (such as all blank-space message), let coefficients of variables \( \frac{c(i + 1)}{256} \) and (1 − X(i))2 be 1 and others be 0. Then, we iterate GCM n times to generate f(n), then recalculate \( f(n) = \left| {f(n) - \left\lfloor {f(n)} \right\rfloor - 0.5} \right| \) such that f(n) is in the range of (0,0.5), and compute X(n) = F f(n)(X(n − 1)), which are then used as the inputs for the next iteration. The detailed 6-unit iteration processes are described in Algorithm 1:

    figure afigure a
  • Step 3 (Hash value generation) After 6-unit iterations, we obtain X(n), X(2n), X(3n), X(4n), X(5n), X(6n), then convert them to their corresponding binary formats, extract 20, 20, 20, 20, 24, 24 bits after their decimal point, respectively, and finally combine them to generate a 128-bit final hash value.

4 Performance analysis

We evaluate the proposed hash function based on a generalized chaotic mapping with variable parameters in terms of distribution of hash value, sensitivity of hash value to the message and secret keys, statistical analysis of diffusion and confusion, analysis of birthday attacks and collision resistance, analysis of secret keys, analysis of speed, and comparison with other algorithms. The arbitrary length of message for evaluating the performance of the proposed hash algorithm is randomly chosen as:

Southwest University (SWU) is a key comprehensive university, under the direct administration of the Ministry of Education. It was newly established in July 2005 through the incorporation of former Southwest China Normal University and Southwest Agricultural University upon the approval of the Ministry of Education. SWU is situated nearby the beautiful Jialing River, and is located at the foot of Jinyun Mountain, a state level scenic spot, in Beibei District, Chongqing Municipality.

4.1 Distribution of hash value

One of the most important properties of a hash function is the uniform distribution of hash value, which is directly related to the security of the hash function. We conduct the hash simulation experiment on the randomly chosen message and then use two two-dimensional graphs to display the distribution of the message and the final hash value. First of all, we plot the randomly chosen message in Fig. 1a, and we can see that the decimal ASCII code values of the message distribute in a small range of [32, 127], while in Fig. 1b, the distribution of the corresponding hash value in hexadecimal format is very irregular. To make a comparison, we choose an extreme message, “all 0”-message, and we can see that the distribution of the message in Fig. 2a and the distribution of the corresponding hash value in hexadecimal format in Fig. 2b also spread irregularly. The simulation results indicate that no information (including the statistic information) of the message can be left after the diffusion and confusion.

Fig. 1
figure 1

Spread of message and hash value: a distribution of the message in ASCII code and b distribution of the hash value in hexadecimal format (6FF0FEBF7460A59ED5EF9F8AEBED0B20)

Fig. 2
figure 2

Spread of “all 0”-message and hash value: a distribution of all “all 0”-message and b distribution of the hash value in hexadecimal format (6C79365B9EFD85966F599385F47568CE)

4.2 Sensitivity of hash value to the message and secret keys

According to the characteristics of a hash function, it is hard to find the original message if the hash value is known, which indicates that a hash function must be sensitive to tiny modifications in original message or secret keys. In particular, any slight modifications on message or secret keys will lead to a 50 % difference in hash value, according to a Hamming distance of approximately l/2 (l denotes the length of hash value) between the two hash values. In order to show the sensitivity of hash value to the message and secret keys, we conduct hash algorithm simulation experiment under the following seven conditions:

  • Condition 1: The original message chosen as the one in Sect. 4.1;

  • Condition 2: Change the first character “S” to “s”;

  • Condition 3: Change the word “key” to “non-key”;

  • Condition 4: Swap the word “Normal” with “Agricultural”;

  • Condition 5: Change the full stop “.” at the end of the message into a comma “,”;

  • Condition 6: Add a blank space to the end of the message;

  • Condition 7: Change the secret key X(0) = 0.2323 to X(0) = 0.2323000000000001.

The corresponding hash values in hexadecimal format are obtained, followed by the corresponding number of different bits compared with the hash value of Condition 1:

  • Condition 1: 6FF0FEBF7460A59ED5EF9F8AEBED0B20.

  • Condition 2: 54E86ABB3E709568B05A090BBA548938 (50).

  • Condition 3: 554B06A95906DE6E192456A46CD8FC36 (71).

  • Condition 4: 9AB0E492C1C6A3CF89BE84116F092CB1 (57).

  • Condition 5: AEF66656E072DFD495B5A579522BB5C7 (62).

  • Condition 6: FAA7F8AE9E6FAAD150DAC41810D9D88F (67).

  • Condition 7: 85D8AD49506CD53A3A6AC15A15A49010 (62).

The graphical display of binary sequences is plotted in Fig. 3.

Fig. 3
figure 3

Hash values under seven different conditions

As shown in the seven different hash values in hexadecimal format and Fig. 3, we conclude that the simulation results indicate the sensitivity property of the proposed hash algorithm is so perfect that any tiny difference in the message or secret keys will cause huge changes in the final hash value. Note that the reason we increase the value of X(0) with 10−17 in Condition 7 is the fact that 10−17 is the least value for the sensitive test of the hash algorithm, which has been verified in Sect. 4.5.

4.3 Statistical analysis of diffusion and confusion

Diffusion and confusion are two essential elements to hide message redundancy in the encryption algorithms that are introduced by Shannon [30]. The diffusion refers to as spreading out of the influence of a single plaintext bit over many cipher text bits so as to hide the statistical structure of the plaintext, while the confusion refers to as the use of transformations that complicate dependence of the statistics of cipher text on the statistics of plaintext.

We conduct the following experiment to capture qualitative characteristics of the diffusion and confusion: We first conduct the hash algorithm simulation on a randomly chosen message. Then, we randomly modify a bit in the message and then conduct the simulation. Finally, we count the difference between the two hash values in binary format. We denote the number of changed bits as B i . We perform N = 10,000 times of these tests, and the corresponding distribution of changed bit number is illustrated in Fig. 4. As depicted in Fig. 4, the maximum changed bit number is 84 and the minimum is 45, which show a good diffusion effect of the proposed hash algorithm.

Fig. 4
figure 4

Distribution of changed bit number: a plot of B i and b histogram of B i

In addition, we perform the same experiment to capture the statistic characteristics of the diffusion and confusion on the proposed hash algorithm. First, we define six statistical metrics of minimum changed bit number B min, maximum changed bit number B max, mean changed bit number \( \bar{B} \), mean changed probability P, standard deviation of the changed bit number ΔB, and standard deviation ΔP. They are computed as:

  • Minimum changed bit number: B min = min{B 1, B 2,…, B N },

  • Maximum changed bit number: B max = max{B 1, B 2,…, B N },

  • Mean changed bit number: \( \overline{B} = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {B_{i} } \),

  • Mean changed probability: \( P = (\overline{B} /l) \times 100\,\% \),

  • Standard variance of the changed bit number: \( \Delta B = \sqrt {\frac{1}{N - 1}\sum\nolimits_{i = 1}^{N} {(B_{i} - \overline{B} )^{2} } }, \)

  • Standard variance: \( \Delta P = \sqrt {\frac{1}{N - 1}\sum\nolimits_{i = 1}^{N} {(B_{i} /l - P)^{2} } } \times 100\,\%, \) where N indicates the total number of tests, B i denotes the changed bit number in the ith test, and l is the length of hash value. Based on the six statistics, we conduct the experiment on the hash algorithm N times, where N = 256, 512, 1024, 2048, and 10,000. The corresponding results of B min, B max, \( \overline{B} \), P, ΔB, and ΔP are presented in Table 2. As depicted in Table 2, the mean changed bit number \( \overline{B} \) is very close to the ideal value 64 bits (half of length of hash values), which is an empirical proof that the hash algorithm shows strong capability of confusion and diffusion [31], and the mean changed probability P is very close to the ideal value 50 % as well. Furthermore, the value of standard deviation of the changed bit number ΔB and standard deviation ΔP is very small, and with the increase of N, the two values are smaller and smaller, so the capability of diffusion and confusion is very stable.

    Table 2 Statistics of number of changed bits

4.4 Analysis of birthday attack and collision resistance

In fact, collision resistance is similar to birthday attacks in theory. They are essentially a probability problem that two random input data are found such that they are hashed to the same output. In the following, we conduct qualitative and quantitative analysis of collision resistance on the proposed algorithm.

We first perform a qualitative analysis on the algorithm. In the proposed algorithm, the generalized chaotic mapping with variable parameters is introduced to ensure that the parameter p in each iteration is dynamically updated by the last iteration value f(i) and the corresponding message bit in different positions. This inherent structure expedites the avalanche effect, which will ensure that each bit of the final hash value will be related to all the bits of message and even a single bit change in message or key will be diffused and result in great changes in the final hash value. Therefore, the proposed hash algorithm can resist the collision attack.

Then, we conduct the following experiment to make a quantitative analysis on collision resistance [32]: First, the hash value for a paragraph of message randomly chosen is generated and stored in ASCII format. Then, a bit in the message is selected randomly and toggled. A new hash value is then generated and stored in ASCII format as well. Two hash values are compared, and the number of ASCII character with the same value at the same location in the hash value is counted. We conduct the experiment 10,000 times. A plot of the distribution of the number of hits is demonstrated in Fig. 5. As illustrated in Fig. 5, there are 42 tests hitting twice, 404 tests hitting once, while in 9554 tests, no hit occurs, which are listed in Table 3.

Fig. 5
figure 5

Distribution of the number of ASCII characters with the same value at the same location in the hash value

Table 3 Number of hits (10,000-time tests)

Moreover, we introduce the absolute difference of two hash values d, which is calculated by the formula: \( d = \sum \nolimits_{i = 1}^{N} \left| {t\left({e_{i}} \right) - t\left({e^{\prime}_{i}} \right)} \right| \), where e i and e i are the ith ASCII character of the original hash value and the new hash value, respectively, and the function t(*) converts the entries to their equivalent decimal values. This kind of collision test is performed 10,000 times as well, with the secret key X(0) = 0.2323, and f(0) = 0. The maximum, mean, minimum values of d and mean/character are listed in Table 4. Based on the calculation in Ref. [53], the theoretical mean/character value can be computed as \( \frac{1}{3} \times 256 = 85.3333 \). The mean/character value 89.29 is close the theoretical value.

Table 4 Absolute differences of two hash values (10,000-time tests)

Therefore, the qualitative and quantitative analysis on collision demonstrates that the proposed hash algorithm has good collision resistance.

4.5 Analysis of secret keys

In the proposed algorithm, the chaotic sensitivities to tiny modifications in initial values and parameters are fully utilized. The security of hash algorithm completely depends on message and secret keys. Therefore, the algorithm is immune from key recovery attack.

To investigate the key space, we conduct the following evaluation experiment: We first set the initial value X(0) of PWLCM to be larger than 10−16, such as X(0) = 0.2323200000000001, and the corresponding changed bit number of the hash value is around 64 (85D8AD49506CD53A3A6AC15A15A49010 (62)). In contrast, if we set X(0) to be 10−17 or less than 10−17, the hexadecimal format of hash value is permanently shown as “7C3DB62518F8E15F3A350B766D5E2A47”; that is, no corresponding hash bit changes. Therefore, the sensitive precision of secret key X(0) to hash value is 10−16. Similarly, the sensitive precision of the parameter p is 10−16 as well. Considering both X(0) ∊ (0, 1) and p ∊ (0, 0.5), we can derive that the secret key space is approximately larger than 2106. According to the ECRYPT II report on key lengths [33, 4749], 106 bits provide sufficient security against brute force key-search attacks [50, 51].

4.6 Analysis of speed

First, the proposed hash algorithm does not need message padding as the preparation, while message padding is proportional to the length of original message; therefore, our algorithm without the message padding process improves the executive speed. Then, a generalized chaotic mapping and a one-dimensional piecewise linear chaotic map are utilized in our algorithm, the dynamical property of which is enough for the security of the algorithm, and the structures of which are simple that greatly reduce complexity of the algorithm, thereby providing fast computation speed and achieving the high efficiency.

Moreover, the proposed algorithm has the parallel property. Six iteration values X(n), X(2n), X(3n), X(4n), X(5n), and X(6n) contribute 20, 20, 20, 20, 24, and 24 bits to the final hash value, respectively, which corresponds to the parallel computation of 2.5, 2.5, 2.5, 2.5, 3, and 3 bytes.

4.7 Comparison with other algorithms

We perform a comparison between the proposed hash function and some significant chaos-based hash functions as well as MD5, which is based on statistical performance and collision resistance.

Tables 5 and 6 describe the comparison of statistical performance between the proposed algorithm and selected existing algorithms. Note that the results reported in Table 5 are based on N = 2048 random tests and 128-bit hash value, while the results of Table 6 focus on N = 10,000 random tests and 128-bit hash value. Based on the results, our algorithm shows better statistical performance.

Table 5 Comparison on statistical performance with N = 2048 random tests and 128-bit hash value
Table 6 Comparison on statistical performance with N = 10,000 random tests and 128-bit hash value

In addition, Tables 7 and 8 present the comparison of the number of ASCII characters with the same value at the same location and absolute difference in 128-bit hash values between our algorithm and selected existing algorithms. Note that the reported results of Table 7 are based on N = 2048 random tests, while the results of Table 8 focus on N = 10,000 random tests. Based on the results, the proposed algorithm shows better collision resistance.

Table 7 Comparison on collision resistance with N = 2048 random tests and 128-bit hash value
Table 8 Comparison on collision resistance with N = 10,000 random tests and 128-bit hash value

As a discussion, the proposed hash algorithm illustrates better statistical performance and collision resistance capability. We present a fast and efficient hash algorithm based on a generalized chaotic mapping with variable parameters. We dedicate to the improvement of the efficiency with high execution speed. However, most of the start-of-the-art literature focuses on complexity of the algorithm, ignoring the speed and efficiency.

5 Conclusion

In this paper, we present a fast and efficient hash algorithm based on a generalized chaotic mapping with variable parameters. We first define a generalized chaotic mapping by utilizing piecewise linear chaotic map and trigonometric functions. Then, we convert the arbitrary length of message into the corresponding ASCII values and perform 6-unit iterations with variable parameters and message values based on the generalized chaotic mapping. The final hash value is obtained by cascading extracted bits from iteration state values. We excessively evaluate the proposed algorithm in terms of distribution of hash value, sensitivity of hash value to the message and secret keys, statistical analysis of diffusion and confusion, analysis of birthday attacks and collision resistance, analysis of secret keys, analysis of speed, and comparison with other algorithms, and the results illustrate that the suggested algorithm is fast, efficient, and enough simple and has good confusion and diffusion capabilities, strong collision resistance, and a high level of security.