1 Introduction

In the current decade, communication technology, multimedia data technology, internet, and internet protocol, the exchange of communication through wireless networks is growing exponentially. The wireless network such as the internet is an open network, so the transmission of sensitive information through the internet is a security risk. Therefore, the development of the new method is essential, which ensures the confidentiality of sensitive information over the open network. For the last several years, researchers have developed classical encryption schemes such as International Encryption Standard (IDES), Data Encryption Standard (DES), RSA, Rivest Cipher 4 (RC4), and Advanced Encryption Standard (AES) [4, 24, 32, 41]. These schemes are depending on mathematical operations and iterated processes, so considered well secure for confidential communication. Since, multimedia data consists of large amount of data and having other distinguishing features such as strong pixels correlation, bulk data capacity, and high redundancy between the rare pixels. Therefore, classical encryption sachem, for example, DES and AES are not appropriated for multimedia data encryption. In literature, several cryptographic schemes are presented using different approaches for multimedia data encryption [1, 2, 22, 30, 37, 38]. Cryptographic algorithms suggested by researchers for digital image encryption is given in [2, 11, 12, 15, 31, 36]. In 1970, Fredrich introduced the notion of chaos-based cryptography and suggested a technique for image encryption based on a chaotic map. Afterward, the researchers have observed that the distinguishing features of the chaotic systems such as their property of dependence on the initial conditions and the high sensitivity property to the parameter and initial condition, randomness, and ergodicity make the theory of chaos suitable for the data security application, in particular for multimedia data encryption. In literature, numerous encryption schemes exist based on chaotic systems for digital images [34]. The existing schemes are based on one dimensional or multidimensional chaotic maps. However, due to a small set of parameters of the one-dimensional chaotic maps, the encryptions schemes that are based on the one-dimensional chaotic maps are proven to not secure against cryptanalysis attacks.

1.1 Related work

In literature, numerous numbers of algorithms are presented for the encryption and decryption of digital audio. However, there does not exist a single scheme that applies to all audio formats. De Martin and Serveti introduced a technique for the partial encryption of telephone speech in 2002 [27], in which the authors have suggested two different methods for partial encryption. The first method was designated as having a high bit rate but low-security strength. However, the second method was designed to encrypt more and more bit data streams and provide strong security. In the same year, Thorwirth et al. had presented the selection encryption method that relied on the standard compression for perceptual audio coding [33]. In the suggested work the authors mostly focused on the analyses of the encrypted MP3 encoded files. Then Serveti et al. presented another MP3 audio encryption scheme in 2003 [28]. The presented algorithm was a partial encryption scheme with low time complexity, which preserved the content of the information nevertheless alter the quality of the original audio. Afterward, Bhargava et al. introduced four encryption schemes for digital video applicable for MPEG format [3]. In the suggested scheme a shared secret key was utilized to randomly modify the coefficient of Discrete Cosine Transform. Subsequently, Grangetto et al. presented a new multimedia data security framework based on arithmetic coding [8]. In the proposed work, the goal of multimedia data encryption was accomplished by inserting randomization in the procedure of arithmetic coding. In 2008 Yan et al. proposed the procedure of scrambling digital audio data in the compressed domain [39]. The suggested scheme worked to scramble the secret audio data utilizing the key before transmission. However, this work was then proved defenseless against the brute force attack [40]. Neto and Lima suggested an audio encryption scheme using the cosine number transform [18]. The anticipated mechanism was applicable to encrypt the blocks of the audio data. The selection of the block was based on a sample overlapping rule that produced confusion and diffusion in the ciphered data. The theory of chaos was extensively used for multimedia data encryption, presented in the literature. The encryption techniques depend on chaotic maps for the encryptions of static multimedia data such as digital images are given in [5, 14, 35]. Besides, these are widely utilized for dynamic multimedia data like audio and video. Mosa et al. proposed an audio encryption scheme based on the Bakar map in 2011 [21]. In the presented scheme, the Bakar chaotic map is used for the permutation of audio segments and achieves the goal of permutation using masking in the transform and time domain. The speech encryption scheme based on the chaotic shift key was proposed by Sathiyamurthi et al. [25]. In the suggested scheme the goal of the high-security level is achieved by multiple times permutating the sample of the data. Next, Madian et al. introduced audio scrambling to break the correlation between the audio data based on two-dimensional cellular automata [19]. Subsequently, Liu et al. presented an innovative mechanism for the encryption of digital audio, in which a multi-scroll chaotic map was deployed for the confusion and the diffusion of the data. The audio encryption scheme relying on a fuzzy cellular network. The delayed uncertainty of hybrid bidirectional associative memory was proposed by Kalpana et al. in 2018 [13].

1.2 Our contribution

In this paper, we have introduced a novel three-dimensional discrete chaotic map. The structure of the new three-dimensional chaotic map is inspired by the map given in [20]. The map has been extended and includes more control parameters while remaining their dimension the same. The performance analysis demonstrates that the new chaotic map shows better chaotic behavior and enlarge interval of control perimeters than the existing map. Besides, we have presented| a novel lossless audio encryption scheme based on the suggested chaotic maps. The initial conditions and control parameters are used as a secret key to generate chaotic sequences. The generated sequences are then used in diffusion confusion operations. The result of the security analysis and simulations demonstrate that the suggested encryption scheme is more efficient to resist various cryptanalysis attacks.

The remaining part of the manuscript is organized as follows: Section 2, presents a novel chaotic map and their evaluation results. The proposed audio encryption scheme is introduced in Section 3. Section 4 is devoted to the construction and analysis of the seven-bit S-box. In Section 5, we have discussed the simulation results and their comparison with the existing scheme. In the last, Section 6, the discussion has been concluded.

2 Preliminaries

This section is concerned to recall some fundamental definitions and theorems, which are used in the proposed encryption scheme.

Definition2.1. Let X and Y be any two systems, such that

$$ X\left(i+1\right)=G\left(X(i)\right) $$
(1)
$$ Y\left(i+1\right)=F\left(X(i),Y(i)\right) $$
(2)
$$ X(i)={\left({x}_1(i),{x}_2(i),\dots, {x}_{m-1}(i),{x}_m(i)\ \right)}^T $$
(3)
$$ Y(i)={\left({y}_1(i),{y}_2(i),\dots, {y}_{m-1}(i),{y}_n(i)\ \right)}^T\kern1em n\le m $$
(4)
$$ G\left(X(i)\right)={\left(g\left({x}_1(i)\right),g\left({x}_2(i)\right),\dots, g\left({x}_{m-1}(i)\right),g\left({x}_m(i)\right)\ \right)}^T $$
(5)
$$ F\left(X(i),Y(i)\right)={\left({f}_1\left(X(i),Y(i)\right),{f}_2\left(X(i),Y(i)\right),\dots .,{f}_m\left(X(i),Y(i)\right)\right)}^T $$
(6)

In the systems, the system in (1) is said to the driving system and the system in (2) is called a driven system. If there exists a transformationH, such that

$$ H:{\mathbb{R}}^m\to {\mathbb{R}}^n $$
$$ H\left(X(i)\right)={\left({h}_1(i),{h}_2(i),\dots, {h}_n(i)\right)}^T $$
(7)

The system of the equations given in (1) and (2) is said to be generalization synchronization. If for a subset S = SX × SY ⊂ m × n and the entire trajectory that is given in Eqs. (1) and (2) with the initial condition in Ssatisfy the following equation.

$$ \underset{i\to +\infty }{\lim}\left\Vert H\left(X(i)-Y(i)\right)\right\Vert =0 $$
(8)

Theorem 2.2

Let X, F(X), Xn and F(Y, X) be the systems which are defined in the preliminaries sections and Xn is defined as

$$ {X}_n=\left({x}_1(i),{x}_2(i),\dots .{x}_n(i)\right) $$
(9)

Let T be the invertible transformation defined as follows

$$ T:{\mathrm{\mathbb{R}}}^n\to {\mathrm{\mathbb{R}}}^n $$
$$ T\left({x}_1,{x}_2,\dots, {x}_{n-1},{x}_n\right)=\left({y}_1,{y}_2,\dots, {y}_{n-1},{y}_n\right) $$
(10)

If the systems that are given in the Eqs. (1) and (2) are generalized synchronization by the transformation Y = T(Xn). Then the functions F(Y, X) given in the equation satisfies the following properties

$$ F\left(Y,X\right)=T\left({G}_m(x)-r\left({X}_n,Y\right)\right) $$
(11)
$$ {G}_m(x)={\left({g}_1(x),{g}_2(x),\dots, {g}_m(x)\right)}^T $$
(12)

The function r that is defined as

$$ r\left({X}_n,Y\right)=\left({r}_1\left({X}_n,Y\right),{r}_2\left({X}_n,Y\right),{r}_3\left({X}_n,Y\right),\dots, {r}_n\left({X}_n,Y\right)\right) $$
(13)

guarantee the stability of the zero solution of the following error equation.

$$ e\left(i+1\right)=T\left({X}_n\left(i+1\right)\right)-Y\left(i+1\right)=r\left({X}_n,Y\right) $$
(14)

2.1 Proposed chaotic map

This section is devoted to the introduction of the proposed Chaotic map and to the detailed discussion of their properties which are given as follows

$$ {x}_{i+1}={y}_i+\alpha \sin \left({x}_i\right)+\gamma \cos \left({z}_i\right) $$
(15)
$$ {y}_{i+1}={x}_i+\sin \left({x}_i\right)\cos \left({y}_i\right)+\tan \left({z}_i\right) $$
(16)
$$ {z}_{i+1}={x}_i\sin (i)+{y}_i\cos (i)+\beta {\tan}^{-1}\left({z}_i\right)-\delta $$
(17)

In the above chaotic system of equation, for any α ∈ [5, ∞), β ∈ [−10, 10]and {γ, δ } ⊆ [−1,  1], and for the initial condition (x0, y0, z0) = (.0705, 00001, 0038), the chaotic orbit of the system of equationsxi, yizi for the first fifty thousand iterations is visualized in Fig. 1(a-d).

Fig. 1
figure 1

Chaotic trajectories of variables (a) xi − yi (b) yi − zi (c) xi − zi (d) xi − yi − zi

Afterward, defined an invertible matrix M and defined a transformation given as follows:

$$ T:{\mathrm{\mathbb{R}}}^3\to {\mathrm{\mathbb{R}}}^3 $$

Defined by

$$ T\left({x}_i,{y}_i,{z}_i\right)=\left(\begin{array}{ccc}3& -2& -3\\ {}3& 0& -1\\ {}-3& 2& 3\end{array}\right)\left[\begin{array}{c}{x}_i\\ {}{y}_i\\ {}{z}_i\end{array}\right] $$
(18)

Let

$$ r\left(X,Y\right)=-0.5\left( TX-Y\right) $$
(19)

Then r(X, Y)build an asymptotically stable error equation. Then by using theorem (1) the obtain system is in the form of:

$$ Y\left(i+1\right)=M\left(G\left(X(k)\right)-r\left(X,Y\right)\right) $$
(20)

Choose the initial condition Y(0) = T(x0, y0, z0). The first fifty thousand iterations of the system given are shown in Fig. 2. It can be seen that the chaotic behavior of the systems given in (15), (16), and (17) are completely different, accordingly both the systems are not generalization synchronization related. Besides, its chaotic behavior is better than the system given in [20].

Fig. 2
figure 2

Chaotic trajectories of variables (a) T(xi − yi) (b) T(yi − zi) (c) T(xi − zi) (d) T(xi − yi − zi)

3 Proposed algorithm

We discussed the proposed audio encryption scheme in this section. The proposed scheme is designed to secure digital audio in a wav format before transmitting it over an insecure channel. Initially, the scheme read the audio file in the class signed sixteen-bit integers, whose range value laid in the interval [−215,  215]. We denote the matrix of the original audio data by \( \mathcal{K} \) having dimension M × N, where M denotes the number of rows and N denotes the number of columns. The step-by-step procedure of the encryption scheme is given as follows;

  • Step 1. Since, initially, the scheme read the audio file in the class signed bit integers, which contains negative integers, therefore initially the scheme generates binary matrix consist of 0 and 1 to recognize the position of the non-negative and negative integers. The mathematical formulation is given as follows;

    $$ {\mathcal{B}}_{i,j}=\left\{\begin{array}{c}0\kern2em if\kern1.5em {k}_{i,j}<0\\ {}1\kern2em if\kern1.5em {k}_{i,j}\ge 0\end{array}\right. $$
    (21)

Where \( {\mathcal{B}}_{i,j} \)denoted the (i, j)th element of the binary matrix and ki, j denote the element of the data matrix \( \mathcal{K} \) at position(i, j). Consequently, get a binary matrix \( \mathcal{B} \) of dimension M × N.

  • Step 2. In the next step, transform the audio data from the set [−215,  215 − 1] to the set [−215 − 1,  215 − 1] using the following equation;

    $$ {k}_{ij}=\left\{\begin{array}{c}{k}_{ij}\kern3.25em if\kern1.5em {k}_{ij}>-{2}^{15}\\ {}{k}_{ij}-1\kern2em if\kern0.5em {k}_{ij}=-{2}^{15}\end{array}\right. $$
    (22)

As resultant get a new matrix\( {\mathcal{K}}^{\prime } \) having entries between the range −215 − 1 and 215 − 1, aim to convert the data into 15-bit integer values.

  • Step 3. Afterward, the scheme uses the absolute function and transform the data of the matrix \( {\mathcal{K}}^{\prime } \) from the set {−215 − 1,….,215 − 1} to the set {0, …, 215 − 1} and get a new matrix \( {\mathcal{K}}^{\prime \prime } \)whose entries consist of 15-bit positive integers.

  • Step 4. Then the scheme generates the random sequences \( {x}_{R_i} \), \( {y}_{R_i} \) and \( {z}_{R_i} \)by using the following modular equations,

    $$ {x}_{R_i}= floor\left({x}_i\times {10}^5\right)\mathit{\operatorname{mod}}\ M $$
    (23)
    $$ {y}_{R_i}= floor\left({y}_i\times {10}^5\right)\mathit{\operatorname{mod}}\ N $$
    (24)
    $$ {z}_{R_i}= floor\left({z}_i\times {10}^5\right)\mathit{\operatorname{mod}}\ L $$
    (25)

Where xi, yi and zi are the sequences defined in the Eqs. (15–17) and L denote an integer greater than M × N. The random sequences are tested through a statistical test suite from random numbers and pseudorandom numbers generator for cryptographic applications, the result is given in Tab. 4.

  • Step 5. Since multimedia data exist a strong correlation among the adjacent integers, therefore efficient multimedia data schemes include several features to break the strong correlation among the adjacent values. Accordingly, in this step, the proposed algorithm utilizes the generated sequences given in step 4 and shuffle the matrix \( {\mathcal{K}}^{\prime \prime } \). We can write it in mathematical form given as follows;

    $$ {\mathcal{K}}^p\left(i,j\right)={\mathcal{K}}^{\hbox{'}\hbox{'}}\left({x}_R(i),{y}_R(j)\right) $$
    (26)

Where (i, j) denote the integer position in the shuffled matrix \( {\mathcal{K}}^p \). The correlation analysis graphs are illustrated in Fig. 7. The figures show that the permutation step successfully break the correlation among the adject pixels.

  • Step 4. The confusion phase is an essential part of any cryptosystem. In this step, we used the substitution procedure to produced confusion in the ciphered data. Since the data in the permuted block are 15-bit integers, so it will be might computationally complex to substitute the whole block containing 15-bit integers, therefore we dived the block into two subblocks, containing 8-bit integers and 7-bit integers respectively by using the following maps;

ψ1 : 215 ⟶ 28 and ψ2 : 215 ⟶ 27.

Defined by

$$ {\psi}_1\left({a}_1,{a}_2,\dots, {a}_{15}\right)=\left({a}_1,{a}_2,\dots, {a}_8,0,0,\dots, 0\right) $$
(27)
$$ {\psi}_2\left({a}_1,{a}_2,\dots, {a}_{15}\right)=\left(0,0,\dots, 0,{a}_9,{a}_{10},\dots, {a}_{15}\right) $$
(28)

Consequently, get two subblocks \( {{\mathcal{K}}^p}_7 \) and \( {{\mathcal{K}}^p}_8 \) consist of seven-bit and eight-bit elements respectively.

  • Step 5. Generate two 8 × 8 S-box and 7 × 7 S-box by using Möbius transformation over Galois field GF(28) and GF(27). The procedure of generating 8 × 8 S-box and their security analyses is given in the literature [16]. Since 7 × 7 S-box has never been used and analyzed before, therefore in this paper, we briefly discussed the construction procedure of 7 × 7 S-box and their performance analysis in Section 4.

  • Step 6. In this step, substitute the subblock \( {{\mathcal{K}}^p}_8 \) with 8 × 8 S-box, the substitution process is same as AES substitution. Since the 7 × 7 S-box is the 8 × 16 lookup table as shown in the Table 1, therefore the substitution process is somehow unique. Initially, convert the decimal representation of the elements of the subblock \( {{\mathcal{K}}^p}_7 \) into binary representations. Then split the seven bits elements into three and four bits, and convert the three-bits string and the four-bits string elements into decimal representation, and substitute the elements with the elements of the S-box place at the position (i, j), where i denote the decimal representation of the three-bit elements and jdenote the decimal representation at of the four-bit elements. For better understanding read Example 4.1.

  • Step 7. After the substitution process, one can get new subblocks \( {{\mathcal{K}}^S}_8 \)and \( {{\mathcal{K}}^S}_7 \). Eventually, used the xor operation and xor the obtained blocks \( {{\mathcal{K}}^S}_8 \)and \( {{\mathcal{K}}^S}_7 \) with the sequence \( {z}_{R_i}\ \mathit{\operatorname{mod}}\ 256 \) and \( {z}_{R_i}\ \mathit{\operatorname{mod}}\ 128 \) and get new blocks \( {{\mathcal{K}}^E}_8 \)and \( {{\mathcal{K}}^E}_7 \).

  • Step 8. Convert the seven-bit block \( {{\mathcal{K}}^E}_7 \) and the eight-bit block \( {{\mathcal{K}}^E}_8 \) into a single fifteen-bit block by using the following maps;

    $$ {\psi}^{-1}:{{\mathbb{Z}}_2}^8\times {{\mathbb{Z}}_2}^7\to {{\mathbb{Z}}_2}^{15} $$
    $$ {\psi}^{-1}\left(\left({a}_1,{a}_2,\dots, {a}_8\right),\left({a}_1,{a}_2,\dots, {a}_7\right)\right)=\left({a}_1,{a}_2,\dots, {a}_8,{a}_1,{a}_2,\dots, {a}_7\right) $$
    (29)
Table 1 Proposed 7×7 S-box

Where ai ∈ {0, 1}. As the result of the above map get M × N block \( {{\mathcal{K}}^S}_{15} \) containing fifteen bits elements.

  • Step 9. At the final step, map the elements of the matrix \( {{\mathcal{K}}^S}_{15} \)from the {0, 1, 2, …215 − 1} to the set {−215 − 1,  215 − 1} using the binary matrix \( \mathcal{B} \). The mathematical formula is given as follows;

$$ {\mathcal{K}}^E\left(i,j\right)=\left\{\begin{array}{c}\kern0.75em {{\mathcal{K}}^S}_{15}\left(i,j\right)\kern1.25em if\kern0.5em \mathcal{B}\left(i,j\right)=1\\ {}-{{\mathcal{K}}^S}_{15}\left(i,j\right)\kern1em if\kern0.75em \mathcal{B}\left(i,j\right)=0\end{array}\right. $$
(30)

The obtained matrix \( {\mathcal{K}}^E \)is then converts into the Audio file which is required ciphered audio. Further detail of the proposed scheme is illustrated in the flow chart of the scheme, shown in Fig. 3. To test the security strength of the proposed scheme, we have encrypted various audio files of different characters and different sizes. The result analyses are shown in the following section.

Fig. 3
figure 3

Flow chart of the propose encryption scheme

figure a

4 Construction of 7 × 7 S-box and their performance analysis

Since 7 × 7 S-box has never been used before, therefore a brief discussion is given in this section about the construction procedure of the 7 × 7 S-box and their performance analyses. The S-box used to substitute the subblock consist of seven-bit integers is based on the action of general linear group \( GL\left(2,{\mathbbm{F}}_{2^7}\right) \) on the binary Galois field extension\( {\mathbbm{F}}_{2^7} \) of order 128 [29], the mathematical representation is defined as follows:

(31)

Where and uv, w, and z are the elements of the extension field \( {\mathbbm{F}}_{2^7} \), which satisfied the condition u × z + v × z ≠ 0, where the operations × and + are the extension field \( {\mathbbm{F}}_{2^7} \) operations field multiplication and addition. is a discontinues bijective map over the extension filed \( {\mathbbm{F}}_{2^7} \).The images of\( {\mathcal{F}}_{\mathcal{M}} \) are then transmute into an 8 × 16 lookup table. In this study, we construct a 7 × 7 S-box by using the parameters u = 53, v = 46, w = 33 and z = 34 and primitive irreducible polynomial p(x) = x7 + x + 1. The S-box is shown in Table 1, in the next subsection we analyzed the performance analyses of the constructed S-box.

Example 4.1

Le I = (0101101)x be the input of the S-box, then the MSB 010x = 2 indicates the second row, the numbering starts from 0, and the column 1101x = 13. If the input I is substitute with the S-box given in Table 1, then the output of the S-box isS1(45 = (101101)x) = 1.

Example 4. 2

The following tables demonstrate the step-by-step procedure of proposed encryption scheme.

figure d

4.1 Security analysis of the 7 × 7 S-box

In this subsection, we presented some statistical and algebraic analysis of the proposed S-box given in Table 1. followed [18]. The suggested S-box is examined over diverse analyses, for instance, Nonlinearity, Differential approximation probability (DP), Bit independent criterion (BIC), Strict avalanche criterion (SAC), and Linear approximation probability (LP). The nonlinearity of an S-box demonstrates the distance between the Boolean functions of the S-box and the set all affine functions. The upper bound of a Boolean vector function is calculated by the formula \( {2}^{x-1}-{2}^{\frac{\left(x-1\right)}{2}-1} \). Thus for x = 7 the optimum nonlinearity value is 60. The average nonlinearity value of the proposed is 52.8571 shown in Table 2, which is near the optimum value. Similarly, the results of the other analyses are appeared in a similar table, which demonstrates that the proposed S-box is secure against all kinds of attacks.

Table 2 Performance analyses of the 7×7 S-box

5 Security analysis

A well-organized multimedia data encryption scheme should be able to resist all kinds of attacks such as statistical, brute force, and other cryptanalytic attacks. In this section, we analyze the robustness of the suggested encryption algorithm against multiple attacks. The test simulations are carried out by Matlab 2019(b) on a portable personal computer. To investigate the proposed encryption scheme, we have chosen multiple audio samples with different characters such as speech, music, etc., and encrypt these samples via the proposed scheme using different keys. Figure 4 shows the waveforms of the original and the encrypted audio files. From the Figures, it can be seen that the amplitude plotted in the waveform of the encrypted audios is uniform and have no similarity with the amplitude of the original audio, thus the audio is successfully encrypted. In the next subsection, we examine the scheme against various analyses, for example, spectrogram, histogram analysis, entropy and Correlation.

Fig. 4
figure 4

Waveforms of the man, female, bird and alarm audio (b) original audio file (b) encrypted audio file

5.1 Spectrogram analysis

The Spectrogram analysis of the audio is widely used for sound analysis. It is the basic tool to analyze the sound in spectral analysis. The spectrogram of audio is defined as two-dimensional graphs with the third dimension represented via different colors. It is the visual representation of the frequency that is varies with time. However, the color in the third dimension represents the amplitude or loudness of the sound at a specific time, whereas the red and blue colors specify the low amplitude and the bright color up means the stronger amplitude. We analyzed the proposed encryption scheme through spectrogram analysis, the result is shown in Fig. 5. Figure 5 (a) displays the spectrogram graph of the original audio file, while the spectrogram of the encrypted audio file is shown in Fig. 5(b). In the figures one can noticed that the spectrogram of the encrypted audio is uniform, have strong amplitude, and completely different from the spectrogram of the original audio, Thus the audio is successfully encrypted.

Fig. 5
figure 5

Spectrogram graph of man sound, female sound, birds sound and alarm (a) original audio (b) encrypted audio

5.2 Histogram analysis

The histogram analysis is the prominent method, used to examine the encryption quality of the cryptosystem against statistical attacks. Since the cryptosystem is likely to transform the original data into noise and produce randomness in the data. Thus, the well-organized cryptosystem should convert the original data with similarly probable values, so that the encrypted data provides no info that helps the attacker to decrypt the data without information of the secret key. We scrutinize the proposed encryption scheme with histogram analysis, the result is illustrated in Fig. 6. Figure 6(a) displays the histogram of the original audio, while the histogram of the encrypted audio file is shown in Fig. 3(b). It can be seen that the histogram of the original audio signal random and converging to a single point, however, the histograms of all encrypted audio files are almost uniform. Accordingly, the proposed scheme is highly secure against any statistical attack and the eavesdroppers are be unable to extract information from the encrypted data.

Fig. 6
figure 6

Histogram analysis of men, female, birds and alarm audio (a) histogram of the original audio. (b) Histogram of the corresponding encrypted audio

5.3 Correlation

The correlation coefficient is a statistical test used to scrutinize the strength of the cryptosystem against several statistical attacks. Since in multimedia data the segments of the data are strongly correlated. Therefor a well-secure cryptosystem should interrupt the correlation among the segment of the data. The Correlation analyses examine the correlation between the similar segments in the data. The mathematical representation of the correlation coefficient is given as follows:

$$ {\gamma}_{uv}=\frac{\mathit{\operatorname{cov}}\left(p,q\right)}{\sqrt{\mathcal{D}(p)\mathcal{D}(q)}} $$
(32)

Where

$$ \mathit{\operatorname{cov}}\left(p,q\right)=\frac{1}{\mathcal{P}}\sum \limits_{i=1}^{\mathcal{P}}{p}_i-\mathcal{E}(p)\left({q}_i-\mathcal{E}(q)\right) $$
(33)
$$ \mathcal{D}(p)=\frac{1}{\mathcal{P}}\sum \limits_{i=}^{\mathcal{P}}{\left({p}_i-\mathcal{E}(p)\right)}^2 $$
(34)

And

$$ \mathcal{E}(p)=\frac{1}{\mathcal{P}}\sum \limits_{i=1}^{\mathcal{P}}{p}_i $$
(35)

In the above equation pi denote the selected sample at ith position and qi denote the corresponding adjacent sample. We examine the proposed scheme over correlation coefficient analysis. Mostly correlation analyses of the data are measured in multiple directions such as vertical horizontal and diagonal direction. Since in the audio the data are distributed in a single string, so we analyzed the correlation analysis of the proposed scheme in the horizontal direction, the result is listed in Table 3. From the table it is observed that the value of the correlation analysis of the original audio is equal to 1, it means the segments in the audio data are strongly correlated. However, the value of the correlation analysis of the ciphered audio is approximately equal to 0, i.e., the proposed scheme systematically interrupts the correlation of the audio segment. Besides, the correlation analysis of the original and the encrypted audio file is shown in Fig. 7. Fig demonstrates that the proposed scheme steadily reduced the intercorrelation of the audio file. Thus, the proposed scheme is well secure against the statistical attacks.

Table 3 Correlation analysis of different audio
Fig. 7
figure 7

Correlation analysis of man, female, birds and alarm sound (a) Correlation analysis of original audios (b) Correlation analysis of their corresponding encrypted audios

5.4 Information entropy

The information entropy analysis is utilized to measure the rate of uncertainty in the ciphered data. The rate of uncertainty is directly proportional to the value of the entropy; the higher value of entropy reflects the higher uncertainty in the encrypted audio file. The mathematical form of the information entropy analysis is given as follows.

$$ H=-\sum \limits_{k=0}^{\mathcal{L}}\mathcal{P}(k){\mathit{\log}}_2\mathcal{P}(k) $$
(36)

Where \( \mathcal{L} \) indicates the grayscale of the audio file and \( \mathcal{P}(k) \) signifies the probability of the appearance of the grey-value k. In this case, the theoretical value H corresponding to the audio file is 16. So, the cryptosystem considered to be well-secured if the information entropy value of the ciphered file is 16. We inspect the proposed scheme through information entropy analysis; the results are tabulated in Table 4. From the table one can notice that the information value of the proposed scheme is much closed 16 for all ciphered audio, thus the produced optimum uncertainty in the audio file, therefore the proposed algorithm is capable to resist entropy attack.

Table 4 Entropy analysis

5.5 Differential attacks

The differential attacks consist of two analyses, the number of pixel change rates (NPCR) and Unified Average Changing Intensity (UACI) that are used to securitize the sensitivity of the cryptosystem. A proficient cryptographic algorithm should be sensitive so that the slight change in the plain data yields an enormous change in the cipher data. The NPCR and UACI analysis are used to evaluate the sensitivity of the cryptosystem. The mathematical representation of NPCR and UACI is given as follows.

(37)

In the equation () K symbolize the cardinality of the audio data set

(38)

The mathematical equation of the UACI is given as follows:

(39)

In Eq. (28), the 2K indicates the order of bit in the audio data set. An algorithm considered to be well-secured against different attacks, if the NPCR and UACI rate of the algorithm is approximately equal 100 and 33.3333 respectively. We evaluate the proposed audio encryption scheme over NPCR and UACI analysis, the resultant values are given in Table 5. The results of both analyses reveal that the proposed scheme is capable to resist the diverse attacks.

Table 5 Differential analysis

5.6 Asymptotic complexity and execution time

The asymptotic time complexity analysis of an algorithm theoretically estimates the running time of the algorithm to complete the execution. Usually, it is denoted by big oh \( \mathcal{O} \). In this subsection, we discussed the asymptotic time complexity of the suggested encryption scheme. Since the first step of the scheme permute the audio data according to the random numbers, generated through chaotic maps. The permutation of each element requires constant time \( \mathcal{O}(1) \), therefore the total asymptotic time complexity of the permutation step is \( \mathcal{O}\left(M\times N\right) \), where M × N denote the dimension of the audio matrix. Similarly, in the substitution step S-box with constant number of elements has been used. Thus, the substitution of each element of the audio data takes constant time \( \mathcal{O}(1) \). Consequently, the total time complexity of the substitution step is \( \mathcal{O}\left(M\times N\right) \). Since, the complexity of each step of the encryption algorithm is the same, therefore the complexity of the overall encryption procedure is linear that is \( \mathcal{O}\left(M\times N\right) \). To figure out the execution time of the encryption scheme, we implemented the scheme in Matlab 2019b running on personal computer Window 10, 64-bit operating system. The computer is equipped with processer Inter(R) Core (TM) -i7-5600U CPU @ 2.60 GHz and 16 GB Ram. The execution time of the encryption of different audio having different character is listed in Table 6.

Table 6 Execution time

5.7 Peak signal to noise ratio

Peak signal to noise ratio is a phenomenon that mainly related to the quality of the data after decryption using cypto-algorithems. The PSNR score between original audio A and encrypted audio A, in decibel(db), is then calculated by computing the ratio between the maximum possible pixel value and corrupted noise value. In addition, the higher score of PSNR is enough indication of efficiency of an encryption algorithm. The PSNR is defined via mathematical expression as

$$ PSNR=20\cdotp {\log}_{10}\left(\frac{255}{(MSE)^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2$}\right.}}\right) $$

where

$$ MSE=\frac{1}{M\times N}\sum \limits_{i=0}^M\sum \limits_{j=1}^N{\left[I\left(i,j\right)-{I}^{\prime}\left(i,j\right)\right]}^2 $$

where A(i, j) and A(i, j) are the pixel values of the original and the encrypted audio respectively. The PSNR and MSE result of proposed encryption scheme is listed in Table 7. The results demonstrate that the proposed scheme has low PSNR values and high MSE.

Table 7 PSNR and MSE analysis

5.8 NIST statistical test

In this section, we analyzed the sequence of the random number, which is generated by the proposed random number generator scheme to evaluate the random number generator for cryptographic applications. To test the randomness of the generated sequence, we convert the sequence into binary, because NIST test is applicable for binary data. The NIST statistical test consists of sixteen tests method tabulated in Table 8. It can be observed from the table that the generated sequence passed the entire randomness test, which evidence that the proposed scheme generates good quality random sequences that are appropriate for audio encryption application.

Table 8 NIST randomness test for cryptographic applications

6 Conclusion

This paper introduced a three-dimensional chaotic map and its applications to audio encryption applications. In the first part of the paper, we presented a three-dimensional chaotic map. The map is evaluated through phase plots and bifurcation diagrams. We further use the suggested chaotic maps and design a novel audio encryption scheme. The chaotic sequences are used to shuffle the data of the plain audio to achieve the diffusion property. In the confusion module, initially, the permuted fifteen-bit integers block is dived into two subblocks, consist of eight-bit integers and seven-bit integers. Besides, the Möbius transformation deployed that generate good quality 8 × 8 and 7 × 7 S-boxes. The S-boxes are then used to substitute the block of eight-bit integers and seven-bit integers, which produce optimum confusion in ciphered blocks. The simulation demonstrates that the proposed encryption successfully encrypts the audio and converts it into an unrecognizable uniform sound. Moreover, the scheme is scrutinized against various attacks, the performance result determines that the proposed encryption scheme exhibits better resistance to statistical and differential attacks.