Keywords

1 Introduction

Every day millions (perhaps billions) of messages in the form of texts, audio, images, and videos, are communicated on the Internet, which is an open (unsecure) network. So, there must be robust technique(s) in order to communicate secretly. In the context of secure communication, encryption is the best choice, which encodes a secret message into a form which is unrecognizable, except by the intended one. Broadly, there are two types of encryption schemes: symmetric-key encryption and asymmetric-key encryption. The symmetric-key encryption, also known as (a.k.a.) private-key encryption, uses the same secret key for encoding and decoding a message. The foremost application of the private-key encryption is to provide confidentiality. On the other hand, asymmetric-key encryption, a.k.a. public-key encryption, uses different keys for encoding and decoding a message. In particular, public key is used for encoding, while private (secret) key is used for decoding a message. The foremost applications of the public-key encryption are authentication and non-repudiation, besides confidentiality.

Since the symmetric-key encryption methods are much faster and more efficient, for attaining confidentiality, as compared to the asymmetric-key encryption methods, therefore, we adopt the symmetric-key encryption method in the proposed technique. Note that the Rotation-XOR (RX) operations, used in the proposed technique, are the primitive operations, which are efficiently and directly supported by most of the computer processors. These operations aid in the possible improvement of speed of the designed technique.

The rest of the paper has been put in the following order: Sect. 2 provides related works; Sect. 3 gives preliminaries; Sect. 4 describes the encryption and decryption algorithms of the proposed technique; Sect. 5 describes the implementation and experimental results; Sect. 6 discusses security analyses of the proposed technique; Sect. 7 gives comparison of the proposed technique with the recent state-of-the-art techniques; and Sect. 8 concludes the paper, followed by the references.

2 Related Works

Abouelkheir and El-Sherbiny [1] in 2022 proposed a technique for the security of digital audio files based on a modified RSA (Rivest, Shamir, and Adleman) algorithm. The authors modified the RSA algorithm via using dynamic keys—for enhancing security of the proposed technique, and five numbers (two primes and three random numbers)—for enhancing speed of the proposed technique. Several metrics have been utilized in order to validate the aims of the designed scheme. Although the scheme performs well in terms of encryption, but in terms of decryption, it is not a good scheme. It performs lossy decryption, i.e., the decrypted audio files are not exactly identical to the original audio files.

Shah et al. [2] in 2021 proposed a technique for the secure communication of digital audio files based on the finite fields. The authors generated a sequence of pseudo-random numbers via an elliptic curve, which is used to scramble the samples of the plain audio files. Further, the scrambled audio samples are substituted via the newly constructed S-boxes, to ensure the confusion–diffusion properties [3] required for a secure encryption algorithm. Faragallah and El-Sayed [4] in 2021 proposed an encryption scheme for securing the audio files based on the XOR (eXclusive OR) operation and the Hartley Transform (HT). First of all, a plain audio file is reshaped into a two-dimensional (2D) data block, and then it is XOR-ed with a grayscale image (treated as a secret key). The obtained XOR-ed blocks are then transposed via a chaotic map, followed by an optical encryption using HT. Naskar et al. [5] in 2021 suggested an encryption scheme for audio files based on the distinct key blocks together with the Piece-Wise Linear Chaotic Map (PWLCM) and the Elementary Cellular Automata (ECA). The scheme encrypts a plain audio file in three stages: cyclic shift, substitution, and scrambling. The cyclic shift is used for reducing the correlation between the samples of each audio block. The shifted audio data blocks are substituted (modified) via PWLCM, and finally, modified blocks are scrambled via ECA for better diffusion.

Abdelfatah [6] in 2020 proposed an algorithm for securing audio files in three phases utilizing three secret keys. The first phase is the self-adaptive scrambling of the plain audio files via the first secret key. The second phase is the dynamic DNA (deoxyribonucleic acid) encoding of the scrambled audio data via the second secret key. The last phase is the cipher feedback mode via the third secret key, which aids in achieving better confusion and diffusion properties.

3 Preliminaries

3.1 Digital Audio

A digital audio, say, P is a l-by-c matrix, consisting of elements called samples, where l and c denote the number of samples and the number of channels in P, respectively. If \(c = 1\), then P is said to be a single (or mono) channel audio file, and if \(c = 2\), then P is said to be a dual (or sterio) channel audio file. Note that the samples in P are the floating-point values, i.e., real values. Figure 1 shows the oscillogram (a graph between amplitude and time) of the audio file “handel.wav”, which is of size 73113 \(\times \) 1, i.e., a single-channel audio file containing 73113 samples. For other details of the audio file “handel.wav”, namely, sample rate (in Hz—Hertz), duration (in sec—seconds), bits per sample, bit rate (in kbps—1000 bits per second), and size (in KB—1024 Bytes), see Table 1.

Fig. 1
figure 1

Oscillogram of the audio file “handel.wav”

3.2 Rotation operation

By rotation operation, we mean “circular shift” or “bit-wise” rotation. It is of two types:

  1. 1.

    Left rotation. It is denoted by “\(\lll \)”. By \(x \lll y\), it is meant that x is left rotated by y bits. For example, if \(x =\) 0001 0111 and \(y = 1\), then \(x \lll y\) gives 0010 1110. Figure 2a demonstrates the concept, wherein MSB is the Most Significant Bit and LSB is the Least Significant Bit.

  2. 2.

    Right rotation. It is denoted by “\(\ggg \)”. By \(x \ggg y\), it is meant that x is right rotated by y bits. For example, if \(x =\) 0001 0111 and \(y = 1\), then \(x \ggg y\) gives 1000 1011. Figure 2b demonstrates the concept.

Fig. 2
figure 2

a Left rotation of \(x =\) 0001 0111 by 1-bit and b right rotation of x by 1-bit

3.3 XOR Operation

It is one of the simplest operations in a computer’s processor. It is a bit-wise operation that takes two strings of bits of equal length and performs the XOR (denoted by \(\oplus \)) operation as: if two bits are same, the result is 0; and if not same, the result is 1. It’s actually addition modulo 2.

For example, if \(a =\) 1010 1011 and \(b =\) 0101 1100, then \(a \oplus b =\) 1111 0111.

4 Description of the Proposed Encryption and Decryption Algorithms

4.1 Preprocessing on the Audio File

Input. An audio file P of size \(l \times 1\).

  1. 1.

    Convert the audio samples of P from the floating point values (real values) to binary (matrix) via single-precision floating point (32-bit).Footnote 1

  2. 2.

    Convert the binary (matrix) to non-negative integers (bytes) array, i.e., P is of size \(1 \times l\). Note that, here samples of P are in bytes (0–\(2^8 - 1\)).

  3. 3.

    Now, if l is a multiple of 4, then no padding is required, else pad (\(4 - r\)) elements “post” with zeros to P, where r is a remainder on dividing l by 4.

  4. 4.

    Convert the bytes of P into WORDS, where WORD is a collection of 4 bytes, and rename the audio file P as \(P_w\).

Output. The audio file \(P_w\) of size \(1 \times m\), where m denotes the number of WORDS in \(P_w\).

4.2 Reverse Preprocessing on the Audio File

Input. The audio file \(P_w\) of size \(1 \times m\), where m being the number of WORDS in \(P_w\).

  1. 1.

    Convert the WORDS of the audio file \(P_w\) into bytes (0–2\(^8 -\) 1), and now, the size of \(P_w\) is \(1 \times 4m\). Rename \(P_w\) as P.

  2. 2.

    Remove “last” zero (padded) bytes, if any, from P, and let the size of P becomes \(1 \times l\) bytes.

  3. 3.

    Convert the bytes (non-negative integers—0–\(2^8 - 1\)) into a binary (matrix).

  4. 4.

    Convert the binary (matrix) into the floating-point values via the single-precision floating point (32-bit).

  5. 5.

    Take the transpose of P so that the size of P becomes \(l \times 1\).

Output. The audio file P of size \(l \times 1\).

4.3 Preprocessing on Secret Key

Input. Secret key \(K =\) {\(k_1\), \(k_2\), \(k_3\), \(k_4\), \(k_5\), \(k_6\), \(k_7\), \(k_8\), \(k_9\), \(k_{10}\), \(k_{11}\), \(k_{12}\), \(k_{13}\), \(k_{14}\), \(k_{15}\), \(k_{16}\), \(k_{17}\), \(k_{18}\), \(k_{19}\), \(k_{20}\), \(k_{21}\), \(k_{22}\), \(k_{23}\), \(k_{24}\), \(k_{25}\), \(k_{26}\), \(k_{27}\), \(k_{28}\), \(k_{29}\), \(k_{30}\), \(k_{31}\), \(k_{32}\)} of 32 bytes.

  1. 1.

    Split the secret key K into two equal parts, say, \(K_1\) and \(K_2\) as \(K_1 =\){\(k_1\), \(k_2\), \(k_3\), \(k_4\), \(k_5\), \(k_6\), \(k_7\), \(k_8\), \(k_9\), \(k_{10}\), \(k_{11}\), \(k_{12}\), \(k_{13}\), \(k_{14}\), \(k_{15}\), \(k_{16}\)} and \(K_2 =\){\(k_{17}\), \(k_{18}\), \(k_{19}\), \(k_{20}\), \(k_{21}\), \(k_{22}\), \(k_{23}\), \(k_{24}\), \(k_{25}\), \(k_{26}\), \(k_{27}\), \(k_{28}\), \(k_{29}\), \(k_{30}\), \(k_{31}\), \(k_{32}\)}.

  2. 2.

    Convert the key-bytes of \(K_1\) and \(K_2\) into WORDS as \(K_{1w} =\){\(q_{1w}\), \(q_{2w}\), \(q_{3w}\), \(q_{4w}\)}, and \(K_{2w} =\){\(r_{1w}\), \(r_{2w}\), \(r_{3w}\), \(r_{4w}\)}, where \(q_{1w} = k_1k_2k_3k_4\), \(q_{2w} = k_5k_6k_7k_8\), \(q_{3w} = k_9k_{10}k_{11}k_{12}\), and \(q_{4w} = \{k_{13}k_{14}k_{15}k_{16}\)}; \(r_{1w} = k_{17}k_{18}k_{19}k_{20}\), \(r_{2w} = k_{21}k_{22}k_{23}k_{24}\), \(r_{3w} = k_{25}k_{26}k_{27}k_{28}\), and \(r_{4w} = \{k_{29}k_{30}k_{31}k_{32}\)}.

  3. 3.

    Expansion of \(K_{1w}\).

    • \(\blacktriangleright \) Expand \(K_{1w}\) to the size m as:

      1. (a)

        For \(i =\) 1, 2, 3, 4; \(T_1[i] = K_{1w}[i]\), i.e., \(T_1[1] = q_{1w}\), \(T_1[2] = q_{2w}\), \(T_1[3] = q_{3w}\), and \(T_1[4] = q_{4w}\).

      2. (b)

        Calculate \(T_1[5]\) as

        $$\begin{aligned} T_1[5] = \text {mod}(\lceil mean(T_1[i])\rceil \text {,} \ 2^{32})\text {,} \qquad i = \text {1, 2, 3, 4.} \end{aligned}$$

        where “mean” denotes the average function, \(\lceil \cdot \rceil \) denotes the ceiling function, and “mod” denotes the modulus function.

      3. (c)

        Calculate \(T_1[i]\), for \(i =\) 6, 7, ..., m, as

        $$\begin{aligned} T_1[i] = \text {mod}(T_1[i - 1] + T_1[i - 2]\text {,}\ 2^{32})\text {,} \qquad i = \text {6, 7,} \dots \text {,}\ m{.} \end{aligned}$$
  4. 4.

    Expansion of \(K_{2w}\).

    • \(\blacktriangleright \) Expand \(K_{2w}\) to the size m as:

      1. (a)

        For \(i =\) 1, 2, 3, 4; \(T_2[i] = K_{2w}[i]\), i.e., \(T_2[1] = r_{1w}\), \(T_2[2] = r_{2w}\), \(T_2[3] = r_{3w}\), and \(T_2[4] = r_{4w}\).

      2. (b)

        Calculate \(T_2[5]\) as

        $$\begin{aligned} T_2[5] = \text {mod}(\lceil mean(T_2[i])\rceil \text {,} \ 2^{32})\text {,} \qquad i = \text {1, 2, 3, 4.} \end{aligned}$$

        where symbols have their usual meanings.

      3. (c)

        Calculate \(T_2[i]\), for \(i =\) 6, 7, ..., m, as

        $$\begin{aligned} T_2[i] = \text {mod}(T_2[i - 1] + T_2[i - 2]\text {,}\ 2^{32})\text {,} \qquad i = \text {6, 7,} \dots \text {,}\ m{.} \end{aligned}$$
  5. 5.

    Generation of a third key.

    • \(\blacktriangleright \) Generate a third key \(K_{3w}\) from \(K_{1w}\) and \(K_{2w}\) as:

      $$\begin{aligned} K_{3w} = \text {mod}(K_{1w} \cdot K_{2w}\text {,}\ 2^{32}) \end{aligned}$$

      where “\(\cdot \)” denotes component-wise multiplication.

Output. The expanded keys \(T_1\) and \(T_2\) of size m, and the generated key \(K_{3w}\) of size 4.

4.4 Encryption Algorithm

Input. An audio file P of size \(l \times 1\) and the secret key K of 32-byte.

  1. 1.

    Apply preprocessing on the audio file P (see Sect. 4.1), and let the obtained file be \(P_w\) of size \(1 \times m\).

  2. 2.

    Apply preprocessing on the secret key K (see Sect. 4.3) to obtain the expanded keys \(T_1\) & \(T_2\) of size m, and the generated key \(K_{3w}\) of size 4 (in WORDS).

  3. 3.

    Initial round substitution. XOR \(P_w\) with \(T_1\), i.e.,

    $$\begin{aligned} B[i] = P_w[i] \oplus T_1[i], \qquad i =1, 2, \ldots , m. \end{aligned}$$
  4. 4.

    First round substitution.

    1. (a)

      Let \(B =\){\(b_1\), \(b_2\), \(\dots \), \(b_m\)}, then do the following:

      $$\begin{aligned} \begin{array}{ll} \qquad \text {for} \ i = \text {1 to}\ m \\ \qquad \quad b_{i - 1} = c_{i - 1} \\ \qquad \quad c_i = [b_i \lll \sigma (b_{i - 1})] \oplus b_{i - 1}\\ \qquad \text {end for} \end{array} \end{aligned}$$

      where \(b_0 = c_0 = b_m\); “\(\lll \)” denotes the left rotation operator; and “\(\sigma \)” in \(\sigma (b_{i - 1})\) denotes sum-of-digits function, and \(\sigma (b_{i - 1})\) denotes sum-of-digits of \(b_{i - 1}\). For instance, if \( b_{i - 1} = 123\), then \(\sigma (123) = 1 + 2 + 3 = 6\).

    2. (b)

      Let \(C =\){\(c_1\), \(c_2\), \(\dots \), \(c_m\)}, then do the following:

      $$\begin{aligned} \begin{array}{l} C[i] = C[i] \oplus K_{3w}[i] \qquad i =\text {1, 2, 3, and} \\ C[m] = C[m] \oplus K_{3w}[4]. \end{array} \end{aligned}$$
  5. 5.

    Second round substitution.

    1. (a)

      Do the following:

      $$\begin{aligned} \begin{array}{l} \qquad \text {for} \ j = \text {1 to}\ m \\ \qquad \quad c_{j - 1} = d_{j - 1} \\ \qquad \quad d_j = [c_j \lll \sigma (c_{j - 1})] \oplus c_{j - 1}\\ \qquad \text {end for} \end{array} \end{aligned}$$

      where \(d_0 = c_m\) and the rest symbols have their usual meanings.

    2. (b)

      Let \(D =\){\(d_1\), \(d_2\), \(\dots \), \(d_m\)}, then do the following:

      $$\begin{aligned} E[j] = D[j] \oplus T_2[j]\text {,} \qquad j = \text {1, 2,} \dots \text {,}\ m. \end{aligned}$$
  6. 6.

    Apply the reverse preprocessing on the audio file E of size \(1 \times m\) (see Sect. 4.2), and let the obtained audio file be F of size \(l \times 1\).

Output. The encrypted audio file F of size \(l \times 1\).

4.5 Decryption Algorithm

Input. The encrypted audio file F of size \(l \times 1\) and the secret key K (32-byte).

  1. 1.

    Apply the preprocessing on the audio file F (see Sect. 4.1) to obtain an audio file E of size \(1 \times m\), m being number of WORDS in E.

  2. 2.

    Second round substitution.

    1. (a)

      XOR the audio file E with \(T_2\), i.e.,:

      $$\begin{aligned} D[j] = E[j] \oplus T_2[j]\text {,} \qquad j =\text {1, 2,} \dots \text {,}\ m. \end{aligned}$$
    2. (b)

      Let \(D =\){\(d_1\), \(d_2\), ..., \(d_m\)}, then do the following:

      $$\begin{aligned} \begin{array}{ll} \qquad \text {for} \ j = m \ \text {to 1} \\ \qquad \quad c_j = [d_j \oplus d_{j - 1}] \ggg \sigma (d_{i - 1}) \\ \qquad \text {end for} \end{array} \end{aligned}$$

      where “\(j = m\) to 1” means \(j = m\text {,} \ m - 1\), ..., 2, 1; \(d_0 = d_m\); and “\(\ggg \)” denotes the right rotation.

  3. 3.

    First round substitution.

    1. (a)

      Let \(C =\){\(c_1\), \(c_2\), ..., \(c_m\)}, then do the following:

      $$\begin{aligned} \begin{array}{ll} C[i] = C[i] \oplus K_{3w}[i]\text {,} \qquad i =\text {1, 2, 3, and} \\ C[m] = C[m] \oplus K_{3w}[4]. \end{array} \end{aligned}$$
    2. (b)

      Do the following:

      $$\begin{aligned} \begin{array}{ll} \qquad \text {for} \ i = m \ \text {to 1} \\ \qquad \quad b_i = [c_i \oplus c_{i - 1}] \ggg \sigma (c_{i - 1}) \\ \qquad \text {end for} \end{array} \end{aligned}$$

      where \(c_0 = C_m\) and the rest symbols have their usual meanings.

  4. 4.

    Initial round substitution. Let \(B =\){\(b_1\), \(b_2\), ..., \(b_m\)}, then do the following:

    $$\begin{aligned} P_w[i] = B[i] \oplus T_1[i]\text {,} \qquad i ={1, 2, \ldots ,} \ m. \end{aligned}$$
  5. 5.

    Apply the reverse preprocessing on the audio file \(P_w\) (see Sect. 4.2) of size \(1 \times m\), to obtain the audio file P of size \(l \times 1\).

Output. The decrypted (original) audio file P of size \(l \times 1\).

5 Implementation and Experimental Results

The proposed technique is implemented on MATLAB (R2021a) software under the Windows 10 operating system. To evaluate the performance (encryption and decryption qualities) of the proposed technique, two test audio files of different sample lengths are taken from the MATLAB IPT (Image Processing Toolbox).Footnote 2 The details of these audio files are provided in Table 1. Also, the oscillograms of the original, encrypted, and the decrypted audio files are shown in Fig. 3.

Table 1 Description of the test audio files

6 Security Analyses

6.1 Key Space Analysis

The space of all potential combinations of a key constitutes a key space of any encryption/decryption algorithm. Keyspace should be very large so that attacks, such as the brute-force [9], known/chosen plaintext [10], etc., could become unsuccessful. Our proposed technique is based on a secret key of 32 bytes (256 bits), which produces a key space of \(2^{256}\), and as of today, it is believed to be unbreakable.

6.2 Encryption Evaluation Metrics

Since any single metric cannot evaluate any encryption algorithm (or any encrypted audio file) fully, so we employ two important metrics, namely, the oscillogram and the number of sample change rates.

6.2.1 Oscillogram Analysis

The oscillogram is a 2D graph of an audio file (or signal) between the amplitude and the time, generated by the oscilloscope, a.k.a. oscillograph [11, Chap. 5]. It represents the change in amplitude of an audio file over the time. X-axis represents the time in seconds, while Y-axis represents the amplitude in volts. The oscillograms of the original, encrypted, and the decrypted audio files are shown in Fig. 3.

Fig. 3
figure 3

Experimental results: Figs. a and d show the oscillograms of the original audio files; Figs. b and e show the oscillograms of the corresponding encrypted audio files; and Figs. c and f show the oscillograms of the corresponding decrypted audio files

From Fig. 3, we observe that the oscillograms of the encrypted audio files are uniform, unlike those of the corresponding original audio files. Also, the oscillograms of the decrypted audio files are identical to those of the corresponding original files. Thus, our proposed technique performs a robust encryption. Also, since the audio files are successfully decrypted without any data loss, the designed technique performs lossless decryption.

6.2.2 Number of Sample Change Rate (NSCR) Test

The NSCR [13] is used to test the resistance of the differential attack [14], or judging the Shannon’s diffusion property [3]. The NSCR scores between the encrypted audio files \(E_1\) and \(E_2\) can be calculated via Eq. 1:

$$\begin{aligned} {} NSCR = \sum _{s=1}^{l}\dfrac{\beta (s\text {,}\ 1)}{l} \times 100\% \ \text {,} \end{aligned}$$
(1)

where \(\beta (s\text {,}\ 1)\) is given by Eq. 2:

$$\begin{aligned} {} \beta (s\text {,}\ 1) = {\left\{ \begin{array}{ll} 0 \text {,} &{} \text {if} \ E_1(s\text {,}\ 1) = E_2(s\text {,}\ 1)\\ 1 \text {,} &{} \text {if} \ E_1(s\text {,}\ 1) \ne E_2(s\text {,}\ 1) \end{array}\right. } \end{aligned}$$
(2)

where \(E_1(s\text {,}\ 1)\) and \(E_2(s\text {,}\ 1)\) are the samples of the encrypted audio files prior to and after alteration of only one sample of the original audio file.

We have calculated the NSCR scores by changing only one sample of the test audio files at different positions (from beginning—(1, 1)th sample as well as from the last—(l, 1)th sample), l being the total number of samples in an audio file. The obtained NSCR scores are shown in Table 2. Note that, if the calculated/reported NSCR score is greater than the theoretical NSCR value, which is 99.5527 at 0.01 significance level and 99.5693% at 0.05 level [13], then the NSCR test is passed. The proposed technique passes the NSCR test for all the audio files, and thus, ensures the property of diffusion, and also, outperforms the methods listed in Table 2, which are vulnerable to the differential attack.

Table 2 NSCR scores of the encrypted images

6.3 Decryption Evaluation Metric

To evaluate the decryption algorithm, i.e., the decrypted audio files, we use an important metric: the mean square error.

6.3.1 Mean Square Error (MSE) Analysis

The MSE [15] is used to judge the decryption quality of any decrypted audio file. The MSE value can be any non-negative integer. Lower the MSE, better is the decryption quality, in particular, value 0 denotes the perfect decryption, i.e., the original and the decrypted audio files are exactly identical—lossless decryption. The MSE can be calculated via Eq. 3:

$$\begin{aligned} MSE = \sum _{j = 1}^{l} \dfrac{(P_j - D_j)}{l} \ \text {,} \end{aligned}$$
(3)

where \(P_j\) and \(D_j\) denote the jth samples of the original and the decrypted audio files, respectively, while the other symbols have their usual meanings.

The values of the MSE between the original and the decrypted audio files are provided in Table 3. From the table, we observe that the MSE values are 0 (zero), endorsing that the decrypted audio files are perfectly identical to the original audio files.

Table 3 The MSE values between the decrypted and the original audio files

6.4 Key Sensitivity Analysis

This test is utilized to judge the confusion property [3] of any encryption/decryption algorithm. According to Shannon [3], a secure cryptographic algorithm must have the confusion property to thwart statistical attacks. It is the property of confusion that hides the relationship between the encrypted data and the secret key. The key sensitivity test is utilized to judge this confusion property. The sensitivity of the secret key is assessed in two aspects:

  1. 1.

    Encryption. It is used to measure the dissimilarity between the two encrypted audio files \(E_1\) and \(E_2\) w.r.t. the same plain audio file P using two different encryption keys \(\lambda _1\) and \(\lambda _2\), where \(\lambda _1\) and \(\lambda _2\) are obtained from the original secret key K by altering merely the LSB corresponding to the last and the first bytes of K, respectively.

  2. 2.

    Decryption. It is used to measure the dissimilarity between the two decrypted audio files \(D_1\) and \(D_2\) w.r.t. the same encrypted audio file E, encrypted via secret key K, using the decryption keys \(\lambda _1\) and \(\lambda _2\), respectively. Note that both the encryption/decryption keys \(\lambda _1\) and \(\lambda _2\) differ from each other as well as from the secret key K merely by 1-bit.

The results of the key sensitivity analysis w.r.t. the encryption (enc—in short) and decryption (dec—in short) aspects are shown in Figs. 4 and 5, respectively, whence we infer that the proposed technique has a very high bit-level sensitivity, and thus, ensures the property of confusion.

Fig. 4
figure 4

Key sensitivity analysis w.r.t. the encryption

Fig. 5
figure 5

Key sensitivity analysis w.r.t. the decryption

7 Comparison with the Existing Techniques

The proposed technique is compared with the recent state-of-the-art techniques based on the commonly available metrics, namely, the NSCR and the MSE. The comparisons of the proposed approach with the recent approaches based on the NSCR and the MSE metrics are provided in Tables 2 and 3, respectively. From these tables, we infer that our proposed technique performs well in terms of the respective compared metrics.

8 Conclusion

In this paper, we proposed a technique for securing digital audio files based on the WORD-oriented RX operations. Several performance evaluation metrics, i.e., encryption and decryption evaluation metrics, have been used on the audio files of varying sizes from the standard database, in order to empirically assess the efficiency and robustness of the designed approach. The results of these performance evaluation metrics validate the goals of the proposed approach. Moreover, a thorough comparison with the recent state-of-the-art techniques, based on several metrics, have also been made.