1 Introduction

The wide popularity of the Internet combined with the rapid advances in cell phones and computer technologies lead to explosive growth of electronic data exchange and digital communications. As a result, digital speech communication is almost used in all activities in our daily life. As examples, it is used in commerce, education, military, politics, e-learning, news telecasting phone, and banking. This means that, a massive amount of sensitive audio data is exchanged in a daily routine over open and shared networks. Because of the rapid growth of digital audio, and data communications, the importance of providing a high level of security becomes a major issue.

The voice information, which is different from text messages and images, has higher redundancy and stronger correlation between samples. In recent years, several image and video scrambling methods have been presented (Zhang et al. 2008; Lin and Chang 2001; Petitcolas et al. 1999; Langelaar et al. 2000; Chen and Lin 2003; Barni et al. 2001; Refregier and Javidi 1995; Hedelin et al. 1999; Yang et al. 1998; Sang et al. 2003; Kwon et al. 2006; Wu and Ng 2002; Shumei et al. 2009; Al Saad and Hato 2014), but the number of audio scrambling techniques is relatively less. Audio scrambling algorithms must disorganize the audio signal and eliminate the correlation between samples. The common audio scrambling methods use 1-D linear mapping which is based on simple algorithms, and this makes these methods easy to be attacked (Zeng et al. 2012).

The main problem in any audio encryption technique is that the audio signal is a slowly time varying signal so, it contains a large number of adjacent samples of similar values. To solve this problem, we need to do the following:

  1. 1.

    Destroy correlation between samples in order to remove any intelligibility.

  2. 2.

    Change formant, pitch and energy of original signal.

  3. 3.

    Fill silent periods within conversation without revealing the secret key.

  4. 4.

    Make processing time small to decrease delay.

Sound waves are characterized in terms of frequency (Hz), amplitude (dB) and phase (degree), whereas frequencies and amplitudes are perceived as pitch and loudness, respectively.

Traditional cryptographic techniques are efficient for the text data only. They cannot be used for securing sensitive audio data because of its huge capacity and high redundancy. Strong cryptographic algorithms are needed to encrypt sensitive audio data before transmission. Therefore, designing efficient audio encryption techniques that can provide high security level to the sensitive audio data are new challenges. This problem has been studied recently by many scientists, and they proposed a number of audio encryption schemes (Al Saad and Hato 2014; Li et al. 2009; Kohad et al. 2012; Sharma 2012; Zeng et al. 2012; Sheu 2011; Elshamy et al. 2013; Zhao et al. 2014; Mermoul and Belouchrani 2010; Maysaa and Qays 2013). These schemes can generally be classified into analog and digital (Al Saad and Hato 2014).

In a digital cryptosystem, the audio signal is sampled with a suitable sampling frequency to break the continuous audio signal into equal short time segments. This stream of discrete audio segments is quantized and encoded to generate a binary data stream. The output binary data stream can be compressed to generate a data signal at a suitable bit rate. At the transmitter side, the binary data stream is encrypted with an encryption algorithm, which changes the sequence of bits by means of block or stream ciphering. The changed sequence is then transmitted via digital modulation (Zhai et al. 2008). Digital encryption has a higher security level than analog, but it requires complex implementation and large bandwidth for transmission (Al Saad and Hato 2014).

The most efficient encryption techniques for dealing with redundant, and bulky audio data are the chaos-based and double random phase encoding (DRPE) techniques as they provide highly secure, and fast encryption (Maysaa and Qays 2013; Del Re et al. 1989).

This paper presents an efficient method for encrypting digital audio signals using a hybrid mixture of chaotic maps and optical encryption. The proposed audio cryptosystem aims to enhance the audio security during transmission on shared networks. The proposed audio cryptosystem is implemented using either Arnold’s cat map with DRPE, or Baker map with DRPE. From their names, it is clear that each of them is composed of a chaotic cryptosystem and DRPE as an optical cryptosystem which means that they have two security layers. The first layer is achieved by the chaotic map, while the second layer is achieved by DRPE which adds a physical security layer. This physical security layer enforces the security level of the proposed audio cryptosystem. The performance of the proposed audio cryptosystems is investigated using different encryption/decryption audio quality metrics including Spectral Distortion (SD), Log Likelihood Ratio (LLR), correlation, processing time, histogram and spectrogram. The results show that the proposed audio cryptosystem provides a high security level with high confidentiality. Also, the results show that the hybrid Baker map with DRPE cryptosystem outperforms the Arnold’s map with DRPE system for different audio quality metrics.

The rest of the paper is organized as follows: Sect. 2 presents a general discussion of the chaotic cryptosystems embedded in the proposed systems. In Sect. 3, the details of DRPE are discussed. Section 4 presents the proposed audio cryptosystem. In Sect. 5, the key performance indicators used to evaluate the proposed audio cryptosystems are given. Section 6 presents the experimental results and discussion. Finally, Sect. 7 summarizes the main conclusions of the paper.

2 Chaotic system

Any dynamic and nonlinear deterministic system that exhibits pseudorandom behavior is a chaotic system. The output of chaotic systems depends on initial conditions, and specific parameters. That is, different initial conditions, and/or different parameter values yield different system’s output (Del Re et al. 1989). Chaotic systems are motivated for applications in cryptography, and pseudo-random number generators because of their random-like behavior and sensitivity to initial conditions and parameter settings (Chen et al. 2014; Ahmad et al. 2012). Because of these unique characteristics, chaotic systems are able to fulfill the cryptographic properties such as confusion, diffusion and disorder. Chaotic systems are very sensitive to their inputs that is any simple change in the initial conditions and parameters setting can lead to a very big difference in the final system state over few iterations. Many researchers utilized chaotic systems in developing cryptographic algorithms such as (Al Saad and Hato 2014; Li et al. 2009; Kohad et al. 2012; Sharma 2012; Zeng et al. 2012; Sheu 2011; Elshamy et al. 2013; Zhao et al. 2014; Mermoul and Belouchrani 2010; Maysaa and Qays 2013; Del Re et al. 1989).

2.1 Chaotic Arnold’s cat map

In mathematics, a chaotic function or map is any function that possesses some kind of chaotic behavior (Del Re et al. 1989). Arnold’s cat map (ACM) or Arnold transform (AT), proposed by Vladimir Arnold in 1960, is an example of a two-dimensional chaotic map (Zhang et al. 2008). If ACM is applied to a digital image, it randomizes the original organization of its pixels and the image becomes insignificant or noisy. However, it has a period p and if iterated p times, the original image reappears.

Definition

The generalized schema of Arnold’s cat map can be given by the following transformation

$$ \Gamma :{\rm T}^{2} \to {\rm T}^{2} $$

such that:

$$ \left( \begin{aligned} x' \hfill \\ y' \hfill \\ \end{aligned} \right) = \left( \begin{aligned} \begin{array}{*{20}c} 2 & 1 \\ \end{array} \hfill \\ \begin{array}{*{20}c} 1 & 1 \\ \end{array} \hfill \\ \end{aligned} \right)\left( \begin{aligned} x \hfill \\ y \hfill \\ \end{aligned} \right)\left( {\bmod\,\,N} \right) $$
(1)

where, x, y ∈ {0, 1, 2 … N − 1} and N is the size of a digital image. A new image is produced when all points in a given image are manipulated once by Eq. (1). ACM is easy and at the same time it is a strong transform (Lin and Chang 2001). Digital image encryption can be done by applying ACM in the following manner: Let p be the transform period of an N × N digital image I. Placing ACM for a random iteration of times (t ∈[1, p]) to I, a scrambled image I′ is acquired which is completely chaotic and is different from I. Now I′ can be transmitted over the communication channels without revealing any data to the unauthorized receivers or sniffers (Petitcolas et al. 1999; Langelaar et al. 2000). The receiver repeats the operation for (p − t) times to obtain back the original image. The proposed scheme utilizes the ACM for encrypting audio. It first reshapes the audio form 1-D format to 2-D format, and then applies the same ACM procedure on the resulting 2-D matrix.

The original Arnold transformations given by Eq. (1) can be easily modified to produce a sequence of Arnold transformations as follows:

$$ \left( \begin{aligned} x' \hfill \\ y' \hfill \\ \end{aligned} \right) = \left( \begin{aligned} \begin{array}{*{20}c} i & {i + 1} \\ \end{array} \hfill \\ \begin{array}{*{20}c} 1 & 1 \\ \end{array} \hfill \\ \end{aligned} \right)\left( \begin{aligned} x \hfill \\ y \hfill \\ \end{aligned} \right)\left( {\bmod\,N} \right) $$
(2)

or

$$ \left( \begin{aligned} x' \hfill \\ y' \hfill \\ \end{aligned} \right) = \left( {\begin{array}{*{20}c} {i + 1} & i \\ 1 & 1 \\ \end{array} } \right)\left( \begin{aligned} x \hfill \\ y \hfill \\ \end{aligned} \right)\left( {\bmod\,N} \right) $$
(3)

where i ∈ {1,2,3…}

Transformations given by Eqs. (2) and (3) are periodic since the absolute value of det (A) is always 1 in both cases, where A = [a, b; c, d] is the Arnold transform matrix. Equations (2) and (3) make a sequence of various Arnold transforms (Petitcolas et al. 1999; Langelaar et al. 2000) with various periodicity values Pk (Sang et al. 2003).

2.2 Chaotic Baker Map

In this section, the chaotic Baker map is explained (Pande and Zambreno 2011; Fridrich 1998). There are two types of the chaotic Baker map method, generalized map and discretized map. The discretized Baker map is an effective way to randomize the elements in a square matrix. Let B(n 1 ,…,n k ), denote the discretized map, where the vector, [n 1,…,n k ], represents the secret key, S key . Defining N as the number of data elements in one row, the secret key is chosen such that each integer n i divides N, and n 1 + ··· + n k  = N.

Let N i  = n 1 + ··· + n i1 . The data item at the indices (q, z), is moved to the indices:

$$ B_{{(n_{1} , \ldots ,n_{k} )}} (q,z) = \left( {\frac{N}{{n_{i} }}(q - N_{i} ) + z \, \bmod \left( {\frac{N}{{n_{i} }}} \right),\frac{{n_{i} }}{N}\left( {z - z \, \bmod \left( {\frac{N}{{n_{i} }}} \right)} \right) + N_{i} } \right) $$
(4)

where N i  ≤ q < N i  + n i , 0 ≤ z < N, and N 1  = 0.

The following steps summarize the execution procedure of chaotic permutation:

  1. (1)

    An N × N square matrix is divided into N rectangles of width n i and number of elements N.

  2. (2)

    The elements in each rectangle are reshaped to a row in the permuted rectangle. Rectangles are taken from left to right beginning with upper rectangles then lower ones. Inside each rectangle, the scan begins from the bottom left corner towards upper elements. Figure 1 shows an example of the permutation of an 8 × 8 matrix. The secret key is chosen to be (2, 4, 2), hence N = 8, n 1  = 2, n 2 = 4, and n 3 = 2. Figure 1a shows the generalized Baker map and Fig. 1b shows the discretized Baker map.

    Fig. 1
    figure 1

    Baker map. a Generalized baker map, b discretized baker map

3 Double random phase encoding (DRPE)

In literature, many optical image encryption techniques have been proposed. The most widely applied and effective encryption technique among them is the DRPE which is presented by Refregier and Javidi (1995). DRPE is selected among the optical family of security algorithms to be used with a chaotic system in the proposed hybrid audio cryptosystems for many reasons. First one is that DRPE uses two keys in addition to optical transformation which increases the security level. Second one is that encryption/decryption time is very small compared with the other optical based encryption algorithms. Thirdly, it does not need any special requirements, and its implementation is very simple. Fourthly, it is applicable to 2-D, and 3-D formats. Finally, it could be optically and digitally applied. Table 1 presents a comparison between some optical encryption techniques using different parameters.

Table 1 Comparison between different optical cryptosystems

The DRPE is based on modifying the spectral distribution of the reformatted 2-D audio. The main idea for DRPE works by inserting two random phase masks (RPMs), one in the input plane and the other in the Fourier plane to encrypt the transformed 2-D audio into stationary noise in a setup called “4f”. The 4f setup is an optical system consisting of two cascaded lenses, separated by two focal lengths as shown in Fig. 2, with each of the input and output image planes having one focal length outside the lens system from different directions (i.e., so the total length is four focal lengths, hence ‘‘4f’’).

Fig. 2
figure 2

Encryption process by double random phase encoding (DRPE)

The decryption process must use the same RPM used in the encryption process. When applied in a 4f optical processor, the complex conjugate Fourier phase key is required to decrypt the encrypted 2-D audio in DRPE.

The normal DRPE is divided into three stages:

  1. (1)

    RPM1, the first key, is multiplied by the transformed 2-D audio to be encrypted. This procedure introduces the first modification for the spectrum of the target encrypted 2-D audio.

  2. (2)

    RPM2, the second key, is directly multiplied into the spectrum of transformed 2-D audio in the Fourier plane. Multiplication of the RPM2 by the spectrum obtained in the first stage will be the second modification in the spectrum of the target transformed 2-D audio.

  3. (3)

    The second optical inverse Fourier transform is carried out through a second lens to obtain an encrypted 2-D audio in the original audio 2-D space.

To explain the DRPE in detail, we consider a primary intensity 2-D transformed audio f(xy) with positive values, where x and y denote the time domain coordinates. Also, υ and η denote the Fourier domain coordinates. Let ψ(xy) represent the encrypted 2-D audio, and n(xy) and m(xy), represent two key independent white sequences uniformly distributed in [0, 2π]. To encode primary transformed 2-D audio f(xy) into white stationary sequences, the two RPMs used are φ n (xy) = exp [2iπn(xy)] andφ m (xy) = exp [2iπm(xy)]. h(xy) = m(xy) is a phase function uniformly distributed in [0, 2π]. The RPM2 key φ m (υη), is the Fourier transform for the function h(xy), that is,

$$ FT\left\{ {h(x,y)} \right\} = \hat{h}(\upsilon ,\eta ) = \varphi_{m} \left( {\upsilon ,\eta } \right) = \exp \left[ {2i\pi m\left( {\upsilon ,\eta } \right)} \right] $$
(5)

The encryption process is composed of multiplying the primary transformed 2-D audio f(xy) by the RPM1 φ n (xy). Encrypting 2-D audio is complex, with amplitude and phase, and is given by the following expression:

$$ \psi (x,y) = \left\{ {f\left( {x,y} \right)\varphi_{n} \left( {x,y} \right)} \right\}*FT^{ - 1} \left\{ {\varphi_{m} \left( {\upsilon ,\eta } \right)} \right\} $$
(6)

The symbol (*) denotes convolution. The encrypted audio by Eq. (6) has a noise-like appearance that does not reveal any details of the original 2-D audio.

The decryption result output is

$$ FT^{ - 1} \left\{ {FT\left[ {\psi \left( {x,y} \right)} \right]\varphi_{m}^{ * } \left( {\upsilon ,\eta } \right)} \right\} = FT^{ - 1} \left\{ {FT\left[ {f(x,y)\varphi_{n} (x,y)} \right]\varphi_{m} \left( {\upsilon ,\eta } \right)\varphi_{m}^{ * } \left( {\upsilon ,\eta } \right)} \right\} = f\left( {x,y} \right)\varphi_{n} \left( {x,y} \right) $$
(7)

Absolute value for Eq. (7) turns out the decrypted 2-D Audio f(xy). The whole encryption and decryption mechanism can be implemented either digitally or optically. Optical hardware is very simple, it can be the classical 4f-processor (Goodman 1996; Javidi et al. 1997). Also in the encryption mechanism, the 4f-processor has the RPM1 as a first key stuck to the original transformed 2-D Audio in the time plane and the RPM2 as a second key in the Fourier plane.

The encrypted 2-D audio is given by Eq. (8):

$$ \psi_{p} \left( {x,y} \right) = \left\{ {\exp \left[ {i\pi f\left( {x,y} \right)} \right]\varphi_{n} \left( {x,y} \right)} \right\} * h\left( {x,y} \right) = \left\{ {\exp \left[ {i\pi f\left( {x,y} \right)} \right]\varphi_{n} \left( {x,y} \right)} \right\} * FT^{ - 1} \left\{ {\varphi_{m} \left( {\upsilon ,\eta } \right)} \right\} $$
(8)

The encrypted 2-D audio can be generated either optically or digitally similar to that described in Eq. (6). Also the same optical setup shown in Fig. 2 is used for the decryption process, but in this process, the complex conjugate of both RPMs φ * n (xy) = exp [− 2iπn(xy)] andφ * m (υη) = exp [− 2iπm(υη)], referred to as first and second keys, are very important for decryption to retrieve the original 2-D audio. The Fourier transform second key φ * m (υη) is placed in the Fourier plane, whereas the phase function first key φ * n (xy) is placed at the input plane of the optical processor. The phase only version of the original 2-D audio exp [iπf(xy)] is recovered in the time domain. Original 2-D audio f(xy) can be displayed as an intensity distribution by extracting the phase of exp [iπf(xy)] and dividing it by π.

4 Proposed hybrid audio encryption technique

To meet the requirements of modern cryptography applications with high security levels, the proposed hybrid cryptosystem is composed of digital and optical encryption techniques. It combines DRPE as an optical encryption technique with Arnold’s Cat map or Chaotic Baker map as a digital encryption technique. Figure 3 illustrates how the proposed hybrid cryptosystem works.

Fig. 3
figure 3

Flowchart of proposed audio cryptosystem. a Encryption steps of proposed audio cryptosystem, b decryption steps of proposed audio cryptosystem

As it is shown from Fig. 3a, the processing steps of the proposed encryption system can be summarized as follows:

  1. 1.

    Segment original audio signal into segments, and then reshape them into 2-D format.

  2. 2.

    Mask with Arnold cat map or Baker map.

  3. 3.

    Add mask to the transformed 2-audio.

  4. 4.

    Clipping:

    1. a.

      A value of 2 is subtracted from all values exceeding 1 resulting in negative values to make all samples between −1 and 1.

  5. 5.

    Apply DRPE.

    1. a.

      Generate first Fourier RPM key RPM1 and multiply it by the target audio signal to be encrypted.

    2. b.

      Generate second Fourier RPM key RPM2, and insert it into the audio signal in the Fourier plane. The insertion of the RPM2 in the audio obtained in the first phase introduces the second amendment into the target audio signal.

    3. c.

      Preform the second optical Fourier transform using a second lens to obtain the encoded audio in the original 2-D space of audio signal.

  6. 6.

    Reshape the 2-D format to 1-D format which represents the encrypted audio signal.

  7. 7.

    Synthesize segments.

Also, as it is shown from Fig. 3b, the operation of the proposed decryption system can be summarized as follows:

  1. 1.

    Segment encrypted audio signal into segments and then reshape them into 2-D format.

  2. 2.

    Apply DRPE.

    1. a.

      Generate first Fourier RPM key RPM1 and multiply it by the target audio signal to be encrypted.

    2. b.

      Generate second Fourier RPM key RPM2, and insert it into the audio signal in the Fourier plane. The insertion of the RPM2 in the audio obtained in the first phase introduces the second amendment into the audio of the target audio signal.

    3. c.

      Preform the second optical Fourier transform using a second lens to obtain the encoded audio in the original 2-D space of audio signal.

  3. 3.

    Inverse clipping.

    1. a.

      Add a value of 2 to negative values <−1 in the resulting encrypted 2-D audio.

  4. 4.

    Masking with Arnold cat map or Baker map.

  5. 5.

    Subtract mask from the encrypted 2-D audio.

  6. 6.

    Reshape the 2-D audio to 1-D format which represents the original audio signal.

  7. 7.

    Synthesize segments and reconstruct audio signal.

4.1 Masking

The mask is created from the secret key. A specific number of ones is introduced to an all-zero block, then this block is permutated with the chaotic Baker map to create a mask of zeros and ones as shown in Fig. 4. The output mask is added to each block of the audio signal after the reshaping step. This step is necessary to hide silent periods within the audio signal to overcome known-plaintext attacks.

Fig. 4
figure 4

Generation of the Mask. a Fill first four rows with ones, b permute the ones with the chaotic Baker map

If, for example, the secret key of the Baker map is equal to {4, 2, 2, 4} then the sum of the sub-keys drives to 12 × 12 blocks.

The number of sub-keys is 4, so the following steps can be executed:

  1. (a)

    First, fill the four rows with ones in a 12 × 12 all-zero block based on the number of sub-keys that is equal to four.

  2. (b)

    Second, permute the block resulting from step (a) with the chaotic Baker map to spread ones in whole block.

  3. (c)

    Finally, add the output mask to each block and a clipping step is used in the output block. In the clipping step, a value of 2 is subtracted from all values exceeding 1 resulting in negative values to make all samples between −1 and 1 as shown in Fig. 5.

    Fig. 5
    figure 5

    Steps for addition of the mask. a Original signal, b mask, c after addition of mask, d final block

5 Key performance metrics

Many metrics can be used for quality evaluation of audio cryptosystems. These metrics are classified into two categories; encryption quality metrics, and decryption quality metrics.

5.1 Encryption quality metrics

Encryption quality metrics for audio cryptosystems have a great importance in the design of the encryption techniques. They are desirable for indicating the amount of distortion introduced by the audio cryptosystems, determining the parameter settings, and optimizing the audio cryptosystems structures. The more the amount of distortion, the better is the performance of the audio cryptosystem. Audio quality metrics are concerned with calculating how far the encrypted signal from the original signal is. They are also concerned with determining the immunity of the encryption algorithm to cryptanalysis attacks.

A good encryption scheme should resist all types of known attacks. In this research paper, the security of the proposed audio encryption scheme is evaluated and compared with other audio encryption schemes using the following encryption quality metrics:

5.1.1 Statistical analysis

Several kinds of ciphers might be solved by statistical analysis (Pascal 2005). Statistical analysis is applied on the proposed audio cryptosystem to demonstrate its confusion and diffusion properties. This is shown by a test on the histogram of the encrypted audio signal, the correlation between samples in the clear and encrypted signals, and the SD measures.

Histogram A histogram is a graphical display of the tabulated densities of data (Shumei et al. 2009). In this research, the histogram test is given to evaluate the success of the substitution step by indicating that new sample values are introduced into the encrypted signal instead of the original values.

Correlation coefficient (CC) A useful metric to determine the encryption quality of audio cryptosystem is the CC between similar samples in the clear and the encrypted signals. It can be computed as follows:

$$ r_{xy} = \frac{{c_{v} (x,y)}}{{\sqrt {D(x)} \sqrt {D(y)} }} $$
(9)

where c v (x,y) is the covariance between the original signal s and the encrypted signal y. D(x) and D(y) are the variances of the signals x and y, respectively. In numerical calculations, the following discrete formulas can be applied (Manjunath and Anand 2002):

$$ E(x) = \frac{1}{{N_{x} }}\sum\limits_{n = 1}^{{N_{x} }} {x(n)_{{}} } $$
(10)
$$ D(x) = \frac{1}{Nx}\sum\limits_{n = 1}^{Nx} {\left( {x(n) - E(n)} \right)^{2} } $$
(11)
$$ c_{v} (x,y) = \frac{1}{{N_{x} }}\sum\limits_{n = 1}^{Nx} {\left( {x(n) - E(x))(y(n) - E(y)} \right)} $$
(12)

where N x is the number of audio samples involved in the computations. The low value of the correlation coefficient r xy indicates a good encryption quality.

Spectral distortion (SD) The SD is a form of encryption metrics that is applied in frequency domain on the frequency spectra of the original and processed audio signals. It is computed in dB to give how far is the spectrum of the processed audio signal from that of the original audio signal. The SD can be computed as follows (Hedelin et al. 1999):

$$ SD = \frac{1}{M}\sum\limits_{m = 0}^{M - 1} {\sum\limits_{{n = L_{s} m}}^{{L_{s} m + L_{s} - 1}} {\left| {V_{s} (k) - V_{y} (k)} \right|^{{}} } } $$
(13)

where V s (k) is the spectrum of the original audio signal in dB for a certain segment, V y (k) is the spectrum of the processed audio signal in dB for the same segment, M is the number of segments and Ls is the segment length. SD is used as a quality metric for both encryption, and decryption. For encryption, a high value of SD between the original signals and encrypted signals indicates a good quality. In contrast, for decryption, a low value of SD between the original signals and decrypted signals indicates a good quality.

5.1.2 Processing time (PT)

The processing time is the time needed to encrypt/decrypt signal. The lower the processing time, the higher the encryption speed. Only, the decryption time of the proposed audio cryptosystem is estimated as both the encryption and decryption processes have approximately the same time.

5.2 Decryption quality metrics

Decryption quality metrics have a great importance in the design and maintenance of audio cryptosystems. They are desirable to indicate the amount of distortion introduced by any audio cryptosystem for determining the parameter settings, and optimizing the encryption algorithm. These metrics determine the immunity of the audio cryptosystem to distortion and attacks. They are performed on the decrypted signal to verify its quality. There are two approaches that are used to determine the quality of decrypted audio signals; subjective and objective (Yang et al. 1998). Subjective metrics determine the quality depending on the perceptual ratings by a group of listeners. Objective metrics determine the quality using the physical parameters and computational models, and they are less expensive. They save time and give more consistent results. Therefore, objective audio metrics are desirable in practical applications. Current objective audio quality metrics base their estimates on the use of both original and decrypted audio signals. It is useful to measure the quality of the decrypted audio signal compared to the original audio signal to show the effect of the audio cryptosystem.

5.2.1 Log likelihood ratio (LLR)

The LLR metric for audio signal depends on the assumption that each segment can be represented by an all-pole linear predictive coding model of the form (Sang et al. 2003; Kwon et al. 2006):

$$ s(n) = \sum\limits_{m = 1}^{{m_{p} }} {a_{m} s(n - m) + G_{s} u(n)} $$
(14)

where a m (for m = 1, 2, …, mp) are the coefficients of the all-pole filter, G s is the gain of the filter and u(n) is an appropriate excitation source for the filter. The audio signal is windowed to form frames of 15–30 ms length. The LLR metric is then defined as (Sang et al. 2003):

$$ LLR = \left| {\log \left( {\frac{{{\vec{\mathbf{a}}}_{{\mathbf{s}}} {\bar{\mathbf{R}}}_{{\mathbf{y}}} {\vec{\mathbf{a}}}_{{\mathbf{s}}}^{{\mathbf{T}}} }}{{{\vec{\mathbf{a}}}_{{\mathbf{y}}} {\bar{\mathbf{R}}}_{{\mathbf{y}}} {\vec{\mathbf{a}}}_{{\mathbf{y}}}^{{\mathbf{T}}} }}} \right)} \right| $$
(15)

where \( {\vec{\mathbf{a}}}_{{\mathbf{s}}} \) is the LPCs coefficient vector [1, a s (1), a s (2),…,a s (mp)] for the original clear audio signal, \( {\vec{\mathbf{a}}}_{{\mathbf{y}}} \) is the LPCs coefficient vector [1, a y (1), a y (2),…, a y (mp)] for the decrypted audio signal, and \( {\bar{\mathbf{R}}}_{{\mathbf{y}}} \) is the autocorrelation matrix of the decrypted audio signal. The closer the LLR to zero, the higher is the quality of the output audio signal.

6 Experimental results and discussions

During a course of experiments, the two proposed hybrid audio cryptosystems namely; Arnold cat map with drpe, and Baker map with DRPE are implemented, and their performance is evaluated and compared with each other.

6.1 Experiment 1

In this experiment, using CC, SD, LLR, and PT as key performance indicators, the performance of the two proposed audio cryptosystems is evaluated and compared. The results are tabulated in Table 2.

Table 2 Quality metrics values for hybrid arnold cat map with DRPE and baker map with DRPE audio cryptosystems

From that table; one can easily notice that the Hybrid Baker map with DRPE audio cryptosystem outperforms the Arnold map with DRPE. As in the encryption phase, it provides a lower value for CC, and a higher value for SD between the original and encrypted audio signals than the Arnold map with DRPE which indicates a better encryption quality. Also, in the decryption phase, it provides lower values for LLR, SD, and PT for the decrypted signals than the Arnold map with DRPE. This means that the quality of encrypted signal resulting from the hybrid Baker map with DRPE audio cryptosystem is better than that obtained by the hybrid Arnold map with DRPE audio cryptosystem.

6.2 Experiment 2

In this experiment, the histogram check is used to evaluate the success of the substitution step by indicating the new sample values into the encrypted signal instead of the original values. Figure 6a presents the histogram of the original audio signal, while Fig. 6b presents the histogram of the encrypted audio signal using chaotic Arnold map with DRPE audio cryptosystem, and Fig. 6c presents the histogram of the of encrypted audio signal chaotic Baker map with DRPE audio cryptosystem.

From the Fig. 6b, it is observed that the histogram of the encrypted audio signal using Arnold map with DRPE audio cryptosystem is completely different from that of the original audio signal presented in the histogram at Fig. 6a which means that the security level of the first proposed audio cryptosystem (Arnold map with DRPE) is high. Also, from Fig. 6c, it is easy to notice that the histogram of the encrypted audio signal using Baker map with DRPE audio cryptosystem is fairly uniform and is significantly different from that of the original audio signal presented in the histogram at Fig. 6a which means that the security level of the second proposed audio cryptosystem (Baker map with DRPE) is also high.

Fig. 6
figure 6

Histograms of the audio signal. a Histogram of original audio, b histogram of encrypted audio signals using chaotic Arnold map with DRPE encryption, c histogram of encrypted audio signal using chaotic Baker map with DRPE encryption

6.3 Experiment 3

A waveform is an image that clarify an audio signal. It shows the changes in amplitude over a certain amount of time. In Fig. 7a, b, and c, the waveforms of the original audio signal, encrypted audio signal using Baker map with DRPE audio cryptosystem, and decrypted audio signal are presented. As it could be seen from that figure, the waveform of encrypted audio signal is completely different from that of the original audio signal which means that the encryption quality of proposed hybrid chaotic optical cryptosystems is high. Also, by comparing the waveforms presented at Fig. 7a, and c one can notice that they are very close to each other which indicates the decryption quality of the proposed audio cryptosystems is high.

Fig. 7
figure 7

Waveform of original, encrypted, and decrypted audio signal. a Waveform of the original audio signal, b waveform of the encrypted audio signal, c waveform of the decrypted audio signal

6.4 Experiment 4

A spectrogram is a visual clarification of the spectrum of frequencies in an audio signal as they change with time or some other variable. In Fig. 8a, b, and c, the spectrograms of the original audio signal, encrypted audio signal using Baker map with DRPE audio cryptosystem, and decrypted audio signal are presented. By comparing the Fig. 8a, and b, one can easily notice that spectrogram of encrypted signals is completely different from that of the original signal. This implies higher encryption quality. Also, by comparing the Fig. 8a, and c, it is easy to notice that they are very close to each other which implies higher decryption quality.

Fig. 8
figure 8

Spectrograms of the original, encrypted and decrypted signal. a Spectrogram of the original audio signal. b spectrogram of the encrypted audio signal, c spectrogram of the decrypted audio signal

7 Conclusion

Audio security is concerned with insuring the secrecy, reliability, accessibility and confidentially of data. The main target of voice security is to protect audio systems from unauthorized access, disruption, alteration, annihilation and use. This paper proposed two efficient hybrid chaotic optical audio cryptosystems. They are Arnold cat map with DRPE, and Baker map with DRPE. These proposed cryptosystems are multilayer security systems. They utilize the chaotic systems to provide the first security layer. In addition to this security layer, the proposed audio cryptosystems utilize the DRPE for providing the second security layer. This layer represents a physical security layer one that it is immune to attacks. The two proposed hybrid chaotic optical audio cryptosystems are implemented and their performances are evaluated and compared with each other using different performance evaluation metrics. The results showed that the two proposed audio cryptosystems have high security degree with high confidentiality. Also, the results showed that the hybrid Baker map with DRPE audio cryptosystem outperforms the Arnold map with DRPE using various encryption/decryption audio quality metrics.