1 Introduction

As communication technology improves, data transmission through the internet has a manifold. Cryptographic encryption techniques help in providing confidentiality by generating unintelligible data. However, this unintelligible data indicates some secret communication that lures attackers to perform various attacks to decipher the secret communication. Concealing cryptographic encryption data in some cover data helps reduce attacks. Concealing through embedding on the cover data can be carried out using spatial or frequency domain [13]. The challenges in embedding the information on cover data include high imperceptibility, embedding capacity and robustness. Increasing any of the mentioned three qualities reduces the other two qualities. The three qualities are interrelated [7]. Embedding on the spatial domain has better embedding capacity than the frequency domain, but frequency domain embedding has got better concerning imperceptibility. Various authors have developed various methods to conceal multimedia data like text, audio, and image in some other multimedia data to conceal confidential data. Some of the text hiding methods are given in [6, 10, 15, 18, 19, 30]. Karakus et al. [15] proposed a method to conceal the doctor’s comments in a medical image using Genetic Algorithm-Optimum Pixel Similarity. Karakus et al. method is compared with the classical similarity-based method and found to have more data hiding capacity. Ditta et al. [6] proposed a method to conceal personal text data using Arabic text as a carrier. The method hides the secret information using Unicode characters like Zero Width Joiner and Zero Width character. In order to achieve high security, the secret information is encrypted using bit inversion before concealing in the cover medium. Younus et al. [30] proposed a method to secure secret text by hiding it in a cover image. The secret text is compression using the Huffman technique and encrypted using the Vigenere cipher. Random blocks are selected from the cover image using the Knight tour algorithm (KTA) and the compressed, encrypted data are embedded using Exploiting Modification Direction (EMD) technique. Ren et al. [19] proposed a method to hide secret messages using Advanced Audio Coding (ACC) audio as a carrier. The method is designed to reduce the distortion in the carrier audio by combining Huffman codeword histogram, modifying quantized modify discrete cosine transformation and perceptual masking using the psychoacoustic model. Qi et al. [18] inspired by natural data hiding technique, proposed a method to securely transmit confidential data by creating a synthetic haze image that resembles some natural weather condition. The private data is embedded using the HILL cost function. Some demerits are possibilities of false during parameter estimation and binary embedding only. Rahman [10] proposed a method for securely sending confidential information for a nuclear reactor by hiding the information in the middle frequency of a DCT transformed cover image. The private data is converted into a binary sequence and every two bits are hidden in LSB1 and LSB2 of the middle frequency DCT transformed cover image. Since the direct secret data are embedded, the possibility is that an attacker recovers the confidential data if he or she knows that communication is taking place using this method. Basu et al. [3] proposed a method to hide the secret image into another cover image. A differential evolution optimization algorithm is deployed to hide the secret data constructively.

As an image is one of the most communicated data, various authors have developed methods for concealing the secret image using various multimedia covers such as images, audio or videos. Some image hiding methods are given in [1, 8, 9, 11, 17, 22, 23, 25, 28, 31, 32]. Thanki et al. [23] proposed a method to hide a secret image using another cover image. The cover image’s Ridgelet coefficient and wavelet coefficient are obtained by applying Finite Ridgelet Transform and single-level DWD, respectively. The secret image is scrambled using Arnold’s transform and embedded in the LL subband of the cover image. The drawback of the method is that Arnold’s transformation takes a lot of execution time. Valander et al. [25] proposed a method to hide a secret image using a cover image using Integer Wavelet Transform (IWT). The secret image is encrypted using a modified Logistic chaotic map to increase the security of the secret image. Banik et al. [11] proposed a robust binary image hiding technique using audio as the carrier. The image undergoes Arnold’s scrambling operation and is embedded using the cocktail party effect at the LH subband of the Discrete Wavelet Transform of the audio carrier. El-Latif et al. [1] proposed a method to hide an image half the size of the cover image using an S-box created from quantum walks. The S-box determines the position of embedding the secret image data to the cover image. Mukherjee et al. [17] proposed a multi-bit embedding method that can embed 2-6 bits with an image as a carrier or 6-13 bits with audio as a carrier. Embedding is done using pixel value differencing (PVD) on the spatial domain. Zhang [32] proposed a data hiding technique using an image as a cover. The cover image undergoes Haar IWT to avoid truncation error. The data are hidden in the edges of the LL sub-band based on multidirectional line encoding. Wahab et al. [26] proposed a method for hiding secret data on a cover image. The secret data is first encrypted using RSA (Rivest Shamir Adleman) encryption scheme and compressed using Huffman code. The encoded data is hidden using the LSB embedding technique on the LH, HL, and HH sub-band for DFT decomposed cover image. Abdulhammed et al. proposed a novel method to hide secret data into a cover image. The secret image data is stored in the edges of a cover image identified using a strong edges detection algorithm (SEDA). The position for embedding is computed using a random sequence generated using the Chirikov map. Yu et al. [31] proposed a reversible data hiding technique using audio as a carrier. The personal data is converted into novenary digits, embedded into the audio (single channel) carrier using a magic matrix, and converted into dual-channel stego-audio. The proposed method has got better signal-to-noise-ratio (SNR), and objective difference grade (ODG) than that of the method given by Xiang et al. [28]. Shafi et al. [22] proposed a method to hide data using audio as cover based on amplitude differencing. The method uses two audio covers of similar size to embed the data. The amplitude difference from the two cover audio with multiple indexes ranging from 0 to 255 (each at an interval of 16) is used to generate the stego-audio, where 4 bits of the secret data are embedded sequentially. The proposed method has good imperceptibility, but the maximum embedding ratio is just 12.5% of the cover audio and low robustness. El-Khamy et al. [8] proposed a method to hide image data in the audio signal. The image is encrypted using a random sequence generated from the Logistic map with XOR operation to increase security. The audio signal undergoes a two-level Cohen-Daubechies-Feauveau Integer to Integer lifting wavelet transform. The binary bits 0 or 1 of the encrypted image are hidden using a threshold technique in the second detail sub-band coefficients. El-Khamy et al. [9] proposed a method to hide image data into the audio signal using sample comparison with Discrete Wavelet Transform (DWT). The original image is first encrypted using the RSA encryption technique and binary data is generated from the cipher image. The position of embedding is based on a pseudo-random number generated. The cover audio is decomposed using DWD. The cipher bits are embedded by comparing the sample of DWD with a threshold value and making necessary changes to the coefficient of DWD if required based on the threshold. The method is robust to noise attacks, but it has a low embedding capacity.

Cipher images lure attackers to perform cryptanalysis as it indicates secret communication. Concealing the cipher image into a cover image while maintaining high imperceptibility, robustness and high embedding capacity are of research importance. Motivated by the various data hiding works, a method is proposed to conceal secret image information in audio data. The proposed method aims to improve the embedding capacity while maintaining high imperceptibility and robustness. The contribution of the paper includes:

  1. 1.

    The secret image is converted to non-intelligible data by scrambling and encrypting using Ikeda chaotic map. The initial conditions for the Ikeda map are secretly shared using Elliptic Curve Cryptography (ECC) cryptosystem.

  2. 2.

    To avoid a known-plaintext attack, the initial conditions for the Ikeda map depend on the Secure Hash Algorithm (SHA3-384) bits applied to the secret image.

  3. 3.

    To avoid suspicious communication taking place to an attacker, the secret image’s intelligible data is concealed in the Lifting Wavelet Transform (LWT) of the carrier audio, maintaining a good embedding capacity with significantly less compromise on the carrier audio quality.

  4. 4.

    To decrease the computation involved in random number generation using the Ikeda map, a base conversion operation is applied so that each loop generates 12 random values.

The following is the organization of this paper. In Section 3, the proposed scheme is described, which includes key exchange and keys generation for the Ikeda map, generating the scrambled-encrypted image and concealing the encrypted image into audio data along with reversing the process to get back the secret image. Section 4 shows the simulation results and the analyses of the proposed method. The conclusion is given in Section 5.

2 Ikeda map

Kensuke Ikeda pioneered the development of the Ikeda map [14]. The Ikeda map is a discrete time dynamic system given by:

$$ \begin{array}{@{}rcl@{}} x_{k+1}&=&\alpha+\beta(x_{k}Cos[t]+y_{k}Sin[t]); \\ y_{k+1}&=&\beta(x_{k}Sin[t]-y_{k}Cos[t]); \end{array} $$
(1)

where \(t =c-\frac {d}{1+{x_{k}^{2}}+{y_{k}^{2}}}\), α ∈ (0,8), β ∈ (0.6,1), c ∈ (2,5), d ∈ (35,50)

Figure 1 shows the attractor, bifurcation diagram and Lyapunov characteristic exponents of the Ikeda map.

Fig. 1
figure 1

Ikeda map (a) attractor (b) Bifurcation diagram (c) Lyapunov characteristic exponents

Equation 1 is iterated and plotted to depict the attractor as shown in Fig. 1a. The bifurcation plot with α ∈ (0,8) with a step size of 0.005 and β = 0.6 is shown in Fig. 1b. A bifurcation diagram helps in visually examining the chaotic behavior. From Fig. 1b, it is seen that the chaotic behavior is more ergodicity for α > 2.3. Lyapunov exponent measures the exponential separation for two minuscule close orbits with respect to variation in the control parameter. A positive Lyapunov exponent indicates that the system is chaotic. The Lyapunov exponent plot given in Fig. 1c denotes that the Ikeda system is chaotic. In order to generate a chaotic sequence for image scrambling and cipher image generation, the Ikeda map is used in the proposed scheme as it poses the desired chaotic behavior.

3 Proposed scheme

3.1 Key exchange and keys generation for Ikeda map

  1. 1

    Get the 384 bits hash value (h) by applying (SHA3-384) on the input image.

  2. 2

    Apply elliptic curve point multiplication operation between the hash value h and the generator G of a finite field elliptic curve given by ερ : y2x3 + ax + b mod ρ to get an elliptic coordinate hG(hGx,hGy). hG(hGx,hGy) is secretly shared using ECC to the other communicating party [21].

  3. 3

    hGx and hGy are converted into binary bits each of 384 bits.

  4. 4

    Two sets of initial parameters for the Ikeda map are computed from the values of the bits from Step [3] as follows. For image scrambling, the binary bits from hGx are used and for image encryption, the binary bits from hGy are used. Where, x0 = [b1...b64]2/264; y0 = [b65...b128]2/264; α = 5 + [b129...b192]2/264; β = e + [b193...b256]2/(264 × 10); \(t =c-\frac {d}{1+{x_{k}^{2}}+{y_{k}^{2}}}\); c = 3 + [b257...b320]2/264; d = 43 + [b321...b384]2/264; [bi...bi+ 63]2 is the equivalent binary to integer conversion using 2 as base for each 64 binary bits, e = 0.7 and e = 0.9 for image scrambling and image encryption respectively.

3.2 Image scrambling

To generate a scrambled image from an colour input plain image the following steps are performed:

  1. 1

    Input the plain RGB colour image and determine the image dimension (M × N).

  2. 2

    The Ikeda map is run for a loop of (M × N × 3) to generate a list of X and Y sequence using the initial parameters derived from hGx.

  3. 3

    The X and Y sequence are partitioned into three parts (XRed,XGreen,XBlue) and (YRed,YGreen,YBlue) for each colour channel.

  4. 4

    Each list is sorted and stored as (XRSort, XGSort, XBSort) and (YRSort, YGSort, YBSort).

  5. 5

    A permutation table (PXRed,PXGreen,PXBlue,PY Red,PY Green,PY Blue) is generated by determining the position of each values of XRed in XRSort, XGreen in XGSort, XBlue in XBSort, YRed in YRSort, YGreen in YGSort and YBlue in YBSort respectively.

  6. 6

    The input image is separated into the corresponding colour channels (IR,IG,IB).

  7. 7

    Each IR,IG and IB are partitioned vertically into M parts and vertically scrambled using permutation table (PXRed,PXGreen and PXBlue respectively to generate RGB vertically scrambled images.

  8. 8

    The RGB vertically scrambled images are partitioned horizontally into N parts and horizontally scrambled using permutation table (PY Red, PY Green and PY Blue respectively to generate the horizontally scrambled images.

  9. 9

    The output RGB images of Step [9] are combined to form the required scrambled image (Simg).

3.3 Image encryption

To generate the cipher image from the scrambled image (Simg) the following steps are performed:

  1. 1

    The Ikeda map is run for a loop of (M × N × 3)/12 to generate a list of X and Y sequence using the initial parameters derived from hGy.

  2. 2

    Each values in X and Y are converted into 6 integers as follows:

    $$ \begin{array}{@{}rcl@{}} \begin{array}{l} S_{\chi}=[X_{i} \times 10^{16}]_{256}[[2...7]]\\ S_{\Psi}=[Y_{i} \times 10^{16}]_{256}[[2...7]] \end{array} \end{array} $$
    (2)

    where, [Xi × 1016]256[[2..7]] and [Yi × 1016]256[[2..7]] are the integer to base 256 conversion generating a list of values ∈ (0 − 255) and the values from position 2 to 7 are taken.

  3. 3

    The values in SX = (sχ1,sχ2,...,sχn) and SY = (sΨ1,sΨ2,...,sΨn) are riffle as S = (sχ1,sΨ1,sχ2,sΨ2,...,sχn,sΨn) and used to generate a chaotic image Cimg.

  4. 4

    The chaotic image Cimg is XORed with the scrambled image Simg to generate the encrypted image Eimg.

3.4 Hiding encrypted image in audio data

To hide the encrypted image in an audio data the following steps are performed:

  1. 1

    Import the audio file (.wav).

  2. 2

    Extract the number of audio channel (ac), the sample rate (as) and the audio data (value ranges from -1 to 1) from the imported audio.

  3. 3

    Combine all the channel (if more than one channel) and store as a single list. Apply Lifting Wavelet Transform (LWT) using Cohen-Daubechies-Feaveau (CDF) wavelet for 2 level (0,1,00,01) of refinement.

  4. 4

    Generate three random integers (r1,r2,r3) between 1 and the length of each list in the LWT 1, 00 and 01 refinement levels and share (r1,r2,r3) secretly to the receiver.

  5. 5

    The pixel values in encrypted image Eimg is divided into 4 parts (P1, P2, P3, P4), where Pi are represented as triplet digits with necessary 0 padding at the left (for instance, if pixel value is 20, triplet digits= 020). P1 and P2 are concatenated to form P12.

  6. 6

    The values in P12, P3 and P4 replaces the fifth to seventh fractional part of the real digits starting at position (r1,r2,r3) of the 1, 00 and 01 Level of LWT respectively with wrapping around if needed.

  7. 7

    The Lifting Wavelet Data at level 0 along with the cipher data embedded DWD in Step [6] are combined together and Inverse Wavelet Transform is applied.

  8. 8

    The data in Step [7] is partitioned based upon the number of audio channel (ac) and represented as audio (.wav) using the same sample rate (as).

3.5 Extracting encrypted image information

The following operations are performed to extract the encrypted image information from the stego-audio.

  1. 1

    Import the stego-audio file.

  2. 2

    Extract the audio data.

  3. 3

    Apply Lifting Wavelet Transform (LWT) using Cohen-Daubechies-Feaveau wavelet for 2 level (0,1,00,01) of refinement.

  4. 4

    Obtain the values of (r1,r2,r3).

  5. 5

    Extract the values of P12, P3 and P4 from 1, 00, 01 level of LWT (step 2) from positions (r1,r2,r3) respectively with wrapping around if necessary.

  6. 6

    The encrypted image Eimg information is obtained by combining P12, P3 and P4.

Block diagram for enciphering and concealing in the proposed method is shown in Fig. 2a.

Fig. 2
figure 2

Proposed method block diagram for (a) enciphering and concealing the secret image in audio data. (b) deciphering and revealing the secret image from the audio carrying secret image data

3.6 Image decryption

To generate the deciphered image from the encrypted image (Eimg) the following steps are performed:

  1. 1

    The Ikeda map is run for a loop of (M × N × 3)/12 to generate a list of X and Y sequence using the initial parameters derived from hGy.

  2. 2

    Each values in X and Y are used to generate SX and SY as given in Step 2 of Section 3.3.

  3. 3

    The chaotic image Cimg is generated using the same process as given in Step 3 of Section 3.3.

  4. 4

    The chaotic image Cimg is XORed with the encrypted image Eimg to generate the deciphered scrambled image Simg.

3.7 Image descrambling

To generate a descrambled image from the scrambled image Simg the following steps are performed:

  1. 1

    Import the scrambled image Simg.

  2. 2

    Generate X and Y sequence using initial parameters hGx derived from the shared elliptic coordinate hGx.

  3. 3

    The X and Y sequence are partitioned into three parts (XRed, XGreen, XBlue) and (YRed, YGreen, YBlue) for each colour channel.

  4. 4

    Each list is sorted and stored as (XRSort, XGSort, XBSort) and (YRSort, YGSort, YBSort).

  5. 5

    Using the same process given in Step 5 of Image scrambling, a permutation table (PXRed,PXGreen,PXBlue,PY Red,PY Green,PY Blue) is generated.

  6. 6

    An inverse permutation table (IPXRed,IPXGreen,IPXBlue is generated by determining the position of each values of 1 to M in (PXRed, PXGreen, PXBlue. Similarly, an inverse permutation table (IPY Red, IPY Green, IPY Blue is generated by determining the position of each values of 1 to N in (PY Red, PY Green, PY Blue).

  7. 7

    The scrambled image Simg is separated into the corresponding colour channels (SIR,SIG,SIB).

  8. 8

    Each SIR,SIG and SIB are partitioned vertically into M parts and vertically descrambled using permutation table (IPXRed,IPXGreen and IPXBlue respectively to generate RGB vertically descrambled images.

  9. 9

    The RGB vertically descrambled images are partitioned horizontally into N parts and horizontally descrambled using permutation table (IPY Red, IPY Green and IPY Blue respectively to generate the horizontally descrambled images.

  10. 10

    The output RGB images of Step [9] are combined to form the required descrambled image (Iimg).

Block diagram for deciphering and revealing the secret image in the proposed method is shown in Fig. 2b.

4 Simulation and analysis of the proposed scheme

The proposed algorithm is simulated using Wolfram Mathematica 12.3 on Fujitsu Celsius workstation with configuration Intel(R) Xeon(R) W-2133 CPU @ 3.60 GHz 32 GB RAM. The sample images used are taken from the USC-SIPI Image Database [24]. The audios are taken from BBC Sound Effects [4]. The ECC technique uses the Brainpool [16] elliptic curve parameters for key exchange. Figure (3a) shows the plain image used as input. The input image is vertically and horizontally scrambled using the image scrambling technique given in Subsection 3.2 and shown in Fig. (3b) and (c) respectively. Figure (3d) shows the cipher image generated using the proposed method. The cover audio and the stego-audio is shown as an audio plot in Fig. (3e) and (f) respectively and the corresponding spectrogram is shown in Fig. (3g) and (h). Figure (3i-l) show the recovered cipher image, deciphered descrambled image, horizontally descrambled and vertically descrambled image, respectively. The absolute difference is calculated and depicted as an image to check the difference between the input image and deciphered descrambled image, as shown in Fig. (3m). Blacker the image, the lesser the difference. Figure (3n) depicts the absolute difference in audio magnitude between the cover audio and steg-audio. Lesser the difference, the magnitude of the amplitude tends to zero. The difference between the cover audio and stego-audio is minimal and lies between + 0.0001 to − 0.0001. The encrypted image data is concealed and distributed evenly across the audio data. The PSNR and SSIM value for the cover audio and stego-audio is tabulated in Table 1.

Fig. 3
figure 3

(a) Sample image. (b) Vertically scrambled image of Fig. (3a). (c) Vertically scrambled image of Fig. (3b). (d) Encrypted image. (e) Cover audio. (f) Stego audio. (g) Spectrogram of cover audio. (h) Spectrogram of stego audio. (i) Recovered cipher image from stego audio. (j) Deciphered descrambled image. (k) Horizontally descrambled image. (l) Vertically descrambled image. (m) Image difference between Fig. (3a) and (l). (n) Audio difference between Fig. (3e) and (f)

Table 1 PSNR and SSIM of the stego-audio

The PSNR and SSIM values of the stego-audio show that the stego-audio is very close to the cover audio, indicating high imperceptibility. The PSNR and SSIM value for the stego-audio, cipher image and the decrypted image is tabulated in Table 1. The PSNR and SSIM values of the decrypted image indicate that the original input image and the deciphered image are the same.

4.1 Embedding capacity

Embedding capacity is computed as:

$$ \begin{array}{@{}rcl@{}} \text{Embedding capacity}=\frac{\text{Size of secret data}}{\text{Size of cover data}} \times 100 \% \end{array} $$
(3)

The proposed method uses a 2200KB audio data to hide a colour image of size 768KB. The embedding capacity is 34.9%.

4.2 Noise attack

In a noise attack, random noise is induced in the stego-audio. The stego-audio is induced with noise whose magnitude ranges from (− 1,1) for a certain percentage of the stego-audio duration to determine the robustness of the proposed algorithm against noise attack. Table 2 shows the PSNR, SSIM and BER values for the decrypted images under random noise attack (Fig. 4).

Table 2 PSNR, SSIM and BER under noise attack and random cropping attack
Fig. 4
figure 4

(a-c) Steg-audio with 12.5,25 and 50% noise attack respectively. (d-f) Deciphered images from steg-audio noise attack images respectively

4.3 Random cropping attack

In a random cropping attack, certain audio signal parts are replaced by zeros. The stego-audio is applied with cropping attacks for specific durations to determine the robustness of the proposed algorithm against random cropping attacks. The random audio crop attack for (12.5%,25%,50%) on the stego audio are shown in Fig. (5a - c). The respective deciphered plain images are shown in Fig. 5d-f.

Fig. 5
figure 5

(a-c) Random audio crop attack for (12.5%,25%,50%) on the stego audio respectively. (d-f) Deciphered images from stego audio under audio trim attack for (12.5%,25%,50%) respectively

The PSNR, SSIM and BER values for the deciphered images under random cropping attack are tabulated in Table 2. The values show that the proposed method is robust against random cropping attacks and the generated deciphered images are visually perceivable.

4.4 Correlation analysis

The correlation coefficient shows how strongly two variables are related. Images are made up of pixel values arranged as a 2D matrix. The correlation coefficient analysis in cipher image helps assess the encryption algorithm’s statistical characteristic. The correlation coefficient is computed as:

$$ \begin{array}{@{}rcl@{}} \text{Correlation} = \frac{{\sum}_{i=1}^{n}(\chi_{i}-\overline{\chi})(\psi_{i}-\overline{\psi})} {\sqrt{{\sum}_{i=1}^{n}(\chi_{i}-\overline{\chi})^{2}} \sqrt{{\sum}_{i=1}^{n}(\psi_{i}-\overline{\psi})^{2}}} \end{array} $$
(4)

where n is the number of pixels under consideration. Ten thousand points are randomly selected to calculate the correlation coefficient and horizontal, vertical and diagonal (HVD) directions. The correlation coefficient ranges from -1 to 1, where values toward -1 indicate anti-correlation, values towards 0 show no correlation and values toward 1 indicate high correlation. Standard images are usually highly correlated, indicated by values tending towards 1 and cipher images are lowlily correlated, indicated by values tending towards 0, as shown in Table 3.

Table 3 Correlation coefficients for plain and cipher images

Graphical plots of the correlation along HVD directions for plain and cipher images are shown in Fig. 6. The correlation graph is concentrated for plain images, indicating a higher correlation, while the correlation graph is dispersed for cipher images, indicating a lower correlation.

Fig. 6
figure 6

HVD correlation graph for (a-c) Plain image. (d-f)Cipher image

4.5 Attack analysis

The cipher-text-only attack is an attack where the adversary has access to only the cipher-text information. For the proposed model, if the adversary has access to the cipher-text, the adversary cannot decipher the plain-text as he/she does not have the correct key. Determining the key would require solving ECDLP, a challenging problem, and trying a brute force attack is impractical. A known-plaintext attack is a type of attack where the adversary has access to cipher-text and the corresponding plain-text and tries to decipher a new cipher-text. A hash value is used for the proposed method to generate the secret keys using (SHA3-384). Every different image will have a different set of secret keys. So, the proposed method will be safe from cipher-text-only and known-plaintext attacks.

4.6 Histogram analysis

A histogram plot of an image depicts the frequency of each pixel value. A standard image usually has specific intensities conglomerated to provide meaningful information. So in plotting the histogram of a plain image, uneven distribution of pixels frequency is seen as shown in Fig. 7a. A good cipher image conceals any meaningful information with a uniform distribution of pixels frequency. So in plotting the histogram of a cipher image, uniform distribution of pixels frequency is seen as shown in Fig. 7b.

Fig. 7
figure 7

Histogram plot of (a) Plain Lena image. (b) Cipher Lena image

4.6.1 Variance

A variance of the histogram provides the mathematical confirmation for the histogram distribution. Lower the variance, the more uniform the histogram. Higher the variance, the more uneven the histogram. Variance is given by:

$$ \begin{array}{@{}rcl@{}} Variance=\frac{1}{N^{2}} \sum\limits_{i=0}^{N-1} \sum\limits_{j=0}^{N-1} \frac{1}{2} (h_{i}-h_{j})^{2} \end{array} $$
(5)

where, N = 256, hi and hj are number of grey pixels. Variance for the plain and cipher image are tabulated in Table 4.

4.6.2 Maximum deviation

The maximum deviation (MD) [5] is computed as:

$$ \begin{array}{@{}rcl@{}} MD=\frac{h_{0}+h_{255}}{2}+\sum\limits_{i=1}^{255}h_{i} \end{array} $$
(6)

where, hi is the absolute difference of the i th pixel count between the cipher and the plain image. MD measures the deviation of the cipher image from the plain image. Higher the difference better the encryption algorithm. The maximum deviation of the proposed method is tabulated in Table 4.

Table 4 Variance, maximum deviation and irregular deviation for plain and cipher images

4.6.3 Irregular deviation

The irregular deviation (ID) [5] is computed as:

$$ \begin{array}{@{}rcl@{}} ID=\sum\limits_{i=0}^{255}(\lvert H_{i}-A_{h} \rvert) \end{array} $$
(7)

where, Hi is the absolute difference of the i th pixel count between the cipher and the plain image; Ah is the mean of the histogram values. ID measures the closeness of statistical distribution between histogram deviation and the uniform distribution. A smaller ID indicated a better encryption algorithm. Irregular deviation for the proposed method is tabulated in Table 4.

4.7 Key space and key sensitivity analysis

The proposed algorithm uses 384 bits Brainpool parameters. The sender and receiver use a 384 bits private key to share the elliptic curve coordinate hG. Solving the private key from a public key in ECC requires solving the elliptic curve discrete logarithmic problem (ECDLP). Using Brute force requires computing CO computation, where CO is the cyclic order of the given 384 bits Brainpool parameter for a given Generator G. The best-known techniques, such Baby-step-Giant step method or the Pollard’s rho method, requires \(\sqrt {CO}\) steps to solve the ECDLP. \(\sqrt {CO}=4.65395 \times 10^{57}\) steps are large enough to baffle the attacker. The proposed algorithm is susceptible to keys. A single bit change drastically changes the data of the output. Figure (8a-b) shows the images of the deciphered image using the correct key and another utilizing a key that is just a bit different from the original key.

Fig. 8
figure 8

Decrypted using (a) Correct key nreceiver. (b) Incorrect key nreceiver − 1

4.8 Differential attack

Cryptanalysis using differential attack tries to find non-random behavior in the cipher data generated from two minimally different plain inputs. Two input images are considered that differ just by a bit to test the robustness of the proposed encryption scheme against differential attacks. The cipher images’ corresponding Number of Changing Pixel Rate (NPCR) and Unified Averaged Changed Intensity (UACI) values are analyzed. Wu et al. [27] proposed that a cipher image NPCR should be greater than the threshold criteria \(N_{\alpha }^{*}\), where α is the significance level. For UACI, the value should be in the interval of \(U_{\alpha }^{*-}\) and \(U_{\alpha }^{*+}\). An image with dimension 512 × 512, should have \(N_{\alpha }^{*}\) greater than 99.5717%, 99.5810%, 99.5893% and the interval of \(U_{\alpha }^{*-}\) and \(U_{\alpha }^{*+}\) of 33.3115% − 33.6156%, 33.3445% − 33.5826% and 33.3730% − 33.5541% for significance levels 0.001, 0.01 and 0.05 respectively. Figure (9a) and (b) shows the two different images that just differ by a bit and the corresponding cipher image are given in Fig. 9c and d. Figure (9e) shows the image generating through absolute pixel difference between the two cipher images given in Fig. 9c and d.

Fig. 9
figure 9

(a) Original image. (b) One pixel value change from Fig. (9a) at location (255,255). (c) Encrypted image of Fig. (9a). (d) Encrypted image of Fig. (9b). (e) Image difference between Fig. (9c) and (d)

The NPCR and UACI values are computed and tabulated in Table 5. The proposed encryption scheme passes the NPCR and UACI threshold criteria \(N_{\alpha }^{*}\), \(U_{\alpha }^{*-}\) and \(U_{\alpha }^{*+}\).

Table 5 Entropy, NPCR, UACI and avalanche effect

4.9 Avalanche effect

The Avalanche effect is the desired property of a cryptographic algorithm where a slight change in the key or the plain image should drastically change the cipher image. The computed avalanche value for a single bit flipped in the input image and a single bit flipped in the input key is tabulated in Table 5. The avalanche values indicate that for a single bit flip in the input image or the key 50% of the cipher data got changed.

4.10 Entropy

In an image, entropy provides the randomness information on the distribution of pixels. Entropy is given by:

$$ \begin{array}{@{}rcl@{}} \text{Entropy} = -\sum\limits_{i=1}^{2^{n}-1}P(p_{i})\log_{2}\frac{1}{ P(p_{i})} \end{array} $$
(8)

where, n is the number of pixels and P(pi) denotes the probability of pixel pi. In an image with pixels represented using 8 bits, the theoretical maximum entropy value is \(log_{2}{2}^{8}=8\). The higher the entropy values, the more uniformly the pixels are distributed. So, a cipher image tends to have entropy close to 8 compared to plain images. The Entropy value for the plain and cipher images is tabulated in Table 5.

4.11 Randomness test

Rukhin et al. [20] develop a standard statistical test suite (NIST test suite 800-22 revision 1a) that can be used to test the randomness of the cipher data. In each test a p-value is computed. A p-value greater than 0.01 suggests that the cipher data is random with the confidence of 99%. The fifteen tests are applied to each cipher image of Peppers, Lena, Splash and Baboon, and the p-values are tabulated in Table 6. The p-values indicates the encryption algorithm passes the randomness test with confidence of 99%.

Table 6 Randomness test

4.12 Complexity of the proposed method

The proposed method consists of:

  1. 1.

    Computing the hash value (SHA3-384) of the input image: O(n), where n is the number of bits in the input image.

  2. 2.

    Computation of elliptic curve point multiplication: O(log2h), where h is the hash value of the input image.

  3. 3.

    Scrambling of the input image: O(N2), where N is the dimension of the image.

  4. 4.

    Chaotic sequence generation: O(p), where p is the number of the pixels in the input image.

  5. 5.

    Computing LWT: O(NlogN), where N is the audio data.

  6. 6.

    Concealing operation: O(p), where p is the number of the pixels in the input image.

Overall, the complexity of the proposed method is O(n) + O(log2h) + O(N2) + 2O(p) + O(NlogN).

4.13 Comparison and discussion

Performance comparison of the proposed method with other techniques of data hiding is tabulated in Table 7. The proposed method had a good PSNR value for the embedded data, which is better than the existing techniques. The SSIM value of the embedded data for the proposed method is close to 1, where SSIM 1 indicates exact similarity with the original data. The embedding capacity of the proposed method is better than other existing methods. Discussion of the compared data hiding techniques are as follows:

  1. 1.

    In Ref. [30], the plain text is secure using the Vigenere cipher, and the payload is increased using the Huffman lossless compression technique. However, the Huffman dictionary table must be shared between the communicating parties for each secret message.

  2. 2.

    In Ref. [10], the confidential information of the nuclear reactor is hidden in middle band DCT coefficients by replacing the LSB. The technique possesses imperceptibility but has got low embedding capacity.

  3. 3.

    In Ref. [3], the confidential data is hidden in the insignificant perceptual region in the cover image determined by deploying the Differential Evolution optimization algorithm. The technique possesses imperceptibility and robustness but lacks embedding capacity.

  4. 4.

    In Ref. [23], the scrambled secret image is hidden in the DWT coefficient. The technique has a high embedding capacity; however, the scrambled secret image is obtained through Arnold’s transformation, a permutation-only technique with finite cyclic order.

  5. 5.

    In Ref. [25], the secret image is enciphered using a modified Logistic map and embedded in the cover image using IWT. The modified Logistic map has got better key-space, thereby increasing the security. The method possess possesses imperceptibility and high embedding capacity.

  6. 6.

    In Ref. [11], the technique is blind steganography that uses both DCT and DWT to hide the secret image. Arnold’s transformation is used to scramble the pixels to increase security. The technique possesses a strong imperceptibility and robustness but has a moderate embedding capacity. The security can be improved because Arnold’s transformation is a permutation-only technique with finite cyclic order.

  7. 7.

    In Ref. [1], a substitution box (S-Box) is developed using quantum walks. The secret data is expanded to the size of the cover image. The secret data is hidden in the cover image based on the S-box entries. The technique possesses a strong imperceptibility and embedding capacity but lacks robustness.

  8. 8.

    In Ref. [32], the secret image is embedded in the edge obtained from 3 × 3 non-overlapping blocks on the cover image. To avoid truncation errors, IWT is deployed. The problem of overflow or underflow in embedding for pixels with values close to 0 or 255 is also handled. The technique possesses a strong imperceptibility and shows robustness to some common attacks but has low embedding capacity.

  9. 9.

    In Ref. [26], the secret data is ciphered using RSA and compressed using Huffman coding, followed by embedding in the DWD LH, HL, and HH sub-bands. The method has got a moderate imperceptibility and embedding capacity. Choosing the primes of the RSA is a concern in this method. The smaller primes will generate a smaller value for the cipher, which will help better payload but is vulnerable to integer factorization attacks. Bigger primes will be secure, but the payload will decrease.

  10. 10.

    In Ref. [2], the plain text is secure through the elliptic curve encryption technique. However, the chosen elliptic parameter has a small cyclic order, and the technique uses a static table for each character.

  11. 11.

    In Ref. [29], the secret image is stored in the strong edges detected using SEDA, which helps in maintaining the robustness and imperceptibility, but it has low embedding capacity.

Table 7 Comparison with existing methods of data hiding

5 Conclusion

Images are one of the most transmitted digital data where specific images are confidential. Various image encryption methods are proposed by different authors that convert the plain image to an unintelligible image that looks noisy to maintain confidentiality. From an attacker’s perceptive, these unintelligible noisy images indicate that something important is transmitted. Many such encryption schemes are cryptanalysed. Concealing the noise like encrypted data helps to baffle from cryptanalysis attack from the attacker as no clear indication is shown about the transmission of critical data. A method is proposed to safeguard the confidential image transmission by first converting the plain image into a cipher image based on scrambling and encryption operation using the Ikeda map. The cipher image is concealed in LWT audio data. Various statistical analyses are carried out to show that the technique blends the cipher image in the audio data without affecting the quality of the audio with high embedding capacity. The encryption algorithm passes the statistical and security analyses. The proposed technique shows robustness to noise attacks and random cropping attacks. Amongst the compared techniques given in Table 7, the proposed method has got the highest embedding capacity and better imperceptibility with PSNR and SSIM values of 86.33 and 0.99, respectively. The code of the proposed method can be obtained from https://github.com/Dolendro/Securing-encrypted-image-information-in-audio-dataon demand. As a future work, concealing secret data in video using natural noise [12] for key generation can be researched.