1 Introduction

Visual Cryptography (VC) is a kind of secret sharing scheme, first proposed by Naor et al. [18], which provides computation-less decoding of secret information (mainly images). k-out-of-n visual secret sharing (VSS) scheme is a special category of VC where a secret information is encoded into n meaningless shares and printed onto transparencies. These n transparencies are distributed among n participants. Secret image can be decrypted by only superimposing any k or more transparencies. Whereas k − 1 or fewer transparencies of the participants cannot decode the secret image, in spite of having in finite computation power. Secret sharing is not the only application of VC; there are many more other applications like access control, identification [17], copyright protection [6], watermarking and visual authentication. One can understand the modus operandi of VSS by Fig. 1 where 2-out-of-2 VSS (k = 2; n = 2) scheme is shown. Here a binary image is treated as secret information and each pixel p of secret binary image is encoded into a pair of black and white subpixels for both random shares.

Fig. 1
figure 1

2-out of 2 VSS, where a secret pixel is encoded into two subpixels in each of the two shares

If p is white/black, one of the first/last two columns tabulated under the white/black pixel in Fig. 1 is selected randomly so that selection probability will be 50%. Then, the first two subpixels in that column are allotted to share 1 and the following other two subpixels are allotted to share 2. Independent of whether p is black or white, pixel is encoded into two subpixels of black-white or white-black with equal probabilities.

Thus an individual share has no idea about whether p is black or white. The last row of Fig. 1 shows the superimposition of the two shares, If the pixel p is black, the output of superimposition will be two black subpixels corresponding to a gray level 1. If p is white, then result of superimposition will be one white and one black subpixel, corresponding to a gray level 1/2. Hence by stacking two shares together, we can obtain the approximate visual information of the secret image. Figure 2 shows a visual example of the 2-out-of-2 VSS scheme. Figure 2(a) shows a secret binary image I sec which will be encrypted. As per the encoding scheme shown in Fig. 1, each binary pixel p of I sec is divided into two subpixels in each shares, as shown in Fig. 2(b) and (c). Stacking the two shares leads to the output image shown in Fig. 2(d). The recovered image is decoded without any cryptographic computation. There are some contrast loss which can be noticed in decoded image and the width of the reconstructed image is just twice of the original secret image. In this paper, proposed approach uses the concepts of visual cryptography in order to secure digital audios. There are many researches done in literature which are based on image security. Our foundation for audio watermarking in cloud environment is basically comes from the image based security approaches in cloud. Zhihua Xia et al. suggested a scheme in [24] that supports Content based Image Retrieval (CBIR) over encrypted images without leaking the sensitive information to the cloud server. Zhihua Xia et al. proposed a secure multi-keyword ranked search scheme over encrypted cloud data in [23], which simultaneously supports dynamic update operations like deletion and insertion of documents. Specifically, the vector space model and the widely-used TF x IDF model are combined in the index construction and query generation. A novel color image watermarking scheme is presented by Jianzhong Li et al. in [15] which is based on quaternion Hadamard transform (QHT) and Schur decomposition,. In order to consider the correlation between different color channels and the significant color information, a new color image processing tool termed as the quaternion Hadamard transform is also proposed in [15]. A parallel processing approach for images in order to improve the efficiency more than five times in comparison to existing approaches are also presented in [1].

Fig. 2
figure 2

Example of 2 out of 2 VSS. Secret image is encoded into two random patterns and decoded image has 50% contrasts

Digital audio watermarking is an important technique to secure and authenticate audio media. We generally classify the existing designs into time domain and transform domain methods, and relate all the reviewed works on audio security with either audio watermarking technique or audio steganography or audio encryption technique [12]. There are various state of art approaches are present in literature to secure the audio signals. Audio signals can be secured by either audio watermarking or basic cryptography techniques. Hu Hwai et al. [10] proposed Robust, transparent and high-capacity audio watermarking in DCT domain. Due to frequency domain, the complexity of this scheme is high. Vivekannad Bhatt et al. [3] proposed an adaptive audio watermarking based on the singular value decomposition in the wavelet domain. This scheme is a hybrid version of both domains viz. frequency and spatial. A perceptual-based DWPT-DCT framework for selective blind audio watermarking is proposed in [11] which is motivated by the human auditory perception. This scheme provides robustness and imperceptibility with respect to the original and recovered audio signal. Al-nuaimy et al. proposed a different type of audio encryption technique in [2] which is based on chaotic encrypted images and SVD. Pranab Kumar et al. proposed a blind SVD-based audio watermarking using entropy and log-polar transformation in [5]. Here copyright protections of audio signals are done using log-polar transformation (LPT). A different kind of audio watermarking is proposed in [13] where multiple watermarks are embedded into the cover signals in order to check the authenticity of the signal. Two novel covert communication techniques are present in [14, 16] using spreadsheets and multi-scroll chaotic system respectively.

Besides aforementioned audio watermarking approaches, there are several state of art audio encryption or data hiding techniques presented in literature. Hemalatha S et al. proposed a integer wavelet transform based audio data hiding technique in [9] which provides high embedding capacity and good imperceptibility to cover signal. Phase coding and LSB based audio steganography presented in [7]. This technique provides both facilities: data fragmentation as well as data encryption. A novel audio steganography technique presented in [25] which is based on block based XOR operations of LSBs. This approach can withstand steganalysis attacks. S Shivani et al. proposed a speech secret sharing approach in [20] which uses the concept of visual cryptography, but this approach only provides confidentiality and does not provide integrity verification. Reference [19] provides an approach for verifiable visual cryptography in which each share are capable enough to authenticate itself for any tampering. Approach presented in [19] is made for secretly transmitting the images but unable to transmit audio signals.

There is no effective algorithm present in literature for audio security which provides all security requirements like confidentiality, integrity, authentication etc. in a single approach. Proposed approach combines the concepts of VC in audio and addresses obvious problems of VC like random pattern of the shares, lossy recovery of secret, explicit generation of codebook.

The remaining sections of the paper are structured as follows. Proposed approach is described in Section 2. To show the effectiveness of the proposed approach, the experimental results and comparisons with various states of art approaches are discussed in Section 3. Paper is concluded in Section 4.

2 Proposed approach for providing security and privacy to songs using visual cryptography

There are many websites like itunes, hungama.com etc. which provide access (to listen or download) to songs on payment basis to their authentic users. Their servers are nothing but the huge songs repository which must be secured in order to unauthentic access. Since these songs repositories store songs without any encryption hence these are more vulnerable for attacks. If a repository is compromised due to any infiltration attack then the website owner may be treated as convict for copyright violation. Hence these all songs on repository must be kept in repository in secure manner so that in spite any attack, no one will be able to infer the theme of songs. The basic visualization of repository, authentic users and attacker are shown in Fig. 3. Here we can see that if attacker decrypts the single bottleneck point anyhow, then he can access all songs without any interruption. We have presented an efficient approach in this paper to avoid this type of scenario. Proposed approach is a brand new type of audio security scheme which is nothing but the fusion of image and audio processing with their all basic features. In this approach we are applying the visual cryptography secret sharing approach on audio bit stream. This approach generates two meaningful image shares which are actually embedded with audio files. Meaningful shares provide confidentiality to secret. At the same time proposed approach inherits the properties of 2 out of 2 visual cryptography. It means that, at the receiver end one can decrypt the secret audio if and only if he has both meaningful shares. Otherwise nothing will be decoded. Both shares are self authenticated. If there will be any intentional or unintentional alteration on shares that can be tracked by proposed approach. Most of the previous state of art approaches on image secret sharing generates meaningless shares. Actually randomness of the shares increases the vulnerability for cryptanalysis, hence some meaningful information like registered trademark or any copyright logo are added to the shares. Besides these constraints there are some other limitations like predefine codebook is also removed in proposed approach. Predefined (Explicit) codebook causes excessive memory requirement and overhead at both the sender and receiver end. One can visualize the relations among songs repository, authentic users and attackers in Fig. 4. Here we can see that a song is dividing into two visually meaningful shares which are verifiable in nature. Bothe shares will be stored in two different repositories. Now if an attacker breaches the security of either repository, he has only meaningful images instead of songs. If he tampers the share images then due to verifiability we can simply detect that shares are not authentic.

Fig. 3
figure 3

Visualization of insecure repository of songs

Fig. 4
figure 4

Visualization of secure repository of songs using proposed approach

Novelty of the proposed approach:

Proposed Audio security approach provides following features:

  1. 1.

    Meaningful shares

  2. 2.

    Confidentiality to secret audio file by converting it into images.

  3. 3.

    Implicit generation of codebook.

  4. 4.

    Access control by providing the features of 2 out of 2 Visual Cryptography

  5. 5.

    Each share is self authenticating in nature.

Proposed song security approach using Visual Cryptography is outlined in Fig. 5. There are seven steps to generate two self authenticating meaningful shares for a secret audio file. These seven steps mainly include creation of basis matrices, creation of secret shares then meaningful shares, creation of self authenticating shares, tamper detection, audio extraction and finally secret recovery. Audio signals are recorded in the form of unsigned eight bit integer and each sample is converted into eight bit binary forms. Combined binary vector of each sample is denoted as vector S b . Step 1 takes S b as input to generate basis matrix. Step 2 takes basis vector V as input and generate two binary secret vector \( {S}_b^{e_1} \) and \( {S}_b^{e_2} \). Meaningful shares are generated in step 3 by using two cover images C 1 and C 2 (which are going to be displayed on shares), output of this step is denoted by C share1 and C share2 respectively. In step 4 these meaningful shares are converted into self authenticating shares and denoted by C vs1 and C vs2 . Due to self authentication feature in shares one can easily track the any mishandled or leaked share in step 5 which was very difficult in previous state of the art approaches. If shares are authentic and unaltered then we can extract the secret shares of audio signals in step 6. Finally audio signals are regenerated in step 7. The detailed description of each steps are discussed next subsections.

Fig. 5
figure 5

Flow chart for proposed Self Authenticating Audio Secret Sharing approach

2.1 Basis matrix creation

Let W = {W 0, … W n − 1} be a set of participants which includes all sender and receiver of audio signals. According to Visual Cryptography scheme a secret binary bit stream S b is encoded into n images which are called secret shares. Let ΓQual ⊆ 2W and ΓForb ⊆ 2W where 2W is power set of W and ΓQual ⋂ ΓForb =  ∅ . The members of ΓQual are refereed as qualified set and members of ΓForb are refereed as forbidden set. The pair (ΓQual , ΓForb) is called access structure of VSS.

Secret audio signal can be decoded by only qualified set of participants X ∈ ΓQual whereas any participants Y ∈ ΓForb cannot decode the secret.

figure f

Example-2.1

Let (ΓQual , ΓForb) be an access structure for n participants. Proposed Audio secret sharing approach has been taken the concept 2 out of 2 visual cryptography. In case of proposed 2 out of 2 VC approach, if two participants {1, 2} are given for an access structure (ΓQual , ΓForb), then ΓForb = {{1}, {2}} and ΓForb = {{1, 2}}. ΓForb = {{1}, {2}}. Since secret information can be achieved by computing two shares.

In proposed approach pixel expansion m is obtained corresponding to each secret sample of audio for all n shares. Pixel expansion m will be denoted by n × m boolean matrix M. Let r i be the i th (i = 1, 2, 3 . ., n) row of M which contains subpixels for i th share. Let X = {i 1, i 2 … i s } be the subset of the row of M which will be assigned to s participants. Here OR-logical operation on the corresponding row r ik (k = 1, 2, 3 .  . s) of M can be used to simulate the superimposing operations of shares in X. Result of this operation is a row vector \( V\left(V= OR\left({r}_{i_1},{r}_{i_2},..{r}_{i_s}\right)\right). \) The Hamming weight of V is approximation of gray level of superimposed pixel p and denoted by w(V).

Definition-2.1 [26]

Let (ΓQual , ΓForb) be an access structure for n participants. Two n × m boolean matrices \( {S}_{i,j\in \left\{0,1\right\}}^{ij} \) are called basis matrices, if the sets C ij are obtained by permuting columns of S ij in all possible ways, respectively, S ij satisfy the following two conditions.

  1. 1.

    Contrast condition: If X = {i 1, i 2 … i u } ∈ ΓQual, the row vectors V 0 and V 1 (for extreme white and black combination of bits) obtained by doing OR operation on rows i 1 , i 2 … i u of S ij respectively, satisfy

$$ w\left({V}_0\right)\le {t}_X-\alpha (m)\times m $$
(1)

and

$$ w\left({V}_1\right)\ge {t}_X $$
(2)
  1. 2.

    Security condition: Any subset X = {i 1, i 2 … i v } ∈ ΓForb of v participants has no information of the secret signal. The collection of two matrices D j (j = 0, 1) of size v × m formed by extracting rows i 1 , i 2 … i v from each matrix C ij are indistinguishable.

Where t X is the threshold to interpret the reconstructed sample as black or white and α(m) is called the relative difference referred to as the contrast of the decoded signal, it can be obtained by

$$ {t}_X=\mathit{\min}\left(w\left({V}_1\left(X,M\right)\right)\right) $$
(3)

where M ∈ C 1

$$ \alpha (m)=\frac{\mathit{\min}\left(w\left({V}_1\left(X,M\right)\right)\right)-\mathit{\max}\left(w\left({V}_0\left(X,M\right)\right)\right)}{m} $$
(4)

The matrix M is randomly selected from C ij for any signal sample.

Proposed Algorithm 1 is used to create two basis matrices V of size n × m for binary bit 1 and 0. Two encoding sets of V can also be obtained by permuting the columns of respective V. First of all each sample signal is converted into eight bit binary form. Basis matrix is generated for each binary bit of sample.

Example-2.2

Let n = 2 and V 0 , V 1 be the two basis matrices for two different bits 0 and 1 of sample of signal. According to algorithm 1 \( {V}^0=\left[\begin{array}{c}10\\ {}10\end{array}\right] \) and according to algorithm 1 \( {V}^1=\left[\begin{array}{c}01\\ {}10\end{array}\right] \).

One can see that single row of matrix V i contains only single 1 and 0 for each binary signal sample. Hence it will be very difficult to find belonging sample by insufficient number of share. One can also take the different permutations of column of basis matrix by following way

$$ {\displaystyle \begin{array}{c}{V}^0=\left\{\left[\begin{array}{c}10\\ {}10\end{array}\right],\left[\begin{array}{c}01\\ {}01\end{array}\right]\right\}\\ {}{V}^1=\left\{\left[\begin{array}{c}10\\ {}01\end{array}\right],\left[\begin{array}{c}01\\ {}10\end{array}\right]\right\}\end{array}} $$

Once we get the basis matrices for binary bits of S b , we proceed further for next algorithm.

2.2 Secret share generation

In this step we take the generated basis matrices as input. Two random bit streams \( {S}_b^{e_1} \) and \( {S}_b^{e_2} \) are generated by concatenating the elements of first and second rows of V respectively as shown in Algorithm 2. The length of \( {S}_b^{e_1} \) and \( {S}_b^{e_2} \) is 2 × length(S b ). Now these two bit streams will be passed to next step in order to generate meaningful shares.

figure g

2.3 Creation of meaningful secret shares

Two meaningful binary cover images C 1 and C 2 are required as template to generate two meaningful secret images C share1 and C share2. If the size of C 1 and C 2 is M × N then length of \( {S}_b^{e_1}=\frac{M\times N}{4} \). According to Algorithm 3, C 1 and C 2 will be divided into 2 × 2 non overlapping blocks and hence each bit of \( {S}_b^{e_1} \) and \( {S}_b^{e_2} \) will be assigned to respective block of C 1 and C 2.

2.4 Creation of self authenticating meaningful secret shares

Shares are most sensible objects because they carry secret information; hence they must be untampered and authentic before decoding. To achieve this objective we need to create self authenticating or veryfiable shares which are capable enough to track their tampered region. Algorithm 4 is used to generate two verifiable meaningful secret shares C vs1 and C vs2. Self embedding technique is used to create a single authentication bit for each block of size 2 × 2 of C share1 and C share2.

Example-2.3

Let us consider a block of C share1 is \( \left[\begin{array}{c}11\\ {}01\end{array}\right] \) as an input to this Algorithm 4. If corresponding rows β and columns γ for each bit of given block are \( \left[\begin{array}{ccc} row\backslash col.& 37& 38\\ {}102& 1& 1\\ {}103& 0& 1\end{array}\right] \). According to Algorithm 4 to calculate self embedding authentication bit, we do XOR between secret bits with fifth MSB of their corresponding row and column values.

figure h

Algorithm 4 is used to generate single authentication bit A u for each block of C share1 and C share2.

figure i

2.5 Tamper detection

One can only get the actual authentic secret audio signal at receiver end, when share are not tampered intentionally or unintentionally during transmission. Hence before decoding the secret audio signal, we must check the both shares for alteration. Algorithm 5 is used to identify the tampered pixel for both shares. Here we just extract the R th pixel of each block of size 2 × 2 of C vs1 and C vs2 and recalculate it by Algorithm 4. Now bit wise comparison is done between extracted and recalculated bit matrices. If any mismatch found then that pixel will be marked as tampered one.

figure j

2.6 Audio extraction

Once we realize that received meaningful shares are not intentionally or unintentionally altered then we extract the secret bits of audio signals from each meaningful shares using algorithm 6.

figure k

2.7 Secret audio signal recovery

Secret audio signal recovery of proposed approach require little bit computation. In this phase, according to Algorithm 7, after verifying the authenticity of both shares C vs1 and C vs2, secret signal recovery is done. To recover a samples, we need to both bit streams \( {S}_b^{e_1} \) and \( {S}_b^{e_2} \) .

figure l

Au is a kind of fragile watermark which is produced by self embedding technique. Fragile watermark is destroyed after any intentional or unintentional attack on cover signal [21]. In proposed approach cover signal is nothing but the audio samples. Block wise authentication is done for tampered meaningful share. Au bit is generated for each block of size 2 × 2 by using Algorithm 4. If a single bit will be changed in original, we can track that received signal is not authentic.

2.8 Performance analysis

Self Authenticating meaningful shares \( {C}_{vs_1} \) and \( {C}_{vs_2} \) must satisfy the contrast and security conditions. Since we are dealing with binary cover images, hence the values of objective evaluation parameters between C k & \( {C}_{vs_k} \) must be satisfactory enough.

Lemma 1

Imperceptibility between C k & \( {C}_{vs_k} \) must be satisfactory enough in terms of objective evaluation parameters.

Since two bits of a block of size 2 × 2 of \( {C}_{share_k} \) is altered in order to make \( {C}_{vs_k} \). One bit is dedicated for verifiability of block whereas other one bit is dedicated for secret audio signal. Remaining two bits are the visible information of the cover image. Hence only 50% information is different between cover image C k and \( {C}_{vs_k} \), hence imperceptibility must belong to an acceptable range.

Lemma 2

Original Audio signal S b will be recovered as it is if \( {C}_{vs_k} \) will be unaltered.

If \( {C}_{vs_k} \) is unaltered then S b can be recovered as it is at receiver end. Because one can extract secret audio signals from bit d of each block of size 2 × 2 of \( {C}_{vs_k} \) and both extracted share will be computed in order to recover the audio.

3 Experimental results and comparisons

Experiments have been performed on various audio signals. Here we have taken all audio signals in unsigned eight bit integer form so that it’s sample can be easily converted into eight bit binary form. Proposed self authenticating audio secret sharing approach has been verified and illustrated for two shares. All cover images are taken in halftone format using existing error diffusion technique because it looks like gray scale image. Figure 6 shows overall gist of our proposed approach. Figure 6 (a) shows the graphical representation of the secret audio signal which are to be transmitted securely. Images (b) and (c) are self authenticating meaningful shares \( {C}_{vs_1} \) and \( {C}_{vs_2} \) respectively. Image (d) is nothing but the recovered signal at the receiver end. We can see here that recovered secret audio signal at receiver end is same as original one, since there is no tampering is done on self authenticating meaningful share.

Fig. 6
figure 6

a Original audio signal S b , b Self authenticating meaningful share \( {C}_{vs_1} \), c Self authenticating meaningful share \( {C}_{vs_2} \) d Recovered audio signal

Each share is embedded with authentication bits generated by self embedding techniques in order to verify its integrity. If any tampering will be done intentionally or unintentionally during offline storage or transmission, it can be easily tracked by proposed approach. Figure 7 demonstrate the integrity verification of shares, where images (a) original secret audio signal and (b) are tampered version of the original one. One can easily identify the authenticity of shares using algorithm 5. White pixels in Fig. 7 (c) show the tampered portion of share. Tamper localization is done without the help of original share. Since block based authentication is done here hence alteration in a single bit leads the whole block as tampered. Table 1 shows the accuracy of tamper detection. Here we tabulated the tampering results of four self authenticating meaningful shares. One can observe that tamper localization is done with very satisfactory level.

Fig. 7
figure 7

a Original audio secret signal, bTampered version of self authenticating meaningful share \( {C}_{vs_k} \), c Detected pixels, d Modified audio signal at the receiver end

Table 1 Alteration detection accuracy of proposed approach

3.1 Comparison of relative reports on VC

Proposed approach is a very new and different kind of audio secret sharing approach which is not proposed in literature till now. Various essential qualitative characteristics of Visual cryptography and secret sharing have been considered to compare the proposed approach with existing approaches on audio secret sharing as shown in Table 2. Few of qualitative parameters for comparison of VC are described as follows:

  1. 1.

    Contents of shares: Most of the existing algorithms generate shares which are random in nature. These shares are highly vulnerable for cryptanalysis and also may be cause of confusion in share identification. Hence there should be some meaningful information on shares. This information may be any additional information about shares or share holders.

    Table 2 The comparison of relative reports on VC
  2. 2.

    Contrast α(m) of the Share: The value of contrast (m) must be as high as possible so that the quality of meaningful share image \( {C}_{vs_k} \) remains same as original cover image C k . Since for security issues there are some contrast loss in all VC schemes, hence this contrast loss must be minimized.

  3. 3.

    Security Criteria: Any subset of ΓForb must show no information about the secret audio signal and for Γ Forb , rows from any matrix of V j(j ∈ {0, 1}) must be indistinguishable with respect to S b .

  4. 4.

    Codebook Requirement: Most of the existing algorithms on secret sharing require codebook, explicitly, at the time of encoding and decoding process. Codebooks are nothing but a pattern of all combinations of basis matrices which are decided for various possibilities of signal samples. Since explicit codebooks are very difficult to manage and require excessive static memory for storing, hence explicit requirement of codebook is biggest overhead for any secure sharing algorithm.

  5. 5.

    Share authentication: Shares are very important and sensible primitives in secret sharing hence they must be protected during any transmission. Self embedding method is best way to protect the shares because extra authentication images are not required in this case.

  6. 6.

    Lossless decryption of secret signal: One cannot compromise with the original audio secret signals because lossy recovery may change the entire meaning.

There are lots of other approaches on audio security present in literature which can be compared with the proposed mechanism but in Table 2 we have compared only those latest approaches on audio security which can be defined by aforementioned qualitative parameters.

4 Conclusions

In this paper a novel approach for self authenticating audio using visual cryptography with meaningful shares has been proposed for securely storing various songs on huge repository. Proposed method eliminates various basic security constraints of visual cryptography like random pattern of shares, explicit codebook requirement, contrast loss, lossy recovery etc. Proposed approach basically uses the concept of 2 out of 2 visual cryptography scheme where both secret shares are visually meaningful which provide confidentiality (vital security requirement) to secret audio signal during transmission. All blocks of size 2 × 2 of both meaningful shares \( {C}_{share_k} \) are contained with self embedding authentication bits which is a kind of block based fragile watermark that confirms authentication and content based integrity of both shares and hence secret audio signal. The experimental results and comparison with the most of the existing state of art approaches in all aspects of audio security show the effectiveness of proposed self authenticating audio security approach.