1 Introduction

Audio is one of the major component of the multimedia being the most natural way of communication. Persons communicate with call centers to resolve their issues, who may record the conversations. These conversations may also contain important personal information like name, mobile number, account number, credit card number, Aadhaar number etc. which may later be illegally used. Similarly, if any prior information of a terror attack or crime is received by some intelligence agency then this information is to be shared secretly. Data received from sensors for various purposes such as activity recognition [7, 8, 10, 11], must also be securely transmitted. Several times social media data [9] stored on the server must also be secured as it contains personal information. Encryption is not sufficient to secure these sensitive informations. Hence, to ensure the security of these data, secret sharing scheme was introduced. Secret sharing scheme was first proposed by Shamir et al. [14] and Blakely et al. [2] in 1979. In the secret sharing scheme, secret information is divided into multiple units called shares before storing or transmitting it. In a (k, n) secret sharing scheme, secret is divided into n shares and at least k shares are required to regenerate the secret. Secret sharing can be categorized into two categories with respect to the computation required for revealing the secret. In the first category, some computation is necessary while in the second no computation is required. In the second category, only with the help of human perception system like ear, eye, etc., the secret information can be revealed. For example, in visual cryptographic system human can reveal the secret without any computation just by superimposing the visual shares in an order. Similarly, a human can reveal the audio secret by playing the audio shares simultaneously. Revealing audio secret without computation is affected by hearing impairment, location of audio players, effect of noise during transmission etc. Thus, first category (revealing secret with some computation) is a better option in case of audio secret sharing.

Security of audio data is a major concern among researchers working in the field of multimedia security. Number of researchers have proposed different approaches to secure audio data but still it is an open research problem. To secure audio data, here we are proposing verifiable (n,n) audio secret sharing scheme by separating the amplitudes and signs of audio secret. Integrity of each share received can also be checked at the receiver end. This scheme is also suitable for real-time audio communication as the available bandwidth is also taken into consideration for the construction of shares. Total size of all the shares together is same as the size of secret itself as it does not require any cover audio to transmit the shares. During transmission of the shares, if some attacks are made even then secret may be revealed. Here, two types of attacks are considered.

  • Addition of noise/meaningful audio may happen in some part of the share/s.

  • Replacement of some part of the share/s by noise/meaningful audio.

The remaining part of the paper is structured as follows:

Literature Survey for audio secret sharing is presented in Section 2. Proposed approach for verifiable (n, n) ASS scheme is described in Section 3. Effectiveness of the proposed approach is shown through experimental results in Section 4. Comparison of proposed approach with state of the art approaches is discussed in Section 5. Paper is concluded in Section 6 followed by references.

2 Literature survey for Audio Secret Sharing (ASS)

Number of researchers have been working in the field of ASS scheme which is a more secure way to transmit/store secret data through audio. In this section, a brief review of state of the art approaches for ASS scheme is presented.

Y. Desmedt et al. [3] proposed a (2, 2) audio secret sharing scheme to hide a binary secret message through cover audio. Two audio shares each of same duration as of original cover audio are generated using interference property of audio. To hide a bit of binary secret, part of the cover audio of fixed duration (assume T) is used. For example, if secret binary bit is ‘1’ then phase difference between both shares for time ‘T’ are taken to be 0 otherwise π. Playing both shares simultaneously, binary secret will be revealed as follows: Higher amplitude (constructive interference) of audio is decoded as ‘1’ otherwise (destructive interference) ‘0’. This scheme is good for binary secret but it cannot be used for audio secret. Here, total size of all shares is very large as compared to secret size. Also, it may not be used to hide secret in real time audio transmission. This scheme [3] is further extended to (2,n) audio secret sharing scheme which requires ⌈log 2n⌉ number of cover audios. Ching-Nung Yang [18] and Chen-chi-Lin et al. [6] also proposed (2,n) ASS scheme with only one cover audio but other limitations still exist continue. Above mentioned schemes may not be generalized for (k, n) ASS scheme. Later, Md. Ehdaie et al. [4] also proposed (2,n) ASS scheme that can be generalized for (k, n) ASS. In this scheme, secret message is an audio file. They generate shares from the secret itself. First share is generated randomly and remaining shares depend on it. Here also, secret can be revealed by playing audio shares simultaneously. Total size of shares is larger as compared to secret size but lesser than the above mentioned approaches. This scheme is also not applicable in real time audio transmission. Md Ehadaie et al. [12] introduced a new generalization method to extend every (2, n) secret sharing scheme. Daniel Socek et al. [15] proposed a scheme for sharing audio secret by using Morse code which has two types of sounds: short beep and long beep. Here, audio secret is first converted into Prefix Binary code signal then, it is transformed into Morse-code structures. In this approach there is no computation required to reveal the secret.

Jing-Zhang et al. [17] proposed an audio secret sharing scheme based on fractal encoding to reduce the size of the original secret by 40%. Fractal is divided into n parts which are embedded into n distinct audio covers using Least Significant Bit (LSB) technique in the frequency domain. Hence, total size of the shares is larger than the original secret. Computation time is high so, it can not be applicable in real time audio communication. S. Vyavahare et al. [16] proposed an audio secret sharing scheme using matrix projection technique which does not require any cover audio. Shares are generated from the secret itself, and total size of shares generated is approximately same as secret size. During the construction of shares a remainder matrix is generated which is later used for revealing the secret. If any change occurs in the remainder matrix then secret cannot be revealed. It is the major drawback of this approach. Ryouichi Nishimurce et al. [13] proposed audio secret sharing scheme for 1-bit audio. 1-bit audio is a high quality digital audio with sampling frequency of 2,822.4 kHz and a resolution of 1-bit. Here, share is constructed using sharing table. This scheme cannot be generalized for other formats of audio.

Apart from the above mentioned limitations no authors has provided a method for checking the integrity of received shares if any attack occur during the transmission of shares.

In this paper, authenticity of revealed secret can be guaranteed by checking the integrity of each shares received. Other limitations found in the literature are reduced in the proposed scheme of this paper.

3 Proposed approach for Verifiable (n, n) Audio Secret Sharing (VASS) scheme

Flow chart of the proposed approach for generation of verifiable shares and revealing of secret is shown in Fig. 1. Details of construction of verifiable shares and revealing of secret are shown in Fig. 2 and in Fig. 7 respectively. There are three phases in the proposed approach to generate verifiable audio shares. First phase converts input audio secret into two different streams: Stream of scaled amplitudes (Sp) and Stream of signs (Ss), using Algorithm 1. Second phase generates n primary shares. These primary shares are generated by changing the location of scaled amplitudes and relative ordering of signs using Algorithm 2. Primary shares when played give the impression of noise. In the primary shares, share number and scaling factor are also inserted which are required to reveal the secret later. In third phase, n authentication keys are generated and embedded them into respective primary shares which will be used to check the integrity of shares later. The shares thus generated are verifiable. Once, all the shares are received then by performing some computation, secret is revealed through Algorithm 4. At the receiver end, integrity of share/s can be verified (if required) by using Algorithm 5. Time complexity of the proposed approach in this paper is O(N), where N is the total number of samples in audio secret. Experimentally, it is also found that time required to generate the shares and revealing the secret is low which is also clear from the Table 6. Hence, to increase the security of audio conversation in real time over Internet this approach may be used.

Fig. 1
figure 1

Flow chart of proposed approach for verifiable (n, n) ASS scheme

Fig. 2
figure 2

Flow chart of proposed approach for share generation

As the quality of audio signal decreases with changes in relative ordering of sign, to provide more security in audio shares they are generated by separately processing on stream of signs (Ss) and stream of amplitudes (Sp) of original audio secret (S). Due to it, no cover audio is required for transmitting/storing the shares. Even by playing all the shares no secret can be revealed. Secret can only be revealed after applying the Algorithm 5.

3.1 Phase 1 (pre-processing of audio secret): separation of stream of signs and stream of scaled amplitudes

First of all scaling of input audio secret (S) is done by dividing it with scaling factor (sf). Scaling is done to reduce the transmission time of the shares. Scaling factor (sf) is taken as the ratio of largest amplitude of S and L (Default value = 255, it can be changed depending upon the largest allowed amplitude for transmission). Due to scaling of S, musical noise may be introduced. To minimize it, numbers with zero amplitudes in the scaled audio secret are replaced by ones. Quality of audio secret is inversely proportional to the scaling factor. Hence, upper limit of the scaling factor also depends upon sampling frequency of the audio secret.

figure d

Here, 1 and − 1 are used to represent positive and negative signs respectively. After scaling, audio secret is separated into two streams: stream of scaled amplitudes (Sp) and stream of signs (Ss) as shown in Fig. 3.

Primary shares of audio secret are generated in phase 2.

Fig. 3
figure 3

aS: Input audio secret bSp: Stream of scaled amplitude of input audio secret and cSs: Stream of signs

3.2 Phase 2: generation of primary shares

Stream of amplitudes: Sp and stream of signs: Ss, each one is divided into n(n ≥ 2) vectors of size \(\lceil \frac {N}{n}\rceil \times 1\) using modulus n operator applied on indices. If \((\frac {N}{n})\) is not an integer then Sp and Ss both are appended by required number of ones. To generate n primary shares, n vectors Avec1, Avec2, ... ,Avecn are generated from Sp and n vectors Svec1, Svec2, ...,Svecn are generated from Ss. Value at ith index of Sp and (Ss) are copied at (\(\frac {i-1}{n} + 1 \))th index of Avect and (Svect) vectors respectively, where t = ((i − 1)mod n + 1).

figure e

For example, Let n = 4 shares be constructed for the input audio secret S. S:

Index

1

2

3

4

5

6

7

8

9

10

11

12

Data

128

− 150

− 255

− 185

158

140

− 80

106

208

190

− 170

− 35

Here, maximum amplitude of S is 255 at index 3. If maximum allowed amplitude that can be transmitted is 26 then, \(sf= \lceil \frac {255 }{26} \rceil = 10\). Now, stream of scaled amplitudes (Sp) are as follows:

Index

1

2

3

4

5

6

7

8

9

10

11

12

Data

13

15

25

18

16

14

8

11

21

19

17

3

Similarly, stream of signs Ss is:

Index

1

2

3

4

5

6

7

8

9

10

11

12

Data

1

− 1

− 1

− 1

1

1

− 1

1

1

1

− 1

− 1

To generate 4 primary shares, following four vectors Avec1, Avec2, Avec3 and Avec4 are generated from Sp using mod 4 as explained in Algorithm 2.

A v e c 1

A v e c 2

A v e c 3

A v e c 4

13

15

25

18

16

14

8

11

21

19

17

3

Similarly, following four vectors Svec1, Svec2, Svec3 and Svec4 are generated from Ss using mod 4 as explained in Algorithm 2.

S v e c 1

S v e c 2

S v e c 3

S v e c 4

1

− 1

− 1

− 1

1

1

− 1

1

1

1

− 1

− 1

For the generation of ith primary share, veci is generated first using Aveci and Svecni+ 1 as follows:

$$vec_i(j) = Avec_i(j) \times Svec_{n-i + 1}\left( \frac{N}{n}-j + 1\right) $$

v e c 1

v e c 2

v e c 3

v e c 4

− 13

− 15

25

18

16

− 14

8

11

− 21

− 19

− 17

3

Next, share number and scaling factor used are appended in shares: sf + i is appended in the \((\frac {N}{n} + 1)^{th}\) index and sfi is appended in the \((\frac {N}{n} + 2)\)th index of the ithveci to generate ith primary share (shpr(i)). These extra values appended in a share will be used to determine share number and scaling factor during revealing of the secret. Following are the four primary shares thus generated by Algorithm 2 for sf = 10.

shpr1

shpr2

shpr3

shpr4

− 13

− 15

25

18

16

− 14

8

11

− 21

− 19

− 17

3

11

12

13

14

9

8

7

6

For primary share1: sf(10) + i(1) = 11 and sf(10) − i(1) = 9 have been appended at the 4th and 5th locations respectively. Similarly, for primary share2: sf(10) + i(2) = 12 and sf(10) − i(2) = 8, for primary share3: sf(10) + i(3) = 13 and sf(10) − i(3) = 7, and for primary share4: sf(10) + i(4) = 14 and sf(10) − i(4) = 6 have been appended at the 4th and 5th locations respectively.

Figure 4 shows the four primary shares generated from the secret S.

Fig. 4
figure 4

Generated primary shares from S for n = 4

Authenticity of the secret revealed from primary shares can not be confirmed. So, our next step would be to make the shares verifiable to ensure the authenticity of secret revealed.

3.3 Phase 3: process to make primary shares verifiable

To make primary shares (obtained by Algorithm 2) verifiable, first an authentication key for each share is generated and thereafter embedded there itself. Authentication key is generated by performing repetitive bitwise XOR operation among the first \(\lceil \frac {N}{n} \rceil \) elements of the primary share. To make the primary share verifiable, maximum amplitude of a primary share is replaced by the value obtained through bitwise xoring of respective authentication key and maximum amplitude of that share. Let, aki denote the authentication key of ith primary share (generated in Section 3.2).

$$\begin{array}{@{}rcl@{}} \text{ak}_1 &=& -13 \oplus 16 \oplus -21 = 8\\ \text{ak}_2 &=& -15 \oplus -14 \oplus -19 = -18\\ \text{ak}_3 &=& 25 \oplus 8 \oplus -17 = -2 \\ \text{ak}_4 &=& 18 \oplus 11 \oplus 3 = 26 \end{array} $$
figure f

Highest amplitude value (-21) of primary share1 will be replaced by xoring it with ak1 (− 21 ⊕ 8 = − 29). Similarly, highest amplitude value (-19) of primary share2 will be replaced by xoring it with ak2 (− 19 ⊕− 18 = 3), highest amplitude value (25) of primary share3 will be replaced by xoring it with ak3 (25 ⊕− 2 = − 25) and highest amplitude value (18) of primary share4 will be replaced by xoring it with ak4 (18 ⊕ 26 = 8).

Now, following are the generated four shares which are verifiable.

shv1

shv2

shv3

shv4

− 13

− 15

− 25

8

16

− 14

8

11

− 29

3

− 17

3

11

12

13

14

9

8

7

6

This embedding helps in verifying the integrity of received shares without any explicit use of its authentication key.

For example, let a primary share consists of four elements A, B, C and D. So, authentication key (E) = ABCD. Suppose, B is the largest. So, F = BE will be embedded at the position of B. Hence, verifiable share will contain A, F, C and D instead of A, B, C and D. Bitwise xoring of elements of the received share is: AFCD = ABECD = EE = 0.

Hence, if no tampering is made during transmission, bitwise xoring of the elements of each share should be 0. In other words, non-zero xoring value confirms tampering.

To maintain the normal representation of shares, the authentication key is embedded at the location of maximum amplitude value of the primary share. For example, if the authentication key is embedded within the share at the silence part or the part where amplitude is very low then it can easily be identified by applying cryptanalysis techniques. This may defeat the goal of embedding (Abrupt hike may create a suspense sometimes as shown in Fig. 5c when authentication key is embedded at the mid location). Hence, it is to be embedded at the location of maximum amplitude value of the respective primary share shpr (Fig. 6).

Fig. 5
figure 5

Verifiable shares generated from primary shares for n = 4

Fig. 6
figure 6

Verifiable shares are obtained using Algorithm 3. No point of suspense is found

Running time complexity of the proposed approach for verifiable share generation is O(N), where N is the total number of samples in the audio secret. Revealing of secret from received shares is discussed next.

3.4 Phase 4: revealing of secret from the shares

Algorithm 4 is used to reveal the secret from all the n shares. Number of shares (n), scaling factor used (sf) and share number (sn) are required for revealing the secret audio from the received shares. Information about number of shares (n) for a secret is publicly available. Scaling factor (sf) used can be determined from any share received as follows: sf \(= \frac {sh_{v}(\frac {N}{n}+ 1) + sh_{v}(\frac {N}{n}+ 2)}{2}\).

Share number (sn) of a received share is determined as: \(sn = \frac {sh_{v}(\frac {N}{n}+ 1) - sh_{v}(\frac {N}{n}+ 2)}{2}\). All the share numbers of the received shares must be distinct and lying between 1 to n for revealing the secret.

For example, received shares are shv1, shv2, shv3 and shv4. For shv1: \(sn=\frac {11-9}{2} = 1\) and \(sf=\frac {11 + 9}{2} = 10\). Similarly, for shv2: \(sn=\frac {12-8}{2} = 2\) and \(sf=\frac {12 + 8}{2} = 10\), for shv3: \(sn=\frac {13-7}{2} = 3\) and \(sf=\frac {13 + 7}{2} = 10\) and for shv4: \(sn=\frac {14-6}{2} = 4\) and \(sf=\frac {14 + 6}{2} = 10\). Share numbers are distinct and lie between 1 to 4. Hence, secret can be revealed from them. Flow chart for revealing the secret is shown in Fig. 7. Now, secret can be revealed using Algorithm 4 which is the reverse process of generation of shares. When the Algorithm 4 is applied on received shares (Section 3.3) following secret will be revealed as shown in Fig. 8.

Fig. 7
figure 7

Flow chart for revealing the secret

Fig. 8
figure 8

Secret audio is reconstructed from verifiable share using Algorithm 4

Revealed Secret:

Index

1

2

3

4

5

6

7

8

9

10

11

12

Data

13

− 15

− 25

− 8

16

14

− 8

11

29

3

− 17

− 3

The running time complexity of this algorithm is O(N).

figure g

3.5 Phase 5: Integrity Verification of Share (IVS)

To confirm the authenticity of the revealed secret, each received share must pass the integrity test. Integrity of a share is confirmed by performing repetitive bitwise XOR operation among the first \(\lceil \frac {N}{n} \rceil \) elements of that share. Non-zero xoring result alarms tampering. In that case sender is asked to retransmit the particular share.

It is worth to mention that zero result of bitwise xoring does not guarantee 100% integrity of the share. However, this can be ignored as it is a rare case as:

Probability of declaring a tampered share to be non-tampered is calculated as follows: Let k be the total number of samples in a share. Each sample is represented with b-bits.

Total possible sets of samples can be generated = 2b×k.

Here, it is assumed that if xoring of all the samples in a share is zero it means, share is correct. Output of bitwise XOR operation will be zero then one of the following conditions must be satisfied:

  • All bits are zero at same bit position of all samples in a share.

  • Even number of ones at same bit position of all samples in a share.

There is only one (kC0)way to have all bits zero at same position of all samples together.

Total number of ways to have ‘1’ in two places at same bit position of all samples simultaneously = (kC2).

Total number of ways to have ‘1’ in four places at same position of all samples simultaneously = (kC4).

Total number of ways to get ‘1’ in six places at same bit position of all samples simultaneously = (kC6).

Total number of ways to get even numbers of ‘1’ in k {if k is even number} places at same bit position of all samples simultaneously = (kCk)

Total number of ways to get even numbers of ‘1’ in k {if k is odd number} places at same bit position of all samples simultaneously = (kCk− 1).

Hence, for one bit location total number of favorable cases (to have zero output)

$$\begin{array}{@{}rcl@{}} &=& (^{k}C_{0}) + (^{k}C_{2}) + (^{k}C_{4}) + (^{k}C_{6}) + {\cdots} + (^{k}C_{k}) \ \ \ \ \ \ {\cdots} {\cdots} {\cdots} {\cdots} \cdots \text{ if k is even }\\ &=&(^{k}C_{0}) + (^{k}C_{2}) + (^{k}C_{4}) + (^{k}C_{6}) + {\cdots} + (^{k}C_{k-1}) \ \ {\cdots} {\cdots} {\cdots} {\cdots} \cdots \text{ if k is odd } \end{array} $$

Since, it can happen for b bit locations Hence, total number of favorable cases (to have zero output)

$$\begin{array}{@{}rcl@{}} &=& ((^{k}C_{0}) + (^{k}C_{2}) + (^{k}C_{4}) + (^{k}C_{6}) + {\cdots} + (^{k}C_{k}))^ b \ \ \ \ \ {\cdots} {\cdots} {\cdots} \cdots {\cdots} \ \text{ if k is even }\\ &=& ((^{k}C_{0}) + (^{k}C_{2}) + (^{k}C_{4}) + (^{k}C_{6}) + {\cdots} + (^{k}C_{k-1}))^ b \ \ \ \ \ {\cdots} {\cdots} {\cdots} {\cdots} \ \text{ if k is odd } \end{array} $$

Probability of declaring tampered share as non-tampered is

$$\begin{array}{@{}rcl@{}} \!\!&\approx&\!\! \frac{((^{k}C_{0}) \,+\, (^{k}C_{2}) \,+\, (^{k}C_{4}) + (^{k}C_{6}) + {\cdots} + (^{k}C_{k}))^ b}{(2^{b\times k})} \approx\frac{1}{2^b} \ \ \ \ \ {\cdots} {\cdots} {\cdots} {\cdots} {\cdots} \ \text{ if k is even}\\ \!\!&\approx&\!\! \frac{((^{k}C_{0}) \,+\, (^{k}C_{2}) \,+\, (^{k}C_{4}) \,+\, (^{k}C_{6}) + {\cdots} + (^{k}C_{k-1}))^ b}{(2^{b\times k})} \approx\frac{1}{2^b} \ \ \ {\cdots} {\cdots} {\cdots} {\cdots} {\cdots} \ \text{ if k is odd} \end{array} $$
figure h

Hence, the probability of declaring a tampered share as non-tampered share \( \approx \frac {1}{2^b}\). So, probability of detecting tampered share as of non-tampered depends on the number of bits used to represent a sample (maximum allowed amplitude that is to be transmitted) in a share. It means if a sample is represented with 8-bits then probability of declaring a tampered share as non tampered is \(\approx \frac {1}{2^8} = \frac {1}{256}\approx 0.0039\). Experimental results are analyzed in the next section.

4 Analysis of experimental results

To analyze the effectiveness of the proposed approach, experiments have been conducted with audio secrets of various size having different sampling frequencies for n = 4 shares. Parameters: SNR, MOS and correlation coefficient (r) being the most popular objective/subjective parameters have been used to measure the quality of the audio signal.

Shares generation time and secret reconstruction time are also recorded. Each experiments are conducted in three cases. Clean speech is taken as secret and Algorithm 4 is used to reveal the secret from received shares.

  • Case 1:- no distortion is applied on any share during transmission.

  • Case 2:- any cth(0 < c ≤ 1) fraction of a share is replaced by different real world noise or meaningful audio at different SNR values. Same thing is repeated for p (1 ≤ pn) number of shares.

  • Case 3:- distortion created is similar to the previous case but in place of replacement, addition is made.

4.1 Database used for the experiment

Audio secrets of various durations (1.5 sec to 5 sec) are taken from IndicTTS [1] database and TIMIT [5] database. Both IndicTTS and TIMIT database contain clean sentences spoken by male and female. In IndicTTS database, spoken sentences are of 13 Indian languages having sampling frequency fs = 48000. In TIMIT database, spoken sentences are recorded in English by speakers belong to 8 distinct dialect regions of the United states having sampling frequency fs = 16000. Some more audio secrets spoken by male and female in English language of duration (8 to 10 seconds) are also recorded in the lab used for the experiment. Various types of noise taken from NOIZEUS database are used for the experiment.

4.2 Parameters used for performance measurment

To calculate subjective parameter MOS, 20 persons are divided into two equal groups. In the group 1, original secret audio is played first thereafter revealed secret is played. Persons are asked to provide rating according to the defined labels. Average of ratings is calculated. In the group 2, only the revealed secret is played and asked to provide ratings. Finally, average ratings of group 1 and group 2 is taken as final MOS rating.

R a t i n g

L a b e l

5

Excellent quality

4

Content of audio is clear with acceptable quality

3

Less than 50% content of audio is clear with fair quality

2

Less than 25% content of audio is clear with bad quality

1

Only noise like audio is audible

Objective parameter SNR is :-

$$ SNR_{db} = 10 \times \log_{10}{\frac{{\sum}_{i = 1}^{N} S^{2}(i)}{{\sum}_{i = 1}^{N} {(S(i)-S_{r}(i))}^{2}}} $$
(1)

4.3 Experimental results

For input share shown in Fig. 9, four primary shares (Fig. 10) are generated using Algorithm 2. Algorithm 3 is used to make them verifiable (Fig. 11). In the case 1 of the experiment when no modification is assumed during transmission the revealed secret (Fig. 12) is found to be same as original secret (Fig. 9).

Fig. 9
figure 9

Input audio secret

Fig. 10
figure 10

Four primary shares generated using Algorithm 2

Fig. 11
figure 11

Four verifiable shares generated using Algorithm 3

Fig. 12
figure 12

Revealed secret using algorithm for phase1

Figure 13 shows the shares when attacked under case 2 for c = 1 and p = 1 where share shv1 is completely replaced by Babble noise at 0 dB during transmission.

Fig. 13
figure 13

Four verifiable shares generated using Algorithm 3. Share shv1 has been completely replaced by noise

Figure 14 shows the secret revealed from the attacked shares (Fig. 13). By comparing original audio secret and the revealed secret (Figs. 9 and 14), it is clear that noise added in one share has propagated throughout the secret. In this case, revealed secret could not be recovered. It shows that it is not possible to reveal the secret even if one complete share is replaced by noise. For other values of p and c, experimental results are shown in Tables 2 and 3.

Fig. 14
figure 14

Reconstructed secret using Algorithm 5 for phase2

Figure 15 shows the shares when attacked under case 3 for c = 1 and p = 1 where babble noise is added in existing share shv1 at 0 dB during transmission.

Fig. 15
figure 15

Four verifiable shares sent. Babble noise is added in share shv1 at 0 dB

Figure 16 shows the secret revealed from the above attacked shares (Fig. 15) using Algorithm 4. By comparing original audio secret and revealed secret (Figs. 9 and 16), it is clear that revealed secret Sr is completely different from the original one. It again confirms that even if one complete share is effected by noise, secret is not audible. For other values of p and c, experimental results are shown in Tables 4 and 5.

Fig. 16
figure 16

Reconstructed secret using Algorithm 5 for phase3

In Table 1: S, Sn and Sr represent input secret, transmitted secret and revealed secret respectively. Here, correlation coefficient (r) between (S&Sr) and (Sn&Sr) both are obtained as 0.99 instead of 1. Scaling done at preprocessing step is the cause of it. It is observed through Table 1 that MOS value of revealed secret is consistently 5 for case 1. Correlation coefficient in each case (column 6 & 7) is ≈ 1. These two observations show that revealed secret is of same quality as of original secret when there is no attack during transmission. This confirms the effectiveness of the proposed approach through secure transmission. For the experiment in case 2, p represents number of attacked shares and c represents fraction of one share attacked.

Table 1 SNR, MOS and correlation coefficient (r) for reconstructed secret for case 1 of the experiment

Through Table 2, it can be concluded that if single share is corrupted up to 50% then secret is audible otherwise not. Through Tables 2 and 3, it is observed that if more than one shares are attacked such that total effected part is more than 50% size of a single share then also original secret is not audible. Similar results are seen in case 3 of the experiment shown in Tables 4 and 5.

Table 2 SNR and MOS value of reconstructed speech for case 2 of the experiment
Table 3 Correlationcoefficient(r) for reconstructed speech for case 2 of the experiment
Table 4 SNR and MOS of reconstructed speech for case 3 of the experiment
Table 5 Correlationcoefficient(r) value of reconstructed speech for case 3 of the experiment

Share generation time and secret revealing time for n = 4 shares are shown in Table 6. Here, it is observed that overall time for both generating the shares and revealing the secret is \(\approx \frac {1}{70}^{th}\) size (w.r.t. time) of original secret for sampling frequency of 8000. It takes \(\approx \frac {1}{38}^{th}\) size (w.r.t. time) of original secret with sampling frequency of 16000. Share generation time and secret revealing time are also recorded for n = 2 and n = 6 shares. It is similar as the result obtained for n = 4 shares. It can be concluded from here that share generation time/secret revealing time is not significantly effected by number of shares generated. For secure transmission with least possibility of attack primary shares themselves instead of verifiable shares may be transmitted. Due to lesser time consumed it can also be used for real time audio/video communication via internet.

Table 6 Share generation and reconstruction time for n = 4 shares

From the Figs. 17 and 18 following observations are made:

  1. 1.

    Addition of some noise in more than one share is more harmful than addition of same amount of noise in one share. In other words, quality of revealed secret is better if same amount of noise is confined in single share instead of more shares.

  2. 2.

    Similar results are seen for case 3 which can be confirmed through Fig. 18.

Fig. 17
figure 17

SNR in dB, MOS and r(S, Sr) obtained for case 2 of the experiment when babble noise is replaced at 0 dB

Fig. 18
figure 18

SNR in dB, MOS and r(S, Sr) obtained for case 3 of the experiment when babble noise is added at 0 dB

5 Comparison of proposed approach with existing state of the art approaches

In order to compare with the proposed approach in this paper to the various state of the art approaches, different criteria have been taken into account as shown in Table 7.

  • Threshold:- Threshold is the minimum number of shares that are required to regenerate the secret. Threshold should be high for better security of shares.

  • Decryption:- Decryption of secret in audio secret sharing can be performed either without computation or with some computation. In ASS scheme, decryption with some computation is preferred because decryption without computation is affected by hearing impairment, location of audio players, effect of noise during transmission of shares etc.

  • Secret type:- The types of secret which can be transmitted through audio shares.

  • Integrity assurance:- Provision of verifying the integrity of received shares.

  • Space efficiency (M):- Ideally, for space efficient approach M should be near to 1. where, M is defined as shown in (2).

    $$ M = \frac{{\sum}_{i = 1}^{n} \textit{size of i}^{th}\textit{ share}}{\textit{size of secret}} $$
    (2)

    where, n is the total number of constructed shares.

  • Number of cover audio:- Cover audio is defined as an audio in which secret is embedded. It may be 0 or more.

Comparison results are shown in Table 7.

Table 7 Comparison of state of the art approaches for ASS

6 Conclusion

In this paper a novel approach for verifiable (n, n) audio secret sharing scheme is proposed. It reduces the drawbacks of existing work: requirement of cover audios, requirement of more than one audio players to reveal the secret etc. Proposed approach provides the extra facility to confirm the authenticity of the revealed secret by checking the integrity of individual shares received. Experimentally it is shown that the quality of revealed secret is as good as original. Experimentally it is also confirmed that the performance of this approach is better than the state-of-art approaches in terms of space efficiency, integrity verification etc. Proposed approach is more secure in the sense that even by hacking all the shares no clue of the secret can be obtained without applying the algorithm for it. The proposed approach takes little time both for generating the shares and revealing the secret from the received shares. Hence, it can also be used to provide security for real time applications like Video chatting, Internet telephony etc.