1 Introduction

Speech security systems have been broadly exploited in many applications. Currently, it is very important to protect speeches over communication systems with rapid and safe cryptosystems. As speech communications become further broadly used and yet more delicate, the significance of offering a superior rank of security is of great significance. Up to date, various speech encryption methods have been suggested. Speech watermarking is a strong means to secrete and hence protect information from several intended or unintended utilization during communication. Speech watermarking types and applications as well as topics of robustness, capacity and imperceptibility are detailed in [21]. Authors of [5] presents an efficient safe communication system based on a speech watermarking approach in the purpose to permit an automatic recognition of the speaker, with an optimization of the whole system to boost its performance.

We proposed in [24] a new design for blind watermarking of speech and audio signals, in which we introduced the discrete wavelet transform (DWT) and the discrete cosine transform (DCT) after segmenting the signal. For protection reason, we applied Arnold transform on the watermark to save recognition security. In addition, we presented in [18, 19] a robust blind speech watermarking method using DCT and DWT inside signal sub-sampling. To get high-quality imperceptibility, fusion is realized against various attacks such as: re-quantization, cropping, echo amplification and additive white Gaussian noise (AWGN).

Chaos is a characteristic action of nonlinear dynamic systems. It is described by its large sensitivity to factors and first conditions, which operates as the encryption keys. Mathematically identified as uncertainty governed through deterministic laws. This behavior of chaotic signals offers the possibility to handle several applications. Amongst, the use of Chaos within safe communication has a big consideration. The interesting chapter in [11] explains chaotic systems and illustrates their aptness for use to protect communications information. One of the major motivations for the improved protection of communication given by Chaotic is its broadband signal property that permits efficient spectral cover up of the communication by the chaotic transporter. Authors of [29]discuss the problem of the chaotic safe communication. New double channel diffusion system is given and used in protected communication design, next the channel-switching procedures are assumed to more boost the safety of messages transmission. Paper [20] introduces a low complexity, small delay, and high degree of secured speech encryption method based on change of speech pieces by chaotic Baker map and replacement by masks in together time and transform domains to fill the unvoiced periods inside speech conversation. Chaotic shift keying-based speech encryption and decryption approaches has been presented in [25], where the input speech signals are sampled and its values are segmented into four levels which are permuted using four chaotic generators. A novel speech encryption using fractional chaotic systems is given in [28], where two-channel transmission process is used. The original speech is encoded by a nonlinear function of the chaotic states. The work in [23] intends to show modules for improving the security of speaker authentication by inserting the watermark in the detail coefficients of the speech signal after applying wavelet transform and basing on the energy computation. Speaker is identified by speech and the removed watermark from the watermarked speech. Authors of [4] study and implement the effect of using different floating-point representations on the chaotic system’s performance, for speech security, with numerical simulations for all discussed chaotic systems showing good results in terms of MSE, entropy, correlation coefficient, and pass the NIST test.

In this work, we combined the watermarking with a chaotic approach in a hybrid scheme based on Arnold scrambling Algorithm, in the scope to enhance the security of speeches in communication systems. Our contribution in this work is the substitution of the discrete wavelet transform (DWT) and the discrete cosine transform (DCT) and the segmentation of the speech signal by a novel chaotic representation after spreading and Arnold scrambling by a secret key. The superiority of the proposed scheme is revealed in numerical comparisons with other published works and our design is validated as well in order to be considered amongst the best methods compared to other recently existing strong approaches. In addition, this work highlights the efficiency of the designed schemes in furnishing strength for copyright protection to possession of the data and validating people by using speech as a biometric tool.

2 Chaotic generator (tent map, logistic map)

2.1 Logistic map

The simplest discrete chaotic systems functions that have been used recently for cryptography applications is the logistic map. The logic map function is expressed as:

$$ {\mathrm{x}}_{\mathrm{n}+1}=r.{\mathrm{x}}_{\mathrm{n}}.\left(1-{\mathrm{x}}_{\mathrm{n}}\right) $$

Where xn takes value in the interval (0, 1), the parameter r is a positive constant and takes values up to four. Its value establishes and investigates the manner of the logistic map. From r = 3.57 the iterations become completely chaotic and start to provide themselves to the aim of encryption. So a superior value of parameter r is selected to get an extremely chaotic so far deterministic discrete-time signal [20, 25, 28]. The preliminary value x0 and the parameter r areconsidered as the secret key.

2.2 Tent map

The chaotic manners of the tent map (a piecewise linear, constant map with a single maximum) has been considered analytically all over its chaotic region in terms of the invariant density and the power spectrum. As the elevation of the highest point is lowered, consecutive band-splitting changes take place in the chaotic area and gather to the change point into the non-chaotic area. The time-correlation function of non-periodic paths and their power spectrum are computed precisely at the band-splitting points and in the neighborhood to these points. The tent map is topologically conjugate, and hence the performances of the map are in this sense equal below iteration. The chaotic tent map is defined by:

$$ {\displaystyle \begin{array}{c}{\mathrm{x}}_{\mathrm{i}+1}=f\left({\mathrm{x}}_{\mathrm{i}},u\right)\\ {}f\left({\mathrm{x}}_{\mathrm{i}},u\right)=u{\mathrm{x}}_{\mathrm{i}}\kern1em \mathrm{if}\ {\mathrm{x}}_{\mathrm{i}}<0.5\\ {}f\left({\mathrm{x}}_{\mathrm{i}},u\right)=u\left(1-{\mathrm{x}}_{\mathrm{i}}\right)\kern0.75em \mathrm{otherwise}\end{array}} $$

Where:xi ⋲ [0, 1] for i ≥ 0.

This map converts an interval [0, 1] onto itself and includes merely one control parameter u, correspondingly, where u⋲[0, 2], x0 is the initial value of the system. The set of real values x0, x1, …xn is named the orbit of the system. Depending on the control parameter u, the system illustrates a variety of dynamical actions varying from expected to chaotic [7, 13, 26].

3 Watermarking

Digital watermarking retrieve is stronger if the original un-watermarked information are available. However, access to the original main signal cannot be acceptable on the entire real-world circumstances [16].In many applications, the identification algorithm is capable of using the original audio signal to extract the watermark from the watermarked signal [3]. It, often significantly, obtains superiorly the detector performance; because the watermark information is extracted throughout subtract the original signal from the watermarked signal. However, if the identification algorithm does not have access to the original signal and this inability considerably decreases the amount of information that could be masked in the original signal. The full process of the watermark insertion and removal is modeled as a communication canal where the watermark is distorted due to the presence of strong intrusive in addition to canal properties.

4 Arnold scrambling algorithm

The KxK matrix W is altered into W′ by Arnold transformation to decrease the autocorrelation coefficient of the image and subsequently the confidentiality of watermark is strengthened [14]. Arnold transformation is cyclical and whereas it is iterated, rarely the original signal will be achieved. The Arnold scrambling algorithm [10] has the features of simplicity and periodicity. So it is generally used to provide an extra stage of protection. Arnold Transform is well recognized as cat seem transforms and is merely suitable for scrambling speech signals by dividing the signal into some vectors which can be converted to N × N dimension matrices used then to mix up the signal.

Arnold Transform is cyclic in nature. The signal decryption depends on the scrambling key, which can be used as secret key and identifies the amount of times that has been scrambled.

5 The proposed hybrid chaotic watermarking architecture

5.1 Emitter side

We try to construct a random chaotic signal using (Tent map, Logistic map). We carry out the fusion of the used watermark with the original speech signal. The produced watermarked signal is combined with the random chaotic signal using a chaotic key to generate an encrypted signal, this signal is then transmitted, Figs. 1, 2 and 3.

Fig. 1
figure 1

Flowchart of the encryption scheme

Fig. 2
figure 2

Flowchart of the decryption scheme

Fig. 3
figure 3

Encryption and decryption speech processes cycle

The encryption process starting with reading a speech signal stocked in Hard disk using a Matlab function, where, the Matlab recommends to represent the speech file in the range [−1,1]. Also, read the watermark file. Then steps are as follows:

  1. 1)

    In this step, the user inputs a key (key1) to embed the watermark securely within the original speech signal. The scheme considered to embed the watermark is presented in [24]. In this method, we embed the watermark in DCT and DWT domain and employ sub-sampling technique. This method offers the embedding control from side transparency and robustness of the watermark with a shifting value (∆). This step results a speech signal marked by a secret information (watermark) named Wtr_Sp.

  2. 2)

    Logistic map and tent map create two chaotic signals, depending on the initial values input by the user, those chaotic signals are generated, and named as Lg_S and Tn_S, respectively.

  3. 3)

    Using the formulas below, the three signals Lg_S, Tn_S and Wtr_Sp are mixed to produce a new signal named Mx_Sg:

$$ \left\{\begin{array}{c} Mx\_{Sg}_i=\left(\mathrm{Tn}\_{\mathrm{S}}_{\mathrm{i}}\times \mathrm{Wtr}\_{\mathrm{S}\mathrm{p}}_{\mathrm{i}}+\left(1-\mathrm{Tn}\_{\mathrm{S}}_{\mathrm{i}}\right)\mathrm{Lg}\_{\mathrm{S}}_{\mathrm{i}}\right)-1;\kern5.5em \mathrm{Wtr}\_{\mathrm{S}\mathrm{p}}_{\mathrm{i}}=>0\ \\ {} Mx\_{Sg}_i=\left(\mathrm{Tn}\_{\mathrm{S}}_{\mathrm{i}}\times \mathrm{Wtr}\_{\mathrm{S}\mathrm{p}}_{\mathrm{i}}+\left(1-\mathrm{Tn}\_{\mathrm{S}}_{\mathrm{i}}\right)\mathrm{Lg}\_{\mathrm{S}}_{\mathrm{i}}\right)+1;\kern6.25em \mathrm{Wtr}\_{\mathrm{S}\mathrm{p}}_{\mathrm{i}}<0\end{array}\right. $$
(1)

Where i represents samples index.

  1. 4)

    Decomposing the Mx_Sg into segments, where each segment length is a square number.

  2. 5)

    Before applying Arnold transform, each segment is reshaped into 2D matrix (N × N elements)

  3. 6)

    The user inserts another key (key 2), then the encryption process employs that key on each matrix to scramble its elements with Arnold transform.

  4. 7)

    Reshape each scrambled matrix into 1D vector with length N2.

  5. 8)

    To obtain the final encrypted speech signal, the process of encryption collects the segment with each other.

5.2 Receiver side

The inverse process is performed here. Using the previous chaotic key, the received signal is decrypted by removing the same random chaotic signal generated in the transmitter side. We extract the watermark from the decrypted signal and verify the obtained signal with the original to ensure its originality without degradation, Figs. 2 and 3.

  1. 1)

    The steps 4 and 5 in the encryption process, are applied on the encrypted speech signal.

  2. 2)

    Inverse Arnold transform is then applied on each 2D matrix using the same key(key2) employed previously.

  3. 3)

    Reshape each retrieved matrix to 1D vector with length N2.

  4. 4)

    Collect the retrieved segments with each other to produce\( Mx\_{Sg}_i^{\prime } \).

  5. 5)

    The same second step in the encryption process is applied without changing the initial value.

  6. 6)

    Decrypted speech signal samples separation is accomplished respecting the following:

$$ \left\{\begin{array}{c}\mathrm{Wtr}\_{\mathrm{S}\mathrm{p}}_{\mathrm{i}}^{\prime }=\frac{\left( Mx\_{Sg}_i^{\prime }+1-\left(1-\mathrm{Tn}\_{\mathrm{S}}_{\mathrm{i}}\right)\mathrm{Lg}\_{\mathrm{S}}_{\mathrm{i}}\right)}{\mathrm{Tn}\_{\mathrm{S}}_{\mathrm{i}}} Mx\_{Sg}_i<0\\ {}\mathrm{Wtr}\_{\mathrm{S}\mathrm{p}}_{\mathrm{i}}^{\prime }=\frac{\left( Mx\_{Sg}_i^{\prime }-1+\left(1-\mathrm{Tn}\_{\mathrm{S}}_{\mathrm{i}}\right)\mathrm{Lg}\_{\mathrm{S}}_{\mathrm{i}}\right)}{\mathrm{Tn}\_{\mathrm{S}}_{\mathrm{i}}} Mx\_{Sg}_i=>0\end{array}\right. $$
(2)

Where: \( \mathrm{Wtr}\_{\mathrm{Sp}}_{\mathrm{i}}^{\prime } \) is the decrypted speech signal and \( Mx\_{Sg}_i^{\prime } \) results from the fourth step.

Until this step the speech signal is decrypted, but not confirmed. To verify that the speech signal is safe and sent from authenticate side, the decryption process maintain with these steps:

  1. 1)

    Extraction of the watermark included within the encrypted speech signal, and that by employing the same key(key1) in extraction process presented in [24].

  2. 2)

    Authentication of decrypted speech signal controlled by verification of the similarity between extracted and original watermark. So, more similarity between the two means decrypted speech more authenticate.

6 Performance evaluation metrics

6.1 Correlation coefficient

The correlation coefficient, usually denoted by ‘r’, is a measure of the strength of the straight-line or linear relationship between two variables [22].In our case the variables are original, encrypted and decrypted speech signal. If two variables are closely related with stronger association, the correlation coefficient is close to the value 1. On the other hand, if the coefficient is close to 0, two variables are not related and cannot predict each other.

The correlation coefficient ‘r’ can be calculated respecting the following formulas [6]:

$$ {r}_{S_1{S}_2=}\frac{\frac{1}{L}{\sum}_{i=1}^L\left({S}_{1,i}-E\left({S}_1\right)\right)\left(\left({S}_{2,i}-E\left({S}_2\right)\right)\right)}{\surd \left(\frac{1}{L}{\sum}_{1,i}^L{\left({S}_{1,i}-E\left({S}_1\right)\right)}^2\right)\times \surd \left(\frac{1}{L}{\sum}_{1,i}^L{\left({S}_{2,i}-E\left({S}_2\right)\right)}^2\right)}\mathrm{Where}\ E(S)=\frac{1}{L}{\sum}_{i=1}^L{S}_{i.} $$

L is the length of speech signals (number of samples), S1andS2 are the duality of the two signals (original,encrypted) or (original, decrypted).

6.2 Signal to noise ratio (SNR)

To confirm the performance of digital speech encryption schemes the SNR is calculated, where the SNR measures the noise content in the encrypted speech signals. Cryptanalyst always try to increase the noise content in the encrypted signal so as to minimize the information content in the encrypted data. Also the decipher tries to reduce the noise content in the decrypted signal. Signal to noise ratio is a factor employed to identify the amount by which the signal is stained with noise. The Signal to noise ratio can be calculated by the equation below [8]:

$$ \mathrm{SNR}=10\times \log 10\frac{\sum_{\mathrm{i}=1}^{\mathrm{L}}{\mathrm{S}}_{1,\mathrm{i}}^2}{\sum_{\mathrm{i}=1}^{\mathrm{L}}{\left({\mathrm{S}}_{1,\mathrm{i}}-{\mathrm{S}}_{2,\mathrm{i}}\right)}^2} $$

S1, iand S2, irepresent the ith samples of the (original, ciphered) or (original,deciphered) speech signals, respectively, and L represents the length of speech signals,

6.3 Bit error rate (BER)

To authenticate that the received encrypted speech signal was sent from trusty side, we make examination of the BER. The BER is employed to verify the similarity between the two watermarks, the original and the extracted watermarking image. In addition, BER equals zero means that there is no effect on the watermark and the extraction is successful which means that the received speech signal is sent from authenticated side. BER is expressed by the following formula [24].

$$ \mathrm{BER}=\frac{{\mathrm{B}}_{\mathrm{ERR}}}{\mathrm{N}}\times 100\% $$

Where: BERR: The quantity of erroneous bits

N: The number of all bits(size of the watermark)

6.4 NSCR and UACI

In our proposal, computing the unified average changing intensity (UACI) and number of sample change rate (NSCR)between the two encrypted speech signals is to look for the degree of variation when the key is modified slowly. In other words to evaluate the sensitivity of the key. The NSCR and UACI of the two encrypted speech signals are calculated using equations below [26]:

$$ {\displaystyle \begin{array}{c}\mathrm{NSCR}={\sum}_{\mathrm{i}=1}^{\mathrm{L}}\frac{{\mathrm{d}}_{\mathrm{i}}}{\mathrm{L}}\times 100\%\kern0.5em \mathrm{Where}:{\mathrm{d}}_{\mathrm{i}}=\left\{\begin{array}{c}1,\kern1.75em {\mathrm{S}}_{\mathrm{i},1}^{\prime }={\mathrm{S}}_{\mathrm{i},2}^{\prime}\\ {}0,\kern1em \mathrm{otherwise}\ \end{array}\right.\\ {}\mathrm{UACI}=\frac{1}{\mathrm{L}}\left[{\sum}_{\mathrm{i}=1}^{\mathrm{L}}\frac{{\mathrm{S}}_{\mathrm{i},1}^{\prime }-{\mathrm{S}}_{\mathrm{i},2}^{\prime }}{\operatorname{Max}}\right]\end{array}} $$

\( {\mathrm{S}}_{\mathrm{i},1}^{\prime}\mathrm{and}\ {\mathrm{S}}_{\mathrm{i},2}^{\prime } \)are the two speech signals with a slow difference on the key in the ith sample.

L: represents the length of the speech vector.

Max: depends on [2, 15], each sample of speech and audio signals assuming an integer value in the range [065,535]and in that situation Max = 65,535,so when using Matlab environment, the digital speeches are normalized in the range [11], subsequently, the Max is 2.

7 Experimental results

In this part, we will assess the proposed scheme by experimental tests using two computing PC’s with windows7, 32bits, CPU dual core. The first with 2gb RAM and on MATLAB7.1 environment and the second with 4gb RAM and on MATLAB8.1 environment. We used the second computing machine for experimenting elapsed time for execution.

All experiments are made using 20 speech files including male and female voices with different periods. This mono voice samples are selected randomly with 16bits for each sample. Table 1 illustrates the used speech signals with duration taken from the famous voices database TIMIT with gender identification. In Table 2, we present the initial statics data values of the two Chaotic maps (Tent and Logistic) on which all results are obtained. The watermark used image (16x16bit) is given in Fig. 4.

Table 1 The used speech signals with duration taken from the famous voices database TIMIT
Table 2 The initial statics data values of the two Chaotic maps
Fig. 4
figure 4

The watermark used image

7.1 Key space

The total number of different keys that used in the encryption system called briefly as key space [13]. In addition, the good encryption system needs to offer a great key space and that for compensating the degradation dynamics in PC, and thus prevents invaders to decrypt original data even after they invest large amounts of resources and time [9].With omitting logistic and tent map, only Arnold scrambling can give a wide key space, where, available permutation positions of an M × M matrix are (M × M)!, for example if we consider a size of matrix are (8 × 8)! ≈ 1,26 × 1089, so what will be if the size of the matrix with hundreds.

Depending on [17] the designing of a cryptosystem resists against brute force attack, the size of the key space should be larger than 2128(≈3,24 × 1038), based on this point we can conclude that the proposed cryptosystem can resists brute-force attack sufficient for reliable practical employ.

7.2 Keys sensitivity analysis

All the initial values (a0 and r) from logistic map, (b0 and u) from tent map and K from Arnold scrambling are a keys, so we tried to test and examine the sensitivity of the encryption algorithm by changing one key or multiple keys. Table 3 shows the values of correlation coefficient, NSCR and UACI, where those values obtained with encryption one of the selected speech signal with a series of keys, then tried to detect the encrypted speech signal using different keys series. To look at the difference and importantly of the keys, firstly the speech signal is encrypted then the metrics mentioned previously calculated using the decrypted speeches signals with the true keys and with wrong keys. The NCSR demonstrates that the two decrypted speech signal hold usually a differ samples with a percentages near 100%. The UACI confirms that the intensities of the samples between the decrypted speeches signals are divergent. Finally correlation coefficient affirms the relationship it to be underprivileged, specifically from Arnold key changing. From the obtained results we can conclude that even changes on the encryption keys values during the decryption process leads to wrong decryption results.

Table 3 Keys Sensitivity

7.3 SNR and correlation coefficient

7.3.1 Encryption process effect

The encryption is considered more acceptable when the correlation coefficient value is close to zero. In addition, the encryption process is better when the SNR value decreased. Based on this and from data gathered in Table 4,we observe that the SNR values look too small and the correlation coefficient values are close to zero, and become negative which show that the encrypted signal is very far from the original speech signal and this indicates that the characteristics of the original signal are completely segregated.

Table 4 Numerical results of the Signal to Noise Ratio (SNR) and Correlation Coefficient (CC) between the original and the encrypted signals

7.3.2 Decryption process effect

The quality of the speech signal extracted from the encrypted signal is an essential characteristic. Otherwise, the encryption process is not significant. For this, we will discuss the quality of the decrypted signal from the encrypted one, using the two previous coefficient (correlation and SNR). Table 5 gives all statistics data for these coefficients for all speech signals. From these values, we can easily observe that the obtained values are excellent. The correlation coefficient reaches the smallest value of 0.99943 and close to 1, which signifies that there is no difference between the original and the decrypted speech signals and the encryption process is very good. The SNR values are also almost significant. The variations in SNR values are due to speech signal interval and energy, see Table 5. From all this discussion, we can conclude that the proposed scheme conserves greatly the quality of the speech signal when it is decrypted.

Table 5 Numerical results of the Signal to Noise Ratio (SNR) and Correlation Coefficient (CC) between the original and the decrypted signals

7.4 Waveforms review

7.4.1 Original and encrypted speech signals

The waveform A in Figs. 5, 6, 7 and 8 shows the original speech signal of: SI770,SI839,SI943 and SI1217 respectively. The waveform B shows the encrypted signal, and for more clarification, the last waveform B is illustrated in two parts. By observing these figures, we can clearly mention that there is no similarity between the original speech signal (A) and its encrypted version (B) which is regularly uniform and it has no relation with the variations of the original waveform (A).

Fig. 5
figure 5

SI770 waveforms (A): original, B:encrypted and the first/second half of the encrypted (B)

Fig. 6
figure 6

SI839 waveforms (A): original, B: encrypted and the first/second half of the encrypted (B)

Fig. 7
figure 7

SI943 waveforms (A): original, B: encrypted and the first/second half of the encrypted (B)

Fig. 8
figure 8

SI1217 waveforms (A): original, B:encrypted and the first/second half of the encrypted (B)

7.4.2 Original and decrypted speech signals

The speech signals SI1715, SI2194, SI2303 and SX29 are showed in the first waveform of Figs. 9, 10, 11 and 12 respectively, and the decrypted speech signals are presented in the second waveform of the same figures. The third waveform illustrates the difference between the original speech signal end the decrypted one. Even if we focus well on the waveforms, we cannot distinguish between the original speech signal and the extracted decrypted signal, and we can only see the difference when we make the difference waveform. This difference waveform is showed with very small amplitude (0.01–0.01), and based on this very tiny difference, we can conclude that the two speech signals: the original and the decrypted one are similar and too close to each other.

Fig. 9
figure 9

SI1715 waveforms (A): original, (B): decrypted, the difference between the original and the decrypted speech)

Fig. 10
figure 10

SI2194 waveforms (A): original, (B): decrypted, the difference between the original and the decrypted speech)

Fig. 11
figure 11

SI2303 waveforms (A): original, (B): decrypted, the difference between the original and the decrypted speech)

Fig. 12
figure 12

SX29 waveforms (A): original, (B): decrypted, the difference between the original and the decrypted speech)

7.5 Watermark control and authentication

The proposed scheme is based on adding a watermark to the original speech signal during encryption process and extracting this watermark during decryption process. The purpose of this operation is enhancing more the security and further credibility, so that extracting successfully the watermark during decryption confirms that the received signal is well authenticated and it is transmitted from and authenticated original signal without any transformations in the transmission media. Table 6 provides results when the speech signal is attacked by some AWGN additive white Gaussian noises. In the presented data in this table, we mention that the watermark is extracted successfully in the presence of small noise, but when the noise increases considerably, it affects the watermark. Which indicates that the speech signal is affected. We can observe this in the BER values implying that the transmitted encrypted signal is suffering from some attacks. We cite that we can control the strength of the watermark introduction so that it is possible increasing or decreasing the watermark sensitivity during undergoing the attacks by only varying the ∆ values.

Table 6 BER values variation after speech Signals AWGN attacks

Reversible watermarking is based on the process of watermark insertion into a medical image, transmission of the watermarked image, where the complete removal of the watermark from the image on the recipient’s side is important and after watermark removal, the original image is completely restored and unchanged. In our case, since the quality of the decrypted speech signal is accepted and the SNR is greater than the requested value (20 dB), the removal of the watermark is not necessary to be reversible. The only condition on the watermark is that it does not affect the encrypted speech signal.

7.6 Time complexity analysis

Table 7 presents the elapsed time to accomplish the encryption/decryption operations using the proposed scheme on some speech signals.

Table 7 Elapsed times for encryption/decryption operations

We observe from Table 7 that the number of seconds taken by the proposed Algorithm to complete the encryption/decryption process is less than the time duration of the speech signal (see Table 1). So, we can judge that the proposed scheme works in real time. This can be explained by the well exploitation, and not costly, of the computing machine performances. Figure 13 illustrates the speech signal durations in addition to the two graphs with different colors represents the time variation of the two operations (encryption and decryption). We can deduce from this figure observation, that the length of speech signal can slightly affects the Algorithm execution time with a proportional relation, when the speech signal length increases the needed time for its processing increases with a real time treatment.

Fig. 13
figure 13

Speech signal durations and the execution time variation of the two operations (encryption and decryption)

7.7 Comparisons

From previous results, we confirmed that the proposed scheme offers excellent results and we can stand on them. For more substantiation that our design merits further interest and may be considered among the best methods, we try to compare it with other recently published strong approaches.

Basing on results illustrated in Table 8, the proposed method seems well again in many records than other methods used in the comparison and too close in other records. The correlation coefficient between the original and the encrypted speech signal in the proposed approach is classified second for its neighboring to zero just following the method proposed in [26]. The rest of values are also close to zero indicating good quality encryption.

Table 8 Comparison between the proposed approach and seven published methods

But the correlation coefficient in the proposed scheme between the original and the decrypted speech signal extracted at the receiver is observed the best with the value of one, which means that there is no difference between the original and the decrypted speech signal. In addition, the proposed approach has the preference of the farthest value from one compared to other methods. The very robust methods presented in [12, 15], show significant SNR values between the original and the decrypted speech signal. The proposed scheme comes following giving a SNR value of 34.08 dB; But the SNR between the original and the encrypted signal is the smallest in the proposed scheme which demonstrates that the encrypted signal is very far from the original speech signal compared to other methods.

8 Conclusion

In this work, a novel scheme for securing speech signals using three approaches: Chaotic generator (tent and logistic maps) for producing a random vector by some initially introduced values to be merged with the original speech signal values, secondly the watermarking is included inside the encrypted signal for the purpose of verification during decryption process that the encrypted signal is authenticated and does not undergo external attacks; The third process Arnold scrambling key (cat map) is used to disperse signal samples by a secret key, and recovering the original signal from samples is not achievable without this key. As a result, we can say that the larger key space is a measure of better encryption and the obtained correlation value in the proposed scheme is nearer to zero which shows that original and encrypted signals are totally uncorrelated. Also, we recovered the original speech without affecting the quality.