1 Introduction

Nowadays, we witness an unprecedented development and use of information and technology systems coupled with an expansion of complex communication networks. These interconnected networks which vehicle huge and diverse amounts of information are also connected to internet. Consequently, the information which circulates freely in several forms: audio, images, textual and otheris potentially subject to interception by unauthorized persons or systems for whom it is not intended and can be, therefore, falsified, exchanged, stored, diverted, copied, etc. This situation can raise major security concerns when the information is confidential such as in military, government or corporate speech communication. It is, thus, necessary to ensure confidentiality and integrity of the informationin face of any undesirable interception by adversaries.

Confidentiality of speech communication can be achieved by making the information incomprehensible for all intruders through modern cipher of data which is an effective means to resolve this security problem. The contribution of this work is focused on two major problems: the first problem is closely related to the very nature of the type of encryption chosen, namely stream cipher. Stream cipher is one of the two types of secret key encryption which, unlike block cipher, consists of adding bit to bit to the clear text a sequence of bits of the same length as the text to be encrypted. The added bit sequence is called the keystream and, it also represents the secret key. This system ensures perfect security under the condition that the keystream is a completely random sequence of the same size as the text to be encrypted and that it can be exchanged in a safe manner between the sender and the receiver.

However, this kind of stream encryption method presents a great disadvantage because it is neither reasonable nor practical to share a secret key whose length equals that of the text to be transmitted. In the case of speech encryption, the secret key must have the same size as that of the speech. The second problem is specific to speech signal encryption. In many cases, total encryption of a speech file is not necessary and the encryption effort can be targeted to those parts of the speech file containing useful and confidential information. Knowing that the time required for the encryption of a speech signal is directly proportional to its size (The larger the speech file, the longer the time required for its encryption), selective encryption of useful parts of the speech file can significantly reduce the encryption time and effort.

Several methods that deal with speech encryption are available in the literature Rahman et al. (2012), Musheer et al. (2012), Ashtiyani et al. (2012), Maysaa and Iman Qays (2013), Abd Elzaher et al. (2016), Farsana and Gopakumar (2016), Belmeguenai et al. (2015) and Belmeguenai et al. (2015). While some techniques use chaotic signals, others methods are based on registers Belmeguenai et al. (2015) and Belmeguenai et al. (2015). However, all of these contributions only address the first problem. In this work we propose a new approach by which both problems discussed above are addressed simultaneously. In fact, several schemes have been proposed to solve the problem related to the length of the key but these are rather complex and difficult to implement.

In this work the problem of the secret key is approached by using pseudo-random sequences generated in a deterministic way from a short secret key which can be exchanged more easily between sender and receiver. The mechanism for generating pseudo-random sequences relies on a set of linear feedback shift registers (LFSR) and nonlinear feedback shift registers (NLFSRs) combined with a Boolean function. On the other hand, the proposed method considerably optimizes the time required for encryption by focusing the encryption effort on a portion of the speech file which we call the useful information. This is achieved by removing silence/unvoiced parts of the speech based on some appropriate energy level threshold of the speech samples. The proposed method is simple, fast and easy to implement with the advantage of significant reduction of the computation time of the encryption compared to other stream cipher systems Gammel et al. (2006), Hell et al. (2006), Cid et al. (2009).

The paper is organized as follows. The Sect. 2 gives the notion of partial speech data encryption. In Sect. 3 the proposed pseudo-random generator criteria and choices are presented. Section 4 presents the proposed approach. The analysis and the results are given in Sect. 5. The concludes is presented in Sect. 6.

2 Partial speech data encryption

In partial encryption, the encryption process is not applied to the entire information set but, is limited to a selected subset of the original information. By limiting encryption to only a portion of the information, partial encryption allows significant reduction in required computing resources and encryption time. Selection of useful information is done manually or automatically depending on the application at hand Lamel et al. (1981) and Ramirez et al. (2007). This work adopts automatic detection and selection of voiced parts from silence/unvoiced parts based on the energy level of the speech samples. Very low energy samples are considered silence/unvoiced and thus removed. In order to achieve selection of useful speech information, the algorithm described in Burileanu et al. (2000) is implemented. Figure 1 shows autamatic selection of useful speech information.

3 Pseudo-random generator

The production of a pseudo-random generator (PRG), which is as resilient as possible to known attacks, requires an adequate mathematical base which makes it possible to generate robust and unpredictable pseudo-random sequences. An illustration of the different blocks used in the proposed pseudo random generator is depicted in Fig. 2.

The pseudo-random generator consists of one 256-bit linear Feedback Shift Register (LFSR), one 256-bit Non-Linear Feedback Shift Register (NLFSR) and a nonlinear Boolean function g. The content of both registers are denoted by \(s_{i+1},s_{i+2},...,s_{i+256}\) for the LSFR and \(u_{i+1},u_{i+2},...,u_{i+256}\) for NLSFR. The recurrence relationship for output of LFSR is given by:

(1)

The NLFSR uses the following recurrence relation:

(2)

The 512 contents in the LFSR and NLFSR represent the state of the PRG. From this state, 14 variables are taken as input to a Boolean function, g. seven inputs to g are taken from the NLFSR and seven are taken from the LFSR. The Boolean function g is chosen to be balanced, correlation immune of the thirth order, has algebraic degree 4 and has nonlinearity 7936. Its output is defined as

(3)

The Keystream bits Z(i) is produced by combining the two bits from the state of NLFSR and one bit from the state of LFSR with the output of a non-linear filtering function g, as:

$$\begin{aligned} Z(i)= u(i+73)\oplus u(i+89)\oplus s(i+93)\oplus g(i). \end{aligned}$$
(4)
Fig. 1
figure 1

The process of automatic selection of useful speech information: a original speech 1, b selected part to be encrypted, c original speech 2 and d selected part to be encrypted

Fig. 2
figure 2

Pseudo-random generator

4 Proposed approach

The fundamental object of this contribution is to propose an approach that allows a sender and receiver to exchange speech signals through insecure channel in such a way that the exchanged speeches are totally unintelligible to any unauthorized third party hijacker. The main idea of this work is inspired from the references Kulkarni et al. (2008), Bhatnagar and onathan Wu (2012) and Belmeguenai et al. (2015). The block diagram of the overall proposed approach is shown in Fig. 3.

4.1 Input speech

The input speech signals used to validate the proposed approach are three files from the TIMIT database Garofolo (1993). The files denoted si1699, si2257 and si2329 are available in .sph file format. Before any processing is initiated, these speech files are to be converted into .wav file format using the ‘readsph.m’ Matlab function.

4.2 Silence removal and normalization

The first step in the encryption process, at the sender side, is the pre-processing of the original speech with the purpose of ridding the signal of its silent/unvoiced parts. The remaining voiced parts are concatenated to form the useful speech signal to be encrypted. This pre-processing step is implemented automatically using the silence removal algorithm presented in Burileanu et al. (2000). Figures 1b and d illustrate the results of useful speech data selection carried out on two different speech signals. The selected signal is normalized by applying the following relationship

$$\begin{aligned} X=255\times \frac{U-Min(U)}{Max(U)-Min(U)}. \end{aligned}$$
(5)

where U is the useful speech signal, Max(U) and Min(U) are the maximum and minimum values of U, respectively, and X is the resulting normalized useful information to be encrypted. Subsequent to normalization, the useful data is converted into useful information digits using 8-bit quantization to obtain digital values in the range 0 to 255.

Fig. 3
figure 3

Block diagram of the encryption-decryption process

4.3 Encryption

Starting from an initial secret key k (initial values of LFSR and NLFSR), the sender generates a pseudo-random sequence (keystream) equal in size to the selected useful information digits to be encrypted. The encrypted useful information digits are obtained by XOR-ing each selected useful information digit with the keystream before being transferred to the receiver as shown in Fig. 3. The encryption process of the digital useful data is summarized in the following steps:

  • Generation of the keystream \(Z_{i}\) according to Eq. 4.

  • XOR-ing the normalized digital information \(X_{i}\) with keystream \(Z_{i}\) to produce the normalized encrypted useful information digits \(M_{i}\) such that; \(M_{i}=X_{i}\oplus Z_{i}.\)

  • Transmission of the normalized encrypted information Mi through the insecure communication channel.

4.4 Decryption

Having exchanged the secret key k with the sender, the receiver uses this key to restore of the normalized useful digital information by carring out the following the procedure:

  • Reception of the normalized ecrypted digital information \(M_{i}.\)

  • Generation of the keystream \(Z_{i}\) using the exchanged secret key k in Eq. 4.

  • Compute the normalized decrypted digital information \(X_{i}\) by XOR-ing the normalized ecrypted digital information \(M_{i}\) with the keystream \(Z_{i}:\) \(X_{i}=M_{i}\oplus Z_{i}.\)

4.5 Denormalization

The final step in the ecryption-decryption process to ultimately recover the original selected speech signal is denormalization of the decrypted speech digits \(X_{i}\) according to the following relation:

$$\begin{aligned} U=\frac{Max(U)-Min(U)}{255} X+ Min(U). \end{aligned}$$
(6)
Fig. 4
figure 4

Test for si1699: a original speech, b selected speech samples for encryption, c spectrum of selected speech, d normalized selected speech, e encrypted normalized selected speech, f spectrum of encrypted normalized selected speech

Fig. 5
figure 5

Test for si2257: a original speech, b selected speech samples for encryption, c spectrum of selected speech, d normalized selected speech, e encrypted normalized selected speech, f spectrum of encrypted normalized selected speech

Fig. 6
figure 6

Test for si2329: a original speech, b selected speech samples for encryption, c spectrum of selected speech, d normalized selected speech, e encrypted normalized selected speech, f spectrum of encrypted normalized selected speech

Fig. 7
figure 7

Test using Graine-128: a encrypted normalized selected speech of signal si1699, b spectrum of signal showen by a and c histograms of signal showen by a, d encrypted normalized selected speech of signal si2257, e spectrum of signal showen by d and f histograms of signal showen by d, g encrypted normalized selected speech of signal si2329, h spectrum of signal showen by g and j histograms of signal showen by g

Fig. 8
figure 8

Test using Achterbahn-128: a encrypted normalized selected speech of signal si1699, b spectrum of signal showen by a and c histograms of signal showen by a, d encrypted normalized selected speech of signal si2257, e spectrum of signal showen by d and f histograms of signal showen by d, g encrypted normalized selected speech of signal si2329, h spectrum of signal showen by g and j histograms of signal showen by g

Fig. 9
figure 9

Test using Rakaposhi: a encrypted normalized selected speech of signal si1699, b spectrum of signal showen by a and c histograms of signal showen by a, d encrypted normalized selected speech of signal si2257, e spectrum of signal showen by d and f histograms of signal showen by d, g encrypted normalized selected speech of signal si2329, h spectrum of signal showen by g and j histograms of signal showen by g

5 Analysis and discussion

In the sequel, the performance of the proposed scheme is demonstrated and analyzed through simulation experiments carried out on a MATLAB.7.5 platform. For this purpose, three speech files denoted si1699, si2257 and si2329 rare used as input speeches to the proposed encryption-decryption algorithm. Pre-processing and encryption results of these speech files sampled at a frequency of 16 Khz are shown in Figs. 4, 5 and 6. Figures 7, 8 and 9 illustrate the spectrums and histograms of encryption using Grain-128, Achterbahn-128 and Rakaposhi systems, respectively.

5.1 Histogram analysis

It is clear that the histogram of the encrypted speech illustrated in Fig. 10 is almost uniformly distributed, and different from the histogram of the original speech. Thus, the encrypted speech provides no index that would facilitate the use of a statistical attack on the proposed scheme, making statistical attacks difficult. These properties indicate that the proposed scheme provides high security against statistical attacks. Figure 10 shows the histograms of original speeches and their corresponding encrypted speeches.

Fig. 10
figure 10

Frames a, c and e show histograms of selected useful speeches in 4b, 5b and 6b, respectively. Frames b, d and f show histograms of their corresponding encrypted speeches, respectively

5.2 Key space

To further strengthen security of the system, the secret key of the proposed scheme is constituted not only of the initialization of a single register, but of both registers (LFSR and NLFSR) concatenated as a single string of 512-bit length. This large string of bits can ensure maximum security, and can reasonably prevent any attempts of attack by adversaries. Furthermore, the larger the size of the secret key, the less effective are exhaustive attacks. In the proposed scheme, \(2^{512}\) initializations are necessary for an exhaustive search which is large enough to prevent an exhaustive attack.

5.3 Key sensitivity

The most important feature of a cryptosystem based on stream cipher is key sensitivity. For example, a small change in the secret key leads to different results during the encryption and decryption of data. Correct decryption needs complete and precise knowledge of all of the secret key bits; otherwise the data cannot be decrypted. Figure 11 shows the decryption using a wrong key (one bit has been changed).

Fig. 11
figure 11

Sensitivity analysis: a decrypted speech 1 with wrong key, b the spectrum, c the histogram, d decrypted speech 2 with wrong key, e the spectrum and f the histogram

5.4 Correlation coefficient

The correlation coefficient analysis is a statistical measure to evaluate the mutual relationship between two variables. While a correlation coefficient that is equal to unity is an indication of a strong correlation between the original data and its corresponding encrypted data, a zero valued correlation coefficient points to a weak relationship between the signals. Correlation can be calculated as follows:

$$\begin{aligned} Cor(x,y)=\frac{Cov(x,y)}{\sigma _{x}\sigma _{y}}. \end{aligned}$$
(7)

where Cov is the covariance, \(\sigma _{x}\) and \(\sigma _{y}\) are the standard deviations of x any, respectively. Table 1 summarizes and compares correlation coefficients for the test speeches. It indicates that the proposed algorithm has correlation values close to zero which confirms its excellent performance.

Table 1 Correlation coefficient

5.5 Signals to noise ratio (SNR)

The signal to noise ratio is a statistical metric to evaluate the performance of any cryptosystem; it measures the information content in the encrypted data. It can be calculated as follows:

$$\begin{aligned} SNR=10\times \log _{10} \frac{E(x)}{E(d)}. \end{aligned}$$
(8)

where E(x) is the mean square of the original data and E(d) is the mean square difference between the original and reconstructed data. In Table 2 the proposed scheme gives better negative signal to noise ratio.

Table 2 Signals to noise ratio SNR(dB)

5.6 Peak signal to noise ratio (PSNR)

Peak signal to noise ratio can be used to evaluate an encryption scheme. PSNR reflects the encryption quality. It is a measurement which indicates the changes in values between the original speech and it’s encrypted form. The formula to calculate PSNR(dB) is:

$$\begin{aligned} PSNR=10\times \log _{10} \left[ \frac{n\times (Max(U))^{2}}{E(d)}\right] \end{aligned}$$
(9)

where n is the length of the original speech. The lower values of PSNR, the higher is the encryption quality. Table 3 summarizes the PSNR values.

Table 3 Peak signal to noise ratio PSNR(dB)

6 Conclusion

In this work, a novel approach for selective speech encryption based on stream cipher is elaborated. The speech signal is pre-processed to remove automatically the silent/unvoiced parts and produce the selected useful speech to be encrypted. This allows a drastic reduction in computing time and resources. The proposed pseudo-random generator allows a large key space which makes the system secure from brute force attack. The algorithm is computationally simple and highly secure and easy to implement for speech encryption and decryption. The experimental results show that the proposed scheme gives large key space, high key sensitivity, lower correlation, good SNR and a more uniform histogram. Analysis of the proposed approach shows superior performance in comparison with Grain-128, Achterbahn-128 and Rakaposhi algorithms.