1 Introduction

Many methods are invented to present a good tool for encrypting audio signals. The attacker tries to know the secret key that used for encryption. One of the major characteristic for audio is presence of silent period in the signal, where the attacker try to find through it the keys as it is considered as cipher text its plaintext is known. The attacker can know the key. All methods try to hide this period that called residual intelligibility to give the process a robustness and high security. Also connecting this service with the cloud of ERTU via several methods like Wi-Fi and mobile is an important issue to address. One of this tool cipher an image and embed it in the right part of tracks and apply a transformation where is placed in left side of track. All this operation is biased on chaotic map Yin and Min (2010). Another algorithm is scramble audio file in multidimensional that give more security against attacks Li et al. (2009). Lessons learned on the usage of call logs for security and management in Internet Protocol (IP) Telephony is described in Tartarelli et al. (2010). Balance of security strength and energy for a phasor measurement unit (PMU) monitoring system in smart grid is presented in Qiu et al. (2012). Toward secure targeted broadcast in smart grid is offered Fadlullah et al. (2012). Network access security for the internet: Protocol for carrying authentication for network access is provided Marin-Lopez et al. (2012). The Euler project: Application of software defined radio in joint security operations is demonstrated Baldini et al. (2012). Security and network operations challenges with cellular infrastructure in the tactical theater are attributive Elmasry et al. (2012). Physical layer security in wireless smart grid is depicted Lee et al. (2012). Secure service provision in smart grid communications is decrypted He et al. (2012). Secure wireless communication system for smart grid with rechargeable electric vehicles is capacitated Su et al. (2012). Synchronized multimedia streaming on the iPhone platform with network coding is made Vingelmann et al. (2011). A new tutorials on IEEE 802.1AS is provided that updates earlier description, and new simulation results for timing performance for synchronization of audio/video bridging networks Garner and Ryu (2011). A survey is done some of the prevalent and upcoming backhaul technology trends based on the aforementioned evolution within RAN. Wherever possible, a critical analysis of particular technology trend for its technical and commercial feasibility is also presented Raza (2011). The cloud computing of the television has been used to broadcast live TV to cell phones via satellite, terrestrial towers or Wi-Fi networks. Land-based broadcasting techniques send out analog or digital TV signals over the air from terrestrial base stations. The mobile telephone with a TV antenna and an analog or digital TV tuner can pick up the signals (Yasumoto et al. 2011; Polák and Kratochvíl 2011, 2012; Eldin et al. 2013; Tamayo-Fernández et al. 2011; Constantiou and Mahnke 2010). Some standards rely on satellite broadcasting to deliver live TV to cell phones. They can broadcast from satellite to mobile telephone, from satellite to base station to phone or use both techniques simultaneously Iqbal and Ahmed (2011). This broadcast method streams live TV signals via the Internet. A web-enabled smartphone with data capabilities can pick up the stream from any Wi-Fi hotspot or WiMAX coverage area (Högberg 2010; Mierau et al. 2011). The rest of this paper is organized as follow. Section 2 presents the software package proposed for ERTU cloud computing. Software package with Chaotic and Multiple Key (MK) encryption results are given in Sects. 3 and 4 respectively, taking noise effect into consideration. Finally section V is the conclusion of the paper.

2 The proposed software package for audio encryption of CICOMP

Figure 1 shows the layout of audio encryption proposed for ERTU cloud. This layout indicates the flow of the audio through the proposed package during processing. The graphical user interface for the proposed package is shown in Fig. 2. As shown, the package contains five pools as follows:

  • Scenario user can choose the multimedia to be protected or transmitted.

    Fig. 1
    figure 1

    The layout of audio encryption that used in ERTU cloud

    Fig. 2
    figure 2

    Software package GUI of audio encryption proposed for ERTU cloud

  • Flowchartsin which user can monitor the main characteristics of the audio such as time domain, spectrogram, power, time/frequency and signal distribution.

  • Encryption in which the audio is to be encrypted by either chaotic or MK algorithm. The package supports many transforms such as discrete cosine transform (DCT), discrete wavelet transform (DWT) and discrete sine transform (DST). The package also presents a new technique using AT, to be discussed later.

  • Recovery The audio is recovered by AuthGs through the recovery process.

  • Metrics many metrics are supported by the package such as elapsed time, log-likelihood ratio (LLR), correlation, and spectral distortion (SD) to enable user to guarantee protection and security.

Spectrogram is a graph which indicates the frequencies of speech versus time. It is used to get visual indication and comparison between the original and processed audio. Distribution of the signal gives the distribution of the signal’s amplitude and it becomes a good comparison tool between processed and original speech. Histograms show the distribution of data values across a data range. It may be divided to many categories

Scatter plots (also called scatter diagrams) are used to investigate the possible relationship between two variables that both relate to the same “event.” A straight line of best fit (using the least squares method) is often included. The scatter diagram helps to identify the existence of a measurable relationship between two items by measuring them in pairs and plotting them on a graph.

SD in the frequency variant spectral distance measure the likelihood pole coefficient (LPC) smooth spectrum of speech in the 300–3400 Hz band is divided into six bands and individual spectral distances are then computed over each band .In another way, it measure the distance between the original signal and the processed in frequency domain. It simply evaluates squared difference of the log-magnitude functions over an appropriate frequency band (Sridharan et al. 1991; Hedelin et al. 1999). It can be calculated as:

$$\begin{aligned} {\hbox {SD}_\mathrm{v}}_\mathrm{o} ^{2}&= \frac{1}{2\hbox {v}_\mathrm{o}}\int \nolimits _{-\mathrm{v}_\mathrm{o} }^{\mathrm{v}_\mathrm{o} } \Big (10\log _{10} \left| {\hbox {A}\left( {\hbox {e}^{\mathrm{j}2\pi \mathrm{v}}} \right) } \right| ^{2}\nonumber \\&-10\log 10\hbox {Aej}2\pi \hbox {v}2~\hbox {dv} \end{aligned}$$
(1)

where 1/A(z) is a filter model and \(1/\hat{A}(\hbox {z})\) is its quantized correspondent

LLR is a measure of spectral similarity between two signals and it has found wide use in areas of speech recognition and verification (Crochiere et al. 1950; Tribolet et al. 1978). The principal assumption on which the LLR distance is based on that speech can be represented by a pth order all-pole model of the form:

$$\begin{aligned} { x}\left( { n} \right) =\sum \limits _{{m}={ 1}}^{p} {a}_{m} {x}\left( {{n}-{m}} \right) +{G}_{x} {u}\left( {n} \right) \end{aligned}$$
(2)

where \(x(n)\) is the sampled speech signal, \(\hbox {a}_\mathrm{m} (m\,=\,1,\,2,\ldots ,p)\) are the coefficients of an all-pole filter \(1/\hbox {A}_\mathrm{x}(\hbox {z})\), which models the resonances of the speech production mechanism \(G\), is the gain of the filter, and u(n) is an appropriate excitation source for the filter. The waveform coder can be represented in which \(x(n)\) is the input speech, which can be modeled according to (2), and \(y(n)\) is the decoded output. The LLR log for comparing \(x(n)\) and \(y(n)\) can then be defined as:

$$\begin{aligned} {l}={\log } \left[ {\frac{{a}_{x} {R}_{y} {a}_{x}^{t} }{{a}_{y} {R}_{y} {a}_{y}^{t} }} \right] \end{aligned}$$
(3)

where

  • \(a_x=\,\hbox {LPC}\) coefficient vector \(\left( {1,a_1 ,a_2 \ldots a_p } \right) \) for the original speech \(x(n),\)

  • \(a_y =\,\hbox {LPC}\) coefficient vector \(\left( {1,G_1 ,G_2 \ldots G_p } \right) \) for the coded speech \(y(n)\)

  • And \(R\), is the correlation matrix of \(y(n)\)

Correlation is known as how much the similarity as identically between the original speech and the processed, and is given by Naeem et al. (2009):

$$\begin{aligned} { C}=\frac{ \sum \nolimits _{{ r}=1}^{ H} \sum \nolimits _{{ c}=1}^{ W} \left( {{ I}_1 \left( {{ r},{ c}} \right) - \overline{{ I}_1 } } \right) \left( {{ I}_2 \left( {{ r},{ c}} \right) - \overline{{ I}_2}} \right) }{\sqrt{\left[ { \sum \nolimits _{{ r}=1}^{ H} \sum \nolimits _{{ c}=1}^{ W} \left( {{ I}_1 \left( {{ r},{c}} \right) - \overline{{ I}_1}} \right) ^{2}} \right] \left[ { \sum \nolimits _{{ r}=1}^{ H} \sum \nolimits _{{ c}=1}^{ W} \left( { I}_2 \left( {{ r},{c}} \right) - \overline{{I}_{2}} \right) ^{2}} \right] }}\nonumber \\ \end{aligned}$$
(4)

where \(I_1\,\,(r, c) \) is the value of the pixel at the point \((r,\hbox {c})\) in the original audio. \(I_2(r,c) \) is the value of pixel at \((r,c)\) in the encrypted audio, \(\bar{I_1}\) is the mean of the original audio and \( \bar{I_2 } \) is the mean of the encrypted audio that is calculated as follows

$$\begin{aligned} \bar{{I}} =\frac{{ 1}}{{W}*{H}} \sum \limits _{{r}={1}}^{H} \sum \limits _{{c}={1}}^{W} {I}\left( {{r},{c}} \right) \end{aligned}$$
(5)

Sensitivity The degree of affecting is called the sensitivity, if the process is affected by a small change, so its sensitivity is high and vise versus.

A new algorithm for audio encryption is also proposed and applied via the package. The new algorithm based on encrypting the audio signal, then an AT is used with a mathematical operation to test and assure the encrypted signal is free from residual intelligibility hence increase the security. Audio encryption is done through both Chaotic and (MK) algorithms.

2.1 Encryption process

The package supports two types of encryption techniques, chaotic and MK algorithms. In chaotic algorithm, the audio signal is permuted by chaotic algorithm using an initial key then applies any transform to the resulted signal to apply a second permutation where the three level of permutation will be done on the processed signal. A masking step is done to assure that the value of each element not over 2 where it insures there is no silent period as possible.

Chaotic map is an encryption algorithm that used to relocate the position data into position. It has the benefit of low correlation, good randomness and non-predictability. It has, also a high sensitivity to initial parameters, that if as small change has been occurred Naeem et al. (2009).

The general equation is:

$$\begin{aligned} {B}({ x},{y})= \left\{ \!\! \begin{array}{ll} ({ 2x},{ y}/{ 2})\quad { 0}\le { x}<\frac{{ 1}}{{2}} \\ ({ 2x}-{ 1},\frac{{ y}}{{ 2}} +\frac{{ 1}}{{ 2}})\quad \frac{{ 1}}{{ 2}}\le { x}\le { 1} \\ \end{array} \right. \end{aligned}$$
(6)

But we use key for initial condition so as to perform the encryption algorithm. Generalized Baker map can be generalized as follows:

  1. (a)

    An \(\hbox {N}\times \hbox {N}\) square matrix is divided into k vertical rectangles of height N and with width \(\hbox {n}_\mathrm{i}\), where \(\hbox {n}_{1}+\hbox {n}_{2}+{\cdots }\,\hbox {n}_\mathrm{k}\,=\,\hbox {N}.\)

  2. (b)

    These vertical rectangles should be stretched horizontally.

  3. (c)

    Then, the rectangles are stacked to have the left one at the bottom and the right one at the top.

Discretized Baker map will be denoted B\((\hbox {n}_{1},\hbox {n}_{2},\ldots ,\hbox {n}_\mathrm{k})\), where the sequence of k integers, \(\hbox {n}_{1},\hbox {n}_{2},\ldots ,\hbox {n}_\mathrm{k}\), is chosen, such that each integer \(\hbox {n}_{i}\) divides N, and \(\hbox {N}\,=\hbox {n}_{1}\,+\cdots +\,\hbox {n}_\mathrm{i}\). The digit at the position (r,s), with \(\hbox {N}_\mathrm{i}<\hbox {r}<\hbox {N}_\mathrm{i}+\,\hbox {n}_\mathrm{i}\,\hbox {and}\,0\,<\,\hbox {s}\,<\hbox {N}\) is mapped to

$$\begin{aligned} { B}_{({ n}_1 \ldots { n}_{ k} )} \left( {{ r},{ s}} \right)&= \Big [ {\frac{{ N}}{{ r}}({ r}-{ N}_{ i} )+{ s}\,{ mod}\left( {\frac{{ N}}{{ n}_{ i} }} \right) ,}\nonumber \\&\quad \frac{{ n}_{ i} }{{N}}\Big ( { s}-{s}\,\,{ mod} \Big (\frac{N}{n_i}\Big )\Big ) +{N}_{ i}\Big ] \end{aligned}$$
(7)

An \(\hbox {N}\times \hbox {N}\) square matrix is divided into k vertical rectangles of height N and with width \(\hbox {n}_\mathrm{i}\). Then, each vertical rectangle of dimensions \(\hbox {N}\times \hbox {n}_\mathrm{i}\) is divided into \(\hbox {n}_\mathrm{i}\) boxes; each containing N points. Each of these boxes is mapped to a row of pixels by mapping column by column, with the left one at the bottom and the right one at the top.

In MK algorithm, several keys are based on one key then generate the other key from it. In this work, a second key is the inverse of original key by divided it to two halves and reverse each to generate new key. Another key is used to make block and randomization of the data, by using this key to permutated the signal. Any transformation is used to apply second permutation followed by inverse transformation. First, block randomization changes the position of any elements in the matrix to another position using the keys generated before. The matrix is divided into blocks equal to the length of keys then applies a row and column randomization where if the value of key element equal zero then the row or the column has no change. If the value of key is equal to one, then the shift for row or column is equal to the number of previous one in the key. After that, a masking step is done to assure that the value of each element not over 2.

2.2 Package applications

2.2.1 Visual inspection

The proposed package enable user to visually inspect the media under processing. For package testing, the Osarry’s voice is used for six seconds long. Figure 3(a–f) shows time domain signals for original, encrypted, recovered and frequency domain signals for original, encrypted and recovered respectively. The figure shows the difference between the original signal and the encrypted one, besides the clear of Residual intelligibility also similarity between original and recovered signal can be verified.

Fig. 3
figure 3

Time domain signals a original, b encrypted and c recovered and Frequency domain signals d original, e encrypted and f recovered

Package also supports signal distribution that are shown in Fig. 4, where it indicates the difference in the distribution of the amplitude between the original and encrypted signal besides showing the close similarity between the original and recovered signal.

Fig. 4
figure 4

The distribution of signal a original b encrypted c decrypted

The scatter diagram, shown in Fig. 5 indicates the correlation between the original signal and both the encrypted and decrypted signal, the figure indicate the high correlation for decryption and low in case of encryption. It also indicates the scattering region between the original signal and the recovered as well as the encrypted one where it is a sign for a relation between both signal and how it is difficult to get it from. The correlation between the original and encrypted signal is \(-7121\times 10^{-004}\) and the between the original and the recovered signal equal 1. The SD in encryption is 17.0025 dB where in the decryption is \(2.2546\times 10^{-010}\) dB. LLR of the encryption is 0.8124 and in the decryption is \(2.7513\times 10^{-014}\). The LLR describes the same results where is big in case of the original and encrypted and very small in case of original and decrypted signals.

Fig. 5
figure 5

The scattering between the original signal and a encrypted, b decrypted

2.2.2 Key sensitivity test

The package also was tested against key change to guarantee cloud security. One bit change in the key was deliberately changed. Figure 6 shows the difference between the original and the decrypted signal where a small change in the decrypted key occurred. It gives good evidence to how the system is sensitive to any change in the keys and hence guarantees security. Studying the quality metrics in this case shows that the correlation between the original and decrypted signal is 0.1276 and describes the dissimilarities between them. The same results can be considered in the value of SD and LLR where it is 126.1073 dB and none respectively.

Fig. 6
figure 6

The time domain of the decrypted signal for case of change in keys

2.2.3 Noise effect

The effect of noise was also evaluated for many metrics. Output SNR, segmental SNR, correlation, LLR and SD are calculated in different case of input SNR. The 100 dB for input SNR is used to evaluate the different metrics. In Fig. 7.e, the correlation indicates that the increase in SNR value increase the correlation and it increase linearly. Segmental SNR shows an increase in its value with increase SNR value in almost linear manner as shown in Fig. 7a. For Fig. 7b, Output SNR has the same behavior as previous metrics. LLR shows a variation with decrease in its value. In case of SD test the value decrease as based on Fig. 7c, d. So, this technique gives a good condition of encryption for medium and high SNR, and become good case at 15 dB. The variation in LLR is in small range, Segmental SNR and Output SNR vary in range of 20 dB. Correlation and SD’s variation is very big range.

Fig. 7
figure 7

Effect of the noise in the decrypted signal a segmental SNR, b output SNR, c log-likelihood ratio, d spectral distortion and e correlation

Fig. 8
figure 8

Osarry’s Time domain as a original signal, b AT used in encryption, c encrypted signal and d recovered signal

Fig. 9
figure 9

Scattering between the original signal and a decrypted, b encrypted

Fig. 10
figure 10

Time domain of the decrypted signal in case of key change

Table 1 The different metrics for all supported transforms

2.2.4 Audio tone

In this work, a new algorithm is proposed to encrypt the audio signal by using Chaotic encryption, then use an AT by an mathematical operation to test and assure the encrypted signal is free from residual intelligibility, so increase the security of the signal from hacking eavesdrops groups and non AuthGs. The resulted signal from second chaotic encryption stage is compared by certain level and makes a decision. If the level is small than threshold, a arithmetic operation is done where it is added to the tone signals. Otherwise, the reverse of these operations is performed for the signals which it is subtracted from tone signal.

For AT proposed testing, the Osarry’s voice is used to test the algorithm for six seconds long and the Osarry’s voice is used as tone. Figure 8a and b show time domain voice and AT signals respectively. Encrypted signal with AT algorithm and recovered one is shown in Fig. 8c and d respectively. From Fig. 8, it is obvious that the difference between the original signal and the encrypted one, besides showing the clear of Residual intelligibility and the similarity between the original and recovered signal can be verified.

The scattered diagram between the original signal and both the encrypted and decrypted signal is shown in Fig. 9 that indicates the high correlation for decryption and low in case of encryption.

AT algorithm was also tested against key sensitivity. Figure 10 shows the difference between the original and the decrypted signal where a small change in the decrypted key occurred. It gives a good example to how the system is sensitive to any change in the keys.

Table 1 gives more information for the different criteria for all supported transforms. Different metrics are available to help user to choose the desired transform.

3 Multiple keys results

Another similar software package for audio encryption with MK algorithm is also presented. All visual inspection tests and metrics described before are also available in that package. Here in a comparison between chaotic and MK encryption algorithms is presented. The performance of each algorithm is investigated in case of noise existence as well as processing time. DWT and proposed AT are also compared. Segmental SNR, Output SNR, LLR, SD and Correlation were of the decrypted signal were investigated. Figure 11. a shows that for segmental signal to noise ratio, chaotic encryption gives higher segmental signal to noise ratio than MK algorithm.

Fig. 11
figure 11

Effect of the noise in the decrypted signal versus chaotic and MK algorithm a segmental SNR, b output SNR, c LLR, d SD and e correlation

In the following, the effect of noise will be evaluated for many metrics. Output SNR, segmental SNR, correlation, LLR and SD are calculated in different case of input SNR. The 100 dB for input SNR is used to evaluate the different metrics. The test is done for both chaotic map and MKs for DWT and AT. The segmental SNR is a metric used to evaluate the effect of noise for the encrypted signal and how to effect for the decrypted signal. It evaluates the SNR for certain frame of the signal. In Fig. 11a, DWT in chaotic increase smoothly until to be fixed in its output value at 60 dB where AT in chaotic has a peaks in first value of SNR and become decrease which start to be fix after 60 dB but with small variation in its value. AT in MK is almost constant with change of SNR but its result is very bad and DWT in MK is decrease sharply and become constant after 2 dB. It is also has bad response. In Segmental SNR, AT in chaotic is preferred in low SNR where DWT in chaotic is powerful than AT due to its stability in the value. The MK performs badly for all value of SNR. The output SNR calculates the SNR for the whole signals with SNR change which is showed in Fig. 11b. DWT in chaotic increase smoothly until to be fixed in its output value at 60 dB where AT in chaotic increase linearly and start to be constant after 40 dB but with noticeable variation in its value. AT in MK is almost constant with change of SNR but its result is very bad and DWT in MK is decrease sharply and become constant after 2 dB. It is also has bad response. In Output SNR, AT in chaotic is preferred but being careful for high SNR where DWT in chaotic is less than AT but has the advantage of its stability in the value. The MK performs badly for all value of SNR. In Fig. 11c, the SD gives the effect of SNR for the similarity of decrypted signal with the original. Increase the value of SD is preferred in encryption and the reverse in decryption case. DWT in chaotic decreases linearly where AT in chaotic decrease smoothly and start to be constant after 50 dB. AT in MK is almost constant with change of SNR but its result is very bad and DWT in MK increase sharply and become constant after 2 dB. It is also has bad response. In SD, AT in chaotic is preferred where DWT in chaotic is the worst in high SNR than others. LLR is another metric to describe how much the processed signal close to the original. Increase the value of LLR is preferred in encryption and the reverse in decryption case. DWT in chaotic decreases in small range with variation where AT in chaotic decrease and start to be constant after 55 dB. AT in MK increase with variation until become constant at 50 dB and DWT in MK increase sharply and become constant after 2 dB. In LLR, AT in chaotic is the worst where AT in MK is preferred in high SNR than others and DWT in MK preferred in low SNR. All of them are represented in Fig. 11d. The correlation and its effects with SNR is displayed in Fig. 11e. DWT in chaotic increase almost linearly and start to be fixed after 60 dB where AT in chaotic increase almost linearly and start to be constant after 20 dB. AT in MK is constant with change of SNR value and DWT in MK decreases sharply and become constant after 2 dB. In correlation, AT in chaotic is the best where DWT in MK is the worst. MK has bad response of for all SNR value. AT in chaotic is the preferred than other and DWT in chaotic become the second choice.

Figure 12a monitors that the variation of block size’s effect on correlation shows that the similarities between all techniques except AT in case of Chaotic encryption which the correlation varies with change of block size. The LLR’s responses for all techniques almost have the same response with slight different in its value in case of AT Chaotic encryption as represented in Fig. 12b. For SD, Fig. 12c highlights that the MK encryption has the same response while the AT in Chaotic Map has the smallest value. DWT is the largest of them and gives the best response. The time elapsed in the process for all techniques is the same especially after block size of 32 as described in Fig. 12d

Fig. 12
figure 12

The effect of block size with a correlation, b LLR, c SD and d elapsed time

4 Conclusion

Saving the audios and multimedia of encrypt Audio using AT Based on Chaotic Map is presented. One of important aspects that any media organizations try to maintain to protect its content from any attacks like stealing or modifying or reuse without permission in advance is provided. For any ClComp, encryption of multimedia is one of application that cloud tries to maintain in good situation and protect from any tampering. In audio case, there are two type for protect the content according to its situation. First situation in the case broadcast the content and want not be received unless the AuthGs. Audio must be encrypted in way that if any one hears the encrypted signal can’t recognize the content and so can’t reuse or benefit from it. The second situation is to save it inside the organization’s ClComp with taking in consideration that may be abused by internal employee. Encrypting of the audio signals is addressed based on chaotic map and test this algorithm by using an AT to evaluate the performance. As noticed from results, DWT and AT in Chaotic encryption have a good performance than other transforms. DWT has a good response and high security but AT has high robustness for noise than others. In MK encryptions, DWT perform well more than others, but all has a bad response with noise.

The software of encrypt audio using AT Biased on Chaotic Map should appeal to not only skilled ClComp of ERTU, but also those with an interest in applying web technologies to their organization. This software is therefore for experienced ICT professionals in the workplace and authorized will be able to:

  1. (a)

    Create and deploy a web application.

  2. (b)

    Use both enterprise and web application frameworks.

  3. (c)

    Build scalable web applications and analyses their performance using the latest tools and techniques.

  4. (d)

    Develop internet rich applications using the latest rich internet application frameworks.

  5. (e)

    Perform usability testing on a given web application to ensure maximum effectiveness.

  6. (f)

    Design and deploy wire framing, eye tracking and web analytics for web sites.

  7. (g)

    Generate a suitable business case to justify a web-based development.