1 Introduction

With the rapid advance in multimedia processing techniques, illegal copy and modification of recoded audio have become very common. So, the content integrity of a recoded audio data has become of great concern. When dealind with audio data in medical, military, and audio applications, the integrity is very important to ensure that the original data has not been altered (G´omez et al. 2002). So, it is very important to use sensitive integrity verification schemes to detect intended or unintended modifications in the data. Recently, watermarking has gained a popular rule in this task for copyright protection, content verification, and broadcast monitoring (Zhang et al. 2012).

There are various classification criteria upon which watrmarking techniques can be classified. These criteria include robustness, perceptibility, and embedding and retrieval procedures. The robustness describes the ability of the watermark to resist common data manipulations. The watermark can be robust, fragile, or semi-fragile. A robust watermark is used to protect the copyright, because it aims at resisting different types of processing (Petrovic 2005; Chang and Chang 2010; Radharani and Valarmathi 2010). On the contrary, a fragile or semi-fragile watermark is used to verify the authenticity and content integrity. When a delicate watermark is attacked; obviously, it will be destroyed (Li and Yang 2003).

A combination of robust and fragile watermarks in audio signals can be found in (Lu et al. 2000). Of these watermarks, one was used to provide copyright protection and the other was used for tampered regions detection. The integrity verification using watermarking and fingerprinting techniques with a fragile speech recording authentication scheme based on the discrete Fourier transform (DFT) was studied by Wu and Jay Kuo (20012002). In (G´omez et al. 2002), the integrity of audio recordings is verified using a mixed watermarking-fingerprinting approach with some sort of self embedding. The idea of self embedding mainly depends on extracting the fingerprint or a dependent mark from a used signal and then keeping it within the signal through a marking process. So, there is no need for additional external data during the integrity verification process. Some approaches with the same concept have been presented for image, video, and audio (Dittmann et al. 1999; Dittmann 2001; Shaw 2000).

In (Liu and Chen 2004), a technique that localizes tampered regions of the speech content and recovers these regions based on LSB watermarking was proposed. In this paper, we present a self embedding mechanism based on the DCT for verifying audio content integrity. This idea of this mechanism is to work on the DCT of the signal blocks arranged in 2-D and embed a signature extracted from each block into another block in a manner that does not distort the audio signal. The rest of the paper is arranged as follows. Section 2 provides a review of the DCT. Section 3 presents the proposed content verification scheme. Section 4 presents the obtained results. Finally, the conclusions with future trends are given in Sect. 5.

2 Discrete cosine transform

Transform domain techniques have shown better performance in watermark embedding than time domain techniques. The majority of transform domain techniques depend on inserting the datum into the transform coefficients of the cover information, and after the modification of the coefficients, the information is converted back into the spatial domain (Shoemaker 2002; Tewfik 2000). One of the most popular transforms for watermark embedding is the DCT (Nassar et al. 2014). 

The DCT is a real transform mapping a sequence of a certain length to another sequence of the same length. It has a good energy compaction property, and enables segmentation of the signal of concern into sub-bands. Type-II DCT is the most popular for DCT implementation in signal and image processing (Ahmed et al. 1974).

The DCT divides the signal into high, middle, and low frequency components (Fl, FM, and FH) as shown in Fig. 1. In the low frequency components, most of the signal energy resides, which makes it improper to modify these components. On the other hand, high frequency components are usually removed through compression. So, mid-frequency components are the most appropriate for watermark embedding (Shiva Kumar et al. 2010).

Fig. 1
figure 1

Definition of DCT Regions

In this work, we deal with audio signals with an image processing concept. We first rearrange the 1-D audio signal into a 2-D matrix, and then apply the 2-D DCT defines as: (Khayam 2003):

$$C\left( {u,v} \right) \, = a(v)a(u)\sum\limits_{i = 0}^{N - 1} {\sum\limits_{i = 0}^{N - 1} {x_{i} } } \cos \left( {\frac{{\pi u\left( {2i + 1} \right)}}{2N}} \right)\cos \left( {\frac{{\pi u\left( {2i + 1} \right)}}{2N}} \right)$$
(1)

where M and N are the 2-D matrix dimensions. Also, u and v = 0, 1, 2….N – 1.

3 The proposed verification algorithm

In this paper, the DCT is exploited in audio integrity verification depending on develpoing the concepts in (Nassar et al. 2014). After audio signal rearrangement to 2-D, we work on 16x16 blocks. A self embedding process is applied to put a mark or signature of each block into another block. The propoesed verification scheme is shown in Fig. 2

Fig. 2
figure 2

Proposed approach

3.1 Integrity protection process

The steps of the suggested integrity verification method are summarized as follows:

  • Step 1 The 1-D audio signal is rearranged into 2-D.

  • Step 2 The obtained 2-D signal is segmented into 16x16 blocks and the DCT is applied on each block.

  • Step 3 The first row and first column from each block are weighted with a small weight and embedded in the last row and last column of another block, respectively. This process is performed on each pair of blocks. 

  • Step 4 Inverse DCT is performed on each block to obtain a modified 2-D matrix.

  • Step 5 The signal is rearranged again to 1-D. 

  • Step 6 The obtained signal is transmitted through a communication channel.

3.2 Integrity verification process

The integrity verification process are summarized as follows: 

  • Step 1 The 1-D audio signal is rearranged in 2-D. 

  • Step 2 The signal is segmented into 16x16 blocks as in the embedding process, and the 2-D DCT is applied.

  • Step 3 The embedded row and column in each DCT block are extracted and the correlation coefficients are estimated with the corresponding rows and columns in the source block, respectively.

  • Step 4 Based on the correlation coefficient values, a decision can be made whether the signature exists or not.

  • Step 5 The original signal can be recovered after signature or mark removal.

4 Result analysis and comparison

The proposed audio signal integrity verification process have been tested with MAtlab simulations. The original signal in Fig. 3 has been used. 

Fig. 3
figure 3

Original audio signal

4.1 Integrity protection process

Some qulity metrics are presented below for the proposed scheme.

4.1.1 Inaudibility

Inaudibility (audio imperceptibility) is determined by the perceptual difference between the marked audio signal and the original audio signal. It depends on the SNR defined as (Can et al. 2014; Al-Haj et al. 2009):

$$SNR = 10\log_{10} \frac{{\sum\nolimits_{j = 1}^{m} {x_{j}^{2} } }}{{\sum\nolimits_{j = 1}^{m} {\left| {x_{j} - y_{j} } \right|}^{2} }}$$
(2)

where x j is the original audio signal and y j corresponds to the marked audio signal, j is the sample index, and m is the number of samples of the output audio signal (Bassia et al. 2001).

4.1.2 Audio Spectrogram

The spectrogram is simply defined as a visual representation of a frequency spectrum representing the sound or other signal as it varies with time or some other variable. It is commonly represented by a graph with two dimensions: the horizontal axis is assigned to time, and the vertical axis is assigned to frequency. In case of 3-D representations; a third dimension represents the amplitude of a dedicated frequency at a specific time which can be visually resembled by a color or intensity, or sometimes the height of a 3-D surface. The lower frequencies are more intensive than the higher frequencies (http://en.wikipedia.org/wiki/Spectrogram).

Figure 4 ensure that the proposed scheme does not deteriorate the marked signals. 

Fig. 4
figure 4

a Original audio signal and its spectrogram, b Marked audio signal and its spectrogram

4.1.3 Correlation

Correlation coefficient can be used for signature verification. Moreover, the signal can be recoverd again after signature removal and compared with the original signal showing differences close to zero. 

Figures 5, 7, and 8 show some sample results. 

Fig. 5
figure 5

Original and marked audio signals difference

4.1.4 Spectral distortion (SD)

Spectral distorsion is an important metric that desserved consideration. 

$$SD = \frac{1}{M}\sum\limits_{m = 0}^{M - 1} {\sum\limits_{i = Nm}^{Nm + N - 1} {\left| {V_{x} (i) - V_{y} (i)} \right|} }$$
(3)

where V x (i) is the spectrum of original audio signal in dB for a certain segment and V y (i) is a spectrum of marked audio signal in dB for the same segment. Smaller values of SD mean better quality (Kubichek 1993; Wang et al. 1992; Yang et al. 1998).

Table 1 show a summary of results. 

Table 1 Numerical evaluation metrics for the integrity protection process

4.2 Integrity verification process

Figures 6, 7, and 8 in addition to Table 2 give good integrity verification results. 

Fig. 6
figure 6

Extracted audio signal and its spectrogram

Table 2 Numerical evaluation metrics for the integrity verification process
Fig. 7
figure 7

Marked and extracted audio signals difference

Fig. 8
figure 8

Original and extracted audio signals difference

4.2.1 Shearing

Shearing is a common signal manipulation attack. During transmissions, some parts of transmitted audio signal may be lost intentionally or unintentionally due to channel conditions, jamming and so on. Figures 910 show a sheared marked audio signal, and extracted audio signal. The results here reveal the reactivity of the proposed verification algorithm to the audio signal modification, in which any change in the marked signal even it was audibily undetected results in a great impact on the extracted audio signal. So, the extracted output audio signal quality can be considered as an integrity verification reference.

Fig. 9
figure 9

Sheared marked audio signal

Fig. 10
figure 10

Extracted distorted audio signal and its spectrogram

4.2.2 Low pass filter

The marked audio signal here is attacked by a third order Butterworth filter as shown in Fig. 11, the verification algorithm output is shown in Fig. 12, which shows the extracted audio signal with a clear distortion revealing the impact of filtering by using verification processing.

Fig. 11
figure 11

Filtered watermarked audio signal

Fig. 12
figure 12

Extracted distorted audio signal and its spectrogram

4.2.3 Inaudible sample change

In this experiment, the audio signal originality is tempered by changing only two sample values at different positions throughout the audio waveform in which the tampered marked audio is intended to appear audibly and not affected as shown in Fig. 13. By verifying the tampered audio signal using the proposed integrity verification algorithm, the tampering is audibly detected and the extracted output audio signal shows larger changes than the tampered audio signal as shown in Fig. 14.

Fig. 13
figure 13

Tampered audio signal and its spectrogram

Fig. 14
figure 14

Extracted audio signal and its spectrogram

The results related to the difference between the extracted and tampered marked audio signal in case of mentioned audio manipulation attacks are shown in Figs. 15, 16 and 17.

Fig. 15
figure 15

Signal difference in case of shearing

Fig. 16
figure 16

Signal difference in case of filtering

Fig. 17
figure 17

Signal difference in case of inaudible sample change

Also correlation coefficients are measured in the same circumstances of modification and tabulated in Table 3.

Table 3 Correlation coefficient measurement in case of different attacks

The results in Table 3 assures the ability of the proposed algorithm to detect changes in audio signals.

Tables 4 and 5 show more results on other data. They are in favor of the above-mentioned findings. 

Table 4 Performance analysis in case of different audio samples without modification
Table 5 Performance analysis in case of different audio samples with shearing attack

5 Conclusion

The paper presented a self embedding algorithm for integrity verification of digital audio signals. The concepts of image processing have been used here for signature or mark embedding. The suggested method is very simple and enables reconstruction of the original signal again. Simulation results have show that the suggested scheme is robust evene in the presence of attacks. The proposed schems is very appropriate in military as well as nuclear applications.