Abstract
The vulnerability of digital audio signals for different types of risks requires an imperceptible and robust digital audio watermarking scheme. In this research, we propose creating one such imperceptible and robust hybrid watermarking scheme based on a discrete wavelet transform (DWT) and Schur decomposition hybrid method. The proposed scheme embeds the foreground bits of the watermarking image into the least significant bit of the diagonal coefficients of the triangular matrix S generated from Schur decomposition. Schur decomposition is applied on the second sub-band HL2 generated from applying a second-level 2D-Haar DWT on the first channel of the original audio signal. We analyze the proposed digital audio watermarking scheme’s performance in terms of signal to noise ratio (SNR), objective difference grades (ODG), and subjective difference grades (SDG) that resulting 81.43, 4.78 and 0.184, respectively. The resulting of payload capacity, NC, and BER are as high as 319.29 bps, 0.9911, and 0.0135, respectively. Experimental results confirm that the proposed scheme is inaudible and robust against common types of attacks such as Gaussian noise, re-quantization, re-sampling, low-pass filter, high-pass filter, echo, MP3 compression, and cropping. In comparison with state-of-the-art audio watermarking schemes, the proposed scheme’s performance is superior in term of imperceptibility, robustness, and data payload size.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Nowadays, advancements in digital multimedia technology allow for easy and efficient storing, transferring, broadcasting, and reproducing of digital audio files [57]. Nevertheless, with such advancements have emerged different types of risks have emerged different issues such as copyright and digital rights management [4, 17]. Traditional cryptography algorithms are used to protect audio files by hiding the details of the audio file and allowing only authorized users to perform read and update processes [67]. Unfortunately, audio file decryption cannot be protected from copyright infringement.
A promising solution for the copyright problem is to hide secret and imperceptible information in audio files [6]. Hiding secret information within digital contents is one of the main challenges for the multimedia industry [5]. Steganography and digital watermarking have been used to hide information inside digital signals [48]. In steganography, the presence of hidden information cannot be detected or known, whereas in watermarking it does not matter if the hidden information is detected or known [34]. Moreover, the hidden information in watermarking should not be disclosed or removed from the digital multimedia contents.
Digital watermarking refers to the art and science of hiding information within a digital signal (e.g., image, audio, and video) [47, 58]. During the last few years, the importance of digital watermarking has led to many research studies [67]. Many applications, such as broadcast monitoring, owner identification, transaction tracking, and copy control on digital watermarking, are related to copyright management and protection [11]. In general, every watermarking system should be robust and effective, including minimum degradation, high payload size, and very low false positive rates. One of the main areas that needs attention is many multimedia applications is the security [2, 3, 7, 24].
Effective audio watermarking schemes must satisfy four main requirements (i.e., imperceptibility, security, robustness, and high payload size) [22, 48] If watermarked audio is imperceptible, the quality of audio signal before and after embedding watermarks remains the same when measured objectively and subjectively. The International Federation of the Phonographic Industry has recommended that audio watermarked signals have a more than 20 dB (decibel) signal-to-noise ratio [66]. Furthermore, a watermarked audio signal is secure if obtaining information about the embedded watermarks in the signal is impossible [4]. The robustness of watermarked audio refers to the ability to extract the embedded watermark from the audio signal after applying different types of attacks [38]. Finally, the payload size of watermarked audio refers to the amount of data that can be embedded per unit of time into the audio signal without losing imperceptibility, which must be more than 20 bits per second (bps) [21]. Maintaining these requirements in an audio watermarking scheme is a major challenge because doing so involves a trade-off [12]. For instance, increasing the number of embedded bits (high payload size) in the audio signals would decrease the quality of the watermarked signal and the scheme’s robustness.
Because of the sensitivity and wide dynamic range of the human auditory system in comparison with the human visual system, audio watermarking techniques are considered much more challenging than image and video watermarking techniques [6, 14]. Moreover, image and video signals are multidimensional, but audio signal is one-dimensional [12]. Therefore, the amount of information that can be embedded in image or video is much greater than that which can be embedded in audio. Consequently, the number of research articles related to digital audio watermarking is much smaller than that of articles related to image and video watermarking.
The rest of this article is organized as follows. In the next section, we present a review of the literature related to this research. We then discuss details of the DWT and Schur decomposition, followed by our proposed watermarking scheme. We next present our performance analysis and experimental results regarding the requirements of the audio watermarking scheme. Finally, we present our conclusions.
2 Literature review
In general, digital audio watermarking techniques can be performed in the time domain [10, 42, 63, 68] or in the transform domain [13, 45, 55, 64, 65] of the audio signal. In time-domain watermarking techniques, watermarking is embedded directly in host audio signals. The most popular technique in this domain is the least significant bit (LSB), which is based on modification of the LSB position of the original samples [51]. Although time-domain watermarking techniques are easy to implement and computationally inexpensive, they are vulnerable to intentional or unintentional audio signal-processing attacks [39].
However, transform-domain watermarking techniques embed the watermarking in the transform coefficient, which can be redistributed over different bands of this transform, leading to less degradation of the original audio signal and thus making watermarking removal more difficult for hackers [46]. Therefore, the transform-domain watermarking techniques are considered more robust and more imperceptible than time-domain watermarking techniques. Many transformations have been widely used in the literature; the most popular ones are discrete wavelet transform (DWT), discrete Fourier transform (DFT), discrete cosine transform (DCT), and singular value decomposition (SVD) [8, 15,16,17, 27, 29, 30, 36, 40, 71].
Recently, a more robust and imperceptible watermarking technique has been achieved by performing a hybrid transform into the audio signal before embedding the watermarking, such as DWT-DCT [31, 37, 41, 43], DWT-STFT [50], DWT-SVD [23, 60, 61], and DWT. Moreover, recent studies have used Schur decomposition in image and video watermarking techniques [14, 22, 53], which is proving a very promising research area that can achieve results competitive with those of other transform domains, giving robust and imperceptible watermarking results. Several image and video watermarking techniques have combined Schur decomposition with other transforms, such as DWT, SVD, and DCT [47,48,49, 53, 54]. These combinations have resulted in robust techniques against a set of image and video attacks and have succeeded in extracting the watermarking with minimum degradation in addition to high computational speed.
Schur decomposition improves technique efficiency because it needs less computational time than similar decomposition transforms. Research studies comparing the performance of SVD and Schur decomposition, both of which are based on the mathematical tools for matrices analysis [22, 49], have found that Schur watermarking techniques are computationally faster than SVD. Schur decomposition is the main intermediate step in SVD. Thus, SVD requires three times the computational effort of Schur decomposition [47, 58].
In 2010, a blind watermarking technique based on SVD and dither modulation quantization was proposed [12]. The dither modulation quantization was used to embed the watermarking bits into the singular values of the audio signal blocks. In 2012, a proposed audio watermarking technique based on Schur decomposition and dither modulation [14] divided the audio signals into nonoverlapping frames that converted into 2D matrix blocks, after which a dither modulation quantization was used to embed the watermarking into singular values of blocks. The analysis results showed the proposed technique to be efficient and robust against various attacks, with a high computational speed.
Another research group proposed a semi-blind audio watermarking technique based on SVD [50]. They used short-time Fourier transform (STFT) to convert the audio into a matrix to which they applied the SVD operation, then embedded the watermarking bits by modifying the SVD coefficients adaptively based on quantization of the norm of singular value of STFT audio signal blocks.
In 2014, a semi-blind audio watermarking technique based on a hybrid DWT-SVD transform was proposed [4]. This technique scatters the watermarking bits in the transformed audio in a way that perceives a high degree of imperceptibility and robustness. Four-level DWT decomposition is then applied to each audio signal block to form a unique distributed matrix of all the detailed sub-bands. The SVD operation is then applied on a DWT-coefficient matrix to embed the watermarking bits in the off-diagonal positions of the singular values.
Most of proposed audio watermarking schemes are designed in such a way that they satisfy the requirements such as imperceptibility, payload capacity, and robustness. Therefore, there are major challenges in audio watermarking such as maintaining robustness, perceptibility, payload rate, and performance. Unfortunately, many of the proposed audio watermarking schemes have a major challenge because doing so involves a trade-off [22]. For instance, increasing the number of embedded bits (high payload size) in the audio signals would decrease the quality of the watermarked signal and the scheme’s robustness. In this paper, we present a new audio watermarking scheme based on a hybrid discrete wavelet transform (DWT) and Schur decomposition method a solution to this problem. The proposed scheme aims to achieve a balance between the audio watermarking scheme characteristics. Schur decomposition improves watermarking efficiency because it has low complexity and needs less computational time than similar decomposition transforms such as SVD. Moreover, Schur decomposition increases the perceptual transparency of the proposed audio watermarking scheme. While, DWT transformation increases robustness of the proposed audio watermarking scheme by effectively resisting several types of audio signal attacks such as low-pass filtering, high-pass filtering, noising, cropping, compression, resembling, and re-quantization. Thus, changing DWT slightly does not affect the signal quality. Therefore, the proposed hybrid scheme achieves good rate of distortion and robustness trade-offs.
3 DWT and Schur decomposition
In this research study, we propose creating an audio watermarking scheme based on DWT and Schur decomposition. Thus, in the following subsections, we describe the DWT and Schur decomposition method in detail.
3.1 Discrete wavelet transform (DWT)
The discrete wavelet transform (DWT) is a special linear transformation that can decompose the signal into a set of orthogonal and spatially oriented frequency channels called wavelets [32]. These wavelets are considered the basic functions for representing signals; they are produced by dilation and scaling from a single common wavelet called the mother wavelet [25].
One of the most common DWT basis functions is the Haar set of wavelets. DWT with Haar is very helpful in signal representation because it splits the signal into several subbands in a frequency domain [19]. In one-dimensional DWT (1D DWT), low-pass and high-pass filters decompose the signal into two sub-bands: the father wavelet ϕ and mother wavelet ψ. However, in two-dimensional DWT (2D DWT), an extension of the 1D case, the father wavelet ϕ and mother wavelet ψ of the signal 1D DWT are decomposed into four sub-bands: LL, LH, HL, and HH.
The sub-band ϕ(x, y) or (LL) is the 2D father wavelet of the decomposed signal. This sub-band holds the average (approximated) component of the signal and is given by the following equation:
The sub-bands LH, HL, and HH represent the horizontal details signal ψH(x, y), the vertical details signal ψV(x, y), and the diagonal details signal ψD(x, y),respectively [20]. These sub-bands are the 2D mother wavelets of the decomposed signal and are given by the following equations:
Fig. 1 shows the decomposition of the two-dimensional signal by the 2D DWT into four sub-bands.
DWT can decompose the signal by more than one level. Fig. 2 illustrates how the two-dimensional signal is decomposed first into four sub-bands in the first level 2D DWT decomposition [52]. After that, the DWT can further decompose the sub-bands to obtain new sub-bands in the second level. In the example given here, the 2D DWT decomposes the horizontal details wavelet into four sub-bands in the same way as in the first level.
Discrete wavelet transforms are extremely helpful in signal analysis because they serve as powerful tools for localizing and analyzing information [70]. Therefore, many engineering and computer science applications use DWT because it represents signal information well. Digital watermarking is one of the main fields in which DWT has been used [33]. The watermarking embedding becomes easier after analyzing areas in the signal’s DWT wavelets; hence, DWT is useful in image, video, and audio watermarking schemes. Although the audio signal is one-dimensional, the proposed scheme reshapes the input audio signals into a two-dimensional representation to apply the two-level 2D DWT rather than the 1D DWT.
3.2 Schur decomposition
Schur decomposition, or Schur triangulation, is an important mathematical tool in linear algebra used in metrics analysis. This decomposition comes in two versions: real Schur transformation and complex Schur transformation [56]. Schur decomposition should be applied on an n x n complex-valued matrix; in other words, the matrix used should be square.
Accordingly, given that A is a real square matrix, then the real Schur decomposition of A should be computed by the following expression [47, 58]:
where U is an orthogonal (unitary) matrix, UT is the conjugate transpose of U, and S is the upper block-triangular matrix called the real Schur form. The eigenvalues of S are the same as those in matrix A. Schur is used in math to compute matrix exponentials because the unitary matrix in Schur decomposition makes computing matrix functions easier and less complicated. It is also used to compute nonsymmetrical eigenvalues decomposition.
4 Proposed audio watermarking scheme
In this section, we describe the details of the proposed audio watermarking scheme. The proposed scheme’s design, based on a DWT and Schur decomposition hybrid method, consists of two main procedures: embedding and extraction. The following subsection describes these procedures in detail.
4.1 Watermarking embedding procedure
The embedding procedure applies the DWT and Schur decomposition to the input audio signal. First, the proposed embedding scheme applies two levels of 2D-Haar DWT to the input original audio signal. Schur decomposition is then applied on the HL2 sub-band, which gives two matrices (U and S). The foreground bits of a binary watermarking image are embedded in the diagonal coefficients of triangular matrix S. The block diagram in Fig. 3 shows the procedure and describes the detailed steps.
-
Step 1
(preprocessing): Extract the first channel A of the input stereo wave audio file and then convert it into a 2D matrix (A2D).
-
Step 2
(2D-Haar DWT): Apply two-level 2D-Haar DWT to A2D. This operation generates seven DWT sub-bands (LL1; [LL2, HL2, LH2, HH2]; LH1; HH1). Each sub-band is a matrix of DWT coefficients at a specific resolution. Fig. 4 shows the sub-bands produced by the 2-level DWT decomposition (see Eq. (6)).
-
Step 3
(Schur) Apply Schur decomposition on the second sub-band (HL2) generated from the previous step. Schur decomposition decomposes the sub-band HL2 matrix into two independent matrices (U and S) (see Eq. (7)).
-
Step 4
(input watermark image): The input watermark image is converted into binary image w (see Eq. (8)).
-
Step 5
(watermark preprocess): Convert watermark image w into vector and extract the foreground binary bits (wsi).
-
Step 6
(embedding): Embed the binary bits of watermarking image (wsi) into the HL2 of the matrix S (\( {S}_{HL_2} \)) by substituting the watermarking bit (wsi) with the seventh LSB of the integer part in the diagonal coefficients in \( {S}_{HL_2} \) (see Eq. (9)).
-
Step 7
(audio reconstruction):
-
a)
Schur inverse: Apply the inverse of the Schur operator to the modified \( {S}_{HL_2^{\prime}}^{\prime } \) matrix to generate a modified coefficient matrix\( {HL}_2^{\prime } \) (see Eq. (10)).
-
a)
-
b)
DWT inverse: Apply the inverse of the 2D-Haar DWT to the modified coefficient matrix\( {HL}_2^{\prime } \).
-
c)
Reconstruction: Convert \( {A}_{2D}^{\prime } \) into vector A′ and then combine it with the second channel of the stereo audio to generate a final watermarked audio\( {A}_{org}^{\prime } \).
4.2 Watermarking extraction procedure
Given the watermarked audio signal, the watermarking can be extracted according to the extraction procedure described in this section. Fig. 5 demonstrates the extraction procedure.
-
Step 1
(preprocessing): Input the watermarked audio file\( {A}_{org}^{\prime } \). The first channel of the watermark signal is extracted and transformed into 2D matrix\( {A}_{2D}^{\prime } \).
-
Step 2
(2D-Haar DWT): Apply two-level Haar DWT\( {A}_{dwt}^{\prime } \), which will give the seven sub-bands \( \kern0.50em \left(\left[{\mathbf{LL}}_{\mathbf{1}};\left[{\mathbf{LL}}_{\mathbf{2}},{\mathbf{HL}}_{\mathbf{2}}^{\prime },{\mathbf{LH}}_{\mathbf{2}},{\mathbf{HH}}_{\mathbf{2}}\right];{\mathbf{LH}}_{\mathbf{1}};{\mathbf{HH}}_{\mathbf{1}}\right]\right) \).
-
Step 3
(Schur): Apply the Schur transform on the HL2’ sub-band. The Schur transform decomposes the sub-band’s coefficient matrix into two independent matrices:
-
Step 6 (extraction): Extract the embedded watermark bits from the diagonal elements of triangular matrix SHL2’ as follows:
-
Step 7 (watermark reconstruction) Reconstruct the image watermark Wsi(ext) by cascading the extracted watermark bits from the watermarked audio file.
5 Results and analysis
This section presents an evaluation of our proposed DWT-Schur audio watermarking scheme. Based on the materials and conducted experiments, subsequent sections will discuss the different measurements used to evaluate the imperceptibility, robustness, and payload capacity of the proposed watermarking scheme.
5.1 Materials
Because different audio types have different perceptual properties, we tested the proposed watermarking scheme for various audio signal types: blues, classical, noise, jazz, vocal, and pop music. Specifically, the experiment used twelve audio files, a pair of audio files for each audio type. All the audio files were downloaded from the Looperman website and selected from a different genre of the same duration (16 s) [44]. The audio files were wave stereo, sampled at 44.1 KHz and quantized to 16-bit per sample. The watermarking was embedded on the first channel. The embedded watermarking image is a 120 × 60-pixel binary image, as Fig. 6 shows.
5.2 Experimental results and analysis
In this section we performed several experimental tests to evaluate the proposed watermarking scheme. We analyzed the imperceptibility, robustness, and payload capacity of the proposed DWT-Schur watermarking scheme for different audio types. The availability of the original audio signal provided the opportunity to evaluate the imperceptibility of the proposed watermarking scheme by comparing the watermarked signal with the original signal. The imperceptibility test aims to measure the perceptual quality or perceptual transparency of the embedded watermarking in the original audio signals. Imperceptibility was measured subjectively using subjective difference grades (SDG) metrics and objectively using signal-to-noise ratio (SNR) and objective difference grades (ODG) metrics. The subjectively evaluation is based on human listening to measure the degree of embedded audio file quality degradation, while objectively evaluation is based on calculating the values of signal-to-noise ratio (SNR) and Objective difference grade tests. SNR is a statistical difference metric used to measure the noise produced from the embedded watermark by comparing the watermarked audio with the original, that gives a general indication of the imperceptibility of the proposed scheme. ODG is an objective difference grade used to measure the dissimilarities between the watermarked and original signals that measure perceptual difference between the two compared audios. In addition, Robustness of the proposed watermarking scheme was tested to show the resistance of the embedded watermark image to several types of attacks such as compression, re-sampling, and linear filtering using normalized correlation (NC) and the bit error rate (BER) to show the resistance of the embedded watermarking image. Moreover, the data payload of the proposed scheme was evaluated to measure the capacity for embedding the data in an audio file by calculating the number of embedding bits in an audio signal.
5.2.1 Imperceptibility
Imperceptibility is also called the “perceptual quality” or “perceptual transparency” of the embedded watermarking in the original audio signals. The availability of the original audio signal provided the opportunity to evaluate the imperceptibility of the proposed watermarking scheme by comparing the watermarked signal with the original signal. The imperceptibility test aims to measure the perceptual quality or perceptual transparency of the embedded watermarking in the original audio signals. Many studies have been conducted in this field to achieve a high perceptual transparent watermarking scheme [1, 9, 18, 26, 32, 45, 51, 55, 59, 62, 69]. In this paper, intensive tests have demonstrated the imperceptibility of the proposed watermarking scheme, which we have evaluated subjectively and objectively.
-
1)
Subjective test
Because the host signal was audio, we tested the perceptual quality assessment (inaudibility) of the proposed DWT-Schur watermarking scheme subjectively with a human listening test. The subjectively evaluation is based on human listening to measure the degree of embedded audio file quality degradation. Ten participants listened to pairs of original and watermarked signals ten times for each pair and then reported the difference between the two audio signals. Based on the ITU-R BS.1284, the participants chose the appropriate scale from five impairment grades varying from 5.0 to 1.0, signifying “imperceptible” to “very annoying,” as Table 1 shows. The SDG average for each pair from all the participants is calculated as the pair’s final grade.
Table 2 presents the average SDGs of each audio pair (original and watermarked) from all ten listening test participants. Because the average of the SDG results (4.78) is very close to 5, the watermarked and original audios are identical. Therefore, the results indicate that the proposed scheme is imperceptible.
-
2)
Objective test
Objectively evaluation is based on calculating the values of signal-to-noise ratio (SNR) and Objective difference grade tests. We used SNR and ODG to measure the proposed watermarking scheme’s imperceptibility objectively.
-
a)
Signal-to-noise ratio (SNR)
SNR is a statistical difference metric used to measure the noise produced from the embedded watermark by comparing the watermarked audio with the original, that gives a general indication of the imperceptibility of the proposed scheme. SNR is a statistical difference metric used to measure the noise produced from the embedded watermarking by comparing the watermarked audio with the original, as in the following equation:
where A is the original audio, A’ represents the watermarked audio signals, and n is the number of samples in the audio. A higher SNR refers to higher audio quality resulting from less error noise. According to the International Federation of Photography Industry (IFPI) standards, the SNR of an audio watermarking scheme should be above 20 dB to be imperceptible [66].
Table 3 shows the average of the SNR values for each audio type. Noticeably, the obtained SNR values from the proposed scheme are in a high range (74.22–85.99), higher than the minimum IFPI requirement (20 dB) [66]. The overall SNR average of the proposed scheme for the different audio types is 81.43. Therefore, the proposed watermarking scheme is imperceptible.
-
b)
Objective difference grade
ODG is another metric used to evaluate the proposed scheme’s imperceptibility. Although the SNR does not take the characteristics of the human auditory system into account, it gives a general indication of the scheme’s imperceptibility. ODG is an objective difference grade used to measure the dissimilarities between the watermarked and original signals that measure perceptual difference between the two compared audios. Therefore, to evaluate the ODG value, we used the perceptual evaluation of audio quality (PEAQ) scheme, which simulates the human auditory system based on ITU-R BS.1387. ODG is a Objective difference grade used to measure the dissimilarities between the watermarked and original signals. Implemented by the TSP Lab of the Electrical and Computer Engineering Department at McGill University, PEAQ software gives ODG scores ranging from 0 to −4, where 0 indicates no perceptual difference between the two compared audios (original and watermarked) and − 4 indicates that the watermarked audio is very annoying [35]. An ODG score can be greater than zero because it is derived from an artificial neural network that simulates the human auditory system.
Table 2 gives the ODG average of each audio type. The results show that the proposed watermarking scheme is imperceptible because the ODG scores are all near zero. The obtained ODG scores are higher than zero, which is out of the normal range (0 to −4). This is because the ODG score is calculated based on artificial neural networks that simulate the human auditory system. To confirm this anomalous ODG, we conducted the PEAQ test on two identical audio files; the ODG scored higher than zero, indicating that the watermarked audio file in the proposed scheme was perceptually identical to the original audio file. Fig. 7 shows the amplitude of the audio sample over time in seconds for both the original and watermarked versions of the signal. Fig. 8 shows the amplitude of the audio sample over the sample number for both the original and watermarked versions of the signal. Both figures clearly show that the signals are identical, which indicates that the watermarked audio will not irritate the listener and confirms that the proposed scheme is imperceptible.
5.2.2 Robustness results
Several signal processing operations performed on the watermarked audio may not directly affect the quality of the host audio, but they may affect the quality of the embedded watermarking image within the audio. Robustness of the proposed watermarking scheme was tested to show the resistance of the embedded watermark image to several types of attacks such as compression, re-sampling, and linear filtering using normalized correlation (NC) and the bit error rate (BER) to show the resistance of the embedded watermarking image. Therefore, we applied a set of common types of attacks to the watermarked audio and computed NC and BER metrics to test the robustness of the proposed watermarking scheme against these attacks. We measured the similarities between the original watermarked image (w) and the extracted watermarked image (w’) using NC metrics calculated as follows:
where M and N are the dimensions of the binary watermarked image. If w and w’ are almost identical, then NC is close to 1. However, if w and w’ vary, NC will be close to zero.
The BER metric measures the error bit rate between the original watermarking image (w) and the extracted watermarking image (w’), as Eq. (17) demonstrates. Thus, if the original watermarking image is identical to the extracted watermark, then the BER is zero. Table 4 describes the common signal processing attacks applied to the watermarked audio signal
where ⊕ is the exclusive-OR operation.
Tables 5 and 6 show the robustness results of the proposed scheme against several attacks in terms of NC and BER for different audio types, respectively. The results show that the proposed scheme achieves a high robustness against these attacks because the minimum value of NC is 0.9911 and the maximum value of BER is 0.0135, see Fig. 9.
5.2.3 Payload
The data payload capacity is defined as the number of bits embedded in the audio signal within a unit of time. The data payload of the proposed scheme was evaluated to measure the capacity for embedding the data in an audio file by calculating the number of embedding bits in an audio signal. It is measured by bits per second and defined by the following equation:
where P is the data payload, B is the number of embedded bits in the original audio signal, and T is the duration of the embedding in seconds. Typically, the data payload for audio watermarking methods must be more than 20 bps.
We performed a data payload capacity analysis test on the proposed scheme to calculate the number of embedding bits in an audio signal. The recommended data payload capacity of the audio watermarking scheme is more than 20 bps [66]. According to Eq. (18), the payload capacity of the proposed scheme is computed by dividing the number of embedded bits in the original audio signal over the time of the embedding in seconds, giving a payload of 516.26 bps.
6 Discussion
In this section, we compare the experimental results of our proposed scheme with other proposed digital audio watermarking schemes described in the literature. Because different authors use different metrics, the results of the imperceptibility analysis are not straightforward, making it difficult to compare the results of our scheme with others. The subjective listening test is significant for assessing the perceptual quality of the watermarked audio; however, the results can differ from one listener to another. Therefore, comparing the schemes in the literature with our scheme will not be completely accurate. Even so, Table 7 compares the values of SNR and ODG to give a better understanding of the imperceptibility performance of these schemes. Some of the values listed in Table 6 are average values for different types of audio mentioned in the recent literature. The proposed scheme presents an excellent inaudibility result because its SNR value is 81.43, which is the highest value compared with the values of other proposed schemes. Moreover, because all the ODG values of the proposed watermarking scheme are near zero, the watermarking audio is close to the original audio file, which confirms the imperceptibility of the proposed scheme. Furthermore, the data payload capacity of the proposed method is 319.29 bps, which is considered a high payload rate compared with other proposed audio watermarking schemes.
Table 8 clearly shows that the proposed scheme is robust against several types of attacks for different types of audio signals because the overall average BER value for the different types of attacks is the minimum and close to zero. Table 8 also clearly shows that the proposed scheme has a high robustness rate compared with the other schemes. Fig. 10 illustrates the sample of an audio signal after applying several types of attacks. Clearly, some types of attacks affect the audio signal more than others. Nevertheless, the analysis and results confirm the robustness of the proposed scheme as an embedded image is extracted after these attacks with little degradation. In addition, the results of the proposed scheme meet the IFPI requirements for audio watermarking schemes. To conclude, the comparisons and analysis results confirm that our proposed scheme based on a DWT and Schur decomposition hybrid method meets all the requirements of an excellent audio watermarking scheme and performs extremely well in comparison with other proposed schemes.
7 Conclusion and future works
In this research study, we propose creating a novel imperceptible and robust digital audio watermarking scheme based on a DWT and Schur decomposition hybrid method. We first carried out the watermarking embedding procedure by applying a two-level 2D-Haar DWT to the original audio signal, where the first level segmented the input audio into four sub-bands. Then, we further segmented the second band (HL1) into four new sub-bands based on a 2D-Haar DWT. Second, we applied Schur decomposition to the second sub-band HL2 to decompose the sub-band’s coefficient matrix into two independent matrices U and S. Third; we used the S matrix to embed the watermark bits into the seventh LSB of the integer part in the diagonal coefficients. Finally, we reconstructed the watermarked audio by applying the inverse of the DWT and Schur decomposition.
The experimental finding and analysis results show that the proposed watermarking scheme is robust against common types of attacks such as Gaussian noise, re-quantization, re-sampling, low-pass filtering, high-pass filtering, echo, MP3 compression, and cropping. The ODG, SDG, and SNR tests confirm the imperceptibility of the proposed audio watermarking scheme because the average ODG score (0.18) is very close to zero, the average SDG score (4.78) is very close to 5, and the average SNR (81.43) meets the IFPI requirements for audio watermarking schemes. These tests confirm that the original audio and watermarked audio obtained from the proposed scheme are identical. Moreover, the capacity of the proposed scheme is high because the data payload rate is 319.29 bps. Therefore, the results indicate that the proposed scheme is imperceptible, with high payload capacity. In comparison with other recently proposed audio watermarking schemes, our proposed method is superior in terms of balanced performance among robustness, imperceptibility, and payload capacity.
Experimentally it is shown that the performance of the proposed watermarking scheme is better than the state-of-art approaches in terms of imperceptibility, payload capacity, and robustness. Therefore, the proposed scheme can be used in digital audio signals to identify of content ownership, broadcast monitoring, publication monitoring, content authentication, copy control, and Information carrier in different fields.
One main limitation of the proposed method is the fact that the embedding process can embed small size of watermark image comparable with audio file size, due to the DWT and Schur decompositions. Thus, our futureworks include that how to increase the size of watermark image. In addition, we may include the enhancement of proposed scheme in terms of imperceptibility, payload capacity, and robustness. Furthermore, we plan to focus on developing a secure watermarking scheme based on the Schur decomposition and chaos theory for real time applications to maintain the security and other important parameters. Finally, we plan to design and implement watermarking scheme based on DWT and Schur for image and video signals in a future work.
References
Abd El-Samie FE (2009) An efficient singular value decomposition algorithm for digital audio watermarking. Int J Speech Technol 12:27–45. https://doi.org/10.1007/s10772-009-9056-2
Alanizy N, Alanizy A, Baghoza N, et al (2018) 3-Layer PC Text Security via Combining Compression, AES Cryptography 2LSB Image Steganography, Journal of Research in Engineering and Applied Sciences (JREAS), Vol. 3, No. 4, Pages: 118–124, October 2018
Alassaf N, Gutub A, Parah SA, Al Ghamdi M (2018) Enhancing Speed of SIMON: A Light-Weight-Cryptographic Algorithm for IoT Applications, Multimedia Tools and Applications: An International Journal - Springer, ISSN 1380–7501, DOI https://doi.org/10.1007/s11042-018-6801-z, Published online: 5 2018
Al-Haj A (2014) An imperceptible and robust audio watermarking algorithm. EURASIP J Audio, Speech, Music Proc 2014:1–12. https://doi.org/10.1186/s13636-014-0037-2
Al-Haj A (2014) A dual transform audio watermarking algorithm. Multimed Tools Appl 73:1897–1912. https://doi.org/10.1007/s11042-013-1645-z
Al-Haj A, Twal C, Mohammad A (2010) Hybrid DWT-SVD audio watermarking. 2010 5th Int Conf Digit Inf Manag ICDIM 2010:525–529. https://doi.org/10.1109/ICDIM.2010.5664651
Aljuaid N, Gutub A, Khan E (2018) Enhancing PC Data Security via Combining RSA Cryptography and Video Based Steganography", Journal of Information Security and Cybercrimes Research (JISCR), Vol. 1, No. 1, Pages: 8–18, Published by Naif Arab University for Security Sciences (NAUSS), June 2018
Attari AA, AsgharBeheshtiShirazi A (2017) Robust and Blind Audio Watermarking in Wavelet Domain. In: Proceedings of the International Conference on Graphics and Signal Processing - ICGSP ‘17. Singapore, Singapore, pp 69–73
Bansal N, Bansal A, Deolia V, Pathak P (2015) Comparative Analysis of LSB, DCT and DWT for Digital Watermarking. In: 2nd International Conference on Computing for Sustainable Global Development (INDIACom). Mathura, India, pp 40–45
Bassia P, Pitas I, Nikolaidis N (2001) Robust audio watermarking in the time domain. IEEE Trans Multimed 3:232–241
Bhat V, Sengupta KI, Das A (2010) An adaptive audio watermarking based on the singular value decomposition in the wavelet domain. Digit Sign Proc 20:1547–1558
Bhat V, Sengupta I, Das A (2011) An audio watermarking scheme using singular value decomposition and dither-modulation quantization. Multimed Tools Appl 52:369–383
Charfeddine M, El’Arbi M, Ben AC (2014) A new DCT audio watermarking scheme based on preliminary MP3 study. Multimed Tools Appl 70:1521–1557. https://doi.org/10.1007/s11042-012-1167-0
Choudhary A, Chauhan S (2012) Schur decomposition and dither modulation: an efficient and robust audio watermarking technique. Proc CUBE:744–748. https://doi.org/10.1145/2381716.2381858
Deb K, Rahman MA, Sultana KZ et al (2014) DCT and DWT based robust audio watermarking scheme for copyright protection. J Korea Inst Sign Proc Syst 15:1–9
Deokar SM, Dhaigude B (2015) Blind Audio Watermarking Based On Discrete Wavelet and Cosine Transform. In: 2015 International Conference on Industrial Instrumentation and Control (ICIC). Pune, pp 264–268
Dhar PK, Shimamura T (2017) Blind audio watermarking in transform domain based on singular value decomposition and exponential-log operations. Radioengineering 26:552–561. https://doi.org/10.13164/re.2017.0552
Dong L, Yan Q, Lv Y, Deng S (2017) Full band watermarking in DCT domain with Weibull model. Multimed Tools Appl 76:1983–2000. https://doi.org/10.1007/s11042-015-3115-2
Dutt S (2011) A DWT-HAAR based audio watermarking algorithm. 416–419
Elgamal a F (2013) Block-based watermarking for color images using DCT and DWT. Int J Comput Appl 66:33–40
feng LJ, Wang HX, Wu T et al (2017) Norm ratio-based audio watermarking scheme in DWT domain. Multimed Tools Appl:1–17. https://doi.org/10.1007/s11042-017-5024-z
Full EWK, Metkar S, Kamble HC (2014) Image watermarking by SCHUR decomposition. Int J Inf Comput Technol 4:1155–1159
Ganic E, Eskicioglu AM (2004) Robust DWT-SVD domain image watermarking. Proc 2004 Multimed Secur Work Multimed Secur - MM&Sec ‘04 166 . doi: https://doi.org/10.1145/1022431.1022461
Gutub A (2017) Counting-Based Secret Sharing Technique for Multimedia Applications, Multimedia Tools and Applications: An International Journal - Springer, ISSN 1380–7501, DOI https://doi.org/10.1007/s11042-017-5293-6, Published online: 2 November 2017
Hemalatha S, Acharya UD, Renuka a, Kamath PR (2012) A novel colorImage steganography using Discrete Wavelet Transform. Proc Second Int Conf Comput Sci Eng Inf Technol - CCSEIT ‘12 223–226 . doi: https://doi.org/10.1145/2393216.2393254
Hsu LY, Hu HT (2015) Blind image watermarking via exploitation of inter-block prediction and visibility threshold in DCT domain. J Vis Commun Image Represent 32:130–143. https://doi.org/10.1016/j.jvcir.2015.07.017
Hu HT, Hsu LY (2015) Robust, transparent and high-capacity audio watermarking in DCT domain. Signal Process 109:226–235. https://doi.org/10.1016/j.sigpro.2014.11.011
Hu HT, Hsu LY (2016) A DWT-based rational dither modulation scheme for effective blind audio watermarking. Circuits, Syst Sign Proc 35:553–572. https://doi.org/10.1007/s00034-015-0074-9
Hu HT, Hsu LY (2017) Incorporating spectral shaping filtering into DWT-based vector modulation to improve blind audio watermarking. Wirel Pers Commun 94:221–240. https://doi.org/10.1007/s11277-016-3178-z
Hu HT, Hsu LY, Chou HH (2014) Variable-dimensional vector modulation for perceptual-based DWT blind audio watermarking with adjustable payload capacity. Digit Sign Process A Rev J 31:115–123. https://doi.org/10.1016/j.dsp.2014.04.014
Hu HT, Hsu LY, Chou HH (2014) Perceptual-based DWPT-DCT framework for selective blind audio watermarking. Signal Process 105:316–627. https://doi.org/10.1016/j.sigpro.2014.05.003
Hu HT, Chen SH, Hsu LY (2014) Incorporation of perceptually energy-compensated qim into dwt-dct based blind audio watermarking. In: Proceedings - 2014 10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing. Kitakyushu, Japan, pp 748–752
Hu HT, Hsu LY, Garcia-Alfaro J (2015) Exploring DWT-SVD-DCT feature parameters for robust multiple watermarking against JPEG and JPEG2000 compression. Comput Electr Eng 41:52–63. https://doi.org/10.1016/j.compeleceng.2014.08.001
Hua G, Huang J, Shi YQ et al (2016) Twenty years of digital audio watermarking—a comprehensive review. Signal Process 128:222–242. https://doi.org/10.1016/j.sigpro.2016.04.005
Kabal P (2003) An Examination and Interpretation of ITU-R BS. 1387: Perceptual Evaluation of Audio Quality
Karajeh H, Maqableh M (2018) An imperceptible , robust , and high payload capacity audio watermarking scheme based on the DCT transformation and Schur decomposition. Analog Integr Circ Sig Process. https://doi.org/10.1007/s10470-018-1332-0
Kaur N, Kaur U (2013) Audio watermarking using Arnold transformation with DWT-DCT. Int J Comput Sci Eng 2:286–294
Kaur A, Dutta MK, Soni KM, Taneja N (2017) Localized & self adaptive audio watermarking algorithm in the wavelet domain. J Inf Secur Appl 33:1–15. https://doi.org/10.1016/j.jisa.2016.12.003
Kavadia C, Lodha A (2013) A review on spatial & transform domain digital watermarking techniques. Int J Adv Res Comput Sci 4:20–22
Lei B, Soon IY, Zhou F et al (2012) A robust audio watermarking scheme based on lifting wavelet transform and singular value decomposition. Signal Process 92:1985–2001
Li D, Ji Y, Kim J (2011) A Quantified Audio Watermarking Algorithm Based on DWT-DCT. In: Communications in Computer and Information Science. Springer, Berlin, Heidelberg, pp 339–340
Lie W-N, Chang L-C (2006) Robust and high-quality time-domain audio watermarking based on low-frequency amplitude modification. IEEE Trans Multimed 8:46–59
Liu J, She K (2012) A hybrid approach of DWT and DCT for rational dither modulation watermarking. Circuits Syst Sign Proc 31:797–811. https://doi.org/10.1007/s00034-011-9331-8
Looperman Pro Audio Resources Community Forums, https://www.looperman.com, Accessed Date: 15 July, 2017
Maha C, Maher E, Mohamed K, Chokri BA (2010) DCT Based blind audio watermarking scheme. In: 2010 International Conference on Signal Processing and Multimedia Applications (SIGMAP). Athens, Greece, pp 139–144
Meenakshi K, Rao CS, Prasad KS (2014) a fast and robust hybrid watermarking scheme based on Schur and Svd transform. Int J Res Eng Technol 3:7–11
Mohammad AA (2012) A new digital image watermarking scheme based on Schur decomposition. Multimed Tools Appl 59:851–883. https://doi.org/10.1007/s11042-011-0772-7
Mohan BC, Swamy KV (2010) On the use of Schur decomposition for copyright protection of digital images. Int J Comput Electr Eng 2:781–787
Mohan B, Swamy K, Kumar S (2011) A Comparative performance evaluation of SVD and Schur Decompositions for Image Watermarking. IJCA Proc Int Conf VLSI … 25–30
Özer H, Sankur B, Memon N (2005) An SVD-based audio watermarking technique. Proc 7th Work Multimed Secur - MM&Sec ‘05 51. doi: https://doi.org/10.1145/1073170.1073180
Pattanshetti P, Dongaonkar S, Karpe S (2015) Digital watermarking in audio using least significant bit and discrete cosine transform. Int J Comput Sci Inf Technol 6:3688–3692
Ponni Alias Sathya S, Ramakrishnan S (2018) Fibonacci based key frame selection and scrambling for video watermarking in DWT–SVD domain. Wirel Pers Commun. https://doi.org/10.1007/s11277-018-5252-1
Rajab L, Al-khatib T, Al-haj A (2015) A blind DWT-SCHUR based digital video watermarking technique. J Softw Eng Appl 8:224–233
Razafindradina HB, Razafindrakoto NR, Randriamitantsoa PA (2013) Improved watermarking scheme using discrete cosine transform and Schur decomposition. IJCSN - Int J Comput Sci Netw 02:25–31
Roy S, Sarkar N, Chowdhury AK, Iqbal SMA (2015) An efficient and blind audio watermarking technique in DCT domain. In: 2015 18th International Conference on Computer and Information Technology, ICCIT. Dhaka, pp 362–367
Šego V (2014) The hyperbolic Schur decomposition. Linear Algebra Appl 440:90–110. https://doi.org/10.1016/j.laa.2013.10.037
Singh D, Singh SK (2017) DWT-SVD and DCT based robust and blind watermarking scheme for copyright protection. Multimed Tools Appl 76:13001–13024. https://doi.org/10.1007/s11042-016-3706-6
Su Q, Niu Y, Liu X, Zhu Y (2012) Embedding color watermarks in color images based on Schur decomposition. Opt Commun 285:1792–1802. https://doi.org/10.1016/j.optcom.2011.12.065
Subir JAM (2016) DWT-DCT based blind audio watermarking using Arnold scrambling and Cyclic codes. In: 3rd International Conference on Signal Processing and Integrated Networks (SPIN). Noida, pp 79–84
Suresh G, Lalitha NV, Srinivasa Rao C, Sailaja V (2012) An efficient and simple audio watermarking using DCT-SVD. 2012 Int Conf devices. Circuits Syst ICDCS 2012:177–181. https://doi.org/10.1109/ICDCSyst.2012.6188699
Thind DK, Jindal S (2015) A semi blind DWT-SVD video watermarking. Procedia Comput Sci 46:1661–1667. https://doi.org/10.1016/j.procs.2015.02.104
Tsai H-H, Cheng J-S, Yu P-T (2003) Audio watermarking based on HAS and neural networks in DCT domain. EURASIP J Adv Sign Proc 2003:252–263. https://doi.org/10.1155/S1110865703208027
Wang H, Nishimura R, Suzuki Y, Mao L (2008) Fuzzy self-adaptive digital audio watermarking based on time-spread echo hidinge. Appl Acoust 69:868–874
Wang X-Y, Niu P-P, Yang H-Y (2009) A robust digital audio watermarking based on statistics characteristics. Pattern Recogn 42:3057–3064
Wang X, Wang P, Zhang P, HY SX (2013) Norm-space, aadaptive, and blind audio watermarking algorithm by discrete wavelet transform. Signal Process 93:913–922
Wu S, Huang J, Huang D, Shi YQ (2005) Efficiently self-synchronized audio watermarking for assured audio data transmission. IEEE Trans Broadcast 51:69–76
Xiang S (2011) Audio watermarking robust against D/a and a/D conversions. EURASIP J Adv Sign Proc 2011:3. https://doi.org/10.1186/1687-6180-2011-3
Xiang S, Huang J (2007) Histogram-based audio watermarking against time-scale modification and cropping attacks. IEEE Trans Multimed 9:1357–1372
Yang Y, Lei M, Liu X et al (2016) Novel zero-watermarking scheme based on DWT- DCT. China Commun 13:122–126
Zear A, Singh AK, Kumar P (2016) A proposed secure multiple watermarking technique based on DWT, DCT and SVD for application in medicine. Multimed Tools Appl:1–20. https://doi.org/10.1007/s11042-016-3862-8
Zhang J (2015) Audio dual watermarking scheme for copyright protection and content authentication. Int J Speech Technol 18:443–448. https://doi.org/10.1007/s10772-015-9287-3
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Karajeh, H., Khatib, T., Rajab, L. et al. A robust digital audio watermarking scheme based on DWT and Schur decomposition. Multimed Tools Appl 78, 18395–18418 (2019). https://doi.org/10.1007/s11042-019-7214-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7214-3