1 Introduction

Nowadays, advancements in digital multimedia technology allow for easy and efficient storing, transferring, broadcasting, and reproducing of digital audio files [57]. Nevertheless, with such advancements have emerged different types of risks have emerged different issues such as copyright and digital rights management [4, 17]. Traditional cryptography algorithms are used to protect audio files by hiding the details of the audio file and allowing only authorized users to perform read and update processes [67]. Unfortunately, audio file decryption cannot be protected from copyright infringement.

A promising solution for the copyright problem is to hide secret and imperceptible information in audio files [6]. Hiding secret information within digital contents is one of the main challenges for the multimedia industry [5]. Steganography and digital watermarking have been used to hide information inside digital signals [48]. In steganography, the presence of hidden information cannot be detected or known, whereas in watermarking it does not matter if the hidden information is detected or known [34]. Moreover, the hidden information in watermarking should not be disclosed or removed from the digital multimedia contents.

Digital watermarking refers to the art and science of hiding information within a digital signal (e.g., image, audio, and video) [47, 58]. During the last few years, the importance of digital watermarking has led to many research studies [67]. Many applications, such as broadcast monitoring, owner identification, transaction tracking, and copy control on digital watermarking, are related to copyright management and protection [11]. In general, every watermarking system should be robust and effective, including minimum degradation, high payload size, and very low false positive rates. One of the main areas that needs attention is many multimedia applications is the security [2, 3, 7, 24].

Effective audio watermarking schemes must satisfy four main requirements (i.e., imperceptibility, security, robustness, and high payload size) [22, 48] If watermarked audio is imperceptible, the quality of audio signal before and after embedding watermarks remains the same when measured objectively and subjectively. The International Federation of the Phonographic Industry has recommended that audio watermarked signals have a more than 20 dB (decibel) signal-to-noise ratio [66]. Furthermore, a watermarked audio signal is secure if obtaining information about the embedded watermarks in the signal is impossible [4]. The robustness of watermarked audio refers to the ability to extract the embedded watermark from the audio signal after applying different types of attacks [38]. Finally, the payload size of watermarked audio refers to the amount of data that can be embedded per unit of time into the audio signal without losing imperceptibility, which must be more than 20 bits per second (bps) [21]. Maintaining these requirements in an audio watermarking scheme is a major challenge because doing so involves a trade-off [12]. For instance, increasing the number of embedded bits (high payload size) in the audio signals would decrease the quality of the watermarked signal and the scheme’s robustness.

Because of the sensitivity and wide dynamic range of the human auditory system in comparison with the human visual system, audio watermarking techniques are considered much more challenging than image and video watermarking techniques [6, 14]. Moreover, image and video signals are multidimensional, but audio signal is one-dimensional [12]. Therefore, the amount of information that can be embedded in image or video is much greater than that which can be embedded in audio. Consequently, the number of research articles related to digital audio watermarking is much smaller than that of articles related to image and video watermarking.

The rest of this article is organized as follows. In the next section, we present a review of the literature related to this research. We then discuss details of the DWT and Schur decomposition, followed by our proposed watermarking scheme. We next present our performance analysis and experimental results regarding the requirements of the audio watermarking scheme. Finally, we present our conclusions.

2 Literature review

In general, digital audio watermarking techniques can be performed in the time domain [10, 42, 63, 68] or in the transform domain [13, 45, 55, 64, 65] of the audio signal. In time-domain watermarking techniques, watermarking is embedded directly in host audio signals. The most popular technique in this domain is the least significant bit (LSB), which is based on modification of the LSB position of the original samples [51]. Although time-domain watermarking techniques are easy to implement and computationally inexpensive, they are vulnerable to intentional or unintentional audio signal-processing attacks [39].

However, transform-domain watermarking techniques embed the watermarking in the transform coefficient, which can be redistributed over different bands of this transform, leading to less degradation of the original audio signal and thus making watermarking removal more difficult for hackers [46]. Therefore, the transform-domain watermarking techniques are considered more robust and more imperceptible than time-domain watermarking techniques. Many transformations have been widely used in the literature; the most popular ones are discrete wavelet transform (DWT), discrete Fourier transform (DFT), discrete cosine transform (DCT), and singular value decomposition (SVD) [8, 15,16,17, 27, 29, 30, 36, 40, 71].

Recently, a more robust and imperceptible watermarking technique has been achieved by performing a hybrid transform into the audio signal before embedding the watermarking, such as DWT-DCT [31, 37, 41, 43], DWT-STFT [50], DWT-SVD [23, 60, 61], and DWT. Moreover, recent studies have used Schur decomposition in image and video watermarking techniques [14, 22, 53], which is proving a very promising research area that can achieve results competitive with those of other transform domains, giving robust and imperceptible watermarking results. Several image and video watermarking techniques have combined Schur decomposition with other transforms, such as DWT, SVD, and DCT [47,48,49, 53, 54]. These combinations have resulted in robust techniques against a set of image and video attacks and have succeeded in extracting the watermarking with minimum degradation in addition to high computational speed.

Schur decomposition improves technique efficiency because it needs less computational time than similar decomposition transforms. Research studies comparing the performance of SVD and Schur decomposition, both of which are based on the mathematical tools for matrices analysis [22, 49], have found that Schur watermarking techniques are computationally faster than SVD. Schur decomposition is the main intermediate step in SVD. Thus, SVD requires three times the computational effort of Schur decomposition [47, 58].

In 2010, a blind watermarking technique based on SVD and dither modulation quantization was proposed [12]. The dither modulation quantization was used to embed the watermarking bits into the singular values of the audio signal blocks. In 2012, a proposed audio watermarking technique based on Schur decomposition and dither modulation [14] divided the audio signals into nonoverlapping frames that converted into 2D matrix blocks, after which a dither modulation quantization was used to embed the watermarking into singular values of blocks. The analysis results showed the proposed technique to be efficient and robust against various attacks, with a high computational speed.

Another research group proposed a semi-blind audio watermarking technique based on SVD [50]. They used short-time Fourier transform (STFT) to convert the audio into a matrix to which they applied the SVD operation, then embedded the watermarking bits by modifying the SVD coefficients adaptively based on quantization of the norm of singular value of STFT audio signal blocks.

In 2014, a semi-blind audio watermarking technique based on a hybrid DWT-SVD transform was proposed [4]. This technique scatters the watermarking bits in the transformed audio in a way that perceives a high degree of imperceptibility and robustness. Four-level DWT decomposition is then applied to each audio signal block to form a unique distributed matrix of all the detailed sub-bands. The SVD operation is then applied on a DWT-coefficient matrix to embed the watermarking bits in the off-diagonal positions of the singular values.

Most of proposed audio watermarking schemes are designed in such a way that they satisfy the requirements such as imperceptibility, payload capacity, and robustness. Therefore, there are major challenges in audio watermarking such as maintaining robustness, perceptibility, payload rate, and performance. Unfortunately, many of the proposed audio watermarking schemes have a major challenge because doing so involves a trade-off [22]. For instance, increasing the number of embedded bits (high payload size) in the audio signals would decrease the quality of the watermarked signal and the scheme’s robustness. In this paper, we present a new audio watermarking scheme based on a hybrid discrete wavelet transform (DWT) and Schur decomposition method a solution to this problem. The proposed scheme aims to achieve a balance between the audio watermarking scheme characteristics. Schur decomposition improves watermarking efficiency because it has low complexity and needs less computational time than similar decomposition transforms such as SVD. Moreover, Schur decomposition increases the perceptual transparency of the proposed audio watermarking scheme. While, DWT transformation increases robustness of the proposed audio watermarking scheme by effectively resisting several types of audio signal attacks such as low-pass filtering, high-pass filtering, noising, cropping, compression, resembling, and re-quantization. Thus, changing DWT slightly does not affect the signal quality. Therefore, the proposed hybrid scheme achieves good rate of distortion and robustness trade-offs.

3 DWT and Schur decomposition

In this research study, we propose creating an audio watermarking scheme based on DWT and Schur decomposition. Thus, in the following subsections, we describe the DWT and Schur decomposition method in detail.

3.1 Discrete wavelet transform (DWT)

The discrete wavelet transform (DWT) is a special linear transformation that can decompose the signal into a set of orthogonal and spatially oriented frequency channels called wavelets [32]. These wavelets are considered the basic functions for representing signals; they are produced by dilation and scaling from a single common wavelet called the mother wavelet [25].

One of the most common DWT basis functions is the Haar set of wavelets. DWT with Haar is very helpful in signal representation because it splits the signal into several subbands in a frequency domain [19]. In one-dimensional DWT (1D DWT), low-pass and high-pass filters decompose the signal into two sub-bands: the father wavelet ϕ and mother wavelet ψ. However, in two-dimensional DWT (2D DWT), an extension of the 1D case, the father wavelet ϕ and mother wavelet ψ of the signal 1D DWT are decomposed into four sub-bands: LL, LH, HL, and HH.

The sub-band ϕ(x, y) or (LL) is the 2D father wavelet of the decomposed signal. This sub-band holds the average (approximated) component of the signal and is given by the following equation:

$$ \phi \left(x,y\right)=\phi (x)\phi (y) $$
(1)

The sub-bands LH, HL, and HH represent the horizontal details signal ψH(x, y), the vertical details signal ψV(x, y), and the diagonal details signal ψD(x, y),respectively [20]. These sub-bands are the 2D mother wavelets of the decomposed signal and are given by the following equations:

$$ {\psi}^H\left(x,y\right)=\psi (x)\phi (y) $$
(2)
$$ {\psi}^V\left(x,y\right)=\phi (x)\ \psi (y) $$
(3)
$$ {\psi}^D\left(x,y\right)=\psi (x)\ \psi (y) $$
(4)

Fig. 1 shows the decomposition of the two-dimensional signal by the 2D DWT into four sub-bands.

Fig. 1
figure 1

Sub-band decomposition of the 2D DWT

DWT can decompose the signal by more than one level. Fig. 2 illustrates how the two-dimensional signal is decomposed first into four sub-bands in the first level 2D DWT decomposition [52]. After that, the DWT can further decompose the sub-bands to obtain new sub-bands in the second level. In the example given here, the 2D DWT decomposes the horizontal details wavelet into four sub-bands in the same way as in the first level.

Fig. 2
figure 2

Two level 2D DWT sub-band decomposition

Discrete wavelet transforms are extremely helpful in signal analysis because they serve as powerful tools for localizing and analyzing information [70]. Therefore, many engineering and computer science applications use DWT because it represents signal information well. Digital watermarking is one of the main fields in which DWT has been used [33]. The watermarking embedding becomes easier after analyzing areas in the signal’s DWT wavelets; hence, DWT is useful in image, video, and audio watermarking schemes. Although the audio signal is one-dimensional, the proposed scheme reshapes the input audio signals into a two-dimensional representation to apply the two-level 2D DWT rather than the 1D DWT.

3.2 Schur decomposition

Schur decomposition, or Schur triangulation, is an important mathematical tool in linear algebra used in metrics analysis. This decomposition comes in two versions: real Schur transformation and complex Schur transformation [56]. Schur decomposition should be applied on an n x n complex-valued matrix; in other words, the matrix used should be square.

Accordingly, given that A is a real square matrix, then the real Schur decomposition of A should be computed by the following expression [47, 58]:

$$ Schur(A)=\left(U\times S\times {U}^T\right) $$
(5)

where U is an orthogonal (unitary) matrix, UT is the conjugate transpose of U, and S is the upper block-triangular matrix called the real Schur form. The eigenvalues of S are the same as those in matrix A. Schur is used in math to compute matrix exponentials because the unitary matrix in Schur decomposition makes computing matrix functions easier and less complicated. It is also used to compute nonsymmetrical eigenvalues decomposition.

4 Proposed audio watermarking scheme

In this section, we describe the details of the proposed audio watermarking scheme. The proposed scheme’s design, based on a DWT and Schur decomposition hybrid method, consists of two main procedures: embedding and extraction. The following subsection describes these procedures in detail.

4.1 Watermarking embedding procedure

The embedding procedure applies the DWT and Schur decomposition to the input audio signal. First, the proposed embedding scheme applies two levels of 2D-Haar DWT to the input original audio signal. Schur decomposition is then applied on the HL2 sub-band, which gives two matrices (U and S). The foreground bits of a binary watermarking image are embedded in the diagonal coefficients of triangular matrix S. The block diagram in Fig. 3 shows the procedure and describes the detailed steps.

  1. Step 1

    (preprocessing): Extract the first channel A of the input stereo wave audio file and then convert it into a 2D matrix (A2D).

  2. Step 2

    (2D-Haar DWT): Apply two-level 2D-Haar DWT to A2D. This operation generates seven DWT sub-bands (LL1; [LL2, HL2, LH2, HH2]; LH1; HH1). Each sub-band is a matrix of DWT coefficients at a specific resolution. Fig. 4 shows the sub-bands produced by the 2-level DWT decomposition (see Eq. (6)).

Fig. 3
figure 3

Two-level 2D-Haar DWT-Schur watermarking embedding procedure

Fig. 4
figure 4

Frames of two-level DWT sub-bands

$$ {A}_{dwt}= DWT\left({A}_{2D}\right) $$
(6)
  1. Step 3

    (Schur) Apply Schur decomposition on the second sub-band (HL2) generated from the previous step. Schur decomposition decomposes the sub-band HL2 matrix into two independent matrices (U and S) (see Eq. (7)).

$$ Schur\left({HL}_2\right)={U}_{HL_2}\times {S}_{HL_2} $$
(7)
  1. Step 4

    (input watermark image): The input watermark image is converted into binary image w (see Eq. (8)).

$$ {w}_i=\left\{\left[0,1\right],1\le i\le \Big(M\times N\right\} $$
(8)
  1. Step 5

    (watermark preprocess): Convert watermark image w into vector and extract the foreground binary bits (wsi).

  2. Step 6

    (embedding): Embed the binary bits of watermarking image (wsi) into the HL2 of the matrix S (\( {S}_{HL_2} \)) by substituting the watermarking bit (wsi) with the seventh LSB of the integer part in the diagonal coefficients in \( {S}_{HL_2} \) (see Eq. (9)).

$$ {S}_{HL_2}^{\prime}\left(i,i\right)= BITAND\left( LSB\left({S}_{HL_2}\left(i,i\right)\right),{w}_{si}\right) $$
(9)
  1. Step 7

    (audio reconstruction):

    1. a)

      Schur inverse: Apply the inverse of the Schur operator to the modified \( {S}_{HL_2^{\prime}}^{\prime } \) matrix to generate a modified coefficient matrix\( {HL}_2^{\prime } \) (see Eq. (10)).

$$ {HL}_2^{\prime }={U}_{HL_2}\times {S}_{HL_2}\times {U}_{HL_2^T} $$
(10)
  1. b)

    DWT inverse: Apply the inverse of the 2D-Haar DWT to the modified coefficient matrix\( {HL}_2^{\prime } \).

$$ {A}_{2D}^{\prime }=\mathrm{DWT}\ \mathrm{Inverse}\ \left(\left[{\mathbf{LL}}_{\mathbf{1}};\left[{\mathbf{LL}}_{\mathbf{2}},{\mathbf{HL}}_{\mathbf{2}}^{\prime },{\mathbf{LH}}_{\mathbf{2}},{\mathbf{HH}}_{\mathbf{2}}\right];{\mathbf{LH}}_{\mathbf{1}};{\mathbf{HH}}_{\mathbf{1}}\right]\right) $$
(11)
  1. c)

    Reconstruction: Convert \( {A}_{2D}^{\prime } \) into vector A and then combine it with the second channel of the stereo audio to generate a final watermarked audio\( {A}_{org}^{\prime } \).

4.2 Watermarking extraction procedure

Given the watermarked audio signal, the watermarking can be extracted according to the extraction procedure described in this section. Fig. 5 demonstrates the extraction procedure.

  1. Step 1

    (preprocessing): Input the watermarked audio file\( {A}_{org}^{\prime } \). The first channel of the watermark signal is extracted and transformed into 2D matrix\( {A}_{2D}^{\prime } \).

  2. Step 2

    (2D-Haar DWT): Apply two-level Haar DWT\( {A}_{dwt}^{\prime } \), which will give the seven sub-bands \( \kern0.50em \left(\left[{\mathbf{LL}}_{\mathbf{1}};\left[{\mathbf{LL}}_{\mathbf{2}},{\mathbf{HL}}_{\mathbf{2}}^{\prime },{\mathbf{LH}}_{\mathbf{2}},{\mathbf{HH}}_{\mathbf{2}}\right];{\mathbf{LH}}_{\mathbf{1}};{\mathbf{HH}}_{\mathbf{1}}\right]\right) \).

Fig. 5
figure 5

DWT-Schur watermarking extraction procedure

$$ {A}_{dwt}^{\prime }= DWT\left({A}_{2D}^{\prime}\right) $$
(12)
  1. Step 3

    (Schur): Apply the Schur transform on the HL2’ sub-band. The Schur transform decomposes the sub-band’s coefficient matrix into two independent matrices:

$$ Schur\left({HL}_2^{\prime}\right)={U_{HL}}_2^{\prime}\times {S_{HL}}_2^{\prime } $$
(13)
  • Step 6 (extraction): Extract the embedded watermark bits from the diagonal elements of triangular matrix SHL2’ as follows:

$$ {w}_{si\left(\mathit{\operatorname{ext}}\right)}(i)= LSB\left({S}_{HL_2^{\prime }}\left(i,i\right)\right) $$
(14)
  • Step 7 (watermark reconstruction) Reconstruct the image watermark Wsi(ext) by cascading the extracted watermark bits from the watermarked audio file.

5 Results and analysis

This section presents an evaluation of our proposed DWT-Schur audio watermarking scheme. Based on the materials and conducted experiments, subsequent sections will discuss the different measurements used to evaluate the imperceptibility, robustness, and payload capacity of the proposed watermarking scheme.

5.1 Materials

Because different audio types have different perceptual properties, we tested the proposed watermarking scheme for various audio signal types: blues, classical, noise, jazz, vocal, and pop music. Specifically, the experiment used twelve audio files, a pair of audio files for each audio type. All the audio files were downloaded from the Looperman website and selected from a different genre of the same duration (16 s) [44]. The audio files were wave stereo, sampled at 44.1 KHz and quantized to 16-bit per sample. The watermarking was embedded on the first channel. The embedded watermarking image is a 120 × 60-pixel binary image, as Fig. 6 shows.

Fig. 6
figure 6

The binary watermarking image

5.2 Experimental results and analysis

In this section we performed several experimental tests to evaluate the proposed watermarking scheme. We analyzed the imperceptibility, robustness, and payload capacity of the proposed DWT-Schur watermarking scheme for different audio types. The availability of the original audio signal provided the opportunity to evaluate the imperceptibility of the proposed watermarking scheme by comparing the watermarked signal with the original signal. The imperceptibility test aims to measure the perceptual quality or perceptual transparency of the embedded watermarking in the original audio signals. Imperceptibility was measured subjectively using subjective difference grades (SDG) metrics and objectively using signal-to-noise ratio (SNR) and objective difference grades (ODG) metrics. The subjectively evaluation is based on human listening to measure the degree of embedded audio file quality degradation, while objectively evaluation is based on calculating the values of signal-to-noise ratio (SNR) and Objective difference grade tests. SNR is a statistical difference metric used to measure the noise produced from the embedded watermark by comparing the watermarked audio with the original, that gives a general indication of the imperceptibility of the proposed scheme. ODG is an objective difference grade used to measure the dissimilarities between the watermarked and original signals that measure perceptual difference between the two compared audios. In addition, Robustness of the proposed watermarking scheme was tested to show the resistance of the embedded watermark image to several types of attacks such as compression, re-sampling, and linear filtering using normalized correlation (NC) and the bit error rate (BER) to show the resistance of the embedded watermarking image. Moreover, the data payload of the proposed scheme was evaluated to measure the capacity for embedding the data in an audio file by calculating the number of embedding bits in an audio signal.

5.2.1 Imperceptibility

Imperceptibility is also called the “perceptual quality” or “perceptual transparency” of the embedded watermarking in the original audio signals. The availability of the original audio signal provided the opportunity to evaluate the imperceptibility of the proposed watermarking scheme by comparing the watermarked signal with the original signal. The imperceptibility test aims to measure the perceptual quality or perceptual transparency of the embedded watermarking in the original audio signals. Many studies have been conducted in this field to achieve a high perceptual transparent watermarking scheme [1, 9, 18, 26, 32, 45, 51, 55, 59, 62, 69]. In this paper, intensive tests have demonstrated the imperceptibility of the proposed watermarking scheme, which we have evaluated subjectively and objectively.

  1. 1)

    Subjective test

Because the host signal was audio, we tested the perceptual quality assessment (inaudibility) of the proposed DWT-Schur watermarking scheme subjectively with a human listening test. The subjectively evaluation is based on human listening to measure the degree of embedded audio file quality degradation. Ten participants listened to pairs of original and watermarked signals ten times for each pair and then reported the difference between the two audio signals. Based on the ITU-R BS.1284, the participants chose the appropriate scale from five impairment grades varying from 5.0 to 1.0, signifying “imperceptible” to “very annoying,” as Table 1 shows. The SDG average for each pair from all the participants is calculated as the pair’s final grade.

Table 1 Impairment grades of the subjective and objective difference grades

Table 2 presents the average SDGs of each audio pair (original and watermarked) from all ten listening test participants. Because the average of the SDG results (4.78) is very close to 5, the watermarked and original audios are identical. Therefore, the results indicate that the proposed scheme is imperceptible.

  1. 2)

    Objective test

Table 2 The ODG and SDG values of different audio signals

Objectively evaluation is based on calculating the values of signal-to-noise ratio (SNR) and Objective difference grade tests. We used SNR and ODG to measure the proposed watermarking scheme’s imperceptibility objectively.

  1. a)

    Signal-to-noise ratio (SNR)

SNR is a statistical difference metric used to measure the noise produced from the embedded watermark by comparing the watermarked audio with the original, that gives a general indication of the imperceptibility of the proposed scheme. SNR is a statistical difference metric used to measure the noise produced from the embedded watermarking by comparing the watermarked audio with the original, as in the following equation:

$$ SNR=10{\log}_{10}\left[\frac{\sum \limits_{i=1}^nA{(i)}^2}{\sum \limits_{i=1}^n{\left[A(i)-{A}^{\prime }(i)\right]}^2}\right] $$
(15)

where A is the original audio, A’ represents the watermarked audio signals, and n is the number of samples in the audio. A higher SNR refers to higher audio quality resulting from less error noise. According to the International Federation of Photography Industry (IFPI) standards, the SNR of an audio watermarking scheme should be above 20 dB to be imperceptible [66].

Table 3 shows the average of the SNR values for each audio type. Noticeably, the obtained SNR values from the proposed scheme are in a high range (74.22–85.99), higher than the minimum IFPI requirement (20 dB) [66]. The overall SNR average of the proposed scheme for the different audio types is 81.43. Therefore, the proposed watermarking scheme is imperceptible.

  1. b)

    Objective difference grade

Table 3 SNR, NC, and BER values for different audio signals

ODG is another metric used to evaluate the proposed scheme’s imperceptibility. Although the SNR does not take the characteristics of the human auditory system into account, it gives a general indication of the scheme’s imperceptibility. ODG is an objective difference grade used to measure the dissimilarities between the watermarked and original signals that measure perceptual difference between the two compared audios. Therefore, to evaluate the ODG value, we used the perceptual evaluation of audio quality (PEAQ) scheme, which simulates the human auditory system based on ITU-R BS.1387. ODG is a Objective difference grade used to measure the dissimilarities between the watermarked and original signals. Implemented by the TSP Lab of the Electrical and Computer Engineering Department at McGill University, PEAQ software gives ODG scores ranging from 0 to −4, where 0 indicates no perceptual difference between the two compared audios (original and watermarked) and − 4 indicates that the watermarked audio is very annoying [35]. An ODG score can be greater than zero because it is derived from an artificial neural network that simulates the human auditory system.

Table 2 gives the ODG average of each audio type. The results show that the proposed watermarking scheme is imperceptible because the ODG scores are all near zero. The obtained ODG scores are higher than zero, which is out of the normal range (0 to −4). This is because the ODG score is calculated based on artificial neural networks that simulate the human auditory system. To confirm this anomalous ODG, we conducted the PEAQ test on two identical audio files; the ODG scored higher than zero, indicating that the watermarked audio file in the proposed scheme was perceptually identical to the original audio file. Fig. 7 shows the amplitude of the audio sample over time in seconds for both the original and watermarked versions of the signal. Fig. 8 shows the amplitude of the audio sample over the sample number for both the original and watermarked versions of the signal. Both figures clearly show that the signals are identical, which indicates that the watermarked audio will not irritate the listener and confirms that the proposed scheme is imperceptible.

Fig. 7
figure 7

Original and watermarked signals. X-axis: time in seconds, Y-axis: amplitude of the samples

Fig. 8
figure 8

Original and watermarked signals. X-axis: samples of the signal, Y-axis: the amplitude of the samples

5.2.2 Robustness results

Several signal processing operations performed on the watermarked audio may not directly affect the quality of the host audio, but they may affect the quality of the embedded watermarking image within the audio. Robustness of the proposed watermarking scheme was tested to show the resistance of the embedded watermark image to several types of attacks such as compression, re-sampling, and linear filtering using normalized correlation (NC) and the bit error rate (BER) to show the resistance of the embedded watermarking image. Therefore, we applied a set of common types of attacks to the watermarked audio and computed NC and BER metrics to test the robustness of the proposed watermarking scheme against these attacks. We measured the similarities between the original watermarked image (w) and the extracted watermarked image (w’) using NC metrics calculated as follows:

$$ NC\left(w,{w}^{\prime}\right)=\frac{\sum \limits_{i=1}^M\sum \limits_{j=1}^Nw\left(i,j\right){w}^{\prime}\left(i,j\right)}{\sqrt{\sum \limits_{i=1}^M\sum \limits_{j=1}^Nw{\left(i,j\right)}^2}\sqrt{\sum \limits_{i=1}^M\sum \limits_{j=1}^N{w}^{\prime }{\left(i,j\right)}^2}} $$
(16)

where M and N are the dimensions of the binary watermarked image. If w and w’ are almost identical, then NC is close to 1. However, if w and w’ vary, NC will be close to zero.

The BER metric measures the error bit rate between the original watermarking image (w) and the extracted watermarking image (w’), as Eq. (17) demonstrates. Thus, if the original watermarking image is identical to the extracted watermark, then the BER is zero. Table 4 describes the common signal processing attacks applied to the watermarked audio signal

$$ BER\left(w,{w}^{\prime}\right)=\frac{\sum \limits_{i=1}^M\sum \limits_{j=1}^Nw\left(i,j\right)\bigoplus {w}^{\prime}\left(i,j\right)}{M\times N} $$
(17)

where ⊕ is the exclusive-OR operation.

Table 4 Attack types and specifications

Tables 5 and 6 show the robustness results of the proposed scheme against several attacks in terms of NC and BER for different audio types, respectively. The results show that the proposed scheme achieves a high robustness against these attacks because the minimum value of NC is 0.9911 and the maximum value of BER is 0.0135, see Fig. 9.

Table 5 Average NC for each audio type
Table 6 Average BER for each audio type
Fig. 9
figure 9

Average BER for each audio type with different types of attacks

5.2.3 Payload

The data payload capacity is defined as the number of bits embedded in the audio signal within a unit of time. The data payload of the proposed scheme was evaluated to measure the capacity for embedding the data in an audio file by calculating the number of embedding bits in an audio signal. It is measured by bits per second and defined by the following equation:

$$ P=\frac{B}{T}(bps) $$
(18)

where P is the data payload, B is the number of embedded bits in the original audio signal, and T is the duration of the embedding in seconds. Typically, the data payload for audio watermarking methods must be more than 20 bps.

We performed a data payload capacity analysis test on the proposed scheme to calculate the number of embedding bits in an audio signal. The recommended data payload capacity of the audio watermarking scheme is more than 20 bps [66]. According to Eq. (18), the payload capacity of the proposed scheme is computed by dividing the number of embedded bits in the original audio signal over the time of the embedding in seconds, giving a payload of 516.26 bps.

6 Discussion

In this section, we compare the experimental results of our proposed scheme with other proposed digital audio watermarking schemes described in the literature. Because different authors use different metrics, the results of the imperceptibility analysis are not straightforward, making it difficult to compare the results of our scheme with others. The subjective listening test is significant for assessing the perceptual quality of the watermarked audio; however, the results can differ from one listener to another. Therefore, comparing the schemes in the literature with our scheme will not be completely accurate. Even so, Table 7 compares the values of SNR and ODG to give a better understanding of the imperceptibility performance of these schemes. Some of the values listed in Table 6 are average values for different types of audio mentioned in the recent literature. The proposed scheme presents an excellent inaudibility result because its SNR value is 81.43, which is the highest value compared with the values of other proposed schemes. Moreover, because all the ODG values of the proposed watermarking scheme are near zero, the watermarking audio is close to the original audio file, which confirms the imperceptibility of the proposed scheme. Furthermore, the data payload capacity of the proposed method is 319.29 bps, which is considered a high payload rate compared with other proposed audio watermarking schemes.

Table 7 SNR, ODG, and Payload results for the different audio watermarking schemes

Table 8 clearly shows that the proposed scheme is robust against several types of attacks for different types of audio signals because the overall average BER value for the different types of attacks is the minimum and close to zero. Table 8 also clearly shows that the proposed scheme has a high robustness rate compared with the other schemes. Fig. 10 illustrates the sample of an audio signal after applying several types of attacks. Clearly, some types of attacks affect the audio signal more than others. Nevertheless, the analysis and results confirm the robustness of the proposed scheme as an embedded image is extracted after these attacks with little degradation. In addition, the results of the proposed scheme meet the IFPI requirements for audio watermarking schemes. To conclude, the comparisons and analysis results confirm that our proposed scheme based on a DWT and Schur decomposition hybrid method meets all the requirements of an excellent audio watermarking scheme and performs extremely well in comparison with other proposed schemes.

Table 8 BER average comparisons for different audios of different schemes
Fig. 10
figure 10

Illustrates blues audio signal: (a) original signal, (b) watermarked signal, (c) Gaussian noise (20 dB) attack, (d) re-quantization, (e) re-sampling (22.05) attack, (f) low-pass filter attack, (g) high-pass filter attack, (h) echo (delay = 227 ms) attack, (i) MP3 compression (96) attack, and (j) cropping (50%) attack. X-axis: Samples number of the signal, Y-axis: the amplitude of the samples

7 Conclusion and future works

In this research study, we propose creating a novel imperceptible and robust digital audio watermarking scheme based on a DWT and Schur decomposition hybrid method. We first carried out the watermarking embedding procedure by applying a two-level 2D-Haar DWT to the original audio signal, where the first level segmented the input audio into four sub-bands. Then, we further segmented the second band (HL1) into four new sub-bands based on a 2D-Haar DWT. Second, we applied Schur decomposition to the second sub-band HL2 to decompose the sub-band’s coefficient matrix into two independent matrices U and S. Third; we used the S matrix to embed the watermark bits into the seventh LSB of the integer part in the diagonal coefficients. Finally, we reconstructed the watermarked audio by applying the inverse of the DWT and Schur decomposition.

The experimental finding and analysis results show that the proposed watermarking scheme is robust against common types of attacks such as Gaussian noise, re-quantization, re-sampling, low-pass filtering, high-pass filtering, echo, MP3 compression, and cropping. The ODG, SDG, and SNR tests confirm the imperceptibility of the proposed audio watermarking scheme because the average ODG score (0.18) is very close to zero, the average SDG score (4.78) is very close to 5, and the average SNR (81.43) meets the IFPI requirements for audio watermarking schemes. These tests confirm that the original audio and watermarked audio obtained from the proposed scheme are identical. Moreover, the capacity of the proposed scheme is high because the data payload rate is 319.29 bps. Therefore, the results indicate that the proposed scheme is imperceptible, with high payload capacity. In comparison with other recently proposed audio watermarking schemes, our proposed method is superior in terms of balanced performance among robustness, imperceptibility, and payload capacity.

Experimentally it is shown that the performance of the proposed watermarking scheme is better than the state-of-art approaches in terms of imperceptibility, payload capacity, and robustness. Therefore, the proposed scheme can be used in digital audio signals to identify of content ownership, broadcast monitoring, publication monitoring, content authentication, copy control, and Information carrier in different fields.

One main limitation of the proposed method is the fact that the embedding process can embed small size of watermark image comparable with audio file size, due to the DWT and Schur decompositions. Thus, our futureworks include that how to increase the size of watermark image. In addition, we may include the enhancement of proposed scheme in terms of imperceptibility, payload capacity, and robustness. Furthermore, we plan to focus on developing a secure watermarking scheme based on the Schur decomposition and chaos theory for real time applications to maintain the security and other important parameters. Finally, we plan to design and implement watermarking scheme based on DWT and Schur for image and video signals in a future work.