1 Introduction

Due to availability of multimedia data in digital form and advancement of multimedia technology, various multimedia data such as image, video, audio can be copied, modified and then, it can be distributed with the help of Internet technology to a distant place very easily. Therefore, it is very simple to access data by any one in the world. It is really a big challenge for the owner of the data to protect it from unauthorized users. The most promising solution to the problem of copyright protection of the digital multimedia data is digital watermarking (Cox et al., 2007), which is a process of embedding the copyright information into a host data. The copyright information is called watermark.

Recently, audio watermarking technique (Ercelebi & Batakci, 2009; Lei et al., 2011; Li & Wu, 2015; Wu et al., 2005) is a hot research topic and many researchers are getting interested in developing watermarking techniques for the copyright protection of audio signals. According to International Federation of the Phonographic Industry (IFPI; Vivekananda Bhat et al., 2010), an effective audio watermarking method must satisfy the following properties or characteristics: (i) Imperceptibility: The quality of the host audio signal should not be degraded after embedding the watermark information. It is the perceptual similarity between the host and the watermarked audio signals. As stated in IFPI, signal-to-noise ratio (SNR) between host and watermarked audio signals should be more than 20 dB. (ii) Robustness: Ability to extract a watermark information from a watermarked audio signal against different audio attacks. (iii) Payload: The amount of watermark information that can be embedded into the host audio signal which satisfies the imperceptibility property. It is defined as the number of watermark bits that can be embedded into a host signal per unit of time and is measured using bits per second (bps). It should be more than 20 bps. (iv) Security: The watermark information can be extracted only by the intended users. Watermarked signals should not reveal any clues about watermark information in them. The extraction of watermark from watermarked signal depends on secret keys only rather than the secrecy of the watermarking techniques. Actually, imperceptibility, robustness and payload are mutually conflict to each other. For example, increasing data rate in a watermarking technique results in quality degradation of the watermarked signal and increase of robustness against attacks. For designing and developing of an effective watermarking technique, one has to make a suitable trade-off among those properties.

In the literature, many watermarking techniques have been proposed for images and videos (Muhammad & Bibi, 2015; Rasti et al., 2016). These techniques can be applied for audio watermarking. However, developing audio watermarking technique is difficult compared to image and video watermarking techniques for two reasons (Vivekananda Bhat et al., 2011a). Firstly, the audio signals are represented by much less signals per time interval compared to images and videos. As a result, amount of information that can be added with audio signals is much lower than images and videos. Secondly, the human auditory system (HAS) is much more sensitive than the human visual system (HVS). It is therefore difficult to satisfy the imperceptibility (inaudibility) property of audio watermarking techniques than the imperceptibility (invisibility) property of the watermarking techniques for images and videos

In recent years, many audio watermarking methods are developed. The audio watermarking methods are broadly classified into two groups: (i) time-domain methods (Binny & Koilakuntla, 2014; Lie & Chang, 2006; Subbarayan & Ramanatha, 2009) and transform-domain methods (Dhar & Shimamura, 2015; Elshazly et al., 2017; Vivekananda Bhat et al., 2011b; Zhang, 2015). In time-domain based techniques, watermark information is embedded directly into the audio signals. These techniques are very easy to develop and they require less computation compared to transform domain techniques. In transform domain based techniques, audio signals are transformed into frequency domain using some transformation and then adjust the coefficients of the frequency components. These techniques are more robust than time-domain techniques. There are various types of time-domain methods such as LSB replacement, echo hiding, phase coding, patchwork etc. and transform-domain methods such as FFT, DWT, DCT, SVD etc.

Binny and Koilakuntla (2014) proposed a stenographic technique which embeds the text data into audio signal based on LSB method. To embed the text data, each audio trial is converted into bits and it is shown that the SNR value decreases as text length increases. A time-spread (TS) echo hiding based audio watermarking method is presented in Hu et al. (2016). Each watermark bit is embedded into the corresponding segment of audio signals by adding echos with different delays. In Lie and Chang (2006), an audio watermarking technique has been presented in time-domain. The algorithm exploits differential average of the absolute amplitude relations to embed a watermark bit in group of audio samples. It maintains high audio quality and also shows robustness against the attacks. In Subbarayan and Ramanatha (2009), an LSB based audio watermarking technique has been described. Before embedding, watermark has been encrypted using RSA algorithm. This technique is not robust against attacks. In Chetan et al. (2021), an audio watermarking method based on modified LSB bit plan of audio signal has been proposed. The watermark is embedded on the least significant bits of samples through the audio signal instead of being localized to a particular set of bits. For embedding the watermark bit, audio signal is quantized to the range [0–255] and convert each sample to 8-bits binary form. A DCT based blind audio watermarking is proposed in Roy et al. (2015). In the embedding process, DCT represented audio signal is partitioned into non-overlapping segments each of same size and then each segment is divided into four non-overlapping frames. The watermark bit is embedded into a segment by modifying the coefficients of the four frames. In the experimental results, they have shown that the technique has a good imperceptibility and robustness, but data payload is not given. In Li and Wu (2015), a blind audio watermarking technique has been proposed based on lifting wavelet transform and QR decomposition. The watermark bits are embedded via quantization of the random correlation coefficients which is computed from a random vector generated by a secret key and the vector obtained by the QR decomposition of the lifting wavelet coefficients. The experimental results show that the technique has a good robustness, imperceptibility and data payload. In Zhang (2015), a semi-fragile audio watermarking scheme is proposed based on DWT and DCT domain for copyright and content authentication.

To improve the effectiveness of the watermarking techniques, many researchers are using singular value decomposition (SVD; Dhar & Shimamura, 2014; Lei et al., 2012) to embed watermark into the host data. In SVD based audio watermarking system, the obvious preprocessing step is the conversion of 1-D signal into 2-D signal (2-D matrix) and then SVD is applied on the 2-D matrix. In Dhar and Shimamura (2014), a blind audio watermarking technique has been described in fast Fourier transform (FFT) domain based on SVD and Cartesian-polar transformation (CPT). The FFT is applied on each audio frame and then SVD is used for the low frequency components of FFT. The highest singular value is decomposed into two components using Cartesian-polar transform for embedding watermark information. In Vivekananda Bhat et al. (2010), a blind audio watermarking technique has been proposed. The watermark bits are embedded by modifying the singular values of the wavelets coefficients. The method has shown very good imperceptibility and robustness, but the payload is very low. In Lei et al. (2011), a blind audio watermarking technique has been described based on SVD and DCT. A binary watermark is embedded into the high-frequency coefficients of the SVD–DCT blocks. The method has also shown low payload. In Farzaneh and Toroghi (2020) an audio watermarking scheme has been proposed based on graph-based transform and SVD. This scheme has limited efficiency as it is robust against compression and additive noise. In Novamizanti et al. (2020), authors have proposed an audio watermarking algorithm based on LWT–DCT–SVD. For embedding the watermark, algorithm decomposes the audio signals by LWT and then selected sub-band is transformed by DCT. The output of the DCT is used for SVD process and the singular matrix is modified by quantization to embed the watermark. A SVD based audio watermarking method using dual watermarking for copyright protection has been proposed in Patil and Chitode (2021). DWT process is applied on audio signal followed by SVD process to embed the watermark. The watermark is embedded by linear combination of the SVD component of watermark image and audio signal. Elshazly et al. (2021) proposed a audio watermarking algorithm based on DWT and SVD. They have embedded the color image as watermark. During watermark embedding, watermark bit is embedded by quantization of the largest singular values (LSVs).

In this paper, a blind audio watermarking technique is proposed based on SVD and quantization. The watermark bit is embedded by modifying the LSV of each audio signal block. The advantage of using the LSVs and quantization are: (i) The quantization of the LSV does not affect the signal quality. (ii) Change of LSVs under attacks are very insignificant. (iii) The quantization is simple, easy to implement and less complexity. (iv) The quantization method helps to achieve a good trade-offs among payload, imperceptibility and robustness. In the present work, to enhance the security level the watermark image is scrambled using Fibonacci–Lucas transformation (FLT). The proposed method is simple (less complex) and easy to implement.

This paper is organized as follows. Section 2 provides background information including watermark scrambling using Fibonacci–Lucas transform and SVD. Section 3 demonstrates the embedding and extraction process of the proposed technique. The experimental results are given in Sect. 4. Section 5 provides performance analysis of the proposed technique. Finally, conclusions are given in Sect. 6.

2 Background information

2.1 Fibonacci–Lucas transformation

In this work, we use Fibonacci–Lucas (Mishra et al., 2012) transformation to scramble the binary watermark image for enhancing the confidentiality of the proposed method. The FLT is defined as the mapping \(FLT: T^{2} \rightarrow T^{2}\) as given in Eq. (1).

$$\left( \begin{array}{cc}x'\\ y'\end{array}\right) = \left( \begin{array}{cc} F_{i} &{} F_{i+1}\\ L_{i} &{} L_{i+1} \end{array} \right) \left( \begin{array}{cc}x\\ y\end{array}\right) \mod \ N,$$
(1)

where \((x,y) \in \{0, 1, \ldots , N-1\}\), \(F_{i}\) is the i th term of the Fibonacci series and \(L_{i}\) is the i th term of the Lucas series, \(N\times N\) is the size of the image. Using FLT, given in Eq. (1), the pixel (xy) of the given image is mapped into a new location \((x', y')\). So, for a particular i, the FLT matrix is defined as

$$FLT_{i}= \left( {\begin{array}{cc} F_{i} &{} F_{i+1} \\ L_{i} &{} L_{i+1} \\ \end{array} } \right) .$$

This way one can define many Fibonacci–Lucas transforms by changing the value of i. So, one can use Fibonacci–Lucas transform as more secured scrambling method. For example, the first matrix of this series is for \(i=1\) is given by

$$FLT_{1}= \left( {\begin{array}{cc} 1 &{} 1 \\ 2 &{} 1 \\ \end{array} } \right) .$$

It may be noted that this \(FLT_{1}\) is nothing but the Arnold transform.

In this work, we have scrambled the watermark image by applying the FLT. This transformation is adopted to ensure more security.

2.2 Singular value decomposition

There are enormous applications of SVD (Aslantas, 2009; Lai, 2011) in the field of image processing. The image can be viewed as a matrix of non-negative scalar elements. SVD decomposes a matrix A of size \(m \times n\) as product of three matrices: U, \(\Sigma\) and V, i.e., \(A=U\Sigma V^{T}\). Here, \(U_{m\times m}\) and \(V_{n \times n}\) are orthogonal matrices. The matrix \(\Sigma _{m \times n}\) whose i th diagonal entry is equals to i th singular value. The singular values are denoted as \(\alpha _{i}, i=1, \ldots ,n\). When the rank of A is p, elements of \(\Sigma\) satisfy the conditions \(\alpha _{1}\ge \alpha _{2}\ge \cdots \ge \alpha _{p}>0\) and \(\alpha _{p+1}=\alpha _{p+2} =\cdots =\alpha _{n} =0\). The SVD is used frequently by many researchers in designing watermark techniques due its attractive properties (Vivekananda Bhat et al., 2011a, b; Lai, 2011; Lei et al., 2012). In this work, we have considered square matrix of size, say, \(n \times n\). Then SVD of \(A_{n \times n}\) is formulated as given in Eq. (2).

$$A = U \Sigma V^{T} = \left[ \begin{array}{c} u_{1}, \ldots ,u_{n} \end{array} \right] \times \left[ \begin{array}{ccc} \alpha _{1} &{} &{} \\ &{} \ddots &{} \\ &{} &{} \alpha _{n} \end{array} \right] \times \left[ \begin{array}{c} v_{1} \\ \vdots \\ v_{n} \end{array} \right] .$$
(2)

3 Proposed technique

The proposed method is discussed in this section. In the watermark embedding process, first the watermark is scrambled by FLT and then the host audio signal is decomposed by SVD and watermark bits are embedded by modifying the LSV. The watermark extraction process is simple one, in which the watermarked audio signal is decomposed by SVD and then from LSV watermark bit is determined. The detail of the embedding and extraction process are described in the following subsections.

3.1 Embedding process

The block diagram of the embedding process is shown in Fig. 1. In the proposed method, SVD is applied to embed to watermark bits. We know that SVD can be applied on a 2-D matrix. But, the given audio signal is an 1-D signal. For simplicity, we have considered square matrix. So, the audio signal is converted into 2-D signal of size \(M \times M\). Suppose, \(X=\{ x(i), i=1, 2, \ldots , L \}\) be a host audio signal of L samples. The audio signal X is converted into 2-D audio signal of size \(M \times M\), where \(M=\sqrt{L}\).

Fig. 1
figure 1

Block diagram of embedding process

The 2-D audio is partitioned into non overlapping blocks \(B_{k}\) of size \(q \times q\). The number of such blocks is equal to \(\frac{M}{q} \times \frac{M}{q}\) which is same as the watermark size, i.e., in each block \(B_{k}\) we embed a bit. So, depending on the size of the watermark image, the length of host signal is determined or vice versa.

Before embedding the watermark, it is scrambled by FLT. The Fibonacci–Lucas transform matrix is used as the secret key to enhance the security level of the watermark.

The block \(B_{k}\) is decomposed by SVD method and the watermark bit is embedded into the block \(B_{k}\) by modifying the LSV \(\alpha _{k,1}\). Here, a user defined quantization interval \(\Delta\) is used in the embedding process. The \(\alpha _{k,1}\) is quantized as given in Eq. (3).

$$h_{k}= \left\lceil \frac{\alpha _{k,1}}{\Delta } \right\rceil ,$$
(3)

where value of \(h_{k} \in [ 0, H_{max} ]\) and \(H_{max}\) is computed as \(H_{max}=\frac{||A||_{2}}{\Delta }\), where \(||A||_{2}\) represents the induced 2-norm of the matrix A. In the embedding process, we follow a simple logic. Here, we discuss the process when watermark bit is ‘0’ and exactly symmetric method can be applied when watermark bit is ‘1’.

If watermark bit is ‘0’, we modify the \(\alpha _{k,1}\) to \(\alpha _{k,1}'\), as the middle value of the previous/next quantization level (which is appropriate), so that corresponding \(h_{k}\) (i.e., \(h_{k}'\)) will be even. So, when watermark bit is ‘0’ we have four different cases, which are as follows:

  1. (1)

    If \(h_{k}\) is even then do nothing, i.e., \(\alpha _{k,1}'=\alpha _{k,1}\).

  2. (2)

    If \(h_{k}=1\), then \(\alpha _{k,1}'=\frac{3\Delta }{2}\), i.e., \(\alpha _{k,1}'=h_{k}\,*\,\Delta + \frac{\Delta }{2}\) (see Fig. 2a).

  3. (3)

    If \(h_{k}=h_{max}\) and \(h_{k}\) is odd, then \(\alpha _{k,1}'= h_{k}\,*\,\Delta - \frac{3\Delta }{2}\) (see Fig. 2b).

  4. (4)

    If \(h_{k}\) is odd, between the middle value of the previous quantization interval \(\left( P_{m}=h_{k}\,*\,\Delta - \frac{3\Delta }{2}\right)\) and the middle value of the next quantization interval \(\left( N_{m}=h_{k}\,*\,\Delta +\frac{\Delta }{2}\right)\) which one is closest to \(\alpha _{k,1}\) is consider as \(\alpha _{k,1}'\) (see Fig. 2c).

Fig. 2
figure 2

Watermark bit embedding process when watermark bit is ‘0’ and k is odd for \(h_{k}\). Three cases are illustrated: a extreme left, i.e., \(h_{1}\), b extreme right, i.e., \(h_{max}\) and c middle case, i.e., \(1<k<max\)

The algorithmic sketch of the proposed embedding process is given in Algorithm: Embedding Process().

   Algorithm: Embedding Process(X, W, FL, \(\Delta\), \(X_{w}\))

Input: Host audio(X), Binary watermark(W), FLT matrix (FL), Quantization interval(\(\Delta\))

Output: Watermarked audio(\(X_{w}\))

  1. (1)

    Convert the 1-D host audio signal X of L samples into 2-D audio signal S.

  2. (2)

    Partition the 2-D audio signal S into non-overlapping blocks \(B_{k}\), \(k=1, 2, \ldots , N \times N\), each of size \(q \times q\).

  3. (3)

    The watermark W of size \(N \times N\) is scrambled by FLT which gives scrambled watermark \(W_{s}\).

  4. (4)

    SVD is applied on each block \(B_{k}\) and \(B_{k}=U_{k} \Sigma _{k} {V_{k}}^{T}\), where the LSV is \(\alpha _{k,1}\).

  5. (5)

    Quantize \(\alpha _{k,1}\) and obtain \(h_{k}=\left\lceil \frac{\alpha _{k,1}}{\Delta }\right\rceil\).

  6. (6)

    The watermark bits are embedded by modifying \(\alpha _{k,1}\) as \(\alpha _{k,1}'\)

    1. (a)

      \(P_{m}= h_{k}\,*\,\Delta - 3*\frac{\Delta }{2}\)

    2. (b)

      \(N_{m}= h_{k}\,*\,\Delta + \frac{\Delta }{2}\)

    3. (c)

      If watermark bit \(W_{s,k} = 0\)   //make \(h_{k}\) even

      $$\alpha _{k,1}'=\left\{ \begin{array}{ll} 0, &{} \text{ if } h_{k} {=0 } \\ N_{m}, &{} \text{ if } h_{k} {=1 } \\ \alpha _{k,1}, &{} \text{ if } h_{k} \text{ is } \text{ even }\\ P_{m}, &{} \text{ if } h_{k} \text{ is } \text{ odd } \text{ and } h_{k} =H_{max} \\ P_{m}, &{} \text{ if } h_{k} \text{ is } \text{ odd } \text{ and } \\ &{}\,\, |\alpha _{k,1} - P_{m}|<|\alpha _{k,1} - N_{m}|\\ N_{m}, &{} \text{ if } h_{k} \text{ is } \text{ odd } \text{ and } \\ &{}\,\, |\alpha _{k,1} - P_{m}|\ge |\alpha _{k,1} - N_{m}| \end{array} \right.$$
    4. (d)

      When watermark bit \(W_{s,k} = 1\)   //make \(h_{k}\) odd

      $$\alpha _{k,1}'=\left\{ \begin{array}{ll} N_{m} , &{} \text{ if } h_{k} {=0 } \\ \alpha _{k,1}, &{} \text{ if } h_{k} \text{ is } \text{ odd } \\ P_{m}, &{} \text{ if } h_{k} \text{ is } \text{ even } \text{ and } h_{k} =H_{max} \\ P_{m}, &{} \text{ if } h_{k} \text{ is } \text{ even } \text{ and } \\ &{}\,\,|\alpha _{k,1} - P_{m}| < |\alpha _{k,1} - N_{m}| \\ N_{m}, &{} \text{ if } h_{k} \text{ is } \text{ even } \text{ and } \\ &{}\,\,|\alpha _{k,1} - P_{m}| \ge |\alpha _{k,1} - N_{m}| \end{array} \right.$$
  7. (7)

    Apply inverse SVD to obtain watermarked block \(B_{kw}\)

    1. (a)

      \(\Sigma _{kw}=\Sigma _{k} - \{\alpha _{k,1}\} \cup \{\alpha _{k,1}'\}\)

    2. (b)

      \(B_{kw}=U_{k}\Sigma _{kw}{V_{k}}^{T}\)

  8. (8)

    The watermarked audio signal \(X_{w}\) is reconstructed from all blocks \(B_{kw}\).

  9. (9)

    Return \(X_{w}\)

3.2 Extraction process

The watermark extraction process is very simple. Here, the watermarked audio signal, \(X_{w}\), is process like the embedding method and from the LSV \(\alpha _{w_{k,1}}\) we compute the value \(h_{w_{k}}\). The parity (even/odd) of \(h_{w_{k}}\) determines the watermark bit (\(Wt_{k}\)), i.e.,

$$Wt_{k}=\left\{ \begin{array}{ll} 0 , &{} \text{ if } h_{w_{k}} \text{ mod } \text{2 } = 0, \\ 1, &{} \text{ if } h_{w_{k}} \text{ mod } \text{2 } = 1. \end{array} \right.$$

Finally, this watermark is unscrambled using FLT and we obtain the binary watermark \(W'\). The proposed method is blind as no other information is required other than watermarked audio signal \(X_{w}\). The block diagram of the proposed extraction process is shown in Fig. 3 and the algorithmic structure is given in Algorithm: Extraction Process().

   Algorithm: Extraction Process(\(X_{w}\), FL, \(\Delta\), \(W'\))

Input: watermarked audio(\(X_{w}\)), FLT matrix(FL), Quantization interval(\(\Delta\))

Output: Extracted binary watermark(\(W'\))

  1. (1)

    Convert the 1-D watermarked audio signal \(X_{w}\) of L samples into 2-D audio signal \(S'\).

  2. (2)

    Partition the 2-D audio signal \(S'\) into non-overlapping blocks \(B'_{k}\), \(k=1, 2, \ldots , N \times N\), each of size \(q \times q\).

  3. (3)

    SVD is applied on each block \(B'_{k}\) and \(B'_{k}=U'_{k} \Sigma '_{k} {{V'}_{k}}^{T}\), where, the LSV is \(\alpha _{w_{k,1}}\).

  4. (4)

    Quantize \(\alpha _{w_{k,1}}\) and obtain \(h_{w_{k}}= \left\lceil \frac{\alpha _{w_{k,1}}}{\Delta }\right\rceil\).

  5. (5)

    If mod(\(h_{w_{k}}\), 2)= 0, then \(Wt_{k}=0\) Else \(Wt_{k}=1\)

  6. (6)

    Assemble all extracted watermark bits and then apply inverse FLT using FL and obtain extracted watermark \(W'\)

  7. (7)

    Return \(W'\)

Fig. 3
figure 3

Block diagram of extraction process

4 Experimental results

In this section we demonstrate the experimental result of the proposed watermarking method. We have implemented the proposed method using MATLAB 7.1.

Four different host audio signals namely, ‘Classical’, ‘Jazz’, ‘Piano’ and ‘Tabla’, are used as test signal in the experiment. These test signals are provided by the corresponding author of Ph.D. Thesis (Ghosal, 2014). Each audio signal has sampling rate 44.1 kHz with 16 bits/sample in the WAVE format. Each audio signal contains 230,400 samples (duration 5.224 s). These audio signals are transformed into 2-D signals with size \(480 \times 480\) and each audio signal is further divided into blocks of size \(15\times 15\). Number of blocks is \(\frac{480}{15} \times \frac{480}{15}\) = \(32\times 32\). Since, we assume that a bit will be embedded into each block then a binary image of size \(32 \times 32\) is used as the watermark in our proposed method. If the blocks size is bigger than \(15 \times 15\) we can use the current watermark for embedding purpose but in that case some bits will not be used. On the other hand, when we have smaller size blocks, then for some blocks there will no bit for embedding.

The watermark image is scrambled using FLT as described in Sect. 2. The original and the scrambled version of the watermark are shown in Fig. 4a and  4b, respectively. The quantization threshold \(\Delta\) is set as 0.19 to achieve a good trade-off between conflicting requirements of imperceptibility and robustness and detail of this is discussed in Sect. 5.2. Since, the embedding and extraction process of the proposed technique depend on \(\Delta\), the \(\Delta\) value is used as one secret key of the method. Another secret key is the FLT matrix. Without appropriate transform matrix, no one can identify the actual sequence of the watermark by reverse scrambling process. The host and watermarked audio signal ‘Jazz’ are shown in Fig. 5.

Fig. 4
figure 4

Binary watermark image of size \(32 \times 32\): a original and b scrambled

Fig. 5
figure 5

Performance of the proposed watermark embedding method on the host signal ‘Jazz’

4.1 Imperceptibility test

In our experiment, the imperceptibility test is performed using SNR which is used to measure the quality of watermarked audio signal. It is defined as

$$SNR=10\,*\,\log _{10}\frac{\sum ^{L}_{i=1}X^2(i)}{\sum ^{L}_{i=1}[X(i)-X_{w}(i)]^2}\,{\text {dB}},$$
(4)

where X and \(X_{w}\) are host and watermarked audio signals. The SNR is more when the difference between the host signal and the watermarked signal is less, i.e., imperceptibility is less (which is desired property of any watermarking technique). In Fig. 5c, the difference signal between the original and watermarked ‘Jazz’ is shown. From this figure we may note that the magnitude of the difference signal is very close to zero, so the imperceptibility is less. The SNR of the watermarked audio signals of ‘Classical’, ‘Jazz’, ‘Piano’ and ‘Tabla’ are 29.14 dB, 22.16 dB, 26.91 dB and 21.10 dB respectively. According to the IFPI standard, the SNR value of all the watermarked audio signals should be above 20 dB (Vivekananda Bhat et al., 2010). Our proposed technique satisfies the IFPI requirement.

4.2 Robustness test

Robustness means the ability to extract the watermark under the attacks, i.e., we need to measure the similarity/dissimilarity between the original watermark nd extracted watermark. In this experiment, we use two parameters: (i) normalized coefficient (NC) and (ii) bit error rate (BER).

The NC is the measure of similarity between the watermarks and it is defined as follows:

$${ \tiny NC=\frac{\sum ^{N}_{i=1} \sum ^{N}_{j=1}W(i,j)W^{\prime }(i,j)}{\sqrt{\sum ^{N}_{i=1} \sum ^{N}_{j=1}W(i,j)}\sqrt{\sum ^{N}_{i=1} \sum ^{N}_{j=1}W^{\prime }(i,j)}} }.$$
(5)

If \(NC(W,W^{\prime })\) is close to 1 , then the similarity between W and \(W^{\prime }\) is very high else the similarity is very low.

The BER is the measure of dissimilarity and it is defined as

$$BER= \frac{\sum ^{N}_{i=1} \sum ^{N}_{j=1}W(i,j) \oplus W^{\prime }(i,j)}{N \times N} \times 100\%,$$
(6)

where \(\oplus\) is the exclusive or (XOR) operator and W and \(W^{\prime }\) are the original and extracted watermarks, respectively. So, when BER value is close to zero then we may infer that both the watermarks (original and extracted) are close enough.

So, a method is robust when the NC between the original watermark and extracted watermark is high and BER is less. To measure the robustness of the proposed method, we have considered different audio attacks. The audio editing and attacking tools used in this experiment are Adobe Audition 1.0 (for echo addition and inverse attacks), GoldWave 5.18 (for denoising, smoothing, MP3 compression and resampling attacks) and MATLAB (for low-pass filtering, additive white Gaussian noise, cropping and requantization attacks).

  1. (i)

    Low-pass filtering low-pass filtering using second order Butterworth filter with 11.025 kHz cut-off frequency is performed on the watermarked audio signals.

  2. (ii)

    Additive noise white Gaussian noise is added to the watermarked audio signals until the SNR of the resulting signal below 20 dB.

  3. (iii)

    Cropping segments of 500 samples of the watermarked audio signals are replaced by the segments of the watermarked signal attacked with additive white Gaussian noise at five positions.

  4. (iv)

    Echo addition an echo signal with a delay of 98 ms and a decay of 41% is added to the watermarked audio signal.

  5. (v)

    Denoising the “Hiss removal” function of GoldWave is used to denoise the watermarked audio signal.

  6. (vi)

    Reverse amplitude reverse the sign of the sample amplitudes.

  7. (vii)

    Smoothing the smoothing operation of GoldWave is used to produce slow changes to the watermarked audio signal.

  8. (viii)

    Re-quantization the 16-bit watermarked audio signals is quantized down to 8 bits/sample and then back to 16 bits/sample.

  9. (ix)

    MP3 compression the MPEG-1 layer-3 compression is applied using GoldWave on the watermarked audio signals. The watermarked audio signal is compressed at different bit rates (128 kbps, 64 kbps and 48 kbps) and then decompressed back to the WAVE format.

  10. (x)

    Re-sampling as the host audio signal is sampled at 44.1 kHz, thus, watermarked audio signal is down-sampled at 22.05 kHz, 11.025 kHz, 8.0 kHz and then up-sampled back to 44.1 kHz.

The extracted watermarks along with the NC and BER values under different attacks on the watermarked audio signal ‘Jazz’ are presented in Table 1. The NC values are all above 0.9837 and the BER values are all below 1.9%. The extracted watermark images are visually similar to the original watermark image. This establishes high robustness of the proposed technique for the ‘Jazz’ audio signal. In the first column of this, the numerals are used to represent the different attacks and these numerals are used in the next tables to refer the attacks.

Table 1 Extracted watermark from watermarked ‘Jazz’ audio signal under different audio attacks with NC and BER

Similar results for the audio signals ‘Piano’, ‘Tabla’ and ‘Classical’ are shown in Table 2 (due to space, we ignore the extracted watermark images). The NC values are above 0.8879, so 90% or more accurate. On the other hand, the performance under the attack ‘echo addition’ is comparatively poor in terms of BER, which is around 12.5%. As a whole, the proposed method is robust against almost all attacks except ‘echo addition’.

Table 2 NC and BER of extracted watermark from watermarked audio signal under different attacks for ‘Piano’, Tabla’ and ‘Classical’

4.3 Data payload

The data payload refers to the number of bits that can be embedded into the audio signal per unit of time. The data payload for watermarking techniques can be measured by various methods. In the proposed method data payload is denoted as \(D_{p}\) and it is defined as

$$D_{p}=\frac{F_{s}}{N_{B}} \text{ bps },$$
(7)

where \(F_{s}\) is sampling rate of the audio signal in Hz, \(N_{B}\) is the number of samples in each block. From Eq. (7), it has been seen that the data payload can be increased by decreasing the number of samples in each block. The data payload can effect the imperceptibility and robustness of any audio watermarking technique. In our proposed technique, we have performed our experiment with varying number of samples in each block (i.e., varying data payload) and we have observed that SNR is increasing when the data payload is decreasing. In our method, if we increase the size of the block then payload will be decreasing. In Table 3, the SNR values with varying data payload on different audio signals are given. For our proposed method, \(F_{s}\) is 44,100 Hz and \(N_{B}\) = 225, thus data payload of our method is 196 bps. This is a very high payload as typical payload is 20–50 bps (Lei et al., 2012).

Table 3 SNR versus payload

The data payload also effect the robustness of the watermarking method. It is obvious that robustness (under any attack) will be decreasing when data payload is increasing. The results in Table 4 demonstrate that our proposed technique is robust even with high payload on ‘Classical’ audio signal under different attacks.

Table 4 BER (%) versus payload (under attacks) with ‘Classical’ audio signal

From Tables 3 and 4, we may note that if we increase the window size (\(N=q \times q\)) both the imperceptibility and the robustness increases at the cost of payload (which decreases very fast). Therefore, other window size cannot be considered.

4.4 Security

In the proposed method, we have used a quantization interval \(\Delta\) during watermark embedding and watermark extraction. For unauthorized users, it is impossible to extract the watermark without actual value of \(\Delta\). This quantization interval is used as secret key in our method. Another secret key is FLT matrix which is used to scramble the watermark image. Without correct matrix, it is not possible to identify the actual sequence of the watermark. These two secret keys actually enhance the security of the watermark.

5 Performance analysis

5.1 Error analysis

The error analysis of a watermarking technique is characterized by two types of error: (i) false positive error and (ii) false negative error. It is difficult to give an exact model of these errors. Here, we adopt a simplified model based on binomial probability distribution to provide an error analysis to compute the probability of two types of errors for the proposed technique.

5.1.1 False positive error analysis

The false positive error is the probability that an unwatermarked audio signal is considered to be watermarked during extraction process. Let r be the total number of watermark bits and s be the number of matching bits computed while extracting watermark. The extracted watermark bits are assumed to be independent random variables. The probability \(P_{e}\) is defined as the probability that extracted watermark bits are matched with the original watermark bits. According to Bernoulli trials, we denote the probability \(P_{s}\) as the probability that exactly s number of matching bits are found out of r watermark bits and it is defined as

$$P_{s}={r \atopwithdelims ()s} P_{e}^{s}(1-P_{e})^{r-s},$$
(8)

where \({r \atopwithdelims ()s}\) is the binomial coefficient. Since watermark values are either 0 or 1, thus \(P_{e}\) is \(\frac{1}{2}\). The false positive error will occur when the number of matching bits are greater than or equal to some threshold \(T_{p}\). Then, the probability of false positive error is denoted as \(P_{fp}\) and it is defined as

$$P_{fp}= \sum ^{r}_{s=T_{p}}P_{s} = \sum ^{r}_{s=T_{p}}\frac{{r \atopwithdelims ()s}}{2^{r}}.$$
(9)

In our proposed scheme, we assume that an unwatermarked audio signal is claimed to be watermarked when 75% or more bits are matched. Thus, \(T_{p}= \left\lceil 0.75 \times r \right\rceil\) and then \(P_{fp}\) is given as

$$P_{fp}=2^{-r} \sum ^{r}_{s=\left\lceil 0.75 \times r \right\rceil } {r \atopwithdelims ()s},$$
(10)

where r is the number of bits in the watermark (here \(r=32\times 32 = 1024\)). A plot of the false positive error probability for \(r \in (0, 100]\) is shown in Fig. 6. It is observed that probability of false positive error approaches to zero when \(r \ge 25\). In this experiment, the false positive error probability \(P_{fp}\) is close to 0. We obtain \(P_{fp} =9.12 \times 10^{-318}\) by putting r = 1024 in Eq. (10).

Fig. 6
figure 6

Probability of false of positive error

5.1.2 False negative error analysis

False negative error is the probability that a watermarked audio declared as unwatermarked one. Let r be the total number of watermark bits and s is the number of matching bits. An watermarked audio signal is considered to be unwatermarked when the number of matching bits is less than or equal to a threshold \(T_{n}\). Similarly, the probability of false negative error is denoted as \(P_{fn}\) and it is defined as

$$P_{fn}=\sum ^{T_{n}}_{s=0}{r \atopwithdelims ()s}P_{e}^{r}(1-P_{e})^{r-s}.$$
(11)

Here, we consider 75%, i.e., if number of matching bits is less than 75% we claim that there is no watermark in the given audio signal. So, \(T_{n}\) = \(\lceil 0.75 \times r \rceil -1\). In our experiment, we set BER = 25%. Then, the \(P_{fn}\) is written as

$$P_{fn}=\sum ^{\left\lceil 0.75 \times r \right\rceil -1}_{s=0}{r \atopwithdelims ()s} P_{e}^{s}(1-P_{e})^{r-s}.$$
(12)

Here, \(P_{e}\) is depend on the attacks. For different attacks, \(P_{e}\) has different values. From Tables 1 and 2, we have obtained that the BERs are all less than 0.1255 and hence \(P_{e}\) is taken as 0.8745 in our experiment. We obtain \(P_{fn}\) is equal to \(1.67 \times 10^{-88}\) from the Eq. 12 by putting \(P_{e}=0.8745\) and r = 1024. Figure 7 plots the false negative error probability for \(r \in\) (0, 100]. It is observed that the false negative error probability approaches to 0 when r is greater than 50. In Table 5, average NC values, average BER values (averages are computed over the data available in Tables 1,  2) and false negative error probability (\(P_{fn}\)) under audio attacks are shown. It is observed that \(P_{fn}\) values of our proposed technique are very close to 0 against different attacks.

Fig. 7
figure 7

Probability of false of negative error

Table 5 NC, BER (%) and false negative error probability (\(P_{fn}\)) against different audio attacks

5.2 Quantization parameter

The quantization parameter \(\Delta\) plays an important role in the performance of any quantization based audio watermarking technique. The two conflicting requirements of watermarking, i.e., imperceptibility and robustness are very much depend on the quantization parameter. As the value of \(\Delta\) increases, the imperceptibility decreases. On the other hand, the robustness of the technique increases with the increased value of \(\Delta\). For effective audio watermarking technique, suitable \(\Delta\) value to be selected experimentally. Figures 8 and  9 show the imperceptibility and robustness under different \(\Delta\) respectively. From the experimental evaluation, we have obtained that the optimal range of \(\Delta\) is [0.1, 0.3] and within this range the SNR is greater than 20 dB and BER is less than 0.1. For \(\Delta \in [0.1, 0.3]\), from the above two figures, we may note that (i) in case of lower \(\Delta\), imperceptibility is high and robustness is low, (ii) when \(\Delta\) is high, the imperceptibility is low and robustness is high. So, we consider the middle of the range as the trade-off and in this implementation we set \(\Delta =0.19\).

Fig. 8
figure 8

SNR (dB) values under varying quantization steps (measures imperceptibility)

Fig. 9
figure 9

BER under varying quantization steps (measures robustness)

5.3 Performance comparison and discussion

In this section, the performance of the proposed watermarking technique is compared with state-of-the-art techniques. Here, we have compared the proposed method with four methods described in Vivekananda Bhat et al. (2011b), Lei et al. (2012), Li et al. (2017) and Novamizanti et al. (2020) with respect to robustness under different audio attacks, imperceptibility, data payload. The details of the experimental results for comparison among different methods are given in Table 6 and in Table 7. From Table 6, it is clear that proposed method achieve highest SNR with maximum payload. The robustness of five algorithms have tested against MP3 compression and results are shown in Table 6. The performance of the proposed against MP3 compression is better compare to two existing methods and slight less compare to other two methods. So, we may note that the performance of the proposed method is better than the existing methods. In Table 7, the robustness of different methods have compared and we observe that performance of the proposed method is equal or better than first three methods and has similar performance as the fourth method in terms of NC. The robustness respect to the BER, we note that a method is performs well compare to other methods against a particular attack whereas against other attack another method performs well. So, very close performance. Therefore, we may note that performance of the proposed method is comparable with the state-of-the-art methods. Here, we may also note that the proposed method used a very simple approach, SVD followed by quantization, to embed the watermark bit. Other existing methods have used SVD with some other complex steps like QIM, LWT, DWT, LWT–DCT, etc. and that sense our method is less complex.

Table 6 Performance of audio watermarking scheme sorted by data payload
Table 7 Result of robustness of different audio watermarking techniques on the ‘Classical’ audio signal

6 Conclusions

In this work, we have proposed a blind audio watermarking technique using SVD and quantization. The watermark is embedded by quantization of the LSV of the audio blocks. The proposed method provides a good trade-off among the imperceptibility, robustness and payload. The quantization parameter \(\Delta\) and the FLT provide more security level. The experimental results demonstrate that proposed technique is robust against different audio attacks. The performance of the proposed method is similar or even better than the state-of-the-art methods. In addition, the proposed technique also provides low error probability rates. These results prove that the proposed watermarking technique can be used for copyright protection of audio signals. To further improve the performance, synchronization and error correcting codes may be considered within the proposed architecture.