Secure logarithmic audio watermarking scheme based on the human auditory system

Fallahpour, Mehdi; Megías, David

doi:10.1007/s00530-013-0325-1

Secure logarithmic audio watermarking scheme based on the human auditory system

Regular Paper
Published: 09 June 2013

Volume 20, pages 155–164, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Systems Aims and scope Submit manuscript

Secure logarithmic audio watermarking scheme based on the human auditory system

Download PDF

Mehdi Fallahpour¹ &
David Megías²

403 Accesses
13 Citations
Explore all metrics

Abstract

This paper proposes a high capacity audio watermarking algorithm in the logarithm domain based on the absolute threshold of hearing of the human auditory system (HAS), which makes this scheme a novel technique. When considering the fact that the human ear requires more precise samples at low amplitudes (soft sounds), the use of the logarithm helps us design a logarithmic quantization algorithm. The key idea is to divide the selected frequency band into short frames and quantize the samples based on the HAS. Using frames and the HAS improves the robustness, since embedding a secret bit into a set of samples is more reliable than embedding it into a single sample. In addition, the quantization level is adjusted according to the HAS. Apart from remarkable capacity, transparency and robustness, this scheme provides three parameters (frequency band, scale factor and frame size) which facilitate the regulation of the watermarking properties. The experimental results show that the method has a high capacity (800–7,000 bits per second), without significant perceptual distortion (ODG >1) and provides robustness against common audio signal processing such as added noise, filtering and MPEG compression (MP3).

Adjustable audio watermarking algorithm based on DWPT and psychoacoustic modeling

Article 26 May 2017

Digital audio watermarking using minimum-amplitude scaling on optimized DWT low-frequency coefficients

Article 14 September 2020

Robust audio watermarking algorithm based on DWT using Fibonacci numbers

Article 28 March 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Traditional data protection methods, such as encryption, are not enough for audio copyright enforcement. Digital watermarking is a popular technique for digital data protection and digital rights management [26, 27]. According to the International Federation of Phonographic Industry (IFPI) [3], audio watermarking should meet the following requirements: (a) imperceptibility: the watermarking scheme should not affect the perceptual quality of audio—in this paper, this is achieved using a psychoacoustic model to guarantee that the watermarking process does not distort the cover audio signal—(b) capacity: refers to the number of bits that can be embedded into the audio signal within a second and (c) robustness: the embedded watermark data should not be removed or eliminated by common audio signal processing operations and attacks, such as additive and multiplicative noise, MP3 compression, and filtering. All these requirements are often conflicting with each other, which makes the design of high capacity, transparent and robust audio watermarking schemes a challenging task.

Several research results exist for watermarking in the logarithm and cepstrum domains. Lee et al. [4] introduced a digital audio watermarking such that the watermark is embedded into cepstrum coefficients of the audio signal using techniques analogous to spread spectrum communications. Li and Yu [5] suggested a robust and transparent audio data embedding method in the cepstrum domain. BCH code-based robust audio data hiding in the cepstrum domain is presented in [7]. Hsieh et al. [6] suggested an audio watermarking technique based on the time energy features. Li et al. [8] proposed an audio watermarking scheme in the cepstrum domain based on the statistical mean manipulation. The embedded watermark is robust against MP3 compression and additive noise. Hu and Chen [9] proposed cepstral watermarking that manipulates the statistic mean. To avoid sharp discontinuities in the frame boundaries caused by the watermarking process, a small transition area is deliberately placed between frames, leading to an improvement in perceived quality as well. Ko et al. [21] suggested a digital watermarking method based on the log scaling of frequency in the decoding process for robust detection. Yang et al. [10] is the first technique on applying log-polar mapping to audio watermarking. The log-polar mapping is only applied to the frequency index, not to the transform coefficients, which prevents the reconstruction distortion of inverse log-polar transform and reduces the computational cost.

Watermarking methods based on the human auditory system (HAS) have been suggested in different previous works, such as [1, 2, 11, 30]. Garcia [1] proposed an algorithm to estimate the masking threshold in the psychoacoustic model of the HAS. Tsai et al. [2] proposed an intelligent audio watermarking method based on the characteristics of the HAS and neural networks in the DCT domain. Also, in [11], the watermark is embedded into selected DCT coefficients of the host audio signal such that the signal to noise ratio is maintained at a level which is audibly annoying to the HAS. Lie and Chang [30] proposed an algorithm that maintains an energy relation between every three sample sections to represent the embedded bit information by scaling up or down corresponding amplitudes and conserving audio waveforms that are perceivable to human ears.

When considering the embedding domain, audio watermarking techniques can be classified into time domain and frequency domain methods. In [12, 13], which were proposed by the authors of this paper, the discrete/fast Fourier transform (DFT/FFT) domain is selected to embed watermarks for taking benefit of the translation-invariant property of the FFT coefficients to resist small distortions in the time domain. In fact, as compared to time domain schemes, transform-based methods provide better perceptual quality and robustness against common attacks at the price of increasing the computational burden.

This paper presents an audio watermarking algorithm in the logarithm domain based on the HAS. Changing the quantization level based on the HAS in the logarithm domain makes the algorithm a novel and useful idea. Based on the requirements, a frequency band, a frame size and a scaling factor are selected and each secret bit is embedded into a frame. In addition to very high capacity, imperceptible distortion and robustness against common attacks, which make this scheme outperform other works in the literature, the other main features of the proposed algorithm are as follows: (1) using logarithm coefficients enhances the robustness, (2) watermark extraction is blind without using the host signal, (3) adjusting the quantization level based on the HAS improves transparency and robustness making it possible to enhance them at a same time, which is a significant challenge of many techniques, (4) embedding a single secret bit into a frame with an adjustable frame size provides a convenient solution to obtain a trade-off between the properties of the watermarking system, and (5) an encryption technique enhances the security of the system in such a way that an attacker, even if he/she knows the watermarking method, cannot extract the raw secret bits since a key is required to decrypt them.

The remainder of this paper is organized as follows. A brief overview of the HAS is given in Sect. 2. Section 3 combines the above techniques to propose a new method for audio watermarking. Moreover, the detailed watermark embedding and extraction algorithms are explained in that section. The experimental results and comparison with other schemes are given in Sect. 4 and, finally, relevant conclusions are drawn in Sect. 5.

2 Human auditory system

Extensive work has been performed over the years in understanding the characteristics of the HAS and applying this knowledge to audio compression and audio watermarking. Figure 1 shows a typical absolute threshold curve, where the abscissae are the frequencies measured in hertz (Hz) and the ordinates are the absolute thresholds in decibels (dB). As it can be observed, human beings tend to be more sensitive towards frequencies in the range from 1 to 4 kHz, while the threshold increases steeply at very high and very low frequencies. Based on the HAS, the human ear sensitivity in higher frequencies is lower than in middle frequencies. Thus, it is clear that, by embedding data in the high frequency band, which is used in the proposed scheme, the distortion will be mostly inaudible and thus more transparency will be obtained.

The HAS can be modeled as a frequency analyzer containing a set of 25 band-pass filters, named critical bands, that cover the range 10 Hz–20 kHz. In the absence of other sounds, the perceived intensity of a single sound, called loudness, depends on this sound’s pressure level (SPL), duration and frequency. The threshold for masking a sound is determined by the frequency and SPL [2].

3 Proposed method

In this method, we use the following technique to embed a bit stream (secret bits) into the logarithm coefficients. First, based on the desired capacity, transparency and robustness, the frequency band, frame size and scale factor should be selected. The selected band is then divided into short frames and each sample is quantized based on the HAS. Each single secret bit of the watermark stream is embedded into all samples of a frame, which makes the method more robust against attacks.

Based on the HAS, the human ear sensitivity is different in various frequencies, i.e. the absolute threshold of hearing (ATH) is different for different frequency bands. The embedding scheme takes advantage of changes of ATH in various frequency bands to adjust the quantization level.

3.1 Tuning

The proposed method provides three parameters to adjust three properties of the watermarking system. The frequency band, the scaling factor (α) and the frame size (d) are the three parameters of this method to adjust capacity, perceptual distortion and robustness.

Since most MP3 cut-off frequencies [25] are higher than 16 kHz, the high frequency band is set to 16 kHz. Then, to select the frequency band, only the low frequency band, f _l, should be adjusted. The default value for low frequency band is 9 kHz. Decreasing f _l implies increasing capacity and distortion.

Increasing the frame size, d, results in a better robustness, but capacity decreases. The default value for the frame size is d = 5. Finally, to achieve better transparency the scaling factor, α, should be increased. However, decreasing the scaling factor leads to better robustness.

Figure 2 shows the flowchart for the selection of the tuning parameters. In the initialization, f _l is 9 kHz, d is 5 and α is 10. This flowchart facilitates adjusting the parameters based on the requirements. However, adjusting the parameters based on some demands is very difficult and considering a trade-off between capacity, transparency and robustness is always necessary.

3.2 Embedding the secret bits

The frequency band, the scaling factor (α) and the frame size (d) are the three required parameters in the embedding process which have to be adjusted according to the requirements. In this section, for simplicity, we do not consider the regulation of these parameters and just take them as fixed. The effects of these parameters are analyzed in Sect. 4.

In the embedding steps, first the FFT is calculated and then the logarithm is computed. The next step is embedding the secret bits and, finally, the inverse FFT is applied to generate the marked audio signal. The embedding steps are detailed below.

1.
Compute the FFT of the original audio signal. We can use the whole file (for short clips, e.g. with less than 1 min) or blocks of a given length (e.g. 10 s) for longer files.
2.
Calculate the logarithm coefficients of the FFT samples.
3.
Divide the logarithm samples in the selected frequency band into frames of size d.
4.
To improve the security, the secret bit stream, B, is encrypted by a key, C, to form the watermark signal W:
$$W = \, E\left( {C,B} \right),$$
where E is the encryption operation.

For example, the embedded bit stream W may be computed as the exclusive-or (XOR) sum of the real watermark and a pseudo-random bit stream. Then, the seed C to produce the pseudo-random bit stream would be required as part of the secret key both at the embedder and the detector [20].
5.
The marked logarithm samples $\left\{ {c_{j}^{'} } \right\}$ are obtained by using the following equation:
$$c_{j}^{\prime } = \left\{\begin{array}{ll} \lfloor c_{j}\delta_{j}\rfloor / \delta _{j}, & if \; w_{l} = 0, \\ \left( \lfloor{c_{j} \delta_{j}\rfloor + 0.5} \right)/\delta_{j}, & if \; w_{l} = 1. \\ \end{array} \right.$$
where $l = \lfloor j/d \rfloor + 1,\,w_{l}$ is the lth bit of the watermark, $\delta_{j} = \alpha /{\text{ATH}}_{j} ,\,\alpha$ is a scaling factor and $\lfloor x \rfloor$ denotes the nearest integer value to x towards negative infinity. ${\text{ATH}}_{j}$ is the absolute threshold sound level for each sample which is calculated by:
$${\text{ATH}}\left( f \right) = 3.64\left( \frac{f}{1000} \right)^{-0.8} - 6.5e^{{-0.6\left( {\frac{f}{1000} - 3.3} \right)^{2} }} + 0.0010\left( \frac{f}{1000} \right)^{4} \, \left( {\text{dB SPL}} \right).$$
Each secret bit is embedded into a suitable frame.
6.
Finally use the inverse logarithm (exponential function) and inverse FFT to obtain the marked audio signal.

Figure 3 shows the flowchart for the embedding steps.

As it is evident, increasing the scale factor increases the accuracy of samples which results in better transparency (less distortion) but also less robustness against attacks. In addition, by enlarging the frequency band, the capacity and distortion increase and robustness decreases. Finally, increasing the frame size strengthens the robustness against attacks and reduces the capacity.

Note that the HAS model has been applied (in Step 5) using only its passive properties (without frequency masking). This choice is much more efficient from a computational point of view and makes it possible to use the proposed system in real-time applications. If real-time embedding is not a requirement, frequency masking could be considered in the scheme. However, the transparency results achieved with the scheme (as presented in Sect. 4) are remarkable even without using frequency masking. Thus, the application of frequency masking is left for future work.

3.3 Extracting the secret bits

The watermark extraction process is performed in the logarithm domain and the required parameters can be considered as side information. The scale factor, the frame size and the frequency band can be transmitted in a secure way to the decoder or they could be embedded using some fixed settings. For example, we could use default parameters to embed only the value of the adjusted parameters. Then, in the decoder, the adjusted parameters would be extracted using the default parameters and the secret bits would be obtained using the extracted adjusted parameters. Note that these parameters (frequency band, scaling factor and frame size) are also part of the secret key of the scheme (required both at the embedding and the detector side), together with the key C used for encryption. Hence, if the values of the tuning parameters are embedded at fixed (default) positions, they should be embedded as ciphertext for security reasons. Because the host audio signal is not required in the detection process, the detector is blind. The detection process can be summarized in the following steps:

1.
Compute the FFT of the marked audio signal.
2.
Calculate the logarithm of the FFT coefficients.
3.
Divide the logarithm samples in the selected frequency band into frames of size d.
4.
To detect a secret bit in a frame, each sample should be examined to check if it is a zero frame (“0” embedded) or a one frame (“1” embedded). Then, depending on the evaluation for all samples in the current frame, a secret bit can be detected. The extracted bit from each sample ($S_{j}^{'}$) is achieved using the following equation:
$$S_{j}^{\prime } = \left\{ {\begin{array}{*{20}c} {0,\quad if\;\,0.25 > \left| {c_{j}^{\prime } \delta _{j} - {\rm round}\left( {c_{j}^{\prime } \delta _{j} } \right)} \right|}, \\ {1, \quad if\;\,0.25 \le \left| {c_{j}^{\prime } \delta _{j} - {\rm round}\left( {c_{j}^{\prime } \delta _{j} } \right)} \right|}. \\ \end{array}} \right.$$
After getting information about all samples in the frame, based on the number of samples which represent “0” or “1” (voting scheme), the secret bit ($w_{l}^{'}$) related to the frame can be extracted. If the number of samples identified as “0” is equal to or larger than half the frame size, the extracted bit is “0”, otherwise it is “1”.
5.
To achieve the raw watermark stream we need to use the encryption key and the decryption algorithm.

In fact, an attacker should have access to the following information to extract the secret stream:

Embedding algorithm;
Encryption algorithm;
Encryption/decryption key;
Frequency band in the embedding procedure;
Frame size in the embedding procedure;
Scaling factor in the embedding procedure.

Thus, if it is not impossible, it is extremely difficult for an attacker to extract the secret information embedded into the audio signal.

4 Experimental results

To evaluate the performance of the proposed method and to consider the applicability of the scheme in a real scenario, all songs in the album Rust by No, Really [16] and the most popular tracks of different albums in different genres [28] have been used. All audio clips are sampled at 44.1 kHz with 16 bits per sample and two channels. Note that the presented results are just for one channel: the left one. In other words, we have converted the stereo signals to mono and used only the left channel.

Audio watermarking applications require a trade-off between the desired properties, namely, capacity, robustness and transparency. The following scenarios can be assumed to obtain different results regarding these three properties:

(1)
No robustness: in this case, very high capacity and transparency can be achieved;
(2)
Semi-robustness: robustness against MP3 compression and common attacks is demanded. In this case, more distortion should be accepted, as compared to Scenario 1;
(3)
Robustness against many attacks with a wide range of changes is desirable. This is more difficult and complicated than the previous scenarios, since we need robustness against various attacks. Thus, according to the trade-off between capacity, transparency and robustness, a sacrifice in capacity and transparency is required.

The significant advantage of this scheme is providing superior results for all these three conditions.

The objective difference grade (ODG) has been used in this paper to evaluate the transparency of the proposed algorithm. The ODG is one of the output values of the ITU-R BS.1387 PEAQ [17] standard, where ODG = 0 means no degradation and ODG = −4 means a very annoying distortion. Values of ODG between −1 and 0 are required for transparent watermarking. The OPERA software [29] based on the ITU-R BS.1387 standard has been used to compute this objective measure of quality.

Table 1 shows the perceptual distortion, payload and BER under the MP3 compression attack with different bit rates. Note that different values for parameters are used to achieve a different trade-off between capacity, transparency and robustness, as usual for all watermarking systems. For example, for “Beginning of the end”, a frame size d = 1 and a wide frequency band, the results show high capacity and robustness against MP3-128. On the other hand, using a frame size d = 5 and a narrower frequency band, less capacity and better robustness is achieved. Also, increasing the scaling factor results in more accuracy and better transparency, whereas decreasing it leads to better robustness.

Table 1 Results of three real song signals (robust against Table 2 attacks)

Full size table

In this scheme, we have three parameters and audio watermarking schemes have three main properties. Thus, we have three inputs and three outputs for a nonlinear system which works based on the HAS. Finding explicit equations to adjust the requirements is extremely difficult and sometimes impossible. We may use different loops and conditions to obtain better results.

As mentioned in the Sect. 3.1, we have general tuning rules which can help us to reach the requirements or to get close to them very quickly. The frame size has more effect on robustness, whereas the scaling factor and frequency band have more effect on transparency and capacity. In other words, by increasing the frame size better robustness is achieved. In addition, increasing the frequency band leads to better capacity. Finally, by increasing the scaling factor better transparency can be achieved.

Note that these parameters allow to regulate the ODG between 0 (not perceptible) and −1 (not annoying), with about 800–7,000 bits per second (bps) of capacity and allowing robustness against MP3-128, which are extremely better than typical requirements.

The default parameter values (frequency band 12–16 kHz, frame size equal to 5 and scaling factor equal to 10) have been selected for “Stop payment” and “Breathing on another planet” audio test files. The ODG for “Breathing on another planet” is −0.43 and for “Stop payment” it is −0.19.

Table 2 illustrates the effect of several common attacks, provided by the Stirmark Benchmark for Audio (SMBA) v1.0 [14], on ODG and BER for the two selected audio test files. The parameters were selected for each signal, then the embedding method was applied, the SMBA software was used to attack the marked files and, finally, the detection method was applied for the attacked files. The ODG in Table 2 is calculated between the marked and the attacked-marked files. The parameters of the attacks are selected according to the definitions provided in the SMBA web site [18].

Table 2 Robustness test results

Full size table

For example, in AddBrumm, 1–4 k shows the strength and 1–4.5 k shows the frequency. This row reports that any value in the range 1–4 k for the strength and 1–4.5 k for the frequency can be used without any significant change in BER. In fact, this table provides the average results for the test signals based on the BER and, in the case with the same BER, based on the limitation of the parameters. It can be seen that the proposed scheme produces excellent robustness against all these attacks (BER close to zero) even if the attacks significantly distort the audio files (even for ODG lower than −3).

Table 3 shows how considering the HAS improves the properties of the watermarking system. In fact, in the proposed method, the quantization level is adjusted by the ATH which results in better properties of the method. For example for “Breathing on another planet”, when the frequency band is 9–16 kHz, the BER rate for both, considering the HAS and constant quantization (without any HAS model), is about 0.05. However, the distortion caused by watermarking for adaptive quantization is almost imperceptible whereas it is absolutely annoying when constant quantization is used.

Table 3 Adaptive vs. constant quantization

Full size table

To reduce the computational time and memory usage, songs can be divided into small clips, e.g. 10 s each. Then, the synchronization method described in [19] and the embedding algorithm described in this paper was applied for each clip separately.

Figure 4 shows the difference between adaptive and constant quantization. As this plot illustrates, using adaptive scaling quantization, the transparency can be improved and kept in a perceptible but not annoying area, which is the typical requirement for a watermarking system. However, using constant quantization, the embedding method can destroy the cover audio signal and the ODG will be in the annoying area when capacity is increased beyond some threshold.

The method proposed in this paper has been compared with several recent audio watermarking strategies. Almost all the audio data hiding schemes which produce very high capacity are fragile against signal processing attacks. Because of this, it is not possible to establish a comparison of the proposed scheme with other audio watermarking schemes which are similar to it as capacity is concerned. Hence, we have chosen a few recent and relevant audio watermarking schemes in the literature. In Table 4, we compare the performance of the proposed watermarking algorithm and several recent audio watermarking strategies robust against the MP3 attack. Speech applications and codecs are considered in [14]. The distortion introduced to the marked signal is slightly annoying, capacity is very low and robustness is achieved against compression attacks. Recently, [15] introduces a very fast scheme which uses the Fourier transform. The embedding bit-rate is low, 64 bits per second, but the scheme is very robust against several attacks. Lie et al. [30] consider the HAS to present a method in the time domain, but the embedding capacity is quite low. Baras et al. [31] present a transparent technique, but, in some cases, the distortion is slightly annoying. The provided capacity in [31] is about a hundred bits. Fallahpour et al. [12, 13], which were also proposed by the authors of this paper, have a remarkable performance in the different properties, but the scheme proposed in this paper can manage the needed properties better, since there are three useful adjustable parameters. In particular, the results of this paper make it possible to improve the transparency results with respect to [12, 13] due to the explicit use of the HAS and adaptive quantization. This comparison shows the superiority in both capacity and imperceptibility of the suggested method for the same robustness with respect to other robust schemes. This is particularly relevant, since the proposed scheme can embed much more information and, at the same time, introduces less distortion in the marked file. In short, the proposed scheme achieves higher capacity if we compare it with methods with similar robustness and imperceptibility, and more robustness and imperceptibility if we compare it to methods with similar capacity.

Table 4 Comparison of different watermarking algorithms

Full size table

5 Conclusions

This paper suggests an audio watermarking algorithm in the logarithm domain based on the HAS. The human ear requires more precise samples at low amplitudes (soft sounds) and taking advantage of the logarithm it is possible to design a logarithmic quantization algorithm to exploit this property. Adjusting the quantization level based on the HAS in the logarithm domain results in a very high capacity, imperceptible distortion and robustness. The most notable features of the proposed algorithm are as follows: (1) blind watermark extraction, (2) adaptive quantization based on the HAS that improves transparency and robustness, making it possible to enhance them simultaneously, which is a main challenge of many techniques; and (3) embedding a single secret bit into all samples of a frame, with an adjustable frame size, delivers a suitable solution to obtain a convenient trade-off between the properties of the watermarking system.

The experimental results show that the scheme provides high capacity (800–7,000 bps), without significant perceptual distortion (ODG is greater than −1) whilst achieving robustness against common audio signal processing, such as added noise, filtering and MPEG compression (MP3).

References

Garcia, R.: Digital watermarking of audio signals using a psychoacoustic auditory model and spread spectrum theory. In AES 107th Convention, pp. 123–131 (1999)
Tsai, H.H., Cheng, J.S., Yu, P.T.: Audio watermarking based on HAS and neural networks in DCT domain. EURASIP J. Appl. Signal Process 3, 252–263 (2003)
Article Google Scholar
Katzenbeisser, S., Petitcolas, F.A.P.: Information hiding techniques for steganography and digital watermarking. Artech. House, Boston (2000)
Google Scholar
Lee, S.K., Ho, Y.S.: Digital audio watermarking in the cepstrum domain. IEEE Trans. Consum. Electron. 46(3), 744–750 (2000)
Article MathSciNet Google Scholar
Li, X., Yu, H.H.: Transparent and robust audio data hiding in cepstrum domain. In: IEEE International Conference on Multimedia and Expo, vol. 1, pp. 397–400 (2000)
Hsieh, C.-T., Sou, P.-Y.: Blind cepstrum domain audio watermarking based on time energy features. In: 14th International Conference on Digital signal processing, vol. 2, pp. 705–708 (2002)
Liu, S.C., Lin, S.D.: BCH code based robust audio watermarking in the cepstrum domain. J. Inform. Sci. Eng. 22, 535–543 (2006)
Google Scholar
Li, S., Cui, L., Choi, J., Cui, X.: An audio copyright protection schemes based on SMM in cepstrum domain. In: International Workshops on Structural, Syntactic, and Statistical Pattern Recognition (SSPR and SPR’06), LNCS, vol. 4109, pp. 923–927 (2006)
Hu, H.T., Chen, W.H.: A dual cepstrum-based watermarking scheme with self-synchronization. Signal Process. 92(4), 1109–1116 (2012)
Article Google Scholar
Yang, R., Kang, X., Huang, J.: Robust Audio Watermarking Based on Log-Polar Frequency Index. 7th International Workshop on Digital Watermarking, IWDW 2008, Volume 5450 of Lecture Notes in Computer Science, pp. 124–138, Springer (2008)
Dutta, M.K., Gupta, P., Pathak, V.K.: A perceptible watermarking algorithm for audio signals. Multime’d. Tools Appl. pp. 1–23 Feb 2012
Fallahpour, M., Megías, D.: High capacity audio watermarking using FFT amplitude interpolation. IEICE Electron. Express 6(14), 1057–1063 (2009)
Article Google Scholar
Fallahpour, M., Megías, D.: Robust high-capacity audio watermarking based on FFT amplitude modification. IEICE Trans. Inf. Syst. E93-D(01), 87–93 (2010)
Article Google Scholar
Nishimura, A.: Audio data hiding that is robust with respect to aerial transmission and speech codecs. Int. J. Innov. Comput. Inf. Control 6(3(B)), 1389–1400 (2010)
Google Scholar
Kang, X., Yang, R., Huang, J.: Geometric invariant audio watermarking based on an LCM feature. IEEE Trans. Multime’d 13(2), 181–190 (2011)
Article Google Scholar
No, Really,“Rust”. http://www.jamendo.com/en/album/7365
Thiede, T., Treurniet, W.C., Bitto, R., Schmidmer, C., Sporer, T., Beerens, J.G., Colomes, C., Keyhl, M., Stoll, G., Brandenburg, K., Feiten, B.: PEAQ—The ITU standard for objective measurement of perceived audio quality. J. AES 48(1/2), 3–29 (2000)
Google Scholar
Stirmark Benchmark for Audio. http://wwwiti.cs.uni-magdeburg.de/~alang/smba.php
Wang, X.Y., Zhao, H.: A novel synchronization invariant audio watermarking scheme based on DWT and DCT. IEEE Trans. Signal Process. 54(12), 4835–4840 (2006)
Article Google Scholar
Megías, D., Herrera-Joancomartí, J., Minguillón, J.: Total disclosure of the embedding and detection algorithms for a secure digital watermarking scheme for audio. 7th International Conference on Information and Communication Security, ICICS 2005. Volume 3783 of Lecture notes in computer science, pp. 427–440, Springer (2005)
Ko, B.S., Nishimura, R., Suzuki, Y.: Log-scaling watermark detection in digital audio watermarking. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’04), 3, pp. 81–84 (2004)
Unoki, M., Hamada, D.: Method of digital-audio watermarking based on cochlear delay characteristics. Int. J. Innov. Comput. Inf. Control 6(3(B)), 1325–1346 (2010)
Google Scholar
Kondo, K., Nakagawa, K.: A digital watermark for stereo audio signals using variable inter-channel delay in high-frequency bands and its evaluation. Int. J. Innov. Comput. Inf. Control 6(3(B)), 1209–1220 (2010)
Google Scholar
Gulbis, M., Muller, E., Steinebach, M.: Content-based audio authentication watermarking. Int. J. Innov. Comput. Inf. Control 5(7), 1883–1892 (2009)
Google Scholar
Burnett, I.S., Pereira, F., Van de Walle, R., Koenen, R.: The MPEG-21 book, Wiley (2006)
Xu, C.S., Feng, D.D.: Robust and efficient content-based digital audio watermarking. Multimedia Syst. 8.5, 353–368 (2002)
Article Google Scholar
Peinado, M., Petitcolas, F.A.P., Kirovski, D.: Digital rights management for digital cinema. Multimedia Syst. 9.3, 228–238 (2003)
Article Google Scholar
http://www.jamendo.com/en/
http://www.opticom.de/products/opera-demoversion.html
Lie, W.N., Chang, L.C.: Robust and high-quality time-domain audio watermarking subject to psychoacoustic masking. The 2001 IEEE International Symposium on Circuits and Systems, 2001. ISCAS 2001, vol. 2, IEEE (2001)
Cléo, B., Moreau, N., Dymarski, P.: Controlling the inaudibility and maximizing the robustness in an audio annotation watermarking system. IEEE Transactions on Audio, Speech, and Language Processing, 14.5, pp. 1772–1782 (2006)

Download references

Acknowledgments

This work was partly funded by the Spanish Government through projects TSI2007-65406-C03-03 “E-AEGIS”, TIN2011-27076-C03-02 “CO-PRIVACY” and CONSOLIDER INGENIO 2010 CSD2007-0004 “ARES”.

Author information

Authors and Affiliations

School of Information Technology and Engineering (SITE), University of Ottawa, Ottawa, Canada
Mehdi Fallahpour
Estudis d’Informàtica, Multimèdia i Telecomunicació, Internet Interdisciplinary Institute (IN3), Universitat Oberta de Catalunya, Barcelona, Spain
David Megías

Authors

Mehdi Fallahpour
View author publications
You can also search for this author in PubMed Google Scholar
David Megías
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehdi Fallahpour.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fallahpour, M., Megías, D. Secure logarithmic audio watermarking scheme based on the human auditory system. Multimedia Systems 20, 155–164 (2014). https://doi.org/10.1007/s00530-013-0325-1

Download citation

Published: 09 June 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s00530-013-0325-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Secure logarithmic audio watermarking scheme based on the human auditory system

Abstract

Similar content being viewed by others

Adjustable audio watermarking algorithm based on DWPT and psychoacoustic modeling

Digital audio watermarking using minimum-amplitude scaling on optimized DWT low-frequency coefficients

Robust audio watermarking algorithm based on DWT using Fibonacci numbers

1 Introduction

2 Human auditory system