1 Introduction

Nowadays, online video piracy is one of the major concerns and challenges of the film production companies. Illegal camcording at the cinema theaters is the most common source of the video piracy, in which video captured from the screen by a camcorder is widely distributed through Internet without observing the copyright laws. Today, watching the pirate videos online is much more facilitated, taking into account the ease of access to the Internet and the number of streaming servers. Although strict laws have been passed in various countries to stop the online piracy, they have been ineffective in practice. For example, the Dark Knight movie was illegally downloaded 7-million times within only six months of its original distribution at 2008; despite the fact that the Warner Brothers Company had prepared itself to combat this challenge from several months ago [30]. One of the most important techniques capable of tackling such disputes is the video watermarking.

In video watermarking, the copyright information is embedded into the video signal. A watermarking system is normally composed of two steps: embedding and detection. At the embedding stage, a watermark is generated and embedded into the original signal to produce the watermarked signal. In the detection phase, the input signal is examined to detect whether it contains the watermark or not [21]. In the most cases that are known as the blind watermarking, the original signal is not available at the detection phase and detection must be carried out without the reference signal. One of the main issues of the blind watermarking is the synchronization; which can be defined as the process of extracting the relation between the spatial and temporal coordinates; or in other words, locating the watermark. Therefore, any problem in the synchronization will trouble the watermark detection [21].

Since the watermark is actually an extra information added to the host signal, it can degrade the visual quality of the video. Therefore, maintaining the video quality at an acceptable level after being watermarked – known as the imperceptibility – is a major characteristic of a watermarking system. A watermarked video may suffer from various kinds of distortion. For instance, consider the video camcording at the cinema theaters. In case that this video is captured from a large screen via a high quality video camera capable of HD-quality recording, the video resolution might be preserved. However, if this video is supposed to be played on the portable devices such as mobile phones, its format or resolution might be changed. On the other hand, due to the improper camera position, distortions like upscaling, rotation and cropping might happen which results in a trouble for the synchronization. Moreover, the video might lose some information due to transmission on the Internet. Frame rate conversion is some other attack often applied to break the synchronization. Altogether, watermark must be detectable whether it has suffered from maliciously or unintentionally applied distortion. Therefore, robustness is another major characteristic of a watermarking system [21].

Within recent years, various blind watermarking algorithms are proposed to withstand different types of attacks. Using Harris detection in [12], watermark is embedded into the discrete cosine transform (DCT) domain of Y component of the cover video. Similar techniques are presented in [32] and [19] where the watermark is embedded into low frequency DCT coefficients; due to the effect of attacks such as downscaling on the high frequency coefficients. Downscaling a frame in the spatial domain is almost equivalent to removing the high frequency bands in the DCT domain. However, the human visual system (HVS) is more sensitive to the changes in the low frequency coefficients. Since the coefficients around DC normally are of high magnitudes, subtle modification of such coefficients results in the perceivable degradation in the video quality. Therefore, the watermark power cannot be increased in such methods due to the imperceptibility considerations. Although the methods proposed in [12] and [32] are robust against the scaling and rotation, they face limitations in case of cropping attack. The method proposed in [19] is based on quantization index modulation (QIM) embedding and is robust against the downscaling, frame rate conversion and format conversion. Meanwhile, it is incapable to withstand geometric attacks such as rotation and cropping.

Rasti et al. [28] proposed to embed the watermark into low entropy parts of all three RGB color channels of each frame by combining QR decomposition, SVD, Chirp Z-transform and DWT. This method shows good performance against different attacks, however it uses a non-blind detection algorithm. In order to provide more security and to mislead the intruder, a frame selection algorithm is proposed in [5]. In this method a few frames are selected for embedding based on mathematical relationship between number of frames and capacity. However, this may make the system vulnerable to temporal attacks which is not considered in the proposed algorithm. In [29] a bi-orthogonal wavelet transform (BWT) based video watermarking method is proposed. Also, an improved artificial bee colony algorithm is employed to generate random frame for the embedding process. Experiments show a good performance against different attacks while the combined attacks are not considered at all.

It should be mentioned that DWT and DCT transforms have a variaty of usage in image processing area [20]. A combined DWT-DCT transform is utilized in [13]. The Arnold transform is also employed to enhance the security and robustness. To improve the robustness of the watermarking algorithm, key-frames could be selected as carries in the frame sequence [10]. They proposed a method based on boundary luminosity analysis to extract the key-frames. Then the watermark is embedded in the low frequency DCT coefficients of the key-frames. In contrast, the proposed method in [1], utilizes the P-frames of video for embedding purposes. The motion information is analyzed to find appropriate selections and then, nonzero quantized residuals are utilized for watermark embedding. One of the popular solutions for controlling the watermark strength and generate robust and imperceptible watermarked carriers is to consider Just Noticeable Distortion (JND). A saliency-modulated JND profile is proposed in [6] to improve video watermarking scheme. Their method tries to achieve the most robust possible scheme with an imperceptible watermark.

In [11] and [7] watermarking algorithms are proposed based on discrete wavelet transform (DWT). The watermark is embedded into the DWT coefficients of a sub-image in [11]. To detect the watermark, the sub-image is extracted from the entire image at first, and then the watermark detection algorithm is applied to it. This method is robust against cropping, but faces some problems in case of the rotation. The method in [7] also suffers from lack of robustness against geometric attacks. These weaknesses arise from two issues in the DWT domain. The first problem is the lack of shift invariance, that is, a subtle shift of the input signal causes significant alterations in the coefficients’ distribution at various scales. The second problem is the poor directional sensitivity. Dual-tree complex wavelet transform (DTCWT) with intrinsic features like approximate shift invariance, good directional sensitivity and perfect reconstruction and efficient computation has been introduced to address these issues [14,15,16]. The magnitude of the low frequency coefficients remaining almost the same after rotation and scaling results in the robustness of watermarking techniques based on DTCWT against geometric attacks [9].

Recently, many watermarking algorithms have been proposed based on DTCWT [2,3,4, 18, 22, 23, 26, 27, 31]. In [9], all video frames are transformed at first, using a 4-level DTCWT. The watermark coefficients are derived by applying a 1-level DTCWT to the watermark and are added to the third and fourth level coefficients of frame with the proper weights. This method is robust against the upscaling, cropping, rotation and lossy compression, but its performance is poor against downscaling in resolution. Moreover, in this method, the luminance channel (Y) of frames is applied for watermark embedding. Since the HVS is more sensitive to luminance channel than the chrominance one [17, 25], the imperceptibility is reduced by increasing the watermark power. A DTCWT and Singular Value Decomposition (SVD) based hybrid watermarking system is proposed in [4] that improves the imperceptibility through applying the U channel of frames. However, the performance of this method is not evaluated against frame rate conversion. SVD is also combined with secrete sharing to obtain robust video watermarking scheme [8]. Another DTCWT based algorithm is proposed in [3] where the watermark is embedded into the all six sub-bands of the third level of a 3-level DTCWT transform. This method embeds the watermark into the U channel too, and is robust against the frame rate conversion attack, in addition to the geometric attacks but the computational complexity of the proposed method is hight and it has poor performance against large combined geometric distortions.

In this paper, a video watermarking technique based on DTCWT is presented. For better detection, the watermark generation method somehow differs from common approaches; in the sense that every watermark row is generated with its own specific key which is different from those of others. The U channel is applied for the watermark embedding to increase the imperceptibility. Moreover, to increase the robustness against attacks, high frequencies of the third level of a 3-level DTCWT are used. Only the coefficients at the first and sixth sub-bands are applied to improve the imperceptibility. Moreover, to significantly decrease the embedding time, watermark embedding is not performed directly at the transform domain, but its equivalent mask is executed in the spatial domain. At the detection phase, the watermark is extracted from the received video at first. Then two-dimensional (2D) normalized cross correlation (NCC) between the original watermark (generated with the help of a given key) and the extracted watermark is done. By comparing the result to a preset threshold, the presence or absence of the watermark is detected. Many scenarios are considered to evaluate the performance of the proposed method and the comparison is performed with state of the art methods.

The rest of the paper is organized as follows. In Section 2 the details of different parts of the proposed method is described in details. Section 3 includes the evaluation of the method against different video attacks. Finally Section 4 concludes the paper.

2 Watermark algorithm

2.1 Watermark generation

Watermark is assumed to be a random 2D matrix of elements including -1 and + 1. A pseudo-random number generator with the below key is applied to generate the watermark:

$$ k = k_{c}+ k_{v} $$
(1)

The overall key consists of two parts: kc which is the fixed key and the variable key kv which changes every v frames. kv prevents a certain watermark from being repeatedly embedded into many frames. Since a large value of v means embedding a certain watermark into many consecutive frames and consequently the watermark being vulnerable against the estimation attacks, its value should not be too large. On the other hand, a small value for v helps the attacker to remove the watermark from the video by means of temporal frame averaging techniques [21]. Therefore, v must be properly set to avoid both the estimation and temporal frame averaging attacks.

At the detection phase, watermark will not be recoverable without knowing the size of the originally generated watermark at the embedding stage. We further discuss this issue through an example. Figure 1a shows the 2D NCC between two watermarks of sizes 60 × 120 and 55 × 115. Assume that the first watermark is embedded into the cover video, and the second one is produced using the key and the size of the input video. As can be observed, there exists no correlation between these two watermarks and the detector will be misled. Therefore, the detector fails without knowing the size of the original video since it must make watermark from the size of the manipulated video which is not correct. Following, we suggest a solution to this problem.

Fig. 1
figure 1

The effect of the size of the generated watermark on the correlation strength. a The traditional way of watermark generation b The proposed method for watermark generation. The horizontal and vertical axes show the correlation index and 2D normalized correlation, respectively

To solve this problem, the kc key must change for every row (or column) while generating the watermark. With this approach, generating watermarks of different sizes does not trouble its synchronization, and always a watermark of smaller size is a sub-matrix of the larger one. Figure 1b shows 2D NCC between two watermarks of sizes 60 × 120 and 55 × 115 generated by the described approach. As can be observed, the detector simply detects the correlation between two watermarks; hence, the size of the original video is not required at the receiver side and the key for watermark generation suffices for the detection.

Due to the redundancy in the DTCWT, some information of the pseudo-random watermark lies in the null space and would be lost during inverse transform. To avoid this, the watermark is added in the transformed domain. The last issue is the watermark size. Since the watermark is a 2D array, its size is defined having the number of its rows and columns known. The horizontal and vertical dimensions of the watermark are chosen as one eighth of those of the U frame. The watermark embedding procedure is described with details at the following section.

2.2 Watermark embedding

Two important issues must be considered to design a proper watermark embedding algorithm. The embedded watermark must be robust against various attacks, and it should not cause such severe degradation in the video quality. The low frequency coefficients of a video frame are robust against the compression and geometric attacks, though the HVS is more sensitive to their changes [9]. Therefore, deciding where to add the watermark is a trade-off between the watermark imperceptibility and its robustness against attacks. In the proposed method, low frequency coefficients are applied to increase the watermark robustness, while the U frame of the YUV image presentation is chosen to improve the imperceptibility. The YUV model of a video frame is derived from its RGB one as below:

$$ \left[\begin{array}{l} Y\\ U\\ V \end{array}\right] =\left[\begin{array}{ccc} 0.2989 & 0.5866 & 0.1145\\ -0.1688 & -0.3312 & 0.5\\ 0.5 & -0.4184 & -0.0816 \end{array}\right] \times \left[\begin{array}{l} R\\ G\\ B \end{array}\right] $$
(2)

To increase the robustness of our method, the third level coefficients of a 3-level DTCWT (low frequency coefficients) are applied for the watermark embedding. Different regions of a frame have different frequency characteristics. Therefore, to improve the imperceptibility, a conceptual mask should be designed to adjust the watermark power at different regions. This mask must be designed such that it increases the watermark power at the regions with higher frequency components. Since this mask must be perfectly recoverable at the detection phase, it is extracted from the first level coefficients of the Y frame. The Y channel and first level coefficients are selected due to their negligible alteration at the detection stage [9], and the imperceptibility improvement, respectively. Before further discussion on this selection, we explain how the mask is made.

As noted before, every DTCWT level contains six high frequency sub-bands. Only the first and sixth sub-bands are applied in the proposed method for the sake of embedding. Two sub-bands are selected instead of all of them, to improve the watermark imperceptibility. The first and sixth level sub-band coefficients of the Y frame are used for the mask generation, and they are combined as below to produce the watermark:

$$ \hat{M}_{y} = \left\lceil\frac{|Y_{1,1}^{H}| + |Y_{1,6}^{H}|}{2}\right\rceil $$
(3)

where \(\hat {M}_{y}\) and \(|Y_{1,i}^{H}|\) are the mask (dependent to the Y frame) and i’th sub-band of the first level of the transformed Y frame, respectively. To prevent the watermark power from being excessively increased, a constraint is considered for the generated mask as below:

$$ M_{y} = \left\{\begin{array}{l} \hat{M}_{y} \quad \hat{M}_{y} \le \beta\\ \\ \beta \quad \hat{M}_{y} > \beta \end{array}\right. $$
(4)

In this way, the overgrow of the watermark power in some regions that causes the imperceptibility problems is prevented. The input video is assumed to be in YUV 4:2:0 format. In this format, the U frame horizontal and vertical dimensions are one half of those Y frame; hence, the generated mask is of the same size as the U frame. Now we discuss on the selection of the first level coefficients for the mask generation. As stated, for the sake of simplicity, we apply construct the mask from lower level coefficients. Thus, it is needed to upsample the mask to be the same size as U frame. However, with such expanding, the effect of every mask element is leaked to the neighboring elements. For example, the mask normally has higher values at the edges, resulting in the watermark power increases at those regions. Therefore, increasing the size of the mask causes its value that is proportional to the watermark power increases around edges, which yields leaving an undesirable artifact in those regions. This issue will be illustrated through an example in Section 4.

When the mask is generated, the watermark must be added to the first and sixth sub-bands of the third level of the transformed U frame. In the proposed method this is equivalently performed in the spatial domain rather than directly in the transform domain, in order to speed up the watermarking process. More precisely, a 3-level DTCWT is required to insert the watermark in the third level sub-bands of the U frame, from which we try to avoid since it needs more computation.

To prepare the watermark, a one-level transform is applied to the watermark W at first. The watermark \(W^{\prime }\) is constructed such that the first and sixth sub-bands of its third level transform are equal to those of W; while the other sub-bands of its all levels are set to zero. For this sake, a 3-level DTCWT is first applied to a zero-valued 2D array of the same size as U frame. Then the first and sixth sub-bands of its third level are set to those of W, and the inverse transform is applied. Finally, the watermark is inserted into the U frame in the spatial domain based on the below formula:

$$ \hat{U} = U + \alpha M . W^{\prime} $$
(5)

where \( M . W^{\prime }\) represents the element by element matrix multiplication. α determines the watermark power. More precisely, α and β controls the trade-off between the robustness and imperceptibility of the watermark. As the consequence, their proper adjustment is critical for the system efficiency. The block diagram of the embedding method is shown in Fig. 2.

Fig. 2
figure 2

The block diagram of the proposed watermark embedding algorithm

2.2.1 Parameter adjustment

Imperceptibility is one of the most important factors in a watermarking system design. An embedded watermark perceivable by the HVS is not suitable for the practical applications. The peak signal to noise ratio (PSNR) is a common metric for the visual evaluation of the videos. However, most of the time, it does not properly reflect the perceptual quality of the videos; due to the non-linear behavior of HVS. Therefore, universal standards such as JPEG and MPEG rely on the subjective tests to better evaluate the different algorithms.

Considering the above discussion, we have also adopted the subjective tests to adjust α and β parameters in our watermarking system. To this end, an experiment is conducted as described in the following. In each test, the watermarked and original videos of one-minute duration are simultaneously played for the subject at the rate of 30 frames per second (fps). The starting point of the video is chosen randomly. Then the subject votes for the perceived quality of the watched videos through a number ranging from one to five; where one and five represent the worst and best qualities, respectively. This experiment is repeated for 10 different videos.

Finding the best values for α and β requires extensive experiments. Considering the high correlation between these two parameters, β is fixed at 20 to avoid the large number of experiments. Therefore, only α is derived subjectively. For this sake, each video is watermarked by all α values ranging from 0.5 to 5 at step 0.5; thus every subject must make 100 different comparisons (10 videos each one watermarked at 10 different α values). In each experiment, the original video and the α value of the watermarked video are chosen randomly to fairly perform all 100 experiments. The mean opinion score (MOS) of this experiment is shown in Fig. 3.

Fig. 3
figure 3

Perceptual quality of the watermarked videos in terms of mean opinion score for different watermark strengths (α)

As expected, subjects have been unable to distinguish between the original and watermarked videos for low α value; while they have correctly detected the quality loss of the watermarked video in some cases for α values exceeding 3.5. This is due to the fact that in the videos with more low frequency components, increasing α decreases the watermark imperceptibility rapidly. Therefore, we set α = 3.5 and β = 20 hereafter.

2.3 Watermark detection

The inverse of the watermark embedding process must be carried out for the watermark detection. As the first step, the desired watermark should be calculated. Then the watermark detection is performed thanks to the correlation between the extracted and desired watermarks. To this end, the mask \(M_{y^{\prime }}\) is extracted from the Y channel of the received video (denoted by \(Y^{\prime }\)) in the same manner as that explained in the watermark embedding process. Then the received U frame (denoted by \(U^{\prime }\)) is element-by-element divided by the mask \(M_{y^{\prime }}\), yielding \(U^{\prime \prime }\). This reduces the effect of U frame elements in the detection process. Then the first and sixth sub-bands of the third level of a 3-level DTCWT applied on \(U^{\prime \prime }\) are extracted. They are denoted by \(W_{1}^{\prime }\) and \(W_{6}^{\prime }\), respectively.

To detect the embedded watermark, \(W_{1}^{\prime }\) and \(W_{6}^{\prime }\) must be compared to the expected watermark with respect to the key K. For this sake, the watermark W is constructed with respect to the size of \(U^{\prime }\), based on the previous section. Then the first and sixth sub-bands of W are extracted using a one-level DTCWT, and are called \(W_{1}^{\prime \prime }\) and \(W_{6}^{\prime \prime }\), respectively. At the end, it only suffices to calculate the correlation between \(W_{1}^{\prime }\), \(W_{6}^{\prime }\), \(W_{1}^{\prime \prime }\), and \(W_{6}^{\prime \prime }\).

Now the detection steps are described mathematically. After extracting the received Y and U frames (\(Y^{\prime }\) and \(U^{\prime }\)) and generating the mask \(M_{y^{\prime }}\), the matrix \(U^{\prime \prime }\) is constructed as below:

$$ U^{\prime\prime} = U^{\prime}./M_{y} $$
(6)

where ./ represents the element-wise division. Then the correlation between the desired and extracted watermarks is evaluated based on 2D NCC criteria as below:

$$ NC_{f} = \frac{1}{2} \times \left( W^{\prime}_{1} * W^{\prime\prime}_{1} + W^{\prime}_{6} * W^{\prime\prime}_{6}\right) $$
(7)

where * stands for 2D NCC calculated as below:

$$ W*V = \frac{||W.V||}{{||w|| \times ||v||}} $$
(8)

f in NCf represents that the correlation calculation is performed for each frame. To reduce the effect of the uncorrelated values on the non-watermarked frames, NCf is averaged over t consecutive frames, resulting in decreasing the false positive rate. Therefore, the final correlation value will be:

$$ NC = \frac{1}{t} \sum\limits_{f = 1}^{t} NC_{f} $$
(9)

Then NC must be compared to a certain threshold. The determining element of the threshold is set such that the probability of false detection (Pfd) lies below 10− 6. The detection threshold is calculated utilizing the model described in [24]. The watermark detection system is illustrated in Fig. 4. The detection value calculator is 2D NCC in our problem. NCC can also be described as below:

$$ w*v = \frac{||W.V||}{{||W|| \times ||V||}} = \cos(\gamma) $$
(10)

where γ is the angle between w and v. More precisely, since w and v are two-dimensional, γ is the angle between the normal vectors of w and v planes. The threshold value can be determined with respect to the γ angle. All the possible watermarks lie on a hyper-circle. The ratio of the area of the hyper-circle located inside the detection region to the entire hyper-circle area determines the false alarm, as shown in Fig. 5. This ratio is calculated as below [24]:

$$ p_{fd} = \frac{I_{n-2}(T)}{2\times I_{n-2}(\pi/2)} $$
(11)

where n is the number of watermark elements and

$$ I_{d}(\theta) = {\int}_{0}^{\theta} \sin^{d}(\theta) d\theta $$
(12)
Fig. 4
figure 4

The block diagram for the watermark detection system. The figure adapted from [24]

Fig. 5
figure 5

Three dimensional detection region. The figure adapted from [24]

In (11), T and d are the threshold value and the watermark dimensionality, respectively. Using this formula, the threshold value is determined numerically to achieve Pfd = 10− 6. The threshold values for three different video sizes are given in Table 1.

Table 1 The threshold of the watermark detection system for each video size

3 Experimental results

In order to evaluate the proposed method, 20 standard videos of resolution 288 × 352 including akiyo, bowing, bridge_close, bridge_far, bus, coastguard, crew, deadline, foreman, highway, husky, mad900, silent, students, suzie, waterfall, city, container, flower, and football(b), 10 HD-quality videos of resolution 1080 × 1920 including blue_sky, crowd_run, dinner, ducks_take_off, factory, into_tree, old_town_cross, park_joy, pedestrian_area, riverbed and lastly 8 videos of resolution 720 × 1280 including vidyo4, vidyo3, vidyo1, Stockholm, sintel_trailer, shields, parkrun and mobcal are applied, all of them containing 300 frames in 4:2:0 format recorded at the rate of 30 fps. The watermarking key changes once every 20 frames, that is v = 20 in (1). The average correlation of all frames is applied at the receiver for the sake of detection, i. e. t = 300. The performance of the proposed method is compared to [3] and [13]. Considering the different effects of the watermark power in each method, the watermark power factor (α) is chosen such that the videos watermarked by all methods have the same PSNR quality. This choice can be better understood through a simple example. Figure 6 shows a single video frame watermarked by the proposed method at α = 3.5 and the same frame watermarked by the method proposed in [3] at α = 36 (authors’ recommendation). As can be observed, the proposed method does not impose any distortion perceivable to HVS at all, while [3] with the recommended watermark power, impose perceivable distortion that can be perceived according to HVS. Therefore, to better compare the results, it is preferred to apply the PSNR in videos to adjust the α values.

Fig. 6
figure 6

The effect of the watermark strength on perceptual quality of video. a the method of [3] with recommended watermark strength b the proposed method with α = 3.5

3.1 Robustness against compression

The first experiment investigates the robustness of the proposed method against the lossy compression. All three video sets mentioned above are tested in this experiment. Every video is watermarked 30 times and then is compressed using an H.264/AVC encoder and decoder with the quantization parameter set to 30. All methods perfectly detect all the watermarked videos. Therefore, for a better comparison, the average and standard deviation of the normalized correlation for the watermarked videos are shown in Table 2.

Table 2 The robustness of three methods against compression attack

Although the performance of all methods is too close to each other in smaller videos, the superiority of the proposed method in higher qualities is evident. Also the standard deviation of normalized correlation is always smaller than the other methods which shows that the proposed method is more robust against compression attack. The receiver operation characteristic (ROC) curves of three methods are plotted for better comparison. Figure 7 illustrates that the proposed method achieves a higher detection rate for a certain false alarm ratio; hence it is more robust to the compression attack.

Fig. 7
figure 7

ROC curve for compression attack

3.2 Robustness against scaling

The robustness of the proposed method against rescaling of the watermarked video is investigated in the next experiment. To this end, two video sets of sizes 1920 × 1080 and 1280 × 720 are applied. Videos of size 1920 × 1080 are rescaled to 1280 × 720 and 352 × 288, after being watermarked and compressed. Furthermore, 1280 × 720 videos are also rescaled to 352 × 288, after the compression. The average and standard deviation of the normalized correlation for all methods are reported in Table 3.

Table 3 The robustness of the proposed method against scaling

Table 3 indicates that for the all attacks, the normalized correlation of the proposed method lies above that of [3] and [13]. The ROC curves are presented in Fig. 8 to better comparison of three methods, where the curve of the proposed method always lies above that of other methods.

Fig. 8
figure 8

ROC curve for the scaling attack (a)1920 × 1080 → 1280 × 720 (b)1920 × 1080 → 352 × 288 (c)1280 × 720 → 352 × 288

3.3 Robustness against rotation

To investigate the robustness of the proposed method against the rotation, every video is rotated for one to five degrees after being watermarked. This attack can be carried out in two scenarios. In the first scenario, the rotated video is cropped to scale back to the original size, while it is not cropped in the second scenario, resulting in the increase in the video size and reduction of the lost information. The first scenario is applied here to examine the robustness against the rotation attack, that is, the video is cropped and scaled back to the original size, after being rotated. The results are demonstrated in Table 4.

Table 4 The robustness of three methods against the rotation attack

It can be inferred from Table 4 that the proposed method reaches higher peaks compared to the other ones for the all rotation cases. The ROC curves for three out of five rotation cases are given in Fig. 9, where the proposed method exhibits a better performance compared to [3] and [13]. This advantage is better observable when the degree of the rotation being increased.

Fig. 9
figure 9

ROC curve for the rotation attack. (a), (b), and (c) correspond to 3, 4, and 5 degree of rotation, respectively

3.4 Combined attacks

In this section the robustness against the combined attacks is investigated. For this sake, videos of size 1920 × 1080 and 1280 × 720 are applied. The videos are rescaled after being watermarked, and then they are rotated three degrees in the same manner as stated in the previous section. They are cropped from both sides by 1, 3, 5, 7, 10, 15, and 20 percent of the original size. Finally, videos are compressed as stated in Section 3.1. The results of this experiment in terms of false negative ratio (FNR) are given in Table 5. The FNR values are given in percent.

Table 5 The robustness of the methods against combined attacks which includes rotation, scaling, cropping, and compression in terms of False Negative Rate

3.5 Computational complexity

To compare the embedding complexity of the methods, all standard videos are watermarked three times and the watermarking speed is recorded in terms of fps. Results are derived running MATLAB 2017a on a system with Intel Core i7 960@3.2GHz processor and 16GB of memory, and are reported in Table 6 based on the resolution of the watermarked video. Results show that the proposed method is much faster than [3]. The values in the table shows the number of frames watermarked by the corresponding algorithm in one second. This computational gain is achieved due to shifting the watermark addition from the transformed domain to the spatial domain that removes the requirement of applying the forward and inverse transforms.

Table 6 Comparison of methods in terms of watermarking computational complexity

4 Conclusion

In this paper, a video watermarking algorithm is proposed to tackle the problem of video piracy.The proposed method uses the low frequency components of the U channel of the video frames for watermark embedding. To embed watermark, the coefficients of the first and sixth sub-bands of DTCWT has been employed. The proposed method is evaluated using different kinds of attacks. Experimental results show that the proposed method is highly robust against different attacks which includes compression, scaling, rotation, and the combination of them. To ensure the high perceptual quality of the proposed method, the watermark power is chosen using a subjective test. Also, thanks to the linear attribute of DTCWT, the mask of embedding is designed to be applied in the spatial domain. This reduces the timing process of our watermark embedding system significantly.