Keywords

1 Introduction

Video watermarking is a technique for protecting digital video data from piracy. As illegal distribution of copyrighted digital video is ever-growing, video watermarking attracts an increasing amount of attention within the information security community. Over the last decade, various watermarking techniques have been introduced for copyright protection and data authentication. Based on the domain where the watermark information is embedded, this technique can be divided into three main classes: compressed, spatial and transform domain [1].

Among the above-mentioned three categories, transform domain algorithm is widely used due to its effectiveness in maintaining robustness against various attacks. The most commonly used transforms are singular value decomposition (SVD), discrete Fourier transform (DFT), discrete Cosine transform (DCT), discrete wavelet transform (DWT) and dual-tree complex wavelet transform (DT CWT) [1]. In [2], Huan et al.. Introduce an algorithm applying SVD on the DT CWT domain. In [3], Bhaskar et al.. Proposed a robust video watermarking scheme with squirrel search algorithm. For lack of a strong rotational invariance proved by mathematical principles, these methods are not robust to rotation attacks with a large angle, while this property is possessed by Zernike moment. In the image watermarking field, methods based on Zernike moment has been widely used, for example, in [4,5,6]. In [7], the author proposed a video watermarking algorithm based on Zernike moment. However, this algorithm only resists against rotation attacks rather than scaling attacks. Therefore, although contributions expended by predecessors have exceeded the development of robust watermarking techniques, problems like resistance to geometric attacks are still challenging in the video watermarking community and need further research.

In this paper, we propose a robust video watermarking algorithm based on normalized Zernike moments to resist against geometric distortions. Since different video compression algorithms are used on the Internet, the proposed algorithm is designed in the uncompressed domain for suiting any video compression standard. Because of the geometric-invariant property proved by mathematical principles, normalized Zernike moments are employed in our method as invariant features. For watermark embedding and extraction, Dither Modulation-Quantization Index Modulation (DM-QIM) is employed in the algorithm by using dither vectors and modulating the Zernike moments into different clusters to make an adequate trade-off between robustness and distortion [8]. To achieve high visual quality, we embed the watermark information into the U channel of the cover video sequence because distortion in luminance is more noticeable than that in chrominance as for human visual system [9]. The experimental results show that our approach maintains good visual quality and achieves great robustness to geometric attacks with high intensity comparing to the prior work.

The remainder of this paper is organized as follows. The preliminary knowledge related to the scheme is discussed in Sect. 2. In Sect. 3, we introduce the proposed video watermarking approach in detail. While in Sect. 4, experiments for imperceptibility and robustness evaluation of the proposed scheme is conducted. Finally, conclusions and future work are drawn in Sect. 5.

2 Preliminaries

In this section, we describe the preliminary knowledge of the proposed algorithm, which can be separated into four parts. In each part, we demonstrate the main contents and explain the reason why we use them.

2.1 Geometric Attacks

When watermarked videos are available online, some kinds of content-preserving attacks may be applied, which inevitably reduce the energy of the watermark inside the transmitted videos [6]. Among all these distortions, geometric attack is a relatively challenging one, since a slight geometric deformation often fails the watermark detection. In this paper, for practical applications, we mainly discuss the most common geometric attacks: rotation and scaling attacks.

Geometric attack is defined by a set of parameters that determines the operation performed over the target document, for example, scaling attack can be characterized by applying a scaling ratio to the sampling grid, and a similar conclusion can be given to rotation attacks. These common geometric attacks will cause two typical distortions in the document. One is the shifting of pixels in the spatial plane. The other is alteration of the pixel values due to interpolation [10]. Hence withstanding an arbitrary displacement of all or some of its pixels by a random amount is the main concern for resistance to geometric deformations.

2.2 Zernike Moments

In our method, we use normalized Zernike moments for data embedding due to its geometric invariance proved by mathematical principles, which is a modification of Zernike moments. Therefore, we first introduce Zernike moments in this part.

Zernike moments are orthogonal moments based on Zernike polynomial, which is a complete orthogonal set over the interior of the unit circle [11]. The set of these polynomials can be denoted in the following equation:

$$\begin{aligned} V_{nm}(x,y)=R_{nm}(\rho )e^{jm\theta } \end{aligned}$$
(1)

where \(x,y\) denote the pixel position, \(\rho =\sqrt{x^{2}+y^{2}}\), and \(\theta =tan^{-1}(y/x)\). \(n\) is a non-negative integer which represents the order and \(m\) is the repetition designed to satisfy the fact that \(n-\left| m \right| \) is both non-negative and even. \(R_{nm}(\rho )\) are radial Zernike polynomials, which are given by the equation below:

$$\begin{aligned} R_{nm}(\rho )=\sum _{s=0}^{n-\left| m \right| /2}\frac{(-1)^{s}\left[ (n-s)! \right] \rho ^{n-2s} }{s!\left( \frac{n+\left| m \right| }{2}-s \right) !\left( \frac{n-\left| m \right| }{2}-s \right) !} \end{aligned}$$
(2)

After computing Zernike polynomials in Eq. (1), we can get the Zernike moments of order n with repetition m for a continuous image function:

$$\begin{aligned} A_{nm}=\frac{n+1}{\pi }\int \int _{x^{2}+y^{2}\le 1}f(x,y)V^{*}_{nm}(\rho ,\theta )dxdy \end{aligned}$$
(3)

where \(V_{nm}\) represents the Zernike polynomial, and \(*\) denotes complex conjugate.

For digital signal, the integrals are replaced by summations. Since the Zernike polynomial is a set over the interior of the unit circle, where each frame is reconstructed. By utilizing the properties of the Zernike polynomial set discussed above, frame image \(f(x,y)\) can be reconstructed to \(\hat{f}(x,y)\) in Eq. (4).

$$\begin{aligned} \hat{f}(x,y)=\sum _{n=0}^{N}\sum _{m}{A}_{nm}{V}_{nm}(\rho ,\theta ) \end{aligned}$$
(4)

where \(A_{nm}\) represents the Zernike moments of order \(n\) with repetition \(m\). A larger \(N\) results in a reconstruction result with more accuracy.

2.3 Invariant Properties of Normalized Zernike Moment

Based on the mathematical definition of Zernike moment, the amplitude of which can be used as a rotation-invariant feature. By utilizing the normalization technique, the normalized Zernike moments are invariant to both rotation and scaling attacks. The certification process is addressed in detail as follows.

Rotation Invariance. From Eq. (3), \(A_{nm}\) can be simplified as \(A_{nm}=\left| A_{nm} \right| e^{jm\theta }\). After rotating each frame image clockwise by angle \(\alpha \), the relationship between the original and rotated frames in the same polar coordinate becomes

$$\begin{aligned} A'_{nm}=A_{nm}e^{-jm\alpha } \end{aligned}$$
(5)

which means after rotation, the amplitude of the Zernike moment remains the same. As a result, it can be used as a rotation-invariant feature of each frame.

Scaling Invariance. After scaling the size of an image, the nonlinear interpolation will convert the content of the unit circle from the original one, which means that Zernike moments are not robust to scaling deformations.

$$\begin{aligned} m_{pq}=\sum _{x=0}^{M-1}\sum _{y=0}^{N-1}x^{p}y^{q}f(x,y) \end{aligned}$$
(6)

To achieve scaling invariance, we can normalize each frame as shown in [12] before computing the Zernike moments. The normalization phase of which is concluded in a detailed way by the following four steps:

Step 1) Center the image by transforming \(f(x,y)\) to \(f_{1}(x,y) = f(x-\bar{x},y-\bar{y})\). \((\bar{x},\bar{y})\) is the centroid of \(f(x,y)\), which can be calculated below.

$$\begin{aligned} \bar{x}=\frac{m_{10}}{m_{00}} , \bar{y}=\frac{m_{01}}{m_{00}} \end{aligned}$$
(7)

where \(m_{10}\),\(m_{01}\) and \(m_{00}\) are the moments of \(f(x,y)\) as defined in Eq. (6).

Step 2) Apply a shearing transform from \(f_{1}(x,y)\) to \(f_{2}(x,y)\) in the \(x\) direction using Eq. (8) with \(A_{x}=\begin{pmatrix}1 &{} \beta \\ 0 &{}1 \end{pmatrix}\) making sure that the \(\mu _{30}\) of the resulting image is zero, which stands for central moments and is described in Eq. (9).

$$\begin{aligned} g(x,y)=A\cdot f(x,y) \end{aligned}$$
(8)

Step 3) Apply a shearing transform from \(f_{2}(x,y)\) to \(f_{3}(x,y)\) in the \(y\) direction with \(A_{y}=\begin{pmatrix}1 &{} 0 \\ \gamma &{}1 \end{pmatrix}\) so that the \(\mu _{11}\) of the resulting frame reaches zero.

$$\begin{aligned} \mu _{pq}=\sum _{x=0}^{M-1}\sum _{y=0}^{N-1}(x-\bar{x})^{p}(y-\bar{y})^{q}f(x,y) \end{aligned}$$
(9)

Step 4) Scale \(f_{3}(x,y)\) in both \(x\) and \(y\) directions with \(A_{s}=\begin{pmatrix}\alpha &{} 0 \\ 0 &{}\delta \end{pmatrix}\) to a prescribed standard size and achieve \(\mu _{50}>0\) and \(\mu _{05}>0\) from the outcome.

In [12], it is proved that image and its affine transforms have the same normalized image. Consequently, when it is employed in video algorithms, the same conclusion can be drawn. As a result, after normalization, the amplitude of the Zernike moments stays invariant to both rotation and scaling attacks.

2.4 Quantization Index Modulation

Quantization Index Modulation (QIM) [8] is an embedding operation used for information hiding, which preserves provably better rate distortion-robustness trade-offs than spread-spectrum and low-bit(s) modulation methods. In this paper, we use the modification of QIM: Dither Modulation (DM)-QIM algorithm. In this subsection, we introduce the basic theory of DM-QIM as follows:

Embedding Procedure. Suppose \(f(n,m)\) is an image, where \(n \in \left[ 1,N \right] ,m \in \left[ 1,M \right] \) and \(W(k),k\in \left[ 1,N\times M \right] \), which is used as watermark. Let \(d (k)\) be an array of uniformly distributed pseudo-random integers chosen within [−\(\varDelta \)/2, \(\varDelta \)/2], which is generated according to a secret key. Dither vectors \(d_{0}(k)\) and \(d_{1}(k)\) are used for embedding the ‘0’ and ‘1’ bits of the watermark respectively. For simplicity, we combine the two vectors into \(d_{W(k)}(k)\).

$$\begin{aligned} d_{0}(k)=d(k) \end{aligned}$$
(10)
$$\begin{aligned} d_{1}(k)=d_{0}(k)-sign(d_{0}(k))\times \frac{\varDelta }{2} \end{aligned}$$
(11)

where \(f^{w}(n,m)\) denotes the watermarked image and \(\varDelta \) represents the quantization step, which is the most important parameter of QIM. The watermark embedding operation is performed below in Eq. (12).

$$\begin{aligned} f^{w}(n,m)=Q(f(n,m)+d_{W(k)}(k),\varDelta )-d_{W(k)}(k) \end{aligned}$$
(12)

where \(Q(x,y)\) is defined below, and \(round(x)\) returns the nearest integer of \(x\).

$$\begin{aligned} Q(x,\varDelta )=\varDelta \times round(\frac{x}{\varDelta }) \end{aligned}$$
(13)

Extraction Procedure. To extract the watermark data, we put the watermark bits ‘0’ and ‘1’ into (8) using the watermarked frame as an input instead of the original one, and then estimate the errors between the watermarked image and the results obtained above. By comparing the two errors, the one with the lower value represents the watermark bit. The extraction procedure is concluded below.

$$\begin{aligned} g^{W(k)}(n,m)=Q(\tilde{f}^{w}(n,m)+d_{W(k)}(k),\varDelta )-d_{W(k)}(k) \end{aligned}$$
(14)
$$\begin{aligned} \tilde{W}(k)= argmin_{p\in [0,1]}\left| \tilde{f}^{w}(n,m)-g^{p}(n,m) \right| \end{aligned}$$
(15)

where \(\tilde{f}^{w}(n,m)\) denotes the frame we received, and \(g^{W(k)}(n,m)\) is used to calculate the watermark value in Eq. (15). \(d_{W(k)}(k)\) is a dither vector used for embedding the watermark bit, \(\varDelta \) represents the quantization step, and both of them should be the same as the embedding procedure. \(argmin(x)\) means the independent variable that minimizes the value of \(x\).

3 Proposed Method

In this section, we introduce the proposed video watermarking algorithm in terms of embedding and extraction, which is demonstrated below separately.

3.1 Watermark Embedding

The watermark embedding procedure is demonstrated in Fig. 1, and some of the steps in the block diagram is explained in the following subsections.

Fig. 1.
figure 1

Proposed robust watermarking framework

U Channel Extraction. In YUV format, Y represents the luminance channel and U, V are the two independent chrominance channels. As distortion in the chrominance channel is less sensitive to the human visual system than the luminance one [9], we extract the U channel in a YUV represented video for watermark embedding to enhance imperceptibility in the proposed method.

The following equation shows how to generate YUV signals from RGB sources:

$$\begin{aligned} \begin{bmatrix} Y\\ U\\ V \end{bmatrix}=\begin{bmatrix} 0\\ 127\\ 127\end{bmatrix}+\begin{bmatrix} 0.2989 &{} 0.5866 &{} 0.1145\\ -0.1688&{}-0.3312&{}0.5000 \\ 0.5000&{}-0.4184&{}-0.0816 \end{bmatrix}\begin{bmatrix} R\\ G\\ B\end{bmatrix} \end{aligned}$$
(16)

Adaptive Normalization. The adaptive normalization is almost identical as the procedure described in Sect. 2.3, except for step (4). After experiments, it is found that if videos in low resolution are normalized to a size much larger than the original one, more distortions will be produced.

To deal with this issue, the standard size \(M \times M\) mentioned in Sect. 2.3, step (4) is set adaptively based on the video size. For example, if the U channel of the input video sequence is in a resolution of \(176\times 144\). Empirically, \(M\) can be set as 256 for higher accuracy. For different requests, \(M\) is open for adjusting.

Moments Selection. In [13], it is said that Zernike moments with repetition m = 4j, j integer, will deviate from orthogonality, meaning these moments cannot be computed accurately. In [14], \(\left| A_{00} \right| \) and \(\left| A_{11} \right| \) are independent of image, for that reason, they are not appropriate for watermark embedding. Given that conclusions can be drawn from Eq. (3) that \(\left| A_{n,m}\right| \) = \(\left| A_{n,-m}\right| \), where \(\left| x\right| \) denotes the amplitude, we can dismiss the latter ones so as to eliminate the embedding modifications. From what has been discussed above, we remove these moments to maximize the applicability and superiority of our algorithm.

Data Embedding. After selection, we embed watermark data into the amplitude of all these selected moments using DM-QIM, which has already been extensively discussed in Sect. 2.3. In this paper, watermark data for each target frame contains 1 bit, this embedding operation can be described below.

$$\begin{aligned} \left| A_{n,m}^{w} \right| =Q(\left| A_{n,m} \right| +d_{w},\varDelta )-d_{w} \end{aligned}$$
(17)

where the superscript \(w\) indicates that it is the value after embedding. \(Q(x,y)\) is a quantizer defined in Eq. (13), and \(w=0, 1\) represents the watermark bit. \(\varDelta \) is a quantization step, which is set based on the value of \(\left| A_{n,m} \right| \), and \(d_{w}\in [-\varDelta /2, \varDelta /2]\) is dither vector used for embedding watermark bit \(w\).

Watermark Signal Reconstruction. The watermark signal \(I_{w}(x,y)\) is reconstructed using Eq. (4) and multiplied with a coefficient based on the amplitude of both the original and the watermarked moment, which is demonstrated below.

$$\begin{aligned} I_{w}(x,y)=\sum _{n}^{N_{max}}\sum _{m}(\frac{\left| A^{w}_{nm} \right| }{\left| A_{nm} \right| }-1)\times {A}_{nm}{V}_{nm}(\rho ,\theta ) \end{aligned}$$
(18)

where \(x, y\) denote the pixel position, and \(A_{nm}\) represents the Zernike moment in order \(n\) and repetition \(m\), while the one with a superscript \(w\) indicates that it is the value after embedding the watermark. \({V}_{nm}(\rho ,\theta )\) denotes the Zernike polymonial, with \(\rho =\sqrt{x^{2}+y^{2}}\), and \(\theta =tan^{-1}(y/x)\).

Finally, the watermark signal is added to the target frame with a coefficient \(\alpha \) designed for controlling the embedding strength of the watermark and ensures the imperceptibility. We calculate the value of \(\alpha \) with the following equation.

$$\begin{aligned} \alpha =\frac{\varTheta I(x,y)}{\varTheta I_{r}(x,y)} \end{aligned}$$
(19)

where \(x^{2}+y^{2}\le 1\) and \(\varTheta (x)\) returns the mean value of \(x\). \(I(x,y)\) is the original frame image where all the data is in the unit circle. \(I_{r}\) is demonstrated in Eq. (20), which represents the reconstructed frame of the original one without selecting the appropriate moments for watermark embedding.

$$\begin{aligned} I_{r}(x,y)=\sum _{n}^{N_{max}}\sum _{m}{A}_{nm}{V}_{nm}(\rho ,\theta ) \end{aligned}$$
(20)

In order to ensure visual quality, we choose to add the reconstruction to the original frame instead of replacement, since it is limited in the unit circle and the reconstruction effect is far from satisfactory even with a high order. This conclusion is verified in Fig. 2, which takes a \(256\times 256\) image of ‘Lena’ as an example to illustrate the reconstruction results with different orders. Furthermore, the reconstruction phase is rather time-consuming, for instance, when order is 30, it takes over 10 s to reconstruct only one image. So it is not a brilliant choice to replace the watermarked signal with the original one for data embedding.

Fig. 2.
figure 2

Reconstruction of ‘Lena’ with different orders

To sum up, the embedding procedure can be concluded as follows:

Step 1) Divide the input video into groups and select the target frames.

Step 2) Perform adaptive normalization for calculating Zernike moments.

Step 3) Calculate the Zernike moments from the normalized frame and select the appropriate ones as invariant features for watermark embedding.

Step 4) Compute the amplitude of the selected moments and embed the same watermark bit into all of them, using DM-QIM to embed watermark bit.

Step 5) Reconstruct the watermarked moments as the watermark signal and add it to the original frame with a coefficient \(\alpha \) defined in Eq. (19).

3.2 Watermark Extraction

In Fig. 3, the process of watermark extraction is introduced, which is similar to the embedding procedure. After computing the Zernike moments and selecting the appropriate ones of each frame, we extract the watermark bit from each moment using the extraction step in DM-QIM by Eq. (14) and (15).

Majority Vote. From all the watermark bits extracted from the selected moments in one frame, to dismiss the mutations, we choose the one with the highest frequency as the extracted watermark bit of each frame for more accuracy.

Fig. 3.
figure 3

Watermark extraction from Zernike Moments of the watermarked frame

The extraction procedure can be concluded in the following five steps:

Step 1) Divide the input video into groups and select the target frames.

Step 2) Perform adaptive normalization for calculating Zernike moments.

Step 3) Calculate the Zernike moments from the normalized frame and select the same moments used in watermark embedding procedure for extraction.

Step 4) Compute the amplitude of the selected moments and extract all the watermark bits using DM-QIM extraction method described in Sect. 2.4.

Step 5) Using Majority Vote to select the watermark bit with the highest frequency as the final extraction watermark bit for each target frame.

4 Experimental Results and Analysis

In this section, in order to evaluate the effectiveness of the proposed method, we conceive experiments to analyze the imperceptibility and robustness against geometric attacks by comparing our scheme with the existing approach [2].

4.1 Experimental Setup

All the experiments in this paper is implemented in the environment of Matlab R2016a on a PC with 8 GB RAM and 2.3 GHz Intel Core i5 CPU, running on 64-bit Windows 10. To evaluate our method fairly, we selected six standard video sequences in CIF format (\(352\times 288\)), i.e., Akiyo, Foreman, Hall, Mother and Daughter, Paris and Silent [15], and each testing video contains 300 frames.

For simulation, we normalize the frame image of each video in U channel to \(256\times 256\) and set the GOP (Group of Picture) length as 6, while the watermark length is 50, which is generated pseudo-randomly using a key, so that each GOP carries one watermark bit. After preliminary experiment in Sect. 4.2, we set the step length \(\varDelta \) to 30000, with \(d_{0}= 0\), \(d_{1}= 15000\). For simplicity, we embed each watermark bit into the first frame of one GOP, which represents the index frame. For fair comparison, the GOP in prior work [2] is also set to 6. The embedding strength T in [2] is set by 400 as the recommended value.

4.2 Parameter Setting

In this section, we conduct an experiment to find the optimal setting of \(\varDelta \), which is the most important parameter in our scheme. To evaluate the accuracy of the extracted watermark, Normalized Cross Correlation (NCC) is exploited as a standard, which is demonstrated below by Eq. (21).

$$\begin{aligned} NCC(X,Y)=\frac{Cov(X,Y)}{\sqrt{Var(X)\cdot Var(Y)}} \end{aligned}$$
(21)

where \(Cov(X,Y)\) denotes the covariance between image \(X\) and \(Y\), \(Var(X)\) means the variance of \(X\). The value range of NCC is [−1,1] where ‘1’ means complete match and ‘−1’ indicates that the two images are exactly the opposite.

To evaluate the performance of different quantization step values, we use the six standard test video sequences mentioned above in Sect. 4.1, and the other parameters remain unchanged. In Fig. 4, it can be seen that the NCC value of our proposed method changes with the increase of quantization step and a maximum NCC value is reached by setting \(\varDelta \) to 30000 and 40000. Since the accuracy of the extracted watermark increases with the NCC value, so both can be chosen as the best quantization step. In the following discussions, we set \(\varDelta \) = 30000 for experiment unless specific statement is presented.

Fig. 4.
figure 4

NCC of the watermarked video for our scheme in different quantization steps

4.3 Imperceptibility

For practical application, watermark imperceptibility is a very important requirement of a digital video watermarking algorithm. In this subsection, we adopt the peek signal-to-noise ratio (PSNR) as the standard to measure the visual quality of the final watermarked video, which is demonstrated below.

$$\begin{aligned} PSNR=10\cdot log_{10}\frac{max^{2}}{E}, E=\frac{1}{m\cdot n}\sum _{i=1}^{m}\sum _{j=1}^{n}\left| I(i,j)-K(i,j) \right| ^{2} \end{aligned}$$
(22)

where \(I(i,j)\) and \(K(i,j)\) represent two different images, and \(|x|\) represents the absolute value of \(x\). \(m, n\) denotes the height and width of each frame, and \(max\) indicates the upper limit value of the pixel in each frame image.

The experimental results on the test video sequences are listed in Table 1. It can be concluded from the table that, the PSNR of watermarked videos in this paper are controlled around 37 dB, while the counterpart in [2] is 30 dB. Therefore, the PSNR values in our method gain about 7 dB averagely comparing to the prior work. Consequently, in terms of PSNR, our scheme outperforms the existing scheme [2] in imperceptibility, which results in better visual effects.

Table 1. PSNR of the proposed method and the existing work after watermarking

4.4 Geometric Robustness

In this subsection, experiments are conducted to analyze the robustness of our method against geometric attacks. The experimental data in Table 2 is obtained by averaging the results of the aforementioned six standard test video sequences.

Table 2. NCC of the proposed method and the existing work after Scaling attack

From Table 2, it can be observed that, before attacks, the accuracy in our method is 2% higher than that in approach [2]. After scaling attacks, the NCC value in our method relatively maintains the same value as the one without attacks. While in approach [2], the result decrease remarkably as the scaling factor increases from 200% to 300% and 300% to 400%. When the scaling factor reaches 400%, the NCC value in [2] is lower than our approach by 40%. As a result, the proposed method outperforms [2] notably in scaling attacks.

Table 3. NCC of the proposed method and the existing work after Rotation Attack

In Table 3, the NCC of the proposed method and the existing work after rotation attack is given. It can be concluded that when the rotation angle increases, the NCC value in our method can be maintained, which is slightly lower than the one without any attack. While in [2], the NCC value drops significantly as the rotation angle rises, especially from 90\(^{\circ }\) to 120\(^{\circ }\), which is lower than the value in our method from 60% to 80%. After the analysis above, our method performs remarkably better than [2] in robustness against rotation attacks.

From all the conclusions discussed above, our method outperforms [2] in both imperceptibility and geometric robustness against scaling and rotation. Therefore, we can jump to the conclusion that our method has a relatively good visual effect and a great robustness against geometric attacks.

5 Conclusion

In this paper, we propose a novel video watermarking scheme, which combines the benefits of both Zernike moments and normalization to resist against geometric distortions. Zernike moments are employed for its special invariant properties against rotation attacks. Normalization is used to normalize the target frame, so that the normalized Zernike moment is robust to both scaling and rotation attacks. After calculating the Zernike moments, we select some of the appropriate ones for watermark embedding according to certain principles to improve robustness and reduce modifications. With the heavy computation load and low accuracy for reconstructing Zernike moments, we use them to design watermark signal. The watermark embedded in each frame is obtained by taking the one with the highest frequency from all the candidate watermarks extracted from the amplitude of the selected Zernike moments to avoid errors. Based on the experimental results, our approach maintains good visual quality and achieves great robustness to rotation and scaling attacks comparing to the prior work.

In our method, it is shown that we apply a couple of forward and inverse normalizations, which produce inevitable distortions. To deal with this problem, it is necessary to design a new watermarking strategy to eliminate the loss. Meanwhile, as Zernike moment is computationally expensive, a watermark embedding algorithm with more efficiency needs to be explored. In future works, we will focus on the optimization for both preciseness and efficiency.