1 Introduction

The significant advancements in technology have resulted in the presence of high-speed and powerful storage devices and communication networks, facilitating the transfer of an enormous amount of data across the globe in seconds. Due to the increasing number of Internet users every day, digital data in all its forms has become vulnerable to misuse during transmission, distribution, processing, and storing. The presence of advanced software and multimedia tools has made manipulation and forging of digital data, especially videos, audios, texts, and images, very easy resulting in legal disputes and people claiming ownership of others’ intellectual properties. Resolving such problems necessitate the presence of methods to secure digital data and protect copyrights. Some of these methods are cryptography, steganography, and watermarking [13]. In digital watermarking techniques, a watermark or a secret message is embedded in the data before transmission and extracted later at the receiver to ensure the authenticity of the transmitted data and verify ownership. A proper watermarking technique is judged based on several characteristics [30], which are imperceptibility, robustness, security, and capacity.

A long time ago, many watermarking techniques were proposed for still images. Researchers recently have started to focus on other types of media, especially videos. The main reason is the increase in online distribution of illegal copies of movies, shows, games, etc. and the widespread of surveillance cameras everywhere [8]. In general, a video stream consists of a sequence of images with changing scenes and contents. Thus, video watermarking faces more challenges [3] than those facing image watermarking especially in terms of watermark transparency and types of attacks affecting the videos. Video watermarking schemes are further classified into 3 categories: blind, where only secret keys are needed for watermark extraction; semi-blind, where original watermark and secret keys are required for extraction; non-blind, where original video and secret keys are needed for extraction.

For increasing the imperceptibility of the watermarked videos and reducing the processing time, some video watermarking techniques hide watermarks in selected video frames as in [2, 40]. However, this leaves non-watermarked frames subject to tampering and attacks, hence affecting the robustness of those schemes. Other techniques, as in [12, 34, 39], divide the watermark among the frames of the video based on scene change, which is determined by the histogram difference method. Although adopting scene change techniques increase robustness against frame dropping, switching, and averaging, tampering with just one frame is enough to lead to losing part of the watermark. On the other hand, the majority of the proposed techniques, as in [33,34,35], embed the same watermark in each video frame. Although sometimes this is seen as time-consuming and kind of data redundancy, robustness against several types of attacks has been achieved this way.

Video watermarks are usually embedded either in the spatial domain or the transfer domains of the original videos [18]. In spatial domain techniques [16], watermarks are embedded in the least significant bits of the pixels to increase imperceptibility. Spatial domain techniques are easy to implement and have low processing time; however, they are less robust towards geometric attacks. In transfer domains’ techniques, watermarks are embedded by updating the coefficients of the transfer domain. Researchers prefer using transfer domains to embed the watermark in because they are more robust towards geometric and compression attacks in addition to their high watermark imperceptibility. The most commonly used domains for watermarking are discrete cosine transform (DCT) [19], discrete Fourier transform (DFT) [15], discrete wavelet transform (DWT) [31, 38], and singular value decomposition (SVD) [17]. Hybrid techniques [25, 29, 36] based on the combination of two or more transfer domains are also available.

In [19], the authors present a video watermarking technique based on DCT, where the DCT coefficients of the blocks of the watermark image are added to the DCT coefficients of the blocks of each of the original video frames using a watermark strength factor. Although their watermarked videos show high peak signal to noise ratio (PSNR) values, no results have been reported by the authors in case of signal processing attacks or geometric attacks. In [17], Kong et al. propose a technique based on SVD only, where 4 bits of a binary watermark are embedded in the middle singular values of each frame of the original video according to a specific equation. Although the technique is totally blind and secure, it is confined to particular video sizes, has low capacity, and the watermark bits are susceptible to common processing attacks such as noise, filtering, and blurring.

In [39], Sinha et al. present a hybrid technique based on DWT and principal component analysis (PCA). The video frames are decomposed using DWT, and the LL sub-band of each frame is divided into blocks. The watermark bits are embedded in the principal components of the blocks of the LL sub-band. The technique is non-blind, where the original video is needed for the watermark extraction process. Embedding watermark bits into the low-frequency sub-band has led to low PSNR and correlation between the extracted and the original watermarks in the case of video attacks. In [29], the authors embed a binary watermark in the singular values (SVs) of the second level DWT of HL sub-band by substituting the least significant bits of SVs coefficients with the watermark bits. Although high imperceptibility of the watermarked video is achieved, the robustness of the technique is highly affected by common processing attacks such as noise, where watermark bits are going to be very sensitive and change easily.

In [10], Faragallah presents a hybrid non-blind video watermarking technique where a binary watermark is hidden in the second level DWT decomposition of LL sub-band of each frame by adding it to the SVs of LH2, HL2, and HH2 sub-bands. He uses Hamming correction codes to protect watermark from bit errors. The technique has good performance and high robustness; however, the technique is non-blind, and the author did not show the effects of noise and filtering attacks on his proposed technique. In [11], the authors present a hybrid technique based on hiding the watermark in the second level DWT decomposition of LH sub-band by adding SVs of the watermark to the SVs of LL2 sub-band with a visibility factor. The technique used is non-blind, and its imperceptibility is highly dependent on the video content and the embedded watermark due to the presence of a constant visibility factor. In [20], Liu proposed a non-blind watermarking scheme based on applying the DWT and SVD on the Blue channel of the host image. The author applies the Logistic and the RSA encryption technique on the watermark image before embedding to increase the security of his scheme. The encrypted watermark is embedded into the low-frequency sub-band of the host. The author improves security of the watermarking scheme, however the quality of the extracted watermark is low.

In [36], the authors compute the SVs of the HL2 sub-band of second-level DWT decomposition of LH sub-band of both the watermark and each of the original video frames. SVs are added with a constant visibility factor making the technique confined to specific videos and watermarks. In [38], Singh proposes a blind video watermarking scheme based on DWT and SIFT feature points to improve robustness towards rotation attacks. Watermark bits are embedded by modifying two random high-frequency coefficients in the first level DWT of the key video frames only. He proposed restoring rotationally attacked videos by matching a group of SIFT feature point pairs generated by the original video at embedding to those created by the watermarked attacked one during extraction. However, to restore a rotated video, a long-secret key consisting of a 128-dimensional feature vector for each point, these points’ distances from the frame center, distances between the points themselves, and the size of the frame, is required. That’s in addition to other secret keys for watermark CAT mapping and DWT. Also, embedding the watermark in selected frames leaves the rest of the frames unprotected.

Although SVD based watermarking schemes are commonly used due to their stability and robustness to various attack, they suffer from the main drawback, which is the false positive problem (FPP), a security issue that is discussed and analyzed in [24]. To overcome the FPP problem, many authors [7, 21, 22] have suggested calculating a one-way hash function using a message Digest 5 (MD5) algorithm or secure hash algorithm 1 (SHA-1) for the U and V matrices of the SVD of the watermark image. They use these hash values as secret keys. However, the disadvantage of using these hashing algorithms is the sensitivity to the bit level change of the input message. In [14], the authors replace SVs of HH sub-band of the original image by SVs of the watermark. A hash function is used for U and V matrices and is hidden in the LL subband of the image. The smallest amount of noise is capable of changing the embedded hash, thus preventing watermark extraction.

As mentioned earlier, DWT is robust toward geometric and compression attacks; however, it lacks robustness towards the other attacks. On the other hand, SVD is robust and stable; however, it suffers from false positive problems. Therefore, in this paper, DWT is integrated with both SVD and Laplacian pyramid to exploit the pros of DWT and SVD while overcoming the cons of DWT and increasing the robustness of the watermark towards signal processing and geometric attacks. Laplacian pyramid is used to separate the watermark image into two images: high-frequency watermark image and low-frequency watermark image. SVs of the high-frequency watermark image are embedded in the SVs of the HH sub-band of every frame by substitution. SVs of the low-frequency watermark image are embedded in the SVs of HH1 of the second level DWT decomposition of HL sub-band of every frame by replacement. This watermark separation saves the low-frequency part of the watermark image by hiding it in the low-frequency of the video frame, making it robust towards noise, filtering and compression attacks. However, FPP of SVD is resolved by using perceptual image hashing on the watermark image, as discussed in section 3 of this paper, instead of the MD5 or SHA-1 to overcome the problem of bits sensitivity mentioned earlier. The proposed scheme by itself is robust towards geometric attacks such as rotation, scaling, and shear. However, to increase robustness and improve the quality of the extracted watermark more in case of geometric attacks, a restore algorithm based on SURF points computed from the first video frame only is also proposed.

The rest of the paper is organized as follows: Section 2 presents a summary about DWT, SVD, Laplacian pyramid, and SURF feature points. Section 3 discusses in details the proposed embedding algorithm, the proposed extraction algorithm, and the proposed geometric attacks restore algorithm with an emphasis on how security is achieved via the proposed scheme. Section 4 summarizes the evaluation metrics used in evaluating the performance of the proposed scheme. Experimental results from the proposed scheme are presented and analyzed in section 5, along with comparisons to results obtained using two conventional schemes. Finally, conclusions are reached in section 6.

2 Background

2.1 Discrete wavelet transform (DWT)

DWT [18, 39] decomposes a signal into a set of orthogonal wavelets, which are capable of reconstructing the original signal back. It has a wide variety of applications in signal coding, data compression, signal processing, digital communication, and many more. The most DWT crucial feature is that it preserves the frequency content of the signal as well as the location. The 2-D DWT decomposes a video frame into 4 sub-bands (LL, LH, HL, HH), each of which is the quarter of the size of the original video frame. LL, known as an approximation, is the low-frequency sub-band, where the real information content of the image is found.

The other sub-bands have the details, LH has the vertical details, HL has the horizontal details, and HH has the diagonal details. Any of these sub-bands can be further decomposed into a second level DWT and so on as shown in Fig. 1.

Fig. 1
figure 1

Decomposition of 2D DWT sub-bands of a video frame

2.2 Singular value decomposition (SVD)

SVD is a useful numerical analysis tool used to decompose an image into three matrices; each one of them has valuable information about the image. For an M × N image, its SVD representation is defined as:

$$ x=\kern1em US{V}^T $$
(1)
$$ x=\left(\begin{array}{c}{U}_{1,1}\kern1.25em \cdots \kern1.25em {U}_{1,M}\\ {}{U}_{2,1}\kern1.12em \cdots \kern1.25em {U}_{2,M}\\ {}\vdots \kern2em \ddots \kern1.75em \vdots \\ {}{U}_{M,1}\kern1em \cdots \kern1.25em {U}_{M,M}\end{array}\right)\ \left(\begin{array}{c}{S}_{1,1}\kern1.25em \cdots \kern1.5em 0\\ {}0\kern2em {S}_{2,2}\kern1em 0\\ {}\vdots \kern2em \ddots \kern1.5em \vdots \\ {}0\kern2em \cdots \kern1.25em {S}_{M,N}\end{array}\right)\ {\left(\begin{array}{c}{V}_{1,1}\kern1.25em \cdots \kern1.25em {V}_{1,N}\\ {}{V}_{2,1}\kern1.12em \cdots \kern1.25em {V}_{2,N}\\ {}\vdots \kern2em \ddots \kern1.75em \vdots \\ {}{V}_{N,1}\kern1em \cdots \kern1.25em {U}_{N,N}\end{array}\right)}^T $$
(2)

Where ‘S’ is a diagonal matrix with dimension MxN, U and V are orthogonal matrices with dimension M × M and N × N, respectively. The diagonal entries of S are called singular values and represent the image luminance. As a result of the decomposition, these Singular values are organized in a descending order automatically. On the other hand, matrix U and V represent the horizontal and vertical details of the image [41]. Due to its high stability towards noise and its robustness towards geometric attacks such as rotation, translation, flipping, and transpose, SVD has become a right candidate in watermarking application [32].

2.3 Laplacian pyramid

A pyramid transform of an image produces a collection of images at different scales organized in a pyramid structure; all together represent the original image. The Gaussian Laplacian pyramid, introduced by Burt and Adelson [1, 6], provides an interpretation of the image information at different resolutions. The Gaussian pyramid is a sequence of images where each level is acquired by filtering and sub-sampling its preceding level. The lowest level of the pyramid has the highest resolution information, which is the original image, while higher levels are reduced resolutions of the original image [37]. To create the image pyramid from the original image G0, the blurring and sub-sampling of the lower level Gl-1 is repeated to produce the Gl image at the next level according to the following equation [1]:

$$ {G}_l\left(i,j\right)={\left[w\ast {G}_{l-1}\right]}_{\downarrow 2}=\sum \limits_{m,n=-2}^2w\left(i,j\right)\ {G}_{l-1}\left(2i+m,2j+n\right) $$
(3)

Where Gl is the lth level pyramid, w(m,n) is a Gaussian-like weighting function and []↓2denotes down-sampling of the signal by 2. If the image size is (2 N + 1) × (2 N +1), then the pyramid has N + 1 levels. On the other hand, the Laplacian pyramid is derived from Gaussian pyramid by subtracting two successive levels of the Gaussian pyramid. Since the images produced from the Gaussian pyramid at different levels have different sizes, the images are expanded (up-sampled and interpolated) to match each other for the subtraction process. The Laplacian pyramid is further constructed as [1]:

$$ {L}_l={G}_l\left(i,j\right)- EXPAND\left({G}_{l+1}\right)={G}_l\left(i,j\right)-{G}_{l+1,1}\left(i,j\right) $$
(4)

The original image is reconstructed by reversing the previous process using all levels of the Laplacian pyramid, as well as the top level of the Gaussian pyramid based on the following equation:

$$ {G}_l\left(i,j\right)={L}_l+ EXPAND\left({G}_{l+1}\left(i,j\right)\right)={L}_l+{G}_{l+1,1} $$
(5)

2.4 Speeded up robust features (SURF)

SURF [4] is a scale and rotation independent feature detector and descriptor, which is used to find feature points in an image and describe them. They are special because if the image is exposed to affine transformation, the detector is able to find the equivalent points in the modified version. SURF is a speeded-up edition of SIFT (Scale Invariant Feature Transformation) [23]. In SIFT, Laplacian of Gaussian (LoG) is approximated with the Difference of Gaussian for finding scale-space. In contrast, SURF approximates LoG with Box Filter convolution, which can be easily calculated utilizing integral images. Using integral image and box filter makes SURF three times faster than SIFT [27]. SURF relies on the determinant of the Hessian matrix [26] for both scale and location. Besides, SURF dimensional feature vector, which is 64 elements is much smaller than that of SIFT, which is 128 elements.

3 Proposed video watermarking scheme

The purpose of the proposed scheme is building secure semi-blind video watermarking system based on the DWT, SVD and Laplacian pyramid robust towards different types of attacks while maintaining the perceptual quality of the watermarked videos. The main idea of the proposed technique is to exploit the pros of DWT and SVD and overcome their cons using Laplacian pyramid and perceptual image hash. First, the proposed method divides the watermark image using the Laplacian pyramid technique into two images: low-frequency watermark image and high-frequency watermark image. The high-frequency watermark image is inserted in the HH sub-band of the Y component of each frame while the low-frequency watermark image is inserted in the second level decomposition of HL sub-band. Hiding the low-frequency watermark image in the HL sub-band increases watermark robustness towards attacks affecting high-frequency components of the frames such as noise and median attacks. While hiding the high-frequency watermark image into HH sub-band improves the quality of the watermarked video and prevents its quality degradation in case the whole watermark is hidden in the HL sub-band.

Furthermore, the proposed scheme maintains the perceptual quality of the watermarked video by using an embedding algorithm based on the replacement of the singular values of the DWT sub-band that each watermarked image is inserted in by scaled singular values of the corresponding watermark image. Thus, making the singular values of both watermark images matching closely the singular values of the host frame, which leads to reducing the effect of embedding process and maintain the perceptual quality of the watermarked video [42].

Moreover, the proposed watermarking scheme solves the security aspect problem of the SVD-based schemes by generating a perceptual hash vector for the watermark image before embedding, to be validated before the extraction process. Thus guaranteeing halt of the extraction process when an attacker replaces the watermark image as the hash vector is changed. Also, the proposed system uses the SURF points to restore geometrically attacked videos at the decoding side. Video restoration at the decoding side improves the quality of the extracted watermark, thus increasing the robustness of the scheme.

In the following subsections the proposed embedding algorithm, the proposed extraction algorithm, the security scheme, and the proposed restore algorithm are described in details.

3.1 Proposed embedding algorithm

Figure 2 shows the block diagram of the proposed embedding process. The algorithm steps are as follows:

  1. Step 1:

    Divide the video into frames and compute SURF points of the first video frame only.

  2. Step 2:

    Convert each frame into YUV components.

  3. Step 3:

    Apply one level DWT on the luminance Y component of each frame to decompose it into four sub-bands (LL, HL, LH, and HH).

  4. Step 4:

    Apply DWT on HL to get the four sub-bands LL1, HL1, LH1, and HH1.

  5. Step 5:

    Apply the SVD on the HH of the Y component to generate the singular values SHH.

Fig. 2
figure 2

Block diagram of the Proposed Embedding Algorithm

$$ {\mathrm{U}}_{\mathrm{HH}}{\mathrm{S}}_{\mathrm{HH}}{\mathrm{V}}_{\mathrm{HH}}=\mathrm{SVD}\left(\mathrm{HH}\right) $$
(6)
  1. Step 6:

    Apply the SVD on the HH1 to get the singular values SHH1.

$$ {\mathrm{U}}_{\mathrm{H}{\mathrm{H}}_1}{\mathrm{S}}_{\mathrm{H}{\mathrm{H}}_1}{\mathrm{V}}_{\mathrm{H}{\mathrm{H}}_1}=\mathrm{SVD}\left(\mathrm{H}{\mathrm{H}}_1\right) $$
(7)
  1. Step 7:

    Apply Laplacian Pyramid transform on the watermark image to get high-frequency watermark image H and low-frequency watermark image L.

  2. Step 8:

    Apply the SVD on the high-frequency image H of the watermark image.

$$ {\mathrm{U}}_{\mathrm{H}}{\mathrm{S}}_{\mathrm{H}}{\mathrm{V}}_{\mathrm{H}}=\mathrm{SVD}\left(\mathrm{H}\right) $$
(8)
  1. Step 9:

    Apply the SVD on the low-frequency image L of the watermark image.

$$ {\mathrm{U}}_{\mathrm{L}}{\mathrm{S}}_{\mathrm{L}}{\mathrm{V}}_{\mathrm{L}}=\mathrm{SVD}\left(\mathrm{L}\right) $$
(9)
  1. Step 10:

    Modify the singular values of HH band with the singular value of high-frequency watermark image H according to the following equation [42].

$$ {\mathrm{S}}_{\mathrm{H}\mathrm{H}}^{\ast }={\upalpha}_1.{\mathrm{S}}_{\mathrm{H}}=\frac{\left\Vert {\mathrm{S}}_{\mathrm{H}\mathrm{H}}\right\Vert }{\left\Vert {\mathrm{S}}_{\mathrm{H}}\right\Vert}\ast {\mathrm{S}}_{\mathrm{H}} $$
(10)
  1. Step 11:

    Modify the singular values of HH1 band with the singular value of low-frequency watermark image L using the following equation

$$ {\mathrm{S}}_{\mathrm{H}{\mathrm{H}}_1}^{\ast }={\alpha}_2.{\mathrm{S}}_{\mathrm{L}}=\frac{\left\Vert {\mathrm{S}}_{\mathrm{H}{\mathrm{H}}_1}\right\Vert }{\left\Vert {\mathrm{S}}_{\mathrm{L}}\right\Vert}\ast {\mathrm{S}}_{\mathrm{L}} $$
(11)
  1. Step 12:

    Apply inverse SVD to get the modified sub-band HH of cover image.

$$ \mathrm{HH}={\mathrm{U}}_{\mathrm{HH}}\ast {\mathrm{S}}_{\mathrm{HH}}^{\ast}\ast {\mathrm{V}}_{\mathrm{HH}}^{\mathrm{T}} $$
(12)
  1. Step 13:

    Apply inverse SVD to get the modified sub-band HH1 of cover image.

$$ \mathrm{H}{\mathrm{H}}_1={\mathrm{U}}_{\mathrm{H}{\mathrm{H}}_1}\ast {\mathrm{S}}_{\mathrm{H}{\mathrm{H}}_1}^{\ast}\ast {\mathrm{V}}_{\mathrm{H}{\mathrm{H}}_1} $$
(13)
  1. Step 14:

    Apply inverse DWTwith modified sub-band HH1 to get HL.

  2. Step 15:

    Apply inverse DWT with modified sub-band HH and HL to get the watermarked luminance Y component.

  3. Step 16:

    Calculate the DWT hash function for the watermark image and store the hash vector.

$$ \mathrm{Hash}\ \mathrm{vec}1=\mathrm{Hash}\left(\mathrm{wate}\mathrm{r}\mathrm{mark}\ \mathrm{image}\right) $$
(14)

3.2 Proposed extraction algorithm

In the extraction algorithm, the watermark image is recovered from the watermarked video only if the hash vector computed at the embedding process is equal to the hash vector computed at extraction or the correlation between them is greater than a threshold. The details of the hash vector calculation are described in subsection 3.3. Figure 3 shows the block diagram of the proposed extraction algorithm. The extraction algorithm steps are as follows:

  1. Step 1:

    Calculate the DWT hash vector for the received watermark image.

Fig. 3
figure 3

Block diagram of the Proposed Extraction Algorithm

$$ \mathrm{Hash}\ \mathrm{vec}2=\mathrm{Hash}\left(\mathrm{recieved}\ \mathrm{watermark}\ \mathrm{image}\right) $$
(15)
  1. Step 2:

    Compare the stored hash vec1 and hash vec2, if they are equal or the correlation between them is greater than a threshold then go to Step 3 else exit.

  2. Step 3:

    If received watermarked video is geometrically attacked, match the SURF points of the 1st frame of the original video to those of the 1st frame of the attacked watermarked video to select the corresponding points to be used in the restore process.

  3. Step 4:

    Apply Laplacian Pyramid transform on the received watermark image to get high-frequency watermark image H and low-frequency watermark image L.

  4. Step 5:

    Apply the SVD on the high-frequency watermark image H

$$ {U}_H{S}_H{V}_H= SVD(H) $$
(16)
  1. Step 6:

    Apply the SVD on the low-frequency watermark image L

$$ {U}_L{S}_L{V}_L= SVD(L) $$
(17)
  1. Step 7:

    Divide watermarked video into frames and convert each frame into YUV components.

  2. Step 8:

    Apply one level DWT on the luminance Y component of each frame to decompose it to four sub-bands (LL,HL,LH, and HH).

  3. Step 9:

    Apply DWT on HL to get the four sub-bands LL1, HL1, LH1, and HH1.

  4. Step 10:

    Apply the SVD on the HH of the Y component to generate the singular values SHH

$$ {U}_{HH}{S}_{HH}{V}_{HH}= SVD(HH) $$
(18)
  1. Step 11:

    Apply the SVD on the HH1 to get the singular values SHH1

$$ {U}_{H{H}_1}{S}_{H{H}_1}{V}_{H{H}_1}= SVD\left(H{H}_1\right) $$
(19)
  1. Step 12:

    Recover the singular values of the extracted high-frequency watermark image

$$ new\ {S}_H=\frac{S_{HH}^{\ast }}{\alpha_1} $$
(20)
  1. Step 13:

    Recover the singular values of the extracted low-frequency watermark image

$$ new\ {S}_L=\frac{S_{H{H}_1}^{\ast }}{\alpha_2} $$
(21)
  1. Step 14:

    Apply inverse SVD to get the extracted high-frequency watermark image.

$$ H={U}_H\ast new\ {S}_H\ast {V}_H $$
(22)
  1. Step 15:

    Apply inverse SVD to get the extracted low-frequency watermark image.

$$ L={U}_L\ast new\ {S}_L\ast {V}_L $$
(23)
  1. Step 16:

    Apply the inverse Laplacian pyramid transform on the reconstructed L and H to recover the extracted watermark.

  2. Step 17:

    Compare extracted watermark to original watermark to verify ownership.

3.3 Security of the proposed scheme

The perceptual image hash methods are chosen to be used in the proposed algorithm instead of conventional cryptographic hashing to overcome the security issue of the SVD mentioned previously. The main problem of conventional cryptographic hashing is that if any change occurs in the input image, due to noise or filtering, a different hash vector is going to be generated. Meanwhile, perceptual image hashing is image content dependent, and its hash vector is not affected by most content-preserving manipulations. Multiple perceptual image hash methods have been developed in the past years to check the authenticity of the image content. One of these methods is the statistics-based method, where signatures are generated from computing statistics such as mean, variance, moments of image blocks, and histogram. This paper focuses on a method presented by [5], where the hash vector is extracted from different sub-bands in the image wavelet decomposition. The following steps explain the hashing vector calculation:

  1. Step 1:

    A random tiling of each subband of the image is created. Averages of coefficients in the tiles are computed in the coarse subband, while variances are computed in the other subbands. As a result an image–statistics vector m is obtained

$$ m=\sigma \left(I,K\right) $$
(24)

where I is the image, K is the key that determines the tiling, and σ is the feature (statistics) mapping operation.

  1. Step 2:

    The vector m generated in Step 1 is quantized using a randomized quantizer Q. The function Q uses the pseudorandom seed K in Step 1 to produce the probabilistic distribution required for quantization. Step 2 produces a vector with 3-bit data entries.

$$ x=Q\left(m,K\right)\in {\left\{0,1,.\dots, 7\right\}}^l $$
(25)
  1. Step 3:

    The quantized vector x obtained in Step 2 is decoded using a first-order Reed-Muller error-correcting decoder D to produce the length-n binary hash value [5].

$$ h=D(x)\in {\left\{0,1\right\}}^n $$
(26)
  1. Step 4:

    A final decoding step of a linear code with random parameters converts the current intermediate hash value h into a shorter string.

3.4 Proposed restore algorithm

Geometric attacks can prevent watermarks from being correctly extracted at the decoder [3]. Therefore, the proposed scheme uses SURF feature points to restore rotation, scaling and shearing attacks. The proposed method uses the restore technique to improve the quality of the extracted watermark by enhancing the quality of the attacked watermarked video. At the embedding process, feature points are calculated only for the first frame of the original video as the first frame’s feature points are sufficient to determine any geometric attack. At extraction, SURF points are calculated for the first frame of the attacked watermarked video and compared to those of the original one. Table 1 shows the effect of the three attacks on transforming some parameters (scale, orientation, coordinates, and percent of matched points) of the feature points thus helping in determining the type of the attack. Figure 4 shows the block diagram of the proposed restore algorithm. The algorithm steps are as follows:

Table 1 The effect of different attacks on video parameters
Fig. 4
figure 4

Block Diagram ofthe Proposed Restore Algorithm

  1. Step 1:

    Match the SURF points of the first frame of the original video to those of the first frame of the attacked watermarked video to select the corresponding points.

  2. Step 2:

    Use the percent of matched points, and variations in scale, orientation, horizontal and vertical locations, to decide the type of attack as per Table 1.

  3. Step 3:

    If a scale attack is detected, an approximated scale ratio is calculated. Video is restored by inverting the scale of the video.

  4. Step 4:

    If a rotation attack is detected, an approximated rotation angle is calculated. An inverse rotation takes place to get the direction back. However, rotation attack increases the frame size so adjusting the angle results in padding zeros around the frame, thus a cropping stage is needed to remove the additional black part around the frame as shown in Fig. 5. For cropping the additional black part, the original frame size needs to be calculated. Equations (24) and (25) are used to find the original frame size, where variables in the equation are clarified in Fig. 6.

Fig. 5
figure 5

Two steps Rotation Restore: (a) Attacked frame with rotation 30o, (b) Frame after adjusting its direction, (c) Frame after cropping the black surrounding

Fig. 6
figure 6

Parameters used in Eq. (24) and (25)

$$ \mathrm{a}=\frac{x\cos \theta -y\sin \theta }{\cos \left(2\theta \right)} $$
(27)
$$ \mathrm{b}=\frac{y\cos \theta -x\sin \theta }{\cos \left(2\theta \right)} $$
(28)
  1. Step 5:

    If a shear attack is detected, determine the shear direction by comparing the average displacement in both directions; the higher is considered the shear direction. If it is horizontal, neglect the points having a small horizontal displacement or a big vertical displacement. In the case of vertical movement, conditions are reversed. Take the average of the remaining displacement points to be the shearing value used to adjust the frames.

4 Evaluation metrics

The two main evaluation metrics used to evaluate the proposed video watermarking scheme are PSNR and normalized correlation (NC). The PSNR metric is used to measure the quality in dB of both the watermarked video after the embedding process and the extracted watermark image after the extraction process. The NC metric is used to measure the correlation between the extracted watermark image and the original watermark image. Both metrics are explained below [28].

4.1 Peak signal to noise ratio (PSNR)

Given two 2D matrices I(x,y) of dimensions MxN representing an original image, and I(x,y) of the same dimension representing the degraded image; then the PSNR between the two images is calculated as in (26) and (27)

$$ \mathrm{PSNR}=20{\log}_{10}\left(\frac{\mathrm{MA}{\mathrm{X}}_{\mathrm{i}}}{\sqrt{\mathrm{MSE}}}\right) $$
(29)
$$ \mathrm{MSE}=\frac{1}{\mathrm{MN}}{\sum}_{i=0}^{M-1}{\sum}_{j=0}^{N-1}{\left\Vert I\left(i,j\right)-{I}^{\prime}\Big(i,j\Big)\right\Vert}^2 $$
(30)

Where MAXi is the maximum intensity level in I(x,y), and MSE is the mean square error. PSNRv between an original video (of K frames) and a degraded video is taken as the average of PSNRs of each of these original frames and their degraded versions as given in (28).

$$ \mathrm{PSN}{\mathrm{R}}_v=\frac{\sum \limits_{i=1}^K\mathrm{PSN}{\mathrm{R}}_{\mathrm{i}}}{\mathrm{K}} $$
(31)

4.2 Normalized correlation (NC)

The correlation coefficient measures the similarity between the original and the extracted watermark logo. The NC values vary between −1 to +1. Two images are very similar when NC is close to +1. While a NC value close to −1, indicates that the two images are highly dissimilar. The formula to compute the correlation between the original watermark logo and the extracted watermark logo, both of size NxM pixels is given by:

$$ NC=\frac{\sum \limits_{i=1}^M\sum \limits_{j=1}^N\left({A}_{i,j}-\overline{A}\right)\left({B}_{i,j}-\overline{B}\right)}{\sqrt{\sum \limits_{i=1}^M\sum \limits_{j=1}^N{\left({A}_{i,j}-\overline{A}\right)}^2\sum \limits_{i=1}^M\sum \limits_{j=1}^N{\left({B}_{i,j}-\overline{B}\right)}^2}} $$
(32)

Where \( \overline{A} \), \( \overline{B} \) are the means of the original watermark logo and the extracted watermark logo, respectively. Ai, j andBi, j are the pixel values at position (i, j) of the original watermark logo and the extracted watermark logo, respectively.

5 Experimental results and analysis

In this section, the experimental results obtained using the proposed video watermarking scheme are presented and analyzed. These results are also compared to results obtained using two other watermarking scheme presented by Liu et al. [20] and Gupta et al. [14] to emphasize the benefits of using the laplacian pyramid and the restore algorithm on the proposd technique. Experiments were performed on the 5 videos shown in Fig. 7a using the original watermark image in Fig. 7b. These videos are Talk show, Akiyo, Big buck bunny, Flying bird and Elephants dream, of which the first 3 are benchmark videos. The sizes of these videos and their numbers of frames are given in Table 2. Table 2 shows the PSNR of the watermarked videos after the watermark embedding process has been applied on the original videos using the proposed scheme, Liu et al. scheme, and Gupta et al. scheme. The PSNR of each watermarked video is calculated as the average PSNR of all its watermarked frames. Both the proposed and Gupta schemes show comparable PSNR values. Depending on the content of the video, one scheme can have higher PSNR than the other; but within a small margin. However, both schemes have demonstrated high imperceptibility of the watermark. Meanwhile, Liu et al. scheme shows the lowest PSNR of watermarked video for all tested videos.

Fig.7
figure 7

(a) Original Test Videos, (b) Original Watermark Image

Table 2 PSNR results of Watermarked Videos

Figure 8 shows selected frames from the original Big Bunny video as well as the same selected frames watermarked using the proposed scheme, Gupta et al. scheme, and Liu et al. scheme.

Fig. 8
figure 8

Original and watermarked frames from Big Bunny Video

The PSNR of each of the selected watermarked frames is included in the figure. Figure 9 shows the PSNR of all the watermarked Big Bunny video frames using the three schemes versus frame number. It is clear from both Figs. 8 and 9 that the PSNR of the watermarked frames for the proposed scheme compared to that of conventional is highly dependent on the contents of the frame; as part of the watermark in the proposed scheme is hidden in the HL band of DWT of every frame.

Fig. 9
figure 9

PSNR of the watermarked frames versus frame number for Big Bunny video

Table 3 shows the PSNR and NC of the extracted watermark for the five test videos using both the proposed scheme and Gupta et al. scheme [14], without attacks and with common signal processing attacks such as Median filtering, Salt & Pepper Noise, Gaussian Noise, MPEG4, H.264 [9] compression and Cropping.

Table 3 PSNR & NC of Extracted Watermark for Proposed and Gupta et al. [14] Schemes for signal processing attacks

From Table 3, it can be observed that in case of no attack, the proposed scheme outperforms Gupta et al. scheme within a range from 4 to 20 dB in the PSNR of extracted watermark; depending on the video tested. In case of signal processing attacks, it is obvious that the proposed scheme has higher PSNR and correlation of extracted watermark than Gupta et al. scheme in cases of median filtering from 4 to 20 dB, Salt & Pepper noise 7-20 dB, Gaussian noise 4-20 dB, and MPEG and H264 compression 4-20 dB. Only in case of crop, comparative performance varies depending on the frequency contents of the original video tested and the part of the video where crop occurs.

Table 4 shows the PSNR and NC of the extracted watermark for the same five test videos using both the proposed scheme and Gupta et al. scheme [14] in case of geometric attacks such as Scale, Rotation, and Shear. Since the proposed scheme is robust by itself to geometric attacks and the proposed restore algorithm is used to increase robustness and improve quality of extracted watermark.

Table 4 PSNR & NC of extracted watermark for proposed and Gupta et al. [14] for geometric attack

Thus, the proposed scheme results for geometric attacks are presented twice; once with the proposed restore algorithm in use and the other without restoration. From Table 4, it is observed that the performance of the proposed scheme without using the proposed restore algorithm towards scaling in terms of PSNR of the extracted watermark and NC is higher than those of Gupta et al. scheme from 3 to 16 dB, depending on video tested. The performance even improves more, adding 2–4 dB extra with the proposed restore algorithm. In case of rotation, the performance of the proposed scheme without restore is 5–20 dB more than Gupta et al. scheme. This result depends on the video tested and the angle of rotation. With the restore algorithm, the performance can jump 5–16 dB extra also depending on video and angle of rotation. In the case of shear, the proposed scheme outperforms Gupta et al. scheme by 2–11 dB without restore. With the proposed restore algorithm 2–7 dB extra improvement can be added.

Figure 10 shows the extracted watermarks for both the proposed scheme and Gupta et al. scheme using different attacks such as signal processing, geometric attacks and without attacks, for the Big Bunny video. Only results with the proposed restore algorithm are considered here. Figure 10 supports conclusions drawn from results in Tables 3 and 4.

Fig. 10
figure 10

Extracted watermarks using proposed and Gupta schemes from Big bunny video after different attacks

Figure 11a shows the PSNR of the extracted watermark using both proposed and Gupta et al. schemes versus frame number in case of no attack for the Big Bunny video. The extracted watermarks from all video frames using the proposed scheme are of higher quality and have higher PSNR than those extracted using the Gupta et al. one. Figure 11b shows the relationship between the PSNR of the extracted watermark using both proposed and Gupta et al. schemes for Big Bunny video versus noise ratio. Increasing the noise ratio results in a decrease in the PSNR for both schemes; however, the proposed scheme has higher PSNR than that of Gupta despite the noise ratio. This improvement in PSNR value is due to hiding part of the watermark in HL sub-band of DWT of the original video frames. Figure 11c and d show the PSNR of the extracted watermark versus rotation angle and shear value, respectively, for Gupta et al. scheme and the proposed scheme for both cases of with and without the proposed restore algorithm. From the graphs, it can be concluded that the proposed video watermarking scheme has better performance than Gupta et al. scheme in case of rotation and shear attacks despite rotation angle and shear value. The proposed restore algorithm improved the results of the proposed algorithm more resulting in higher PSNR for the extracted watermark. Using SURF feature points, the restore algorithm can find out the type of geometric attack and provide a good estimation for rotation angle, scale ratio, and shear value.

Fig. 11
figure 11

PSNR of Extracted Watermark for Big Bunny Video versus (a) Frame Number, (b) Noise Ratio, (c) Rotation Angle, (d) Shear Value

Table 5 shows the normalized correlation (NC) of the extracted watermark for three test videos using both the proposed scheme and Liu et al. scheme, without attacks, with common signal processing attacks such as Median filtering, Salt & Pepper Noise, Gaussian Noise, MPEG4, Cropping, and with geometric attacks such as scaling, rotation, and shear. It is clear that the NC of the extracted watermarks using the proposed scheme is higher than that of Liu et al. scheme in all previously mentioned attacks for all tested videos. Figure 12 shows the extracted watermarks from Big Bunny video using Liu et al. scheme under the previously mentioned attacks. The quality of extracted watermark using Liu is very low compared to those of the proposed scheme shown in Fig. 10. The scrambling technique Liu et al. use in their scheme to improve security has affected the quality of the extracted watermarks poorly.

Table 5 NC of Extracted Watermark using Proposed and Liu et al. [20] Schemes for different types of attacks
Fig. 12
figure 12

Extracted Watermarks from Yang Liu [20] Scheme for Big Bunny after different attacks

6 Conclusions

This paper proposes a semi-blind robust and secure video watermarking scheme based on DWT, SVD, and Laplacian pyramid. The main purpose of the proposed scheme is improving the robustness of the watermark towards different types of attacks without deteriorating the quality of the watermarked video. Laplacian pyramid has been used to separate the watermark image into a low-frequency watermark image and a high-frequency watermark image. Each of the watermark images is hidden in the singular values SVs of different DWT subbands. The scheme was tested on five colored videos, three of which are benchmark. Results obtained from the proposed scheme have been compared to those of other schemes. From experimental and comparative results, the watermarked videos obtained using the proposed scheme have shown very high watermark transparency in terms of the average peak signal to noise ratio (PSNR) of the watermarked frames. Extracted watermarks using the proposed scheme, without attacks occurring during transmission, are strongly correlated to the original watermark and have higher quality in terms of PSNR compared to those extracted using Gupta et al. and Liu et al. schemes. Robustness of the proposed scheme towards various common signal processing like noise, filtering, and compression, and geometric attacks is higher than that of Gupta and Liu. An improvement from 4-20db in the quality of extracted watermark is achieved using the proposed technique compared to those of Gupta et al.

Moreover, the proposed restore scheme based on SURF for geometric attacks shows a significant improvement in performance in case of scaling, rotation, and shear attacks. Only in cropping attack, comparative results may vary depending on the contents of the test video and the place where cropping happens. Security of the proposed scheme has been achieved via using a perceptual image hash function to be applied on the original watermark image during embedding for authentication purposes. If watermark authentication fails during extraction, the extraction process will not proceed. This security step guarantees that the owners’ watermark is the one used in the extraction process; thus preventing false-positive problems.