1 Introduction

Digital media are being spread widely via the Internet. Common digital media data include text, video, audio, and images. However, digital media can be easily duplicated, counterfeited or tampered by illegitimate users, and hence protecting the copyright and content has become an important concern in the digital world. The protection of intellectual property rights is an increasingly important issue with a large amount of digital media interchanged on the Internet. Generally, information hiding is adopted to conceal messages in the cover media to protect property rights, for authentication or to share secret messages [40, 7, 25]. In the information hiding field, watermarking scheme is effective for copyright protection [10], whereas the steganographic scheme is commonly applied to share secret messages [17, 30]. However, the cover media suffer due to some permanent destruction, regardless of whether the watermarking or the steganographic scheme is adopted, even if the embedded messages have been extracted. In some applications, such as medical diagnosis, military imaging or law enforcement, even a slight distortion is unacceptable, as it may cause an incorrect decision to be made. Digital watermarking is defined as the process of embedding a piece of multimedia data called watermark over another multimedia data called cover data to protect it from the act of misusing [11]. Digital watermarking system has drawn more and more attention in recent years to protect multimedia data against copyright and authentication.

Generally, watermarking algorithms are classified into four types based on the type of data to be watermarked normally, (i) text, (ii) image, (iii) video and (iv) audio [27, 12, 24, 28]. Most of the research has dealt with image watermarking and currently the focus is on video due to the advancements in technology. In the case of images, the watermarking algorithms are classified based on the domain of embedding normally, spatial [44] and frequency or transform domain [37, 8]. In spatial domain, [13] the watermark images are embedded directly by modifying the subsets of the image pixels. These spatial domain watermarking techniques can be visible or invisible. Mostly, invisible watermarking is preferred for content authentication [35]. Again, invisible kind of spatial watermarking can be of blind [26] or non-blind. The qualities of spatial watermark are measured based on (i) the embedding capacity of an image. It can be increased by embedding different bits of watermark image on different pixels of cover image based on its color value [14], (ii) the fidelity of an image can be improved by embedding based on the cover image prediction error sequence and it can match very well with the properties of the human visual system [11], (iii) the robustness can be achieved by embedding the watermark pixels on the most valuable area of an image called Region of Interest (ROI) [41].

Previous works reveal that the numerous spatial domain algorithms available fail to provide better robustness [34], but it is fast enough when compared to the transform domain techniques. To overcome the demerits of spatial domain algorithms, the frequency domain watermarking were introduced. It is nothing but transferring the digital multimedia content into multiple frequency bands using reversible transforms [21] and then performs embedding on the transformed coefficients, which are more robust to various image processing, video processing and geometrical attacks. In literature, researchers have discussed the robust single watermarking systems using DFT only, DCT only, DWT only [23], SVD only [49] and hybrid techniques such as DFT and Radon [33], DCT and SVD [55], DWT and SVD [19], DWT and Hilbert transforms [2], DCT, SVD & DE [5] and multiple watermarking system using BAM in DWT domain [3]. Although many transform techniques exist for image watermarking, watermarking using DWT is highly motivated by good time-frequency features and directives that match well with the Human Visual System (HVS).

Similar to spatial domain watermarking, the transform domain watermarking can be visible or invisible. Almost all watermarking algorithms used for content authentication and copyright protection applications are of invisible type. In the field of image watermarking, most of the researches have been focused on gray scale watermarking techniques. But in many multimedia applications, color images are the basic component. The color image watermarking is an important challenge in the modern digital watermarking techniques.

In general, watermarking algorithms are judged using the factors such as robustness, imperceptibility and capacity of embedding. A digital watermark is called imperceptible if the original cover image and the watermarked images are perceptually indistinguishable. A digital watermark is called robust if it resists a selected class of attacks. Robust watermarks may be used in copy protection applications to carry a copy and no access control information. The number of embedded images on cover image or video determines the capacity of the digital watermarking schemes. In achieving the goal of high embedding capacity, most of the proposed reversible data hiding schemes suffer from the problem of a large amount of auxiliary information. Moreover the complexity of the watermarking system also increases.

Video watermarking differs from image watermarking as follows: First, video signals are highly susceptible to pirate attacks such as interpolation, frame swapping, frame dropping, frame averaging, etc. These attacks have no counterpart in image watermarking. Secondly, providing the imperceptibility of the watermark in video is relatively more difficult than image, because the watermark embedding procedure should consider the temporal variation into account due to the three dimensional characteristics of the video. Third issue in video watermarking depends either on embedding the identical watermark in each frame, where attacker would collude the frames from different scenes for extracting the watermark [54, 16] which leads to the statistical perceptual invisibility maintenance problem [9] or embedding independent watermark for each frame, where the attacker would take advantage of the motionless regions in successive video frames to remove the watermark by comparing and averaging the frames [50, 56]. The solution to the above mentioned collusion and averaging problem pointed out by Su et al. [48] is that embedding identical watermark to the motionless frames and different watermark to the motion frames. Thus, two types of watermarks can be embedded in a same video.

Watermarks embedding in the detail coefficients of the wavelet transform results in increasing the robustness [29]. Niu and Sun [38] proposed a method of watermarking that embeds the decomposed watermark into the decomposed video based on its decomposition level. A blind video watermarking scheme proposed by Serdean et al. is invariant to geometrical attacks such as rotation, cropping, scaling and shift, where they embedded in the wavelet domain using Human Visual System (HVS) model [46]. Barni et al. [6] proposed a robust watermarking scheme for raw video, which alters the DFT coefficients of the brightness components of the to-be-marked frames. It is robust against JPEG compression, filtering, scaling, sharpening, rotation and cropping attacks. Vassaux et al. proposed a MPEG – 4 based video watermarking scheme [52].

The Contourlet transform based robust watermarking methods are also there in the literature [31, 22, 43, 45, 3]. Li et al. [31] proposed a contourlet transform based image watermarking algorithm. In this scheme, scale-space feature based watermark synchronization is combined with Non-Sampled Contourlet Transform (NSCT) for embedding watermark. But, this scheme is proposed for gray scale images only and the PSNR values of the watermarked images are between 40 and 45 dB. Haohao [22] proposes YCbCr based watermarking scheme in which the watermarks are embedded in the largest detailed sub bands of contourlet coefficients. But it is less robust against image processing and signal processing attacks. The level of watermark detection after the attacks is very low since the watermarks are embedded in the largest frequency sub bands. Rahimi et al. [43] present an adaptive dual watermarking scheme which embeds the watermark bits in the singular value vectors of low pass contourlet sub bands for DICOM images. But it is less robust against salt and pepper attacks and motion blur. Ranjbar et al. [45] proposed a blind and robust watermarking method consisting of two embedding stages. In the first stage, the odd description of image is divided into non-overlapping fixed sized blocks and signature (watermark) is embedded in the high frequency components of the CT blocks. In the second stage, the signature is embedded in the low frequency components. But, this method is less resistant against median filtering, Gaussian noise, salt and pepper noise and JPEG compression attacks. Akhaee et al. [4] introduced a robust blind scheme and a non-blind multiplicative watermarking scheme where the watermarks are embedded in the directional sub bands with higher energy representing the edges of the image. But it is less robust against compression and rotation attacks.

Based on the above, we are specifically concerned about designing a robust video watermarking system to efficiently embed color image watermark, using techniques such as bit plane slicing, Contourlet Transform, Discrete Wavelet Transform and Singular Value Decomposition for authentication and copyright protection. It is described as follows: In order to increase the robustness, imperceptibility and watermarking capacity, the color watermark is first sliced into 24 slices and then we scramble each such slice using Arnold transform with a key, and then apply contourlet transform on the converted frame to capture the smooth contours and now, the wavelet transform is applied over it to get better multi-resolution bands. Next, we find SVD for the selected DWT sub band and we embed each such scrambled slice’s SVD value with the DWT band’s SVD on the cover video. The proposed scheme achieves good level of watermark quality with PSNR values greater than 68 dB. Here, since the embedding of watermark is done in both the low and high frequency DWT sub bands, it is robust against image processing attacks, geometric attacks, temporal attacks, and multiple attacks with high normalized correlation values and low bit error rate.

The paper has been organized as follows. Section 2 details the review of current system and our contribution. Section 3 gives the preliminary information. Section 4 discusses the proposed work. Sections 5 and 6 are devoted to the experimental analysis and the comparison between the conventional systems and the proposed approach. Section 7 concludes the paper and discusses about future directions.

2 Review of current systems and our work contribution

In this section the existing approaches are reviewed and the contribution of our work is presented.

2.1 Review of existing systems

From the literature, the following weaknesses of the existing algorithms have been identified:

  1. (i)

    None of the existing algorithms facilitate to provide robustness against all types of attacks. Especially, frame dropping in video watermarking.

  2. (ii)

    In most of the existing algorithms, researchers have used mainly gray scale image or binary image as watermark, but the scope of such images is relatively less nowadays when compared to the color images.

  3. (iii)

    The performance of the watermarking technique in most of the existing algorithms is estimated with respect to robustness and imperceptibility levels. In most of the cases, they failed to focus on the capacity of embedding, i.e., payload.

  4. (iv)

    Most of the algorithms use an image as a watermark and embed them into a cover image. This often results in the degradation of the imperceptibility level or the PSNR value. But we embed color images into color video and hence the degree of degradation can be controlled.

  5. (v)

    Majority of the existing algorithms embed the same image in all video frames for video watermarking, and they can also achieve frame dropping ratio of N-1 frames for a maximum of N frames in the video. This is mainly due to large amount of duplication.

2.2 Our contribution

We have taken all the above mentioned points into consideration and developed a novel embedding scheme as follows:

  1. (i)

    In our approach, we achieved a frame dropping rate of about 95 %, i.e., 23 frames out of 24 frames.

  2. (ii)

    We have used color image as a watermark and embedded into a color cover video.

  3. (iii)

    In addition to fidelity and robustness, we have also considered payload as a metric and achieved an embedding capacity (E) of about E = N – number of motion frames / 24 images per second of the video, where, N is the number of frames, For example if N = 150 frames, motion frames = 6, then E is (150 – 6) / 24 = 6 images.

  4. (iv)

    In our approach, we are embedding the scrambled bit plane slices into each and every frame of the video. This completely avoids duplication of same contents throughout the video and also offers good level of imperceptibility and security.

  5. (v)

    The watermark slices are embedded in the hybrid transform domain of Contourlet and Wavelet. This in turn strengthens the robustness and visual perception.

3 Preliminary concepts

3.1 Bit plane slicing

Every pixel in a color image is represented using 24 bits (8 bits for red component, 8 bits for green and 8 bits for blue component) and its Most Significant Bits (MSB) i.e., 7, 6, 5 and 4 bits contain the visually significant information of an image, whereas, the Least Significant Bits (LSB) i.e., 3, 2, 1 and 0 bits normally contribute to the subtle information of an image. This leads, to the fact that, removal of such lower order bits, won’t affect the human visual perception of an image. The way of representing an image with one or more bits of the byte used for each pixel is called bit plane slicing. Thus, a color image which is represented by 24 bits is composed of 24 1-bit planes, ranging from bit-plane 0 for least significant bit to bit-plane 7 for the most significant bit each for red, green and blue component. Here, we have sliced the color watermark (VIT University logo) into 24 bit-planes. The sample MSB bit-planes of each red, green and blue color component are shown in Fig. 1 and we have embedded those planes on to the consecutive non-motion frames of a cover video for copyright protection and ownership authentication.

Fig. 1
figure 1

MSB bit plane slicing output for the color watermark image (vitlogo.bmp)

3.2 Shot boundary detection

Traditionally, a scene is a continuous sequence that is temporally and spatially interconnected in the real world. One of the most fundamental video segmentation tasks is shot boundary detection. Shot boundary detection is the process of automatically detecting the boundaries between shots in a video. It is a problem which has attracted much attention recently since video has become available in digital form. It is an essential pre-processing step to almost all video analysis, indexing, summarization, search and other content based applications. In the proposed work, a histogram correlation approach [42] has been used to achieve the shot boundary detection. Here, we perform histogram difference based shot boundary detection for the test videos manwalk.avi and mountain.avi (see Fig. 2).

Fig. 2
figure 2

Histogram difference based shot boundary detection for manwalk.avi and mountain.avi videos with a threshold value of t = 80

3.3 Arnold transform

Arnold Transform also called Cat face transform, where an image is hit with the transformation that apparently randomizes the original organization of its pixels. However, if iterated enough times, eventually the original image reappears. The number of iterations taken is known as the Arnold’s period. This period depends on the image size, which means that Arnold’s period will be different for different image sizes [39, 18]. The Arnold transforms (Eq. 1) and its inverse (Eq. 2) can be applied only to M × M digital images and it is given as follows:

The Arnold Transform is:

$$ \left(\begin{array}{c}\hfill p^{\prime}\hfill \\ {}\hfill q^{\prime}\hfill \end{array}\right)=\left(\begin{array}{c}\hfill 1\ 1\hfill \\ {}\hfill 1\ 2\hfill \end{array}\right)\left(\begin{array}{c}\hfill p\hfill \\ {}\hfill q\hfill \end{array}\right) mod\;M $$
(1)

where,

p, q:

are the coordinates of the original image.

p′, q’:

are the coordinates of the scrambled image.

M:

is the height or width of the square image to be processed.

The Inverse Arnold Transform is:

$$ \left(\begin{array}{c}\hfill p\hfill \\ {}\hfill q\hfill \end{array}\right)=\left(\begin{array}{ll}2\hfill & -1\hfill \\ {}-1\hfill & 1\hfill \end{array}\right)\left(\begin{array}{c}\hfill p^{\prime}\hfill \\ {}\hfill q^{\prime}\hfill \end{array}\right) mod\;M $$
(2)

where,

p, q:

are the coordinates of the descrambled image.

p′, q’:

are the coordinates of the scrambled image.

M:

is the height or width of the square image to be processed.

The above Eq. 1 is used to transform each and every pixel coordinates of the image. When all the coordinates are transformed, the scrambled image will be obtained. After few iterations, if the output image that we achieve reaches our anticipated target (i.e. up to secret key), we have achieved the scrambled image we wanted to. This iteration number is used as the secret key. The same key can be used for decoding the scrambled image. Here, we have applied Arnold’s transformation to the image Blue 8th plane.jpg by taking Arnold’s key as 30. Figure 3 below shows output of Arnold’s Transformation.

Fig. 3
figure 3

Arnold’s transformation a original image and b scrambled image

3.4 Contourlet transform

A Contourlet transform is an efficient multi-scale directional transform developed by M.N. Do and Martin Vetterli [15]. It uses a double filter bank structure, which can be constructed by combining two distinct and successive decomposition stages: a Laplacian Pyramid (LP) and a Directional Filter Bank (DFB). The Laplacian pyramid is used to perform multi-scale decomposition i.e., it decomposes an image into a number of detail (high frequency) sub bands and an approximation (low frequency) sub bands. Then a DFB is used to perform directional decomposition on the detail sub bands. The Discrete contourlet transform can able to capture the directional edges in a better way when compared to wavelets [15]. There are various options for pyramid and Directional filters. Here, we have used “9-7” pyramid filter and “pkva” directional filter. The schematic diagram of contourlet transform is given in Fig. 4a and its application to one of the image frames of our video is shown in Fig. 4b

Fig. 4
figure 4

Contourlet transform a schematic diagram b CT on ‘mountain.avi’

3.5 Discrete wavelet transform

Discrete wavelet transform is an efficient and powerful tool allowing multiresolution analysis of an image. The Wavelet Transform, at high frequencies, gives good time resolution and poor frequency resolution, while at low frequencies; the Wavelet Transform gives good frequency resolution and poor time resolution. In DWT, an analysis of an image signal is done by allowing it to pass through an analysis filter followed by decimation operation. This operation results in four sub-bands for one-level decomposition (see Fig. 5a) as LLa1, LHd1, HLd1 and HHd1 and eight sub bands for two-level decomposition (see Fig. 5b) at LLa1 band as LLa2, LHd2, HLd2 and HHd2.

Fig. 5
figure 5

a 1-level DWT decomposition b 2-level DWT decomposition c 1-level decomposition of an image frame of ‘mountain.avi’ d 2-level decomposition of ‘mountain.avi’

Here, we have used Biorthogonal wavelet, because, the decomposition and reconstruction filters in biorthogonal transform are obtained from two distinct scaling functions, where one is duality of the other. It helps us to achieve good robustness and invisibility properties compared to the other wavelet bases [47]. It also provides good embedding capacity, if it is used to decompose the image into different channels [36].

3.6 Singular value decomposition

The Singular Value Decomposition (SVD) is a powerful technique in many matrix computations and analyses. Using the SVD of a matrix in computations rather than the original matrix has the advantage of being more robust to numerical errors. The components of the SVD quantify the resulting change between the underlying geometry of those vector spaces. Many fundamental aspects of linear algebra rely on determining the rank of a matrix, making the SVD an important and widely used technique [49].

For any image X, SVD(X) is shown as,

$$ X=A\;B\;{C}^T $$
(3)

where,

X:

m × n matrix

A, C:

orthonormal matrices

B:

diagonal matrix comprising of singular values of X

4 Proposed work

The proposed video watermarking is based on bit plane slicing, shot boundary detection, Arnold transform, Contourlet, Discrete Wavelet Transform and Singular Value Decomposition techniques. Although, there exist several research works on watermarking using DWT and SVD [1, 19], Contourlet [45] and Arnold transforms [5, 19]. The novelty of our paper can be described as follows. In our present work we use the hybrid concept of DWT and Contourlet to gain the advantages of both transforms. To achieve better performance in terms of imperceptibility, the bit plane slices of the watermark image are embedded, instead of the watermark image itself. Before embedding the bit plane slices they are scrambled using Arnold transformation key. In addition to that, the Eigen vector is generated for a watermark image using co-variance matrix and maximum Eigen values, and then we embed these two parameters in the mid frequency coefficients of the DWT transform of the Contourlet transformed non-motion frames of cover video. This ensured that our method has high robustness. It also achieves two levels of authentication by the scrambling the extracted watermark slices using the valid key and another is by comparing the extracted vector with the regenerated vector from the extracted slices. If they match we assume that no alteration has occurred. Otherwise, alteration is assumed to be done. Due to the embedding of slices and Eigen vector of similar type happens only in the non-motion frame (i.e., similar content on similar frames), the common attacks in videos naming collusion attack is getting solved. It involves two stages: Embedding and, Detector cum Extraction Process. The step by step process of each stage is discussed in the following sections:

4.1 Embedding process

The embedding process involves a novel idea of embedding scrambled slices on the successive non-motion frames of the cover video in hybrid transform domains. This process can be described using three steps: i) cover video pre-processing ii) pre-processing of watermark images and iii) embedding. Cover Video Pre-processing: The color cover video is converted into frames, from that the non-motion frames are extracted using histogram difference based scene change detection algorithm. Now, we use these non-motion frames for embedding the similar watermarks i.e., scrambled slices and the generated Eigen Vector matrix. Then we perform RGB to YCbCr color conversion and apply the Contourlet transform on Y component of each such non-motion frames. Now, we apply DWT transform on the low sub band of the contourlet output. Watermark Images Pre-processing: The color watermark image is sliced into 24 bit planes using bit plane slicing. Then, these slices are scrambled using Arnold Transform key K (i.e., 30 in our case). And also, the watermark image relevance vector is generated using co-variance matrix and Eigen values. We compute the Eigen vector corresponding to the maximum Eigen value. Then, these two features are used for the embedding process. Watermark Embedding Process: The resultant scrambled watermark slices and the generated Eigen Vector is embedded into all identified non-motion frames of cover video. In order to strengthen the embedding process, a hybrid transform is used. Once, Contourlet transform has been applied on the chosen cover video frames, we apply the DWT on its low frequency sub bands. Here, the watermark slices and the generated vector are embedded in the mid frequency coefficients LH and HL DWT sub bands respectively. This kind of embedding in the hybrid domain helps us to achieve both visual perception and robustness to our algorithm. The process is explained in detail as follows:

  • Step 1: Choose an appropriate color watermark ‘w’ and a color video as a cover video ‘cv’.

  • Step 2: Perform bit plane slicing on the color watermark image. Due to its color nature, each component R, G, and B will have 8 slices, so that a total of 24 slices will be generated and let us name it as bitplane w1 , bitplane w2 ,………,bitplane w24 .

  • Step 3: Scramble each of the generated slices using Arnold Transformation Key K as,

    $$ {E}_K\left( bitplan{e}_{w 1},\; bitplan{e}_{w 2},\dots \dots \dots, bitplan{e}_{w 2 4}\right) $$
    (4)
  • Step 4: Divide the RGB cover video ‘cv’ into RGB frames namely cvf 1 ,cvf 2 ,………….cvf n , where, n is the number of frames

  • Step 5: Apply shot boundary detection algorithm 4.3 (explained later) to all the extracted frames,

    $$ motion\; frames= scenechangedetection\left(cv{f}_i\right) $$

    where i = 1, 2, 3, ………, n

  • Step 6: Skip the motion frames of step 5 from the cv and select only the first 24 non-motion frames among the cover RGB frames in order to embed the 24 watermark slices. Let us call these non-motion frames as cvf 1 ’,cvf 2 ’,………….cvf 24 ’.

  • Step 7: Convert each RGB frames cvf 1 ’,cvf 2 ’,………….cvf 24 into YCbCr form as,

    $$ \left.\begin{array}{l}\left[{f}_{1y}\kern0.15em {f}_{1cb}\kern0.15em {f}_{1cr}\right]=rgb 2 ycbcr\left(cv{f}_1'\right)\hfill \\ {}\left[{f}_{2y}\kern0.15em {f}_{2cb}\kern0.15em {f}_{2cr}\right]=rgb 2 ycbcr\left(cv{f}_2'\right)\hfill \\ {}\dots ..\hfill \\ {}\left[{f}_{2 4y}\kern0.15em {f}_{2 4cb}\kern0.15em {f}_{2 4cr}\right]=rgb 2 ycbcr\left(cv{f}_{2 4}'\right)\hfill \end{array}\right\} $$
    (5)
  • Step 8: Apply 1-level Contourlet Transform (CT) on the f 1y (luminance) part of the 1st cover video frame as,

    $$ \left[LL,\left[D 1,\ D 2,\ D 3,\ D 4\right]\right]=CT\left({f}_{1y}\right) $$
    (6)
  • Step 9: Apply 2-level DWT on the LL component of Step 8 as,

    $$ \left[ll 1,lh 1, hl 1,hh 1\right]=DWT(LL) $$
    (7)
    $$ \left[ll 2,lh 2, hl 2,hh 2\right]=DWT\left(ll 1\right) $$
    (8)
  • Step 10: Take SVD on lh2 and hl2 component of the DWT output obtained in Step 9

    $$ \left[{A}_c,{B}_c,{C}_c\right]=SVD\left(lh 2\right) $$
    (9)
    $$ \left[{A}_d,{B}_d,{C}_d\right]=SVD\left( hl 2\right) $$
    (10)

    where B c and B d denotes the singular values of matrix lh2 and hl2

    A c , C c and A d , C d denote orthogonal matrices.

  • Step 11: Compute Eigen Vector ‘V’ of the watermark image ‘w’ as follows;

    1. (i)

      Find zero mean M for the watermark image W

      $$ M=W-m $$
      (11)

      where m - mean of W.

    2. (ii)

      Calculate covariance matrix

      $$ CM=M \times {M}^T $$
      (12)
    3. (iii)

      Determine eigen values γi and eigen vectors φi of the covariance matrix CM.

    4. (iv)

      Choose the maximum eigen value and its corresponding eigen vector. It may be denoted as γmax and φmax respectively.

    $$ \mathrm{V}=\varphi max $$
    (13)
  • Step 12: Take SVD on each sliced scrambled color watermark image (from step. 3) as,

    $$ \left[{A}_{wi},{B}_{wi},{C}_{wi}\right]=svd\left({E}_K\left( bitplan{e}_{wi)}\right)\right) $$
    (14)

    where, i = 1, 2, ………..,24

  • Step 13: Calculate the new singular values B ci ’, by adding the cover video (lh2) singular value B ci with the watermark slice’s principal component (A wi × B wi ) and B di ’, by adding the cover video singular value B di (hl2) with the Eigen vector (V) multiplied by robustness factor α.

    $$ {B}_{ci}'={B}_{ci}+\upalpha \left({A}_{wi}\times {B}_{wi}\right) $$
    (15)
    $$ {B}_{di}'={B}_{di}+{\upalpha}_{\mathrm{i}}(V) $$
    (16)

    where, α represents robustness factor.

  • Step 14: Reconstruction of DWT Coefficient lh2’ with the new singular value B ci and B di of inverse SVD is given as,

    $$ lh2'={A}_{ci}\times {B}_{ci}'\times {C}_{ci} $$
    (17)
    $$ hl2'={A}_{di} \times {B}_{di}' \times {C}_{di} $$
    (18)
  • Step 15: The watermarked frame’s Y component namely ycf 1y is obtained using the inverse DWT as,

    $$ ll1'= idwt\left(ll2,lh2',lh2',hh2\right) $$
    (19)
    $$ LL'= idwt\left(ll1', hl1,lh1,hh1\right) $$
    (20)
  • Step 16: The watermarked frame’s Y component namely cf 1y is obtained using the inverse Contourlet transform as,

    $$ c{f}_{1y}'=ict\left(\left.LL',\left[D1',D2',D3',D4'\right]\right]\right) $$
    (21)
  • Step 17: Construct the YCbCr image with modified luminance component cf 1y (from Eq. 21) and unmodified chrominance component f 1cb f 1cr (from Eq. 5)

  • Step 18: The resultant YCbCr frame of step 17 is converted to RGB frame.

  • Step 19: Repeat Steps 6 to 18 for the remaining 23 frames.

  • Step 20: Grouping all the embedded frames ‘WF’ and the motion frames left (before embedding) results in the watermarked video ‘wv’.

4.2 Detector and extraction process

The detector and extraction process performs the reverse of embedding. It involves (i) watermarked video pre-processing and detection, (ii) Extraction and (iii) Watermark post-processing. Watermarked video pre-processing: It represents converting the received watermarked video into frames and then identifies the non-motion frames. To verify the presence of watermark, the correlation value is computed between the non-motion frames of watermarked video and original video. The result of correlation is compared with the predefined threshold value. Then it concludes as ‘watermark is present’ if the correlated value is greater than the threshold, else, it is concluded as ‘no watermark’ is present. Extraction: The scrambled slices and the Eigen vector V′ are extracted from the watermarked content. Watermark post-processing: From the extracted scrambled slices, the authorized user can extract their watermark slices using the appropriate Arnold key. Then, the slices are grouped together to form the watermark image. To verify the authenticity of the content, the Eigen vector V″ is computed from the regenerated image and compared with the extracted vector V′. If matches are found, then “No change” in the content is assumed else, content has changed. This process is explained in detail as follows:

  • Step 1: Let us designate the watermarked video which is received as ‘wv’.

  • Step 2: Divide RGB watermarked video ‘wv’ into RGB frames as wvf 1 , wvf 2 ………….wvf n where, n is the number of frames.

  • Step 3: Repeat steps 5 and 6 of the embedding algorithm to get the first 24 non-motion frames namely wvf 1 ’, wvf 2 ’,………,wvf 24 .

  • Step 4: Convert each RGB frames wvf 1 ’, wvf 2 ’,……………wvf 24 into YCbCr form as,

    $$ \begin{array}{l}\left[x{f}_{1y}x{f}_{1cb}x{f}_{1cr}\right]=rgb2 ycbcr\left(wv{f}_1^{\prime}\right)\kern1em \\ {}\left[x{f}_{2y}x{f}_{2cb}x{f}_{2cr}\right]=rgb2 ycbcr\left(wv{f}_2^{\prime}\right)\kern1em \\ {}\dots ..\kern1em \\ {}\left[x{f}_{24y}x{f}_{24cb}x{f}_{24cr}\right]=rgb2 ycbcr\left(wv{f}_{24}^{\prime}\right)\end{array}\kern1em \Big\} $$
    (22)
  • Step 5: Apply 1-level CT on the xf 1y (luminance) part of the 1st watermarked video frame as,

    $$ \left[wLL,\left[wD 1,wD 2,wD 3,wD 4\right]\right]=CT\left(x{f}_{1y}\right) $$
    (23)
  • Step 6: Apply 2-level DWT on the wLL of CT as,

    $$ \left[wll 1,wlh 1,whl 1,whh 1\right]=DWT(wLL) $$
    (24)
    $$ \left[wll 2,wlh 2,whl 2,whh 2\right]=DWT(wll1) $$
    (25)
  • Step 7: Take SVD on the wlh2 and whl2 components of the DWT output of Step 6

    $$ \left[{A}_c',{B}_c'',{C}_c'\right]=SVD\left(wlh 2\right) $$
    (26)
    $$ \left[{A}_d',{B}_d'',{C}_d'\right]=SVD\left(whl 2\right) $$
    (27)

    where B c ’’ and B d ’’ are the singular values of matrix wlh2 and whl2 respectively.

  • Step 8: Extract the singular values of the watermark slice1 using the following equation,

    $$ {A}_{w 1}'{B}_{w 1}'=\left({B}_{c 1}''-{B}_{c 1}'\right)/{\alpha}_1 $$
    (28)
    $$ V'=\left({B}_{d 1}'' - {B}_{d 1}'\right)/{\upalpha}_1 $$
    (29)
  • Step 9: Apply inverse SVD on the extracted principle component A w1 ’B wc1 to obtain the scrambled watermark slice1,

    $$ ext{w}_1={A}_{w 1}` \times {B}_{w 1}'\times {C_{w 1}}^T $$
    (30)
  • Step 10: Apply Arnold Key K’ on extw 1 to get the descrambled watermark slice1 as,

    $$ {D}_K\left(ext{w}_1\right) $$
    (31)
  • Step 11: Repeat Step 5 to 10 to obtain the remaining 23 slices.

  • Step 12: Grouping of first 8, second 8 and third 8 slices results in R, G and B component of the watermark image respectively. Thus, the watermark image will be obtained by placing the appropriate components.

  • Step 13: Find the Eigen Vector V’’ of the output obtained in Step 12

  • Step 14: Compare

    $$ C={V}^{\prime }-{V}^{{\prime\prime} } $$
    (32)

    If C = 0,

    Authenticated content

    else

    Unauthenticated content

4.3 Shot boundary detection (motion frames=scenechangedetection(cv))

  • Input: individual frames of a video

  • Output: identifying motion frames

  • Step 1: Initialize first frame = cvf 1 ’, and nextcurrent = cvf k ’, where k = 2, 3, 4,…… 24

  • Step 2: Calculate histogram for all frames,

    $$ Hist\left(cv{f}_i\right) $$
    (33)

    where i = 1,2,3,………..24

  • Step 3: Correlate first frame histogram with every other frame in the video

    $$ {p}_{diff\kern0.28em Y}=\mathrm{correlation}\left( Hist\left(cv{f}_1^{\prime}\right), Hist\left(cv{f}_k^{\prime}\right)\right) $$
    (34)

    where k = 2,………n and y = 1,2,….n-1

  • Step 3:

    count = 1, Threshold T, y = 1,2,….n-1

                   while(count < = total number of frames)

    $$ if\;{p}_{diff\;Y}<=T\; then $$
    (35)

                          The frame cvfY’ is suitable for embedding

                             count = count + 1

                   else

                           The frame cvfY′ is not suitable for embedding.

                          count = count + 1

                   end

    end

The general block diagram of the above discussed steps of the proposed work is shown in Fig. 6.

Fig. 6
figure 6

General block diagram of proposed approach

5 Experimental analysis

The performance of the proposed method is measured in terms of imperceptibility, robustness, and embedding capacity. The six sample cover videos of same dimensions and the watermark image as logo of VIT University has been taken in order to evaluate the performance. The sample videos are shown in Fig. 7 as, ‘mountain.avi’, ‘rhinos.avi’, ‘walk.avi’, ‘suzie.avi’, ‘foreman.avi’, ‘mother_daughter’ and the color watermark image ‘vitlogo.bmp’.

Fig. 7
figure 7

Test videos a mountain.avi b rhinos.avi c walk.avi d suzie.avi e foreman.avi f mother_daughter.avi and g Watermark Image (vitlogo.bmp)

Figure 8 shows the watermarked videos and extracted watermark image as ‘moun.avi’, ‘rhin.avi and ‘wal.avi’ and ‘vitlog.bmp’ respectively. We have used the following parameters for the embedding and extraction process: The scaling factor α is chosen to be adaptive based on the input frame to improve the robustness of the algorithm, such that for each non-motion frames the value of scaling factor varies. The 2-level Biorthogonal filter coefficients were used for wavelet decomposition.

Fig. 8
figure 8

Watermarked videos a moun.avi b rhin.avi c wal.avi d suz.avi e forem.avi f motdau.avi and g extracted watermark image (vitlog.bmp)

5.1 Quality metrics

The proposed algorithm will meet all the three issues of watermarking namely, imperceptibility, robustness and the capacity of embedding. The metrics used to measure the first issue is Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) and the metrics used to measure the second issue are Normalized Correlation Coefficient (NCC) and Bit Error Rate (BER). We define a new third metric which is based on the number of images that can be embedded into the entire cover video.

5.1.1 Visual perception or transparency or imperceptibility

The Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) are used as a common metric to evaluate the degradation caused by various attacks. The acceptable value of PSNR is greater than 20 dB [51] and SSIM’s acceptable value varies between 0 (No match) to 1 (Exact Match).

$$ PSNR=10\ { \log}_{10}\left(\frac{255^2}{MSE}\right) $$
(36)

where, Mean Square Error (MSE) between the cover video frame cvf(t) and the attacked watermarked video frame awvf(t) is defined as,

$$ MSE=\frac{1}{T}\left({\displaystyle \sum_{t=1}^T{\left(cvf(t)- awvf(t)\right)}^2}\right) $$
(37)

where, T is total number of pixels per frame.

The SSIM of two video frame’s cover video frame cvf(t) and the attacked watermarked video frame avf(t) is given as,

$$ SSIM\left(x,y\right)=\frac{\left(2{L}_{cvf(t)}{L}_{avf(t)}\right)\left(2{V}_{cvf(t)avf(t)}\right)}{\left({L}_{cvf(t)}^2+{L}_{avf(t)}^2\right)\left({V}_{cvf(t)}^2+{V}_{avf(t)}^2\right)} $$
(38)

where,

Lcvf(t) and Lavf(t) :

represents the luminance factor of two images i.e., the mean of cvf(t) and avf(t)

V cvf(t) and Vavf(t) :

represents the contrast factor of two images i.e., the standard deviation of cvf(t) and avf(t)

V cvf(t) avf(t) :

represents the correlation coefficient between cvf(t) and avf(t).

5.1.2 Robustness

The Robustness of the embedded watermark to various attacks is measured in terms of Normalized Correlation Co-efficient (NCC), and Bit Error Rate (BER). Both the metric’s acceptable value is between 0 and 1.

  1. (i)

    NCC: For the NCC metric, if two images i.e., original watermark and the extracted attacked watermark are identical or correlated, then its value will be close to 1. If two images are uncorrelated, then its value will be close to 0.

    The correlation co-efficient can be computed using the following equation as,

    $$ NCC=\frac{{\displaystyle \sum \left(\left(O{W}_i-O{W}_m\right)\left(E{W}_i-E{W}_m\right)\right)}}{\sqrt{{\displaystyle \sum {\left(O{W}_i-O{W}_m\right)}^2}}\sqrt{{\displaystyle \sum {\left(E{W}_i-E{W}_m\right)}^2}}} $$
    (39)

    where, OW i is the intensity of the ith pixel in image 1 (original watermark), EW i is the intensity of the ith pixel in image 2 (extracted watermark), OW m is the mean intensity of image 1 (original watermark), and EW m is the mean intensity of image 2 (extracted watermark).

  2. (ii)

    BER: It is the ratio of wrongly extracted watermark bits to the total number of watermark bits embedded. If there is no error in the received message then the bit error rate value will be 0, otherwise close to 1.

    It can be computed using the equation as,

    $$ BER\left(OW,EW\right)=\frac{{\displaystyle \sum_{i=1}^m\left|O{W}_i-E{W}_i\right|}}{m} $$
    (40)

    where, OW i is the intensity of the ith pixel in image 1 (original watermark), EW i is the intensity of the ith pixel in image 2 (extracted watermark) and m is the total number of embedded watermark bits.

5.1.3 Payload

In our approach, we have taken a color watermark image and applied the bit plane slicing on the image, to obtain 24 bit plane slices i.e., 8 slices for each R, G, and B plane respectively. For the ‘N’ number of frames in the given color cover video, one can embed a maximum of (N—number of motion frames)/24 color images. Thus the payload is calculated.

5.2 Attacks

The fidelity and robustness of the proposed approach are validated on the sample videos (mountain.avi, rhinos.avi, and walkingman.avi) using the following attack situations: (i) No attacks, (ii) image processing attacks, (iii) geometrical attacks, (iv) temporal attacks, and v) multiple attacks.

5.2.1 No attacks

The impact of embedding on the cover video in terms of transparency and robustness can be evaluated effectively, when there is no attack on the watermarked video. The former is measured using PSNR metric, while the later is measured using NCC and BER. Thus, values of PSNR, NCC and BER for embedding watermark on the sample videos such as ‘mountain.avi’, ‘rhinos.avi’ and ‘manwalk.avi’ is given in the Table 1. From this we infer that, minimum Average PSNR we obtained is 55 dB for ‘rhinos.avi’ and maximum PSNR value obtained is 68 dB for ‘mountain.avi’, whereas the NCC remains 0.999 (approximately 1) and BER remains 0 for all the three sample videos.

Table 1 No attack Vs PSNR, NCC and BER

5.2.2 Image processing attacks

The various image processing attacks considered to validate the performance of the proposed approach are Gaussian noise (G) with variance of 0.1 and 0.5 respectively, Poisson noise (P), Salt and pepper noise (SP) with noise density of 0.02 and 0.06 respectively, Median filtering (M) of sizes 3*3 and 5*5 respectively, Contrast adjustment (C), and Histogram attack (H). Figures 9 and 10 shows the attacked watermarked videos and the extracted watermark for the above mentioned attacks.

  1. 1)

    Gaussian attack: The addition of Gaussian noise on watermarked video normally affects the human visual perception level, because it removes the edge component. This property will in turn affect the PSNR and SSIM value more than the NCC and BER value. In order to achieve better PSNR, SSIM, BER and NCC after the influence of Gaussian attack, we embedded the slice of scrambled watermark on the color video in the mid frequency band (LH band). The visual effect of Gaussian noise on the watermarked video is shown in Fig. 9a and b.

  2. 2)

    Salt and Pepper Noise: Another common form of noise is data drop-out noise (commonly referred to as intensity spikes, speckle or salt and pepper noise). This kind of noise is caused due to errors in the data transmission. The corrupted pixels are either set to the value which is maximum (white) or zero (black), giving the image a ‘salt and pepper’ like appearance. Unaffected pixels always remain unchanged. The noise is usually quantified by the percentage of pixels which are corrupted. Users can set the density of the distribution of salt and pepper noise. The quality of the watermarked image and the extracted watermark varies based on the density factor. The density used in this work is 0.02 and 0.06 percents of the pixels in the watermark. The proposed approach bears the mentioned density values for both transparency metrics (PSNR, SSIM) and robustness metrics (NCC, BER) (Fig. 9c and d).

  3. 3)

    Poisson attack: This kind of noise arises from the data itself instead of adding artificial noise to the data. Using a Poisson distribution, we can statistically model the distribution of the discrete arrivals of pixels over a period of time. It resembles Gaussian distribution except some of the properties such as mean is equal to variance and it is used only for discrete data. The effect of Poisson attack on the watermarked image (PSNR, SSIM) is as same as Gaussian attack. Since mean and variance are same in the case of Poisson, it will also affect the quality of the extracted watermark (NCC, BER value). But, we proved from the experimental results that the proposed approach tolerate the impact of Poisson attacks (Fig. 9e).

  4. 4)

    Median Filtering: The effect of median filtering on image is similar to other smoothing filters such as Gaussian and Poisson noises. All smoothing techniques are effective at removing noise in smooth patches or smooth regions of a signal, but adversely affect edges. On the other hand median filter is effective at removing noise without affecting the edge pixels. This nature of median filter increases the quality of the watermarked video (PSNR, SSIM). At the same time, it affects the extracted watermark quality (NCC, BER) which is also reflected in our experimental results (Fig. 9f and g).

  5. 5)

    Contrast Adjustment: The contrast of an image can be improved using standard techniques. We tested the effect of contrast adjustment on watermarked image. From the results (Fig. 9h), we infer that the contrast adjustment will affect both the visual perception (PSNR, SSIM) and the robustness (NCC, BER) value.

  6. 6)

    Histogram Attack: According to human visual system, a watermarking scheme is effective, when there is no differentiation between the original video and the watermarked video. But this is possible only for lossless scheme of watermarking. For lossy schemes of watermarking like our method, the changes can be viewed in peaks. If the variation is less, then we can conclude that our system is good against histogram attack. The Fig. 10 below shows the histogram plot of the original test video frames (foreman, suzie, mountain) and the corresponding watermarked video frames for various Red, Green and Blue components, where the variations among each components histogram is very minimal and also negligible. From Fig. 10, it is clear that our proposed approach withstands the histogram attack.

Fig. 9
figure 9

PSNR and NCC values for various image processing attacks on sample watermarked videos (mountain.avi, rhinos.avi, manwalk.avi) and the extracted watermark (vitlogo.bmp)

Fig. 10
figure 10

Histogram analysis (last two columns at the right) (histogram plot for Red, Green and Blue components of original frame of foreman, suzie, mountain) and (histogram plot for Red, Green and Blue components of watermarked frame of foreman, suzie, mountain) respectively

5.2.3 Geometrical attack

The geometrical attack taken to test the performance of the proposed approach is rotation. Here, we have rotated the watermarked video into various degrees (1°, 2°, 5°, 10°, and 180°) and then tried to extract the watermark from the rotated watermarked video. The bilinear interpolation is used to resize the rotated watermarked frames to its original size. Figure 11 shows the attacked watermarked videos and the extracted watermark for the above mentioned attacks with PSNR, SSIM for imperceptibility measures and NCC, BER for robustness measure respectively. It shows that, our approach is able to extract 65, 76, 83, 51 and 99 % of watermark, when the frames are rotated by 1, 2, 5, 10, and 180° respectively for the test video 1. Similarly, it can extract 48, 48, 47, 43, 99 % and 36, 35, 34, 32, 99 % of watermark for test video 2 and 3 respectively. From the above results, we conclude that the proposed watermarking algorithm is able to withstand rotation attack for the test video 1 in a better way than the other test cases. In particular, when frame is rotated by 180° the 99 % watermark extraction is achieved for all the test videos.

Fig. 11
figure 11

PSNR and NCC values for the geometrical attack on sample watermarked videos (mountain.avi, rhinos.avi, manwalk.avi) and the extracted watermark (vitlogo.bmp)

5.2.4 Temporal attacks

In general, the performance of the video watermarking algorithm is evaluated using temporal video attacks such as frame dropping (FD) and frame swapping (FS). Frame dropping is the process of dropping the frames from a video randomly. Frame swapping is the process of swapping the frames in a video. For the better approach, extraction of watermark should be possible, even after the maximum dropping rate is enforced. Similarly, swapping of frames should also allow us to get back the embedded watermark. It is shown in Fig. 12.

Fig. 12
figure 12

NCC and BER values for various video attacks on sample watermarked videos (mountain.avi, rhinos.avi, manwalk.avi) and the extracted watermark (vitlogo.bmp)

For frame dropping, the watermarked video is dropped for various dropping rates (4, 20, 41, 62, 83 and 96 %) and then we tried to extract the watermark from the frame dropped watermarked video. Figure 12a–f shows the attacked watermarked videos and the extracted watermark for the above mentioned attacks. Here, our approach is able to extract 99, 99, 95, 87, 85 and 72 % of watermark, when the frames are dropped at the rate of 4, 20, 41, 62, 83 and 96 % for the test video1. Similarly, we were able to extract 98, 98, 95, 91, 82, 75 % and 97, 97, 96, 92, 74, 73 % of watermarks for test video 2 and 3 respectively. From the above results, we conclude that the proposed watermarking algorithm is able to withstand frame dropping attack for the test video 1 in a better way when compared to the other cases. In particular, when the frame is dropped at the rate of 4 % and 20 % we notice that nearly 99 % of the watermark is extracted for the test video1.

Figure 12g–i shows the attacked watermarked videos and the extracted watermark for various frame swapping attacks. The proposed approach is able to extract 92, 88 and 86 % of watermark when the frame swapping rate is 8, 25 and 50 % respectively for the test video1. Similarly, we were able extract 89, 86, 85 % and 79, 78, 75 % of watermarks for the test video 2 and 3 respectively. From the above result, we conclude that the proposed watermarking algorithm is able to withstand frame swapping attack for the test video 1 in a better way when compared to the other cases. In particular, when the number of frame dropping is 2 nearly 92 % of watermark is extracted for the test video1.

5.2.5 Multiple attacks

Apart from the normal image processing, temporal and geometrical attacks, we have introduced new kind of attacks such as the occurrences of more than one attack at the same time on the video but on different frames called as multiple attacks. Here, we have used six combinations as: (i) Gaussian noise (variance = 0.5) and Poisson noise (GP), (ii) Gaussian noise (variance = 0.5) and Salt and Pepper noise (density = 0.05) (GS), (iii) Poisson and Salt and Pepper noise (density = 0.05) (PS), (iv) Gaussian noise (variance = 0.5) and Rotation (10°) (GR), (v) Gaussian noise (variance = 0.5), Rotation (10°), and frame Dropping (3 frames) (GRD), and (vi) Gaussian noise (variance = 0.5) and Rotation (10°), Salt and pepper noise (density = 0.05) and frame Dropping (20 frames) (GRSD). Figure 13 shows the attacked watermarked videos and the extracted watermark for the above mentioned multiple attacks.

Fig. 13
figure 13

PSNR, SSIM, NCC and BER values for various multiple attacks on sample watermarked videos (mountain.avi, rhinos.avi, manwalk.avi) and the extracted watermark (vitlogo.bmp)

From Fig. 13, we notice that the effect of multiple attacks in terms of visual quality (PSNR, SSIM) and robustness (NCC, BER) is improved when compared to the individual attacks, because the effect of one type of noise is compensated by the other additions. In particular, when the rotation along with other attacks such as, Gaussian, Gaussian and dropping, Gaussian, dropping and salt and pepper noise shows better improvement in terms of visual quality (PSNR, SSIM) and robustness (NCC and BER). We have also identified that, the better PSNR (66.0942), SSIM (0.9675), NCC (0.9903), and BER (0.0099) values were obtained when the watermarked video is corrupted by Gaussian and salt and pepper noise attacks (Fig. 13b). The results show that, better PSNR, SSIM, NCC and BER values are obtained for the sample video1 (mountain.avi) when compared to the other two videos (rhinos.avi and manwalk.avi).

From the experimental results, we infer that the PSNR (min—54.0885, max—68.0295) and NCC (min—0.6021, max—0.9947) values are within the acceptable level. Among the results, we found that, the sample video1 (mountain.avi) shows better PSNR (min—60.3838, max—68.0292) and NCC (min—0.6299, max—0.9999) values for all types of attacks when compared to the other two sample videos (rhinos.avi and manwalk.avi). This is due to: i) the background motion is more when compared to the object motion in the case of test video1, the same is reversed in the other two sample videos. ii) the quality of the video is good in terms of brightness and contrast, in the case of test video1 but the same is not true for the other two videos, and (iii) The domination of blue component in sample video1 is more compared to the other two. Always, the blue color has less visual perception compared to red and green [20].

6 Comparison between conventional methods and the proposed approach

The performance of the proposed approach is justified by implementing the existing SVD [49], DWT [23], and DWT-SVD [19], DFT-Radon [33], DCT - SVD [55] based algorithms with our dataset and the same is compared with the attacks introduced. Figures 14, 15, 16, and 17 show the comparison plots between the existing techniques and the proposed approach.

Fig. 14
figure 14

a Image processing attacks Vs PSNR, b Image processing attacks Vs NCC

Fig. 15
figure 15

Geometrical attacks Vs NCC

Fig. 16
figure 16

a Frame dropping Vs NCC and b Frame swapping Vs NCC

Fig. 17
figure 17

a Multiple attacks Vs PSNR, b Multiple attacks Vs NCC

We tested the imperceptibility and robustness of the proposed watermarking algorithm to common distortions using four types of attacks namely, image processing attack, geometrical attack, video attack and multiple attacks and the same is also been compared with the existing watermarking algorithms.

In our first test, watermarks were decoded from the watermarked video which is affected by some of the image processing attacks with indices such as Gaussian noise (G) with variance 0.1 (index 1) and 0.5 (index 2), Poisson noise (P) (index 3), Salt and Pepper noise (SP) with noise density 0.02 (index 4) and 0.06 (index 5) respectively, contrast adjustment (C) (index 6), median filtering (M) of sizes 3*3 (index 7) and 5*5 (index 8), and histogram attack (index 9) and the results are shown in Fig. 14.

From the Fig 14a and b, we infer that the proposed approach is able to extract 99.39, 98.04, 99.72, 98.44, 87.36, 88.49, 98.24, 97.77 and 92.99 % of watermark when the frames are affected by G, P, SP, C, M, and H respectively. From the figure, it is also clear that the proposed approach is highly imperceptible against median filtering attack (M) of 3*3 and 5*5 with a PSNR value 68.0292 and 67.8916 respectively when compared to other image processing attacks. In particular, when frames are corrupted by Gaussian noise with variance 0.1, nearly 99.39 % of watermark is extracted. From the above results, we infer that the proposed watermarking algorithm is able to withstand image processing attacks in a better way except the contrast adjustment attack when compared to the existing watermarking algorithms in terms of imperceptibility level as shown in Fig. 14a.

In the second test, the watermarked video sequences were distorted by geometrical attack namely rotation. Here, various degrees of rotation such as 1, 2, 5, 10, and 180° are used and then the watermarks are extracted from the rotated video.

Figure 15 shows the comparison of NCC values among the various rotation attacks with the conventional DWT and SVD method. The proposed approach is able to extract 91 %, 90 %, 86 %, 84 % and 99 % of watermark, when the frames are rotated by 1°, 2°, 5°, 10°, and 180° respectively. These results were found to be good when compared to that of the conventional algorithms namely SVD [49], DWT [23], and DWT-SVD [19], DFT-Radon [33], DCT - SVD [55].

In the third test, we compared the effect of video processing attacks such as frame dropping and frame swapping with the related methods [19, 23, 33, 49, 55]. From Fig. 16 and b, we infer that the robustness level of the extracted watermark is good for all dropping rate and swapping rate. The proposed approach is able to extract 99, 98, 95, 87, 85 and 72 % of watermark when the frames are dropped at the rate of 4, 20, 41, 62, 83 and 96 % respectively. These values are found to be better when compared with the existing algorithms.

The fourth test case we used for the comparison of the proposed algorithm with the conventional DWT and SVD algorithm is multiple attacks. Here, six different combination of multiple attacks namely, (i) Poisson and salt and pepper noise (density = 0.05) (PS), (ii) Gaussian noise (variance = 0.5) and salt and pepper noise (density = 0.05) (GS), (iii) Gaussian noise (variance = 0.5) and Poisson noise (GP), (iv) Gaussian noise (variance = 0.5) and rotation (10°) (GR), (v) Gaussian noise (variance = 0.5), rotation (10°), and frame dropping (3 frames) (GRD), and (vi) Gaussian noise (variance = 0.5), rotation (10°), salt and pepper noise (density = 0.05) and frame dropping (20 frames) (GRSD) are used to distort the watermarked video. Figure 17a and b show the comparison of PSNR and NCC values for various multiple attacks with the conventional methods. The proposed approach is able to extract 98.11, 99.03, 92.59, 97.35, 98.66 and 98.51 % of watermark, when the frames were distorted by multiple attacks of type PS, GS, GP, GR, GRD and GRSD respectively. These results are found to be good when compared to that of the conventional algorithm methods.

6.1 Detection rate and receiver operating characteristics

To evaluate the robustness of the proposed algorithm, the Detection Rate and ROC are the other metrics used in addition to the above metrics.

6.1.1 Detection rate

Once the attack was introduced on the watermarked video, the detection rate can be computed. To avoid false positive rate for the given dataset, the threshold for detection is set to be Dt = 12 % and above, because the highest detection value of the unwatermarked videos were 11 %. After an attack was introduced, the detection rate was calculated for various attack types such as image processing, geometrical, temporal attacks and multiple attacks. From the test, we found that the proposed approach’s average detection rate is 97.87 %. Similarly, 75.46 %, 77.23 %, 89.91 % and 88.73 % for DCT - SVD [55], SVD only [49], DFT - Radon [33] and DWT - SVD [19] respectively. From this, we conclude that the proposed watermarking algorithm is able to withstand for all kinds of attacks when compared to the existing watermarking algorithms in terms of imperceptibility level.

6.1.2 Receiver operating characteristics (ROC)

ROC Curves can avoid the influence of predefined thresholds. This curve is a plot of the probability of true positive detection versus the probability of false positive detection [53, 32]. ROC is a good tool for estimating the behavior of a detector for different types of degradations introduced in a video. The false positive detection occurs when the detector detects a watermark in an unwatermarked image. The high false positive detection rate is unacceptable for most watermarking methods.

The probability of a false positive is the probability that the detection value for unwatermarked image will exceed the threshold value which is 0.17 in our case. Here, the ROC curve is drawn only for image processing attacks and it is compared with the related methods [19, 33, 49, 55]. Figure 18 shows the comparison of ROC curves of the three methods for various image processing attacks in terms of NCC and BER respectively.

Fig. 18
figure 18

Comparison of ROC curves for various image processing attacks a in terms of NCC b in terms of BER

To validate the accuracy of the test cases, the area under ROC curve is used. The value of the area under the curve determines the level of accuracy. Normally, this value varies between 0 and 1. The area of 1 represents perfect test, whereas the area of 0.5 represents worthless test. From Fig. 18, we infer that the ROC curve for image processing attacks for our proposed system is 0.96 in terms of NCC and 0.88 in terms of BER, which proves to be excellent and good, according to ROC test.

All the above results proves that the proposed video watermarking technique is robust against various attacks, except in the case of contrast adjustment attack when compared with the existing algorithms. The payload capacity is also reasonably increased in the proposed approach, as by embedding different (N − number of motion frames) / 24 images of any size in the cover color video. Thus, the proposed technique is best and suitable for multilevel authentication applications.

7 Conclusion

In this paper, we have presented a novel bit plane slicing based watermarking algorithm for color watermark images. The simulation results prove that our approach has the following attractive features: First, we scramble the watermark slices using Arnold transform key, then we embed scrambled slices of the watermark on the singular values of the DWT sub bands over CT domain of the non-motion frames. A good fidelity is achieved, due to the spatial frequency resolution property of DWT and multi scale and directional property of CT and intrinsic algebraic properties of SVD. The adaptive nature of the embedding factor further improves the level of imperceptibility. This statement is proved as we achieved an average PSNR of 55.0885 and best PSNR value of 68.0292. Second, high level of robustness is achieved by means of hybridization of CT and DWT with SVD. Our approach is robust against image processing attacks, multiple attacks, geometrical attacks and temporal attacks. This is proved using the Normalized Correlation Coefficient (NCC) value, as we got from the experimental results, the minimum NCC value of the watermark which is extracted is 0.6021, which can be acceptable if we examine it using our bare eyes. Third, the payload is also achieved at the rate of (N-number of motion frames)/24 images for N frame video. Fourth, in order to improve the security two levels authentication is introduced. (i) Since, the watermark slices are scrambled using Arnold key, the scrambling of the slices requires the same key, even after extraction. (ii) In order to detect whether the transferred video is tampered or not, the eigen vector V′ of the watermark image is also embedded. Thus, at the receiver side, after extraction of slices, it is grouped together to form an estimate of the watermark image and calculate the Eigen vector V″ for the estimate. The comparison value proves the authentication of the received video. Finally, the watermark slices and the Eigen vector are embedded only on the non-motion frames. This kind of embedding avoids our transmission from collusion attack. From our studies, we conclude that the proposed approach is best and suitable for copy right protection and content authentication applications when compared to the existing approaches.