Blind semi-fragile watermarking scheme for video authentication in video surveillance context

Hammami, Amal; Ben Hamida, Amal; Ben Amar, Chokri

doi:10.1007/s11042-020-09982-4

Blind semi-fragile watermarking scheme for video authentication in video surveillance context

Published: 28 October 2020

Volume 80, pages 7479–7513, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Blind semi-fragile watermarking scheme for video authentication in video surveillance context

Download PDF

Amal Hammami¹,
Amal Ben Hamida¹ &
Chokri Ben Amar¹

615 Accesses
15 Citations
Explore all metrics

Abstract

With the development of advanced multimedia editing tools, numerous unauthorized manipulations are easily doable to surveillance systems video files. Thus, video tamper detection is revealed as a big challenge for multimedia security field researchers. Indeed, we propose herein a singular value decomposition (SVD) and discrete wavelet transform (DWT) based semi fragile watermarking scheme for video content authentication. A content-based authentication signature is firstly generated by extracting reliable features from regions of interest. QR code generation technique as well as Arnold transform are used to boost the security aspect of the watermark. This latter is efficiently hidden in the wavelet middle frequency sub bands through an additive embedding algorithm and then extracted via a blind detection method. Simulation results demonstrate that the proposed scheme jointly achieves a good perceptual quality and a high watermark capacity. In addition, it is capable of distinguishing intentional attacks from incidental modifications. Indeed, the proposed watermarking scheme is very fragile to malicious tampering while allowing non-malicious processing.

Blind Semi-fragile Hybrid Domain-Based Dual Watermarking System for Video Authentication and Tampering Localization

Article 04 August 2023

The blind robust video watermarking scheme in video surveillance context

Article 07 October 2023

Low complexity semi-fragile watermarking scheme for H.264/AVC authentication

Article 26 May 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Nowadays, video surveillance is broadly deployed in several sectors. In fact, surveillance cameras are increasingly installed in public places as well as private ones for instance, in street corners, commercial stores, residential areas, airports, train stations etc. Indeed, 245 million security cameras were active around the world in 2014 [22]. According to Information Handling Services (IHS), there were less than 10 million professionally installed video surveillance cameras globally in 2006 [20]. This number rises quickly to beyond 100 million in 2016. Moreover, over than130 million cameras are shipped in 2018. The main reasons for this burgeoning deployment are the public safety improvement against the crime threat growing and the property security in the society [13]. In addition, the hardware low cost in comparison to the human surveillance further enhances the video surveillance systems ubiquity [13, 18]. Furthermore, videos recorded by a video surveillance system are the subject of many analytical functions such as objects classification and identification, objects tracking and activities and behaviors analysis [7,8,9]. Besides, they play an important role in police and judicial investigations as legal evidences.

In the other hand, the recent revolution in computer technology field is leading to several problems for the multimedia industry in general and the video surveillance one in particular. In fact, this improvement comes across with the development of sophisticated signal and image processing software, which are able to maliciously manipulate the stored videos content without deteriorating of the visual quality. For instance, surveillance sequences can be simply doctored in such way to exculpate or incriminate an individual. Thereby, the stored videos lose their trustworthiness and credibility as legal proof in front of court low. Hence, it is a critical need that video surveillance systems integrate authentication procedures in order to guarantee data integrity and prove their true origin [66].

To overcome this challenge, a broad range of authentication techniques has already been introduced. Cryptography with different protocols is one of the most used solution to protect videos authenticity and integrity [1, 40, 56]. Nonetheless, this video authentication mechanism has some shortcomings such as computation and storage requirements. Likewise, after encrypting the digital video any visualization, analysis or visual data search requires its decryption. To deal with these weaknesses, video watermarking is introduced as a promising cryptography alternative [35, 36, 45, 46]. It is the procedure of embedding a signature called watermark in the video frames. The embedded watermark can be an image, a logo or any particular kind of information content. A video watermarking system is consisting of two processes as shown in Fig. 1. The first one is the embedding, which refers to the watermark combination with the host video. The information to be used as a watermark can be an image or a binary sequence. In addition, it can be constructed through exploiting video frames features. The watermark extraction is the second process consisting a video watermarking system. It is the process of extracting the hidden information from the eventually tampered watermarked video that will be used to ascertain the video content authenticity.

Watermarking based authentication approaches were first introduced as fragile watermarking systems. In this case, any modification of the watermarked video readily generates a mark detection failure. Thus, the watermark loss is considered as an evidence of content tampering. The main benefit of fragile watermarking is the ability of tampering localization but it is so difficult to discriminate between malicious video processing which aims to alter the video semantic content and some non-intentional processing [3, 12]. Another popular used approach is the robust watermarking. It is aptly named due to its resilience against any attacks form. Indeed, the hidden information can be recovered from tremendously attacked watermarked video [19, 43]. To exploit the advantages of both the fragile approach and the robust one, another paradigm is introduced. It is referred as semi fragile [17, 44]. This watermarking method type is designed to be robust against intentional tampering distortion and to tolerate only unintentional manipulations. A semi-fragile watermarking system has provide its efficiency for applications that require a trade-off between robustness and fragility namely for video surveillance application. Thus, we propose in this work a blind semi fragile watermarking scheme for video authentication in video surveillance context using Discrete Wavelet Transform (DWT), Singular Value Decomposition (SVD), Quick Response code (QR code) and Arnold Transform.

This paper remainder is organized as follows. Section 2 provides an overview of video watermarking field. The review of state of the art of video watermarking based authentication techniques is given in Section 3. Section 4 presents the proposed semi fragile watermarking scheme. Performances results and a comparison with existing techniques are reported in Section 5. Finally, conclusions are drawn and perspectives are open in the last section.

2 Overview of video watermarking

In this section, an overview of video watermarking and its main terminologies is given. First, we define the video watermarking applications. Next, we key out the requirements in this field. Finally, we present various video watermarking techniques classifications.

2.1 Video watermarking applications

Digital watermarking has become into vogue from the late 1980s. This research area has quickly witnessed a great growth due to its important applications. Broadcast monitoring is one of the common used video watermarking applications. It enables advertising agencies to verify whether their commercial contents are broadcasted as contracted by hiding a watermark in advertisements contents. In fact, extracting the embedded watermark enables to check that commercials have been aired during all the payed for time [28, 47]. Moreover, watermarking can be used for fingerprinting. This application allows finding illegal copies source. Indeed, the owner can embed a different watermark in each content media copy. Thus, this mark enables the intellectual property owner to identify the buyer for each legal distribution and check who has broken his license by providing the content to third parties [38, 70]. Copyright protection is a fundamental video watermarking application. In this case, specific owner information are used as a watermark in order to identify the copyright ownership as well as prevent video fraud and misappropriation. Indeed, watermark retrieving from the watermarked video allows the rightful owner to prove the video ownership when someone alleges it [14, 61]. Besides, data authentication is another popular watermarking application, which aims to confirm the watermarked video integrity and to detect the attempted altering of the original video content. The watermark, which is concealed in the host video, is designed to be affected by signal manipulations and then be used to indicate whether the watermarked video content is authentic or not [30, 63].

2.2 Video watermarking requirements

As we previously mentioned, video watermarking is exploited in wide range of applications. Consequently, every watermarking system should have its own specific properties with respect to the considered application. Mostly, three requirements are basically given for most video watermarking systems. The first one is imperceptibility or transparency, which refers to the watermarked video perceptual quality. Obviously, it depends on the embedding process. Indeed, the distortion caused by the algorithm used for watermarking should add a minor degradation to the host video perceptual quality. Therefore, the watermarked video should not be distinguishable from the original one by human eyes. The second property is robustness. It means the ability of the watermark to survive under distortions. These attacks are mainly divided into two types; unintentional and intentional ones. Unintentional attacks are processing that do not have the goal to impair or remove the watermark. Intentional attacks attempt mischievously to damage the embedded data in the watermarked video. Capacity denotes the third requirement for video watermarking system. It defines the maximum amount of information that can be hidden in the host video as a watermark. The embedded information size varies according to the targeted watermarking application. For instance, for security purpose, a big capacity is required. In contrast, for copy protection purpose one-bit capacity is generally sufficient. Imperceptibility, robustness and capacity are mutually dependent to each other. In fact, increasing the capacity leads to decrease the robustness and degrade the visual quality. Therefore, a good trade-off among all the properties listed above should be maintained when designing a watermarking system [5, 64, 65]

2.3 Video watermarking techniques classification

Video watermarking techniques can be classified based on distinct criteria. According to human perception, video watermarking techniques are divided into two classes: visible watermarking techniques and invisible ones. For the first class, the watermark is embedded in such way to be noticeable when viewing the watermarked video. For the second class, the watermark is concealed in the host video in order to be perceptively unidentifiable by human eyes. Based on watermark detection criterion video watermarking techniques are classified as non blind, semi blind and blind. In non blind techniques, both the original video and the watermark are required during the extraction process. On the other hand, in semi blind techniques the information used as a watermark can be successfully extracted from the watermarked video without using the original video. In blind detection neither the embedded watermark nor the original host video are required for watermark extraction [5, 6].

Another criterion, which is frequently used to classify video watermarking schemes, is the working domain. Indeed, depending upon this criterion video watermarking techniques are usually divided into two categories. The first one is the spatial domain watermarking. In this type, the embedding process is achieved by directly modifying or replacing the original video frame pixel values. Spread spectrum, Least Significant Bit, correlation based technique present the most used technique in this domain [54, 55, 69]. Spatial domain based watermarking approaches are characterized by a simple implementation and a low computational complexity. However, it is denoted that these techniques have several drawbacks namely low embedding capacity and weak robustness against several attacks specially compression. The frequency domain which also referred as transform domain is the spatial domain alternative. Video watermarking technique in this case starts by converting the host frame to a new appropriate working domain. Then transform coefficients are adjusted by the watermark to obtain a watermarked frame. The common domain transformation techniques are the singular value decomposition (SVD), the Discrete Cosine Transform DCT, the Discrete Wavelet Transform (DWT) and the Lifting Wavelet Transform (LWT)[26, 49, 53]. Frequency domain based approaches have gained a tremendous exposure as compared to spatial domain based ones since they are more resilient to geometrical and compression attacks. Subsequently, they yield large capacity and better imperceptibility by respecting more advanced human visual system properties. Therefore, transform domain based approaches allow to efficiently meet the trade-off between the different watermarking system requirements [6, 11, 68].

3 Related work

Video authentication through video watermarking scheme is an appealing field, which motivates several researchers. In the literature, there is a variety of existing approaches relevant to this research area. As already noted in Section 2.3, video watermarking techniques are commonly classified, based on the embedding domain criterion, to two categories, i.e., spatial domain watermarking techniques and frequency domain ones. In the present section, we will only investigate frequency domain based watermarking schemes since this domain allows better attaining the compromise between the different watermarking requirements. Regarding the number of the used domain transformations, existing approaches dedicated to frequency domain can be mono frequency or multi frequency.

The mono-frequency based watermarking systems involve only one transform to embed the mark. In [4], Alenizi et al. propose a new DWT based video watermarking scheme for authentication purpose. The luminance Y component undergoes a DWT decomposition via randomly generated filters to increase the algorithm security. The watermark is inserted in the middle frequency sub band using an additive method with a pseudo-random sequence P, which is generated using a secret key and a constant magnitude factor α to control the watermark robustness. The simulation results show that this scheme has good performances under different well-known attacks. However, it gives lower performance in terms of correlation when the scenes have a smooth nature and a few motions. In [25], a DCT based video watermarking is introduced. In this scheme, the watermark is concealed in the low frequency sub-band resulting of the DCT application to the changed scene specific frames. Farfoura et al. present a semi-fragile watermarking scheme for content-based authentication [15]. The authentication codes used in this scheme are composed of frames index timing information and invariant features, which are extracted from intra macroblocks. The watermark is inserted into Quantized DCT (QDCT) coefficients in a set of random chosen Group of Pictures (GOP). The advantages of this watermarking scheme are the resilience against semantic content preserving attacks as well as the sensitivity to content altering attacks. In addition, the technique shows a low computational complexity and a good imperceptibility level. Furthermore, Bhardwaj et al. introduced a robust video watermarking technique operating in the mono frequency domain [10]. In this scheme, the to-be-watermarked frames are chosen via a frame selection procedure based on the mathematical relationship between the non-watermarked video frames index, the embedding capacity and the coefficient block size. The watermark bits are hidden in the quantified LH3 sub-band coefficients resulting from the lifting wavelet transform (LWT). Experimental results demonstrate that this technique is robust to various image processing attacks with a good level of imperceptibility. Khosravi et al. propose several efficient interpolation based-watermarking schemes operating in the mono frequency domain for data management transmission in remote sensing video surveillance by video synthetic aperture radar (ViSAR). In fact, this latter provides several principal, control and managerial data which should be compressed before been transmitted. Hence, authors adopt watermarking systems based on interpolators and domain transformations such as Fast Fourier Transform (FFT), (DCT) and (DWT) to aggregate and reduce the ViSAR information size [32,33,34].

Conversely, multi-frequency based video watermarking techniques operate combining several transformations in the embedding process. A DWT and SVD based watermarking technique is developed in [59]. In this methodology, Fibonacci sequence is used to identify key frames which will be used for the watermarking. The watermark singular values are embedded in LH mid frequency sub band coefficients of selected frames. Based on simulation results, this technique is immune to video processing attacks and it ensures a good quality of watermarked videos. Another multi-frequency based video watermarking combining the (DWT) with the principal component analysis (PCA) is proposed by Yassin et al. in [67]. In this work, two levels DWT is used to transform the Y component to the frequency domain. The maximum coefficients of the maximum entropy principal component analysis (PCA) blocks are identified as the optimal watermarking locations. The watermark is hidden in the selected suitable coefficients quantified values. According to the experimental results, this watermarking methodology proves its robustness against different distortions specially contrast adjustment, Gaussian noise addition and JPEG coding.

In [51], Nouioua et al. introduce a novel digital video watermarking technique based on SVD which performs in the Multi-resolution Singular Value Decomposition domain. The watermark is encrypted through a Logistic Map Encryption and then hosted only in the fast motion frames in each video shot. The embedding is done following a blind Quantization Index Modulation algorithm. Authors claimed that this scheme is secure and robust to a variety of manipulations like compression, image processing and frame synchronization. Another multi-frequency based video watermarking technique is developed by Panyavarapornto for both copyright protection and content authentication purposes [52]. In this scheme, discrete wavelet transform is used as a combination with discrete cosine transform. Indeed, the watermarking is achieved by applying DWT on the Y component of the video sequence frames then performing the DCT on the middle frequency sub bands and finally the watermark is inserted in mid-band DCT coefficients. The proposed algorithm has proven its robustness especially against compression attacks and has shown visually acceptable quality. Similarly, an enhanced watermarking approach using DWT, DCT and interpolation is proposed in [29]. In this algorithm, interpolation technique is applied, after the watermark extraction, to zoom the host frame and to get the concealed and improved information hidden in the host watermarked frame.

According to the above existing video watermarking approaches overview, it is clear that the combination of transformation domain techniques offers better resilience to different attacks than the technique involving one single transform. Consequently, in the proposed work the watermark embedding is carried out in the multi frequency domain.

4 Proposed approach

The proposed system is a blind and semi fragile video watermarking in the frequency domain based on DWT, SVD, QR code and Arnold transform for video authentication in video surveillance context.

As illustrated in Fig. 2, it involves 3 processes namely: the watermark generation, the watermark embedding and the detection process. The design of each process will be explained in the following subsections.

The main contributions in this work are:

1)
The selection of proper invariant features to construct a content-based watermark that exhibits semi fragility property and allows to fulfill the task of discrimination between malicious processing actions and non-malicious ones.
2)
The adoption of QR code technique and Arnold transform to deal with the watermark security and computational complexity challenges. Before being embedded in the host video frame, the watermark is processed by a QR code generator and then encrypted by Arnold transform. Therefore, the hidden information cannot be recovered in its original form even if the attacker successfully decodes the extraction algorithm.
3)
The hybridization of two transformation domain techniques, namely the DWT and the SVD, and the exploitation of their complementary characteristics to enhance the watermarking system performance. In fact, The DWT sub bands properties as well as the relation between the SVD coefficients are jointly used to embed the watermark into the host video and to guarantee a blind detection during the extraction process.

4.1 Preliminaries

To better understand the details of the proposed approach, a brief overview of YUV space color, discrete wavelet transform, singular value decomposition, QR code technique and Arnold transform is provided in this section.

4.1.1 YUV color space

The YUV color spaces consists of luminance (intensity) and chrominance (color) components.YUV components are less correlated than the RGB color space ones that makes it more suitable for image and video processing applications and for watermarking in particular. The conversion from RGB to YUV and the transformation from YUV to RGB are done using formulas (1) and (2) respectively.

$$ \left\{ \begin{array}{lcl} Y = 0.299\times R + 0.587\times G + 0.114\times B \\ U= -0.147\times R - 0.289\times G + 0.436\times B\\ V = 0.615\times R - 0.515\times G - 0.100\times B \end{array} \right. $$

(1)

$$ \left\{ \begin{array}{lcl} R = Y + 1.140\times V \\ G= Y - 0.395\times U - 0.581\times V\\ B = Y + 2.032\times B \end{array} \right. $$

(2)

4.1.2 Singular value decomposition

SVD is a numerical transform which decomposes an mxn real matrix A into a factorization of three matrices:

$$ A = U \times S \times V^{t} \quad [54,55] $$

(3)

Where:

$$ U= \begin{bmatrix} u_{11} & u_{12} & {\ldots} & 0 & u_{1m} \\ u_{21} & u_{22} & {\ldots} & 0 & u_{2m} \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} & \vdots\\ u_{m1} & u_{m2} & {\ldots} & 0 & u_{mm} \\ \end{bmatrix} S= \begin{bmatrix} S_{00} & 0 & 0 & {\ldots} 0 \\ 0 & S_{11} & 0 & {\ldots} 0 \\ {\vdots} & {\ddots} & {\ddots} & {\ddots} & \vdots\\ 0 & {\ldots} & 0 & S_{nn} & 0\\ \end{bmatrix} V^{t}= \begin{bmatrix} v_{11} & v_{12} & {\ldots} & {\ldots} & v_{1n} \\ v_{21} & v_{22} & {\ldots} & {\ldots} & v_{2n} \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ v_{n1} & v_{n2} & {\ldots} & {\ldots} & v_{nn} \\ \end{bmatrix} $$

U and V, that are orthogonal matrices of size mxm and nxn respectively, present the singular vectors of matrix A. S is mxn diagonal matrix and its non-zero elements arranged in descending order define the singular values of matrix A. The singular values matrix S ensures higher invisibility and more robustness against attacks as compared to U and V matrices thereby it suits the watermarking requirements. Generally, SVD is gaining more popularity in image and video processing area thanks to it attractive properties namely its conceptual stability and its maximum energy packing [39, 57].

4.1.3 Discrete wavelet transform

DWT is a mathematical tool used to hierarchically decompose an image and video frames. This tool allows separate an image into 4 frequency sub-bands i.e. low-frequency sub-band (LL) as well as high-frequency sub-band (HH) and mid frequency sub-bands (HL and LH). The process can be repeated to compute multiple levels wavelet decomposition. DWT is well known for its resilience to noise addition and compression. Also, it better modulates the Human Visual System aspects than the other domain transformation techniques. Hence, it was adopted for many practical applications in image and video processing such as image restoration and image zooming as well as transmission and compression [23, 24, 48, 50]. It is often used in watermarking schemes due to its spatial localization, frequency spread and multi-resolution modelling [2].

Figure 3 illustrates the sub-bands obtained after two decomposition levels.

4.1.4 Quick response code

Quick response code is a two dimensional matrix symbols introduced in 1994 by Denson-Wave and it is standardized by the international organization for standardization as ISO/IEC 18004:2015 [21].

A QR code is a set of black square blocks arranged in a white background. Version information, separators, timing patterns, format information, data and error correction, quiet zone, alignment patterns and position detection are the QR code basic structure elements as shown in Fig. 4. It is used in a wide range of multimedia applications especially when a great information size should be transmit in a compact format. In fact, a QR code can carry up to 7089 numeric characters and up to 4296 alphanumeric characters [27]. Likewise, providing a good damage resilience and a high storage capacity are the main reasons for the QR code adoption in the watermarking field.

4.1.5 Arnold transform

Arnold transform is an invertible and iterative mapping, which permits to randomize the original pixels positions in an image. The considered iterations number is called as the Arnold’s period and it depends on the original image size. The main purpose of the Arnold transform is to warp the original image semantic, which become unreadable in the scrambled version. The Arnold Transform of an nxn image is described by the following equation:

$$ \begin{bmatrix} x^{\prime} \\ y^{\prime}\\ \end{bmatrix} = \left|\begin{array}{cc} 1 & 1\\ 1 & 2 \end{array}\right| \begin{bmatrix} x \\ y\\ \end{bmatrix} \mod{N} \quad [63] $$

(4)

Where (x, y) and ($x^{\prime }, y^{\prime }$) are the original pixel coordinate and the scrambled one respectively and N is the image size.

Arnold transform is recognized as one of the most used image scrambling technique. It has various applications, particularly in watermarking field; it is often utilized to encrypt the watermark in order to ensure the confidentiality and to improve the security level of the watermarking scheme [58]. Indeed the watermark cannot be extracted without an accurate knowledge of the particular Arnold period K.

Figure 5 depicts an example of Arnold transform applied to an image with different periods K.

4.2 Watermark generation process

A well-designed watermark is a prominent requirement for the watermarking scheme efficiency. In the proposed watermarking system, the host video is divided into sequences of N successive frames. For every video sequence, a watermark is generated from its first frame based on Algorithm 1 and then it is repeatedly inserted in each frame of the given sequence.

In order to cater to the security need, N, which defines the number of frames in each sequences. In fact, a large value of N means embedding the same watermark into a large number of consecutive frames. Conversely, small value of N denotes the watermarking of few number of frames with the same watermark. Hence, its value should be properly fixed to avoid making the watermark vulnerable to unintentional manipulations.

As illustrated in Fig. 6, the watermark generation process implies two main steps: Regions Of Interest (ROI) extraction and watermark construction. Since we focus on captured videos for surveillance purpose in public places, moving objects for instance pedestrians and vehicles are the required regions. Indeed, they are the most targeted regions by malicious attacks in a video frame and each intentionally forgery on their content should be detectable. A technique based on adaptive improved version of Gaussian Mixture Model (GMM) [62] is used to detect ROI. In order to remove noising information, morphological filtering operations such as closing and opening are achieved as explained in [7].

Then the extracted regions are exploited for the watermark construction strategy. First, the ROI extern contours are extracted. We select only salient points from moving objects edges in order to keep relevant information as well as to significantly minimize the computation time and enhance the watermark robustness. Indeed, salient points selection is done through Shi-Tomasi corner detector [60], which is resilient to several attacks. In our algorithm, the detected corner positions are considered as features. In fact, a cartographic map is constructed with the selected salient points coordinates. To provide additional security level to our system, the constructed map is processed as an input to a QR code generator. This contributes not only to enhance the security side of the system but also to conceal a large amount of information during a less embedding time. To further strengthen the security of the secret information to be hidden, the obtained QR code is encrypted using Arnold transform with a period K. Hence, this image scrambling technique ensures that the watermark extraction cannot be done without an accurate knowledge of the particular Arnold period K, which represents the second watermarking secret key in our approach. Finally, the scrambled version of the QR code is used as a watermark and hosted into the video frames.

4.3 Embedding process

As mentioned before, the host video is processed initially to be segregated into sequences of N gathered frames. All frames in each sequence are watermarked by a unique scrambled watermark, which is intrinsic to the given video sequence. The embedding process flow chart is shown in Fig. 7 and described in Algorithm 2. The RGB frame is first converted into YUV format as its components are less correlated than the RGB color space [16]. By virtue of the fact that it is better harmonized with human visual system (HVS), the luminance component Y is selected for the embedding process to strengthen the watermark imperceptibility. More precisely, the human eye is less sensitive to the luminance component Y compared to the chrominance components U and V [16].

The selected component is divided into several non-overlapping blocks of 4*4 size. The block size is chosen to maximize the number of bit to be inserted i.e., to guarantee a large capacity. Indeed, in every resulting block one bit will be concealed. Thereafter, each block is subjected to a single level discrete wavelet transform DWT. The DWT is solicited as a domain transformation technique thanks to its efficient resilience to noise addition. Moreover, it allows to more faithfully modeling the Human Visual System aspects than the other domain transformation techniques. Among the produced sub bands, only the mid frequency sub bands (LH1 and HL1) are selected as the best watermarking locus because they strike the correct trade-off between the imperceptibility and the robustness requirements. In fact, involving the low frequency sub-band (LL), which represents the most significant video frame parts, in the embedding process can increase the watermark robustness at the cost of the perceptual quality. Conversely, inserting the watermark within the high frequency sub-band (HH) guarantees a good imperceptibility but the secret embedded information risks to be lost during the compression processing since it refers to the least important information in the given video frame [41, 42].

Afterwards, the singular value decomposition is performed to the selected sub-bands. This operation yields 3 independent matrices namely U, S and V. Since S provides higher invisibility and more robustness against attacks as compared to the two obtained matrices U and V, it is particularly taken as the one to be watermarked. The watermark insertion is carried out by modifying the singular values of S matrices relative to the mid frequency sub-band HL1 and LH1 according to the bellow equations:

If W_embedding = 0

$$ \left\{ \begin{array}{lcl} S_{watermarked}(0,0) = S_{original}(0,0) +Fact_{\alpha} \\ \\ S_{watermarked}(1,1) = S_{original}(0,0) \end{array} \right. $$

(5)

Else

$$ \left\{ \begin{array}{lcl} S_{watermarked}(0,0) = S_{original}(1,1) +Fact_{\beta} \\ \\ S_{watermarked}(1,1) = S_{original}(1,1) \end{array} \right. $$

(6)

Where W_embedding is the watermark bit, S_original and S_watermarked are respectively the original version of the singular value matrix and the watermarked one. Fact_α and Fact_β are two scaling factors used for controlling the watermarked video visual quality as well as the watermark robustness. Their values, which depend on the coefficients of the original matrix S, are calculated using the following formulas.

$$ Fact_{\alpha} = \frac{S_{original}(0,0) +S_{original}(1,1)}{\alpha} $$

(7)

$$ Fact_{\beta} = \frac{S_{original}(0,0) +S_{original}(1,1)}{\beta} $$

(8)

Where α and β are two integer values.

Next, both singular value decomposition inverse and discrete wavelet transform inverse are applied to yield the watermarked luminance component Y. This latter is combined with the non watermarked chrominance components to obtain the watermarked RGB frame after re-converting the color space from YUV to RGB using (2).

The watermarked video is the result of the repetition of the above-described process to each frame in every sequence.

4.4 Detection process

Figure 8 illustrates the watermark detection general scheme that involves two processes: the regeneration process and the extraction one.

It claims that the detection is blind since only the watermarked video and the two secret keys N and K are required as the scheme inputs. The regeneration process is composed of the same steps used in the watermark generation process. The regenerated watermark is denoted by W_regenerated. In the other hand, the extraction process starts operating in analogy with the watermark embedding process as described in Algorithm 3. In fact, the watermarked video is subdivided into video sequences using the secret key N and a watermark is further extracted from each sequence. At first, a conversion from RGB to YUV color space is performed. Then the luminance component Y is decomposed to 4x4 non-overlapping blocks. After performing a single level DWT to each block, the singular value decomposition SVD is applied to the middle frequency sub bands LH1 and HL1. Finally, the hidden signature is extracted from the singular values matrices coefficients based on the following rules:

$$ \left\{ \begin{array}{lcl} W_{extracted}(0,0) = 0 \quad If\quad S_{extracted}(0,0) - S_{extracted}(1,1)>\frac{Fact_{\alpha} +Fact_{\beta}}{2} \\ \\ W_{extracted}(0,0) = 1 \quad Otherwise \end{array} \right. $$

(9)

Where S_extracted is the extracted singular value matrix, W_extracted is the extracted watermark bit, Fact_α and Fact_β are the two scaling factors computed using (7) and (8) respectively.

For tampering detection, the extracted watermark W_extracted and the regenerated one W_regenerated are compared. In fact,a mismatch between these two watermarks denotes an occurred alteration.

5 Experimental results

The proposed scheme is tested on various videos. The selected videos include at least one moving object and low to high movement activities amount. Details of videos for testing are indicated in Table 1. As depicted in this latter, these videos are test.avi, camera2.avi, video1.avi, foreman.avi, tempete.avi, table.avi and mobile.avi. The first three sequences belong to PETS benchmark datasets. However, the others videos are often used to evaluate previous existing works. The used videos, which hold on a different frames number, are distinguished by frame size as well as the frame per second (FPS) metric.

Table 1 Specifications of the used videos for simulation

Blind semi-fragile watermarking scheme for video authentication in video surveillance context

Abstract

Similar content being viewed by others

Blind Semi-fragile Hybrid Domain-Based Dual Watermarking System for Video Authentication and Tampering Localization

The blind robust video watermarking scheme in video surveillance context

Low complexity semi-fragile watermarking scheme for H.264/AVC authentication

1 Introduction

2 Overview of video watermarking

2.1 Video watermarking applications

2.2 Video watermarking requirements

2.3 Video watermarking techniques classification

3 Related work

4 Proposed approach

4.1 Preliminaries

4.1.1 YUV color space

4.1.2 Singular value decomposition

4.1.3 Discrete wavelet transform

4.1.4 Quick response code

4.1.5 Arnold transform

4.2 Watermark generation process

4.3 Embedding process

4.4 Detection process

5 Experimental results

5.1 Metrics

5.2 Configuration of parameters used for experimentation

5.3 Capacity results

5.4 Imperceptibility results

5.5 Robustness and fragility results

5.6 Comparison of our proposed scheme with existing authentication approaches

6 Conclusion and future works

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation