1 Introduction

In recent years, the deployment of digital videos is proliferating rapidly. In fact, this multimedia medium is applied in different applications such as intelligent surveillance, event of interest detection and future behaviors inspection. Besides, videos often serve as criminal evidences and insurance claims especially in surveillance context. Hence, stored videos credibility and authenticity must be guaranteed. Tackling this issue is a challenging concern notably with the multimedia technologies tremendous development. Indeed, the emergence of advanced editing software makes it increasingly easy for unauthorized users to forge video content without a noticeable perception quality degradation. Therefore, reliable content authentication procedure becomes an inevitable need in the digitally advanced community. Digital encryption, which converts the original video content into unreadable version, has been widely used as a solution to protect videos authenticity and trustworthiness [13, 19, 48]. However, this traditional authenticity protection technique exhibits many shortcomings. As first, preserving the actual video content is one of the most important requirements in practical applications due to the fast transmission [47]. Moreover, encryption methods do not enable to accurately locate tampered areas [49].

To address these defects, digital watermarking has been introduced and has become an important research discipline in the security field [16, 21, 39]. A watermarking system usually consists of two processes: an embedding process and an extraction one. The first procedure encapsulates a secret information referred as watermark into the host carrier content. The to-be-hidden watermark may be an image, a random binary sequence or a message. The extraction process allows recovering the inserted information from the watermarked content [21]. Watermarking techniques are widely exploited in many applications such as copyright protection, broadcast monitoring and data authentication [4]. Generally, a watermarking scheme is acknowledged in terms of three main properties namely (i) robustness: the hidden watermark should be resilient to several attacks; (ii) imperceptibility: the watermark embedding should not yield any conspicuous difference between the original content visual quality and the watermarked one; (iii) capacity: it determines the data amount that can be embedded into the host multimedia content [11, 24]. Obviously, the aforementioned characteristics are antagonist. In fact, increasing the capacity degrades significantly the watermarked content quality as well as diminishes the watermark robustness against attacks. Thus, an effective watermarking system must assure, depending on the dedicated application, a reasonable compromise between these properties [22].

Based on the robustness criterion, watermarking algorithms are split into three classes: robust, fragile and semi-fragile watermarking schemes [4]. Robust schemes have the ability to recover the watermark after undergoing modifications by any attack. Conversely, the fragile watermarking systems are designed to be susceptible to any small distortion. Semi-fragile watermarking is known as a robust and fragile algorithms modified version. For this watermarking type, the discrimination between malicious and non-malicious attacks is an important requirement. Indeed, semi-fragile schemes are introduced to allow legitimate manipulations including noises addition and compression while rejecting malicious distortions such as frames content changing by object deletion or insertion [22]. The main advantage of semi-fragile watermarking adoption in video content authentication purpose is that it jointly provides a greater potential for tamper modifications characterization and a good resilience level to certain manipulations especially the compression process, which represents a demanding feature in practical applications.

In addition to the robustness criterion, blindness is also commonly used to classify the watermarking systems as non-blind, semi-blind and blind ones [23]. A non-blind scheme necessitates the knowledge of the original video as well as the original watermark in order to extract the concealed information from the watermarked content. Actually, non-blind watermarking algorithms are not appropriate for practical applications where the non-watermarked cover video version is not available at the decoder. As far as semi-blind systems are concerned, only the original watermark is required during the extraction process. Besides, it is stated that these two categories, i.e., the non-blind and semi-blind schemes, lead for several information storage and bandwidth consumption issues as the original video version and/or the secret signature need to be saved and transmitted along with the watermarked sequence. Lastly, neither the host video nor the original watermark are required to successfully extract the hidden information by a blind detector. This characteristic makes the blind scheme gaining more popularity compared to the other two watermarking classes as it avoids the storage and bandwidth overload problems.

In this work, a blind semi-fragile dual watermarking scheme for video content authentication with a high robustness level to common processing, an efficient intentional attacks localization ability and a good watermarked video quality is proposed. The developed technique involves simultaneously the Least Significant Bit (LSB) technique, Torus automorphism mapping, the Lifting Wavelet Transform (LWT) and the Singular Value Decomposition (SVD). The remainder of this paper is framed as follows. A review on video watermarking techniques is provided in Sect. 2. LWT, SVD, Torus automorphism mapping and texture features are briefly introduced in Sect. 3. Section 4 explains in detail the proposed scheme and the associated processes. Performance evaluation results and comparative analysis are investigated in Sect. 5. Finally, Sect. 6 deals with this work conclusions and futures perspectives.

2 Related Works

As reported earlier, video watermarking has become an interesting research focus in the information security field in last decades. Thereby, various video watermarking techniques have been proposed in the literature. These techniques often differ in their design strategies, although they rely on some similar features. In fact, according to the working domain, acknowledged also as the embedding domain, watermarking methods are set into the spatial, the frequency, or the hybrid techniques [30].

Watermarking techniques that perform in the spatial domain conceal the watermark bits by directly adjusting the host video frames pixels values. In [3], a spatial domain-based watermarking scheme for content authentication and tamper detection is presented. In order to encapsulate the watermark, the authors use LSB technique, which consists in substituting the least significant bits of the original video frame pixels with the watermark bit. Simulations results show that this technique gives a good imperceptibility and an efficient forged areas detection. Mohammed A.A et al. introduce another video watermarking scheme performing in the spatial domain [35]. In this algorithm, the watermark is inserted via an additive method in selected lowest pixels values from the luminance component of the host frame. According to test results, this scheme has a high compression resilience with a good preserved watermarked video quality. In [36], a watermarking algorithm is proposed to verify the digital video content integrity and authenticity. In order to fulfill security requirement, the watermark is ciphered by XOR-ing it with an arbitrary image generated using Arnold Cat Map [1]. Then, the encrypted watermark is embedded following a LSB pixel-wise algorithm. Simulation experiments demonstrate the ability of this scheme to identify the modified regions in maliciously manipulated frames. The video watermarking proposed by Guangxi et al. in [8] is executed in the spatial domain by fine-tuning the luminance component pixels values. This method includes an embedding area selection strategy based on luminance adaptive and edge mask. The proposed approach can withstand a variety of attacks such as compression and scaling. Besides, it achieves a good imperceptibility. Watermark embedding in the spatial domain is recognized with its simplicity and inexpensive computational complexity. Furthermore, it does not lead to a significant distortion to the cover video quality. As a result, this technique is more convenient for applications in low computing power environment.

In contrast to the former, frequency watermarking schemes initially convert pixels values into the transform domain using a specific transformation. Then, the watermark is embedded into certain frequency bands coefficients. A wide range of transformations, including but not limited to Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), LWT and SVD, are extensively applied in the watermarking field. A watermarking scheme involving the DWT is proposed in [41]. For transparency improvement, the watermark is infused only on a single frame per scene. In fact, this watermarking methodology applies a scene-change detector to identify the scene-changed frames that are used as watermark information carriers. Then, these latter are decomposed using the 3 levels DWT and the to-be-hidden watermark is inserted in the LL sub-band coefficients. This approach can successfully sustain different manipulations. Furthermore, it yields watermarked videos with a good quality. Another DWT-based semi-fragile watermarking approach for video authentication is proposed in [46]. The authentication-based watermark used in this method is a noise-like image generated by arranging a random binary sub-sequence bits in the spectral domain. The built up watermark is embedded in the DWT frequency domain using a frame-by-frame watermarking algorithm. Based on the experiments results, the suggested design is convenient for solving authentication task as well as localizing the tampered frames.

Ponni and Ramakrishnan introduce a video watermarking combining the DWT and the SVD [40]. They use a scene change detection technique to identify key scenes that exhibit frequent changes. Later, key frames are chosen based on Fibonacci sequence and used as watermark information holders. The insertion is fulfilled by embedding the singular value of the watermark, which is encrypted using Fibonacci–Lucas transform [51], in the selected frames singular values. This methodology is robust against various attacks while achieving a good imperceptibility level. A modified version of the above-described scheme is proposed in [38]. Likewise, this technique performs in the transform domain using the DWT and the SVD. This scheme differs from the method in [40] by embedding the principal components of the watermark instead of the singular values in the specific selected frames. This technique satisfies the robustness and imperceptibility aspects. Relying on the same general concept, Abdulla and Navas advance a semi-blind watermarking scheme in multi-frequency domain [2]. In this methodology, only the lower entropy frames are used as watermark holders in order to reduce the yielded visual distortion. Under this framework, two mathematical transforms namely 2-level DWT and SVD are jointly exploited through an additive method to conduct the insertion function. Based on the simulations outcomes, this system successfully preserves the perceptual quality of the watermarked videos. Furthermore, it performs robustly under numerous manipulations.

Bhardwaj et al. present a video watermarking system in the LWT domain [6]. In this scheme, the authors exploit the mathematical relationship among the frames number in the cover video, the coefficient block size and the capacity to select a few frames for the insertion process. LWT is performed to each selected frame luminance, which is decomposed into different frequency sub-bands. Then, the watermark is injected into the lower frequency band via a quantization method. This scheme achieves a noteworthy robustness and imperceptibility enhancement. Another frequency video watermarking approach using SVD and Multiresolution Singular Value Decomposition (MSVD) is employed in [37]. Only significant frames, which are identified based on motion estimation, are chosen to host the to-be-hidden information. This latter is firstly scrambled and then, blindly inserted using a Quantization Index Modulation-based algorithm [7]. According to the experimental findings, this method gives a very good imperceptibility as well as a high robustness to different geometric and image processing-based attacks.

A semi-fragile watermarking system for video content authentication is provided in [15]. The used authentication signature in this algorithm is composed of the timing information and invariant features, which are extracted by a spatial analysis of macroblocks. The watermark embedding is executed by flipping the sign of nonzero QDCT (Quantized DCT) coefficients in a set of random selected Group of Pictures (GOP). The prime advantage of this methodology is that it is insensitive to mild processing while being sensitive to intentional distortions. Likewise, another semi-fragile watermarking approach using DCT is suggested in [10]. The frame number and the relationship among the DCT nonzero coefficients are used to generate the watermark, which is concealed in the medium frequency sub-bands coefficients. This algorithm has a negligible effect on the watermarked video quality. Besides, it shows a good immunity against non-malicious manipulations. Similarly, a chromatic residual DCT-based watermarking scheme is presented in [45]. In this system, an authentication data are generated through the intra-prediction mode categorization. Actually, the chroma blocks with the higher number of nonzero (NNZ) items are selected as watermark holders. In fact, the watermark is hidden within the mid-frequency quantized AC coefficients of chromatic \(4\,\times \,4\) sub blocks. The simulations results demonstrate that this method is resilient against compression while exhibiting a high sensitivity toward malicious tampering at both the temporal and spatial level. Another frequency domain-based watermarking solution is introduced in [31]. Herein, the watermark holders are chosen tacking into account the human visual system specifications. Then, a three-dimensional discrete cosine transform-based embedding algorithm is involved to hide the watermark information in the selected host blocks pixels. The obtained results show that this approach offers a reasonable sustainability level as well as good watermarked videos quality. Although frequency domain-based techniques are high in computation, they exhibit an efficient robustness to a wide variety of attacks.

The third category of watermarking is hybrid domain-based techniques. The watermark embedding is fulfilled combining the two areas viz. the spatial domain and the frequency one. In [28], authors use both of LSB and DCT to develop a new video watermarking scheme. A first watermark is concealed in the blue channel of the host frame using LSB as spatial domain-based embedding technique. Then, two levels DCT is applied to the initially watermarked frame and next, the second watermark is inserted into each transformed coefficients. This method shows a high invisibility level and a good resilience to several manipulations. Another hybrid video watermarking algorithm, which combines three transformations together, i.e., LSB, DWT and DCT, is presented in [14]. A first mark is doubly inserted in the host frame using DCT and LSB-based embedding techniques. The final watermarked frame is obtained by hiding a second watermark in the initially marked one following an embedding algorithm in the DWT domain. Evaluation results ascertain the robustness and the invisibility criteria of this scheme. Kerbiche et al. develop a similar video watermarking scheme, which jointly uses DWT, SVD and LSB [30]. In this methodology, crowdsourced regions and moving objects, both generated from the mosaic host video frames, are adopted as watermarks carriers. Since two signatures are considered, a DWT-SVD-based insertion algorithm as well a LSB based one are carried out during the embedding process. This approach has a high robustness and a good imperceptibility.

Algorithms based on hybrid domain are more suitable for watermarking schemes that involve multi-watermarks embedding because they guarantee high imperceptibility and robustness levels as well as a less computational complexity. For this reason, the dual watermarking approach proposed in this paper operates in the hybrid domain. Indeed, LSB-based embedding method is used as a combination with LWT-SVD based one to carry out the insertion of two distinct watermarks in each host video frame.

3 Preliminaries

LWT, SVD, texture features and Torus automorphism mapping are simultaneously utilized as fundamental components to build up the proposed watermarking scheme. A brief description of each of these components is given in the following subsections.

3.1 Lifting Wavelet Transform (LWT)

The lifting wavelet transform was initially introduced by Sweldens as an alternative concept for traditional DWT. This domain transform technique produces reversible integer wavelet and scaling coefficients rather than floating-point ones. Hence, it avoids aliasing effects and information loss. Allowing a wavelet transform fully in-place calculation, lifting scheme necessitates the half computations number that is involved in conventional wavelet transform.

Fig. 1
figure 1

Lifting wavelet transform application to Barbara image

The signal decomposition into frequency sub-bands by LWT is achieved using iteration operations of three fundamental steps that are termed as splitting, prediction and update. Readers can refer to [12, 44] for the detailed lifting scheme mathematical description. For visual perception, Fig. 1 depicts Barbara image decomposition using lifting scheme. LWT is known for its attractive properties including a good robustness to noise addition, a less computation complexity, a reduced distortion and a better frequency localization [34]. These features make LWT suitable for several applications such as denoising [9], features extraction [25] and watermarking [6].

3.2 Singular Value Decomposition (SVD)

SVD is a well-known algebraic numerical tool used for complex matrix decomposition. First, SVD was proposed by Beltrami and Jordan only for square matrices, and then, Eckart and Yong have generalized this technique to handle rectangular matrices [33]. For any matrix M of an arbitrary size m*n, the singular value decomposition can be expressed using the following formula:

$$\begin{aligned} M = U \times S \times V^t \end{aligned}$$
(1)

where S is a diagonal matrix that contains only positive values called the singular values and arranged in descending order. U and V are orthogonal matrices termed as the left-hand side singular vectors and the right-hand side ones, respectively. In image analysis, singular values matrix defines the image luminance, while the corresponding singular vectors pair describes the image geometry. SVD becomes an interesting algebra transform for signal processing area owing to its prominent properties such as its maximum energy packing, multivariate analysis and adaptation to distortions ability [23, 27, 29]. It is worth mentioning that after applying SVD the obtained U matrix holds the main and greatest coefficients values and it exhibits two interesting characteristics [5]. Indeed, all elements in its first column have the same sign. Moreover, these elements are strongly correlated.

3.3 Texture Features

Texture is recognized as an important image descriptor that is broadly applied in several pattern recognition and computer vision applications. Up to now, no unique texture definition has been proposed by researchers. However, it can be defined as a set of complex visual patterns. These latter are composed of spatially arranged entities having color, brightness, size or shape as well as distinguished by heaviness, uniformity and smoothness [42]. Several texture features are proposed in the literature. Energy, entropy and kurtosis are among the commonly texture features used to classify image as a single entity or its blocks as sub-entities. Energy, calculated using (2), is among the statistical measures that provide most meaningful texture information [17].

$$\begin{aligned} {\text {Energy}} = \sum _{i=1}^{n}\sum _{j=1}^{m}{I}^{2}(i,j) \end{aligned}$$
(2)

where I is an image of \(m {*} n\) size. The energy amount describes the uniformity of the distribution. In a blocks cluster, textured blocks are identified by higher energy values compared to their neighbors, while non-textured ones are recognized by lower energy values.

Entropy is a statistical metric, which measures the randomness of image content dispersion. It can be expressed using the following formula:

$$\begin{aligned} {\text {Entropy}} = -\sum _{i=1}^{255}p(x_{i})\log _{2} p(x_{i}) \end{aligned}$$
(3)

where p(\(x_{i}\)) is the symbol \(x_{i}\) occurrence of probability. Entropy is an efficient indicator for image information magnitude [32]. High entropy value reveals a randomly dispersed intensities distribution; thus, it indicates a textured block. In contrast, low entropy value is associated with uniform intensities distribution through untextured block.

Kurtosis represents the normalized fourth-order moment [20]. It measures the flatness of a distribution. Textured blocks, which have a significant deal of information, are identified by low kurtosis values. Contrariwise, high kurtosis value designates a non-textured block. The kurtosis mathematical expression is given in (4).

$$\begin{aligned} {\text {Kurtosis}} = \sigma ^{-4}\sum _{i=1}^{255}(x_{i}-\mu ) ^{4}p(x_{i})-3 \end{aligned}$$
(4)

where p(\(x_{i}\)) is the symbol \(x_{i}\) occurrence of probability, \(\mu =\sum _{i=1}^{n}x_{i}p(x_{i})\) is the pixels mean value and \(\sigma =\sqrt{(x_{i}-\mu )^{2} \times p(x_{i})}\) is the variance square root.

3.4 Torus Automorphism Mapping

Torus automorphism mapping represents a chaotic map version that allows spatially dispersing pixels locations [18]. In this technique, for each point only one unique mapping point is assigned. Torus automorphism mapping expression corresponding to one dimension sequence is defined as:

$$\begin{aligned} X_\textrm{mapped}= (K \times X)\mod (L) +1 \end{aligned}$$
(5)

X and \(X_\textrm{mapped}\) represent the bit positions before and after the Torus automorphism mapping. L is the sequence length, and K denotes the mapping secret key. Torus automorphism is commonly adopted as content scrambling method especially in watermarking application to enhance the embedded watermark security [22, 32].

4 Proposed Approach

The proposed approach is a semi-fragile dual video watermarking system devoted for content authentication and tampering detection and localization. As shown in Fig. 2, this scheme consists of three processes: the watermarks construction, the watermarks embedding and the detection.

In particular, the main contributions in this work are outlined as below:

  1. 1.

    A content-based watermarks generation procedure is proposed. Indeed, in the advanced dual watermarking technique two distinct authentication information are designed using frame content characteristics and then, individually encapsulated in each host video frame. The built up content-based watermarks permit to fulfill the tasks of (i) discrimination between intentional manipulations and non-intentional ones for a faithful authenticity verification, (ii) the detection of inter-frames as well as intra-frame forgeries and (iii) the localization of maliciously tampered areas.

  2. 2.

    To improve the overall proposed scheme imperceptibility attribute, a texture analysis-based block classification method is introduced. This technique permits to optimally choose the watermark holders in concordance with the given video frame characteristics and thereby to lessen the visual artifacts introduced by the embedding process.

  3. 3.

    To proficiently establish the trade-off between the different semi-fragile watermarking requirements, a hybrid domain-based watermarks embedding procedure is proposed. In fact, a LSB-based insertion method is employed to successfully hide the first watermark in the spatial domain. However, the second watermark is embedded using a multi frequential algorithm utilizing both of the LWT frequency bands specifications and the SVD coefficients characteristics to strengthen the scheme performance. The established embedding algorithms not only permit to successfully infuse the watermarks data in the host video frames but also to guarantee a blind detection at the receiving side. Indeed, the proposed detector is fully capable of blindly extracting the two-hosted watermarks. This allows avoiding information storage and bandwidth consumption issues as neither the unwatermarked video version nor the secret signature need to be saved and transmitted along with the watermarked sequence.

  4. 4.

    To boost the watermark ability to precisely locate spatial distortions, a glide window-based detection strategy is proposed. It allows investigating the inconsistency between the extracted and the reconstructed second watermark versions for a rigorous content authentication with an accurate tamper localization.

The watermarks construction, embedding and detection processes are separately explored in the incoming subsections.

Fig. 2
figure 2

Overall flow diagram of the proposed video watermarking approach

4.1 Watermarks Construction Process

The proposed scheme is a dual watermarking scheme, in which two watermarks are tailored for each video frame, as illustrated in Fig. 3, to detect and locate each tampering type.

Fig. 3
figure 3

Overall flow diagram of the proposed watermarks construction process

In fact, video forgery techniques are often classified to two sets namely intra-frame and inter-frames forgery [43]. The former, also termed as spatial tampering, refers to the host frame content modification such objects manipulation through object addition or deletion. Inter-frames forgeries, also called temporal tampering, designate a manipulation conducted at time level such as frames dropping, replacing or swapping.

The watermark construction process starts by dividing the host video into RGB frames. Subsequently, the primary watermark is built up using the current frame index as timing information. Indeed, the frame index is modulated to a binary sequence for convenient insertion. Then, it is served as watermark to detect and locate temporal tampering.

Next, a block-based technique is adopted to generate the second watermark, which is used to reveal spatial tampering. First, each RGB frame is converted to YCbCr space color. Y component, which allows ensuring the best compromise between the robustness and the imperceptibility, is segmented into non-overlapping blocks of 16*16 size. Following, a texture analysis is applied to each block. In fact, each 16*16 block is further decomposed into four sub-block of 8*8 size. Then, three texture features including the energy, entropy and kurtosis are investigated to evaluate the texture aspect of the segmented sub-blocks as follows:

step1: For each sub-block among the four, ones associated with the same block, calculate the energy, entropy and kurtosis values denoted, respectively, by \(V_\textrm{energy}\), \(V_\textrm{entropy}\) and \(V_\textrm{kurtosis}\).

step2: Calculate \(Mean_\textrm{energy}\), \(Mean_\textrm{entropy}\) and \(Mean_\textrm{kurtosis}\) the average of each feature values.

step3: Associate a Boolean variable B to each feature and compute its value based on the following rules:

If \(V_\textrm{energy} > Mean_\textrm{energy}\): \(B_\textrm{energy}=1\); Otherwise, \(B_\textrm{energy}=0\)

If \(V_\textrm{entropy} > Mean_\textrm{entropy}\): \(B_\textrm{entropy}=1\); Otherwise, \(B_\textrm{entropy}=0\)

If \(V_\textrm{kurtosis} < Mean_\textrm{kurtosis}\): \(B_\textrm{kurtosis}=1\); Otherwise, \(B_\textrm{kurtosis}=0\)

step4: Identify the sub-blocks texture status based on the rules illustrated in Table 1.

Table 1 Used rules for sub-blocks texture identification

After classifying all the four sub-blocks based on texture aspect, the corresponding block is regarded as textured block if it holds more than one textured sub-block and thereby, the current generated watermark bit is 1. Otherwise, it is labeled as non-textured block and the corresponding watermark bit is 0. All the generated bits are combined in one-dimensional binary sequence, which is encrypted using Torus automorphism mapping to further improve the scheme security aspect. Finally, the scrambled sequence is used as a second watermark.

4.2 Watermarks Embedding Process

As mentioned previously, two different watermarks are embedded in each video frame. Since the proposed watermarking scheme is a hybrid domain-based approach, the watermarks embedding is conducted in both spatial and frequency domains. The embedding process flow diagram is given in Fig. 4.

Fig. 4
figure 4

Overall flow diagram of the proposed watermarks embedding process

To begin the process, the video is decomposed into consecutive RGB frames. To further satisfy the trade-off between the imperceptibility and the robustness requirements, the RGB frame is converted to the YCbCr color model as its components are less correlated than the RGB ones. To improve the imperceptibility aspect, among the constituent channels, only the luminance component Y is implied in the watermarking by virtue of the fact that it provides a high relevancy to the human visualization capacity. Explicitly, the human visual system is less sensitive toward distortions at luminance level compared to modifications in chrominance components (Cb and Cr). In addition, the involvement of the Y band in the watermarking further strengthens the scheme sustainability owing the fact that the luminance element is lenient against different attacks particularly compression operations. Hence, the primary watermark is hidden in Y component pixels using the LSB-based insertion method where the least significant bit of pixel is substituted by the current watermark bit.

Thereafter, the second watermark is embedded inside of the initially watermarked Y component. Indeed, this latter is split into 16*16 blocks as illustrated in Fig 4. Then, every bit in the second watermarked is expanded into three copies which are inserted in the same block. Indeed, each block is divided in four equal-sized sub blocks, which are classified into textured and non-textured sub-blocks based on texture analysis as already discussed in Sect. 4.1. The three most textured sub-blocks are selected as the best embedding locations for the three expanded similar bits. The explanation behind watermarking the strongly textured sub-blocks is that the human visual system (HVS) is less sensitive to changes in textured areas that hold many details rather than in non-textured ones.

Taking into account that increasing the decomposition level can induce serious perceivable distortions to the final watermarked video quality as well as and significantly reduce the embedding payload capacity, a single level LWT is applied to transform the selected candidate sub-blocks to the frequency domain. This operation generates four different frequency sub-bands termed as low-frequency sub-band (LL), high-frequency sub-band (HH) and mid-frequency sub-bands (LH) and (HL). Inserting a mark in LL sub-band, which contains the greatest frame energy amount, guarantees a good robustness, whereas it yields an apparent distortion in the video watermarked quality. In contrast, involving the high-frequency sub-band provides high perceptual quality while making the watermark fragile to attacks. Among the two mid-frequency bands, HL is more susceptible to human visual system (HVS). Thereby, LH is recognized as the most appropriate sub-band for the embedding since it successfully ensures a favorable balance between the mutually inversely related transparency and sustainability aspects.

For enhancing the proposed scheme performance, the LWT is associated with SVD. Thus, the selected sub-band (LH) is factorized to the product of three matrices namely U, S and V using SVD. As already highlighted in Sect. 3.2, the U matrix first column coefficients exhibit a stable relationship by having the same sign and being strongly correlated values. Besides, it holds the main and greatest coefficients values. In the proposed scheme, these attractive properties are exploited to establish an embedding algorithm that permits to increase the robustness and provide further coherence with the human visual system as well as guarantee a blind detection at the receiving side. In order to find the most correlated two elements among the U matrix first column coefficients, the average of difference between each couple of coefficients values, denoted by \(D_\textrm{avg}\), is calculated for different videos and summarized in Table 2. As can be noticed from this table, U(2,1) and U(3,1) are the closet coefficients since they exhibit the less difference of values. Hence, the watermark bits are encapsulated within the U matrix by adjusting these two values via the formulas below:

If \(W_\textrm{embedding}=0 \)

$$\begin{aligned} \left\{ \begin{array}{lcl} U_\textrm{watermarked}(2,1) = sign(U(2,1)) \times |U_\textrm{avg}+T |\\ U_\textrm{watermarked}(3,1) = sign(U(3,1)) \times |U_\textrm{avg}-T |\\ \end{array} \right. \end{aligned}$$
(6)

Else

$$\begin{aligned} \left\{ \begin{array}{lcl} U_\textrm{watermarked}(2,1) = sign(U(2,1)) \times |U_\textrm{avg}-T |\\ U_\textrm{watermarked}(3,1) = sign(U(3,1)) \times |U_\textrm{avg}+T |\\ \end{array} \right. \end{aligned}$$
(7)

Where U(2,1) and U(3,1) are the original values of the second and the third coefficients in U matrix. \(U(2,1)_\textrm{watermarked}\) and \(U(3,1)_\textrm{watermarked}\) are the watermarked values corresponding to the second and the third coefficients in U matrix. \(U_\textrm{avg}\) is the mean between U(2,1) and U(3,1) values computed using (8), and T is the embedding strength factor.

$$\begin{aligned} U_\textrm{avg}= \frac{|U(2,1)+U(3,1) |}{2} \end{aligned}$$
(8)

The next step is to apply the SVD inverse and the LWT inverse to obtain the watermarked sub-block. The three watermarked sub-blocks are merged with the residual non-watermarked one to get the corresponding watermarked 16*16 block. Similarly, watermarked blocks are combined to create the finally watermarked Y component. To reconstruct the RGB watermarked frame, this latter is merged with the non-watermarked chrominance channels, i.e., Cr and Cb and then, converted to RGB space color. Lastly, the above-explained steps are repeated for all the host frames to obtain the watermarked video.

Table 2 The average difference between each couple of coefficients in first column of U component for different videos

4.3 Detection Process

Figure 5 presents the flow diagram of the detection process. It includes three stages: the reconstruction, the extraction and the authentication and tampering localization. The reconstruction phase allows regenerating the watermark, which is used to detect spatial tamper, by applying the second watermark construction steps as described in the Sect. 4.1. \(W_\textrm{reconstructed}\) is the label utilized to denote the obtained watermark. Execution concrete steps of the other two stages, i.e., the extraction and the authentication and tampering localization are introduced in the subsequent subsections.

Fig. 5
figure 5

Overall flow diagram of the proposed detection process

4.3.1 Watermarks Extraction Process

During this process, two watermarks are blindly extracted. This allows avoiding information storage and bandwidth consumption issues as neither the unwatermarked video version nor the secret signature need to be saved and transmitted along with the watermarked sequence.

The extraction process begins with the same watermarks insertion steps as shown in Fig. 5. After partitioning the watermarked video into RGB frames and converting them to the YCbCr color model, Y channel is retained for further processing. To get the primary extracted watermark, the least significant bits are extracted from the Y pixels values and aggregated in one binary sequence, which is later converted to decimal number representing the extracted frame index.

In order to extract the second watermark, the same luminance component Y is decomposed into 16*16 blocks. From each block, which is subdivided into 8*8 sub-blocks, three copies (\(C_{1}\),\(C_{2}\),\(C_{3}\)) for the same watermark bit \(W(i)_\textrm{extracted}\) are extracted. In fact, the watermark holders among the resulting sub-blocks are identified with the same procedure used during the watermark embedding process. Recognized watermark carriers are subjected to LWT following by SVD. Each copy of the watermark bit \(C_{j}\) is extracted from the considered \(U_\textrm{extracted}\) matrix through the following rules:

$$\begin{aligned} \left\{ \begin{array}{lcl} C_{j} = 0 \quad If\quad U_\textrm{extracted}(2,1) > U_\textrm{extracted}(3,1)\\ \\ C_{j} = 1 \quad {\text {Otherwise}} \end{array} \right. \end{aligned}$$
(9)

Where \(U_\textrm{extracted}(2,1)\) and \(U_\textrm{extracted}(3,1)\) denote the extracted values corresponding, respectively, to the second and the third coefficients in the \(U_\textrm{extracted}\) matrix first column. \(C_{j}\) is the watermark bit copy relative to the sub-block number j, with j \(\in \) {1, 2, 3}.

The extracted watermark bit final value \(W(i)_\textrm{extracted}\) assigned to a considered block is evaluated by applying a majority voting method on the three versions (\(C_{1}\),\(C_{2}\),\(C_{3}\)) issued from its corresponding sub-blocks as explained in Table 3. All the extracted watermark bits are concatenated together to obtain the final extracted second watermark \(W_\textrm{extracted}\).

Table 3 Majority voting rules for watermark bit value investigation

4.3.2 Authentication and Tampering Localization

Establish an efficient strategy to verify the video content authenticity and to detect and locate the tampered area is the proposed approach main focus. In temporal authentication stage, the observed frame index and the extracted index from the current frame are compared. A mismatch between these two indexes permits to detect and locate a performed temporal tampering.

On the other hand, the spatial authentication is conducted for each watermarked video frame by inspecting the similarity between the extracted second watermark \(W_\textrm{extracted}\) and its reconstructed version \(W_\textrm{reconstructed}\) using two measures namely Bit Error Rate (BER) and Normalized Correlation (NC) as well as two thresholds termed as \(Tr_\textrm{BER}\) and \(Tr_\textrm{NC}\). Definitions and mathematical descriptions of these metrics are provided in Sect. 5.3. The authentication decision is made after comparing the BER and the NC measures to their corresponding thresholds. The considered frame is judged as authentic only if BER is below its threshold \(Tr_\textrm{BER}\) and NC is above \(Tr_\textrm{NC}\). Otherwise, the given frame of \(h {*} w\) size is regarded as non-authentic.

Altered content regions are located in non-authentic watermarked video frame based on a tampering localization strategy including the following steps. First, an error map \(E_\textrm{map}\), which is a \(\frac{h}{16}\) * \(\frac{w}{16}\) size matrix, is elaborated by XOR-ing the extracted second watermark with the reconstructed one as shown in (10).

$$\begin{aligned} E_\textrm{map}= W_\textrm{extracted} \oplus W_\textrm{reconstructed} \end{aligned}$$
(10)

To classify the \(16\times 16\) blocks, issued from the Y component segmentation, as authentic, intentionally manipulated or non-intentionally manipulated, the constructed error map \(E_\textrm{map}\) coefficients are scanned using a 2x2 sized glide window. Obviously, the authentication sensitivity depends on the glide window dimensions. In fact, the 2x2 size enables to increasing the localization ability by accurately finding the tampered blocks. Thus, the glide window is opened from each location in the \(E_\textrm{map}\) and then, the values of the four held in elements are examined. Based on these latter, maliciously tampered regions are identified.

As \(E_\textrm{map}\) is constructed by XOR-ing the extracted and the reconstructed second watermark versions (application of the Exclusive OR between the two watermarks), a nonzero element in \(E_\textrm{map}\) coefficients corresponds to a mismatch between the two investigated watermarks. Therefore, if at least two nonzero values are clustered among the glide window elements, the corresponding 16x16 block localized in the considered frame is flagged as an intentionally manipulated block content. If there are only zero elements inside the glide window elements, the corresponding block is considered as authentic. Otherwise, the associated block is deemed as unintentionally manipulated.

Fig. 6
figure 6

Sample of tampering localization with Error map and 2*2 glide window size

Figure 6 depicts an illustration of tampering localization using the error map \(E_\textrm{map}\) with the glide window.

5 Simulations Findings and Comparisons

To testify the proposed system performance and prove the effectiveness of the used techniques and procedures to build up its overall framework, several simulations are carried out under diverse videos including test, camera1, video1, coastguard, news, foreman and paris. The test videos can be classified into two sets. The first one includes well-known standard videos, which are commonly applied in the watermarking field and often utilized to evaluate state-of-the-art schemes. The second set involves different surveillance sequences.

Actually, the developed watermarking scheme is evaluated in terms of the three requirements. In Sects. 5.1 and 5.2, the capacity and the imperceptibility are scrutinized to demonstrate the proposed scheme ability to hide a large amount of watermark data without raising a rigorous visual quality degradation and thereby highlight the proficiency of (i) the involved texture features-based watermarking positions selection strategy and (ii) the proposed watermarks insertion algorithm.

As previously stated, the proposed semi fragile video watermarking system is devoted for authenticity verification as well as tampering detection and localization goals. Hence, its semi-fragility performance is evaluated in Sect. 5.3 to substantiate its reliability in terms of discrimination between unintentional attacks and intentional ones along with maliciously tampered content localization. More precisely, Sect. 5.3 is divided into two subsections namely Sects. 5.3.1 and 5.3.2. Section 5.3.1 is dedicated to analyze the robustness against common processing, while Sect. 5.3.2 deals with the fragility to temporal and spatial intentional attacks investigation in addition to tampering localization.

The conducted simulations in the above-mentioned two subsections permit to attest the appropriateness of (i) the advanced content-based watermarks generation procedure to construct two distinct authentication information that allow to successfully fulfill the tasks of attacks types characterization, inter-frames as well as intra-frame forgeries detection and maliciously tampered areas localization (ii) the various beneficial techniques and procedures used to build up the hybrid domain-based watermarks embedding process particularly the LSB, the LWT and the SVD techniques and (iii) the proposed glide window-based spatial distortions detection strategy for an accurate tamper localization.

In the sequel, simulations results belonging to each performance level, i.e., capacity, imperceptibility and semi-fragility are reported in detail, thoroughly discussed and compared with previous works. It is worth mentioning that the embedding strength factor is empirically set to T = 0.05 in the performed experiments.

5.1 Capacity Results

As explained before, every watermark bit is inserted in three copies in one 16*16 size block during the embedding process. Hence, the information amount that can be concealed into a given frame with h*w dimension, denoted by Cap, can be expressed by (11).

$$\begin{aligned} Cap = 3 \times \frac{h \times w}{16 \times 16} \end{aligned}$$
(11)

The different test videos capacity results are illustrated in Table 4. From this table, it can be inferred that the proposed scheme offers a large payload capacity. The main reason behind the achieved results is the use of block-based embedding method with the bit expansion mode.

Table 4 Capacity values for used videos for simulation experiments

5.2 Imperceptibility Results

In order to evaluate the proposed watermarking scheme imperceptibility, the Peak Signal to Noise Ratio (PSNR) is used as measure. This metric allows assessing the quality difference between the original host video and its watermarked version with respect to human visual system [21]. The PSNR value, which is proportional to the Mean Square Error (MSE) value, is computed using (12).

$$\begin{aligned} PSNR = 10 \times \log \frac{255^2}{MSE} \end{aligned}$$
(12)
Fig. 7
figure 7

Imperceptibility results: The obtained PSNR values for all watermarked videos

PSNRs of all watermarked videos are shown in Fig. 7. From this figure, it is observed that the obtained PSNR values are ranged between 33.1738 dB and 36.1641 dB. As the marked frames are estimated to have high perceptual quality when the PSNR is higher than 30 dB [26, 50], the obtained results correspondingly disclose the good imperceptibility level, achieved using the proposed dual watermarking scheme.

For subjective observation and better visual comparison, examples of unwatermarked original frames and their corresponding watermarked frames are plotted in Fig. 8. According to Fig. 8, no perceptible distortion is visually detectable in the dual watermarked frames. Thus, our watermarking approach preserves the video perpetual quality after the embedment of a watermark data with a significant size. Involving an efficient texture features-based method to choose the best watermark holders positions in coherence with the human visual system, selecting the mid-frequency sub-band for the watermarking and combining the LWT and the SVD during the embedding process enable to successfully fulfill the imperceptibility requirement.

Fig. 8
figure 8

Up: non-watermarked host frames: a test.avi b camera1.avi c news.avi d paris.avi, Down: watermarked frames: e test.avi f camera1.avi g news.avi h paris.avi

5.3 Semi-fragility Results

To assess the semi-fragility performance, two standard measurement parameters are considered namely the Normalized Correlation (NC) and the Bit Error Rate (BER) which are computed through (13) and (14), respectively [23]. NC is used to inspect the difference between the extracted watermark \(W_\textrm{extracted}\) and the reconstructed one \(W_\textrm{reconstructed}\), while BER is defined as the watermark erroneous bits rate overall the correct ones.

$$\begin{aligned}{} & {} \begin{aligned} NC=\frac{\sum \limits _{i=0}^{m}\sum \limits _{j=0}^{n}W(i,j)_\textrm{extracted}\times W(i,j)_\textrm{reconstructed}}{\sqrt{\sum \limits _{i=0}^{m}\sum \limits _{j=0}^{n}W(i,j)_\textrm{extracted}^{2}} \sqrt{\sum \limits _{i=0}^{m}\sum \limits _{j=0}^{n}W(i,j)_\textrm{reconstructed}^{2}}} \end{aligned} \end{aligned}$$
(13)
$$\begin{aligned}{} & {} \begin{aligned} BER = \sum \limits _{i=0}^{m}\sum \limits _{j=0}^{n}(\frac{W(i,j)_\textrm{extracted}\oplus W(i,j)_\textrm{reconstructed}}{m \times n} \end{aligned} \end{aligned}$$
(14)

Where, the watermarks dimensions are represented by m and n and \(\oplus \) is the XOR operator.

Attacks characterization is a paramount property that must be satisfied by an authentication-based watermarking scheme. Indeed, a semi-fragile scheme for authentication goal should survive admissible processing which preserve semantic content. On the other hand, it should detect malicious attacks, which attempt to alter the data general semantic, and be able to locate the edited regions in the tampered watermarked video frames. To these ends, two thresholds \(Tr_\textrm{NC}\) and \(Tr_\textrm{BER}\) are, respectively, associated with NC and BER. As prescribed in Sect. 4.3.2, the content is deemed as authentic only if BER is inferior to \(Tr_\textrm{BER}\) and NC is superior to \(Tr_\textrm{NC}\). Otherwise, the content is regarded as non-authentic. \(Tr_\textrm{NC}\) and \(Tr_\textrm{BER}\) are empirically set to 0.9 and 0.1, respectively. Several experiments are directed to test the robustness of the proposed watermarking scheme against non-malicious modifications and its fragility to malicious ones. The following subsections detail and discuss the obtained results.

5.3.1 Robustness to Non-malicious Attacks Results

The proposed scheme robustness is investigated against common processing operations including Gaussian noise, salt and pepper noise, speckle noise, brightening, sharping, histogram equalization, H.264 compression and MJPEG compression. NC and BER values obtained after performing the already listed non-malicious attacks as well as in the case of absence of attacks are recorded in Table 5 and Table 6. If no attack is carried out, the complete hidden information can be successfully extracted since the obtained NC and BER are, respectively, 1 and 0 as shown in Table 5 and Table 6. This property is fundamental for an authentication scheme where a faithful authenticity decision and an accurate tamper localization are required.

After adding Gaussian noise, the procured NC values vary between 0.92973 and 0.98894; thereby, they are above the fixed threshold \(Tr_\textrm{NC}=0.9\). Besides, the obtained BER values are ranged between 0.02197 and 0.08135; hence, they do not exceed their relative threshold \(Tr_\textrm{BER}=0.1\). Based on these results, it can be concluded that the proposed scheme can resist this manipulation type. Similarly, the detector is able to efficiently extract the encapsulated watermark from all watermarked videos processed by the two next considered non-malicious attacks viz. salt and pepper and speckle noises. In fact, according to the experiments results tabulated in Table 5 and Table 6, the achieved NC values under salt and pepper attack reach 0.99751 and the obtained BER is ranged between 0.00494 and 0.06144. Likewise, the resulting NC and BER after performing speckle noise attack are, respectively, superior to 0.95018 and inferior to 0.09661. The resilience to Gaussian, salt and pepper and speckle noises is issued from involving the LWT, which is broadly recognized by its high immunity to noises addition, in the watermarking algorithm.

Table 5 Robustness results: The NC values after the non-malicious attacks application

Afterward, three different enhancement-based attacks namely brightening, sharpening and histogram equalization are applied to the watermarked videos in order to test the proposed scheme robustness to them. As shown in Table 5 and Table 6, the minimum NC and the maximum BER associated with brightening attack are 0.98731 and 0.02514, respectively. As far as the sharpening manipulation is concerned, the achieved NC vary between 0.99489 and 0.99999, while the BER values lie between 0 and 0.01017. For the last considered enhancement attack, i.e., the histogram equalization manipulation, the obtained NC values are beyond 0.98704 and the BER values do not exceed 0.02569. Hence, all the acquired NC measurements are above \(Tr_\textrm{NC}\), while BER is below \(Tr_\textrm{BER}\), which confirm that the embedded signature detection can be successfully fulfilled after applying the enhancement-based attacks. The selection of the most convenient blocks for the watermark insertion along with the simultaneous use of the intrinsic characteristics relative to two powerful transforms, i.e., LWT and SVD to build up the proposed watermarking scheme processes are the fundamental arguments for the explanation of this strong resilience to enhancement-based attacks.

Table 6 Robustness results: The BER values after the non-malicious attacks application

As seen from Table 5 and Table 6, the last considered non-malicious attacks are the H.264 and MJPEG compression. The averages of NC and BER values obtained under H.264 compression are equal to 0.99169 and 0.01592, respectively. However, 0.00314 and 0.99841 are, respectively, the achieved BER and NC averages after compressing the watermarked video based on MJPEG format. The above outcomes demonstrate the proposed scheme immunity to these attacks. The main reason for this result is the selection of the middle frequency sub-band as appropriate watermark carrier, which improves the overall proposed scheme robustness attribute and particularly against compression processing.

All the above reported evaluations outcomes ascertain that our proposed scheme has an outstanding capability to survive a large set of incidental distortions including noises addition attacks, enhancement-based operations and compression processing. These important findings are obtained owing to the exploitation of the attractive and complementary characteristics of both LWT and SVD transforms throughout the different watermarking stages. Moreover, apart from improving the transparency aspect, the texture feature-based method involved during the insertion process permits to select the proper watermarking positions with an important stability level, which further improves the advanced algorithm performance in terms of robustness.

5.3.2 Fragility to Malicious Attacks Results and Tampering Localization

The watermarked videos are mischievously tampered in order to scrutinize the proposed approach effectiveness against malicious attacks and its ability to detect tampering locations. Malicious attacks are often classified into temporal and spatial attacks. As explained in Sect. 4.3.2, a mismatch between the observed frame index and the extracted one from the current watermarked frame permits to detect and locate a performed temporal distortion. However, spatial tampering is detected and tampered segments are located by assessing the difference between the extracted second watermark and its reconstructed version using an error map and a glide window.

Firstly, watermarked videos are applied to different temporal attacks, including (i) frame dropping; (ii) frame replacing and (iii) frame swapping.

Without conducting any temporal manipulation, the observed frame index as well as the extracted one are shown in Fig. 9a. As expected, the graph shows that the two indexes are identical which indicates that no temporal tampering is occurred.

Frame dropping attack is simulated by deleting the frames 40-60. The corresponding authentication result is given in Fig. 9b. It can be noticed that the watermarked video contains 80 frames, but the last extracted frame index is 100. In addition, it can be seen that the observed and extracted indexes keep equal until frame 39, then a jump between frames 40 and 60 is occurred. Based on these findings, it can be claimed that the frames 40–60 are dropped.

Figure. 9c displays the authentication result after applying frame replacing attack by changing the frames ranged from 40 to 60 with the twenty first frames. According to Fig. 9c, the observed index and the extracted one are consistent for frames 1–39 and 61–100, but a total mismatch is inferred from frame 40 to frame 60. These drawn observations demonstrate that frames 40–60 are tampered in temporal domain.

Fig. 9
figure 9

Temporal tampering detection and localization results: a without tampering b frame dropping attack c frame replacing attack d frame swapping attack

Figure 9d illustrates the result of frame swapping manipulation where the frames 25-49 are sequentially replaced by the frames 50 to 75 and vice versa. The observed and the extracted indexes are unaffected until frame number 24 and after frame number 75 as depicted by the plot in Fig. 9d. However, the indexes of the two ranges namely 25–49 and 50–75 are exchanged. Therefore, this temporal manipulation is successfully detected.

In summary, the above-discussed experimental findings confirm the proposed semi-fragile watermarking approach reliability in detecting and precisely locating all the simulated inter-frames tampering. This high performance is resulted from the suitable choice of the current frame index as timing information for the construction of the first authentication information that is used to mark the considered cover video frame.

On the other hand, watermarked videos are subjected to another set of attacks in order to test the proposed scheme performance in detecting and locating spatial distortions namely (i) object removal and (ii) object insertion attacks. Thus, the visual content of different watermarked videos frames is deliberately manipulated by deleting or adding a specific object. In the presence of these two tampering manipulation types, the obtained authentication metrics, i.e., NC and BER are, respectively, above and below their preset thresholds \(Tr_\textrm{NC}\) and \(Tr_\textrm{BER}\) as indicated in Fig. 10 and Fig. 11. These results represent an explicit proof about the intentional modifications that the frames had undergone. The glide window-based tampering localization strategy described in Sect. 4.3.2 is then performed to determine the altered regions yielded by object insertion or object removal attacks as illustrated in Fig. 10 and Fig. 11, respectively.

Fig. 10
figure 10

Spatial tampering detection and localization results after object insertion attack: a watermarked frame b tampered frame with BER=0.12366 and NC= 0.87058 c corresponding error map d localized tampering

Fig. 11
figure 11

Spatial tampering detection and localization results after object removal attack: a watermarked frame b tampered frame with BER=0.13089 and NC= 0.86710 c corresponding error map d localized tampering

For better illustration of the tampering localization ability, several watermarked frames, their maliciously tampered versions by one of the two considered spatial forgeries and their corresponding versions with the localized tampering are depicted in Fig. 12 and Fig. 13. As obvious from these illustrations, the proposed scheme successes to detect and concisely locate these kinds of tampering manipulations in the spatial domain. This achievement is due to the involvement of an effective watermark generation strategy, which benefits from 3 independent texture features to elaborate to be embedded authentication data that faithfully describe the current frame content characteristics. Indeed, being subjected to a malicious attack contributes to an inconsistence between the watermarked and the intentionally tampered frame versions at texture information level. Taking advantage of this property as well as the glide window concept allows ensuring reliable spatial tamper detection and localization.

Fig. 12
figure 12

Samples of A Watermarked frames, B their tampered versions by object removal attack and their corresponding located tampered area from videos. (a) test, (b) camera1, (c) news

Fig. 13
figure 13

Samples of (A) Watermarked frames, (B) their tampered versions by object insertion attack and their corresponding located tampered area from videos. (a) test, (b) camera1, (c) news

5.4 Comparative Study

To further validate its performance, the proposed technique is compared with seven previous existing watermarking techniques, which are discussed in [2, 6, 31, 37, 38, 40, 41], with respect to robustness, capacity and imperceptibility properties. Moreover, a comparison in terms of tampering localization ability is conducted with state-of-the-art systems cited in [10, 15, 45, 46]. For a faithful comparison seeking, the experiments comparing the robustness, which are depicted in Tables 7 and 8, are redone using watermarks of length 90 bits.

The comparison in terms of robustness between our technique and that in [6] is provided in Table 7. Referring to Table 7, it is quite evident that the proposed technique carries superior robustness against Gaussian noise, salt and pepper and speckle noise as well as sharpening, histogram equalization and MJPEG compression. In addition, the two watermarking methods provide a comparable robustness performance against brightening attack. In brief, it is noted that the most findings of the proposed technique surpass those reached by the considered comparative algorithm. This superiority is resulted from the use of LWT as a combination with the SVD to carry out the watermarking in the multi-frequential domain instead of the mono-frequential one, which contributes to a considerable improvement in terms of the resilience performance.

Table 7 Robustness comparison (NC) between the proposed approach and the watermarking scheme [6]

On the other hand, Table 8 depicts the comparison of the proposed scheme robustness with those presented in [2, 31, 37, 38, 40, 41]. Indeed, the NC values tabulated in Table 8 demonstrate that our watermarking technique outperforms all the techniques cited in this table in terms of the resistance to noises addition attacks except salt and pepper to which the method introduced in [37] exhibits a slightly better robustness level. The involvement of LWT transform, which is broadly recognized by its noteworthy immunity to noises addition, in the watermarking processes is the prime reason for our scheme good sustainability against this type of manipulations.

Moreover, based on the comparison results summarized in Table 8, it is revealed that our proposed system is the most robust to sharpening and brightening as it attains the high performance value when compared to the existing watermarking approaches. Furthermore, it is observed that a competitive survival level against histogram equalization is ensured by the proposed system as well as the previous work in [2], which has a slightly enhanced NC value. Selecting the embedding positions in coherence with the human visual system capacity along with the watermarking execution in the multi-frequency domain allow reaching this robustness degree in the presence of these three enhancement-based attacks. In addition, while analyzing Table 8, it is inferred that our approach outperforms the techniques [37, 38, 40, 41] with respect to the sustainability criterion to the MJPEG compression. For H.264 compression, both of our algorithm and the one in [37] have the same strong resilience degree to this manipulation. The proposed system robustness to compression processing is the result of the middle frequency sub-band use as watermark information holder.

Table 8 Robustness comparison (NC) between the proposed approach and the watermarking schemes [2, 37, 38, 40, 41] and [31] on Foreman video
Table 9 Imperceptibility and capacity comparison between the proposed approach and the watermarking schemes [2, 6, 37, 38, 40, 41] and [31] on Foreman video

PSNR and capacity measurements values corresponding to our algorithm and to the previous works in [2, 6, 31, 37, 38, 40, 41] can be found in Table 9. According to this latter, the related schemes in [2, 6, 31, 37, 38, 40, 41] achieve better imperceptibility results than the proposed one because the embedding payload in our advanced system is noticeably larger as compared to those methods. Actually, our watermarking scheme capacity is about 3.5 times greater than the comparative works as shown in Table 9. Moreover, in the proposed system, dual watermarks are embedded in every frame in the host video. However, for the approaches [6, 37, 38, 40, 41] and [31] only a few number of frames or blocks are selected for the watermarking, which denotes a less visual content modification. In the same vein, it has to be pointed out that the reasons for the high imperceptibility of the scheme in [2] are its involvement of a holders selection strategy that allows to choose a limited number of frames to convey the watermark information as well as its semi-blindness nature. Actually, the method is a semi-blind SVD-DWT watermarking technique that uses the unwatermarked video format during the extraction. Therefore, it adopts a very small embedding factor to fuse the watermark information within the cover video. This implies that the perceptual quality is barely degraded. In meantime, our proposed approach is a fully blind scheme. Thus, our provided PSNR value remains reasonable and satisfactory especially as it is superior to 30 db [26, 50].

Fig. 14
figure 14

Spatial tampering localization ability comparison between the proposed approach and the watermarking scheme [15]: (a) watermarked frame (b) tampered frame (c) localized tampering by the scheme [15] (d) localized tampering by our scheme

To compare the proposed watermarking scheme performance regarding the spatial distortions localization ability with the scheme presented in [15], we deliberately tampered the original watermarked frame from news.avi video by deleting the blue TV from the upper right as shown in Fig. 14b. Figures 14c and 14d illustrate the tampering localization findings belonging to the work [15] and our proposed one, respectively. From these figures, it can be seen that the two approaches fulfill the localization requirement. However, it is clearly noticed that our proposed technique ensures a more concise maliciously edited areas identification. In fact, unlike the watermarking scheme in [15], which involves a 3x3 sized gliding window, our methodology relies on a 2x2 glide window-based tampered regions recognition strategy. Actually, the used 2x2 size enables to more improve the tampering detection reliability as well as further increase the tamper localization accuracy.

As far as the temporal tampering localization ability is concerned, the comparative study, recapitulated in Table 10, confirms that all the techniques cited in this table succeed in localizing the frame dropping-based attack. Moreover, except the work in [46] the other schemes can efficaciously recognize frame replacing-based tampering. Nevertheless, only the proposed approach and the system suggested in [46] are able to successfully identify the swapping-based inter-frames manipulation. Therefore, only our approach can offer the capability to jointly find out and properly locate the considered three temporal tampering attacks.

The comparison with other methods testifies the proposed algorithm effectiveness in terms of capacity, imperceptibility, robustness and tampering localization. It efficiently strikes a favorable compromise between the different requirements. This performance is derived from the joint use of various beneficial techniques and procedures to build up its overall framework.

Table 10 Temporal tampering localization ability comparison between the proposed approach and the watermarking schemes [10, 15, 46] and [45]

6 Conclusion and Future Works

In this paper, a semi-fragile dual video watermarking scheme was proposed for content authentication and tampering detection and localization. Timing information and reliable texture features are used to design two authentication watermarks, which are used to detect and locate intra-frame and inter-frames manipulations. The embedding process is done in the hybrid domain. Indeed, the first watermark is embedded within the host frames pixels using LSB method. To meet the security requirement, the second watermark is encrypted by Torus isomorphism mapping. Afterward, it is encapsulated into the most textured sub-blocks in bit expansion mode following a LWT-SVD-based embedding method. In the detection side, the two hidden watermarks are blindly extracted. The mismatch between the observed frame index and the extracted one from the current frame indicates a temporal tampering occurrence. Whereas, the extracted second watermark and its reconstructed version are processed as inputs to a glide window-based tampering localization procedure in order to accurately determine the spatial forgeries. Several experiments were performed to assess the proposed algorithm performance. Simulations results demonstrate the effectiveness of the proposed semi-fragile scheme method in attacks types characterization. In fact, it efficiently survives incidental processing such as compression and noises addition while being sensitive to intentional distortions such as content modification and frames manipulations with a concise location ability. Moreover, it carries a good watermarked video perceptual quality with a large payload capacity. In the follow-up work, the proposed approach will be extended to ensure the tampered regions recovery.