1 Introduction

The Three-Dimensional Video-plus-Depth (3DV + D) comprises diverse video streams captured by different cameras around an object. Therefore, it is an imperative assignment to fulfil efficient compression to transmit and store the 3DV + D content in a sufficient compressed form to attain future resource bounds, whilst preserving a decisive reception quality. Also, the security of the transmitted 3DV + D is a critical issue for protecting its content copyright. Due to the fast progress in network development, humans can easily and arbitrarily distribute or access digital multimedia data from networks. The ownership security has become an important issue for individuals, and it requires more interest. Thus, there is a significant threat to copyright owners and digital multimedia producers to conserve their multimedia from intruder prospection to avert loss in transmitted data [32]. The watermarking is one of the most favourable methods to secure digital multimedia files in the domains of copyright protection and data authentication, where a watermark secret code is inserted in the transmitted digital multimedia, and it contains the information about the creator of the media, the copyright owner, or the authorized user. The utilization of digital watermarks for efficient video transmission can be beneficial to ensure copyright. A digital watermark can be embedded either in a compressed video or uncompressed video [11]. Video information are always transported and stored in the form of compressed data. The uncompressed video watermarking techniques can also be utilized for the compressed-video bit streams. However, they require complete video re-encoding and decoding for the watermark insertion or extraction. In different cases, the complete video stream decoding process is not recommended. So, the compressed video watermarking has recently acquired more attentiveness. Furthermore, the watermark insertion and extraction in compressed data has less computations, because the complete re-encoding and decoding of the transmitted stream is not required for embedding and extraction of the watermark bits.

Recently, several video encoding standards have emerged. The objective of an encoding standard is to achieve high data compression, while maintaining an acceptable quality. The 3D-HEVC is efficient and it is the most recent encoding standard used in different applications. The 3D-HEVC has received a broad attentiveness, and it is expected to rapidly take place of traditional 2D video coding in numerous applications [5]. The predictive 3D H.265/HEVC framework is used to compress the transmitted 3DV sequences [7, 10]. In the 3D-HEVC system, the original 3D Video (3DV) consisting of multiple video streams is taken for the same object by various cameras. Thus, to transport the 3DV over limited-resources networks, a highly efficient compression standard must be applied, whilst preserving a high reception quality. The 3D-HEVC exploits the advantages of the time and space matching between frames in the same stream in addition to the inter-view matching within the different 3DV streams to enhance the encoding process. However, the extremely-compressed videos are more sensitive to transmission faults.

The 3DV + D is a common format of 3DV representation that has recently been discussed intensively [3, 4, 27]. In this format, data about scene per-pixel geometry is available. In addition, depth data is very essential in 3D applications. It is beneficial for adaptable depth conception to harmonize various 3D displays. Moreover, it optimizes the 3DV stored bits compared to the traditional 2D videos. Because the whole object pixels must have identical depth values, the depth information can be utilized to recognize the object boundaries. Due to the importance of the depth data which corresponds to the texture color data of the transmitted 3DV, it must be utilized to represent the transmission of 3DV + D content over wireless networks. Unfortunately, the utilization of depth data increases the transmission bit rate because of the need for an additional transmission bandwidth to the required bandwidth to transmit the color data of the 3DV + D content.

Therefore, one of the main contributions of this paper is to present robust and reliable compressed-video watermarking techniques for efficient transmission of 3D-HEVC compressed bit streams. These techniques have the following characteristics:

  1. 1.

    Quality. The quality of the watermarked 3D-HEVC frames resulting from the embedding process is maintained as high as possible by efficiently choosing the most suitable domain for watermark embedding.

  2. 2.

    Robustness and Security. The proposed watermark embedding and extraction procedures are robust. The embedded watermarks can survive different types of attacks.

  3. 3.

    Transmission Bit Rate. The proposed watermarking techniques reduce the transmission bit rate through hiding and embedding multiple depth frames into the corresponding color frames of the transmitted 3D-HEVC data.

  4. 4.

    Complexity. The computational cost due to watermark embedding is kept minimum.

The rest of this paper is organized as follows. Section 2 introduces some of the existing hybrid watermarking schemes for 3D-HEVC applications. Section 3 presents the proposed watermarking techniques. Simulation results and the comparative study are presented in section 4. The conclusions are presented in section 5.

2 Related work

With the emerging evolution of 3D-HEVC applications, the security and copyright protection have become important aspects of the 3DV content storage and transmission. Multimedia watermarking techniques are employed for protecting the 3DV + D data copyright. Multimedia watermarking methods are classified into two main categories; spatial- and transform-domain methods. The spatial-domain methods hide the watermark in the given video frames by directly adjusting their pixel values. They are simple to carry out and need less computations. Unfortunately, they are not robust enough to attacks. The transform-domain watermarking methods adjust the video frames coefficients in a certain transform domain based on the adopted watermark embedding method. The transform-domain watermarking methods achieve more robustness than those of the spatial-domain methods.

There are few research works on 3DV data watermarking, and most of them deal with Depth-Based Image Rendering (DBIR). Thus, 3DV watermarking is still in its rudimentary phase. A watermarking method in the wavelet-domain for stereo images was introduced in [2]. It depends on extracting the depth map from the stereo pairs for watermark embedding. In [22], a visual model method for watermarking of High Definition (HD) stereo images in DCT domain was presented. It is based on the visual sensitivity of the human eye to define the perceptual modifications in the watermark embedding process. Lin et al. [19] suggested a watermarking method depending on the rendering conditions of the 3D images. Another blind diverse watermarking method was suggested in [18] based on the DBIR method performed on the centre image and the depth image generated by the content provider. Kim et al. [15] also introduced a watermarking method for 3D DBIR images through the utilization of the quantization on Dual-Tree Complex Wavelet Transform (DT-CWT) coefficients. To improve the watermark robustness, two features of the DT-CWT are utilized, which are the approximate shift invariance and the directional selectivity. In [13], some efficient and robust hybrid watermarking schemes for different color image systems have been presented.

An efficient watermarking method for 3D images based on DBIR scheme was presented in [34] by utilizing the Scale-Invariant Feature Transform (SIFT) to choose some suitable regions for watermarking and the spread spectrum technique to insert the watermark data in the DCT coefficients of the selected regions. A 3DV blind watermarking scheme based on a virtual view invariant domain was introduced in [9]. The luminance average values of the 3DV frames are chosen for watermark embedding. In [17], another 3DV watermarking scheme that concentrates on perceptual quality embedding was introduced. It takes advantage of motion on the z-axis, visual features, and the rendered hidden pixels from the depth data.

Swati et al. [31] suggested a fragile watermarking method, where the watermark is inserted in the Least Significant Bit (LSB) of the non-zero quantized coefficients in the HEVC compressed video. Ogawa et al. [26] proposed an efficient watermarking scheme for HEVC bit streams that inserts the watermark information through the video compression phase. Also, there are several traditional works existing for the watermarking of the 2D H.264/AVC compressed bit streams. Zhang et al. [36] suggested a video watermarking scheme, in which the security information is represented in a pre-processed binary data sequence and embedded into the middle frequency coefficients in the I frame. To enhance the watermark verification, the coefficients signs are altered depending on the watermark. The work introduced in [36] has been enhanced in [37] by concentrating on gray-scale characters and patterns. Qiu et al. [28] suggested a robust intra-frame watermark embedding scheme in quantized DCT coefficients and a fragile inter-frame watermark embedding method in motion vectors. Kuo and Lo [16] enhanced the video watermark embedding scheme that was suggested in [28] by selecting more appropriate regions for both robust and fragile watermark embedding within the H.264 compressed video through the video encoding process.

In [23], the process of watermark embedding is executed through directly changing some data bits within the bit stream, however the pre-embedding process has complex computations. In [24], the same authors of [23] suggested a non-blind and robust watermarking method by utilizing the Watson Visual Model (WVM) for watermark embedding in the I frame. Their proposed non-blind method [24] was extended for the P frame in [25], where the watermark bits are embedded in all non-zero coefficients of the P frame. An information hiding model was implemented in [8] to choose the watermark embedding area based on the forbidden-zone-data-hiding concepts. The sign parity of the coefficients and the values of the middle-frequency coefficients are altered for watermark embedding in the I frame [35]. In [6], the watermark has been embedded in the non-zero coefficients of the P-frame in the compressed domain to achieve better perceptual quality of the watermarked video frames and a minimal increase in video bit rate. In [29], a structure preserving non-blind H.264 watermarking scheme was suggested to insert watermarks through substituting secret bits in the motion vector differences of the non-reference images. Su et al. [30] suggested another non-blind watermark embedding algorithm for the I frames and P frames. The watermark embedding is implemented based on the spread spectrum technique and the WVM [24].

It is noticed that several authors introduced a lot of work on 3D image watermarking in the spatial domain. Most recent 3D image and video watermarking techniques have been implemented in transform domains. Generally, there are few contributions in the literature on 3D compressed video watermarking techniques. Some of these introduced techniques have critical problems with watermark verification and extraction. The traditional video watermarking techniques have not achieved adequate watermarked and extracted watermark subjective and objective qualities in the presence of multimedia attacks. Thus, they have low robustness and imperceptibility. Moreover, most of the traditional video watermarking techniques work on un-compressed images. So, they require more computations in the watermark insertion and extraction processes, where a complete encoding and decoding of the transmitted stream are needed for embedding and extraction of the watermark data. Thus, they increase the computational overhead. Furthermore, the traditional video watermarking techniques failed in selecting the most suitable regions inside the host frames for watermark embedding. Thus, they have an effect on the imperceptibility and quality of the watermarked frames, and they also increase the computations.

Taking into account the limitations of the state-of-the-art video watermarking techniques, the main contribution of this paper is to present efficient hybrid techniques for secure 3DV + D HEVC communication. These transform-domain watermarking techniques efficiently protect the copyright of the 3DV + D HEVC streams to preserve both robustness and imperceptibility. Moreover, the proposed hybrid techniques save the 3D-HEVC transmission bit rates. Therefore, they have good imperceptibility, high quality, high robustness, acceptable bit rates, low computational complexity, and adequate immunity to different types of multimedia attacks compared to the traditional watermarking techniques.

3 The proposed watermarking techniques

In this section, the proposed watermarking techniques are introduced. We present the homomorphic-transform-based SVD watermarking in the DWT domain and the three-level DSWT watermarking in the DCT domain. They are developed for 3DV data hiding taking into account an increasing embedding capacity without affecting the quality of the watermarked 3DV streams. In the proposed techniques, multiple depth watermark frames are firstly fused using the proposed wavelet-based fusion technique. Then, the resultant fused depth watermark is embedded in the original 3DV color frames using the two proposed hybrid watermarking techniques to produce the watermarked 3D-HEVC streams. So, the proposed watermarking framework consists of two phases as shown in Fig. 1. In the first phase, the primary depth watermark frame is fused with the secondary depth watermark frame using the proposed wavelet-based fusion technique to produce the fused depth watermark frame. In the second phase, the fused depth watermark frame is embedded in the original 3D-HEVC color frames using the two proposed watermarking techniques. The watermark extraction process is performed in two reverse steps; the extraction of the fused depth watermark frame and the reconstruction process to extract the primary and secondary depth watermark frames.

Fig. 1
figure 1

The general framework of the proposed watermarking system

In the first phase, we exploit the proposed wavelet-based fusion technique, which is effective for combining perceptually important image features. It is used for several applications such as medical and remote sensing applications.

The basic idea of the proposed wavelet-based fusion technique is that the two depth watermark frames are firstly transformed using the DWT. Then, the fusion process is executed and after that the Inverse DWT (IDWT) is employed to construct the fused depth watermark frame. The proposed wavelet-based fusion process is shown in Fig. 2. An average fusion rule is adopted in this scheme. Further discussion details, conditions, and equations for the utilized wavelet-based fusion technique are found in [21].

Fig. 2
figure 2

The proposed wavelet-based fusion process of the two primary and secondary depth watermark frames

3.1 Homomorphic-transform-based SVD watermarking in the DWT domain

In this section, the proposed homomorphic-transform-based SVD watermarking in the DWT domain is introduced. The proposed framework for fused depth watermark embedding is shown in Fig. 3, and that for fused depth watermark extraction is shown in Fig. 4.

Fig. 3
figure 3

Watermark embedding in the proposed scheme of homomorphic-transform-based SVD watermarking in the DWT domain

Fig. 4
figure 4

Watermark extraction in the proposed homomorphic-transform-based SVD watermarking in the DWT domain

The watermark embedding steps shown in Fig. 3 are summarized below:

  1. Step 1:

    The original HEVC stream is partitioned into groups of k frames, and the resulting frames are utilized as the host frames.

  2. Step 2:

    The 2D DWT is applied on each HEVC frame to split it into four sub-ands (LL, LH, HL, and HH), and the LL sub-bands are further utilized.

  3. Step 3:

    The reflectance components of the LL sub-band are extracted through the homomorphic transform.

  4. Step 4:

    The fused depth image is utilized as a watermark, and it is embedded with the SVD algorithm in the extracted reflectance components of the LL sub-band.

  5. Step 5:

    The inverse SVD is implemented, and then the inverse homomorphic and DWT transforms are applied after that to obtain the watermarked frames, and thus the watermarked HEVC stream.

The watermark extraction steps shown in Fig. 4 are summarized below:

  1. Step 1:

    The watermarked HEVC stream is obtained and then the received HEVC bit streams are partitioned into groups of k frames to get the watermarked frames.

  2. Step 2:

    The 2D DWT is applied on each of the original and watermarked frames to split them into their four sub-ands (LL, LH, HL, and HH), and the LL sub-bands are further utilized.

  3. Step 3:

    The reflectance components of the LL sub-bands of the original and watermarked frames are extracted through the homomorphic transform.

  4. Step 4:

    The SVD algorithm is applied on the reflectance components of the LL sub-bands of the original and watermarked frames to perform the watermark extraction.

  5. Step 5:

    The inverse SVD is implemented, and then the inverse DWT is applied to extract the possibly-distorted depth watermark images.

In the proposed watermarking technique, the fused depth watermark frames are inserted in the chosen wavelet transform coefficients of the 3DV luminance Y components of the color frames. The first step of this technique is the transformation of the RGB color space to the YUV color space. The 2D DWT is used to split each Y frame into four sub-bands, which are the approximation sub-band (low frequency LL), the horizontal detail sub-band (high frequency LH), the vertical detail sub-band (high frequency HL), and the diagonal detail sub-band (high frequency HH). So, the wavelet transform is performed on the luminance component Y of every color frame within the 3DV stream. The reflectance components of the LL sub-bands are extracted through the homomorphic transform. The fused depth watermark frame is embedded by employing the SVD algorithm on the extracted reflectance components of the LL sub-bands.

$$ {F}_{LL}\left({n}_1,{n}_2\right)=I\left({n}_1,{n}_2\right).R\left({n}_1,{n}_2\right) $$
(1)
$$ \ln\ \left[{F}_{LL}\left({n}_1,{n}_2\right)\right]=\ln \left[I\left({n}_1,{n}_2\right)\right]+\ln \left[R\left({n}_1,{n}_2\right)\right] $$
(2)
$$ \mathbf{R}=\mathbf{US}{\mathbf{V}}^{\mathrm{T}} $$
(3)
$$ \mathbf{D}=\mathbf{S}+k\mathbf{W} $$
(4)
$$ \mathbf{D}={\mathbf{U}}_{\mathbf{w}}{\mathbf{S}}_{\mathbf{w}}{\mathbf{V}}_{\mathbf{w}}^{\mathrm{T}} $$
(5)
$$ {\mathbf{R}}_{\mathbf{w}}=\mathbf{U}{\mathbf{S}}_{\mathbf{w}}{\mathbf{V}}^{\mathrm{T}} $$
(6)
$$ {\mathbf{X}}_{\mathbf{w}}={\mathbf{R}}_{\mathbf{w}}+\mathbf{I} $$
(7)
$$ {\mathbf{F}}_{\mathbf{L}{\mathbf{L}}_{\mathbf{w}}}=\exp\ \left({\mathbf{X}}_{\mathbf{w}}\right) $$
(8)

The main contribution of the proposed homomorphic-transform-based SVD watermarking in the DWT domain is the utilization of the homomorphic transform jointly with the DWT and SVD transforms. So, the homomorphic transform improves the performance of the watermarking process through choosing the most suitable components of the host color frames for watermark embedding to maintain the imperceptibility and robustness of the watermarked frames. The homomorphic transform is performed based on the fact that the frame intensity is represented by the multiplication of light illumination and reflectance of objects inside images. Because the illumination is approximately constant and the reflectance is variable from an image to another, the image reflectance represents the most important component of the transmitted images. Thus, the image reflectance can be extracted through the homomorphic transform, and then it is used for watermark embedding.

The low-frequency LL sub-band intensity can be formulated by (1), where I(n1, n2) is the illumination and R(n1, n2) is the reflectance of the selected frame, whose values at the spatial coordinates (n1, n2) are positive scalar quantities. The homomorphic transform is executed as given in (2). A High Pass Filter (HPF) and a Low Pass Filter (LPF) are applied to the ln [FLL(n1, n2)] to separate the illumination from the reflectance. We can represent ln [R(n1, n2)] and ln [I(n1, n2)] in matrix form as R and I matrices. After that, the SVD is applied on the reflectance R matrix as in (3), where U and V are orthogonal matrices and S is a diagonal matrix. The Singular Values (SVs) of the matrix R are the entries of the S matrix. Then, the fused depth watermark (W matrix) is combined with the SVs of the reflectance R matrix as in (4). After that, the SVD is employed on the modified matrix (D matrix) as given by (5), then the frame (Rw matrix) is obtained by utilizing the modified matrix (Sw matrix) as in (6). The inverse homomorphic transform is applied on the I and Rw to get a matrix Xw as in (7), and then the low-frequency sub-band of the watermarked frame (FLLw) can be obtained by (8). The inverse DWT is implemented to get the 3DV watermarked frame Fw.

For the extraction of the possibly distorted fused depth watermark from the possibly corrupted watermarked 3D-HEVC frame, given Uw,Sw,Vw matrices and the possibly distorted frame Fw, the above-mentioned steps are reversely executed. The homomorphic transform is applied on the LL sub-band of the watermarked frame FLLw. An HPF is utilized to obtain the possibly distorted reflectance component R*w, and then the SVD is employed on the R*w matrix as given by (9). The matrix that includes the watermark is computed by (10), and so the possibly-corrupted fused depth watermark is obtained by (11).

$$ {\mathrm{R}}_{\mathrm{w}}^{\ast }={\mathrm{U}}^{\ast }{\mathrm{S}}_{\mathrm{w}}^{\ast }{\mathrm{V}}^{\ast \mathrm{T}} $$
(9)
$$ {\mathrm{D}}^{\ast }={\mathrm{U}}_{\mathrm{w}}{\mathrm{S}}_{\mathrm{w}}^{\ast }{\mathrm{V}}_{\mathrm{w}}^{\mathrm{T}} $$
(10)
$$ {\mathrm{W}}^{\ast }=\left({\mathrm{D}}^{\ast }-\mathrm{S}\right)/k $$
(11)

3.2 Three-level DSWT watermarking in the DCT domain

In this technique, the transformation from the RGB color space to the YUV color space is the first step, and then the DCT is applied on each Y frame. The 3-level DSWT is utilized to divide the DCT domain into four sub-bands, which are the approximation sub-band (A), the horizontal sub-band (H), the vertical sub-band (V), and the diagonal sub-band (D). These A, H, V, and D sub-band matrices have identical sizes. The fused depth watermark frame is embedded into the matrix A.

The fused depth watermark frame embedding steps are shown in Fig. 5 and summarized below:

  1. Step 1:

    The original compressed 3DV stream is transformed from the RGB to the YUV color space, and then the luminance Y values of the 3DV frames are further utilized.

  2. Step 2:

    The converted 3DV stream is partitioned into groups of k frames.

  3. Step 3:

    The DCT components of each Y frame are obtained using the 2D DCT.

  4. Step 4:

    The determined DCT components of each Y frame are decomposed into four sub-ands (A, H, V, and D) using the 3-level DSWT.

  5. Step 5:

    The fused depth watermark frame is embedded into the matrix A of each Y frame by multiplying the watermark by a key k and adding it to the matrix A as presented in (12), where 0 < k < 1.

Fig. 5
figure 5

The proposed framework of the three-level DSWT in the DCT domain for 3D-HEVC watermark embedding

$$ {\mathbf{A}}_{\mathbf{w}}=\mathbf{A}+k\mathbf{W} $$
(12)

where A refers to the host frame, Aw refers to the watermarked frame, W is the fused depth watermark, and k is the embedding factor.

  1. Step 6:

    The inverse DSWT is implemented and then the inverse DCT to obtain the watermarked Y frame, and thus the watermarked 3DV stream.

The fused depth watermark frame extraction steps are shown in Fig. 6 and summarized below:

  1. Step 1:

    The watermarked 3DV is transformed from the RGB to the YUV color space, and just the luminance Y values of the frames are further processed.

  2. Step 2:

    The converted watermarked 3DV stream is partitioned into groups of k frames.

  3. Step 3:

    The DCT components are extracted from each watermarked Y frame using the 2D DCT.

  4. Step 4:

    The determined DCT components of each Y frame are decomposed into four frequency sub-bands (Aw, H, V, D) utilizing the 3-level DSWT.

  5. Step 5:

    The possibly-distorted fused depth watermark frame (W*) is extracted from the matrix A*w of each watermarked Y frame by subtracting the matrix A of the original frame from the matrix A*w of the watermarked frame and dividing the result by k as introduced in (13).

Fig. 6
figure 6

The proposed framework of the three-level DSWT in the DCT domain for 3D-HEVC watermark extracting

$$ {\mathbf{W}}^{\ast}=\left({{\mathbf{A}}_{\mathbf{w}}}^{\ast}-\mathbf{A}\right)/k $$
(13)

4 Simulation results and discussions

To assess the proposed watermarking techniques, several simulation tests on standard well-known 3DV + D (Shark and Dancer) 1920 × 1088 sequences [20] have been carried out. For each sequence, the coded 3D H.265/HEVC bit streams are produced by employing the reference HM codec [12]. All simulation tests have been performed using an Intel® Core™i7-4500 U CPU @1.80GHz and 2.40GHz with 8GB RAM, working with Windows 10, 64-bit operating system, and using MATLAB 2017a. The visual results ensure watermark invisibility and no degradation in the quality of watermarked frames compared to the original frames. The PSNRs of the watermarked frames and the NCs of the extracted possibly corrupted fused, primary, and secondary depth watermarks are estimated. The PSNR is calculated by (14) and (15) [7], where MSE is the Mean Square Error between the original host and watermarked color frames, A is the main color frame, Aw is the watermarked color frame, and M × N is the size of the main and the watermarked color frames. The NC is estimated by (16), where W is the original depth watermark and W* is the extracted corrupted depth watermark. In our simulations, we apply different types of attacks on the watermarked frames as in [1], but we only present samples results from the whole tested 3DV frames to simplify the presentation of the performance evaluation of the proposed techniques. Then, we extract the fused, primary, and secondary depth watermark frames to test the robustness of the proposed techniques. We have run two experiments for each proposed watermarking technique. The first one uses the 3DV + D Shark sequence by selecting color frame 100 as a test host Y frame, depth frame 100 as a primary depth watermark frame, and depth frame 200 as a secondary depth watermark frame. The other experiment uses the 3DV + D Dancer sequence by selecting color frame 50 as a test host Y frame, depth frame 50 as a primary depth watermark frame, and depth frame 150 as a secondary depth watermark frame as shown in Fig. 7 indicating their fused depth watermark frames.

$$ PSNR(dB)=10 lo{g}_{10}\left({255}^2/ MSE\right) $$
(14)
$$ MSE=\frac{1}{M\times N}\sum \limits_{x=0,y=0}^{M-1,N-1}{\left({A}_w\left(x,y\right)-A\left(x,y\right)\right)}^2 $$
(15)
$$ NC=\frac{{\mathbf{W}}^{\ast}.\mathbf{W}}{\left\Vert {\mathbf{W}}^{\ast}\right\Vert .\left\Vert \mathbf{W}\right\Vert } $$
(16)
Fig. 7
figure 7

The 3D-HEVC Shark and Dancer host color frame and the primary, secondary, and fused depth watermark frames

To clarify the efficiency of the proposed watermarking techniques in protecting and securing the transmitted 3DV + D HEVC bit streams in the presence of attacks, we have compared their performance with those of the state-of-the-art hybrid watermarking techniques such as DCT + DWT, DWT + SVD, and DCT + SVD [13, 14, 33]. In the introduced simulation results, the DWT + Homomorphic + SVD refers to the first proposal of the homomorphic-transform-based SVD watermarking in the DWT domain, and the DCT + DSWT refers to the second proposal of the three-level DSWT watermarking in DCT domain. Figures 8 and 9 show the visual results with the PSNR and NC values of the color watermarked and extracted primary, secondary, and fused depth watermark frames for the Shark and Dancer streams without attacks compared to those of the state-of-the-art techniques.

Fig. 8
figure 8

3DV watermarked and extracted fused, primary, and secondary depth watermark Shark frames without attacks

Fig. 9
figure 9

3DV watermarked and extracted fused, primary, and secondary depth watermark Dancer frames without attacks

It is clear from Figs. 8 and 9 that there is a high similarity between the original and watermarked frames with the proposed techniques compared to the state-of-the-art techniques. The proposed techniques also improve the similarity between the original and extracted fused, primary, and secondary depth watermark frames. Moreover, the proposed DWT + Homomorphic + SVD technique introduces better watermarked and extracted watermark frames than those of the proposed DCT + DSWT technique. The proposed techniques achieve high PSNR and NC values for all tested 3DV frames compared to those of the related works. Table 1 presents the average CPU time results of the proposed embedding techniques compared to those of the state-of-the-art embedding techniques for the Shark and Dancer 3DV streams without attacks. It is noticed that the proposed techniques introduce acceptable embedding CPU processing times, and hence they can be recommended for online and real-time video transmission applications. The DCT + DWT technique has the shortest CPU processing time, and the DCT + SVD technique has the longest CPU processing times in the watermark embedding process.

Table 1 Average CPU embedding times of all techniques for the Shark and Dancer 3DV streams

In Tables 2, 3, 4, 5, 6, and 7, we compare the objective average PSNR values of the watermarked color frames and the NC values of the extracted fused, primary, and secondary depth watermark frames of the Shark and Dancer 3D-HEVC sequences for the proposed watermarking techniques and the state-of-the-art watermarking techniques for different types of attacks. From all presented simulation results, we deduce that the suggested techniques always achieve superior PSNR and NC values. It can be realized that the proposed techniques have a meaningful average gain in objective PSNR and NC for all tested 3D-HEVC frames for different types of attacks. From Table 2 presenting the simulation results in the cases of different rotation attacks, it is noticed that both the DWT + Homomorphic + SVD and the DCT + DSWT watermarking techniques give not only the highest PSNR values between the original and watermarked frames, but also the best correlation between the original and extracted fused, primary, and secondary depth watermark frames. Also, it is clear that the DWT + Homomorphic + SVD technique achieves the best results with the rotation attacks.

Table 2 Objective average PSNR values of the watermarked color frames and average NC, NC1, and NC2 values of the extracted fused, primary, and secondary depth watermark frames for the Shark and Dancer 3DV streams with different rotation attacks
Table 3 Objective average PSNR values of the watermarked color frames and average NC, NC1, and NC2 values of the extracted fused, primary, and secondary depth watermark frames for the Shark and Dancer 3DV streams with different Gaussian noise attacks
Table 4 Objective average PSNR values of the watermarked color frames and average NC, NC1, and NC2 values of the extracted fused, primary, and secondary depth watermark frames for the Shark and Dancer 3DV streams with different types of blurring attacks
Table 5 Objective average PSNR values of the watermarked color frames and average NC, NC1, and NC2 values of the extracted fused, primary, and secondary depth watermark frames for the Shark and Dancer 3DV streams with different types of JPEG compression attacks
Table 6 Objective average PSNR values of the watermarked color frames and average NC, NC1, and NC2 values of the extracted fused, primary, and secondary depth watermark frames for the Shark and Dancer 3DV streams with HEVC compression attack
Table 7 Objective average PSNR values of the watermarked color frames and average NC, NC1, and NC2 values of the extracted fused, primary, and secondary depth watermark frames for the Shark and Dancer 3DV streams with resizing and crop attacks

From Table 3 presenting the simulation results in the cases of Gaussian noise attacks, it is noticed that the proposed watermarking techniques give not only the highest PSNR values between the original and watermarked frames, but also the best correlation values between the original and extracted fused, primary, and secondary depth watermark frames. In addition, it is clear that the DWT + Homomorphic + SVD technique achieves the best results in the cases of Gaussian noise attacks. From Table 4 including the simulation results in the cases of different (Motion, Disk, and Average) blurring attacks, it is noticed that both the DWT + Homomorphic + SVD and the DCT + DSWT watermarking techniques give not only the highest PSNR values between the original and watermarked frames, but also the best correlation values between the original and extracted fused, primary, and secondary depth watermark frames. Moreover, it is clear that the DWT + Homomorphic + SVD technique achieves the best results in the cases of different blurring attacks.

From Table 5 including the simulation results in the cases of different JPEG compression attacks, it is noticed that the proposed watermarking techniques give not only the highest PSNR values between the original and watermarked frames, but also the best correlation values between the original and extracted watermarks. In addition, it is clear that the DWT + Homomorphic + SVD technique achieves the best results in the cases of JPEG compression attacks. From Table 6 including the simulation results in the cases of HEVC compression attacks, it is noticed that the proposed watermarking techniques give not only the highest PSNR values between the original and watermarked frames, but also the best correlation values between the original and extracted watermarks. In addition, it is clear that the DWT + Homomorphic + SVD technique achieves the best results in the cases of HEVC compression attacks. From Table 7 including the simulation results in the cases of resizing and crop attacks, it is noticed that all the presented watermarking techniques in this paper give good results of high PSNR values between the original and watermarked frames, and also high correlation values between the original and extracted fused, primary, and secondary depth watermark frames. It is also clear that the proposed watermarking techniques still achieve the best results. It is clear that the DWT + Homomorphic + SVD technique achieves the best results in the cases of resizing and crop attacks. From all presented simulation results, we deduce that the suggested techniques always achieve superior PSNR and NC values. It can be realized that the proposed techniques have a meaningful average gain in objective PSNR and NC for all tested 3DV + D HEVC frames in the presence of different types of attacks.

It is known that the color and depth frames of the transmitted 3DV + D HEVC sequences need two separate channels for their transmission over networks. In order to further confirm the performance efficiency of the proposed watermarking techniques in minimizing the required bandwidth for transmitting the 3DV + D data, we run more simulation tests. The results prove that the proposed watermarking techniques can jointly transmit the color and depth frames on the same channel through embedding multiple fused depth frames within the color frames of the 3DV + D. Table 8 shows the size in bytes of the original host color frame, original primary, secondary, and the fused watermark depth frames, and watermarked color frames for the Shark and Dancer 3DV streams. From this table, it is noticed that the size of the fused watermark depth frame is smaller than the summation of the sizes of the primary and secondary watermark depth frames. Also, it is noticed that the size of the watermarked color frame is smaller than the summation of the sizes of the host color frame and fused watermark depth frame. Therefore, it is clear that the proposed techniques and all presented techniques give good results for minimizing the required channel bandwidth for transmitting the color and depth frames of the transmitted 3DV + D data. Thus, instead of the transmission of the color and depth frames separately needing more transmission bandwidth, we transmit both the color and depth frames on the same channel that has a bandwidth less than that required for transmitting color and depth frames, separately. This is achieved through hiding multiple fused depth frames inside the color frames of the transmitted 3DV + D data. So, the proposed fusion and watermarking techniques improve the capacity of the embedded information embedding, and subsequently save the transmission bit rate.

Table 8 Size of the transmitted host, primary, secondary, fused watermark, and watermarked frames for the Shark and Dancer 3DV streams

Therefore, it is noticed that all presented objective and subjective results of the proposed watermarking techniques are good for securing the transmission of 3DV + D HEVC data, and they can survive different types of multimedia attacks. It is also clear that the proposed watermarking techniques give the highest robustness. In addition to the improved robustness, the imperceptibility and security of the proposed techniques are good. Moreover, the proposed techniques save the transmission bandwidth by embedding multiple fused depth frames within the color frames. So, they minimize the required transmission bandwidth for streaming of the 3DV + D HEVC data, and thus they enhance bandwidth efficiency of the communication channel. The presented subjective and objective results also prove that there is no remarkable difference between the original and the watermarked frames, which reveals the high fidelity of the proposed watermarking techniques. Also, the watermarked frames and the extracted fused, primary, and secondary watermarks of the suggested techniques have high PSNR and NC values compared to those of the state-of-the-art techniques in all simulation experiments.

In order to confirm the performance and security efficiency of the proposed techniques, we have run more simulation tests on more 3D-HEVC sequences of the PoznanStreet and PoznanHall2 3D video test sequences, which have various spatial resolutions and temporal characteristics. We have run two other experiments on the PoznanStreet and PoznanHall2 3D-HEVC sequences for each proposed technique. The first one uses the 3DV + D PoznanStreet sequence by selecting color frame 100 as a test host Y frame, depth frame 100 as a primary depth watermark frame, and depth frame 200 as a secondary depth watermark frame. The other experiment uses the 3DV + D PoznanHall2 sequence by selecting color frame 50 as a test host Y frame, depth frame 50 as a primary depth watermark frame, and depth frame 150 as a secondary depth watermark frame as shown in Fig. 10 indicating their fused depth watermark frames.

Fig. 10
figure 10

The 3D-HEVC PoznanStreet and PoznanHall2 host color frame and the primary, secondary, and fused depth watermark frames

Table 9 presents the objective average PSNR results of the watermarked color frames, the NC of the extracted fused, primary, and secondary depth watermark frames, and the embedding CPU time results for the PoznanStreet and PoznanHall2 3DV streams with the proposed embedding techniques compared to the state-of-the-art embedding techniques in the absence of attacks. It is clear that the two proposed techniques achieve high PSNR and NC values for all tested 3DV frames compared to those of the related works. It is also noticed that the proposed techniques present an embedding CPU processing time that is acceptable for online and real-time 3DV communication applications.

Table 9 Objective average PSNR values of the watermarked color frames, average NC values of the extracted fused depth watermark frames, and average CPU embedding times for the PoznanStreet and PoznanHall2 3DV streams without attacks

In Tables 10, 11, 12, 13, 14, and 15, we compare the objective average PSNR values of the watermarked frames and the average NC values of the extracted watermark frames for the PoznanStreet and PoznanHall2 3DV sequences when applying the two proposed watermarking techniques and compare with the state-of-the-art watermarking techniques in the presence of different types of attacks. From Table 10 including the simulation results in the cases of different rotation attacks, it is noticed that the proposed DWT + Homomorphic + SVD and DCT + DSWT watermarking techniques give not only the highest PSNR values between the original and watermarked frames, but also the best correlation values between the original and extracted watermarks. It is also clear that the DWT + Homomorphic + SVD technique achieves the best results with rotation attacks. From Table 11 including the simulation results with different Gaussian noise attacks, it is noticed that the proposed watermarking techniques give not only the highest PSNR values between the original and watermarked frames, but also the best correlation values between the original and extracted watermarks. In addition, it is clear that the proposed DWT + Homomorphic + SVD technique achieves the best results with different Gaussian noise attacks. From Table 12 including the simulation results with different (Motion, Disk, and Average) blurring attacks, it is noticed that the proposed DWT + Homomorphic + SVD and DCT + DSWT watermarking techniques give not only the highest PSNR values between the original and watermarked frames, but also the best correlation values between the original and extracted watermarks. Moreover, it is clear that the DWT + Homomorphic + SVD technique achieves the best results with different types of blurring attack. From Table 13, including the simulation results with different JPEG compression attacks, it is noticed that the proposed hybrid watermarking techniques give not only the highest PSNR values between the original and watermarked frames, but also the best correlation values between the original and extracted watermarks. In addition, it is clear that the proposed DWT + Homomorphic + SVD scheme achieves the best results with JPEG compression attacks.

Table 10 Objective average PSNR values of the watermarked color frames and average NC values of the extracted fused depth watermark frames for the PoznanStreet and PoznanHall2 3DV streams with different rotation attacks
Table 11 Objective average PSNR values of the watermarked color frames and average NC values of the extracted fused depth watermark frames for the PoznanStreet and PoznanHall2 3DV streams with different Gaussian noise attacks
Table 12 Objective average PSNR values of the watermarked color frames and average NC values of the extracted fused depth watermark frames for the PoznanStreet and PoznanHall2 3DV streams with different types of blurring attacks
Table 13 Objective average PSNR values of the watermarked color frames and average NC values of the extracted fused depth watermark frames for the PoznanStreet and PoznanHall2 3DV streams with different types of JPEG compression attacks
Table 14 Objective average PSNR values of the watermarked color frames and average NC values of the extracted fused depth watermark frames for the PoznanStreet and PoznanHall2 3DV streams with HEVC compression attack
Table 15 Objective average PSNR values of the watermarked color frames and average NC values of the extracted fused depth watermark frames for the PoznanStreet and PoznanHall2 3DV streams with resizing and crop attacks

From Table 14 including the simulation results in the cases of HEVC compression attacks, it is noticed that the proposed watermarking techniques give not only the highest PSNR values between the original and watermarked frames, but also the best correlation values between the original and extracted watermarks. In addition, it is clear that the DWT + Homomorphic + SVD technique achieves the best results in the cases of HEVC compression attacks. From Table 15 including the simulation results with resizing and crop attacks, it is noticed that all watermarking techniques give high PSNR values between the original and watermarked frames, and also acceptable correlation values between the original and extracted watermarks. It is also clear that the proposed watermarking techniques achieve the best results. Moreover, it is clear that the DWT + Homomorphic + SVD technique achieves better results than those of the DCT + DSWT technique with the resizing and crop attacks.

Table 16 shows the size in bytes of the original host color frame, original primary, secondary, and fused watermark depth frames, and watermarked color frame for the PoznanStreet and PoznanHall2 3DV streams. From this table, it is noticed that the size of the fused watermark depth frame is smaller than the summation of the sizes of the primary and secondary watermark depth frames. Also, it is noticed that the size of the watermarked color frame is smaller than the summation of the sizes of the host color frame and the fused watermark depth frame. Therefore, it is clear that the proposed techniques and all presented techniques give good results for minimizing the required channel bandwidth for transmitting the color and depth frames of the transmitted 3DV + D data. Thus, instead of the transmission of the color and depth frames separately needing more transmission bandwidth, we can transmit both the color and depth frames on the same channel to save the bandwidth. This is achieved through hiding multiple fused depth frames inside the color frames of the transmitted 3DV + D data. So, the proposed watermarking techniques improve the capacity of information embedding, save the transmission bandwidth, and subsequently enhance the channel bandwidth efficiency.

Table 16 Size of the transmitted host, primary, secondary, fused watermark, and watermarked frames for the PoznanStreet and PoznanHall2 3DV streams

From all objective simulation results of the PoznanStreet and PoznanHall2 3DV streams, we notice that the two proposed techniques always achieve higher PSNR and NC values with different types of attacks. Also, it is proved that the DWT + Homomorphic + SVD technique presents better performance than that of the DCT + DSWT technique in all tests. It is also noticed that the subjective and objective results for the tested 3DV Shark, Dancer, PoznanStreet, and PoznanHall2 sequences prove that there is no remarkable difference between the original and the watermarked frames, which reveals the high fidelity of the two proposed watermarking techniques. Also, the watermarked frames and extracted watermarks of the suggested watermarking techniques have high PSNR and NC values compared to those of the state-of-the-art techniques. Moreover, the simulation results also reveal that the proposed watermarking techniques have high robustness against different types of attacks, which guarantees the efficiency of these techniques.

5 Conclusions

This paper presented efficient and robust hybrid fusion-watermarking techniques for 3D-HEVC streams. It also presented a comparative study between these proposed watermarking techniques and the existing state-of-the-art techniques. The evaluation metrics for the comparisons on standard 3D-HEVC streams include the stability, reliability, and robustness. Experimental results revealed the superiority of the proposed techniques in maintaining high robustness and high fidelity in the presence of different types of attacks compared to the existing watermarking techniques. Also, the proposed techniques extract the fused, primary, and secondary depth watermark frames with high probability of detection and good 3DV perceptual quality. Moreover, the proposed techniques show that the wavelet-based fusion technique can be used as a new way to embed more watermark frames in the watermarking system. Therefore, the proposed techniques can be utilized for minimizing the transmission bandwidth of color-plus-depth 3D-HEVC content.